Holistic Prediction of the pKa in Diverse Solvents Based on a Machine‐Learning Approach

Software: ADMET Predictor®

Abstract

While many approaches to predict aqueous pKa values exist, the fast and accurate prediction of non‐aqueous pKa values is still challenging. Based on the iBonD experimental pKa database (39 solvents), a holistic pKa prediction model was established using machine learning. Structural and physical‐organic‐parameter‐based descriptors (SPOC) were introduced to represent the electronic and structural features of the molecules. The models trained with a neural network or the XGBoost algorithm showed the best prediction performance with a low MAE value of 0.87 pKa units. The approach allows a comprehensive mapping of all possible pKa correlations between different solvents and it was validated by predicting the aqueous pKa and micro‐pKa of pharmaceutical molecules and pKa values of organocatalysts in DMSO and MeCN with high accuracy. An online prediction platform was constructed based on the current model, which can provide pKa prediction for different types of X−H acidity in the most commonly used solvents.

By Qi Yang Yao Li Dr. Jin‐Dong Yang Yidi Liu Dr. Long Zhang Prof. Dr. Sanzhong Luo Prof. Dr. Jin‐Pei Cheng