A novel strategy for prediction of human plasma protein binding using machine learning techniques

Publication: Chem Intel Lab Sys
Software: ADMET Predictor®

Abstract

Plasma protein binding (PPB) is a key player of drug ADME (absorption, distribution, metabolism, elimination) behaviors, enabling PPB to have significant impact on drug efficacy and toxicity. As drug discovery enters the era of rational drug design, it is desirable to use in silico model to predict PPB so as to achieve rapid initial screening for potential candidate compounds prior to further time-consuming and costly in vitro and in vivo experimental assay. In this study, a global quantitative structure-activity relationship (QSAR) model of PPB was built on the basis of a large training set comprising more than 5000 compounds to represent large structural diversity. The uneven distribution of PPB was often rectified by two mathematical transformations of PPB but this led to a decrease in prediction accuracy at the lower binding level. To resolve this problem, we proposed a novel strategy to build models for different binding levels. The best model yielded much lower mean absolute error (MAE) of 0.076 on the test set than published models and the MAE was further reduced to 0.041 ​at the high level of binding (0.8–1). The models also performed excellent in the validation set containing some compounds from traditional Chinese medicine. In addition, the applicability domain was determined to identify new compounds which are appropriate for prediction using our built models. In conclusion, this study developed a novel strategy to construct robust QSAR model for PPB prediction which could be used by chemists to predict the PPB of candidate compounds efficiently and make structural modification in the early stage of drug development.