Predictive Models Based on Molecular Images and Molecular Descriptors for Drug Screening

Publication: ACS Omega
Software: ADMET Predictor®

Abstract

Various toxicity and pharmacokinetic evaluations as screening experiments are needed at the drug discovery stage. Currently, to reduce the use of animal experiments and developmental expenses, the development of high-performance predictive models based on quantitative structure–activity relationship analysis is desired. From these evaluation targets, we selected 50% lethal dose (LD50), blood–brain barrier penetration (BBBP), and the clearance (CL) pathway for this investigation and constructed predictive models for each target using 636–11,886 compounds. First, we constructed predictive models using the DeepSnap-deep learning (DL) method and images of compounds as features. The calculated area under the curve (AUC) and balanced accuracy (BAC) were, respectively, 0.887 and 0.818 for LD50, 0.893 and 0.824 for BBBP, and 0.883 and 0.763 for the CL pathway. Next, molecular descriptors (MDs) of compounds were calculated using Molecular Operating Environment, alvaDesc, and ADMET Predictor to construct predictive models using the MD-based method. Using these MDs, we constructed predictive models using DataRobot. The calculated AUC and BAC were, respectively, 0.931 and 0.805 for LD50, 0.919 and 0.849 for BBBP, and 0.900 and 0.807 for the CL pathway. In this investigation, we constructed predictive models combining the DeepSnap-DL and MD-based methods. In ensemble models using the mean predictive probability of the DeepSnap-DL and MD-based methods, the calculated AUC and BAC were, respectively, 0.942 and 0.842 for LD50, 0.936 and 0.853 for BBBP, and 0.908 and 0.832 for the CL pathway, with improved predictive performance observed for all variables compared with either single method alone. Moreover, in consensus models that adopted only compounds for which the results of the two methods agreed, the calculated BAC for LD50, BBBP, and the CL pathway were 0.916, 0.918, and 0.847, respectively, indicating higher predictive performance than the ensemble models for all three variables. The predictive models combining the DeepSnap-DL and MD-based methods displayed high predictive performance for LD50, BBBP, and the CL pathway. Therefore, the application of this approach to prediction targets in various drug discovery screenings is expected to accelerate drug discovery.

By Hideaki Mamada, Mari Takahashi, Mizuki Ogino, Yukihiro Nomura, and Yoshihiro Uesawa