Abstract
Drug discovery is a rigorous process that can cost up to 3 billion dollars and takes more than 10 years to bring new therapeutics from bench to bedside. While virtual screening (such as molecular docking) can significantly speed up the discovery process and improve hit rates, its speed already lags behind the rate of the explosive growth of publically available chemical databases which already exceed billions of entries. This recent surge of available chemical entities presents great opportunities for discovering novel classes of small molecule drugs but also brings a significant demand for faster docking methods. In the current thesis, we illustrated the need for a faster screening method by virtually screening 7.6 million molecules against Thymocyte selection-associated high mobility group box protein (TOX). Then we demonstrated that the deep learning-based method of ‘Progressive Docking (PD2.0)’ can speed up such virtual screening by up to hundred folds. In particular, by utilizing deep learning QSAR models trained on the docking scores of a subset of the database, one can approximate in an iterative manner the docking outcome of unprocessed entries. We tested the developed method against various targets including ETS transcription factor ERG, Estrogen Receptor Activation Function 2 (ERAF2), Androgen Receptor (AR), Estrogen Receptor (ER), Sodium-Ion Channel (Nav1.7) and many more.