Abstract
By using an in-house data set of small-molecule structures, encoded by Ghose–Crippen parameters, several machine learning techniques were applied to distinguish between kinase inhibitors and other molecules with no reported activity on any protein kinase. All four approaches pursued—support-vector machines (SVM), artificial neural networks (ANN), k nearest neighbor classification with GA-optimized feature selection (GA/kNN), and recursive partitioning (RP)—proved capable of providing a reasonable discrimination. Nevertheless, substantial differences in performance among the methods were observed. For all techniques tested, the use of a consensus vote of the 13 different models derived improved the quality of the predictions in terms of accuracy, precision, recall, and F1 value. Support-vector machines, followed by the GA/kNN combination, outperformed the other techniques when comparing the average of individual models. By using the respective majority votes, the prediction of neural networks yielded the highest F1 value, followed by SVMs.