Abstract
Background
Bioequivalence risk assessment as an extension of quality risk management lacks examples of quantitative approaches to risk assessment at an early stage of generic drug development. The aim of our study was to develop a model-based approach for bioequivalence risk assessment that uses pharmacokinetic and physicochemical characteristics of drugs as predictors and would standardize the first step of risk assessment.
Methods
The Sandoz in-house bioequivalence database of 128 bioequivalence studies with poorly soluble drugs (23.5% non-bioequivalent) was used to train and validate the model. Four different modeling approaches, random forest, XGBoost, logistic regression and naïve Bayes, were compared.
Results
Among the best performing machine learning models, random forest was selected and optimized for the number of features, resulting in an accuracy of 84% on the test data set. The most important features for prediction were those related to solubility (dose number, acid dissociation constant), absorption and elimination rate, effective permeability, variability of pharmacokinetic endpoints, and absolute bioavailability. All features had a conceivable influence on the model predictions.
Conclusion
The model was used to develop a bioequivalence risk assessment approach to categorize drugs in early development into high, medium or low risk classes.
Graphical abstract
By Dejan Krajcar, Dejan Velušček, Iztok Grabnar