Machine learning driven bioequivalence risk assessment at an early stage of generic drug development

Publication: Eur J Pharm Biopharm
Division: PBPK

Abstract

Background

Bioequivalence risk assessment as an extension of quality risk management lacks examples of quantitative approaches to risk assessment at an early stage of generic drug development. The aim of our study was to develop a model-based approach for bioequivalence risk assessment that uses pharmacokinetic and physicochemical characteristics of drugs as predictors and would standardize the first step of risk assessment.

 

Methods

The Sandoz in-house bioequivalence database of 128 bioequivalence studies with poorly soluble drugs (23.5% non-bioequivalent) was used to train and validate the model. Four different modeling approaches, random forest, XGBoost, logistic regression and naïve Bayes, were compared.

 

Results

Among the best performing machine learning models, random forest was selected and optimized for the number of features, resulting in an accuracy of 84% on the test data set. The most important features for prediction were those related to solubility (dose number, acid dissociation constant), absorption and elimination rate, effective permeability, variability of pharmacokinetic endpoints, and absolute bioavailability. All features had a conceivable influence on the model predictions.

Conclusion

The model was used to develop a bioequivalence risk assessment approach to categorize drugs in early development into high, medium or low risk classes.

 

Graphical abstract

By Dejan Krajcar, Dejan Velušček, Iztok Grabnar