Models of Regioselectivity (Likely Sites of Metabolic Attack) for Human Cytochrome P450 Enzymes 1A2, 2A6, 2B6, 2C8, 2C19, 2C9, 2D6, 2E1, and 3A4

Show me some examples!

Metabolism plays a critical role in the bioavailability of drugs and in drug-drug interactions. The cytochrome P450 enzymes (CYPs) are probably the most important class of Phase I metabolizing enzymes accounting for the majority of Phase I metabolic transformations of most drugs. Knowledge of the specific metabolites resulting from these transformations is often important in understanding toxicities, efficacy (in the case of prodrugs), and clearance along with many other key aspects of drug pharmacokinetics. Complementing our substrate classification models, we have also developed models for predicting atomic sites of metabolic oxidation for CYP isoforms 1A2, 2A6, 2B6, 2C8, 2C19, 2C9, 2D6, 2E1, and 3A4. (Click here for a few examples.) These five CYPs account for ~90-95% of CYP-mediated reactions. Tools for generating the likely metabolite(s), given the specific atomic site(s) of metabolism are in development, as is extension of the substrate classification and site of oxidation models to additional CYP isoforms.

The datasets used to train the models make use of our own extensively curated and updated version of the Accelrys Metabolite database (distributed as the Symyx Metabolite database prior to the merger of Symyx and Accelrys) for the majority of reactions and associated literature references. Other data sources included published datasets of sites of metabolism and general review articles. The data underwent substantial curation based on information contained in original literature citations as well as additional references found in the course of this work. These curation updates included corrections to reactant and product structures, CYP assignments, and often additional sites identified in more recent publications. Some site assignments were removed when further literature investigations revealed the particular reaction in question was mediated by a different CYP or even a non-CYP enzyme.

AP_Metabolite_7.png
Molecule-based performance of CYP 450 site models. "Top N" columns show a percentage of molecules correctly classified when one of the top N scoring atoms is an observed site of metabolism.

Each candidate atom of a molecule receives a continuous score (a propensity towards metabolic attack) from the site model for a given CYP. The highest scoring atom of a given molecule is classified as a site (unless the score is below a predetermined threshold) and all other atoms of that molecule within a predetermined fraction of that score are also assigned as sites. The atomic scores and the site assignments may be viewed in the Structure Visualization window within ADMET Predictor. A sample display of atomic CYP scores is shown below:

AP_Metabolite_1.png

The scores range from 0-1000 with higher scores indicating a greater likelihood of being a metabolic site. The highest scoring atom is highlighted with a red hashed circle. Other atoms with scores near the maximum are also highlighted (none in this case). Atoms that are not candidate sites are not assigned a score.

It is critical to bear in mind that the models were trained on known substrates of a given CYP enzyme. Many factors determine WHETHER a molecule is actually an observed substrate, conditions-dependent chemical kinetics being one of them. The atom properties that make it a site of metabolism are independent of whether or not the molecule is a substrate and, hence, can be metabolized. Consequently, the site prediction displays depend on the corresponding CYP substrate classification models described below. If the molecule is predicted to be a non-substrate, then the corresponding Structure Visualization display does show predicted sites highlighted with gray (rather than red) hashed circles as seen below. In other words, gray highlights mean: "this/these would be likely atomic site(s) if the molecule were a substrate".

AP_Metabolite_2.png

Show me some examples!


Substrate Classification Models for Cytochrome for Human Cytochrome P450 Enzymes 1A2, 2A6, 2B6, 2C8, 2C19, 2C9, 2D6, 2E1, and 3A4

ADMET Predictor human P450 substrate classification package includes models that can predict if a given chemical structure is a substrate or not of these five P450 isozymes in Phase 1 metabolism processes.

The P450 substrate/non-substrate data sets were compiled from the Accelrys Metabolite database, Drugbank database, as well as other public resources. Significant data curation efforts have been carried out with the original research literature. A large number of errors in the literature and databases, such as incorrect chemical structures and incorrect enzyme information, have been identified and corrected.

AP_Metabolite_8.png
Performance of substrate models of Cytochrome P450: 1A2, 2C19, 2C9, 2D6, and 3A4.


Inhibition Models for Human Cytochrome P450 Enzymes 1A2, 2C19, 2C9, 2D6, and 3A4

The inhibitory potency of drugs against cytochrome P450 is important for the study of drug toxicities and drug-drug interactions. ADMET Predictor™ CYP P450 inhibition classification package includes four global inhibition models for CYP 1A2, 2C19, 2C9, 2D6, and 3A4 isoforms, as well as two substrate-specific inhibition models for Human Liver Microsome (HLM) CYP 3A4 with midazolam as substrate and recombinant expressed CYP 3A4 with testosterone as substrate.

The inhibitor/non-inhibitor cutoff values were set at Ki = 30 M for CYP 1A2. Exact threshold values for the remaining models remain unknown, since the training databases for these models contained only binary indicators. Non-substrates which have not been identified as inhibitors for CYP 2C19, were accepted as non-inhibitors, as very little non-inhibitors for CYP 2C19 have been publically identified.

MET_Inh.PNG
Performance of Inhibition Models of CYP P450 1A2, 2C19, 2C9, 2D6, and 3A4

In addition, ADMET Predictor features two regression models for predicting the substrate-specific inhibition constant, Ki, values in microM for HLM CYP 3A4 with midazolam and recombinant expressed 3A4 with testosterone as substrates, respectively.

MET_Inh_Regr.PNG
ADMET Predictor MET_3A4_Ki_mid (left) and MET_3A4_Ki_tes (right) Models Validation


Kinetic Models for Metabolism by Human Cytochrome P450 Enzymes 1A2, 2C9, 2C19, 2D6, and 3A4

Michaelis-Menten constant (Km), a measure of the affinity of the enzyme for its substrate, maximum metabolic rate (Vmax), and intrinsic clearance CLint are three important parameters of the activities of cytochrome P450 enzymes which constitute a superfamily of hemoproteins involved in the drug metabolism in human body. Different metabolites are possible for the same molecule and the same enzyme, and they are usually formed with different rates.

Within its Metabolism Module, ADMET Predictor provides a human CYP450 enzyme kinetic models package including Km, Vmax, and CLint models for five important CYP isozymes 1A2, 2C9, 2C19, 2D6, and 3A4. It is crucial to understand that these models are resolved atomically, i.e., the three kinetic parameters are predicted for each detected site of metabolic attack by the mentioned CYPs. Previous versions of these models delivered only an overall rate of metabolism per substrate. The models were developed using Artificial Neural Network Ensemble (ANNE) methodology and 2D molecular descriptors. Although the three models were developed independently, their outputs have been reconciled to always obey the CLint = Vmax / Km relationship. The predicted parameter values are expected to be used in human physiological pharmacokinetic/pharmacodynamic (PK/PD) models for purposes of risk assessment and to support decision-making in drug discovery.

Experimental Km, Vmax, and CLint data for 77-165 individual reactions per enzyme were compiled from the literature with careful examination of the original articles. The dataset contained substrates for each enzyme with kinetic parameters measured from in vitro metabolic studies on cloned virus-infected cells expressing human enzyme-specific microsomes (recombinant data). In addition, the dataset also included 138 human liver microsomal CYP3A4-mediated metabolic reactions (HLM data). Note that experimental results obtained with recombinant enzyme can differ substantially from results obtained with liver microsomes due to differences in P450 Reductase : CYP ratios and in cytochrome b5 content.

The ranges of squared correlation coefficient, R2, and root-mean-squared error (RMSE, in log units) - all calculated on an external test sets - are shown in the table below.

Models R2 range RMSE range
log Km 0.626-0.864 0.422-0.554
log Vmax 0.470-0.830 0.465-0.566
log CLint 0.647-0.861 0.430-0.649

MET_2D6.PNG ADMET Predictor MET_2D6_Km (left), MET_2D6_Vmax (center), and MET_2D6_CLint (right) Models' Performance

MET_2D6_kinetics_propranolol_deriv.PNG An illustration of site-specific kinetics of metabolic oxidation of a propranolol derivative by human CYP 2D6. The first diagram shows metabolic propensity scores calculated by our regioselectivity model CYP_2D6_Sites.


Probability of Metabolism by Human Uridine 5'-Diphosphate-Glucuronosyltransferases (UGT)

The UGT enzymes, distributed in various organs in human body, catalyze in Phase II metabolism the glucuronidation reaction (formation of a linkage between glucuronic acid and a nucleophile) leading to an easier elimination of xenobiotics. Some compounds are directly catalyzed by UGTs without ever having first been metabolized by Phase I enzymes. Most of the enzymes in humans are produced by the liver, but one, UGT 1A10, is generated by the GI tract.

We have developed classification QSAR models from literature data for nine UGT isozymes that cause Phase II drug metabolism: UGT1A1, UGT1A3, UGT1A4, UGT1A6, UGT1A8, UGT1A9, UGT1A10, UGT2B7, and UGT2B15. The UGT models predict whether a compound will be metabolized by one or more of these enzymes and were developed using the Artificial Neural Network Ensemble (ANNE) training methodology. Independent test sets were used for validation of all models.

MET_UGT.PNG
Performance of Classification Models for UGT-mediated Metabolism