نشریه علمی مهندسی پزشکی زیستی

Generalizable protein druggability prediction with a two-stage classifier and time-constrained Bayesian optimization

Document Type : Full Research Paper

Authors

Department of Biomedical Engineering, Faculty of Engineering, Meybod University, Meybod, Iran

10.22041/ijbme.2025.2039584.1922
Abstract
Drug discovery demands rapid and reliable identification of protein targets, yet most automated approaches prioritize accuracy while overlooking inference latency, robustness, and reproducible tuning; sequence- and graph-based deep models are often costly and slow, and evolutionary hyperparameter search exhibits high variance. The central gap is the absence of a method that jointly optimizes accuracy and delay while intelligently routing easy and hard cases and setting control parameters correctly. We therefore propose a two-stage classifier built on CatBoost with early exit for high-confidence instances and a deeper pass for ambiguous ones, coupled with latency-aware Bayesian Hyperband (LABO-HB) for hyperparameter tuning, a single-objective target aligning accuracy and latency, and data-driven threshold selection. After a cost–benefit audit of deep versus boosted alternatives, we adopted this two-stage design because it preserves accuracy while reducing inference time and performance variance. Evaluation on balanced ProTar-II, unseen ProTar-II-Ind, and DPI-CDF compiled from DrugBank and Swiss-Prot used rigorous preprocessing, sequence-based feature extraction, strict leakage prevention, and fair train–validation–test splits, and generalization remained stable across internal and external benchmarks. The method achieved 96.6% overall accuracy and an F1 score of 96.5%, while significantly reducing prediction time relative to baselines. The combination of early exit and LABO-HB delivers high accuracy with lower latency and reduced variability, is robust to class imbalance, and yields reproducible outputs. The approach is practically useful in drug design, enabling agile target prioritization during early screening and more judicious allocation of laboratory resources.

Keywords

Subjects


Volume 19, Issue 3
Autumn 2025
Pages 231-240

  • Receive Date 26 August 2024
  • Revise Date 20 September 2025
  • Accept Date 24 November 2025