Two-Step Imputation and AdaBoost-Based Classification for Early Prediction of Sepsis on Imbalanced Clinical Data
journals.lww.comSepsis is a life-threatening response to infection that causes tissue damage, organ failure, and death. Effective early prediction of sepsis would improve patients’ diagnosis and reduce the cost associated with late-stage sepsis infection by applying appropriate early intervention. However, effective early prediction is challenging because sepsis biomarkers are neither obvious nor definitive, and sepsis datasets are heavily imbalanced against positive diagnosis of sepsis while containing significant missing values. Early prediction of sepsis in ICUs using clinical data is the objective of the PhysioNet/Computing in Cardiology Challenge 2019.
In this article, we proposed a machine learning algorithm to aid in the early detection of sepsis.
We applied linear interpolation and implemented a sample weighted AdaBoost model to predict sepsis 6 hours before clinical diagnosis.
Medical data contains more than 40,000 patients gathered from three geographically distinct U.S. hospital systems that consisted of a combination of hourly vital sign, lab values, and static patient descriptions.
The challenge metric, however, did not directly reward models for their generalizability across institutions.
The article is evaluated using a new metric called Utility Score that is defined as Official scoring criteria. Our approach was among the top 10% of entries to the Challenge on a hidden test set.
Herein, we demonstrate that our proposed approach was the most effective of the Challenge entrants when such generalizability is explicitly accounted for in model evaluation.