The New Zealand Medical Journal

 Journal of the New Zealand Medical Association, 11-June-2010, Vol 123 No 1316

Illness severity scoring for Intensive Care at Middlemore Hospital, New Zealand: past and future
Susan L Mann, Mark R Marshall, Alec Holt, Brendon Woodford, Anthony B Williams


The Acute Physiological and Chronic Health Evaluation (APACHE) II score is a popular illness severity scoring system for intensive care units. Scoring systems such as the APACHE II allow researchers and clinicians to quantify patient illness severity with a greater degree of accuracy and precision, which is critical when evaluating practice patterns and outcomes, both within or between intensive care units. The study aims to: assess changes in APACHE II scores and hospital-standardised mortality ratio at our ICU over a nine year period from 1 January 1997 to 31 December 2005; assess for changes in the performance of the APACHE II scoring system in predicting patient hospital mortality over the same period; and assess for any clinical subgroups in which APACHE II scoring was particularly inaccurate or imprecise.

Retrospective audit of a single centre relational database, with evaluation of the APACHE II scoring system by year through discrimination (ability to discriminate between the patients who will die or survive at hospital discharge) using receiver operating characteristic (ROC) curves, and calibration (ability to predict mortality rate over classes of risk) using goodness-of-fit as assessed by the Hosmer-Lemeshow statistic.

Data from 7703 patients were available for analysis. There was a decrease in overall hospital mortality, from approximately 19% at the beginning of the period of observation to approximately 12% at the end. There was also a decrease in the hospital standardised mortality ratio from 0.94 (95%CI 0.82–1.06) to 0.66 (95%CI 0.55–0.76). In general, both the APACHE II score and risk of death model performed adequately in each year with ROC curve AUCs of >0.8, albeit with progressively poorer performance over time and ‘model fade’ that approached statistical significance. There was progressively poorer calibration with the APACHE II risk of death model as indicated by the Hosmer-Lemeshow statistic, with a statistically significant difference between the predicted and observed mortality from 2003 onwards. Overall, there was moderately poor model performance in the diagnostic groups with the largest number of patients (sepsis and post-surgical complications).

This study shows the progressively worse performance of the APACHE II illness severity scoring system over time due to ‘model fade’. This is especially so in common diagnostic categories, making this a clinically relevant finding. Future approaches to illness severity scoring should be tested and compared, such as re-estimating coefficients of the APACHE II diagnostic categories or using locally developed ones, moving to later evolutions of the system such as the APACHE III or APACHE IV, or developing novel artificial intelligence approaches.

