![]()
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Illness severity scoring for Intensive Care at
Middlemore Hospital, New Zealand: past and future
Susan L Mann, Mark R Marshall, Alec Holt, Brendon Woodford,
Anthony B Williams
Illness severity scoring systems such as the Acute
Physiological and Chronic Health Evaluation (APACHE) have become important tools
for the evaluation and planning of intensive care practice patterns. These
systems objectively estimate patient risk for mortality from acute physiological
and chronic health status. They are not, however, a tool used for deciding
treatment for individual patients; they are a group measurement used for
patients who have similar disease processes. Their origin in the late 1970s and
early 1980s was driven by the need to relate such practice patterns to patient
outcomes.
In the modern setting, tools such as the APACHE scoring
system allow researchers and clinicians to quantify patient illness severity
with a greater degree of accuracy and precision, which is essential for
benchmarking and program evaluation. The interest in illness severity scoring
systems is evidenced by the extensive body of literature that continues to
advance both technical aspects of the systems themselves, and the applications
for which they are used.
Middlemore Hospital was one the earlier facilities in New
Zealand to implement APACHE II scoring in clinical settings. The routine scoring
of patients began in the Intensive Care Unit (ICU) in 1986. This advance was
facilitated in no small part by the availability of one of the developers of the
APACHE system, Dr Jack Zimmerman, who spent an extended sabbatical in New
Zealand, some of which was at Middlemore Hospital.
Despite enthusiastic support for APACHE II scoring by
international opinion leaders at the time, the relevance and utility in New
Zealand has been questioned from an early stage. The external validity of the
system in such a different population from that in is developed was acknowledged
by Zimmerman et al:1
The NZ hospitals designated
1.7% of their total beds for intensive care compared to 5.6% in the US
hospitals. The average age for NZ admissions was 42 compared to 55 in the US
(p<0.0001). The NZ ICUs admitted fewer patients with severe chronic failing
health (NZ 8.7%, US 18%) and following elective surgery (NZ 8%, US 40%).
Approximately half the NZ admissions were for trauma, drug overdose, and asthma
while these diagnoses accounted for 11% of US admissions. When controlled for
differences in casemix and severity of illness, hospital mortality rates in NZ
were comparable to the US. This study demonstrates substantial differences in
patient selection among these US and NZ.
Furthermore, after more than two decades of use, it is
unclear whether the performance of the APACHE II has been maintained. Patient
casemix in New Zealand has changed from earlier times, and the improvements in
supportive care that are now available may have decreased mortality for any
given illness severity.
International opinion leaders are in general moving towards
the more recently developed scoring systems such as APACHE versions III and IV,
which have been shown to outperform older versions in studies of North American
and European ICU populations.2
This paradigm is slowly translating to clinical practice in
this part of the world: the Australian and New Zealand Intensive Care Society
adult patient database now collect data sufficient to model both APACHE II and
III scores.3
There are three aims of this study. We aim to:
MethodsStudy population and
setting—Middlemore Hospital is the main hospital within the
Counties Manukau District Health Board (CMDHB). The hospital serves a large
urban population. The district catchment includes Manukau City which is rapidly
expanding: the population has grown from 356,006 in 1996 to 454,655 at last
census in 2006. The population can be summarily characterised as being young ,
multi-ethnic, and of low socioeconomic status compared with the rest of New
Zealand.4
Middlemore Hospital is a tertiary referral centre for
plastic surgery, burns, orthopaedics, and a range of medical sub-specialities.
Any patient requiring neurosurgical or cardiothoracic surgical intervention is
referred on to Auckland City Hospital as Middlemore Hospital does not have these
facilities; all other patient categories remain at Middlemore Hospital.
Although there is a specialist regional paediatric
hospital in the area, Middlemore Hospital is also a paediatric hospital; the
Middlemore Hospital ICU therefore cares for those children down to 2 kg weight
requiring intensive care accounting for approximately 120 paediatric admissions
per year.
The hospital is academically affiliated and thus a
teaching institution. Middlemore Hospital has had between 700 and 900 acute beds
over the time in which this research was done, and now also includes a satellite
surgical centre which caters for the majority of elective cases apart from those
that are particularly high risk.
Currently, the Middlemore ICU is nominally a seven
funded-bed Level 3 facility. Since the inception of the Middlemore Hospital ICU
in the late 1960s, the unit has been structurally modified on several occasions.
As a result of the both national and local changes in healthcare strategy, the
unit had at times had nominated HDU beds, and at other times not. Since 2004,
there has been a four-funded bed Level I intensive care unit at a satellite
surgical centre, which currently shares clinical governance, staff, policies and
procedures with the main ICU at Middlemore Hospital. These patients were not
included in this study.
Data source—All data were
sourced from a single-centre relational database that has been in continuous use
at the Middlemore Hospital ICU since January 1986. The database contains
information on all patients admitted to ICU during this period, using data that
is prospectively collected, collated, and agreed upon by senior specialists and
the charge nurse at the time.
Data collection was progressively expanded during this
period to ultimately include demographic information, APACHE II score,
diagnostic information, ventilatory and inotropic support, procedures performed,
and patient outcome. Patients who were less than 15 years of age, or who had
been admitted solely for the purpose of a procedure such as difficult central
venous line or endoscopy were not scored, as the system was not devised for
these groups. The database specifically includes both patient death at both ICU
and hospital discharge.
The database includes locally developed diagnostic
codes (“adclasses” and “subclasses”) in addition to the
APACHE II ones, which were developed to better reflect and discriminate disease
categories related to the local population (see Appendix). Generic APACHE II
diagnostic codes do necessarily provide a realistic reflection of the local
disease categories and population outcomes. They can be ‘localised’
by adjustments to either disease categorisation and / or the category weights
subsequently used with the APACHE II scores for calculating risk of death
supported in the case of Middlemore Hospital by Zimmerman et al who emphasised
differences between North American and New Zealand ICU patient
populations.1
Data were prospectively stored in Microsoft Access
(Microsoft Corporation, Seattle, WA, USA), and retrospectively abstracted for
analyses from a 9-year period from 1 January 1997 to 31 December 2005.
Calculation of APACHE II scores and risk of
death—All APACHE II scores and risk of death were calculated at
patient hospital discharge using the prospectively stored data and the logistic
regression equation developed by Knaus et al.5
The data for calculation of the APACHE II score included physiological
measurements in the first 24 hours of ICU admission, age and chronic health
status.
The APACHE II risk of death is calculated not only from
scores but also diagnostic categories, which were rigorously and continuously
evaluated by the senior ICU medical staff during the process of prospective data
collection. Such minimisation of misclassification was necessary to avoid error
arising from the heavy reliance of the APACHE II risk of death formula on reason
for ICU admission.
Statistics—Standard statistics
were used to describe data, making particular use of median and interquartile
range to avoid assumptions around data distribution. Hypothesis testing was
undertaken using Kruskal-Wallis equality-of-populations rank test for continuous
variables, and the Pearson's Chi-squared test for categorical ones.
Risk-adjusted mortality by year was assessed by
hospital standardised mortality ratios and 95% confidence intervals (regarding
observed mortality as a binomial variable), which were obtained by dividing the
number of observed hospital deaths in each year by the number of predicted ones
using the APACHE II system.6
Overall predictive performance of the APACHE II scoring
system by year was gauged through discrimination (ability to discriminate
between the patients who will die or survive at hospital discharge) and
calibration (ability to predict mortality rate over classes of risk).
Discrimination was assessed using receiver operating characteristic (ROC)
curves, which plot the true positive rate (sensitivity, or predicted hospital
deaths / observed hospital deaths) against the false positive rate
(1-specificity, or 1-predicted hospital deaths / observed hospital deaths).
The predictive performance is indicated in this method
by the ROC area under the curve (AUC), with a value of 0.5 equating to random
prediction and a value of 1.0 equating to perfect discrimination. The slope the
curve indicates ratio of true positives and false positives, which also is known
as the likelihood ratio.7 For the analyses in
this article, equality of ROC AUC for each year of study was
compared.8
Calibration was assessed using the correspondence
between the number of observed hospital deaths and the number of predicted
hospital deaths within each 10% stratum (decile) of the cohort’s expected
risk of death. The predictive performance is indicated in this method by
goodness-of-fit as assessed by the Hosmer-Lemeshow
statistic.9
The predictive performance of the APACHE II scoring
system in major clinical subgroups was assessed by discrimination using hospital
standardised mortality ratios within each of the major
“adclasses”.
All analyses were performed using Microsoft Excel
(Microsoft Corporation, Seattle, WA, USA) and Intercooled Stata 9.2 (Statacorp,
College Station, TX, USA) software.
Ethics—The need for formal
approval for the research process was waived by the National (New Zealand)
Health and Disability Ethics Committee under the provisions made for clinical
audit.
ResultsData from 7703 patients were available for analysis.
Baseline patient characteristics are presented in Table
1. Numbers of patients admitted to the ICU increased steadily from 686 in
1997 to 730 in 2005. The demographic characteristics of patients changed over
the period of observation, with a trend to older and more Māori patients.
There has also been a change in casemix of patients, with a
reduction in the number of patients with diagnoses of poisoning and trauma, and
an increase in the number of patients admitted after elective or emergency
surgery. Patient length of stay has progressively reduced, as has the proportion
of patients requiring mechanical ventilation. Overall hospital mortality
decreased from approximately 19% at the beginning of the period of observation
to approximately 12% at the end.
Figure 1. APACHE II scores and risk scores by
year, presented as boxplots
![]() Note: In these plots, the middle horizontal line represents the median; the box the second and third quartiles; and the whiskers the upper and lower extreme values which are no more than 1.5 × the interquartile range beyond the middle quartiles. Figure 2. Hospital-standardised mortality ratio
and 95% confidence intervals, by year
![]() ![]() The APACHE II score decreased marginally over the period of
observation as illustrated in Figure 1, with a median value of 14 in 1997 (IQR
9–21) and a corresponding value of 13 in 2005 (IQR 9–21). Although
this reduction did achieve statistical significance (p=0.0001), it cannot be
regarded as being clinically important. APACHE II predicted risk of death has
remained stable over the period of observation, with a minor trend to reduction
that did not achieve statistical significance (p=0.11).
The hospital-standardised mortality ratio decreased over the
period observation as illustrated in Figure 2, with a value of 0.94 (95%
confidence intervals 0.82–1.06) in 1997 and a corresponding value of 0.66
(95% confidence intervals 0.55–0.76) in 2005. Model adequacy for
discrimination by APACHE II score is illustrated by year in Figures 3 and 4. In
general, the APACHE II score performs adequately in each year with ROC curve
AUCs of >0.8. However, there is deteriorating accuracy of mortality
predictions over time (otherwise known as ‘model
fade’10 that approaches statistical
significance.
Corresponding model adequacy for discrimination by APACHE II
predicted risk of death is illustrated in Figures 5 and 6. The risk model
performs similarly to the APACHE II score showing a like degree of ‘model
fade’.
Figure 3. ROC curves for APACHE II Score, by
year
![]() Note: The predictive performance is
indicated by the ROC area, with a value of 0.5 equating to random prediction and
a value of 1.0 equating to perfect discrimination.
Figure 4. ROC curve AUC (95%CI) for APACHE II
Score, by year, as shown in figure 3
![]() Note: Marker labels indicate the P
value for the test of equality of ROC areas relative to the reference year of
1997.
Figure 5. ROC curves for the APACHE II Risk
score by year
![]() Note: The predictive performance is
indicated by the ROC area, with a value of 0.5 equating to random prediction,
and a value of 1.0 equating to perfect discrimination.
Figure 6. ROC curve AUC (95%CI) for APACHE II
Risk Score by year as shown in figure 5.. Marker labels indicate the P value for
the test of equality of ROC areas relative to the reference year of
1997
![]() Number of Patients in Each
Decile of Risk
Figure 7. Calibration curves for APACHE II
predicted risk of death, by year showing the number of observed and predicted
deaths within each 10% stratum (decile) of the cohort’s expected risk of
death. Predictive performance is assessed by the Hosmer-Lemeshow statistic (see
table 2)
![]() Table 2. Model adequacy for calibration by
APACHE II predicted risk of death, by year as indicated by the Hosmer-Lemeshow
goodness-of-fit statistic for each of the calibration curves in figure 7. A high
Hosmer-Lemeshow statistic and a P value <0.05 indicates poor correspondence
between the number of observed and predicted deaths within each 10% stratum
(decile) of the cohort’s expected risk of death
Figure 8. Hospital-standardised mortality ratio
(observed/predicted hospital deaths) for clinical diagnostic subgroups
(“adclasses” as described in Appendix)
![]() Model adequacy for calibration by APACHE II predicted risk
of death is illustrated by year in Figure 7. There is progressively poorer
goodness-of-fit as indicated by the Hosmer-Lemeshow statistic, with a
statistically significant difference between the predicted and observed
mortality from 2003 onwards as shown in Table 2.
Figure 8 illustrates model adequacy for discrimination by
APACHE II predicted risk of death, according to clinical diagnostic subgroup.
Although model adequacy was poorest in patients with neurological failure, there
were only a small number of patients in this group. In contrast, the large
number of patients with sepsis, respiratory failure, postoperative status, and
circulatory failure makes the moderately poor model adequacy in these clinical
subgroups more clinically relevant.
DiscussionOur data show that there has generally been a change in the
overall casemix of patients admitted to the Middlemore Hospital ICU, with a
decrease in the number of patients with poisonings and trauma over the period of
observation, and an increase in those with complications as a result of surgery.
APACHE II scores have remained fairly constant over the
period of observation, with only a subtle trend to decreasing patient illness
severity that did not achieve statistical significance. The data also show that
there has been with a reduction in crude and risk-adjusted mortality, as
assessed mortality rates and by hospital standardised mortality ratios. Despite
this, there has been a steady drop in the proportion of patients receiving
mechanical ventilation over the period of observation, and the average length of
patient stay.
Correlation between mechanical ventilation and increments in
length of patient ICU stay has been noted in other studies
3. This change in outcomes and practice pattern
may reflect the benefits of clinical pathways within our hospital, and the
earlier detection and correction of physiological derangements that occurs in
the modern, more pro-active approach to provision intensive care.
An alternative, more pessimistic view is that this scenario
may reflect earlier discharges from our ICU to accommodate increasing demand in
a setting of increasingly limited resources. Reassuringly, if this latter
scenario is the true one, then outcomes appear to have been maintained despite
this.
The data are in general terms consistent with a recent paper
by Moran et al reporting on intensive care outcomes using an international
Australian and New Zealand ICU database (ANZICS database), which to date has not
included data from Middlemore Hospital and can therefore be regarded as
independent. These investigators reported an improvement in overall
risk-adjusted mortality over the last 11 years, which they did not attribute to
any one specific factor3.
Most medical administrators and practitioners would consider
these improved outcomes to be in some part causally related to corresponding
improvements in clinical care and therapeutic interventions. It would, however,
take a more complex minimum dataset than both the ANZICS database and our local
one to study this question appropriately.
There are two major findings of this study relating to the
predictive performance of the APACHE II system. The first is that there has been
progressive deterioration model adequacy in terms of both discrimination and
calibration. Predictive performance is generally acceptable when ROC curve AUCs
are >0.8, and using these and similar criteria it seems that continuing use
this system in our current practice may be unreasonable. The second is that the
APACHE II system has been better sustained in some clinical diagnostic subgroups
but not others.
As is common to most ICUs, the largest clinical diagnostic
subgroup in our dataset is sepsis and post-surgical complications, and the
APACHE II system has moderately poor model adequacy in this subgroup, with
prediction error of between 25-50%. Of note, the subgroups with the largest
prediction error in our dataset constitute only ~10% of the entire Middlemore
ICU population.
The finding of ‘model fade’ over time is also
consistent with those of Moran et al, who demonstrated deteriorating model
adequacy for the APACHE II system over time, both in terms of both
discrimination and calibration. This was the case even after the authors
recalibrated the APACHE II model by re-estimating coefficients for the
Australasian population, thereby optimising discrimination and calibration.
This is an important subtlety, since the performance of all
illness severity scoring models is well known to be poorer in populations that
are different from those in which they have been developed. This simple
recalibration adjusts for geographical differences in measured patient
characteristics (physiology and diagnosis), although it does not consider ICU
characteristics and different organizational characteristics of healthcare
systems as a predictive variable. The Intensive Care National Audit and Research
Centre (ICNARC) model is in essence an adaptation of the APACHE model that was
developed by Rowan et al. in the 1990s in the United
Kingdom,10 but over the years has resulted in a
completely independent model that is widely used in the
UK.11
Opinion leaders now recommend regular recalibration of
illness scoring systems to local and more contemporary
cohorts,12 although to our knowledge there is
no consensus or even propositions concerning thresholds for model performance
that would trigger the recalibration process, or standardised methodology around
the recalibration itself.
‘Model fade’ and poor model performance in
diagnostic subgroups have led to the evolution of existing scores into a third
and fourth generations of illness severity scoring systems, such as SAPS III and
APACHE III and IV.2,12 The evolution of these
scores did not involve simple recalibration of models by re-estimating
coefficients, and instead involved the application of new statistical methods,
the addition of new variables, an increase in the number of diagnostic groups,
and a change to the measurement of certain physiological and diagnostic
variables.
These scores can be expected to perform better as a result
of their development in a cohort that is more contemporary and externally valid
in terms of casemix, and also by using clinical information that was not
initially taken into consideration during the development of the earlier
systems.
There is a widespread move amongst ICUs to this newer
generation of illness scoring systems, although their performance is only
marginally better than earlier versions of the scores that have been more simply
recalibrated by re-estimating coefficients.13
Notwithstanding, the APACHE III system is currently used more widely in the USA,
with demonstrably greater discrimination and calibration than the original
APACHE II system.2 It is too early to say at
this time whether more recent evolutions of these systems such as the APACHE IV
and SAPS III systems will demonstrate continued improvement.
The findings of our study do not address one of the
conundrums of illness severity scoring: the interpretation of changes in scores
and outcomes over time. As with other studies, it is impossible to tell from our
data whether our results are due to improved patient care and access to care, or
alternatively from the deteriorating performance of scoring systems because of
changing patient casemix.
Our cumulative clinical experience is in keeping with
others: ICU patients are in general sicker than previously, with improving
outcomes despite this. Confirmation of this perception will only be forthcoming
with studies that extend data collection to include other indicators of patient
illness severity and practice patterns, and the use of statistical approaches
that use causal or structural times series modelling.
The strength of this study is its size and completeness.
This study, running from 1997 to 2005 inclusive contains a large dataset over a
nine year period without gaps. The major weaknesses of this study are those that
are inherent to any scoring system that is dependent on clinical classification
of patients into diagnostic categories (whether local diagnostic codes
(“adclasses” and “subclasses”) or APACHE II ones). There
are no explicit criteria to improve consistency within or between ICUs in making
these classification, and all due care was taken in our database to limit
subjectivity and optimize accuracy and precision as much as possible.
In terms of the future of illness severity scoring, good
reasons abound for us to persist with the APACHE scoring system at Middlemore
Hospital, as opposed to moving to others such as organ failure scoring systems
(Multiple Organ Dysfunction Score, Sepsis-related Organ Failure Assessment).
The choice of method within any particular ICU is critically
dependent on the degree of confidence in its use; the APACHE scoring systems are
more validated than the other choices at Middlemore Hospital ICU. Moreover, it
is our opinion that the APACHE scoring systems are also subject to rigorous
remodelling and adaptation: this is essential to ensure that the system reflects
changes in underlying characteristics of patients and healthcare delivery
systems, and therefore correctly model the relationships with patients’
outcomes.2,14
Notwithstanding, there have been encouraging results with
loosely-termed ‘artificial intelligence’ approaches. Frize and
Walker reported early success of their pilot of neural networks in both adult
and neonatal intensive care.15 Investigation
into these modelling methods may prove fruitful for the future, and may result
in better performance although this is yet to be definitely
demonstrated.16–18
Our data indicate that we should be preparing to move
forward from the APACHE II system. Three workstreams are suggested by the
results of this study, which should probably be run concurrently with the
results determining the final solution for illness severity scoring.
The first workstream should involve recalibration of the
APACHE II model by re-estimating coefficients for our local population using
local diagnostic codes (“adclasses” and “subclasses”)
and/or APACHE II ones. The second should involve a trial of the APACHE III
system. The third should involve a pilot of artificial intelligence approaches.
The performance of these three approaches in our population
should determine which illness severity scoring system should be used in short
and medium term. However, it would appear that regular re-calibration should be
undertaken irrespective of what model is chosen, in order to minimise
‘model fade’ and provide clinicians and managers interested in
benchmarking a well validated model to predict mortality.
Competing interests: None known.
Author information: Susan L Mann,
Department of Intensive Care Medicine, Counties Manukau District Health Board,
Manukau, South Auckland;
Mark R Marshall, Nephrologist, Department of Internal Medicine, Counties Manukau District Health Board, Manukau, Auckland; Alec Holt, Director Health Informatics Programme, Department of Information Science, University of Otago, Dunedin; Brendon Woodford, Department of Information Science, University of Otago, Dunedin; Anthony B Williams, Intensivist, Department of Intensive Care Medicine, Counties Manukau District Health Board, Manukau, South Auckland Acknowledgements: The authors thank Mr
Mpatisi Moyo (Decision Support, Middlemore Hospital) and Mr Gary Jackson (Public
Health Physician, Counties Manukau District Health Board).
Correspondence: Susan L Mann, PO Box
25-075, St Heliers, Auckland 1740, New Zealand. Fax: +64 (0)9 2760034;
email: smann@xtra.co.nz
References:
Appendix on next page
Appendix
(“Adclass and subclass” classification)
ADCLASS
– Use the first category that fits the
patient
Subclass on next page
SUBCLASS
1. TRA type of trauma 11. CIR
type of circulatory failure
BLUnt DYSrhythmia
PENetrating CCU overflow
BURn PULmonary
embolism AMI (acute)
2.POI
type of poisoning UNDiagnosed shock CGS (cardiogenic
shock)
CYClic medication (+ sedative) CHF (congestive
SEDative medication heart failure
MEDication
(other) MIScellaneous
NON-medicinal
12.CNS type of CNS Failure
4. ANA type of anaphylaxis VIRal
encephalitis
MEDicinal SEIzures
NON-medicinal CVA
SAH
(subarachnoid haem)
5. ASP type of asphyxiation
MIScellaneous
DROwning
HANging 13.GIFtype of GI failure
STRangulation HAEmorrhage
HEPatic failure
7. SEP locus of
sepsis PANcreatitis
BLOod only
ENDocardium 14. METtype of metabolic
failure
GENital
tract GIT tract HEAt stroke
JOInt HYPothermia
MENinges MIScellaneous
RESp tract DIAbetic
SOFt tissues
15. NEUromuscular failure
URInary
tract MYAsthenia
VAScular
catheter MIScellaneous
WOUnd GBS (Guillain-Barré)
MIscellaneous TETanus
10. SUR type of surgery 16. REN type of renal
failure
ABDominal ARF (acute)
ENT CRF (chronic)
FACio-maxillary and dental
GYNaecological
17.RES type of respiratory failure 18 PROcedure type
admitted for
NECk CVP insertion
ORThopaedic Dialysis
PLAstic OTHer
OSA
CORD
MISCELLANEOUS
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Current
issue | Search journal |
Archived issues | Classifieds
| Hotline (free ads) Subscribe | Contribute | Advertise | Contact Us | Copyright | Other Journals |