Observational studies are a useful tool in epidemiology.[[1]] One study revealed that approximately 68% of published articles in the four leading United States obstetrics and gynaecology journals were of observational nature.[[2]] In obstetric research, they provide the opportunity to study relatively rare adverse events like stillbirth or neonatal death (NND). Despite their value, observational studies come with biases and investigators have an obligation to identify and mitigate these,[[3]] which includes adequate reporting of the study design and methodology.[[4]]
Epidemiological maternity research in New Zealand is usually performed on national Government-held data. The highest quality perinatal data in the country is collected and held by the Perinatal and Maternal Mortality Review Committee (PMMRC), which is an independent committee under the Health Quality & Safety Commission New Zealand. PMMRC data is only made available to a limited number of researchers, due to data sovereignty issues. An alternative data source for maternity research in New Zealand is provided by the Statistics New Zealand (NZ) Integrated Data Infrastructure (IDI). The IDI is a collection of New Zealand Government and non-government administrative and survey data, held by Statistics NZ. Researchers are granted access to merge and interrogate data sources only available within the strict privacy rules of the IDI environment; as projects are required to meet all “five safes” (safe people, safe projects, safe settings, safe data, and safe output).[[5]] Data in the IDI is linked at the individual level, which allows personal information to be connected across different sectors (such as income, migration, and health). Each person is provided with a unique identification (ID) number in a central “spine”, by which the various datasets can be joined. All people ever resident in New Zealand (i.e., citizens or those with visas that allow residency, work or study) and captured in one of the data sources, are included. The IDI data are “refreshed” (i.e., updated, to include newer data, and additional data sources) up to four times a year. By connecting data from multiple sources not otherwise linked on an individual level, questions around complex issues can be researched with high quality across the population.[[6]]
As both maternal and infant data are generally necessary in perinatal analyses, creating datasets for maternity research can be complicated. This methodological report was prepared to assist New Zealand researchers in developing comprehensive datasets for national pregnancy studies, with a focus on perinatal death. We describe a standardised method for creating a “core” dataset within the IDI, allowing for consistent national reporting, and include suggestions for additional tables. We additionally aim to improve the understanding of the used datasets and variables.
An application was made with Statistics NZ to use the data within the IDI. Once approved, a dataset for perinatal research was built in Microsoft SQL Management Server Studio (using IDI refresh IDI_Clean_20211020). National data sources used, were the Maternity Collection (MAT),[[7]] Births, Deaths and Marriages (BDM),[[8,9]] the National Minimum Dataset (NMDS),[[10]] the Mortality Collection (MORT),[[11]] the Chronic Conditions dataset (CC),[[12]] Census,[[13]] and Immigration data.[[14]] “General” or “central” IDI tables used included the full birth date, full death date, and address notification tables.[[15]] A succinct and essential summary of these data sources has been provided in Appendix 1. See Table 1 for an overview of abbreviations used in this methodological report. Finally, PMMRC publicly available annual report data was used as the gold standard to validate the tables created.[[16]]
A numerator table (including all perinatal deaths) and a denominator table (including all births) were created separately, to include all births between 2008 and 2017. The main dataset used was the MAT collection. Data quality of the MAT collection varies by item but has improved significantly since 2008.[[17]] Thus, data from 2008 onward are most useful for perinatal research. The proposed method for creating the core dataset, as well as two corresponding full SQL codes (for a table in and excluding multiples) are provided in Appendices 2–4.
Even though data quality has improved since 2008, some variables still have a high degree of missing data. This is particularly true for women cared for exclusively by district health boards (DHB), due to a funding change introduced in July 2007. DHB employed midwifery teams are no longer required to claim for primary maternity services, while self-employed community-based lead maternity carers (LMCs; midwives, general practitioners, or obstetric specialists) must submit pregnancy data prior to payment from the Ministry of Health (MoH).[[18]] As a result, completeness of some data for DHB-registered pregnancies varies widely, while the same data for non-DHB pregnancies is mostly complete. It should also be noted that since 2008 the variable “LMC type” is inaccurate for women under DHB care.[[17]] Finally, when considering data quality issues, some variables can be taken from either MAT or BDM data sources. We validated the following variables for use: maternal age, fetal sex, gestational age, and birthweight.
Pregnancy complications such as gestational diabetes (GDM) or hypertensive disorders of pregnancy are important outcomes in routine maternity research. The MAT delivery table in the IDI identifies births that were complicated by pre-eclampsia or eclampsia, identified by NMDS. MAT, however, does not identify pregnancies of mothers with pre-existing hypertensive disorders or gestational hypertension. Secondly, the indicator for GDM is not offered in the IDI as this field is incorrectly labelled. It indicates both pre-existing diabetes and GDM, and according to the MoH no validation process is undertaken for this field. For that reason, we propose to add NMDS and CC datasets to the core dataset. Unfortunately, data on primary care diagnoses are not available in the IDI. Hospital admissions can be joined to the correct pregnancy by maternal ID and admission dates. Following a similar method, mothers with pre-existing diabetes (as an important risk factor) can be identified by joining the CC dataset.
Ethnicity is an important demographic risk factor widely used in pregnancy research in New Zealand, and key to examining health equity. Characteristics of ethnicity recorded in New Zealand include that it is self-defined, it can change over time and an individual may identify with more than one ethnic group. The use of ethnicity data in health research is addressed by the Ethnicity Data Protocols for the Health and Disability Sector by the MoH.[[19]] According to this protocol, ethnicity data can be categorized at four different “levels” following a hierarchical structure; with level four being the most detailed level of reporting (containing 231 ethnicity codes). These codes are then aggregated into ethnicity levels three to one. As an illustration, code “43112” stands for Fijian Indian and aggregates into “431” Indian (level three), “43” Indian (level two) and “4” Asian (level one). Generally, level two ethnicity data are used in health research for reporting, which includes 22 groups. In this aggregation a high level of detail is maintained for some ethnicities (such as Māori, Pacific Peoples, Chinese or Indian), while other minority groups are merged despite large heterogeneity (such as other Asian ethnicities, African or Latin American).
Unfortunately, the MAT dataset only holds level two ethnicity data. Due to the data collection methods of MAT, ethnicity data may also have been completed by a healthcare provider, leading to potential inaccuracy or lack of detail. Moreover, someone’s ethnicity can reflect a contextual response. This might occur, when a mother believes she will receive better care when reporting a different ethnicity. In contrast, BDM birth registration provides high-quality level four ethnicity data, including information reported by the parents directly outside of the healthcare setting. The requirement for parents to complete birth registration separately from the LMC leads to ethnicity data akin to ethnicity reported in the national Census and is generally considered second choice to Census ethnicity data.[[20]] However, since ethnicity can change and BDM is recorded closer to the birth event than Census, we argue that BDM should be used as the main source in maternity studies. Note that in cases where ethnicity data is missing, the source-ranked ethnicity table (“central” table in the IDI) may be consulted, although only level one ethnicity is provided.
Some research questions will require both ethnicity (i.e., a subjective belief, related to cultural behaviours and practices) and country of birth (COB; i.e., an objective measure, more closely linked to ethnic origin), considering increasing migration and ethnic diversity globally.[[21]] Combining these variables in analyses might provide a better understanding of pregnancy risk factors, since common issues associated with migration in first generation migrants (such as socio-economic deprivation or diverse health literacy) may result in differential health outcomes compared to second and third generation women.[[22–24]] COB data are not available from the standard birth tables and should therefore be obtained from alternative datasets. Census or immigration data present as the highest quality sources for this variable. Since COB is fixed, the datasets can simply be linked by maternal ID, regardless of the correct pregnancy event. An alternative method for consideration, when solely interested whether a mother was New Zealand born or not, is to join parent ID on the infants BDM birth record, with the BDM births table. If the mother’s birth was registered, she was New Zealand born. In contrast, if the mother’s birth cannot be identified in BDM births, she was most likely born overseas.
The MAT or BDM datasets do not contain any information on individual level socio-economic status. The current classification system used to monitor deprivation, which is widely used in health and social research, is the New Zealand Socioeconomic Deprivation Indices (NZDep).[[25]] NZDep is a decile score based on area of domicile, divided into meshblocks or larger Census Area Units where a higher level of confidentiality is required. The NZDep is based on census data; with decile one representing 10% of the population who live in the least deprived areas and decile ten representing 10% of the population who live in the most deprived areas in New Zealand. The NZDep2013 is the fifth updated version since 1991 and combines nine variables from the 2013 Census. A limitation of this method is that the NZDep represents area-level deprivation and does not necessarily reflect the socio-economic status of an individual, although it is a close approximation and may be used as a proxy in large datasets.[[26]] Alternative proxy variables for consideration, include the New Zealand Indices of Multiple Deprivation,[[27]] region of domicile (sourced from the address notifications table), DHB of domicile (sourced from MAT), or personal income level (sourced from Census), although the researchers of this paper have no experience using these alternative sources.
To provide an estimation of socio-economic status in our birth cohort, the registered address closest to the date of birth, and prior to delivery, was chosen for each delivery event, to allow for geographic movement over time, and to best capture mothers socio-economic status during pregnancy. Note that where a full date of birth is missing (primarily among perinatal deaths), birth year and month sourced from the BDM births or MAT table may be used. The meshblock associated with this address was then extracted and linked to the corresponding NZDep2006 and 2013 decile scores (births after 2008 and before 2013 linked to NZDep2006 and from 2013 linked to NZDep2013).
Six thousand, seven hundred and ninety perinatal deaths (4,768 stillbirths and 2,022 NNDs) and 617,375 live births were identified in our dataset. In comparison, PMMRC annual report data comprised 6,518 perinatal deaths (4,779 stillbirths, including 1,456 terminations, and 1,739 NNDs) and 617,321 live births.[[16]] As a result, our numerator dataset includes 272 more perinatal mortalities compared to the gold standard (an approximate 4.0% overreporting in deaths). Our denominator dataset includes 54 more live births compared to PMMRC report data (a 0.05% overreporting in live births). Thus overall, our datasets represent a close approximation to the gold standard. We found that 99.2% of all births (99.5% of live births and 77.4% of perinatal deaths) in our final dataset had a record from both MAT delivery and BDM births; providing complete data for the majority of cases. Cases that could not be joined to both sources, were mostly missing BDM data (example in Table 2).
Smoking status at registration was unknown for 44.0% of women with a DHB-registered pregnancy, while this was only 0.04–1.5% for women under the care of an LMC provider. This was even higher for maternal BMI, with 58.0% and 0.1–2.2% of missing data among these groups respectively. The variable “booking trimester” was missing for 48.5% of women with a DHB-registered pregnancy, compared to 0.01–0.2% of women booked with another LMC type. 37.1% of DHB cases were noted as LMC type “unknown” in our dataset. The following variables were validated.
While maternal age is provided by MAT, it was calculated from maternal birth year, month and delivery date in BDM. In our cohort, there was a 95.6% overlap where this variable was available from both datasets. 95.4% of the 4.4% non-matches only differed by one year. Consequently, the MAT dataset may be used for maternal age (accounting for 99.9% of all cases).
The overlap in fetal sex was 99.96% where both MAT and BDM variables were available. Hence, either variable can be used in analysis (accounting for 98.2% of all cases).
The MAT dataset notes gestational age in weeks, while BDM also registers days. There was an 89.1% overlap in gestational age week where both variables were available. Of the remaining 10.9%, it appeared that 0.7% only differed by one day (e.g., 37 and 36+6), while 7.0% differed by one day to one week (e.g., 39 and 40+0). Among cases in which a larger difference existed (3.2% of total), birthweight was more likely to correlate with MAT gestational age and therefore MAT may be prioritised for use. However, BDM should be used in analyses including customised birthweight centiles,[[28]] where the absence of gestation in days leads to systematic over-estimation of birthweight centiles. Using both tables, 98.0% of cases are accounted for.
The overlap in birthweight was 96.6% where both MAT and BDM variables were available. Of the remaining 3.4%, 1.7% only differed by 100 grams and 0.6% differed by more than 500 grams. Either variable can be used in analysis, accounting for 94.0% of all cases.
Close to 99.0% of deliveries could be linked to a NZDep score.
Level four ethnicity from the 2013 Census had an 89.4% overlap with BDM ethnicity, in cases where only one ethnic group was recorded in both sources (N=392,004). The corresponding overlap for level three ethnicity was 90.6% and level two ethnicity 95.7%. Therefore, if BDM is missing, Census provides a good alternative. If both are missing, MAT ethnicity can be used as a surrogate. This method may also mitigate some data quality differences between ethnicities, as the availability of BDM ethnicity data for perinatal deaths differs per group (Table 2).
Between 2008 and 2017 88.3% of mothers had a known COB from the 2018 Census. Where Census 2018 data was missing, Census 2013 data was used, with a 99.2% agreement between the two surveys among women where both were available. If both were missing, then immigration data was used, with an 89.7% and 88.8% agreement with Census 2018 and 2013 respectively. Immigration metadata suggests using nationality over COB, however in our dataset this resulted in less agreement with Census (82.1% and 80.7%). Finally, in this report nationality was used as a surrogate for COB if all other COB data was missing. This is justified by an 85.4% agreement between COB and nationality in the immigration dataset. By combining all four variables, COB was available for 98.9% of all mothers.
View Tables 1–2.
This methodological paper describes an opportunity for standardised, high-quality maternity research in New Zealand using the IDI. A strength of this proposed approach is the ability to create a comprehensive dataset including perinatal deaths and live births from a variety of national sources, using our combined knowledge, and defining PMMRC data as the gold standard; thereby utilising the best quality data from each dataset available. All steps in creating this dataset have been justified and validated. Complete understanding of the data sources, including the quality of the variables used and general inconsistencies in metadata, will also improve the accuracy of research output. Since these data sources are available to all researchers who are granted permission to use the Statistics NZ IDI, this will increase accessibility.
In developing this methodology, some limitations to the IDI were discovered, such as restricted use of the MORT dataset. Even though MORT is considered the best source for stillbirths, this methodology uses the MAT and BDM datasets to create the numerator table. This is justified as while MORT identified 6,270 perinatal deaths between 2008 and 2017, only 1,955 (31.2%) of these could be matched to MAT or BDM tables. Thus, clinical data including important variables would be unavailable for almost 70% of all mortalities. We suspect this low matching rate is due to a linking error within the IDI, which should be addressed by Statistics NZ. In addition, only 59 cases had a different birth status according to MORT, validating this approach.
We also offer recommendations to improve the quality of perinatal data, to further enhance these resources. Firstly, the transfer of BMI and smoking data from DHB primary care facilities to the MAT datasets should be facilitated to eliminate systematic bias in analyses that control for these variables, as the highest degree of missing data is among high-risk mothers under DHB care, and who are also more likely to suffer perinatal mortalities. Consideration should be given to excluding DHB-registered pregnancies from analyses that require adjustment for these variables. For many years the PMMRC has recommended the MoH to “urgently require DHBs to provide complete and accurate registration data to the MAT dataset”, without success.[[29]] Additionally, the variable “booking trimester” was missing for almost half of women with a DHB-registered pregnancy, despite late booking being associated with poorer perinatal outcomes and socio-economic deprivation.[[30,31]]
Collection of important obstetric risk factors such as maternal pre-existing chronic conditions, should be included in the MAT dataset. Others, such as GDM or hypertensive disorders of pregnancy, may need to be validated as the quality of these variables is not clear. For instance, Lawrence et al. investigated the prevalence of GDM according to some commonly used data sources in New Zealand and found an underreporting in NMDS (3.8%, compared to 5.9% reported by DHBs or 6.9% reported by laboratories).[[32]] There was 70% agreement on the presence of GDM between the data sources. We also suggest that validation studies of routine maternity datasets are performed. This will assist researchers in the interpretation of results of a widely used data source. Furthermore, pregnancy research requires both mother and infant data in most analyses. Mothers may appear in a table more than once with consecutive pregnancies or a pregnancy may result in multiple infants, which complicates the building of a perinatal dataset. Including stillbirths into the MAT infant dataset will provide more detailed information about the birth and simplify the process of creating a dataset. However importantly, although this methodology report offers quality improvement for maternity research, making the PMMRC national dataset available within the IDI should be considered. Use of this dataset would eliminate many data quality issues described in this paper associated with perinatal mortality studies.
Even though the IDI provides a promising avenue for perinatal studies, there are barriers to accessing the data. Each new project requires a comprehensive application process. New research projects are assessed seven times a year, with a turnaround time of approximately six weeks. Successful applications are required to pay a one-off fee ($500). Once approved, specific users of the IDI will need to be authorised by Statistics NZ, undergo confidentiality training, and any changes to the project are subject to evaluation. Researchers are granted access to merge and interrogate data sources only available within the strict privacy rules of the Datalab; available in a few cities across New Zealand. Finally, researchers are recommended to have intermediate SQL coding skills.
In conclusion, this methodological report aims to improve the quality of routine maternity studies in New Zealand by offering an alternative approach to conventional data sources, while simultaneously increasing knowledge and accessibility.
View Appendices 1 & 2.
View Appendices 3 & 4: To see copies of theoriginal SQL files, please contact corresponding author Esti C de Graff at e.degraaff@auckland.ac.nz
The highest quality perinatal data in New Zealand is collected and collated by the Perinatal and Maternal Mortality Review Committee (PMMRC) and is made available to a limited number of researchers. Therefore, maternity, and perinatal mortality studies are generally performed on Government-held data. This report offers an alternative approach with in-depth justification for the methodology, while simultaneously improving the understanding of the data sources.
A standardised method for creating a comprehensive maternity dataset within the Statistics New Zealand Integrated Data Infrastructure (IDI) was developed and a validation dataset was created to include all births between 2008 and 2017.
A close approximation to the PMMRC annual report data was found, with 4.0% over-reporting of perinatal deaths and 0.05% over-reporting of live births in the IDI dataset. Several variables, including important pregnancy risk factors, were validated for use. Limitations to the datasets were explored and additional tables in the IDI were proposed, to include variables on pregnancy complications, ethnicity and country of birth, and socio-economic data.
This methodological report describes an opportunity for standardised, high-quality maternity research in New Zealand using the IDI, including a variety of national data sources. Recommendations for further enhancement of these resources have been offered.
1) Hoppe DJ, Schemitsch EH, Morshed S, et al. Hierarchy of Evidence: Where Observational Studies Fit in and Why We Need Them. J Bone Joint Surg Am. 2009;91(Supplement_3).
2) Funai EF, Rosenbush EJ, Lee MJ, Del Priore G. Distribution of Study Designs in Four Major US Journals of Obstetrics and Gynecology. Gynecol Obstet Invest. 2001;51(1):8-11.
3) Grimes DA, Schulz KF. Bias and causal associations in observational research. Lancet. 2002;359(9302):248-52.
4) Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Int J Surg. 2014;12(12):1500-24.
5) Milne B, Atkinson J, Blakely T, et al. Data Resource Profile: The New Zealand Integrated Data Infrastructure (IDI). Int J Epidemiol. 2019;48.
6) Atkinson J, Blakely T. New Zealand’s Integrated Data Infrastructure (IDI): Value to date and future opportunities. Int J Popul Data Sci. 2017;1.
7) National Health Board Business Unit. National Maternity Collection Data Mart Data Dictionary. 2011.
8) Statistics New Zealand. IDI DIA Life Event data. 2021.
9) Births, Deaths, Marriages, and Relationships Registration (Prescribed Information) Regulations 1995 (SR 1995/183). 2021.
10) National Health Board. National Minimum Dataset (Hospital Events) Data Dictionary. 2014.
11) National Health Board. Mortality Collection Data Dictionary v1.8. 2021.
12) Statistics New Zealand. IDI MOH Chronic Condition/Significant Health Event Cohort data. 2021.
13) Statistics New Zealand. the Statistics Act 1975, version as at 28 October 2021.
14) Statistics New Zealand. IDI Data Dictionary: Immigration data (July 2015 edition). 2021.
15) Statistics New Zealand. Statistical standard for meshblock. Available from www.stats.govt.nz; 2016.
16) Perinatal and Maternal Mortality Review Committee. Thirteenth Annual Report of the Perinatal and Maternal Mortality Review Committee | Te Pūrongo ā-Tau Tekau mā Whā o te Komiti Arotake Mate Pēpi, Mate Whaea Hoki: Reporting mortality and morbidity 2017 | Te tuku pūrongo mō te mate me te whakamate 2017. Wellington: Health Quality & Safety Commission; 2019.
17) Perinatal and Maternal Mortality Review Committee. Methodology and definitions for Perinatal and Maternal Mortality Review Committee (PMMRC) reporting. 2018.
18) Ministry of Health. Primary Maternity Services Notice Persuant to Section 88 of the New Zealand Public Health And Disability Act 2000. 2007.
19) Ministry of Health. HISO 10001:2017 Ethnicity Data Protocols. 2017.
20) Reid G, Bycroft C, Gleisner F. Comparison of ethnicity information in administrative data and the census. Available from www.stats.govt.nz; 2016.
21) Stronks K, Kulu-Glasgow I, Agyemang C. The utility of 'country of birth' for the classification of ethnic groups in health research: the Dutch experience. Ethn Health. 2009;14(3):255-69.
22) Premkumar A, Debbink MP, Silver RM, et al. Association of Acculturation With Adverse Pregnancy Outcomes. Obstet Gynecol. 2020;135(2):301-9.
23) Bakken KS, Skjeldal OH, Stray-Pedersen B. Obstetric Outcomes of First- and Second-Generation Pakistani Immigrants: A Comparison Study at a Low-Risk Maternity Ward in Norway. J Immigr Minor Health. 2017;19(1):33-40.
24) Horner J, Ameratunga SN. Monitoring immigrant health and wellbeing in New Zealand: addressing the tyranny of misleading averages. Aust Health Rev. 2012;36(4):390-3.
25) Atkinson J, Salmond C, Crampton P. NZDEP2013 Index of Deprivation. Department of Public Health, University of Otago, Wellington; 2014.
26) Crampton P, Salmond C, Atkinson J. A comparison of the NZDep and New Zealand IMD indexes of socioeconomic deprivation. Kōtuitui. 2020;15(1):154-69.
27) Exeter DJ, Zhao J, Crengle S, et al. The New Zealand Indices of Multiple Deprivation (IMD): A new suite of indicators for social and health research in Aotearoa, New Zealand. PLOS ONE. 2017;12(8):e0181260.
28) Anderson N, Sadler L, Stewart A, McCowan L. Maternal and pathological pregnancy characteristics in customised birthweight centiles and identification of at-risk small-for-gestational-age infants: a retrospective cohort study. BJOG. 2012;119(7):848-56.
29) Perinatal and Maternal Mortality Review Committee. Fourteenth Annual Report of the Perinatal and Maternal Mortality Review Committee | Te Pūrongo ā-Tau Tekau mā Whā o te Komiti Arotake Mate Pēpi, Mate Whaea Hoki: Reporting mortality and morbidity 2018 | Te tuku pūrongo mō te mate me te whakamate 2018. Wellington: Health Quality & Safety Commission; 2021.
30) Flenady V, Koopmans L, Middleton P, et al. Major risk factors for stillbirth in high-income countries: a systematic review and meta-analysis. Lancet. 2011;377(9774):1331-40.
31) Bartholomew K, Morton SMB, Atatoa Carr PE, et al. Early engagement with a Lead Maternity Carer: Results from Growing Up in New Zealand. Aust N Z J Obstet Gynaecol. 2015;55(3):227-32.
32) Lawrence RL, Wall CR, Bloomfield FH. Prevalence of gestational diabetes according to commonly used data sources: an observational study. BMC Pregnancy Childbirth. 2019;19(1):34.
Observational studies are a useful tool in epidemiology.[[1]] One study revealed that approximately 68% of published articles in the four leading United States obstetrics and gynaecology journals were of observational nature.[[2]] In obstetric research, they provide the opportunity to study relatively rare adverse events like stillbirth or neonatal death (NND). Despite their value, observational studies come with biases and investigators have an obligation to identify and mitigate these,[[3]] which includes adequate reporting of the study design and methodology.[[4]]
Epidemiological maternity research in New Zealand is usually performed on national Government-held data. The highest quality perinatal data in the country is collected and held by the Perinatal and Maternal Mortality Review Committee (PMMRC), which is an independent committee under the Health Quality & Safety Commission New Zealand. PMMRC data is only made available to a limited number of researchers, due to data sovereignty issues. An alternative data source for maternity research in New Zealand is provided by the Statistics New Zealand (NZ) Integrated Data Infrastructure (IDI). The IDI is a collection of New Zealand Government and non-government administrative and survey data, held by Statistics NZ. Researchers are granted access to merge and interrogate data sources only available within the strict privacy rules of the IDI environment; as projects are required to meet all “five safes” (safe people, safe projects, safe settings, safe data, and safe output).[[5]] Data in the IDI is linked at the individual level, which allows personal information to be connected across different sectors (such as income, migration, and health). Each person is provided with a unique identification (ID) number in a central “spine”, by which the various datasets can be joined. All people ever resident in New Zealand (i.e., citizens or those with visas that allow residency, work or study) and captured in one of the data sources, are included. The IDI data are “refreshed” (i.e., updated, to include newer data, and additional data sources) up to four times a year. By connecting data from multiple sources not otherwise linked on an individual level, questions around complex issues can be researched with high quality across the population.[[6]]
As both maternal and infant data are generally necessary in perinatal analyses, creating datasets for maternity research can be complicated. This methodological report was prepared to assist New Zealand researchers in developing comprehensive datasets for national pregnancy studies, with a focus on perinatal death. We describe a standardised method for creating a “core” dataset within the IDI, allowing for consistent national reporting, and include suggestions for additional tables. We additionally aim to improve the understanding of the used datasets and variables.
An application was made with Statistics NZ to use the data within the IDI. Once approved, a dataset for perinatal research was built in Microsoft SQL Management Server Studio (using IDI refresh IDI_Clean_20211020). National data sources used, were the Maternity Collection (MAT),[[7]] Births, Deaths and Marriages (BDM),[[8,9]] the National Minimum Dataset (NMDS),[[10]] the Mortality Collection (MORT),[[11]] the Chronic Conditions dataset (CC),[[12]] Census,[[13]] and Immigration data.[[14]] “General” or “central” IDI tables used included the full birth date, full death date, and address notification tables.[[15]] A succinct and essential summary of these data sources has been provided in Appendix 1. See Table 1 for an overview of abbreviations used in this methodological report. Finally, PMMRC publicly available annual report data was used as the gold standard to validate the tables created.[[16]]
A numerator table (including all perinatal deaths) and a denominator table (including all births) were created separately, to include all births between 2008 and 2017. The main dataset used was the MAT collection. Data quality of the MAT collection varies by item but has improved significantly since 2008.[[17]] Thus, data from 2008 onward are most useful for perinatal research. The proposed method for creating the core dataset, as well as two corresponding full SQL codes (for a table in and excluding multiples) are provided in Appendices 2–4.
Even though data quality has improved since 2008, some variables still have a high degree of missing data. This is particularly true for women cared for exclusively by district health boards (DHB), due to a funding change introduced in July 2007. DHB employed midwifery teams are no longer required to claim for primary maternity services, while self-employed community-based lead maternity carers (LMCs; midwives, general practitioners, or obstetric specialists) must submit pregnancy data prior to payment from the Ministry of Health (MoH).[[18]] As a result, completeness of some data for DHB-registered pregnancies varies widely, while the same data for non-DHB pregnancies is mostly complete. It should also be noted that since 2008 the variable “LMC type” is inaccurate for women under DHB care.[[17]] Finally, when considering data quality issues, some variables can be taken from either MAT or BDM data sources. We validated the following variables for use: maternal age, fetal sex, gestational age, and birthweight.
Pregnancy complications such as gestational diabetes (GDM) or hypertensive disorders of pregnancy are important outcomes in routine maternity research. The MAT delivery table in the IDI identifies births that were complicated by pre-eclampsia or eclampsia, identified by NMDS. MAT, however, does not identify pregnancies of mothers with pre-existing hypertensive disorders or gestational hypertension. Secondly, the indicator for GDM is not offered in the IDI as this field is incorrectly labelled. It indicates both pre-existing diabetes and GDM, and according to the MoH no validation process is undertaken for this field. For that reason, we propose to add NMDS and CC datasets to the core dataset. Unfortunately, data on primary care diagnoses are not available in the IDI. Hospital admissions can be joined to the correct pregnancy by maternal ID and admission dates. Following a similar method, mothers with pre-existing diabetes (as an important risk factor) can be identified by joining the CC dataset.
Ethnicity is an important demographic risk factor widely used in pregnancy research in New Zealand, and key to examining health equity. Characteristics of ethnicity recorded in New Zealand include that it is self-defined, it can change over time and an individual may identify with more than one ethnic group. The use of ethnicity data in health research is addressed by the Ethnicity Data Protocols for the Health and Disability Sector by the MoH.[[19]] According to this protocol, ethnicity data can be categorized at four different “levels” following a hierarchical structure; with level four being the most detailed level of reporting (containing 231 ethnicity codes). These codes are then aggregated into ethnicity levels three to one. As an illustration, code “43112” stands for Fijian Indian and aggregates into “431” Indian (level three), “43” Indian (level two) and “4” Asian (level one). Generally, level two ethnicity data are used in health research for reporting, which includes 22 groups. In this aggregation a high level of detail is maintained for some ethnicities (such as Māori, Pacific Peoples, Chinese or Indian), while other minority groups are merged despite large heterogeneity (such as other Asian ethnicities, African or Latin American).
Unfortunately, the MAT dataset only holds level two ethnicity data. Due to the data collection methods of MAT, ethnicity data may also have been completed by a healthcare provider, leading to potential inaccuracy or lack of detail. Moreover, someone’s ethnicity can reflect a contextual response. This might occur, when a mother believes she will receive better care when reporting a different ethnicity. In contrast, BDM birth registration provides high-quality level four ethnicity data, including information reported by the parents directly outside of the healthcare setting. The requirement for parents to complete birth registration separately from the LMC leads to ethnicity data akin to ethnicity reported in the national Census and is generally considered second choice to Census ethnicity data.[[20]] However, since ethnicity can change and BDM is recorded closer to the birth event than Census, we argue that BDM should be used as the main source in maternity studies. Note that in cases where ethnicity data is missing, the source-ranked ethnicity table (“central” table in the IDI) may be consulted, although only level one ethnicity is provided.
Some research questions will require both ethnicity (i.e., a subjective belief, related to cultural behaviours and practices) and country of birth (COB; i.e., an objective measure, more closely linked to ethnic origin), considering increasing migration and ethnic diversity globally.[[21]] Combining these variables in analyses might provide a better understanding of pregnancy risk factors, since common issues associated with migration in first generation migrants (such as socio-economic deprivation or diverse health literacy) may result in differential health outcomes compared to second and third generation women.[[22–24]] COB data are not available from the standard birth tables and should therefore be obtained from alternative datasets. Census or immigration data present as the highest quality sources for this variable. Since COB is fixed, the datasets can simply be linked by maternal ID, regardless of the correct pregnancy event. An alternative method for consideration, when solely interested whether a mother was New Zealand born or not, is to join parent ID on the infants BDM birth record, with the BDM births table. If the mother’s birth was registered, she was New Zealand born. In contrast, if the mother’s birth cannot be identified in BDM births, she was most likely born overseas.
The MAT or BDM datasets do not contain any information on individual level socio-economic status. The current classification system used to monitor deprivation, which is widely used in health and social research, is the New Zealand Socioeconomic Deprivation Indices (NZDep).[[25]] NZDep is a decile score based on area of domicile, divided into meshblocks or larger Census Area Units where a higher level of confidentiality is required. The NZDep is based on census data; with decile one representing 10% of the population who live in the least deprived areas and decile ten representing 10% of the population who live in the most deprived areas in New Zealand. The NZDep2013 is the fifth updated version since 1991 and combines nine variables from the 2013 Census. A limitation of this method is that the NZDep represents area-level deprivation and does not necessarily reflect the socio-economic status of an individual, although it is a close approximation and may be used as a proxy in large datasets.[[26]] Alternative proxy variables for consideration, include the New Zealand Indices of Multiple Deprivation,[[27]] region of domicile (sourced from the address notifications table), DHB of domicile (sourced from MAT), or personal income level (sourced from Census), although the researchers of this paper have no experience using these alternative sources.
To provide an estimation of socio-economic status in our birth cohort, the registered address closest to the date of birth, and prior to delivery, was chosen for each delivery event, to allow for geographic movement over time, and to best capture mothers socio-economic status during pregnancy. Note that where a full date of birth is missing (primarily among perinatal deaths), birth year and month sourced from the BDM births or MAT table may be used. The meshblock associated with this address was then extracted and linked to the corresponding NZDep2006 and 2013 decile scores (births after 2008 and before 2013 linked to NZDep2006 and from 2013 linked to NZDep2013).
Six thousand, seven hundred and ninety perinatal deaths (4,768 stillbirths and 2,022 NNDs) and 617,375 live births were identified in our dataset. In comparison, PMMRC annual report data comprised 6,518 perinatal deaths (4,779 stillbirths, including 1,456 terminations, and 1,739 NNDs) and 617,321 live births.[[16]] As a result, our numerator dataset includes 272 more perinatal mortalities compared to the gold standard (an approximate 4.0% overreporting in deaths). Our denominator dataset includes 54 more live births compared to PMMRC report data (a 0.05% overreporting in live births). Thus overall, our datasets represent a close approximation to the gold standard. We found that 99.2% of all births (99.5% of live births and 77.4% of perinatal deaths) in our final dataset had a record from both MAT delivery and BDM births; providing complete data for the majority of cases. Cases that could not be joined to both sources, were mostly missing BDM data (example in Table 2).
Smoking status at registration was unknown for 44.0% of women with a DHB-registered pregnancy, while this was only 0.04–1.5% for women under the care of an LMC provider. This was even higher for maternal BMI, with 58.0% and 0.1–2.2% of missing data among these groups respectively. The variable “booking trimester” was missing for 48.5% of women with a DHB-registered pregnancy, compared to 0.01–0.2% of women booked with another LMC type. 37.1% of DHB cases were noted as LMC type “unknown” in our dataset. The following variables were validated.
While maternal age is provided by MAT, it was calculated from maternal birth year, month and delivery date in BDM. In our cohort, there was a 95.6% overlap where this variable was available from both datasets. 95.4% of the 4.4% non-matches only differed by one year. Consequently, the MAT dataset may be used for maternal age (accounting for 99.9% of all cases).
The overlap in fetal sex was 99.96% where both MAT and BDM variables were available. Hence, either variable can be used in analysis (accounting for 98.2% of all cases).
The MAT dataset notes gestational age in weeks, while BDM also registers days. There was an 89.1% overlap in gestational age week where both variables were available. Of the remaining 10.9%, it appeared that 0.7% only differed by one day (e.g., 37 and 36+6), while 7.0% differed by one day to one week (e.g., 39 and 40+0). Among cases in which a larger difference existed (3.2% of total), birthweight was more likely to correlate with MAT gestational age and therefore MAT may be prioritised for use. However, BDM should be used in analyses including customised birthweight centiles,[[28]] where the absence of gestation in days leads to systematic over-estimation of birthweight centiles. Using both tables, 98.0% of cases are accounted for.
The overlap in birthweight was 96.6% where both MAT and BDM variables were available. Of the remaining 3.4%, 1.7% only differed by 100 grams and 0.6% differed by more than 500 grams. Either variable can be used in analysis, accounting for 94.0% of all cases.
Close to 99.0% of deliveries could be linked to a NZDep score.
Level four ethnicity from the 2013 Census had an 89.4% overlap with BDM ethnicity, in cases where only one ethnic group was recorded in both sources (N=392,004). The corresponding overlap for level three ethnicity was 90.6% and level two ethnicity 95.7%. Therefore, if BDM is missing, Census provides a good alternative. If both are missing, MAT ethnicity can be used as a surrogate. This method may also mitigate some data quality differences between ethnicities, as the availability of BDM ethnicity data for perinatal deaths differs per group (Table 2).
Between 2008 and 2017 88.3% of mothers had a known COB from the 2018 Census. Where Census 2018 data was missing, Census 2013 data was used, with a 99.2% agreement between the two surveys among women where both were available. If both were missing, then immigration data was used, with an 89.7% and 88.8% agreement with Census 2018 and 2013 respectively. Immigration metadata suggests using nationality over COB, however in our dataset this resulted in less agreement with Census (82.1% and 80.7%). Finally, in this report nationality was used as a surrogate for COB if all other COB data was missing. This is justified by an 85.4% agreement between COB and nationality in the immigration dataset. By combining all four variables, COB was available for 98.9% of all mothers.
View Tables 1–2.
This methodological paper describes an opportunity for standardised, high-quality maternity research in New Zealand using the IDI. A strength of this proposed approach is the ability to create a comprehensive dataset including perinatal deaths and live births from a variety of national sources, using our combined knowledge, and defining PMMRC data as the gold standard; thereby utilising the best quality data from each dataset available. All steps in creating this dataset have been justified and validated. Complete understanding of the data sources, including the quality of the variables used and general inconsistencies in metadata, will also improve the accuracy of research output. Since these data sources are available to all researchers who are granted permission to use the Statistics NZ IDI, this will increase accessibility.
In developing this methodology, some limitations to the IDI were discovered, such as restricted use of the MORT dataset. Even though MORT is considered the best source for stillbirths, this methodology uses the MAT and BDM datasets to create the numerator table. This is justified as while MORT identified 6,270 perinatal deaths between 2008 and 2017, only 1,955 (31.2%) of these could be matched to MAT or BDM tables. Thus, clinical data including important variables would be unavailable for almost 70% of all mortalities. We suspect this low matching rate is due to a linking error within the IDI, which should be addressed by Statistics NZ. In addition, only 59 cases had a different birth status according to MORT, validating this approach.
We also offer recommendations to improve the quality of perinatal data, to further enhance these resources. Firstly, the transfer of BMI and smoking data from DHB primary care facilities to the MAT datasets should be facilitated to eliminate systematic bias in analyses that control for these variables, as the highest degree of missing data is among high-risk mothers under DHB care, and who are also more likely to suffer perinatal mortalities. Consideration should be given to excluding DHB-registered pregnancies from analyses that require adjustment for these variables. For many years the PMMRC has recommended the MoH to “urgently require DHBs to provide complete and accurate registration data to the MAT dataset”, without success.[[29]] Additionally, the variable “booking trimester” was missing for almost half of women with a DHB-registered pregnancy, despite late booking being associated with poorer perinatal outcomes and socio-economic deprivation.[[30,31]]
Collection of important obstetric risk factors such as maternal pre-existing chronic conditions, should be included in the MAT dataset. Others, such as GDM or hypertensive disorders of pregnancy, may need to be validated as the quality of these variables is not clear. For instance, Lawrence et al. investigated the prevalence of GDM according to some commonly used data sources in New Zealand and found an underreporting in NMDS (3.8%, compared to 5.9% reported by DHBs or 6.9% reported by laboratories).[[32]] There was 70% agreement on the presence of GDM between the data sources. We also suggest that validation studies of routine maternity datasets are performed. This will assist researchers in the interpretation of results of a widely used data source. Furthermore, pregnancy research requires both mother and infant data in most analyses. Mothers may appear in a table more than once with consecutive pregnancies or a pregnancy may result in multiple infants, which complicates the building of a perinatal dataset. Including stillbirths into the MAT infant dataset will provide more detailed information about the birth and simplify the process of creating a dataset. However importantly, although this methodology report offers quality improvement for maternity research, making the PMMRC national dataset available within the IDI should be considered. Use of this dataset would eliminate many data quality issues described in this paper associated with perinatal mortality studies.
Even though the IDI provides a promising avenue for perinatal studies, there are barriers to accessing the data. Each new project requires a comprehensive application process. New research projects are assessed seven times a year, with a turnaround time of approximately six weeks. Successful applications are required to pay a one-off fee ($500). Once approved, specific users of the IDI will need to be authorised by Statistics NZ, undergo confidentiality training, and any changes to the project are subject to evaluation. Researchers are granted access to merge and interrogate data sources only available within the strict privacy rules of the Datalab; available in a few cities across New Zealand. Finally, researchers are recommended to have intermediate SQL coding skills.
In conclusion, this methodological report aims to improve the quality of routine maternity studies in New Zealand by offering an alternative approach to conventional data sources, while simultaneously increasing knowledge and accessibility.
View Appendices 1 & 2.
View Appendices 3 & 4: To see copies of theoriginal SQL files, please contact corresponding author Esti C de Graff at e.degraaff@auckland.ac.nz
The highest quality perinatal data in New Zealand is collected and collated by the Perinatal and Maternal Mortality Review Committee (PMMRC) and is made available to a limited number of researchers. Therefore, maternity, and perinatal mortality studies are generally performed on Government-held data. This report offers an alternative approach with in-depth justification for the methodology, while simultaneously improving the understanding of the data sources.
A standardised method for creating a comprehensive maternity dataset within the Statistics New Zealand Integrated Data Infrastructure (IDI) was developed and a validation dataset was created to include all births between 2008 and 2017.
A close approximation to the PMMRC annual report data was found, with 4.0% over-reporting of perinatal deaths and 0.05% over-reporting of live births in the IDI dataset. Several variables, including important pregnancy risk factors, were validated for use. Limitations to the datasets were explored and additional tables in the IDI were proposed, to include variables on pregnancy complications, ethnicity and country of birth, and socio-economic data.
This methodological report describes an opportunity for standardised, high-quality maternity research in New Zealand using the IDI, including a variety of national data sources. Recommendations for further enhancement of these resources have been offered.
1) Hoppe DJ, Schemitsch EH, Morshed S, et al. Hierarchy of Evidence: Where Observational Studies Fit in and Why We Need Them. J Bone Joint Surg Am. 2009;91(Supplement_3).
2) Funai EF, Rosenbush EJ, Lee MJ, Del Priore G. Distribution of Study Designs in Four Major US Journals of Obstetrics and Gynecology. Gynecol Obstet Invest. 2001;51(1):8-11.
3) Grimes DA, Schulz KF. Bias and causal associations in observational research. Lancet. 2002;359(9302):248-52.
4) Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Int J Surg. 2014;12(12):1500-24.
5) Milne B, Atkinson J, Blakely T, et al. Data Resource Profile: The New Zealand Integrated Data Infrastructure (IDI). Int J Epidemiol. 2019;48.
6) Atkinson J, Blakely T. New Zealand’s Integrated Data Infrastructure (IDI): Value to date and future opportunities. Int J Popul Data Sci. 2017;1.
7) National Health Board Business Unit. National Maternity Collection Data Mart Data Dictionary. 2011.
8) Statistics New Zealand. IDI DIA Life Event data. 2021.
9) Births, Deaths, Marriages, and Relationships Registration (Prescribed Information) Regulations 1995 (SR 1995/183). 2021.
10) National Health Board. National Minimum Dataset (Hospital Events) Data Dictionary. 2014.
11) National Health Board. Mortality Collection Data Dictionary v1.8. 2021.
12) Statistics New Zealand. IDI MOH Chronic Condition/Significant Health Event Cohort data. 2021.
13) Statistics New Zealand. the Statistics Act 1975, version as at 28 October 2021.
14) Statistics New Zealand. IDI Data Dictionary: Immigration data (July 2015 edition). 2021.
15) Statistics New Zealand. Statistical standard for meshblock. Available from www.stats.govt.nz; 2016.
16) Perinatal and Maternal Mortality Review Committee. Thirteenth Annual Report of the Perinatal and Maternal Mortality Review Committee | Te Pūrongo ā-Tau Tekau mā Whā o te Komiti Arotake Mate Pēpi, Mate Whaea Hoki: Reporting mortality and morbidity 2017 | Te tuku pūrongo mō te mate me te whakamate 2017. Wellington: Health Quality & Safety Commission; 2019.
17) Perinatal and Maternal Mortality Review Committee. Methodology and definitions for Perinatal and Maternal Mortality Review Committee (PMMRC) reporting. 2018.
18) Ministry of Health. Primary Maternity Services Notice Persuant to Section 88 of the New Zealand Public Health And Disability Act 2000. 2007.
19) Ministry of Health. HISO 10001:2017 Ethnicity Data Protocols. 2017.
20) Reid G, Bycroft C, Gleisner F. Comparison of ethnicity information in administrative data and the census. Available from www.stats.govt.nz; 2016.
21) Stronks K, Kulu-Glasgow I, Agyemang C. The utility of 'country of birth' for the classification of ethnic groups in health research: the Dutch experience. Ethn Health. 2009;14(3):255-69.
22) Premkumar A, Debbink MP, Silver RM, et al. Association of Acculturation With Adverse Pregnancy Outcomes. Obstet Gynecol. 2020;135(2):301-9.
23) Bakken KS, Skjeldal OH, Stray-Pedersen B. Obstetric Outcomes of First- and Second-Generation Pakistani Immigrants: A Comparison Study at a Low-Risk Maternity Ward in Norway. J Immigr Minor Health. 2017;19(1):33-40.
24) Horner J, Ameratunga SN. Monitoring immigrant health and wellbeing in New Zealand: addressing the tyranny of misleading averages. Aust Health Rev. 2012;36(4):390-3.
25) Atkinson J, Salmond C, Crampton P. NZDEP2013 Index of Deprivation. Department of Public Health, University of Otago, Wellington; 2014.
26) Crampton P, Salmond C, Atkinson J. A comparison of the NZDep and New Zealand IMD indexes of socioeconomic deprivation. Kōtuitui. 2020;15(1):154-69.
27) Exeter DJ, Zhao J, Crengle S, et al. The New Zealand Indices of Multiple Deprivation (IMD): A new suite of indicators for social and health research in Aotearoa, New Zealand. PLOS ONE. 2017;12(8):e0181260.
28) Anderson N, Sadler L, Stewart A, McCowan L. Maternal and pathological pregnancy characteristics in customised birthweight centiles and identification of at-risk small-for-gestational-age infants: a retrospective cohort study. BJOG. 2012;119(7):848-56.
29) Perinatal and Maternal Mortality Review Committee. Fourteenth Annual Report of the Perinatal and Maternal Mortality Review Committee | Te Pūrongo ā-Tau Tekau mā Whā o te Komiti Arotake Mate Pēpi, Mate Whaea Hoki: Reporting mortality and morbidity 2018 | Te tuku pūrongo mō te mate me te whakamate 2018. Wellington: Health Quality & Safety Commission; 2021.
30) Flenady V, Koopmans L, Middleton P, et al. Major risk factors for stillbirth in high-income countries: a systematic review and meta-analysis. Lancet. 2011;377(9774):1331-40.
31) Bartholomew K, Morton SMB, Atatoa Carr PE, et al. Early engagement with a Lead Maternity Carer: Results from Growing Up in New Zealand. Aust N Z J Obstet Gynaecol. 2015;55(3):227-32.
32) Lawrence RL, Wall CR, Bloomfield FH. Prevalence of gestational diabetes according to commonly used data sources: an observational study. BMC Pregnancy Childbirth. 2019;19(1):34.
Observational studies are a useful tool in epidemiology.[[1]] One study revealed that approximately 68% of published articles in the four leading United States obstetrics and gynaecology journals were of observational nature.[[2]] In obstetric research, they provide the opportunity to study relatively rare adverse events like stillbirth or neonatal death (NND). Despite their value, observational studies come with biases and investigators have an obligation to identify and mitigate these,[[3]] which includes adequate reporting of the study design and methodology.[[4]]
Epidemiological maternity research in New Zealand is usually performed on national Government-held data. The highest quality perinatal data in the country is collected and held by the Perinatal and Maternal Mortality Review Committee (PMMRC), which is an independent committee under the Health Quality & Safety Commission New Zealand. PMMRC data is only made available to a limited number of researchers, due to data sovereignty issues. An alternative data source for maternity research in New Zealand is provided by the Statistics New Zealand (NZ) Integrated Data Infrastructure (IDI). The IDI is a collection of New Zealand Government and non-government administrative and survey data, held by Statistics NZ. Researchers are granted access to merge and interrogate data sources only available within the strict privacy rules of the IDI environment; as projects are required to meet all “five safes” (safe people, safe projects, safe settings, safe data, and safe output).[[5]] Data in the IDI is linked at the individual level, which allows personal information to be connected across different sectors (such as income, migration, and health). Each person is provided with a unique identification (ID) number in a central “spine”, by which the various datasets can be joined. All people ever resident in New Zealand (i.e., citizens or those with visas that allow residency, work or study) and captured in one of the data sources, are included. The IDI data are “refreshed” (i.e., updated, to include newer data, and additional data sources) up to four times a year. By connecting data from multiple sources not otherwise linked on an individual level, questions around complex issues can be researched with high quality across the population.[[6]]
As both maternal and infant data are generally necessary in perinatal analyses, creating datasets for maternity research can be complicated. This methodological report was prepared to assist New Zealand researchers in developing comprehensive datasets for national pregnancy studies, with a focus on perinatal death. We describe a standardised method for creating a “core” dataset within the IDI, allowing for consistent national reporting, and include suggestions for additional tables. We additionally aim to improve the understanding of the used datasets and variables.
An application was made with Statistics NZ to use the data within the IDI. Once approved, a dataset for perinatal research was built in Microsoft SQL Management Server Studio (using IDI refresh IDI_Clean_20211020). National data sources used, were the Maternity Collection (MAT),[[7]] Births, Deaths and Marriages (BDM),[[8,9]] the National Minimum Dataset (NMDS),[[10]] the Mortality Collection (MORT),[[11]] the Chronic Conditions dataset (CC),[[12]] Census,[[13]] and Immigration data.[[14]] “General” or “central” IDI tables used included the full birth date, full death date, and address notification tables.[[15]] A succinct and essential summary of these data sources has been provided in Appendix 1. See Table 1 for an overview of abbreviations used in this methodological report. Finally, PMMRC publicly available annual report data was used as the gold standard to validate the tables created.[[16]]
A numerator table (including all perinatal deaths) and a denominator table (including all births) were created separately, to include all births between 2008 and 2017. The main dataset used was the MAT collection. Data quality of the MAT collection varies by item but has improved significantly since 2008.[[17]] Thus, data from 2008 onward are most useful for perinatal research. The proposed method for creating the core dataset, as well as two corresponding full SQL codes (for a table in and excluding multiples) are provided in Appendices 2–4.
Even though data quality has improved since 2008, some variables still have a high degree of missing data. This is particularly true for women cared for exclusively by district health boards (DHB), due to a funding change introduced in July 2007. DHB employed midwifery teams are no longer required to claim for primary maternity services, while self-employed community-based lead maternity carers (LMCs; midwives, general practitioners, or obstetric specialists) must submit pregnancy data prior to payment from the Ministry of Health (MoH).[[18]] As a result, completeness of some data for DHB-registered pregnancies varies widely, while the same data for non-DHB pregnancies is mostly complete. It should also be noted that since 2008 the variable “LMC type” is inaccurate for women under DHB care.[[17]] Finally, when considering data quality issues, some variables can be taken from either MAT or BDM data sources. We validated the following variables for use: maternal age, fetal sex, gestational age, and birthweight.
Pregnancy complications such as gestational diabetes (GDM) or hypertensive disorders of pregnancy are important outcomes in routine maternity research. The MAT delivery table in the IDI identifies births that were complicated by pre-eclampsia or eclampsia, identified by NMDS. MAT, however, does not identify pregnancies of mothers with pre-existing hypertensive disorders or gestational hypertension. Secondly, the indicator for GDM is not offered in the IDI as this field is incorrectly labelled. It indicates both pre-existing diabetes and GDM, and according to the MoH no validation process is undertaken for this field. For that reason, we propose to add NMDS and CC datasets to the core dataset. Unfortunately, data on primary care diagnoses are not available in the IDI. Hospital admissions can be joined to the correct pregnancy by maternal ID and admission dates. Following a similar method, mothers with pre-existing diabetes (as an important risk factor) can be identified by joining the CC dataset.
Ethnicity is an important demographic risk factor widely used in pregnancy research in New Zealand, and key to examining health equity. Characteristics of ethnicity recorded in New Zealand include that it is self-defined, it can change over time and an individual may identify with more than one ethnic group. The use of ethnicity data in health research is addressed by the Ethnicity Data Protocols for the Health and Disability Sector by the MoH.[[19]] According to this protocol, ethnicity data can be categorized at four different “levels” following a hierarchical structure; with level four being the most detailed level of reporting (containing 231 ethnicity codes). These codes are then aggregated into ethnicity levels three to one. As an illustration, code “43112” stands for Fijian Indian and aggregates into “431” Indian (level three), “43” Indian (level two) and “4” Asian (level one). Generally, level two ethnicity data are used in health research for reporting, which includes 22 groups. In this aggregation a high level of detail is maintained for some ethnicities (such as Māori, Pacific Peoples, Chinese or Indian), while other minority groups are merged despite large heterogeneity (such as other Asian ethnicities, African or Latin American).
Unfortunately, the MAT dataset only holds level two ethnicity data. Due to the data collection methods of MAT, ethnicity data may also have been completed by a healthcare provider, leading to potential inaccuracy or lack of detail. Moreover, someone’s ethnicity can reflect a contextual response. This might occur, when a mother believes she will receive better care when reporting a different ethnicity. In contrast, BDM birth registration provides high-quality level four ethnicity data, including information reported by the parents directly outside of the healthcare setting. The requirement for parents to complete birth registration separately from the LMC leads to ethnicity data akin to ethnicity reported in the national Census and is generally considered second choice to Census ethnicity data.[[20]] However, since ethnicity can change and BDM is recorded closer to the birth event than Census, we argue that BDM should be used as the main source in maternity studies. Note that in cases where ethnicity data is missing, the source-ranked ethnicity table (“central” table in the IDI) may be consulted, although only level one ethnicity is provided.
Some research questions will require both ethnicity (i.e., a subjective belief, related to cultural behaviours and practices) and country of birth (COB; i.e., an objective measure, more closely linked to ethnic origin), considering increasing migration and ethnic diversity globally.[[21]] Combining these variables in analyses might provide a better understanding of pregnancy risk factors, since common issues associated with migration in first generation migrants (such as socio-economic deprivation or diverse health literacy) may result in differential health outcomes compared to second and third generation women.[[22–24]] COB data are not available from the standard birth tables and should therefore be obtained from alternative datasets. Census or immigration data present as the highest quality sources for this variable. Since COB is fixed, the datasets can simply be linked by maternal ID, regardless of the correct pregnancy event. An alternative method for consideration, when solely interested whether a mother was New Zealand born or not, is to join parent ID on the infants BDM birth record, with the BDM births table. If the mother’s birth was registered, she was New Zealand born. In contrast, if the mother’s birth cannot be identified in BDM births, she was most likely born overseas.
The MAT or BDM datasets do not contain any information on individual level socio-economic status. The current classification system used to monitor deprivation, which is widely used in health and social research, is the New Zealand Socioeconomic Deprivation Indices (NZDep).[[25]] NZDep is a decile score based on area of domicile, divided into meshblocks or larger Census Area Units where a higher level of confidentiality is required. The NZDep is based on census data; with decile one representing 10% of the population who live in the least deprived areas and decile ten representing 10% of the population who live in the most deprived areas in New Zealand. The NZDep2013 is the fifth updated version since 1991 and combines nine variables from the 2013 Census. A limitation of this method is that the NZDep represents area-level deprivation and does not necessarily reflect the socio-economic status of an individual, although it is a close approximation and may be used as a proxy in large datasets.[[26]] Alternative proxy variables for consideration, include the New Zealand Indices of Multiple Deprivation,[[27]] region of domicile (sourced from the address notifications table), DHB of domicile (sourced from MAT), or personal income level (sourced from Census), although the researchers of this paper have no experience using these alternative sources.
To provide an estimation of socio-economic status in our birth cohort, the registered address closest to the date of birth, and prior to delivery, was chosen for each delivery event, to allow for geographic movement over time, and to best capture mothers socio-economic status during pregnancy. Note that where a full date of birth is missing (primarily among perinatal deaths), birth year and month sourced from the BDM births or MAT table may be used. The meshblock associated with this address was then extracted and linked to the corresponding NZDep2006 and 2013 decile scores (births after 2008 and before 2013 linked to NZDep2006 and from 2013 linked to NZDep2013).
Six thousand, seven hundred and ninety perinatal deaths (4,768 stillbirths and 2,022 NNDs) and 617,375 live births were identified in our dataset. In comparison, PMMRC annual report data comprised 6,518 perinatal deaths (4,779 stillbirths, including 1,456 terminations, and 1,739 NNDs) and 617,321 live births.[[16]] As a result, our numerator dataset includes 272 more perinatal mortalities compared to the gold standard (an approximate 4.0% overreporting in deaths). Our denominator dataset includes 54 more live births compared to PMMRC report data (a 0.05% overreporting in live births). Thus overall, our datasets represent a close approximation to the gold standard. We found that 99.2% of all births (99.5% of live births and 77.4% of perinatal deaths) in our final dataset had a record from both MAT delivery and BDM births; providing complete data for the majority of cases. Cases that could not be joined to both sources, were mostly missing BDM data (example in Table 2).
Smoking status at registration was unknown for 44.0% of women with a DHB-registered pregnancy, while this was only 0.04–1.5% for women under the care of an LMC provider. This was even higher for maternal BMI, with 58.0% and 0.1–2.2% of missing data among these groups respectively. The variable “booking trimester” was missing for 48.5% of women with a DHB-registered pregnancy, compared to 0.01–0.2% of women booked with another LMC type. 37.1% of DHB cases were noted as LMC type “unknown” in our dataset. The following variables were validated.
While maternal age is provided by MAT, it was calculated from maternal birth year, month and delivery date in BDM. In our cohort, there was a 95.6% overlap where this variable was available from both datasets. 95.4% of the 4.4% non-matches only differed by one year. Consequently, the MAT dataset may be used for maternal age (accounting for 99.9% of all cases).
The overlap in fetal sex was 99.96% where both MAT and BDM variables were available. Hence, either variable can be used in analysis (accounting for 98.2% of all cases).
The MAT dataset notes gestational age in weeks, while BDM also registers days. There was an 89.1% overlap in gestational age week where both variables were available. Of the remaining 10.9%, it appeared that 0.7% only differed by one day (e.g., 37 and 36+6), while 7.0% differed by one day to one week (e.g., 39 and 40+0). Among cases in which a larger difference existed (3.2% of total), birthweight was more likely to correlate with MAT gestational age and therefore MAT may be prioritised for use. However, BDM should be used in analyses including customised birthweight centiles,[[28]] where the absence of gestation in days leads to systematic over-estimation of birthweight centiles. Using both tables, 98.0% of cases are accounted for.
The overlap in birthweight was 96.6% where both MAT and BDM variables were available. Of the remaining 3.4%, 1.7% only differed by 100 grams and 0.6% differed by more than 500 grams. Either variable can be used in analysis, accounting for 94.0% of all cases.
Close to 99.0% of deliveries could be linked to a NZDep score.
Level four ethnicity from the 2013 Census had an 89.4% overlap with BDM ethnicity, in cases where only one ethnic group was recorded in both sources (N=392,004). The corresponding overlap for level three ethnicity was 90.6% and level two ethnicity 95.7%. Therefore, if BDM is missing, Census provides a good alternative. If both are missing, MAT ethnicity can be used as a surrogate. This method may also mitigate some data quality differences between ethnicities, as the availability of BDM ethnicity data for perinatal deaths differs per group (Table 2).
Between 2008 and 2017 88.3% of mothers had a known COB from the 2018 Census. Where Census 2018 data was missing, Census 2013 data was used, with a 99.2% agreement between the two surveys among women where both were available. If both were missing, then immigration data was used, with an 89.7% and 88.8% agreement with Census 2018 and 2013 respectively. Immigration metadata suggests using nationality over COB, however in our dataset this resulted in less agreement with Census (82.1% and 80.7%). Finally, in this report nationality was used as a surrogate for COB if all other COB data was missing. This is justified by an 85.4% agreement between COB and nationality in the immigration dataset. By combining all four variables, COB was available for 98.9% of all mothers.
View Tables 1–2.
This methodological paper describes an opportunity for standardised, high-quality maternity research in New Zealand using the IDI. A strength of this proposed approach is the ability to create a comprehensive dataset including perinatal deaths and live births from a variety of national sources, using our combined knowledge, and defining PMMRC data as the gold standard; thereby utilising the best quality data from each dataset available. All steps in creating this dataset have been justified and validated. Complete understanding of the data sources, including the quality of the variables used and general inconsistencies in metadata, will also improve the accuracy of research output. Since these data sources are available to all researchers who are granted permission to use the Statistics NZ IDI, this will increase accessibility.
In developing this methodology, some limitations to the IDI were discovered, such as restricted use of the MORT dataset. Even though MORT is considered the best source for stillbirths, this methodology uses the MAT and BDM datasets to create the numerator table. This is justified as while MORT identified 6,270 perinatal deaths between 2008 and 2017, only 1,955 (31.2%) of these could be matched to MAT or BDM tables. Thus, clinical data including important variables would be unavailable for almost 70% of all mortalities. We suspect this low matching rate is due to a linking error within the IDI, which should be addressed by Statistics NZ. In addition, only 59 cases had a different birth status according to MORT, validating this approach.
We also offer recommendations to improve the quality of perinatal data, to further enhance these resources. Firstly, the transfer of BMI and smoking data from DHB primary care facilities to the MAT datasets should be facilitated to eliminate systematic bias in analyses that control for these variables, as the highest degree of missing data is among high-risk mothers under DHB care, and who are also more likely to suffer perinatal mortalities. Consideration should be given to excluding DHB-registered pregnancies from analyses that require adjustment for these variables. For many years the PMMRC has recommended the MoH to “urgently require DHBs to provide complete and accurate registration data to the MAT dataset”, without success.[[29]] Additionally, the variable “booking trimester” was missing for almost half of women with a DHB-registered pregnancy, despite late booking being associated with poorer perinatal outcomes and socio-economic deprivation.[[30,31]]
Collection of important obstetric risk factors such as maternal pre-existing chronic conditions, should be included in the MAT dataset. Others, such as GDM or hypertensive disorders of pregnancy, may need to be validated as the quality of these variables is not clear. For instance, Lawrence et al. investigated the prevalence of GDM according to some commonly used data sources in New Zealand and found an underreporting in NMDS (3.8%, compared to 5.9% reported by DHBs or 6.9% reported by laboratories).[[32]] There was 70% agreement on the presence of GDM between the data sources. We also suggest that validation studies of routine maternity datasets are performed. This will assist researchers in the interpretation of results of a widely used data source. Furthermore, pregnancy research requires both mother and infant data in most analyses. Mothers may appear in a table more than once with consecutive pregnancies or a pregnancy may result in multiple infants, which complicates the building of a perinatal dataset. Including stillbirths into the MAT infant dataset will provide more detailed information about the birth and simplify the process of creating a dataset. However importantly, although this methodology report offers quality improvement for maternity research, making the PMMRC national dataset available within the IDI should be considered. Use of this dataset would eliminate many data quality issues described in this paper associated with perinatal mortality studies.
Even though the IDI provides a promising avenue for perinatal studies, there are barriers to accessing the data. Each new project requires a comprehensive application process. New research projects are assessed seven times a year, with a turnaround time of approximately six weeks. Successful applications are required to pay a one-off fee ($500). Once approved, specific users of the IDI will need to be authorised by Statistics NZ, undergo confidentiality training, and any changes to the project are subject to evaluation. Researchers are granted access to merge and interrogate data sources only available within the strict privacy rules of the Datalab; available in a few cities across New Zealand. Finally, researchers are recommended to have intermediate SQL coding skills.
In conclusion, this methodological report aims to improve the quality of routine maternity studies in New Zealand by offering an alternative approach to conventional data sources, while simultaneously increasing knowledge and accessibility.
View Appendices 1 & 2.
View Appendices 3 & 4: To see copies of theoriginal SQL files, please contact corresponding author Esti C de Graff at e.degraaff@auckland.ac.nz
The highest quality perinatal data in New Zealand is collected and collated by the Perinatal and Maternal Mortality Review Committee (PMMRC) and is made available to a limited number of researchers. Therefore, maternity, and perinatal mortality studies are generally performed on Government-held data. This report offers an alternative approach with in-depth justification for the methodology, while simultaneously improving the understanding of the data sources.
A standardised method for creating a comprehensive maternity dataset within the Statistics New Zealand Integrated Data Infrastructure (IDI) was developed and a validation dataset was created to include all births between 2008 and 2017.
A close approximation to the PMMRC annual report data was found, with 4.0% over-reporting of perinatal deaths and 0.05% over-reporting of live births in the IDI dataset. Several variables, including important pregnancy risk factors, were validated for use. Limitations to the datasets were explored and additional tables in the IDI were proposed, to include variables on pregnancy complications, ethnicity and country of birth, and socio-economic data.
This methodological report describes an opportunity for standardised, high-quality maternity research in New Zealand using the IDI, including a variety of national data sources. Recommendations for further enhancement of these resources have been offered.
1) Hoppe DJ, Schemitsch EH, Morshed S, et al. Hierarchy of Evidence: Where Observational Studies Fit in and Why We Need Them. J Bone Joint Surg Am. 2009;91(Supplement_3).
2) Funai EF, Rosenbush EJ, Lee MJ, Del Priore G. Distribution of Study Designs in Four Major US Journals of Obstetrics and Gynecology. Gynecol Obstet Invest. 2001;51(1):8-11.
3) Grimes DA, Schulz KF. Bias and causal associations in observational research. Lancet. 2002;359(9302):248-52.
4) Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Int J Surg. 2014;12(12):1500-24.
5) Milne B, Atkinson J, Blakely T, et al. Data Resource Profile: The New Zealand Integrated Data Infrastructure (IDI). Int J Epidemiol. 2019;48.
6) Atkinson J, Blakely T. New Zealand’s Integrated Data Infrastructure (IDI): Value to date and future opportunities. Int J Popul Data Sci. 2017;1.
7) National Health Board Business Unit. National Maternity Collection Data Mart Data Dictionary. 2011.
8) Statistics New Zealand. IDI DIA Life Event data. 2021.
9) Births, Deaths, Marriages, and Relationships Registration (Prescribed Information) Regulations 1995 (SR 1995/183). 2021.
10) National Health Board. National Minimum Dataset (Hospital Events) Data Dictionary. 2014.
11) National Health Board. Mortality Collection Data Dictionary v1.8. 2021.
12) Statistics New Zealand. IDI MOH Chronic Condition/Significant Health Event Cohort data. 2021.
13) Statistics New Zealand. the Statistics Act 1975, version as at 28 October 2021.
14) Statistics New Zealand. IDI Data Dictionary: Immigration data (July 2015 edition). 2021.
15) Statistics New Zealand. Statistical standard for meshblock. Available from www.stats.govt.nz; 2016.
16) Perinatal and Maternal Mortality Review Committee. Thirteenth Annual Report of the Perinatal and Maternal Mortality Review Committee | Te Pūrongo ā-Tau Tekau mā Whā o te Komiti Arotake Mate Pēpi, Mate Whaea Hoki: Reporting mortality and morbidity 2017 | Te tuku pūrongo mō te mate me te whakamate 2017. Wellington: Health Quality & Safety Commission; 2019.
17) Perinatal and Maternal Mortality Review Committee. Methodology and definitions for Perinatal and Maternal Mortality Review Committee (PMMRC) reporting. 2018.
18) Ministry of Health. Primary Maternity Services Notice Persuant to Section 88 of the New Zealand Public Health And Disability Act 2000. 2007.
19) Ministry of Health. HISO 10001:2017 Ethnicity Data Protocols. 2017.
20) Reid G, Bycroft C, Gleisner F. Comparison of ethnicity information in administrative data and the census. Available from www.stats.govt.nz; 2016.
21) Stronks K, Kulu-Glasgow I, Agyemang C. The utility of 'country of birth' for the classification of ethnic groups in health research: the Dutch experience. Ethn Health. 2009;14(3):255-69.
22) Premkumar A, Debbink MP, Silver RM, et al. Association of Acculturation With Adverse Pregnancy Outcomes. Obstet Gynecol. 2020;135(2):301-9.
23) Bakken KS, Skjeldal OH, Stray-Pedersen B. Obstetric Outcomes of First- and Second-Generation Pakistani Immigrants: A Comparison Study at a Low-Risk Maternity Ward in Norway. J Immigr Minor Health. 2017;19(1):33-40.
24) Horner J, Ameratunga SN. Monitoring immigrant health and wellbeing in New Zealand: addressing the tyranny of misleading averages. Aust Health Rev. 2012;36(4):390-3.
25) Atkinson J, Salmond C, Crampton P. NZDEP2013 Index of Deprivation. Department of Public Health, University of Otago, Wellington; 2014.
26) Crampton P, Salmond C, Atkinson J. A comparison of the NZDep and New Zealand IMD indexes of socioeconomic deprivation. Kōtuitui. 2020;15(1):154-69.
27) Exeter DJ, Zhao J, Crengle S, et al. The New Zealand Indices of Multiple Deprivation (IMD): A new suite of indicators for social and health research in Aotearoa, New Zealand. PLOS ONE. 2017;12(8):e0181260.
28) Anderson N, Sadler L, Stewart A, McCowan L. Maternal and pathological pregnancy characteristics in customised birthweight centiles and identification of at-risk small-for-gestational-age infants: a retrospective cohort study. BJOG. 2012;119(7):848-56.
29) Perinatal and Maternal Mortality Review Committee. Fourteenth Annual Report of the Perinatal and Maternal Mortality Review Committee | Te Pūrongo ā-Tau Tekau mā Whā o te Komiti Arotake Mate Pēpi, Mate Whaea Hoki: Reporting mortality and morbidity 2018 | Te tuku pūrongo mō te mate me te whakamate 2018. Wellington: Health Quality & Safety Commission; 2021.
30) Flenady V, Koopmans L, Middleton P, et al. Major risk factors for stillbirth in high-income countries: a systematic review and meta-analysis. Lancet. 2011;377(9774):1331-40.
31) Bartholomew K, Morton SMB, Atatoa Carr PE, et al. Early engagement with a Lead Maternity Carer: Results from Growing Up in New Zealand. Aust N Z J Obstet Gynaecol. 2015;55(3):227-32.
32) Lawrence RL, Wall CR, Bloomfield FH. Prevalence of gestational diabetes according to commonly used data sources: an observational study. BMC Pregnancy Childbirth. 2019;19(1):34.
Observational studies are a useful tool in epidemiology.[[1]] One study revealed that approximately 68% of published articles in the four leading United States obstetrics and gynaecology journals were of observational nature.[[2]] In obstetric research, they provide the opportunity to study relatively rare adverse events like stillbirth or neonatal death (NND). Despite their value, observational studies come with biases and investigators have an obligation to identify and mitigate these,[[3]] which includes adequate reporting of the study design and methodology.[[4]]
Epidemiological maternity research in New Zealand is usually performed on national Government-held data. The highest quality perinatal data in the country is collected and held by the Perinatal and Maternal Mortality Review Committee (PMMRC), which is an independent committee under the Health Quality & Safety Commission New Zealand. PMMRC data is only made available to a limited number of researchers, due to data sovereignty issues. An alternative data source for maternity research in New Zealand is provided by the Statistics New Zealand (NZ) Integrated Data Infrastructure (IDI). The IDI is a collection of New Zealand Government and non-government administrative and survey data, held by Statistics NZ. Researchers are granted access to merge and interrogate data sources only available within the strict privacy rules of the IDI environment; as projects are required to meet all “five safes” (safe people, safe projects, safe settings, safe data, and safe output).[[5]] Data in the IDI is linked at the individual level, which allows personal information to be connected across different sectors (such as income, migration, and health). Each person is provided with a unique identification (ID) number in a central “spine”, by which the various datasets can be joined. All people ever resident in New Zealand (i.e., citizens or those with visas that allow residency, work or study) and captured in one of the data sources, are included. The IDI data are “refreshed” (i.e., updated, to include newer data, and additional data sources) up to four times a year. By connecting data from multiple sources not otherwise linked on an individual level, questions around complex issues can be researched with high quality across the population.[[6]]
As both maternal and infant data are generally necessary in perinatal analyses, creating datasets for maternity research can be complicated. This methodological report was prepared to assist New Zealand researchers in developing comprehensive datasets for national pregnancy studies, with a focus on perinatal death. We describe a standardised method for creating a “core” dataset within the IDI, allowing for consistent national reporting, and include suggestions for additional tables. We additionally aim to improve the understanding of the used datasets and variables.
An application was made with Statistics NZ to use the data within the IDI. Once approved, a dataset for perinatal research was built in Microsoft SQL Management Server Studio (using IDI refresh IDI_Clean_20211020). National data sources used, were the Maternity Collection (MAT),[[7]] Births, Deaths and Marriages (BDM),[[8,9]] the National Minimum Dataset (NMDS),[[10]] the Mortality Collection (MORT),[[11]] the Chronic Conditions dataset (CC),[[12]] Census,[[13]] and Immigration data.[[14]] “General” or “central” IDI tables used included the full birth date, full death date, and address notification tables.[[15]] A succinct and essential summary of these data sources has been provided in Appendix 1. See Table 1 for an overview of abbreviations used in this methodological report. Finally, PMMRC publicly available annual report data was used as the gold standard to validate the tables created.[[16]]
A numerator table (including all perinatal deaths) and a denominator table (including all births) were created separately, to include all births between 2008 and 2017. The main dataset used was the MAT collection. Data quality of the MAT collection varies by item but has improved significantly since 2008.[[17]] Thus, data from 2008 onward are most useful for perinatal research. The proposed method for creating the core dataset, as well as two corresponding full SQL codes (for a table in and excluding multiples) are provided in Appendices 2–4.
Even though data quality has improved since 2008, some variables still have a high degree of missing data. This is particularly true for women cared for exclusively by district health boards (DHB), due to a funding change introduced in July 2007. DHB employed midwifery teams are no longer required to claim for primary maternity services, while self-employed community-based lead maternity carers (LMCs; midwives, general practitioners, or obstetric specialists) must submit pregnancy data prior to payment from the Ministry of Health (MoH).[[18]] As a result, completeness of some data for DHB-registered pregnancies varies widely, while the same data for non-DHB pregnancies is mostly complete. It should also be noted that since 2008 the variable “LMC type” is inaccurate for women under DHB care.[[17]] Finally, when considering data quality issues, some variables can be taken from either MAT or BDM data sources. We validated the following variables for use: maternal age, fetal sex, gestational age, and birthweight.
Pregnancy complications such as gestational diabetes (GDM) or hypertensive disorders of pregnancy are important outcomes in routine maternity research. The MAT delivery table in the IDI identifies births that were complicated by pre-eclampsia or eclampsia, identified by NMDS. MAT, however, does not identify pregnancies of mothers with pre-existing hypertensive disorders or gestational hypertension. Secondly, the indicator for GDM is not offered in the IDI as this field is incorrectly labelled. It indicates both pre-existing diabetes and GDM, and according to the MoH no validation process is undertaken for this field. For that reason, we propose to add NMDS and CC datasets to the core dataset. Unfortunately, data on primary care diagnoses are not available in the IDI. Hospital admissions can be joined to the correct pregnancy by maternal ID and admission dates. Following a similar method, mothers with pre-existing diabetes (as an important risk factor) can be identified by joining the CC dataset.
Ethnicity is an important demographic risk factor widely used in pregnancy research in New Zealand, and key to examining health equity. Characteristics of ethnicity recorded in New Zealand include that it is self-defined, it can change over time and an individual may identify with more than one ethnic group. The use of ethnicity data in health research is addressed by the Ethnicity Data Protocols for the Health and Disability Sector by the MoH.[[19]] According to this protocol, ethnicity data can be categorized at four different “levels” following a hierarchical structure; with level four being the most detailed level of reporting (containing 231 ethnicity codes). These codes are then aggregated into ethnicity levels three to one. As an illustration, code “43112” stands for Fijian Indian and aggregates into “431” Indian (level three), “43” Indian (level two) and “4” Asian (level one). Generally, level two ethnicity data are used in health research for reporting, which includes 22 groups. In this aggregation a high level of detail is maintained for some ethnicities (such as Māori, Pacific Peoples, Chinese or Indian), while other minority groups are merged despite large heterogeneity (such as other Asian ethnicities, African or Latin American).
Unfortunately, the MAT dataset only holds level two ethnicity data. Due to the data collection methods of MAT, ethnicity data may also have been completed by a healthcare provider, leading to potential inaccuracy or lack of detail. Moreover, someone’s ethnicity can reflect a contextual response. This might occur, when a mother believes she will receive better care when reporting a different ethnicity. In contrast, BDM birth registration provides high-quality level four ethnicity data, including information reported by the parents directly outside of the healthcare setting. The requirement for parents to complete birth registration separately from the LMC leads to ethnicity data akin to ethnicity reported in the national Census and is generally considered second choice to Census ethnicity data.[[20]] However, since ethnicity can change and BDM is recorded closer to the birth event than Census, we argue that BDM should be used as the main source in maternity studies. Note that in cases where ethnicity data is missing, the source-ranked ethnicity table (“central” table in the IDI) may be consulted, although only level one ethnicity is provided.
Some research questions will require both ethnicity (i.e., a subjective belief, related to cultural behaviours and practices) and country of birth (COB; i.e., an objective measure, more closely linked to ethnic origin), considering increasing migration and ethnic diversity globally.[[21]] Combining these variables in analyses might provide a better understanding of pregnancy risk factors, since common issues associated with migration in first generation migrants (such as socio-economic deprivation or diverse health literacy) may result in differential health outcomes compared to second and third generation women.[[22–24]] COB data are not available from the standard birth tables and should therefore be obtained from alternative datasets. Census or immigration data present as the highest quality sources for this variable. Since COB is fixed, the datasets can simply be linked by maternal ID, regardless of the correct pregnancy event. An alternative method for consideration, when solely interested whether a mother was New Zealand born or not, is to join parent ID on the infants BDM birth record, with the BDM births table. If the mother’s birth was registered, she was New Zealand born. In contrast, if the mother’s birth cannot be identified in BDM births, she was most likely born overseas.
The MAT or BDM datasets do not contain any information on individual level socio-economic status. The current classification system used to monitor deprivation, which is widely used in health and social research, is the New Zealand Socioeconomic Deprivation Indices (NZDep).[[25]] NZDep is a decile score based on area of domicile, divided into meshblocks or larger Census Area Units where a higher level of confidentiality is required. The NZDep is based on census data; with decile one representing 10% of the population who live in the least deprived areas and decile ten representing 10% of the population who live in the most deprived areas in New Zealand. The NZDep2013 is the fifth updated version since 1991 and combines nine variables from the 2013 Census. A limitation of this method is that the NZDep represents area-level deprivation and does not necessarily reflect the socio-economic status of an individual, although it is a close approximation and may be used as a proxy in large datasets.[[26]] Alternative proxy variables for consideration, include the New Zealand Indices of Multiple Deprivation,[[27]] region of domicile (sourced from the address notifications table), DHB of domicile (sourced from MAT), or personal income level (sourced from Census), although the researchers of this paper have no experience using these alternative sources.
To provide an estimation of socio-economic status in our birth cohort, the registered address closest to the date of birth, and prior to delivery, was chosen for each delivery event, to allow for geographic movement over time, and to best capture mothers socio-economic status during pregnancy. Note that where a full date of birth is missing (primarily among perinatal deaths), birth year and month sourced from the BDM births or MAT table may be used. The meshblock associated with this address was then extracted and linked to the corresponding NZDep2006 and 2013 decile scores (births after 2008 and before 2013 linked to NZDep2006 and from 2013 linked to NZDep2013).
Six thousand, seven hundred and ninety perinatal deaths (4,768 stillbirths and 2,022 NNDs) and 617,375 live births were identified in our dataset. In comparison, PMMRC annual report data comprised 6,518 perinatal deaths (4,779 stillbirths, including 1,456 terminations, and 1,739 NNDs) and 617,321 live births.[[16]] As a result, our numerator dataset includes 272 more perinatal mortalities compared to the gold standard (an approximate 4.0% overreporting in deaths). Our denominator dataset includes 54 more live births compared to PMMRC report data (a 0.05% overreporting in live births). Thus overall, our datasets represent a close approximation to the gold standard. We found that 99.2% of all births (99.5% of live births and 77.4% of perinatal deaths) in our final dataset had a record from both MAT delivery and BDM births; providing complete data for the majority of cases. Cases that could not be joined to both sources, were mostly missing BDM data (example in Table 2).
Smoking status at registration was unknown for 44.0% of women with a DHB-registered pregnancy, while this was only 0.04–1.5% for women under the care of an LMC provider. This was even higher for maternal BMI, with 58.0% and 0.1–2.2% of missing data among these groups respectively. The variable “booking trimester” was missing for 48.5% of women with a DHB-registered pregnancy, compared to 0.01–0.2% of women booked with another LMC type. 37.1% of DHB cases were noted as LMC type “unknown” in our dataset. The following variables were validated.
While maternal age is provided by MAT, it was calculated from maternal birth year, month and delivery date in BDM. In our cohort, there was a 95.6% overlap where this variable was available from both datasets. 95.4% of the 4.4% non-matches only differed by one year. Consequently, the MAT dataset may be used for maternal age (accounting for 99.9% of all cases).
The overlap in fetal sex was 99.96% where both MAT and BDM variables were available. Hence, either variable can be used in analysis (accounting for 98.2% of all cases).
The MAT dataset notes gestational age in weeks, while BDM also registers days. There was an 89.1% overlap in gestational age week where both variables were available. Of the remaining 10.9%, it appeared that 0.7% only differed by one day (e.g., 37 and 36+6), while 7.0% differed by one day to one week (e.g., 39 and 40+0). Among cases in which a larger difference existed (3.2% of total), birthweight was more likely to correlate with MAT gestational age and therefore MAT may be prioritised for use. However, BDM should be used in analyses including customised birthweight centiles,[[28]] where the absence of gestation in days leads to systematic over-estimation of birthweight centiles. Using both tables, 98.0% of cases are accounted for.
The overlap in birthweight was 96.6% where both MAT and BDM variables were available. Of the remaining 3.4%, 1.7% only differed by 100 grams and 0.6% differed by more than 500 grams. Either variable can be used in analysis, accounting for 94.0% of all cases.
Close to 99.0% of deliveries could be linked to a NZDep score.
Level four ethnicity from the 2013 Census had an 89.4% overlap with BDM ethnicity, in cases where only one ethnic group was recorded in both sources (N=392,004). The corresponding overlap for level three ethnicity was 90.6% and level two ethnicity 95.7%. Therefore, if BDM is missing, Census provides a good alternative. If both are missing, MAT ethnicity can be used as a surrogate. This method may also mitigate some data quality differences between ethnicities, as the availability of BDM ethnicity data for perinatal deaths differs per group (Table 2).
Between 2008 and 2017 88.3% of mothers had a known COB from the 2018 Census. Where Census 2018 data was missing, Census 2013 data was used, with a 99.2% agreement between the two surveys among women where both were available. If both were missing, then immigration data was used, with an 89.7% and 88.8% agreement with Census 2018 and 2013 respectively. Immigration metadata suggests using nationality over COB, however in our dataset this resulted in less agreement with Census (82.1% and 80.7%). Finally, in this report nationality was used as a surrogate for COB if all other COB data was missing. This is justified by an 85.4% agreement between COB and nationality in the immigration dataset. By combining all four variables, COB was available for 98.9% of all mothers.
View Tables 1–2.
This methodological paper describes an opportunity for standardised, high-quality maternity research in New Zealand using the IDI. A strength of this proposed approach is the ability to create a comprehensive dataset including perinatal deaths and live births from a variety of national sources, using our combined knowledge, and defining PMMRC data as the gold standard; thereby utilising the best quality data from each dataset available. All steps in creating this dataset have been justified and validated. Complete understanding of the data sources, including the quality of the variables used and general inconsistencies in metadata, will also improve the accuracy of research output. Since these data sources are available to all researchers who are granted permission to use the Statistics NZ IDI, this will increase accessibility.
In developing this methodology, some limitations to the IDI were discovered, such as restricted use of the MORT dataset. Even though MORT is considered the best source for stillbirths, this methodology uses the MAT and BDM datasets to create the numerator table. This is justified as while MORT identified 6,270 perinatal deaths between 2008 and 2017, only 1,955 (31.2%) of these could be matched to MAT or BDM tables. Thus, clinical data including important variables would be unavailable for almost 70% of all mortalities. We suspect this low matching rate is due to a linking error within the IDI, which should be addressed by Statistics NZ. In addition, only 59 cases had a different birth status according to MORT, validating this approach.
We also offer recommendations to improve the quality of perinatal data, to further enhance these resources. Firstly, the transfer of BMI and smoking data from DHB primary care facilities to the MAT datasets should be facilitated to eliminate systematic bias in analyses that control for these variables, as the highest degree of missing data is among high-risk mothers under DHB care, and who are also more likely to suffer perinatal mortalities. Consideration should be given to excluding DHB-registered pregnancies from analyses that require adjustment for these variables. For many years the PMMRC has recommended the MoH to “urgently require DHBs to provide complete and accurate registration data to the MAT dataset”, without success.[[29]] Additionally, the variable “booking trimester” was missing for almost half of women with a DHB-registered pregnancy, despite late booking being associated with poorer perinatal outcomes and socio-economic deprivation.[[30,31]]
Collection of important obstetric risk factors such as maternal pre-existing chronic conditions, should be included in the MAT dataset. Others, such as GDM or hypertensive disorders of pregnancy, may need to be validated as the quality of these variables is not clear. For instance, Lawrence et al. investigated the prevalence of GDM according to some commonly used data sources in New Zealand and found an underreporting in NMDS (3.8%, compared to 5.9% reported by DHBs or 6.9% reported by laboratories).[[32]] There was 70% agreement on the presence of GDM between the data sources. We also suggest that validation studies of routine maternity datasets are performed. This will assist researchers in the interpretation of results of a widely used data source. Furthermore, pregnancy research requires both mother and infant data in most analyses. Mothers may appear in a table more than once with consecutive pregnancies or a pregnancy may result in multiple infants, which complicates the building of a perinatal dataset. Including stillbirths into the MAT infant dataset will provide more detailed information about the birth and simplify the process of creating a dataset. However importantly, although this methodology report offers quality improvement for maternity research, making the PMMRC national dataset available within the IDI should be considered. Use of this dataset would eliminate many data quality issues described in this paper associated with perinatal mortality studies.
Even though the IDI provides a promising avenue for perinatal studies, there are barriers to accessing the data. Each new project requires a comprehensive application process. New research projects are assessed seven times a year, with a turnaround time of approximately six weeks. Successful applications are required to pay a one-off fee ($500). Once approved, specific users of the IDI will need to be authorised by Statistics NZ, undergo confidentiality training, and any changes to the project are subject to evaluation. Researchers are granted access to merge and interrogate data sources only available within the strict privacy rules of the Datalab; available in a few cities across New Zealand. Finally, researchers are recommended to have intermediate SQL coding skills.
In conclusion, this methodological report aims to improve the quality of routine maternity studies in New Zealand by offering an alternative approach to conventional data sources, while simultaneously increasing knowledge and accessibility.
View Appendices 1 & 2.
View Appendices 3 & 4: To see copies of theoriginal SQL files, please contact corresponding author Esti C de Graff at e.degraaff@auckland.ac.nz
The highest quality perinatal data in New Zealand is collected and collated by the Perinatal and Maternal Mortality Review Committee (PMMRC) and is made available to a limited number of researchers. Therefore, maternity, and perinatal mortality studies are generally performed on Government-held data. This report offers an alternative approach with in-depth justification for the methodology, while simultaneously improving the understanding of the data sources.
A standardised method for creating a comprehensive maternity dataset within the Statistics New Zealand Integrated Data Infrastructure (IDI) was developed and a validation dataset was created to include all births between 2008 and 2017.
A close approximation to the PMMRC annual report data was found, with 4.0% over-reporting of perinatal deaths and 0.05% over-reporting of live births in the IDI dataset. Several variables, including important pregnancy risk factors, were validated for use. Limitations to the datasets were explored and additional tables in the IDI were proposed, to include variables on pregnancy complications, ethnicity and country of birth, and socio-economic data.
This methodological report describes an opportunity for standardised, high-quality maternity research in New Zealand using the IDI, including a variety of national data sources. Recommendations for further enhancement of these resources have been offered.
1) Hoppe DJ, Schemitsch EH, Morshed S, et al. Hierarchy of Evidence: Where Observational Studies Fit in and Why We Need Them. J Bone Joint Surg Am. 2009;91(Supplement_3).
2) Funai EF, Rosenbush EJ, Lee MJ, Del Priore G. Distribution of Study Designs in Four Major US Journals of Obstetrics and Gynecology. Gynecol Obstet Invest. 2001;51(1):8-11.
3) Grimes DA, Schulz KF. Bias and causal associations in observational research. Lancet. 2002;359(9302):248-52.
4) Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Int J Surg. 2014;12(12):1500-24.
5) Milne B, Atkinson J, Blakely T, et al. Data Resource Profile: The New Zealand Integrated Data Infrastructure (IDI). Int J Epidemiol. 2019;48.
6) Atkinson J, Blakely T. New Zealand’s Integrated Data Infrastructure (IDI): Value to date and future opportunities. Int J Popul Data Sci. 2017;1.
7) National Health Board Business Unit. National Maternity Collection Data Mart Data Dictionary. 2011.
8) Statistics New Zealand. IDI DIA Life Event data. 2021.
9) Births, Deaths, Marriages, and Relationships Registration (Prescribed Information) Regulations 1995 (SR 1995/183). 2021.
10) National Health Board. National Minimum Dataset (Hospital Events) Data Dictionary. 2014.
11) National Health Board. Mortality Collection Data Dictionary v1.8. 2021.
12) Statistics New Zealand. IDI MOH Chronic Condition/Significant Health Event Cohort data. 2021.
13) Statistics New Zealand. the Statistics Act 1975, version as at 28 October 2021.
14) Statistics New Zealand. IDI Data Dictionary: Immigration data (July 2015 edition). 2021.
15) Statistics New Zealand. Statistical standard for meshblock. Available from www.stats.govt.nz; 2016.
16) Perinatal and Maternal Mortality Review Committee. Thirteenth Annual Report of the Perinatal and Maternal Mortality Review Committee | Te Pūrongo ā-Tau Tekau mā Whā o te Komiti Arotake Mate Pēpi, Mate Whaea Hoki: Reporting mortality and morbidity 2017 | Te tuku pūrongo mō te mate me te whakamate 2017. Wellington: Health Quality & Safety Commission; 2019.
17) Perinatal and Maternal Mortality Review Committee. Methodology and definitions for Perinatal and Maternal Mortality Review Committee (PMMRC) reporting. 2018.
18) Ministry of Health. Primary Maternity Services Notice Persuant to Section 88 of the New Zealand Public Health And Disability Act 2000. 2007.
19) Ministry of Health. HISO 10001:2017 Ethnicity Data Protocols. 2017.
20) Reid G, Bycroft C, Gleisner F. Comparison of ethnicity information in administrative data and the census. Available from www.stats.govt.nz; 2016.
21) Stronks K, Kulu-Glasgow I, Agyemang C. The utility of 'country of birth' for the classification of ethnic groups in health research: the Dutch experience. Ethn Health. 2009;14(3):255-69.
22) Premkumar A, Debbink MP, Silver RM, et al. Association of Acculturation With Adverse Pregnancy Outcomes. Obstet Gynecol. 2020;135(2):301-9.
23) Bakken KS, Skjeldal OH, Stray-Pedersen B. Obstetric Outcomes of First- and Second-Generation Pakistani Immigrants: A Comparison Study at a Low-Risk Maternity Ward in Norway. J Immigr Minor Health. 2017;19(1):33-40.
24) Horner J, Ameratunga SN. Monitoring immigrant health and wellbeing in New Zealand: addressing the tyranny of misleading averages. Aust Health Rev. 2012;36(4):390-3.
25) Atkinson J, Salmond C, Crampton P. NZDEP2013 Index of Deprivation. Department of Public Health, University of Otago, Wellington; 2014.
26) Crampton P, Salmond C, Atkinson J. A comparison of the NZDep and New Zealand IMD indexes of socioeconomic deprivation. Kōtuitui. 2020;15(1):154-69.
27) Exeter DJ, Zhao J, Crengle S, et al. The New Zealand Indices of Multiple Deprivation (IMD): A new suite of indicators for social and health research in Aotearoa, New Zealand. PLOS ONE. 2017;12(8):e0181260.
28) Anderson N, Sadler L, Stewart A, McCowan L. Maternal and pathological pregnancy characteristics in customised birthweight centiles and identification of at-risk small-for-gestational-age infants: a retrospective cohort study. BJOG. 2012;119(7):848-56.
29) Perinatal and Maternal Mortality Review Committee. Fourteenth Annual Report of the Perinatal and Maternal Mortality Review Committee | Te Pūrongo ā-Tau Tekau mā Whā o te Komiti Arotake Mate Pēpi, Mate Whaea Hoki: Reporting mortality and morbidity 2018 | Te tuku pūrongo mō te mate me te whakamate 2018. Wellington: Health Quality & Safety Commission; 2021.
30) Flenady V, Koopmans L, Middleton P, et al. Major risk factors for stillbirth in high-income countries: a systematic review and meta-analysis. Lancet. 2011;377(9774):1331-40.
31) Bartholomew K, Morton SMB, Atatoa Carr PE, et al. Early engagement with a Lead Maternity Carer: Results from Growing Up in New Zealand. Aust N Z J Obstet Gynaecol. 2015;55(3):227-32.
32) Lawrence RL, Wall CR, Bloomfield FH. Prevalence of gestational diabetes according to commonly used data sources: an observational study. BMC Pregnancy Childbirth. 2019;19(1):34.
The full contents of this pages only available to subscribers.
Login, subscribe or email nzmj@nzma.org.nz to purchase this article.