PREDICTION OF SEVERE RETINOPATHY OF PREMATURITY IN 24 TO 30 WEEKS GESTATION INFANTS USING BIRTH CHARACTERISTICS

 

R.E. Zackula,1 Talkad S. Raghuveer 2

  1. Department of Research, University of Kansas School of Medicine-Wichita, 1010 N. Kansas, Wichita, KS, 67214 USA.
  2. Department of Pediatrics, University of Kansas School of Medicine at Wichita, Division of Neonatology, Pediatrix Medical Group of KS, Wesley Medical Center, 550 N Hillside, Wichita, KS, 66714, USA

Corresponding Author:

Talkad S. Raghuveer, MD

Professor

Department of Pediatrics

University of Kansas School of Medicine at Wichita

Division of Neonatology

Pediatrix Medical Group of KS

Wesley Medical Center

550 N. Hillside

Wichita, KS  66714, USA

Email: raghuveer.talkad3@gmail.com

Abbreviations: ROP: Retinopathy of Prematurity; GA: Gestational Age; SDS: Standard Deviation Score: SDS

Manuscript Citation:

Pivodic A, Hård AL, Löfqvist C, Smith LEH, Wu C, Brunder MC, Lagreze WA, Stahl A, Holmstrom G, Slbertsson-Wikland K, Johansson H, Nilsson S, Hellstrom A. Individual risk prediction for sight –threatening Retinopathy of Prematurity using birth characteristics.  JAMA Ophthalmol 2020 Jan,138 (1): 21-29.

TYPE OF INVESTIGATION - Retrospective cohort study

STUDY QUESTION

Among preterm infants 24-30 weeks’ gestation, can individualized prediction model containing birth characteristics adequately estimate risk for treatment of sight-threatening ROP?

METHODS

Design: Retrospective cohort study using data from Swedish National Patient Registry of infants screened for Retinopathy of Prematurity (ROP) from January 1, 2007 to August 7, 2018.  The purpose was to develop and validate an individualized, predictive model to estimate risk for treatment of sight-threatening ROP using birth characteristics. (ROP treatment was Laser surgery in most cases, anti-Vascular Endothelial Growth Factor antibody in some cases and combination of both treatments in a few cases, Hellstrsom A, personal communication)

Outcome: The study outcome was ROP treatment (dichotomous variable based on the International Classification of Retinopathy of Prematurity and Early Treatment for Retinopathy of Prematurity (ETROP) criteria for treatment).

Patients: The target population was 9,135 extremely premature infants’ born 2007-2018 from the Swedish National Registry for Retinopathy of Prematurity (SWEDROP). Excluded were 1,388 infants greater than 31 weeks at birth, and 138 infants with missing data, for a total of 7,609 infants. From those, two internal groups were constructed: the model development group and the validation temporal group. These groups were further subdivided by gestational age, < 24 weeks and > 24 weeks, the latter being the focus of the developmental model. Hence, the internal model development group was comprised of 6947 infants, born 2007-2017; while the internal validation temporal group contained 308 infants, born 2017-2018. Same gestation age cohorts were used for the two external geographical validation models, which included 1,485 infants born in the United States from 2005-2010 and 329 European infants, born 2011-2017. 

Intervention/Exposure: prematurity, born 24 to 30 weeks gestation

Statistical Analysis: Poisson regression for time-varying data was utilized with ROP treatment as the outcome and birth characteristics as predictors.

Follow-up: 20 postnatal weeks (graphs show up to 28 postnatal weeks)

RESULTS

The overall incidence of ROP needing treatment from SWEDROP was 5.8% (442 of 7609). The incidence of ROP treatment was 40.1% (142/354) for infants with GA <24 weeks, 10.2% (287/2806) among those with GA 24 weeks to less than 28 weeks, and 0.3% (13/4,449) among those at least 28 weeks GA. The developmental model for GA 24 to 30 weeks included: piecewise linear current postnatal age (break points, 8 and 12weeks), piecewise linear continuous GA given in weeks and days (break point, 27 weeks), sex, piecewise linear BWSDS (break point,−1SDS), postnatal age × piecewise

linear GA interaction, sex × GA interaction, and postnatal age × piecewise linear BWSDS interaction.

All predictive models were assessed with discrimination and calibration. Discrimination is the ability of a model to differentiate between those who do or do not have ROP and is measured by the area under the receiver-operating curve (AUC); calibration is the agreement between predictions from the model and observed outcomes, usually shown as departure from a line on a graph.

Each model showed high predictive ability: AUC of 0.90 (95%CI, 0.89-0.92) for internal model development, 0.94 (95% CI 0.90-0.98) for internal temporal validation; external, geographical validation was 0.87 (95% CI 0.84-0.89) for U.S. cohort, and 0.90 (95% CI 0.85-0.95) for European cohort. Sensitivity of the final model was 99.0%.  Risk of infants needing ROP treatment increased between postnatal weeks 8 to 12 and decreased thereafter. Calibration plots were reported in the supplement; there was only slight departure from the calibration line.

STUDY CONCLUSIONS

The authors conclude the predictive model based on birth characteristics data is generalizable and enables individualized, early risk prediction for infants born at 24 to 30 weeks gestation needing ROP treatment.

COMMENTARY

The primary purpose for the Pivodic et al. model was to predict ROP needing treatment in premature infants using only birth characteristic data. Progress research strategy (PROGRESS) 3 was the guide for the statistical plan and model development strategy [2] in this high-quality work. However, by completing PROBAST [3], a newly developed instrument for assessing statistical models, we found some deficiencies and uncovered potential bias in the predictive model.

Assessment Instrument

The PROBAST instrument (Prediction model study Risk of Bias Assessment Tool) was designed specifically to assess statistical model studies making individualized predictions [3]. The instrument includes an explanation and elaboration, along with a template for conducting an assessment. The explanation article defines risk of bias as “shortcomings in the study design, conduct or analysis, which leads to systematically distorted estimates of a model’s predictive performance or to an inadequate model to address the research question.” The template includes 20 signaling questions exploring four study domains: participants, predictors, outcome, and analysis. Scoring of each question and domain is low, high, or unclear in terms of bias. Within the analysis domain, model predictive performance is evaluated using calibration, discrimination or classification measures. PROBAST is available at http://development.probast.org/

Our assessment included the development and validation models; however, it was limited to models assessing risk of severe ROP needing treatment in infants born at 24 to 30 weeks gestation. Table 1 includes the results for the signaling questions, followed by the rationale for each domain rating.

Table 1. PROBAST Assessment

Signaling Question

Dev

Val

Participant Domain

 

 

1.1  Were appropriate data sources used, e.g. cohort, RCT or nested case-control study data?

Unclear

Unclear

1.2  Were all inclusions and exclusions of participants appropriate?

High

High

Predictor Domain

 

 

2.1   Were predictors defined and assessed in a similar way for all participants?

Unclear

Unclear

2.2   Were predictor assessments made without knowledge of outcome data?

Low

Unclear

2.3   Are all predictors available at the time the model is intended to be used?

Low

Unclear

Outcome Domain

 

 

3.1   Was the outcome determined appropriately?

Low

Unclear

3.2   Was a pre-specified or standard outcome definition used?

Low

Unclear

3.3   Were predictors excluded from the outcome definition?

Unclear

Unclear

3.4   Was the outcome defined and determined in a similar way for all participants?

Low

Unclear

3.5   Was the outcome determined without knowledge of predictor information?

Unclear

Unclear

3.6   Was the time interval between predictor assessment and outcome determination appropriate?

Unclear

Unclear

Analysis Domain

 

 

4.1   Were there a reasonable number of participants with the outcome?

Low

High

4.2   Were continuous and categorical predictors handled appropriately?

Low

Unclear

4.3   Were all enrolled participants included in the analysis?

Low

Unclear

4.4   Were participants with missing data handled appropriately?

High

High

4.5   Was selection of predictors based on univariable analysis avoided?

High

N/A

4.6   Were complexities in the data (e.g. censoring, competing risks, sampling of controls) accounted for appropriately?

Unclear

Unclear

4.7   Were relevant model performance measures evaluated appropriately?

Low

High

4.8   Were model overfitting and optimism in model performance accounted for?

Unclear

N/A

4.9   Do predictors and their assigned weights in the final model correspond to the results from multivariable analysis?

Unclear

N/A

Dev: Developed model

Val: Validation model

Rationale for PROBAST ratings

Participant Domain

Subjects for the developed model were from the Swedish National Patient Registry. There is no information regarding race/ethnicity in the development model, nor is there an assessment of the 138 infants that were excluded from the target population due to missing data. For example, AUC by race/ethnicity, as reported for the US cohort, varied from 0.79 for Hispanics to 0.90 for blacks, indicating this variable contributes to the models’ ability to discriminate. Also, any missing data should be evaluated for randomness and imputed where possible or evaluated for differences from the analyzed cohort.  Together, these may have introduced bias. Thus, there is “High concern” for the participant domain.

Predictor Domain

First, birth weight SDS is a composite score for expected reference weight, which is based on GA, sex, and birth weight for all healthy singletons born at GA at least 24 weeks from 1990 to 1999 and is derived from infants registered in the Medical Birth Register of Sweden. This scoring system may not be widely used. Next, there is no evaluation of predictors for collinearity (predictors that are highly related to each other). Collinearity may be suspect in the Pivodic et al. model that uses both gestational age and birth weight, along with their interaction, as these are highly associated. This potential collinearity may have resulted in inflated variance, unduly influencing the slope parameters estimates, and rendering future predictions for individuals less than accurate.

Further, limiting the model to birth characteristics (which was the stated goal of Pivodic et al.) may lead to overfitting (too few outcome events relative to the number of predictors) or underfitting (failing to include important predictors) of predictive models for severe ROP, and may not fully consider the true risk.  For example, Pivodic et al. state that postnatal age (not necessarily a birth characteristic) was the best predictive variable for the temporal risk of ROP treatment. Care received while in the NICU may also modify this risk.  Incorporating variables such as oxygen saturation targets, type of infant feeding (breast milk versus formula), and supplementation with certain nutrients (Vitamin A, Omega-3 fatty acids and Vitamin E), may change the estimates of risk for developing severe ROP [4]. Thus, the risk of bias was rated as “High” for the predictor domain.

Outcome Domain

Evaluations for ROP needing treatment may not be consistent across all locations; research shows observer bias between centers for ophthalmologists assessing acute ROP [5]. Therefore, the risk of bias is “Unclear” in the outcome domain.

Analysis Domain

First, ROP needing treatment was a rare event, approximately 4.1% in infants born 24 to 30 weeks’ gestation (300 of 7,255). As sample size decrease, the number of infants with the event may also decrease. For example, the temporal validation group contained 308 infants, or about 13 with ROP needing treatment. With smaller samples, the risk increases for selecting spurious predictors (overfitting) or failing to include important predictors (underfitting). Thus, the temporal group may not have had enough evidence for accurate internal validation.

As stated above, collinearity may contribute to risk of bias in all models. Collinearity may alter coefficients by inadvertently reversing them or producing inflated error terms, resulting in inaccurate risk estimates and over-optimistic p-values. A different type of regression analysis could help overcome any collinearity issue; for example, ridge regression, lasso, or elastic net procedures all help reduce model variance.  Specifically, elastic net mathematically selects among all predictors, including possible interactions, for the “best” variables to predict an outcome. Thus, the arbitrary criterion for selecting predictors with univariable analysis results <0.10 used by Pivodic et al. would not be applicable.

Internal temporal validation

Internal temporal validation of a development model is when the investigators use the same predictors, outcome definitions and measurements, but sample from a later period. The infants born 2017 to 2018 from the Swedish National Patient Registry were the temporal group. The coefficients produced during the model development stage predicted outcomes. Results showed good AUC model performance; however, because data collection spanned 2007 to 2018 in the registry, it is important to recognize and account for changes in medical practice during this extended period. For example, recommendations for oxygen saturation targets used in preterm infants had changed [4].

External geographical validation

The intent of external model validation is to quantify the developed model’s predictive performance using a new participant dataset, typically from different investigators that contain measures similar to the developed model. This can be data collected from different settings, intentionally different population, or different locations.

Pivodic et al. selected two groups for external validation: a sample of infants born in the U.S. from 2005 to 2010 and a sample of infants born in Germany (European group) from 2011 to 2017. As with temporal validation, the coefficients from the developed model were used to predict outcomes for the two external cohorts. All validation assessments showed the model was performing well as measured by high calibration, discrimination, classification, and sensitivity.

In creating a predictive model for risk of treatment for severe ROP, it is important to have both high sensitivity (correctly identify those that require treatment; true positive rate) and high specificity (correctly identify those who would not need to be screened; true negative rate). But, positive predictive value (PPV) ranged from 9.2% to 21.8%; so that of those who tested positive, as high as 21.8% actually have ROP. With no known biomarkers for ROP, routine retinal examinations are necessary to detect the small percent of infants requiring treatment. Moreover, increased survival of extremely premature infants leads to an increased number of infants needing screening.  Unfortunately, the number of ROP-trained ophthalmologists is limited and expected to decrease [6]. As such, predictive models may help to select premature infants who are not at risk for severe ROP (such as infants >27 weeks GA). However, specificity was markedly reduced in some of Pivodic et al models, ranging from 10.5% to 49.3%; thus, these may not be useful for identifying premature infants who are not at risk for ROP. For these reasons, the risk of bias was “High” in the analysis domain.

 

In conclusion, the evaluation using PROBAST showed risk of bias may be present, which would limit generalizability. As a result, we recommend the following:

  1. Select samples for the development of models that more closely represent the target population in order to increase generalizability of the final model.
  2. Conduct a thorough assessment for potential predictors, including evaluating multicollinearity.
  3. Use larger samples for validation models, (the incidence of severe ROP is trending up in recent studies). [4]
  4. Incorporate changes of neonatal care into all models; for example, recent changes in recommendations for oxygen saturation levels.
  5. Consider more robust regression techniques to select predictors, such as elastic net, that reduce variance while considering all possible interactions. Alternatively, a Bayesian approach might be useful in that prior knowledge would be incorporated into the model building process.

The American Academy of Ophthalmology did a systematic study of predictive models and defined criteria for model development, which they assigned Level I, Level II, and Level III [7]. Per this measure, the Pivodic et al. study appears to meet the criteria for a Level I rating (high quality study). However, according to PROBAST these models may in fact be biased. Despite this potential shortcoming, Pivodic et al. demonstrated the risk of ROP needing treatment peaked at 12 weeks postnatal age, regardless of gestational age at birth and not related to post-menstrual age. Rate of increase was 54% per week from postnatal week 8 through 12; afterwards it decreased 30% per week.  This information may help with scheduling ROP screenings and in deploying strategies to prevent progression of ROP [4]. 

EBM LESSON: EVALUATION OF A PREDICTION MODEL

The PROBAST instrument appears to provide substantial insight into the risk of bias when developing a prediction model for individuals. Constructing a model to predict an outcome is a complex statistical procedure. Therein lies a danger for using models to predict individual outcomes before fully understanding potential risk of bias. While we acknowledge that Pivodic et al. employed high quality, rigorous tools for evaluating their models, by using the new PROBAST instrument, shortcomings were easily identified that could significantly alter risk estimates of ROP requiring treatment in premature infants.

Acknowledgment:

The Journal club is a collaboration between the American Academy of Pediatrics- Section of Neonatal Perinatal medicine and the International Society of Evidence- based neonatology (EBNEO.org)
References

  1. Pivodic A, Hård AL, Löfqvist C, Smith LEH, Wu C, et al. Individual risk prediction for sight –threatening Retinopathy of Prematurity using birth characteristics.  JAMA Opthalmol 2020; 138 (1): 21-29.
  2. Steyerberg EW, Moons KGM, van der Windt DA. et al. PROGRESS Group. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLos Med. 2013; 10 (2): e1001381.
  3. Moons KG, Wolff RF, Riley RD, et al. PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Intern Med. 2019;170:W1–W33. https://www.equator-network.org/, accessed 20/18/2020.
  4. Raghuveer TS, Zackula RE. Strategies to Prevent Severe Retinopathy: A 2020 update and Meta-analysis. In Press. 2020
  5. Darlow BA, Elder MJ, Horwood LJ, Donoghue DA, Henderson-Smart DJ; Australian and New Zealand Neonatal Network. Does observer bias contribute to variations in the rate of retinopathy of prematurity between centres? Clin Exp Ophthalmol. 2008 Jan-Feb; 36(1):43-6.
  6. Wong RK, Ventura CV, Espiritu MJ, Yonekawa Y, Henchoz L, Chiang MF, et al. Training fellows for retinopathy of prematurity care: a web-based survey. J AAPOS 2012; 16 (2): 177-181.
  7. Hutchinson AK, Melia M, Yang MB, VanderVeen DK, Wilson LB, Lambert SR. Clinical models and algorithms for the prediction of Retinopathy of Prematurity. Ophthalmology 2016; 123: 804-816.
Last Updated

08/30/2022

Source

American Academy of Pediatrics