Attention: Restrictions on use of AUA, AUAER, and UCF content in third party applications, including artificial intelligence technologies, such as large language models and generative AI.
You are prohibited from using or uploading content you accessed through this website into external applications, bots, software, or websites, including those using artificial intelligence technologies and infrastructure, including deep learning, machine learning and large language models and generative AI.

JU INSIGHT: Multi-institutional Validation of Improved Vesicoureteral Reflux Assessment With Simple and Machine Learning Approaches

By: Adree Khondker, BHSc; Jethro C. C. Kwong, MD; Priyank Yadav, MCh; Justin Y. H. Chan, MD; Anuradha Singh, MD; Marta Skreta, MSc; Lauren Erdman, PhD; Daniel T. Keefe, MD, MSc; Katherine Fischer, MD; Gregory Tasian, MD, MSc; Jessica H. Hannick, MD, MSc; Frank Papanikolaou, MD; Benjamin J. Cooper, BSc (c); Christopher S. Cooper, MD; Mandy Rickard, MN-NP; Armando J. Lorenzo, MD, MSc | Posted on: 01 Dec 2022

Khondker A, Kwong JCC, Yadav P, et al. Multi-institutional validation of improved vesicoureteral reflux assessment with simple and machine learning approaches. J Urol. 2022;208(6):1314-1322.

Study Need and Importance

Vesicoureteral reflux (VUR) severity is communicated by a grade between I-V on voiding cystourethrogram (VCUG). However, current estimates suggest that one-third of VCUGs will have discordant grading between clinicians, especially in moderate VUR. Here, we aimed to quantify reflux grading reliability and develop an improved model to perform this task. We asked multiple raters to grade each VCUG to determine the agreed VUR grade for each image. Then, we quantified each renal unit’s ureteral tortuosity and ureteral dilatation. We then developed a machine learning model, qVUR, which uses these features to determine VUR grade with improved reliability over clinical grading (see Figure).

What We Found

In a large multi-institutional database encompassing 1,492 renal units, traditional subjective reflux grading showed low inter-rater reliability. We found ureteral tortuosity and dilatation to be strongly correlated with VUR grade. Importantly, qVUR was able to predict VUR grade with human-like accuracy. Moreover, qVUR improved reliability of VUR grading from “fair agreement” to “substantial agreement” between clinicians. Model performance was stable across external validation and unbiased by age, gender, or indication for VCUG.

Limitations

Our study is limited by a lack of clinical outcome data, such as number of urinary tract infections or rates of spontaneous resolution. Our model also does not incorporate the appearances of renal calyces into determination of grade, which is different from traditional grading.

Interpretation for Patient Care

We packaged qVUR into a simple, free-to-use online application (https://sickkidsurology.shinyapps.io/qVUR) and publicly shared our full code base. When assessing VUR grade in a clinical or academic setting, qVUR offers a more reliable grade than traditional methods and does not require expert training to use in institutions without specialized pediatric radiologists.

advertisement

advertisement