Attention: Restrictions on use of AUA, AUAER, and UCF content in third party applications, including artificial intelligence technologies, such as large language models and generative AI.
You are prohibited from using or uploading content you accessed through this website into external applications, bots, software, or websites, including those using artificial intelligence technologies and infrastructure, including deep learning, machine learning and large language models and generative AI.

AUA2023 BEST POSTERS Artificial Intelligence–generated R.E.N.A.L.+ Score Surpasses Human-generated Score in Predicting Renal Oncologic Outcomes

By: Nour Abdallah, MD, Glickman Urological and Kidney Institute, Cleveland, Ohio; Andrew Wood, MD, MS, Glickman Urological and Kidney Institute, Cleveland, Ohio; Tarik Benidir, MD, MSc, Glickman Urological and Kidney Institute, Cleveland, Ohio; Nicholas Heller, BS, University of Minnesota, Minneapolis; Fabian Isensee, PhD, German Cancer Research Center (DKFZ) Heidelberg, University of Heidelberg, Germany; Resha Tejpaul, BA, University of Minnesota, Minneapolis; Dillon Corrigan, MS, Lerner Research Institute, Cleveland, Ohio; Chalairat Suk-ouichai, MD, Siriraj Hospital, Mahidol University, Bangkok, Thailand; Griffin Struyk, MD, University of Minnesota, Minneapolis; Keenan Moore, BA, University of Minnesota, Minneapolis; Nitin Venkatesh, BS, University of Minnesota, Minneapolis; Onuralp Ergun, MD, Cleveland Clinic Research Institute, Ohio; Alex You, BS, Case Western Reserve University, Cleveland, Ohio; Rebecca Campbell, MD, Glickman Urological and Kidney Institute, Cleveland, Ohio; Erick M. Remer, MD, Imaging Institute, Cleveland Clinic, Ohio; Samuel Haywood, MD, Glickman Urological and Kidney Institute, Cleveland, Ohio; Venkatesh Kirshnamurthi, MD, Glickman Urological and Kidney Institute, Cleveland, Ohio; Robert Abouassaly, MD, Glickman Urological and Kidney Institute, Cleveland, Ohio; Steven Campbell, MD, PhD, Glickman Urological and Kidney Institute, Cleveland, Ohio; Nikolaos Papanikolopoulos, PhD, University of Minnesota, Minneapolis; Christopher J. Weight, MD, MS, Glickman Urological and Kidney Institute, Cleveland, Ohio | Posted on: 30 Aug 2023

The R.E.N.A.L. (radius, endophytic/exophytic, nearness to the collecting system, anterior/posterior, location relative to the polar lines) nephrometry score aims to quantify the surgical complexity of a renal mass.1 Although prospectively validated as a predictor of perioperative, pathologic, and survival outcomes,1-4 the score has seen limited clinical adoption due to its interobserver variability and considerable needed time for generation by clinicians.5,6 Artificial intelligence (AI) technology could alleviate these 2 barriers, as we previously demonstrated the ability of AI-generated R.E.N.A.L. nephrometry scores to predict oncologic outcomes in patients with renal mass as reliably as human experts, which can be a valuable tool for clinicians.7 Furthermore, an AI scoring system can overcome another fundamental limitation of the R.E.N.A.L. score, being the ordinal nature of its components. Although categorizing continuous variables (eg, a renal mass diameter of 4-7 cm gets a score of 2) was meant to simplify, standardize, and reduce the time needed for the score generation, some valuable information might be lost on the way,8 thus potentially decreasing the score’s predictive accuracy by blurring the information within the same category. For example, a 6.9-cm 95% endophytic renal mass and a 4.1-cm 50% endophytic renal mass are equally scored by the current ordinal R.E.N.A.L. score, however, their prognosis varies considerably. Additionally, the R.E.N.A.L. score equally weighs each of its components and simply adds their values to simplify its generation. An automated AI-based scoring would not be bounded by these limitations and could combine continuous score components through complex multivariate models. Thus, we generated a modified R.E.N.A.L. score (AI+ score) using AI-generated segmentations and continuous rather than ordinal variables combined using multivariate logistic regression. We hypothesized that the AI+ score would have a better predictive ability of oncologic outcomes than that of scores using ordinal variables.

image

Figure 1. Receiver operating characteristic curves assessing the predictive ability of pathologic outcomes for the different R.E.N.A.L. (radius, endophytic/exophytic, nearness to the collecting system, anterior/posterior, location relative to the polar lines) scores: artificial intelligence (AI)–generated continuous score (red), AI-generated categorical score (green), and human-generated score (blue). FPF indicates false-positive fraction; TPV, true-positive value.

image

Figure 2. Receiver operating characteristic curves assessing the predictive ability of surgery type for the different R.E.N.A.L. (radius, endophytic/exophytic, nearness to the collecting system, anterior/posterior, location relative to the polar lines) scores: artificial intelligence (AI)–generated continuous score (red), AI-generated categorical score (green), and human-generated score (blue).

Table 1. Baseline Characteristics (N=300)

Characteristic
Gender, No. (%)
Female 120 (40)
Male 180 (60)
Age, median (IQR), y 60 (51-68)
Tumor diameter, median (IQR), cm 4.20 (2.60-6.12)
Body mass index, median (IQR), kg/m² 29 (26-35)
Baseline eGFR, median (IQR), mL/min/1.73 m² 72 (60-81)
AI-R.E.N.A.L. score, median (IQR) 8.00 (6.00-9.00)
Human-R.E.N.A.L. score, median (IQR) 8.00 (6.25-9.00)
Malignant renal mass, No. (%) 275 (92)
Pathologic T-stage, No. (%)
0 24 (8.0)
1a 121 (40)
1b 60 (20)
2a 15 (5.0)
2b 5 (1.7)
3 8 (2.7)
3a 62 (21)
4 5 (1.7)
Tumor necrosis, No. (%) 69 (23)
Tumor grade, No. (%)
0 25 (9.3)
1 33 (12)
2 119 (44)
3 66 (25)
4 26 (9.7)
Surgical technique, No. (%)
Laparoscopic 49 (16)
Open 79 (26)
Robotic 172 (57)
Nephrectomy type, No. (%)
Partial 188 (63)
Radical 112 (37)
Estimated blood loss, median (IQR), mL 200 (100-400)
Complications, No. (%) 47 (16)
Length of hospital stay, median (IQR), d 3.00 (2.00-4.00)
Abbreviations: AI, artificial intelligence; eGFR, estimated glomerular filtration rate; IQR, interquartile range; R.E.N.A.L., radius, endophytic/exophytic, nearness to the collecting system, anterior/posterior, location relative to the polar lines.

Table 2. Multivariable Logistic Regression Models Using Continuous Artificial Intelligence+ Score Components

R.E.N.A.L. component Malignant renal mass High stage High grade Tumor necrosis
OR (95% CI) P value OR (95% CI) P value OR (95% CI) P value OR (95% CI) P value
R (per cm) 1.02 (0.98-1.03) .37 2.84 (1.34-4.45) < .001 1.80 (0.33-3.30) .01 4.41 (2.60-6.10) < .001
E 1.83 (0.20-1.71) .59 0.17 (0.02-1.10) .06 0.10 (0.02-0.61) .008 1.01 (0.99-1.03) .22
N (per mm) 0.96 (0.92-1.01) .16 0.96 (0.90-1.02) .22 0.97 (0.92 -1.02) .20 0.96 (0.90-1.03) .29
L 1.01 (0.99-1.02) .18 1.00 (0.99-1.01) .95 1.00 (0.99-1.01) .61 0.96 (0.68-1.35) .93
Abbreviations: CI, confidence interval; OR, odds ratio; R.E.N.A.L., radius, endophytic/exophytic, nearness to the collecting system, anterior/posterior, location relative to the polar lines.

This was a retrospective study of 300 consecutive patients with preoperative CT scans showing suspected renal cancer at a single institution from 2010-2018. We excluded patients with a tumor thrombus and used the largest tumor to generate the nephrometry score in case of multiple tumors. The inclusion criteria were based on the previously published KiTS19 (kidney and kidney tumor segmentation challenge) protocol.9 This cohort, with scans, segmentations, clinical details, and outcomes, is now publicly available at https://kits21.kits-challenge.org/. We used a deep neural network approach to generate kidney segmentation masks of parenchyma and tumor automatically. Geometric algorithms automatically estimated the score components as ordinal and continuous variables. Each tumor was assigned an AI-generated continuous score for each component of the R.E.N.A.L. score (AI+ score), an AI-generated ordinal R.E.N.A.L. score (AI-score), and a human-calculated traditional R.E.N.A.L. score (H-score). The H-score was tabulated by 3 trained medical personnel. The AI+ score was considered a combination of the automatically generated R.E.N.A.L. continuous components without performing the addition of the variables to generate a total score. Thus, its predictive capacity of outcomes was studied using multivariate logistic regression. We then compared the predictive ability of outcomes between the AI+, AI, and H-scores, and we analyzed the AI+ score components’ relative importance.

The baseline characteristics of the cohort are represented in Table 1. The median age was 60 years (IQR 51-68), and 40% were female. The median tumor size was 4.2 cm (2.6-6.12), and 92% of the tumors were malignant, including 27%, 37%, and 23% with high-stage, high-grade, and necrosis, respectively. As represented in Figure 1 and Figure 2, the AI+ score demonstrated a superior capacity over AI and H-scores for predicting malignancy (AUC 0.69 vs 0.67 vs 0.64, respectively), high-stage (AUC 0.82 vs 0.65 vs 0.71, respectively), high-grade (AUC 0.78 vs 0.65 vs 0.65, respectively), pathologic tumor necrosis (AUC 0.81 vs 0.72 vs 0.74, respectively), and a partial nephrectomy approach (AUC 0.88 vs 0.74 vs 0.79, respectively). Of AI+ score components, the maximal tumor diameter (“R”) was the most important outcome predictor. As shown in Table 2, using multivariable logistic regression modeling, the R component was a statistically significant predictor of high-stage disease (per cm, odds ratio [OR] 2.84 [1.34-4.45; P < .001]), high-grade disease (per cm, OR 1.80 [0.33-3.30]; P = .01), and the presence of tumor necrosis (per cm, OR 4.41 [2.60-6.10]; P < .001). Our study had some limitations, including previously reported segmentation-related barriers.7 Although the algorithm could not generate an AI score for 6 cases for an unclear reason, we believe a 2% failure rate is admissible at this point. Furthermore, despite our single-center experience for the surgeries, the CT scan images were acquired from around 70 medical facilities, thus laying a strong foundation for the generalizability and external validity of our work.

The AI+ score, generated with continuous variables and without human intervention at any step, provided meaningful predictions of oncologic outcomes superior to those generated in an ordinal fashion. These fully automated segmentations and nephrometry scores can offer substantial benefits on the clinical and research levels, despite being in their debut. Indeed, automatically generating R.E.N.A.L. nephrometry scores increases the uniformity, quality, and volume of renal mass complexity data by reducing interobserver variability in interpretation and generation and the needed manpower and time. Furthermore, in the era of new developments in the management of renal cell carcinomas, such as adjuvant immunotherapy use and active surveillance, there is a crucial need to enhance the accuracy of patients’ counseling on the possibility of adverse pathology and other oncological outcomes. Our data objectively reassured clinicians to rely on the fully automated version of the known R.E.N.A.L. nephrometry score to predict such important outcomes and can be considered a key intermediary step in the development of even more advanced machine-learning–based radiomic scoring systems.

  1. Joshi SS, Uzzo RG. Renal tumor anatomic complexity: clinical implications for urologists. Urol Clin North Am. 2017;44(2):179-187.
  2. Kutikov A, Uzzo RG. The R.E.N.A.L. nephrometry score: a comprehensive standardized system for quantitating renal tumor size, location and depth. J Urol. 2009;182(3):844-853.
  3. Kutikov A, Smaldone MC, Egleston BL, et al. Anatomic features of enhancing renal masses predict malignant and high-grade pathology: a preoperative nomogram using the RENAL nephrometry score. Eur Urol. 2011;60(2):241-248.
  4. Weight CJ, Atwell TD, Fazzio RT, et al. A multidisciplinary evaluation of inter-reviewer agreement of the nephrometry score and the prediction of long-term outcomes. J Urol. 2011;186(4):1223-1228.
  5. Chapin BF, Wood CG. The RENAL nephrometry nomogram: statistically significant, but is it clinically relevant?. Eur Urol. 2011;60(2):249-251.
  6. Spaliviero M, Poon BY, Aras O, et al. Interobserver variability of R.E.N.A.L., PADUA, and centrality index nephrometry score systems. World J Urol. 2015;33(6):853-858.
  7. Heller N, Tejpaul R, Isensee F, et al. Computer-generated R.E.N.A.L. nephrometry scores yield comparable predictive results to those of human-expert scores in predicting oncologic and perioperative outcomes. J Urol. 2022;207(5):1105-1115.
  8. Altman DG, Royston P. The cost of dichotomising continuous variables. BMJ. 2006;332(7549):1080.
  9. Heller N, Isensee F, Maier-Hein KH, et al. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: results of the KiTS19 challenge. Med Image Anal. 2021;67:101821.

advertisement

advertisement