Building Supervised Computer Vision Models of Kidney Surgery With Surgical Videos of Donor Nephrectomy

By: Mahendra Bhandari, MD, MBA, Henry Ford Hospital, Detroit, Michigan; Hamid Ali, BE, MS, RediMinds Research, Southfield, Michigan; Vipin Tyagi, MS, MCh (Urology), Sir Ganga Ram Hospital, Delhi, India; Gautam Chaudhury, MS, MCh (Urology), All India Institute of Medical Sciences, Jodhpur; Carolyn Pratt, BA, PhD, RediMinds Research, Southfield, Michigan; Randy Nguyen, BS, RediMinds Research, Southfield, Michigan; Abdul Rahman, BS, RediMinds Research, Southfield, Michigan; Keri Martin, BS, MPH, RediMinds Research, Southfield, Michigan; Rajesh Ahlawat, MS, MCh (Urology), Medanta–The Medicity, Gurgaon, India; Chiruvella Malikarjun, MS, MCh (Urology), Asian Institute of Nephrology and Urology, Hyderabad, India; Aneesh Srivastava, MS, MCh (Urology), Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, India; Deepak Dubey, MS, FRCS, MCh (Urology), Manipal Hospitals, Bangalore, India; Mahesh Desai, MS, FRCS, Muljibhai Patel Urological Hospital, Nadiad, India; Shunji Nagai, MD, PhD, Henry Ford Health System, Detroit, Michigan; Ajay Sharma, MS, FRCS, Royal Liverpool University Hospital, England; Madhu Reddiboina, BE, MS, RediMinds Research, Southfield, Michigan | Posted on: 27 Nov 2023

Introduction

Currently, intraoperative dentification of surgical structures heavily depends on individual surgeon judgment, surgeon experience, time pressure, fatigue, and confirmation bias. Surgical video annotation is fundamental to constructing supervised computer vision (CV) models,¹ whereas CV models could visualize critical structures embedded in opaque undissected surgical target with authenticity and the output could be accessed by interface devices. CV is a kind of machine learning which exploits the capabilities of machines to achieve a human level of understanding of the surgical target. It uses deep learning algorithms and mathematical techniques to analyze quantifiable features in image such as color, texture, and position at the pixel level.² The facial recognition technology used routinely for security and authentication purposes is an example of CV model application.

The evolution of real-time surgical decision support has underscored the need for high-performance CV models during kidney surgeries. Such models, adept at detecting anatomical landmarks, can superimpose patient-specific 3D models onto the surgical scene. This capability is pivotal for surgeons, offering a dynamic view of the surgical scene and facilitating precise dissections, thereby minimizing intraoperative accidents.¹ Our high-performance model achieved mean intersection over unions (IoUs) of 0.49, 0.72, and 0.74 for the renal artery, renal vein, and spleen, respectively, showcasing its accuracy (Figure 1). Consensus building among 3 annotators improved model performance significantly.

Figure 1. A visual representation of real-time inferences produced by the top-performing models on a surgical video. The models adeptly identify renal arteries (highlighted in yellow), renal veins (in green), and the spleen (in purple). The original input images are displayed on the left, with their corresponding model outputs or inferences presented on the right.

Data Collection and Curation

We built a supervised deep learning model with the surgical videos of laparoscopic live donor nephrectomy (LLDN) as a baseline model of normal anatomy of the kidney and commonly encountered structures during kidney surgery. Three experienced urologists and kidney transplant surgeons remotely worked in close collaboration with CV scientists and annotated the renal artery, renal vein, and spleen in the images. A periodic consensus mechanism was evolved to standardize identification and labeling of the structures. We curated 4291 images from 6 surgical videos of LLDN for annotation using our custom-built platform, RediMinds Ground Truth Factory (Figure 2). This platform facilitated remote site collaboration among stakeholders and served as a one-stop solution for data collection, curation, annotation, and postprocessing.

Figure 2. The Ground Truth Factory by RediMinds: a meticulously designed platform tailored for the collaborative creation of artificial intelligence (AI) surgical tools. Ground Truth Factory streamlines data upload, annotation, management, and AI model development, emphasizing secure, swift, and efficient collaboration. With built-in processes for data anonymization prior to upload, the platform ensures robust security, adhering to both Health Insurance Portability and Accountability Act (HIPAA) and General Data Protection Regulation (GDPR) standards. NIST indicates National Institute of Standards and Technology; SSAE 18 SOC2, Statement on Standards for Attestation Engagements 18 System and Organization Controls.

We handled a large amount of data, which could be assessed by the fact that 1 minute-long, high-definition surgical video contains 25 times the data found in a high-resolution computed tomography image.³ Through meticulous data curation, we maximized image variability, ensuring a diverse data set. An automated curation tool built on our baseline model further streamlined the process, helping identify images requiring additional model training. Deep learning algorithms have achieved human level of performance in object classification.²

Annotation and Data Cleaning

The RediMinds Ground Truth Factory platform facilitated the conversion of curated images into accessible jobs for annotators. Initial annotations were undertaken by a urologist, with 2 more joining due to observed inconsistencies. In the second phase, we semiautomated the process and used the model trained for phase 1 to select images from the remaining raw videos. By our semiautomation in the second phase, image selection jumped from 55.7% to 70%. Data cleaning was essential postannotation. We finalized an annotated data set of 2164 images for model training and 324 images for testing. This data set was rich, containing multiple instances of the renal artery, renal vein, and spleen.

Consensus Building

A continuous feedback loop was maintained between the technical team and annotators. Discrepancies in annotations were diligently addressed, with consensus discussions playing a pivotal role. Rules for annotation were established, emphasizing the importance of annotating only visible structures and excluding any obscured or overlapped structures. Consensus building among annotators improved the performance of the model (Figure 3). In consensus building, the annotators’ choices and model’s training process provided the basis for consensus approach.^4-6

Figure 3. Annotation strategies applied by surgeon annotators to a singular surgical image: renal vein, clip, and clip applicator (A); erroneous annotation over renal vein, clip, and applicator (objects of different texture; B); partial annotation of renal vein above the clip applicator (C); and correctly performed annotation of renal vein excluding the clip and the clip applicator (D).

Model Optimization and Testing

One type of CV model architecture is the convolutional neural network. It consists of multiple neural layers including convolutional layers to extract features. Convolutional neural networks are designed to process visual data like images, as in this case.

Four model architectures were selected for use: (1) UNet, a model designed for medical applications, (2) DeeplabV3+, a model widely used in surgical image segmentation for liver, kidney, and small intestine, (3) Detectron 2, a model used by Facebook artificial intelligence applications, and (4) Mask Region–Based Convolution Neural Network, a model that provides both segmentation and detection, including bounding boxes and masks as well as confidence scores, for prediction.

Each model underwent rigorous testing with hyperparameters like batch sizes, data augmentation techniques, number of epochs, and learning rates being optimized. In addition to IoU, we also calculated pixel-level metrics precision, accuracy, and F1 scores. Recall is the proportion of total positives (TPs) out of all instances of a class (false positive [FP] and false negative [FN] or TP/[TP+FN]). The precision is equivalent to specificity. Performance metrics, including IoU, precision, accuracy, and F1 score, were employed to gauge model efficacy (Table). Precision recall curves were constructed for standout models. F1 scores provide an overall metric incorporating precision and recall.⁷

Table. A Detailed Presentation of Performance Metrics—Intersection Over Union, Precision (Specificity), Recall (Sensitivity), and F1 Score—for Each Anatomical Structure (Renal Artery, Renal Vein, and Spleen)

Model	Deeplab			Mask RCNN			Detectron2			Unet
Class	RA	RV	SP	RA	RV	SP	RA	RV	SP	RA	RV	SP
IoU	0.23	0.58	0.70	0.49	0.72	0.74	0.41	0.68	0.66	0.16	0.50	0.44
Precision (specificity)	0.53	0.80	0.87	0.74	0.87	0.88	0.56	0.83	0.86	0.38	0.75	0.86
Recall (sensitivity)	0.79	0.87	0.82	0.64	0.81	0.81	0.73	0.8	0.74	0.78	0.81	0.74
F1 score	0.64	0.83	0.84	0.68	0.84	0.85	0.64	0.81	0.8	0.52	0.78	0.80
Abbreviations: IoU, intersection over union; Mask RCNN, Mask Region–Based Convolution Neural Network; RA, renal artery; RV, renal vein; SP, spleen. These metrics are derived from 4 leading computer vision algorithms: Deeplab V3, Mask RCNN, Detectron 2, and Unet. The top-performing metric for each category is accentuated in green.

Comments

The endeavor to develop CV models for anatomical landmark detection is a significant stride toward our overarching goal: creating intraoperative guidance tools for kidney surgery. The past decade has witnessed a renaissance in CV models, with their integration into medical devices becoming increasingly prevalent. One of the primary challenges we faced was the dearth of large, labeled data sets. Precise annotation is a labor-intensive process, necessitating collaboration between urologists and computer scientists.

Our choice of LLDN was strategic, offering a model closely aligned with the standard surgical anatomy of the kidney. This foundational model can be expanded to accommodate kidney pathologies, enhancing its applicability to a broader range of kidney surgeries. The focus on larger anatomical structures, like the renal artery, renal vein, and spleen, was deliberate, allowing us to refine our model-building pipeline and foster consensus among annotators.

The models we optimized in this study are emblematic of the cutting-edge advancements in CV as applied to medical imaging. As we look ahead, the inclusion of additional classes will be crucial, enabling comprehensive surgical scene recognition. Structures like the left gonadal vein, ureter, and adrenal gland, which are susceptible to surgical injury, will require extensive data sets to account for their variability.

Acknowledgements

This material is based upon work supported by the National Science Foundation under Small Business Technology Transfer Grant No. 1953822. RediMinds, Inc has provided further funding for this project. Dave Meinhard, Vattikuti Foundation, provided support for the video.

Hashimoto DA, Rosman G, Rus D, Meireles OR. Artificial intelligence in surgery: promises and perils. Ann Surg. 2018;268(1):70-76.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211-252.
Natarajan P, Frenzel JC, Smaltz DH. Demystifying Big Data and Machine Learning for Healthcare. Routledge; 2021.
Bhandari M, Hamid A, Tyagi V, Choudhary GR, Mallikarjuna C, Desai M. The art of data labelling for building supervised computer vision models for kidney surgery. Eur Urol. 2022;81:S1839-S1840.
Reddiboina M, Ali H, Pratt C, Chaudhary G, Tyagi V, Bhandari M. How did we improve the model performance of the supervised computer vision model for laparoscopic live donor nephrectomy. Eur Urol. 2022;81:S1605-S1606.
Bhandari M, Ali H, Desai M, et al. MP10-10 Complexities in annotating surgical videos to build supervised deep learning models for laparoscopic live donor nephrectomy. J Urol. 2021;206(Suppl 3):e172.
Salman M, Riaz A, Sajid H, Hasan O. m2caiSeg: semantic segmentation of laparoscopic images using convolutional neural networks. ArXiv. 2020;10.48550/arXiv.2008.10134.