Journal Briefs: Urology Briefs: An Objective Scoring Tool to Evaluate Surgical Dissection: Dissection Assessment for Robotic Technique

By: Erik B. Vanstrum, BA; Andrew J. Hung, MD | Posted on: 03 Sep 2021

Vanstrum EB, Ma R, Maya-Silva J et al: Development and validation of an objective scoring tool to evaluate surgical dissection: Dissection Assessment for Robotic Technique (DART). Urol Pract 2021; 8: 596.

With increasing use of robotic surgery, there is a need to ensure that trainees are safely and efficiently learning the associated technical skills. Indeed, there is much room for improvement on this front: a recent report revealed that 61% of graduating urology residents do not feel confident performing a robotic radical prostatectomy.¹ Competency assessment has been explored as a means to objectively and reproducibly evaluate surgical ability in the operating room. Such evaluation has implications that extend beyond the training of new surgeons, including a potential role as a credentialing mechanism.

Numerous tools have been developed to evaluate surgical competency, with urology at the forefront of development and implementation of these tools.² Early robotics assessment tools focused on the evaluation of global skills, including surgical autonomy and the ability to operate the robotics controls.³ Newer tools have increased granularity to focus on specific procedures and even steps within a procedure.^4,5 While these tools provide comprehensive feedback, they are narrow in scope. On the other hand, broadly applicable tools such as Global Evaluative Assessment of Robotic Skills, or GEARS, are not designed to provide comprehensive feedback on detailed surgical technique. In this study, we blended common features from previously established evaluation tools in order to create a detailed assessment of a fundamental surgical skill that is common to many procedures: tissue dissection. Dissection Assessment for Robotic Technique (DART) is designed to be both comprehensive and widely applicable.⁶

Figure. DART: 3-point scale.

We began with the Delphi method to validate both structure and content of DART (see figure). After thorough vetting by a multispecialty panel of 14 expert robotic surgeons, a single element of the tool remained contentious; agreement could not be reached on use of a 3-point vs a 5-point scale. Those in favor of a 3-point scale argued that the tool would be more reproducible and standardized in practical use, while those in favor of the more traditional 5-point Likert scale believed that the increased options provided better opportunity to differentiate levels of skill. As consensus could not be reach, we elected to continue our evaluation with both scales.

Next, a group of 10 raters used DART to evaluate a total of 46 surgical videos split evenly between the pelvic lymph node and seminal vesicle dissection steps of robot-assisted radical prostatectomy. These videos were scored using 3-point and 5-point DART scales by both surgeon and nonsurgeon raters over the course of 3 rounds. We showed that the 3-point scale has greater interrater variability as compared with the 5-point scale and that these scales differentiate expert and novice surgeons equally as well. Due to the improved reliability and our analysis suggesting indistinguishable ability to differentiate levels of experience, we recommend use of the 3-point scale for future study.

In creation of this tool, our goal was to methodically demonstrate a robust and transparent validation process. We detailed interrater variability data over three rounds of video scoring with DART, which revealed a slight learning curve. After a 10-video “training round”, raters improved their agreement to a plateaued level. Additionally, we showed that no prior surgical training is required to use DART. Indeed, our nonsurgically trained raters showed better agreement as compared with their surgically trained counterparts.

DART, like other evaluation tools, is resource-intensive in that it requires time-consuming manual scoring of surgical video. As a means to circumvent the resource hurdles of technical skills assessment, crowdsourced evaluation has been trialed and shown to be comparable to scores provided by expert raters.⁷ However, widespread application of assessment tools in an educational or clinical setting may require automation. Machine learning applications in image processing have already shown promise toward automating skills assessment.⁸ Here, we rigorously evaluated DART to ensure a robust tool prior to engaging in computer vision experiments aimed toward automation.

DART is an objective and reproducible tool to evaluate tissue dissection, a foundational skillset applied across surgical procedures. We showed that this tool can be employed in a variety of surgical contexts and can effectively differentiate surgeon experience. Future work will seek to automate this tool.

Okhunov Z, Safiullah S, Patel R et al: Evaluation of urology residency training and perceived resident abilities in the United States. J Surg Educ 2019; 76: 936.
Vaidya A, Aydin A, Ridgley J et al: Current status of technical skills assessment tools in surgery: a systematic review. J Surg Res 2020; 246: 342.
Goh AC, Goldfarb DW, Sander JC et al: Global evaluative assessment of robotic skills: validation of a clinical assessment tool to measure robotic surgical skills. J Urol 2012; 187: 247.
Hussein AA, Ghani KR, Peabody J et al: Development and validation of an objective scoring tool for robot-assisted radical prostatectomy: prostatectomy assessment and competency evaluation. J Urol 2017; 197: 1237.
Raza SJ, Field E, Jay C et al: Surgical competency for urethrovesical anastomosis during robot-assisted radical prostatectomy: development and validation of the robotic anastomosis competency evaluation. Urology 2015; 85: 27.
Vanstrum EB, Ma R, Maya-Silva J et al: Development and validation of an objective scoring tool to evaluate surgical dissection: Dissection Assessment for Robotic Technique (DART). Urol Pract 2021; 8: 596.
Holst D, Kowalewski TM, White LW et al: Crowd-sourced assessment of technical skills: differentiating animate surgical skill through the wisdom of crowds. J Endourol 2015; 29: 1183.
Baghdadi A, Hussein AA, Ahmed Y et al: A computer vision technique for automated assessment of surgical performance using surgeons’ console-feed videos. Int J Comput Assist Radiol Surg 2019; 14: 697.