Attention: Restrictions on use of AUA, AUAER, and UCF content in third party applications, including artificial intelligence technologies, such as large language models and generative AI.
You are prohibited from using or uploading content you accessed through this website into external applications, bots, software, or websites, including those using artificial intelligence technologies and infrastructure, including deep learning, machine learning and large language models and generative AI.

UPJ INSIGHT New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology

By: Linda My Huynh, MSc, MD/PhD Scholars Program, University of Nebraska Medical Center, Omaha; Benjamin T. Bonebrake, BSBA, College of Medicine, University of Nebraska Medical Center, Omaha; Kaitlyn Schultis, BA, College of Medicine, University of Nebraska Medical Center, Omaha; Alan Quach, MD, University of Nebraska Medical Center, Omaha; Christopher M. Deibert, MD, MPH, University of Nebraska Medical Center, Omaha | Posted on: 20 Jul 2023

Huynh LM, Bonebrake BT, Schultis K, Quach A, Deibert CM. New artificial intelligence ChatGPT performs poorly on the 2022 Self-assessment Study Program for urology. Urol Pract. 2023;10(4):408-416.

Study Need and Importance

Artificial intelligence holds great promise in a wide variety of industries, including clinical medicine and medical education. One application of artificial intelligence is the large language model, which has gained attention in popular media due to the release of ChatGPT by OpenAI Inc. Given its promising performance on the United States Medical Licensing Examination, we evaluated its performance on the Self-assessment Study Program for urology, the most used exam and lifelong learning preparatory tool for urologists.

What We Found

Of the 150 questions from the 2022 Self-assessment Study Program exam, ChatGPT correctly answered less than 30% (see Figure). When provided with multiple-choice answer options, it performed marginally better than when compared to open-ended questions (28.2% vs 26.7%). When given regenerative feedback to improve, however, ChatGPT did not answer substantially more questions correctly. Overall, responses to open-ended questions were displayed at a postgraduate reading language level and were often vague and unspecific.

Figure. Template of multiple-choice question posed to ChatGPT. The correct answer to this question is “C. hydrocodone-acetaminophen 5 mg/325 mg by mouth every six hours as needed (12 tablets).” This multiple-choice entry was registered as correct. BID indicates twice daily.

Limitations

The use of artificial intelligence in medical education requires exposure to appropriate training sets and adequate reinforcement by medical professionals at the early stages. With proper training on the urological guidelines, it is possible that artificial intelligence could perform better on standardized assessments. However, the extent of such discipline-specific training for ChatGPT is unknown. As these models are still in their early stages of development, further research is required to understand their limitations and capabilities.

Interpretation for Patient Care

While this article did not assess the role of artificial intelligence in direct patient care, the present study revealed that this artificial intelligence platform does not yet have the training to serve a vital role in urological education. Prior to use in medicine, these models must be rigorously tested and validated to ensure accurate and reliable results.

advertisement

advertisement