ARTIFICIAL INTELLIGENCE Fighting Fire With Fire: Large Language Models May Allow Pediatric Urologists to Reclaim Time

By: Pavithra Rao, BS, School of Medicine, Oregon Health & Science University, Portland; Lauren M. McGee, MD, Oregon Health & Science University, Portland; Casey Seideman, MD, Oregon Health & Science University, Portland | Posted on: 19 Jan 2024

Artificial intelligence (AI) platforms are taking health care by storm. In the past year, ChatGPT has made headlines for its performance on the USMLE (United States Medical Licensing Examination) Step 1, Step 2 CK (clinical knowledge), and Step 3, performing at or near the passing threshold for all the exams without needing any additional training.¹ In a cross-sectional study, evaluators preferred ChatGPT responses to physicians’ responses to a public database of questions from a public online forum 79% of the time when judged based on quality and empathy.² ChatGPT also provided mostly appropriate responses to simple cardiovascular disease prevention questions when assessed by preventive cardiology clinicians.³ As physicians become increasingly burdened by electronic documentation demands, products such as ChatGPT have exciting potential as time-saving devices if they can consistently demonstrate accuracy and reliability in responses.

To further our understanding of the applications and limits of ChatGPT specifically, our research team has investigated 2 broad uses of the software that could have potential not just for pediatric urologists, but for all patient-facing health care providers. We answer the following questions below.

Can ChatGPT Produce Pediatric Urology Procedure–Specific Discharge Instructions From Scratch?

Our first study evaluates whether ChatGPT can create accurate and customizable postoperative discharge instructions. Institution-specific postoperative discharge instructions for orchiopexy, circumcision, and hernia repair were used as a control. ChatGPT was instructed to provide discharge instructions for the above surgeries using a standardized prompt that differed only in surgery name. Our study showed that ChatGPT performed worse across all domains compared to our institution’s instructions. It used informal language, was nonspecific, and sometimes provided contradictory or inaccurate information. In the pediatric postoperative instructions, ChatGPT surprisingly advised patients to avoid spicy food which would irritate the incision, which was not applicable (Figure). All instructions were found to be above the goal sixth-grade reading level. In short, ChatGPT was effective at creating the framework of discharge instructions (correct structure), but it was not suitable for creation of discharge instructions without significant oversight. Its usefulness in this domain lies in ChatGPT’s ability to transform a blank page into the scaffold of a document, which may ease but not fully replace the mental burden associated with creating educational materials from scratch.

Figure. Excerpt from ChatGPT-generated pediatric urology postoperative instructions.

Is ChatGPT Able to Accurately Translate Medical Educational Materials From English to Other Languages? Does It Outperform Google Translate?

Our second study compares the use of ChatGPT vs Google Translate (GT) to translate patient instructions, given that an estimated 21 million Americans demonstrate a non-English language preference when visiting a health care provider.⁴ Postoperative discharge instructions for pediatric circumcision and a patient education sheet on undescended testicle were used as source documents. The instructions were copied and pasted into ChatGPT and GT, and translated into Spanish, Vietnamese, and Russian. Members of our institution’s Language Services Department independently reviewed the translations and assessed them based on meaning, expression, and technical errors, with each error given a designation of minor, major, or critical. ChatGPT’s performance was superior to GT for Spanish translations. However, GT was more effective for the Vietnamese translations, and both ChatGPT and GT produced low-quality translations in Russian. As seen in other studies, large language models seem to excel with languages that may be more commonly spoken and written, as they may be able to pull from a larger repository of source material from their learning databases. In short, ChatGPT may be useful to quickly translate educational materials for non-English language preference patients who otherwise would not receive language-concordant information to take home, but these instructions run the risk of including errors in meaning and translation, which at times can lead to critical misunderstandings and poorer care.

Looking to the Future

Large language models such as ChatGPT are in their infancy, though they are poised to provide invaluable advances across many professions, medicine included. We have an opportunity as a field to become early adopters of this technology, and we believe in a time when health care burnout has never been higher it is our responsibility to explore assets that can combat this electronic medical record–related fatigue. With Google Bard and ChatGPT-4 offering newer features, most notably interface with images, there are countless opportunities for future research directions. Notably, Google recently announced new generative AI search capabilities which would allow doctors to pull information from clinical notes, scanned documents, and electronic health records to all be accessed in one place.⁵ As these services continue to become more advanced, they show increasing promise in improving clinical efficiency and helping health care providers reclaim some of their valuable lost time.

Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33.
Shen Y, Heacock L, Elias J, et al. ChatGPT and other large language models are double-edged swords. Radiology. 2023;307(2):e230163.
Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
Flores G. The impact of medical interpreter services on the quality of health care: a systematic review. Med Care Res Rev. 2005;62(3):255-299.
Capoot A. Google announces new generative AI search capabilities for doctors. CNBC. October 9, 2023. https://www.cnbc.com/2023/10/09/google-announces-new-generative-ai-search-capabilities-for-doctors-.html