UPJ INSIGHT AUA Committee Members Rate Artificial Intelligence–Generated Responses for Female Stress Urinary Incontinence

By: Annie Chen, MD, Houston Methodist Hospital, Texas; Jerril Jacob, MHA, University of Texas Health Houston; Kuemin Hwang, MD, Houston Methodist Hospital, Texas; Kathleen Kobashi, MD, Houston Methodist Hospital, Texas; Ricardo R. Gonzalez, MD, Houston Methodist Hospital, Texas | Posted on: 17 Jul 2024

Chen A, Jacob J, Hwang K, Kobashi K, Gonzalez RR. AUA Guideline Committee members determine quality of artificial intelligence–generated responses for female stress urinary incontinence. Urol Pract. 2024;11(4):693-698. doi:10.1097/UPJ.0000000000000577

Study Need and Importance

Stress urinary incontinence (SUI) affects countless women worldwide. Given ChatGPT’s rising ubiquity, patients may turn to the platform for SUI advice. The urologic community needs to critically evaluate this platform’s output if patients are to use it for adjunctive medical counsel. The objective of this study was to have experts in the field evaluate the quality of clinical information about SUI from the ChatGPT platform (Table).

What We Found

AUA committee members, who are experts in the field, rate ChatGPT-produced responses on SUI as moderate to moderately high quality, moderate reliability, excellent understandability, and poor actionability utilizing standardized questionnaires. The reading level of the material was advanced, which is an area of improvement to make generated responses more comprehensible.

Limitations

Surveys were conducted based on a single ChatGPT query. Variability will exist between responses to the same query at different times. Additionally, this is a small sampling of experts in the field, which may introduce expert bias.

Interpretation for Patient Care

Although the quality of information pertaining to SUI is rated highly, the authors recommend holding ChatGPT to the highest possible standard of complete accuracy and reliability at the minimum. In order to reach this population of patients who desire to converse with natural language processors to obtain health information, the urologic and gynecologic communities should develop this technology and integrate existing patient handouts from major societies like the AUA; Society of Urodynamics, Female Pelvic Medicine & Urogenital Reconstruction; American College of Obstetricians and Gynecologists; and the International Continence Society to disseminate high-quality information to patients.

Table. Average Scores for DISCERN and Patient Education Materials Assessment Tool Standardized Surveys by Category

	Definition	Diagnosis (SD)	Management	Surgery specific	Overall
DISCERN, average					3.63
Reliability
Average (SD)	3.25 (1.7)	3.00 (1.6)	3.10 (1.60)	3.29 (1.56)	3.16
Raw score average (SD)	65 (5.2)	60 (13.8)	62.3 (9.3)	65.8 (9.8)	63.3
Quality of treatment descriptions if described
Average (SD)	2.38 (1.6)	2.8 (1.3)	3.17 (1.13)	3.62 (1.33)	2.99
Raw score average (SD)	47.6 (5.9)	56.2 (8.2)	63.5 (8.7)	73.5 (15.8)	60.2
Overall quality, score (%)	4 (80)	3.3 (67)	3.6 (72)	4 (80)	3.73 (74.6)
PEMAT
% Understandability (SD)	95.2 (6.7)	88.1 (6.7)	92.9 (7.1)	81.4 (9.5)	89.4
% Actionability (SD)	14.6 (10.6)	18.8 (7.2)	19.3 (4.7)	19.4 (6.2)	18.0
Accuracy, average	3.83	3.33	3.17	3.67	3.5
Abbreviations: PEMAT, Patient Education Materials Assessment Tool.
Average score for 4-point Likert scale for accuracy.

American Urological Association

1000 Corporate Boulevard
Linthicum, MD 21090
Phone: 410-689-3700
Toll-Free: 1-800-828-7866
Fax: 410-689-3912
PublicationsProduction@AUANet.org

About AUANews

AUANews Overview
Editorial Board
Office Staff
Advertising
AUA Overview

Quick Links

Editorial Resources
Medical Student Submissions
Press/Media
Privacy Policy