MEDICAL STUDENT COLUMN Appropriateness of ChatGPT Responses Adjusted for Patient Literacy: A Follow-up Study

By: Yash B. Shah, BSQ, Thomas Jefferson University, Philadelphia, Pennsylvania; Sohan Shah, BS, Thomas Jefferson University, Philadelphia, Pennsylvania; Anushka Ghosh, BS, Thomas Jefferson University, Philadelphia, Pennsylvania; Costas D. Lallas, BS, Thomas Jefferson University, Philadelphia, Pennsylvania; Mihir S. Shah, MD, Thomas Jefferson University, Philadelphia, Pennsylvania | Posted on: 17 Jul 2024

Given the surge in popularity of generative artificial intelligence (AI), patients are increasingly utilizing chatbots, including ChatGPT, for medical advice. Numerous studies have evaluated the accuracy of ChatGPT-written medical information; however, few have assessed its readability.^1-3 Given that the average American has a sixth-grade reading level, it is imperative for online education to not only be accurate but also accessible and actionable.^4,5

We previously demonstrated that AI technology has the power to improve readability, as measured by validated quantitative scoring when provided with the modifier “Explain it to me like I am in sixth grade.”⁴ Here, we present a brief follow-up study that qualitatively analyzes the changes made by AI when presented with such a modifier. We hypothesized that while our prompt would yield simpler language, ChatGPT may perceive the prompt literally and return language inappropriate for adult urology patients.

Methods

In January 2024, ChatGPT 3.5 was given 44 prompts on common urology topics, including penile augmentation, infertility, premature ejaculation, low testosterone, and sperm retrieval. The reading level modifier “Explain it to me like I am in sixth grade” was added as previously described.⁴ The responses were then investigated via qualitative textual analysis by 3 independent reviewers to identify sentences containing information geared specifically to a child in sixth grade, as opposed to an adult who had the reading comprehension of a sixth grader.

Our Findings

While the medical information presented by ChatGPT was written in plain language, the responses often included phrases that are inappropriate for an adult patient. In fact, several responses included phrases that could cause potential distress to patients experiencing challenges with sexual health or fertility. This poses concerns for the use of current AI chatbots in patient education; although quality and quantitative readability have been demonstrated, nuanced factors such as empathy and humanism remain unclear.

Ninety-one percent of the responses about low testosterone recommended that the patient seek parental advice. While this advice is important for a true sixth grader, it is likely inappropriate when counseling adults with a lower reading comprehension level. Further, 20% of ChatGPT premature ejaculation content compared premature ejaculation to a video game, with another 20% describing climax as “something special happening.” Additionally, 40% of infertility responses described intercourse as a “baby-making game,” a term that would likely be distressing. This terminology, when coupled with the high mental health burden in individuals facing premature ejaculation and infertility, lacks empathy and poses significant risk of further lowering the patient’s self-esteem.

Eighty-eight percent of the ChatGPT responses discussing sperm retrieval began by promising the user a simple answer; for example, they often started with the statement “Don’t worry; I’ll explain it in simple terms.” Phrases such as this might have an infantilizing effect on the patient by implying that they are unable to understand critical aspects of their own health.

Overall, although we previously demonstrated that AI has statistically significant benefits to readability when evaluated via validated quantitative formulae, it appears that the actual content lacks empathy, appropriate tone, and an understanding of figurative language. Ultimately, while AI has the powerful ability to rewrite content for patients, the optimal prompts to ensure high-quality and usable content remain unclear. This study also represents a major limitation of quantitative metrics. Although these previously utilized formulae remain the current standard for evaluating readability, our present findings indicate the need to develop new research tools for studying AI in patient education.

Key Takeaways

OpenAI’s ChatGPT is a powerful tool with the potential to revolutionize patient education through personalization, increased accessibility, and increased speed of development. Nonetheless, further research is necessary to develop the optimal tool and patient-specific prompt before ChatGPT can be regularly recommended to patients seeking to learn more about their health.

Cocci A, Pezzoli M, Lo Re M, et al. Quality of information and appropriateness of ChatGPT outputs for urology patients. Prostate Cancer Prostatic Dis. 2023;27(1):103-108. doi:10.1038/s41391-023-00705-y
Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6:1169595. doi:10.3389/frai.2023.1169595
Musheyev D, Pan A, Loeb S, Kabarriti AE. How well do artificial intelligence chatbots respond to the top search queries about urological malignancies?. Eur Urol. 2024;85(1):13-16. doi:10.1016/j.eururo.2023.07.004
Shah YB, Ghosh A, Hochberg AR, et al. Comparison of ChatGPT and traditional patient education materials for men’s health. Urol Pract. 2024;11(1):87-94. doi:10.1097/UPJ.0000000000000490
Shah YB, Glatter R, Madad S. In layman’s terms: the power and problem of science communication. Disaster Med Public Health Prep. 2022;1-3. doi:10.1017/dmp.2022.131