Attention: Restrictions on use of AUA, AUAER, and UCF content in third party applications, including artificial intelligence technologies, such as large language models and generative AI.
You are prohibited from using or uploading content you accessed through this website into external applications, bots, software, or websites, including those using artificial intelligence technologies and infrastructure, including deep learning, machine learning and large language models and generative AI.

ARTIFICIAL INTELLIGENCE What All Urologists Should Know: What Generative Artificial Intelligence Is and What It Isn’t

By: Andrew T. Gabrielson, MD, The James Buchanan Brady Urological Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland; Brendan Wallace, MD, The James Buchanan Brady Urological Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland; Anobel Y. Odisho, MD, MPH, University of California, San Francisco School of Medicine | Posted on: 19 Jan 2024

Artificial intelligence (AI) describes the simulation of human-like abilities such as writing, problem-solving, and parsing of language carried out by a computer system or model.1 While AI technologies have been improving for decades, recent increases in computational power and the creation of transformer-based models dramatically accelerated their development.2 In late 2022, the public release of the user-friendly ChatGPT (generative pretrained transformer) triggered a surge of interest in generative AI (GAI).3,4 GAI applies to all modalities, such as images, videos, and audio. Many investigators and vendors are incorporating GAI tools into routine care delivery.5 It is important for urologists to understand GAI and its strengths and limitations to ensure effective and ethical use in clinical practice.

What Is GPT, and What Are Large Language Models?

Generative (GPT)

GAI is a subset of AI that focuses on the creation of novel content (text, images, audio) based on patterns and relationships present in a model’s training data. Given some input, the model uses probability to generate output.

Pretrained (GPT)

GAI models are trained on massive datasets and generate outputs based on a static dataset rather than recursively learning from an ever-growing dataset.1 Many of today’s publicly available text-based models (ie, ChatGPT) were first trained with general data scraped from the internet. To increase their utility, the models are further refined with specialized datasets, using reinforcement learning or incorporating human feedback into model outputs.1 Over many iterations of human users interacting with the model and providing feedback, it generates higher-quality output.

Transformer (GPT)

Transformer-based models are neural networks that learn context and track relationships in sequential data like the words in a sentence. What makes transformers unique is self-attention, which allows the model to detect subtle ways in which even distant data elements interact. They use differential weighting to apply significance to various inputs and relationships and use probability to predict the next best output (Figure).

image

Figure. Given the prompt, “Who is Taylor Swift dating?,” the model may provide many potential outputs. However, with self-attention and reinforcement learning with human input, the model will prioritize the answer with the highest probability of reward. The probability of reward with outputs like Joe Jonas, Taylor Lautner, or Travis Kelce may be significantly higher than outputs such as Andrew Gabrielson, Brendan Wallace, or Anobel Odisho if the large language model dataset is trained on data obtained from the internet and refined based on user feedback.

Large language models and capabilities in care delivery

Large language models (LLMs) are a type of GPT that is specifically designed and trained for natural language processing tasks. These models are considered “large” because they are pretrained on vast amounts of text data (ie, in some cases hundreds of billions of parameters at computational costs in the tens of millions of dollars).

Nearly every major tech company has developed their own foundation model, each with their own performance characteristics and levels of public accessibility. LLMs such as ChatGPT or Bard are widely available conversational chatbots that urologists can use to simplify administrative tasks, assist with low-stakes compositions, and streamline certain aspects of clinical care.4 GAI can be used to draft emails, summarize information from lengthy reports or research articles, assist with literature review, offer ideas to help spark creativity, generate text to advocate for coverage of clinical services, design complex call schedules, and even assist with programming (Table).6 LLMs may also be useful for categorization of health information into digestible formats for both patients and providers.7

Table. What Can Text Generative Artificial Intelligence Do?

Task Description
Text generation Create coherent and contextually relevant text based on a given prompt
Language translation Translate text from one language to another, transform the style of a given text (eg, from formal to informal), simulate the language patterns and personalities of different characters
Text summarization Summarize long pieces of text into shorter, concise versions
Text completion Provide suggestions to complete a given text or sentence
Question answering Answer questions posed in natural language based on the information provided
Sentiment analysis Determine the sentiment (positive, negative, neutral) expressed in a piece of text
Named entity recognition Identify and classify entities (such as persons, organizations, locations) in a text
Information extraction Extract relevant information from unstructured text
Conversational agents Engage in natural language conversations, providing responses and maintaining context
Data generation Generate synthetic data for various applications
Text-based problem-solving Assist in solving problems by providing relevant information and insights

As more LLMs become designed for clinical care, the authors anticipate that GAI may be incorporated into more high-stakes tasks such as helping patients to interact with their health data (ie, “what are my cancer treatment options?”), understand jargon-laden imaging or pathology reports, and triage clinical symptoms and the need for medical evaluation.7 These models may also demonstrate benefit in the deployment of health care such as supply chain and capacity management, clinical decision support, utilization management, billing/reimbursement, and quality benchmarking.5 LLMs that streamline documentation of clinical encounters have already been deployed in clinical settings.

What GAI Is Not

At its core, GAI is rooted in probability. GAI does not “understand” what it is writing. It lacks inherent morality and truth and does not hold personal beliefs. The model simply strives to produce the most likely response given its training data and the answers that it has been rewarded for in prior interactions. All GAI models are subject to the limitations of their training data; since many of today’s publicly available LLMs are trained with data primarily derived from the internet, these models are susceptible to the same biases, stereotypes, and inaccuracies of the information found on the internet. Furthermore, due to inconsistent training data and continuous self-attention, errors can accumulate in responses, and thus GAI is prone to error and hallucination (generation of nonsensical content). GAI is not able to problem solve, but rather generates responses to similar problems it has seen in its training set.

GAI tools are masters of correlation, not causation. In many instances this may be adequate, but the two should not be confused.

For these reasons, urologists should use discretion when incorporating GAI into their clinical or administrative workflow. GAI responses should not be used without careful review, particularly for high-stakes tasks such as clinical decision-making. Since the release of GAI tools for public consumption, many users quickly found that the quality of the output was highly dependent on the quality of prompts.

Currently publicly available and cloud-based GAI models are not HIPAA (Health Insurance Portability and Accountability Act of 1996) compliant, and protected health information should not be entered. Users should also use caution in submitting any sensitive business data or intellectual property without clear understanding of the vendor’s data use and retention policies, as they may become part of future training data.8 Organizations can implement these tools in a safe and secure fashion, such as running an open-sourced tool locally or an organizationally approved implementation of a cloud-based tool. For example, University of California, San Francisco has deployed a HIPAA-compliant instance of OpenAI’s GPT, allowing safe use of the LLMs with patient data.

These tools are not a replacement for manuscript writing, but they can be useful for tasks such as brainstorming, summarization, or copyediting. Journals such as Nature, Science, Journal of the American Medical Association, Proceedings of the National Academy of Sciences, European Urology, and The Journal of Urology® have published guideline statements regarding their use.9-11 Additionally, the cross-disciplinary CANGARU guidelines (ChatGPT, Generative Artificial Intelligence, and Natural Large Language Models for Accountable Reporting and Use) seek to codify ethical use, disclosure, and proper reporting of such tools in academia.12

Conclusion

The integration of GAI into various aspects of health care presents opportunities and challenges. In their present form, LLMs offer the potential to reduce administrative burden by assisting in low-stakes compositions. However, it is crucial to recognize that GAI does not possess inherent understanding and only operates based on patterns in its training data, and thus can be susceptible to biases and errors in its responses.

For urologists, the use of GAI should be approached with discretion, especially in high-stakes clinical scenarios. Quality outputs depend on the quality of inputs (“garbage in, garbage out”), and users should exercise caution when handling sensitive or protected health information.

As the field of GAI continues to evolve, ethical considerations, transparency, and proper reporting will play a crucial role in its responsible integration into clinical and academic practices.

Funding: None.

Disclosures: A.Y.O. has received in-kind research support from Microsoft Research. The other authors have no financial relationships to disclose.

Attestation statement: No data were used in the preparation of the manuscript.

  1. Riedl M. A very gentle introduction to large language models without the hype. Medium. 2023. https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e
  2. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. arXiv. 2017;10.48550/arXiv.1706.03762.
  3. Yang S. The abilities and limitations of ChatGPT. Anaconda. 2022. Accessed December 10, 2022. https://www.anaconda.com/blog/the-abilities-and-limitations-of-chatgpt
  4. Gabrielson AT, Odisho AY, Canes D. Harnessing generative artificial intelligence to improve efficiency among urologists: welcome ChatGPT. J Urol. 2023;209(5):827-829.
  5. Sahni NR, Carrus B. Artificial intelligence in U.S. health care delivery. N Engl J Med. 2023;389(4):348-358.
  6. Roose K. The brilliance and weirdness of ChatGPT. New York Times. December 5, 2022:B1.
  7. Jeblick K, Schachtner B, Dexl J, et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol. 2023;10.1007/s00330-023-10213-1.
  8. Powles J, Hodson H. Google DeepMind and healthcare in an age of algorithms. Health Technol (Berl). 2017;7(4):351-367.
  9. Thorp HH. ChatGPT is fun, but not an author. Science. 2023;379(6630):313-313.
  10. Authorship and AI tools. Committee on Publication Ethics; 2023.
  11. Flanagin A, Bibbins-Domingo K, Berkwits M, Christiansen SL. Nonhuman “authors” and implications for the integrity of scientific publication and medical knowledge. JAMA. 2023;329(8):637.
  12. Cacciamani GE, Eppler MB, Ganjavi C, et al. Development of the ChatGPT, generative artificial intelligence and natural large language models for accountable reporting and use (CANGARU) guidelines. arXiv. 2023;10.48550/arXiv.2307.08974.

advertisement

advertisement