Attention: Restrictions on use of AUA, AUAER, and UCF content in third party applications, including artificial intelligence technologies, such as large language models and generative AI.
You are prohibited from using or uploading content you accessed through this website into external applications, bots, software, or websites, including those using artificial intelligence technologies and infrastructure, including deep learning, machine learning and large language models and generative AI.

Will Advances in Medical AI Race Ahead While Our Wisdom to Apply Those Advances Crawls?

By: Christopher Weight, MD, MS, Cleveland Clinic, Ohio | Posted on: 03 Jul 2024

“I want you all to write down one question,” my gray-haired professor said. This was one quarter of a century ago and I was a first-year medical student at the University of Utah, fresh out of our white coat ceremony, and seated in the lecture hall with my classmates at the dawn of our medical careers. “This could be any sort of question,” he continued, “but the most probing, the most intriguing, or the question that prompts the most thought will be rewarded with a $100 bookstore voucher!”

As one of the few medical students at the time who was married, and one of the even fewer who was a parent, that $100 got my attention. I labored for a while to produce a good question. Though my exact wording escapes me, I essentially asked, “Will advances in medical technology race ahead while our wisdom to apply those advances crawls?” Much to my surprise, I won the $100 book scholarship, but also to my surprise, this question continues to weigh on my mind to this day.

Do not get me wrong, I am not a luddite; I believe artificial intelligence (AI) can improve patient outcomes in urology and medicine. The possibilities are endless. There are 2 very promising AI approaches that I believe will change urology in the very near future, 2 reasons why AI will best doctors on certain clinical tasks, and 2 pitfalls where I am concerned that AI can go very wrong.

Large Language Models

At this point, large language models (LLMs) like OpenAI’s ChatGPT, Google’s BERT, or Meta’s Llama hardly need an introduction. These models can interpret and generate natural language at a level that is stunningly human-like. These models have the potential to improve patient outcomes in 3 straightforward ways:

  1. Answering medical questions
  2. Automating medical charting and billing
  3. Curating electronic medical record data for use in quality control or research

From chatbots that can answer patients’ questions in real time to AI scribes that can document and summarize the clinical encounter, tools powered by LLMs can help mitigate many aspects of medical practice that lead to burnout. Our team recently also used an LLM to extract structured pathology data from ∼10,000 kidney tumor surgeries, and it achieved 95% to 97% agreement with hand-extracted data by medical personnel. LLMs could be the advancement that finally realizes the potential promised by electronic medical records for so many years.

Computer Vision

image
Figure. The image above was created in a few minutes with the help of OpenAI’s ChatGPT 4.0 and Adobe Photoshop as a visual way to prompt reflection on my question. It depicts a man who clearly is unable to move as fast as all the creations around him. The blueprints of all “his” technological advancements are bursting out of him, symbolizing that everything visible was created by humans. Everything, that is, except humans themselves and the plants at his feet. These were created by the wisdom of nature. He is facing us, and he does not seem to know that one of his very creations is flying directly toward him and he may not have time to get out of the way. If you look closely at his expression, you might gather that he is uneasy with his creations. He suspects that, although his creations may be flashy, fast, and wonderful, something is not right here.

Although language is powerful, medicine is largely built on more expressive modalities like images and video. The Figure is just one example of how modern AI techniques allow for computers to understand, interpret, and even generate visual data. Three areas primed for immediate impact with medical AI tools include:

  1. Enhanced assessment, objectivity, and quantitation in the interpretation of medical images by radiologists and pathologists
  2. Visual augmentation of medical procedures and surgery
  3. Powerful, patient-specific simulation for training and education

Our team at the Cleveland Clinic has worked to develop algorithms to compute nephrometry scores based on CT scans of renal tumors. The data curation, labeling, and refinement necessary for this work took some time, but then, overnight, we could generate every kind of nephrometry score on thousands of images. When we wanted to compare them to human experts, it took another year to collect the scores generated by humans! Now that we have a system in place, we can generate scores on 20,000 patients (about the seating capacity of a professional sports arena) with minimal manual effort. Can you imagine how long it would take humans to score 20,000 scans?

Why AI Will Beat Doctors on Some Clinical Tasks

AI has 2 big advantages over human doctors on certain tasks:

  1. AI systems are trained once and deployed widely, whereas human doctors undergo training in the hundreds of thousands per year.
  2. Judgment, prediction, and interpretation can be objective and reproducible.

It is true that AI models routinely cost massive amounts of both time and money to train, but after that they can be duplicated and used endlessly. Even as medicine evolves with more advanced treatments and diagnostics, the investment to do the necessary fine-tuning or retraining will be insignificant compared to what is needed for human medical education today. If these models really do surpass all humans on accuracy in diagnosis or outcomes from surgery, it will be difficult to justify a human in their stead.

The second problem is human judgment is so variable. The interpretation of medical images is variable from radiologist to radiologist or pathologist to pathologist where agreement is only between 40% to 70% in many studies.1 But what is worse, it is variable within the same radiologist or pathologist. Indeed, radiologists and pathologists often only agree with themselves 70% to 80% of the time when looking at the same image months apart! AI will surpass humans in some of these judgment tasks because we can program out bias and the interpretation can be 100% reproducible.

AI Pitfalls

While AI systems are usually only as good as the data on which they are trained (although this is debated), they can always only be trusted to be as good as the data on which they are validated. As we have seen with nomograms, even a prediction tool that was carefully validated on prospective data 10 years ago cannot be trusted to perform as well today. Unfortunately, most of the currently available AI algorithms are proprietary and difficult to independently test or verify. Large independent collaboratives, led by physicians with multi-institutional and international representation of patients are needed to validate new models. For example, you could imagine a collection of patients with prostate cancer with digitized H&E slides of their cancer, MRIs of their prostates, and appropriate clinical outcomes managed by an international team of urologists, radiologists, and pathologists who could test any new claim that an algorithm is improving patient care. Personally, I would much sooner trust such independent verification over the studies conducted by AI companies in-house.

Finally, as we develop these wonderful tools, we need to be thoughtful in how we apply them. Let us not implement tools that will worsen our lives or run us down.

I conclude with a quote from the introduction of John Green’s The Anthropocene Reviewed: “We are powerful enough to radically reshape Earth’s climate and biodiversity, but not powerful enough to choose how we reshape them.”2 So here we stand, powerful enough to create language, art, and music; clone voices; create movies; and reshape urologic care with AI. It is an awesome power. But let us also be wise enough to apply these tools in such a way that they make our lives better as physicians and allow us to achieve superior patient outcomes—without making ourselves obsolete.

  1. Weight CJ, Atwell TD, Fazzio RT, et al. A multidisciplinary evaluation of inter-reviewer agreement of the nephrometry score and the prediction of long-term outcomes. J Urol. 2011;186(4):1223-1228. doi:10.1016/j.juro.2011.05.052
  2. Green J. The Anthropocene Reviewed. Dutton; 2021.

advertisement

advertisement