Large Language Artificial Intelligence Models and Clinical Reasoning: The Frontier in 2024

Popular AI models like GPT demonstrate expert-like reasoning performance but also substantial human-like cognitive biases.

Publicly available large language models (LLMs), such as GPT-4 and Gemini-1.0-Pro, are capable of expert-level clinical reasoning, but they also are susceptible to the same biases that complicate human cognition. Several recent studies illustrate these points.

In one study, six complex clinical vignettes were presented to each of 50 physicians. The physicians were randomized to use either standard diagnostic support tools alone (e.g., online references) or standard diagnostic tools plus GPT-4 (JAMA Netw Open 2024; 7:e2440969). Providing clinicians with GPT-4 access compared with standard tools alone did not enhance diagnostic performance. However, GPT-4 alone outperformed each of the randomized human groups in diagnostic reasoning scores. Th…

Author

Raja-Elie E. Abdulnour, MD

Disclosures

Nothing to disclose

Raja-Elie E. Abdulnour, MD

Disclosures

Nothing to disclose

Large Language Artificial Intelligence Models and Clinical Reasoning: The Frontier in 2024

Topics

Author