Loading...
Publicly available large language models (LLMs), such as GPT-4 and Gemini-1.0-Pro, are capable of expert-level clinical reasoning, but they also are susceptible to the same biases that complicate human cognition. Several recent studies illustrate these points.
In one study, six complex clinical vignettes were presented to each of 50 physicians. The physicians were randomized to use either standard diagnostic support tools alone (e.g., online references) or standard diagnostic tools plus GPT-4 (JAMA Netw Open 2024; 7:e2440969). Providing clinicians with GPT-4 access compared with standard tools alone did not enhance diagnostic performance. However, GPT-4 alone outperformed each of the randomized human groups in diagnostic reasoning scores. Th…