HEIDI: Schubert, Marc Cicero: Performance of large language models on a neurology board-style examination

Hilfe

Beenden

Markieren

Persönliche Notiz

Andere Formate

BibTeXRIS (Endnote)

Exportieren/Zitieren

Status: Bibliographieeintrag

Standort: ---
Exemplare: ---

	Online-Ressource
Verfasst von:	Schubert, Marc Cicero [VerfasserIn]
	Wick, Wolfgang [VerfasserIn]
	Venkataramani, Varun [VerfasserIn]
Titel:	Performance of large language models on a neurology board-style examination
Verf.angabe:	Marc Cicero Schubert; Wolfgang Wick, MD; Varun Venkataramani, MD, PhD
E-Jahr:	2023
Jahr:	December 7, 2023
Umfang:	11 S.
Fussnoten:	Gesehen am 26.02.2024
Titel Quelle:	Enthalten in: JAMA network open
Ort Quelle:	Chicago, Ill. : American Medical Association, 2018
Band/Heft Quelle:	6(2023), 12 vom: Dez., Artikel-ID e2346721, Seite 1-11
ISSN Quelle:	2574-3805
Abstract:	Recent advancements in large language models (LLMs) have shown potential in a wide array of applications, including health care. While LLMs showed heterogeneous results across specialized medical board examinations, the performance of these models in neurology board examinations remains unexplored.To assess the performance of LLMs on neurology board-style examinations.This cross-sectional study was conducted between May 17 and May 31, 2023. The evaluation utilized a question bank resembling neurology board-style examination questions and was validated with a small question cohort by the European Board for Neurology. All questions were categorized into lower-order (recall, understanding) and higher-order (apply, analyze, synthesize) questions based on the Bloom taxonomy for learning and assessment. Performance by LLM ChatGPT versions 3.5 (LLM 1) and 4 (LLM 2) was assessed in relation to overall scores, question type, and topics, along with the confidence level and reproducibility of answers.Overall percentage scores of 2 LLMs.LLM 2 significantly outperformed LLM 1 by correctly answering 1662 of 1956 questions (85.0%) vs 1306 questions (66.8%) for LLM 1. Notably, LLM 2’s performance was greater than the mean human score of 73.8%, effectively achieving near-passing and passing grades in the neurology board-style examination. LLM 2 outperformed human users in behavioral, cognitive, and psychological-related questions and demonstrated superior performance to LLM 1 in 6 categories. Both LLMs performed better on lower-order than higher-order questions, with LLM 2 excelling in both lower-order and higher-order questions. Both models consistently used confident language, even when providing incorrect answers. Reproducible answers of both LLMs were associated with a higher percentage of correct answers than inconsistent answers.Despite the absence of neurology-specific training, LLM 2 demonstrated commendable performance, whereas LLM 1 performed slightly below the human average. While higher-order cognitive tasks were more challenging for both models, LLM 2’s results were equivalent to passing grades in specialized neurology examinations. These findings suggest that LLMs could have significant applications in clinical neurology and health care with further refinements.
DOI:	doi:10.1001/jamanetworkopen.2023.46721
URL:	Bitte beachten Sie: Dies ist ein Bibliographieeintrag. Ein Volltextzugriff für Mitglieder der Universität besteht hier nur, falls für die entsprechende Zeitschrift/den entsprechenden Sammelband ein Abonnement besteht oder es sich um einen OpenAccess-Titel handelt. Volltext: https://doi.org/10.1001/jamanetworkopen.2023.46721
	DOI: https://doi.org/10.1001/jamanetworkopen.2023.46721
Datenträger:	Online-Ressource
Sprache:	eng
K10plus-PPN:	1881551342
Verknüpfungen:	→ Zeitschrift

Permanenter Link auf diesen Titel (bookmarkfähig): https://katalog.ub.uni-heidelberg.de/titel/69185472

Impressum / Datenschutz Design