HEIDI: Gnatzy, Richard: Solving complex pediatric surgical case studies: a comparative analysis of copilot, ChatGPT-4, and experienced pediatric surgeons' performance

Hilfe

Beenden

Markieren

Persönliche Notiz

Andere Formate

BibTeXRIS (Endnote)

Exportieren/Zitieren

Status: Bibliographieeintrag

Standort: ---
Exemplare: ---

	Online-Ressource
Verfasst von:	Gnatzy, Richard [VerfasserIn]
	Lacher, Martin [VerfasserIn]
	Berger, Michael [VerfasserIn]
	Boettcher, Michael [VerfasserIn]
	Deffaa, Oliver J. [VerfasserIn]
	Kübler, Joachim [VerfasserIn]
	Madadi-Sanjani, Omid [VerfasserIn]
	Martynov, Illya [VerfasserIn]
	Mayer, Steffi [VerfasserIn]
	Pakarinen, Mikko P. [VerfasserIn]
	Wagner, Richard [VerfasserIn]
	Wester, Tomas [VerfasserIn]
	Zani, Augusto [VerfasserIn]
	Aubert, Ophelia [VerfasserIn]
Titel:	Solving complex pediatric surgical case studies
Titelzusatz:	a comparative analysis of copilot, ChatGPT-4, and experienced pediatric surgeons' performance
Verf.angabe:	Richard Gnatzy, Martin Lacher, Michael Berger, Michael Boettcher, Oliver J. Deffaa, Joachim Kübler, Omid Madadi-Sanjani, Illya Martynov, Steffi Mayer, Mikko P. Pakarinen, Richard Wagner, Tomas Wester, Augusto Zani, Ophelia Aubert
Jahr:	2025
Umfang:	8 S.
Illustrationen:	Illustrationen, Diagramme
Fussnoten:	Artikel online veröffentlicht: 02. April 2025 ; Gesehen am 10.06.2025
Titel Quelle:	Enthalten in: European journal of pediatric surgery
Ort Quelle:	Stuttgart : Thieme, 1991
Jahr Quelle:	2025
Band/Heft Quelle:	(2025)
ISSN Quelle:	1439-359X
Abstract:	The emergence of large language models (LLMs) has led to notable advancements across multiple sectors, including medicine. Yet, their effect in pediatric surgery remains largely unexplored. This study aims to assess the ability of the artificial intelligence (AI) models ChatGPT-4 and Microsoft Copilot to propose diagnostic procedures, primary and differential diagnoses, as well as answer clinical questions using complex clinical case vignettes of classic pediatric surgical diseases. We conducted the study in April 2024. We evaluated the performance of LLMs using 13 complex clinical case vignettes of pediatric surgical diseases and compared responses to a human cohort of experienced pediatric surgeons. Additionally, pediatric surgeons rated the diagnostic recommendations of LLMs for completeness and accuracy. To determine differences in performance, we performed statistical analyses. ChatGPT-4 achieved a higher test score (52.1%) compared to Copilot (47.9%) but less than pediatric surgeons (68.8%). Overall differences in performance between ChatGPT-4, Copilot, and pediatric surgeons were found to be statistically significant (p < 0.01). ChatGPT-4 demonstrated superior performance in generating differential diagnoses compared to Copilot (p < 0.05). No statistically significant differences were found between the AI models regarding suggestions for diagnostics and primary diagnosis. Overall, the recommendations of LLMs were rated as average by pediatric surgeons. This study reveals significant limitations in the performance of AI models in pediatric surgery. Although LLMs exhibit potential across various areas, their reliability and accuracy in handling clinical decision-making tasks is limited. Further research is needed to improve AI capabilities and establish its usefulness in the clinical setting.
DOI:	doi:10.1055/a-2551-2131
URL:	Bitte beachten Sie: Dies ist ein Bibliographieeintrag. Ein Volltextzugriff für Mitglieder der Universität besteht hier nur, falls für die entsprechende Zeitschrift/den entsprechenden Sammelband ein Abonnement besteht oder es sich um einen OpenAccess-Titel handelt. Volltext: https://doi.org/10.1055/a-2551-2131
	DOI: https://doi.org/10.1055/a-2551-2131
Datenträger:	Online-Ressource
Sprache:	eng
K10plus-PPN:	1927853052
Verknüpfungen:	→ Zeitschrift

Permanenter Link auf diesen Titel (bookmarkfähig): https://katalog.ub.uni-heidelberg.de/titel/69354825

Impressum / Datenschutz Design