HEIDI: Richter-Pechanski, Phillip: Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting

Hilfe

Beenden

Markieren

Persönliche Notiz

Andere Formate

BibTeXRIS (Endnote)

Exportieren/Zitieren

Status: Bibliographieeintrag

Standort: ---
Exemplare: ---

	Online-Ressource
Verfasst von:	Richter-Pechanski, Phillip [VerfasserIn]
	Wiesenbach, Philipp [VerfasserIn]
	Schwab, Dominic Mathias [VerfasserIn]
	Kiriakou, Christina [VerfasserIn]
	Geis, Nicolas [VerfasserIn]
	Dieterich, Christoph [VerfasserIn]
	Frank, Anette [VerfasserIn]
Titel:	Clinical information extraction for lower-resource languages and domains with few-shot learning using pretrained language models and prompting
Verf.angabe:	Phillip Richter-Pechanski, Philipp Wiesenbach, Dominic Mathias Schwab, Christina Kiriakou, Nicolas Geis, Christoph Dieterich, Anette Frank
E-Jahr:	2024
Jahr:	31 October 2024
Umfang:	24 S.
Illustrationen:	Illustrationen
Fussnoten:	Gesehen am 02.04.2025
Titel Quelle:	Enthalten in: Natural language processing
Ort Quelle:	Cambridge : Cambridge University Press, 2025
Jahr Quelle:	2024
Band/Heft Quelle:	(2024), Seite 1-24
ISSN Quelle:	2977-0424
Abstract:	A vast amount of clinical data are still stored in unstructured text. Automatic extraction of medical information from these data poses several challenges: high costs of clinical expertise, restricted computational resources, strict privacy regulations, and limited interpretability of model predictions. Recent domain adaptation and prompting methods using lightweight masked language models showed promising results with minimal training data and allow for application of well-established interpretability methods. We are first to present a systematic evaluation of advanced domain-adaptation and prompting methods in a lower-resource medical domain task, performing multi-class section classification on German doctor’s letters. We evaluate a variety of models, model sizes (further-pre)training and task settings, and conduct extensive class-wise evaluations supported by Shapley values to validate the quality of small-scale training data and to ensure interpretability of model predictions. We show that in few-shot learning scenarios, a lightweight, domain-adapted pretrained language model, prompted with just 20 shots per section class, outperforms a traditional classification model, by increasing accuracy from to . By using Shapley values for model selection and training data optimization, we could further increase accuracy up to . Our analyses reveal that pretraining of masked language models on general-language data is important to support successful domain-transfer to medical language, so that further-pretraining of general-language models on domain-specific documents can outperform models pretrained on domain-specific data only. Our evaluations show that applying prompting based on general-language pretrained masked language models combined with further-pretraining on medical-domain data achieves significant improvements in accuracy beyond traditional models with minimal training data. Further performance improvements and interpretability of results can be achieved, using interpretability methods such as Shapley values. Our findings highlight the feasibility of deploying powerful machine learning methods in clinical settings and can serve as a process-oriented guideline for lower-resource languages and domains such as clinical information extraction projects.
DOI:	doi:10.1017/nlp.2024.52
URL:	Bitte beachten Sie: Dies ist ein Bibliographieeintrag. Ein Volltextzugriff für Mitglieder der Universität besteht hier nur, falls für die entsprechende Zeitschrift/den entsprechenden Sammelband ein Abonnement besteht oder es sich um einen OpenAccess-Titel handelt. Volltext: https://doi.org/10.1017/nlp.2024.52
	Volltext: https://www.cambridge.org/core/journals/natural-language-processing/article/clinical-information-extraction-for-lowerres ...
	DOI: https://doi.org/10.1017/nlp.2024.52
Datenträger:	Online-Ressource
Sprache:	eng
Sach-SW:	few-shot learning
	language models
	medical information extraction
	pretraining
	Prompting
K10plus-PPN:	1921184035
Verknüpfungen:	→ Zeitschrift

Permanenter Link auf diesen Titel (bookmarkfähig): https://katalog.ub.uni-heidelberg.de/titel/69327359

Impressum / Datenschutz Design