HEIDI: Almasian, Satya: Word embeddings for entity-annotated texts

Hilfe

Beenden

Markieren

Persönliche Notiz

Andere Formate

BibTeXRIS (Endnote)

Exportieren/Zitieren

Status: Bibliographieeintrag

Standort: ---
Exemplare: ---

	Online-Ressource
Verfasst von:	Almasian, Satya [VerfasserIn]
	Spitz, Andreas [VerfasserIn]
	Gertz, Michael [VerfasserIn]
Titel:	Word embeddings for entity-annotated texts
Verf.angabe:	Satya Almasian, Andreas Spitz, and Michael Gertz
E-Jahr:	2019
Jahr:	6 Feb 2019
Umfang:	15 S.
Fussnoten:	Gesehen am 19.02.2019
Titel Quelle:	Enthalten in: De.arxiv.org
Ort Quelle:	[S.l.] : Arxiv.org, 1991
Jahr Quelle:	2019
Band/Heft Quelle:	(2019) Artikel-Nummer 1902.02078, 15 Seiten
Abstract:	Many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training, corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.
URL:	Bitte beachten Sie: Dies ist ein Bibliographieeintrag. Ein Volltextzugriff für Mitglieder der Universität besteht hier nur, falls für die entsprechende Zeitschrift/den entsprechenden Sammelband ein Abonnement besteht oder es sich um einen OpenAccess-Titel handelt. Volltext: http://arxiv.org/abs/1902.02078
Datenträger:	Online-Ressource
Sprache:	eng
Sach-SW:	Computer Science - Computation and Language
K10plus-PPN:	1587741687
Verknüpfungen:	→ Sammelwerk

Permanenter Link auf diesen Titel (bookmarkfähig): https://katalog.ub.uni-heidelberg.de/titel/68359648

Impressum / Datenschutz Design