HEIDI: Ali, Wazir: Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention

Hilfe

Beenden

Markieren

Persönliche Notiz

Andere Formate

BibTeXRIS (Endnote)

Exportieren/Zitieren

Status: Bibliographieeintrag

Standort: ---
Exemplare: ---

	Online-Ressource
Verfasst von:	Ali, Wazir [VerfasserIn]
	Kumar, Jay [VerfasserIn]
	Tumrani, Saifullah [VerfasserIn]
	Nour, Redhwan [VerfasserIn]
	Noor, Adeeb [VerfasserIn]
	Xu, Zenglin [VerfasserIn]
Titel:	Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention
Verf.angabe:	Wazir Ali, Jay Kumar, Saifullah Tumrani, Redhwan Nour, Adeeb Noor, and Zenglin Xu (Senior Member, IEEE)
Jahr:	2025
Umfang:	10 S.
Illustrationen:	Illustrationen
Fussnoten:	Online veröffentlicht: 27. November 2024, Artikelversion: 13. Dezember 2024 ; Gesehen am 04.06.2025
Titel Quelle:	Enthalten in: Institute of Electrical and Electronics EngineersIEEE access
Ort Quelle:	New York, NY : IEEE, 2013
Jahr Quelle:	2025
Band/Heft Quelle:	13(2025), Seite 183133-183142
ISSN Quelle:	2169-3536
Abstract:	Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It’s cursive and consists of characters with inherent joining and non-joining properties, independent of word boundaries. Existing Sindhi word segmentation methods rely on designing and combining hand-crafted features. However, these methods have limitations, such as difficulty handling out-of-vocabulary words, limited robustness for other languages, and inefficiency with large amounts of noisy or raw text. Neural network-based models, in contrast, can automatically capture word boundary information without requiring prior knowledge. In this paper, we propose a Subword-Guided Neural Word Segmenter (SGNWS) that addresses word segmentation as a sequence labeling task. The SGNWS model incorporates subword representation learning through a bidirectional long short-term memory encoder, position-aware self-attention, and a conditional random field. Our empirical results demonstrate that the SGNWS model achieves state-of-the-art performance in Sindhi word segmentation on six datasets.
DOI:	doi:10.1109/ACCESS.2024.3507382
URL:	Bitte beachten Sie: Dies ist ein Bibliographieeintrag. Ein Volltextzugriff für Mitglieder der Universität besteht hier nur, falls für die entsprechende Zeitschrift/den entsprechenden Sammelband ein Abonnement besteht oder es sich um einen OpenAccess-Titel handelt. kostenfrei: Volltext: https://doi.org/10.1109/ACCESS.2024.3507382
	kostenfrei: Volltext: https://ieeexplore.ieee.org/document/10769409/authors
	DOI: https://doi.org/10.1109/ACCESS.2024.3507382
Datenträger:	Online-Ressource
Sprache:	eng
Sach-SW:	Attention mechanism
	Computer science
	Context modeling
	Labeling
	Long short term memory
	long short-term memory
	neural network
	Noise measurement
	Recurrent neural networks
	representation learning
	Representation learning
	Robustness
	Tagging
	White spaces
	word segmentation
K10plus-PPN:	1927457734
Verknüpfungen:	→ Zeitschrift

Permanenter Link auf diesen Titel (bookmarkfähig): https://katalog.ub.uni-heidelberg.de/titel/69353670

Impressum / Datenschutz Design