| Online-Ressource |
Verfasst von: | Ali, Wazir [VerfasserIn]  |
| Kumar, Jay [VerfasserIn]  |
| Tumrani, Saifullah [VerfasserIn]  |
| Nour, Redhwan [VerfasserIn]  |
| Noor, Adeeb [VerfasserIn]  |
| Xu, Zenglin [VerfasserIn]  |
Titel: | Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention |
Verf.angabe: | Wazir Ali, Jay Kumar, Saifullah Tumrani, Redhwan Nour, Adeeb Noor, and Zenglin Xu (Senior Member, IEEE) |
Jahr: | 2025 |
Umfang: | 10 S. |
Illustrationen: | Illustrationen |
Fussnoten: | Online veröffentlicht: 27. November 2024, Artikelversion: 13. Dezember 2024 ; Gesehen am 04.06.2025 |
Titel Quelle: | Enthalten in: Institute of Electrical and Electronics EngineersIEEE access |
Ort Quelle: | New York, NY : IEEE, 2013 |
Jahr Quelle: | 2025 |
Band/Heft Quelle: | 13(2025), Seite 183133-183142 |
ISSN Quelle: | 2169-3536 |
Abstract: | Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It’s cursive and consists of characters with inherent joining and non-joining properties, independent of word boundaries. Existing Sindhi word segmentation methods rely on designing and combining hand-crafted features. However, these methods have limitations, such as difficulty handling out-of-vocabulary words, limited robustness for other languages, and inefficiency with large amounts of noisy or raw text. Neural network-based models, in contrast, can automatically capture word boundary information without requiring prior knowledge. In this paper, we propose a Subword-Guided Neural Word Segmenter (SGNWS) that addresses word segmentation as a sequence labeling task. The SGNWS model incorporates subword representation learning through a bidirectional long short-term memory encoder, position-aware self-attention, and a conditional random field. Our empirical results demonstrate that the SGNWS model achieves state-of-the-art performance in Sindhi word segmentation on six datasets. |
DOI: | doi:10.1109/ACCESS.2024.3507382 |
URL: | Bitte beachten Sie: Dies ist ein Bibliographieeintrag. Ein Volltextzugriff für Mitglieder der Universität besteht hier nur, falls für die entsprechende Zeitschrift/den entsprechenden Sammelband ein Abonnement besteht oder es sich um einen OpenAccess-Titel handelt.
kostenfrei: Volltext: https://doi.org/10.1109/ACCESS.2024.3507382 |
| kostenfrei: Volltext: https://ieeexplore.ieee.org/document/10769409/authors |
| DOI: https://doi.org/10.1109/ACCESS.2024.3507382 |
Datenträger: | Online-Ressource |
Sprache: | eng |
Sach-SW: | Attention mechanism |
| Computer science |
| Context modeling |
| Labeling |
| Long short term memory |
| long short-term memory |
| neural network |
| Noise measurement |
| Recurrent neural networks |
| representation learning |
| Representation learning |
| Robustness |
| Tagging |
| White spaces |
| word segmentation |
K10plus-PPN: | 1927457734 |
Verknüpfungen: | → Zeitschrift |
Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention / Ali, Wazir [VerfasserIn]; 2025 (Online-Ressource)