Computational linguistics for revitalization and polyglotism

Keywords: Language planning, Language Technology, Endangered Languages, Language Economics, Language as a Resource

Abstract

Despite existing laws, in practice, the Peruvian State ignores multiculturalism and behaves as a monolingual and mono-cultural organization. Since this misguided paradigm is still in place, the state has not invested enough to develop language skills to serve all citizens equally. The consequences of this are the lack of promotion, discrimination and finally the isolation that leads to the extinction of our indigenous languages. Our initiative is to change the wrong paradigm, to awaken national pride for our native roots, and to do it on three different ways: to demonstrate that our languages can be used in the modern technological world as well as well-established languages, to demonstrate that our languages can carry culture and entertainment under contemporary canons and to demonstrate that our languages provide economic value to the nation, which justifies their preservation beyond rights. This document describes a roadmap for the development of computational linguistics of under-supported languages that are still spoken by millions of speakers. Such is the case of languages such as: Quechua, Aymara, Guaraní, Nahuatl, Mixtec, Otomi, Quiche, Mayan or Zapotec. Due to the massive presence of the speakers of these languages in the urban environment and their habitual use of the Internet and mobile telephony, we are committed to build corpora of these languages via online crowdsourcing.

Downloads

Download data is not yet available.

Métricas alternativas

References

Adelaar, W. F. H. (2014). Endangered languages with millions of speakers: Focus on Quechua in Peru. JournaLIPP, 3, 1-12. https://lipp.ub.uni-muenchen.de/lipp/article/view/393

Barnard, E., Davel, M., Van Heerden, C. (Septiembre de 2009). ASR corpus design for resource-scarce languages. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Congreso llevado a cabo en Brighton, Reino Unido. http://doi.org/10.13140/RG.2.1.1824.2000.

Benjamin, M. (2016). Digital language diversity: Seeking the value proposition. En C. Soria et ál. (Eds.), CCURL 2016 Collaboration and Computing for Under-Resourced Languages: Towards an Alliance for Digital Language Diversity (pp. 52-58). Eslovenia: LREC. http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop- CCURL2016_Proceedings.pdf

Bird, S. (2018). Designing Mobile Applications for Endangered Languages. En K. L. Rehg y L. Campbell (Eds.), The Oxford Handbook of Endangered Languages. Oxford: Oxford University Press. https://doi.org/10.1093/ oxfordhb/9780190610029.013.40

Bird, S., Hanke, F. R., Adams, O. y Lee, H. (2014). Aikuma: A mobile app for collaborative language documentation. En Proceedings of the 2014 workshop on the use of computational methods in the study of endangered languages (pp. 1-5). Baltimore: Association for Computational Linguistics. https://doi.org/10.3115/v1/W14-2201

Blokland, R., Fedina, M., Gerstenberger, C., Partanen, N., Riebler, M. y Wilbur, J. (2015). Language documentation meets language technology. First International Workshop on Computational Linguistics for Uralic Languages. Septentrio conference series. https://doi. org/10.7557/5.3457

De Vries, N. J., Davel, M. H., Badenhorst, J., Basson, W. D., Barnard, E., De Waal, A. (2014). A smartphone-based asr data collection tool for under-resourced languages. Speech communication, 56, 119-131. https://doi. org/10.1016/j.specom.2013.07.001

Gelas, H., Abate, S. T., Besacier, L., Pellegrino, F. (2011). Quality Assessment of Crowdsourcing Transcriptions for African Languages. INTERSPEECH, 12th Annual Conference of the International Speech Communication Association. Florencia, 3065-3068. https://www. researchgate.net/publication/221478079_Quality_Assessment_of_ Crowdsourcing_Transcriptions_for_African_Languages

Ministerio de Cultura (2020). Base de Datos de Pueblos Indígenas u Originarios. https://bdpi.cultura.gob.pe/

Parent, G., Eskenazi, M. (2010). Toward better crowdsourced transcription: Transcription of a year of the let’s go bus information system data. 2010 IEEE Spoken Language Technology Workshop. Berkeley, 312- 317. https://doi.org/10.1109/SLT.2010.5700870

Rehm, G. (2018). The META-NET strategic research agenda for language technology in europe: An extended summary. En G. Rehm, F. Sasaki, D. Stein y A. Witt (Eds.), Language technologies for a multilingual Europe: TC3 III (pp. 19-41). Berlín: Language Science Press. http:// doi.org/10.5281/zenodo.1291926

Ríos, A. (2016). A basic language technology toolkit for quechua. Sociedad Española para el Procesamiento del Lenguaje Natural, 56, 91-94. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5291

Wang, D. y Zhang, X. (2015). Thchs-30: A free chinese speech corpus. arXiv preprintarXiv:1512.01882

Woodbury, A. C. (2014). Archives and audiences: Toward making endangered language documentations people can read, use, understand, and admire. Language documentation and description, 12, 19-36.

Zariquiey, R., Hammarström, H., Arakaki, M., Oncevay, A., Miller, J., García, A. y Ingunza, A. (2019). Obsolescencia lingüística, descripción gramatical y documentación de lenguas en el Perú: hacia un estado de la cuestión. Lexis, 43 (2), 271-337. https://doi.org/10.18800/lexis.201902.001

Published
2020-11-16
How to Cite
Camacho Caballero, L., & Zevallos Salazar, R. (2020). Computational linguistics for revitalization and polyglotism. Letras (Lima), 91(134), 184-198. https://doi.org/10.30920/letras.91.134.9
Section
Short communications and research progress