Call for Paper

CAE solicits original research papers for the February 2022 Edition. Last date of manuscript submission is January 31, 2022.

Read More

Development of a Prosodic Read Speech Syllabic Corpus of the Yoruba Language

Akintoba Emmanuel Akinwonmi. Published in Information Sciences.

Communications on Applied Electronics
Year of Publication: 2021
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: Akintoba Emmanuel Akinwonmi
10.5120/cae2021652884

Akintoba Emmanuel Akinwonmi. Development of a Prosodic Read Speech Syllabic Corpus of the Yoruba Language. Communications on Applied Electronics 7(36):13-32, June 2021. BibTeX

@article{10.5120/cae2021652884,
	author = {Akintoba Emmanuel Akinwonmi},
	title = {Development of a Prosodic Read Speech Syllabic Corpus of the Yoruba Language},
	journal = {Communications on Applied Electronics},
	issue_date = {June 2021},
	volume = {7},
	number = {36},
	month = {Jun},
	year = {2021},
	issn = {2394-4714},
	pages = {13-32},
	numpages = {20},
	url = {http://www.caeaccess.org/archives/volume7/number36/882-2021652884},
	doi = {10.5120/cae2021652884},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Literature revealed that the need for annotated database of speech text or audio files is justified primarily by the requirements for corporal entities to conduct basic Natural Language Processing (NLP) studies on a language. Such investigationstraverse thephonetic, aural and etymological representations of the language. Moreover, research of interest can also span grammatic, semantic, pragmatic and syntactic characterizations of the particular language. At a secondary level an annotated speech corpus is desirable for the purpose of speech synthesistypified by Text-to-Speech (TTS) and recognitionas in Speech-to-Text (STT). Yoruba language, a resource scarce language with a wide usage,has sparse andscarce digital resources and its computerization poses unique challenges. Annotated speech corpus is one of such resources.Hence, this research was motivated by the need to contribute to the scanty resources for the language. This research minedtextual inputs from four sources including two Standard Yoruba (SY) fiction, an SY grammar textbook and an SY Online Scripture. A hybrid of Falaschi scheme and the add-on procedure of Radová and Vopálka were applied to extractphonetically balanced text bag of 7376 phrases and sentences with a view to minimizing the extraction cost, while maximizing phonetic coverage of all standard Yoruba syllabic events. The selected text was read by an expert and recorded in a suitable environment and saved as wave files. The wave files were annotated with Praat. A relational database was developed to host the corpus metadata. The corpus performed impressively when tested with a Standard Yoruba TTS. This paper presents the design, implementation, results and other useful information about the research.

References

  1. Atoye R. O. 1999. “Native -Speaker Perception of Intonation in Yoruba Zero- Particle Interrogative Clauses.” Papers in English and Linguistics 4:15 - 23, 1999.
  2. Ajolore O. 1974. Learning to Use Yorùbá focus sentence in a multilingual setting. Ph.D. Thesis, University of Ilinouis, USA, 1974.
  3. Adetugbo A. 1973. The Yoruba Language in Yoruba History. in Biobaku, S.O. (ed.). Sources of Yoruba History. pp. 176–204.
  4. Crowther S. 1843. Vocabulary of the Yorùbá Language… to which are prefixed the grammatical elements of the Yoruba language. Church Missionary Society, London.
  5. Fagborun J. G. 1994. The Yoruba Koiné – its History and Linguistic Innovations. LINCOM Linguistic Edition Vol. 6. München/Newcastle: LINCOM Europe.
  6. Akinlabi A. and Liberman M. 2000. The tonal phonology of Yoruba clitics. Rutgers Univ. & Univ. of Pennsylvania.
  7. Adewole L. O. 1987. The Yoruba Language: Published works and doctoral dissertations, 1843 - 1986. Hamburg: Helmut Buske.
  8. Taylor, P. 2009. Text-to-speech synthesis. Cambridge, UK: Cambridge University Press. pp. 3.
  9. Fajobi E. 2003. Why does Yoruba high tone fall where it does in ‘Yorubalized’ English words?” In Kingsley Sage (ed.). The 16th White House Papers: Graduate Research in Cognitive and Computing Sciences at Sussex. pp.34 - 36.
  10. Beckman, M.E., 1996. The parsing of prosody. Language and cognitive processes, 11(1-2), pp.17-68.
  11. Wennerstrom A. K.2001. The music of everyday speech: Prosody and discourse analysis. Oxford: Oxford University Press, 2001.
  12. Bird, S. 2011. Strategies for representing tone in African writing systems: a critical review. URL:http://cogprints.org/2174/00/wll2.pdf. (Access date: 7 July 2011).
  13. Oyetade B. A. 1987. Tone representation and Yoruba tone. Paper presented at UCL Dept. of Phonetics and Linguistics Postgraduate Seminar in March, 1987 and the Spring Meeting of the LAGB, Westfield College, London, April.
  14. Hu, C. 2003. Text statistics tool box for natural language processing. Technical report, The University of Georgia, May.
  15. Ngugi, K., Okelo-Odongo, W. and Wagacha, P.W., 2005. Swahili text-to-speech system. African Journal of Science and Technology, 6(1).
  16. Onayemi A. O. 2010, “Learn Yoruba”, http://www.learnyoruba.com/ aboutus.htm. Retrieved on June 20, 2021
  17. Akinwonmi, A. E. and Alese, B. K. 2013. A prosodic text-to-speech system for the Yorùbá language. The 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013). Technically Co-sponsored by IEEE UK/RI Computer Chapter. December 9-12, 2013, in London, UK. Pages 634 – 639 of the Conference Proceedings. ISBN 978-1-908320-16-2.
  18. Adedjouma Sèmiyou, A., Aoga, J.O. and Igue, M.A., 2012. Part-of-Speech tagging of Yorùbá Standard, Language of Niger-Congo family. Research Journal of Computer and Information Technology Sciences. Vol. 1 pages 2-5.
  19. Olaniyan, O. L. 2015. Development of a Yoruba Language Syllabificator. (Postgraduate diploma dissertation). The Federal University of Technology, Akure, Ondo State. Nigeria.
  20. Agrawal, S. S., Samudravijaya, K. and Arora, K. 2006. Recent advances of speech databases development activity for Indian languages, In Proc. ISCSLP 2006, Singapore, pp. 771-776.
  21. Kumolalo, F. O., Adagunodo, E. R., and Odejobi, O. A. 2010. Development of a Syllabicator for Yoruba Language. In Proceedings of OAU TekConf, September 5-8, 2010, pages 47–51, OAU, Ile-Ife, Nigeria.
  22. Broder, A.Z., Glassman, S.C., Manasse, M.S. and Zweig, G., 1997. Syntactic clustering of the web. Computer networks and ISDN systems, 29(8-13), pp.1157-1166.

Keywords

Speech, Corpus, Yoruba Language, chunk, Syllabification