Call for Paper

CAE solicits original research papers for the July 2021 Edition. Last date of manuscript submission is June 30, 2021.

Read More

Enhanced Evaluation of Sentiment Analysis for Tamil Text-to-Speech Synthesis using Hidden Semi- Markov Model

B. Sudhakar, R. Bensraj. Published in Pattern Recognition.

Communications on Applied Electronics
Year of Publication: 2015
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: B. Sudhakar, R. Bensraj
10.5120/cae2015651971

B Sudhakar and R Bensraj. Article: Enhanced Evaluation of Sentiment Analysis for Tamil Text-to-Speech Synthesis using Hidden Semi- Markov Model. Communications on Applied Electronics 3(6):13-16, December 2015. Published by Foundation of Computer Science (FCS), NY, USA. BibTeX

@article{key:article,
	author = {B. Sudhakar and R. Bensraj},
	title = {Article: Enhanced Evaluation of Sentiment Analysis for Tamil Text-to-Speech Synthesis using Hidden Semi- Markov Model},
	journal = {Communications on Applied Electronics},
	year = {2015},
	volume = {3},
	number = {6},
	pages = {13-16},
	month = {December},
	note = {Published by Foundation of Computer Science (FCS), NY, USA}
}

Abstract

In recent years, speech synthesis has become an dynamic research area in the field of speech processing due to the usage of automated systems for spoken language interface. This paper address an innovative Tamil Text-to-Speech(TTS)synthesis system utilizing Hidden Semi- Markov Model (HSMM) to analyze sentiments from the speech output. Four different HSMM methods have been proposed to implement the above task. The sentiment-dependent (SD) modeling is the first method, utilize individual models trained for each emotion individually. Sentiment adaptation (SA) modeling is the second method, initially a model is trained using neutral speech, and the adaptation process are implemented to each emotion of the database. The third method is called sentiment-independent (SI) technique, is at first trained utilizing data from all the sentiment of the speech database which is based on an average emotion model. Subsequently, an adaptive model has been constructed for each emotions. The fourth method is called sentiment adaptive training(SAT), the average emotion model is trained with simultaneously normalization of the output and state duration distributions. These training methods are evaluated using a Tamil speech database which consists of four categories of speech, anger, joy, sadness and Disgust was used. To assess and compare the potential of the four approaches in synthesized sentimental speech an emotion recognition rate subjective test was performed. Among the four evaluated methods the sentiment adaptive training method gives the enhanced emotion recognition rates for all emotions.

References

  1. Donovan R, Woodland P. A hidden Markov-model based trainable speech synthesizer [J]. Computer Speech and Language, 1999, 13(3): 223-241.
  2. Masuko T, Tokuda K, Kobayashi T, Imai S. Speech synthesis using HMMs with dynamic features [C]. In Proceedings of ICASSP, 1996, 389-392.
  3. Tamura M, Masuko T, Tokuda K, Kobayashi T. Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR [C]. In Proceedings of ICASSP, 2001, 805-808.
  4. Gauvain J, Lee C H. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains [J]. IEEE Trans. Speech Audio Processing, 1994, 2(2):291-298.
  5. Anastasakos T, McDonough J, Schwartz R, Makhoul J. A compact model for speaker-adaptive training [C]. In Proceedings of ICSLP, 1996, 1137-1140.
  6. Yamagishi J, Kobayashi T. Adaptive training for hidden semi-Markov model [C]. In Proceedings of ICASSP, 2005, 365-368.
  7. Tokuda K, Masuko T, Miyazaki N, Kobayashi T. Hidden markov models based on multi-space probability distribution for pitch pattern modeling [C]. In Proceedings of ICASSP, 1999, 229-232.
  8. Tokuda K, Yoshimura T, Masuko T, Kobayashi T, Kitamura T. Speech parameter generation algorithm for HMM-based speech synthesis [C]. In Proceedings of ICASSP, 2000, 1315-1318.
  9. Fukada T, Tokuda K, Kobayashi T, Imai S. An adaptive algorithm for mel-cepstral analysis of speech [C]. In Proceedings of ICASSP, 1992, 137-140.
  10. Zen H, Tokuda K, Masuko T, Kobayashi T, Kitamura T. Hidden semi-Markov model based speech synthesis [C]. In Proceedings of ICSLP, 2004, 1180-1185.
  11. Yamagishi J, Masuko T, Kobayashi T. MLLR adaptation for hidden semi-Markov model based speech synthesis [C]. In Proceedings of ICSLP, 2004, 1213-1216.
  12. Yamagishi J, Tamura M, Masuko T, Tokuda K, Kobayashi T. A context clustering technique for average voice model in HMM-based speech synthesis [C]. In Proceedings of ICSLP, 2002, 133-136.
  13. Yamagishi J, Kobayashi T. Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training [J]. IEICE Trans. Inf. & Syst, 2007, E90-D(2):533-543.
  14. Gutkin X. Gonzalvo, S. Breuer and P. Taylor. 2010.Quantized HMMs for low footprint text-to-speechsynthesis. In Interspeech. pp. 837-840.
  15. Sudhakar.BandBensraj.R.2015.AnexpressiveHMM-BasedText-To-SpeechSynthesis System Utilizing Glottal Inverse filtering for Tamil Language. ARPN Journal of Engineering and Applied Sciences.10(6):2400-2404.

Keywords

HMM, HSMM,TTS