Urdu Text-to-Speech Conversion Using Deep Learning
Paper ID: 2064
The recent advancements in communication technologies have provided new channels of communication for people speaking various languages all over the world. Text-to-Speech (TTS) systems have been proposed by researchers to facilitate language learning and communication among people speaking different languages. There are thousands of languages spoken in different parts of the world, with Urdu being one of the most widely spoken in the subcontinent and many other countries. The existing literature on TTS is mainly focused on languages other than Urdu, and existing approaches for Urdu TTS do not make use of cutting-edge deep learning techniques. Therefore, this study aims to create a system that takes Urdu textual content as input and then produces an audio version of the same textual content by using state-of-the-art techniques. In the proposed deep learning-based technique, Tacotron 2 with WaveGlow is used, which consists of various layers of Long Short-Term Memory (LSTM) and convolutional networks with various hidden layers that help improve the result. The proposed Urdu TTS system is trained on a preprocessed dataset before being tested on a dataset of 100 sentences. The results are evaluated using the Mean Opinion Score (MOS), which is a standard performance evaluation measure in the TTS conversion domain. The evaluation results show that the proposed approach outperformed the existing approaches, achieving a MOS of 3.76.