Preprocessing models for speech technologies
The impact of the Normalizer and the Grapheme-to-Phone on hybrid systems
DOI:
https://doi.org/10.26334/2183-9077/rapln10ano2023a6Keywords:
speech technologies, normalizer, grapheme-to-phone, linguistic knowledge, modelsAbstract
This paper describes the linguistic preprocessing methods on hybrid systems provided by an Artificial Intelligence (AI) international company, Defined.ai. The startup focuses on providing high-quality data, models, and AI tools. The main goal of this work is to enhance and advance the quality of preprocessing models by applying linguistic knowledge. Thus, we focus on two introductory linguistic models in a speech pipeline: Normalizer and Grapheme-to-Phone (G2P). To do so, two initiatives were conducted in collaboration with the Defined.ai Machine Learning team. The first project focuses on expanding and improving a European Portuguese Normalizer model. The second project covers creating G2P models for two different languages – Swedish and Russian. Results show that having a rule-based approach to the Normalizer and G2P increases its accuracy and performance, representing a significant advantage in improving Defined.ai tools and speech pipelines. Also, with the results obtained on the first project, we improved the normalizer in ease of use by increasing each rule with linguistic knowledge. Accordingly, our research demonstrates the added value of linguistic knowledge in preprocessing models.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Bruna Carriço, Christopher Shulby, Helena Moniz

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors retain copyright and concede to the journal the right of first publication. The articles are simultaneously licensed under the Creative Commons Attribution License, which allows sharing of the work with an acknowledgement of authorship and initial publication in this journal.
The authors have permission to make the version of the text published in RAPL available in institutional repositories or other platforms for the distribution of academic papers (e.g., ResearchGate).