Researchers from the Aina Project travel to Portugal to present the latest developments in speech technologies. Iberspeech was held this year at the University of Aveiro between November 11 and 13. The year 2024 has been key in the development of resources and datasets to improve and optimize models such as Matxa-TTS and other speech technologies with which the Aina Project technical team works.
The conference is a meeting point for all those teams that research speech technologies in the field of Iberian languages. It is also a great opportunity to promote and boost collaboration between industry, universities and the entire academic field. In this sense, it represents a key tool for exchanging knowledge and impressions on the different approaches to resources and technologies developed such as speech recognition systems (ASR) or speech synthesis models (TTS), among others.
Technologies in constant evolution
During the three days, Aina’s team presented some of the most outstanding developments such as the corpus worked with data obtained from 3cat . A resource that contains more than 731 hours and 21 minutes of data and that includes manual transcriptions. All of this data has also been verified using four different ASR systems. The main characteristics of the ” LaFresCat ” dataset have also been presented, a key dataset with 3.5 hours recorded in different dialectal variants of Catalan. La FresCat has been crucial in training the Matxa-TTS model, the speech synthesis solution that takes into account the cultural representativeness of Catalan.
In this sense, Aina also proposes new architectures that allow for improvements in the field of automatic conversation detection and transcription. The objective is to improve this technique in environments where there is more noise and the voice can be more difficult to trace. All of this would be achieved through the integration of extralinguistic information and combining audio and text in “dialogue systems”. However, these are just some of the new developments presented within the framework of the event. During the congress, techniques were also addressed to improve the quality of datasets that incorporate collective contributions , such as the data obtained through the Common Voice initiative. This is an approach that allows this data to be improved, and to apply filtering prior to use for training speech synthesis models. A process that, according to a study by the Aina Project, would slightly improve the quality of these audio resources.
In this sense, congresses such as Iberspeech are key to the exchange of knowledge. Through the experiences shared with researchers from other centers and projects, the actions and research that are carried out are outlined. This is a great opportunity to improve all the processes within the framework of the generation of linguistic resources. In addition, all developments are now available in the Aina Kit .