Speech technologies

Speech is the most natural means of interaction as an interface between humans and machines, taking into account all its possible variations. Chatbots are one of the possible types of applications. In addition, in speech there are other factors such as personal characteristics such as gender and age. However, these varieties of ways of speaking are at the same time a barrier to entry. To overcome them and facilitate the adaptation of these technologies, it is necessary to work beforehand on the creation of:

DATA RESOURCES AND MODELS

PLATFORMS AND INTEGRATION TOOLS

To be able to implement speech technologies
as an interface between humans and machines
will work on the generation and updating of:

LARGE SPEECH SYNTHESIS (TTS) MODELS
SPEECH RECOGNITION (STT)
SPEECH TO SPEECH AUTOMATIC TRANSLATION (MT S2S)

AINA’S RESEARCH FOCUSES ON:

Investigate and expand the catalog of models of speech technologies in new and impactful architectures.

Develop multilingual STT models, to transcribe multilingual recordings.

Develop TTS models for speaker and language transfer based on automatic dubbing.

Train specific domain models (audiovisual production, telephony, conversational).

Promote the presence and continuity of Catalan in the most popular technologies and environments of the free software community.

Other machine learning models.