The Institut d’Estudis Aranesi (IEA-AALO) will transfer voice, text and metadata data to the Barcelona Supercomputing Center (BSC-CNS) . Key resources through which Aina will develop linguistic models, for the first time, also in Aranese. Both entities sign a key agreement for the incorporation of the Occitan language in the artificial intelligence systems developed by the Aina Project. The center’s Language Technology Unit is the team that coordinates the project.
The collaboration foresees that the BSC can pre-process the data in order to integrate them into the Aina corpus. The datasets that are available in Hugging Face are fundamental for training the models and the Language Technologies (TL). It will be through these resources that Aina will develop language models also in Aranese.
Through this transversal vision of linguistic models, Aina wants to become a crucial tool for the promotion of languages with few digital resources.
For the Institut d’Estudis Aranesi, this is “a step that can lead to an important advance for the development of technologies in the Occitan language that can facilitate linguistic study and analysis as well as greater diffusion and promotion of the language through of text writing or automatic correction applications, among others”, according to Jèp de Montoya, president of the IEA-AALO. The Aina Project, led by the Barcelona Supercomputing Center and financed by the Generalitat de Catalunya, thus expands its range of collaborations beyond Catalan. Through this transversal vision of linguistic models, Aina wants to become a crucial tool for the promotion of languages with few digital resources. Precisely artificial intelligence systems offer a unique possibility to strengthen the presence of low resources languages in the digital field. All resources developed by Aina, such as the Flor 6.3B model, are available in the Aina KIT
Project Aina | Communication and press
press.languagetech@bsc.es