The Salamandra model has versions of 2, 7 and 40 billion parameters.

All resources are available for integration through the Aina Kit.

The different versions of the Salamandra family of models are now available. There are 3 different versions, the 2B parameter, 7B and 40B. This is a fundamental milestone for the Aina Project , promoted by the Generalitat de Catalunya and developed by Barcelona Supercomputing, as it takes a step forward in the creation of a public and multilingual AI infrastructure. The Salamandra model represents the first large language model trained from scratch using the supercomputing capabilities offered by MareNostrum 5, located at the Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS).

The models also have trained and quantized versions, facilitating their adoption by interested companies and organizations. In total, the Salamandra 2B model versions already have more than 45 thousand downloads, the 7B ones more than 81 thousand and the 40B ones, more than 5 thousand. These are data that show the great interest of the ecosystem to integrate AI resources in Catalan and open.

Salamandra model family from the Aina Project

 

The Salamandra model is one of the largest open source language models developed in Europe and promotes responsible use of AI, through careful treatment of training data. Specifically, the dataset includes 35 European languages ​​with more than 2 trillion tokens. Through the different training epochs, an accurate representation of the different languages ​​such as Catalan is guaranteed. All training data as well as evaluations and limitations of the model are available in the information of each model in Hugging Face.

29 de May de 2025 | Scientific news |