{"id":8441,"date":"2024-11-12T17:24:39","date_gmt":"2024-11-12T16:24:39","guid":{"rendered":"https:\/\/projecteaina.cat\/tech\/?post_type=publicacions&#038;p=8441"},"modified":"2024-11-21T18:54:43","modified_gmt":"2024-11-21T17:54:43","slug":"3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition","status":"publish","type":"publicacions","link":"https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/","title":{"rendered":"3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition"},"excerpt":{"rendered":"<p>In this work, we present the 3CatParla, a new corpus of broadcast television in Catalan intended for the field of automatic speech recognition. It comprises 731 hours and 21 minutes of speech data with manual transcriptions verified using four different ASR systems. We also introduce an acoustic model trained on 3CatParla, demonstrating its capability to produce highly accurate acoustic models. The model trained with 3CatParla is publicly available on HuggingFace under an Apache 2.0 license.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"_acf_changed":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"class_list":["post-8441","publicacions","type-publicacions","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition - Projecte Aina Tech<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/\" \/>\n<meta property=\"og:locale\" content=\"ca_ES\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition - Projecte Aina Tech\" \/>\n<meta property=\"og:description\" content=\"In this work, we present the 3CatParla, a new corpus of broadcast television in Catalan intended for the field of automatic speech recognition. It comprises 731 hours and 21 minutes of speech data with manual transcriptions verified using four different ASR systems. We also introduce an acoustic model trained on 3CatParla, demonstrating its capability to produce highly accurate acoustic models. The model trained with 3CatParla is publicly available on HuggingFace under an Apache 2.0 license.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/\" \/>\n<meta property=\"og:site_name\" content=\"Projecte Aina Tech\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-21T17:54:43+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@projecte_aina\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/publicacions\\\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\\\/\",\"url\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/publicacions\\\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\\\/\",\"name\":\"3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition - Projecte Aina Tech\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/#website\"},\"datePublished\":\"2024-11-12T16:24:39+00:00\",\"dateModified\":\"2024-11-21T17:54:43+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/publicacions\\\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\\\/#breadcrumb\"},\"inLanguage\":\"ca\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/publicacions\\\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/publicacions\\\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Inici\",\"item\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/#website\",\"url\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/\",\"name\":\"Projecte Aina Tech\",\"description\":\"Impulsant l&#039;\u00fas del catal\u00e0 en l&#039;era digital\",\"publisher\":{\"@id\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ca\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/#organization\",\"name\":\"Projecte Aina Tech\",\"url\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ca\",\"@id\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/cropped-aina-home-logo.jpg\",\"contentUrl\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/cropped-aina-home-logo.jpg\",\"width\":512,\"height\":512,\"caption\":\"Projecte Aina Tech\"},\"image\":{\"@id\":\"https:\\\/\\\/projecteaina.cat\\\/tech\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/projecte_aina\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/projecte-aina\\\/\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition - Projecte Aina Tech","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/","og_locale":"ca_ES","og_type":"article","og_title":"3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition - Projecte Aina Tech","og_description":"In this work, we present the 3CatParla, a new corpus of broadcast television in Catalan intended for the field of automatic speech recognition. It comprises 731 hours and 21 minutes of speech data with manual transcriptions verified using four different ASR systems. We also introduce an acoustic model trained on 3CatParla, demonstrating its capability to produce highly accurate acoustic models. The model trained with 3CatParla is publicly available on HuggingFace under an Apache 2.0 license.","og_url":"https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/","og_site_name":"Projecte Aina Tech","article_modified_time":"2024-11-21T17:54:43+00:00","twitter_card":"summary_large_image","twitter_site":"@projecte_aina","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/","url":"https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/","name":"3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition - Projecte Aina Tech","isPartOf":{"@id":"https:\/\/projecteaina.cat\/tech\/#website"},"datePublished":"2024-11-12T16:24:39+00:00","dateModified":"2024-11-21T17:54:43+00:00","breadcrumb":{"@id":"https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/#breadcrumb"},"inLanguage":"ca","potentialAction":[{"@type":"ReadAction","target":["https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/projecteaina.cat\/tech\/publicacions\/3catparla-a-new-open-source-corpus-of-broadcast-tv-in-catalan-for-automatic-speech-recognition\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Inici","item":"https:\/\/projecteaina.cat\/tech\/"},{"@type":"ListItem","position":2,"name":"3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition"}]},{"@type":"WebSite","@id":"https:\/\/projecteaina.cat\/tech\/#website","url":"https:\/\/projecteaina.cat\/tech\/","name":"Projecte Aina Tech","description":"Impulsant l&#039;\u00fas del catal\u00e0 en l&#039;era digital","publisher":{"@id":"https:\/\/projecteaina.cat\/tech\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/projecteaina.cat\/tech\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ca"},{"@type":"Organization","@id":"https:\/\/projecteaina.cat\/tech\/#organization","name":"Projecte Aina Tech","url":"https:\/\/projecteaina.cat\/tech\/","logo":{"@type":"ImageObject","inLanguage":"ca","@id":"https:\/\/projecteaina.cat\/tech\/#\/schema\/logo\/image\/","url":"https:\/\/projecteaina.cat\/tech\/wp-content\/uploads\/2023\/11\/cropped-aina-home-logo.jpg","contentUrl":"https:\/\/projecteaina.cat\/tech\/wp-content\/uploads\/2023\/11\/cropped-aina-home-logo.jpg","width":512,"height":512,"caption":"Projecte Aina Tech"},"image":{"@id":"https:\/\/projecteaina.cat\/tech\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/projecte_aina","https:\/\/www.linkedin.com\/company\/projecte-aina\/"]}]}},"_links":{"self":[{"href":"https:\/\/projecteaina.cat\/tech\/wp-json\/wp\/v2\/publicacions\/8441","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/projecteaina.cat\/tech\/wp-json\/wp\/v2\/publicacions"}],"about":[{"href":"https:\/\/projecteaina.cat\/tech\/wp-json\/wp\/v2\/types\/publicacions"}],"wp:attachment":[{"href":"https:\/\/projecteaina.cat\/tech\/wp-json\/wp\/v2\/media?parent=8441"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}