Pangeanic leads a consortium with leading machine translation and NLP companies KantanMT and Tilde to create the largest-ever neural machine translation engine farm translating between all European languages.
The new project is financed by the “Connecting Europe Facility” program and the Government of Spain is also present via SEAD, the State Secretariat for Digital Advancement. NTEU’s goal is as simple as ambitious to implement Automatic Translation in the Public Administrations so data flows seamlessly across Member States irrespective of the language the source document.
This is the objective of Neural Translation for the EU (NTEU) , which will receive around two million euros, to develop 506 different near-human, neural machine translation engines within two years. The engines will require a huge amount of data, which the 3 companies are farming from their own repositories, the EU’s own very large repository and other European-sponsored projects. Thus, European Public Administrations will be able to integrate machine translation services within their own national infrastructures and will achieve automatic translation between all the official languages of the European Union. Initially, the consortium’s proposal will develop the engines for European Public Administrations, although its ambition is to offer specialist services to non-Public Administrations bodies and institutions, governments and agencies worldwide.
A challenge for the consortium is to find (and create) bilingual data between non-common language combinations, such as between Estonian and Portuguese, or to create a Maltese-Greek translation engine without pivoting on English, unlike other popular free tools.
NTEU project consortium companies and quality evaluation
The three development companies are Pangeanic from Valencia, KantanMT from Ireland and Tilde from Latvia. The General Technical Office of the Spanish Language Technology Plan, which has already collaborated with these companies on previous projects, will coordinate the evaluation of the results, which will later be validated by different universities in an open bid.
The European Commission’s interest in this project lies in its objective of extending the coverage of the current eTranslation system, promoted by the Commission itself, which it currently only translates from and into English and few other, major European languages such as French/German. Translation and language technologies are a key tool in the European strategy to create a digital single market across language barriers.
Given the great dependence that machine learning technology has on data, the great challenge will consist of obtaining a corpus of training of sufficient quality and quantity to train the different engines – both bilingual data and monolingual data. In order to complete the language pairs with less initial data, it is planned to use automatic text generation techniques using state-of-the-art multilayer neural networks.
The project has received coverage in national press, and technological magazines
- La Razón: https://innovadores.larazon.es/es/not/el-nuevo-google-translate-de-la-ue-tiene-sello-espanol
- Blog RuralVía : https://blog.ruralvia.com/sabias-que-una-empresa-espanola-desarrollara-el-google-translate-de-la-union-europea/