Facebook’s researchers have just announced they have developed a faster and more accurate AI language translation system.
According to an article online, the researchers claim that their new technology could also assist computers with translating multiple languages, especially those with low resources such as, Urdu and Burmese.
Machine translation systems have evolved over recent years and are now thought to be able to offer a high level of accuracy in some common language pairs such as, English to Arabic or English to Spanish. However, they are still far from reliable with the many less common pairings due to the amount of data required for accuracy.
The Facebook AI Research division (FAIR) have managed to train a machine translation system by providing it with masses of data from sites such as Wikipedia. Each of the texts were fed into the system as independent pieces of text, in different languages, and are known as monolingual corpora. Most AI translation systems use monolingual corpora alongside parallel corpus to enable an efficient learning system.
The most amazing point of FAIR’s system is that it only uses monolingual corpus.
A research scientist, and head of FAIR’s Paris lab, Antoine Bordes said “Building a parallel corpus is complicated because you need to find people fluent in two languages to create it. For instance, if you wanted to build a parallel corpus of Portuguese/Nepali, you would need to find people fluent in these two languages, which would be very difficult. On the other side, building monolingual corpora Portuguese/Nepali is very easy: you just need to download webpages from Portuguese and from Nepali websites, it doesn’t matter if they are not parallel sentences or if they talk about different things.”
Bordes added that “The novelty in our approach is that we can train MT systems from monolingual corpora only, we don’t need any parallel corpus. Potentially, given a book written in an alien language, we could use our model to translate it into English.”
The innovative translation system will be presented at Empirical Methods in Natural Language Processing (EMNLP) and could be set to propel Facebook far ahead of other social media platforms as users will be able to instantly translate posts from around the world.
Marc Aurelio Ranzato, another FAIR research scientist said, “Our new approach provides a dramatic improvement over previous state-of-the-art unsupervised approaches — and is equivalent to supervised approaches trained with nearly 100,000 reference translations”.
He continued stating that. “To give some idea of the level of advancement, an improvement of 1 BLEU point (a common metric for judging the accuracy of MT) is considered a remarkable achievement in this field — our methods showed an improvement of more than 10 BLEU points.
This is a giant leap forward for machine translation, and also for speakers of the thousands of languages around the world that are seldom used by translators, be they human or machine.