Data & Analytics

AppTek Launches New Metadata-Informed Neural Machine Translation System for Enterprises

AppTek announced the release of its new neural machine translation system that incorporates metadata as inputs used to customise the MT output and empower localisation professionals with more accurate user-influenced machine translations. Additionally, the company expanded its core machine translation platform to support hundreds of language and dialect pairs.

AppTek’s new meta-aware NMT system is changing the paradigm of how professional translators work with machine translation output. Up until today, most off-the-shelf MT systems have functioned inside a “black box” where source language text is formulated into text of a target language with no or limited awareness of the surrounding context or the domain or topic of the source text, and with limited control of the resulting output. Traditionally, enterprises would need to train, deploy and maintain multiple MT systems to account for translation tasks that differ in aspects such as language, dialect, domain, topic, and more, at the risk of high deployment costs and overfitting models.

With AppTek’s new metadata informed NMT platform, enterprise customers can now access a single NMT system with multi-domain, multi-genre, multi-dialect content which increases the quality and adaptability of the system.

By feeding additional metadata into the system, they gain more control of the MT output and can enable translators to simply “flip the switch” to the desired customised translation through relevant functionality in the user interface of the editing tools professionals work with.

Examples of MT output customisation achieved with using additional metadata include:

—Style – switch between formal and informal styles, such as that between a telenovela and a documentary, and get a translation with an appropriate politeness register depending on speaker status and relationships;
—Length Control for Automatic Dubbing and Subtitling Tasks – generate shorter or longer translations with minimal information loss or distortion for tasks with hard length constraints;
—Speaker Gender – toggle to the correct speaker gender, which influences inflections for certain parts of speech, especially in morphologically rich languages such as Czech;
—Domain – adapt to the genre of the text, such as news programmes, patents, talk shows, etc. to increase overall accuracy and use of in-domain, relevant translations of ambiguous words at the document level;
—Extended Context – optionally make the system consider neighbouring sentences within a document when translating a particular sentence so that ambiguity of, for example, pronoun translation can be resolved.
—Glossary – account for official or mandatory translations which the system may otherwise translate differently; and,
—Language Variety – account for multiple languages and dialects within a single system, as well as handling mixed-language content.

“By incorporating metadata to influence the MT output we are able to inject some ‘world knowledge’ into our platform,” said Evgeny Matusov, AppTek’s Lead Science Architect for Neural Machine Translation. “This improves the overall quality and adaptability of the system output and can be accomplished within a single multi-purpose system designed to reduce environmental footprint and cost.”

AppTek’s metadata-informed MT technology is now available for translation from English to selected European languages and their varieties, with more language pairs coming soon. The system can be customised and adapted to the needs of enterprise customers by utilising existing parallel domain-specific translation corpora found inside company archives.

“As the demand for content localisation continues to skyrocket, enterprises need to continue to innovate and find new ways to further accelerate production workflows,” said Kyle Maddock, SVP Marketing at AppTek. “Our metadata-informed MT system has been specifically designed with translation professionals in mind, by providing them with more control over the MT output which can further speed up the localisation process.”