The English Bible comes in many different written styles, making it the perfect source text to work on the translation of styles.
Therefore, according to the research published in the magazine Royal Society Open ScienceThis is not the first set of parallel data created for the translation of styles, but it is the first one that uses the Bible.
The creation of style translators (tools that keep the text in the same language, but that transform the style) is complex because of the difficulty of acquiring the enormous amount of data required. For example, a style translator could take an English selection of "Moby Dick" and translate it into different versions suitable for young readers or non-native English speakers.
In this study, however, it is suggested that the Bible may be an important resources to achieve it: Each version of the Bible contains more than 31,000 verses that scientists used to produce more than 1.5 million unique combinations of verses of origin. The texts were incorporated into two algorithms: a statistical machine translation system called "Moses" and a neural network framework commonly used in machine translation, "Seq2Seq".
Already in 2011, a group of Israeli developers presented revolutionary software that is giving new clues about who or who wrote the Hebrew Bible (Aleppo Codex). This algorithm, developed by the team led by Moshe Koppel from the University of Bar-Ilan, analyzed the style and set of words to select parts of a text that would have been written by different authors, highlighting their diversity.
The team used 34 stylistically distinct biblical versions ranging from linguistic complexity, from the "King James Version" to the "Basic English Bible."
As an additional benefit to the research team, the Bible is already fully indexed by the consistent use of book numbers, chapters and verses.
As noted by the study collaborator Daniel Rockmore, Professor of Computer Science at Dartmouth:
Humans have been performing the task of organizing biblical texts for centuries, so we didn't have to put our faith in less reliable alignment algorithms.