A framework for learning.
A fundamental limitation of Machine Assisted Translation systems is the need for very large corpora of training data. Most Bible translation projects cannot provide this and this limits the contribution of the systems to the later stages of a project.
Project Paddington seeks to overcome this limitation by kick-starting MAT by providing a bi-lingual lexicon, morphology and syntax tables compiled from very small amounts of text. This brings forward the moment when MAT systems can contribute to a translation.
There are three major areas of work being developed for Paddington:
- A set of small process we call parseBots (pB – hence Paddington) tasked with learning about the content and structure of a very small piece of text,
- The Language Module which collects and collates items learnt by the bots and
- An Interaction Module tasked with engaging with the user to verify the findings of the rest of the system.
Many of our other systems have a role to play in building the capabilities of Paddington.
We presented a paper about Project Paddington to the ASLING TC39 conference in London in Nov 2017: Learning from Sparse Data.