Broadening the Perspective

As we approach Autumn there is much for which to be thankful in the work of the last few months. We have made steady progress with our Sparse Data Learning Model, good outcomes were forthcoming from both the software symposium in Philadelphia at American Bible Society and from the ACL/DeepLo 2018 conference in Australia.

Major international conferences are excellent ways to get a synopsis of the ‘state of the art’ in a field. ACL 2018 was no exception and coupled with the DeepLo conference later the same week, which focussed on low resource languages, the two events provided a clear picture of how well, or otherwise, languages without strong commercial support are served by mainstream Machine Translation (MT).

When the MAT team was first set up by BFBS in 1991 the state of the art was considered to be Rule-Based MT (RBMT). Our work then in Statistical MT (SMT) was very distinctly left of field. Over the next 15-20 years RBMT gradually lost ground to SMT and by about 2010 Phrase-Based SMT (PBSMT) was established as the leading methodology for MT systems. Our own SMT systems have been available in ParaTExt since about 2007.

The last two years, however, have seen a sea-change in state of the art MT. The advent of Neural MT (NMT) from Google (and others) in 2016 brought a new kid to the block who seemed able to do things that bit better than the existing PBSMT systems. You might expect that we too would be looking to NMT for future systems but the reality is a little different.

SMT, particularly PBSMT, systems require a lot of example data from which to learn the equivalences between a pair of texts. NMT, sadly, needs even more, to the point that using NMT to translate between language pairs with low resources of training data etc… is simply impractical. This limitation has led us to imagine other approaches and it was to examine our approach alongside mainstream NMT solutions that took Jon to Melbourne.

There were some interesting outcomes. Broadly speaking, NMT systems cannot contribute where low-resource languages are concerned. There are, however, some things we can borrow from NMT systems.

NMT is based upon what is sometimes called ‘Deep Learning’ (DL). DL uses multi-layered neural networks (NN) to learn the correct outputs for a given input. This involves combining signals from many different nodes in the network to assess the validity of a hypothesis. NNs have a number of mechanisms that do this, one of which (Threshold Logic Units) we have adapted to allow us to combine signals from different analyses towards aggregate conclusions.

Something else NNs offer is an intriguing ability to perform better when they are thinking about different things at the same time(!). How does that work? Well, a NN is trained to perform a task and then, once it is performing well, the same network is then trained to do something entirely different but retaining its original ability. The odd thing is that when you do this, the performance of the network on its original task improves. How’s that for bizarre?

These two characteristics are also implicit in the work we have in hand in our Sparse Data Learning Model. Disparate but related analyses are run against the same data and the outputs combine to strengthenlearning outcomes.

All in all it is very fascinating, distinctly leading (bleeding?) edge and building it into a useful machine for bible translations is proving one of the most interesting and extensive projects we have ever undertaken.