New Understandings

It has been a little while since we were last in touch but a lot has happened since then both in terms of our work and in the wider world of Bible translation. One of the glories of translating the Bible is that it is one of relatively few church related activities in which the majority of churches willingly collaborate. It has its challenges, it’s true, but the disparate fragments of Christendom are, on the whole, very wiling to sit down together for the task of translating the Scriptures.

We were reminded of this recently by a communique from UBS that was celebrating an agreement with the Eastern and Oriental Orthodox Churches to collaborate with the UBS in “mutual commitment to continuing relations and collaboration in all aspects of activity related to the Holy Scripture”. This is the culmination of years of diplomacy. The dialogue which produced this agreement began in 1991 but has its roots in the founding of UBS in 1948.

UBS reported: “The agreement, a Memorandum of Understanding, is unique because it is the first agreement of the respective Churches with an organisation that is not another Church. It is also the first time that the Eastern Orthodox Church and the Oriental Orthodox Churches, who are not in communion with one another, have jointly signed a document like this”. deo gracias.

Meanwhile, in what feels a little like another world, the team have been working hard turning ideas into runnable code. It is said that the two biggest threats to successful IT projects are complexity and novelty. Both have the capacity to derail the best plans (can you derail a plan? mmm…) and both are present significantly in our Parse Bot system. Nevertheless progress has been good and we are now close to having a solid test system in place.

Last Sunday the Church began a New Year by looking forward once more to the coming Christ. The prophecies of His coming were read once more and certainty of His return reaffirmed. Yet 1.5 billion people still cannot hear these truths in their own language.

Broadening the Perspective

As we approach Autumn there is much for which to be thankful in the work of the last few months. We have made steady progress with our Sparse Data Learning Model, good outcomes were forthcoming from both the software symposium in Philadelphia at American Bible Society and from the ACL/DeepLo 2018 conference in Australia.

Major international conferences are excellent ways to get a synopsis of the ‘state of the art’ in a field. ACL 2018 was no exception and coupled with the DeepLo conference later the same week, which focussed on low resource languages, the two events provided a clear picture of how well, or otherwise, languages without strong commercial support are served by mainstream Machine Translation (MT).

When the MAT team was first set up by BFBS in 1991 the state of the art was considered to be Rule-Based MT (RBMT). Our work then in Statistical MT (SMT) was very distinctly left of field. Over the next 15-20 years RBMT gradually lost ground to SMT and by about 2010 Phrase-Based SMT (PBSMT) was established as the leading methodology for MT systems. Our own SMT systems have been available in ParaTExt since about 2007.

The last two years, however, have seen a sea-change in state of the art MT. The advent of Neural MT (NMT) from Google (and others) in 2016 brought a new kid to the block who seemed able to do things that bit better than the existing PBSMT systems. You might expect that we too would be looking to NMT for future systems but the reality is a little different.

SMT, particularly PBSMT, systems require a lot of example data from which to learn the equivalences between a pair of texts. NMT, sadly, needs even more, to the point that using NMT to translate between language pairs with low resources of training data etc… is simply impractical. This limitation has led us to imagine other approaches and it was to examine our approach alongside mainstream NMT solutions that took Jon to Melbourne.

There were some interesting outcomes. Broadly speaking, NMT systems cannot contribute where low-resource languages are concerned. There are, however, some things we can borrow from NMT systems.

NMT is based upon what is sometimes called ‘Deep Learning’ (DL). DL uses multi-layered neural networks (NN) to learn the correct outputs for a given input. This involves combining signals from many different nodes in the network to assess the validity of a hypothesis. NNs have a number of mechanisms that do this, one of which (Threshold Logic Units) we have adapted to allow us to combine signals from different analyses towards aggregate conclusions.

Something else NNs offer is an intriguing ability to perform better when they are thinking about different things at the same time(!). How does that work? Well, a NN is trained to perform a task and then, once it is performing well, the same network is then trained to do something entirely different but retaining its original ability. The odd thing is that when you do this, the performance of the network on its original task improves. How’s that for bizarre?

These two characteristics are also implicit in the work we have in hand in our Sparse Data Learning Model. Disparate but related analyses are run against the same data and the outputs combine to strengthenlearning outcomes.

All in all it is very fascinating, distinctly leading (bleeding?) edge and building it into a useful machine for bible translations is proving one of the most interesting and extensive projects we have ever undertaken.

 

Working together

Chiang Mai report

Both Jon and Neil travelled to Payap University in Chiang Mai early in May for meetings with colleagues working with SIL/Wycliffe and GBI. The focus was on discovering and exploiting the ways our research might support one another’s work and the outcomes were very encouraging. Our own focus on language learning machines was strongly endorsed and we can already see ways in which this work might be exploited in systems under development by colleagues elsewhere.

The whole area of machine learning and automatic translation technologies is an increasingly important topic for bible translation and our key funders are known to be keen to exploit such technological developments. Ensuring our work is central to this is important to ensure that opportunities are not missed and the widest possible benefits are secured for translators.

Soon after our return from Chiang Mai we received an invitation to contribute to an important discussion with ETEN, our major funders, at the American Bible Society HQ in Philadelphia PA in July. This meeting will bring together the Chiang Mai participants with a wider group from other organisations working in the same field in discussion with funders. This is an important meeting for us. The work we do is primary research in a highly complex, technical field. Building confidence and understanding with funders is key to the long term outcomes towards which we are working. Jon travels out to PA on 10th July for this meeting.

DeepLo 2018

The response to our request for help with the costs of attending the DeepLo conference on MAT for low-resource languages in Melbourne was strong and Jon travels on from PA to Australia (via LHR) on 13-16th July to attend ACL 2018 and DeepLo from 16-19th July. Thank you to all those who responded so generously. We are hopeful that we shall not only have the opportunity to present our work in this key research forum but there will also be opportunity to learn more about other initiatives in support of low-resource languages from amongst the wider research community.
This is a punishing schedule but these two events will be key for the project in terms both of wider support and future research directions.

Less is More

In a world where we are surrounded by texts it can be strange to realise that sometimes the first ever piece of text written in a language is the one being typed by the Bible translator. Our Sparse Data Research project looks to glean as much information about a language as possible the very start of a project, perhaps from very small amounts of text. Research is showing good results after just ten verses have been translated.

In fact we find that more data doesn’t necessarily help this process. Small groups of verses will often have related words, forms and names and be in a consistent genre. When more data is added we find the ‘noise’ of this new data can drown out the ‘signal’ that can be found in the small group of verses.

Less truly is more in these circumstances. One could almost say there’s a ‘still, small voice’ waiting to be heard above the loudness of the less focussed approach.

Less is, unfortunately, not always more where budgets are concerned. An invitation to participate in a workshop focussed precisely on the problem of working in ‘low-resource languages’ (DeepLo 2018) is both welcome and timely. DeepLo is the first forum at which machine learning specialists and computational linguists are invited to meet and discuss these issues. Our work is central to their focus. The invitation is an affirmation of our work but the venue, Melbourne Australia, would make a huge hole in our travel budget for the year. Nevertheless, the opportunity not only to present our work but to hear from others how they are approaching these issues is important. If you would like to make a donation towards the costs of attending the DeepLo workshop, please do so here.

Getting a wider team together

The Paralexica team will be in Chiang Mai, Thailand in May to meet colleagues looking at progress in computer aided translation projects and specifically at how our research might help. On-line meetings have already taken place with a larger group to prepare for the meetings in Chiang Mai, to ensure the time is used as efficiently as possible. Bringing a focus to these meetings is good, and the chance to meet colleagues face-to-face is so important. Having the time to chat over a meal, to let the team run with an idea in an unstructured way often brings good outcomes. That back of the envelope or napkin scribblings can be highly fruitful!

Chiang Mai was chosen for our meetings partly because there is a large Bible translation work in South East Asia where many communities in the region are still in need of scripture in their own language. The difficulties of providing good linguistic analysis for languages with little or no resources is well recognised here.

 

Dealing with Distraction

2017 was an exciting year for the team as we developed our plans for Sparse Data Analysis and presented proposals to colleagues and others at meetings and conferences. As we begin 2018 there seems to be a mountain of work ahead to realise the potential in these plans. Keeping focussed and energised (particularly on dark February days) can sometimes be hard.

As we move into 2018 there are a number of distractions around. Technical set backs can delay progress and the increasing need to justify our work as funding for global translation comes under pressure can all feel like distractions when there is so much to do.

The lady holding her bible on the right is Ozoonwa Nyumbe from S. Sudan. This is where our focus needs to be. Threats to funding are not just a distraction to us in our work, in the end they mean that people like Ozoonwa may not be able to discover God’s love through scripture in their own language.

Please pray,

  • Giving thanks for:
    • the support of friends and colleagues worldwide
    • lengthening days and the approach of spring
    • all who contribute to the task of Bible translation
  • Praying for:
    • stable and sufficient funding for long-term projects
    • clarity and understanding as the team wrestle with complex problems
    • a right focus and freedom from distraction

Christmas transformations

This time of year is all about transformation. This weekend I will be changing the appearance of my home by festooning a couple of hundred small light bulbs over the front of the house and garden. I will, no doubt, change my diet over the next few weeks, increasing by an alarming amount my calorie intake. Soon days will start to get longer as we head into the new year.

The nativity story seems to me to be one of radical transformation. Everyone involved sees great change of some sort. Changes not only of physical location, but also likely changes on how life might be henceforth. Many make a journey, often of considerable length and probable hardship. One imagines shepherds telling campfire stories of angels and the special baby for generations. Magi realising the implications for themselves and the world when the heavens reveal the coming of a king. Mary and Joseph wondering what life will look like in the coming years and coming to terms with it not looking like what they might have planned.  All respond to a ‘call’ of some sort. This call doesn’t regard social standing, wealth or intellect. All are invited (or somehow compelled?) to take part in the greatest transformation and mystery of all – that of the God of the universe becoming incarnate as a baby. Once again at Christmas we too are invited to consider this great transformation and how our life might henceforth change.

I don’t think it is too much to say the Bible is part of this transformation. When the Bible comes to a community in a heart-language it brings with it change for both individuals and the whole community. This article shows what happened when the new Beembe translation arrived at a village in Congo-Brazzaville. Through your support we hope our work can continue to be part of this transformation.

The end of another year inevitably brings a look back to what has been achieved and a look forward to plans for the next year. We are grateful that the Singapore meetings in November saw affirmation that our research is bringing good results and should be carried forward. Specifically that we should instigate further testing early in 2017 with a view to building production software later in 2017.

Please pray:

  • Give thanks for the success of the UBS iCAP and AsLing conferences in November.
  • Give thanks for affirmation of current work and guidance for future developments.
  • For successful future testing and development work into 2017.

One of our supporting fellowships here in the UK has set up an on-line donation page on MyDonate where you can send a donation and get the benefit of Gift Aid as well. We shall need to replace some fairly elderly computers soon and hope to use these funds towards the cost (about £4,000). If you’d prefer to donate via more traditional means please contact us at gtp@biblesocieties.org and we shall send you details.

Thank you very much!


Almighty God, you have poured upon us the new light of your incarnate Word: Grant that this light, enkindled in our hearts, may shine forth in our lives; through Jesus Christ our Lord, who lives and reigns with you, in the unity of the Holy Spirit, one God, now and for ever. Amen.

We pray you have a peaceful and blessed Christmas time.

As we move into the new year this prayer reminds us of the impact the incarnation can have in our lives, and of those around us.

Thank you for your continued prayers and support.

ASLING TC39

Friday 17th November 2017 found us in London attending the Association internationale pour la promotion des technologies Linguistiques (#ASLING) 39th Translating and the Computer conference (#TC39). The conference is held annually in London at the Insititute of Mechanical Engineers on Birdcage Walk under the watchful eye of George Stephenson whose portrait hangs in the conference room.

The conference attracts a wide following of computational linguists and translators in equal measure representing academic researchers, commercial translation providers and government agencies including EU and UN translation services. In this the TC conference is, in our experience, a unique blend of research, reality and pragmatism and as such it represents a highly knowledgable arena within which to present our work for peer review.

We have presented here on many occasions (see: publication list) and our work has always been well received (as evidenced by the number of return invitations we get). This year we presented our early research on Learning from Sparse Data or, as we call it, Project Paddington. We were not at all sure how well this would go down. The peer reviewers’ comments on our paper had not been entirely enthusiastic, not least because the approach we are working with is, for good reasons, diametrically opposed to most current research in Machine Translation. Some of the reviewers clearly felt we ought to fall into line. So it was with a little trepidation that Jon clambered onto the rostrum to present our paper in the very last session of the conference.

To our relief (and, to be honest, some surprise) the response was enthusiastic. The first comment during questions came from the CEO of a commercial MT provider who thanked us for the paper and then went on to say, “this is fantastic work, I have always felt that this kind of model is how our systems should be approaching language learning, it just feels right and now you have demonstrated that it can work! I am going to model this as soon as possible, thank you”! Further, equally supportive questions and comments followed until the session had to be closed to allow the conference as a whole to be formally ended. As we left the building half an hour later we were still in deep discussion with other delegates about our work including the conference key note speaker, Prof. Alexander Waibel from Carnegie Mellon University International Center (sic) for Advanced Communication Technologies, who identified many points of contact with the work of his department and mainstream MT challenges.

All in all, it was a very pleasing outcome and we came away much encouraged that our recent research is very much at the forefront of developing language technologies.

ParseBots, Language Models and a New Name

It has been rather too long since our last newsletter. Our only excuse is that we have been busy but the summer holiday season has offered us a chance to draw breath at last.

7000++ => ParaLexica

The first thing we need to share is that we have decided to rebrand the team. With so much of our work so closely coupled to the UBS ParaTExt project we felt a better name for the project would be ParaLexica which has something of a family feel to it with ParaTExt as well as sounding vaguely linguistic. We have a nice new website – paralexica.net – to go with our new name and a blog – paralexic.net/theblog/ – to which copies of these bulletins will be posted.

Some of you may remember a bulletin in April entitled Aeroplanes, Jet Lag and Sparse Learning. Thankfully, there have been no more aeroplanes since then and the jet lag has passed but the question of Learning from Sparse Data is still very much with us.

The biggest limitation of all Machine Translation (MT) systems is the need to train the system with lots of examples before it can function. In the case of an NT translation this means that much of the task may be complete before there is enough data to train the machine to help. Colleagues in both UBS and SIL have been encouraging us to consider how our systems might be brought on line earlier in a project, perhaps even from day one.

We set out to imagine a way for a machine to learn right from the very start of a translation project. This proved a very fruitful exercise. It is astonishing what can be imagined once you put aside the idea something cannot be done… The idea began in an office in the SIL centre in Dallas, was further developed at EACL 2017 in Valencia and finally gathered some flesh to its bones in the first part of the summer.

We can think of Language Learning as having three phases: Discovery, Validation and Verification. Discovery is the task of recognising structures or patterns in a language that may represent meaning or function. Validation is convincing ourselves that a pattern is worth investigating further. Verification is seeking confirmation that the analysis is good. Much of the early summer was devoted to modelling this concept and we have been very pleased with the results.

Introducing Project Paddington

We are calling the idea Project Paddington (PB). Why? Well, unusually, it is not just the name of a bear. As we began to model the discovery stage of the process we imagined a whole set of ‘Bots’, each of which was able to parse an element of natural language. This gave us morphBots, nameBots, syntaxBots, stemBots etc, etc… Collectively we thought of them as parseBots (PB), at which point a name for the project became obvious.

Turning this proposal into a viable process is a lot of work. Our existing systems (not least CogNomen) can drive many of the Bots but others will need to be developed from scratch. This task alone would easily consume the remaining funding we have and that is before we have begun to consider how best to aggregate the results from the various Bots into a coherent model for the language. Equally, designing and building a verification process which allows the (non-technical) translator to verify what the Bots are learning is not trivial.

We took the proposal to UBS, showed them a demo and asked the question: Do we put this to one side until we have finished the current work schedule or should we make this a priority? The response was: This is a very exciting development which keeps the translator at the heart of the process. We like it a lot and we want you to prioritise it.

So, we are embarking on a major piece of work of a scale well beyond the capacity of our current funding. Did we mentioned prayer..?

Figure and Ground…

Sometimes it can be difficult to see the wood for the trees but equally you need to see the big picture to understand the detail. The team has had a lot of detail to focus on over the last month or two. Neil has been hard at work putting our names finder through final testing, the EACL conference provided a wealth of information on related research elsewhere but amongst this wealth of minutiae common issues are becoming clearer and a bigger picture has begun to emerge.

Two threads in particular can be discerned. One, Sparse Data Learning, we wrote about in the last newsletter, the other is so much part of our world it has become part of the scenery. Electronics engineers call it Signal and Noise. Finding a structure or pattern in a text is all about recognising that pattern from amongst the noise around it. For CogNomen, the pattern is the row of letters that represent a proper name and the noise is all the other words collocated with that row, some of which will share some of the letters in the pattern. Working out which of the words is the name you want is an exercise in finding a signal in the midst of noisy distractions. Only if the Signal Noise Ratio  (SNR) is in our favour can the pattern be found.
Just as CGN is seeking a particular pattern in the text stream so a morphology analyser is looking for common patterns that represent the way words change their function and a glossing engine is trying to find patterns in the places where a particular word or phrase ought to be.

All of these are exercises in discerning signal from noise or recognising figure from ground. Interestingly, once you have found a figure, if you then consider the figure as ground you may well discover more signals in what was noise, just like the Escher drawing above. An example would be a machine that can discover morphology templates in a language. Once it has recognised English –ed and –ing suffixes if we then think of those signals as noise and mask them out, stems of words emerge from the background like this: learned, learning, => learn* and then by flipping the signal and noise again we find: learnerlearns, learnt etc…

It’s a bit like Thomas and Philip: “how can we know the way”?, “show us the Father”! The answer is always ‘look for the pattern’. In the end, that’s what this work is all about. Helping others recognise the pattern of Christ in the scriptures. It’s good to remember the big picture.

Please pray:

  • giving thanks
    • for a good conference at EACL 2017 in Spain and for the wealth of new ideas it has generated for us,
    • that our funding is now secure until May 2018,
    • that our hardware appeal has reached its target and we are now able to replace ageing systems
  •  and for wisdom as we plan for the next 12 months.

O Almighty God, whom truly to know is everlasting life; Grant us perfectly to know thy Son Jesus Christ to be the way, the truth, and the life; that, following the steps of thy holy Apostle, Saint Philip we may stedfastly walk in the way that leadeth to eternal life; through the same thy Son Jesus Christ our Lord.
Amen.

Aeroplanes, Jet Lag and Sparse Learning…

March has been a strange sort of month for the team. There has been a lot of travelling, especially for Jon who is still trying to manage the after effects of an 8 hour shift one way closely followed by an 6 hour shift the other. Some of us think he’s always been a bit vague… Travelling can not only be tiring but it can also get in the way of what we think of as the real work so there has to be a good reason for one of us to clamber onto an aeroplane.

In this case the request from UBS to evaluate a system proposed by colleagues elsewhere proved well worth the hassle of aeroplanes and jet lag. Jon travelled to Chiang Mai to learn more of the work under evaluation and then both Jon and Neil flew out to Dallas to help create a development roadmap and consider where our systems might contribute. Colleagues from other Bible Societies and Mission Agencies took part and it was good to renew acquaintance with old friends, some not seen for many years.

The good news for us was that our current R&D strategy is already closely aligned with the opportunities created by this new initiative and the time spent in Chiang Mai and Dallas has also proved very fruitful in encouraging us to exploit our systems in new and exciting contexts. One observation which emerged from the discussions was the need for translators to benefit from MAT systems at the earliest stages of a translation. This has encouraged us to think much harder about how our systems can learn from very sparse data. Sparse or Incremental Learning as it is sometimes called is a big challenge for machines. We are looking forward to exploring what might be done.

There is one more aeroplane to negotiate before Easter (when we were first planning for 2017 we imagined it would be the only one in the first half of the year!) as both Jon and Neil are attending the Association of Computational Linguistics European chapter conference, EACL2017. This is a triennial event held this time in Valencia in Spain. It is an opportunity for us to learn more of other people’s work and see where the trends in wider machine translation are leading and how Bible translation might benefit.

Please pray:

  • that new collaborations may bear good fruit,
  • giving thanks for travelling mercies in February and March and praying for safe journeys to and from EACL 2017 in Spain.
  • for focus and clarity as we work towards delivering our names finder module for ParaTExt,
  • and for space to imagine new ways of helping translators worldwide.

Don’t forget you can still contribute to our computer hardware costs on-line at our donation page on MyDonate and get the benefit of Gift Aid as well. If you’d prefer to donate via more traditional means please contact us and we shall send you details.


God of love,
passionate and strong,
tender and careful:
watch over us and hold us
all the days of our life;
through Jesus Christ our Lord.
Amen.