In a world where we are surrounded by texts it can be strange to realise that sometimes the first ever piece of text written in a language is the one being typed by the Bible translator. Our Sparse Data Research project looks to glean as much information about a language as possible the very start of a project, perhaps from very small amounts of text. Research is showing good results after just ten verses have been translated.
In fact we find that more data doesn’t necessarily help this process. Small groups of verses will often have related words, forms and names and be in a consistent genre. When more data is added we find the ‘noise’ of this new data can drown out the ‘signal’ that can be found in the small group of verses.
Less truly is more in these circumstances. One could almost say there’s a ‘still, small voice’ waiting to be heard above the loudness of the less focussed approach.
Less is, unfortunately, not always more where budgets are concerned. An invitation to participate in a workshop focussed precisely on the problem of working in ‘low-resource languages’ (DeepLo 2018) is both welcome and timely. DeepLo is the first forum at which machine learning specialists and computational linguists are invited to meet and discuss these issues. Our work is central to their focus. The invitation is an affirmation of our work but the venue, Melbourne Australia, would make a huge hole in our travel budget for the year. Nevertheless, the opportunity not only to present our work but to hear from others how they are approaching these issues is important. If you would like to make a donation towards the costs of attending the DeepLo workshop, please do so here.
Getting a wider team together
The Paralexica team will be in Chiang Mai, Thailand in May to meet colleagues looking at progress in computer aided translation projects and specifically at how our research might help. On-line meetings have already taken place with a larger group to prepare for the meetings in Chiang Mai, to ensure the time is used as efficiently as possible. Bringing a focus to these meetings is good, and the chance to meet colleagues face-to-face is so important. Having the time to chat over a meal, to let the team run with an idea in an unstructured way often brings good outcomes. That back of the envelope or napkin scribblings can be highly fruitful!
Chiang Mai was chosen for our meetings partly because there is a large Bible translation work in South East Asia where many communities in the region are still in need of scripture in their own language. The difficulties of providing good linguistic analysis for languages with little or no resources is well recognised here.