Sometimes it can be difficult to see the wood for the trees but equally you need to see the big picture to understand the detail. The team has had a lot of detail to focus on over the last month or two. Neil has been hard at work putting our names finder through final testing, the EACL conference provided a wealth of information on related research elsewhere but amongst this wealth of minutiae common issues are becoming clearer and a bigger picture has begun to emerge.
Two threads in particular can be discerned. One, Sparse Data Learning, we wrote about in the last newsletter, the other is so much part of our world it has become part of the scenery. Electronics engineers call it Signal and Noise. Finding a structure or pattern in a text is all about recognising that pattern from amongst the noise around it. For CogNomen, the pattern is the row of letters that represent a proper name and the noise is all the other words collocated with that row, some of which will share some of the letters in the pattern. Working out which of the words is the name you want is an exercise in finding a signal in the midst of noisy distractions. Only if the Signal Noise Ratio (SNR) is in our favour can the pattern be found.
Just as CGN is seeking a particular pattern in the text stream so a morphology analyser is looking for common patterns that represent the way words change their function and a glossing engine is trying to find patterns in the places where a particular word or phrase ought to be.
All of these are exercises in discerning signal from noise or recognising figure from ground. Interestingly, once you have found a figure, if you then consider the figure as ground you may well discover more signals in what was noise, just like the Escher drawing above. An example would be a machine that can discover morphology templates in a language. Once it has recognised English –ed and –ing suffixes if we then think of those signals as noise and mask them out, stems of words emerge from the background like this: learned, learning, => learn* and then by flipping the signal and noise again we find: learner, learns, learnt etc…
It’s a bit like Thomas and Philip: “how can we know the way”?, “show us the Father”! The answer is always ‘look for the pattern’. In the end, that’s what this work is all about. Helping others recognise the pattern of Christ in the scriptures. It’s good to remember the big picture.
- giving thanks
- for a good conference at EACL 2017 in Spain and for the wealth of new ideas it has generated for us,
- that our funding is now secure until May 2018,
- that our hardware appeal has reached its target and we are now able to replace ageing systems
- and for wisdom as we plan for the next 12 months.