Umap Cymraeg, the Welsh language version of Umap, has been up and running now for about three weeks. I thought it high time that I gave a quick explanation of what it is and who’s behind it in English so that curious non-Welsh speakers could maybe chip in with any comments or questions you have on the service.
I’ve given the full background detail to the project on the Welsh language post so maybe a Google Translation will have to suffice if you want the details, but in short Luistxo Fernandez a web developer for CodeSyntax in Eibar, in the Basque Country, contacted me to ask if I wanted to collaborate on developing a Welsh language version of their already operational Basque and Catalan tweet aggregators. I was only too happy to oblige seeing as my current job is study towards a PhD looking at online participatory culture and the Welsh language. Serendipity is a wonderful thing.
What does Umap do?
Umap Cymraeg does several things:
- it collects a database of Twitter users that have used Welsh in their tweets (or have the potential to do so). Manually initially, but then the system begins to add users by itself;
- it filters the tweets by these users for ones that are in Welsh, then publishes them on the main page;
- it filters this stream of tweets for popular words and hashtags and creates hourly/daily/weekly/monthly top trending topics lists
- it filters the links in Welsh language tweets to discover what links are most popular, giving a dynamic top news/shared links chart
How accurate is it?
At present, from rough calculations and mere obvservation it seems that it recognises about 60-65% of Welsh language tweets. Problems arise when tweets are short or where there is a mixture of Welsh and another language. It is however very good at not publishing tweets which aren’t in Welsh, with relatively few English only tweets coming through. I hope that we can improve this rate as we go on but it seems that even this rate is acceptable in providing an overview of the public discussion in Welsh.
Why is Umap needed?
Some may not quite see the point of duplicating content from Twitter, but the aim is to provide a space in which Welsh tweets are seen in context with each other, rather than in a sea of other content. Umap creates a distinct Welsh language space out of a very complex and busy twitter stream.
Apart from the ability to be able to dip in to a monolingual Welsh Twitter, this has several other spinoff advantages:
- This is he first time that trends amongst Welsh language tweets have been identified. From a mapping perspective alone this is useful, but it can also be harnessed for marketing, discovery, brand /news monitoring and other uses.
- Umap’s top news/links feed creates a new and unique way of seeing what the conversation is about at any moment in time. No service has previously been able to automatically gauge what is making waves online in the Welsh language. Of course it can also show that little is shared, or that the range of things that are shared in Welsh are limited, but this is great for building a better discussion in the Welsh language. Spot the gaps, and we can then find ways to fill them.
- Umap can work as a first point of contact for new Twitter users who wish to discover other Welsh speakers. There are now over 1,000 users being followed with the top 50 busiest ranked on a page. This is already being used as a signpost to where the Welsh language conversation is at.
- Umap can be built upon. All published tweets are archived and searchable. Twitter displays them for a limited time. Umap displays them until there is not enough space for them all. This archive is easily searchable and could be valuable as a research tool or as a corpus of informal Welsh language usage. In future I hope that it will be possible for developers to use this data for secondary applications of all sorts. After all, one of the great things about Twitter is that services like Umap can be built on top of it, allowing for many innovations.
Lastly, I want to point out again that Umap has been developed by Luistxo Fernandez’s team in CodeSyntax, with a little help on the Welsh version from me. If you have an idea about using Umap’s engine for other languages or groups of Twitter users (including English) then please get in touch with them.
As I said above, serendipity is a wonderful thing – Umap Cymraeg tries to make serendipity a wonderful thing that happens more often in the Welsh language.