Bug #464
open
Task #1: Define the list of applications
Task #6: Propose additional software for inclusion
Feature #340: Add tanglet, a Boggle-like game
Tanglet should work with more languages
Added by Jean-Michel Philippe about 13 years ago.
Updated over 12 years ago.
Due date:
09/30/2012 (over 12 years late)
Description
Currently Tanglet only works for Czech, English and French. Dictionaries are missing for other languages. We need to contact the upstream author to know how to make a dictionary for a new language. His instructions must then be tested on a new language then written in a new page of our website.
- Target version changed from 2012-02 to 2012-08
Looking at source code, it seems Tanglet only needs a word list for each language. The word list is a compressed pure text file. A way to get such word lists is to extract them from Aspell dictionaries as explained in the following blog post:
http://typethinker.blogspot.fr/2008/02/fun-with-aspell-word-lists.html
Trials showed that this is quite fast, but this leads to possibly unwanted words:
- for Italian, many combinations of articles and words like “quell'abarico”
- for French, many accentuated letters that are not proposed in Tanglet for ease of play
The first case can be solved by removing words containing a single quote while the second case requires to use a tool like unaccent. Unaccent requires to be used on each line separately (otherwise it crashes), which is unexpectedly very fast. On the other hand, this leads to duplicate words and then requires to filter out results with uniq, which is a bit slow but acceptable.
Sample scripts:
$ sudo apt-get install aspell aspell-it aspell-fr
$ aspell -l it dump master | aspell -l it expand | sed -r 's/ [^ ]+'"'"'[^ ]+//g' | less
$ aspell -l fr dump master | aspell -l fr expand | while read LINE; do echo $LINE | unaccent UTF8; done | uniq | less
In the French extraction from Aspell, single quotes are to be removed too. In Italian, there are multiple words per line and names of people, places, etc., starting by an uppercase letter. This can be solved using:
$ tr ' ' '\n' | grep -o '^[a-z]*'
Finally Italian requires unaccent to be run too.
Tanglet is also reflected on Transifex. Our translation is duplicate but it is more advanced. We can easily offer our work upstream.
Yes you're right. Indeed I'm in touch with the author of Tanglet because I was finding it currently offers too few languages to play with. So I wrote a script that collects words from dictionary files to build Tanglet word lists. I didn't talk about Transifex yet, I should.
Note that this is a recurring problem: we have many translations on Transifex that never go back to upstream, although some of our translators took the time to contact upstream projects. If you think you'd have time to coordinate between our Transifex project and upstream, you're welcome!
At the moment I am very busy with proofreading Linux From Scratch. I note in my todo this need, knowing that I would first like to correct the page doudoulinux website and complete French translation.
No problem, we're not in a hurry!
Not biensur. I try to do at least one small thing every week.
I've recently two scripts from the upstream developer. I can now generate dice from lists of words and generate a list of words suitable for the most recent versions of Tanglet (not the Squeeze one, we don't need it for this version). All the scripts have been successfully tested with several languages among those Greek and Russian that are using non Latin alphabets.
The generated files will be uploaded to:
http://media.doudoulinux.org/tanglet/
Right now there are only the lists of words. We need to put together the files for lists of words, dice and language name. These files can be then tested by copying the directory into /usr/share/tanglet/data/
.
$ ls /usr/share/tanglet/data/
cs en fr
- Due date changed from 01/16/2012 to 09/30/2012
- Status changed from New to In Progress
- Assignee set to Jean-Michel Philippe
- % Done changed from 0 to 80
- Estimated time changed from 10:00 h to 6:00 h
Also available in: Atom
PDF