Project

General

Profile

Actions

Bug #464

open

Task #1: Define the list of applications

Task #6: Propose additional software for inclusion

Feature #340: Add tanglet, a Boggle-like game

Tanglet should work with more languages

Added by Jean-Michel Philippe over 12 years ago. Updated over 11 years ago.

Status:
In Progress
Priority:
Normal
Category:
Translation
Start date:
12/12/2011
Due date:
09/30/2012 (over 11 years late)
% Done:

80%

Estimated time:
6:00 h
Spent time:

Description

Currently Tanglet only works for Czech, English and French. Dictionaries are missing for other languages. We need to contact the upstream author to know how to make a dictionary for a new language. His instructions must then be tested on a new language then written in a new page of our website.

Actions #1

Updated by Jean-Michel Philippe about 12 years ago

The latest version 1.2 provides cs, en, fr, he, nl. See:

http://gottcode.org/tanglet/

Actions #2

Updated by Jean-Michel Philippe about 12 years ago

  • Target version changed from 2012-02 to 2012-08
Actions #3

Updated by Jean-Michel Philippe over 11 years ago

Looking at source code, it seems Tanglet only needs a word list for each language. The word list is a compressed pure text file. A way to get such word lists is to extract them from Aspell dictionaries as explained in the following blog post:

http://typethinker.blogspot.fr/2008/02/fun-with-aspell-word-lists.html

Trials showed that this is quite fast, but this leads to possibly unwanted words:

  • for Italian, many combinations of articles and words like “quell'abarico”
  • for French, many accentuated letters that are not proposed in Tanglet for ease of play

The first case can be solved by removing words containing a single quote while the second case requires to use a tool like unaccent. Unaccent requires to be used on each line separately (otherwise it crashes), which is unexpectedly very fast. On the other hand, this leads to duplicate words and then requires to filter out results with uniq, which is a bit slow but acceptable.

Sample scripts:

$ sudo apt-get install aspell aspell-it aspell-fr
$ aspell -l it dump master | aspell -l it expand | sed -r 's/ [^ ]+'"'"'[^ ]+//g' | less
$ aspell -l fr dump master | aspell -l fr expand | while read LINE; do echo $LINE | unaccent UTF8; done | uniq | less
Actions #4

Updated by Jean-Michel Philippe over 11 years ago

On Git, Tanglet now provides German too:

https://github.com/gottcode/tanglet

Actions #5

Updated by Jean-Michel Philippe over 11 years ago

In the French extraction from Aspell, single quotes are to be removed too. In Italian, there are multiple words per line and names of people, places, etc., starting by an uppercase letter. This can be solved using:

$ tr ' ' '\n' | grep -o '^[a-z]*'

Finally Italian requires unaccent to be run too.

Actions #6

Updated by Stéphane Aulery over 11 years ago

Tanglet is also reflected on Transifex. Our translation is duplicate but it is more advanced. We can easily offer our work upstream.

Actions #7

Updated by Jean-Michel Philippe over 11 years ago

Yes you're right. Indeed I'm in touch with the author of Tanglet because I was finding it currently offers too few languages to play with. So I wrote a script that collects words from dictionary files to build Tanglet word lists. I didn't talk about Transifex yet, I should.

Note that this is a recurring problem: we have many translations on Transifex that never go back to upstream, although some of our translators took the time to contact upstream projects. If you think you'd have time to coordinate between our Transifex project and upstream, you're welcome!

Actions #8

Updated by Stéphane Aulery over 11 years ago

At the moment I am very busy with proofreading Linux From Scratch. I note in my todo this need, knowing that I would first like to correct the page doudoulinux website and complete French translation.

Actions #9

Updated by Jean-Michel Philippe over 11 years ago

No problem, we're not in a hurry!

Actions #10

Updated by Stéphane Aulery over 11 years ago

Not biensur. I try to do at least one small thing every week.

Actions #11

Updated by Jean-Michel Philippe over 11 years ago

I've recently two scripts from the upstream developer. I can now generate dice from lists of words and generate a list of words suitable for the most recent versions of Tanglet (not the Squeeze one, we don't need it for this version). All the scripts have been successfully tested with several languages among those Greek and Russian that are using non Latin alphabets.

The generated files will be uploaded to:

http://media.doudoulinux.org/tanglet/

Right now there are only the lists of words. We need to put together the files for lists of words, dice and language name. These files can be then tested by copying the directory into /usr/share/tanglet/data/.

$ ls /usr/share/tanglet/data/
cs en fr
Actions #12

Updated by Jean-Michel Philippe over 11 years ago

  • Due date changed from 01/16/2012 to 09/30/2012
  • Status changed from New to In Progress
  • Assignee set to Jean-Michel Philippe
  • % Done changed from 0 to 80
  • Estimated time changed from 10:00 h to 6:00 h
Actions

Also available in: Atom PDF