http://team.doudoulinux.org/http://team.doudoulinux.org/favicon.ico?16699090422012-01-06T22:25:46ZDoudouLinux project lifeMigration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=15792012-01-06T22:25:46ZJean-Michel Philippe
<ul></ul><p>The latest version 1.2 provides cs, en, fr, he, nl. See:</p>
<p><a class="external" href="http://gottcode.org/tanglet/">http://gottcode.org/tanglet/</a></p> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=16982012-03-28T20:11:11ZJean-Michel Philippe
<ul><li><strong>Target version</strong> changed from <i>2012-02</i> to <i>2012-08</i></li></ul> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=22042012-08-27T09:02:12ZJean-Michel Philippe
<ul></ul><p>Looking at source code, it seems Tanglet only needs a word list for each language. The word list is a compressed pure text file. A way to get such word lists is to extract them from Aspell dictionaries as explained in the following blog post:</p>
<p><a class="external" href="http://typethinker.blogspot.fr/2008/02/fun-with-aspell-word-lists.html">http://typethinker.blogspot.fr/2008/02/fun-with-aspell-word-lists.html</a></p>
<p>Trials showed that this is quite fast, but this leads to possibly unwanted words:</p>
<ul>
<li>for Italian, many combinations of articles and words like <em>“quell'abarico”</em></li>
<li>for French, many accentuated letters that are not proposed in Tanglet for ease of play</li>
</ul>
<p>The first case can be solved by removing words containing a single quote while the second case requires to use a tool like <em>unaccent</em>. Unaccent requires to be used on each line separately (otherwise it crashes), which is unexpectedly very fast. On the other hand, this leads to duplicate words and then requires to filter out results with <em>uniq</em>, which is a bit slow but acceptable.</p>
<p>Sample scripts:</p>
<pre><code>$ sudo apt-get install aspell aspell-it aspell-fr<br />$ aspell -l it dump master | aspell -l it expand | sed -r 's/ [^ ]+'"'"'[^ ]+//g' | less<br />$ aspell -l fr dump master | aspell -l fr expand | while read LINE; do echo $LINE | unaccent UTF8; done | uniq | less</code></pre> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=22052012-08-27T09:16:58ZJean-Michel Philippe
<ul></ul><p>On Git, Tanglet now provides German too:</p>
<p><a class="external" href="https://github.com/gottcode/tanglet">https://github.com/gottcode/tanglet</a></p> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=22062012-08-27T11:35:42ZJean-Michel Philippe
<ul></ul><p>In the French extraction from Aspell, single quotes are to be removed too. In Italian, there are multiple words per line and names of people, places, etc., starting by an uppercase letter. This can be solved using:</p>
<pre><code>$ tr ' ' '\n' | grep -o '^[a-z]*'</code></pre>
<p>Finally Italian requires unaccent to be run too.</p> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=22952012-09-02T23:07:19ZStéphane Aulerylkppo@free.fr
<ul></ul><p>Tanglet is also reflected on Transifex. Our translation is duplicate but it is more advanced. We can easily offer our work upstream.</p> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=22972012-09-09T19:37:45ZJean-Michel Philippe
<ul></ul><p>Yes you're right. Indeed I'm in touch with the author of Tanglet because I was finding it currently offers too few languages to play with. So I wrote a script that collects words from dictionary files to build Tanglet word lists. I didn't talk about Transifex yet, I should.</p>
<p>Note that this is a recurring problem: we have many translations on Transifex that never go back to upstream, although some of our translators took the time to contact upstream projects. If you think you'd have time to coordinate between our Transifex project and upstream, you're welcome!</p> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=23012012-09-10T11:54:44ZStéphane Aulerylkppo@free.fr
<ul></ul><p>At the moment I am very busy with proofreading Linux From Scratch. I note in my todo this need, knowing that I would first like to correct the page doudoulinux website and complete French translation.</p> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=23022012-09-10T18:09:37ZJean-Michel Philippe
<ul></ul><p>No problem, we're not in a hurry!</p> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=23042012-09-10T22:04:23ZStéphane Aulerylkppo@free.fr
<ul></ul><p>Not biensur. I try to do at least one small thing every week.</p> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=23182012-09-23T14:34:09ZJean-Michel Philippe
<ul></ul><p>I've recently two scripts from the upstream developer. I can now generate dice from lists of words and generate a list of words suitable for the most recent versions of Tanglet (not the Squeeze one, we don't need it for this version). All the scripts have been successfully tested with several languages among those Greek and Russian that are using non Latin alphabets.</p>
<p>The generated files will be uploaded to:</p>
<p><a class="external" href="http://media.doudoulinux.org/tanglet/">http://media.doudoulinux.org/tanglet/</a></p>
<p>Right now there are only the lists of words. We need to put together the files for lists of words, dice and language name. These files can be then tested by copying the directory into <code>/usr/share/tanglet/data/</code>.</p>
<pre><code>$ ls /usr/share/tanglet/data/<br /> cs en fr</code></pre> Migration to Squeeze - Bug #464: Tanglet should work with more languageshttp://team.doudoulinux.org/issues/464?journal_id=23192012-09-23T14:35:05ZJean-Michel Philippe
<ul><li><strong>Due date</strong> changed from <i>01/16/2012</i> to <i>09/30/2012</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>Jean-Michel Philippe</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>80</i></li><li><strong>Estimated time</strong> changed from <i>10:00 h</i> to <i>6:00 h</i></li></ul>