Re: [D-I] Automatic spellcheck of po files

Frans Pop Wed, 19 Jan 2005 17:12:57 -0600

On Wednesday 19 January 2005 22:20, Petter Reinholdtsen wrote:
> Is there some charset problem?  I looked at the
> unknown words for nb, and "går" and "når" are definitely not unknown
> words in the dictionary.


I see the same kind of problem with Dutch.

The unknown wordlist shows 'BraziliÃ«', which is 'Brazilië' in UTF-8 
(Dutch for Brazil).
I've just checked the a-spell Dutch wordlist and Brazilië _is_ included.

$ aspell dump master /usr/lib/aspell/dutch | grep "Brazil"
Braziliaanse
Braziliaans
Braziliaan
Brazilianen
Brazilië

It looks like the dump prints a ISO-8859-1 coded list.

I think the manpage for aspell gives the answer:
<quote>
  --encoding=string
       The  encoding  the input text is in. Valid values are ``utf-8'',
       ``iso8859-*'',  ``koi8-r'',  ``viscii'',  ``cp1252'',  ``machine
!!     unsigned  16'',  ``machine  unsigned  32''.  However, the Aspell
!!     utility will currently only function correctly with 8-bit encod-
!!     ings. utf-8 support is planned for the future. The two ``machine
       unsigned'' encodings are intended to be used by  other  programs
       using  the  Aspell library and it is unlikely the Aspell utility
       will ever support these encodings.
</quote>

So it looks as if you may have to iconv the files before you test them 
(or, even better, patch aspell so it supports utf-8 ;-)

pgpeQh10ahzEz.pgp
Description: PGP signature

Re: [D-I] Automatic spellcheck of po files

Reply via email to