On Wednesday 19 January 2005 22:20, Petter Reinholdtsen wrote: > Is there some charset problem? I looked at the > unknown words for nb, and "går" and "når" are definitely not unknown > words in the dictionary.
I see the same kind of problem with Dutch. The unknown wordlist shows 'Brazilië', which is 'Brazilië' in UTF-8 (Dutch for Brazil). I've just checked the a-spell Dutch wordlist and Brazilië _is_ included. $ aspell dump master /usr/lib/aspell/dutch | grep "Brazil" Braziliaanse Braziliaans Braziliaan Brazilianen Brazilië It looks like the dump prints a ISO-8859-1 coded list. I think the manpage for aspell gives the answer: <quote> --encoding=string The encoding the input text is in. Valid values are ``utf-8'', ``iso8859-*'', ``koi8-r'', ``viscii'', ``cp1252'', ``machine !! unsigned 16'', ``machine unsigned 32''. However, the Aspell !! utility will currently only function correctly with 8-bit encod- !! ings. utf-8 support is planned for the future. The two ``machine unsigned'' encodings are intended to be used by other programs using the Aspell library and it is unlikely the Aspell utility will ever support these encodings. </quote> So it looks as if you may have to iconv the files before you test them (or, even better, patch aspell so it supports utf-8 ;-)
pgpeQh10ahzEz.pgp
Description: PGP signature