On Sat, Sep 23, 2006 at 09:12:28PM +0300, Martin-Éric Racine wrote: > Hello Brian, > > It seems that a recent release of the Russian wordlist (source: > rus-ispell) contains words that aspell interprets as illegal, which makes > the hash generation break, leaving users with an empty hash file (Bug > #385403) and making this an RC bug. Both words definitely exist in the > Russian language, so I am unsure how to solve this issue. Any ideas?
The error message ====postinst log=== ????????????? ????? aspell-ru (0.99g3-1) ... aspell-autobuildhash: processing: ru [ru] ??????: /usr/lib/aspell//ru_affix.dat:1246: The condition "???" does not guarantee that "????" can always be stripped. aspell-autobuildhash: processing: ru [ru] ??????: /usr/lib/aspell//ru_affix.dat:1246: The condition "???" does not guarantee that "????" can always be stripped. =================== suggests that rule described in line 1246 of ru_affix.dat is wrong, and cannot always be executed for the given string. Indeed, looking at that rule, I see something that, expressed in terms of 7bit chars, looks like SFX L estn as stn that means, if you find 'stn' strip 'estn' (Buggy!!!) and replace it by 'as'. But if you have something like 'astn' rule is matched, but you cannot strip 'estn', hence the error. I have blindly modified that line to something possible and I then find a different set of errors and warnings, --------------------- Warning: The word "???" is invalid. The total length is larger than 240 characters. Skipping word. ... Error: The word "?????????" is invalid. The total word length, with soundslike data, is larger than 240 characters. --------------------- which disappear if I use the .wl file instead of the .cwl for building the hash (I also removed the offending number from the first line of the wordlist). I tested this in a aspell personal sarge backport, so I cannot confirm if this means only that my backport is buggy or if is a general problem. I hope to try that tomorrow with a current aspell. Regarding the original problem, the right ru_affix.dat fix is definitely something for rus-ispell upstream, or at least for somebody fluent with russian. I am attaching a patch with the changes I used here to test that the rule was failing. Do not consider it a real patch, I do not speak russian, just a dummy test. Hope this helps -- Agustin
--- ru_affix.dat.orig 2006-09-24 22:05:38.000000000 +0200 +++ ru_affix.dat 2006-09-24 22:05:55.000000000 +0200 @@ -1244,7 +1244,7 @@ SFX L ÓÑ ÌÁÓØ [^ÁÉÑØ]ÓÑ SFX L ÅÞØÓÑ £ËÓÑ ÅÞØÓÑ SFX L ÅÞØ £Ë ÅÞØ -SFX L ÅÚÔÉ £Ú ÚÔÉ +SFX L ÅÚÔÉ £Ú ÅÚÔÉ SFX L ÅÓÔØ ÌÏ ÞÅÓÔØ SFX L ÅÓÔØ ÌÉ ÞÅÓÔØ SFX L ÅÓÔØ ÌÁ ÞÅÓÔØ

