Re: PATCH for ticket 7026

Vincent van Ravesteijn Tue, 16 Nov 2010 04:16:50 -0800

>> This will work too I guess.
>
> In the sense of "avoid the crash"...
>
> The purpose of hasDigit() is to test for occurrences of digits to avoid spell 
> check of words with digits.
> A docstring may very well contain digits coded outside the range of 0x00 .. 
> 0x7F (ascii 0-9).
> Unicode contains more numeral in different encodings.
>
> Stephan


Are you sure that the numeric characters in other parts of the
spectrum cannot occur in real words that need to be spellchecked. An
example to prove that this can be the case is in Chinese:

三 means '3', but 三角 means triangle.

Ok, I don't know what iswdigit() returns for 三, and I guess that
spellchecking for Chinese makes no sense, but you get the idea.

It would be worse if there is some language in which such a numeric
character occurs for example in 10% of all words (as some common
ending or something), then 10% of the words is not spellchecked.

It feels like we are trying to be smart, but I'd feel better if we
then exactly know what we do and which words are not spellchecked and
why.

Besides, I read on this
website:http://linux.about.com/library/cmd/blcmdl3_iswdigit.htm
"The wide character class "digit" always contains exactly the digits
'0' to '9'.", so I'm not sure whether it has any added value.

Vincent

Re: PATCH for ticket 7026

Reply via email to