On Tue, Nov 16, 2010 at 01:16:38PM +0100, Vincent van Ravesteijn wrote:
> >> This will work too I guess.
> >
> > In the sense of "avoid the crash"...
> >
> > The purpose of hasDigit() is to test for occurrences of digits to avoid 
> > spell check of words with digits.
> > A docstring may very well contain digits coded outside the range of 0x00 .. 
> > 0x7F (ascii 0-9).
> > Unicode contains more numeral in different encodings.
> >
> > Stephan
> 
> Are you sure that the numeric characters in other parts of the
> spectrum cannot occur in real words that need to be spellchecked. An
> example to prove that this can be the case is in Chinese:
> 
> ??? means '3', but ?????? means triangle.
> 
> Ok, I don't know what iswdigit() returns for ???, and I guess that
> spellchecking for Chinese makes no sense, but you get the idea.
> 
> It would be worse if there is some language in which such a numeric
> character occurs for example in 10% of all words (as some common
> ending or something), then 10% of the words is not spellchecked.
> 
> It feels like we are trying to be smart, but I'd feel better if we
> then exactly know what we do and which words are not spellchecked and
> why.
> 
> Besides, I read on this
> website:http://linux.about.com/library/cmd/blcmdl3_iswdigit.htm
> "The wide character class "digit" always contains exactly the digits
> '0' to '9'.", so I'm not sure whether it has any added value.

I experimented a bit on solaris. Using the attached isdigit.c program
I get the output in (the also attached) isdigit.out. As you can see,
the output is incorrect outside the ascii range and the program
segfaults, too.

However, if I stick an "#undef isdgit" right after "#include <ctype.h>",
I get no crash and the correct result:

$ ./isdigit
 48 0x30
 49 0x31
 50 0x32
 51 0x33
 52 0x34
 53 0x35
 54 0x36
 55 0x37
 56 0x38
 57 0x39

which is exactly the same as the output of the attached iswdgit.c program.
So, using the macro version of isdigit() produces wrong results if the
argument is not in the ascii range and also a crash.
Using iswdigit() produces the same result as the function version of
isdigit().

Moral: either we stick an "#undef isdigit" in our code or we switch
to iswdigit(). However, in this case, some locale expert should clarify
under what conditions the output of iswdigit() differs from that of
isdigit().

-- 
Enrico
#include <stdio.h>
#include <ctype.h>

int main(void)
{
    int wc;

    for (wc=0; wc <= 0xFFFF; wc++) {
        if (isdigit(wc)) {
            printf("%3d", wc);
            printf(" %#4x\n", wc);
        }
    }
}

#include <stdio.h>
#include <wctype.h>

int main(void)
{
    int wc;

    for (wc=0; wc <= 0xFFFF; wc++) {
        if (iswdigit(wc)) {
            printf("%3d", wc);
            printf(" %#4x\n", wc);
        }
    }
}

 48 0x30
 49 0x31
 50 0x32
 51 0x33
 52 0x34
 53 0x35
 54 0x36
 55 0x37
 56 0x38
 57 0x39
261 0x105
262 0x106
263 0x107
264 0x108
269 0x10d
270 0x10e
271 0x10f
272 0x110
277 0x115
278 0x116
279 0x117
280 0x118
285 0x11d
286 0x11e
287 0x11f
288 0x120
293 0x125
294 0x126
295 0x127
296 0x128
301 0x12d
302 0x12e
303 0x12f
304 0x130
309 0x135
310 0x136
311 0x137
312 0x138
317 0x13d
318 0x13e
319 0x13f
320 0x140
325 0x145
326 0x146
327 0x147
328 0x148
333 0x14d
334 0x14e
335 0x14f
336 0x150
341 0x155
342 0x156
343 0x157
344 0x158
349 0x15d
350 0x15e
351 0x15f
352 0x160
357 0x165
358 0x166
359 0x167
360 0x168
365 0x16d
366 0x16e
367 0x16f
368 0x170
373 0x175
374 0x176
375 0x177
376 0x178
381 0x17d
382 0x17e
383 0x17f
384 0x180
523 0x20b
524 0x20c
525 0x20d
526 0x20e
Segmentation fault

Reply via email to