Dear Horiguchi-san, Fujii-san,

Perfect work... Thank you for replying and analyzing! 

>  A. "^-?[0-9]+.*" : returns valid padding. p goes after the last digit.
>  B. "^[^0-9-].*"  : padding = 0, p doesn't advance.
>  C. "^-[^0-9].*"  : padding = 0, p advances by 1 byte.
>  D. "^-"          : padding = 0, p advances by 1 byte.
>   (if *p == 0 then breaks)

I confirmed them and your patterns are correct.

> If we wan to make the behaviors C and D same with the current, the
> else clause should be like the follows, but I don't think we need to
> do that.
>               else
>               {
>           padding = 0;
>                 if (*p == '-')
>                   p++;
>           }

This treatments is not complex so I want to add them if possible.

> One possible cause of a difference in behavior is character class
> handling including multibyte characters of isdigit and strtol.  If
> isdigit accepts '一' as a digit (some platforms might do this) , and
> strtol doesn't (I believe it is universal behavior), '%一0p' is
> converted to '%' and the pointer moves onto '一'. But I don't think we
> need to do something for such a crazy specification.

Does isdigit() understand multi-byte character correctly? The arguments
of isdigit() is just a unsigned char, and this is 1byte.
Hence I thought that they cannot distinguish 'ー'. 
Actually I considered about another thing. Maybe isdigit() just checks 
whether the value of the argument is in (int)48 and (int)57, and that means that
the first part of some multi-byte characters may be accepted as digit in some 
locales.
But, of cause I agreed this is the crazy case.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Reply via email to