In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote:
> Right. Unfornatly, after starting on this, I relized that that's the easy > part. Unicode has a fairly-well defined way of figuring out if a character > is a digit (see if it's category is Nd (Number/digit), and if so what it's > value is (the value of the "decimal" property.) Can it also tell you the base used for digit strings in that character set... Actually I don't know if there are any modern writing systems that don't use base ten but certainly if you were dealing with some ancient scripts that used sexagesimal numbers that might be a problem ;-) > However, there appears to be no good way of determining if somthing is a > decimal point, a sign indicator, or an E/e (exponent signifier). I suspected there wouldn't be. > The attached patch will let the chartype layer decide if a character is a > digit, and what it's value is. The patch seems to be missing though... > Note also that is_digit should now return the value of the digit if it is a > digit, or 42 if it isn't. (I had to use somthing, and ~0 sometimes wanted > to be (char)~0, and sometimes (INTVAL)~0, so I decided not to use ~0. 0, of > course, can't be used for not-a-digit, since is_digit('0')==0. I was assuming there would a separate digit_value() routine to avoid that problem. Apart from anything else there will doubtless me many other is_xxx() routines in due course which will be simple boolean tests. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu