On Tue, Apr 1, 2008 at 5:40 PM, Chas. Owens <[EMAIL PROTECTED]> wrote:
> 2008/4/1 Jay Savage <[EMAIL PROTECTED]>:
>  snip
>
> >     my ($last) = $number =~ /.*(\d)/;
>  >
>  >  Let Perl worry about what is and isn't a digit.
>  snip
>
>  Unfortunately, with the rise of UNICODE, \d is no longer what one
>  expects* ([0-9]).  It now includes all characters marked as digits in
>  UNICODE.  This includes characters like "\x{1813}" (MONGOLIAN DIGIT
>  THREE).  The \w character class also no longer matches [a-zA-Z0-9_],
>  but instead matches any character marked as a word character by
>  UNICODE; however, this is much less of a problem since, unlike digits,
>  a character is still a character (try adding "\x{1813}" + 1).
>
>  * note, you can get the old behavior back by using the bytes pragma.
>

Exactly. The functionality is there to be taken advantage of. I don't
see why that is "unfortunate" at all. How is "Mongolian digit three"
less of a digit than arabic numeral three? That \d is unicode-aware
is, IMO, it's strongest selling point over a simple roll-your-own
character class. \d finds digits; [0-9] finds arabic numerals.

As for what one expects...that's a different story.

Best,

-- jay
--------------------------------------------------
This email and attachment(s): [ ] blogable; [ x ] ask first; [ ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com http://www.downloadsquad.com http://www.engatiki.org

values of β will give rise to dom!

Reply via email to