On Tue, Apr 1, 2008 at 5:40 PM, Chas. Owens <[EMAIL PROTECTED]> wrote: > 2008/4/1 Jay Savage <[EMAIL PROTECTED]>: > snip > > > my ($last) = $number =~ /.*(\d)/; > > > > Let Perl worry about what is and isn't a digit. > snip > > Unfortunately, with the rise of UNICODE, \d is no longer what one > expects* ([0-9]). It now includes all characters marked as digits in > UNICODE. This includes characters like "\x{1813}" (MONGOLIAN DIGIT > THREE). The \w character class also no longer matches [a-zA-Z0-9_], > but instead matches any character marked as a word character by > UNICODE; however, this is much less of a problem since, unlike digits, > a character is still a character (try adding "\x{1813}" + 1). > > * note, you can get the old behavior back by using the bytes pragma. >
Exactly. The functionality is there to be taken advantage of. I don't see why that is "unfortunate" at all. How is "Mongolian digit three" less of a digit than arabic numeral three? That \d is unicode-aware is, IMO, it's strongest selling point over a simple roll-your-own character class. \d finds digits; [0-9] finds arabic numerals. As for what one expects...that's a different story. Best, -- jay -------------------------------------------------- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.downloadsquad.com http://www.engatiki.org values of β will give rise to dom!