On Mon, Jun 28, 2010 at 19:48, John W. Krahn <jwkr...@shaw.ca> wrote:
snip
> s/\((\d+)\)/(<a href="mypage.php?$1">$1</a>)/g;
snip

Since Perl 5.8.0, \d does not mean [0-9], it means any character that
is classified as a digit in Unicode.  In Perl 5.12.1, there are five
hundred seventy-seven characters that will match \d.  If it is your
intent to replace "᠔᠒" with "<a href="mypage.php?᠔᠒">᠔᠒</a>" then \d
is a good choice; however, if you want to replace digits you can do
math with, I would suggest using [0-9].

Note, Mongolian isn't the only problem, there is also "𝟺𝟸" which
looks like "42", but is really "\x{1d7fa}\x{1d7f8}".  If you want both
"\x{1d7fa}\x{1d7f8}" and "42" to point to the same page, you will need
to use some form of transliteration like [Unicode::Digits][1]:

    use Unicode::Digits qw/digits_to_int/;

    s{
        \( (\d+) \)
    }{
        sprintf "(<a href="mypage.php?%d">%s</a>)", digits_to_int($1), $1
    }xeg

 [1] : http://search.cpan.org/dist/Unicode-Digits/lib/Unicode/Digits.pm

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to