I'm trying to throw out URLs with any invalid characters in them, like
'@". According to http://www.ietf.org/rfc/rfc1738.txt :
   Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL.

I'd like to throw out a URL like
'http://jncicancerspectrum.oupjournals.org/cgi/content/full/jnci;91/3/252'
(even though this one works perfectly fine. Go figure.). I've tried:
        if ($url =~ /^[^A-Za-z0-9$-_.+!*'(),]+$/) { #if there are any
invalid URL characters in the string
                                                    # Remember, special
regex characters lose their meaning inside []
           print "Invalid character in URL at line $.: $url\n";
           next;
        }

According to my Camel, special regex characters are supposed to lose
their special functioning inside []. Yet, that obviously isn't true for
'-' used to separate the start and end of a range. I thought the fourth
'-' at '$-' was probably indicating a range, so I tried to escape it by
preceding it with a backslash or '\Q' but both gave strange errors about
uninitiated strings in concatenations.

Any suggestions? Thanks for your help and thoughts.

-Kevin Zembower

-----
E. Kevin Zembower
Unix Administrator
Johns Hopkins University/Center for Communications Programs
111 Market Place, Suite 310
Baltimore, MD  21202
410-659-6139

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to