John Rudd wrote:
The only problem I can think of is than an ampersand in a _URL_ is legal
(IIRC, in CGI form urls, ampersand is used to delimit different
variables, so if the URL question contains some form of context, like
ack'ing a sign-up, it might legitimately contain an &). So, you need to
distinguish between "& before the third /" and "& after the third / and
probably after a ?". The former is bad. The latter should be ok.
I find it simpler to just remove the '%' ane '#' from the expression and use
http://[\w\d\.]*\&
so that '&' is not matched in the path part even if the slash is
encoded. while this doesn't catch all descrepancies, it catches the
example spams.