John Rudd wrote:
The only problem I can think of is than an ampersand in a _URL_ is legal (IIRC, in CGI form urls, ampersand is used to delimit different variables, so if the URL question contains some form of context, like ack'ing a sign-up, it might legitimately contain an &). So, you need to distinguish between "& before the third /" and "& after the third / and probably after a ?". The former is bad. The latter should be ok.

I find it simpler to just remove the '%' ane '#' from the expression and use
        http://[\w\d\.]*\&;
so that '&' is not matched in the path part even if the slash is encoded. while this doesn't catch all descrepancies, it catches the example spams.




Reply via email to