Hi, these days I got my first spam mail containing control characters in URLs, I think from some spammer called Empire Towers.
Although 20_uri_tests.cf contains the test HTTP_CTRL_CHARS_HOST, it wasn't triggered by this mail. I got curious because of this and the slight difficulties to investigate the resulting IP addresses of the URLs :-). HTTP_CTRL_CHARS_HOST does not trigger because the test never sees those URLs. The following http://www.05-cray.category.unique.zaam.net^A^T^T^T.co.fr|https.am2002.goopt.com:8101 appears as 2 separate URLs http://www.05-cray.category.unique.zaam.net and https.am2002.goopt.com:8101 Obviously the perl modules URI and URI::Find never expected characters like this and did not provide for the appropriate regular expressions. The attached patch will hopefully rectify this. ciao Klaus PS: I will file a bug with bugzilla
--- PerMsgStatus.pm 7 May 2002 01:53:12 -0000 1.105 +++ PerMsgStatus.pm 19 May 2002 02:50:05 -0000 @@ -1125,9 +1125,9 @@ } # Taken from URI and URI::Find -my $reserved = q(;/?:@&=+$,[]\#); +my $reserved = q(;/?:@&=+$,[]\#|); my $mark = q(-_.!~*'()); #'; emacs -my $unreserved = "A-Za-z0-9\Q$mark\E"; +my $unreserved = "A-Za-z0-9\Q$mark\E\x00-\x08\x0b\x0c\x0e-\x1f"; my $uricSet = quotemeta($reserved) . $unreserved . "%"; my $schemeRE = '[a-zA-Z][a-zA-Z0-9.+\-]*';