Hi, I'm working with Devon Carraway's URIBL plugin and have been testing its effectiveness in finding URI's using 6 million lines or so of email traffic from a day in the life of our mail servers. My testing has shown that the following line in the while ( $l = $transaction->body_getline ) loop within lookup_start is problematic:
# Dodge inserted-semicolon munging $l =~ tr/;//d; Unlike the other bits of "dodge this sort of munging" operations, examining my test results and asking uncle google has not made it clear to me what "inserted-semicolon munging" really is. Can anyone shed light on how semicolons could be used to obfuscate URIs so the URIBL plugin can't detect them? If I have an understanding of this, perhaps I can come up with a safer alternative. I'll paste some of the output of my test script to demonstrate the effect of tr/;//d. The 'Original result' is what we find if we're using tr/;//d, the 'New result' is what we find without it. <TD> <B>Required:</B><BR><BR><FONT size=2 face=Tahoma> .NET</FONT><BR><FONT size=2 face=Tahoma> Blah</FONT><BR><FONT size=2 face=Tahoma> Blah</FONT><BR></TD> Results differ! Original result: nbsp.net New result: no match Wichita, KS 67204, USA www.somesite.com = Results differ! Original result: nbspwww.somesite.com New result: www.somesite.com ... you gets the picture. -Jared