Hi,

I'm working with Devon Carraway's URIBL plugin and have been testing its
effectiveness in finding URI's using 6 million lines or so of email
traffic from a day in the life of our mail servers.  My testing has shown
that the following line in the while ( $l = $transaction->body_getline )
loop within lookup_start is problematic:

        # Dodge inserted-semicolon munging
        $l =~ tr/;//d;

Unlike the other bits of "dodge this sort of munging" operations,
examining my test results and asking uncle google has not made it clear to
me what "inserted-semicolon munging" really is.  Can anyone shed light on
how semicolons could be used to obfuscate URIs so the URIBL plugin can't
detect them?  If I have an understanding of this, perhaps I can come up
with a safer alternative.  I'll paste some of the output of my test script
to demonstrate the effect of tr/;//d.  The 'Original result' is what we
find if we're using tr/;//d, the 'New result' is what we find without it.

<TD>&nbsp;<B>Required:</B><BR><BR><FONT size=2
face=Tahoma>&nbsp;.NET</FONT><BR><FONT size=2
face=Tahoma>&nbsp;Blah</FONT><BR><FONT size=2
face=Tahoma>&nbsp;Blah</FONT><BR></TD>
Results differ!
Original result: nbsp.net
New result:      no match

Wichita, KS 67204, USA &nbsp;&nbsp;www.somesite.com&nbsp;&nbsp; =
Results differ!
Original result: nbspwww.somesite.com
New result:      www.somesite.com

... you gets the picture.

-Jared




Reply via email to