-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello Gary,
Saturday, August 2, 2003, 1:29:54 PM, you wrote: GF> body REMOVE_OBFUSCATE GF> /(Rem(o|0)ve|Delete).{0,10}y(o|0)ur.{0,10}(e[-]?mai(l|1)|address)/i GF> describe REMOVE_OBFUSCATE Remove y0ur e-mail GF> Let's say that I think the odds of a spam are higher if the GF> obfuscated form is used, than when the regular form is used. Can you GF> suggest a way to modify this pattern so that the pattern only matches GF> obfuscated uses? Note: to meet my definition of obfuscated, only one GF> of the substitutions above must appear. For example, GF> Remove y0ur e-mail GF> will suffice as an obfuscated form of "Remove your e-mail". The way I do this is to look for the individual obfuscated words, eg: body L_b_MaskedW0rdsb /(discreet1y|d0ct0r|appr0ved|m0ney|fr0m)/i describe L_b_MaskedW0rdsb masked spam word(s) score L_b_MaskedW0rdsb 3.1 body L_b_MaskedW0rdsc /(casin0|0nline|m0re|[EMAIL PROTECTED]|F0r|[EMAIL PROTECTED])/i describe L_b_MaskedW0rdsc masked spam word(s) score L_b_MaskedW0rdsc 3.1 body L_b_MaskedW0rdsd /(m0ve|[EMAIL PROTECTED]|[EMAIL PROTECTED])/i describe L_b_MaskedW0rdsd masked spam word(s) score L_b_MaskedW0rdsd 3.1 body L_b_MaskedW0rds2 /(0bscene|AmaZlNG|SENsatl0NAL|SlCkenlNG)/i describe L_b_MaskedW0rds2 masked spam word(s) score L_b_MaskedW0rds2 3.1 (note: I've several more rules, and more entries in some of these rules; abbreviated for simplicity of display) I don't care whether "y0ur" is in a remove line, or talking about a body part, or a debt problem, or a mortgage. If that word is in an email, it's likely to be spam. The question then becomes, how high do you score these rules, and how do you split the words between rules? If you put "m0ve", "y0ur", and "ma1l" into three separate rules, then the line "rem0ve y0ur e-ma1l" will get three scores added to your spam. One warning: keep your individual words "long". I'm leaning to a minimum length of 4 characters. Short strings of this type of text can easily match random strings in PGP signatures, website management links, perfectly valid mailing list unsubscribe links, etc. I'm thinking of removing the "F0r" from the above rules for this reason (and "d0" is already removed, because of excessive false positives). Bob Menschel -----BEGIN PGP SIGNATURE----- Version: PGP 8.0 iQA/AwUBPyxLGJebK8E4qh1HEQLhqACgrmYK+C0RZIajdxNbaRU0542pLhcAoIwH pgyMjICsMd7bCFr6nbpS83N2 =tuTn -----END PGP SIGNATURE----- ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk