-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Gary,

Saturday, August 2, 2003, 1:29:54 PM, you wrote:

GF> body REMOVE_OBFUSCATE
GF> /(Rem(o|0)ve|Delete).{0,10}y(o|0)ur.{0,10}(e[-]?mai(l|1)|address)/i
GF> describe REMOVE_OBFUSCATE       Remove y0ur e-mail

GF> Let's say that I think the odds of a spam are higher if the
GF> obfuscated form is used, than when the regular form is used. Can you
GF> suggest a way to modify this pattern so that the pattern only matches
GF> obfuscated uses? Note: to meet my definition of obfuscated, only one
GF> of the substitutions above must appear. For example,    
GF>    Remove y0ur e-mail
GF> will suffice as an obfuscated form of "Remove your e-mail".

The way I do this is to look for the individual obfuscated words, eg:
body     L_b_MaskedW0rdsb  /(discreet1y|d0ct0r|appr0ved|m0ney|fr0m)/i
describe L_b_MaskedW0rdsb  masked spam word(s)
score    L_b_MaskedW0rdsb  3.1
body     L_b_MaskedW0rdsc  /(casin0|0nline|m0re|[EMAIL PROTECTED]|F0r|[EMAIL 
PROTECTED])/i
describe L_b_MaskedW0rdsc  masked spam word(s)
score    L_b_MaskedW0rdsc  3.1
body     L_b_MaskedW0rdsd  /(m0ve|[EMAIL PROTECTED]|[EMAIL PROTECTED])/i
describe L_b_MaskedW0rdsd  masked spam word(s)
score    L_b_MaskedW0rdsd  3.1
body     L_b_MaskedW0rds2  /(0bscene|AmaZlNG|SENsatl0NAL|SlCkenlNG)/i
describe L_b_MaskedW0rds2  masked spam word(s)
score    L_b_MaskedW0rds2  3.1

(note: I've several more rules, and more entries in some of these rules;
abbreviated for simplicity of display)

I don't care whether "y0ur" is in a remove line, or talking about a body
part, or a debt problem, or a mortgage. If that word is in an email, it's
likely to be spam.

The question then becomes, how high do you score these rules, and how do
you split the words between rules? If you put "m0ve", "y0ur", and "ma1l"
into three separate rules, then the line "rem0ve y0ur e-ma1l" will get
three scores added to your spam.

One warning: keep your individual words "long". I'm leaning to a minimum
length of 4 characters. Short strings of this type of text can easily
match random strings in PGP signatures, website management links,
perfectly valid mailing list unsubscribe links, etc. I'm thinking of
removing the "F0r" from the above rules for this reason (and "d0" is
already removed, because of excessive false positives). 

Bob Menschel

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBPyxLGJebK8E4qh1HEQLhqACgrmYK+C0RZIajdxNbaRU0542pLhcAoIwH
pgyMjICsMd7bCFr6nbpS83N2
=tuTn
-----END PGP SIGNATURE-----




-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to