> -----Original Message----- > From: Robert Menschel [mailto:[EMAIL PROTECTED] > Sent: Monday, June 30, 2003 9:43 PM > To: Ralf Guenthner > Cc: [EMAIL PROTECTED] > Subject: Re: [SAtalk] Creative spam, any ideas? > > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello Ralf, > > Monday, June 30, 2003, 1:37:22 AM, you wrote: > > RG> The spam below slipped through SA 2.54. Note how they substitute > RG> possible trigger terms with other characters, like a > capital "I". Any > RG> ideas what to do to catch stuff like this? The mail also > contained a > RG> rather graphic image... > > 1) This is where Bayes excels. Feed them to Bayes as > confirmed spam; the > tokens will add up quite quickly. > > 2) I've been collecting these into rules which identify the > use of masked > words, eg: > body L_b_MaskedW0rds > /L0SE|[EMAIL PROTECTED]|si0n|casin0|0nline|m0re|[EMAIL PROTECTED]|F0r|d0|[EMAIL > PROTECTED]|Ple > [EMAIL PROTECTED]|m0ve|ph > [EMAIL PROTECTED]|[EMAIL PROTECTED]|[EMAIL PROTECTED]/i > describe L_b_MaskedW0rds masked spam word(s) > score L_b_MaskedW0rds 0.1 > body L_b_MaskedW0rds2 > /WeIcome|Mldget|AnimaI|sieix|E\}\{treme|FlSTlng|Tatt00ed|Iadie > s|MasslVE|Io > ads|BlZarre|hardc0re|0bscene|AmaZlNG|SENsatl0NAL|SlCkenlNG/i > body L_b_MaskedW0rds3 /\bl[i1]v[e3] > .{0,9}(?:fuck(?:[i1]ng)?|s[e3]x|nak[e3]d|g[i1]rls?|v[i1]rg[i1] > ns?|t[e3][e3 > ]ns?|p[0o]rn[0o]?)\b/i > body L_b_MaskedW0rds4 > /\b(excIusive|GiangBiang|sIut|ganigbainged|duides|hairdciore|E > xcIude|pIz)/ > i > > I have another set which looks for similar items in subject > headers. I've > just begun using these, so my scores are set very low right > now, while I > check for errors in the rules, false positives, etc. Once I'm > comfortable > and confident with them, the scores will be raised to much > higher levels. > > On a related topic, which I think I've seen asked before but don't > remember seeing an answer, which is better (more efficient) within SA: > single rules with many alternatives such as I have above, or > many rules > with few alternatives? Does one form use less computer resources than > another? > > Bob Menschel
These are good to try and catch, however I tend to try them last. There are Soooo many ways to obfuscate a word it is silly. So I only go for a few of the most popular words and right a rule to try to cath any instance of it obfuscated. Also I've learned now that too many things in a rule will drive you nuts. So it is better to keep one rule per word. Somthing like MY_OBFU_FREE just looks for the word Free obfuscated, and so on. Here is a list off the top of my head of the ones I look for: Free casino ejaculate sex intercourse penis adult girls sluts hardcore movies Maybe a few more I can't remember. But each has its own rule, with just about every possibilty of OBFU I could come up with. Chances are if they have other words OBFU'd, they have these already. So I hit most with just these famous few. HTH Chris Santerre System Admin "You should never, never doubt what nobody is sure about."- Willy Wonka ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk