--- "John D. Hardin" <[EMAIL PROTECTED]> wrote: > On Sat, 7 Apr 2007, J. wrote: > > > --- "John D. Hardin" <[EMAIL PROTECTED]> wrote: > > > > > You might want to look at this instead of trying to hand-roll > > > obfuscation rules: > > > > > > http://www.impsec.org/~jhardin/antispam/obfusc.pl > > > > Thanks John. I have no idea what the program does but it does seem > > to catch a lot of the stuff I was going after. > > Basically, given a word list and scores it generates re's to catch > most simple obfuscations of those words. Theo is right, it largely > overlaps the ReplaceTags plugin stuff, but I think there are a few > obfuscations that it catches that ReplaceTags does not (after an > admittedly brief look at ReplaceTags)... > > > The re is huge so I can't easily figure out what it's doing, but > > it does miss some of the spam I was targeting with my rule though. > > for example this one: > > > > http://binaryops.com/spam3.txt > > Yeah, at some point the obfuscation becomes problematic to detect > with > a low rate of false positives, and it is to some degree a game of > whack-a-mole. > > However, if the obfuscation becomes complex enough to be difficult to > automatically detect, it becomes that much more difficult for the > victim to be able to *read* and make sense of, so the more esoteric > obfuscations become self-limiting. > > > It was mail like that which forced me to use the .{0,4} clauses in > my > > rule. I'm probably causing some false positives though especially > since > > my scoring is really high. > > Using .{0,4} is far too loose and will cause massive FPs. It's a > little better to try to match the specific extreme obfuscation > technique, in this case (?:\s[a-z]{2}\s)? (from your sample). Of > course, this will probably rot quickly. > > Did you also create a rule for the "from $3, 33" parts? > -- > John Hardin KA7OHZ
Actually the re in the rule was the only thing I could figure out that actually matched all the spam that was getting through that day. I'm not sure how common those kinds of mails are now, but I lowered the scoring a lot in my rule so hopefully it won't cause (m)any fps. I didn't bother with the $3, 33 part but you're right that it might be a good way to avoid trouble if I make that part of the re. Here's the work file I used while making the re: http://binaryops.com/spamwork.txt ____________________________________________________________________________________ Need Mail bonding? Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users. http://answers.yahoo.com/dir/?link=list&sid=396546091