I wonder if a better way to do this would be to add an extra field to the rule (or maybe change BODY to BODY_STRIPPED or HEADER_STRIPPED) which removes everything that is *not* a letter before doing the regexp check. IE, does a s/[^a-zA-Z]//g) on the body/header before checking the rule. I can't see it being too useful on the body, but it would be great to catch those Per\scri;ption subject lines.
Rich Puhek wrote: Roger Merchberger wrote: > Rumor has it that Charles Gregory may have mentioned these words: > >> [snippety] >> Rule: >> BODY RULENAME /a string/i >> >> Coded Rule: >> BODY RULENAME /a{1,3} s{1,3}t{1,3}r{1,3}i{1,3}n{1,3}g{1,3}/i >> >> You get the idea. This could be quite burdensome to implement manually, >> but an easy enough thing to automate 'behind the scenes'. > > > However, if one were to do this with every body ruleset that exists, > it quite possibly could crush the SA server, as it multiply the amount > of CPU used to do a match like that, quite possibly exponentially. [1] > > If there was a way of optimizing the search (or at least only doing it > on the subject of the mail, not the body) it wouldn't be a bad idea, but > [[ as always with this type of > measure/countermeasure/countercountermeasure war ]] as soon as it was > widespread, the spammers would stop this yet again, and move onto the > next useful (for them) obfuscation scheme... :-/ > Would something like "excessive" instances of /(\w)\1/ work? Obviously such patterns are fairly common in regular english, but perhaps looking for an excessive quantity in an email could be an indication of the above problem. Another possible solution might be to preprocess the mail with something like: s/(\w)\1/\1/ in order to cull out the crap. But... like you said, it's an arms race. Fortunatly, Bayes should eat up the double-letter obfuscations... --Rich ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk