Roger Merchberger wrote:
Rumor has it that Charles Gregory may have mentioned these words:
[snippety] Rule: BODY RULENAME /a string/i
Coded Rule: BODY RULENAME /a{1,3} s{1,3}t{1,3}r{1,3}i{1,3}n{1,3}g{1,3}/i
You get the idea. This could be quite burdensome to implement manually, but an easy enough thing to automate 'behind the scenes'.
However, if one were to do this with every body ruleset that exists,
it quite possibly could crush the SA server, as it multiply the amount of CPU used to do a match like that, quite possibly exponentially. [1]
If there was a way of optimizing the search (or at least only doing it on the subject of the mail, not the body) it wouldn't be a bad idea, but [[ as always with this type of measure/countermeasure/countercountermeasure war ]] as soon as it was widespread, the spammers would stop this yet again, and move onto the next useful (for them) obfuscation scheme... :-/
Would something like "excessive" instances of /(\w)\1/ work? Obviously such patterns are fairly common in regular english, but perhaps looking for an excessive quantity in an email could be an indication of the above problem.
Another possible solution might be to preprocess the mail with something like: s/(\w)\1/\1/ in order to cull out the crap.
But... like you said, it's an arms race. Fortunatly, Bayes should eat up the double-letter obfuscations...
--Rich
------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk