My question is, what is the threshold for subject/body text like V(A)G1NAS or C()CKS to actually
invoke a score to be added to the overall score?
There is no static rule in the current ruleset that will ever add score for those particular phrases. However, the bayes tokenizer does a very good job if it's been trained on this stuff. But Bayes aside, I can send you an email that contains 100's of instances of the string "c()cks" and never get a point for it.
The problem is that these are forms of text obfuscation. And there's thousands of possible kinds of text obfuscation.. To try and code rules for each and every possible way to do it isn't practical. I've done a few custom rules for some of the more common ones, especially the spaced-out ones where they use .'s or _'s between every letter, but even my setup doesn't catch every possibility.
It may be possible to eventually add an eval test that searches for a lot of different kinds of obfuscated text, but right now it's not possible with a simple rule. It's almost like you want a "deobfuscated_body" ruletype where the message is scanned several times for a string with various kinds of de-mangling done in advance.
------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk