Well, if everyone stopped using SpamAssassin, it would work better too, so I blame all users.
Yeah. Blame users. Users are the easiest to blame, anyways.
Well, I heard about this weakness before we even adopted Bayes. It was coming, one way or another.
There's one non-Bayesian rule in 2.60 to catch some of these, but we could probably use more rules to catch tricks like this.
Does anyone have a stockpile of this stuff? I was thinking of some filters for this myself earlier. What are the strategies other people have been using/pondering for it?
I think most times, these messages look like:
studs hairiness miltonic waldo pseudoinstruction RzneXzvxrRzneXongpu.pbzRzneX monocotyledon conceals rickshaws raising lutheranizers gels modulating cautions bowed verbally storeyed tormenting dairy pruners height
A couple of things could be done with this. How frequently do you see the word "rickshaws" in email? pseudoinstruction? I think if you see a word you've only ever seen once in "ham" before, and then you see four others, you're looking at gibberish.
Additionally, you'll notice that most of these words are > 5 characters.
[scorch:~] alex% perl -lne '$s+=length;END{print $s/$.}' /usr/share/dict/words
9.58507174263739
However, in practice, if we take this email, we'll find that the words are actually interspersed with many smaller words. if, the, and, i, me, my, and so on. So if we see several occurrences of words we don't see frequently, along with a lack of the smaller words that make speech understandable, it is probably gibberish.
I suppose to defeat it you could use a sort of anti-bayesian filtering, by coming up with a file full of commonly used words and intersperse them with smaller words, or to use legitimate text (somebody mentioned the declaration of independence). A counter counter measure I guess would just be to use fuzzy logic and determine a normal word frequency ratio or a normal large:small word ratio. In combination with other filters, it might be helpful to implement.
It isn't a regex, though, and would be somewhat slow.
alex
------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk