Re: A different approach to scoring spamassassin hits

Loren Wilton Sat, 30 Jun 2007 15:30:04 -0700

And after typing all this I'm thinking you might be right. But part ofthis approach is to run all these rules in YES/NO fashion and see if theprobability is significant. For example: If I tested for SOME_TEST=NOand found it was scoring a probability of ~0.500 then it's indisputablethat you are right.

Well, this still doesn't make any real sense to me; it seems equivalent tothe attempts at bayes poison that spammers stick into their spams: a bunchof words totally unrelated to the mail in the hopes of outweighing theuseful terms. Now their trick works as a good spam indication because thewords they pick aren't common to my ham mails, so it is really a good spamindication rather than poison. I'm not immediately convinced that will holdfor the usage you intend. Maybe. Maybe not.

However, if you want to do this, remember that bayes works on tokens and hasa tokenizer. So SOME_RULE=YES is probably either two or three tokens, andyou will end up scoring on the probability of YES and NO, along with thefrequency of the rule names, which will be 1. So you probably want to doNO_SOME_RULE and YES_OTHER_RULE or the like when you build the insert list.Again though I'm not sure I see the point in the yes and no factors; thepresence or absense of a word in the mail seems like a pretty good yes/noindication to me.

Were I doing it I'd try it both ways and see if there is any difference inresults.


       Loren

Re: A different approach to scoring spamassassin hits

Reply via email to