Re: xxxl spam

mouss Thu, 13 Apr 2006 11:00:02 -0700

John Rudd wrote:


I wouldn't do that.

Please note that I "said it the short" way. I of course don't jump todisable rules. I do check whether the message should have been flaggedas spam (a "reasonable" FP). if so, that's life. If possible, I see if Ican create a rule to make it get hammed without breaking the wholefilter. If however, the tests that made it classify as spam are notclear to me, then I check if I can lower some. but some tests just getdisabled.

Just because legitimate mail triggers some rule doesn't mean that therule is flawed. Using your example, triggering "no_real_name" does notmean that the message is spam, it means that the message has _some_similarity to at least some spam messages (the higher the score, thestronger the similarity). And, that's absolutely true: statistically,when looking at the corpus which was used to create the rules database,a higher percentage of "no_real_name" messages were spam.

As I already said in another thread, the statistics results depend onthe attributes you are checking. the perceptron will not wake up and say"hey, come on, this attribute is not good". so, if you run a mass checkwith rules like:

- IP parity
- first letter of sender
- mailer: "the bat" for instance
- relay = comcast, free.fr, ...
...

then the perceptron will give you what you asked for: scores.

I also understand that US guys may get less encoded subjects, but atleast in .fr, we have that all the time (because of our accentedletters, and because many companies still use software that predatesmime). and if I find a legitimate IP in a dnsbl used by SA, then I justremove that dnsbl.

Now, if legit messages were not just triggering those rules, but alsotriggering enough rules to be flagged as spam ... then I would lower thevalue of those rules, but not disable those rules.

I disable the rules, and if I get false negatives, I see what I can do.up so far, (the very few) missed spam would have been missed anyway.


 But I would only do

that if I could see that there was a large percentage of should-be-hammessages being flagged as spam by that rule AND that rule wasn't beinguseful in flagging spam messages. The reason is: if the message isbeing flagged, but it shouldn't have been, then perhaps my "corpus" ofmessages differs significantly enough from the SA internal corpus thatmy score values need to be different. But that doesn't mean that therules are so disjoint from tracking spam that they should be entirelydisabled. They just don't have the same weighting that my corpus needs.
If, instead, most messages passing through my mail servers, thattriggered that rule, really did seem to be spam, then I wouldn't alterthe score at all. I would just pass the should-have-been-ham messageinto my bayesian learner and hope that a low bayes score for messageslike that would offset the rules had flagged it as spam.

everybody has its own situation. I am very FP sensitive. I prefer to getspam than to lose an important mail. after all, I do review my spam. sothe less FPs there are, the faster I can review my junk folder.

Re: xxxl spam

Reply via email to