On Mon, 12 Jan 2004, David A. Carter wrote:

> What does concern me is how SpamAssassin should deal with Habeas marks,
> which clearly *is* on-topic. Specifically, should SpamAssassin
> auto-learn Habeas-marked messages as ham, as it does today?

This is no different than the question "Should SpamAssassin auto-learn a
high-scoring false negative as ham, as it does today?"

The answer of course is that, by definition, SA can't tell it's a false
negative (if it could, it wouldn't have been a negative, would it?) so
the only way to prevent it from mislearning the occasional false negative
(or positive) is to turn off autolearning entirely.

It's usually easier to promptly re-learn a false negative as spam than it
is to re-learn a false positive as ham, because FNs probably go right into
your mailbox while FPs are dropped in a quarantine (or worse).  Unless
you're not paying attention, a flood of obvious FNs is not going to
"poison" the Bayes database for very long.

That the Habeas mark is what causes the FN is irrelevant, except in so far
as it's an obvious way for a spammer to get a better score.

Also, I think you seriously misjudge the difficulty of pumping enough bad
data into a Bayes database to get something misclassified.

Finally, I think people are overly concerned about "poisoning" their
databases by learning messages containing the Habeas headers as spam (or
ham).  Remember that Bayes only pays attention to tokens that clearly
appear in more of one type of message than the other; if a token appears
too regularly in both, it gets ignored and the decision is made by looking
at other tokens.  All you'll do with correct learning as spam/ham is teach
Bayes that the Habeas headers are not a reliable way to make a decision;
you won't teach it to make the wrong decision unless the entire message
(and thus the rest of the content) is learned the wrong way (which returns
us to the original question about auto-learning).

If the Habeas headers still concern you, use bayes_ignore_header for them,
don't spend your time manually deleting them.



-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to