On Mon, 12 Jan 2004, David A. Carter wrote: > What does concern me is how SpamAssassin should deal with Habeas marks, > which clearly *is* on-topic. Specifically, should SpamAssassin > auto-learn Habeas-marked messages as ham, as it does today?
This is no different than the question "Should SpamAssassin auto-learn a high-scoring false negative as ham, as it does today?" The answer of course is that, by definition, SA can't tell it's a false negative (if it could, it wouldn't have been a negative, would it?) so the only way to prevent it from mislearning the occasional false negative (or positive) is to turn off autolearning entirely. It's usually easier to promptly re-learn a false negative as spam than it is to re-learn a false positive as ham, because FNs probably go right into your mailbox while FPs are dropped in a quarantine (or worse). Unless you're not paying attention, a flood of obvious FNs is not going to "poison" the Bayes database for very long. That the Habeas mark is what causes the FN is irrelevant, except in so far as it's an obvious way for a spammer to get a better score. Also, I think you seriously misjudge the difficulty of pumping enough bad data into a Bayes database to get something misclassified. Finally, I think people are overly concerned about "poisoning" their databases by learning messages containing the Habeas headers as spam (or ham). Remember that Bayes only pays attention to tokens that clearly appear in more of one type of message than the other; if a token appears too regularly in both, it gets ignored and the decision is made by looking at other tokens. All you'll do with correct learning as spam/ham is teach Bayes that the Habeas headers are not a reliable way to make a decision; you won't teach it to make the wrong decision unless the entire message (and thus the rest of the content) is learned the wrong way (which returns us to the original question about auto-learning). If the Habeas headers still concern you, use bayes_ignore_header for them, don't spend your time manually deleting them. ------------------------------------------------------- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Configuration Management System offering advanced branching capabilities and atomic changes on 50+ platforms. Free Eval! http://www.perforce.com/perforce/loadprog.html _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk