Tony Earnshaw said: > Tom Meunier wrote: > > I'm kind of confused here. The way I see it (which could very well be a mi > > sunderstanding, mind you) is that the reason it autolearns spam over 15 point > > s by default is to make darned sure that it doesn't learn a false positive. > > Then one would augment its learning by feeding missed spams through sa-learn. > > The only reason I can think of to NOT feed low-scoring spams through sa-lea > > rn is that I've decided that a spam that scores 5.x points has no interesting > > tokens. Quite the opposite is true; that's why we feed it with a corpus of > > known spam in the first place, rather than feeding it a corpus of known spam > > that has been run through spamassassin manually and the under-15 spams weeded > > out. Same goes with hand-feeding hams that score 4.x points, in the theory > > that there's a fixed probability that a ham from that source will at some poi > > nt trigger another test and trip it over the threshold. > > Perhaps I misunderstand. If so, I'd appreciate alternate viewpoints and di > > scussion. > > To get a reasonable base, it's been my understanding that you teach > Bayes what is spam and what isn't. Your basic spam score's already > defined (default +5.0) in local.cf. You go on doing that until you've > got 200 of the things (spam.) To my mind that should be closer to 500 or > even 1,000, but never mind. You do that to get a reasonably biased base. > > You don't start contradicting what you've taught it by teaching it low > scoring spam until after you've reached your minimum bias of 200.
> You'll confuse the whole Bayes database if you do anything different. > Why in goodness name put a minimum score of 5 in the first place, if > you're going to contradict yourself? Actually, Tom's dead right. If it's spam, feed it to the bayes learner as spam; if it's ham, do the opposite. Stuff that the learner got wrong is especially valuable, as it "fixes" the tokens that were misleading it in the first place. --j. ------------------------------------------------------- This SF.NET email is sponsored by: eBay Great deals on office technology -- on eBay now! Click here: http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk