Tony Earnshaw said:
> Tom Meunier wrote:
> > I'm kind of confused here.  The way I see it (which could very well be a mi
> > sunderstanding, mind you) is that the reason it autolearns spam over 15 point
> > s by default is to make darned sure that it doesn't learn a false positive.  
> > Then one would augment its learning by feeding missed spams through sa-learn.
> >   The only reason I can think of to NOT feed low-scoring spams through sa-lea
> > rn is that I've decided that a spam that scores 5.x points has no interesting
> >  tokens.  Quite the opposite is true; that's why we feed it with a corpus of 
> > known spam in the first place, rather than feeding it a corpus of known spam 
> > that has been run through spamassassin manually and the under-15 spams weeded
> >  out.  Same goes with hand-feeding hams that score 4.x points, in the theory 
> > that there's a fixed probability that a ham from that source will at some poi
> > nt trigger another test and trip it over the threshold.
> > Perhaps I misunderstand.  If so, I'd appreciate alternate viewpoints and di
> > scussion.
> 
> To get a reasonable base, it's been my understanding that you teach 
> Bayes what is spam and what isn't. Your basic spam score's already 
> defined (default +5.0) in local.cf. You go on doing that until you've 
> got 200 of the things (spam.) To my mind that should be closer to 500 or 
> even 1,000, but never mind. You do that to get a reasonably biased base.
> 
> You don't start contradicting what you've taught it by teaching it low 
> scoring spam until after you've reached your minimum bias of 200.

> You'll confuse the whole Bayes database if you do anything different. 
> Why in goodness name put a minimum score of 5 in the first place, if 
> you're going to contradict yourself?

Actually, Tom's dead right.

If it's spam, feed it to the bayes learner as spam; if it's ham, do
the opposite.  Stuff that the learner got wrong is especially valuable,
as it "fixes" the tokens that were misleading it in the first place.

--j.


-------------------------------------------------------
This SF.NET email is sponsored by: eBay
Great deals on office technology -- on eBay now! Click here:
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to