Matt Kettler <[EMAIL PROTECTED]> writes:

> At 07:31 AM 9/10/2004, Gustafson, Tim wrote:
> >What I'm worried about is that I have so many more SPAM than HAM messages.
> >Is this dangerous?
> 
> No, in fact it's closer to optimal than a 50-50 mix is...
> 
> Remember, Bayes is a statistical system.. Statistics work best when they 
> are as close to reality as possible. If most of your mail is spam, so 
> should most of your training.
> (You're currently at a 79% spam training ratio, if your real spam level is 
> a bit over 70%, you're quite close to reality. That's VERY good.)

Actually, Bayes works better if the balance is closer to 50/50.  That's
why we added the additional auto-learning thresholds to make it possible
to balance by reducing the amount of one type of mail (generally, spam)
learned.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Reply via email to