Matt Kettler <[EMAIL PROTECTED]> writes: > At 07:31 AM 9/10/2004, Gustafson, Tim wrote: > >What I'm worried about is that I have so many more SPAM than HAM messages. > >Is this dangerous? > > No, in fact it's closer to optimal than a 50-50 mix is... > > Remember, Bayes is a statistical system.. Statistics work best when they > are as close to reality as possible. If most of your mail is spam, so > should most of your training. > (You're currently at a 79% spam training ratio, if your real spam level is > a bit over 70%, you're quite close to reality. That's VERY good.)
Actually, Bayes works better if the balance is closer to 50/50. That's why we added the additional auto-learning thresholds to make it possible to balance by reducing the amount of one type of mail (generally, spam) learned. Daniel -- Daniel Quinlan http://www.pathname.com/~quinlan/