On Tue, 2003-08-19 at 11:11, Martin Bretschneider wrote: > Hi Sa-users and devs again, > > I've got ham since 2000 in my archive containing > 1. personal emails from friends > 2. important mailings lists (university and so) > 3. not that important mailing list (GnuPG, OpenOffice.org, Sylpheed, > GNOME and so on) > > I have got spam of the last 3 months. Which ham should I take to > train sa-learn? Each ham since last 3 years? Also the not that important > mailing lists? > > TIA Martin >
I'd figure it's better to use your recent ham, the same as with spam, that way your bayes database contains tokens from what's happening now as opposed to what happened years ago. I'd feed all the spam to bayes and roughly the same amount of the most recent ham, including some from all the mailing lists. Keeping it recent and keeping it somewhere near in balance seems to work best. Having said that about balance, some of my client accounts are way off balance, auto-learning 500+ spam and only 10-12 ham a day every day for 2 months or more. I'm just waiting to see what happens when it breaks. -- Yorkshire Dave -- Scanned by MailScanner at wot.no-ip.com ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk