On Tue, 2003-08-19 at 11:11, Martin Bretschneider wrote:
> Hi Sa-users and devs again,
> 
> I've got ham since 2000 in my archive containing
> 1. personal emails from friends
> 2. important mailings lists (university and so)
> 3. not that important mailing list (GnuPG, OpenOffice.org, Sylpheed,
> GNOME and so on)
> 
> I have got spam of the last 3 months. Which ham should I take to
> train sa-learn? Each ham since last 3 years? Also the not that important
> mailing lists?
> 
> TIA  Martin
> 

I'd figure it's better to use your recent ham, the same as with spam,
that way your bayes database contains tokens from what's happening now
as opposed to what happened years ago.

I'd feed all the spam to bayes and roughly the same amount of the most
recent ham, including some from all the mailing lists. Keeping it recent
and keeping it somewhere near in balance seems to work best.

Having said that about balance, some of my client accounts are way off
balance, auto-learning 500+ spam and only 10-12 ham a day every day for
2 months or more. I'm just waiting to see what happens when it breaks.


-- 
Yorkshire Dave


-- 
Scanned by MailScanner at wot.no-ip.com



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to