At 12:11 PM 8/19/2003 +0200, Martin Bretschneider wrote:
I've got ham since 2000 in my archive containing
1. personal emails from friends
2. important mailings lists (university and so)
3. not that important mailing list (GnuPG, OpenOffice.org, Sylpheed,
GNOME and so on)

I have got spam of the last 3 months. Which ham should I take to
train sa-learn? Each ham since last 3 years? Also the not that important
mailing lists?

Ideally you should train a "representative" mix of ham that matches about the same period of time as your spam pile.


The idea is that you should train bayes to identify email based on what it's likely to receive in the future, not just a subset.

So, I'd pick off the last 3-6 months of ham and feed it to bayes, including your mailing lists.. just be sure to avoid any spam messages on the lists, as those will throw off the bayes tokens.



-------------------------------------------------------
This SF.net email is sponsored by Dice.com.
Did you know that Dice has over 25,000 tech jobs available today? From
careers in IT to Engineering to Tech Sales, Dice has tech jobs from the
best hiring companies. http://www.dice.com/index.epl?rel_code=104
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to