Hi Ed,

You can either wait until the auto-learning catches up and learns 200
messages, or if you're impatient (like me) and just want to get it
going:
Train it with as much recent good mail as you can get, then make up for
it with older mail.  Don't get all wrapped up in the timestamp issue.
Train it with as much of your own spam as you can get.  (Quit deleting
it for a while!)  Don't worry about the Spamassassin markup; sa-learn
will ignore that when learning.  If you absolutely can't get to 200
spams, grab some from the public corpus and learn no more than
absolutely necessary to reach your 200-spam level.  This is not optimal,
as the headers are not YOUR headers, but I've verified that it works
REASONABLY well.  Better than no Bayes at all, it does.  I believe any
dire warnings you hear about learning on spam that's not your own should
be toned down to, "Well, that isn't really optimal, so don't overdo it."

Keep in mind that a "large corpus" is optimal, but the necessary corpus
to get Bayes working is 200 hams & 200 spams.

-tom

 

> -----Original Message-----
> From: Ed Greenberg [mailto:[EMAIL PROTECTED] 
> 
> It says I have to send a large corpus of recent mail through 
> it sorted as spam and non-spam.  I have a large corpus of 
> non-spam in my archives, but it's not recent. Do I have to 
> worry about teaching Bayes that an old timestamp is a sign of 
> good mail.
> 
> I don't have a large corpus of 
> "spam-that-spamassassin-already-missed". I don't even have a 
> large corpus of spam that spamassassin got, since (a) I don't 
> save it and (b) all that spam has been deformed by SA. It's 
> been defanged, changed into an attachment, prefixed with SA's output.
> 
> So how do I get started?
> 
> SA is installed, but not integrated with the mailer. Instead, 
> I use procmail to run the mail that has passed all the 
> mailing list filters through SA. So the Spam Test is after 
> the mailing lists are drawn off.
> 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to