On Sun, 20 Jul 2003, Daniel Carrera muttered drunkenly:
> Hello,
> 
> I'm trying to figure out how to use 'sa-learn' to train SA's spam filter.
> I have already looked at the man page.
> 
> I am not entirely sure what kind of input is acceptable to sa-learn.  
> 
> On teaching ham:
> ================
> I gather that I can give sa-learn an entire mailbox and it'll know how to 
> interpret it.  Am I right? I am using mutt.  Are mutt's mail-boxes similar 
> to what sa-learn expects.

sa-learn understands Unix mbox format (messages preceded with a line
matching the regex "^From "), and maildir format (grabbing all files in
a directory save for those starting with a dot, so it can handle mh and
Gnus's nnml formats, too).

Anything else, you'll have to teach it.

> On teaching spam:
> =================
> This is the tricky one.  Most of the spam I get is successfully filtered 
> by the current SA rules.  These are stored in a sepparate file.  I would 
> like to use this file to training SA.  The problem is that SA alters the 
> header siginficantly.  SA's output includes an analysis explaining why it 
> thinks that the give email is spam.  This will do bad things to the 
> statistical approach.  Is there a way to use this output to train SA?

That's fine; SA understands what it (and all previous versions of it)
did to the headers and body, and automatically reverses it (the
equivalent of a `spamassassin -d') before learning.

If I were you I'd take care to sa-learn spam that SA misses (and ham
that SA thinks is spam) appropriately; it's that that will have most
impact upon the effectiveness of SA.

-- 
`We cannot get a new line down the pipe due to a blockage and we cannot
 dig up the road to clear the blockage because it is covered with the
 wrong type of tarmac.' --- British Telecom, via Mark Lowes


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the
same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to