Re: Feeding SA-learn

Anthony Peacock Wed, 23 Jan 2008 03:07:23 -0800

Diego Pomatta wrote:

Anthony Peacock escribió:
Can I feed a plain text file representing just the body
of a message to sa-learn?
/Diego
Yes you can, who to stop it?
I just sent your message body as --ham, and it told it learned onemessage.
I meant without the headers, just the body.
ok thanks
Well the short answer is, yes you can.
The slightly longer answer is that you won't get as good results doingthis, as the Bayes system uses tokens found in the complete message.By only learning on the body you will not gain any advantage fortokens found in headers.
Yep, I know, precisely the problem is that I don't have the originalheaders after the mail has been delivered.My intention was to manually feed the few spam messages that slip thruundetected. By the time I get a hold of those, they are in therecipient's mail client inbox, not in the server.I was thinking, if I save the mail as EML files, would that preserve theheaders in a way that sa-learn can parse correctly?



Depends on the client.

For instance, Thunderbird stores it's folders in mbox format, sosa-learn can work against those files as-is. Other email clients cansave emails in text format complete with headers.


The biggest problem with this is training the users to do that consistantly.


--
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"A CAT scan should take less time than a PET scan.  For a CAT scan,
 they're only looking for one thing, whereas a PET scan could result in
 a lot of things."    - Carl Princi, 2002/07/19

Re: Feeding SA-learn

Reply via email to