Good day to all!

I have a RedHat 8 server which serves mail for my
domain.

I have been wrestling with a way to train my filter
for spam and ham.  I read that it isn't prudent to
train the filter with mail that has been forwarded
to another mailbox since the headers will be modified
resulting in a badly trained filter.  This made me 
paranoid when I noticed that Evolution itself adds header 
info that it could be problematic if I used the Evolution 
mbox itself in training.  I devised a method that others 
may find useful, so I thought I would share.

In my case my mail folders are in /root/evolution/local.
I created a spam folder to which I move my SPAM.
Since I read my mail under the root account, it resides
in /root/evolution/local/spam/mbox.

My idea was to make an entry in my procmailrc that saves
all mail in /var/spool/mail/allmail.  This allows me to
preserve original mail until I can use it to train the
filter, but I didn't want to have to dig through it with
vi to sort the ham from the spam.

I wrote a program that scans the mail in spam/mbox or
Inbox/mbox and builds a linked-list of the message Id's
from the messages.  Next, it opens up the allmail file
and matches the message id's and copies to stdout each raw 
message from allmail that it matches from the source list.

To train, I use evolution to move all my spam from my 
inbox folder to my spam folder.

Then run:
  mailcut /root/evolution/local/Inbox/mbox \
   /var/spool/mail/allmail >hambunch

followed by:
  sa-learn --ham --mbox hambunch

followed by:
  mailcut /root/evolution/local/spam/mbox \
   /var/spool/mail/allmail >spambunch

followed by:
  sa-learn --spam --mbox spambunch

The code is published  at www.heggood.com/mailcut.html

If anyone notices anything off-track in my thinking, please let
me know.

Regards,
-steve-


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to