Gary Funck wrote:
It is a pain, esp. on a big mailbox, and you need large sample, of say, 2000/so each of ham and spam to train the Bayes engine.
What I did is fired up 'mutt', and used its 'tag' capabilities to tag the spam that I wanted to extract and deposit into my spam sample. It is important to remember that this low-scoring spam is exactly the stuff that will help Bayes do a better job. Anyway, I'd first sort by sender's address, and then find the obvious outliers who were spammers, tag those and then write/append them to a spam mbox. I'd also sort by subject and rescan manually for spam. It still took some time, but doing things this way eased the pain.
I use a slightly different approach.
I filter my emails into 4 different IMAP folders: slightly-spammy, somewhat-spammy, pretty-spammy, and very spammy. The filtering is based on increasing number of SA hits (actually the X-Spam-Level: header, and the number of "*" characters).
I also have a to-learn folder, with a pair of subfolders: ham and spam, which are not automatically populated.
Any FN that land in my inbox are manually moved to the to-learn.spam folder.
On regular intervals I do the following:
I scan through the slightly-spammy folder, copy any ham to the ham folder, and move the original to my inbox. Any spam gets moved to my spam "to-learn" folder.
The "somewhat-spammy" folder gets a quick look for the rare FP, I follow the same procedure for that folder as the "slightly-spammy". It's just quicker to scan, since FPs there are rare.
The last two folders are pretty high scoring. I'm thinking of combining them, since they get treated the same. I do a quick look for FP messages (once in a great while, someone on debian-user posts from an IP that's in just about every rbl I use, otherwise no FPs). After handling them, I've been searching for messages that *don't* hit BAYES_99, and move them into the to-learn.spam folder. The rest, I just delete (this procedure gets plenty of spam as it is).
Eventually (haven't automated this step yet, since my IMAP server and SA server are on different boxes), I run sa-learn on my to-learn folders. After that, I move those messages into a corpus.
Most of the movement is done from Netscape, since I use a windows machine for day to day work, but it obviously would work in any IMAP situation.
--Rich
------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk