I'm using Postfix 2.4.6, Amavisd-new 2.5.2, ClamAV 0.91.2 and
Mail-SpamAssassin 3.2.3 in a Linux mail filter. I'm having problems
getting enough ham and spam for Bayes training.
I know that public corpuses and starter DB's are available, but would
prefer to train using our own ham/spam. Unfortunately, this is a very
labor intensive and slow process.
Right now, I'm using the Postfix always_bcc function to send a copy of
every email to a Linux user's mailbox. I manually classify and save the
e-mails to seperate disk files one-by-one. That has the downside of
altering each e-mail by changing the recipient and adding several
X-Amavisd headers and I understand that might impact Bayes accuracy.
It's also a pain...
I'm curious: how do the rest of you approach this problem?
Thanks!
Ken Morley