On Fri, 4 Oct 2002, Pat Suwalski wrote:

> I've got a few years of MailMan archives (in standard mbox format) that 
> could really use some SPAM weeding. The archive is upwards of 3 gigs.
> 
> I was wondering if there is any way (or utility) to get/allow 
> spamassassin to run through each message of an mbox file seperately.

If you've got procmail -- and hence formail (not to be confused with
"formmail", the BUGGY_CGI) -- you can do this:

  formail -s spamassasin -L < rawmbox > taggedmbox

(add -P if spamassassin is v2.31 or older).  However, that's dismally 
slow; if you install PPerl (look for it on cpan) you can speed things
up quite a bit with

  formail -s pperl /path/to/spamassasin -L < rawmbox > taggedmbox

but it still may take on the order of 5-7 seconds per message.  It might
be a little faster still to fire up spamd and use spamc in place of 
spamassassin.

How you clean the tagged messages out afterwards is up to you.

One thing it might be wise to do is prefilter to weed out big messages
(250k+).  They'll either take an inordinately long time (spamassassin) or
get ignored entirely (spamc). In fact, I don't know what formail will do
with a message that spamc rejects as too big (does spamc regurgitate the
whole thing even though it doesn't pass it to spamd?).



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to