On Fri, 4 Oct 2002, Pat Suwalski wrote: > I've got a few years of MailMan archives (in standard mbox format) that > could really use some SPAM weeding. The archive is upwards of 3 gigs. > > I was wondering if there is any way (or utility) to get/allow > spamassassin to run through each message of an mbox file seperately.
If you've got procmail -- and hence formail (not to be confused with "formmail", the BUGGY_CGI) -- you can do this: formail -s spamassasin -L < rawmbox > taggedmbox (add -P if spamassassin is v2.31 or older). However, that's dismally slow; if you install PPerl (look for it on cpan) you can speed things up quite a bit with formail -s pperl /path/to/spamassasin -L < rawmbox > taggedmbox but it still may take on the order of 5-7 seconds per message. It might be a little faster still to fire up spamd and use spamc in place of spamassassin. How you clean the tagged messages out afterwards is up to you. One thing it might be wise to do is prefilter to weed out big messages (250k+). They'll either take an inordinately long time (spamassassin) or get ignored entirely (spamc). In fact, I don't know what formail will do with a message that spamc rejects as too big (does spamc regurgitate the whole thing even though it doesn't pass it to spamd?). ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk