Re: Bulk spam scan

mouss Thu, 31 Jan 2008 16:26:14 -0800

Martin Gregorie wrote:

spamassassin --mbox <mbox >scanned.mbox

No, SA doesn't know how to split up messages for scanning;  sa-learn
is the only SA component that can extract messages from an mbox mail
folder.

In that case, what does the --mbox option do? Not what I expected,
evidently.


it tells spamassassin that the mail is stored in mbox format.

from the man page:

... and files are assumed to be in file format, with a singlemessage per file.

If I accidentally mangled my own personal mail flow such that
everything got put in my system inbox, for instance, I might just move
my system mailbox file from /var/spool/mail to ~/spammy-inbox, and
run:

$ formail -s procmail -m ~/.procmailrc < ~/spammy-inbox

No accident: I've been collecting all inbound and outbound mail with an
"always_bcc" Postfix directive that pushes it through a procmail recipe
and shell script that stores it in a set of mbox files and switches
files when they get near the mbox size limit defined in Postfix.


why not deliver these dups to maildir instead of mbox?

Meanwhile I've built a proper archive system with a loader that can

extract mail from mbox files, split it up and index the messages.

I'm pretty certain that some of the mbox files precede me installing SA,
so I'd like to push them through SA before pushing them through the
archive loader and. hopefully, end up with a similar spam scanned set of

mbox files.

(I'd move the mailbox out of /var/spool/mail so I didn't keep
appending old messages to the end of it over and over;  some mail
*does* get delivered there.)

Yes, that makes sense. Thanks for the formail tip. I can build a script
round that to do my scan and refiling job.

Hmm.  I'm pretty sure it's pointed out in several places that SA does
not know how to process more than one message per call, but I've been
using it long enough that I just know that's how it works.  <g>


man spamassassin-run (excerpt above).

I'd got that message for SA's normal operation and have looked at the
innards of spamc closely enough to see that can only handle a single
message at a time. As I said above, it was the --mbox option that
confused me because, in general, an mbox file contains multiple
messages.

Given that I'm running spamc + spamd, I have two final questions:

- would it be better to use spamc/spamd for the scan in place of
  SpamAssassin?


yes. This way, SA code is loaded once (spamd).

- if spamd is the way to go, do I need to stop my normal mail
  system while the scan is running or will spamd keep the two
  streams separate? I assume it does, but its always good to check.


sorry, I don't understand. someone else probably will...

Re: Bulk spam scan

Reply via email to