Parsing mbox file

K Old Tue, 26 Aug 2003 18:07:39 +0000

Hello everyone,

I am using SpamAssassin to determine what is SPAM, and what isn't on my
server.  Everything works great and two files are written/appended based
on if the mail is spam - almost-certainly-spam and probably-spam.  Given
that the majority of the mail that makes it in these files is spam,
every now and then a valid message will be tagged as spam and I need to
restore it.  So, I look through these files to verify that all of the
emails are indeed spam.  Thing is, it's time consuming.


I'm writing a script that will strip out the From, Subject and
X-Spam-Status and all that is fine.  The kicker is that when
SpamAssassin writes the messages (in mbox format) to the file, it writes
two.  One containing the SpamAssassin flags, etc. and the other is the
original message which is left untouched so that restoring it is easy.

With this said the file has the SpamAssassin message first, then the
original message, so in my script trying to grep for ^From: I end up
getting duplicate lines.  I'd like to be able to "remove" the duplicates
so that I only get the something like the following:

From: [EMAIL PROTECTED]
Subject: an AWESOME!!!! DEAL!!!!
X-Spam-Status: *************************

I've looked at a few modules on CPAN, but haven't parsed mbox files
before, and would like suggestions.  From what I understand if I can
just get every other message I'll get what I need.

Any advice/suggestions?

Thanks,
Kevin


-- 
K Old <[EMAIL PROTECTED]>


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Parsing mbox file

Reply via email to