Hello everyone,

I'm using the 2.55 version of SA and everything works great.  I'm trying
to find a good way to parse the almost-certainly-spam and probably-spam
files that are produced by SA.Given that the majority of the mail that
makes it in these files is spam, every now and then a valid message will
be tagged as spam and I need to restore it.  So, I look through these
files to verify that all of the emails are indeed spam.  Thing is, it's
time consuming.

I'm writing a perl script that will strip out the From, Subject and
X-Spam-Status and all that is fine.  The kicker is that when
SpamAssassin writes the messages (in mbox format) to the file, it writes
two.  One containing the SpamAssassin flags, etc. and the other is the
original message which is left untouched so that restoring it is easy.

With this said the file has the SpamAssassin message first, then the
original message, so in my script trying to grep for ^From: I end up
getting duplicate lines.  I'd like to be able to "remove" the duplicates
so that I only get the something like the following:

From: [EMAIL PROTECTED]
Subject: an AWESOME!!!! DEAL!!!!
X-Spam-Status: *************************

I've looked at a few modules on CPAN, but haven't parsed mbox files
before, and would like suggestions.  From what I understand if I can
just get every other message I'll get what I need.

Any advice/suggestions?

Thanks,
Kevin
-- 
K Old <[EMAIL PROTECTED]>



-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to