Hello everyone, I'm using the 2.55 version of SA and everything works great. I'm trying to find a good way to parse the almost-certainly-spam and probably-spam files that are produced by SA.Given that the majority of the mail that makes it in these files is spam, every now and then a valid message will be tagged as spam and I need to restore it. So, I look through these files to verify that all of the emails are indeed spam. Thing is, it's time consuming.
I'm writing a perl script that will strip out the From, Subject and X-Spam-Status and all that is fine. The kicker is that when SpamAssassin writes the messages (in mbox format) to the file, it writes two. One containing the SpamAssassin flags, etc. and the other is the original message which is left untouched so that restoring it is easy. With this said the file has the SpamAssassin message first, then the original message, so in my script trying to grep for ^From: I end up getting duplicate lines. I'd like to be able to "remove" the duplicates so that I only get the something like the following: From: [EMAIL PROTECTED] Subject: an AWESOME!!!! DEAL!!!! X-Spam-Status: ************************* I've looked at a few modules on CPAN, but haven't parsed mbox files before, and would like suggestions. From what I understand if I can just get every other message I'll get what I need. Any advice/suggestions? Thanks, Kevin -- K Old <[EMAIL PROTECTED]> ------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk