Hello everyone, I am using SpamAssassin to determine what is SPAM, and what isn't on my server. Everything works great and two files are written/appended based on if the mail is spam - almost-certainly-spam and probably-spam. Given that the majority of the mail that makes it in these files is spam, every now and then a valid message will be tagged as spam and I need to restore it. So, I look through these files to verify that all of the emails are indeed spam. Thing is, it's time consuming.
I'm writing a script that will strip out the From, Subject and X-Spam-Status and all that is fine. The kicker is that when SpamAssassin writes the messages (in mbox format) to the file, it writes two. One containing the SpamAssassin flags, etc. and the other is the original message which is left untouched so that restoring it is easy. With this said the file has the SpamAssassin message first, then the original message, so in my script trying to grep for ^From: I end up getting duplicate lines. I'd like to be able to "remove" the duplicates so that I only get the something like the following: From: [EMAIL PROTECTED] Subject: an AWESOME!!!! DEAL!!!! X-Spam-Status: ************************* I've looked at a few modules on CPAN, but haven't parsed mbox files before, and would like suggestions. From what I understand if I can just get every other message I'll get what I need. Any advice/suggestions? Thanks, Kevin -- K Old <[EMAIL PROTECTED]> -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]