On 26 Aug 2003 14:06:56 -0400 K Old <[EMAIL PROTECTED]> wrote: > Hello everyone, > > I'm using the 2.55 version of SA and everything works great. I'm trying > to find a good way to parse the almost-certainly-spam and probably-spam > files that are produced by SA.Given that the majority of the mail that > makes it in these files is spam, every now and then a valid message will > be tagged as spam and I need to restore it. So, I look through these > files to verify that all of the emails are indeed spam. Thing is, it's > time consuming.
Just out of curiosity, how many messages are you scanning through (order-of-magnitude)? > I'm writing a perl script that will strip out the From, Subject and > X-Spam-Status and all that is fine. The kicker is that when > SpamAssassin writes the messages (in mbox format) to the file, it writes > two. One containing the SpamAssassin flags, etc. and the other is the > original message which is left untouched so that restoring it is easy. I think what you're seeing is the initial header from the SA-tagged message followed by the original headers attached in message/rfc822 format. Ideally, you'd be able to strip the SpamAssassin 'wrapper' and extract the message/rfc822 attachment, which should be easy if you can extract the full message from the mbox and pipe it through `spamassassin -d`. The trick now is extracting the full, individual messages from the mbox format. > I've looked at a few modules on CPAN, but haven't parsed mbox files > before, and would like suggestions. From what I understand if I can > just get every other message I'll get what I need. Better to let perl do the heavy lifting rather than guessing which pieces of the whole mbox file you need. Take a look at Mail::MboxParser Mail::Mbox::MessageParser Mail::Box Mail::Util for info on parsing mbox files, and Mail::SpamAssassin->remove_spamassassin_markup() Mail::Internet for manipulating individual messages. > Any advice/suggestions? I'd probably use one of the first four modules to extract a list of messages from the mbox file, then convert each of those messages into Mail::Internet objects to analyze the appropriate headers, and strip off the original SA tagging of suspected false positives with Mail::SpamAssassin. At that point, you can do whatever you want with the suspected FPs. hth, -- Bob ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk