I have a number of spamtrap addresses that between them receive between about
3000 and 6000 messages a day. Until recently I have used this mail to simply
populate a database of machines that have sent me spam in the last 48 hours,
which is used as part of a series of checks on incoming connections.

I've just decided to try and do something more with all these data, and add
reporting through spamassin. Quick and easy to add to the existing script (a
perl script that mail is piped to).

    require Mail::SpamAssassin;

    $spamtest = Mail::SpamAssassin->new({
        debug => $sa_debug,
        dont_copy_prefs   => 1,
        home_dir_for_helpers => $helpers_home,
        stop_at_threshold => 0,
        username => $sa_user,
        userprefs_filename => $sa_userprefs,
      });

    $samail = $spamtest->parse(\*STDIN);
    my $sastatus = $spamtest->report_as_spam($samail);

Trouble is, this is absoloutely killing me server. Even with the MTA (Postfix)
configured to limit concurrent deliveries to it to 2, it's eating all the CPU
and grinding things to a halt. Each report is taking around 8 - 12 seconds,
sometimes more.

I'm reporting using DCC (dccifd), Razor, SpamCop, and bayes. Debug output shows
that there's an awful lot of parsing of the message going on. Is there a way to
avoid this? My guess is that this is necessary for bayes, but I can't see why
it's needed for the others. If I set

bayes_learn_during_report 0

am I likely to see an improvement? Or would I be better going back to first
principles and writing some non-SA based code to report to SpamCop, Razor and
DCC?

Better still, has someone else done it? Is there some nice efficient fast code
out there for spamtraps?

Cheers

-- 
Chris Hastie

Reply via email to