I have a number of spamtrap addresses that between them receive between about 3000 and 6000 messages a day. Until recently I have used this mail to simply populate a database of machines that have sent me spam in the last 48 hours, which is used as part of a series of checks on incoming connections.
I've just decided to try and do something more with all these data, and add reporting through spamassin. Quick and easy to add to the existing script (a perl script that mail is piped to). require Mail::SpamAssassin; $spamtest = Mail::SpamAssassin->new({ debug => $sa_debug, dont_copy_prefs => 1, home_dir_for_helpers => $helpers_home, stop_at_threshold => 0, username => $sa_user, userprefs_filename => $sa_userprefs, }); $samail = $spamtest->parse(\*STDIN); my $sastatus = $spamtest->report_as_spam($samail); Trouble is, this is absoloutely killing me server. Even with the MTA (Postfix) configured to limit concurrent deliveries to it to 2, it's eating all the CPU and grinding things to a halt. Each report is taking around 8 - 12 seconds, sometimes more. I'm reporting using DCC (dccifd), Razor, SpamCop, and bayes. Debug output shows that there's an awful lot of parsing of the message going on. Is there a way to avoid this? My guess is that this is necessary for bayes, but I can't see why it's needed for the others. If I set bayes_learn_during_report 0 am I likely to see an improvement? Or would I be better going back to first principles and writing some non-SA based code to report to SpamCop, Razor and DCC? Better still, has someone else done it? Is there some nice efficient fast code out there for spamtraps? Cheers -- Chris Hastie