This lengthy email (sorry) contains three sections: 1. Filtering order (spam, virus vs virus, spam vs spam+virus) 2. SA's use of ClamAV to retain the benefits in #1 3. SA's use of short-circuiting to reduce frivolous scans
The filtering order that I see recommended all the time is virus detection before spam detection. However, the vast majority of incoming mail is spam, and even the majority of virus-laden mail is caught by spam filters without any hooks into "real" virus scanners. On a mail server that rejects mail at the door ("SMTP-time") for both anti-virus and anti-spam, rejecting in the first step would mean that the second check is never run. Since the amount of spam (and viruses!) blocked by SpamAssassin vastly outnumbers the amount of viruses that would have been blocked before running SA, the only way to justify running virus detection in front of SA would be if it were more efficient by a larger order of magnitude than the spam to virus ratio. I am under the impression that virus checking is *not* that much easier than a fully-loaded SA implementation, so therefore spam detection should run first. Counter-point: online lookups cost bandwidth and latency, virus detection doesn't (yet) require any. Pause. Constructive comments and criticisms? Don't get too caught up in the above part, it is all illustrative in getting to my question below. Mail that passes SpamAssassin but gets caught by ClamAV would add value to SA's Bayesian and AWL databases and thus the message stands a chance at getting caught in the future regardless of its viral content. To best take advantage of that system while not compromising the short-circuiting, SA's ClamAV plugin should be configured to run at the very end of the scan and should be skipped for any message scoring high enough to hit autolearn (which should be higher than the SMTP rejection threshold). As I can't figure out how to do this, I run it separately. How do I configure the ClamAV plugin to be run by SpamAssassin, but only on mail otherwise scoring under bayes_auto_learn_threshold_spam? The ShortCircuit and priority mechanisms do not seem to be capable of this. The closest I can get is: ######## loadplugin CompareScores comparescores.pm loadplugin ClamAV clamav.pm ifplugin Mail::SpamAssassin::Plugin::Shortcircuit ifplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold full __STOP_IF_SPAM eval:check_if_autolearn_spam() else full __STOP_IF_SPAM eval:check_score_is_under(12) endif # note, this is after AWL (1000) priority __STOP_IF_SPAM 10000 shortcircuit __STOP_IF_SPAM on endif full CLAMAV eval:check_clamav() describe CLAMAV Clam AntiVirus detected a virus priority 10001 score CLAMAV 15 ######## Of course, CompareScores and its two functions do not yet exist (or is there already something I can use to that effect?). This runs after autowhitelist (AWL) because it has to; though it would be nice to recalculate AWL after running CLAMAV, the __STOP_IF_SPAM check would prevent AWL from running on any message that isn't already surefire spam. The workaround solution (requiring yet more new code) would be to recalculate it (ignoring the first AWL results) after priority 10001, and the "real" solution is described below. Am I splitting hairs? Is this so trivial that it doesn't matter? ... I'm sure Justin or Theo or some other developer will chime in and state that the whole short-circuiting system needs revisiting for the larger picture: to handle points of diminishing returns. Consider the following order or scanning within SA (with each step containing specific short-circuits as currently implemented): 1. local ham checks (only quick & efficient checks here) 2. local spam checks (only quick & efficient checks here) 3. network + slow ham checks 4. network + slow spam checks 5. autowhitelist Step four would be able to have a default short-circuit every step of the way (and it would only short-circuit the remainder of its tasks, thus still enabling AWL); once you hit the autolearn=spam threshold (or perhaps something higher if you really care about AWL), there's no reason to run more checks. This means that mail nailed by step 2 and not rescued by step 3 would bypass step 4 altogether. No DNS lookups, no Razor2, no ClamAV. It is my current understanding that SA doesn't do this. -- Adam Katz khopesh on irc://irc.freenode.net/#spamassassin http://khopesh.com/Anti-spam