While trying to track down why spamassassin (2.64) suspiciously often took around 1.0x seconds to run, I came across this code snippet: (Mail/SpamAssassin/Dns.pm, line 311): sleep 1;
This same code is still present in version 3.0 (at line 319). The code is executed while collecting the DNSBL lookups. If not all DNSBL query results are in, this waits for the answers to come in. Since we run local copies of most blacklists, I expected the answers in milliseconds - not seconds. It turned out that the fix wasn't too difficult, I've made patches against SpamAssassin 2.64: http://www.xs4all.nl/~johnpc/spamassassin_harvest_dnsbl_select-2.64.patch and against SpamAssassin 3.0: http://www.xs4all.nl/~johnpc/spamassassin_harvest_dnsbl_select-3.0.patch This now uses IO::Select to wait for the first DNS socket to become available for reading. It works fine on the platforms I tested it on (FreeBSD 4.10 and Debian linux "unstable"), but this might have issues on inferior OSes from the Evil Empire. I haven't checked this. If it does, however, I suggest adding a test for $Config{'osname'} eq 'MSWin32', and just calling "sleep" in that case. That it helps performance dramatically can be seen from some stats. On a regular mailflow (practically everything spam), average runtime of spamassassin before the patch was 0.92 seconds. After the patch, average runtime dropped to 0.26 seconds, a 3.5 times speedup. Still not convinced? This is a distribution of the spamassassin runtime before the patch, rounded to tenths of a second: t # of calls ------ 0.0 13 0.1 117 0.2 46 0.3 19 0.4 8 0.5 2 0.6 3 0.7 5 0.8 4 0.9 3 1.0 2 1.1 571 1.2 13 1.3 1 1.4 7 1.6 2 1.7 15 1.8 4 1.9 5 2.0 1 2.1 7 2.2 1 2.4 2 3.0 1 (Note the absurdly large spike at 1.1 seconds) After the patch, I measured this distribution: t # of calls ------ 0.0 137 0.1 3286 0.2 1271 0.3 592 0.4 118 0.5 45 0.6 24 0.7 91 0.8 72 0.9 19 1.0 14 1.1 17 1.2 15 1.3 8 1.4 16 1.5 3 1.6 3 1.7 1 1.8 1 2.1 61 2.2 38 2.3 14 2.4 6 2.7 1 This looks a lot healthier... the calls at 2.x seconds are probably due to other DNS lookups, I haven't figured that out, but it's not significant enough to make me worry ;) Hope this helps (and gets to someone with enough CVS commit clearance :) BTW: you can use this piece of code any way you want, specifically allowed is including it in future SpamAssassin releases. Does this make the lawyers happy enough? -- #!perl -pl # mmfppfmpmmpp mmpffm <[EMAIL PROTECTED]> $p=3-2*/[^\W\dmpf_]/i;s.[a-z]{$p}.vec($f=join('',$p-1?chr(sub{$_[0]*9+$_[1]*3+ $_[2]}->(map{/p|f/i+/f/i}split//,$&)+97):('m',p,f)[map{((ord$&)%32-1)/$_%3}(9, 3,1)]),5,1)='`'lt$&;$f.eig; # Jan-Pieter Cornet