[Sent to Talk & Devel lists, please trim Cc when you answer] First, I need to apologize for sending all this mixed in one patch: - I did send two of these separately earlier here - The patches are somewhat interleaved, so splitting them would have meant writing multiple versions - I didn't have time to refeed each piece slowly and hoping/waiting for it to be included, I need the whole thing for sourceforge.net for last week :)
My RBL rehaul to stop the problems I explained earlier hasn't changed: http://bugzilla.spamassassin.org/show_bug.cgi?id=370 I rewrote my patch to log how long it took for SA to run and in which portions of the code the clock (not CPU) time was actually spent. This is quite useful if you care about how long SA took to scan a message, especially if you do it at SMTP time (which I do with SA-Exim) This is obviously a user option To give you an idea, that's what the log looks like: 0.000: Starting SpamAssassin Check 0.268: Init completed 0.269: Created message object, checking message 0.270: Launching RBL queries in the background 0.388: Finished launching RBL queries in the background 0.388: Starting head tests 0.494: Finished head tests (Delta: 0.106s) 0.494: Starting body tests 0.870: Finished body tests (Delta: 0.376s) 0.870: Starting raw body tests 0.983: Finished raw body tests (Delta: 0.113s) 0.983: Starting full message tests 1.097: Razor -> Starting razor test (10 secs max) 1.241: Razor -> Finished razor test: confirmed spam (Delta: 0.145s) 1.242: Finished full message tests (Delta: 0.259s) 1.242: Starting head eval tests 1.308: Finished head eval tests (Delta: 0.065s) 1.308: Starting RBL tests (will wait up to secs before giving up) 1.309: RBL -> Waiting for result on 161.71.244.211.bl.spamcop.net. 11.411: RBL -> Timeout on 161.71.244.211.bl.spamcop.net. (Delta: 10.101s) 11.413: RBL -> Waiting for result on 161.71.244.211.orbs.dorkslayers.com. 11.415: RBL -> No match on 161.71.244.211.orbs.dorkslayers.com. (Delta: 0.002s) 11.416: RBL -> Waiting for result on 161.71.244.211.relays.osirusoft.com. 11.418: RBL -> No match on 161.71.244.211.relays.osirusoft.com. (Delta: 0.002s) 11.422: RBL -> Waiting for result on 161.71.244.211.relays.ordb.org. 11.424: RBL -> No match on 161.71.244.211.relays.ordb.org. (Delta: 0.002s) 11.426: RBL -> Waiting for result on 161.71.244.211.ipwhois.rfc-ignorant.org. 11.474: RBL -> match on 161.71.244.211.ipwhois.rfc-ignorant.org. (Delta: 0.048s) 11.476: RBL -> Waiting for result on 161.71.244.211.dialups.mail-abuse.org. 11.520: RBL -> match on 161.71.244.211.dialups.mail-abuse.org. (Delta: 0.043s) 11.523: RBL -> Waiting for result on 161.71.244.211.blackholes.mail-abuse.org. 11.525: RBL -> No match on 161.71.244.211.blackholes.mail-abuse.org. (Delta: 0.002s) 11.527: RBL -> Waiting for result on 161.71.244.211.relays.mail-abuse.org. 11.529: RBL -> No match on 161.71.244.211.relays.mail-abuse.org. (Delta: 0.002s) 11.531: RBL -> Waiting for result on 161.71.244.211.relays.visi.com. 11.533: RBL -> No match on 161.71.244.211.relays.visi.com. (Delta: 0.002s) 11.538: Finished all RBL tests (Delta: 10.230s) 11.545: Done checking message (Delta: 11.276s) 11.545: Done running SpamAssassin (Delta: 11.545s) Then, my big goal was to put a bound on SA. I didn't touch the MX test mainly because there's only one that's done, and SA scores you negatively if it didn't succeed. Also, the current code actually runs the check 3 times (as an option), so that didn't fit with my scheme of starting all the DNS queries at the beginning. Basically, if you look at PerMsgStatus, I did the following: timelog("Launching RBL queries in the background", "rblbg", 1); # Here, we launch all the DNS RBL queries and let them run while we # inspect the message -- Marc $self->do_rbl_eval_tests(0); timelog("Finished launching RBL queries in the background", "rblbg", 22); timelog("Starting head tests", "headtest", 1); $self->do_head_tests(); timelog("Finished head tests", "headtest", 2); (...) timelog("Starting head eval tests", "headevaltest", 1); $self->do_head_eval_tests(); timelog("Finished head eval tests", "headevaltest", 2); timelog("Starting RBL tests (will wait up to $self->{conf}->{dns_timeout} secs before giving up)", "rblblock", 1); # This time we want to harvest the DNS results -- Marc $self->do_rbl_eval_tests(1); # And now we can compute rules that depend on those results $self->do_rbl_res_eval_tests(); timelog("Finished all RBL tests", "rblblock", 2); RBL rules now get prefixed with rbleval or rblreseval so that you can easily put them in a different set, and run them in the order intended. All the DNS queries are launched with bgsend before everything else happens, and we harvest the results at the end (waiting up to rbl_timeout) Note that while I put in some support for old RBL rules defined as: header RCVD_IN_OSIRUSOFT_COM eval:check_rbl('osirusoft', 'relays.osirusoft.com.') You're supposed to define them as such: header RCVD_IN_OSIRUSOFT_COM rbleval:check_rbl('osirusoft', 'relays.osirusoft.com.') My last change was debugging. I noticed it was kind of a mess, everyone put his/her own debug messages, and left them behind or commented them out before the commit. I implemented debug sets which are turned on and off inside SpamAssassin.pm's new function right now, but they could be moved to the command line. At least they can be turned on and off easilhy in one place. I'll quote the code: # This should be moved elsewhere, I know, but SA really needs debug sets # I'm putting the intialization here for now, move it if you want # For each part of the code, you can set debug levels. If the level is # progressive, use negative numbers (the more negative, the move debug info # is put out), and if you want to use bit fields, use positive numbers # All code path debug codes should be listed here with a value of 0 if you # want them disabled -- Marc #$DEBUG->{datediff}=-1; #$DEBUG->{razor}=-3; $DEBUG->{rbl}=0; $DEBUG->{timelog}=0; # Bitfield: # header regex: 1 | body-text: 2 | uri tests: 4 | raw-body-text: 8 # full-text regexp: 16 | run_eval_tests: 32 | run_rbl_eval_tests: 64 $DEBUG->{rulesrun}=64; The dbg function now looks like this: # Only the first argument is needed, and it can be a reference to a list if # you want sub dbg { my ($msg, $codepath, $level) = @_; my $dbg=$Mail::SpamAssassin::DEBUG; $msg=join('',@{$msg}) if (ref $msg); if (defined $codepath) { if (not defined $dbg->{$codepath}) { warn("dbg called with codepath $codepath, but it's not defined, skipping (message was \"$msg\"\n"); return 0; } elsif (not defined $level) { warn("dbg called with codepath $codepath, but no level threshold (message was \"$msg\"\n"); } } return if (not $dbg->{enabled}); # Negative levels are just level numbers, the more negative, the more debug return if (defined $level and $level<0 and not $dbg->{$codepath} <= $level); # Positive levels are bit fields return if (defined $level and $level>0 and not $dbg->{$codepath} & $level); warn "debug: $msg\n"; } It used to accept a list but no one used that anywhere in the code (I checked), so the fact that it only accepts a scalar for the debug message should be ok. I made it accept an array ref just in case someone wanted to use the feature though. That's about it. Feel free to ask questions. I'll submit the patch to bugzilla, and you can also find it here in the meantime: http://marc.merlins.org/linux/SA/sa_realrbl-rbltimeout-timelog-debugsets.diff Cheers, Marc -- Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | Finger [EMAIL PROTECTED] for PGP key _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk