[Sent to Talk & Devel lists, please trim Cc when you answer]

First, I need to apologize for sending all this mixed in one patch:
- I did send two of these separately earlier here
- The patches are somewhat interleaved, so splitting them would have meant
  writing multiple versions
- I didn't have time to refeed each piece slowly and hoping/waiting for it
  to be included, I need the whole thing for sourceforge.net for last week :)

My RBL rehaul to stop the problems I explained earlier hasn't changed:
http://bugzilla.spamassassin.org/show_bug.cgi?id=370

I rewrote  my patch  to log  how long  it took for  SA to  run and  in which
portions of  the code the clock  (not CPU) time was  actually spent. This is
quite  useful  if you  care  about  how long  SA  took  to scan  a  message,
especially if you do it at SMTP time (which I do with SA-Exim)
This is obviously a user option

To give you an idea, that's what the log looks like:
0.000: Starting SpamAssassin Check
0.268: Init completed
0.269: Created message object, checking message
0.270: Launching RBL queries in the background
0.388: Finished launching RBL queries in the background
0.388: Starting head tests
0.494: Finished head tests (Delta: 0.106s)
0.494: Starting body tests
0.870: Finished body tests (Delta: 0.376s)
0.870: Starting raw body tests
0.983: Finished raw body tests (Delta: 0.113s)
0.983: Starting full message tests
1.097: Razor -> Starting razor test (10 secs max)
1.241: Razor -> Finished razor test: confirmed spam (Delta: 0.145s)
1.242: Finished full message tests (Delta: 0.259s)
1.242: Starting head eval tests
1.308: Finished head eval tests (Delta: 0.065s)
1.308: Starting RBL tests (will wait up to  secs before giving up)
1.309: RBL -> Waiting for result on 161.71.244.211.bl.spamcop.net.
11.411: RBL -> Timeout on 161.71.244.211.bl.spamcop.net. (Delta: 10.101s)
11.413: RBL -> Waiting for result on 161.71.244.211.orbs.dorkslayers.com.
11.415: RBL -> No match on 161.71.244.211.orbs.dorkslayers.com. (Delta: 0.002s)
11.416: RBL -> Waiting for result on 161.71.244.211.relays.osirusoft.com.
11.418: RBL -> No match on 161.71.244.211.relays.osirusoft.com. (Delta: 0.002s)
11.422: RBL -> Waiting for result on 161.71.244.211.relays.ordb.org.
11.424: RBL -> No match on 161.71.244.211.relays.ordb.org. (Delta: 0.002s)
11.426: RBL -> Waiting for result on 161.71.244.211.ipwhois.rfc-ignorant.org.
11.474: RBL -> match on 161.71.244.211.ipwhois.rfc-ignorant.org. (Delta: 0.048s)
11.476: RBL -> Waiting for result on 161.71.244.211.dialups.mail-abuse.org.
11.520: RBL -> match on 161.71.244.211.dialups.mail-abuse.org. (Delta: 0.043s)
11.523: RBL -> Waiting for result on 161.71.244.211.blackholes.mail-abuse.org.
11.525: RBL -> No match on 161.71.244.211.blackholes.mail-abuse.org. (Delta: 0.002s)
11.527: RBL -> Waiting for result on 161.71.244.211.relays.mail-abuse.org.
11.529: RBL -> No match on 161.71.244.211.relays.mail-abuse.org. (Delta: 0.002s)
11.531: RBL -> Waiting for result on 161.71.244.211.relays.visi.com.
11.533: RBL -> No match on 161.71.244.211.relays.visi.com. (Delta: 0.002s)
11.538: Finished all RBL tests (Delta: 10.230s)
11.545: Done checking message (Delta: 11.276s)
11.545: Done running SpamAssassin (Delta: 11.545s)


Then, my big goal was to put a bound on SA.
I didn't touch the MX test mainly  because there's only one that's done, and
SA  scores you  negatively  if  it didn't  succeed. Also,  the current  code
actually runs the check  3 times (as an option), so that  didn't fit with my
scheme of starting all the DNS queries at the beginning.

Basically, if you look at PerMsgStatus, I did the following:

    timelog("Launching RBL queries in the background", "rblbg", 1);
    # Here, we launch all the DNS RBL queries and let them run while we
    # inspect the message -- Marc
    $self->do_rbl_eval_tests(0);
    timelog("Finished launching RBL queries in the background", "rblbg", 22);

    timelog("Starting head tests", "headtest", 1);
    $self->do_head_tests();
    timelog("Finished head tests", "headtest", 2);
(...)
    timelog("Starting head eval tests", "headevaltest", 1);
    $self->do_head_eval_tests();
    timelog("Finished head eval tests", "headevaltest", 2);

    timelog("Starting RBL tests (will wait up to $self->{conf}->{dns_timeout} secs 
before giving up)", "rblblock", 1);
    # This time we want to harvest the DNS results -- Marc
    $self->do_rbl_eval_tests(1);
    # And now we can compute rules that depend on those results
    $self->do_rbl_res_eval_tests();
    timelog("Finished all RBL tests", "rblblock", 2);

RBL rules now get prefixed with rbleval or rblreseval so that you can easily
put them in a different set, and run them in the order intended.
All the DNS queries are launched with bgsend before everything else happens,
and we harvest the results at the end (waiting up to rbl_timeout)
Note that while I put in some support for old RBL rules defined as:
header RCVD_IN_OSIRUSOFT_COM    eval:check_rbl('osirusoft', 'relays.osirusoft.com.')
You're supposed to define them as such:
header RCVD_IN_OSIRUSOFT_COM    rbleval:check_rbl('osirusoft', 'relays.osirusoft.com.')


My last change was debugging.
I noticed it was kind of a mess, everyone put his/her own debug messages, and
left them behind or commented them out before the commit.
I   implemented  debug   sets   which   are  turned   on   and  off   inside
SpamAssassin.pm's new  function right now,  but they  could be moved  to the
command line. At least they can be turned on and off easilhy in one place.

I'll quote the code:
  # This should be moved elsewhere, I know, but SA really needs debug sets 
  # I'm putting the intialization here for now, move it if you want

  # For each part of the code, you can set debug levels. If the level is
  # progressive, use negative numbers (the more negative, the move debug info
  # is put out), and if you want to use bit fields, use positive numbers
  # All code path debug codes should be listed here with a value of 0 if you
  # want them disabled -- Marc

  #$DEBUG->{datediff}=-1;
  #$DEBUG->{razor}=-3;
  $DEBUG->{rbl}=0;
  $DEBUG->{timelog}=0;
  # Bitfield:
  # header regex: 1 | body-text: 2 | uri tests: 4 | raw-body-text: 8
  # full-text regexp: 16 | run_eval_tests: 32 | run_rbl_eval_tests: 64
  $DEBUG->{rulesrun}=64;


The dbg function now looks like this:

# Only the first argument is needed, and it can be a reference to a list if
# you want
sub dbg {
  my ($msg, $codepath, $level) = @_;
  my $dbg=$Mail::SpamAssassin::DEBUG;

  $msg=join('',@{$msg}) if (ref $msg);

  if (defined $codepath) {
    if (not defined $dbg->{$codepath}) {
      warn("dbg called with codepath $codepath, but it's not defined, skipping 
(message was \"$msg\"\n");
      return 0;
    } elsif (not defined $level) {
      warn("dbg called with codepath $codepath, but no level threshold (message was 
\"$msg\"\n");
    }
  }
  return if (not $dbg->{enabled});
  # Negative levels are just level numbers, the more negative, the more debug
  return if (defined $level and $level<0 and not $dbg->{$codepath} <= $level);
  # Positive levels are bit fields
  return if (defined $level and $level>0 and not $dbg->{$codepath} & $level);

  warn "debug: $msg\n";
}

It used  to accept  a list  but no  one used  that anywhere  in the  code (I
checked), so the  fact that it only  accepts a scalar for  the debug message
should be ok.
I made it accept an array ref just in case someone wanted to use the feature
though.

That's about it.
Feel free to ask questions.

I'll submit the patch to bugzilla, and you can also find it here in the 
meantime:
http://marc.merlins.org/linux/SA/sa_realrbl-rbltimeout-timelog-debugsets.diff

Cheers,
Marc
-- 
Microsoft is to operating systems & security ....
                                      .... what McDonalds is to gourmet cooking
  
Home page: http://marc.merlins.org/   |   Finger [EMAIL PROTECTED] for PGP key

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to