Hi,

On Fri, 23 Jan 2004 00:25:02 +0100 (MET) Matthias Fuhrmann
<[EMAIL PROTECTED]> wrote:

> On Thu, 22 Jan 2004, Vermyndax wrote:
> 
> > Matthias...
> >
> > Argh, that looked abysmally easy.  I guess I could have taken a crack at
> > that after all.
> 
> may i take this as a sign of success (my english isnt that good in all
> terms ...) ? :)

Es tut mir Leid. Mein Deutsch ist nicht so gut, nicht wahr? Ich habe nur
drei jahre Deutsch in Hochschule studieren und Ich habe alles
vorgessen... ;)

> > You might want to try Bob's 1.5 script though...
> 
> doesnt works for me yet, just posted, maybe he'll fix that too :)

The problem in sa-stats.pl v1.3 is in line 100, deep in a gory regex:

    88  LINE: while (<LOG>) {
    89
    90  # Agh... this is ugly.
    91    if (m/
    92  ^(\w{3})\s+             # Month
    93  (\d+)\s+                # Day
    94  (\d\d):(\d\d):(\d\d)\s+ # HH:MM:SS
    95  \w+\s+                  # Hostname?
    96  spamd\[\d+\]:\s+        # spamd[PID]
    97  (clean\smessage|identified\sspam)\s  # Status
    98  \(([-0-9.]+)\/([-0-9.]+)\)\s # Score, Threshold
    99  for\s
   100  \w+:\d+\s             # for daf:1000
   101  in\s
   102  [0-9.]+\sseconds,\s+
   103  [0-9]+\sbytes\./x) {  # There's an extra space at the end for some reason.
   104
   105
   106      #Split line into components

To fix, change:

   100  \w+:\d+\s             # for daf:1000

to

   100  [^:]+:\d+\s           # for daf:1000

Going back to your (Matthias') log entries, we can see the problem:

Jan 22 22:58:01 stinger spamd[19733]: clean message (4.9/5.0) for (unknown):108 in 1.2 
seconds, 1674 bytes.
Jan 22 22:57:51 stinger spamd[19717]: identified spam (27.1/5.0) for (unknown):108 in 
4.1 seconds, 2842 bytes.

The portion of the log entry that should be matched by /\w+:\d+/ is
'(unknown):108' -- the problem is that '(' and ')' are not matched by
\w. The quick answer is to replace \w with [^:] but that's still
brittle. There's a little we can do to simplify that regex, mostly by
using Parse::Syslog to extract generic elements and then use a smaller
regex to get at spamd-specific data. From my version:

    102 my $YEAR = (localtime(time))[5]; # this is years since 1900
    103
    104 my $parser = Parse::Syslog->new( $opt{'logfile'},
    105                                 year   => $YEAR + 1900,
    106                                 );
    107
    108 parseloop:
    109 while (my $sl = $parser->next) {
    110     next parseloop unless ($sl->{'program'} eq 'spamd');
    111     if ($sl->{'text'} =~ m/
    112         (clean\smessage|identified\sspam)\s  # Status
    113         \(([-0-9.]+)\/([-0-9.]+)\)\s         # Score, Threshold
    114         for\s
    115         ([^:]+):\d+\s                            # for daf:1000
    116         in\s
    117         ([0-9.]+)\sseconds,\s+
    118         ([0-9]+)\sbytes\.
    119     /x) {

In line 102 $YEAR should be set via a command line option or a standard,
smart algorithm for guessing the year. I chose the simple, dumb way of
taking the current year which is guaranteed to break when processing the
December 31st logs. I believe Duncan fixed this but I don't know how; I
haven't seen a recent official version of the code.

hth,

-- Bob


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to