Hi, On Fri, 23 Jan 2004 00:25:02 +0100 (MET) Matthias Fuhrmann <[EMAIL PROTECTED]> wrote:
> On Thu, 22 Jan 2004, Vermyndax wrote: > > > Matthias... > > > > Argh, that looked abysmally easy. I guess I could have taken a crack at > > that after all. > > may i take this as a sign of success (my english isnt that good in all > terms ...) ? :) Es tut mir Leid. Mein Deutsch ist nicht so gut, nicht wahr? Ich habe nur drei jahre Deutsch in Hochschule studieren und Ich habe alles vorgessen... ;) > > You might want to try Bob's 1.5 script though... > > doesnt works for me yet, just posted, maybe he'll fix that too :) The problem in sa-stats.pl v1.3 is in line 100, deep in a gory regex: 88 LINE: while (<LOG>) { 89 90 # Agh... this is ugly. 91 if (m/ 92 ^(\w{3})\s+ # Month 93 (\d+)\s+ # Day 94 (\d\d):(\d\d):(\d\d)\s+ # HH:MM:SS 95 \w+\s+ # Hostname? 96 spamd\[\d+\]:\s+ # spamd[PID] 97 (clean\smessage|identified\sspam)\s # Status 98 \(([-0-9.]+)\/([-0-9.]+)\)\s # Score, Threshold 99 for\s 100 \w+:\d+\s # for daf:1000 101 in\s 102 [0-9.]+\sseconds,\s+ 103 [0-9]+\sbytes\./x) { # There's an extra space at the end for some reason. 104 105 106 #Split line into components To fix, change: 100 \w+:\d+\s # for daf:1000 to 100 [^:]+:\d+\s # for daf:1000 Going back to your (Matthias') log entries, we can see the problem: Jan 22 22:58:01 stinger spamd[19733]: clean message (4.9/5.0) for (unknown):108 in 1.2 seconds, 1674 bytes. Jan 22 22:57:51 stinger spamd[19717]: identified spam (27.1/5.0) for (unknown):108 in 4.1 seconds, 2842 bytes. The portion of the log entry that should be matched by /\w+:\d+/ is '(unknown):108' -- the problem is that '(' and ')' are not matched by \w. The quick answer is to replace \w with [^:] but that's still brittle. There's a little we can do to simplify that regex, mostly by using Parse::Syslog to extract generic elements and then use a smaller regex to get at spamd-specific data. From my version: 102 my $YEAR = (localtime(time))[5]; # this is years since 1900 103 104 my $parser = Parse::Syslog->new( $opt{'logfile'}, 105 year => $YEAR + 1900, 106 ); 107 108 parseloop: 109 while (my $sl = $parser->next) { 110 next parseloop unless ($sl->{'program'} eq 'spamd'); 111 if ($sl->{'text'} =~ m/ 112 (clean\smessage|identified\sspam)\s # Status 113 \(([-0-9.]+)\/([-0-9.]+)\)\s # Score, Threshold 114 for\s 115 ([^:]+):\d+\s # for daf:1000 116 in\s 117 ([0-9.]+)\sseconds,\s+ 118 ([0-9]+)\sbytes\. 119 /x) { In line 102 $YEAR should be set via a command line option or a standard, smart algorithm for guessing the year. I chose the simple, dumb way of taking the current year which is guaranteed to break when processing the December 31st logs. I believe Duncan fixed this but I don't know how; I haven't seen a recent official version of the code. hth, -- Bob ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk