Hi,

we had some problems in the last weeks with our mailserver.
Some messages were not delivered and we wanted to know why.
But looking through the logfile is a time consuming process.
So I wanted to write a parser to analyse the logs and parse them as XML.

But I have never written a parser before and know I'm sitting in front 
of the logfile trying to write the grammar for pyparsing.

First of all I need to know if it is possible to parse that kind of info 
into XML.
Here is an excerpt of the logfile lines I'm interested in:

Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301: 
message-id=<[EMAIL PROTECTED]>
Sep 18 04:15:22 mailrelay spamd[1364]: spamd: processing message 
<[EMAIL PROTECTED]> for nobody:65534
Sep 18 04:15:25 mailrelay spamd[1364]: spamd: result: Y 15 - 
BAYES_99,DATE_IN_PAST_03_06,DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_DSN,DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS,FORGED_MUA_OUTLOOK,SPF_SOFTFAIL
 
scantime=3.1,size=8086,user=nobody,uid=65534,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=55277,mid=<[EMAIL
 PROTECTED]>,bayes=1,autolearn=no 

Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E: 
message-id=<[EMAIL PROTECTED]>
Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD: 
message-id=<[EMAIL PROTECTED]>
Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD: 
to=<[EMAIL PROTECTED]>, relay=10.49.0.7[10.49.0.7], 
delay=1, status=sent (250 2.6.0 
<[EMAIL PROTECTED]> Queued mail for delivery)

They are filtered by "message-id", so all these lines above have 
something to do with the message 
"[EMAIL PROTECTED]".

The original logfile is about 25 MB big, so I can't post all of the 
lines of course ;-)

Looking at these lines I realized that there are "Queue IDs":
755387301
DA1431965E
EF90720AD

Filtering the log for these IDs results in the following lines:

Sep 18 02:15:11 mailrelay postfix/smtpd[10841]: 755387301: 
client=unknown[194.25.242.123]
Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301: 
message-id=<[EMAIL PROTECTED]>
Sep 18 04:15:22 mailrelay postfix/qmgr[11082]: 755387301: 
from=<[EMAIL PROTECTED]>, size=8152, nrcpt=7 (queue active)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: 
to=<[EMAIL PROTECTED]>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: 
to=<[EMAIL PROTECTED]>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: 
to=<[EMAIL PROTECTED]>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: 
to=<[EMAIL PROTECTED]>, relay=procmail, delay=14, status=sent (filter)
Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: 755387301: removed

Sep 18 04:15:25 mailrelay postfix/pickup[13175]: DA1431965E: uid=65534 
from=<nobody>
Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E: 
message-id=<[EMAIL PROTECTED]>
Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: DA1431965E: 
from=<[EMAIL PROTECTED]>, size=11074, nrcpt=1 (queue active)
Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E: 
to=<[EMAIL PROTECTED]>, relay=localhost[127.0.0.1], 
delay=1, status=sent (250 Ok: queued as EF90720AD)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: DA1431965E: removed

Sep 18 04:15:25 mailrelay postfix/smtpd[11704]: EF90720AD: 
client=localhost[127.0.0.1]
Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD: 
message-id=<[EMAIL PROTECTED]>
Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E: 
to=<[EMAIL PROTECTED]>, relay=localhost[127.0.0.1], 
delay=1, status=sent (250 Ok: queued as EF90720AD)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD: 
from=<[EMAIL PROTECTED]>, size=11263, nrcpt=1 (queue active)
Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD: 
to=<[EMAIL PROTECTED]>, relay=10.49.0.7[10.49.0.7], 
delay=1, status=sent (250 2.6.0 
<[EMAIL PROTECTED]> Queued mail for delivery)
Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD: removed

All this work is done with command line and grep...

Is it possible to parse this big logfile only ONCE and extract all this 
info into XML?

Like this:

<message id="[EMAIL PROTECTED]">
   <timestamp>Sep 18 04:15:26</timestamp>
   <from>[EMAIL PROTECTED]</from>
   <to>[EMAIL PROTECTED]</to>
   <to>[EMAIL PROTECTED]</to>
   <to>[EMAIL PROTECTED]</to>
   <to>[EMAIL PROTECTED]</to>
   <queueID>EF90720AD</queueID>
   <queueID>DA1431965E</queueID>
   <queueID>755387301</queueID>
   <spamd>
        <score>15</score>
        <filtered>yes</filtered>
        <sendto>[EMAIL PROTECTED]</sendto>
   </spamd>
</message>

The goal of this is to provide a web interface were we can see if the 
messages were filtered as spam (or deleted by our virus scanner).

Is it possible? Or do I have to scan / parse the file more than once?

Andi

-- 
Mozilla Thunderbird 1.5.0.7
Arch Linux
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to