Hi, we had some problems in the last weeks with our mailserver. Some messages were not delivered and we wanted to know why. But looking through the logfile is a time consuming process. So I wanted to write a parser to analyse the logs and parse them as XML.
But I have never written a parser before and know I'm sitting in front of the logfile trying to write the grammar for pyparsing. First of all I need to know if it is possible to parse that kind of info into XML. Here is an excerpt of the logfile lines I'm interested in: Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301: message-id=<[EMAIL PROTECTED]> Sep 18 04:15:22 mailrelay spamd[1364]: spamd: processing message <[EMAIL PROTECTED]> for nobody:65534 Sep 18 04:15:25 mailrelay spamd[1364]: spamd: result: Y 15 - BAYES_99,DATE_IN_PAST_03_06,DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_DSN,DNS_FROM_RFC_POST,DNS_FROM_RFC_WHOIS,FORGED_MUA_OUTLOOK,SPF_SOFTFAIL scantime=3.1,size=8086,user=nobody,uid=65534,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=55277,mid=<[EMAIL PROTECTED]>,bayes=1,autolearn=no Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E: message-id=<[EMAIL PROTECTED]> Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD: message-id=<[EMAIL PROTECTED]> Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD: to=<[EMAIL PROTECTED]>, relay=10.49.0.7[10.49.0.7], delay=1, status=sent (250 2.6.0 <[EMAIL PROTECTED]> Queued mail for delivery) They are filtered by "message-id", so all these lines above have something to do with the message "[EMAIL PROTECTED]". The original logfile is about 25 MB big, so I can't post all of the lines of course ;-) Looking at these lines I realized that there are "Queue IDs": 755387301 DA1431965E EF90720AD Filtering the log for these IDs results in the following lines: Sep 18 02:15:11 mailrelay postfix/smtpd[10841]: 755387301: client=unknown[194.25.242.123] Sep 18 04:15:22 mailrelay postfix/cleanup[12103]: 755387301: message-id=<[EMAIL PROTECTED]> Sep 18 04:15:22 mailrelay postfix/qmgr[11082]: 755387301: from=<[EMAIL PROTECTED]>, size=8152, nrcpt=7 (queue active) Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: to=<[EMAIL PROTECTED]>, relay=procmail, delay=14, status=sent (filter) Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: to=<[EMAIL PROTECTED]>, relay=procmail, delay=14, status=sent (filter) Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: to=<[EMAIL PROTECTED]>, relay=procmail, delay=14, status=sent (filter) Sep 18 04:15:25 mailrelay postfix/pipe[11659]: 755387301: to=<[EMAIL PROTECTED]>, relay=procmail, delay=14, status=sent (filter) Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: 755387301: removed Sep 18 04:15:25 mailrelay postfix/pickup[13175]: DA1431965E: uid=65534 from=<nobody> Sep 18 04:15:25 mailrelay postfix/cleanup[12074]: DA1431965E: message-id=<[EMAIL PROTECTED]> Sep 18 04:15:25 mailrelay postfix/qmgr[11082]: DA1431965E: from=<[EMAIL PROTECTED]>, size=11074, nrcpt=1 (queue active) Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E: to=<[EMAIL PROTECTED]>, relay=localhost[127.0.0.1], delay=1, status=sent (250 Ok: queued as EF90720AD) Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: DA1431965E: removed Sep 18 04:15:25 mailrelay postfix/smtpd[11704]: EF90720AD: client=localhost[127.0.0.1] Sep 18 04:15:26 mailrelay postfix/cleanup[13057]: EF90720AD: message-id=<[EMAIL PROTECTED]> Sep 18 04:15:26 mailrelay postfix/smtp[11703]: DA1431965E: to=<[EMAIL PROTECTED]>, relay=localhost[127.0.0.1], delay=1, status=sent (250 Ok: queued as EF90720AD) Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD: from=<[EMAIL PROTECTED]>, size=11263, nrcpt=1 (queue active) Sep 18 04:15:26 mailrelay postfix/smtp[10879]: EF90720AD: to=<[EMAIL PROTECTED]>, relay=10.49.0.7[10.49.0.7], delay=1, status=sent (250 2.6.0 <[EMAIL PROTECTED]> Queued mail for delivery) Sep 18 04:15:26 mailrelay postfix/qmgr[11082]: EF90720AD: removed All this work is done with command line and grep... Is it possible to parse this big logfile only ONCE and extract all this info into XML? Like this: <message id="[EMAIL PROTECTED]"> <timestamp>Sep 18 04:15:26</timestamp> <from>[EMAIL PROTECTED]</from> <to>[EMAIL PROTECTED]</to> <to>[EMAIL PROTECTED]</to> <to>[EMAIL PROTECTED]</to> <to>[EMAIL PROTECTED]</to> <queueID>EF90720AD</queueID> <queueID>DA1431965E</queueID> <queueID>755387301</queueID> <spamd> <score>15</score> <filtered>yes</filtered> <sendto>[EMAIL PROTECTED]</sendto> </spamd> </message> The goal of this is to provide a web interface were we can see if the messages were filtered as spam (or deleted by our virus scanner). Is it possible? Or do I have to scan / parse the file more than once? Andi -- Mozilla Thunderbird 1.5.0.7 Arch Linux -- http://mail.python.org/mailman/listinfo/python-list