Hello, I'm enhancing a program I use to follow Postfix logs in sort of real-time. It does a couple of things, and the most important is to record a single-line summaries for each message, storing source IP address, from, to, timestamp, size, final delivery destination (mail relay or mailbox) and status. In order to keep up with a lot of activity, I need to keep some data in memory, so that I can summarize all the log lines describing a message process.
The problem is: many things can go wrong in a message delivery, and, worse, syslog, based on UDP, can lose datagrams due to different causes. So such a program running for a long time could end up storing a lot of incomplete messages in memory. I am planning to use timestamps in order to flush longstanding messages, but of course I don't want to discard too much incomplete data. Looking at the progress of an email message, I think I've found a possible problem (for me) with logging. Let's see a normal example: 1- Client connects -> smtpd records a "connect from..." 2- client sends valid "HELO" 3- client sends valid "MAIL FROM:" 4- client sends valid "RCPT TO:" -> smtpd logs a "client=" line with a queue id (at this point, I store the source IP address and queue id for the message) 5- client sends "DATA" and message headers, body. 6- client finishes sending body, sends "." -> cleanup logs "message-id=" 7- qmgr logs "from=" 8- transport (local, etc) logs 'to=' with delivery status, etc. 9- qmgr logs "queue id removed" 10- smtpd logs "disconnect" So far, so good, this one is trivial to follow. But, what happens if the client isn't so predictable? Let's see an example: Zaphod:~ borjam$ telnet 172.16.2.117 smtp Trying 172.16.2.117... Connected to 172.16.2.117. Escape character is '^]'. 220 pruebazfs1.sarenet.es ESMTP Postfix HELO testing 250 pruebazfs1.sarenet.es MAIL FROM: <bor...@sarenet.es> 250 2.1.0 Ok RCPT TO: <bor...@pruebazfs1> 250 2.1.5 Ok quit 221 2.0.0 Bye Connection closed by foreign host. And the log records this: Feb 23 15:08:20 pruebazfs1 postfix/smtpd[4800]: connect from unknown[172.16.1.204] Feb 23 15:08:56 pruebazfs1 postfix/smtpd[4800]: 522243F471: client=unknown[172.16.1.204] Feb 23 15:09:00 pruebazfs1 postfix/smtpd[4800]: disconnect from unknown[172.16.1.204] After this, I have recorded a (queue id, ip address) tuple, and the client disconnected. So it could be a good bet to just remove the aborted (queue_id,ip_addr) tuple from my list. But I'm worried that I could lose messages if the server is slow. The "message-id=" line is logged by a different program, cleanup. Doing a pipelined send, I've seen that the progress is logged as follows: Feb 23 15:13:33 pruebazfs1 postfix/smtpd[4819]: connect from unknown[172.16.1.204] Feb 23 15:13:33 pruebazfs1 postfix/smtpd[4819]: improper command pipelining after HELO from unknown[172.16.1.204] Feb 23 15:13:33 pruebazfs1 postfix/smtpd[4819]: 4DEAD418A8: client=unknown[172.16.1.204] +++Feb 23 15:13:33 pruebazfs1 postfix/cleanup[4823]: 4DEAD418A8: message-id=<20100223141333.4dead41...@pruebazfs1.sarenet.es> Feb 23 15:13:33 pruebazfs1 postfix/qmgr[4746]: 4DEAD418A8: from=<bor...@sarenet.es>, size=358, nrcpt=1 (queue active) +++Feb 23 15:13:33 pruebazfs1 postfix/smtpd[4819]: disconnect from unknown[172.16.1.204] I'm marking the two interesting lines with a "+++". I could just discard a (queue_id, ip_addr) tuple in case I see a "disconnect" line from the same smtpd process. But, is it possible for the "message-id=" line from "cleanup" to be delayed? I've noticed that smtpd, upon reception of a valid "rcpt to", connects to a cleanup process and there seems to be a strong coupling between the two. Is there, maybe, a guarantee (due to the program flow in smtpd) that the "message-id=" line from cleanup will always be logged before the "disconnected" line from smtpd? Is it a synchronous coupling, or messages sent from smtpd go to a cleanup queue to be processed asynchronously, and hence it would be possible to see a "disconnected" smtpd line before the "message-id=" line from cleanup? I hope this message is understandable ;) I need to avoid memory leaks, but I don't want to be over aggressive with discards. Mail servers can be painfully slow now and then, under heavy loads... By the way, wouldn't it be better to have a message similar to the "queue_id removed" logged by qmgr, when this happens? I mean, a queue id has been allocated by the smtpd/cleanup pair, and in case the client disconnects before "DATA" there's no other mention of that queue id on the system log. Thank you very much in advance, Borja.