Luigi Iotti:
> Hi all
> 
> I operate two very different postfix machines. One is heavy loaded and with
> a decent hardware, the other is my home machine. Both have CentOS5 with
> postfix-2.3.3, amavis, spamassassin and clamav. On both machines there is a
> mail account signed on the same mailing list (in particular, the popular
> Squid web proxy daemon mailing list).
> From time to time, one or both of these accounts exhibit the same problem
> while receiving a message from the mentioned mailing list.
> A message is received saying (I paste a transcript of the error I receive
> from my home machine, but the problem on the other is the same):
> 
> Return-Path: <[EMAIL PROTECTED]>
> From: [EMAIL PROTECTED] (Mail Delivery System)
> To: [EMAIL PROTECTED] (Postmaster)
> Subject: Postfix SMTP server: errors from squid-cache.org[12.160.37.9]
> 
> Transcript of session follows.
> 
>  Out: 220 barattolo.rinnanet.it ESMTP Postfix
>  In:  HELO squid-cache.org
>  Out: 250 barattolo.rinnanet.it
>  In:  MAIL FROM:<[EMAIL PROTECTED]>
>  Out: 250 2.1.0 Ok
>  In:  RCPT TO:<[EMAIL PROTECTED]>
>  Out: 250 2.1.5 Ok
>  In:  DATA
>  Out: 354 End data with <CR><LF>.<CR><LF>
>  Out: 451 4.3.0 Error: queue file write error
> 
> Session aborted, reason: lost connection
> 
> 
> Having a look at the logs, I find:
> Sep 24 06:51:13 barattolo postfix/smtpd[5832]: connect from
> squid-cache.org[12.160.37.9]
> ...
> Sep 24 06:52:08 barattolo postfix/smtpd[5832]: NOQUEUE: filter: RCPT from
> squid-cache.org[12.160.37.9]: <squid-cache.org[12.160.37.9]>: Client host
> triggers FILTER smtp-amavis:[127.0.0.1]:10024;
> from=<[EMAIL PROTECTED]>
> to=<[EMAIL PROTECTED]> proto=SMTP helo=<squid-cache.org>
> Sep 24 06:52:08 barattolo postfix/smtpd[5832]: 2928F10000E:
> client=squid-cache.org[12.160.37.9]
> ...
> Sep 24 07:52:07 barattolo postfix/cleanup[5848]: warning: 2928F10000E: read
> timeout on cleanup socket
> ...
> Sep 24 08:01:48 barattolo postfix/smtpd[5832]: disconnect from
> squid-cache.org[12.160.37.9]
> 
> I'm tempted to think that this is a mailing list's manager problem, and to
> forget about it, but I would like to be sure that the fault is not partly or
> totally mine.
> Any suggestions?

Normally, all Postfix network and inter-process I/O is subject to
time limits. On Linux these time limits are implemented with poll().
Network and inter-process I/O are done over TCP or UNIX-domain
sockets.  Those sockets are in blocking mode, and Postfix relies
on the kernel to return early when a read or write operation is
incomplete (sockets are in blocking mode because of Solaris bugs;
by now I should perhaps stop working around bugs from 1996).

In your case, the smtpd process gets stuck, the cleanup process
gives up after waiting for one hour, and then the smtpd process
becomes un-stuck more than 9 minutes later.  In the mean time, the
SMTP client and the cleanup process have gone away, but of course
the smtpd process discovers that only after it becomes un-stuck.

I have no idea why the smtpd process would get stuck except of
course for kernel bugs.

        Wietse

Reply via email to