On 17 Aug 2016, at 11:34, Zhang Huangbin wrote:
Dear all,
I got a problem with my own Postfix policy server (written in Python).
Postfix usually works fine with it, but sometimes it raised error like
this:
Aug 17 08:32:52 mail1 postfix/smtpd[24298]: warning: problem talking
to server 127.0.0.1:1234: Connection reset by peer
"Connection reset by peer" means exactly what it says. That's your
policy server unilaterally closing an existing connection from a Postfix
smtpd process with PID 24298. It is also possible that your policy
server died in some unexpected way and the OS closed the connection.
Aug 17 08:34:05 mail1 postfix/smtpd[24771]: warning: problem talking
to server 127.0.0.1:1234: Connection timed out
That's a new Postfix smtpd process with PID 24771 trying and failing to
open a new connection to the policy service. How long it will wait for
the connection to succeed is controlled by smtpd_policy_service_timeout.
Your config shows that to be 1000s, so that smtpd process tried to open
that connection 16:40 earlier, i.e. 08:17:25.
Then time Postfix raised these errors, my policy server is still
working and properly processing requests (checked its log file).
If the policy server did not log something at or shortly before 'Aug 17
08:32:52' and possibly even before 08:17:25 then it needs to do more
extensive logging. The Postfix smtpd process with PID 24298 had an open
connection to 127.0.0.1:1234 that was reset at that time. On linux it is
possible for the "OOM Killer" facility to kill a process under extreme
memory pressure and to configure the system to respawn a daemon, so it
can *look* like a daemon stays up because it has been respawned shortly
after being killed. Of course the PID of the policy server would change
in that sort of circumstance.
I don't know how to reproduce this issue, except wait (especially when
server is busy, but randomly). Do you have any idea/hint about how i
can debug this issue? either Postfix side or my policy server side, or
both.
Make the policy server log its actions in detail, at least until you can
figure this out.