As it turned out, the problem was with 1-2-3 Systems' VPS service.
Through this mailing list I found another person who was having the
exact same problem with me with qmgr being killed constantly, who also
happened to be hosted on the same physical box as me at 1-2-3 Systems.
Since we were the only two people that either of us could find anywhere
on the 'net that was having this issue with Postfix, it was pretty clear
that this one thing we both had in common must be the cause. I opened a
ticket with 1-2-3 Systems, and they eventually found and fixed the
problem on their box, which solved it for both of us.
Unfortunately, 1-2-3 Systems refuses to tell me exactly what they found
or how they fixed it, even after I asked them several times (they will
only give me very vague answers such as "it was a problem between the
VPS and host, and we corrected it"), but it was clearly something
outside of the VPS environment that was beyond my control. My
assumption is that they are running paravirtualization and that their
host system had some sort of monitoring script that was routinely
killing postfix related processes within VPSes running on the host.
That's just a guess, but it's the best I have.
In any event it's fixed. Thanks to everyone who replied!
- Jeff
On 12/20/2010 1:22 PM, Jeff Morris wrote:
On 12/18/2010 11:03 PM, Victor Duchovni wrote:
postfix/master[20377]: warning: process /usr/libexec/postfix/qmgr
pid 20380
killed by signal 15
This is SIGTERM. Are you running "postfix stop" frequently?
No. In fact I'm not running it at all. In fact in the interest of
troubleshooting this, I have re-installed my VPS from a clean CentOS
5.5 image, and done *nothing* but "yum erase sendmail", "yum install
postfix", "service postfix start". And I still get the same problem.
And only on this one VPS with 123Systems, not on any of the dozens of
other Postfix mail servers I am responsible for.
Don't "restart" Postfix every 5 minutes.
I'm not.
As I said, the master.cf has "wakeup" set to 300 seconds, but this is
the default setting, not something I modified, and it is the same
setting as all of my other servers (which do not exhibit this
problem.) If it were not there, then I don't believe that qmgr would
run at all, except when a connection comes in on port 25. I haven't
looked at the postfix source code, but it seems like postfix is smart
enough to check for qmgr when a connection comes in, sees that it
isn't running, and spawns it. Likewise, every 5 minutes it's trying
to wake up qmgr, seeing that it's not running, and spawning it. In
other words, postfix is trying it's darndist to keep things running,
but *something* is sending a SIGTERM to qmgr several seconds after it
starts up. And as Wietse mentioned in a separate reply, we can rule
out that it's Postfix which is sending the SIGTERM to qmgr, because if
it were, it would not be logging the warning.
And not only am I running with a clean VPS image, I've even tried
killing everything non-essential, to the point where basically all
that's running on the VPS is init, postfix, and sshd, and yet the
problem persists. There's no cron running, no scripts, no other
deamons, nothing.
Interestingly, I also received one other off-list response to my email
from someone else who is experiencing the exact smae problem. Despite
*hours* of Googling, he is the only other person I've managed to come
across with this same issue, and here's the kicker... he's on a VPS
with 123Systems as well. So there's the commonality. I'm not one to
believe in coincidences, so now I'm pretty much convinced that there
must be something that 123Systems is doing which is causing this.
Either they have some sort of monitoring running on the host which is
somehow sending a SIGTERM to qmgr within the guest, or they have done
something to their default CentOS image which is causing it (althoguh
for the life of me I can't imagine what, since even if I replace the
Postfix config files with the config from my other, working VPS, I
still get this same behavior.)
I have opened a ticket with 123Systems to see if they can shed any
light on this. I'll post a follow-up here when I have anything new to
report.
Thanks.
- Jeff