Christian Rohmann: > > Does this server run in a virtual machine? > Yeah, the Debian Lenny (amd64) runs on VMware ESX 4.1 hosts. The guests > itself are Vmware HW revision 7.
VMware has an entire KB article on problems with delivering timer interrupts to guest machines, and the hoops that they are jumping through to avoid poor performance. See http://tech.groups.yahoo.com/group/postfix-users/message/269786 > > What is the output from "grep fatal" on today's and yesterday's maillog > > file? > None, not a single line. > > > What is the output from "grep watchdog" on all your maillog files? > Same as above -> nothing. > > I guess that rules out this issue here? No, it confirms my suspicion that either a) you run Postfix < 2.4 and do "postfix stop" or "reload" frequently, or b) your virtual timers are broken, or c) you used "grep" on compressed files instead of using "zgrep" or "bzgrep". All Postfix daemons including the master have an alarm(3) timer that aborts the process when it becomes stuck. Normally all processes reset their alarm timer frequently; when they become stuck, they stop resetting their alarm timer. When the timer goes off, it logs a watchdog error and kills the process. > On 10/29/2010 05:43 PM, lst_ho...@kwsoft.de wrote: > > Maybe another instance of this problem? > > http://tech.groups.yahoo.com/group/postfix-users/message/269786 > > Even though at some point postfix stopped at EPOLL_WAIT... That does not look like the problem with "postfix stop" or "reload" with Postfix < 2.4 which sometimes triggers a deadlock in syslog(). So we still have the possibility that your timer support is broken such that even the per-process alarm timer is no longer working. Postfix relies heavily on timer support to enforce sanity. Specifically, Postfix relies on short-term timers (implemented with poll and epoll on Linux) to enforce time limits on read/write operations, and relies on long-term alarm timers to kill off a process that hangs because some short-timer failed to go off. If both layers of safety fail due to broken (virtual) timer support, then it is not possible to run Postfix reliably. Wietse