Hey again, On 10/29/2010 07:23 PM, Wietse Venema wrote: > The main loop in the master is as follows: > > forever { > set an alarm for 1000s > do an EPOLL_WAIT for up to 500s and handle any child process > events, or short-term timer requests that are implemented > around the EPOLL_WAIT timer. > respond to sighup (the sighup flag is set by a signal handler) > respond to sigchld (the sigchld flag is set by a signal handler) > }
Just now one machine had the issue again. I checked and saw that we where down to just two smtpd processes and even though master was still bound to port 25 no new connections where accepted. I did telnet to it, but the connection was not accepted and ran into timeout. How does the timer issue relate to the master process not accepting anymore TCP/IP connections on port 25? > It would be worthwhile to see what strace reports when you leave > it running. If strace reports nothing in 500s then EPOLL_WAIT is > not working. If strace reports nothing after 1000s then the alarm > timer is also not working. I'll try to gather you some strace data. I guess the strace should be of the master? Could you give me a hint on what options you might want? On 10/29/2010 07:04 PM, Wietse Venema wrote: > VMware has an entire KB article on problems with delivering timer > interrupts to guest machines, and the hoops that they are jumping > through to avoid poor performance. See > http://tech.groups.yahoo.com/group/postfix-users/message/269786 Thanks for the hint, I already printed that article to read over the weekend. Thanks for your help, Christian