We've got a pool of servers running postfix. Each server is running bind to cache DNS queries. We are running into an issue where DNS queries are intermittently failing (beyond scope for this discussion). When this happens multiple times consecutively postfix starts queueing ALL mail that would go to this destination for exactly 5 minutes.
For example: bind, with query logging turned on, shows several of these logs: Oct 19 11:53:12 hkglppfpool4 named[206415]: client @0x7f32b806b440 127.0.0.1#53827 (cluster9out.us.messagelabs.com): query failed (SERVFAIL) for cluster9out.us.messagelabs.com/IN/A at ../../../bin/named/query.c:8580 At the same time Postfix logs: Oct 19 11:53:12 hkglppfpool4 postfix/smtp[131030]: 4MspyQ3Fm6z511Sx: to=<tengyilian1428...@126.com>, relay=none, delay=10, delays=0.14/0/10/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=cluster9out.us.messagelabs.com type=A: Host not found, try again) When this happens postfix starts deferring ALL mail that should be delivered to cluster9out.us.messagelabs.com for exactly 300 seconds. The named query logs show no queries for this hostname for those 5 minutes, Postfix is not even trying the lookup any more. After the 5 minutes are up, new messages routing to cluster9out.us.messagelabs.com are delivered without being deferred and the queued messages begin to go out. Testing shows that the DNS issue is very short term, lasting for 1 second or so. However the pool of servers can handle a large number of messages in a short time period. The particular combination of events amplifies the short term DNS issue to messages queueing for 5 minutes. We've seen the queues get up over 1000 messages before the 5 minutes are up. Above is just one example. We're seeing these delivery delays going to several different host. The correct solution is to fix the underlying DNS issue. However until then we'd like to mitigate the consequences. Are there configuration options that will a) adjust the number of DNS failures before postfix starts deferring the messages b) adjust the timeout before postfix stops queueing messages Thanks, Eric Wilkison