--On Tuesday, February 24, 2009 9:26 AM -0500 Wietse Venema <wie...@porcupine.org> wrote:

Further investigation tracks this down to something failing with DNS
resolution after a while.  Don't know why, but it does seem to be a
problem  with OS X and catastrophic failure.

Since I don't maintain copies of every Postfix-enabled platform (*)
I will rely on you to provide accurate observations.

        Wietse

(*) I have a couple representaive platforms running in VMware, but
    that is only for testing my own Postfix distribution.

I'm definitely convinced it is an OSX 10.5 bug and not a postfix bug at this point, but hopefully this can help others if they ever run into it. I don't have a solution at this point. Here's more gory details. Two clients have had this occur in different circumstances, but in both cases where OSX was forced to go down uncleanly.

For Client A, it started after they had a power outage. For Client B, it happened after they had a HD failure. I don't know for client B how they recovered the failed HD. In both cases, after the failure, after postfix is running for a while, it starts complaining that it can't do startTLS operations to LDAP. In addition, mail files start showing up in /var/spool/postfix/maildrop. Further investigation revealed that these mail files are being generated by sudo. The same sudo command never generated them prior to the crashes of these servers.

I was finally able to get access to client B's server while the startTLS failures were occurring. At that point I turned up the debuglevel in the LDAP map file it was attempting to use to 7. This resulted in the following being logged:

ldap_connect_to_host: getaddrinfo failed: Temporary failure in name resolution

I then disabled startTLS and verified that connections still failed with the same issue. I.e., startTLS was never the problem (which is good. :P ).

Further examination of the system logs showed that other processes were also having problems resolving the host via DNS:

auth failed: curl_easy_perform: error(6): Couldn't resolve host 'domain.com'

In both cases, the host in question is the local system, which has its correct entries in /etc/hosts, and nslookup, dig, and host commands all worked fine for me as multiple users.

The files being generated by sudo show that it is failing to find users that don't exist in /etc/passwd (which for OSX, is all users except the ones created by apple for system use):

T1235430448 195461Arewrite_context=localFSystem AdministratorSrootMTo: rootN
From: 502N:Subject: *** SECURITY information for domain.com
***NN�domain.com : Feb 23 23:07:28 : 2 : uid 502 does not exist in the
passwd file! ; TTY=unknown ; PWD=unknown ; USER=root ;
COMMAND=/opt/zimbra/libexec/zmmtastatusNXRrootE


Apparently this has bitten other people:

<http://discussions.apple.com/thread.jspa?threadID=1527762&tstart=421>

If we ever get a solution from Apple, I will update further.


It is interesting to note that stopping/restarting postfix resolves the issue for a few hours. Then it will just happen again until it is restarted.

--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Reply via email to