--On Tuesday, February 24, 2009 9:26 AM -0500 Wietse Venema
<wie...@porcupine.org> wrote:
Further investigation tracks this down to something failing with DNS
resolution after a while. Don't know why, but it does seem to be a
problem with OS X and catastrophic failure.
Since I don't maintain copies of every Postfix-enabled platform (*)
I will rely on you to provide accurate observations.
Wietse
(*) I have a couple representaive platforms running in VMware, but
that is only for testing my own Postfix distribution.
I'm definitely convinced it is an OSX 10.5 bug and not a postfix bug at
this point, but hopefully this can help others if they ever run into it. I
don't have a solution at this point. Here's more gory details. Two
clients have had this occur in different circumstances, but in both cases
where OSX was forced to go down uncleanly.
For Client A, it started after they had a power outage. For Client B, it
happened after they had a HD failure. I don't know for client B how they
recovered the failed HD. In both cases, after the failure, after postfix
is running for a while, it starts complaining that it can't do startTLS
operations to LDAP. In addition, mail files start showing up in
/var/spool/postfix/maildrop. Further investigation revealed that these
mail files are being generated by sudo. The same sudo command never
generated them prior to the crashes of these servers.
I was finally able to get access to client B's server while the startTLS
failures were occurring. At that point I turned up the debuglevel in the
LDAP map file it was attempting to use to 7. This resulted in the
following being logged:
ldap_connect_to_host: getaddrinfo failed: Temporary failure in name
resolution
I then disabled startTLS and verified that connections still failed with
the same issue. I.e., startTLS was never the problem (which is good. :P ).
Further examination of the system logs showed that other processes were
also having problems resolving the host via DNS:
auth failed: curl_easy_perform: error(6): Couldn't resolve host 'domain.com'
In both cases, the host in question is the local system, which has its
correct entries in /etc/hosts, and nslookup, dig, and host commands all
worked fine for me as multiple users.
The files being generated by sudo show that it is failing to find users
that don't exist in /etc/passwd (which for OSX, is all users except the
ones created by apple for system use):
T1235430448 195461Arewrite_context=localFSystem AdministratorSrootMTo:
rootN
From: 502N:Subject: *** SECURITY information for domain.com
***NN�domain.com : Feb 23 23:07:28 : 2 : uid 502 does not exist in the
passwd file! ; TTY=unknown ; PWD=unknown ; USER=root ;
COMMAND=/opt/zimbra/libexec/zmmtastatusNXRrootE
Apparently this has bitten other people:
<http://discussions.apple.com/thread.jspa?threadID=1527762&tstart=421>
If we ever get a solution from Apple, I will update further.
It is interesting to note that stopping/restarting postfix resolves the
issue for a few hours. Then it will just happen again until it is
restarted.
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration