Am 05.05.2011 13:19, schrieb Wietse Venema: > Does this help? > > Wietse > > smtp_dns_resolver_options (default: empty) > DNS Resolver options for the Postfix SMTP client. Specify zero or more > of the following options, separated by comma or whitespace. Option > names are case-sensitive. Some options refer to domain names that are > specified in the file /etc/resolv.conf or equivalent. > > res_defnames > Append the current domain name to single-component names (those > that do not contain a "." character). This can produce incorrect > results, and is the hard-coded behavior prior to Postfix 2.8. > > res_dnsrch > Search for host names in the current domain and in parent > domains. This can produce incorrect results and is therefore not > recommended.
Hi Wietse, res_defnames helps, but defeats the purpose. Executive summary: We are facing a massive bug in the GNU glibc 2.11 and eglibc 2.13 resolvers which fails to even attempt a query for a name without dots if RES_DEFNAMES is unset. FreeBSD 8.2, DragonflyBSD 2.10 and Solaris 10 are unaffected. No DNS configuration whatsoever can sidestep the problem. Bug report filed as <http://sourceware.org/bugzilla/show_bug.cgi?id=12734>. Proposals: Recognizing that the libc resolver fix (if ever made) is unlikely to propagate everywhere needed before hell will have frozen over, - Postfix could retry a failed query of a bare hostname (one without any dots) with a dot appended, i. e. if dns_query(... "barehost" ...) failed, pretend it had been dns_query(... "barehost." ...) and retry. Alternatively, - Postfix hostname validation could permit host names with a trailing dot (for instance, in sender_dependent_relayhost_maps) to the extent that I can specify [localhost.]:12345 (currently not possible, I'd tried that yesterday - the libc resolvers get this right on all my systems. This allows me to manually work around the resolver bug. Currently, this doesn't work, postfix/smtp commits suicide after logging "fatal: valid hostname or network address required in server description: [localhost.]:1777" Note: Backporting the latter to earlier Postfix releases that do not strip RES_DEFNAMES might be beneficial as the trailing dot "anchors" the domain name search in the root domain, so I can manually prevent localhost.foo.example from being resolved in lieu of localhost. Here's the analysis: In GNU libc, if _res.options does NOT contain the RES_DEFNAMES bit, and when resolving a name without dots, no query is ever sent, and res_search returns the initialized values, i. e. -1, with h_errno == HOST_NOT_FOUND. Ouch! This bug is verifyable with the source code, for GNU libc see <http://sourceware.org/git/?p=glibc.git;a=blob;f=resolv/res_query.c;h=5ff352e2fc6056bad92238df1fb0c826f48a2f51;hb=HEAD#l323> For FreeBSD lib, see lines 371ff. in <http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/resolv/res_query.c?annotate=1.6;only_with_tag=MAIN>. There is code "If the query has not already been tried..." missing from GNU libc - and I've confirmed with GDB that this code is actually what saves our resolver's neck. FreeBSD releases after 2006 don't have that bug (I haven't checked older versions), and neither do Solaris 10 or DragonflyBSD 2.10. (only versions I've tried). I haven't bothered to check MacOS or NetBSD. Now, what can we do? Practically, I doubt we'll get all the resolvers fixed before the hell freezes over. Defaulting to res_defnames (even if only on Linux) is clearly wrong too. Test program trace on FreeBSD (OK) $ truss ./try-resolv localhost 2>&1 | egrep '^[^()]+$|send|recv' sendto(4,"\b\t\^A\0\0\^A\0\0\0\0\0\0\tloca"...,27,0x0,NULL,0x0) = 27 (0x1b) recvfrom(4,"\b\t\M^E\M^@\0\^A\0\^A\0\0\0\0\t"...,512,0x0,{ AF_INET 192.168.0.14:53 },0x7fffffffd38c) = 43 (0x2b) default _res.options = 800002C1 strip flags= 282 stripped _res.options = 80000041 res search result: 43 process exit, rval = 0 Test program traces on Linux: $ grep DEFNAMES /usr/include/resolv.h #define RES_DEFNAMES 0x00000080 /* use default domain name */ #define RES_DEFAULT (RES_RECURSE|RES_DEFNAMES|RES_DNSRCH|RES_NOIP6DOTINT) $ strace -e send,recv ./try-resolv localhost default _res.options = 802C1 strip flags= 282 stripped _res.options = 80041 res search result: -1, h_errno: 1 (Unknown host) This is bad. As you can see, no DNS traffic in send/recv in this case. Now anchor the search as a FQHN (again, on Linux): $ strace -e send,recv ./try-resolv localhost. default _res.options = 802C1 strip flags= 282 stripped _res.options = 80041 send(3, "\327\327\1\0\0\1\0\0\0\0\0\0\tlocalhost\0\0\1\0\1", 27, MSG_NOSIGNAL) = 27 res search result: 43 We get a result, as expected from resolver source code. Now, leave RES_DEFNAMES set (again, Linux): $ strace -e send,recv ./try-resolv localhost 0x800c1 default _res.options = 802C1 strip flags= 282 stripped _res.options = 80041 forced _res.options = 800C1 send(3, "\324g\1\0\0\1\0\0\0\0\0\0\tlocalhost\7example\3"..., 43, MSG_NOSIGNAL) = 43 res search result: 57 We see that we get a different result - the one your Postfix code change was trying to avoid. To compile the source below, use either of these: gcc -O -o try-resolv try-resolv.c -lresolv # on Linux and Solaris gcc -O -o try-resolv try-resolv.c # on FreeBSD/DragonflyBSD /* try-resolv.c - a program to demonstrate a GNU libc resolver bug triggered by stripping RES_DEFSEARCH from _res.options. */ /* (C) 2011 Matthias Andree, MIT license, see http://opensource.org/licenses/mit-license */ #include <sys/types.h> #include <netinet/in.h> #include <arpa/nameser.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <resolv.h> #include <netdb.h> unsigned char buf[512]; void barf(const char *s) { fputs(s, stderr); exit(EXIT_FAILURE); } int main(int argc, char **argv) { int res = res_init(); long opts; long pfix_flags = RES_DEBUG | RES_DNSRCH | RES_DEFNAMES; if (argc > 1 && 0 == strcmp(argv[1], "-h")) { printf("Usage: %s [<hostname> [<options-in-hex>]]\n", argv[0]); exit(EXIT_SUCCESS); } if (res) barf("res_init() failed."); printf("default _res.options = %lX\n", _res.options); printf("strip flags= %lX\n", pfix_flags); _res.options &= ~pfix_flags; /* strip flags */ printf("stripped _res.options = %lX\n", _res.options); if (argc > 2 && sscanf(argv[2], "%li", &opts) == 1) { _res.options = opts; printf("forced _res.options = %lX\n", _res.options); } res = res_search(argc > 1 ? argv[1] : "localhost", C_IN, T_A, buf, sizeof buf); printf("res search result: %d", res); if (res == -1) printf(", h_errno: %d (%s)", h_errno, hstrerror(h_errno)); printf("\n"); exit(EXIT_SUCCESS); } /* end of try-resolv.c */ Hope that clears the recent issue. I suspect that many older posts (esp. before 2006) were related to domains where localhost.example.org wasn't defined. Best regards, Matthias