Am 05.05.2011 13:19, schrieb Wietse Venema:
> Does this help?
> 
>       Wietse
> 
> smtp_dns_resolver_options (default: empty)
>        DNS Resolver options for the Postfix SMTP client.  Specify zero or more
>        of the following options, separated by  comma  or  whitespace.   Option
>        names  are  case-sensitive. Some options refer to domain names that are
>        specified in the file /etc/resolv.conf or equivalent.
> 
>        res_defnames
>               Append the current domain name to single-component names  (those
>               that do not contain a "." character). This can produce incorrect
>               results, and is the hard-coded behavior prior to Postfix 2.8.
> 
>        res_dnsrch
>               Search for host names  in  the  current  domain  and  in  parent
>               domains. This can produce incorrect results and is therefore not
>               recommended.

Hi Wietse,

res_defnames helps, but defeats the purpose.


Executive summary:

We are facing a massive bug in the GNU glibc 2.11 and eglibc 2.13
resolvers which fails to even attempt a query for a name without dots if
RES_DEFNAMES is unset.  FreeBSD 8.2, DragonflyBSD 2.10 and Solaris 10
are unaffected.

No DNS configuration whatsoever can sidestep the problem.

Bug report filed as <http://sourceware.org/bugzilla/show_bug.cgi?id=12734>.


Proposals:

Recognizing that the libc resolver fix (if ever made) is unlikely to
propagate everywhere needed before hell will have frozen over,

- Postfix could retry a failed query of a bare hostname (one without any
dots) with a dot appended, i. e. if dns_query(... "barehost" ...)
failed, pretend it had been dns_query(... "barehost." ...) and retry.

Alternatively,

- Postfix hostname validation could permit host names with a trailing
dot (for instance, in sender_dependent_relayhost_maps) to the extent
that I can specify [localhost.]:12345 (currently not possible, I'd tried
that yesterday - the libc resolvers get this right on all my systems.
This allows me to manually work around the resolver bug.

  Currently, this doesn't work, postfix/smtp commits suicide after
logging "fatal: valid hostname or network address required in server
description: [localhost.]:1777"


Note: Backporting the latter to earlier Postfix releases that do not
strip RES_DEFNAMES might be beneficial as the trailing dot "anchors" the
domain name search in the root domain, so I can manually prevent
localhost.foo.example from being resolved in lieu of localhost.


Here's the analysis:

In GNU libc, if _res.options does NOT contain the RES_DEFNAMES bit, and
when resolving a name without dots, no query is ever sent, and
res_search returns the initialized values, i. e. -1, with h_errno ==
HOST_NOT_FOUND.  Ouch!

This bug is verifyable with the source code, for GNU libc
see
<http://sourceware.org/git/?p=glibc.git;a=blob;f=resolv/res_query.c;h=5ff352e2fc6056bad92238df1fb0c826f48a2f51;hb=HEAD#l323>

For FreeBSD lib, see lines 371ff. in
<http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/resolv/res_query.c?annotate=1.6;only_with_tag=MAIN>.
 There is code "If the query has not already been tried..." missing from
GNU libc - and I've confirmed with GDB that this code is actually what
saves our resolver's neck.

FreeBSD releases after 2006 don't have that bug (I haven't checked older
versions), and neither do Solaris 10 or DragonflyBSD 2.10. (only
versions I've tried). I haven't bothered to check MacOS or NetBSD.

Now, what can we do? Practically, I doubt we'll get all the resolvers
fixed before the hell freezes over.

Defaulting to res_defnames (even if only on Linux) is clearly wrong too.


Test program trace on FreeBSD (OK)

$ truss ./try-resolv localhost  2>&1 | egrep '^[^()]+$|send|recv'
sendto(4,"\b\t\^A\0\0\^A\0\0\0\0\0\0\tloca"...,27,0x0,NULL,0x0) = 27 (0x1b)
recvfrom(4,"\b\t\M^E\M^@\0\^A\0\^A\0\0\0\0\t"...,512,0x0,{ AF_INET
192.168.0.14:53 },0x7fffffffd38c) = 43 (0x2b)
default _res.options = 800002C1
strip flags= 282
stripped _res.options = 80000041
res search result: 43
process exit, rval = 0


Test program traces on Linux:

$ grep DEFNAMES /usr/include/resolv.h
#define RES_DEFNAMES    0x00000080      /* use default domain name */
#define RES_DEFAULT     (RES_RECURSE|RES_DEFNAMES|RES_DNSRCH|RES_NOIP6DOTINT)

$ strace -e send,recv ./try-resolv localhost
default _res.options = 802C1
strip flags= 282
stripped _res.options = 80041
res search result: -1, h_errno: 1 (Unknown host)

This is bad.  As you can see, no DNS traffic in send/recv in this case.

Now anchor the search as a FQHN (again, on Linux):

$ strace -e send,recv ./try-resolv localhost.
default _res.options = 802C1
strip flags= 282
stripped _res.options = 80041
send(3, "\327\327\1\0\0\1\0\0\0\0\0\0\tlocalhost\0\0\1\0\1", 27,
MSG_NOSIGNAL) = 27
res search result: 43

We get a result, as expected from resolver source code.
Now, leave RES_DEFNAMES set (again, Linux):

$ strace -e send,recv ./try-resolv localhost 0x800c1
default _res.options = 802C1
strip flags= 282
stripped _res.options = 80041
forced _res.options = 800C1
send(3, "\324g\1\0\0\1\0\0\0\0\0\0\tlocalhost\7example\3"..., 43,
MSG_NOSIGNAL) = 43
res search result: 57

We see that we get a different result - the one your Postfix code change
was trying to avoid.


To compile the source below, use either of these:

gcc -O -o try-resolv try-resolv.c -lresolv   # on Linux and Solaris
gcc -O -o try-resolv try-resolv.c            # on FreeBSD/DragonflyBSD

/* try-resolv.c - a program to demonstrate a GNU libc resolver bug
   triggered by stripping RES_DEFSEARCH from _res.options. */
/* (C) 2011 Matthias Andree, MIT license,
   see http://opensource.org/licenses/mit-license */
#include <sys/types.h>
#include <netinet/in.h>
#include <arpa/nameser.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <resolv.h>
#include <netdb.h>

unsigned char buf[512];

void barf(const char *s)
{
    fputs(s, stderr);
    exit(EXIT_FAILURE);
}

int main(int argc, char **argv) {
    int res = res_init();
    long opts;
    long pfix_flags = RES_DEBUG | RES_DNSRCH | RES_DEFNAMES;

    if (argc > 1 && 0 == strcmp(argv[1], "-h")) {
        printf("Usage: %s [<hostname> [<options-in-hex>]]\n", argv[0]);
        exit(EXIT_SUCCESS);
    }

    if (res) barf("res_init() failed.");
    printf("default _res.options = %lX\n", _res.options);
    printf("strip flags= %lX\n", pfix_flags);
    _res.options &= ~pfix_flags;    /* strip flags */
    printf("stripped _res.options = %lX\n", _res.options);
    if (argc > 2 && sscanf(argv[2], "%li", &opts) == 1) {
        _res.options = opts;
        printf("forced _res.options = %lX\n", _res.options);
    }

    res = res_search(argc > 1 ? argv[1] : "localhost", C_IN, T_A, buf,
sizeof buf);
    printf("res search result: %d", res);
    if (res == -1) printf(", h_errno: %d (%s)", h_errno,
hstrerror(h_errno));
    printf("\n");

    exit(EXIT_SUCCESS);
}
/* end of try-resolv.c */

Hope that clears the recent issue.  I suspect that many older posts
(esp. before 2006) were related to domains where localhost.example.org
wasn't defined.

Best regards,
Matthias

Reply via email to