On Wed, Nov 16, 2016 at 11:15:35PM +0100, Walter Doekes wrote:

> this week we stumbled upon an issue where we could not send mail to certain
> domains, for instance em...@umcg.nl.
> 
> Nov 16 17:04:08 mail postfix/smtp[13330]: warning:
>     no MX host for umcg.nl has a valid address record
> Nov 16 17:04:08 mail postfix/smtp[13330]: 1D1D21422C2:
>     to=<em...@umcg.nl>, relay=none, delay=2257,
>     delays=2256/0.02/0.52/0, dsn=4.4.3, status=deferred
>     (Host or domain name not found. Name service error
>     for name=umcg-nl.mail.protection.outlook.com type=A:
>     Host not found, try again)
> 
> It turned out that this was the cause:
> 
>   $ dig MX umcg.nl +short
>   10 umcg-nl.mail.protection.outlook.com.
> 
>   $ dig NS mail.protection.outlook.com. +short
>   ns1-proddns.glbdns.o365filtering.com.
>   ns2-proddns.glbdns.o365filtering.com.
> 
>   $ dig A umcg-nl.mail.protection.outlook.com.  \
>       @ns1-proddns.glbdns.o365filtering.com. +edns +dnssec |
>     grep FORMERR
>   ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 46904
>   ;; WARNING: EDNS query returned status FORMERR -
>       retry with '+nodnssec +noedns'

I can't reproduce your observations using unbound as the local
resolver:


    $ dig +dnssec +ad +noall +comment +cmd +qu +ans +auth +nocl +nottl \
        -t a umcg-nl.mail.protection.outlook.com

    ; <<>> DiG 9.10.4-P2 <<>> +dnssec +ad +noall +comment +cmd +qu +ans +auth 
+nocl +nottl -t a umcg-nl.mail.protection.outlook.com
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10562
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags: do; udp: 4096
    ;; QUESTION SECTION:
    ;umcg-nl.mail.protection.outlook.com. IN        A

    ;; ANSWER SECTION:
    umcg-nl.mail.protection.outlook.com. A 213.199.154.23
    umcg-nl.mail.protection.outlook.com. A 213.199.154.87

Postfix will not directly query the remote nameserver, and in indeed
with DANE you're supposed to be configured to *only* query the
local resolver.  What resolver is that?  And how is it configured?

Once the A records come back insecure (AD=0), Postfix will not
query for TLSA records.

> Apparently some Microsoft Office 365 mail servers do not support EDNS and
> return FORMERR. This propagated through our DNS recursors as SERVFAIL and
> caused the lookup to fail.

FORMERR is the expected/standard respose in this case, and your
resolver is expected to fall back to non-EDNS queries.

> Some more digging revealed that EDNS was enabled on the query through
> `smtp_addr_list`:
> 
>      else if (smtp_tls_insecure_mx_policy > TLS_LEV_MAY)
>         res_opt = RES_USE_DNSSEC;

That setting affects communication between Postfix and the local
resolver, it does control the options on the next hop query.

> The USE_DNSSEC causes the subsequent queries to use USE_EDNS0 with the DO
> flag and that killed our interoperability with the Microsoft Office 365 DNS.

This analysis is flawed.  Your resolver is not supposed to
unconditionally use EDNS upstream just because the local client is
using EDNS.

> - Apart from Microsoft upgrading their servers to 2016 and supporting EDNS,
> is this issue something postfix should handle?

The problem is your resolver.

> - Would postfix have handled FORMERR but not SERVFAIL and are my caching
> resolvers to blame?

The latter.

> - Should postfix retry the query without EDNS on unexpected errors?

No.

-- 
        Viktor.

Reply via email to