Victor, Wietse,
On 2016-12-06 11:16, wie...@porcupine.org wrote:
MRob:
Last few days, I'm seeing large amount of failures in a log file for
domains using protection.outlook.com:
to=<u...@example.com>, relay=none, delay=13190,
delays=13187/0.08/2.2/0,
dsn=4.4.3, status=deferred (Host or domain name not found. Name
service
error for name=example-com.mail.protection.outlook.com type=AAAA: Host
not found, try again)
Do you need IPv6 support? If not, disable it and avoid useless lookups.
No, it was only enabled because that is Postfix default. I have disabled
this to reduce contributing factors. Unfortunately, the issue persists.
These domains do have A records, but some of them can take anywhere
from
.75 of a second to 3 seconds to return a result from DNS lookup (using
dig).
When postfix reports it cannot find AAAA record, can I assume every
time
it retries it also looks for the A record?
If you enable both IPv4 and IPv6, then Postfix must look for both
A and AAAA records. There is no IP protocol field in MX records.
The current Postfix default is to randomize equal-preference A and
AAAA lookups, so I am surprised that the last failUre is always for
AAAA lookups.
This is strange, then, because until I disabled ipv6, the logs for these
problem domains only showed errors looking up AAAA. Only when I disabled
ipv6, I can now see this:
status=deferred (Host or domain name not found. Name service error for
name=example-com.mail.protection.outlook.com type=A: Host not found, try
again)
Is the problem a lookup timeout? Never seen this before the last few
days, so am inclined to think it's mostly their problem, or is there
something I could do?
This could a messed-up DNS resolver anywhere in the path, including
a bad resolv.conf file under /var/spool/postfix/etc, or some
'security' filter that breaks connectivity to some DNS server.
Victor suggested in a mail prior to yours (Victor, please correct me if
I misunderstand) that it could have been due to Microsoft providing ipv6
responses for some domains, but some of those responses being EDNS0,
which our local resolver may not know how to handle. This seemed
plausible, but now that I took out ipv6 and the error continues with A,
I am less certain.
Having removed ipv6 from the question, I get the error I quoted above
even for domains that do resolve using "dig" from the CLI of the same
host. Why would there be that kind of discrepancy?
For me, A and AAAA lookups of example-com.mail.protection.outlook.com
are instantaneous (reply: NXDOMAIN).
Of course I changed the real domain to protect the innocent
(example-com). Is it appropriate to give a real, live example? It may or
may not help, because A is resolving fine with dig, but postfix is
having trouble itself.