Hi,

We are experiencing strange intermittent issues when resolving 
outlook.office365.com, but also with other domains like e.g. amazonaws.com or 
snort.org. But let’s choose office365.com as example for now. 
outlook.office365.com is a CNAME to lb.geo.office365.com, and office365.com 
delegates the geo subdomain to different nameservers; 2 of them are showing 
some issues on intodns.com [1] (which may or may not be related to this 
problem). 

When querying one of the office365.com nameservers, it correctly delegates, as 
far as I understand:

# dig a lb.geo.office365.com @ns1.msft.net +noadditional +nostats

; <<>> DiG 9.10.4 <<>> a lb.geo.office365.com @ns1.msft.net +noadditional 
+nostats
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37098
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 5
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;lb.geo.office365.com.          IN      A

;; AUTHORITY SECTION:
geo.office365.com.      300     IN      NS      glb1.glbdns2.microsoft.com.
geo.office365.com.      300     IN      NS      ns1.p21.dynect.net.
geo.office365.com.      300     IN      NS      ns3.p21.dynect.net.
geo.office365.com.      300     IN      NS      ns4.p21.dynect.net.
geo.office365.com.      300     IN      NS      ns2.p21.dynect.net.
geo.office365.com.      300     IN      NS      glb2.glbdns2.microsoft.com.

Still, BIND (sometimes) decides to return SERVFAIL to the client immediately 
after receiving this response. Some interesting debug log lines:

resolver: debug 3: resquery 0x7f26fecc8010 (fctx 
0x7f26fecb4458(lb.geo.office365.com/A)): sent
resolver: debug 3: resquery 0x7f26fecc8010 (fctx 
0x7f26fecb4458(lb.geo.office365.com/A)): response
resolver: debug 10: received packet:
resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): 
noanswer_response
resolver: debug 10: log_ns_ttl: fctx 0x7f26fecb4458: noanswer_response: 
lb.geo.office365.com (in 'office365.com'?): 1 172499
resolver: debug 10: log_ns_ttl: fctx 0x7f26fecb4458: DELEGATION: 
lb.geo.office365.com (in 'geo.office365.com'?): 0 172499
resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): cache_message
resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): [result: 
success] query canceled in response();  responding
resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): cancelquery
resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): nameservers now 
above QDOMAIN
resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): done
resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): stopeverything
resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): cancelqueries
resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): sendevents
client: error: query client=0x7f2700055ca0 thread=0x7f2709813700 
(lb.geo.office365.com/A): query_find: unexpected error after resuming: SERVFAIL
query-errors: debug 1: client 127.0.0.1#35062 (outlook.office365.com): query 
failed (SERVFAIL) for outlook.office365.com/IN/A at query.c:7837

“nameservers now above QDOMAIN” sounds like a geo.office365.com nameserver 
refers back to an office365.com nameserver? The thing is though, I cannot see 
any such response packet in tcpdump. Is this information taken (wrongly) from 
cache then? The same log message appears at all times for any of the failing 
domains we’ve seen so far. 

Note that this doesn’t seem to happen with an empty cache and we are also not 
able to trigger it on a test machine. It only happens on loaded machines once 
the cache TTL of the queried record expires. We can reproduce it with the 
latest patch levels of both 9.10 and 9.9. 

Regards,
Thomas


[1] http://intodns.com/geo.office365.com

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to