Hi Mark, I may have found another (possibly related?) bug:
I noticed that when validating a signed zone using delv by querying a local BIND caching server (v9.10.3-P4), it sometimes suddenly alerts "no valid RRSIG”. Indeed, when querying “dig ds mydomain +dnssec", it returns the DS records, but no RRSIG at all. The following sequence of commands (output simplified) makes me think this might be related to prefetch/cache expiry as well (prefetch value 2): $ while true; do dig ds mydomain; sleep 1; done ;; ANSWER SECTION: mydomain. 3 IN DS […] mydomain. 3 IN DS […] mydomain. 3 IN RRSIG DS […] ;; ANSWER SECTION: mydomain. 3600 IN DS […] mydomain. 3600 IN DS […] mydomain. 2 IN RRSIG DS […] ;; ANSWER SECTION: mydomain. 3599 IN DS […] mydomain. 3599 IN DS […] mydomain. 1 IN RRSIG DS […] ;; ANSWER SECTION: mydomain. 3598 IN DS […] mydomain. 3598 IN DS […] mydomain. 0 IN RRSIG DS […] ;; ANSWER SECTION: mydomain. 3597 IN DS […] mydomain. 3597 IN DS […] What’s your take on this? Regards, Thomas > On 20.06.2016, at 08:39, Mark Andrews <ma...@isc.org> wrote: > > > A fix for this is in review and should be in the next maintainance > release. > > Mark > > In message <16a2cdfd-694d-444a-a760-17c9d7517...@open.ch>, Thomas Sturm > writes: >> >> I am now able to reliably reproduce the behaviour with dig querying BIND >> 9.10.4-P1 (not 9.9, apparently) with "prefetch 0”: >> >> $ while true; do dig outlook.office365.com +noauthority +noadditional >> +tries=1 +retry=0; sleep 0.1; done >> >> Wait for 5 minutes, once the TTL expires, this should show about 5-7 >> SERVFAIL responses. >> >> prefetch 1 or 2 makes it harder to reproduce and it only happens >> (sometimes) on loaded systems. prefetch 10 makes it go away. >> >> It never happens after restarting or flushing the cache. And it never >> happens when querying x seconds _after_ the TTL expired. Could there be >> an issue processing cached client requests during cache expiry, and since >> it only happens on 9.10, potentially related to prefetching? >> >> >> >>> On 16.06.2016, at 10:00, Thomas Sturm <t...@open.ch> wrote: >>> >>> Hi, >>> >>> We are experiencing strange intermittent issues when resolving >> outlook.office365.com, but also with other domains like e.g. >> amazonaws.com or snort.org. But let’s choose office365.com as example for >> now. outlook.office365.com is a CNAME to lb.geo.office365.com, and >> office365.com delegates the geo subdomain to different nameservers; 2 of >> them are showing some issues on intodns.com [1] (which may or may not be >> related to this problem). >>> >>> When querying one of the office365.com nameservers, it correctly >> delegates, as far as I understand: >>> >>> # dig a lb.geo.office365.com @ns1.msft.net +noadditional +nostats >>> >>> ; <<>> DiG 9.10.4 <<>> a lb.geo.office365.com @ns1.msft.net >> +noadditional +nostats >>> ;; global options: +cmd >>> ;; Got answer: >>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37098 >>> ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 5 >>> ;; WARNING: recursion requested but not available >>> >>> ;; OPT PSEUDOSECTION: >>> ; EDNS: version: 0, flags:; udp: 4000 >>> ;; QUESTION SECTION: >>> ;lb.geo.office365.com. IN A >>> >>> ;; AUTHORITY SECTION: >>> geo.office365.com. 300 IN NS >> glb1.glbdns2.microsoft.com. >>> geo.office365.com. 300 IN NS ns1.p21.dynect.net. >>> geo.office365.com. 300 IN NS ns3.p21.dynect.net. >>> geo.office365.com. 300 IN NS ns4.p21.dynect.net. >>> geo.office365.com. 300 IN NS ns2.p21.dynect.net. >>> geo.office365.com. 300 IN NS >> glb2.glbdns2.microsoft.com. >>> >>> Still, BIND (sometimes) decides to return SERVFAIL to the client >> immediately after receiving this response. Some interesting debug log >> lines: >>> >>> resolver: debug 3: resquery 0x7f26fecc8010 (fctx >> 0x7f26fecb4458(lb.geo.office365.com/A)): sent >>> resolver: debug 3: resquery 0x7f26fecc8010 (fctx >> 0x7f26fecb4458(lb.geo.office365.com/A)): response >>> resolver: debug 10: received packet: >>> resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): >> noanswer_response >>> resolver: debug 10: log_ns_ttl: fctx 0x7f26fecb4458: noanswer_response: >> lb.geo.office365.com (in 'office365.com'?): 1 172499 >>> resolver: debug 10: log_ns_ttl: fctx 0x7f26fecb4458: DELEGATION: >> lb.geo.office365.com (in 'geo.office365.com'?): 0 172499 >>> resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): >> cache_message >>> resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): >> [result: success] query canceled in response(); responding >>> resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): >> cancelquery >>> resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): >> nameservers now above QDOMAIN >>> resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): done >>> resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): >> stopeverything >>> resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): >> cancelqueries >>> resolver: debug 3: fctx 0x7f26fecb4458(lb.geo.office365.com/A): >> sendevents >>> client: error: query client=0x7f2700055ca0 thread=0x7f2709813700 >> (lb.geo.office365.com/A): query_find: unexpected error after resuming: >> SERVFAIL >>> query-errors: debug 1: client 127.0.0.1#35062 (outlook.office365.com): >> query failed (SERVFAIL) for outlook.office365.com/IN/A at query.c:7837 >>> >>> “nameservers now above QDOMAIN” sounds like a geo.office365.com >> nameserver refers back to an office365.com nameserver? The thing is >> though, I cannot see any such response packet in tcpdump. Is this >> information taken (wrongly) from cache then? The same log message appears >> at all times for any of the failing domains we’ve seen so far. >>> >>> Note that this doesn’t seem to happen with an empty cache and we are >> also not able to trigger it on a test machine. It only happens on loaded >> machines once the cache TTL of the queried record expires. We can >> reproduce it with the latest patch levels of both 9.10 and 9.9. >>> >>> Regards, >>> Thomas >>> >>> >>> [1] >> http://intodns.com/geo.office365.com______________________________________ >> _________ >>> Please visit https://lists.isc.org/mailman/listinfo/bind-users to >> unsubscribe from this list >>> >>> bind-users mailing list >>> bind-users@lists.isc.org >>> https://lists.isc.org/mailman/listinfo/bind-users >> >> >> -- >> thomas sturm >> principal engineer >> >> open systems ag >> raeffelstrasse 29 >> ch-8045 zurich >> t: +41 58 100 10 10 >> f: +41 58 100 10 11 >> >> t...@open.ch >> >> http://www.open.ch >> >> > > -- > Mark Andrews, ISC > 1 Seymour St., Dundas Valley, NSW 2117, Australia > PHONE: +61 2 9871 4742 INTERNET: ma...@isc.org -- thomas sturm principal engineer open systems ag raeffelstrasse 29 ch-8045 zurich t: +41 58 100 10 10 f: +41 58 100 10 11 t...@open.ch http://www.open.ch
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users