Hi there,

I am running two instances of named on the same server (BIND 9.16.33 on alpine 
3.16). They are running using completely separate config directories, and they 
have separate work directories as well as control ports. Let's call them NS1 
and NS2.

NS1 is a forwarding instance. It listens on any:53 and forwards all requests to 
127.0.0.1:153
NS2 is a normal bind9 instance. It has one zone (test.com), and listens to 
127.0.0.1:153

My understanding is, when NS1 receives a request for "test.com", it will 
initially forward that query to NS2 for resolution, and then cache the result 
in memory for TTL of that record. The next request coming in for "test.com", 
should be served from in-memory cache of NS1, and NS2 should be out of the 
picture.

Based on that, I am running some tests. Initial dump of NS1's memory shows an 
empty cache:
/ # cat /var/cache/ns1/named_dump.db
;
; Start view _default
;
;
; Cache dump of view '_default' (cache _default)
;
; using a 86400 second stale ttl
$DATE 20221129172701
;
; Address database dump

Next, I send an A record request for test.com to NS1, which returns the correct 
result. Dumping the cache:
;
; Start view _default
;
;
; Cache dump of view '_default' (cache _default)
;
; using a 86400 second stale ttl
$DATE 20221129172835
; authanswer
; stale
test.com. 86390 IN A 10.10.10.10
;
; Address database dump

Which shows that the A record is cached by NS1 at this point, and should be 
valid for the next 86390 seconds.
The next test would be to kill NS2, and query the record. Desired outcome would 
be NS1 resolving the query, without the need for NS2.
After killing NS2 however, NS1 fails to resolve the query. Looking at NS1 cache:
;
; Start view _default
;
;
; Cache dump of view '_default' (cache _default)
;
; using a 86400 second stale ttl
$DATE 20221129173157
; authanswer
; stale
test.com. 86188 IN A 10.10.10.10
;
; Address database dump

Which shows me that the cache still exists and is valid. Looking at the logs:
29-Nov-2022 17:31:52.014 serve-stale: info: test.com resolver failure, stale 
answer unavailable
29-Nov-2022 17:31:52.014 query-errors: info: client @0x7feeb7f1b308 
192.168.56.1#59506 (test.com): query failed (SERVFAIL) for test.com/IN/A at 
query.c:5871

which tells me the query fails, because the stale result is unavailable.
in NS1's config, I have:
options {
listen-on port 53 { any; };
listen-on-v6 { none; };

directory "/var/cache/ns1";

recursion yes;
allow-transfer { none; };
allow-query { any; };

forwarders {
127.0.0.1 port 153;
};
forward only;

stale-answer-enable yes;
stale-answer-ttl 300;

dnssec-validation yes;

statistics-file "/var/run/named.ns1.stats";

auth-nxdomain no;
};

Two questions about this situation:
1. Why would the test.com entry in cache be stale, if the TTL has not expired 
yet? The ideal scenario would be for the forwarder not to reach out to NS2 
unless necessary. Am I not understanding the stale record concept correctly?
2. Why is the stale answer not available in this scenario, even though stale 
answers are enabled and the cache exists and is valid? Am I missing some config 
part?

Any help would be appreciated.

Regards
Hamid Maadani
-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list

ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.


bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to