>> Is that pulling the old-style stats file, or the HTTP-based stats channel?
As should be evident from my other message, this is using the HTTP-based stats channel. > If the latter... the zone list (and by extension the root > document) seems to take a long time to process, and involves > some sort of locking that blocks all query processing while the > list is being generated. We encountered this on a 3+ million > zone instance.. BIND would stop answering queries for several > minutes if anyone requested the root stats document or the zone > list. Since this name server is approximately a pure recursive resolver, the list of authoritative zones is short, in fact only 3 configured zones ("localhost", "127.in-addr.arpa" and the corresponding for IPv6 loopback), and then there's the "automatic" zones in addition, but still, the halting of query processing while the list of zones is processed should not be an issue here. That said, I'm also rather baffled that BIND would have to stop processing all queries while traversing the zone instances; that certainly seems to have an excessive effect on normal operations. > As Ray says, you may be better off individually querying each > of the other documents and processing those rather than polling > the root doc to get them all in one shot. It's not "me" who is doing the querying, it's the collectd software. In the syscall trace, I see indeed that it is asking for the root document: GET / HTTP/1.1 Host: localhost:8053 User-Agent: collectd/5.7.2 Accept: */* However, your advice to query the separate documents in individual requests would: * require a rewrite of the BIND module in collectd * still not entirely get rid of the problem that some queries are put on hold while the stats channel data is processed and sent Looking at the system call trace shows me that other BIND threads do process DNS queries while this single thread which does the HTTP handling does not. Hence my suggestion to instead use a dedicated thread for the stats / HTTP handling. Oh, BTW, it also seems that BIND in my case wastes 15ms doing needless getsockname() syscalls on FD's which are invalid as part of the early stages of stats processing: 5645 17 named 1504698577.991440645 CALL getsockname(0xffffffff,0x7f7fef1f06e0,0x7f7fef1f069c) 5645 17 named 1504698577.991446511 RET getsockname -1 errno 9 Bad file descriptor (repeated lots of times). Regards, - HÃ¥vard _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users