It would be difficult, and possibly impossible, to continue to process queries and format a report on queries simultaneously without losing information in the report. To have a separate thread creating the report, it might have to stop query processing, take a snapshot of the data at that point in time, save it somewhere, restart query processing, and then format the report from the saved data. In this case, there would be a brief interval when name could not handle queries. One might have to write a prototype to determine how long that interruption would take.
Charles Elliott -----Original Message----- From: bind-users [mailto:bind-users-boun...@lists.isc.org] On Behalf Of Havard Eidnes Sent: Wednesday, September 6, 2017 8:40 AM To: m...@conundrum.com Cc: bind-us...@isc.org Subject: Re: Strange recursor response time pattern >> Is that pulling the old-style stats file, or the HTTP-based stats channel? As should be evident from my other message, this is using the HTTP-based stats channel. > If the latter... the zone list (and by extension the root > document) seems to take a long time to process, and involves some sort > of locking that blocks all query processing while the list is being > generated. We encountered this on a 3+ million zone instance.. BIND > would stop answering queries for several minutes if anyone requested > the root stats document or the zone list. Since this name server is approximately a pure recursive resolver, the list of authoritative zones is short, in fact only 3 configured zones ("localhost", "127.in-addr.arpa" and the corresponding for IPv6 loopback), and then there's the "automatic" zones in addition, but still, the halting of query processing while the list of zones is processed should not be an issue here. That said, I'm also rather baffled that BIND would have to stop processing all queries while traversing the zone instances; that certainly seems to have an excessive effect on normal operations. > As Ray says, you may be better off individually querying each of the > other documents and processing those rather than polling the root doc > to get them all in one shot. It's not "me" who is doing the querying, it's the collectd software. In the syscall trace, I see indeed that it is asking for the root document: GET / HTTP/1.1 Host: localhost:8053 User-Agent: collectd/5.7.2 Accept: */* However, your advice to query the separate documents in individual requests would: * require a rewrite of the BIND module in collectd * still not entirely get rid of the problem that some queries are put on hold while the stats channel data is processed and sent Looking at the system call trace shows me that other BIND threads do process DNS queries while this single thread which does the HTTP handling does not. Hence my suggestion to instead use a dedicated thread for the stats / HTTP handling. Oh, BTW, it also seems that BIND in my case wastes 15ms doing needless getsockname() syscalls on FD's which are invalid as part of the early stages of stats processing: 5645 17 named 1504698577.991440645 CALL getsockname(0xffffffff,0x7f7fef1f06e0,0x7f7fef1f069c) 5645 17 named 1504698577.991446511 RET getsockname -1 errno 9 Bad file descriptor (repeated lots of times). Regards, - HÃ¥vard _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users