On 09/12/2020 07:30, Kiran Kumar via Pdns-users wrote:
How do we minimize answers-slow, We are running on CentOS Linux release 7.9.2009 (Core)
on VM with 4VCPUs and 16GB RAM.

rec_control get-all | grep answer
*answers-slow    80903*
answers0-1      598471
answers1-10     1057756
answers10-100   2342082
answers100-1000 1341675

For explanation see: https://docs.powerdns.com/recursor/metrics.html#gathered-information

answers-slow is queries answered after more than 1 second, and in your case represent 1.5% of answers, except you've not shown packetcache-hits so the fraction of client queries affected will likely be far less than that.

In resolving a given query, the recursor is going to have to contact one or more authoritative nameservers on the Internet. These are some reasons why it might take more than 1 second to get the final answer:

- the answer is not already in cache (obviously) - this happens more frequently if there is low TTL in the authoritative server for that domain; AND - the first authoritative server tried is down (or transient network problem to that server), so pdns times out and tries another one; OR - multiple authoritative servers need to be contacted, with a large round-trip time to each; OR - the client is querying for a domain which is completely lame / broken and cannot find any answer.

This doesn't necessarily indicate a problem with your own pdns server at all.  It could just as well be problems with some authoritative domains on the Internet. Heaven knows there are plenty of broken domains out there :-)

It could however be made worse by packet loss or congestion on your network or your network's upstream link.  If your recursor is on a private IP address behind a NAT, it would be better to put it on a public IP address, so that it doesn't have to generate NAT state for every outbound query it makes.  If your uplink is congested, which will cause latency and packet loss, then there's not much you can do short of buying more bandwidth.

It could be made worse by excessive load on your server causing it to fall behind or drop queries, or insufficient RAM causing it to kick out cache entries prematurely, so you should also use a suitable tool to monitor your server resource utilisation (netdata <https://github.com/netdata/netdata> is very good for this, monitoring at 1-second resolution by default so lets you see short bursts of activity).  However, your server may be completely fine.

For comparison, here's the tiny cache on my home network:

root@cache1:~# rec_control get-all | egrep '^(answers|packetcache-hits|over-capacity-drops|policy-drops)'
answers-slow    348
answers0-1    6118
answers1-10    7149
answers10-100    9074
answers100-1000    4695
over-capacity-drops    0
packetcache-hits    1983665
policy-drops    0

and here's a production DNS cache in a data centre:

root@wrn-dns1:~# rec_control get-all | egrep '^(answers|packetcache-hits|over-capacity-drops|policy-drops)'
answers-slow    1710185
answers0-1    40045388
answers1-10    132638392
answers10-100    101328465
answers100-1000    11033827
over-capacity-drops    0
packetcache-hits    8907014600
policy-drops    0

The fraction of answers-slow out of answersXXXX is not hugely different from what you see. Also notice that packetcache-hits is far higher again.

Regards,

Brian.

_______________________________________________
Pdns-users mailing list
Pdns-users@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/pdns-users

Reply via email to