I suspect if you tcpdump/wireshark the DNS traffic, you'll find a query 
goes out, and either the response is delayed by 2 seconds, or no response 
is received and your client re-sends the request.

To understand this, inside your pod you'll need to find out what your 
upstream DNS recursive server is. This might be `cat /etc/resolv.conf`, but 
if it's using systemd for resolution it could be `resolvectl status` or 
such like. And then you need to work out what's going on upstream.

You should note that a 2 second delay a few times per day for DNS 
resolution is not unusual. There are lots of reasons. It could be as simple 
as some network packet loss between your k8s server and your DNS recursor 
(since DNS is usually sent over UDP, and UDP does not guarantee delivery). 
Just one lost packet can cause a 1-2 second delay, depending on what the 
client's retransmission policy is.

However, a more likely explanation is this: the record has expired from the 
cache in the DNS recursor. When it next gets a query for this expired name, 
the recursive DNS server needs to locate the upstream authoritative DNS 
servers for that domain. If the one it chooses first is down, it will 
timeout and retry to a different one.  Furthermore, it also needs to 
resolve the *names* of the authoritative servers (from NS records) into 
addresses, and if those have expired, there can be delays with that too. A 
delay of several seconds for all this is quite common.

This is just life: many DNS domains are broken in this way, because people 
don't know how to delegate properly or run their authoritative nameservers 
properly. If you tell us the actual domain you're querying, maybe we can 
identify the problem with the domain - but you'll have to get the domain 
owner to fix it.

As a sticking-plaster over the problem: if you run your own DNS recursor 
with suitable software, then you can get it to refresh the record *before* 
it expires. In powerdns-recursor this is controlled by refresh-on-ttl-perc 
<https://docs.powerdns.com/recursor/settings.html#refresh-on-ttl-perc>. 
Bind calls it "prefetch <https://kb.isc.org/docs/aa-01122>". (Other 
nameserver software may or may not have this feature).

At the end of the day though, DNS issues are not related to the Go 
programming language. 

On Friday, 9 May 2025 at 17:40:57 UTC+1 Cipov Peter wrote:

> Hello Community
>
> I have question regarding native golang DNS lookup as my app is compiled 
> statically (CGO_ENABLED=0). For some reason this solution behaves 
> unpredictable, having sometimes (few times a day) dns lookup >2s. I am 
> using http-trace to get this number. I have tried to look into code whether 
> there is possibility to drill-down to these 2s (no luck right now). My 
> usecase is making quick http call to integration (optimal request total 
> time < 500ms).
>
> using
> GODEBUG="netdns=2"
> CGO_ENABLED=0
> GOOS=linux
> GOARCH=amd64
>
> running in docker debian bookworm as k8s pod
>
> logs:
> go package net: confVal.netCgo = false  netGo = false                     
>                                            go package net: cgo resolver not 
> supported; using Go's DNS resolver      
>
> I have checked the source code I have not seen much tracing information 
> into why sometimes dns spikes occurs. Did I missed some option to get 
> insights why dns lookup takes so long ? I cannot distinguish whether it is 
> waiting for network call or some internal timeout.
>
> Thank you
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/bb46b311-0cdd-4bd7-aa55-5bbbd25f7b7dn%40googlegroups.com.

Reply via email to