Thanks Ian. I've raised #39178 <https://github.com/golang/go/issues/39178>.
On Wednesday, 20 May 2020 00:49:57 UTC+1, Ian Lance Taylor wrote: > > On Tue, May 19, 2020 at 4:27 PM <thereal...@gmail.com <javascript:>> > wrote: > > > > I get DNS lookup timeouts many times a day when running Go HTTP clients > on Kubernetes, where UDP packets are sometimes dropped due to a race > condition in the Linux kernel. I'd like to propose a small standard library > change to help debug these scenarios and get some feedback on the idea and > implementation here before posting it as a Github issue. > > > > For any http.Client use with a timeout set on the transport, when the > timeout is exceeded and an error is returned it can be challenging to > discover the cause of the timeout from the error message. Is it DNS related > or TCP related or something else? Consider the following simulation of a > DNS lookup that does not return within the timeout. > > > > package main > > > > import ( > > "context" > > "net" > > "net/http" > > "time" > > ) > > > > var timeout = 50 * time.Millisecond > > var host = "http://example.com/" > > > > // delayedDialer introduces a delay when performing DNS lookups. > > func delayedDialer(ctx context.Context, network, address string) > (net.Conn, error) { > > time.Sleep(2 * timeout) > > d := net.Dialer{} > > return d.DialContext(ctx, network, address) > > } > > > > func main() { > > r := &net.Resolver{ > > PreferGo: true, > > Dial: delayedDialer, > > } > > tr := &http.Transport{ > > DialContext: (&net.Dialer{ > > Timeout: timeout, > > Resolver: r, > > }).DialContext, > > } > > client := &http.Client{Transport: tr} > > _, err := client.Get(host) > > if err != nil { > > panic(err) > > } > > } > > > > In the event of a DNS lookup timeout, the output is: > > > > panic: Get "http://example.com/": dial tcp: i/o timeout > > > > In the event of a TCP connection timeout (comment out the time.Sleep() > call) the output is: > > > > panic: Get "http://example.com/": dial tcp 93.184.216.34:80: i/o > timeout > > > > The missing IP address in the DNS lookup timeout output is the only clue > that the root cause is a DNS problem, rather than a TCP problem. If users > do not expect to see an IP address in that error message, they won't know > it is missing, and their debugging and testing effort may be wasted > investigating possible TCP-related causes, instead of investigating DNS > (which normally runs over UDP). > > > > The error message could communicate the root cause to users more > clearly. After a DNS lookup timeout, the error returned by client.Get() > looks like this in Go 1.14.3: > > > > &url.Error{ > > Op: "Get", > > URL: "http://example.com/", > > Err: &net.OpError{ > > Op: "dial", > > Net: "tcp", > > Source: nil, > > Addr: nil, > > Err: &poll.TimeoutError{}, > > }, > > } > > > > Note: The &poll.TimeoutError{} will become &net.timeoutError{} in future > releases after commit d422f54. > > > > The resolver in net/lookup.go is the source of the *poll.TimeoutError. > Instead it could return a *net.DNSError error when a lookup times out. > Doing so would provide additional error context for all use cases (not only > http.Client) and may save debugging time. For example client.Get() could > return: > > > > &url.Error{ > > Op: "Get", > > URL: "http://example.com/", > > Err: &net.OpError{ > > Op: "dial", > > Net: "tcp", > > Source: nil, > > Addr: nil, > > Err: &net.DNSError{ > > Err: "i/o timeout", > > Name: "example.com", > > Server: "", > > IsTimeout: true, > > IsTemporary: false, > > IsNotFound: false, > > }, > > }, > > } > > > > This would preserve the existing "i/o timeout" string and the ability to > call err.Timeout(), while adding "lookup example.com: " to the Error() > output and the capability to explicitly check for a *net.DNSError. The > output from my simulator code would also be more explicit: > > > > panic: Get "http://example.com/": dial tcp: lookup example.com: i/o > timeout > > > > The following change to implement this works in my usage and passes the > tests in all.bash. > > > > $ git diff > > diff --git a/src/net/lookup.go b/src/net/lookup.go > > index 5f7119872a..b1a22dd4e6 100644 > > --- a/src/net/lookup.go > > +++ b/src/net/lookup.go > > @@ -314,6 +314,9 @@ func (r *Resolver) lookupIPAddr(ctx context.Context, > network, host string) ([]IP > > }() > > } > > err := mapErr(ctx.Err()) > > + if t, ok := err.(timeout); ok && t.Timeout() { > > + err = &DNSError{Err: err.Error(), Name: host, > IsTimeout: true} > > + } > > if trace != nil && trace.DNSDone != nil { > > trace.DNSDone(nil, false, err) > > } > > > > I can submit a Github issue and pull request if anyone thinks that this > is a good idea and implementation. > > I don't know if that is the right implementation, but what you say > sounds plausible, and I encourage you to open an issue. Thanks. > > Ian > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f5c7b794-d74d-4eb5-8a43-7ae0196946ea%40googlegroups.com.