Thanks Ian. I've raised #39178 <https://github.com/golang/go/issues/39178>.

On Wednesday, 20 May 2020 00:49:57 UTC+1, Ian Lance Taylor wrote:
>
> On Tue, May 19, 2020 at 4:27 PM <thereal...@gmail.com <javascript:>> 
> wrote: 
> > 
> > I get DNS lookup timeouts many times a day when running Go HTTP clients 
> on Kubernetes, where UDP packets are sometimes dropped due to a race 
> condition in the Linux kernel. I'd like to propose a small standard library 
> change to help debug these scenarios and get some feedback on the idea and 
> implementation here before posting it as a Github issue. 
> > 
> > For any http.Client use with a timeout set on the transport, when the 
> timeout is exceeded and an error is returned it can be challenging to 
> discover the cause of the timeout from the error message. Is it DNS related 
> or TCP related or something else? Consider the following simulation of a 
> DNS lookup that does not return within the timeout. 
> > 
> > package main 
> > 
> > import ( 
> >     "context" 
> >     "net" 
> >     "net/http" 
> >     "time" 
> > ) 
> > 
> > var timeout = 50 * time.Millisecond 
> > var host = "http://example.com/"; 
> > 
> > // delayedDialer introduces a delay when performing DNS lookups. 
> > func delayedDialer(ctx context.Context, network, address string) 
> (net.Conn, error) { 
> >     time.Sleep(2 * timeout) 
> >     d := net.Dialer{} 
> >     return d.DialContext(ctx, network, address) 
> > } 
> > 
> > func main() { 
> >     r := &net.Resolver{ 
> >         PreferGo: true, 
> >         Dial:     delayedDialer, 
> >     } 
> >     tr := &http.Transport{ 
> >         DialContext: (&net.Dialer{ 
> >             Timeout:  timeout, 
> >             Resolver: r, 
> >         }).DialContext, 
> >     } 
> >     client := &http.Client{Transport: tr} 
> >     _, err := client.Get(host) 
> >     if err != nil { 
> >         panic(err) 
> >     } 
> > } 
> > 
> > In the event of a DNS lookup timeout, the output is: 
> > 
> > panic: Get "http://example.com/": dial tcp: i/o timeout 
> > 
> > In the event of a TCP connection timeout (comment out the time.Sleep() 
> call) the output is: 
> > 
> > panic: Get "http://example.com/": dial tcp 93.184.216.34:80: i/o 
> timeout 
> > 
> > The missing IP address in the DNS lookup timeout output is the only clue 
> that the root cause is a DNS problem, rather than a TCP problem. If users 
> do not expect to see an IP address in that error message, they won't know 
> it is missing, and their debugging and testing effort may be wasted 
> investigating possible TCP-related causes, instead of investigating DNS 
> (which normally runs over UDP). 
> > 
> > The error message could communicate the root cause to users more 
> clearly. After a DNS lookup timeout, the error returned by client.Get() 
> looks like this in Go 1.14.3: 
> > 
> > &url.Error{ 
> >   Op:  "Get", 
> >   URL: "http://example.com/";, 
> >   Err: &net.OpError{ 
> >     Op:     "dial", 
> >     Net:    "tcp", 
> >     Source: nil, 
> >     Addr:   nil, 
> >     Err:    &poll.TimeoutError{}, 
> >   }, 
> > } 
> > 
> > Note: The &poll.TimeoutError{} will become &net.timeoutError{} in future 
> releases after commit d422f54. 
> > 
> > The resolver in net/lookup.go is the source of the *poll.TimeoutError. 
> Instead it could return a *net.DNSError error when a lookup times out. 
> Doing so would provide additional error context for all use cases (not only 
> http.Client) and may save debugging time. For example client.Get() could 
> return: 
> > 
> > &url.Error{ 
> >   Op:  "Get", 
> >   URL: "http://example.com/";, 
> >   Err: &net.OpError{ 
> >     Op:     "dial", 
> >     Net:    "tcp", 
> >     Source: nil, 
> >     Addr:   nil, 
> >     Err:    &net.DNSError{ 
> >       Err:         "i/o timeout", 
> >       Name:        "example.com", 
> >       Server:      "", 
> >       IsTimeout:   true, 
> >       IsTemporary: false, 
> >       IsNotFound:  false, 
> >     }, 
> >   }, 
> > } 
> > 
> > This would preserve the existing "i/o timeout" string and the ability to 
> call err.Timeout(), while adding "lookup example.com: " to the Error() 
> output and the capability to explicitly check for a *net.DNSError. The 
> output from my simulator code would also be more explicit: 
> > 
> > panic: Get "http://example.com/": dial tcp: lookup example.com: i/o 
> timeout 
> > 
> > The following change to implement this works in my usage and passes the 
> tests in all.bash. 
> > 
> > $ git diff 
> > diff --git a/src/net/lookup.go b/src/net/lookup.go 
> > index 5f7119872a..b1a22dd4e6 100644 
> > --- a/src/net/lookup.go 
> > +++ b/src/net/lookup.go 
> > @@ -314,6 +314,9 @@ func (r *Resolver) lookupIPAddr(ctx context.Context, 
> network, host string) ([]IP 
> >                         }() 
> >                 } 
> >                 err := mapErr(ctx.Err()) 
> > +               if t, ok := err.(timeout); ok && t.Timeout() { 
> > +                       err = &DNSError{Err: err.Error(), Name: host, 
> IsTimeout: true} 
> > +               } 
> >                 if trace != nil && trace.DNSDone != nil { 
> >                         trace.DNSDone(nil, false, err) 
> >                 } 
> > 
> > I can submit a Github issue and pull request if anyone thinks that this 
> is a good idea and implementation. 
>
> I don't know if that is the right implementation, but what you say 
> sounds plausible, and I encourage you to open an issue.  Thanks. 
>
> Ian 
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/f5c7b794-d74d-4eb5-8a43-7ae0196946ea%40googlegroups.com.

Reply via email to