Excellent analysis. Idempotence and exactly once delivery are often glossed 
over and yet it is usually critical to proper system design.

The key for me is to remember that the request can fail at ANY point in the 
flow.

XA transactions can solve this, but most systems these days rely on eventual 
consistency for scalability. 

> On Dec 7, 2020, at 5:59 AM, 'Axel Wagner' via golang-nuts 
> <golang-nuts@googlegroups.com> wrote:
> 
> 
> We recently had the same issue.
> 
>> On Mon, Dec 7, 2020 at 11:58 AM Gregor Best <b...@pferdewetten.de> wrote:
> 
>> Hi!
>> 
>> We're using a 3rd party provider's API to handle some of our customer
>> requests. Interaction with their API consists of essentially POST'ing
>> a small XML document to them.
>> 
>>  From time to time, `net/http`'s `Client.Do` returns an `io.EOF`
>> when sending the request. For now, the provider always reported
>> those instances as "we didn't get your request".
>> 
>> Cursory search in various Github issues and a glance at the source
>> of `net/http` seems to indicate that `io.EOF` is almost always
>> caused by the server closing the connection, but the client not
>> getting the "it's now closed" signal before it tries to re-use the
>> connection.
> 
> That was what I concluded as well. I think it could theoretically also happen 
> if a new connection is opened and immediately closed by the server.
> 
>> FWIW, `fasthttp`'s HTTP client implementation treats `io.EOF` as
>> "this request needs to be retried", but I don't know how much that
>> knowledge transfers to `net/http`.
> 
> I think `fasthttp` is behaving incorrectly - in particular, if it also does 
> so for POST requests (you mention that you use them). They are, in general, 
> not idempotent and there is a race where the client sends the request, the 
> server receives it and starts handling it (causing some observable 
> side-effects) but then dies before it can send a response, with the 
> connection being closed by the kernel. If the client retries that (at a 
> different backend, or once the server got restarted), you might end up with 
> corrupted state.
> 
> AIUI, `net/http` never assumes requests are retriable - even GET requests - 
> and leaves it up to the application to decide, whether a request can be 
> retried or not. Our solution was to verify that all our requests *can* be 
> retried and then wrapping the client call with retries.
> 
>> Is my interpretation of the situation correct? Or are there other
>> circumstances where the request _did_ end up at the remote end and
>> `io.EOF` is returned?
> 
> I think in general, you can't distinguish (as a client) whether or not the 
> server received the message. For example, you can try a TCP server that 
> immediately closes the connection. The `net/http` client will report an EOF. 
> Though, to clarify: In this test it didn't return `io.EOF`, but an 
> `*url.Error` wrapping `io.EOF`. So if you actually get an unwrapped `io.EOF`, 
> the answer might be different.
> 
>> I guess what I'm asking is: Is it safe (as in: requests won't end
>> up on the remote twice or more times) to retry POST requests when
>> `Client.Do` returns an `io.EOF`?
>> 
>> Note that disabling connection reuse (as was suggested by a number
>> of stackoverflow posts) is an option that we'd like to avoid unless
>> there's absolutely no other way to handle this.
> 
> If I'm correct in my understanding, even disabling keep-alive won't really 
> help - though it might reduce the number of these errors significantly. It 
> will always be possible that the server closes the connection while a request 
> is in-flight. If that is sufficient, a middle-ground might be to reduce the 
> keep-alive timeout on the client (or increase it on the server):
> https://golang.org/pkg/net/http/#Transport.IdleConnTimeout
> In our case, the server was a java server and it used a timeout of 30s, while 
> the go client defaults to a 90s timeout. If you instead use, say, a 20s 
> timeout on the go client, you still get most of the performance benefit of 
> keep-alive, but the client will assume the connection is useless 10s before 
> the server actually closes it. Not ideal, but it should significantly improve 
> the situation.
> 
> IMO the only real solution though, is to make requests idempotent, e.g. by 
> adding a unique request ID. It's a lot of work and only really effective if 
> it's propagated all the way down. But it's still easier to achieve than 
> exactly-once-delivery :)
>  
>> 
>> -- 
>>    Gregor Best
>>    b...@pferdewetten.de
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/a31d42a5-6a81-0579-a380-b268d10f4eb0%40pferdewetten.de.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/CAEkBMfHuyMSaeuXw_Zc6N11dfrvJnUVbiLLfWojJ6bpDNNCQYw%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/76320628-A0B2-4D7F-A510-31B62A9557F2%40ix.netcom.com.

Reply via email to