Thanks for the replies guys. Looks like we (and our provider) will have
to do a bit of soul-searching wrt idempotent API requests. At least it's
good to see that we're not entirely off the beaten path with what we're
doing :)
On 07.12.20 12:58, Axel Wagner wrote:
We recently had the same issue.
On Mon, Dec 7, 2020 at 11:58 AM Gregor Best <b...@pferdewetten.de
<mailto:b...@pferdewetten.de>> wrote:
Hi!
We're using a 3rd party provider's API to handle some of our customer
requests. Interaction with their API consists of essentially POST'ing
a small XML document to them.
From time to time, `net/http`'s `Client.Do` returns an `io.EOF`
when sending the request. For now, the provider always reported
those instances as "we didn't get your request".
Cursory search in various Github issues and a glance at the source
of `net/http` seems to indicate that `io.EOF` is almost always
caused by the server closing the connection, but the client not
getting the "it's now closed" signal before it tries to re-use the
connection.
That was what I concluded as well. I think it could theoretically also
happen if a new connection is opened and immediately closed by the server.
FWIW, `fasthttp`'s HTTP client implementation treats `io.EOF` as
"this request needs to be retried", but I don't know how much that
knowledge transfers to `net/http`.
I think `fasthttp` is behaving incorrectly - in particular, if it also
does so for POST requests (you mention that you use them). They are, in
general, not idempotent and there is a race where the client sends the
request, the server receives it and starts handling it (causing some
observable side-effects) but then dies before it can send a response,
with the connection being closed by the kernel. If the client retries
that (at a different backend, or once the server got restarted), you
might end up with corrupted state.
AIUI, `net/http` never assumes requests are retriable - even GET
requests - and leaves it up to the application to decide, whether a
request can be retried or not. Our solution was to verify that all our
requests *can* be retried and then wrapping the client call with retries.
Is my interpretation of the situation correct? Or are there other
circumstances where the request _did_ end up at the remote end and
`io.EOF` is returned?
I think in general, you can't distinguish (as a client) whether or not
the server received the message. For example, you can try a TCP server
that immediately closes the connection. The `net/http` client will
report an EOF. Though, to clarify: In this test it didn't return
`io.EOF`, but an `*url.Error` wrapping `io.EOF`. So if you actually get
an unwrapped `io.EOF`, the answer might be different.
I guess what I'm asking is: Is it safe (as in: requests won't end
up on the remote twice or more times) to retry POST requests when
`Client.Do` returns an `io.EOF`?
Note that disabling connection reuse (as was suggested by a number
of stackoverflow posts) is an option that we'd like to avoid unless
there's absolutely no other way to handle this.
If I'm correct in my understanding, even disabling keep-alive won't
really help - though it might reduce the number of these errors
significantly. It will always be possible that the server closes the
connection while a request is in-flight. If that is sufficient, a
middle-ground might be to reduce the keep-alive timeout on the client
(or increase it on the server):
https://golang.org/pkg/net/http/#Transport.IdleConnTimeout
<https://golang.org/pkg/net/http/#Transport.IdleConnTimeout>
In our case, the server was a java server and it used a timeout of 30s,
while the go client defaults to a 90s timeout. If you instead use, say,
a 20s timeout on the go client, you still get most of the performance
benefit of keep-alive, but the client will assume the connection is
useless 10s before the server actually closes it. Not ideal, but it
should significantly improve the situation.
IMO the only real solution though, is to make requests idempotent, e.g.
by adding a unique request ID. It's a lot of work and only really
effective if it's propagated all the way down. But it's still easier to
achieve than exactly-once-delivery :)
--
Gregor Best
b...@pferdewetten.de <mailto:b...@pferdewetten.de>
--
You received this message because you are subscribed to the Google
Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to golang-nuts+unsubscr...@googlegroups.com
<mailto:golang-nuts%2bunsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/a31d42a5-6a81-0579-a380-b268d10f4eb0%40pferdewetten.de
<https://groups.google.com/d/msgid/golang-nuts/a31d42a5-6a81-0579-a380-b268d10f4eb0%40pferdewetten.de>.
--
Gregor Best
b...@pferdewetten.de
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/2d649270-f07f-7091-accc-77669c6feff2%40pferdewetten.de.