Hi Valters, On Sun, Sep 17, 2023 at 04:25:48AM +0300, Valters Jansons wrote: > But, gRPC is essentially a framework (binary encoding and custom > headers) for object-oriented HTTP/2. My observed issue is an HTTP/2 > processing issue by HAProxy, when the frontend client doesn't send an > END_STREAM flag.
Yep, that's what I understood (i.e. unfinished upload). > When I implement a similar scenario in other languages, the failure > rate is much lower. For example, I pushed a custom Java client to > https://github.com/sigv/grpcopen/tree/main/java that I was testing > with nginx (same HAProxy configuration as my other proof). When this > request succeeds, however, the session state is reported SD--. The > gRPC example to me is simply the most reliable failure, as there is > some timing involved, which I can't seem to yet figure out fully. > > To highlight H2 standard (RFC7540), section 8.1 states "the last frame > in the sequence bears an END_STREAM flag". It is however never > clarified if that's a "MUST" or a "SHOULD". The client could be argued > as non-compliant, assuming it is intended as "MUST". Maybe this is the > reason why the h2spec project does not cover such a scenario ("who > would do such a thing"). > > What is even more important for this issue though, is the last > paragraph of the same section: > > An HTTP response is complete after the server sends -- or the client > receives -- a frame with the END_STREAM flag set (including any > CONTINUATION frames needed to complete a header block). A server can > send a complete response prior to the client sending an entire > request if the response does not depend on any portion of the request > that has not been sent and received. When this is true, a server MAY > request that the client abort transmission of a request without error > by sending a RST_STREAM with an error code of NO_ERROR after sending > a complete response (i.e., a frame with the END_STREAM flag). > Clients MUST NOT discard responses as a result of receiving such a > RST_STREAM, though clients can always discard responses at their > discretion for other reasons. > > To paraphrase, when the backend server sends END_STREAM, the response > is complete and the server is not interested in the request data > anymore. The END_STREAM is a success in the eyes of the H2 spec, and > "clients MUST NOT discard responses as a result" (that is - HAProxy is > expected to send the response to client when END_STREAM). HAProxy > should not treat it as an abrupt close. When the backend server sends > RST_STREAM(NO_ERROR) to abort the connection, HAProxy must simply stop > sending data to the backend server. I agree with this and that's exactly what we're trying to achieve with Christopher's patch. The problem is not just H2 but at the upper layer where we're trying to forward client body to the server and end up on a closed stream. This is an error at this layer (akin to the broken pipe when sending to a closed pipe or socket). The issue here is that when we face this, we immediately interrupt processing and deliver a 502 to the client, regardless of what response might be pending in the response channel. The change Christopher proposed consists in not failing to send if there's anything pending in the response channel. So apparently based on your tests it's still not sufficient, we need to figure out why. We're particularly interested in figuring the root cause instead of just papering over it because generally such issues can also cause truncated responses and similar issues in other rare cases, thus it's important to understand the whole sequence of events. > I understand there is some complexity with the mux, and tieing H1 into > it. But the H2 spec says the exchange is successful, and the observed > half-close is okay. In the mux it is totally OK (unless there's a rare case where we destroy data there but I don't see any case where this can happen). The issue is related to the sequencing of analysers at the upper layer, where we always process requests first and response second. Christopher found that your reproducer was working for him with the patch, but it's very likely timing dependent so we'll need to continue to analyze this. Thanks! Willy

