Le 29/07/2024 à 16:30, Jens Wahnes a écrit :
Christopher Faulet wrote:
Le 29/07/2024 à 09:05, Christopher Faulet a écrit :
Thanks, I will investigate. It is indeed most probably an issue with the
splicing, as Willy said. I will try to find the bug on the 2.8 and
figure out if
upper versions are affected too.
I'm able to reproduce the issue by hacking the code, forcing a
connection error by hand. It occurs when an error is reported on the
connection when haproxy tries to send data using kernel splicing. But it
is only an issue when a filter is attached to the applicative stream. I
guess you have enabled the HTTP compression. The response is not
compressed of course, otherwise the kernel splicing would not be used.
But it is still attached to the stream and it has an effect in this case.
AFAIK, the older versions are not affected. On newer versions, I don't
really know. There is an issue with my hack, but timeouts are still
active and a true client abort is properly detected. So, I'm inclined to
think there is no issue on these versions. But my fix will probably be
applicable too.
I'm on the fix. I must test when this happens on server side, to be
sure. But it should be fixed soon.
Thank you for the update.
My results so far: Everything is fine on 2.8.10 without splicing.
On 3.0.3 with splicing turned on, I have also not seen any lingering
sessions, but I have only been running version 3.0.3 for some hours now,
so this could still happen. I'll rather let it run for some more time
before drawing conclusions.
Thanks for the confirmation. On 3.0, I was unable to reproduce the issue. So I'm
not surprised.
I pushed some patches that should fix your issue. They cannot be applied as-is
on the 2.8. You can use attached patches for the 2.8 if you want to try. It
could help to be sure they properly fix your issue.
--
Christopher Faulet
From 681f19beaaf95872f7e0a5e1442546571268472c Mon Sep 17 00:00:00 2001
From: Christopher Faulet <cfau...@haproxy.com>
Date: Mon, 29 Jul 2024 17:48:16 +0200
Subject: [PATCH 1/3] BUG/MEDIUM: stconn: Report error on SC on send if a
previous SE error was set
When a send on a connection is performed, if a SE error (or a pending error)
was already reported earlier, we leave immediately. No send is performed.
However, we must be sure to report the error at the SC level if necessary.
Indeed, the SE error may have been reported during the zero-copy data
forwarding. So during receive on the opposite side. In that case, we may
have missed the opportunity to report it at the SC level.
The patch must be backported as far as 2.8.
---
src/stconn.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/stconn.c b/src/stconn.c
index 20a1d9b0b2..2df175e7a4 100644
--- a/src/stconn.c
+++ b/src/stconn.c
@@ -1586,6 +1586,8 @@ static int sc_conn_send(struct stconn *sc)
if (sc->state < SC_ST_CON)
return 0;
BUG_ON(sc_ep_test(sc, SE_FL_EOS|SE_FL_ERROR|SE_FL_ERR_PENDING) == (SE_FL_EOS|SE_FL_ERR_PENDING));
+ if (sc_ep_test(sc, SE_FL_ERROR))
+ sc->flags |= SC_FL_ERROR;
return 1;
}
--
2.45.2
From 7e6bf5ea450b034bc090fd662fd5057f428c0f63 Mon Sep 17 00:00:00 2001
From: Christopher Faulet <cfau...@haproxy.com>
Date: Tue, 30 Jul 2024 08:41:03 +0200
Subject: [PATCH 2/3] BUG/MEDIUM: mux-pt/mux-h1: Release the pipe on connection
error on sending path
When data are sent using the kernel splicing, if a connection error
occurred, the pipe must be released. Indeed, in that case, no more data can
be sent and there is no reason to not release the pipe. But it is in fact an
issue for the stream because the channel will appear are not empty. This may
prevent the stream to be released. This happens on 2.8 when a filter is also
attached on it. On 2.9 and upper, it seems there is not issue. But it is
hard to be sure and the current patch remains valid is all cases. On 2.6 and
lower, the code is not the same and, AFAIK, there is no issue.
This patch must be backported to 2.8. However, on 2.8, there is no zero-copy
data forwarding. The patch must be adapted. There is no done_ff/resume_ff
callback functions for muxes. The pipe must released in sc_conn_send() when
an error flag is set on the SE, after the call to snd_pipe callback
function.
---
src/stconn.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/stconn.c b/src/stconn.c
index 2df175e7a4..ebe05bc585 100644
--- a/src/stconn.c
+++ b/src/stconn.c
@@ -1608,7 +1608,7 @@ static int sc_conn_send(struct stconn *sc)
if (ret > 0)
did_send = 1;
- if (!oc->pipe->data) {
+ if (!oc->pipe->data || sc_ep_test(sc, SE_FL_ERROR | SE_FL_ERR_PENDING)) {
put_pipe(oc->pipe);
oc->pipe = NULL;
}
--
2.45.2
From 13e3f3b91a1f7f7da7d8eaad371d501b7c76530f Mon Sep 17 00:00:00 2001
From: Christopher Faulet <cfau...@haproxy.com>
Date: Tue, 30 Jul 2024 09:28:23 +0200
Subject: [PATCH 3/3] BUG/MINOR: stconn: Request to send something to be woken
up when the pipe is full
After a receive via the kernel splicing, if the pipe is full, more room is
requested. But we must be sure something was sent to wake the SC up.
Otherwise, if sends are skipped, because there was a connection error for
instance, some extra wakeups may be experienced. On newer versions, this fix
was included by chance when the zero-copy data forwarding was introduced. On
older versions, the API is not the same when the issue does not exist.
It is 2.8-specific. There is no upstream commit ID. And no backport needed.
---
src/stconn.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/stconn.c b/src/stconn.c
index ebe05bc585..777f47b9d1 100644
--- a/src/stconn.c
+++ b/src/stconn.c
@@ -1305,7 +1305,7 @@ static int sc_conn_recv(struct stconn *sc)
/* the pipe is full or we have read enough data that it
* could soon be full. Let's stop before needing to poll.
*/
- sc_need_room(sc, 0);
+ sc_need_room(sc, -1);
goto done_recv;
}
--
2.45.2