Le 26/07/2024 à 19:51, Jens Wahnes a écrit :
Hi everyone,
I'm trying to move from Haproxy 2.4 to 2.8 and encountered some trouble
with 2.8 that did not occur on 2.4.
Specifically, this seems to concern plain HTTP connections only, i.e.
non-HTTPS traffic. I have not seen an example of this happening with
HTTPS connections. However, it could also be that it is specific to
HTTP/1 as opposed to HTTP/2 (we do not have any unencrypted HTTP/2
connections).
Some of the backend connections are not properly closed with Haproxy 2.8
and keep lingering for hours or days. That is, the client connection is
gone, but the backend connection is still there in state CLOSE_WAIT.
This in turn causes trouble with seamless reloads: the old Haproxy
processes never terminate and as several newer ones pile up, they take
up memory.
I have not experienced this with Haproxy 2.4.27. On 2.4.27, these
backend TCP connections were properly closed. It only began to be a
problem with never versions like 2.8.10.
To give an example, a connection stuck in CLOSE_WAIT may look this in
`ss` output:
```
CLOSE-WAIT 0 0 172.16.240.53:48528
172.16.240.86:2000
users:(("haproxy",pid=1099145,fd=239))
```
In Haproxy's own `show sess`, the above example would look like this:
```
0x7efc6b7b5400: [26/Jul/2024:18:24:50.308214] id=237988 proto=tcpv4
source=10.22.113.130:50205
flags=0x53c0a, conn_retries=0, conn_exp=<NEVER> conn_et=0x000
srv_conn=(nil), pend_pos=(nil) waiting=0 epoch=0
frontend=Loadbalancer (id=48 mode=http), listener=http (id=5)
addr=10.210.18.56:80
backend=_std_ (id=3 mode=http) addr=172.16.240.53:48528
server=hermes (id=1) addr=172.16.240.86:2000
task=0x7efc72205ec0 (state=0x00 nice=0 calls=6 rate=0 exp=<NEVER>
tid=1(1/1) age=17m41s)
txn=0x7efc6bf72aa0 flags=0x0 meth=1 status=403 req.st=MSG_CLOSED
rsp.st=MSG_CLOSING req.f=0x44 rsp.f=0x0d
scf=0x7efc7233e8a0 flags=0x00030406 state=CLO
endp=CONN,0x7efc6d080ec0,0x04014a01 sub=0 rex=<NEVER> wex=<NEVER>
h1s=0x7efc6d080ec0 h1s.flg=0x104010 .sd.flg=0x4014a01
.req.state=MSG_DONE .res.state=MSG_DATA
.meth=GET status=403 .sd.flg=0x04014a01 .sc.flg=0x00030406
.sc.app=0x7efc6b7b5400 .subs=(nil)
h1c=0x7efc6d5f5d40 h1c.flg=0x20d00 .sub=0 .ibuf=0@(nil)+0/0
.obuf=0@(nil)+0/0 .task=0x7efc6d004360 .exp=<NEVER>
co0=0x7efc7230f140 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM
target=LISTENER:0x7efc712d5e00
flags=0x801c0300 fd=194 fd.state=1c22 updt=0 fd.tmask=0x2
scb=0x7efc6b7c85e0 flags=0x0003501b state=CLO
endp=CONN,0x7efc6d095bc0,0x05004a01 sub=0 rex=<NEVER> wex=<NEVER>
h1s=0x7efc6d095bc0 h1s.flg=0x4010 .sd.flg=0x5004a01
.req.state=MSG_DONE .res.state=MSG_DONE
.meth=GET status=403 .sd.flg=0x05004a01 .sc.flg=0x0003501b
.sc.app=0x7efc6b7b5400 .subs=(nil)
h1c=0x7efc6d087280 h1c.flg=0x80000100 .sub=0 .ibuf=0@(nil)+0/0
.obuf=0@(nil)+0/0 .task=0x7efc6d052500 .exp=<NEVER>
co1=0x7efc723ede00 ctrl=tcpv4 xprt=RAW mux=H1 data=STRM
target=SERVER:0x7efc71e48000
flags=0x00040300 fd=239 fd.state=1122 updt=0 fd.tmask=0x2
req=0x7efc6b7b5420 (f=0x848000 an=0x40000 pipe=0 tofwd=0 total=113)
an_exp=<NEVER> buf=0x7efc6b7b5428 data=(nil) o=0 p=0 i=0 size=0
htx=0x8c65c0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
res=0x7efc6b7b5470 (f=0xa0240000 an=0x24000000 pipe=2310 tofwd=0
total=25601)
an_exp=<NEVER> buf=0x7efc6b7b5478 data=(nil) o=0 p=0 i=0 size=0
htx=0x8c65c0 flags=0x0 size=0 data=0 used=0 wrap=NO extra=0
```
Issuing a `shutdown session` command does not resolve the issue (e.g.
`shutdown session 0x7efc6b7b5400` in this example). So far, killing the
whole process has been the only way for me to get rid of these leftover
connections.
So my question is: does this ring a bell for anyone? Maybe something
that has been fixed in newer versions (like 3.0)?
Thanks, I will investigate. It is indeed most probably an issue with the
splicing, as Willy said. I will try to find the bug on the 2.8 and figure out if
upper versions are affected too.
--
Christopher Faulet