Hi Maciej, On Tue, Mar 21, 2023 at 11:14:48AM +0100, Maciej Zdeb wrote: > Hi, > > I'm observing a strange issue with haproxy 2.4.22 (but it was also on > previous versions). > > I have set maxconn to 200000 in global and defaults configuration section > and with following configuration > > frontend front > mode http > option http-keep-alive > > bind 10.0.0.10:443 ssl crt /etc/cert/crt.pem alpn h2,http/1.1 process 1/1 > bind 10.0.0.10:443 ssl crt /etc/cert/crt.pem alpn h2,http/1.1 process 1/2 > ... > bind 10.0.0.10:443 ssl crt /etc/cert/crt.pem alpn h2,http/1.1 process 1/20 > default_backend back > > backend back > option http-keep-alive > mode http > http-reuse always > option httpchk GET /health HTTP/1.0\r\nHost:\ ttt.local > http-check expect string OK > timeout queue 1s > default-server maxconn 2000 > > default-server resolve-prefer ipv4 resolvers default-dns > server slot_0_checker 10.0.0.82:31011 check weight 0 disabled cookie > slot_0_checker > server slot_1_checker 10.0.0.236:31011 check weight 0 disabled cookie > slot_1_checker > server slot_0_0 10.0.0.82:31011 source ${SNAT_741_0} track slot_0_checker > weight 50 disabled cookie slot_0_0 > server slot_1_0 10.0.0.236:31011 source ${SNAT_741_0} track > slot_1_checker weight 51 disabled cookie slot_1_0 > > I'm experiencing a situation in which clients cannot connect (termination > state CQ-- or sQ--) which is expected when the traffic is high (maxconn > 2000 on each server) and HAProxy is using much more CPU. However after such > an event when traffic is lower or when I cut off the traffic completely > (cpu idle is almost 100%) I still cannot connect to the proxy. Termination > state is still sQ-- and I receive 503 in response while the stats page and > CLI reports that there are no connections to proxy.
Just to be clear on these last few points, when you say you cannot connect, you mean in fact that the connection establishes to haproxy but your request cannot reach the server, right ? 503 will indeed indicate a failure to find a server or a connection that died in the queue. Does your stats page indicate that for the servers or the backend there are still connections in the queue ? A test could be useful, to force the LB algorithm to something determinist (e.g. "balance source"). If, for any reason there's an issue with the backend's queue showing remaining entries, subsequent requests will go directly to the queue as well. With determinist algorithms there isn't this shortcut since we don't know upfront where to go, so that could give us a hint about what is blocking. > Am I missing something or is it a bug? At first glance from your description it sounds like a bug. Your queue timeout is small enough that it should be possible to eliminate all pending requests and try to connect to the servers again. At least this should be reflected in the stats page, and from what you're saying it doesn't even seem to be the case. Willy