Hi, I've encountered a situation where HAProxy does not fail over from a server it has marked as DOWN to a backup server it has marked as UP. I have managed to reproduce this consistently in a test environment, here is the (I hope) relevant configuration
defaults http_defaults_1 mode http hash-type consistent hash-balance-factor 150 maxconn 4096 http-reuse safe option abortonclose option allbackups timeout check 2s timeout connect 15s timeout client 30s timeout server 30s backend server-backend from http_defaults_1 balance roundrobin # We keep at the default because retrying anything else risks duplicating events on these servers retry-on conn-failure server server-006 fd8a...0006:8080 maxconn 32 check inter 250 alpn h2 server server-012 fd8a...0012:8080 maxconn 32 check backup alpn h2 This is with HAproxy 3.0.3-95a607c running on a VPS with 16GB RAM (we have seen the same issue on a dedicated server with 64GB though), which is running Ubuntu 24.04.1 with the default net.ipv4.tcp_retries2 = 15, net.ipv4.tcp_syn_retries = 6, and tcp_fin_timeout of 60s (these also apply to IPv6 connections). CPU usage is under 20%. Once I have a small load running (20 req/sec), if I make the 8080 port on server-006 temporarily unavailable by restarting the service on it, HAProxy logs the transition of server-006 to DOWN (and the stats socket and server_check_failure metrics show the same) and server-012 picks up requests as expected, with no 5xx errors recorded. However if I instead kill the server-006 machine (so that a TCP health check to it with `nc` fails with a timeout rather than a connection refused), the server is marked as DOWN as before, but all requests coming in the HAProxy for that backend return a 5xx error to the client after 15s (the timeout connect) and server-012 does not receive any requests despite showing as UP in the stats socket. This "not failed over" state of 100% 5xx errors goes on for minutes, sometimes hours, and how long seems to depend on the load. Reducing the load to a few requests a minute avoids the issue (and dropping the load when it is in the "not failed over" state also fixes the issue). I would have expected the <=32 in flight requests to have been redispatched to 012 as soon as 006 was marked down, and the other <=4096-32 requests to have been held in the frontend queue until the backend ones were finished, but understandably things get more complicated when you consider timeouts. The haproxy_backend_current_queue metric which is normally at 0 with that load will jump up to around 200 during this issue, and drop straight down to 0 again when it resolves, there is no draining period. Looking at `ss -6tnp` during the issue there are only SYN-SENT messages to both servers, no ESTAB until the issue resolves. I tried dropping net.ipv4.tcp_retries2 down to 3 and tcp_syn_retries to 2, however this made no difference. Hoping someone can point me in the right direction to understand what is happening here. Thanks, Miles