On Tue, Mar 21, 2023 at 02:26:03PM +0100, Maciej Zdeb wrote: > wt., 21 mar 2023 o 11:39 Willy Tarreau <w...@1wt.eu> napisal(a): > > > Just to be clear on these last few points, when you say you cannot > > connect, you mean in fact that the connection establishes to haproxy > > but your request cannot reach the server, right ? 503 will indeed > > indicate a failure to find a server or a connection that died in the > > queue. > > Correct!. TCP connection from client to haproxy is established and haproxy > returns 503 (termination state sQ--).
OK. > > A test could be useful, to force the LB algorithm to something determinist > > (e.g. > > "balance source"). > > > I will check if it is possible to balance by source. This frontend serves > around 8000 rps and I'm not sure if changing "ratio" algorithm to "balance > source" won't cause any troubles (with customers behind huge NAT). Yeah I definitely understand. Or maybe you can divert a small ratio of them to a purposely created backend from the frontend ? Rest assured I'm not encouraging you to degrade your config, just trying to find ideas to isolate the issue. > Sample log of request that was made AFTER the traffic switch to another > data-center. > > { > "_source": { > "status_code": "503", > "time_queue": "1001", <- timeout queue 1s > "time_sess_tot": "1012", > "ssl_cipher": "TLS_AES_256_GCM_SHA384", > "memb_conc_conn": "0", <- %sc > (server concurrent connections) This one is normal since no server was selected ("<NOSRV>" below) due to a non-determinist algo and no cookie, hence the first available server will be picked when an entry gets dequeued. What you *seem* to be missing in these logs is "%bc" and "%bq" to see the state of the backend itself. My suspicion is that %bq is not zero despite %bc being zero. I don't see why, but this part is extremely tricky, and maybe we have a tiny race there with some entries never being dequeued (though the timeout should take care of them normally). Willy