Hi Willy, On Thu, 28 Mar 2024 04:37:11 +0100 Willy Tarreau <w...@1wt.eu> wrote:
> Thanks guys! So there seems to be an annoying bug. However I'm not sure > how this is related to your "connection refused", except if you try to > connect at the moment the process crashes and restarts, of course. I'm > seeing that the bug here is stktable_requeue_exp() calling task_queue() > with an invalid task expiration. I'm having a look now. I'll respond in > the issue with what I can find, thanks for your report. These "connection refused" is from our watchdog; but the effects are as perceptible from the outside. When our watchdog hits this situation, it will forcefully restart HAProxy (we have 2 instances) because there will be a considerable service degradation. If you remember, there's https://github.com/haproxy/haproxy/issues/1895 and we talked briefly about this in person, at HAProxyConf. But this is incredibly elusive to reproduce; it comes and goes. It might happen every few minutes, or not happen at all for months. Not tied to a specific setup: different versions, kernels, machines. In fact, we do not have better ways to detect the situation, at least not as fast, reactive, and resilient. > > Since you were speaking about FD count and maxconn at 900k, please let > me take this opportunity for a few extra sanity checks. By default we > assign up to about 50% of the FD to pipes (i.e. up to 25% pipes compared > to connections), so if maxconn is 900k you can reach 1800 + 900 = 2700k > FD. One thing to keep in mind is that /proc/sys/fs/nr_open sets a > per-process hard limit and usually is set to 1M, and that > /proc/sys/fs/file-max sets a system-wide limit and depends on the amount > of RAM, so both may interact with such a large setting. We could for > example imagine that at ~256k connections with as many pipes you're > reaching around 1M FDs and that the connection from socat to the CLI > socket cannot be accepted and is rejected. Since you recently updated > your kernel, it might be worth checking if the default values are still > in line with your usage. We set our defaults pretty high in anticipation: /proc/sys/fs/file-max = 5M; /proc/sys/fs/nr_open = 3M; Even with our software stack, we do not reach the limits. A long time ago we did hit (lower limits back then) and the effects are devastating. Cheers, -- Ricardo Nabinger Sanchez https://www.taghos.com.br/