Hi Willy,

On Thu, 28 Mar 2024 04:37:11 +0100
Willy Tarreau <w...@1wt.eu> wrote:

> Thanks guys! So there seems to be an annoying bug. However I'm not sure
> how this is related to your "connection refused", except if you try to
> connect at the moment the process crashes and restarts, of course. I'm
> seeing that the bug here is stktable_requeue_exp() calling task_queue()
> with an invalid task expiration. I'm having a look now. I'll respond in
> the issue with what I can find, thanks for your report.

These "connection refused" is from our watchdog; but the effects are as
perceptible from the outside.  When our watchdog hits this situation,
it will forcefully restart HAProxy (we have 2 instances) because there
will be a considerable service degradation.  If you remember, there's
https://github.com/haproxy/haproxy/issues/1895 and we talked briefly
about this in person, at HAProxyConf.

But this is incredibly elusive to reproduce; it comes and goes.  It
might happen every few minutes, or not happen at all for months.  Not
tied to a specific setup: different versions, kernels, machines.  In
fact, we do not have better ways to detect the situation, at least not
as fast, reactive, and resilient.


> 
> Since you were speaking about FD count and maxconn at 900k, please let
> me take this opportunity for a few extra sanity checks. By default we
> assign up to about 50% of the FD to pipes (i.e. up to 25% pipes compared
> to connections), so if maxconn is 900k you can reach 1800 + 900 = 2700k
> FD. One thing to keep in mind is that /proc/sys/fs/nr_open sets a
> per-process hard limit and usually is set to 1M, and that
> /proc/sys/fs/file-max sets a system-wide limit and depends on the amount
> of RAM, so both may interact with such a large setting. We could for
> example imagine that at ~256k connections with as many pipes you're
> reaching around 1M FDs and that the connection from socat to the CLI
> socket cannot be accepted and is rejected. Since you recently updated
> your kernel, it might be worth checking if the default values are still
> in line with your usage.

We set our defaults pretty high in anticipation:

        /proc/sys/fs/file-max = 5M;
        /proc/sys/fs/nr_open = 3M;

Even with our software stack, we do not reach the limits.  A long time
ago we did hit (lower limits back then) and the effects are devastating.

Cheers,

-- 
Ricardo Nabinger Sanchez             https://www.taghos.com.br/

Reply via email to