Re: [ANNOUNCE] haproxy-3.1-dev3 (more infos on the story with fd-hard-limit and systemd)

Willy Tarreau Wed, 17 Jul 2024 02:26:09 -0700

Hi Lukas,

On Tue, Jul 16, 2024 at 11:28:12PM +0200, Lukas Tribus wrote:
> Hi Valentine, hi Willy,
> 
> after spending some time testing I agree tuning maxconn/fd-limits is hard ...

In fact we know that it's hard for experts, and it's even harder for
new users.

> With 8GB RAM we can still OOM with 1M FDs / 500k maxconn (no TLS), but
> it appears to be around the sweetspot.

It happens to be the previous common limit, with plenty of distros
offering 1M FDs by default. For me it's already large, I would be
fine with a much lower value, but lowering a limit could possibly
break some use cases, and given that basically nobody complained
about OOM in regular usages recently, it was sort of a confirmation
that the default 1M that was applied for quite some time was OK for
most users.

> It thought it would require more memory considering that we suggest
> 1GB of memory for 20k non-TLS connections or 8k TLS connections, but
> my test was indeed synthetic with zero features used, and it's not
> only about haproxy userspace but the system as well.

Yes, it's really a whole. Its even more difficult nowadays because
it depends if data stay stuck in system buffers or not. Haproxy
aggressively recycles its buffers, and since 3.0 it even avoids
reading data if some are stuck downstream so it tries to be very
memory efficient, but we know that memory usage can increase a lot
with SSL and congested links.

> lukas@dev:~/haproxy$ git grep -B3 -A1 "GB of RAM"
(...)

I think we might be doing better than before on this but I don't
want to encourage users to put more connections on low memory. RAM
is so cheap that the smallest VM you can find has 2GB of RAM for a
single CPU core, so better encourage them to have enough RAM to
evacuate the data rather than risking OOM.

I would *love* to find a reasonable way to autosize all this based
on available RAM, but there's no portable solution to this, which
means that whatever hack we'll do would still require a fallback
anyway.

> > What I would really like is to no longer see any maxconn in a regular
> > configuration because there's no good value and we've seen them copied
> > over and over.
> 
> By setting a global maxconn you force yourself to at least think about it.

I totally agree with that. However our experience (particularly from
discussing appliance sizing with potential customers) is that nobody
has any idea of their maxconn, nor even within an order of magnitude.
Most users usually know more or less what their bandwidth is, sometimes
number of visitors per day, and that's about all. We really need to
consider that those who post here and on github are the exception and
not the norm. Another factor that comes into play is how connections
are accounted. Those coming from other LBs acting at a lower layer
(e.g. LVS) count the total number of connections tracked in the LB's
table, which includes TIME_WAIT, which can represent 90-99% depending
on the workloads! That's how you hear that port 80 in charge of the
redirect of a web site working exclusively in SSL, that gets 1000
conn/s requires 60000 concurrent connections while the reality is
rather less than 100!

> I'm assuming by "regular configuration" you mean small scale/size? In
> this case I agree.

Yes that was it. The vast majority of users have way less than even
100 req/s on average and technical parameters like maxconn are the
least of their concerns.

> > I'm confused now, I don't see how, given that the change only *lowers*
> > an existing limit, it never raises it. It's precisely because of the
> > risk of OOM with OSes switching the default from one million FDs to one
> > billion that we're proposing to keep the previous limit of 1 million as
> > a sane upper bound. The only risk I'm seeing would be users discovering
> > that they cannot accept more than ~500k concurrent connections on a large
> > system. But I claim that those dealing with such loads *do* careful size
> > and configure their systems and services (RAM, fd, conntrack, monitoring
> > tools etc). Thus I'm not sure which scenario you have in mind that this
> > change could result in such a report as above.
> 
> True, I confused memory required for initialization with memory
> allocated when actually used.

OK no pb!

> On Thu, 11 Jul 2024 at 14:44, Willy Tarreau <w...@1wt.eu> wrote:
> >
> > My take on this limit is that most users should not care. Those dealing
> > with high loads have to do their homework and are used to doing this,
> > and those deploying in extremely small environments are also used to
> > adjusting limits (even sometimes rebuilding with specific options), and
> > I'm fine with leaving a bit of work for both extremities.
> 
> Considering how non-trivial tuning maxconn/fd-hard-limit/ulimit for a
> specific memory size and configuration is, I (now) have to agree.

OK ;-)

> On Tue, 16 Jul 2024 at 16:22, Valentine Krasnobaeva
> <vkrasnoba...@haproxy.com> wrote:
> >
> > It is obscure for some users 'fd-hard-limit'. And a lot of them
> > may ask: "What is the best value, according to my environment,
> > which I should put here ?", "What will be the impact?"
> 
> This is completely true and why I prefer users think about maxconn
> instead, but like I said, even that is hard.

I think the correct approach (at least the message we deliver to users)
should be:
  - if you don't know your sizing, the defaults should generally be OK

  - if you're dealing with an extremely small system, then please
    calculate run some rough calculations on the number of connections
    you can fit in the small amount of memory, and set maxconn to that
    value to protect the system

  - if you're dealing with an extremely large system, then please set
    maxconn to the number of concurrent connections you want to deal
    with, and adjust your memory sizing accordingly

  - in any case, observe stats and resource consumption, and after some
    time and experience gathered on the system's behavior, please do
    set reasonable limits that correspond to your workload and sizing.

At this point, do you (or anyone else) still have any objection against
backporting the DEFAULT_MAXFD patch so as to preserve the current
defaults for users, and/or do you have any alternate proposal, or just
want to discuss other possibilities ?

Thanks!
Willy

Re: [ANNOUNCE] haproxy-3.1-dev3 (more infos on the story with fd-hard-limit and systemd)

Reply via email to