Hi Lukas,

and first, many thanks for sharing your thoughts and opinions on this.

[ responding to both of your messages at once ]

On Wed, Jul 10, 2024 at 09:30:55PM +0200, Lukas Tribus wrote:
> On Wed, 10 Jul 2024 at 16:39, Willy Tarreau <w...@1wt.eu> wrote:
> >
> > Another change that will need to be backported after some time concerns
> > the handling of default FD limits.
(...)
> I wholeheartedly hate default implicit limits and I also pretty much
> disagree with fd-hard-limit in general, but allow me to quote your own
> post here from github issue #2043 comment
> https://github.com/haproxy/haproxy/issues/2043#issuecomment-1433593837

I don't like having had to deal with such limits for ~23 years now but
the facts is that it's one of the strict, non-bypassable, system-imposed
limits. The problem is that while the vast majority of users don't care
about the number of FDs, this value cannot be changed at runtime and does
have serious implications on RAM usage and even ou ability to accept
connections that we're engaging in processing cleanly. So in any case we
need to respect a limit, and for this we have to compose with what operating
systems are doing.

For decades they would present 1024 soft and more hard, but not that much
(e.g. 4k) and it was needed to start as root to go beyond. Then some OSes
started to expose much higher hard values by default (256k to 1M) so that
it was no longer requird to be root to start a service. During this time,
such limits were more or less tailored around RAM sizing. Now it seems
we're reaching a limit, with extreme values being advertised without any
relation with allocated RAM. I think that containers are part of the cause
of this.

> > we used to have a 2k maxconn limit for a very long time and it was causing
> > much more harm than such an error: the process used to start well and was
> > working perfectly fine until the day there was a big rush on the site and it
> > wouldn't accept more connections than the default limit. I'm not that much
> > tempted by setting new high default limits. We do have some users running
> > with 2+ million concurrent connections, or roughly 5M FDs. That's already 
> > way
> > above what most users would consider an acceptable default limit, and 
> > anything
> > below this could mean that such users wouldn't know about the setting and 
> > could
> > get trapped.
> 
> I disagree that we need to heuristically guess those values like I
> believe I said in the past.

My problem is that such limit *does* exist (and if you look at ulimit -a,
it's one of the rare ones that's never unlimited), we have to apply a
value, with too low one we reject traffic at the worst possible moments
(when there are the most possible witnesses of your site falling down)
and with too high one we cannot start anymore. Limits are imposed to the
process and it needs to work within.

> "But containers ..." should not be an argument to forgo the principle
> of least surprise.

I agree even though they're part of the problem (but no longer the only
one).

> There are ways to push defaults like this out if really needed: with
> default configuration files, like we have in examples/ and like
> distributions provide in their repositories. This default the users
> will then find in the configuration file and can look it up in the
> documentation if they want.

I'm not against encouraging users to find sane limits in the config
files they copy-paste all over the place. Just like I think that if
systemd starts to advertise very large values, we should probably
encourage to ship unit files setting the hard limit to 1M or so (i.e.
the previously implicit hard value presented to the daemon).

> At the very least we need a *stern* configuration warning that we now
> default to 1M fd, although I would personally consider this (lack of
> all fd-hard-limit, ulimit and global maxconn) leading to heuristic
> fd-hard-limit a critical error.

When Valentine, who worked on the patch, talked to me about the problem,
these were among the possibilities we thought about. I initially
disagreed with the error because I considered that having to set yet
another limit to keep your config running is quite a pain (a long time
ago users were really bothered by the relation between ulimit-n and
maxconn). But I was wrong on one point, I forgot that fd-hard-limit only
applies when maxconn/ulimit-n etc are not set, so that wouldn't affect
users who already set their values correctly.

> I also consider backporting this change - even with a configuration
> warning - dangerous.

I know, but we don't decide what distro users start their stable version
on :-/

> So here a few proposals:
> 
> Proposal 1:
> 
> - remove fd-hard-limit as it was a confusing mistake in the first place

No, I disagree with this one. fd-hard-limit *is* useful. It says "if you
don't know what a good value is, stay within system limits and in no case
beyond this". I consider that it adds reliability to configs and will
stop the mess of users forcing absurd maxconn that they don't necessarily
understand the impact. I'm even using it myself not due to resources, just
because it allows haproxy to start faster by not having to initialize all
1M FD entries that I know I'm not going to use.

> - exit with a configuration error when global maxconn is not set

That goes with the above, I disagree with this. This has been our mistake
in the old ages, that maxconn was needed to be edited in configs. It was
also Apache's problem at the 1.3 era where we started to deploy haproxy
everywhere in front of it due to MaxClient being impossible to tune. The
default 150 was way too low even for moderate sites that learned it the
hard way by having the site fail to respond, and restarting with a larger
value the next day would cause swap and OOM making the situation worse.

System limits are present and whenever we can we should follow them
because they're generally adjusted at central places where users expect
to find them. The maxconn is service-specific within system-imposed sizing
constraints. It makes sense that some just want to take whatever the OS
offers.

> - put global maxconn in all example configurations, encourage
> Debian/RH to do the same

What I would really like is to no longer see any maxconn in a regular
configuration because there's no good value and we've seen them copied
over and over. How many times we asked "are you sure you really need
that high a maxconn?" in bug reports (even if it was unrelated to the
problem).

> - document accordingly
> 
> 
> Proposal 2:
> 
> - keep fd-hard-limit
> - exit with a configuration error when fd-hard-limit needs to guess 1M

That's an option I think I can live with even if by default it will
really annoy all users by mandating yet another obscure setting,
especially for developers starting it in foreground on the command
line. In this case we might want to provide a command-line equivalent
argument and suggest it in the error message.

> - put fd-hard-limit in all example configurations, encourage Debian/RH
> to do the same

Instead I'd really encourage them to put the limit into the systemd
unit file I guess, since it's where the change happens in the first
place. But that's something that we need to discuss here with other
users and distro maintainers as well.

> - document accordingly

Agreed on this. Valentine is currently working on explaining the relation
between all of these settings to put in the management doc so that one
does not need to first know the keyword to figure how it relates to
others. I think this will help quite a bit.

> Otherwise the next bug report will be that haproxy OOM's (in
> production and only when encountering load) by default with systems
> with less than 16 GB of RAM. The same bug reporter just needs a VM
> with 8 GB RAM or less.

I'm confused now, I don't see how, given that the change only *lowers*
an existing limit, it never raises it. It's precisely because of the
risk of OOM with OSes switching the default from one million FDs to one
billion that we're proposing to keep the previous limit of 1 million as
a sane upper bound. The only risk I'm seeing would be users discovering
that they cannot accept more than ~500k concurrent connections on a large
system. But I claim that those dealing with such loads *do* careful size
and configure their systems and services (RAM, fd, conntrack, monitoring
tools etc). Thus I'm not sure which scenario you have in mind that this
change could result in such a report as above.

Thanks!
Willy

Reply via email to