On Mon, Mar 10, 2025 at 03:17:49PM +0700, Luke Seelenbinder wrote:
> >> Our init-addr is `init-addr libc,last,none`. Due to a complex set of 
> >> factors,
> >> using libc to resolve a host can simply hang, instead of fail. When HAProxy
> >> starts up and libc hangs, the startup times out instead of failing with no 
> >> IP
> >> (i.e. `none`).
> > 
> > Ah OK makes sense. That said, if a regular server does not resolve,
> > normally it doesn't boot. You mean that here it still boots with no
> > address ?
> 
> No, that's the problem; we'd prefer HAProxy to have no IP for a server at
> boot than to fail to boot, but since libc just hangs instead of failing, it
> doesn't finish booting, and the startup times out. This ends up creating a
> service restart cycle.

Ah got it now!

> This means the docs are slightly misleading, however? Since `init-addr
> libc,last,none` results in a hang if libc just hangs vs failing.

Sure but we don't know that libc fails until it responds :-/ The fallback
here corresponds to the cases where it says "I can't resolve that one",
which is not the case in your situation.

> I don't know
> if a hard or configurable timeout on the underlying call makes sense?

I really have no idea if it's doable at all, to be honest. A function
is called and the program is blocked for all this time. I'm not sure
this is cancellable. I can imagine that those relying on a database for
example can have difficulties stopping a pending request :-/

> >> Is there a way to set the timeout for a libc address resolution? We may be
> >> able to drop `libc` in the init-addr list entirely due to a generally 
> >> better
> >> setup now, but it's useful in some cases.
> > 
> > I'm not aware of any way to tune the libc's resolver, though if there
> > is, it will be libc-specific, and even specific to the backend used by
> > the libc. What could be done, however, could indeed be to use a plain
> > IP address, but passing it via an environment variable in the global
> > section (or sourced from another file). This may be easier to handle
> > than hard-coding IP addresses. E.g:
> > 
> >    global
> >        setenv NS1_ADDR   tcp@10.11.12.1:5353
> >        setenv NS2_ADDR   tcp@10.11.12.2:5353
> > 
> >    resolvers
> >        nameserver ns1 "$NS1_ADDR"
> >        nameserver ns2 "$NS2_ADDR"
> 
> Fair enough. I figured that would be the answer... In that case, I think the
> best option for us is to remove `libc` entirely, and make sure our failure
> modes are good enough if it comes up without a backend for a few seconds.

OK. Initially I wanted us to support direct access to resolvers via
resolv.conf, but this would be a chicken-and-egg problem as it would
either require to have the polling loop already running, or to have
a duplicate code part just for that. Another approach could be to have
a "hosts" mode that relies on /etc/hosts that we'd parse ourselves
maybe, but I have not seen any demand for this so I doubt there's much
sympathy for this :-/

Willy


Reply via email to