On Mon, Mar 10, 2025 at 03:17:49PM +0700, Luke Seelenbinder wrote: > >> Our init-addr is `init-addr libc,last,none`. Due to a complex set of > >> factors, > >> using libc to resolve a host can simply hang, instead of fail. When HAProxy > >> starts up and libc hangs, the startup times out instead of failing with no > >> IP > >> (i.e. `none`). > > > > Ah OK makes sense. That said, if a regular server does not resolve, > > normally it doesn't boot. You mean that here it still boots with no > > address ? > > No, that's the problem; we'd prefer HAProxy to have no IP for a server at > boot than to fail to boot, but since libc just hangs instead of failing, it > doesn't finish booting, and the startup times out. This ends up creating a > service restart cycle.
Ah got it now! > This means the docs are slightly misleading, however? Since `init-addr > libc,last,none` results in a hang if libc just hangs vs failing. Sure but we don't know that libc fails until it responds :-/ The fallback here corresponds to the cases where it says "I can't resolve that one", which is not the case in your situation. > I don't know > if a hard or configurable timeout on the underlying call makes sense? I really have no idea if it's doable at all, to be honest. A function is called and the program is blocked for all this time. I'm not sure this is cancellable. I can imagine that those relying on a database for example can have difficulties stopping a pending request :-/ > >> Is there a way to set the timeout for a libc address resolution? We may be > >> able to drop `libc` in the init-addr list entirely due to a generally > >> better > >> setup now, but it's useful in some cases. > > > > I'm not aware of any way to tune the libc's resolver, though if there > > is, it will be libc-specific, and even specific to the backend used by > > the libc. What could be done, however, could indeed be to use a plain > > IP address, but passing it via an environment variable in the global > > section (or sourced from another file). This may be easier to handle > > than hard-coding IP addresses. E.g: > > > > global > > setenv NS1_ADDR tcp@10.11.12.1:5353 > > setenv NS2_ADDR tcp@10.11.12.2:5353 > > > > resolvers > > nameserver ns1 "$NS1_ADDR" > > nameserver ns2 "$NS2_ADDR" > > Fair enough. I figured that would be the answer... In that case, I think the > best option for us is to remove `libc` entirely, and make sure our failure > modes are good enough if it comes up without a backend for a few seconds. OK. Initially I wanted us to support direct access to resolvers via resolv.conf, but this would be a chicken-and-egg problem as it would either require to have the polling loop already running, or to have a duplicate code part just for that. Another approach could be to have a "hosts" mode that relies on /etc/hosts that we'd parse ourselves maybe, but I have not seen any demand for this so I doubt there's much sympathy for this :-/ Willy