On Thu, 13 Mar 2025 at 08:23, Willy Tarreau <w...@1wt.eu> wrote: > > Hi Lukas, > > On Tue, Mar 11, 2025 at 03:26:59PM +0000, Lukas Tribus wrote: > > Using both libc and haproxy resolvers can lead to hard to diagnose issues > > when their bevahiour diverges; recommend using only one type of resolver. > > > > Should be backported to stable versions. > > --- > > > > > I think the docs could be updated to reflect this. > > > > That's my option at least, so here is an RFC doc patch for this. > > > > I don't know if others agree; there may be corner cases I'm not thinking > > of. > > I'm thinking that maybe we should soften the language a bit to explain > that the problem is mixing libc with resolvers that do not come from > resolv.conf.
I disagree because there are lots of possible problems with this configuration, using different nameserver is just one of them. > In fact originally when DNS started to be useful, we've seen a lot of > just standard resolv.conf being used, to the point that a new option > "parse-resolv-conf" was added to ease this. > > I think that over time the ecosystem has matured a bit and cleaned up > that mess, leaving users with some servers for the service discovery > and the servers used by the system (if at all). And in my opinion, > the problem arises in this specific case. You are thinking of a case where resolv.conf points to some recursive nameserver, and the haproxy configuration resolver config points to different ones. However DNS is complex and there are *a lot* of behavior differences one can shoot himself in the foot with, other than using different servers. - for libc we can use gethostbyname or getaddrinfo based on how haproxy was build, impacting address family results - resolv.conf does not force udp or tcp, libc decides, and in most but not all libc's a UDP query automatically falls back to TCP - haproxy resolvs explicitly either via UDP or TCP, it is the user responsibility to fix issue and configure fallbacks - haproxy only supports FQDN while libc may search domains (man 5 resolv.conf has lots of options) - handling bigger responses is likely different - EDNS0 handling is likely different - handling of DNS flags is likely different For example case 1: An admin configures private resolvers in TCP mode to avoid issues with bigger response sizes, however unbeknownst to him TCP mode is not available/reachable for unrelated issues. The same name servers are configured in /etc/resolv.conf, so libc is able to resolve the private server IPs without issues, because libc uses UDP before falling back to TCP. How much time and back and forth does this need in a support call, to find out that the haproxy internal resolver never run-time *updated* the server IPs because it never worked in the first place, hidden by the libc resolver which makes everything apparently work, when it would have been immediately obvious if libc resolution was disabled? For example case 2: Lack of FQDN: same as case 1, libc searches a hostname in the local domain, haproxy does not. Again the internal resolver will fail to update server IPs and libc will hide this problem for some time. Even Luke's problem in this case was not really related to the difference results of nameservers, but the distinction between where libc resolving stops and haproxy internal resolving starts. Every subtle difference in behavior can make the difference between a simple and a complex diagnosis, when two different implementations are involved, whether the root cause is an external factor, a local misconfiguration or a bug. > > @@ -18242,13 +18242,19 @@ init-addr {last | libc | none | <ip>},[...]* > > instances on the fly. This option defaults to "last,libc" indicating > > that the > > previous address found in the state file (if any) is used first, > > otherwise > > the libc's resolver is used. This ensures continued compatibility with > > the > > - historic behavior. > > + historic behavior. When using the haproxy resolvers disabling libc based > > + resolution is recommended, also see section 5.3. > > Maybe we could say something along: > > "When haproxy explicitly uses different resolvers than the system's > ones, disabling libc based resolution is highly recommended, also > see section 5.3" > > or any variant ? What do you (and Luke) think ? This is like saying as long as you are pointing to the same name servers dual resolution is fine, which I disagree for the reason mentioned above. > > > + Example 2: > > + defaults > > + # disable libc resolution when using resolvers > > + default-server init-addr last,none > > Then here we could say "when using different resolvers". > > > inter <delay> > > fastinter <delay> > > downinter <delay> > > @@ -19281,7 +19287,8 @@ workload. > > This chapter describes how HAProxy can be configured to process server's > > name > > resolution at run time. > > Whether run time server name resolution has been enable or not, HAProxy > > will > > -carry on doing the first resolution when parsing the configuration. > > +keep trying to resolve names at startup during configuration parsing via > > libc > > +for backwards compatibility. > > "keep trying" makes me think it insists, which is not true because at > the first error it fails to start. However, the libc resolvers are > generally blocking, and can be slow since serialized. Probably that > all of these concepts should be handled to clarify the picture. Yeah, I didn't like "carry on", but it also works without it: > Whether run time server name resolution has been enable or not, HAProxy will > do the first resolution at startup during configuration parsing via libc > for backwards compatibility. > Something along these lines maybe ? > > Unless explicitly disabled via the server "init-addr" keyword, HAProxy > will resolve server addresses on startup using the standard method > provided by the operating system's C library ("libc"). It is important > to understand that while this resolution generally relies on DNS, it > can also involve other mechanisms that are specific to the deployment. > If an address cannot be resolved, the process will stop with an error. > In addition, resolutions are serialized, so that resolving addresses > for 1000 servers will result in 1000 request-response cycles, which > can take quite some time. Also, when DNS servers are unreachable or > unresponsive, the libc can take a very long time before timing out for > each and every server, rendering a startup impractical. Finally, if the > servers are configured to rely on a "resolvers" section that references > different DNS servers, the response from the libc might cause startup > errors, or worse, long delays. For this reason it is important not to > mix libc with other resolvers, and adjust the "init-addr" server setting > according to the desired behavior. > > I'm fine with any other proposal, I just want to be sure that these points > are clarified, because clearly the DNS part in the doc suffers quite a bit > and would deserve a refresh! I really want to drive home the point that this is not *only* about different DNS servers, but also about different resolution behavior when using the same DNS servers, because our code and libc code is not the same. Lukas