> On 21. Feb 2025, at 04:39, Rick Macklem <rick.mack...@gmail.com> wrote: > > On Thu, Feb 20, 2025 at 4:28 PM Steve Rikli <s...@genyosha.net> wrote: >> >> On Wed, Feb 19, 2025 at 02:40:15PM -0800, Rick Macklem wrote: >>> >>> The subject line basically describes the problem glebius@ >>> ran into. When doing an NFS mount in /etc/fstab, it failed >>> since the DNS service was not yet working and, as such, >>> the DNS lookup of the server fqdn failed, causing the mount >>> to fail. Note that this behaviour has existed for decades. >>> >>> He feels this is a bug and that mount_nfs(8) should retry >>> getaddrinfo(3) calls until success, instead of failing the >>> mount when the first attempt fails. >>> The problem with just retrying getaddrinfo(3) is that it >>> could retry forever for simple failures like a typo in the >>> server fqdn. >>> I can see several ways this can be handled and would >>> like feedback from others w.r.t. these alternatives. >>> >>> 1) Simply document this case and encourage use of >>> host names in /etc/hosts for NFS servers along with >>> specifying use of file before dns in nsswitch.conf. >>> Doing this results in the mounts working whether or >>> not DNS is working. >>> >>> 2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3) >>> until it succeeds. (I feel this would be a POLA violation, >>> given that the current behaviour has existed for decades >>> and for simple cases where the fqdn will never resolve >>> the behaviour would be to hang at the mount attempt >>> during boot unless "bg" is specified for the /etc/fstab entry.) >>> >>> 3) Add a new NFS mount option "retrydns=<N>", which would enable >>> retries of getaddrinfo(3). This would avoid any POLA violation and >>> would allow for a convenient way to document the behaviour in >>> "man mount_nfs". >>> >>> 4) ??? >>> >>> So, what do you think is the preferred change? >> >> I don't think I would change mount_nfs code behavior for this. >> >> That is, requiring services and daemons etc. to workaround missing, >> misconfigured, slow, or misbehaving nameservice (whether it's DNS, >> /etc/hosts, NIS, whatever) seems like more complexity, possibly not >> effective, and maybe not focused on the right thing. >> >> Now, without meaning to be presumptuous, it may be worth re-examining >> the startup sequence, e.g. to make sure NFS mounts are tried after the >> known dependencies can reasonably be expected to have started, including >> the network, plus local_unbound or bind (if used), possibly others. >> >> After a quick look, I don't see an obvious problem with the sequence, >> but more knowledgeable eyes than mine are welcome. I don't quite follow >> some of the output from rcorder and service -r. >> >>> ps: I looked and the return value from getaddrinfo(3) does not >>> appear to be useful to discern the case of "DNS service not >>> running yet". (I think it replies EAI_FAIL for this case.) >> >> In that area, I'll note FreeBSD rc.d has a "NETWORKING" dependency for >> PROVIDE and REQUIRE, and it's included in scripts like nfsclient, >> mountcritremote et al. However there seems to be no similar dependency >> for something like "NAMESERVICE" (generic, as opposed to "named" >> specifically), and I'm not sure how that might be implemented, even >> assuming it could be useful in a situation like this. >> >> I.e. there are many things to potentially check for "can the system >> resolve hostnames yet", and not all of them involve running a local >> instance of named, unbound, etc. >> >> In general, if I were running into problems with nameservice not being >> available by the time NFS mounts happen, I think I'd start by looking >> into possible nameservice issues, then check out some mechanisms other >> folks have mentioned (fstab IP addresses or late option, rc.conf >> netwait_enable, etc.) rather than coding workarounds into NFS itself. > Well, the patch I have created (it took about 15min) only changes behaviour > if a new "retrydns" option i used. As such, I think it might be useful for > some, > but doesn't change things unless someone uses it. > > I agree with you that I don't think the rc scripts have a way to check REQUIRE > dns working. (I, personally, always put the fqdn for NFS servers in /etc/hosts > and make sure "files" is first in nsswitch.conf, but others argue that is not > feasible for some deployments. (Using IP numbers works for AUTH_SYS, > but not Kerberized mounts.) > > Note that there is already "retrycnt", which specifies retry the mount, > but that retry loop doesn't include getaddrinfo(3) calls. > --> Personally, I do not like always doing retries since I often > type mount commands manually and I'm a terrible typist, so I > often mistype the server's name. > > This reply was mostly a followup on all the good comments and > not just yours. > > Thanks everyone, for your comments, rick >
my 2cents: there is a difference of name service not responding and name not resolving. In first case, it will go to: bg If an initial attempt to contact the server fails, fork off a child to keep trying the mount in the background. Useful for fstab(5), where the file system mount is not critical to multiuser operation. bgnow Like bg, fork off a child to keep trying the mount in the background, but do not attempt to mount in the foreground first. This eliminates a 60+ second timeout when the server is not responding. Useful for speeding up the boot process of a client when the server is likely to be unavailable. This is often the case for interdependent servers such as cross-mounted servers (each of two servers is an NFS client of the other) and for cluster nodes that must boot before the file servers. in second case, its a failure you can not recover from. rgds, toomas