On Wed, Feb 19, 2025 at 02:40:15PM -0800, Rick Macklem wrote:
> 
> The subject line basically describes the problem glebius@
> ran into.  When doing an NFS mount in /etc/fstab, it failed
> since the DNS service was not yet working and, as such,
> the DNS lookup of the server fqdn failed, causing the mount
> to fail. Note that this behaviour has existed for decades.
> 
> He feels this is a bug and that mount_nfs(8) should retry
> getaddrinfo(3) calls until success, instead of failing the
> mount when the first attempt fails.
> The problem with just retrying getaddrinfo(3) is that it
> could retry forever for simple failures like a typo in the
> server fqdn.
> I can see several ways this can be handled and would
> like feedback from others w.r.t. these alternatives.
> 
> 1) Simply document this case and encourage use of
>     host names in /etc/hosts for NFS servers along with
>     specifying use of file before dns in nsswitch.conf.
>      Doing this results in the mounts working whether or
>       not DNS is working.
> 
> 2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3)
>      until it succeeds. (I feel this would be a POLA violation,
>      given that the current behaviour has existed for decades
>      and for simple cases where the fqdn will never resolve
>      the behaviour would be to hang at the mount attempt
>      during boot unless "bg" is specified for the /etc/fstab entry.)
> 
> 3) Add a new NFS mount option "retrydns=<N>", which would enable
>     retries of getaddrinfo(3). This would avoid any POLA violation and
>     would allow for a convenient way to document the behaviour in
>     "man mount_nfs".
> 
> 4) ???
> 
> So, what do you think is the preferred change?

I don't think I would change mount_nfs code behavior for this.

That is, requiring services and daemons etc. to workaround missing,
misconfigured, slow, or misbehaving nameservice (whether it's DNS,
/etc/hosts, NIS, whatever) seems like more complexity, possibly not
effective, and maybe not focused on the right thing.

Now, without meaning to be presumptuous, it may be worth re-examining
the startup sequence, e.g. to make sure NFS mounts are tried after the
known dependencies can reasonably be expected to have started, including
the network, plus local_unbound or bind (if used), possibly others.

After a quick look, I don't see an obvious problem with the sequence,
but more knowledgeable eyes than mine are welcome.  I don't quite follow
some of the output from rcorder and service -r.

> ps: I looked and the return value from getaddrinfo(3) does not
>       appear to be useful to discern the case of "DNS service not
>       running yet". (I think it replies EAI_FAIL for this case.)

In that area, I'll note FreeBSD rc.d has a "NETWORKING" dependency for
PROVIDE and REQUIRE, and it's included in scripts like nfsclient,
mountcritremote et al. However there seems to be no similar dependency
for something like "NAMESERVICE" (generic, as opposed to "named"
specifically), and I'm not sure how that might be implemented, even
assuming it could be useful in a situation like this.

I.e. there are many things to potentially check for "can the system
resolve hostnames yet", and not all of them involve running a local
instance of named, unbound, etc.

In general, if I were running into problems with nameservice not being
available by the time NFS mounts happen, I think I'd start by looking
into possible nameservice issues, then check out some mechanisms other
folks have mentioned (fstab IP addresses or late option, rc.conf
netwait_enable, etc.) rather than coding workarounds into NFS itself.

Reply via email to