Re: RFC: mount_nfs failure due to dns not running yet

Rick Macklem Thu, 20 Feb 2025 19:58:51 -0800

On Thu, Feb 20, 2025 at 4:28 PM Steve Rikli <s...@genyosha.net> wrote:
>
> On Wed, Feb 19, 2025 at 02:40:15PM -0800, Rick Macklem wrote:
> >
> > The subject line basically describes the problem glebius@
> > ran into.  When doing an NFS mount in /etc/fstab, it failed
> > since the DNS service was not yet working and, as such,
> > the DNS lookup of the server fqdn failed, causing the mount
> > to fail. Note that this behaviour has existed for decades.
> >
> > He feels this is a bug and that mount_nfs(8) should retry
> > getaddrinfo(3) calls until success, instead of failing the
> > mount when the first attempt fails.
> > The problem with just retrying getaddrinfo(3) is that it
> > could retry forever for simple failures like a typo in the
> > server fqdn.
> > I can see several ways this can be handled and would
> > like feedback from others w.r.t. these alternatives.
> >
> > 1) Simply document this case and encourage use of
> >     host names in /etc/hosts for NFS servers along with
> >     specifying use of file before dns in nsswitch.conf.
> >      Doing this results in the mounts working whether or
> >       not DNS is working.
> >
> > 2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3)
> >      until it succeeds. (I feel this would be a POLA violation,
> >      given that the current behaviour has existed for decades
> >      and for simple cases where the fqdn will never resolve
> >      the behaviour would be to hang at the mount attempt
> >      during boot unless "bg" is specified for the /etc/fstab entry.)
> >
> > 3) Add a new NFS mount option "retrydns=<N>", which would enable
> >     retries of getaddrinfo(3). This would avoid any POLA violation and
> >     would allow for a convenient way to document the behaviour in
> >     "man mount_nfs".
> >
> > 4) ???
> >
> > So, what do you think is the preferred change?
>
> I don't think I would change mount_nfs code behavior for this.
>
> That is, requiring services and daemons etc. to workaround missing,
> misconfigured, slow, or misbehaving nameservice (whether it's DNS,
> /etc/hosts, NIS, whatever) seems like more complexity, possibly not
> effective, and maybe not focused on the right thing.
>
> Now, without meaning to be presumptuous, it may be worth re-examining
> the startup sequence, e.g. to make sure NFS mounts are tried after the
> known dependencies can reasonably be expected to have started, including
> the network, plus local_unbound or bind (if used), possibly others.
>
> After a quick look, I don't see an obvious problem with the sequence,
> but more knowledgeable eyes than mine are welcome.  I don't quite follow
> some of the output from rcorder and service -r.
>
> > ps: I looked and the return value from getaddrinfo(3) does not
> >       appear to be useful to discern the case of "DNS service not
> >       running yet". (I think it replies EAI_FAIL for this case.)
>
> In that area, I'll note FreeBSD rc.d has a "NETWORKING" dependency for
> PROVIDE and REQUIRE, and it's included in scripts like nfsclient,
> mountcritremote et al. However there seems to be no similar dependency
> for something like "NAMESERVICE" (generic, as opposed to "named"
> specifically), and I'm not sure how that might be implemented, even
> assuming it could be useful in a situation like this.
>
> I.e. there are many things to potentially check for "can the system
> resolve hostnames yet", and not all of them involve running a local
> instance of named, unbound, etc.
>
> In general, if I were running into problems with nameservice not being
> available by the time NFS mounts happen, I think I'd start by looking
> into possible nameservice issues, then check out some mechanisms other
> folks have mentioned (fstab IP addresses or late option, rc.conf
> netwait_enable, etc.) rather than coding workarounds into NFS itself.
Well, the patch I have created (it took about 15min) only changes behaviour
if a new "retrydns" option i used. As such, I think it might be useful for some,
but doesn't change things unless someone uses it.


I agree with you that I don't think the rc scripts have a way to check REQUIRE
dns working. (I, personally, always put the fqdn for NFS servers in /etc/hosts
and make sure "files" is first in nsswitch.conf, but others argue that is not
feasible for some deployments. (Using IP numbers works for AUTH_SYS,
but not Kerberized mounts.)

Note that there is already "retrycnt", which specifies retry the mount,
but that retry loop doesn't include getaddrinfo(3) calls.
--> Personally, I do not like always doing retries since I often
     type mount commands manually and I'm a terrible typist, so I
     often mistype the server's name.

This reply was mostly a followup on all the good comments and
not just yours.

Thanks everyone, for your comments, rick

Re: RFC: mount_nfs failure due to dns not running yet

Reply via email to