Excuse the top posting. I am replying to a top post.

Correct. This is not a bug. It's been NFS behaviour ever since I
switched careers from IBM Mainframe to UNIX ~ 30 years ago. Sun Solaris
behaved this way, as did Tru64, DG-UX and HP/UX. Typically a sysadmin
needed to -- and needs to today -- put NFS server IPs in hosts(5).

One person suggested using late for NFS shares. That too what Red Hat
does. NFS mounts are flagged by systemd (and prior to that upstart
[rc.d]), with _netserv. _netserv would cause the NFS mount to take
place after the network is fully up, including DNS resolution.

Solaris didn't do this when I last worked on it and AFAIK it still
doesn't.

I think our choices are to document that sysadmins must either use
hosts(5) or ensure NFS shares are mounted late. Or, mount NFS shares
after the network is fully up.

A retry forever, until DNS finally provides a good answer, can
potentially hang boot. This would be especially troublesome for remote
unattended reboot in which remediation would require calling remote
eyes and hands remote support to "fix" the situation on the console.

BTW, with NFSv3 and v2, uninterruptible mounts, i.e. those without the
intr option, did behave this way. NFSv4 doesn't support intr.

I think the easiest solution would be some documentation. Next would be
mounting NFS shares later at about the same time late mounts are
processed (actually, immediately prior), like Red Hat Linux does. A
_netserv fstab(5) option could serve the same purpose it does in linux,
immediately prior to late option handling.

Altering the kernel wait forever is undesirable. This would result in
boot hangs requiring console access to work around the problem. This
would be a PITA and POLA for unattended remote sites.

-- 
Cheers,
Cy Schubert <cy.schub...@cschubert.com>
FreeBSD UNIX:  <c...@freebsd.org>   Web:  https://FreeBSD.org
NTP:           <c...@nwtime.org>    Web:  https://nwtime.org

                        e^(i*pi)+1=0


On Thu, 20 Feb 2025 01:25:20 +0100
Lars Tunkrans <drsn...@gmail.com> wrote:

> This situation has existed these past 40 years.  You have to put your
> ipadress : hostname pairs into /etc/hosts  if you dont have accsss to a
> working DNS.    This is not a bug.  Its the way name resolution works.
> 
> Den ons 19 feb. 2025 23:40Rick Macklem <rick.mack...@gmail.com> skrev:
> 
> > Hi,
> >
> > The subject line basically describes the problem glebius@
> > ran into.  When doing an NFS mount in /etc/fstab, it failed
> > since the DNS service was not yet working and, as such,
> > the DNS lookup of the server fqdn failed, causing the mount
> > to fail. Note that this behaviour has existed for decades.
> >
> > He feels this is a bug and that mount_nfs(8) should retry
> > getaddrinfo(3) calls until success, instead of failing the
> > mount when the first attempt fails.
> > The problem with just retrying getaddrinfo(3) is that it
> > could retry forever for simple failures like a typo in the
> > server fqdn.
> > I can see several ways this can be handled and would
> > like feedback from others w.r.t. these alternatives.
> >
> > 1) Simply document this case and encourage use of
> >     host names in /etc/hosts for NFS servers along with
> >     specifying use of file before dns in nsswitch.conf.
> >      Doing this results in the mounts working whether or
> >       not DNS is working.
> >
> > 2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3)
> >      until it succeeds. (I feel this would be a POLA violation,
> >      given that the current behaviour has existed for decades
> >      and for simple cases where the fqdn will never resolve
> >      the behaviour would be to hang at the mount attempt
> >      during boot unless "bg" is specified for the /etc/fstab entry.)
> >
> > 3) Add a new NFS mount option "retrydns=<N>", which would enable
> >     retries of getaddrinfo(3). This would avoid any POLA violation and
> >     would allow for a convenient way to document the behaviour in
> >     "man mount_nfs".
> >
> > 4) ???
> >
> > So, what do you think is the preferred change?
> >
> > rick
> > ps: I looked and the return value from getaddrinfo(3) does not
> >       appear to be useful to discern the case of "DNS service not
> >       running yet". (I think it replies EAI_FAIL for this case.)
> >
> >  


Reply via email to