On 2/19/25 5:40 PM, Rick Macklem wrote:
Hi,

The subject line basically describes the problem glebius@
ran into.  When doing an NFS mount in /etc/fstab, it failed
since the DNS service was not yet working and, as such,
the DNS lookup of the server fqdn failed, causing the mount
to fail. Note that this behaviour has existed for decades.

He feels this is a bug and that mount_nfs(8) should retry
getaddrinfo(3) calls until success, instead of failing the
mount when the first attempt fails.
The problem with just retrying getaddrinfo(3) is that it
could retry forever for simple failures like a typo in the
server fqdn.
I can see several ways this can be handled and would
like feedback from others w.r.t. these alternatives.

1) Simply document this case and encourage use of
     host names in /etc/hosts for NFS servers along with
     specifying use of file before dns in nsswitch.conf.
      Doing this results in the mounts working whether or
       not DNS is working.

2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3)
      until it succeeds. (I feel this would be a POLA violation,
      given that the current behaviour has existed for decades
      and for simple cases where the fqdn will never resolve
      the behaviour would be to hang at the mount attempt
      during boot unless "bg" is specified for the /etc/fstab entry.)

3) Add a new NFS mount option "retrydns=<N>", which would enable
     retries of getaddrinfo(3). This would avoid any POLA violation and
     would allow for a convenient way to document the behaviour in
     "man mount_nfs".

4) ???
Split the difference?  -1 for "try forever", default to 3, configurable up to insanity?  Also, rather than just DNS, make this in the case of just about any failure except actual administrative failure (mountd refusing the mount, for example).

If this gets added, there should either be an exponential backoff with a configurable max (default to 30s), or a configurable static delay (default to 3s? 10s?).   The mount_nfs process should log loudly every time the delay gets triggered.

Honestly, this would be handy in any number of crazy situations where you have a need to wait for something else to start.  I've been bitten by the "just fail the mount" behavior before, but I worked around it instead of thinking of changing the behavior.

Daniel


Reply via email to