On 2/19/25 5:40 PM, Rick Macklem wrote:
Hi,
The subject line basically describes the problem glebius@
ran into. When doing an NFS mount in /etc/fstab, it failed
since the DNS service was not yet working and, as such,
the DNS lookup of the server fqdn failed, causing the mount
to fail. Note that this behaviour has existed for decades.
He feels this is a bug and that mount_nfs(8) should retry
getaddrinfo(3) calls until success, instead of failing the
mount when the first attempt fails.
The problem with just retrying getaddrinfo(3) is that it
could retry forever for simple failures like a typo in the
server fqdn.
I can see several ways this can be handled and would
like feedback from others w.r.t. these alternatives.
1) Simply document this case and encourage use of
host names in /etc/hosts for NFS servers along with
specifying use of file before dns in nsswitch.conf.
Doing this results in the mounts working whether or
not DNS is working.
2) Call it a bug and patch mount_nfs(8) to retry getaddrinfo(3)
until it succeeds. (I feel this would be a POLA violation,
given that the current behaviour has existed for decades
and for simple cases where the fqdn will never resolve
the behaviour would be to hang at the mount attempt
during boot unless "bg" is specified for the /etc/fstab entry.)
3) Add a new NFS mount option "retrydns=<N>", which would enable
retries of getaddrinfo(3). This would avoid any POLA violation and
would allow for a convenient way to document the behaviour in
"man mount_nfs".
4) ???
Split the difference? -1 for "try forever", default to 3, configurable
up to insanity? Also, rather than just DNS, make this in the case of
just about any failure except actual administrative failure (mountd
refusing the mount, for example).
If this gets added, there should either be an exponential backoff with a
configurable max (default to 30s), or a configurable static delay
(default to 3s? 10s?). The mount_nfs process should log loudly every
time the delay gets triggered.
Honestly, this would be handy in any number of crazy situations where
you have a need to wait for something else to start. I've been bitten
by the "just fail the mount" behavior before, but I worked around it
instead of thinking of changing the behavior.
Daniel