Bug#711021: marked as done (mount.nfs timeout for GETPORT is much too short)

Debian Bug Tracking System Sun, 22 Sep 2024 14:15:25 -0700

Your message dated Sun, 22 Sep 2024 23:12:00 +0200
with message-id <c427a6cc8fb80b564d884356119c3ad77201c9b1.ca...@decadent.org.uk>
and subject line Re: mount.nfs timeout for GETPORT is much too short
has caused the Debian Bug report #711021,
regarding mount.nfs timeout for GETPORT is much too short
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)


-- 
711021: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=711021
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems

--- Begin Message ---

Package: nfs-common
Version: 1:1.2.6-3
Severity: important

This NFS client stopped being able to mount from my NFS server at boot
time, around the time I upgraded them both to wheezy.  I think the
problem started when only the server was upgraded and was ultimately
triggered by avahi-daemon being installed.  Since cups now recommends
avahi-daemon, this can be considered a common configuration.

I took a packet capture on both sides (which matched, so no packets are
being lost) and saw that:

- The client makes a GETPORT call
- The client retries a few times at 1 second intervals, then (if using
  TCP) closes the connection
- About 5 seconds after the first call from the client, the server sends
  a reply.  (strace-ing rpcbind showed it requesting a reverse DNS lookup
  from avahi, which apparently has a 5 second timeout for mDNS lookups.
  The client should have had a proper reverse DNS entry, but didn't.)
- The client sends a RST (TCP) or ICMP port unreachable error (UDP) when
  receiving the reply

The relevant functions include nfs_pmap_getport() in
support/nfs/getport.c, which even has a comment to say:

 *  2.  This version times out quickly by default.  It time-limits the
 *      connect process as well as the actual RPC call, and even allows the
 *      caller to specify the timeout.

I don't know why it does this, though perhaps the intent was to
fail-over quickly when auto-detecting whether the remote portmap/rpcbind
uses TCP or UDP.  But having failed to query on both protocols, the
timeout ought to be increased when retrying.

Ben.

-- Package-specific info:
-- rpcinfo --
   program vers proto   port
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  46254  status
    100024    1   tcp  58492  status
    100021    1   udp  33374  nlockmgr
    100021    3   udp  33374  nlockmgr
    100021    4   udp  33374  nlockmgr
    100021    1   tcp  40195  nlockmgr
    100021    3   tcp  40195  nlockmgr
    100021    4   tcp  40195  nlockmgr
-- /etc/default/nfs-common --
NEED_STATD=
STATDOPTS=
NEED_IDMAPD=
NEED_GSSD=
-- /etc/idmapd.conf --
[General]
Verbosity = 0
Pipefs-Directory = /var/lib/nfs/rpc_pipefs
[Mapping]
Nobody-User = nobody
Nobody-Group = nogroup
-- /etc/fstab --
shadbolt:/home  /home           nfs     nfsvers=3,nodev,nosuid,mountproto=tcp   
0       0
shadbolt:/usr/local /usr/local  nfs     nfsvers=3,nodev,nosuid,mountproto=tcp   
0       0
-- /proc/mounts --
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
shadbolt:/home /home nfs 
rw,nosuid,nodev,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.2.1,mountvers=3,mountport=33045,mountproto=tcp,local_lock=none,addr=192.168.2.1
 0 0
shadbolt:/usr/local /usr/local nfs 
rw,nosuid,nodev,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.2.1,mountvers=3,mountport=33045,mountproto=tcp,local_lock=none,addr=192.168.2.1
 0 0

-- System Information:
Debian Release: 7.0
  APT prefers stable
  APT policy: (990, 'stable'), (500, 'stable-updates'), (500, 
'proposed-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages nfs-common depends on:
ii  adduser             3.113+nmu3
ii  initscripts         2.88dsf-41
ii  libc6               2.13-38
ii  libcap2             1:2.22-1.2
ii  libcomerr2          1.42.5-1.1
ii  libdevmapper1.02.1  2:1.02.74-7
ii  libevent-2.0-5      2.0.19-stable-3
ii  libgssglue1         0.4-2
ii  libk5crypto3        1.10.1+dfsg-5
ii  libkeyutils1        1.5.5-3
ii  libkrb5-3           1.10.1+dfsg-5
ii  libmount1           2.20.1-5.3
ii  libnfsidmap2        0.25-4
ii  libtirpc1           0.2.2-5
ii  libwrap0            7.6.q-24
ii  lsb-base            4.1+Debian8
ii  rpcbind             0.2.0-8
ii  ucf                 3.0025+nmu3

Versions of packages nfs-common recommends:
ii  python  2.7.3-4

Versions of packages nfs-common suggests:
pn  open-iscsi  <none>
pn  watchdog    <none>

-- no debconf information

signature.asc
Description: This is a digitally signed message part

--- End Message ---

--- Begin Message ---

I wrote:
> I'm not sure that this bug was ever fixed.
> 
> nfs-utils actually uses nfs_getport() to get the port.  That passes a
> timeout of {-1, 0} to libtirpc, which is invalid and should result in
> using the rpcbind client's default timeout.

This analysis was wrong.

nfs_getport() first calls nfs_gp_get_rpcbclient() ->
nfs_get_rpcclient() -> nfs_get_tcpclient() or nfs_get_udpclient(), and
those last two function update the timeout to be 10 seconds (TCP) or 3
seconds (UDP).  That hasn't changed between the version I reported
against (1.2.6-3) and the current 2.7.1-3.

Salvatore Bonaccorso <car...@debian.org> wrote:
[...]
> Is this still something we need to report upstream? (And if so, could
> you do it?).

I can't find the bug, and I don't particularly to care to investigate
any more, so I'm closing this report.

Ben.

-- 
Ben Hutchings
The two most common things in the universe are hydrogen and stupidity.

signature.asc
Description: This is a digitally signed message part

--- End Message ---

Bug#711021: marked as done (mount.nfs timeout for GETPORT is much too short)

Reply via email to