Package: nfs-common Version: 1:1.3.4-2.1+deb9u1 Severity: important Dear Maintainer,
FYI: /usr/share/bug/nfs-common/script warns of error. in my case: 'cat /etc/fstab|grep nfs >&3' returns 1 due to 'grep' fail (my nfs is all in autofs). should probably '|| true' that. to avoid user confusion in bugreport. same for other grep statements. Note: This is a doozy -- a whole tower of fail (some my fault for implementing an earlier workaround for nfs deadlocks and forgetting it was there). So i'm not sure what package to report it against, but the core trigger is in /usr/sbin/start-statd so starting there. Yes, my system is likely configured in a pretty non-standard way, having an /etc/nfsmount.conf forcing nfsv3 to avoid a deadlock in earlier systems coming back to bite me as a deadlock in the future :-( other subsystems involved: - systemd - dbus-daemon - autofs ultimately, my goal here is to help establish a robust systemd-capable coordination in the various parts here to avoid another similar issue due to these inter-dependencies. I don't know if 'start-statd' being re-written to take systemd state into account is the correct solution, but IMHO, systemd/dbus-daemon are utterly fragile in this situation and extremely difficult to debug (need systemctl to do stuff, but it won't work, and you can't restart dbus-daemon w/o systemctl, and kill -TERM on pid 1 doesn't work ... Summary: - system configured for NFSv3 mounts via /etc/nfsmount.conf note: this was to workaround a bug in NFSv4.[012] that caused deadlocks against NFSv3 servers running Jessie. i do not recall the bug # something changed in a recent Stretch patchlevel as this was working fine up until i patched and rebooted. - systemd unit rpc-statd.service is disabled - automount/autofs -> nfs is called triggering start-statd that makes a 'systemctl start rpc-statd' that takes down dbus-daemon and never completes. - regardless of where the blame lies, it is possible that is wrong to call 'systemctl' from inside 'start-statd' *if* it's being called from a systemd unit itself. If system is configured for NFS v3 mounts via /etc/nfsmount.conf and systemctl unit 'rpc-statd' is disabled, then the automounter creates a chain in boot (at least in our system case) that forcibly tries to run 'systemctl start rpc-statd' via /usr/sbin/start-statd. This results in systemctl call not completing (i don't know if it's because systemctl calls can't be nested or called outside normal startup flow or what), and eventually dbus-daemon stops responding (so it could be a bug that needs to be transferred there). this locks up the entire boot process. systemctl calls all timeout. dbus-daemon is sitting in EAGAIN (resource temporarily unavailable) Additionally, i wasn't able to ssh in (even though systemd had started sshd) because of 'pam_motd' in /etc/pam.d/sshd calling update-motd, which also blocked hard and never completed and was uninterruptable. once i commented 'pam_motd' out, i could ssh in, and <CTRL>C something hanging on nfs to get a shell. (again, tower of fail) once in, if i killed the 'systemctl start rpc-statd', the system would return to responsiveness. (systemctl could again contact dbus-daemon) systemd-cgls showed: +-autofs.service | +-1453 /usr/sbin/automount --pid-file /var/run/autofs.pid | +-1465 /bin/mount -t nfs -s -o intr,nodev,nosuid ral-local-linux:/exports/linux-amd64 /var/autofs/mnt/linux-amd64 | +-1466 /sbin/mount.nfs ral-local-linux:/exports/linux-amd64 /var/autofs/mnt/linux-amd64 -s -o rw,nodev,nosuid,intr | +-1467 /bin/sh /usr/sbin/start-statd | -1470 systemctl start rpc-statd.service ^^^^ this hangs dbus-daemon and brings down the whole systemd kingdom. before it hung, ... puffin:/etc/default/grub.d# systemctl list-jobs TYPE STATE 607 apt-daily.service start running 462 nfs-config.service start running 468 apt-daily-upgrade.service start waiting 460 rpc-statd-notify.service start waiting 453 rpc-statd.service start waiting 464 systemd-tmpfiles-clean.service start running Note: 'ral-local-linux' is our NFS-shared /usr/local. this may have been triggered early due to 'cron' being started and user '@reboot' jobs launching. Note: i have a lot of systemd debug and other captured logs i can provide if needed. here's the /etc/nfsmount.conf that was being used prior: [ NFSMount_Global_Options ] nfsvers=3 Thanks, --stephen -- Package-specific info: -- rpcinfo -- program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100005 1 udp 48853 mountd 100005 1 tcp 45675 mountd 100005 2 udp 56398 mountd 100005 2 tcp 58131 mountd 100005 3 udp 49109 mountd 100005 3 tcp 48261 mountd 100003 3 tcp 2049 nfs 100003 4 tcp 2049 nfs 100227 3 tcp 2049 100003 3 udp 2049 nfs 100003 4 udp 2049 nfs 100227 3 udp 2049 100021 1 udp 54879 nlockmgr 100021 3 udp 54879 nlockmgr 100021 4 udp 54879 nlockmgr 100021 1 tcp 41063 nlockmgr 100021 3 tcp 41063 nlockmgr 100021 4 tcp 41063 nlockmgr 100007 2 udp 806 ypbind 100007 1 udp 806 ypbind 100007 2 tcp 807 ypbind 100007 1 tcp 807 ypbind 100024 1 udp 58391 status 100024 1 tcp 34239 status -- /etc/default/nfs-common -- NEED_STATD= STATDOPTS= NEED_IDMAPD=yes NEED_GSSD= -- /etc/idmapd.conf -- [General] Verbosity = 0 Pipefs-Directory = /run/rpc_pipefs [Mapping] Nobody-User = nobody Nobody-Group = nogroup -- /etc/fstab -- -- System Information: Debian Release: 9.13 APT prefers oldstable-updates APT policy: (500, 'oldstable-updates'), (500, 'oldstable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.9.0-14-amd64 (SMP w/24 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages nfs-common depends on: ii adduser 3.115 ii init-system-helpers 1.48 ii keyutils 1.5.9-9 ii libc6 2.24-11+deb9u4 ii libcap2 1:2.25-1 ii libcomerr2 1.43.4-2+deb9u2 ii libdevmapper1.02.1 2:1.02.137-2 ii libevent-2.0-5 2.0.21-stable-3 ii libgssapi-krb5-2 1.15-1+deb9u2 ii libk5crypto3 1.15-1+deb9u2 ii libkeyutils1 1.5.9-9 ii libkrb5-3 1.15-1+deb9u2 ii libmount1 2.29.2-1+deb9u1 ii libnfsidmap2 0.25-5.1 ii libtirpc1 0.2.5-1.2+deb9u1 ii libwrap0 7.6.q-26 ii lsb-base 9.20161125 ii rpcbind 0.2.3-0.6 ii ucf 3.0036 Versions of packages nfs-common recommends: ii python 2.7.13-2 Versions of packages nfs-common suggests: pn open-iscsi <none> pn watchdog <none> Versions of packages nfs-kernel-server depends on: ii init-system-helpers 1.48 ii keyutils 1.5.9-9 ii libblkid1 2.29.2-1+deb9u1 ii libc6 2.24-11+deb9u4 ii libcap2 1:2.25-1 ii libsqlite3-0 3.16.2-5+deb9u3 ii libtirpc1 0.2.5-1.2+deb9u1 ii libwrap0 7.6.q-26 ii lsb-base 9.20161125 ii netbase 5.4 ii ucf 3.0036 -- Configuration Files: /etc/default/nfs-common changed [not included] -- no debconf information