Package: nfs-kernel-server Version: 1:1.3.4-2.1 Severity: important -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
I have recently upgraded my NFS server from Jessie to Stretch. It was the last system on my network that needed to be upgraded. After the upgrade, I found that NFSv4 was nearly unusable. All user home directories, as well as backup and scratch space, are mounted via NFS. Users trying to log in to a default GNOME session would experience hangs that appeared to never resolve themselves. Logging in through ssh resulted in ~2 minutes delay before a bash prompt would appear. I attempted several things to troubleshoot the problem, including reverting to kernel 3.16.0 from Jessie and nfs-utils 1.2.8-9.1. I tried going all the way back to nfs-utils 1.2.8-9, but the version of systemd in Stretch has "nfs-common (<< 1:1.2.8-9.1)". Even after reverting the problem persisted. Based on that, I think that the problem may have been introduced as far back as 1.2.8-9.1. The problem also manifested itself in the form of problems for users logged directly into the NFS server. Even though file accesses are local in that scenario, if an NFS client had attemped to access a file and hung on it, then it would also cause problems on the NFS server. For example, I have a large MBOX file that I access with mutt. If I access the file from an NFS client then it appears to hang with messages like this flooding syslog on the client: vmunix: [918369.611122] NFS: nfs4_reclaim_open_state: Lock reclaim failed Trying to then open that same mbox with mutt on the NFS server causes mutt to segfault because the call to fcntl() returns an error. The only thing that resolved the problem was switching my exports from sec=krb5p to sec=krb5i. Of course, this is considerably less desirable than krb5p in some cases. I believe the that problem is with rpc.svcgssd. Here is part of what got captured in syslog during a failure: Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: using domain: example.com Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: Realms list: 'EXAMPLE.COM' Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: processing 'Method' list Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: loaded plugin /lib/x86_64-linux-gnu/libnfsidmap/nsswitch.so for method nsswitch Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: entering poll Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: leaving poll Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: handling null request Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: sname = nfs/ws1.example....@example.com Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: doing downcall Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509615211 (25678 from now), clnt: n...@ws1.example.com, uid: -1, gid: -1, num aux grps: 0: Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: sending null reply Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload] Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: finished handling null request Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: entering poll Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: leaving poll Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: handling null request Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.example....@example.com Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: doing downcall Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14448 from now), clnt: n...@ws2.example.com, uid: -1, gid: -1, num aux grps: 0: Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: sending null reply Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload] Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: finished handling null request Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: entering poll Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: leaving poll Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: handling null request Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.example....@example.com Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: doing downcall Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13881 from now), clnt: n...@ws3.example.com, uid: -1, gid: -1, num aux grps: 0: Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: sending null reply Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload] Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: finished handling null request Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: entering poll Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: leaving poll Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: handling null request Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.example....@example.com Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: doing downcall Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13756 from now), clnt: n...@ws3.example.com, uid: -1, gid: -1, num aux grps: 0: Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sending null reply Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload] Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: finished handling null request Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: entering poll Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: leaving poll Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: handling null request Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.example....@example.com Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: doing downcall Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13756 from now), clnt: n...@ws3.example.com, uid: -1, gid: -1, num aux grps: 0: Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sending null reply Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload] Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: finished handling null request Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: entering poll Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: leaving poll Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: handling null request Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.example....@example.com Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: doing downcall Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14280 from now), clnt: n...@ws2.example.com, uid: -1, gid: -1, num aux grps: 0: Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: sending null reply Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload] Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: finished handling null request Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: entering poll Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: leaving poll Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: handling null request Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: sname = robe...@example.com Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: doing downcall Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509623547 (33835 from now), clnt: <null>, uid: 3167, gid: 100, num aux grps: 17: Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 1) 100 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 2) 4 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 3) 24 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 4) 29 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 5) 33 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 6) 44 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 7) 46 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 8) 50 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 9) 60 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 10) 113 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 11) 115 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 12) 125 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 13) 132 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 14) 500 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 15) 501 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 16) 502 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 17) 503 Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: sending null reply Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload] Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: finished handling null request Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: entering poll Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: leaving poll Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: handling null request Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.example....@example.com Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: doing downcall Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14260 from now), clnt: n...@ws2.example.com, uid: -1, gid: -1, num aux grps: 0: Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: sending null reply Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload] Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: finished handling null request Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: entering poll Nov 1 22:32:42 nfssrv rpc.svcgssd[30152]: exiting on signal 15 - -- Package-specific info: - -- rpcinfo -- program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 391002 2 tcp 650 sgi_fam 100005 1 udp 32767 mountd 100005 1 tcp 32767 mountd 100005 2 udp 32767 mountd 100005 2 tcp 32767 mountd 100005 3 udp 32767 mountd 100005 3 tcp 32767 mountd 100003 3 tcp 2049 nfs 100003 4 tcp 2049 nfs 100227 3 tcp 2049 100003 3 udp 2049 nfs 100003 4 udp 2049 nfs 100227 3 udp 2049 100021 1 udp 52460 nlockmgr 100021 3 udp 52460 nlockmgr 100021 4 udp 52460 nlockmgr 100021 1 tcp 37351 nlockmgr 100021 3 tcp 37351 nlockmgr 100021 4 tcp 37351 nlockmgr - -- /etc/default/nfs-kernel-server -- RPCNFSDCOUNT=8 RPCNFSDPRIORITY=0 RPCMOUNTDOPTS="-p 32767" NEED_SVCGSSD="yes" RPCSVCGSSDOPTS="" - -- /etc/exports -- /srv/nfs 192.168.0.0/16(sync,rw,fsid=0,crossmnt,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i) /srv/nfs/home 192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i) /srv/nfs/backup 192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i) /srv/nfs/scratch 192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i) - -- System Information: Debian Release: 9.2 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.9.0-4-amd64 (SMP w/4 CPU cores) Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), LANGUAGE=en_US.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages nfs-kernel-server depends on: ii init-system-helpers 1.48 ii keyutils 1.5.9-9 ii libblkid1 2.29.2-1 ii libc6 2.24-11+deb9u1 ii libcap2 1:2.25-1 ii libsqlite3-0 3.16.2-5 ii libtirpc1 0.2.5-1.2 ii libwrap0 7.6.q-26 ii lsb-base 9.20161125 ii netbase 5.4 ii nfs-common 1:1.3.4-2.1 ii ucf 3.0036 nfs-kernel-server recommends no packages. nfs-kernel-server suggests no packages. - -- no debconf information -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEz9ERzDttUsU/BH8iLNd4Xt2nsg8FAln6mi4ACgkQLNd4Xt2n sg+OPRAAksDf+IR6dTn1Gl8OM4IOMVi5S48i+S6n3sDy9Uv04zRX1sdvTuzvGF1d 4mRqPson+eQ3CnuMQIH8XOMZJdrZoB7bsjMWjZC9FgBn42oE7wOxcMJW+FC1qCYe AJMan4ssGDec62XnBkkZY2byhvbXygEmSq/cCQlCKp3amwe3VEm+lqf1qFKl2euV qrnkgk7aAgrH6zu+q5WgAyrEZzvDi7NbcGheUe0NitAKEu2zR+Iq6mMOEP9rd91s sgJ2tCHvwZtE6FoUHk9GJrluZyzItBIGV+aaA9PFjJcoQgxB5mE5cS0vFlfXiVbm xS9v67UJXp/H7R5S+mnoMDqfKi+TCW+FaLEj5s+4bpYd7sk6Pyqugj3yEj6OYKAw u2G7iB4MA64roxGeIEghgPvW/Zmuvpa+pi6RjOUcyvO4lyNGMVUM7TP59fWuXbwv kwQ+coBI6SxAUnKOiA9MADHZcUxjdhPWckkTFagijv9mipAY9+G6+fWBY3iFZrMO vrGABHecEzV5TzKv4J+sSGl5581/+vImK3SIUA6ILBED1SkWmFFonekE9P0YDgFr tR2+vcCvhQehYSQW/hifJpbY/3puUXYOvqQmv+Be3AVw6hBaxVfzY1yjkDemjKXr oHmDZ5L76TokKoiv45JYIWAr8sjNsxScDUlhnP+Ou7Tvi8eUIqM= =CbNN -----END PGP SIGNATURE-----