This bug was fixed in the package linux-gke - 5.15.0-1080.86
---------------
linux-gke (5.15.0-1080.86) jammy; urgency=medium
* jammy/linux-gke: 5.15.0-1080.86 -proposed tracker (LP: #2107001)
* nfsd hangs and never recovers after NFS4ERR_DELAY and a connection loss
(LP: #2103564)
- NFSD: Reset cb_seq_status after NFS4ERR_DELAY
[ Ubuntu: 5.15.0-139.149 ]
* jammy/linux: 5.15.0-139.149 -proposed tracker (LP: #2107038)
* Packaging resync (LP: #1786013)
- [Packaging] update annotations scripts
* CVE-2023-52664
- net: atlantic: eliminate double free in error handling logic
* CVE-2023-52927
- netfilter: allow exp not to be removed in nf_ct_find_expectation
-- Benjamin Wheeler <[email protected]> Wed, 16 Apr 2025
15:12:27 -0400
** Changed in: linux-gke (Ubuntu Jammy)
Status: Fix Committed => Fix Released
** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2023-52664
** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2023-52927
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-gke in Ubuntu.
https://bugs.launchpad.net/bugs/2103564
Title:
nfsd hangs and never recovers after NFS4ERR_DELAY and a connection
loss
Status in linux package in Ubuntu:
Fix Released
Status in linux-gke package in Ubuntu:
New
Status in linux source package in Jammy:
Fix Committed
Status in linux-gke source package in Jammy:
Fix Released
Status in linux source package in Noble:
Fix Committed
Bug description:
BugLink: https://bugs.launchpad.net/bugs/2103564
[Impact]
nfsd loops forever in nfsd4_cb_sequence_done() after it receives a
NFS4ERR_DELAY
and the connection is subsequently lost.
What happens is that NFS4ERR_DELAY sets cb->cb_seq_status to -10008, but it is
never set back to 1, so it just keeps sending NFS4ERR_DELAY.
The stack trace looks like:
watchdog: BUG: soft lockup - CPU#33 stuck for 22s! [kworker/u120:29:1520679]
Kernel panic - not syncing: softlockup: hung tasks
CPU: 33 PID: 1520679 Comm: kworker/u120:29 Tainted: G L 5.15.0-1069-gke
#75-Ubuntu
Workqueue: rpciod rpc_async_schedule [sunrpc]
Call Trace:
RIP: 0010:__rpc_sleep_on_priority_timeout+0x7b/0x110 [sunrpc]
Code: 0f b6 f9 66 90 44 89 fa 48 89 de 4d 8d 7e 50 4c 89 f7 e8 c8 fb ff ff
4c 89 6b 28 49 8b 46 50 49 39 c7 74 5a 4d 3b 6e 60 78 54 <49> 8b 56 50 48 8d 43
60 48 89 42 08 48 89 53 60 4c 89 7b 68 49 89
...
rpc_sleep_on_timeout+0x56/0xa0 [sunrpc]
rpc_delay+0x29/0x30 [sunrpc]
nfsd4_cb_sequence_done+0x1b9/0x250 [nfsd]
nfsd4_cb_done+0x1d/0xf0 [nfsd]
pc_exit_task+0x5c/0x110 [sunrpc]
? __rpc_sleep_on_priority+0x80/0x80 [sunrpc]
__rpc_execute+0x68/0x270 [sunrpc]
rpc_async_schedule+0x30/0x50 [sunrpc]
process_one_work+0x22b/0x3d0
worker_thread+0x53/0x420
? process_one_work+0x3d0/0x3d0
kthread+0x12a/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x22/0x30
</TASK>
There is no workaround.
[Fix]
This was fixed in 6.9-rc1 by:
commit 961b4b5e86bf56a2e4b567f81682defa5cba957e
From: Chuck Lever <[email protected]>
Date: Fri, 26 Jan 2024 12:45:17 -0500
Subject: NFSD: Reset cb_seq_status after NFS4ERR_DELAY
Link:
https://github.com/torvalds/linux/commit/961b4b5e86bf56a2e4b567f81682defa5cba957e
This is present in 5.15.179 and 6.6.76 upstream stable.
[Testcase]
There is no known synthetic reproducer available.
Currently we see it in production workloads on Google Kubernetes Engine, and
we have successfully deployed and ran a test kernel in production with no
further incidents occurring. Before it would lock up once a day.
The test kernel is available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf407307-test
If you install the kernel from the ppa, the issue no longer occurs.
[Where problems can occur]
We are resetting the value of cb->cb_seq_status back to 1 to let it get out of
its state machine, and to actually make some progress, instead of being
trapped at NFS4ERR_DELAY.
If a regression were to occur, it would affect NFS v4.x systems, and it
wouldn't
likely cause any real issues, likely some flapping between NFS4ERR_DELAY and
sending callbacks.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2103564/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp