** Summary changed:

- NFS connections block while causing a high-bandwidth RPC-pingpong between 
client and server
+ NFSv4.1: Interrupted connections cause high bandwidth RPC ping-pong between 
client and server

** Description changed:

- There's a bug in kernels before Linux 5.0 that affects NFS 4.1 connections. 
The bug presents itself like this:
-    * On NFS clients: Attempts to access mounted NFS shares associated with 
the affected server
-      block indefinitely.
-    * On the network: A storm of repeated RPCs between NFS client and server 
uses a lot
-      of bandwidth. Each RPC is acknoledged by the server with an 
NFS4ERR_SEQ_MISORDERED error.
-    * Other NFS clients connected to the same NFS server: Performance drops 
dramatically.
+ BugLink: https://bugs.launchpad.net/bugs/1828978
  
- A patch is available to fix this problem:
+ [Impact]
  
- 
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3453d5708b33efe76f40eca1c0ed60923094b971>
+ There is a bug in NFS v4.1 that causes a large amount of RPC calls
+ between a client and server when a previous RPC call is interrupted.
+ This uses a large amount of bandwidth and can saturate the network.
  
- Is is possible to integrate the patch into the 4.18 kernel series?
- I'm using Ubuntu 18.04.2 LTS as NFS client an server.
+ The symptoms are so:
  
- Thank you.
+ * On NFS clients:
+ Attempts to access mounted NFS shares associated with the affected server 
block indefinitely.
+  
+ * On the network:
+ A storm of repeated RPCs between NFS client and server uses a lot of 
bandwidth. Each RPC is acknoledged by the server with an NFS4ERR_SEQ_MISORDERED 
error.
  
- Best regards,
+ * Other NFS clients connected to the same NFS server:
+ Performance drops dramatically.
  
- Frank Burkhardt
+ This occurs during a "false retry", when a client attempts to make a new
+ RPC call using a slot+sequence number that references an older, cached
+ call. This happens when a user process interrupts an RPC call that is in
+ progress.
+ 
+ [Fix]
+ 
+ This was fixed in 5.1 upstream with the below commit:
+ 
+ commit 3453d5708b33efe76f40eca1c0ed60923094b971
+ Author: Trond Myklebust <trond.mykleb...@hammerspace.com>
+ Date:   Wed Jun 20 17:53:34 2018 -0400
+ Subject: NFSv4.1: Avoid false retries when RPC calls are interrupted
+ 
+ The fix is to pre-emptively increment the sequence number if an RPC call
+ is interrupted, and to address corner cases we interpret the
+ NFS4ERR_SEQ_MISORDERED error as a sign we need to locate an approperiate
+ sequence number between the value we sent, and the last successfully
+ acked SEQUENCE call.
+ 
+ Commit 3453d5708b33efe76f40eca1c0ed60923094b971 is a clean cherry-pick
+ to disco.
+ 
+ [Testcase]
+ 
+ This is difficult to reproduce on test systems, and has instead been
+ verified on a production NFS v4.1 system in a customer environment. This
+ server is heavily trafficked and has a large number of different NFS
+ clients connected to it.
+ 
+ I have built a test kernel that contains the above patch, and also
+ patches for Bug 1842037. It is available here:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf241068-test
+ 
+ Note that the above kernel is for bionic HWE, and not explicitly disco.
+ 
+ Discussion about the patch validation can be found at the bottom of Bug
+ 1842037.
+ 
+ On unpatched kernels, expect to see the symptoms mentioned in Impact,
+ and on patched systems, everything working as intended.
+ 
+ [Regression Potential]
+ 
+ The changes are localised to NFS v4.1 only, and other versions of NFS
+ are not affected. If a regression occurs, users can downgrade NFS
+ versions to v4.0 or v3.x until a fix is made.
+ 
+ The changes only impact when connections are interrupted, and under
+ typical blue sky scenarios would not be invoked.
+ 
+ There have been no fixup commits or commits near the requested commit in
+ newer kernels, which points to this commit fixing the issue, and adopted
+ by the community.

** Tags added: disco sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1828978

Title:
  NFSv4.1: Interrupted connections cause high bandwidth RPC ping-pong
  between client and server

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1828978/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to