** Description changed:

- Can you please cherry-pick those 2 important commits from upstream?
+ BugLink: https://bugs.launchpad.net/bugs/2093871
  
- 1. net: introduce helper sendpages_ok()
- (https://github.com/torvalds/linux/commit/23a55f44)
+ [Impact]
  
- Network drivers are using sendpage_ok() to check the first page of an
- iterator in order to disable MSG_SPLICE_PAGES. The iterator can
- represent list of contiguous pages.
+ Currently the nvme-tcp and drbg subsystems try to enable the MSG_SPLICE_PAGES
+ flag on pages to be written, and when MSG_SPLICE_PAGES is set, eventually it
+ calls skb_splice_from_iter(), which then checks all pages with sendpage_ok()
+ to see if all the pages are sendable.
  
- When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used,
- it requires all pages in the iterator to be sendable. Therefore it needs
- to check that each page is sendable.
+ At the moment, both subsystems only check the first page in a potentially
+ contiguous block of pages, if they are sendpage_ok(), and if the first page 
is,
+ then it just assumes all the rest are sendpage_ok() too, and sends the I/O off
+ to eventually be found out by skb_splice_from_iter(). If one or more of the
+ pages in the contiguous block is not sendpage_ok(), then we get a warn 
printed,
+ data transfer is aborted. In the nvme-tcp case, IO then hangs.
  
- The patch introduces a helper sendpages_ok(), it returns true if all the
- contiguous pages are sendable.
+ This patchset introduces sendpages_ok() which iterates over each page in a
+ contiguous block, checks if it is sendpage_ok(), and only returns true if all
+ of them are.
  
- Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use
- this helper to check whether the page list is OK. If the helper does not
- return true, the driver should remove MSG_SPLICE_PAGES flag.
+ This resolves the whole MSG_SPLICE_PAGES flag situation, since you can now
+ depend on the result of sendpages_ok(), instead of just assuming everything is
+ okay.
  
+ This issue is what caused bug 2075110 [0] to be discovered in the first place,
+ since it was responsible for contigious blocks of pages where the first was
+ sendpage_ok(), but pages further into the block were not.
  
- 2. nvme-tcp: use sendpages_ok() instead of sendpage_ok() 
(https://github.com/torvalds/linux/commit/6af7331a)
+ [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110
  
- Currently nvme_tcp_try_send_data() use sendpage_ok() in order to disable
- MSG_SPLICE_PAGES, it check the first page of the iterator, the iterator
- may represent contiguous pages.
+ Even with "md/md-bitmap: fix writing non bitmap pages" applied, the issue can
+ still happen, e.g. with merged IO pages, so this fix is still needed to
+ eliminate the issue.
  
- MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
- pages it sends with sendpage_ok().
+ [Fix]
  
- When nvme_tcp_try_send_data() sends an iterator that the first page is
- sendable, but one of the other pages isn't skb_splice_from_iter() warns
- and aborts the data transfer.
+ The fixes landed in mainline 6.12-rc1:
  
- Using the new helper sendpages_ok() in order to disable MSG_SPLICE_PAGES
- solves the issue.
+ commit 23a55f4492fcf868d068da31a2cd30c15f46207d
+ Author: Ofir Gal <ofir....@volumez.com>
+ Date:   Thu Jul 18 11:45:12 2024 +0300
+ Subject: net: introduce helper sendpages_ok()
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23a55f4492fcf868d068da31a2cd30c15f46207d
+     
+ commit 6af7331a70b4888df43ec1d7e1803ae2c43b6981
+ Author: Ofir Gal <ofir....@volumez.com>
+ Date:   Thu Jul 18 11:45:13 2024 +0300
+ Subject: nvme-tcp: use sendpages_ok() instead of sendpage_ok()
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6af7331a70b4888df43ec1d7e1803ae2c43b6981
+     
+ commit 7960af373ade3b39e10106ef415e43a1d2aa48c6
+ Author: Ofir Gal <ofir....@volumez.com>
+ Date:  Thu Jul 18 11:45:14 2024 +0300
+ Subject: drbd: use sendpages_ok() instead of sendpage_ok()
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7960af373ade3b39e10106ef415e43a1d2aa48c6
+ 
+ They are needed for noble and oracular.
+ 
+ [Testcase]
+ 
+ This is the same testcase as the original bug 2075110 [0], as the fix is
+ designed to prevent it or similar other bugs from happening again.
+ 
+ [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110
+ 
+ Because of this, the fix:
+ 
+ commit ab99a87542f194f28e2364a42afbf9fb48b1c724
+ Author: Ofir Gal <ofir....@volumez.com>
+ Date: Fri Jun 7 10:27:44 2024 +0300
+ Subject: md/md-bitmap: fix writing non bitmap pages
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab99a87542f194f28e2364a42afbf9fb48b1c724
+ 
+ needs to be reverted during your test runs, or you won't see the issue
+ reproduce.
+ 
+ You can use this ppa for updated kernels with the revert to trigger the
+ issue:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-revert
+ 
+ This can be reproduced by running blktests md/001 [1], which the author
+ of the fix created to act as a regression test for this issue.
+ 
+ [1]
+ 
https://github.com/osandov/blktests/commit/a24a7b462816fbad7dc6c175e53fcc764ad0a822
+ 
+ Deploy a fresh Noble VM, that has a scratch NVME disk.
+ 
+ $ sudo apt install build-essential fio
+ $ git clone https://github.com/osandov/blktests.git
+ $ cd blktests
+ $ make
+ $ echo "TEST_DEVS=(/dev/nvme0n1)" > config
+ $ sudo ./check md/001
+ 
+ The md/001 test will hang an affected system, and the above oops message
+ will be visible in dmesg.
+ 
+ A test kernel is available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-test
+ 
+ This has both the fixes for this bug, and also bug 2075110. The issue will not
+ reproduce.
+ 
+ There is also a test kernel available with the fix for this bug present, and 
the
+ fix for bug 2075110 reverted, so you can see the impact of these patches only:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-repro
+ 
+ This will also not reproduce the issue anymore.
+ 
+ [Where problems could occur]
+ 
+ What we are changing is rather simple. Instead of checking the first page and
+ assuming all the rest in the contiguous block are sendpage_ok(), we now
+ check each page in the contiguous block to see if all of them are 
sendpage_ok().
+ 
+ If any aren't, then we abort the write to the driver, and try again later. 
This
+ saves us time.
+ 
+ However, it does take longer to call sendpage_ok() on each of the pages in the
+ contiguous block, so there will be a minor performance hit.
+ 
+ Small performance hit for correctness should be okay.
+ 
+ Currently we are only applying to nvme-tcp and drbg subsystems. If a 
regression
+ were to occur, it would affect users of those subsystems only.
+ 
+ [Other info]
+ 
+ Upstream mailing list:
+ https://lore.kernel.org/all/20240718084515.3833733-1-ofir....@volumez.com/T/#u

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2093871

Title:
  Introduce and use sendpages_ok() instead of sendpage_ok() in nvme-tcp
  and drbg

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  In Progress
Status in linux source package in Oracular:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/2093871

  [Impact]

  Currently the nvme-tcp and drbg subsystems try to enable the MSG_SPLICE_PAGES
  flag on pages to be written, and when MSG_SPLICE_PAGES is set, eventually it
  calls skb_splice_from_iter(), which then checks all pages with sendpage_ok()
  to see if all the pages are sendable.

  At the moment, both subsystems only check the first page in a potentially
  contiguous block of pages, if they are sendpage_ok(), and if the first page 
is,
  then it just assumes all the rest are sendpage_ok() too, and sends the I/O off
  to eventually be found out by skb_splice_from_iter(). If one or more of the
  pages in the contiguous block is not sendpage_ok(), then we get a warn 
printed,
  data transfer is aborted. In the nvme-tcp case, IO then hangs.

  This patchset introduces sendpages_ok() which iterates over each page in a
  contiguous block, checks if it is sendpage_ok(), and only returns true if all
  of them are.

  This resolves the whole MSG_SPLICE_PAGES flag situation, since you can now
  depend on the result of sendpages_ok(), instead of just assuming everything is
  okay.

  This issue is what caused bug 2075110 [0] to be discovered in the first place,
  since it was responsible for contigious blocks of pages where the first was
  sendpage_ok(), but pages further into the block were not.

  [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110

  Even with "md/md-bitmap: fix writing non bitmap pages" applied, the issue can
  still happen, e.g. with merged IO pages, so this fix is still needed to
  eliminate the issue.

  [Fix]

  The fixes landed in mainline 6.12-rc1:

  commit 23a55f4492fcf868d068da31a2cd30c15f46207d
  Author: Ofir Gal <ofir....@volumez.com>
  Date:   Thu Jul 18 11:45:12 2024 +0300
  Subject: net: introduce helper sendpages_ok()
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23a55f4492fcf868d068da31a2cd30c15f46207d
      
  commit 6af7331a70b4888df43ec1d7e1803ae2c43b6981
  Author: Ofir Gal <ofir....@volumez.com>
  Date:   Thu Jul 18 11:45:13 2024 +0300
  Subject: nvme-tcp: use sendpages_ok() instead of sendpage_ok()
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6af7331a70b4888df43ec1d7e1803ae2c43b6981
      
  commit 7960af373ade3b39e10106ef415e43a1d2aa48c6
  Author: Ofir Gal <ofir....@volumez.com>
  Date:  Thu Jul 18 11:45:14 2024 +0300
  Subject: drbd: use sendpages_ok() instead of sendpage_ok()
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7960af373ade3b39e10106ef415e43a1d2aa48c6

  They are needed for noble and oracular.

  [Testcase]

  This is the same testcase as the original bug 2075110 [0], as the fix is
  designed to prevent it or similar other bugs from happening again.

  [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110

  Because of this, the fix:

  commit ab99a87542f194f28e2364a42afbf9fb48b1c724
  Author: Ofir Gal <ofir....@volumez.com>
  Date: Fri Jun 7 10:27:44 2024 +0300
  Subject: md/md-bitmap: fix writing non bitmap pages
  Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab99a87542f194f28e2364a42afbf9fb48b1c724

  needs to be reverted during your test runs, or you won't see the issue
  reproduce.

  You can use this ppa for updated kernels with the revert to trigger
  the issue:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-revert

  This can be reproduced by running blktests md/001 [1], which the
  author of the fix created to act as a regression test for this issue.

  [1]
  
https://github.com/osandov/blktests/commit/a24a7b462816fbad7dc6c175e53fcc764ad0a822

  Deploy a fresh Noble VM, that has a scratch NVME disk.

  $ sudo apt install build-essential fio
  $ git clone https://github.com/osandov/blktests.git
  $ cd blktests
  $ make
  $ echo "TEST_DEVS=(/dev/nvme0n1)" > config
  $ sudo ./check md/001

  The md/001 test will hang an affected system, and the above oops
  message will be visible in dmesg.

  A test kernel is available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-test

  This has both the fixes for this bug, and also bug 2075110. The issue will not
  reproduce.

  There is also a test kernel available with the fix for this bug present, and 
the
  fix for bug 2075110 reverted, so you can see the impact of these patches only:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-repro

  This will also not reproduce the issue anymore.

  [Where problems could occur]

  What we are changing is rather simple. Instead of checking the first page and
  assuming all the rest in the contiguous block are sendpage_ok(), we now
  check each page in the contiguous block to see if all of them are 
sendpage_ok().

  If any aren't, then we abort the write to the driver, and try again later. 
This
  saves us time.

  However, it does take longer to call sendpage_ok() on each of the pages in the
  contiguous block, so there will be a minor performance hit.

  Small performance hit for correctness should be okay.

  Currently we are only applying to nvme-tcp and drbg subsystems. If a 
regression
  were to occur, it would affect users of those subsystems only.

  [Other info]

  Upstream mailing list:
  https://lore.kernel.org/all/20240718084515.3833733-1-ofir....@volumez.com/T/#u

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2093871/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to