This bug is awaiting verification that the linux-azure/5.15.0-1006.7
kernel in -proposed solves the problem. Please test the kernel and
update this bug with the results. If the problem is solved, change the
tag 'verification-needed-jammy' to 'verification-done-jammy'. If the
problem still exists, change the tag 'verification-needed-jammy' to
'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1973169

Title:
  [Azure][CVM] Fix swiotlb_max_mapping_size() for potential bounce
  buffer allocation failure in storvsc

Status in linux-azure package in Ubuntu:
  New
Status in linux-azure source package in Jammy:
  In Progress

Bug description:
  SRU Justification

  [Impact]
  Description of problem:

  When the v5.15 linux-azure kernel is used for CVM on Azure, it uses swiotlb 
for bounce buffering.
  We recently found an issue in swiotlb_max_mapping_size(), which is used by 
the SCSI subsytem APIs, which are used by the hv_storvsc driver.

  The issue is: currently swiotlb_max_mapping_size() always reports
  256KB (i.e. 128 bounce buffer slots), but swiotlb_tbl_map_single() is
  unable to allocate a bounce buffer for an unaligned 256KB request, and
  eventually it can get stuck and we see this call-trace (BTW, this
  call-trace is obtained from a SLES VM, but I believe the issue exists
  in all distro kernels supporting CVM, and Tianyu says he's able to
  repro the issue in a Ubuntu CVM when trying to mount a XFS file
  system):

  [ 186.458666][ C1] swiotlb_tbl_map_single+0x396/0x920
  [ 186.458669][ C1] swiotlb_map+0xaa/0x2d0
  [ 186.458674][ C1] dma_direct_map_sg+0xee/0x2c0
  [ 186.458677][ C1] __dma_map_sg_attrs+0x30/0x70
  [ 186.458680][ C1] dma_map_sg_attrs+0xa/0x20
  [ 186.458681][ C1] scsi_dma_map+0x35/0x40
  [ 186.458684][ C1] storvsc_queuecommand+0x20b/0x890
  [ 186.458696][ C1] scsi_queue_rq+0x606/0xb80
  [ 186.458698][ C1] __blk_mq_try_issue_directly+0x149/0x1c0
  [ 186.458702][ C1] blk_mq_try_issue_directly+0x15/0x50
  [ 186.458704][ C1] blk_mq_submit_bio+0x4b6/0x620
  [ 186.458706][ C1] __submit_bio+0xe8/0x160
  [ 186.458708][ C1] submit_bio_noacct_nocheck+0xf0/0x2b0
  [ 186.458713][ C1] submit_bio+0x42/0xd0
  [ 186.458714][ C1] submit_bio_wait+0x54/0xb0
  [ 186.458718][ C1] xfs_rw_bdev+0x180/0x1b0 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.458769][ C1] xlog_do_io+0x8d/0x140 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.458819][ C1] xlog_bread+0x1f/0x40 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.458859][ C1] xlog_find_verify_cycle+0xc8/0x180 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.458899][ C1] xlog_find_head+0x2ae/0x3a0 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.458937][ C1] xlog_find_tail+0x44/0x360 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.458978][ C1] xlog_recover+0x2b/0x170 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.459056][ C1] xfs_log_mount+0x15b/0x270 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.459098][ C1] xfs_mountfs+0x49e/0x830 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.459224][ C1] xfs_fs_fill_super+0x5c2/0x7c0 [xfs 
172cb9b0bc08b0ee82c7c88dc584daeab1b34d46]
  [ 186.459303][ C1] get_tree_bdev+0x163/0x260
  [ 186.459307][ C1] vfs_get_tree+0x25/0xc0
  [ 186.459309][ C1] path_mount+0x704/0x9c0

  Details: For example, the original physical address from the SCSI
  layer can be 0x1_0903_f200 with size=256KB, and when
  swiotlb_tbl_map_single() calls swiotlb_find_slots(), it passes
  "alloc_size + offset" (i.e. 256KB + 0x200 ) to swiotlb_find_slots(),
  which then calculates "nslots = nr_slots(alloc_size) ==> 129" and
  fails to allocate a bounce buffer as the maximum allowable number of
  contiguous slabs to map is IO_TLB_SEGSIZE (128).

  The issue affects the hv_storvsc driver, as it calls
  dma_set_min_align_mask(&device->device, HV_HYP_PAGE_SIZE - 1);

  dma_set_min_align_mask() is also called by hv_netvsc, but netvsc is
  not affected as netvsc never calls swiotlb_tbl_map_single() with a
  near-to-256KB size.

  dma_set_min_align_mask() is also called by the NVMe driver, but since
  we don't support PCI device assignment for CVM for now, that's not
  affected for now.

  Tianyu Lan made a fix which is under review:
  https://lwn.net/ml/linux-kernel/20220510142109.777738-1-ltykernel%40gmail.com/

  Note: the linux-azure-cvm v5.4 kernel doesn't need the fix, as that
  kernel uses a vmbus private bounce buffering implementation
  (drivers/hv/hv_bounce.c) rathen than swiotlb.

  [Test Case]

  Microsoft tested

  [Where things could go wrong]

  Bounce buffers may fail to allocate.

  [Other Info]

  SF: #00336634

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1973169/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to