[Kernel-packages] [Bug 1935017] [NEW] Possible io_uring regression with QEMU on Ubuntu's kernel

Park Ju Hyung Thu, 08 Jul 2021 03:15:47 -0700

Public bug reported:

Hi everyone.
Possible io_uring regression with QEMU on Ubuntu's kernel


With the latest Ubuntu 20.04's HWE kernel 5.8.0-59, I'm noticing some
weirdness when using QEMU/libvirt with the following storage
configuration:

<disk type="block" device="disk">
  <driver name="qemu" type="raw" cache="none" io="io_uring" discard="unmap" 
detect_zeroes="unmap"/>
  <source 
dev="/dev/disk/by-id/md-uuid-df271a1e:9dfb7edb:8dc4fbb8:c43e652f-part1" 
index="1"/>
  <backingStore/>
  <target dev="vda" bus="virtio"/>
  <alias name="virtio-disk0"/>
  <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</disk>

QEMU version is 5.2+dfsg-9ubuntu3 and libvirt version is 7.0.0-2ubuntu2.

The guest VM is unable to handle I/O properly with io_uring, and nuking 
io="io_uring" fixes the issue.
On one machine (EPYC 7742), the partition table cannot be read and on another 
(Ryzen 9 3950X), ext4 detects weirdness with journaling and ultimately remounts 
the guest disk to R/O:

[    2.712321] virtio_blk virtio5: [vda] 3906519775 512-byte logical blocks 
(2.00 TB/1.82 TiB)
[    2.714054] vda: detected capacity change from 0 to 2000138124800
[    2.963671] blk_update_request: I/O error, dev vda, sector 0 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.964909] Buffer I/O error on dev vda, logical block 0, async page read
[    2.966021] blk_update_request: I/O error, dev vda, sector 1 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.967177] Buffer I/O error on dev vda, logical block 1, async page read
[    2.968330] blk_update_request: I/O error, dev vda, sector 2 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.969504] Buffer I/O error on dev vda, logical block 2, async page read
[    2.970767] blk_update_request: I/O error, dev vda, sector 3 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.971624] Buffer I/O error on dev vda, logical block 3, async page read
[    2.972170] blk_update_request: I/O error, dev vda, sector 4 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.972728] Buffer I/O error on dev vda, logical block 4, async page read
[    2.973308] blk_update_request: I/O error, dev vda, sector 5 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.973920] Buffer I/O error on dev vda, logical block 5, async page read
[    2.974496] blk_update_request: I/O error, dev vda, sector 6 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.975093] Buffer I/O error on dev vda, logical block 6, async page read
[    2.975685] blk_update_request: I/O error, dev vda, sector 7 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.976295] Buffer I/O error on dev vda, logical block 7, async page read
[    2.980074] blk_update_request: I/O error, dev vda, sector 0 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.981104] Buffer I/O error on dev vda, logical block 0, async page read
[    2.981786] blk_update_request: I/O error, dev vda, sector 1 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
[    2.982083] ixgbe 0000:06:00.0: Multiqueue Enabled: Rx Queue count = 63, Tx 
Queue count = 63 XDP Queue count = 0
[    2.982442] Buffer I/O error on dev vda, logical block 1, async page read
[    2.983642] ldm_validate_partition_table(): Disk read failed.

Kernel 5.8.0-55 is fine, and the only io_uring-related change between
5.8.0-55 and 5.8.0-59 is the commit 4b982bd0f383 ("io_uring: don't mark
S_ISBLK async work as unbounded").

I'm hesitant to run lspci -vnvn and post other bug reporting logs as the 
machine includes some proprietary hardware (quite unrelated to this specific 
issue), but this issue is reproducible on multiple machines.
The another machine that I was able to reproduce this is running a production 
software that needs to run 24/7, so I'm also hesitant to gather logs there too.

Thanks,
Regards

Update:

It was the commit 87c9cfe0fa1fb ("block: don't ignore REQ_NOWAIT for direct 
IO").
(Upstream commit f8b78caf21d5bc3fcfc40c18898f9d52ed1451a5)

I've double checked by resetting the Git to Ubuntu-hwe-5.8-5.8.0-59.66_20.04.1 
and reverting that patch alone.
It fixes the issue.

It seems like this patch was backported to multiple stable trees, so I'm not 
exactly sure why only Canonical's 5.8 is affected.
FWIW, 5.8.0-61 is also affected.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1935017

Title:
  Possible io_uring regression with QEMU on Ubuntu's kernel

Status in linux package in Ubuntu:
  New

Bug description:
  Hi everyone.
  Possible io_uring regression with QEMU on Ubuntu's kernel

  With the latest Ubuntu 20.04's HWE kernel 5.8.0-59, I'm noticing some
  weirdness when using QEMU/libvirt with the following storage
  configuration:

  <disk type="block" device="disk">
    <driver name="qemu" type="raw" cache="none" io="io_uring" discard="unmap" 
detect_zeroes="unmap"/>
    <source 
dev="/dev/disk/by-id/md-uuid-df271a1e:9dfb7edb:8dc4fbb8:c43e652f-part1" 
index="1"/>
    <backingStore/>
    <target dev="vda" bus="virtio"/>
    <alias name="virtio-disk0"/>
    <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
  </disk>

  QEMU version is 5.2+dfsg-9ubuntu3 and libvirt version is
  7.0.0-2ubuntu2.

  The guest VM is unable to handle I/O properly with io_uring, and nuking 
io="io_uring" fixes the issue.
  On one machine (EPYC 7742), the partition table cannot be read and on another 
(Ryzen 9 3950X), ext4 detects weirdness with journaling and ultimately remounts 
the guest disk to R/O:

  [    2.712321] virtio_blk virtio5: [vda] 3906519775 512-byte logical blocks 
(2.00 TB/1.82 TiB)
  [    2.714054] vda: detected capacity change from 0 to 2000138124800
  [    2.963671] blk_update_request: I/O error, dev vda, sector 0 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.964909] Buffer I/O error on dev vda, logical block 0, async page read
  [    2.966021] blk_update_request: I/O error, dev vda, sector 1 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.967177] Buffer I/O error on dev vda, logical block 1, async page read
  [    2.968330] blk_update_request: I/O error, dev vda, sector 2 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.969504] Buffer I/O error on dev vda, logical block 2, async page read
  [    2.970767] blk_update_request: I/O error, dev vda, sector 3 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.971624] Buffer I/O error on dev vda, logical block 3, async page read
  [    2.972170] blk_update_request: I/O error, dev vda, sector 4 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.972728] Buffer I/O error on dev vda, logical block 4, async page read
  [    2.973308] blk_update_request: I/O error, dev vda, sector 5 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.973920] Buffer I/O error on dev vda, logical block 5, async page read
  [    2.974496] blk_update_request: I/O error, dev vda, sector 6 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.975093] Buffer I/O error on dev vda, logical block 6, async page read
  [    2.975685] blk_update_request: I/O error, dev vda, sector 7 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.976295] Buffer I/O error on dev vda, logical block 7, async page read
  [    2.980074] blk_update_request: I/O error, dev vda, sector 0 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.981104] Buffer I/O error on dev vda, logical block 0, async page read
  [    2.981786] blk_update_request: I/O error, dev vda, sector 1 op 0x0:(READ) 
flags 0x0 phys_seg 1 prio class 0
  [    2.982083] ixgbe 0000:06:00.0: Multiqueue Enabled: Rx Queue count = 63, 
Tx Queue count = 63 XDP Queue count = 0
  [    2.982442] Buffer I/O error on dev vda, logical block 1, async page read
  [    2.983642] ldm_validate_partition_table(): Disk read failed.

  Kernel 5.8.0-55 is fine, and the only io_uring-related change between
  5.8.0-55 and 5.8.0-59 is the commit 4b982bd0f383 ("io_uring: don't
  mark S_ISBLK async work as unbounded").

  I'm hesitant to run lspci -vnvn and post other bug reporting logs as the 
machine includes some proprietary hardware (quite unrelated to this specific 
issue), but this issue is reproducible on multiple machines.
  The another machine that I was able to reproduce this is running a production 
software that needs to run 24/7, so I'm also hesitant to gather logs there too.

  Thanks,
  Regards

  Update:

  It was the commit 87c9cfe0fa1fb ("block: don't ignore REQ_NOWAIT for direct 
IO").
  (Upstream commit f8b78caf21d5bc3fcfc40c18898f9d52ed1451a5)

  I've double checked by resetting the Git to 
Ubuntu-hwe-5.8-5.8.0-59.66_20.04.1 and reverting that patch alone.
  It fixes the issue.

  It seems like this patch was backported to multiple stable trees, so I'm not 
exactly sure why only Canonical's 5.8 is affected.
  FWIW, 5.8.0-61 is also affected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1935017/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1935017] [NEW] Possible io_uring regression with QEMU on Ubuntu's kernel

Reply via email to