[Touch-packages] [Bug 1856871] Re: i/o error if next unused loop device is queried

Jorge Merlino Wed, 29 Mar 2023 13:36:45 -0700

** Description changed:

- This is reproducible in Bionic and late.
  
- Here's an example running 'focal':
+ [Impact]
  
- $ lsb_release -cs
- focal
+ * There's an I/O error on fsync() in a detached loop device if it has
+ been previously attached. The issue is that write cache is enabled in
+ the attach path in loop_configure() but it isn't disabled in the detach
+ path; thus it remains enabled in the block device regardless of whether
+ it is attached or not.
  
- $ uname -r
- 5.3.0-24-generic
+ * fsync() on detached loop devices can be called by partition tools and
+ commands run by sosreport, so the unexpected kernel error message might
+ surprise users or even distract from the actual issue being
+ investigatedr. It might also trigger alerts in
+ logging/monitoring/alerting stacks
  
- The error is:
- blk_update_request: I/O error, dev loop2, sector 0
+ [Fix]
  
- and on more recent kernel:
+ * Disable write cache in the detach path
  
- kernel: [18135.185709] blk_update_request: I/O error, dev loop18, sector
- 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
+ [Test Plan]
  
+ * Attach and detach an image to a loop device and test fsync return
+ value aterwards
  
- How to trigger it:
- $ sosreport -o block
+ # DEV=/dev/loop7
  
- or more precisely the cmd causing the situation inside the block plugin:
- $ parted -s $(losetup -f) unit s print
+ # IMG=/tmp/image
+ # truncate --size 1M $IMG
  
- https://github.com/sosreport/sos/blob/master/sos/plugins/block.py#L52
+ # losetup $DEV $IMG
+ # losetup -d $DEV
  
- but if I run it on the next next unused loop device, in this case
- /dev/loop3 (which is also unused), no errors.
+ Before:
+     # strace -e fsync parted -s $DEV print 2>&1 | grep fsync
+     fsync(3)                                = -1 EIO (Input/output error)
+     Warning: Error fsyncing/closing /dev/loop7: Input/output error
+     [  982.529929] blk_update_request: I/O error, dev loop7, sector 0 op 
0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
  
- While I agree that sosreport shouldn't query unused loop devices, there
- is definitely something going on with the next unused loop device.
+ After:
+     # strace -e fsync parted -s $DEV print 2>&1 | grep fsync
+     fsync(3)                                = 0
  
- What is differentiate loop2 and loop3 and any other unused ones ?
+ [Where problems could occur]
  
- 3 things so far I have noticed:
- * loop2 is the next unused loop device (losetup -f)
- * A reboot is needed (if some loop modification (snap install, mount loop, 
...) has been made at runtime
- * I have also noticed that loop2 (or whatever the next unused one is) have 
some stat as oppose to other unused loop devices. The stat exist already right 
after the system boot for the next unused loop device.
- 
- /sys/block/loop2/stat
- ::::::::::::::
- 2 0 10 0 1 0 0 0 0 0 0
- 
- 2  = number of read I/Os processed
- 10 = number of sectors read
- 1  = number of write I/Os processed
- 
- Explanation of each column:
- https://www.kernel.org/doc/html/latest/block/stat.html
- 
- while /dev/loop3 doesn't
- 
- /sys/block/loop3/stat
- ::::::::::::::
- 0 0 0 0 0 0 0 0 0 0 0
- 
- Which tells me that something during the boot process most likely
- acquired (on purpose or not) the next unused loop and possibly didn't
- released it well enough.
- 
- If loop2 is generating errors, and I install a snap, the snap squashfs
- will take loop2, making loop3 the next unused loop device.
- 
- If I query loop3 with 'parted' right after, no errors.
- 
- If I reboot, and query loop3 again, then no I'll have an error.
- 
- To triggers the errors it need to be after a reboot and it only impact
- the first unused loop device available (losetup -f).
- 
- This was tested with focal/systemd whic his very close to latest
- upstream code.
- 
- This has been test with latest v5.5 mainline kernel as well.
- 
- For now, I don't think it's a kernel problem, I'm more thinking of a
- userspace misbehaviour dealing with loop device (or block device) at
- boot.
+ * The detach path for block devices is modified. Worst case scenario
+ would be an error when detaching loop devices.

** Also affects: parted (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: udev (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: systemd (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: snapd (Ubuntu Focal)
   Importance: Undecided
       Status: New

** No longer affects: udev (Ubuntu Focal)

** No longer affects: systemd (Ubuntu Focal)

** No longer affects: snapd (Ubuntu Focal)

** No longer affects: parted (Ubuntu Focal)

** Changed in: linux (Ubuntu Focal)
       Status: New => In Progress

** Changed in: linux (Ubuntu Focal)
     Assignee: (unassigned) => Jorge Merlino (jorge-merlino)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1856871

Title:
  i/o error if next unused loop device is queried

Status in linux package in Ubuntu:
  Fix Released
Status in parted package in Ubuntu:
  Invalid
Status in snapd package in Ubuntu:
  Invalid
Status in systemd package in Ubuntu:
  Invalid
Status in udev package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  In Progress
Status in linux source package in Jammy:
  Fix Released

Bug description:
  
  [Impact]

  * There's an I/O error on fsync() in a detached loop device if it has
  been previously attached. The issue is that write cache is enabled in
  the attach path in loop_configure() but it isn't disabled in the detach
  path; thus it remains enabled in the block device regardless of whether
  it is attached or not.

  * fsync() on detached loop devices can be called by partition tools and
  commands run by sosreport, so the unexpected kernel error message might
  surprise users or even distract from the actual issue being
  investigatedr. It might also trigger alerts in
  logging/monitoring/alerting stacks

  [Fix]

  * Disable write cache in the detach path

  [Test Plan]

  * Attach and detach an image to a loop device and test fsync return
  value aterwards

  # DEV=/dev/loop7

  # IMG=/tmp/image
  # truncate --size 1M $IMG

  # losetup $DEV $IMG
  # losetup -d $DEV

  Before:
      # strace -e fsync parted -s $DEV print 2>&1 | grep fsync
      fsync(3)                                = -1 EIO (Input/output error)
      Warning: Error fsyncing/closing /dev/loop7: Input/output error
      [  982.529929] blk_update_request: I/O error, dev loop7, sector 0 op 
0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0

  After:
      # strace -e fsync parted -s $DEV print 2>&1 | grep fsync
      fsync(3)                                = 0

  [Where problems could occur]

  * The detach path for block devices is modified. Worst case scenario
  would be an error when detaching loop devices.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1856871/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

[Touch-packages] [Bug 1856871] Re: i/o error if next unused loop device is queried

Reply via email to