The sequence is:

exec growpart
  exec sgdisk --info  # read-only
  exec sgdisk --pretend  # read-only
  exec sgdisk --backup  # read-only copy
  # modification of disk starts
  exec sgdisk --move-second-header \
              --delete=PART \
              --new=PART \
              --typecode --partition-guid --change-name
  # now that sgdisk has *closed* the filehandle on the disk, systemd-udevd will
  # get an inotify signal and trigger udevd to run udev scripts on the disk.
  # this includes the *removal* of symlinks due to the --delete portion of 
sgdisk call
  # and following the removal, the -new will trigger the add run on the rules 
which would
  # recreate the symlinks.

  # update kernel partition sizes; this is an ioctl so it does not trigger an 
udev events
  exec partx --update
  # the kernel has the new partition sizes, and udev scripts/events are all 
queued (and possibly in flight)
exit growpart

cloud-init invokes get_size() operation which:
   # this is where the race occurs if the symlink created by udev is *not* 
present
   os.open(/dev/disk/by-id/fancy-symlink-with-partuuid-points-to-sdb1)
  
Dan had put a udevadm settle in this spot like so

def get_size(filename)
   util.subp(['udevadm', 'settle'])
   os.open(....)

So, you're suggesting that somehow _not all_ of the uevents triggered by
the sgdisk command in growpart *wouldn't* have been queued before we
call udevadm settle?

If some other events are happening how is cloud-init to know such that
it can take action to "handle this race" more robustly?

Lastly if there is a *race* in the symlink creation/remove/delay in
uevent propigation; why is that a userspace let alone a cloud-init
issue.  This isn't universally reproducible, rather it's pretty narrow
circumstances between certain kernels and udevs all the while the
growpart/cloud-init code remains the same.




** Changed in: cloud-init
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1834875

Title:
  cloud-init growpart race with udev

Status in cloud-init:
  Incomplete
Status in systemd package in Ubuntu:
  New

Bug description:
  On Azure, it happens regularly (20-30%), that cloud-init's growpart
  module fails to extend the partition to full size.

  Such as in this example:

  ========================================

  2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', 
'--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, 
capture=True)
  2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', 
'/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
  2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
  2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: 
init-network/config-growpart: FAIL: running config-growpart with frequency 
always
  2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 
'cloudinit.config.cc_growpart' from 
'/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 
'cloudinit.config.cc_growpart' from 
'/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  Traceback (most recent call last):
    File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in 
_run_modules
      freq=freq)
    File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
      return self._runners.run(name, functor, args, freq, clear_on_fail)
    File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
      results = functor(*args)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
351, in handle
      func=resize_devices, args=(resizer, devices))
    File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in 
log_time
      ret = func(*args, **kwargs)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
298, in resize_devices
      (old, new) = resizer.resize(disk, ptnum, blockdev)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
159, in resize
      return (before, get_size(partdev))
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
198, in get_size
      fd = os.open(filename, os.O_RDONLY)
  FileNotFoundError: [Errno 2] No such file or directory: 
'/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'

  ========================================

  @rcj suggested this is a race with udev. This seems to only happen on
  Cosmic and later.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to