A couple of comments on the suggested path: > Imho the sequency of commands should be: > * take flock on the device, to neutralise udev
+1 on this approach. Do you know if the flock will block systemd's inotify write watch on the block device which triggers udevd? This is the typical race we see with partition creation and rules executing. > * modify device with sfdisk > * reread partitions tables (i would say with blockdev --rereadpt, rather than > partx/partprobe) I'm not sure we can use blockdev --rereadpt as we are operating upon the root disk we're booted on and my understanding is that the ioctl that partx uses is the only way to update the kernel partition table while the disk is in use, otherwise we'd see the normal warning message like when you fdisk your booted device and it says the disk is busy and cannot read the partition table. > * release the flock +1 > * udevadm trigger --action=add --wait device (or trigger && settle) I don't relish the idea of *re-adding* actions on the disk again since the partx update should have already emitted the uevents associated with the new partitions. However, we could do this as a way to force reloading of everything. I'd like to withhold judgement on whether we need this after testing with use of flock on the device. > And like have a canary "only use locked codepath on this region" such that we can assert through testing that this no longer happens with new code, but does with old code. The change to cloud-utils growpart could add a flag (--use-flock) so cloud-init could emit different log messages on which path it takes (including a warning if we cannot use flock (ie, you may race). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1834875 Title: cloud-init growpart race with udev Status in cloud-init: Incomplete Status in cloud-utils: New Status in linux-azure package in Ubuntu: New Status in systemd package in Ubuntu: Incomplete Bug description: On Azure, it happens regularly (20-30%), that cloud-init's growpart module fails to extend the partition to full size. Such as in this example: ======================================== 2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True) 2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True) 2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds 2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always 2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed 2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules freq=freq) File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run return self._runners.run(name, functor, args, freq, clear_on_fail) File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run results = functor(*args) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle func=resize_devices, args=(resizer, devices)) File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time ret = func(*args, **kwargs) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices (old, new) = resizer.resize(disk, ptnum, blockdev) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize return (before, get_size(partdev)) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size fd = os.open(filename, os.O_RDONLY) FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3' ======================================== @rcj suggested this is a race with udev. This seems to only happen on Cosmic and later. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp