On Mon, Aug 26, 2019 at 4:05 AM Tobias Koch <1834...@bugs.launchpad.net> wrote:
> > (Odds are that whatever causes it to be recreated later in boot would be > > blocked by cloud-init waiting.) > > But that's not happening. The instance does boot normally, the only > service degraded is cloud-init and there is no significant delay either. > > So conversely, if I put a loop into cloud-init and just waited on the > symlink to appear and if that worked with minimal delay, would that > refute the above? > That's still a workaround for something we don't exactly know why is racing nor why this isn't more widespread. The code in cloud-init and growpart, sgdisk and partx are stable (the code has not changed significantly much in some time). We don't have root cause for the race at this time. When cloud-init invokes growpart the symlink exists, and when growpart returns sometimes it does not. If anything growpart should address the race itself; and at this point, it would have to pickup a workaround as well. Let's at least make sure we understand the actual race before we look further into workarounds. >From what I can see in what growpart is doing, the sgdisk command will clear the partition tables (this involves removing the partition and then re-adding it, which triggers udev. Further, Dan's show that partx --update can also trigger a remove and an add. Looking at the partx update code; *sometimes* it will remove and add, however, if the partition to be updated *exists* then it will instead issue an update IOCTL which only updates the size value in sysfs. https://github.com/karelzak/util- linux/blob/53ae7d60cfeacd4e87bfe6fcc015b58b78ef4555/disk- utils/partx.c#L451 Which makes me think that in the successful path, we're seeing partx --update take the partx_resize_partition path, which submits the resize IOCTL https://github.com/karelzak/util- linux/blob/917f53cf13c36d32c175f80f2074576595830573/include/partx.h#L54 which in linux kernel does: https://elixir.bootlin.com/linux/latest/source/block/ioctl.c#L100 and just updates the size value in sysfs: https://elixir.bootlin.com/linux/latest/source/block/ioctl.c#L146 which AFAICT does not emit any new uevents; Lastly, in either path (partx updates vs partx removes/adds); invoking a udevadm settle after the binary has exited is the reasonable way to ensure that *if* any uevents were created, that they are processed. growpart could add udevadm settle code; so could cloud-init. We actually did that in our first test package and that did not result in ensuring the symlink was present. All of this suggests to me that *something* isn't processing the sequence of uevents in such a way that the once they've all been processed we have the symlink. We must be missing some other bit of information in the failing path where the symlink is eventually recreated (possibly due to some other write or close on the disk on the disk which re-triggers rules). > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1834875 > > Title: > cloud-init growpart race with udev > > To manage notifications about this bug go to: > https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions > -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1834875 Title: cloud-init growpart race with udev Status in cloud-init: Incomplete Status in cloud-utils: New Status in systemd package in Ubuntu: New Bug description: On Azure, it happens regularly (20-30%), that cloud-init's growpart module fails to extend the partition to full size. Such as in this example: ======================================== 2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True) 2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True) 2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds 2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always 2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed 2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules freq=freq) File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run return self._runners.run(name, functor, args, freq, clear_on_fail) File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run results = functor(*args) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle func=resize_devices, args=(resizer, devices)) File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time ret = func(*args, **kwargs) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices (old, new) = resizer.resize(disk, ptnum, blockdev) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize return (before, get_size(partdev)) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size fd = os.open(filename, os.O_RDONLY) FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3' ======================================== @rcj suggested this is a race with udev. This seems to only happen on Cosmic and later. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp