Public bug reported: The rescue and unrescue operations in Nova assume that calls to `connect_volume` in os-brick are idempotent which it's currently true, but it was not something we guaranteed in os-brick.
With the recent CVE [1][2] we realized that os-brick cannot assume on the `connect_volume` that if there is a device/s present for the provided connection information then it is the right volume, and even if it's the right volume it cannot assume that it has the right information in sysfs (like the volume size), so it needs to clean things up to the best of its ability before actually connecting, and just in case it needs to confirm just before returning a patch to the caller that the device it's going to return is actually correct and consistent (as in the multipath only has devices with the same size and SCSI ID). This means that os-brick's `connect_volume` will no longer be idempotent by design once this patch [3] merges to prevent data leak in any corner cases. This will break the rescue and unrescue nova operations, because on the rescue call it stashes the original XML [4] and then unstashes it on unrescue [5], but in between Nova calls `connect_volume` for the rescue instance, effectively disconnecting the original device path. This means that reusing that original path either points to a non- existent device or to a volume of another instance. We can see an example of the non-existent device case in the failed CI job [6] where test `tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached` fails with a nova-compute error [7]: libvirt.libvirtError: Cannot access storage file '/dev/sdd': No such file or directory [1]: https://nvd.nist.gov/vuln/detail/CVE-2023-2088 [2]: https://bugs.launchpad.net/nova/+bug/2004555 [3]: https://review.opendev.org/c/openstack/os-brick/+/882841 [4]: https://github.com/openstack/nova/blob/71b105a4cfea054827e09b5b8df6be845909275a/nova/virt/libvirt/driver.py#L4229-L4232 [5]: https://github.com/openstack/nova/blob/71b105a4cfea054827e09b5b8df6be845909275a/nova/virt/libvirt/driver.py#L4323-L4328 [6]: https://a30336fa6a8fca5c6dba- fe779e5654b21fdff79727b204dfb7d6.ssl.cf1.rackcdn.com/882841/3/check/os- brick-src-tempest-lvm-lio-barbican/8ef7adf/testr_results.html [7]: https://zuul.opendev.org/t/openstack/build/8ef7adf6a82248d8b9f94eb5b5bba73c/log/controller/logs/screen- n-cpu.txt?severity=4#77239 ** Affects: nova Importance: High Status: Triaged ** Tags: cinder libvirt rescue volumes -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2020699 Title: Nova's rescue and unrescue assumes os-brick connect_volume is idempotent Status in OpenStack Compute (nova): Triaged Bug description: The rescue and unrescue operations in Nova assume that calls to `connect_volume` in os-brick are idempotent which it's currently true, but it was not something we guaranteed in os-brick. With the recent CVE [1][2] we realized that os-brick cannot assume on the `connect_volume` that if there is a device/s present for the provided connection information then it is the right volume, and even if it's the right volume it cannot assume that it has the right information in sysfs (like the volume size), so it needs to clean things up to the best of its ability before actually connecting, and just in case it needs to confirm just before returning a patch to the caller that the device it's going to return is actually correct and consistent (as in the multipath only has devices with the same size and SCSI ID). This means that os-brick's `connect_volume` will no longer be idempotent by design once this patch [3] merges to prevent data leak in any corner cases. This will break the rescue and unrescue nova operations, because on the rescue call it stashes the original XML [4] and then unstashes it on unrescue [5], but in between Nova calls `connect_volume` for the rescue instance, effectively disconnecting the original device path. This means that reusing that original path either points to a non- existent device or to a volume of another instance. We can see an example of the non-existent device case in the failed CI job [6] where test `tempest.api.compute.servers.test_server_rescue.ServerStableDeviceRescueTest.test_stable_device_rescue_disk_virtio_with_volume_attached` fails with a nova-compute error [7]: libvirt.libvirtError: Cannot access storage file '/dev/sdd': No such file or directory [1]: https://nvd.nist.gov/vuln/detail/CVE-2023-2088 [2]: https://bugs.launchpad.net/nova/+bug/2004555 [3]: https://review.opendev.org/c/openstack/os-brick/+/882841 [4]: https://github.com/openstack/nova/blob/71b105a4cfea054827e09b5b8df6be845909275a/nova/virt/libvirt/driver.py#L4229-L4232 [5]: https://github.com/openstack/nova/blob/71b105a4cfea054827e09b5b8df6be845909275a/nova/virt/libvirt/driver.py#L4323-L4328 [6]: https://a30336fa6a8fca5c6dba- fe779e5654b21fdff79727b204dfb7d6.ssl.cf1.rackcdn.com/882841/3/check/os- brick-src-tempest-lvm-lio-barbican/8ef7adf/testr_results.html [7]: https://zuul.opendev.org/t/openstack/build/8ef7adf6a82248d8b9f94eb5b5bba73c/log/controller/logs/screen- n-cpu.txt?severity=4#77239 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2020699/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp