[Yahoo-eng-team] [Bug 2095173] [NEW] Ephemeral volumes can be corrupted during VM migration

Doug Szumski Fri, 17 Jan 2025 06:31:49 -0800

Public bug reported:

Steps to reproduce:


1. Configure XFS as the default ephemeral volume filesystem. Use a recent 
distro with XFS 5.
2. Create a VM with an ephemeral volume
3. Log into the VM, check the ephemeral volume is mounted
3. Cold migrate the VM to a new host
4. Log into the VM, check the kernel logs which will show corruption has been 
detected on the volume

Why does this happen?

During step 2, Nova creates a backing file and creates an XFS v5
filesystem on it with a unique UUID. A new qcow2 image is created using
this backing file and is passed to the VM.

When the VM is migrated, Nova copies the top level qcow2 image to the
destination hypervisor. It then *recreates* the backing file, which
causes the UUID to change.

>From inside the VM, XFS detects the metadata corruption and refuses to
mount the volume.

The same issue happens during live migration, but you have to reboot or
remount the volume to see the corruption.

As best as I can tell, this affects all supported releases. Not all file
systems detect the corruption. For example, if you force the use of XFS
4, the corruption isn't detected and everything appears fine.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2095173

Title:
  Ephemeral volumes can be corrupted during VM migration

Status in OpenStack Compute (nova):
  New

Bug description:
  Steps to reproduce:

  1. Configure XFS as the default ephemeral volume filesystem. Use a recent 
distro with XFS 5.
  2. Create a VM with an ephemeral volume
  3. Log into the VM, check the ephemeral volume is mounted
  3. Cold migrate the VM to a new host
  4. Log into the VM, check the kernel logs which will show corruption has been 
detected on the volume

  Why does this happen?

  During step 2, Nova creates a backing file and creates an XFS v5
  filesystem on it with a unique UUID. A new qcow2 image is created
  using this backing file and is passed to the VM.

  When the VM is migrated, Nova copies the top level qcow2 image to the
  destination hypervisor. It then *recreates* the backing file, which
  causes the UUID to change.

  From inside the VM, XFS detects the metadata corruption and refuses to
  mount the volume.

  The same issue happens during live migration, but you have to reboot
  or remount the volume to see the corruption.

  As best as I can tell, this affects all supported releases. Not all
  file systems detect the corruption. For example, if you force the use
  of XFS 4, the corruption isn't detected and everything appears fine.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2095173/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 2095173] [NEW] Ephemeral volumes can be corrupted during VM migration

Reply via email to