On 3/26/2025 5:34 PM, Michael Roth wrote:
On Wed, Mar 26, 2025 at 05:13:50PM -0300, Fabiano Rosas wrote:
Michael Roth <michael.r...@amd.com> writes:
Quoting Tom Lendacky (2025-03-26 14:21:31)
On 3/26/25 13:46, Tom Lendacky wrote:
On 3/7/25 12:15, Fabiano Rosas wrote:
From: Steve Sistare <steven.sist...@oracle.com>
Unlike cpr-reboot mode, cpr-transfer mode cannot save volatile ram blocks
in the migration stream file and recreate them later, because the physical
memory for the blocks is pinned and registered for vfio. Add a blocker
for volatile ram blocks.
Also add a blocker for RAM_GUEST_MEMFD. Preserving guest_memfd may be
sufficient for CPR, but it has not been tested yet.
Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
Reviewed-by: Fabiano Rosas <faro...@suse.de>
Reviewed-by: Peter Xu <pet...@redhat.com>
Reviewed-by: David Hildenbrand <da...@redhat.com>
Message-ID: <1740667681-257312-1-git-send-email-steven.sist...@oracle.com>
Signed-off-by: Fabiano Rosas <faro...@suse.de>
---
include/exec/memory.h | 3 ++
include/exec/ramblock.h | 1 +
migration/savevm.c | 2 ++
system/physmem.c | 66 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 72 insertions(+)
This patch breaks booting an SNP guest as it triggers the following
assert:
qemu-system-x86_64: ../util/error.c:68: error_setv: Assertion `*errp == NULL'
failed.
Usually this means the error has already been set previously, which is
not allowed.
I tracked it to the err_setg() call in ram_block_add_cpr_blocker().
It looks like the error message is unable to be printed because
rb->cpr_blocker is NULL.
Adding aux-ram-share=on to the -machine object gets me past the error and
therefore the assertion, but isn't that an incompatible change to how an
SNP guest has to be started?
If I update the err_setg() call to use the errp parameter that is passed
into ram_block_add_cpr_blocker(), I get the following message and then
the guest launch terminates:
The usage at ram_block_add_cpr_blocker() is correct, the cpr_blocker
gets initialized and registered as a migration blocker. The errp only
becomes relevant later when migration starts and the error condition is
met.
qemu-system-x86_64: Memory region pc.bios is not compatible with CPR.
share=on is required for memory-backend objects, and aux-ram-share=on is
required.
Since errp is an &error_fatal, it causes QEMU to exit, so this^ error
message is bogus.
The qemu parameters I used prior to this patch that allowed an SNP guest
to launch were:
-machine type=q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=16G,share=true,prealloc=false
With these parameters after this patch, the launch fails.
I think it might be failing because the caller of
ram_block_add_cpr_blocker() is passing in &error_abort, but if the
error_setg() is call on a properly initialized cpr_blocker value then
SNP is still able to boot for me.
I'm not sure where the best spot is
to initialize cpr_blocker, it probably needs to be done before either
ram_block_add_cpr_blocker() or ram_block_del_cpr_blocker() are callable,
but the following avoids the reported crash at least:
diff --git a/system/physmem.c b/system/physmem.c
index 44dd129662..bff0fdcaac 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -4176,6 +4176,7 @@ void ram_block_add_cpr_blocker(RAMBlock *rb, Error **errp)
return;
}
+ rb->cpr_blocker = NULL;
Could it be the cpr_blocker already got set at ram_block_add() in the
RAM_GUEST_MEMFD path?
That seems to be the case: in some cases ram_block_add() sets cpr_blocker
when (new_block->flags & RAM_GUEST_MEMFD) is true, and then soon after
when ram_block_add_cpr_blocker() is called on the same RAMBlock:
2025-03-26T21:08:15.092427Z qemu-system-x86_64: warning: ram_block_add:
new_block 0x55c247e4c880 new_block->cpr_blocker (nil) name ram1
2025-03-26T21:08:15.124710Z qemu-system-x86_64: warning: ram_block_add: new_block 0x55c2480fde00 new_block->cpr_blocker (nil) name pc.bios
2025-03-26T21:08:15.126190Z qemu-system-x86_64: warning:
ram_block_add_cpr_blocker: rb 0x55c2480fde00 rb->cpr_blocker 0x55c2480fe050
name pc.bios
2025-03-26T21:08:15.138582Z qemu-system-x86_64: warning: ram_block_add:
new_block 0x55c247e3c1e0 new_block->cpr_blocker (nil) name pc.rom
2025-03-26T21:08:15.138938Z qemu-system-x86_64: warning:
ram_block_add_cpr_blocker: rb 0x55c247e3c1e0 rb->cpr_blocker 0x55c247e3c890
name pc.rom
2025-03-26T21:08:16.185577Z qemu-system-x86_64: warning: ram_block_add_cpr_blocker: rb 0x55c248db9200 rb->cpr_blocker (nil) name /rom@etc/acpi/tables
2025-03-26T21:08:16.187140Z qemu-system-x86_64: warning:
ram_block_add_cpr_blocker: rb 0x55c248085620 rb->cpr_blocker (nil) name
/rom@etc/table-loader
2025-03-26T21:08:16.188029Z qemu-system-x86_64: warning:
ram_block_add_cpr_blocker: rb 0x55c2480ce220 rb->cpr_blocker (nil) name
/rom@etc/acpi/rsd
Thanks everyone for debugging this. To summarize, ram_block_add_cpr_blocker
already blocks
guest_memfd, because rb->fd < 0. The fix is to delete this redundant code in
ram_block_add:
error_setg(&new_block->cpr_blocker,
"Memory region %s uses guest_memfd, "
"which is not supported with CPR.",
memory_region_name(new_block->mr));
migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
MIG_MODE_CPR_TRANSFER, MIG_MODE_CPR_EXEC,
-1);
I will submit a fix (unless Tom or Michael would prefer to author it).
- Steve