On 01.12.2020 10:08, Peter Krempa wrote:
On Thu, Nov 26, 2020 at 18:17:28 +0300, Andrey Gruzdev via wrote:
This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's
implemented in his series '[PATCH v0 0/4] migration: add background snapshot'.
Hi,
I gave this a try when attempting to implement the libvirt code for this
feature. I've ran into a problem of migration failing right away. The
VM's cpus were running at that point.
QEMU logged the following to stdout/err:
2020-12-01T06:50:42.334062Z qemu-system-x86_64: uffd_register_memory() failed:
start=7f2007fff000 len=33554432000 mode=2 errno=22
2020-12-01T06:50:42.334072Z qemu-system-x86_64: ram_write_tracking_start()
failed: restoring initial memory state
2020-12-01T06:50:42.334074Z qemu-system-x86_64: uffd_protect_memory() failed:
start=7f2007fff000 len=33554432000 mode=0 errno=2
2020-12-01T06:50:42.334076Z qemu-system-x86_64: uffd_unregister_memory()
failed: start=7f2007fff000 len=33554432000 errno=22
The migration was started by the following QMP conversation:
QEMU_MONITOR_IO_WRITE: mon=0x7fff9c20c610
buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":false},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false},{"capability":"background-snapshot","state":true}]},"id":"libvirt-14"}
QEMU_MONITOR_RECV_REPLY: mon=0x7fff9c20c610 reply={"return": {}, "id":
"libvirt-14"}
QEMU_MONITOR_IO_WRITE: mon=0x7fff9c20c610
buf={"execute":"migrate-set-parameters","arguments":{"max-bandwidth":9223372036853727232},"id":"libvirt-15"}
QEMU_MONITOR_RECV_REPLY: mon=0x7fff9c20c610 reply={"return": {}, "id":
"libvirt-15"}
QEMU_MONITOR_IO_WRITE: mon=0x7fff9c20c610
buf={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-16"}
QEMU_MONITOR_IO_SEND_FD: mon=0x7fff9c20c610 fd=44 ret=72 errno=0
QEMU_MONITOR_RECV_REPLY: mon=0x7fff9c20c610 reply={"return": {}, "id":
"libvirt-16"}
QEMU_MONITOR_IO_WRITE: mon=0x7fff9c20c610
buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-17"}
QEMU_MONITOR_RECV_EVENT: mon=0x7fff9c20c610 event={"timestamp": {"seconds": 1606805733, "microseconds": 962424},
"event": "MIGRATION", "data": {"status": "setup"}}
QEMU_MONITOR_RECV_REPLY: mon=0x7fff9c20c610 reply={"return": {}, "id":
"libvirt-17"}
QEMU_MONITOR_RECV_EVENT: mon=0x7fff9c20c610 event={"timestamp": {"seconds": 1606805733, "microseconds": 966306},
"event": "MIGRATION_PASS", "data": {"pass": 1}}
QEMU_MONITOR_RECV_EVENT: mon=0x7fff9c20c610 event={"timestamp": {"seconds": 1606805733, "microseconds": 966355},
"event": "MIGRATION", "data": {"status": "active"}}
QEMU_MONITOR_RECV_EVENT: mon=0x7fff9c20c610 event={"timestamp": {"seconds": 1606805733,
"microseconds": 966488}, "event": "STOP"}
QEMU_MONITOR_RECV_EVENT: mon=0x7fff9c20c610 event={"timestamp": {"seconds": 1606805733, "microseconds": 970326},
"event": "MIGRATION", "data": {"status": "failed"}}
QEMU_MONITOR_IO_WRITE: mon=0x7fff9c20c610
buf={"execute":"query-migrate","id":"libvirt-18"}
QEMU_MONITOR_RECV_REPLY: mon=0x7fff9c20c610 reply={"return": {"status": "failed"}, "id":
"libvirt-18"}
qemuMigrationJobCheckStatus:1685 : operation failed: snapshot job: unexpectedly
failed
$ uname -r
5.8.18-300.fc33.x86_64
created by libvirt with the following patchset applied:
https://gitlab.com/pipo.sk/libvirt/-/commits/background-snapshot
git fetch https://gitlab.com/pipo.sk/libvirt.git background-snapshot
Start the snapshot via:
virsh snapshot-create-as --memspec /tmp/snap.mem --diskspec sdb,snapshot=no
--diskspec sda,snapshot=no --no-metadata upstream
Note you can omit --diskspec if you have a diskless VM.
The patches are VERY work in progress as I need to figure out the proper
sequencing to ensure a consistent snapshot.
Note that in cases when qemu can't guarantee that the
background_snapshot feature will work it should not advertise it. We
need a way to check whether it's possible to use it, so we can replace
the existing --live flag with it rather than adding a new one and
shifting the problem of checking whether the feature works to the user.
Hi,
May be you are using hugetlbfs as memory backend?
I totally agree that we need somehow check that kernel and VM memory
backend support the feature before one can enable the capability.
Need to think about that..
Thanks,
--
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH +7-903-247-6397
virtuzzo.com