On 220722 2210, Claudio Fontana wrote: > Hi Richard, > > On 7/22/22 21:20, Richard Liu wrote: > > This RFC adds a virtual device for snapshot/restores within QEMU. I am > > working > > on this as a part of QEMU Google Summer of Code 2022. Fast snapshot/restores > > within QEMU is helpful for code fuzzing. > > > > I reused the migration code for saving and restoring virtual device and CPU > > state. As for the RAM, I am using a simple COW mmaped file to do restores. > > > > The loadvm migration function I used for doing restores only worked after I > > called it from a qemu_bh. I'm not sure if I should run the migration code > > in a > > separate thread (see patch 3), since currently it is running as a part of > > the > > device code in the vCPU thread. > > > > This is a rough first revision and feedback on the cpu and device state > > restores > > is appreciated. > > As I understand it, usually the save and restore of VM state in QEMU can best > be > managed by libvirt APIs, and for example using the libvirt command line tool > virsh: > > $ virsh save (or managedsave) > > $ virsh restore (or start) > > These commands start a QEMU migration using the QMP protocol to a file > descriptor, > previously opened by libvirt to contain the state file. > > (getfd QMP command): > https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-2811 > > (migrate QMP command): > https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-1947 > > This is unfortunately currently very slow. > > Maybe you could help thinking out or with the implementation of the solution? > I tried to push this approach that only involves libvirt, using the existing > QEMU multifd migration to a socket: > > https://listman.redhat.com/archives/libvir-list/2022-June/232252.html > > performance is very good compared with what is possible today, but it won't > be upstreamable because it is not deemed optimal, and libvirt wants the code > to be in QEMU. > > What about helping in thinking out how the QEMU-based solution could look > like? > > The requirements for now in my view seem to be: > > * avoiding the kernel file page trashing for large transfers > which currently requires in my view changing QEMU to be able to migrate a > stream to an fd that is open with O_DIRECT. > In practice this means somehow making all QEMU migration stream writes > block-friendly (adding some buffering?). > > * allow concurrent parallel transfers > to be able to use extra cpu resources to speed up the transfer if such > resources are available. > > * we should be able to transfer multiple GB/s with modern nvmes for super > fast VM state save and restore (few seconds even for a 30GB VM), > and we should do no worse than the prototype fully implemented in libvirt, > otherwise it would not make sense to implement it in QEMU. > > What do you think?
Hi Claudio, These changes aim to restore a VM hundreds-thousands of times per second within the same process. Do you think that is achievable with the design of qmp migrate? We want to to avoid serializing/transferring all of memory over the FD. So right now, this series only uses migration code for device state. Right now (in 3/3), the memory is "restored" simply be re-mmapping MAP_PRIVATE from file-backed memory. However, future versions might use dirty-page-tracking with a shadow memory-snapshot, to avoid the page-faults that result from the mmap + MAP_PRIVATE approach. In terms of the way the guest initiates snapshots/restores, maybe there is a neater way to do this with QMP, by providing the guest with access to qmp via a serial device. That way, we avoid the need for a custom virtual-device. Right now, the snapshots are requested/restored over MMIO, since we need to make snapshots at precise locations in the guest's execution (i.e. a specific program counter in a process running in the guest). I wonder if there is a way to achieve that with qmp forwarded to the guest. -Alex > > Ciao, > > Claudio > > > > > To test locally, boot up any linux distro. I used the following C file to > > interact with the PCI snapshot device: > > > > #include <stdio.h> > > #include <stdint.h> > > #include <fcntl.h> > > #include <sys/mman.h> > > #include <unistd.h> > > > > int main() { > > int fd = open("/sys/bus/pci/devices/0000:00:04.0/resource0", O_RDWR > > | O_SYNC); > > size_t size = 1024 * 1024; > > uint32_t* memory = mmap(NULL, size, PROT_READ | PROT_WRITE, > > MAP_SHARED, fd, 0); > > > > printf("%x\n", memory[0]); > > > > int a = 0; > > memory[0] = 0x101; // save snapshot > > printf("before: value of a = %d\n", a); > > a = 1; > > printf("middle: value of a = %d\n", a); > > memory[0] = 0x102; // load snapshot > > printf("after: value of a = %d\n", a); > > > > return 0; > > } > > > > Richard Liu (3): > > create skeleton snapshot device and add docs > > implement ram save/restore > > use migration code for cpu and device save/restore > > > > docs/devel/snapshot.rst | 26 +++++++ > > hw/i386/Kconfig | 1 + > > hw/misc/Kconfig | 3 + > > hw/misc/meson.build | 1 + > > hw/misc/snapshot.c | 164 ++++++++++++++++++++++++++++++++++++++++ > > migration/savevm.c | 84 ++++++++++++++++++++ > > migration/savevm.h | 3 + > > 7 files changed, 282 insertions(+) > > create mode 100644 docs/devel/snapshot.rst > > create mode 100644 hw/misc/snapshot.c > > >