On 220722 2210, Claudio Fontana wrote:
> Hi Richard,
> 
> On 7/22/22 21:20, Richard Liu wrote:
> > This RFC adds a virtual device for snapshot/restores within QEMU. I am 
> > working
> > on this as a part of QEMU Google Summer of Code 2022. Fast snapshot/restores
> > within QEMU is helpful for code fuzzing.
> > 
> > I reused the migration code for saving and restoring virtual device and CPU
> > state. As for the RAM, I am using a simple COW mmaped file to do restores.
> > 
> > The loadvm migration function I used for doing restores only worked after I
> > called it from a qemu_bh. I'm not sure if I should run the migration code 
> > in a
> > separate thread (see patch 3), since currently it is running as a part of 
> > the
> > device code in the vCPU thread.
> > 
> > This is a rough first revision and feedback on the cpu and device state 
> > restores
> > is appreciated.
> 
> As I understand it, usually the save and restore of VM state in QEMU can best 
> be
> managed by libvirt APIs, and for example using the libvirt command line tool 
> virsh:
> 
> $ virsh save (or managedsave)
> 
> $ virsh restore (or start)
> 
> These commands start a QEMU migration using the QMP protocol to a file 
> descriptor,
> previously opened by libvirt to contain the state file.
> 
> (getfd QMP command):
> https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-2811
> 
> (migrate QMP command):
> https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-1947
> 
> This is unfortunately currently very slow.
> 
> Maybe you could help thinking out or with the implementation of the solution?
> I tried to push this approach that only involves libvirt, using the existing 
> QEMU multifd migration to a socket:
> 
> https://listman.redhat.com/archives/libvir-list/2022-June/232252.html
> 
> performance is very good compared with what is possible today, but it won't 
> be upstreamable because it is not deemed optimal, and libvirt wants the code 
> to be in QEMU.
> 
> What about helping in thinking out how the QEMU-based solution could look 
> like?
> 
> The requirements for now in my view seem to be:
> 
> * avoiding the kernel file page trashing for large transfers
>   which currently requires in my view changing QEMU to be able to migrate a 
> stream to an fd that is open with O_DIRECT.
>   In practice this means somehow making all QEMU migration stream writes 
> block-friendly (adding some buffering?).
> 
> * allow concurrent parallel transfers
>   to be able to use extra cpu resources to speed up the transfer if such 
> resources are available.
> 
> * we should be able to transfer multiple GB/s with modern nvmes for super 
> fast VM state save and restore (few seconds even for a 30GB VM),
>   and we should do no worse than the prototype fully implemented in libvirt, 
> otherwise it would not make sense to implement it in QEMU.
> 
> What do you think?

Hi Claudio,
These changes aim to restore a VM hundreds-thousands of times per second
within the same process. Do you think that is achievable with the design
of qmp migrate? We want to to avoid serializing/transferring all of
memory over the FD. So right now, this series only uses migration code
for device state. Right now (in 3/3), the memory is "restored" simply be
re-mmapping MAP_PRIVATE from file-backed memory. However, future
versions might use dirty-page-tracking with a shadow memory-snapshot, to
avoid the page-faults that result from the mmap + MAP_PRIVATE approach.

In terms of the way the guest initiates snapshots/restores, maybe there
is a neater way to do this with QMP, by providing the guest with access
to qmp via a serial device. That way, we avoid the need for a custom
virtual-device. Right now, the snapshots are requested/restored over
MMIO, since we need to make snapshots at precise locations in the
guest's execution (i.e. a specific program counter in a process running
in the guest). I wonder if there is a way to achieve that with qmp
forwarded to the guest.

-Alex

> 
> Ciao,
> 
> Claudio
> 
> > 
> > To test locally, boot up any linux distro. I used the following C file to
> > interact with the PCI snapshot device:
> > 
> >     #include <stdio.h>
> >     #include <stdint.h>
> >     #include <fcntl.h>
> >     #include <sys/mman.h>
> >     #include <unistd.h>
> > 
> >     int main() {
> >         int fd = open("/sys/bus/pci/devices/0000:00:04.0/resource0", O_RDWR 
> > | O_SYNC);
> >         size_t size = 1024 * 1024;
> >         uint32_t* memory = mmap(NULL, size, PROT_READ | PROT_WRITE, 
> > MAP_SHARED, fd, 0);
> > 
> >         printf("%x\n", memory[0]);
> > 
> >         int a = 0;
> >         memory[0] = 0x101; // save snapshot
> >         printf("before: value of a = %d\n", a);
> >         a = 1;
> >         printf("middle: value of a = %d\n", a);
> >         memory[0] = 0x102; // load snapshot
> >         printf("after: value of a = %d\n", a);
> > 
> >         return 0;
> >     }
> > 
> > Richard Liu (3):
> >   create skeleton snapshot device and add docs
> >   implement ram save/restore
> >   use migration code for cpu and device save/restore
> > 
> >  docs/devel/snapshot.rst |  26 +++++++
> >  hw/i386/Kconfig         |   1 +
> >  hw/misc/Kconfig         |   3 +
> >  hw/misc/meson.build     |   1 +
> >  hw/misc/snapshot.c      | 164 ++++++++++++++++++++++++++++++++++++++++
> >  migration/savevm.c      |  84 ++++++++++++++++++++
> >  migration/savevm.h      |   3 +
> >  7 files changed, 282 insertions(+)
> >  create mode 100644 docs/devel/snapshot.rst
> >  create mode 100644 hw/misc/snapshot.c
> > 
> 

Reply via email to