On Wed, Jul 25, 2018 at 08:17:37PM +0100, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > On Fri, Jun 29, 2018 at 12:53:59PM +0100, Dr. David Alan Gilbert wrote: > > > * Denis Plotnikov (dplotni...@virtuozzo.com) wrote: > > > > The patch set adds the ability to make external snapshots while VM is > > > > running. > > > > > > cc'ing in Andrea since this uses sigsegv's to avoid userfault-wp that > > > isn't there yet. > > > > > > Hi Denis, > > > How robust are you finding this SEGV based trick; for example what > > > about things like the kernel walking vhost queues or similar kernel > > > nasties? > > > > (I'm commenting on this old series to keep the discussion together) > > > > If we want to make this series really work for people, we should > > possibly need to know whether it could work with vhost (otherwise we > > might need to go back to userfaultfd write-protection). > > > > I digged a bit on the vhost-net IO, it should be using two ways to > > write to guest memory: > > > > - copy_to_user(): this should possibly still be able to be captured by > > mprotect() (after some confirmation from Paolo, but still we'd > > better try it out) > > What confuses me here is who is going to get the signal from this and > how we recover from the signal - or does it come back as an error > on the vhost fd somehow?
The problem is having to start to handle manually all sigsegv in vhost-net by trapping copy_to_user returning less than the full buffer size or put_user returning -EFAULT. Those errors would need to be forwarded by vhost-net to qemu userland to call mprotect after copying the data. That's not conceptually different from having uffd-wp sending the message except that will then require zero changes to vhost-net and every other piece of kernel code that may have to write to the write protected memory. It may look like the uffd-wp model is wish-feature similar to an optimization, but without the uffd-wp model when the WP fault is triggered by kernel code, the sigsegv model falls apart and requires all kind of ad-hoc changes just for this single feature. Plus uffd-wp has other benefits: it makes it all reliable in terms of not increasing the number of vmas in use during the snapshot. Finally it makes it faster too with no mmap_sem for reading and no sigsegv signals. The non cooperative features got merged first because there was much activity on the kernel side on that front, but this is just an ideal time to nail down the remaining issues in uffd-wp I think. That I believe is time better spent than trying to emulate it with sigsegv and changing all drivers to send new events down to qemu specific to the sigsegv handling. We considered this before doing uffd for postcopy too but overall it's unreliable and more work (no single change was then needed to KVM code with uffd to handle postcopy and here it should be the same). Thanks, Andrea