On Fri, Jul 07, 2017 at 01:01:56PM +0100, Dr. David Alan Gilbert wrote: > > > Take care of deadlocking; any thread in the client that > > > accesses a userfault protected page can stall. > > > > And it can happen under a lock quite easily. > > What exactly is proposed here? > > Maybe we want to reuse the new channel that the IOMMU uses. > > There's no fundamental reason to get deadlocks as long as you > get it right; the qemu thread that processes the user-fault's > is a separate independent thread, so once it's going the client > can do whatever it likes and it will get woken up without > intervention.
You take a lock for the channel, then access guest memory. Then the thread that gets messages from qemu can't get on the channel to mark range as populated. > Some care is needed around the postcopy-end; reception of the > message that tells you to drop the userfault enables (which > frees anything that hasn't been woken) must be allowed to happen > for the postcopy complete; we take care that QEMUs fault > thread lives on until that message is acknowledged. > > I'm more worried about how this will work in a full packet switch > when one vhost-user client for an incoming migration stalls > the whole switch unless care is taken about the design. > How do we figure out whether this is going to fly on a full stack? It's performance though. Client could run in a separate thread for a while until migration finishes. We need to make sure there's explicit documentation that tells clients at what point they might block. > That's my main reason for getting this WIP set out here to > get comments. What will happen if QEMU dies? Is there a way to unblock the client? > > > There's a nasty hack of a lock around the set_mem_table message. > > > > Yes. > > > > > I've not looked at the recent IOMMU code. > > > > > > Some cleanup and a lot of corner cases need thinking about. > > > > > > There are probably plenty of unknown issues as well. > > > > At the protocol level, I'd like to rename the feature to > > USER_PAGEFAULT. Client does not really know anything about > > copies, it's all internal to qemu. > > Spec can document that it's used by qemu for postcopy. > > OK, tbh I suspect that using it for anything else would be tricky > without adding more protocol features for that other use case. > > Dave Why exactly? How does client have to know it's migration? -- MST