Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

Michael S. Tsirkin Mon, 19 Mar 2018 19:59:46 -0700

On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
> On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
> > On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
> > > On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> > > > On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> > > > > On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > > > > > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > > > OTOH it seems that if thread stops nothing will wake it up
> > > > whem vm is restarted. Such bahaviour change across vmstop/vmstart
> > > > is unexpected.
> > > > I do not understand why we want to increment the counter
> > > > on vm stop though. It does make sense to stop the thread
> > > > but why not resume where we left off when vm is resumed?
> > > > 
> > > I'm not sure which counter we incremented. But it would be clear if we 
> > > have
> > > a high level view of how it works (it is symmetric actually). Basically, 
> > > we
> > > start the optimization when each round starts and stop it at the end of 
> > > each
> > > round (i.e. before we do the bitmap sync), as shown below:
> > > 
> > > 1) 1st Round starts --> free_page_start
> > > 2) 1st Round in progress..
> > > 3) 1st Round ends  --> free_page_stop
> > > 4) 2nd Round starts --> free_page_start
> > > 5) 2nd Round in progress..
> > > 6) 2nd Round ends --> free_page_stop
> > > ......
> > > 
> > > For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
> > > finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
> > > optimization thread exits immediately. That is, this optimization thread 
> > > is
> > > gone forever (the optimization we can do for this round is done). We won't
> > > know when would the VM be woken up:
> > > A) If the VM is woken up very soon when the migration thread is still in
> > > progress of 2), then in 4) a new optimization thread (not the same one for
> > > the first round) will be created and start the optimization for the 2nd
> > > round as usual (If you have questions about 3) in this case, that
> > > free_page_stop will do nothing than just return, since the optimization
> > > thread has exited) ;
> > > B) If the VM is woken up after the whole migration has ended, there is 
> > > still
> > > no point in resuming the optimization.
> > > 
> > > I think this would be the simple design for the first release of this
> > > optimization. There are possibilities to improve case A) above by 
> > > continuing
> > > optimization for the 1st Round as it is still in progress, but I think
> > > adding that complexity for this rare case wouldn't be worthwhile (at least
> > > for now). What would you think?
> > > 
> > > 
> > > Best,
> > > Wei
> > In my opinion this just makes the patch very messy.
> > 
> > E.g. attempts to attach a debugger to the guest will call vmstop and
> > then behaviour changes. This is a receipe for heisenbugs which are then
> > extremely painful to debug.
> > 
> > It is not really hard to make things symmetrical:
> > e.g. if you stop on vmstop then you should start on vmstart, etc.
> > And stopping thread should not involve a bunch of state
> > changes, just stop it and that's it.
> > 
> 
> "stop it" - do you mean to
> 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
> the while loop and return NULL); or
> 2) keep the thread staying in the while loop but yield running (e.g.
> sleep(1) or block on a mutex)? (or please let me know if you suggested a
> different implementation about stopping the thread)


I would say it makes more sense to make it block on something.

BTW I still think you are engaging in premature optimization here.
What you are doing here is a "data plane for balloon".
I would make the feature work first by processing this in a BH.
Creating threads immediately opens up questions of isolation,
cgroups etc.

> If we go with 1), then we would not be able to resume the thread.
> If we go with 2), then there is no guarantee to make the thread continue to
> run immediately (there will be a scheduling delay which is not predictable)
> when the vm is woken up. If the thread cannot run immediately when the the
> VM is woken up, it will be effectively the same as 1).
> 
> In terms of heisenbugs, I think we can recommend developers (of this
> feature) to use methods that don't stop the VM (e.g. print).
> 
> Best,
> Wei

Sorry this does not makes sense to me.  The reality is that for most
people migration works just fine.  These patches implement a niche
optimization. #1 priority should be to break no existing flows.

-- 
MST

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

Reply via email to