Re: [Qemu-devel] [regression] Clock jump on VM migration

Dr. David Alan Gilbert Fri, 08 Feb 2019 01:49:00 -0800

* Stefan Hajnoczi (stefa...@redhat.com) wrote:
> On Thu, Feb 07, 2019 at 05:33:25PM -0500, Neil Skrypuch wrote:
> 
> Thanks for your email!
> 
> Please post your QEMU command-line.
> 
> > The clock jump numbers above are from NTP, but you can see that they are 
> > quite 
> > close to the amount of time spent in raw_co_invalidate_cache. So, it looks 
> > like flushing the cache is just taking a long time and stalling the guest, 
> > which causes the clock jump. This isn't too surprising as the entire disk 
> > image was just written as part of the block mirror and would likely still 
> > be 
> > in the cache.
> > 
> > I see the use case for this feature, but I don't think it applies here, as 
> > we're not technically using shared storage. I believe an option to toggle 
> > this 
> > behaviour on/off and/or some sort of heuristic to guess whether or not it 
> > should be enabled by default would be in order here.
> 
> It would be good to figure out how to perform the flush without
> affecting guest time at all.  The clock jump will also inconvenience
> users who do need the flush, so I rather not work around the clock jump
> for a subset of users only.


One thing that makes Neil's setup different is that having the source
and destination on the same host, that fadvise is bound to drop pages
that are actually in use by the source on the same host.

But I'm also curious at what point in the migration we call the
invalidate and so which threads get held up, in which state.

Neil: Another printf would also be interesting, between the
bdrv_co_flush and the posix_fadvise;  I'm assuming it's the
bdrv_co_flush that's taking the time but it would be good to check.

Dave

> Stefan


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [regression] Clock jump on VM migration

Reply via email to