Re: [Qemu-devel] Rethinking missed tick catchup

Gleb Natapov Thu, 13 Sep 2012 09:06:58 -0700

On Thu, Sep 13, 2012 at 10:56:56AM -0500, Anthony Liguori wrote:
> Gleb Natapov <g...@redhat.com> writes:
> 
> > On Thu, Sep 13, 2012 at 09:35:18AM -0500, Anthony Liguori wrote:
> >> Gleb Natapov <g...@redhat.com> writes:
> >> 
> >> > On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
> >> >> "Daniel P. Berrange" <berra...@redhat.com> writes:
> >> >> 
> >> >> I think it's better for QEMU to talk to qemu-ga.  We can tell when a 
> >> >> large
> >> >> period of time has passed in QEMU because we'll accumulate a large
> >> >> number of missed ticks.
> >> >> 
> >> > With RTC configured to use vm clock we will not.
> >> 
> >> Not for host suspend.  For stop and live migration, we stop vm_clock.
> >> But QEMU isn't aware of host suspend so vm_clock cannot be stopped.
> >> 
> > Hmm, true. What about hooking into suspend and doing vmstop during
> > suspend. 
> 
> Is suspend the only foreseeable way for this problem to happen?  I don't
> think it is which is what concerns me about any approach that relies on
> "hooking suspend".
> 
With RTC using real time clock setting host time far ahead of what is it
will trigger same behaviour I think.


> Also, I don't think there is a generic way to "hook suspend".
> 
> >> >> This could happen because of stop, host suspend, live migration to a
> >> >> file, etc.
> >> >> 
> >> >> It's much easier for us to call into qemu-ga to do the time correction
> >> >> whenever this event occurs than to try and have libvirt figure out when
> >> >> it's necessary.
> >> > And if guest does not have qemu-ga what is better inject interrupts like
> >> > crazy for next 2 minutes or leave guest with incorrect time?
> >> 
> >> Yes, at least that's fixable by the end-user.  QEMU consuming 100% CPU
> >> for a prolonged period of time isn't fixable.
> >> 
> > You mean yes to "leave guest with incorrect time"? QEMU will still
> > consume 100% of cpu for some time calling qemu_timer callback millions
> > times. timedrift code is not the right level to fix that.
> 
> Not if we put a cap on how many interrupts we'll try to catch up.
> 
Interrupts ctachup happens at another level. If guest was stopped for
24 hours while RTC was configured to 1kHz qemu_timer will fire callback
88473600 times. Each invocation will try to inject interrupt and fail
incrementing coalesced_irq instead. You can cap coalesced_irq but
callback will still fire 88473600 times.

> As I mentioned previously, if we acrue more than X number of missed
> ticks, we should simply declare bankruptcy and reset the counter.
> 
> When that occurs, *if* qemu-ga is present, we should ask qemu-ga to
> reset the guest's clock based on reading the hardware clock via a
> 'guest-resync-time' command.
> 
> If it isn't, time will be off.  Hopefully the guest is running NTP and
> can correct itself.  Otherwise, at least the admin can manually fix the
> time.
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> > --
> >                     Gleb.

--
                        Gleb.

Re: [Qemu-devel] Rethinking missed tick catchup

Reply via email to