Re: [Qemu-devel] Rethinking missed tick catchup

Gleb Natapov Thu, 13 Sep 2012 11:56:31 -0700

On Thu, Sep 13, 2012 at 01:33:31PM -0500, Anthony Liguori wrote:
> Gleb Natapov <g...@redhat.com> writes:
> 
> > On Thu, Sep 13, 2012 at 10:56:56AM -0500, Anthony Liguori wrote:
> >> Gleb Natapov <g...@redhat.com> writes:
> >> 
> >> > On Thu, Sep 13, 2012 at 09:35:18AM -0500, Anthony Liguori wrote:
> >> >> Gleb Natapov <g...@redhat.com> writes:
> >> >> 
> >> >> > On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote:
> >> >> >> "Daniel P. Berrange" <berra...@redhat.com> writes:
> >> >> >> 
> >> >> >> I think it's better for QEMU to talk to qemu-ga.  We can tell when a 
> >> >> >> large
> >> >> >> period of time has passed in QEMU because we'll accumulate a large
> >> >> >> number of missed ticks.
> >> >> >> 
> >> >> > With RTC configured to use vm clock we will not.
> >> >> 
> >> >> Not for host suspend.  For stop and live migration, we stop vm_clock.
> >> >> But QEMU isn't aware of host suspend so vm_clock cannot be stopped.
> >> >> 
> >> > Hmm, true. What about hooking into suspend and doing vmstop during
> >> > suspend. 
> >> 
> >> Is suspend the only foreseeable way for this problem to happen?  I don't
> >> think it is which is what concerns me about any approach that relies on
> >> "hooking suspend".
> >> 
> > With RTC using real time clock setting host time far ahead of what is it
> > will trigger same behaviour I think.
> >
> >> Also, I don't think there is a generic way to "hook suspend".
> >> 
> >> >> >> This could happen because of stop, host suspend, live migration to a
> >> >> >> file, etc.
> >> >> >> 
> >> >> >> It's much easier for us to call into qemu-ga to do the time 
> >> >> >> correction
> >> >> >> whenever this event occurs than to try and have libvirt figure out 
> >> >> >> when
> >> >> >> it's necessary.
> >> >> > And if guest does not have qemu-ga what is better inject interrupts 
> >> >> > like
> >> >> > crazy for next 2 minutes or leave guest with incorrect time?
> >> >> 
> >> >> Yes, at least that's fixable by the end-user.  QEMU consuming 100% CPU
> >> >> for a prolonged period of time isn't fixable.
> >> >> 
> >> > You mean yes to "leave guest with incorrect time"? QEMU will still
> >> > consume 100% of cpu for some time calling qemu_timer callback millions
> >> > times. timedrift code is not the right level to fix that.
> >> 
> >> Not if we put a cap on how many interrupts we'll try to catch up.
> >> 
> > Interrupts ctachup happens at another level. If guest was stopped for
> > 24 hours while RTC was configured to 1kHz qemu_timer will fire callback
> > 88473600 times. Each invocation will try to inject interrupt and fail
> > incrementing coalesced_irq instead. You can cap coalesced_irq but
> > callback will still fire 88473600 times.
> 
> That's a bug.
> 
> The next period calculation should not be based on the last period +
> length of period but rather on the current time + delta to next period
> boundary.
> 
I disagree that this is a bug. This is by design to account for timer
signals that was delivered to late.



> IOW, if we shouldn't arm timers to expire backwards in time from when
> the event occurred.  That should be accounted as a missed tick.
> 
Not all users of qemu_timer have their own missed tick accounting so
qemu_timer provides general one. We can create another time source
for qemu_timer without this behaviour and use it in RTC.


--
                        Gleb.

Re: [Qemu-devel] Rethinking missed tick catchup

Reply via email to