Gleb Natapov <g...@redhat.com> writes: > On Thu, Sep 13, 2012 at 10:56:56AM -0500, Anthony Liguori wrote: >> Gleb Natapov <g...@redhat.com> writes: >> >> > On Thu, Sep 13, 2012 at 09:35:18AM -0500, Anthony Liguori wrote: >> >> Gleb Natapov <g...@redhat.com> writes: >> >> >> >> > On Thu, Sep 13, 2012 at 09:06:29AM -0500, Anthony Liguori wrote: >> >> >> "Daniel P. Berrange" <berra...@redhat.com> writes: >> >> >> >> >> >> I think it's better for QEMU to talk to qemu-ga. We can tell when a >> >> >> large >> >> >> period of time has passed in QEMU because we'll accumulate a large >> >> >> number of missed ticks. >> >> >> >> >> > With RTC configured to use vm clock we will not. >> >> >> >> Not for host suspend. For stop and live migration, we stop vm_clock. >> >> But QEMU isn't aware of host suspend so vm_clock cannot be stopped. >> >> >> > Hmm, true. What about hooking into suspend and doing vmstop during >> > suspend. >> >> Is suspend the only foreseeable way for this problem to happen? I don't >> think it is which is what concerns me about any approach that relies on >> "hooking suspend". >> > With RTC using real time clock setting host time far ahead of what is it > will trigger same behaviour I think. > >> Also, I don't think there is a generic way to "hook suspend". >> >> >> >> This could happen because of stop, host suspend, live migration to a >> >> >> file, etc. >> >> >> >> >> >> It's much easier for us to call into qemu-ga to do the time correction >> >> >> whenever this event occurs than to try and have libvirt figure out when >> >> >> it's necessary. >> >> > And if guest does not have qemu-ga what is better inject interrupts like >> >> > crazy for next 2 minutes or leave guest with incorrect time? >> >> >> >> Yes, at least that's fixable by the end-user. QEMU consuming 100% CPU >> >> for a prolonged period of time isn't fixable. >> >> >> > You mean yes to "leave guest with incorrect time"? QEMU will still >> > consume 100% of cpu for some time calling qemu_timer callback millions >> > times. timedrift code is not the right level to fix that. >> >> Not if we put a cap on how many interrupts we'll try to catch up. >> > Interrupts ctachup happens at another level. If guest was stopped for > 24 hours while RTC was configured to 1kHz qemu_timer will fire callback > 88473600 times. Each invocation will try to inject interrupt and fail > incrementing coalesced_irq instead. You can cap coalesced_irq but > callback will still fire 88473600 times.
That's a bug. The next period calculation should not be based on the last period + length of period but rather on the current time + delta to next period boundary. IOW, if we shouldn't arm timers to expire backwards in time from when the event occurred. That should be accounted as a missed tick. Regards, Anthony Liguori > >> As I mentioned previously, if we acrue more than X number of missed >> ticks, we should simply declare bankruptcy and reset the counter. >> >> When that occurs, *if* qemu-ga is present, we should ask qemu-ga to >> reset the guest's clock based on reading the hardware clock via a >> 'guest-resync-time' command. >> >> If it isn't, time will be off. Hopefully the guest is running NTP and >> can correct itself. Otherwise, at least the admin can manually fix the >> time. >> >> Regards, >> >> Anthony Liguori >> >> > >> > -- >> > Gleb. > > -- > Gleb.