On 07/30/2012 03:43 PM, Sasha Levin wrote: >>>>> >>>>>> Then work backwards to see the last place it is >>>>>> programmed (APIC_TMICT/APIC_TDCR). >>>>> >>>>> This looks like what you're looking for: >>>>> >>>>> kvm_apic: apic_write APIC_TMICT = 0x3e >> >> What about APIC_TMICT? Might be configured just once on boot. > > Did you mean APIC_TDCR?
Yes, sorry. > > I'm seeing both in the trace logs, but it's APIC_TMICT thats shown when the > hang occurs. > > vm-19727 [002] ...1 66669.010699: kvm_apic: apic_read APIC_TDCR > = 0x0 > vm-19727 [002] ...1 66669.010699: kvm_msr: msr_read 83e = 0x0 > vm-19727 [002] d..2 66669.010699: kvm_entry: vcpu 0 > vm-19727 [002] d..2 66669.010700: kvm_exit: reason MSR_WRITE > rip 0xffffffff810a06a8 info 0 0 > vm-19727 [002] ...1 66669.010701: kvm_apic: apic_write > APIC_TDCR = 0x3 > vm-19727 [002] ...1 66669.010701: kvm_msr: msr_write 83e = 0x3 > vm-19727 [002] d..2 66669.010701: kvm_entry: vcpu 0 TDCR is needed to interpret TMICT. > >>>>> >>>>> >>>>> Basically it's spinning on the following 4 lines of trace when it happens: >>>>> >>>>> 16063.549189: kvm_apic: apic_write APIC_TMICT = 0x3e >>>>> 16063.549190: kvm_msr: msr_write 838 = 0x3e >>>>> 16063.549190: kvm_exit: reason EXTERNAL_INTERRUPT rip >>>>> 0xffffffff8115de02 info 0 0 >>>>> 16063.549190: kvm_entry: vcpu 2 >>>>> >>>> >>>> What, no kvm_inj_virq? >>>> >>>> Are you running with pveoi? Try disabling it. >> >> What's the answer to this question? > > The host is running 3.5, so that shouldn't be enabled. Ok. >> Please add hrtimer_start and hrtimer_expire_entry to your traced events. >> > > A bigger log with those two enabled: > > vm-19730 [003] ...1 66730.739585: kvm_apic: apic_write > APIC_TMICT = 0x3e > vm-19716 [000] d..3 66730.739586: hrtimer_start: > hrtimer=ffff880133cc3f78 function=posix_timer_fn expires=66882054763626 > softexpires=66882054763626 > vm-19730 [003] d..2 66730.739586: hrtimer_start: > hrtimer=ffff88011e00d3d0 function=kvm_timer_fn expires=66882053774813 > softexpires=66882053774813 vcpu 3 programs the timer, and kvm sets an hrtimer, but fat in the future? > vm-19730 [003] ...1 66730.739586: kvm_msr: msr_write 838 = 0x3e > vm-19730 [003] d..2 66730.739587: kvm_entry: vcpu 3 > vm-19730 [003] d..2 66730.739589: kvm_exit: reason > PENDING_INTERRUPT rip 0xffffffff8115c684 info 0 0 > vm-19729 [001] d..2 66730.739590: kvm_exit: reason MSR_WRITE > rip 0xffffffff810a06a8 info 0 0 > vm-19730 [003] d.h2 66730.739590: hrtimer_expire_entry: > hrtimer=ffff88011e00d3d0 function=kvm_timer_fn now=66882053778032 Nope, the ftrace time and hrtimer time aren't synchronized. > vm-19729 [001] ...1 66730.739591: kvm_apic: apic_write > APIC_TMICT = 0x3e vcpu 2 sets its timer. > vm-19730 [003] ...1 66730.739591: kvm_apic_accept_irq: apicid 3 > vec 239 (Fixed|edge) (coalesced) The original vcpu gets its interrupt delivered. > vm-19729 [001] d..2 66730.739591: hrtimer_start: > hrtimer=ffff88020d208490 function=kvm_timer_fn expires=66882053780380 > softexpires=66882053780380 vcpu 2 sets its hrtimer. > vm-19730 [003] ...1 66730.739592: kvm_inj_virq: irq 239 > vm-19729 [001] ...1 66730.739592: kvm_msr: msr_write 838 = 0x3e > vm-19730 [003] d..2 66730.739593: kvm_entry: vcpu 3 > vm-19729 [001] d..2 66730.739593: kvm_entry: vcpu 2 > vm-19730 [003] d..2 66730.739594: kvm_exit: reason MSR_WRITE > rip 0xffffffff810a06a8 info 0 0 > vm-19729 [001] d..2 66730.739594: kvm_exit: reason > EXTERNAL_INTERRUPT rip 0xffffffff83601f61 info 0 0 vcpu 2 hrtimer fires, but it doesn't know it yet. > vm-19730 [003] ...1 66730.739595: kvm_apic: apic_write APIC_EOI > = 0x0 > vm-19730 [003] ...1 66730.739595: kvm_msr: msr_write 80b = 0x0 > vm-19729 [001] d.h2 66730.739596: hrtimer_expire_entry: > hrtimer=ffff88020d208490 function=kvm_timer_fn now=66882053783462 vcpu 2 hrtimer fires. > vm-19730 [003] ...1 66730.739596: kvm_apic_accept_irq: apicid 3 > vec 239 (Fixed|edge) > vm-19729 [001] ...1 66730.739597: kvm_apic_accept_irq: apicid 2 > vec 239 (Fixed|edge) (coalesced) and delivers a guest interrupt, but guest interrupts are still blocked. > vm-19730 [003] d..2 66730.739597: kvm_entry: vcpu 3 > vm-19729 [001] d..2 66730.739597: kvm_entry: vcpu 2 > vm-19729 [001] d..2 66730.739598: kvm_exit: reason > PENDING_INTERRUPT rip 0xffffffff8115c684 info 0 0 > vm-19729 [001] ...1 66730.739599: kvm_apic_accept_irq: apicid 2 > vec 239 (Fixed|edge) (coalesced) > vm-19729 [001] ...1 66730.739600: kvm_inj_virq: irq 239 > vm-19729 [001] d..2 66730.739600: kvm_entry: vcpu 2 Interrupts unblocked, interrupt delivered. So: either the hrtimer is set for way too low a deadline, or perhaps it expires prematurely. TDCR=3 means divide by 16, together with TMICT=0x3e and a bus frequency of 1GHz this means 992 ns expiration, which is consistent with what we see. Possible causes: - the APIC calibration in the guest failed, so it is programming too low values into the timer - it actually needs 1 us wakeups and then can't keep up (esp. as kvm interrupt injection is slowing it down) You can try to find out by changing arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of (say) 20 microseconds which will let the guest live long enough for you to ftrace it and see what kind of timers it is programming. > vm-19729 [001] d..2 66730.739601: kvm_exit: reason MSR_WRITE > rip 0xffffffff810a06a8 info 0 0 > vm-19729 [001] ...1 66730.739602: kvm_apic: apic_write APIC_EOI > = 0x0 > vm-19729 [001] ...1 66730.739602: kvm_msr: msr_write 80b = 0x0 > vm-19730 [003] d..2 66730.739603: kvm_exit: reason MSR_WRITE > rip 0xffffffff810a06a8 info 0 0 > vm-19729 [001] ...1 66730.739603: kvm_apic_accept_irq: apicid 2 > vec 239 (Fixed|edge) > vm-19730 [003] ...1 66730.739604: kvm_apic: apic_write > APIC_TMICT = 0x3e > vm-19729 [001] d..2 66730.739604: kvm_entry: vcpu 2 > vm-19730 [003] d..2 66730.739604: hrtimer_start: > hrtimer=ffff88011e00d3d0 function=kvm_timer_fn expires=66882053793430 > softexpires=66882053793430 > vm-19730 [003] ...1 66730.739605: kvm_msr: msr_write 838 = 0x3e > vm-19730 [003] d..2 66730.739606: kvm_entry: vcpu 3 > vm-19730 [003] d..2 66730.739607: kvm_exit: reason > PENDING_INTERRUPT rip 0xffffffff8115c684 info 0 0 > vm-19729 [001] d..2 66730.739608: kvm_exit: reason > EXTERNAL_INTERRUPT rip 0xffffffff8115f2a1 info 0 0 > vm-19730 [003] d.h2 66730.739608: hrtimer_expire_entry: > hrtimer=ffff88011e00d3d0 function=kvm_timer_fn now=66882053796355 > vm-19729 [001] ...1 66730.739609: kvm_userspace_exit: reason > restart (4) > vm-19730 [003] ...1 66730.739610: kvm_apic_accept_irq: apicid 3 > vec 239 (Fixed|edge) (coalesced) > vm-19730 [003] ...1 66730.739611: kvm_userspace_exit: reason > restart (4) > vm-19716 [000] ...1 66730.739612: kvm_set_irq: gsi 4 level 1 > source 0 > vm-19716 [000] ...2 66730.739614: kvm_pic_set_irq: chip 0 pin 4 > (edge) > -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/