On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote: > On 19/01/16 13:32, Andrew Jones wrote: > > On Tue, Jan 19, 2016 at 01:43:41PM +0100, Christoffer Dall wrote: > >> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > >>> On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote: > >>>> The virt board has an arch timer, which is always on. Emit the > >>>> "always-on" property to indicate to Linux that it can switch off the > >>>> periodic timer and reduces the amount of interrupts injected into a > >>>> guest. > >>>> > >>>> Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org> > >>>> --- > >>>> hw/arm/virt.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c > >>>> index 05f9087..265fe9a 100644 > >>>> --- a/hw/arm/virt.c > >>>> +++ b/hw/arm/virt.c > >>>> @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo > >>>> *vbi, int gictype) > >>>> qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible", > >>>> "arm,armv7-timer"); > >>>> } > >>>> + qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0); > >>>> qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts", > >>>> GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, > >>>> irqflags, > >>>> GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, > >>>> irqflags, > >>>> -- > >>>> 2.1.2.330.g565301e.dirty > >>>> > >>>> > >>> > >>> Hi Christoffer, > >>> > >>> We should also patch the ACPI generation at the same time. I think > >>> something like > >>> > >>> - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE; > >>> + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON; > >> > >> I'm really not familiar enough with ACPI to be comfortable writing code > >> for this or testing this. > >> > >> But if someone can pick this up and add the ACPI bits or can post a > >> follow-up patch, then I'm all for it :) > > > > I can post a follow-up patch. > > > >> > >>> > >>> should do it. > >>> > >>> Also, having the guest reduce the number of interrupts sounds good. Can > >>> you point me to something to read about how/why a guest may choose to do > >>> that, and what the trade-offs are? > >>> > >> Not really, but you can ask Marc. > > > > OK, CCing him. One thing I see is that without this change we're > > currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though > > it's not true. Having that set may disable the oneshot capabilityj > > necessary to switch to nohz mode? I'll just stop there with my > > speculation though, so Marc won't have to correct too much... > > You're spot on. See 82a5619 in the kernel tree. When I did a similar > change in kvmtool, I saw a massive reduction in the number of timer > interrupts injected (specially when the number of vcpu is relatively high). > > This also have interesting benefits when running on a model, where > you're trying to squeeze the last bits of "performance" from the monster... >
Hmm, I'm probably testing this wrong, but I don't see any difference in the number of injected timer interrupts. My guest, which I boot with UEFI, has CONFIG_ARM_ARCH_TIMER=y CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y CONFIG_ARM_TIMER_SP804=y CONFIG_HIGH_RES_TIMERS=y CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set CONFIG_NO_HZ=y CONFIG_HZ_1000=y CONFIG_HZ=1000 I've boot a guest using DT with and without this patch ---WITHOUT--- # ls /proc/device-tree/timer compatible interrupts name # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 3: 6958 5766 5166 5187 5576 5129 4695 4398 GIC 27 Edge arch_timer # sleep 120 && cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 3: 7557 5986 5487 5265 6232 5868 5464 4438 GIC 27 Edge arch_timer ---WITH--- # ls /proc/device-tree/timer always-on compatible interrupts name # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 3: 7005 6080 4996 5391 5165 5257 4930 4844 GIC 27 Edge arch_timer # sleep 120 && cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 3: 7523 6505 5264 6717 5273 5391 5526 4901 GIC 27 Edge arch_timer And kvm trace data has ---WITHOUT--- $ grep kvm_timer_update_irq trace.out | wc -l 94336 ---WITH--- $ grep kvm_timer_update_irq trace.out | wc -l 95838 Any suggestions? Thanks, drew