On Wed, Jan 20, 2016 at 04:20:03PM +0000, Marc Zyngier wrote: > On 20/01/16 15:06, Andrew Jones wrote: > > On Wed, Jan 20, 2016 at 02:28:05PM +0000, Marc Zyngier wrote: > >> On 20/01/16 14:01, Andrew Jones wrote: > >>> On Tue, Jan 19, 2016 at 07:48:14PM +0100, Andrew Jones wrote: > >>>> On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote: > >>>>>>> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > >>>>>> OK, CCing him. One thing I see is that without this change we're > >>>>>> currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though > >>>>>> it's not true. Having that set may disable the oneshot capabilityj > >>>>>> necessary to switch to nohz mode? I'll just stop there with my > >>>>>> speculation though, so Marc won't have to correct too much... > >>>>> > >>>>> You're spot on. See 82a5619 in the kernel tree. When I did a similar > >>>>> change in kvmtool, I saw a massive reduction in the number of timer > >>>>> interrupts injected (specially when the number of vcpu is relatively > >>>>> high). > >>>>> > >>>>> This also have interesting benefits when running on a model, where > >>>>> you're trying to squeeze the last bits of "performance" from the > >>>>> monster... > >>>>> > >>>> > >>>> Hmm, I'm probably testing this wrong, but I don't see any difference in > >>>> the number of injected timer interrupts. My guest, which I boot with > >>>> UEFI, has > >>>> > >>>> CONFIG_ARM_ARCH_TIMER=y > >>>> CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y > >>>> CONFIG_ARM_TIMER_SP804=y > >>>> CONFIG_HIGH_RES_TIMERS=y > >>>> CONFIG_TICK_ONESHOT=y > >>>> CONFIG_NO_HZ_COMMON=y > >>>> # CONFIG_HZ_PERIODIC is not set > >>>> CONFIG_NO_HZ_IDLE=y > >>>> # CONFIG_NO_HZ_FULL is not set > >>>> CONFIG_NO_HZ=y > >>>> CONFIG_HZ_1000=y > >>>> CONFIG_HZ=1000 > >>>> > >>>> I've boot a guest using DT with and without this patch > >>>> > >>>> ---WITHOUT--- > >>>> > >>>> # ls /proc/device-tree/timer > >>>> compatible interrupts name > >>>> # cat /proc/interrupts > >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >>>> CPU6 CPU7 > >>>> 3: 6958 5766 5166 5187 5576 5129 > >>>> 4695 4398 GIC 27 Edge arch_timer > >>>> # sleep 120 && cat /proc/interrupts > >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >>>> CPU6 CPU7 > >>>> 3: 7557 5986 5487 5265 6232 5868 > >>>> 5464 4438 GIC 27 Edge arch_timer > >>>> > >>>> ---WITH--- > >>>> > >>>> # ls /proc/device-tree/timer > >>>> always-on compatible interrupts name > >>>> # cat /proc/interrupts > >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >>>> CPU6 CPU7 > >>>> 3: 7005 6080 4996 5391 5165 5257 > >>>> 4930 4844 GIC 27 Edge arch_timer > >>>> # sleep 120 && cat /proc/interrupts > >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >>>> CPU6 CPU7 > >>>> 3: 7523 6505 5264 6717 5273 5391 > >>>> 5526 4901 GIC 27 Edge arch_timer > >>>> > >>>> > >>>> > >>>> And kvm trace data has > >>>> > >>>> ---WITHOUT--- > >>>> $ grep kvm_timer_update_irq trace.out | wc -l > >>>> 94336 > >>>> ---WITH--- > >>>> $ grep kvm_timer_update_irq trace.out | wc -l > >>>> 95838 > >>>> > >>>> > >>> > >>> Must be how I'm looking, because I just tried kvmtool with/without > >>> Marc's patch that adds always-on, but don't see any reduction of > >>> interrupts there either. I used a defconfig guest kernel. Also, > >>> not that I think it should matter, but my host kernel is 4.4-rc4 > >>> based. > >>> > >>> I'd like to be able to see a difference with/without this always-on > >>> patch, not because I don't think we should take it anyway, but because > >>> I need a test case for the ACPI counterpart. > >> > >> I just run a couple of quick tests, measuring interrupt rate (vmstat 1) > >> on the host, with one VM (2 vcpus) idling, and I'm seeing the following > >> thing: > >> > >> Without "always-on": ~380 interrupts per second > >> With "always-on": ~40 interrupts per second > >> > >> This is with kvmtool, 32bit host (but none of that is arch specific > >> anyway). > >> > > > > For me (64bit host, one VM (8 vcpus)) of 100 'vmstat 1' samples I have the > > following. > > > > Without "always-on": mean=56.370 sd=33.404 min=1 max=244 > > With "always-on": mean=51.580 sd=33.361 min=1 max=273 > > > > I'm also using kvmtool, and my guest is idle. > > > > So a difference between 32 and 64bit hosts? Again, my guest config is > > now just a defconfig. My host config is not, but I'm not sure what > > options to look for other than what I wrote above, which are the same > > for my host. > > Just tried on Seattle with a 64bit guest, and there is hardly any > difference indeed. Both host and guest are "mostly" defconfig as well. > So there is a kernel configuration difference. > > Running my 32bit guest on a 64bit host definitely shows a massive > difference (with 8 vcpus): > > Without "always-on": ~1200 interrupts per second > With "always-on": ~50 interrupts per second > > [Head scratching, poking Mark] > > Right, I now know what is going on: The arm64 kernel uses > tick_setup_hrtimer_broadcast() so that it can still use the arch timer > as a broadcast timer (forcing one CPU to remain on), while the 32bit > kernel relies on the presence of a backup timer (sp804 anyone?) or the > guarantee that the timer cannot go away (always-on). > > This is probably why I'm seeing such a gain with a 32bit guest, and none > with a 64bit guest (the kernel already does the right thing). As to why > there is such a difference between the two architectures, this is a > story for another day... >
Thanks Marc! I just confirmed with an AArch32 guest using QEMU, and the patch we've hijacked for this thread. Without the patch I get a megaton of interrupts (~14000/s). With the patch, after letting the guest chill for a while, I'm getting ~150/s (8 vcpus). I guess I don't have a test case for the ACPI code though. afaik we only have UEFI for AArch64 guests, and we don't have ACPI boot without UEFI. Or maybe I can hack time_init to remove the tick_setup_hrtimer_broadcast call? drew