On Mon, 26 Oct 2015 22:05:08 -0700, Jeremy Chadwick wrote: > (I am not subscribed to the mailing list, please keep me CC'd) > > Issue: a stable/10 system that has an abnormally high load average (e.g. > 0.15, but may be higher depending on other variables which I can't > account for) when the machine is definitely idle (i.e. cannot be traced > to high interrupt usage per vmstat -i, cannot be traced to a userland > process or kernel thread, etc.). > > This problem has been discussed many times on the FreeBSD mailing lists > and the FreeBSD forum (including some folks seeing it on 9.x, but my > complaint here is focused on 10.x so please focus there). > > I'd politely like to request that anyone experiencing this, or who has > experienced it (and if you know when it stopped or why, including what > you may have done, include that), to chime in on this ticket from 2012 > (made for 9.x but style of issue still applies; c#5 is quite valid): > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541 > > For those still experiencing it, I'd suggest reading c#8 and seeing if > sysctl kern.eventtimer.periodic=1 relieves the problem for you. (At > this time I would not suggest leaving that set indefinitely, as it does > seem to increase the interrupt rate in cpuX:timer in vmstat -i. But for > me kern.eventtimer.periodic=1 "fixes" the issue)
Jeremy, I've waited till now to reply, despite really welcoming serious discussion of these LA issues. Rather than joining the PR thread just yet, I'm going to report a short summary of findings, having monitored this ever since it turned up, originally noticed on 9.2 about 2 years ago; I have an embarrasing amount of data to report when appropriate. Like yourself, I think this is far from 'cosmetic' as is oft suggested, especially in some fairly ill-informed forum posts but also various list posts. I've been watching load averages since OS/2 through FreeBSD 2.2 till present and some Linux systems, and have never before seen anything like this, especially on virtually entirely idle systems. While the LAs reported during (say) make -j4 buildworld appear more same, who knows? I'm not suggesting that I think there's any sort of performance hit from this; so far I don't, but 'cosmetic' suggests that it doesn't matter .. I've just brought my 9.3-R Lenovo X200 up to stable/9 to be sure nothing had changed in this respect; it hasn't. So despite your desire to focus on 10.x, and your issue being with systems using LAPIC event counters, I think there are related - but different - manifestations of the issue/s. Firstly, the '0.6' LA at idle often seen in the various reports you've so thoroughly referenced including bug 173541, seems to occur most often (if not entirely) on laptops with Core 2 and Core 2 Duo processors that by default use the HPET as the event counter (and not in per-cpu mode, as eventtimers(4) would imply). I see long-term 15 minute LAs of ~0.55 to ~0.67, from completely idle to idle with X running, 200% idle shown. On these systems, switching to using LAPIC does indeed 'fix' the issue, resulting in genuine 0.00 0.00 0.00 LAs when completely idle, with say 0.10-0.15 short-term when poking it a bit with X running - however C3 is unavailable using LAPIC, pushing power consumption on battery from less than 8W to almost 12W (idle, screen off, lid down) ie ~+50% consumption or ~-33% battery life - so is not a viable solution for laptops actually used away from mains power. Correspondingly, system temperatures are consistently up to 20% higher - on AC or battery - when using LAPIC. So your problem is while using LAPIC, mine and others while using HPET, and despite some semi-obvious relatedness, there are differences, and I don't want to tromp on or confuse the issue you're chasing, yet think there's some underlying - possibly just mathematical? - issues here. I've spent some days trying to follow the HPET and LAPIC code, but it's been too long since I thought I had a tiny grip on any of this, before mav@ turned it on its head - for which we're of course grateful :) eg: /sys/x86/x86/local_apic.c /sys/amd64/include/apicvar.h /sys/dev/acpica/acpi_hpet.[ch] There are some other differences between HPET and LAPIC behaviour on mine best seesn in systat -vm, especially regarding interrupt usage, both in one-shot and periodic modes for both, that require posting quite a lot of data .. so I guess I'm wondering if that's appropriate now? I should add that I'm having some health issues that make it difficult for me to spend much time digging deeply into code or much testing, so that's made me a bit reluctant to dive into this .. but I can't resist! cheers, Ian _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"