On Fri, Dec 26, 2014 at 12:57 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > On Fri, Dec 26, 2014 at 10:12 AM, Dave Jones <da...@codemonkey.org.uk> wrote: >> On Fri, Dec 26, 2014 at 11:34:10AM -0500, Dave Jones wrote: >> >> > One thing I think I'll try is to try and narrow down which >> > syscalls are triggering those "Clocksource hpet had cycles off" >> > messages. I'm still unclear on exactly what is doing >> > the stomping on the hpet. >> >> First I ran trinity with "-g vm" which limits it to use just >> a subset of syscalls, specifically VM related ones. >> That triggered the messages. Further experiments revealed: > > So I can trigger the false positives with my original patch quite > easily by just putting my box under some load. My numbers are nowhere > near as bad as yours, but then, I didn't put it under as much load > anyway. Just a regular "make -j64" of the kernel. > > I suspect your false positives are bigger partly because of the load, > but mostly because you presumably have preemption enabled too. I don't > do preemption in my normal kernels, and that limits the damage of the > race a bit. > > I have a newer version of the patch that gets rid of the false > positives with some ordering rules instead, and just for you I hacked > it up to say where the problem happens too, but it's likely too late. > > The fact that the original racy patch seems to make a difference for > you does say that yes, we seem to be zeroing in on the right area > here, but I'm not seeing what's wrong. I was hoping for big jumps from > your HPET, since your "TSC unstable" messages do kind of imply that > such really big jumps can happen. > > I'm attaching my updated hacky patch, although I assume it's much too > late for that machine. Don't look too closely at the backtrace > generation part, that's just a quick hack, and only works with frame > pointers enabled anyway. > > So I'm still a bit unhappy about not figuring out *what* is wrong. And > I'd still like the dmidecode from that machine, just for posterity. In > case we can figure out some pattern. > > So right now I can imagine several reasons: > > - actual hardware bug. > > This is *really* unlikely, though. It should hit everybody. The > HPET is in the core intel chipset, we're not talking random unusual > hardware by fly-by-night vendors here. > > - some SMM/BIOS "power management" feature. > > We've seen this before, where the SMM saves/restores the TSC on > entry/exit in order to hide itself from the system. I could imagine > similar code for the HPET counter. SMM writers use some bad drugs to > dull their pain. > > And with the HPET counter, since it's not even per-CPU, the "save > and restore HPET" will actually show up as "HPET went backwards" to > the other non-SMM CPU's if it happens > > - a bug in our own clocksource handling. > > I'm not seeing it. But maybe my patch hides it for some magical reason.
So I sent out a first step validation check to warn us if we end up with idle periods that are larger then we expect. It doesn't yet cap the timekeeping_get_ns() output (like you're patch effectively does), but it would be easy to do that in a following patch. I did notice while testing this that the max_idle_ns (max idle time we report to the scheduler) for the hpet is only ~16sec, and we'll overflow after just ~21seconds. This second number maps closely to the 22 second stalls seen in the nmi watchdog reports which seems interesting, but I also realize that qemu uses a 100MHz hpet, where as real hardware is likely to be a bit slower, so maybe that's just chance.. I'd be interested if folks seeing anything similar to Dave would give my patch a shot. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/