On Mon, Feb 29, 2016 at 10:12:08PM +0000, Liang, Kan wrote: > In SDM "18.4.4.4 Re-configuring PEBS Facilities" it mentioned that > a quiescent period is needed between stopping the prior event counting and > setting up a new PEBS event when software needs to reconfigure PEBS > facilities. > The quiescent period is to allow any latent residual PEBS records to complete > its capture at their previously specified buffer address
> That requirement only can be found in Core Microarchitecture. But that should apply to all (PEBS) event scheduling, not just the multi thing. Also very convenient that quiescent period is so well defined. How long should we wait, a day? > I think it may implies that there is some observed delay in writing PEBS > buffer. Doesn't it explicitly state just that? > So if perf record precise hw event with very small period, the slow PEBS > writing > may lockup the CPU. And I still don't see how this would explain a lockup in the MSR writes. [ Jiri, can you disable that stupid panic on hard lockup and let it run for a while, see if all the lockup msgs hit the same IP? Also, can you look where exactly that IP lives in the code? ] So I suspect it actually just did the PERF_GLOBAL_CTRL write, how else would the hardware watchdog trigger on that same CPU. After that, there's only BTS muck, which you're not using, so WTH is it actually stuck on? > If so, I think disabling the multiple pebs should be a good way. As said, this should affect any and all PEBS event scheduling, not just the multi stuff.