On 8/8/2018 9:47 AM, Peter Zijlstra wrote: > On Wed, Aug 08, 2018 at 03:55:54PM +0000, Luck, Tony wrote: >>> So _why_ doesn't this work? As said by Tony, that first call should >>> prime the caches, so the second and third calls should not generate any >>> misses. >> >> How much code/data is involved? If there is a lot, then you may be unlucky >> with cache coloring and the later parts of the "prime the caches" code path >> may evict some lines loaded in the early parts. > > Well, Reinette used perf_event_read_local() which is unfortunately quite > a bit. But the inline I proposed is a single load and depending on > rdpmcl() or native_read_pmc() a call to or just a single inline asm > rdpmc instruction. > > That should certainly work I think.
I am in the process of testing this variation. Reinette