On Fri, Jan 18, 2019 at 12:24:20PM -0500, Vince Weaver wrote:
> On Fri, 18 Jan 2019, Peter Zijlstra wrote:
> > 
> > You can actually use rdpmc when you attach to a CPU, but you have to
> > ensure that the userspace component is guaranteed to run on that very
> > CPU (sched_setaffinity(2) comes to mind).
> 
> unfortunately the HPC people using PAPI would probably be annoyed if we 
> started binding their threads to cores out from under them.

Quite.. :-)

> > The best we could possibly do is put the (target, not current) cpu
> > number in the mmap page; but userspace should already know this, for it
> > created the event and therefore knows this already.
> 
> one other thing the kernel would do is just disable rdpmc (setting index 
> to 0) in the case where the original perf_event_open() cpu paramater!=0
> 
> though that would stop the case where we were on the same CPU from 
> working.

Indeed.

> The issue is currently if you're not careful the rdpmc() interface will 
> sometimes return plausible (but wrong) results for a cross-CPU rdpmc() 
> call, even if you are properly falling back to read() on ->index being 0.
> It's a bit surprising and it looks like it will take a decent amount of 
> userspace code to work around the issue, which cuts into the low-overhead 
> nature of rdpmc.
> 
> If the answer is simply this is the way the kernel is going to do it, 
> that's fine, I just have to add workarounds to PAPI and then get the 
> perf_even_open() manpage updated to make sure people are aware of the 
> issue.

I'm not sure there really is anything the kernel can do to help here...
One thing you could look at is using rseq together with adding a CPU
number to the userspace descriptor, and if rseq gives you a matching CPU
number use rdpmc, otherwise, or on rseq abort, use read().

Reply via email to