Re: [OMPI users] How to yield CPU more when not computing (was curious behavior during wait for broadcast: 100% cpu)

Dave Love Wed, 09 Nov 2016 08:40:31 -0800

Jeff Hammond <jeff.scie...@gmail.com> writes:

>> I see sleeping for ‘0s’ typically taking ≳50μs on Linux (measured on
>> RHEL 6 or 7, without specific tuning, on recent Intel).  It doesn't look
>> like something you want in paths that should be low latency, but maybe
>> there's something you can do to improve that?  (sched_yield takes <1μs.)
>
> I demonstrated a bunch of different implementations with the instruction to
> "pick one of these...", where establishing the relationship between
> implementation and performance was left as an exercise for the reader :-)


The point was that only the one seemed available on RHEL6 to this
exercised reader.  No complaints about the useful list of possibilities.

> Note that MPI implementations may be interested in taking advantage of
> https://software.intel.com/en-us/blogs/2016/10/06/intel-xeon-phi-product-family-x200-knl-user-mode-ring-3-monitor-and-mwait.

Is that really useful if it's KNL-specific and MSR-based, with a setup
that implementations couldn't assume?

>> Is cpu_relax available to userland?  (GCC has an x86-specific intrinsic
>> __builtin_ia32_pause in fairly recent versions, but it's not in RHEL6's
>> gcc-4.4.)
>
> The pause instruction is available in ring3.  Just use that if cpu_relax
> wrapper is not implemented.

[OK; I meant in a userland library.]

Are there published measurements of the typical effects of spinning and
ameliorations on some sort of "representative" system?
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] How to yield CPU more when not computing (was curious behavior during wait for broadcast: 100% cpu)

Reply via email to