Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?

Jordi Guitart Tue, 28 Mar 2017 02:17:11 -0700

Hi,

On 27/03/2017 17:51, Jeff Squyres (jsquyres) wrote:

1. Recall that sched_yield() has effectively become a no-op in newer Linux kernels.  
Hence, Open MPI's "yield when idle" may not do much to actually de-schedule a 
currently-running process.

Yes, I'm aware of this. However, this should impact both OpenMPIversions in the same way.

2. As for why there is a difference between version 1.10.1 and 1.10.2 in oversubscription 
behavior, we likely do not know offhand (as all of these emails have shown!).  Honestly, 
we don't really pay much attention to oversubscription performance -- our focus tends to 
be on under/exactly-subscribed performance, because that's the normal operating mode for 
MPI applications.  With oversubscribed, we have typically just said "all bets are 
off" and leave it at that.

I agree that oversubscription is not the typical usage scenario, and Ican understand the optimizing its performance is not a priority. Butmaybe the problem that I'm facing is just a symptom that something isnot working properly and this could impact also undersubscriptionscenarios (of course, to a lesser extent).


3. I don't recall if there was a default affinity policy change between 1.10.1 
and 1.10.2.  Do you know that your taskset command is -- for absolutely sure -- 
overriding what Open MPI is doing?  Or is what Open MPI is doing in terms of 
affinity/binding getting merged with what your taskset call is doing 
somehow...?  (seems unlikely, but I figured I'd ask anyway)

Regarding the changes between 1.10.1 and 1.10.2, I only found one thatseems related with oversubscription (i.e. "Correctly handleoversubscription when not given directives to permit it"). I don't knowif this could be impacting somehow ...

Regarding the impact of OpenMPI affinity options with taskset, I'd saythat it is a combination. With taskset I'm just constraining theaffinity placement decided by OpenMPI to the set of processors from 0 to27. In any case, the affinity configuration is the same for v1.10.1 andv1.10.2, namely:

Mapper requested: NULL Last mapper: round_robin Mapping policy:BYSOCKET Ranking policy: SLOTBinding policy: NONE:IF-SUPPORTED Cpu set: NULL PPR: NULLCpus-per-rank: 1

     Num new daemons: 0    New daemon starting vpid INVALID
     Num nodes: 1

Per text later in your mail, "taskset -c 0-27" corresponds to the first 
hardware thread on each core.

Hence, this is effectively binding each process to the set of all "first hardware 
threads" across all cores.

Yes, that was the intention to avoid running two MPI processes in thesame physical core.

I'm guessing that this difference is going to end up being the symptom of a 
highly complex system, of which spin-waiting is playing a part.  I.e., if Open 
MPI weren't spin waiting, this might not be happening.

I'm not sure about the impact of spin-waiting, taking into account thatOpenMPI is running in degraded mode.


Thanks

http://bsc.es/disclaimer
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?

Reply via email to