Re: [OMPI users] Performance degradation of OpenMPI 1.10.2 when oversubscribed?

2017-03-28 Thread Jordi Guitart

Hi,

On 27/03/2017 17:51, Jeff Squyres (jsquyres) wrote:

1. Recall that sched_yield() has effectively become a no-op in newer Linux kernels.  
Hence, Open MPI's "yield when idle" may not do much to actually de-schedule a 
currently-running process.
Yes, I'm aware of this. However, this should impact both OpenMPI 
versions in the same way.

2. As for why there is a difference between version 1.10.1 and 1.10.2 in oversubscription 
behavior, we likely do not know offhand (as all of these emails have shown!).  Honestly, 
we don't really pay much attention to oversubscription performance -- our focus tends to 
be on under/exactly-subscribed performance, because that's the normal operating mode for 
MPI applications.  With oversubscribed, we have typically just said "all bets are 
off" and leave it at that.
I agree that oversubscription is not the typical usage scenario, and I 
can understand the optimizing its performance is not a priority. But 
maybe the problem that I'm facing is just a symptom that something is 
not working properly and this could impact also undersubscription 
scenarios (of course, to a lesser extent).


3. I don't recall if there was a default affinity policy change between 1.10.1 
and 1.10.2.  Do you know that your taskset command is -- for absolutely sure -- 
overriding what Open MPI is doing?  Or is what Open MPI is doing in terms of 
affinity/binding getting merged with what your taskset call is doing 
somehow...?  (seems unlikely, but I figured I'd ask anyway)
Regarding the changes between 1.10.1 and 1.10.2, I only found one that 
seems related with oversubscription (i.e. "Correctly handle 
oversubscription when not given directives to permit it"). I don't know 
if this could be impacting somehow ...


Regarding the impact of OpenMPI affinity options with taskset, I'd say 
that it is a combination. With taskset I'm just constraining the 
affinity placement decided by OpenMPI to the set of processors from 0 to 
27. In any case, the affinity configuration is the same for v1.10.1 and 
v1.10.2, namely:


Mapper requested: NULL  Last mapper: round_robin  Mapping policy: 
BYSOCKET  Ranking policy: SLOT
 Binding policy: NONE:IF-SUPPORTED  Cpu set: NULL  PPR: NULL 
Cpus-per-rank: 1

 Num new daemons: 0New daemon starting vpid INVALID
 Num nodes: 1


Per text later in your mail, "taskset -c 0-27" corresponds to the first 
hardware thread on each core.

Hence, this is effectively binding each process to the set of all "first hardware 
threads" across all cores.
Yes, that was the intention to avoid running two MPI processes in the 
same physical core.

I'm guessing that this difference is going to end up being the symptom of a 
highly complex system, of which spin-waiting is playing a part.  I.e., if Open 
MPI weren't spin waiting, this might not be happening.

I'm not sure about the impact of spin-waiting, taking into account that 
OpenMPI is running in degraded mode.


Thanks

http://bsc.es/disclaimer
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Openmpi 1.10.4 crashes with 1024 processes

2017-03-28 Thread Götz Waschk
Hi everyone,

so how do I proceed with this problem, do you need more information?
Should I open a bug report on github?

Regards, Götz Waschk
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users