On Mar 27, 2017, at 11:00 AM, r...@open-mpi.org wrote: > > I’m confused - mpi_yield_when_idle=1 is precisely the “oversubscribed” > setting. So why would you expect different results?
A few additional points to Ralph's question: 1. Recall that sched_yield() has effectively become a no-op in newer Linux kernels. Hence, Open MPI's "yield when idle" may not do much to actually de-schedule a currently-running process. 2. As for why there is a difference between version 1.10.1 and 1.10.2 in oversubscription behavior, we likely do not know offhand (as all of these emails have shown!). Honestly, we don't really pay much attention to oversubscription performance -- our focus tends to be on under/exactly-subscribed performance, because that's the normal operating mode for MPI applications. With oversubscribed, we have typically just said "all bets are off" and leave it at that. 3. I don't recall if there was a default affinity policy change between 1.10.1 and 1.10.2. Do you know that your taskset command is -- for absolutely sure -- overriding what Open MPI is doing? Or is what Open MPI is doing in terms of affinity/binding getting merged with what your taskset call is doing somehow...? (seems unlikely, but I figured I'd ask anyway) One more question -- see below: >> Thanks for your feedback. As described here >> (https://www.open-mpi.org/faq/?category=running#oversubscribing), OpenMPI >> detects that I'm oversubscribing and runs in degraded mode (yielding the >> processor). Anyway, I repeated the experiments setting explicitly the >> yielding flag, and I obtained the same weird results: >> >> $HOME/openmpi-bin-1.10.1/bin/mpirun --mca mpi_yield_when_idle 1 -np 36 >> taskset -c 0-27 $HOME/NPB/NPB3.3-MPI/bin/bt.C.36 -> Time in seconds = 82.79 >> $HOME/openmpi-bin-1.10.2/bin/mpirun --mca mpi_yield_when_idle 1 -np 36 >> taskset -c 0-27 $HOME/NPB/NPB3.3-MPI/bin/bt.C.36 -> Time in seconds = 110.93 Per text later in your mail, "taskset -c 0-27" corresponds to the first hardware thread on each core. Hence, this is effectively binding each process to the set of all "first hardware threads" across all cores. >> Given these results, it seems that spin-waiting is not causing the issue. I'm guessing that this difference is going to end up being the symptom of a highly complex system, of which spin-waiting is playing a part. I.e., if Open MPI weren't spin waiting, this might not be happening. -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users