Hello,
I am getting some strange results when I enable the MCA parameters: mpi_yield_when_idle.

What happen is that for MPI programs which do lots of synchronization, MPI_Barrier and MPI_Wait I get very good speedup (2.x) by turning on the parameter (e.g. the CG benchmark of the NAS parallel benchmarks suite). I am not oversubscribing nodes, I am running 8 processes in a SMP system with exactly 8 physical cores (cache is shared on every 2 cores).

The only way I was explaining this result is because of temperature issues that scale down the clock speed of the entire chip if all the cores are getting too hot (because of the busy waiting). Anyway I tried to replicate the behavior with a trivial (non MPI) code where one core is doing some work while the others (belonging to the same chip) are busy waiting but I didn't get the same speedup when I switch from the busy wait to idle implementation.

Someone of you has any idea why is this happening?

regards, Simone

Reply via email to