On Jan 11, 2007, at 6:59 AM, Wolfgang Wieser wrote:

I'm just in progress of selecting an MPI implementation to be
used on a compute server cluster at the University of Munich.
Since MPI_THREAD_MULTIPLE is a requirement, I went for OpenMPI.

Sorry for the delay in replying here -- all the OMPI developers are crunching to meet our internal deadlines for the upcoming OMPI v1.2 release.

Note that our MPI_THREAD_MULTIPLE support is hapazard at best. :-\ Multi-threaded support has been designed in from the very beginning, but it has not risen high enough in priority yet to fully test and debug MPI_THREAD_MULTIPLE support.

The setup is a rack of boxes running Linux and connected with
gigabit ethernet.

However, there is a severe problem:
Blocking functions like MPI_Probe() suck all CPU power.
But as everybody knows, select(2), poll(2) and recently also
epoll(2) were invented to give implementes a possibility to write
programs with multiple IO channels without the need for busy waiting.

So, I wonder if there is a way to have OpenMPI not make use of busy
waiting but rather apply some kernel-level event selection function
like the ones mentioned above.

The problem is that OMPI may have to poll several different types of networks, to include shared memory. So we revert to a polling approach, which sucks up lots of CPU. We pretty much assume that the MPI process has full reign of the processor. For multi-threaded scenarios, blocking progress threads are the plan, but as I mentioned above, these are *very* loosely tested. I would not consider them stable.

What you can do, however, is tell OMPI to poll in a less aggressive mode -- meaning that we effectively call sched_yield() in every iteration. You can do this by setting the "mpi_yield_when_idle" MCA parameter to 1. For example:

  shell$ mpirun --mca mpi_yield_when_idle 1 -np 4 a.out

Additionally, there is ongoing discussion occurring right now between OMPI developers to allow blocking when there is only TCP being used (e.g., you disable shared memory at run time). It's unclear yet whether this will be included in v1.2, but if it does, it will be effective when you disable shared memory. For example:

  shell$ mpirun --mca btl ^sm -np 4 a.out

See the FAQ for more information about how to set MCA parameters, etc.

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Reply via email to