The primary person you need to talk to is turning in her dissertation within the next few days. So I think she's kinda busy at the moment... :-)

Sorry for the delay -- I'll take a shot at answers below...


On Aug 14, 2007, at 4:39 PM, smai...@ksu.edu wrote:

Can anyone help on this?

-Thanks,
Sarang.

Quoting smai...@ksu.edu:

Hi,
I am doing a research on parallel techniques for shared-memory
systems(NUMA). I understand that OpenMPI is intelligent to utilize
shared-memory system and it uses processor-affinity.

Open MPI has coarse-grained processor-affinity control, see:

http://www.open-mpi.org/faq/?category=tuning#using-paffinity

Expect to see more functionality / flexibility here in the future...

Is the OpenMPI design of MPI_AllReduce "same" for shared-memory (NUMA) as well as distributed system? Can someone please tell me MPI_AllReduce design, in brief, in terms of processes and their interaction on shared-memory?

Open MPI is fundamentally based on plugins. We have plugins in for various flavors of collective algorithms (see the code base: ompi/mca/ coll/), one of which is "sm" (shared memory). The shared memory collectives are currently quite limited but are being expanded and improved by Indiana University (e.g., IIRC, allreduce uses the shared memory reduce followed by a shared memory bcast).

The "tuned" collective plugin has its own implementation(s) of Allreduce -- Jelena or George will have to comment here. They do not assume shared memory; they use well-known algorithms for allreduce. The "tuned" component basically implements a wide variety of algorithms for each MPI collective and attempts to choose which one will be best to use at run-time. U. Tennessee has done a lot of work in this area and I think they have several published papers on it.

The "basic" plugin is the dirt-simple correct-but-not-optimized component that does simple linear and logarithmic algorithms for all the MPI collectives. If we don't have a usable algorithm anywhere else, we fall back to the basic plugin (e.g., allreduce is a reduce followed by a bcast).

Else please suggest me a good reference for this.

Our basic philosophy / infrastructure for MPI collectives is based on this paper:

    http://www.open-mpi.org/papers/ics-2004/

Although work that happened literally last week is just about to hit the development trunk (within a week or so -- still doing some debugging) that brings Goodness to allowing a first-level of mixing-n- matching between collective components that do not provide all the MPI algorithms. I can explain more if you care.

Hope this helps...

--
Jeff Squyres
Cisco Systems

Reply via email to