On 7/22/2010 4:11 PM, Gus Correa wrote:
Hi Cristobal

Cristobal Navarro wrote:
yes,
i was aware of the big difference hehe.

now that openMP and openMPI is in talk, i've alwyas wondered if its a
good idea to model a solution on the following way, using both openMP
and openMPI.
suppose you have n nodes, each node has a quadcore, (so you have n*4 processors)
launch n proceses acorrding to the n nodes available.
set a resource manager like SGE to fill the n*4 slots using round robin.
on each process, make use of the other cores available on the node,
with openMP.

if this is possible, then on each one could make use fo the shared
memory model locally at each node, evading unnecesary I/O through the
nwetwork, what do you think?

Before asking what we think about this, please check the many references posted on this subject over the last decade. Then refine your question to what you are interested in hearing about; evidently you have no interest in much of this topic.

Yes, it is possible, and many of the atmosphere/oceans/climate codes
that we run is written with this capability. In other areas of
science and engineering this is probably the case too.

However, this is not necessarily better/faster/simpler than dedicate all the cores to MPI processes.

In my view, this is due to:

1) OpenMP has a different scope than MPI,
and to some extent is limited by more stringent requirements than MPI;

2) Most modern MPI implementations (and OpenMPI is an example) use shared memory mechanisms to communicate between processes that reside
in a single physical node/computer;
The shared memory communication of several MPI implementations does greatly improve efficiency of message passing among ranks assigned to the same node. However, these ranks also communicate with ranks on other nodes, so there is a large potential advantage for hybrid MPI/OpenMP as the number of cores in use increases. If you aren't interested in running on more than 8 nodes or so, perhaps you won't care about this.

3) Writing hybrid code with MPI and OpenMP requires more effort,
and much care so as not to let the two forms of parallelism step on
each other's toes.
The MPI standard specifies the use of MPI_init_thread to indicate which combination of MPI and threading you intend to use, and to inquire whether that model is supported by the active MPI. In the case where there is only 1 MPI process per node (possibly using several cores via OpenMP threading) there is no requirement for special affinity support. If there is more than 1 FUNNELED rank per multiple CPU node, it becomes important to maintain cache locality for each rank.

OpenMP operates mostly through compiler directives/pragmas interspersed
on the code.  For instance, you can parallelize inner loops in no time,
granted that there are no data dependencies across the commands within the loop. All it takes is to write one or two directive/pragma lines.
More than loop parallelization can be done with OpenMP, of course,
although not as much as can be done with MPI.
Still, with OpenMP, you are restricted to work in a shared memory environment.

By contrast, MPI requires more effort to program, but it takes advantage
of shared memory and networked environments
(and perhaps extended grids too).

<snips>
snipped tons of stuff rather than attempt to reconcile top postings

--
Tim Prince

Reply via email to