I have used POSIX threading and Open MPI without problems on our Opteron
2216 Cluster (4 cores per node). Moving to core-level parallelization
with multi threading resulted in significant performance gains.
Sam Adams wrote:
I have been looking, but I haven't really found a good answer about
system level threading. We are about to get a new cluster of
dual-processor quad-core nodes or 8 cores per node. Traditionally I
would just tell MPI to launch two processes per dual processor single
core node, but with eight cores on a node, having 8 processes seems
inefficient.
My question is this: does OpenMPI sense that there are multiple cores
on a node and use something like pthreads instead of creating new
processes automatically when I request 8 processes for a node, or
should I run a single process per node and use OpenMP or pthreads
explicitly to get better performance on a per node basis?