Hello! I have open mpi 1.9a1r32104 and open mpi 1.5.5. I have much better perfomance in open mpi 1.5.5 with openMP on 8 cores in the program: ....
#define N 10000000 int main(int argc, char *argv[]) { ............... MPI_Init(&argc, &argv); ............... for (i = 0; i < N; i++) { a[i] = i * 1.0; b[i] = i * 2.0; } #pragma omp parallel for shared(a, b, c) private(i) for (i = 0; i < N; i++) { c[i] = a[i] + b[i]; } ............. MPI_Finalize(); } I got on 1 node (for i in 1 2 4 8 ; do export OMP_NUM_THREADS=$i; sbatch -p test -t 5 --exclusive -N 1 -o hybrid-hello_omp$i.out -e hybrid-hello_omp$i.err ompi_mxm3.0 ./hybrid-hello; done) * open mpi 1.5.5 (Data for node: node1-128-17 Num slots: 8 Max slots: 0): * 8 threads 0.014527 sec * 4 threads 0.016138 sec * 2 threads 0.018764 sec * 1 thread 0.029963 sec * openmpi 1.9a1r32104 ( node1-128-29: slots=8 max_slots=0 slots_inuse=0 state=UP ): * 8 threads 0.035055 sec * 4 threads 0.029859 sec * 2 threads 0.019564 sec (same as open mpi 1.5.5 ) * 1 thread 0.028394 sec (same as open mpi 1.5.5 ) So, it looks like, that open mpi 1.9 use only 2 cores from 8. What can i do with this? $cat ompi_mxm3.0 #!/bin/sh [ x"$TMPDIR" == x"" ] && TMPDIR=/tmp HOSTFILE=${TMPDIR}/hostfile.${SLURM_JOB_ID} srun hostname -s|sort|uniq -c|awk '{print $2" slots="$1}' > $HOSTFILE || { rm -f $HOSTFILE; exit 255; } LD_PRELOAD=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/lib/libmxm.so mpirun -x LD_PRELOAD -x MXM_SHM_KCOPY_MODE=off --display-allocation --hostfile $HOSTFILE "$@" rc=$? rm -f $HOSTFILE exit $rc For open mpi 1.5.5 i remove LD_PRELOAD from run script.