Hello!
I have open mpi 1.9a1r32104 and open mpi 1.5.5.
I have much better perfomance in open mpi 1.5.5 with openMP on 8 cores
in  the program:
....

#define N 10000000
int main(int argc, char *argv[]) {
...............
MPI_Init(&argc, &argv);
...............
for (i = 0; i < N; i++) {
a[i] = i * 1.0;
b[i] = i * 2.0;
}

#pragma omp parallel for shared(a, b, c) private(i)
for (i = 0; i < N; i++) {
c[i] = a[i] + b[i];
}
.............
MPI_Finalize();
}
I got on 1 node 
(for i in 1 2 4 8 ; do export OMP_NUM_THREADS=$i; sbatch -p test -t 5 
--exclusive -N 1 -o hybrid-hello_omp$i.out -e hybrid-hello_omp$i.err 
ompi_mxm3.0 ./hybrid-hello; done)
*   open mpi 1.5.5 (Data for node: node1-128-17 Num slots: 8 Max slots: 0): 
*  8 threads 0.014527 sec
*  4 threads 0.016138 sec
*  2 threads 0.018764 sec
*  1 thread   0.029963 sec
*  openmpi 1.9a1r32104 ( node1-128-29: slots=8 max_slots=0 slots_inuse=0 
state=UP ):
*  8 threads 0.035055 sec
*  4 threads 0.029859 sec 
*  2 threads 0.019564 sec  (same as  open mpi 1.5.5 )
*  1 thread   0.028394 sec (same as  open mpi 1.5.5 )
So, it looks like, that open mpi 1.9 use only 2 cores from 8.

What can i do with this?

$cat ompi_mxm3.0
#!/bin/sh
[ x"$TMPDIR" == x"" ] && TMPDIR=/tmp
HOSTFILE=${TMPDIR}/hostfile.${SLURM_JOB_ID}
srun hostname -s|sort|uniq -c|awk '{print $2" slots="$1}' > $HOSTFILE || { rm 
-f $HOSTFILE; exit 255; }
LD_PRELOAD=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/lib/libmxm.so 
mpirun -x LD_PRELOAD -x MXM_SHM_KCOPY_MODE=off --display-allocation --hostfile 
$HOSTFILE "$@"
rc=$?
rm -f $HOSTFILE
exit $rc

For open mpi 1.5.5 i remove LD_PRELOAD from run script.

Reply via email to