Hello!
I've built openmpi 1.6.1rc3 with support of MXM. But when I try to launch
an application using this mtl it hangs and can't figure out why.
If I launch it with np below 128 then everything works fine since mxm isn't
used. I've tried setting the threshold to 0 and launching 2 processes with
the same result: hangs on startup.
What could be causing this problem?
Here is the command I execute:
/opt/openmpi/1.6.1/mxm-test/bin/mpirun \
-np $NP \
-hostfile hosts_fdr2 \
--mca mtl mxm \
--mca btl ^tcp \
--mca mtl_mxm_np 0 \
-x OMP_NUM_THREADS=$NT \
-x LD_LIBRARY_PATH \
--bind-to-core \
-npernode 16 \
--mca coll_fca_np 0 -mca coll_fca_enable 0 \
./IMB-MPI1 -npmin $NP Allreduce Reduce Barrier Bcast
Allgather Allgatherv
I'm performing the tests on nodes with Intel SB processors and FDR. Openmpi
was configured with the following parameters:
CC=icc CXX=icpc F77=ifort FC=ifort ./configure
--prefix=/opt/openmpi/1.6.1rc3/mxm-test --with-mxm=/opt/mellanox/mxm
--with-fca=/opt/mellanox/fca --with-knem=/usr/share/knem
I'm using the latest ofed from mellanox: 1.5.3-3.1.0 on centos 6.1 with
default kernel: 2.6.32-131.0.15.
The compilation with default mxm (1.0.601) failed so I installed the latest
version from mellanox: 1.1.1227
Best regards, Pavel Mezentsev.