given a simple hello.c:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[])
{
int size, rank, len;
char name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(name, &len);
printf("%s: Process %d out of %d\n", name, rank, size);
MPI_Finalize();ffff
}
for n=1
mpirun -n 1 ./hello
it works correctly.
for n>1 it segfaults with signal 11
used gdb to trace the problem to ml coll:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6750845 in ml_coll_hier_barrier_setup()
from <path to openmpi 1.8.5>/lib/openmpi/mca_coll_ml.so
running with
mpirun -n 2 --mca coll ^ml ./hello
works correctly
using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at all relevant.
openmpi 1.8.5 was built with following options:
rpmbuild --rebuild --define 'configure_options --with-verbs=/usr
--with-verbs-libdir=/usr/lib64 CC=gcc CXX=g++ FC=gfortran CFLAGS="-g
-O3" --enable-mpirun-prefix-by-default
--enable-orterun-prefix-by-default --disable-debug
--with-knem=/opt/knem-1.1.1.90mlnx --with-platform=optimized
--without-mpi-param-check --with-contrib-vt-flags=--disable-iotrace
--enable-builtin-atomics --enable-cxx-exceptions --enable-sparse-groups
--enable-mpi-thread-multiple --enable-memchecker
--enable-btl-openib-failover --with-hwloc=internal --with-verbs --with-x
--with-slurm --with-pmi=/opt/slurm --with-fca=/opt/mellanox/fca
--with-mxm=/opt/mellanox/mxm --with-hcoll=/opt/mellanox/hcoll'
openmpi-1.8.5-1.src.rpm
gcc version 5.1.1
Thanks in advance