Daniel, ok, thanks
it seems that even if priority is zero, some code gets executed I will confirm this tomorrow and send you a patch to work around the issue if that if my guess is proven right Cheers, Gilles On Sunday, June 21, 2015, Daniel Letai <d...@letai.org.il> wrote: > MCA coll: parameter "coll_ml_priority" (current value: "0", data source: > default, level: 9 dev/all, type: int) > > Not sure how to read this, but for any n>1 mpirun only works with --mca > coll ^ml > > Thanks for helping > > On 06/18/2015 04:36 PM, Gilles Gouaillardet wrote: > > This is really odd... > > you can run > ompi_info --all > and search coll_ml_priority > > it will display the current value and the origin > (e.g. default, system wide config, user config, cli, environment variable) > > Cheers, > > Gilles > > On Thursday, June 18, 2015, Daniel Letai <d...@letai.org.il > <javascript:_e(%7B%7D,'cvml','d...@letai.org.il');>> wrote: > >> No, that's the issue. >> I had to disable it to get things working. >> >> That's why I included my config settings - I couldn't figure out which >> option enabled it, so I could remove it from the configuration... >> >> On 06/18/2015 02:43 PM, Gilles Gouaillardet wrote: >> >> Daniel, >> >> ML module is not ready for production and is disabled by default. >> >> Did you explicitly enable this module ? >> If yes, I encourage you to disable it >> >> Cheers, >> >> Gilles >> >> On Thursday, June 18, 2015, Daniel Letai <d...@letai.org.il> wrote: >> >>> given a simple hello.c: >>> >>> #include <stdio.h> >>> #include <mpi.h> >>> >>> int main(int argc, char* argv[]) >>> { >>> int size, rank, len; >>> char name[MPI_MAX_PROCESSOR_NAME]; >>> >>> MPI_Init(&argc, &argv); >>> MPI_Comm_size(MPI_COMM_WORLD, &size); >>> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>> MPI_Get_processor_name(name, &len); >>> >>> printf("%s: Process %d out of %d\n", name, rank, size); >>> >>> MPI_Finalize();ffff >>> } >>> >>> for n=1 >>> mpirun -n 1 ./hello >>> it works correctly. >>> >>> for n>1 it segfaults with signal 11 >>> used gdb to trace the problem to ml coll: >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> 0x00007ffff6750845 in ml_coll_hier_barrier_setup() >>> from <path to openmpi 1.8.5>/lib/openmpi/mca_coll_ml.so >>> >>> running with >>> mpirun -n 2 --mca coll ^ml ./hello >>> works correctly >>> >>> using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at all relevant. >>> openmpi 1.8.5 was built with following options: >>> rpmbuild --rebuild --define 'configure_options --with-verbs=/usr >>> --with-verbs-libdir=/usr/lib64 CC=gcc CXX=g++ FC=gfortran CFLAGS="-g -O3" >>> --enable-mpirun-prefix-by-default --enable-orterun-prefix-by-default >>> --disable-debug --with-knem=/opt/knem-1.1.1.90mlnx >>> --with-platform=optimized --without-mpi-param-check >>> --with-contrib-vt-flags=--disable-iotrace --enable-builtin-atomics >>> --enable-cxx-exceptions --enable-sparse-groups --enable-mpi-thread-multiple >>> --enable-memchecker --enable-btl-openib-failover --with-hwloc=internal >>> --with-verbs --with-x --with-slurm --with-pmi=/opt/slurm >>> --with-fca=/opt/mellanox/fca --with-mxm=/opt/mellanox/mxm >>> --with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm >>> >>> gcc version 5.1.1 >>> >>> Thanks in advance >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/06/27154.php >>> >> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/06/27155.php >> >> >> > > _______________________________________________ > users mailing listus...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/06/27157.php > > >