MCA coll: parameter "coll_ml_priority" (current value: "0", data source: default, level: 9 dev/all, type: int)

Not sure how to read this, but for any n>1 mpirun only works with --mca coll ^ml

Thanks for helping

On 06/18/2015 04:36 PM, Gilles Gouaillardet wrote:
This is really odd...

you can run
ompi_info --all
and search coll_ml_priority

it will display the current value and the origin
(e.g. default, system wide config, user config, cli, environment variable)

Cheers,

Gilles

On Thursday, June 18, 2015, Daniel Letai <d...@letai.org.il <mailto:d...@letai.org.il>> wrote:

    No, that's the issue.
    I had to disable it to get things working.

    That's why I included my config settings - I couldn't figure out
    which option enabled it, so I could remove it from the
    configuration...

    On 06/18/2015 02:43 PM, Gilles Gouaillardet wrote:
    Daniel,

    ML module is not ready for production and is disabled by default.

    Did you explicitly enable this module ?
    If yes, I encourage you to disable it

    Cheers,

    Gilles

    On Thursday, June 18, 2015, Daniel Letai <d...@letai.org.il
    <javascript:_e(%7B%7D,'cvml','d...@letai.org.il');>> wrote:

        given a simple hello.c:

        #include <stdio.h>
        #include <mpi.h>

        int main(int argc, char* argv[])
        {
                int size, rank, len;
                char name[MPI_MAX_PROCESSOR_NAME];

                MPI_Init(&argc, &argv);
                MPI_Comm_size(MPI_COMM_WORLD, &size);
                MPI_Comm_rank(MPI_COMM_WORLD, &rank);
                MPI_Get_processor_name(name, &len);

                printf("%s: Process %d out of %d\n", name, rank, size);

                MPI_Finalize();ffff
        }

        for n=1
        mpirun -n 1 ./hello
        it works correctly.

        for n>1 it segfaults with signal 11
        used gdb to trace the problem to ml coll:

        Program received signal SIGSEGV, Segmentation fault.
        0x00007ffff6750845 in ml_coll_hier_barrier_setup()
            from <path to openmpi 1.8.5>/lib/openmpi/mca_coll_ml.so

        running with
        mpirun -n 2 --mca coll ^ml ./hello
        works correctly

        using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at all
        relevant.
        openmpi 1.8.5 was built with following options:
        rpmbuild --rebuild --define 'configure_options
        --with-verbs=/usr --with-verbs-libdir=/usr/lib64 CC=gcc
        CXX=g++ FC=gfortran CFLAGS="-g -O3"
        --enable-mpirun-prefix-by-default
        --enable-orterun-prefix-by-default --disable-debug
        --with-knem=/opt/knem-1.1.1.90mlnx --with-platform=optimized
        --without-mpi-param-check
        --with-contrib-vt-flags=--disable-iotrace
        --enable-builtin-atomics --enable-cxx-exceptions
        --enable-sparse-groups --enable-mpi-thread-multiple
        --enable-memchecker --enable-btl-openib-failover
        --with-hwloc=internal --with-verbs --with-x --with-slurm
        --with-pmi=/opt/slurm --with-fca=/opt/mellanox/fca
        --with-mxm=/opt/mellanox/mxm
        --with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm

        gcc version 5.1.1

        Thanks in advance
        _______________________________________________
        users mailing list
        us...@open-mpi.org
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2015/06/27154.php



    _______________________________________________
    users mailing list
    us...@open-mpi.org  <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
    Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/06/27155.php



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27157.php

Reply via email to