Gilles,

Attached the two output logs.

Thanks,
Daniel

On 06/22/2015 08:08 AM, Gilles Gouaillardet wrote:
Daniel,

i double checked this and i cannot make any sense with these logs.

if coll_ml_priority is zero, then i do not any way how ml_coll_hier_barrier_setup can be invoked.

could you please run again with --mca coll_base_verbose 100
with and without --mca coll ^ml

Cheers,

Gilles

On 6/22/2015 12:08 AM, Gilles Gouaillardet wrote:
Daniel,

ok, thanks

it seems that even if priority is zero, some code gets executed
I will confirm this tomorrow and send you a patch to work around the issue if that if my guess is proven right

Cheers,

Gilles

On Sunday, June 21, 2015, Daniel Letai <d...@letai.org.il <mailto:d...@letai.org.il>> wrote:

    MCA coll: parameter "coll_ml_priority" (current value: "0", data
    source: default, level: 9 dev/all, type: int)

    Not sure how to read this, but for any n>1 mpirun only works with
    --mca coll ^ml

    Thanks for helping

    On 06/18/2015 04:36 PM, Gilles Gouaillardet wrote:
    This is really odd...

    you can run
    ompi_info --all
    and search coll_ml_priority

    it will display the current value and the origin
    (e.g. default, system wide config, user config, cli, environment
    variable)

    Cheers,

    Gilles

    On Thursday, June 18, 2015, Daniel Letai <d...@letai.org.il
    <javascript:_e(%7B%7D,'cvml','d...@letai.org.il');>> wrote:

        No, that's the issue.
        I had to disable it to get things working.

        That's why I included my config settings - I couldn't figure
        out which option enabled it, so I could remove it from the
        configuration...

        On 06/18/2015 02:43 PM, Gilles Gouaillardet wrote:
        Daniel,

        ML module is not ready for production and is disabled by
        default.

        Did you explicitly enable this module ?
        If yes, I encourage you to disable it

        Cheers,

        Gilles

        On Thursday, June 18, 2015, Daniel Letai
        <d...@letai.org.il> wrote:

            given a simple hello.c:

            #include <stdio.h>
            #include <mpi.h>

            int main(int argc, char* argv[])
            {
                    int size, rank, len;
                    char name[MPI_MAX_PROCESSOR_NAME];

                    MPI_Init(&argc, &argv);
                    MPI_Comm_size(MPI_COMM_WORLD, &size);
                    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
                    MPI_Get_processor_name(name, &len);

                    printf("%s: Process %d out of %d\n", name,
            rank, size);

                    MPI_Finalize();ffff
            }

            for n=1
            mpirun -n 1 ./hello
            it works correctly.

            for n>1 it segfaults with signal 11
            used gdb to trace the problem to ml coll:

            Program received signal SIGSEGV, Segmentation fault.
            0x00007ffff6750845 in ml_coll_hier_barrier_setup()
                from <path to openmpi 1.8.5>/lib/openmpi/mca_coll_ml.so

            running with
            mpirun -n 2 --mca coll ^ml ./hello
            works correctly

            using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's
            at all relevant.
            openmpi 1.8.5 was built with following options:
            rpmbuild --rebuild --define 'configure_options
            --with-verbs=/usr --with-verbs-libdir=/usr/lib64 CC=gcc
            CXX=g++ FC=gfortran CFLAGS="-g -O3"
            --enable-mpirun-prefix-by-default
            --enable-orterun-prefix-by-default --disable-debug
            --with-knem=/opt/knem-1.1.1.90mlnx
            --with-platform=optimized --without-mpi-param-check
            --with-contrib-vt-flags=--disable-iotrace
            --enable-builtin-atomics --enable-cxx-exceptions
            --enable-sparse-groups --enable-mpi-thread-multiple
            --enable-memchecker --enable-btl-openib-failover
            --with-hwloc=internal --with-verbs --with-x
            --with-slurm --with-pmi=/opt/slurm
            --with-fca=/opt/mellanox/fca
            --with-mxm=/opt/mellanox/mxm
            --with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm

            gcc version 5.1.1

            Thanks in advance
            _______________________________________________
            users mailing list
            us...@open-mpi.org
            Subscription:
            http://www.open-mpi.org/mailman/listinfo.cgi/users
            Link to this post:
            http://www.open-mpi.org/community/lists/users/2015/06/27154.php



        _______________________________________________
        users mailing list
        us...@open-mpi.org
        Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/06/27155.php



    _______________________________________________
    users mailing list
    us...@open-mpi.org  <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
    Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/06/27157.php



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/06/27169.php



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27170.php

[tux1:16402] mca: base: components_register: registering coll components
[tux1:16402] mca: base: components_register: found loaded component hierarch
[tux1:16402] mca: base: components_register: component hierarch register function successful
[tux1:16402] mca: base: components_register: found loaded component self
[tux1:16402] mca: base: components_register: component self register function successful
[tux1:16402] mca: base: components_register: found loaded component inter
[tux1:16402] mca: base: components_register: component inter register function successful
[tux1:16402] mca: base: components_register: found loaded component fca
[tux1:16402] mca: base: components_register: component fca register function successful
[tux1:16402] mca: base: components_register: found loaded component hcoll
[tux1:16402] mca: base: components_register: component hcoll register function successful
[tux1:16402] mca: base: components_register: found loaded component tuned
[tux1:16402] mca: base: components_register: component tuned register function successful
[tux1:16402] mca: base: components_register: found loaded component sm
[tux1:16402] mca: base: components_register: component sm register function successful
[tux1:16402] mca: base: components_register: found loaded component basic
[tux1:16402] mca: base: components_register: component basic register function successful
[tux1:16402] mca: base: components_register: found loaded component libnbc
[tux1:16402] mca: base: components_register: component libnbc register function successful
[tux1:16402] mca: base: components_open: opening coll components
[tux1:16402] mca: base: components_open: found loaded component hierarch
[tux1:16402] mca: base: components_open: found loaded component self
[tux1:16402] mca: base: components_open: found loaded component inter
[tux1:16402] mca: base: components_open: found loaded component fca
[tux1:16402] mca: base: components_open: component fca open function successful
[tux1:16402] mca: base: components_open: found loaded component hcoll
[tux1:16402] mca: base: components_open: component hcoll open function successful
[tux1:16402] mca: base: components_open: found loaded component tuned
[tux1:16402] mca: base: components_open: component tuned open function successful
[tux1:16402] mca: base: components_open: found loaded component sm
[tux1:16402] mca: base: components_open: found loaded component basic
[tux1:16402] mca: base: components_open: found loaded component libnbc
[tux1:16402] mca: base: components_open: component libnbc open function successful
[tux1:16403] mca: base: components_register: registering coll components
[tux1:16403] mca: base: components_register: found loaded component hierarch
[tux1:16403] mca: base: components_register: component hierarch register function successful
[tux1:16403] mca: base: components_register: found loaded component self
[tux1:16403] mca: base: components_register: component self register function successful
[tux1:16403] mca: base: components_register: found loaded component inter
[tux1:16403] mca: base: components_register: component inter register function successful
[tux1:16403] mca: base: components_register: found loaded component fca
[tux1:16403] mca: base: components_register: component fca register function successful
[tux1:16403] mca: base: components_register: found loaded component hcoll
[tux1:16403] mca: base: components_register: component hcoll register function successful
[tux1:16403] mca: base: components_register: found loaded component tuned
[tux1:16403] mca: base: components_register: component tuned register function successful
[tux1:16403] mca: base: components_register: found loaded component sm
[tux1:16403] mca: base: components_register: component sm register function successful
[tux1:16403] mca: base: components_register: found loaded component basic
[tux1:16403] mca: base: components_register: component basic register function successful
[tux1:16403] mca: base: components_register: found loaded component libnbc
[tux1:16403] mca: base: components_register: component libnbc register function successful
[tux1:16403] mca: base: components_open: opening coll components
[tux1:16403] mca: base: components_open: found loaded component hierarch
[tux1:16403] mca: base: components_open: found loaded component self
[tux1:16403] mca: base: components_open: found loaded component inter
[tux1:16403] mca: base: components_open: found loaded component fca
[tux1:16403] mca: base: components_open: component fca open function successful
[tux1:16403] mca: base: components_open: found loaded component hcoll
[tux1:16403] mca: base: components_open: component hcoll open function successful
[tux1:16403] mca: base: components_open: found loaded component tuned
[tux1:16403] mca: base: components_open: component tuned open function successful
[tux1:16403] mca: base: components_open: found loaded component sm
[tux1:16403] mca: base: components_open: found loaded component basic
[tux1:16403] mca: base: components_open: found loaded component libnbc
[tux1:16403] mca: base: components_open: component libnbc open function successful
[tux1:16403] coll:find_available: querying coll component hierarch
[tux1:16403] coll:find_available: coll component hierarch is available
[tux1:16403] coll:find_available: querying coll component self
[tux1:16403] coll:find_available: coll component self is available
[tux1:16403] coll:find_available: querying coll component inter
[tux1:16403] coll:find_available: coll component inter is available
[tux1:16403] coll:find_available: querying coll component fca
[tux1:16403] coll:find_available: coll component fca is available
[tux1:16403] coll:find_available: querying coll component hcoll
[tux1:16403] coll:find_available: coll component hcoll is available
[tux1:16403] coll:find_available: querying coll component tuned
[tux1:16403] coll:find_available: coll component tuned is available
[tux1:16403] coll:find_available: querying coll component sm
[tux1:16403] coll:sm:init_query: no other local procs; disqualifying myself
[tux1:16403] coll:find_available: coll component sm is not available
[tux1:16403] mca: base: close: component sm closed
[tux1:16403] mca: base: close: unloading component sm
[tux1:16402] coll:find_available: querying coll component hierarch
[tux1:16402] coll:find_available: coll component hierarch is available
[tux1:16402] coll:find_available: querying coll component self
[tux1:16402] coll:find_available: coll component self is available
[tux1:16402] coll:find_available: querying coll component inter
[tux1:16402] coll:find_available: coll component inter is available
[tux1:16402] coll:find_available: querying coll component fca
[tux1:16402] coll:find_available: coll component fca is available
[tux1:16402] coll:find_available: querying coll component hcoll
[tux1:16402] coll:find_available: coll component hcoll is available
[tux1:16402] coll:find_available: querying coll component tuned
[tux1:16402] coll:find_available: coll component tuned is available
[tux1:16402] coll:find_available: querying coll component sm
[tux1:16402] coll:sm:init_query: no other local procs; disqualifying myself
[tux1:16402] coll:find_available: coll component sm is not available
[tux1:16402] mca: base: close: component sm closed
[tux1:16402] mca: base: close: unloading component sm
[tux1:16403] coll:find_available: querying coll component basic
[tux1:16403] coll:find_available: coll component basic is available
[tux1:16403] coll:find_available: querying coll component libnbc
[tux1:16403] coll:find_available: coll component libnbc is available
[tux1:16402] coll:find_available: querying coll component basic
[tux1:16402] coll:find_available: coll component basic is available
[tux1:16402] coll:find_available: querying coll component libnbc
[tux1:16402] coll:find_available: coll component libnbc is available
[tux1:16403] coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
[tux1:16403] coll:base:comm_select: Checking all available modules
[tux1:16403] coll:base:comm_select: component not available: hierarch
[tux1:16403] coll:base:comm_select: component not available: self
[tux1:16402] coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
[tux1:16402] coll:base:comm_select: Checking all available modules
[tux1:16402] coll:base:comm_select: component not available: hierarch
[tux1:16402] coll:base:comm_select: component not available: self
[tux1:16402] coll:base:comm_select: component not available: inter
[tux1:16402] coll:base:comm_select: component not available: fca
[tux1:16403] coll:base:comm_select: component not available: inter
[tux1:16403] coll:base:comm_select: component not available: fca
[tux1:16402] coll:base:comm_select: component available: hcoll, priority: 90
[tux1:16402] coll:base:comm_select: component available: tuned, priority: 30
[tux1:16402] coll:base:comm_select: component available: basic, priority: 10
[tux1:16403] coll:base:comm_select: component available: hcoll, priority: 90
[tux1:16403] coll:base:comm_select: component available: tuned, priority: 30
[tux1:16403] coll:base:comm_select: component available: basic, priority: 10
[tux1:16402] coll:base:comm_select: component available: libnbc, priority: 10
[tux1:16403] coll:base:comm_select: component available: libnbc, priority: 10
[tux1:16402] coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1)
[tux1:16402] coll:base:comm_select: Checking all available modules
[tux1:16402] coll:base:comm_select: component not available: hierarch
[tux1:16403] coll:base:comm_select: new communicator: MPI_COMM_SELF (cid 1)
[tux1:16403] coll:base:comm_select: Checking all available modules
[tux1:16403] coll:base:comm_select: component not available: hierarch
[tux1:16402] coll:base:comm_select: component available: self, priority: 75
[tux1:16402] coll:base:comm_select: component not available: inter
[tux1:16403] coll:base:comm_select: component available: self, priority: 75
[tux1:16403] coll:base:comm_select: component not available: inter
[tux1:16403] coll:base:comm_select: component not available: fca
[tux1:16403] coll:base:comm_select: component not available: hcoll
[tux1:16403] coll:base:comm_select: component not available: tuned
[tux1:16403] coll:base:comm_select: component available: basic, priority: 10
[tux1:16403] coll:base:comm_select: component available: libnbc, priority: 10
[tux1:16402] coll:base:comm_select: component not available: fca
[tux1:16402] coll:base:comm_select: component not available: hcoll
[tux1:16402] coll:base:comm_select: component not available: tuned
[tux1:16402] coll:base:comm_select: component available: basic, priority: 10
[tux1:16402] coll:base:comm_select: component available: libnbc, priority: 10
tux1: Process 0 out of 2
tux1: Process 1 out of 2
[tux1:16403] mca: base: close: unloading component hierarch
[tux1:16402] mca: base: close: unloading component hierarch
[tux1:16402] mca: base: close: unloading component self
[tux1:16403] mca: base: close: unloading component self
[tux1:16402] mca: base: close: unloading component inter
[tux1:16403] mca: base: close: unloading component inter
[tux1:16403] mca: base: close: component fca closed
[tux1:16403] mca: base: close: unloading component fca
[tux1:16402] mca: base: close: component fca closed
[tux1:16402] mca: base: close: unloading component fca
[tux1:16403] mca: base: close: component hcoll closed
[tux1:16403] mca: base: close: unloading component hcoll
[tux1:16403] mca: base: close: component tuned closed
[tux1:16403] mca: base: close: unloading component tuned
[tux1:16403] mca: base: close: unloading component basic
[tux1:16403] mca: base: close: component libnbc closed
[tux1:16403] mca: base: close: unloading component libnbc
[tux1:16402] mca: base: close: component hcoll closed
[tux1:16402] mca: base: close: unloading component hcoll
[tux1:16402] mca: base: close: component tuned closed
[tux1:16402] mca: base: close: unloading component tuned
[tux1:16402] mca: base: close: unloading component basic
[tux1:16402] mca: base: close: component libnbc closed
[tux1:16402] mca: base: close: unloading component libnbc
[tux1:15540] mca: base: components_register: registering coll components
[tux1:15540] mca: base: components_register: found loaded component ml
[tux1:15540] mca: base: components_register: component ml register function successful
[tux1:15540] mca: base: components_register: found loaded component hierarch
[tux1:15540] mca: base: components_register: component hierarch register function successful
[tux1:15540] mca: base: components_register: found loaded component self
[tux1:15540] mca: base: components_register: component self register function successful
[tux1:15540] mca: base: components_register: found loaded component inter
[tux1:15540] mca: base: components_register: component inter register function successful
[tux1:15540] mca: base: components_register: found loaded component fca
[tux1:15540] mca: base: components_register: component fca register function successful
[tux1:15540] mca: base: components_register: found loaded component hcoll
[tux1:15540] mca: base: components_register: component hcoll register function successful
[tux1:15540] mca: base: components_register: found loaded component tuned
[tux1:15540] mca: base: components_register: component tuned register function successful
[tux1:15540] mca: base: components_register: found loaded component sm
[tux1:15540] mca: base: components_register: component sm register function successful
[tux1:15540] mca: base: components_register: found loaded component basic
[tux1:15540] mca: base: components_register: component basic register function successful
[tux1:15540] mca: base: components_register: found loaded component libnbc
[tux1:15540] mca: base: components_register: component libnbc register function successful
[tux1:15540] mca: base: components_open: opening coll components
[tux1:15540] mca: base: components_open: found loaded component ml
[tux1:15540] mca: base: close: component ml closed
[tux1:15540] mca: base: close: unloading component ml
[tux1:15540] mca: base: components_open: found loaded component hierarch
[tux1:15540] mca: base: components_open: found loaded component self
[tux1:15540] mca: base: components_open: found loaded component inter
[tux1:15540] mca: base: components_open: found loaded component fca
[tux1:15540] mca: base: components_open: component fca open function successful
[tux1:15540] mca: base: components_open: found loaded component hcoll
[tux1:15540] mca: base: components_open: component hcoll open function successful
[tux1:15540] mca: base: components_open: found loaded component tuned
[tux1:15540] mca: base: components_open: component tuned open function successful
[tux1:15540] mca: base: components_open: found loaded component sm
[tux1:15540] mca: base: components_open: found loaded component basic
[tux1:15540] mca: base: components_open: found loaded component libnbc
[tux1:15540] mca: base: components_open: component libnbc open function successful
[tux1:15541] mca: base: components_register: registering coll components
[tux1:15541] mca: base: components_register: found loaded component ml
[tux1:15541] mca: base: components_register: component ml register function successful
[tux1:15541] mca: base: components_register: found loaded component hierarch
[tux1:15541] mca: base: components_register: component hierarch register function successful
[tux1:15541] mca: base: components_register: found loaded component self
[tux1:15541] mca: base: components_register: component self register function successful
[tux1:15541] mca: base: components_register: found loaded component inter
[tux1:15541] mca: base: components_register: component inter register function successful
[tux1:15541] mca: base: components_register: found loaded component fca
[tux1:15541] mca: base: components_register: component fca register function successful
[tux1:15541] mca: base: components_register: found loaded component hcoll
[tux1:15541] mca: base: components_register: component hcoll register function successful
[tux1:15541] mca: base: components_register: found loaded component tuned
[tux1:15541] mca: base: components_register: component tuned register function successful
[tux1:15541] mca: base: components_register: found loaded component sm
[tux1:15541] mca: base: components_register: component sm register function successful
[tux1:15541] mca: base: components_register: found loaded component basic
[tux1:15541] mca: base: components_register: component basic register function successful
[tux1:15541] mca: base: components_register: found loaded component libnbc
[tux1:15541] mca: base: components_register: component libnbc register function successful
[tux1:15541] mca: base: components_open: opening coll components
[tux1:15541] mca: base: components_open: found loaded component ml
[tux1:15541] mca: base: close: component ml closed
[tux1:15541] mca: base: close: unloading component ml
[tux1:15541] mca: base: components_open: found loaded component hierarch
[tux1:15541] mca: base: components_open: found loaded component self
[tux1:15541] mca: base: components_open: found loaded component inter
[tux1:15541] mca: base: components_open: found loaded component fca
[tux1:15541] mca: base: components_open: component fca open function successful
[tux1:15541] mca: base: components_open: found loaded component hcoll
[tux1:15541] mca: base: components_open: component hcoll open function successful
[tux1:15541] mca: base: components_open: found loaded component tuned
[tux1:15541] mca: base: components_open: component tuned open function successful
[tux1:15541] mca: base: components_open: found loaded component sm
[tux1:15541] mca: base: components_open: found loaded component basic
[tux1:15541] mca: base: components_open: found loaded component libnbc
[tux1:15541] mca: base: components_open: component libnbc open function successful
[tux1:15541] coll:find_available: querying coll component hierarch
[tux1:15541] coll:find_available: coll component hierarch is available
[tux1:15541] coll:find_available: querying coll component self
[tux1:15541] coll:find_available: coll component self is available
[tux1:15541] coll:find_available: querying coll component inter
[tux1:15541] coll:find_available: coll component inter is available
[tux1:15541] coll:find_available: querying coll component fca
[tux1:15541] coll:find_available: coll component fca is available
[tux1:15541] coll:find_available: querying coll component hcoll
[tux1:15541] coll:find_available: coll component hcoll is available
[tux1:15541] coll:find_available: querying coll component tuned
[tux1:15541] coll:find_available: coll component tuned is available
[tux1:15541] coll:find_available: querying coll component sm
[tux1:15540] coll:find_available: querying coll component hierarch
[tux1:15540] coll:find_available: coll component hierarch is available
[tux1:15540] coll:find_available: querying coll component self
[tux1:15540] coll:find_available: coll component self is available
[tux1:15540] coll:find_available: querying coll component inter
[tux1:15540] coll:find_available: coll component inter is available
[tux1:15540] coll:find_available: querying coll component fca
[tux1:15540] coll:find_available: coll component fca is available
[tux1:15540] coll:find_available: querying coll component hcoll
[tux1:15541] coll:sm:init_query: no other local procs; disqualifying myself
[tux1:15541] coll:find_available: coll component sm is not available
[tux1:15541] mca: base: close: component sm closed
[tux1:15541] mca: base: close: unloading component sm
[tux1:15540] coll:find_available: coll component hcoll is available
[tux1:15540] coll:find_available: querying coll component tuned
[tux1:15540] coll:find_available: coll component tuned is available
[tux1:15540] coll:find_available: querying coll component sm
[tux1:15540] coll:sm:init_query: no other local procs; disqualifying myself
[tux1:15540] coll:find_available: coll component sm is not available
[tux1:15540] mca: base: close: component sm closed
[tux1:15540] mca: base: close: unloading component sm
[tux1:15541] coll:find_available: querying coll component basic
[tux1:15541] coll:find_available: coll component basic is available
[tux1:15541] coll:find_available: querying coll component libnbc
[tux1:15541] coll:find_available: coll component libnbc is available
[tux1:15540] coll:find_available: querying coll component basic
[tux1:15540] coll:find_available: coll component basic is available
[tux1:15540] coll:find_available: querying coll component libnbc
[tux1:15540] coll:find_available: coll component libnbc is available
[tux1:15541] coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
[tux1:15541] coll:base:comm_select: Checking all available modules
[tux1:15541] coll:base:comm_select: component not available: hierarch
[tux1:15541] coll:base:comm_select: component not available: self
[tux1:15540] coll:base:comm_select: new communicator: MPI_COMM_WORLD (cid 0)
[tux1:15540] coll:base:comm_select: Checking all available modules
[tux1:15540] coll:base:comm_select: component not available: hierarch
[tux1:15540] coll:base:comm_select: component not available: self
[tux1:15540] coll:base:comm_select: component not available: inter
[tux1:15540] coll:base:comm_select: component not available: fca
[tux1:15541] coll:base:comm_select: component not available: inter
[tux1:15541] coll:base:comm_select: component not available: fca
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 15540 on node tux1 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Reply via email to