Daniel,

ok, thanks

it seems that even if priority is zero, some code gets executed
I will confirm this tomorrow and send you a patch to work around the issue
if that if my guess is proven right

Cheers,

Gilles

On Sunday, June 21, 2015, Daniel Letai <d...@letai.org.il> wrote:

>  MCA coll: parameter "coll_ml_priority" (current value: "0", data source:
> default, level: 9 dev/all, type: int)
>
> Not sure how to read this, but for any n>1 mpirun only works with --mca
> coll ^ml
>
> Thanks for helping
>
> On 06/18/2015 04:36 PM, Gilles Gouaillardet wrote:
>
> This is really odd...
>
>  you can run
> ompi_info --all
> and search coll_ml_priority
>
>  it will display the current value and the origin
> (e.g. default, system wide config, user config, cli, environment variable)
>
>  Cheers,
>
>  Gilles
>
> On Thursday, June 18, 2015, Daniel Letai <d...@letai.org.il
> <javascript:_e(%7B%7D,'cvml','d...@letai.org.il');>> wrote:
>
>>  No, that's the issue.
>> I had to disable it to get things working.
>>
>> That's why I included my config settings - I couldn't figure out which
>> option enabled it, so I could remove it from the configuration...
>>
>> On 06/18/2015 02:43 PM, Gilles Gouaillardet wrote:
>>
>> Daniel,
>>
>>  ML module is not ready for production and is disabled by default.
>>
>>  Did you explicitly enable this module ?
>> If yes, I encourage you to disable it
>>
>>  Cheers,
>>
>>  Gilles
>>
>> On Thursday, June 18, 2015, Daniel Letai <d...@letai.org.il> wrote:
>>
>>> given a simple hello.c:
>>>
>>> #include <stdio.h>
>>> #include <mpi.h>
>>>
>>> int main(int argc, char* argv[])
>>> {
>>>         int size, rank, len;
>>>         char name[MPI_MAX_PROCESSOR_NAME];
>>>
>>>         MPI_Init(&argc, &argv);
>>>         MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>         MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>         MPI_Get_processor_name(name, &len);
>>>
>>>         printf("%s: Process %d out of %d\n", name, rank, size);
>>>
>>>         MPI_Finalize();ffff
>>> }
>>>
>>> for n=1
>>> mpirun -n 1 ./hello
>>> it works correctly.
>>>
>>> for n>1 it segfaults with signal 11
>>> used gdb to trace the problem to ml coll:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x00007ffff6750845 in ml_coll_hier_barrier_setup()
>>>     from <path to openmpi 1.8.5>/lib/openmpi/mca_coll_ml.so
>>>
>>> running with
>>> mpirun -n 2 --mca coll ^ml ./hello
>>> works correctly
>>>
>>> using mellanox ofed 2.3-2.0.5-rhel6.4-x86_64, if it's at all relevant.
>>> openmpi 1.8.5 was built with following options:
>>> rpmbuild --rebuild --define 'configure_options --with-verbs=/usr
>>> --with-verbs-libdir=/usr/lib64 CC=gcc CXX=g++ FC=gfortran CFLAGS="-g -O3"
>>> --enable-mpirun-prefix-by-default --enable-orterun-prefix-by-default
>>> --disable-debug --with-knem=/opt/knem-1.1.1.90mlnx
>>> --with-platform=optimized --without-mpi-param-check
>>> --with-contrib-vt-flags=--disable-iotrace --enable-builtin-atomics
>>> --enable-cxx-exceptions --enable-sparse-groups --enable-mpi-thread-multiple
>>> --enable-memchecker --enable-btl-openib-failover --with-hwloc=internal
>>> --with-verbs --with-x --with-slurm --with-pmi=/opt/slurm
>>> --with-fca=/opt/mellanox/fca --with-mxm=/opt/mellanox/mxm
>>> --with-hcoll=/opt/mellanox/hcoll' openmpi-1.8.5-1.src.rpm
>>>
>>> gcc version 5.1.1
>>>
>>> Thanks in advance
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/06/27154.php
>>>
>>
>>
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/06/27155.php
>>
>>
>>
>
> _______________________________________________
> users mailing listus...@open-mpi.org 
> <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27157.php
>
>
>

Reply via email to