Thank you,

That seems to solve the problem.

Best Regards,

Nilo Menezes

On 5/19/2015 3:34 PM, Ralph Castain wrote:
It looks like you have PSM enabled cards on your system as well as Ethernet, and we are picking that up. Try adding "-mca pml ob1" to your cmd line and see if that helps


On Tue, May 19, 2015 at 5:04 AM, Nilo Menezes <n...@nilo.pro.br <mailto:n...@nilo.pro.br>> wrote:

    Hello,

    I'm trying to run openmpi with multithread support enabled.

    I'm getting this error messages before init finishes:
    [node011:61627] PSM returned unhandled/unknown connect error:
    Operation timed out
    [node011:61627] PSM EP connect error (unknown connect error):

    *** An error occurred in MPI_Init_thread
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
    abort,
    ***    and potentially your MPI job)
    [node005:51948] Local abort before MPI_INIT completed
    successfully; not able to aggregate error messages, and not able
    to guarantee that all other processes were killed!
    *** An error occurred in MPI_Init_thread
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
    abort,
    ***    and potentially your MPI job)
    [node039:57062] Local abort before MPI_INIT completed
    successfully; not able to aggregate error messages, and not able
    to guarantee that all other processes were killed!
    *** An error occurred in MPI_Init_thread
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
    abort,
    ***    and potentially your MPI job)
    [node012:64036] Local abort before MPI_INIT completed
    successfully; not able to aggregate error messages, and not able
    to guarantee that all other processes were killed!
    *** An error occurred in MPI_Init_thread
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
    abort,
    ***    and potentially your MPI job)
    [node008:14098] Local abort before MPI_INIT completed
    successfully; not able to aggregate error messages, and not able
    to guarantee that all other processes were killed!
    *** An error occurred in MPI_Init_thread
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
    abort,
    ***    and potentially your MPI job)
    [node011:61627] Local abort before MPI_INIT completed
    successfully; not able to aggregate error messages, and not able
    to guarantee that all other processes were killed!
    [node005:51887] 1 more process has sent help message
    help-mpi-runtime / mpi_init:startup:internal-failure
    [node005:51887] Set MCA parameter "orte_base_help_aggregate" to 0
    to see all help / error messages

    The library was configured with:
    ./configure \
    --prefix=/home/opt \
    --enable-static \
    --enable-mpi-thread-multiple \
    --with-threads

    gcc 4.8.2

    On Linux:
    Linux node001 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15
    13:44:51 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

    The job was started with:
    sbatch --nodes=6 --ntasks=30 --mem=4096  -o result/TOn6t30.txt -e
    result/TEn6t30.txt job.sh


    job.sh contains:
    mpirun --mca btl tcp,self \
           --mca btl_tcp_if_include 172.24.38.0/24
    <http://172.24.38.0/24> \
           --mca oob_tcp_if_include eth0 \
    /home/umons/info/menezes/drsim/build/NameResolution/gameoflife_mpi2 
--columns=1000
    --rows=1000

    I call MPI_INIT with:
        int provided;
        MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);

    The program is a simple game of life simulation. It runs fine in a
    single node (with one or many tasks). But fails at random nodes
    when distributed.

    Any hint may help.

    Best Regards,

    Nilo Menezes
    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2015/05/26879.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/05/26880.php

Reply via email to