Correct.

Can it be because torque (pbs_mom) is not running on the head node and mpiexec attempts to contact it ?

Maxime

Le 2014-08-15 17:31, Joshua Ladd a écrit :
But OMPI 1.8.x does run the ring_c program successfully on your compute node, right? The error only happens on the front-end login node if I understood you correctly.

Josh


On Fri, Aug 15, 2014 at 5:20 PM, Maxime Boissonneault <maxime.boissonnea...@calculquebec.ca <mailto:maxime.boissonnea...@calculquebec.ca>> wrote:

    Here are the requested files.

    In the archive, you will find the output of configure, make, make
    install as well as the config.log, the environment when running
    ring_c and the ompi_info --all.

    Just for a reminder, the ring_c example compiled and ran, but
    produced no output when running and exited with code 65.

    Thanks,

    Maxime

    Le 2014-08-14 15:26, Joshua Ladd a écrit :
    One more, Maxime, can you please make sure you've covered
    everything here:

    http://www.open-mpi.org/community/help/

    Josh


    On Thu, Aug 14, 2014 at 3:18 PM, Joshua Ladd
    <jladd.m...@gmail.com <mailto:jladd.m...@gmail.com>> wrote:

        And maybe include your LD_LIBRARY_PATH

        Josh


        On Thu, Aug 14, 2014 at 3:16 PM, Joshua Ladd
        <jladd.m...@gmail.com <mailto:jladd.m...@gmail.com>> wrote:

            Can you try to run the example code "ring_c" across nodes?

            Josh


            On Thu, Aug 14, 2014 at 3:14 PM, Maxime Boissonneault
            <maxime.boissonnea...@calculquebec.ca
            <mailto:maxime.boissonnea...@calculquebec.ca>> wrote:

                Yes,
                Everything has been built with GCC 4.8.x, although x
                might have changed between the OpenMPI 1.8.1 build
                and the gromacs build. For OpenMPI 1.8.2rc4 however,
                it was the exact same compiler for everything.

                Maxime

                Le 2014-08-14 14:57, Joshua Ladd a écrit :
                Hmmm...weird. Seems like maybe a mismatch between
                libraries. Did you build OMPI with the same compiler
                as you did GROMACS/Charm++?

                I'm stealing this suggestion from an old Gromacs
                forum with essentially the same symptom:

                "Did you compile Open MPI and Gromacs with the same
                compiler (i.e. both gcc and the same version)? You
                write you tried different OpenMPI versions and
                different GCC versions but it is unclear whether
                those match. Can you provide more detail how you
                compiled (including all options you specified)? Have
                you tested any other MPI program linked against
                those Open MPI versions? Please make sure (e.g. with
                ldd) that the MPI and pthread library you compiled
                against is also used for execution. If you compiled
                and run on different hosts, check whether the error
                still occurs when executing on the build host."

                http://redmine.gromacs.org/issues/1025

                Josh




                On Thu, Aug 14, 2014 at 2:40 PM, Maxime
                Boissonneault <maxime.boissonnea...@calculquebec.ca
                <mailto:maxime.boissonnea...@calculquebec.ca>> wrote:

                    I just tried Gromacs with two nodes. It crashes,
                    but with a different error. I get
                    [gpu-k20-13:142156] *** Process received signal ***
                    [gpu-k20-13:142156] Signal: Segmentation fault (11)
                    [gpu-k20-13:142156] Signal code: Address not
                    mapped (1)
                    [gpu-k20-13:142156] Failing at address: 0x8
                    [gpu-k20-13:142156] [ 0]
                    /lib64/libpthread.so.0(+0xf710)[0x2ac5d070c710]
                    [gpu-k20-13:142156] [ 1]
                    /usr/lib64/nvidia/libcuda.so.1(+0x263acf)[0x2ac5ddfbcacf]
                    [gpu-k20-13:142156] [ 2]
                    /usr/lib64/nvidia/libcuda.so.1(+0x229a83)[0x2ac5ddf82a83]
                    [gpu-k20-13:142156] [ 3]
                    /usr/lib64/nvidia/libcuda.so.1(+0x15b2da)[0x2ac5ddeb42da]
                    [gpu-k20-13:142156] [ 4]
                    /usr/lib64/nvidia/libcuda.so.1(cuInit+0x43)[0x2ac5ddea0933]
                    [gpu-k20-13:142156] [ 5]
                    
/software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15965)[0x2ac5d0930965]
                    [gpu-k20-13:142156] [ 6]
                    
/software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a0a)[0x2ac5d0930a0a]
                    [gpu-k20-13:142156] [ 7]
                    
/software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a3b)[0x2ac5d0930a3b]
                    [gpu-k20-13:142156] [ 8]
                    
/software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(cudaDriverGetVersion+0x4a)[0x2ac5d094602a]
                    [gpu-k20-13:142156] [ 9]
                    
/software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_print_version_info_gpu+0x55)[0x2ac5cf9a90b5]
                    [gpu-k20-13:142156] [10]
                    
/software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_log_open+0x17e)[0x2ac5cf54b9be]
                    [gpu-k20-13:142156] [11]
                    mdrunmpi(cmain+0x1cdb)[0x43b4bb]
                    [gpu-k20-13:142156] [12]
                    /lib64/libc.so.6(__libc_start_main+0xfd)[0x2ac5d1534d1d]
                    [gpu-k20-13:142156] [13] mdrunmpi[0x407be1]
                    [gpu-k20-13:142156] *** End of error message ***
                    
--------------------------------------------------------------------------
                    mpiexec noticed that process rank 0 with PID
                    142156 on node gpu-k20-13 exited on signal 11
                    (Segmentation fault).
                    
--------------------------------------------------------------------------



                    We do not have MPI_THREAD_MULTIPLE enabled in
                    our build, so Charm++ cannot be using this level
                    of threading. The configure line for OpenMPI was
                    ./configure --prefix=$PREFIX \
                    --with-threads --with-verbs=yes --enable-shared
                    --enable-static \
                    --with-io-romio-flags="--with-file-system=nfs+lustre"
                    \
                    --without-loadleveler --without-slurm --with-tm \
                    --with-cuda=$(dirname $(dirname $(which nvcc)))

                    Maxime


                    Le 2014-08-14 14:20, Joshua Ladd a écrit :
                    What about between nodes? Since this is coming
                    from the OpenIB BTL, would be good to check this.

                    Do you know what the MPI thread level is set to
                    when used with the Charm++ runtime? Is it
                    MPI_THREAD_MULTIPLE? The OpenIB BTL is not
                    thread safe.

                    Josh


                    On Thu, Aug 14, 2014 at 2:17 PM, Maxime
                    Boissonneault
                    <maxime.boissonnea...@calculquebec.ca
                    <mailto:maxime.boissonnea...@calculquebec.ca>>
                    wrote:

                        Hi,
                        I ran gromacs successfully with OpenMPI
                        1.8.1 and Cuda 6.0.37 on a single node,
                        with 8 ranks and multiple OpenMP threads.

                        Maxime


                        Le 2014-08-14 14:15, Joshua Ladd a écrit :
                        Hi, Maxime

                        Just curious, are you able to run a
                        vanilla MPI program? Can you try one one
                        of the example programs in the "examples"
                        subdirectory. Looks like a threading issue
                        to me.

                        Thanks,

                        Josh



                        _______________________________________________
                        users mailing list us...@open-mpi.org
                        <mailto:us...@open-mpi.org> Subscription:
                        http://www.open-mpi.org/mailman/listinfo.cgi/users

                        Link to this 
post:http://www.open-mpi.org/community/lists/users/2014/08/25023.php



                        _______________________________________________
                        users mailing list
                        us...@open-mpi.org <mailto:us...@open-mpi.org>
                        Subscription:
                        http://www.open-mpi.org/mailman/listinfo.cgi/users
                        Link to this post:
                        
http://www.open-mpi.org/community/lists/users/2014/08/25024.php




                    _______________________________________________
                    users mailing list us...@open-mpi.org
                    <mailto:us...@open-mpi.org> Subscription:
                    http://www.open-mpi.org/mailman/listinfo.cgi/users

                    Link to this 
post:http://www.open-mpi.org/community/lists/users/2014/08/25025.php


-- ---------------------------------
                    Maxime Boissonneault
                    Analyste de calcul - Calcul Québec, Université Laval
                    Ph. D. en physique


                    _______________________________________________
                    users mailing list
                    us...@open-mpi.org <mailto:us...@open-mpi.org>
                    Subscription:
                    http://www.open-mpi.org/mailman/listinfo.cgi/users
                    Link to this post:
                    
http://www.open-mpi.org/community/lists/users/2014/08/25026.php




                _______________________________________________
                users mailing list us...@open-mpi.org
                <mailto:us...@open-mpi.org> Subscription:
                http://www.open-mpi.org/mailman/listinfo.cgi/users

                Link to this 
post:http://www.open-mpi.org/community/lists/users/2014/08/25027.php


-- ---------------------------------
                Maxime Boissonneault
                Analyste de calcul - Calcul Québec, Université Laval
                Ph. D. en physique


                _______________________________________________
                users mailing list
                us...@open-mpi.org <mailto:us...@open-mpi.org>
                Subscription:
                http://www.open-mpi.org/mailman/listinfo.cgi/users
                Link to this post:
                http://www.open-mpi.org/community/lists/users/2014/08/25028.php






    _______________________________________________ users mailing
    list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription:
    http://www.open-mpi.org/mailman/listinfo.cgi/users

    Link to this 
post:http://www.open-mpi.org/community/lists/users/2014/08/25031.php


-- ---------------------------------
    Maxime Boissonneault
    Analyste de calcul - Calcul Québec, Université Laval
    Ph. D. en physique


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2014/08/25039.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/25040.php


--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique

Reply via email to