Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

Maxime Boissonneault Thu, 14 Aug 2014 15:14:50 -0400 (EDT)

Yes,

Everything has been built with GCC 4.8.x, although x might have changedbetween the OpenMPI 1.8.1 build and the gromacs build. For OpenMPI1.8.2rc4 however, it was the exact same compiler for everything.


Maxime

Le 2014-08-14 14:57, Joshua Ladd a écrit :

Hmmm...weird. Seems like maybe a mismatch between libraries. Did youbuild OMPI with the same compiler as you did GROMACS/Charm++?

I'm stealing this suggestion from an old Gromacs forum withessentially the same symptom:

"Did you compile Open MPI and Gromacs with the same compiler (i.e.both gcc and the same version)? You write you tried different OpenMPIversions and different GCC versions but it is unclear whether thosematch. Can you provide more detail how you compiled (including alloptions you specified)? Have you tested any other MPI program linkedagainst those Open MPI versions? Please make sure (e.g. with ldd) thatthe MPI and pthread library you compiled against is also used forexecution. If you compiled and run on different hosts, check whetherthe error still occurs when executing on the build host."


http://redmine.gromacs.org/issues/1025

Josh

On Thu, Aug 14, 2014 at 2:40 PM, Maxime Boissonneault<maxime.boissonnea...@calculquebec.ca<mailto:maxime.boissonnea...@calculquebec.ca>> wrote:


    I just tried Gromacs with two nodes. It crashes, but with a
    different error. I get
    [gpu-k20-13:142156] *** Process received signal ***
    [gpu-k20-13:142156] Signal: Segmentation fault (11)
    [gpu-k20-13:142156] Signal code: Address not mapped (1)
    [gpu-k20-13:142156] Failing at address: 0x8
    [gpu-k20-13:142156] [ 0]
    /lib64/libpthread.so.0(+0xf710)[0x2ac5d070c710]
    [gpu-k20-13:142156] [ 1]
    /usr/lib64/nvidia/libcuda.so.1(+0x263acf)[0x2ac5ddfbcacf]
    [gpu-k20-13:142156] [ 2]
    /usr/lib64/nvidia/libcuda.so.1(+0x229a83)[0x2ac5ddf82a83]
    [gpu-k20-13:142156] [ 3]
    /usr/lib64/nvidia/libcuda.so.1(+0x15b2da)[0x2ac5ddeb42da]
    [gpu-k20-13:142156] [ 4]
    /usr/lib64/nvidia/libcuda.so.1(cuInit+0x43)[0x2ac5ddea0933]
    [gpu-k20-13:142156] [ 5]
    /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15965)[0x2ac5d0930965]
    [gpu-k20-13:142156] [ 6]
    /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a0a)[0x2ac5d0930a0a]
    [gpu-k20-13:142156] [ 7]
    /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a3b)[0x2ac5d0930a3b]
    [gpu-k20-13:142156] [ 8]
    
/software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(cudaDriverGetVersion+0x4a)[0x2ac5d094602a]
    [gpu-k20-13:142156] [ 9]
    
/software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_print_version_info_gpu+0x55)[0x2ac5cf9a90b5]
    [gpu-k20-13:142156] [10]
    
/software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_log_open+0x17e)[0x2ac5cf54b9be]
    [gpu-k20-13:142156] [11] mdrunmpi(cmain+0x1cdb)[0x43b4bb]
    [gpu-k20-13:142156] [12]
    /lib64/libc.so.6(__libc_start_main+0xfd)[0x2ac5d1534d1d]
    [gpu-k20-13:142156] [13] mdrunmpi[0x407be1]
    [gpu-k20-13:142156] *** End of error message ***
    --------------------------------------------------------------------------
    mpiexec noticed that process rank 0 with PID 142156 on node
    gpu-k20-13 exited on signal 11 (Segmentation fault).
    --------------------------------------------------------------------------



    We do not have MPI_THREAD_MULTIPLE enabled in our build, so
    Charm++ cannot be using this level of threading. The configure
    line for OpenMPI was
    ./configure --prefix=$PREFIX \
          --with-threads --with-verbs=yes --enable-shared
    --enable-static \
    --with-io-romio-flags="--with-file-system=nfs+lustre" \
           --without-loadleveler --without-slurm --with-tm \
           --with-cuda=$(dirname $(dirname $(which nvcc)))

    Maxime


    Le 2014-08-14 14:20, Joshua Ladd a écrit :

    What about between nodes? Since this is coming from the OpenIB
    BTL, would be good to check this.

    Do you know what the MPI thread level is set to when used with
    the Charm++ runtime? Is it MPI_THREAD_MULTIPLE? The OpenIB BTL is
    not thread safe.

    Josh


    On Thu, Aug 14, 2014 at 2:17 PM, Maxime Boissonneault
    <maxime.boissonnea...@calculquebec.ca
    <mailto:maxime.boissonnea...@calculquebec.ca>> wrote:

        Hi,
        I ran gromacs successfully with OpenMPI 1.8.1 and Cuda 6.0.37
        on a single node, with 8 ranks and multiple OpenMP threads.

        Maxime


        Le 2014-08-14 14:15, Joshua Ladd a écrit :

        Hi, Maxime

        Just curious, are you able to run a vanilla MPI program? Can
        you try one one of the example programs in the "examples"
        subdirectory. Looks like a threading issue to me.

        Thanks,

        Josh



        _______________________________________________ users
        mailing list us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users

        Link to this 
post:http://www.open-mpi.org/community/lists/users/2014/08/25023.php




        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2014/08/25024.php




    _______________________________________________ users mailing
    list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription:
    http://www.open-mpi.org/mailman/listinfo.cgi/users

    Link to this 
post:http://www.open-mpi.org/community/lists/users/2014/08/25025.php

-----------------------------------

    Maxime Boissonneault
    Analyste de calcul - Calcul Québec, Université Laval
    Ph. D. en physique


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2014/08/25026.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/25027.php



--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique

Re: [OMPI users] Segmentation fault in OpenMPI 1.8.1

Reply via email to