I just tried Gromacs with two nodes. It crashes, but with a different error. I get
[gpu-k20-13:142156] *** Process received signal ***
[gpu-k20-13:142156] Signal: Segmentation fault (11)
[gpu-k20-13:142156] Signal code: Address not mapped (1)
[gpu-k20-13:142156] Failing at address: 0x8
[gpu-k20-13:142156] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2ac5d070c710]
[gpu-k20-13:142156] [ 1] /usr/lib64/nvidia/libcuda.so.1(+0x263acf)[0x2ac5ddfbcacf] [gpu-k20-13:142156] [ 2] /usr/lib64/nvidia/libcuda.so.1(+0x229a83)[0x2ac5ddf82a83] [gpu-k20-13:142156] [ 3] /usr/lib64/nvidia/libcuda.so.1(+0x15b2da)[0x2ac5ddeb42da] [gpu-k20-13:142156] [ 4] /usr/lib64/nvidia/libcuda.so.1(cuInit+0x43)[0x2ac5ddea0933] [gpu-k20-13:142156] [ 5] /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15965)[0x2ac5d0930965] [gpu-k20-13:142156] [ 6] /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a0a)[0x2ac5d0930a0a] [gpu-k20-13:142156] [ 7] /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a3b)[0x2ac5d0930a3b] [gpu-k20-13:142156] [ 8] /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(cudaDriverGetVersion+0x4a)[0x2ac5d094602a] [gpu-k20-13:142156] [ 9] /software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_print_version_info_gpu+0x55)[0x2ac5cf9a90b5] [gpu-k20-13:142156] [10] /software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_log_open+0x17e)[0x2ac5cf54b9be]
[gpu-k20-13:142156] [11] mdrunmpi(cmain+0x1cdb)[0x43b4bb]
[gpu-k20-13:142156] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x2ac5d1534d1d]
[gpu-k20-13:142156] [13] mdrunmpi[0x407be1]
[gpu-k20-13:142156] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 142156 on node gpu-k20-13 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------



We do not have MPI_THREAD_MULTIPLE enabled in our build, so Charm++ cannot be using this level of threading. The configure line for OpenMPI was
./configure --prefix=$PREFIX \
      --with-threads --with-verbs=yes --enable-shared --enable-static \
      --with-io-romio-flags="--with-file-system=nfs+lustre" \
       --without-loadleveler --without-slurm --with-tm \
       --with-cuda=$(dirname $(dirname $(which nvcc)))

Maxime


Le 2014-08-14 14:20, Joshua Ladd a écrit :
What about between nodes? Since this is coming from the OpenIB BTL, would be good to check this.

Do you know what the MPI thread level is set to when used with the Charm++ runtime? Is it MPI_THREAD_MULTIPLE? The OpenIB BTL is not thread safe.

Josh


On Thu, Aug 14, 2014 at 2:17 PM, Maxime Boissonneault <maxime.boissonnea...@calculquebec.ca <mailto:maxime.boissonnea...@calculquebec.ca>> wrote:

    Hi,
    I ran gromacs successfully with OpenMPI 1.8.1 and Cuda 6.0.37 on a
    single node, with 8 ranks and multiple OpenMP threads.

    Maxime


    Le 2014-08-14 14:15, Joshua Ladd a écrit :
    Hi, Maxime

    Just curious, are you able to run a vanilla MPI program? Can you
    try one one of the example programs in the "examples"
    subdirectory. Looks like a threading issue to me.

    Thanks,

    Josh



    _______________________________________________ users mailing
    list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription:
    http://www.open-mpi.org/mailman/listinfo.cgi/users

    Link to this 
post:http://www.open-mpi.org/community/lists/users/2014/08/25023.php



    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2014/08/25024.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/25025.php


--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique

Reply via email to