Hmmm...weird. Seems like maybe a mismatch between libraries. Did you build
OMPI with the same compiler as you did GROMACS/Charm++?

I'm stealing this suggestion from an old Gromacs forum with essentially the
same symptom:

"Did you compile Open MPI and Gromacs with the same compiler (i.e. both gcc
and the same version)? You write you tried different OpenMPI versions and
different GCC versions but it is unclear whether those match. Can you
provide more detail how you compiled (including all options you specified)?
Have you tested any other MPI program linked against those Open MPI
versions? Please make sure (e.g. with ldd) that the MPI and pthread library
you compiled against is also used for execution. If you compiled and run on
different hosts, check whether the error still occurs when executing on the
build host."

http://redmine.gromacs.org/issues/1025

Josh




On Thu, Aug 14, 2014 at 2:40 PM, Maxime Boissonneault <
maxime.boissonnea...@calculquebec.ca> wrote:

>  I just tried Gromacs with two nodes. It crashes, but with a different
> error. I get
> [gpu-k20-13:142156] *** Process received signal ***
> [gpu-k20-13:142156] Signal: Segmentation fault (11)
> [gpu-k20-13:142156] Signal code: Address not mapped (1)
> [gpu-k20-13:142156] Failing at address: 0x8
> [gpu-k20-13:142156] [ 0] /lib64/libpthread.so.0(+0xf710)[0x2ac5d070c710]
> [gpu-k20-13:142156] [ 1]
> /usr/lib64/nvidia/libcuda.so.1(+0x263acf)[0x2ac5ddfbcacf]
> [gpu-k20-13:142156] [ 2]
> /usr/lib64/nvidia/libcuda.so.1(+0x229a83)[0x2ac5ddf82a83]
> [gpu-k20-13:142156] [ 3]
> /usr/lib64/nvidia/libcuda.so.1(+0x15b2da)[0x2ac5ddeb42da]
> [gpu-k20-13:142156] [ 4]
> /usr/lib64/nvidia/libcuda.so.1(cuInit+0x43)[0x2ac5ddea0933]
> [gpu-k20-13:142156] [ 5]
> /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15965)[0x2ac5d0930965]
> [gpu-k20-13:142156] [ 6]
> /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a0a)[0x2ac5d0930a0a]
> [gpu-k20-13:142156] [ 7]
> /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a3b)[0x2ac5d0930a3b]
> [gpu-k20-13:142156] [ 8]
> /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(cudaDriverGetVersion+0x4a)[0x2ac5d094602a]
> [gpu-k20-13:142156] [ 9]
> /software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_print_version_info_gpu+0x55)[0x2ac5cf9a90b5]
> [gpu-k20-13:142156] [10]
> /software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_log_open+0x17e)[0x2ac5cf54b9be]
> [gpu-k20-13:142156] [11] mdrunmpi(cmain+0x1cdb)[0x43b4bb]
> [gpu-k20-13:142156] [12]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x2ac5d1534d1d]
> [gpu-k20-13:142156] [13] mdrunmpi[0x407be1]
> [gpu-k20-13:142156] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 142156 on node gpu-k20-13
> exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
>
>
> We do not have MPI_THREAD_MULTIPLE enabled in our build, so Charm++ cannot
> be using this level of threading. The configure line for OpenMPI was
> ./configure --prefix=$PREFIX \
>       --with-threads --with-verbs=yes --enable-shared --enable-static \
>       --with-io-romio-flags="--with-file-system=nfs+lustre" \
>        --without-loadleveler --without-slurm --with-tm \
>        --with-cuda=$(dirname $(dirname $(which nvcc)))
>
> Maxime
>
>
> Le 2014-08-14 14:20, Joshua Ladd a écrit :
>
>  What about between nodes? Since this is coming from the OpenIB BTL,
> would be good to check this.
>
> Do you know what the MPI thread level is set to when used with the Charm++
> runtime? Is it MPI_THREAD_MULTIPLE? The OpenIB BTL is not thread safe.
>
>  Josh
>
>
> On Thu, Aug 14, 2014 at 2:17 PM, Maxime Boissonneault <
> maxime.boissonnea...@calculquebec.ca> wrote:
>
>>  Hi,
>> I ran gromacs successfully with OpenMPI 1.8.1 and Cuda 6.0.37 on a single
>> node, with 8 ranks and multiple OpenMP threads.
>>
>> Maxime
>>
>>
>> Le 2014-08-14 14:15, Joshua Ladd a écrit :
>>
>>   Hi, Maxime
>>
>>  Just curious, are you able to run a vanilla MPI program? Can you try one
>> one of the example programs in the "examples" subdirectory. Looks like a
>> threading issue to me.
>>
>>  Thanks,
>>
>>  Josh
>>
>>
>>
>>  _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/25023.php
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/08/25024.php
>>
>
>
>
> _______________________________________________
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25025.php
>
>
>
> --
> ---------------------------------
> Maxime Boissonneault
> Analyste de calcul - Calcul Québec, Université Laval
> Ph. D. en physique
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/08/25026.php
>

Reply via email to