But OMPI 1.8.x does run the ring_c program successfully on your compute node, right? The error only happens on the front-end login node if I understood you correctly.
Josh On Fri, Aug 15, 2014 at 5:20 PM, Maxime Boissonneault < maxime.boissonnea...@calculquebec.ca> wrote: > Here are the requested files. > > In the archive, you will find the output of configure, make, make install > as well as the config.log, the environment when running ring_c and the > ompi_info --all. > > Just for a reminder, the ring_c example compiled and ran, but produced no > output when running and exited with code 65. > > Thanks, > > Maxime > > Le 2014-08-14 15:26, Joshua Ladd a écrit : > > One more, Maxime, can you please make sure you've covered everything > here: > > http://www.open-mpi.org/community/help/ > > Josh > > > On Thu, Aug 14, 2014 at 3:18 PM, Joshua Ladd <jladd.m...@gmail.com> wrote: > >> And maybe include your LD_LIBRARY_PATH >> >> Josh >> >> >> On Thu, Aug 14, 2014 at 3:16 PM, Joshua Ladd <jladd.m...@gmail.com> >> wrote: >> >>> Can you try to run the example code "ring_c" across nodes? >>> >>> Josh >>> >>> >>> On Thu, Aug 14, 2014 at 3:14 PM, Maxime Boissonneault < >>> maxime.boissonnea...@calculquebec.ca> wrote: >>> >>>> Yes, >>>> Everything has been built with GCC 4.8.x, although x might have changed >>>> between the OpenMPI 1.8.1 build and the gromacs build. For OpenMPI 1.8.2rc4 >>>> however, it was the exact same compiler for everything. >>>> >>>> Maxime >>>> >>>> Le 2014-08-14 14:57, Joshua Ladd a écrit : >>>> >>>> Hmmm...weird. Seems like maybe a mismatch between libraries. Did you >>>> build OMPI with the same compiler as you did GROMACS/Charm++? >>>> >>>> I'm stealing this suggestion from an old Gromacs forum with essentially >>>> the same symptom: >>>> >>>> "Did you compile Open MPI and Gromacs with the same compiler (i.e. both >>>> gcc and the same version)? You write you tried different OpenMPI versions >>>> and different GCC versions but it is unclear whether those match. Can you >>>> provide more detail how you compiled (including all options you specified)? >>>> Have you tested any other MPI program linked against those Open MPI >>>> versions? Please make sure (e.g. with ldd) that the MPI and pthread library >>>> you compiled against is also used for execution. If you compiled and run on >>>> different hosts, check whether the error still occurs when executing on the >>>> build host." >>>> >>>> http://redmine.gromacs.org/issues/1025 >>>> >>>> Josh >>>> >>>> >>>> >>>> >>>> On Thu, Aug 14, 2014 at 2:40 PM, Maxime Boissonneault < >>>> maxime.boissonnea...@calculquebec.ca> wrote: >>>> >>>>> I just tried Gromacs with two nodes. It crashes, but with a >>>>> different error. I get >>>>> [gpu-k20-13:142156] *** Process received signal *** >>>>> [gpu-k20-13:142156] Signal: Segmentation fault (11) >>>>> [gpu-k20-13:142156] Signal code: Address not mapped (1) >>>>> [gpu-k20-13:142156] Failing at address: 0x8 >>>>> [gpu-k20-13:142156] [ 0] >>>>> /lib64/libpthread.so.0(+0xf710)[0x2ac5d070c710] >>>>> [gpu-k20-13:142156] [ 1] >>>>> /usr/lib64/nvidia/libcuda.so.1(+0x263acf)[0x2ac5ddfbcacf] >>>>> [gpu-k20-13:142156] [ 2] >>>>> /usr/lib64/nvidia/libcuda.so.1(+0x229a83)[0x2ac5ddf82a83] >>>>> [gpu-k20-13:142156] [ 3] >>>>> /usr/lib64/nvidia/libcuda.so.1(+0x15b2da)[0x2ac5ddeb42da] >>>>> [gpu-k20-13:142156] [ 4] >>>>> /usr/lib64/nvidia/libcuda.so.1(cuInit+0x43)[0x2ac5ddea0933] >>>>> [gpu-k20-13:142156] [ 5] >>>>> /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15965)[0x2ac5d0930965] >>>>> [gpu-k20-13:142156] [ 6] >>>>> /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a0a)[0x2ac5d0930a0a] >>>>> [gpu-k20-13:142156] [ 7] >>>>> /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a3b)[0x2ac5d0930a3b] >>>>> [gpu-k20-13:142156] [ 8] >>>>> /software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(cudaDriverGetVersion+0x4a)[0x2ac5d094602a] >>>>> [gpu-k20-13:142156] [ 9] >>>>> /software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_print_version_info_gpu+0x55)[0x2ac5cf9a90b5] >>>>> [gpu-k20-13:142156] [10] >>>>> /software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_log_open+0x17e)[0x2ac5cf54b9be] >>>>> [gpu-k20-13:142156] [11] mdrunmpi(cmain+0x1cdb)[0x43b4bb] >>>>> [gpu-k20-13:142156] [12] >>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x2ac5d1534d1d] >>>>> [gpu-k20-13:142156] [13] mdrunmpi[0x407be1] >>>>> [gpu-k20-13:142156] *** End of error message *** >>>>> >>>>> -------------------------------------------------------------------------- >>>>> mpiexec noticed that process rank 0 with PID 142156 on node gpu-k20-13 >>>>> exited on signal 11 (Segmentation fault). >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> We do not have MPI_THREAD_MULTIPLE enabled in our build, so Charm++ >>>>> cannot be using this level of threading. The configure line for OpenMPI >>>>> was >>>>> ./configure --prefix=$PREFIX \ >>>>> --with-threads --with-verbs=yes --enable-shared --enable-static \ >>>>> --with-io-romio-flags="--with-file-system=nfs+lustre" \ >>>>> --without-loadleveler --without-slurm --with-tm \ >>>>> --with-cuda=$(dirname $(dirname $(which nvcc))) >>>>> >>>>> Maxime >>>>> >>>>> >>>>> Le 2014-08-14 14:20, Joshua Ladd a écrit : >>>>> >>>>> What about between nodes? Since this is coming from the OpenIB BTL, >>>>> would be good to check this. >>>>> >>>>> Do you know what the MPI thread level is set to when used with the >>>>> Charm++ runtime? Is it MPI_THREAD_MULTIPLE? The OpenIB BTL is not thread >>>>> safe. >>>>> >>>>> Josh >>>>> >>>>> >>>>> On Thu, Aug 14, 2014 at 2:17 PM, Maxime Boissonneault < >>>>> maxime.boissonnea...@calculquebec.ca> wrote: >>>>> >>>>>> Hi, >>>>>> I ran gromacs successfully with OpenMPI 1.8.1 and Cuda 6.0.37 on a >>>>>> single node, with 8 ranks and multiple OpenMP threads. >>>>>> >>>>>> Maxime >>>>>> >>>>>> >>>>>> Le 2014-08-14 14:15, Joshua Ladd a écrit : >>>>>> >>>>>> Hi, Maxime >>>>>> >>>>>> Just curious, are you able to run a vanilla MPI program? Can you try >>>>>> one one of the example programs in the "examples" subdirectory. Looks >>>>>> like >>>>>> a threading issue to me. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Josh >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing listus...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25023.php >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25024.php >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing listus...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/08/25025.php >>>>> >>>>> >>>>> >>>>> -- >>>>> --------------------------------- >>>>> Maxime Boissonneault >>>>> Analyste de calcul - Calcul Québec, Université Laval >>>>> Ph. D. en physique >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/08/25026.php >>>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing listus...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/08/25027.php >>>> >>>> >>>> >>>> -- >>>> --------------------------------- >>>> Maxime Boissonneault >>>> Analyste de calcul - Calcul Québec, Université Laval >>>> Ph. D. en physique >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/08/25028.php >>>> >>> >>> >> > > > _______________________________________________ > users mailing listus...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/25031.php > > > > -- > --------------------------------- > Maxime Boissonneault > Analyste de calcul - Calcul Québec, Université Laval > Ph. D. en physique > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/25039.php >