Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Maxime Boissonneault
very friendly way to handle that error. -Original Message- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime Boissonneault Sent: Tuesday, August 19, 2014 10:39 AM To: Open MPI Users Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes Hi, I believe I found

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Rolf vandeVaart
Open MPI Users >Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes > >Hi, >I believe I found what the problem was. My script set the >CUDA_VISIBLE_DEVICES based on the content of $PBS_GPUFILE. Since the >GPUs were listed twice in the $PBS_GPUFILE because of the two n

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Maxime Boissonneault
-Original Message- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime Boissonneault Sent: Tuesday, August 19, 2014 8:55 AM To: Open MPI Users Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes Hi, I recompiled OMPI 1.8.1 without Cuda and with debug, but it did not

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Rolf vandeVaart
2014 8:55 AM >To: Open MPI Users >Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes > >Hi, >I recompiled OMPI 1.8.1 without Cuda and with debug, but it did not give me >much more information. >[mboisson@gpu-k20-07 simple_cuda_mpi]$ ompi_info | grep debug >

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Maxime Boissonneault
Hi, I recompiled OMPI 1.8.1 without Cuda and with debug, but it did not give me much more information. [mboisson@gpu-k20-07 simple_cuda_mpi]$ ompi_info | grep debug Prefix: /software-gpu/mpi/openmpi/1.8.1-debug_gcc4.8_nocuda Internal debug support: yes Memory debugging supp

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Maxime Boissonneault
: [OMPI users] Segfault with MPI + Cuda on multiple nodes Same thing : [mboisson@gpu-k20-07 simple_cuda_mpi]$ export MALLOC_CHECK_=1 [mboisson@gpu-k20-07 simple_cuda_mpi]$ mpiexec -np 2 --map-by ppr:1:node cudampi_simple malloc: using debugging hooks malloc: using debugging hooks [gpu-k20-07:47628

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Alex A. Granovsky
] Segfault with MPI + Cuda on multiple nodes Same thing : [mboisson@gpu-k20-07 simple_cuda_mpi]$ export MALLOC_CHECK_=1 [mboisson@gpu-k20-07 simple_cuda_mpi]$ mpiexec -np 2 --map-by ppr:1:node cudampi_simple malloc: using debugging hooks malloc: using debugging hooks [gpu-k20-07:47628] *** Process

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-19 Thread Alex A. Granovsky
] Segfault with MPI + Cuda on multiple nodes Same thing : [mboisson@gpu-k20-07 simple_cuda_mpi]$ export MALLOC_CHECK_=1 [mboisson@gpu-k20-07 simple_cuda_mpi]$ mpiexec -np 2 --map-by ppr:1:node cudampi_simple malloc: using debugging hooks malloc: using debugging hooks [gpu-k20-07:47628] *** Process

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-18 Thread Maxime Boissonneault
It's building... to be continued tomorrow morning. Le 2014-08-18 16:45, Rolf vandeVaart a écrit : Just to help reduce the scope of the problem, can you retest with a non CUDA-aware Open MPI 1.8.1? And if possible, use --enable-debug in the configure line to help with the stack trace?

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-18 Thread Maxime Boissonneault
Same thing : [mboisson@gpu-k20-07 simple_cuda_mpi]$ export MALLOC_CHECK_=1 [mboisson@gpu-k20-07 simple_cuda_mpi]$ mpiexec -np 2 --map-by ppr:1:node cudampi_simple malloc: using debugging hooks malloc: using debugging hooks [gpu-k20-07:47628] *** Process received signal *** [gpu-k20-07:47628] Si

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-18 Thread Alex A. Granovsky
Try the following: export MALLOC_CHECK_=1 and then run it again Kind regards, Alex Granovsky -Original Message- From: Maxime Boissonneault Sent: Tuesday, August 19, 2014 12:23 AM To: Open MPI Users Subject: [OMPI users] Segfault with MPI + Cuda on multiple nodes Hi, Since my previ

Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes

2014-08-18 Thread Rolf vandeVaart
Just to help reduce the scope of the problem, can you retest with a non CUDA-aware Open MPI 1.8.1? And if possible, use --enable-debug in the configure line to help with the stack trace? >-Original Message- >From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime >Boissonn