very friendly way to handle that error.
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
Boissonneault
Sent: Tuesday, August 19, 2014 10:39 AM
To: Open MPI Users
Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes
Hi,
I believe I found
Open MPI Users
>Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes
>
>Hi,
>I believe I found what the problem was. My script set the
>CUDA_VISIBLE_DEVICES based on the content of $PBS_GPUFILE. Since the
>GPUs were listed twice in the $PBS_GPUFILE because of the two n
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
Boissonneault
Sent: Tuesday, August 19, 2014 8:55 AM
To: Open MPI Users
Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes
Hi,
I recompiled OMPI 1.8.1 without Cuda and with debug, but it did not
2014 8:55 AM
>To: Open MPI Users
>Subject: Re: [OMPI users] Segfault with MPI + Cuda on multiple nodes
>
>Hi,
>I recompiled OMPI 1.8.1 without Cuda and with debug, but it did not give me
>much more information.
>[mboisson@gpu-k20-07 simple_cuda_mpi]$ ompi_info | grep debug
>
Hi,
I recompiled OMPI 1.8.1 without Cuda and with debug, but it did not give
me much more information.
[mboisson@gpu-k20-07 simple_cuda_mpi]$ ompi_info | grep debug
Prefix:
/software-gpu/mpi/openmpi/1.8.1-debug_gcc4.8_nocuda
Internal debug support: yes
Memory debugging supp
: [OMPI users] Segfault with MPI + Cuda on multiple nodes
Same thing :
[mboisson@gpu-k20-07 simple_cuda_mpi]$ export MALLOC_CHECK_=1
[mboisson@gpu-k20-07 simple_cuda_mpi]$ mpiexec -np 2 --map-by ppr:1:node
cudampi_simple
malloc: using debugging hooks
malloc: using debugging hooks
[gpu-k20-07:47628
] Segfault with MPI + Cuda on multiple nodes
Same thing :
[mboisson@gpu-k20-07 simple_cuda_mpi]$ export MALLOC_CHECK_=1
[mboisson@gpu-k20-07 simple_cuda_mpi]$ mpiexec -np 2 --map-by ppr:1:node
cudampi_simple
malloc: using debugging hooks
malloc: using debugging hooks
[gpu-k20-07:47628] *** Process
] Segfault with MPI + Cuda on multiple nodes
Same thing :
[mboisson@gpu-k20-07 simple_cuda_mpi]$ export MALLOC_CHECK_=1
[mboisson@gpu-k20-07 simple_cuda_mpi]$ mpiexec -np 2 --map-by ppr:1:node
cudampi_simple
malloc: using debugging hooks
malloc: using debugging hooks
[gpu-k20-07:47628] *** Process
It's building... to be continued tomorrow morning.
Le 2014-08-18 16:45, Rolf vandeVaart a écrit :
Just to help reduce the scope of the problem, can you retest with a non
CUDA-aware Open MPI 1.8.1? And if possible, use --enable-debug in the
configure line to help with the stack trace?
Same thing :
[mboisson@gpu-k20-07 simple_cuda_mpi]$ export MALLOC_CHECK_=1
[mboisson@gpu-k20-07 simple_cuda_mpi]$ mpiexec -np 2 --map-by ppr:1:node
cudampi_simple
malloc: using debugging hooks
malloc: using debugging hooks
[gpu-k20-07:47628] *** Process received signal ***
[gpu-k20-07:47628] Si
Try the following:
export MALLOC_CHECK_=1
and then run it again
Kind regards,
Alex Granovsky
-Original Message-
From: Maxime Boissonneault
Sent: Tuesday, August 19, 2014 12:23 AM
To: Open MPI Users
Subject: [OMPI users] Segfault with MPI + Cuda on multiple nodes
Hi,
Since my previ
Just to help reduce the scope of the problem, can you retest with a non
CUDA-aware Open MPI 1.8.1? And if possible, use --enable-debug in the
configure line to help with the stack trace?
>-Original Message-
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Maxime
>Boissonn
12 matches
Mail list logo