Hi Kristina,

Although its reported by cuda-memcheck as an error it really is an expected 
return code from cuPointerGetAttributes. The CUDA-aware build of OpenMPI calls 
cuPointerGetAttributes to query if a pointer is a host or device pointer. 
Memory allocated with the system allocator (malloc, global, stack and static 
data) is not part of the Unified Virtual Addressspace (UVA) known to the driver 
and therefore cuPointerGetAttributes returns CUDA_ERROR_INVALID_VALUE for 
those. For OpenMPI this simply means that it is a host pointer.

Hope this helps

Jiri

> -----Original Message-----
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of users-
> requ...@open-mpi.org
> Sent: Mittwoch, 8. Juni 2016 18:00
> To: us...@open-mpi.org
> Subject: users Digest, Vol 3525, Issue 3
>    1. cuda-memcheck reports errors for MPI functions after    use of
>       cudaSetDevice (Kristina Tesch)
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 8 Jun 2016 14:59:24 +0200
> From: Kristina Tesch <kristina.te...@gmx.de>
> To: us...@open-mpi.org
> Subject: [OMPI users] cuda-memcheck reports errors for MPI functions
>       after   use of cudaSetDevice
> Message-ID: <2efdd862-c5b1-44f1-9710-932fa7411...@gmx.de>
> Content-Type: text/plain; charset="us-ascii"
> 
> Hello everyone,
> 
> in my application I use CUDA-aware OpenMPI 1.10.2 together with CUDA
> 7.5. If I call cudaSetDevice() cuda-memcheck reports this error for all
> subsequent MPI function calls:
> 
> ========= CUDA-MEMCHECK
> ========= Program hit CUDA_ERROR_INVALID_VALUE (error 1) due to
> "invalid argument" on CUDA API call to cuPointerGetAttributes.
> =========     Saved host backtrace up to driver entry point at error
> =========     Host Frame:/usr/lib64/libcuda.so.1 (cuPointerGetAttributes +
> 0x18d) [0x144ffd]
> =========     Host Frame:/modules/opt/spack/linux-x86_64/gcc-
> 5.3.0/openmpicuda-1.10.2-
> y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12 [0xb0f52]
> =========     Host Frame:/modules/opt/spack/linux-x86_64/gcc-
> 5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libopen-
> pal.so.13 (mca_cuda_convertor_init + 0xac) [0x3cbcc]
> =========     Host Frame:/modules/opt/spack/linux-x86_64/gcc-
> 5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libopen-
> pal.so.13 (opal_convertor_prepare_for_recv + 0x25) [0x33f65]
> =========     Host Frame:/modules/opt/spack/linux-x86_64/gcc-
> 5.3.0/openmpicuda-1.10.2-
> y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12
> (mca_pml_ob1_recv_req_start + 0x15e) [0x1b487e]
> =========     Host Frame:/modules/opt/spack/linux-x86_64/gcc-
> 5.3.0/openmpicuda-1.10.2-
> y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12 (mca_pml_ob1_irecv +
> 0xc4) [0x1ab464]
> =========     Host Frame:/modules/opt/spack/linux-x86_64/gcc-
> 5.3.0/openmpicuda-1.10.2-
> y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12
> (ompi_coll_tuned_barrier_intra_recursivedoubling + 0xde) [0x13d79e]
> =========     Host Frame:/modules/opt/spack/linux-x86_64/gcc-
> 5.3.0/openmpicuda-1.10.2-
> y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12 (MPI_Barrier + 0x72)
> [0x86eb2]
> =========     Host Frame:./Errortest [0x2cb3]
> =========     Host Frame:/usr/lib64/libc.so.6 (__libc_start_main + 0xf5)
> [0x21b15]
> =========     Host Frame:./Errortest [0x2b99]
> 
> A minimal example that reproduces the error on my system is:
> #include <mpi.h>
> 
> int main(int argc, char *argv[]) {
> 
>     MPI_Init(&argc, &argv);
> 
>     cudaSetDevice(0);
> 
>     MPI_Barrier(MPI_COMM_WORLD);
> 
>     MPI_Finalize();
>     return 0;
> }
> 
> I find the same behavior when cudaSetDevice() is swapped with MPI_Init().
> How can I avoid these errors and still select the GPU to work on?
> 
> Thank you,
> Kristina
> 
> End of users Digest, Vol 3525, Issue 3
> **************************************
NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361
Managing Director: Karen Theresa Burns

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Reply via email to