Hi Kristina, Although its reported by cuda-memcheck as an error it really is an expected return code from cuPointerGetAttributes. The CUDA-aware build of OpenMPI calls cuPointerGetAttributes to query if a pointer is a host or device pointer. Memory allocated with the system allocator (malloc, global, stack and static data) is not part of the Unified Virtual Addressspace (UVA) known to the driver and therefore cuPointerGetAttributes returns CUDA_ERROR_INVALID_VALUE for those. For OpenMPI this simply means that it is a host pointer.
Hope this helps Jiri > -----Original Message----- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of users- > requ...@open-mpi.org > Sent: Mittwoch, 8. Juni 2016 18:00 > To: us...@open-mpi.org > Subject: users Digest, Vol 3525, Issue 3 > 1. cuda-memcheck reports errors for MPI functions after use of > cudaSetDevice (Kristina Tesch) > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 8 Jun 2016 14:59:24 +0200 > From: Kristina Tesch <kristina.te...@gmx.de> > To: us...@open-mpi.org > Subject: [OMPI users] cuda-memcheck reports errors for MPI functions > after use of cudaSetDevice > Message-ID: <2efdd862-c5b1-44f1-9710-932fa7411...@gmx.de> > Content-Type: text/plain; charset="us-ascii" > > Hello everyone, > > in my application I use CUDA-aware OpenMPI 1.10.2 together with CUDA > 7.5. If I call cudaSetDevice() cuda-memcheck reports this error for all > subsequent MPI function calls: > > ========= CUDA-MEMCHECK > ========= Program hit CUDA_ERROR_INVALID_VALUE (error 1) due to > "invalid argument" on CUDA API call to cuPointerGetAttributes. > ========= Saved host backtrace up to driver entry point at error > ========= Host Frame:/usr/lib64/libcuda.so.1 (cuPointerGetAttributes + > 0x18d) [0x144ffd] > ========= Host Frame:/modules/opt/spack/linux-x86_64/gcc- > 5.3.0/openmpicuda-1.10.2- > y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12 [0xb0f52] > ========= Host Frame:/modules/opt/spack/linux-x86_64/gcc- > 5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libopen- > pal.so.13 (mca_cuda_convertor_init + 0xac) [0x3cbcc] > ========= Host Frame:/modules/opt/spack/linux-x86_64/gcc- > 5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libopen- > pal.so.13 (opal_convertor_prepare_for_recv + 0x25) [0x33f65] > ========= Host Frame:/modules/opt/spack/linux-x86_64/gcc- > 5.3.0/openmpicuda-1.10.2- > y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12 > (mca_pml_ob1_recv_req_start + 0x15e) [0x1b487e] > ========= Host Frame:/modules/opt/spack/linux-x86_64/gcc- > 5.3.0/openmpicuda-1.10.2- > y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12 (mca_pml_ob1_irecv + > 0xc4) [0x1ab464] > ========= Host Frame:/modules/opt/spack/linux-x86_64/gcc- > 5.3.0/openmpicuda-1.10.2- > y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12 > (ompi_coll_tuned_barrier_intra_recursivedoubling + 0xde) [0x13d79e] > ========= Host Frame:/modules/opt/spack/linux-x86_64/gcc- > 5.3.0/openmpicuda-1.10.2- > y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12 (MPI_Barrier + 0x72) > [0x86eb2] > ========= Host Frame:./Errortest [0x2cb3] > ========= Host Frame:/usr/lib64/libc.so.6 (__libc_start_main + 0xf5) > [0x21b15] > ========= Host Frame:./Errortest [0x2b99] > > A minimal example that reproduces the error on my system is: > #include <mpi.h> > > int main(int argc, char *argv[]) { > > MPI_Init(&argc, &argv); > > cudaSetDevice(0); > > MPI_Barrier(MPI_COMM_WORLD); > > MPI_Finalize(); > return 0; > } > > I find the same behavior when cudaSetDevice() is swapped with MPI_Init(). > How can I avoid these errors and still select the GPU to work on? > > Thank you, > Kristina > > End of users Digest, Vol 3525, Issue 3 > ************************************** NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361 Managing Director: Karen Theresa Burns ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------