Hello everyone,

in my application I use CUDA-aware OpenMPI 1.10.2 together with CUDA 7.5. If I 
call cudaSetDevice() cuda-memcheck reports this error for all subsequent MPI 
function calls:

========= CUDA-MEMCHECK
========= Program hit CUDA_ERROR_INVALID_VALUE (error 1) due to "invalid 
argument" on CUDA API call to cuPointerGetAttributes. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib64/libcuda.so.1 (cuPointerGetAttributes + 
0x18d) [0x144ffd]
=========     Host 
Frame:/modules/opt/spack/linux-x86_64/gcc-5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12
 [0xb0f52]
=========     Host 
Frame:/modules/opt/spack/linux-x86_64/gcc-5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libopen-pal.so.13
 (mca_cuda_convertor_init + 0xac) [0x3cbcc]
=========     Host 
Frame:/modules/opt/spack/linux-x86_64/gcc-5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libopen-pal.so.13
 (opal_convertor_prepare_for_recv + 0x25) [0x33f65]
=========     Host 
Frame:/modules/opt/spack/linux-x86_64/gcc-5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12
 (mca_pml_ob1_recv_req_start + 0x15e) [0x1b487e]
=========     Host 
Frame:/modules/opt/spack/linux-x86_64/gcc-5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12
 (mca_pml_ob1_irecv + 0xc4) [0x1ab464]
=========     Host 
Frame:/modules/opt/spack/linux-x86_64/gcc-5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12
 (ompi_coll_tuned_barrier_intra_recursivedoubling + 0xde) [0x13d79e]
=========     Host 
Frame:/modules/opt/spack/linux-x86_64/gcc-5.3.0/openmpicuda-1.10.2-y246ecjlhmkkh7lhbrgvdwpazc4mgetr/lib/libmpi.so.12
 (MPI_Barrier + 0x72) [0x86eb2]
=========     Host Frame:./Errortest [0x2cb3]
=========     Host Frame:/usr/lib64/libc.so.6 (__libc_start_main + 0xf5) 
[0x21b15]
=========     Host Frame:./Errortest [0x2b99]

A minimal example that reproduces the error on my system is:
#include <mpi.h>

int main(int argc, char *argv[]) {

    MPI_Init(&argc, &argv);

    cudaSetDevice(0);

    MPI_Barrier(MPI_COMM_WORLD);

    MPI_Finalize();
    return 0;
}

I find the same behavior when cudaSetDevice() is swapped with MPI_Init(). How 
can I avoid these errors and still select the GPU to work on?

Thank you, 
Kristina


Reply via email to