Hi there,

I am using multiple MPI non-blocking send receives on the GPU buffer
followed by a waitall at the end; I also repeat this process multiple times.

The MPI version that I am using 1.10.2.

When multiple processes are assigned to a single GPU (or when CUDA IPC is
used), I get the following error at the beginning

The call to cuIpcGetEventHandle failed. This is a unrecoverable error and
will
cause the program to abort.
  cuIpcGetEventHandle return value:   1

and this at the end of my benchmark

The call to cuEventDestory failed. This is a unrecoverable error and will
cause the program to abort.
  cuEventDestory return value:   400
Check the cuda.h file for what the return value means.


*Note1: *

This error doesn't appear if only one iteration of the non-blocking
send/receive call is used (i.e., using MPI_Waitall only once )

This error doesn't appear if multiple iterations are used by MPI_Waitall is
not included.

*Note 2:*

This error doesn't exist if the buffer is is allocated on the host.

*Note 3:*

This error doesn't exist if cuda_ipc is disabled or OMPI version 1.8.8 is
used.


I'd appreciate if you let me know what causes this issue and how it can be
resolved.

Regards,
Iman

Reply via email to