Short and portable answer: you need to sync before the Isend or you will
send garbage data.

Assuming you are willing to go for a less portable solution you can get the
OMPI streams and add your kernels inside, so that the sequential order will
guarantee correctness of your isend. We have 2 hidden CUDA streams in OMPI,
one for device-to-host and one for host-to-device, that can be queried with
the non-MPI standard compliant functions (mca_common_cuda_get_dtoh_stream
and mca_common_cuda_get_htod_stream).

George.


On Wed, Nov 27, 2019 at 4:02 PM Zhang, Junchao via users <
users@lists.open-mpi.org> wrote:

> Hi,
>   Suppose I have this piece of code and I use cuda-aware MPI,
>               cudaMalloc(&sbuf,sz);
>
>    Kernel1<<<...,stream>>>(...,sbuf);
>    MPI_Isend(sbuf,...);
>    Kernel2<<<...,stream>>>();
>
>
>   Do I need to call cudaStreamSynchronize(stream) before MPI_Isend() to
> make sure data in sbuf is ready to send?  If not, why?
>
>   Thank you.
>
> --Junchao Zhang
>

Reply via email to