:bosi...@icl.utk.edu>>
Cc: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>; Open MPI
Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] CUDA mpi question
I was pointed to "2.7. Synchronization and Memory Ordering" of
https://docs.nvidia.com/pdf/GP
rn 1;
>>
>> }
>>
>> }
>>
>>
>>
>> for (int i = 0; i < num_threads; i++) {
>>
>> if(pthread_join(threads[i], NULL)) {
>>
>> fprintf(stderr, "Error joining threadn");
>>
&g
2019 5:43 PM
To: George Bosilca mailto:bosi...@icl.utk.edu>>
Cc: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>; Open MPI
Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] CUDA mpi question
I was pointed to "2.7. Synchronization and Memory Ordering" of
I was pointed to "2.7. Synchronization and Memory Ordering" of
https://docs.nvidia.com/pdf/GPUDirect_RDMA.pdf. It is on topic. But
unfortunately it is too short and I could not understand it.
I also checked cudaStreamAddCallback/cudaLaunchHostFunc, which say the host
function "must not make any
On Wed, Nov 27, 2019 at 5:02 PM Zhang, Junchao wrote:
> On Wed, Nov 27, 2019 at 3:16 PM George Bosilca
> wrote:
>
>> Short and portable answer: you need to sync before the Isend or you will
>> send garbage data.
>>
> Ideally, I want to formulate my code into a series of asynchronous "kernel
> la
On Wed, Nov 27, 2019 at 3:16 PM George Bosilca
mailto:bosi...@icl.utk.edu>> wrote:
Short and portable answer: you need to sync before the Isend or you will send
garbage data.
Ideally, I want to formulate my code into a series of asynchronous "kernel
launch, kernel launch, ..." without synchron
Short and portable answer: you need to sync before the Isend or you will
send garbage data.
Assuming you are willing to go for a less portable solution you can get the
OMPI streams and add your kernels inside, so that the sequential order will
guarantee correctness of your isend. We have 2 hidden