Hi Geoff:

Our original implementation used cuMemcpy for copying GPU memory into and out 
of host memory.  However, what we learned is that the cuMemcpy causes a 
synchronization for all work on the GPU.  This means that one could not overlap 
very well running a kernel and doing communication.  So, now we create an 
internal stream and then use that along with cuMemcpyAsync/cuStreamSynchronize 
for doing the copy.

In turns out in Jeremia’s case, he wanted to have a long running kernel and he 
wanted the MPI_Send/MPI_Recv to happen at the same time.  With the use of 
cuMemcpy, the MPI library was waiting for his kernel to complete before doing 
the cuMemcpy.

Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Geoffrey Paulsen
Sent: Wednesday, August 12, 2015 12:55 PM
To: us...@open-mpi.org
Cc: us...@open-mpi.org; Sameh S Sharkawi
Subject: Re: [OMPI users] CUDA Buffers: Enforce asynchronous memcpy's

I'm confused why this application needs an asynchronous cuMemcpyAsync()in a 
blocking MPI call.   Rolf could you please explain?

And how does is a call to cuMemcpyAsync() followed by a syncronization any 
different than a cuMemcpy() in this use case?

I would still expect that if the MPI_Send / Recv call issued the 
cuMemcpyAsync() that it would be MPI's responsibility to issue the 
synchronization call as well.



---
Geoffrey Paulsen
Software Engineer, IBM Platform MPI
IBM Platform-MPI
Phone: 720-349-2832
Email: gpaul...@us.ibm.com<mailto:gpaul...@us.ibm.com>
www.ibm.com<http://www.ibm.com>


----- Original message -----
From: Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>>
Sent by: "users" <users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>
To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>>
Cc:
Subject: Re: [OMPI users] CUDA Buffers: Enforce asynchronous memcpy's
Date: Tue, Aug 11, 2015 1:45 PM

I talked with Jeremia off list and we figured out what was going on.  There is 
the ability to use the cuMemcpyAsync/cuStreamSynchronize rather than the 
cuMemcpy but it was never made the default for Open MPI 1.8 series.  So, to get 
that behavior you need the following:

--mca mpi_common_cuda_cumemcpy_async 1

It is too late to change this in 1.8 but it will be made the default behavior 
in 1.10 and all future versions.  In addition, he is right about not being able 
to see these variables in the Open MPI 1.8 series.  This was a bug and it has 
been fixed in Open MPI v2.0.0.  Currently, there are no plans to bring that 
back into 1.10.

Rolf

>-----Original Message-----
>From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeremia Bär
>Sent: Tuesday, August 11, 2015 9:17 AM
>To: us...@open-mpi.org<mailto:us...@open-mpi.org>
>Subject: [OMPI users] CUDA Buffers: Enforce asynchronous memcpy's
>
>Hi!
>
>In my current application, MPI_Send/MPI_Recv hangs when using buffers in
>GPU device memory of a Nvidia GPU. I realized this is due to the fact that
>OpenMPI uses the synchronous cuMempcy rather than the asynchornous
>cuMemcpyAsync (see stacktrace at the bottom). However, in my application,
>synchronous copies cannot be used.
>
>I scanned through the source and saw support for async memcpy's are
>available. It's controlled by 'mca_common_cuda_cumemcpy_async' in
>./ompi/mca/common/cuda/common_cuda.c
>However, I can't find a way to enable it. It's not exposed in 'ompi_info' (but
>registered?). How can I enforce the use of cuMemcpyAsync in OpenMPI?
>Version used is OpenMPI 1.8.5.
>
>Thank you,
>Jeremia
>
>(gdb) bt
>#0  0x00002aaaaaaaba11 in clock_gettime ()
>#1  0x00000039e5803e46 in clock_gettime () from /lib64/librt.so.1
>#2  0x00002aaaab58a7ae in ?? () from /usr/lib64/libcuda.so.1
>#3  0x00002aaaaaf41dfb in ?? () from /usr/lib64/libcuda.so.1
>#4  0x00002aaaaaf1f623 in ?? () from /usr/lib64/libcuda.so.1
>#5  0x00002aaaaaf17361 in ?? () from /usr/lib64/libcuda.so.1
>#6  0x00002aaaaaf180b6 in ?? () from /usr/lib64/libcuda.so.1
>#7  0x00002aaaaae860c2 in ?? () from /usr/lib64/libcuda.so.1
>#8  0x00002aaaaae8621a in ?? () from /usr/lib64/libcuda.so.1
>#9  0x00002aaaaae69d85 in cuMemcpy () from /usr/lib64/libcuda.so.1
>#10 0x00002aaaaf0a7dea in mca_common_cuda_cu_memcpy () from
>/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libmca_common_c
>uda.so.1
>#11 0x00002aaaac992544 in opal_cuda_memcpy () from
>/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libopen-pal.so.6
>#12 0x00002aaaac98adf7 in opal_convertor_pack () from
>/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libopen-pal.so.6
>#13 0x00002aaab167c611 in mca_pml_ob1_send_request_start_copy () from
>/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/openmpi/mca_pm
>l_ob1.so
>#14 0x00002aaab167353f in mca_pml_ob1_send () from
>/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/openmpi/mca_pm
>l_ob1.so
>#15 0x00002aaaabf4f322 in PMPI_Send () from
>/users/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libmpi.so.1
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org<mailto:us...@open-mpi.org>
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>Link to this post: http://www.open-
>mpi.org/community/lists/users/2015/08/27424.php
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/08/27431.php


Reply via email to