Hi Geoff: Our original implementation used cuMemcpy for copying GPU memory into and out of host memory. However, what we learned is that the cuMemcpy causes a synchronization for all work on the GPU. This means that one could not overlap very well running a kernel and doing communication. So, now we create an internal stream and then use that along with cuMemcpyAsync/cuStreamSynchronize for doing the copy.
In turns out in Jeremia’s case, he wanted to have a long running kernel and he wanted the MPI_Send/MPI_Recv to happen at the same time. With the use of cuMemcpy, the MPI library was waiting for his kernel to complete before doing the cuMemcpy. Rolf From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Geoffrey Paulsen Sent: Wednesday, August 12, 2015 12:55 PM To: us...@open-mpi.org Cc: us...@open-mpi.org; Sameh S Sharkawi Subject: Re: [OMPI users] CUDA Buffers: Enforce asynchronous memcpy's I'm confused why this application needs an asynchronous cuMemcpyAsync()in a blocking MPI call. Rolf could you please explain? And how does is a call to cuMemcpyAsync() followed by a syncronization any different than a cuMemcpy() in this use case? I would still expect that if the MPI_Send / Recv call issued the cuMemcpyAsync() that it would be MPI's responsibility to issue the synchronization call as well. --- Geoffrey Paulsen Software Engineer, IBM Platform MPI IBM Platform-MPI Phone: 720-349-2832 Email: gpaul...@us.ibm.com<mailto:gpaul...@us.ibm.com> www.ibm.com<http://www.ibm.com> ----- Original message ----- From: Rolf vandeVaart <rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> Sent by: "users" <users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>> To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>> Cc: Subject: Re: [OMPI users] CUDA Buffers: Enforce asynchronous memcpy's Date: Tue, Aug 11, 2015 1:45 PM I talked with Jeremia off list and we figured out what was going on. There is the ability to use the cuMemcpyAsync/cuStreamSynchronize rather than the cuMemcpy but it was never made the default for Open MPI 1.8 series. So, to get that behavior you need the following: --mca mpi_common_cuda_cumemcpy_async 1 It is too late to change this in 1.8 but it will be made the default behavior in 1.10 and all future versions. In addition, he is right about not being able to see these variables in the Open MPI 1.8 series. This was a bug and it has been fixed in Open MPI v2.0.0. Currently, there are no plans to bring that back into 1.10. Rolf >-----Original Message----- >From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeremia Bär >Sent: Tuesday, August 11, 2015 9:17 AM >To: us...@open-mpi.org<mailto:us...@open-mpi.org> >Subject: [OMPI users] CUDA Buffers: Enforce asynchronous memcpy's > >Hi! > >In my current application, MPI_Send/MPI_Recv hangs when using buffers in >GPU device memory of a Nvidia GPU. I realized this is due to the fact that >OpenMPI uses the synchronous cuMempcy rather than the asynchornous >cuMemcpyAsync (see stacktrace at the bottom). However, in my application, >synchronous copies cannot be used. > >I scanned through the source and saw support for async memcpy's are >available. It's controlled by 'mca_common_cuda_cumemcpy_async' in >./ompi/mca/common/cuda/common_cuda.c >However, I can't find a way to enable it. It's not exposed in 'ompi_info' (but >registered?). How can I enforce the use of cuMemcpyAsync in OpenMPI? >Version used is OpenMPI 1.8.5. > >Thank you, >Jeremia > >(gdb) bt >#0 0x00002aaaaaaaba11 in clock_gettime () >#1 0x00000039e5803e46 in clock_gettime () from /lib64/librt.so.1 >#2 0x00002aaaab58a7ae in ?? () from /usr/lib64/libcuda.so.1 >#3 0x00002aaaaaf41dfb in ?? () from /usr/lib64/libcuda.so.1 >#4 0x00002aaaaaf1f623 in ?? () from /usr/lib64/libcuda.so.1 >#5 0x00002aaaaaf17361 in ?? () from /usr/lib64/libcuda.so.1 >#6 0x00002aaaaaf180b6 in ?? () from /usr/lib64/libcuda.so.1 >#7 0x00002aaaaae860c2 in ?? () from /usr/lib64/libcuda.so.1 >#8 0x00002aaaaae8621a in ?? () from /usr/lib64/libcuda.so.1 >#9 0x00002aaaaae69d85 in cuMemcpy () from /usr/lib64/libcuda.so.1 >#10 0x00002aaaaf0a7dea in mca_common_cuda_cu_memcpy () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libmca_common_c >uda.so.1 >#11 0x00002aaaac992544 in opal_cuda_memcpy () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libopen-pal.so.6 >#12 0x00002aaaac98adf7 in opal_convertor_pack () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libopen-pal.so.6 >#13 0x00002aaab167c611 in mca_pml_ob1_send_request_start_copy () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/openmpi/mca_pm >l_ob1.so >#14 0x00002aaab167353f in mca_pml_ob1_send () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/openmpi/mca_pm >l_ob1.so >#15 0x00002aaaabf4f322 in PMPI_Send () from >/users/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libmpi.so.1 > >_______________________________________________ >users mailing list >us...@open-mpi.org<mailto:us...@open-mpi.org> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: http://www.open- >mpi.org/community/lists/users/2015/08/27424.php ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/08/27431.php