I talked with Jeremia off list and we figured out what was going on. There is the ability to use the cuMemcpyAsync/cuStreamSynchronize rather than the cuMemcpy but it was never made the default for Open MPI 1.8 series. So, to get that behavior you need the following:
--mca mpi_common_cuda_cumemcpy_async 1 It is too late to change this in 1.8 but it will be made the default behavior in 1.10 and all future versions. In addition, he is right about not being able to see these variables in the Open MPI 1.8 series. This was a bug and it has been fixed in Open MPI v2.0.0. Currently, there are no plans to bring that back into 1.10. Rolf >-----Original Message----- >From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeremia Bär >Sent: Tuesday, August 11, 2015 9:17 AM >To: us...@open-mpi.org >Subject: [OMPI users] CUDA Buffers: Enforce asynchronous memcpy's > >Hi! > >In my current application, MPI_Send/MPI_Recv hangs when using buffers in >GPU device memory of a Nvidia GPU. I realized this is due to the fact that >OpenMPI uses the synchronous cuMempcy rather than the asynchornous >cuMemcpyAsync (see stacktrace at the bottom). However, in my application, >synchronous copies cannot be used. > >I scanned through the source and saw support for async memcpy's are >available. It's controlled by 'mca_common_cuda_cumemcpy_async' in >./ompi/mca/common/cuda/common_cuda.c >However, I can't find a way to enable it. It's not exposed in 'ompi_info' (but >registered?). How can I enforce the use of cuMemcpyAsync in OpenMPI? >Version used is OpenMPI 1.8.5. > >Thank you, >Jeremia > >(gdb) bt >#0 0x00002aaaaaaaba11 in clock_gettime () >#1 0x00000039e5803e46 in clock_gettime () from /lib64/librt.so.1 >#2 0x00002aaaab58a7ae in ?? () from /usr/lib64/libcuda.so.1 >#3 0x00002aaaaaf41dfb in ?? () from /usr/lib64/libcuda.so.1 >#4 0x00002aaaaaf1f623 in ?? () from /usr/lib64/libcuda.so.1 >#5 0x00002aaaaaf17361 in ?? () from /usr/lib64/libcuda.so.1 >#6 0x00002aaaaaf180b6 in ?? () from /usr/lib64/libcuda.so.1 >#7 0x00002aaaaae860c2 in ?? () from /usr/lib64/libcuda.so.1 >#8 0x00002aaaaae8621a in ?? () from /usr/lib64/libcuda.so.1 >#9 0x00002aaaaae69d85 in cuMemcpy () from /usr/lib64/libcuda.so.1 >#10 0x00002aaaaf0a7dea in mca_common_cuda_cu_memcpy () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libmca_common_c >uda.so.1 >#11 0x00002aaaac992544 in opal_cuda_memcpy () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libopen-pal.so.6 >#12 0x00002aaaac98adf7 in opal_convertor_pack () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libopen-pal.so.6 >#13 0x00002aaab167c611 in mca_pml_ob1_send_request_start_copy () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/openmpi/mca_pm >l_ob1.so >#14 0x00002aaab167353f in mca_pml_ob1_send () from >/home/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/openmpi/mca_pm >l_ob1.so >#15 0x00002aaaabf4f322 in PMPI_Send () from >/users/jbaer/local_root/opt/openmpi_from_src_1.8.5/lib/libmpi.so.1 > >_______________________________________________ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: http://www.open- >mpi.org/community/lists/users/2015/08/27424.php ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------