The CUDA-aware support is only available when running with the verbs interface to Infiniband. It does not work with the PSM interface which is being used in your installation. To verify this, you need to disable the usage of PSM. This can be done in a variety of ways, but try running like this:
mpirun -mca pml ob1 ..... This will force the use of the verbs support layer (openib) with the CUDA-aware support. From: users [mailto:users-boun...@open-mpi.org] On Behalf Of KESTENER Pierre Sent: Wednesday, October 30, 2013 12:07 PM To: us...@open-mpi.org Subject: Re: [OMPI users] OpenMPI-1.7.3 - cuda support Dear Rolf, thank for looking into this. Here is the complete backtrace for execution using 2 GPUs on the same node: (cuda-gdb) bt #0 0x00007ffff711d885 in raise () from /lib64/libc.so.6 #1 0x00007ffff711f065 in abort () from /lib64/libc.so.6 #2 0x00007ffff0387b8d in psmi_errhandler_psm (ep=<value optimized out>, err=PSM_INTERNAL_ERR, error_string=<value optimized out>, token=<value optimized out>) at psm_error.c:76 #3 0x00007ffff0387df1 in psmi_handle_error (ep=0xfffffffffffffffe, error=PSM_INTERNAL_ERR, buf=<value optimized out>) at psm_error.c:154 #4 0x00007ffff0382f6a in psmi_am_mq_handler_rtsmatch (toki=0x7fffffffc6a0, args=0x7fffed0461d0, narg=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at ptl.c:200 #5 0x00007ffff037a832 in process_packet (ptl=0x737818, pkt=0x7fffed0461c0, isreq=<value optimized out>) at am_reqrep_shmem.c:2164 #6 0x00007ffff037d90f in amsh_poll_internal_inner (ptl=0x737818, replyonly=0) at am_reqrep_shmem.c:1756 #7 amsh_poll (ptl=0x737818, replyonly=0) at am_reqrep_shmem.c:1810 #8 0x00007ffff03a0329 in __psmi_poll_internal (ep=0x737538, poll_amsh=<value optimized out>) at psm.c:465 #9 0x00007ffff039f0af in psmi_mq_wait_inner (ireq=0x7fffffffc848) at psm_mq.c:299 #10 psmi_mq_wait_internal (ireq=0x7fffffffc848) at psm_mq.c:334 #11 0x00007ffff037db21 in amsh_mq_send_inner (ptl=0x737818, mq=<value optimized out>, epaddr=0x6eb418, flags=<value optimized out>, tag=844424930131968, ubuf=0x1308350000, len=32768) ---Type <return> to continue, or q <return> to quit--- at am_reqrep_shmem.c:2339 #12 amsh_mq_send (ptl=0x737818, mq=<value optimized out>, epaddr=0x6eb418, flags=<value optimized out>, tag=844424930131968, ubuf=0x1308350000, len=32768) at am_reqrep_shmem.c:2387 #13 0x00007ffff039ed71 in __psm_mq_send (mq=<value optimized out>, dest=<value optimized out>, flags=<value optimized out>, stag=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at psm_mq.c:413 #14 0x00007ffff05c4ea8 in ompi_mtl_psm_send () from /gpfslocal/pub/openmpi/1.7.3/lib/openmpi/mca_mtl_psm.so #15 0x00007ffff1eeddea in mca_pml_cm_send () from /gpfslocal/pub/openmpi/1.7.3/lib/openmpi/mca_pml_cm.so #16 0x00007ffff79253da in PMPI_Sendrecv () from /gpfslocal/pub/openmpi/1.7.3/lib/libmpi.so.1 #17 0x00000000004045ef in ExchangeHalos (cartComm=0x715460, devSend=0x1308350000, hostSend=0x7b8710, hostRecv=0x7c0720, devRecv=0x1308358000, neighbor=1, elemCount=4096) at CUDA_Aware_MPI.c:70 #18 0x00000000004033d8 in TransferAllHalos (cartComm=0x715460, domSize=0x7fffffffcd80, topIndex=0x7fffffffcd60, neighbors=0x7fffffffcd90, copyStream=0xaa4450, devBlocks=0x7fffffffcd30, devSideEdges=0x7fffffffcd20, devHaloLines=0x7fffffffcd10, hostSendLines=0x7fffffffcd00, hostRecvLines=0x7fffffffccf0) at Host.c:400 #19 0x000000000040363c in RunJacobi (cartComm=0x715460, rank=0, size=2, ---Type <return> to continue, or q <return> to quit--- domSize=0x7fffffffcd80, topIndex=0x7fffffffcd60, neighbors=0x7fffffffcd90, useFastSwap=0, devBlocks=0x7fffffffcd30, devSideEdges=0x7fffffffcd20, devHaloLines=0x7fffffffcd10, hostSendLines=0x7fffffffcd00, hostRecvLines=0x7fffffffccf0, devResidue=0x1310480000, copyStream=0xaa4450, iterations=0x7fffffffcd44, avgTransferTime=0x7fffffffcd48) at Host.c:466 #20 0x0000000000401ccb in main (argc=4, argv=0x7fffffffcea8) at Jacobi.c:60 Pierre. ________________________________ De : KESTENER Pierre Date d'envoi : mercredi 30 octobre 2013 16:34 À : us...@open-mpi.org<mailto:us...@open-mpi.org> Cc: KESTENER Pierre Objet : OpenMPI-1.7.3 - cuda support Hello, I'm having problems running a simple cuda-aware mpi application; the one found at https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example I have modified symbol ENV_LOCAL_RANK into OMPI_COMM_WORLD_LOCAL_RANK My cluster has 2 K20m GPUs per node, with QLogic IB stack. The normal CUDA/MPI application works fine; but the cuda-ware mpi app is crashing when using 2 MPI proc over the 2 GPUs of the same node: the error message is: Assertion failure at ptl.c:200: nbytes == msglen I can send the complete backtrace from cuda-gdb if needed. The same app when running on 2 GPUs on 2 different nodes give another error: jacobi_cuda_aware_mpi:28280 terminated with signal 11 at PC=2aae9d7c9f78 SP=7fffc06c21f8. Backtrace: /gpfslocal/pub/local/lib64/libinfinipath.so.4(+0x8f78)[0x2aae9d7c9f78] Can someone give me hints where to look to track this problem ? Thank you. Pierre Kestener. ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------