[OMPI users] OpenMPI-1.7.3 - cuda support
Hello, I'm having problems running a simple cuda-aware mpi application; the one found at https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example I have modified symbol ENV_LOCAL_RANK into OMPI_COMM_WORLD_LOCAL_RANK My cluster has 2 K20m GPUs per node, with QLogic IB stack. The normal CUDA/MPI application works fine; but the cuda-ware mpi app is crashing when using 2 MPI proc over the 2 GPUs of the same node: the error message is: Assertion failure at ptl.c:200: nbytes == msglen I can send the complete backtrace from cuda-gdb if needed. The same app when running on 2 GPUs on 2 different nodes give another error: jacobi_cuda_aware_mpi:28280 terminated with signal 11 at PC=2aae9d7c9f78 SP=7fffc06c21f8. Backtrace: /gpfslocal/pub/local/lib64/libinfinipath.so.4(+0x8f78)[0x2aae9d7c9f78] Can someone give me hints where to look to track this problem ? Thank you. Pierre Kestener.
Re: [OMPI users] OpenMPI-1.7.3 - cuda support
Dear Rolf, thank for looking into this. Here is the complete backtrace for execution using 2 GPUs on the same node: (cuda-gdb) bt #0 0x7711d885 in raise () from /lib64/libc.so.6 #1 0x7711f065 in abort () from /lib64/libc.so.6 #2 0x70387b8d in psmi_errhandler_psm (ep=, err=PSM_INTERNAL_ERR, error_string=, token=) at psm_error.c:76 #3 0x70387df1 in psmi_handle_error (ep=0xfffe, error=PSM_INTERNAL_ERR, buf=) at psm_error.c:154 #4 0x70382f6a in psmi_am_mq_handler_rtsmatch (toki=0x7fffc6a0, args=0x7fffed0461d0, narg=, buf=, len=) at ptl.c:200 #5 0x7037a832 in process_packet (ptl=0x737818, pkt=0x7fffed0461c0, isreq=) at am_reqrep_shmem.c:2164 #6 0x7037d90f in amsh_poll_internal_inner (ptl=0x737818, replyonly=0) at am_reqrep_shmem.c:1756 #7 amsh_poll (ptl=0x737818, replyonly=0) at am_reqrep_shmem.c:1810 #8 0x703a0329 in __psmi_poll_internal (ep=0x737538, poll_amsh=) at psm.c:465 #9 0x7039f0af in psmi_mq_wait_inner (ireq=0x7fffc848) at psm_mq.c:299 #10 psmi_mq_wait_internal (ireq=0x7fffc848) at psm_mq.c:334 #11 0x7037db21 in amsh_mq_send_inner (ptl=0x737818, mq=, epaddr=0x6eb418, flags=, tag=844424930131968, ubuf=0x130835, len=32768) ---Type to continue, or q to quit--- at am_reqrep_shmem.c:2339 #12 amsh_mq_send (ptl=0x737818, mq=, epaddr=0x6eb418, flags=, tag=844424930131968, ubuf=0x130835, len=32768) at am_reqrep_shmem.c:2387 #13 0x7039ed71 in __psm_mq_send (mq=, dest=, flags=, stag=, buf=, len=) at psm_mq.c:413 #14 0x705c4ea8 in ompi_mtl_psm_send () from /gpfslocal/pub/openmpi/1.7.3/lib/openmpi/mca_mtl_psm.so #15 0x71eeddea in mca_pml_cm_send () from /gpfslocal/pub/openmpi/1.7.3/lib/openmpi/mca_pml_cm.so #16 0x779253da in PMPI_Sendrecv () from /gpfslocal/pub/openmpi/1.7.3/lib/libmpi.so.1 #17 0x004045ef in ExchangeHalos (cartComm=0x715460, devSend=0x130835, hostSend=0x7b8710, hostRecv=0x7c0720, devRecv=0x1308358000, neighbor=1, elemCount=4096) at CUDA_Aware_MPI.c:70 #18 0x004033d8 in TransferAllHalos (cartComm=0x715460, domSize=0x7fffcd80, topIndex=0x7fffcd60, neighbors=0x7fffcd90, copyStream=0xaa4450, devBlocks=0x7fffcd30, devSideEdges=0x7fffcd20, devHaloLines=0x7fffcd10, hostSendLines=0x7fffcd00, hostRecvLines=0x7fffccf0) at Host.c:400 #19 0x0040363c in RunJacobi (cartComm=0x715460, rank=0, size=2, ---Type to continue, or q to quit--- domSize=0x7fffcd80, topIndex=0x7fffcd60, neighbors=0x7fffcd90, useFastSwap=0, devBlocks=0x7fffcd30, devSideEdges=0x7fffcd20, devHaloLines=0x7fffcd10, hostSendLines=0x7fffcd00, hostRecvLines=0x7fffccf0, devResidue=0x131048, copyStream=0xaa4450, iterations=0x7fffcd44, avgTransferTime=0x7fffcd48) at Host.c:466 #20 0x00401ccb in main (argc=4, argv=0x7fffcea8) at Jacobi.c:60 Pierre. De : KESTENER Pierre Date d'envoi : mercredi 30 octobre 2013 16:34 À : us...@open-mpi.org Cc: KESTENER Pierre Objet : OpenMPI-1.7.3 - cuda support Hello, I'm having problems running a simple cuda-aware mpi application; the one found at https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example I have modified symbol ENV_LOCAL_RANK into OMPI_COMM_WORLD_LOCAL_RANK My cluster has 2 K20m GPUs per node, with QLogic IB stack. The normal CUDA/MPI application works fine; but the cuda-ware mpi app is crashing when using 2 MPI proc over the 2 GPUs of the same node: the error message is: Assertion failure at ptl.c:200: nbytes == msglen I can send the complete backtrace from cuda-gdb if needed. The same app when running on 2 GPUs on 2 different nodes give another error: jacobi_cuda_aware_mpi:28280 terminated with signal 11 at PC=2aae9d7c9f78 SP=7fffc06c21f8. Backtrace: /gpfslocal/pub/local/lib64/libinfinipath.so.4(+0x8f78)[0x2aae9d7c9f78] Can someone give me hints where to look to track this problem ? Thank you. Pierre Kestener.
Re: [OMPI users] OpenMPI-1.7.3 - cuda support
Thanks for your help, it is working now; I didn't noticed that limitations. Best regards, Pierre Kestener. De : users [users-boun...@open-mpi.org] de la part de Rolf vandeVaart [rvandeva...@nvidia.com] Date d'envoi : mercredi 30 octobre 2013 17:26 À : Open MPI Users Objet : Re: [OMPI users] OpenMPI-1.7.3 - cuda support The CUDA-aware support is only available when running with the verbs interface to Infiniband. It does not work with the PSM interface which is being used in your installation. To verify this, you need to disable the usage of PSM. This can be done in a variety of ways, but try running like this: mpirun –mca pml ob1 ….. This will force the use of the verbs support layer (openib) with the CUDA-aware support. From: users [mailto:users-boun...@open-mpi.org] On Behalf Of KESTENER Pierre Sent: Wednesday, October 30, 2013 12:07 PM To: us...@open-mpi.org Subject: Re: [OMPI users] OpenMPI-1.7.3 - cuda support Dear Rolf, thank for looking into this. Here is the complete backtrace for execution using 2 GPUs on the same node: (cuda-gdb) bt #0 0x7711d885 in raise () from /lib64/libc.so.6 #1 0x7711f065 in abort () from /lib64/libc.so.6 #2 0x70387b8d in psmi_errhandler_psm (ep=, err=PSM_INTERNAL_ERR, error_string=, token=) at psm_error.c:76 #3 0x70387df1 in psmi_handle_error (ep=0xfffe, error=PSM_INTERNAL_ERR, buf=) at psm_error.c:154 #4 0x70382f6a in psmi_am_mq_handler_rtsmatch (toki=0x7fffc6a0, args=0x7fffed0461d0, narg=, buf=, len=) at ptl.c:200 #5 0x7037a832 in process_packet (ptl=0x737818, pkt=0x7fffed0461c0, isreq=) at am_reqrep_shmem.c:2164 #6 0x7037d90f in amsh_poll_internal_inner (ptl=0x737818, replyonly=0) at am_reqrep_shmem.c:1756 #7 amsh_poll (ptl=0x737818, replyonly=0) at am_reqrep_shmem.c:1810 #8 0x703a0329 in __psmi_poll_internal (ep=0x737538, poll_amsh=) at psm.c:465 #9 0x7039f0af in psmi_mq_wait_inner (ireq=0x7fffc848) at psm_mq.c:299 #10 psmi_mq_wait_internal (ireq=0x7fffc848) at psm_mq.c:334 #11 0x7037db21 in amsh_mq_send_inner (ptl=0x737818, mq=, epaddr=0x6eb418, flags=, tag=844424930131968, ubuf=0x130835, len=32768) ---Type to continue, or q to quit--- at am_reqrep_shmem.c:2339 #12 amsh_mq_send (ptl=0x737818, mq=, epaddr=0x6eb418, flags=, tag=844424930131968, ubuf=0x130835, len=32768) at am_reqrep_shmem.c:2387 #13 0x7039ed71 in __psm_mq_send (mq=, dest=, flags=, stag=, buf=, len=) at psm_mq.c:413 #14 0x705c4ea8 in ompi_mtl_psm_send () from /gpfslocal/pub/openmpi/1.7.3/lib/openmpi/mca_mtl_psm.so #15 0x71eeddea in mca_pml_cm_send () from /gpfslocal/pub/openmpi/1.7.3/lib/openmpi/mca_pml_cm.so #16 0x779253da in PMPI_Sendrecv () from /gpfslocal/pub/openmpi/1.7.3/lib/libmpi.so.1 #17 0x004045ef in ExchangeHalos (cartComm=0x715460, devSend=0x130835, hostSend=0x7b8710, hostRecv=0x7c0720, devRecv=0x1308358000, neighbor=1, elemCount=4096) at CUDA_Aware_MPI.c:70 #18 0x004033d8 in TransferAllHalos (cartComm=0x715460, domSize=0x7fffcd80, topIndex=0x7fffcd60, neighbors=0x7fffcd90, copyStream=0xaa4450, devBlocks=0x7fffcd30, devSideEdges=0x7fffcd20, devHaloLines=0x7fffcd10, hostSendLines=0x7fffcd00, hostRecvLines=0x7fffccf0) at Host.c:400 #19 0x0040363c in RunJacobi (cartComm=0x715460, rank=0, size=2, ---Type to continue, or q to quit--- domSize=0x7fffcd80, topIndex=0x7fffcd60, neighbors=0x7fffcd90, useFastSwap=0, devBlocks=0x7fffcd30, devSideEdges=0x7fffcd20, devHaloLines=0x7fffcd10, hostSendLines=0x7fffcd00, hostRecvLines=0x7fffccf0, devResidue=0x131048, copyStream=0xaa4450, iterations=0x7fffcd44, avgTransferTime=0x7fffcd48) at Host.c:466 #20 0x00401ccb in main (argc=4, argv=0x7fffcea8) at Jacobi.c:60 Pierre. De : KESTENER Pierre Date d'envoi : mercredi 30 octobre 2013 16:34 À : us...@open-mpi.org<mailto:us...@open-mpi.org> Cc: KESTENER Pierre Objet : OpenMPI-1.7.3 - cuda support Hello, I'm having problems running a simple cuda-aware mpi application; the one found at https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example I have modified symbol ENV_LOCAL_RANK into OMPI_COMM_WORLD_LOCAL_RANK My cluster has 2 K20m GPUs per node, with QLogic IB stack. The normal CUDA/MPI application works fine; but the cuda-ware mpi app is crashing when using 2 MPI proc over the 2 GPUs of the same node: the error message is: Assertion failure at ptl.c:200: nbytes == msglen I can send the complete backtrace from cuda-gdb if needed. The same app when running on 2 GPUs on 2 different nodes give another error: jacobi_cuda_aware_mpi:28280 terminated with signal 11 at PC=2aa