The CUDA person is now responding. I will try and reproduce. I looked through the zip file but did not see the mpirun command. Can this be reproduced with -np 4 running across four nodes? Also, in your original message you wrote "Likewise, it doesn't matter if I enable CUDA support or not. " Can you provide more detail about what that means? Thanks
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, November 06, 2014 1:05 PM To: Open MPI Users Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of MPI_Ibcast I was hoping our CUDA person would respond, but in the interim - I would suggest trying the nightly 1.8.4 tarball as we are getting ready to release it, and I know there were some CUDA-related patches since 1.8.1 http://www.open-mpi.org/nightly/v1.8/ On Nov 5, 2014, at 4:45 PM, Steven Eliuk <s.el...@samsung.com<mailto:s.el...@samsung.com>> wrote: OpenMPI: 1.8.1 with CUDA RDMA... Thanks sir and sorry for the late response, Kindest Regards, - Steven Eliuk, Ph.D. Comp Sci, Advanced Software Platforms Lab, SRA - SV, Samsung Electronics, 1732 North First Street, San Jose, CA 95112, Work: +1 408-652-1976, Work: +1 408-544-5781 Wednesdays, Cell: +1 408-819-4407. From: Ralph Castain <rhc.open...@gmail.com<mailto:rhc.open...@gmail.com>> Reply-To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>> List-Post: users@lists.open-mpi.org Date: Monday, November 3, 2014 at 10:02 AM To: Open MPI Users <us...@open-mpi.org<mailto:us...@open-mpi.org>> Subject: Re: [OMPI users] Randomly long (100ms vs 7000+ms) fulfillment of MPI_Ibcast Which version of OMPI were you testing? On Nov 3, 2014, at 9:14 AM, Steven Eliuk <s.el...@samsung.com<mailto:s.el...@samsung.com>> wrote: Hello, We were using OpenMPI for some testing, everything works fine but randomly, MPI_Ibcast() takes long time to finish. We have a standalone program just to test it. The following is the profiling results of the simple test program on our cluster: Ibcast 604 mb takes 103 ms Ibcast 608 mb takes 106 ms Ibcast 612 mb takes 105 ms Ibcast 616 mb takes 105 ms Ibcast 620 mb takes 107 ms Ibcast 624 mb takes 107 ms Ibcast 628 mb takes 108 ms Ibcast 632 mb takes 110 ms Ibcast 636 mb takes 110 ms Ibcast 640 mb takes 7437 ms Ibcast 644 mb takes 115 ms Ibcast 648 mb takes 111 ms Ibcast 652 mb takes 112 ms Ibcast 656 mb takes 112 ms Ibcast 660 mb takes 114 ms Ibcast 664 mb takes 114 ms Ibcast 668 mb takes 115 ms Ibcast 672 mb takes 116 ms Ibcast 676 mb takes 116 ms Ibcast 680 mb takes 116 ms Ibcast 684 mb takes 122 ms Ibcast 688 mb takes 7385 ms Ibcast 692 mb takes 8729 ms Ibcast 696 mb takes 120 ms Ibcast 700 mb takes 124 ms Ibcast 704 mb takes 121 ms Ibcast 708 mb takes 8240 ms Ibcast 712 mb takes 122 ms Ibcast 716 mb takes 123 ms Ibcast 720 mb takes 123 ms Ibcast 724 mb takes 124 ms Ibcast 728 mb takes 125 ms Ibcast 732 mb takes 125 ms Ibcast 736 mb takes 126 ms As you can see, Ibcast takes a long to finish and it's totally random. The same program was compiled and tested with MVAPICH2-gdr but it went smoothly. Both tests were running exclusively on our four nodes cluster without contention. Likewise, it doesn't matter if I enable CUDA support or not. The followings are the configuration of our server: We have four nodes in this test, each with one K40 GPU and connected with mellanox IB. Please find attached config details and some sample code... Kindest Regards, - Steven Eliuk, Ph.D. Comp Sci, Advanced Software Platforms Lab, SRA - SV, Samsung Electronics, 1732 North First Street, San Jose, CA 95112, Work: +1 408-652-1976, Work: +1 408-544-5781 Wednesdays, Cell: +1 408-819-4407. <Ibcast_config_details.txt.zip><Ibcast_SampleCode.cpp>_______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/11/25662.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/11/25695.php ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------