On Feb 29, 2012, at 2:30 PM, Jingcha Joba wrote: > Squyres, > I thought RDMA read and write are implemented as one side communication using > get and put respectively.. > Is it not so?
Yes and no. Keep in mind the difference between two things here: - An an underlying transport's one-sided capabilities (e.g., using InfiniBand RDMA reads/writes) - MPI one-sided and/or two-sided message passing Most OpenFabrics-capable MPI's use OF RDMA reads and writes for sending large messages (both one and two sided). But it's not always the case. For example, it may not be worth it to use RDMA for short messages because of the cost of registering memory, negotiating the target address for the RDMA read/write (which may require a round-tip ACK), etc. So OF-capable MPI's basically divorce the two issues. The underlying transport will choose the "best" method (whether it's a send/recv style exchange, an RDMA-stle exchange, or a mixture of the two). Make sense? > On Wed, Feb 29, 2012 at 10:49 AM, Jeffrey Squyres <jsquy...@cisco.com> wrote: > FWIW, if Brian says that our one-sided stuff is a bit buggy, I believe him > (because he wrote it). :-) > > The fact is that the MPI-2 one-sided stuff is extremely complicated and > somewhat open to interpretation. In practice, I haven't seen the MPI-2 > one-sided stuff used much in the wild. The MPI-3 working group just revamped > the one-sided support and generally made it much mo'betta. Brian is > re-implementing that stuff, and I believe it'll also be much mo'betta. > > My point: I wouldn't worry if not all one-sided benchmarks run with OMPI. No > one uses them (yet) anyway. > > > On Feb 29, 2012, at 1:42 PM, Jingcha Joba wrote: > > > When I ran my osu tests , I was able to get the numbers out of all the > > tests except latency_mt (which was obvious, as I didnt compile open-mpi > > with multi threaded support). > > A good way to know if the problem is with openmpi or with your custom OFED > > stack would be to use some other device like tcp instead of ib and rerun > > these one sided comm tests. > > On Wed, Feb 29, 2012 at 10:04 AM, Barrett, Brian W <bwba...@sandia.gov> > > wrote: > > I'm pretty sure that they are correct. Our one-sided implementation is > > buggier than I'd like (indeed, I'm in the process of rewriting most of it > > as part of Open MPI's support for MPI-3's revised RDMA), so it's likely > > that the bugs are in Open MPI's onesided support. Can you try a more > > recent release (something from the 1.5 tree) and see if the problem > > persists? > > > > Thanks, > > > > Brian > > > > On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquy...@cisco.com> wrote: > > > > >FWIW, I'm immediately suspicious of *any* MPI application that uses the > > >MPI one-sided operations (i.e., MPI_PUT and MPI_GET). It looks like > > >these two OSU benchmarks are using those operations. > > > > > >Is it known that these two benchmarks are correct? > > > > > > > > > > > >On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote: > > > > > >> Sorry, i forgot to introduce the system.. Ours is the customized OFED > > >>stack implemented to work on the specific hardware.. We tested the stack > > >>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We > > >>want to execute the osu_benchamark3.1.1 suite on our OFED.. > > >> > > >> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku > > >><dvrao....@gmail.com> wrote: > > >> Hiii, > > >> I tried executing osu_benchamarks-3.1.1 suite with the openmpi-1.4.3... > > >>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_ > > >> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and > > >>the remaining tests are hanging at some message size.. the output is > > >>shown below > > >> > > >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl > > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca > > >>orte_base_help_aggregate 0 > > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw > > >> failed to create doorbell file /dev/plx2_char_dev > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> WARNING: No preset parameters were found for the device that Open MPI > > >> detected: > > >> > > >> Local host: test1 > > >> Device name: plx2_0 > > >> Device vendor ID: 0x10b5 > > >> Device vendor part ID: 4277 > > >> > > >> Default device parameters will be used, which may result in lower > > >> performance. You can edit any of the files specified by the > > >> btl_openib_device_param_files MCA parameter to set values for your > > >> device. > > >> > > >> NOTE: You can turn off this warning by setting the MCA parameter > > >> btl_openib_warn_no_device_params_found to 0. > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> failed to create doorbell file /dev/plx2_char_dev > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> WARNING: No preset parameters were found for the device that Open MPI > > >> detected: > > >> > > >> Local host: test2 > > >> Device name: plx2_0 > > >> Device vendor ID: 0x10b5 > > >> Device vendor part ID: 4277 > > >> > > >> Default device parameters will be used, which may result in lower > > >> performance. You can edit any of the files specified by the > > >> btl_openib_device_param_files MCA parameter to set values for your > > >> device. > > >> > > >> NOTE: You can turn off this warning by setting the MCA parameter > > >> btl_openib_warn_no_device_params_found to 0. > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1 > > >> # Size Bi-Bandwidth (MB/s) > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> 1 0.00 > > >> 2 0.00 > > >> 4 0.01 > > >> 8 0.03 > > >> 16 0.07 > > >> 32 0.15 > > >> 64 0.11 > > >> 128 0.21 > > >> 256 0.43 > > >> 512 0.88 > > >> 1024 2.10 > > >> 2048 4.21 > > >> 4096 8.10 > > >> 8192 16.19 > > >> 16384 8.46 > > >> 32768 20.34 > > >> 65536 39.85 > > >> 131072 84.22 > > >> 262144 142.23 > > >> 524288 234.83 > > >> mpirun: killing job... > > >> > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> mpirun noticed that process rank 0 with PID 7305 on node test2 exited > > >>on signal 0 (Unknown signal 0). > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> 2 total processes killed (some possibly by mpirun during cleanup) > > >> mpirun: clean termination accomplished > > >> > > >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl > > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca > > >>orte_base_help_aggregate 0 > > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw > > >> failed to create doorbell file /dev/plx2_char_dev > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> WARNING: No preset parameters were found for the device that Open MPI > > >> detected: > > >> > > >> Local host: test1 > > >> Device name: plx2_0 > > >> Device vendor ID: 0x10b5 > > >> Device vendor part ID: 4277 > > >> > > >> Default device parameters will be used, which may result in lower > > >> performance. You can edit any of the files specified by the > > >> btl_openib_device_param_files MCA parameter to set values for your > > >> device. > > >> > > >> NOTE: You can turn off this warning by setting the MCA parameter > > >> btl_openib_warn_no_device_params_found to 0. > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> failed to create doorbell file /dev/plx2_char_dev > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> WARNING: No preset parameters were found for the device that Open MPI > > >> detected: > > >> > > >> Local host: test2 > > >> Device name: plx2_0 > > >> Device vendor ID: 0x10b5 > > >> Device vendor part ID: 4277 > > >> > > >> Default device parameters will be used, which may result in lower > > >> performance. You can edit any of the files specified by the > > >> btl_openib_device_param_files MCA parameter to set values for your > > >> device. > > >> > > >> NOTE: You can turn off this warning by setting the MCA parameter > > >> btl_openib_warn_no_device_params_found to 0. > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> alloc_srq max: 512 wqe_shift: 5 > > >> # OSU One Sided MPI_Put Bandwidth Test v3.1.1 > > >> # Size Bandwidth (MB/s) > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> plx2_create_qp line: 415 > > >> 1 0.02 > > >> 2 0.05 > > >> 4 0.10 > > >> 8 0.19 > > >> 16 0.39 > > >> 32 0.77 > > >> 64 1.53 > > >> 128 2.57 > > >> 256 4.16 > > >> 512 8.30 > > >> 1024 16.62 > > >> 2048 33.22 > > >> 4096 66.51 > > >> 8192 42.45 > > >> 16384 11.99 > > >> 32768 18.20 > > >> 65536 76.04 > > >> 131072 98.64 > > >> 262144 407.66 > > >> 524288 489.84 > > >> mpirun: killing job... > > >> > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> mpirun noticed that process rank 0 with PID 7314 on node test2 exited > > >>on signal 0 (Unknown signal 0). > > >> > > >>------------------------------------------------------------------------- > > >>- > > >> 2 total processes killed (some possibly by mpirun during cleanup) > > >> mpirun: clean termination accomplished > > >> > > >> I even checked the logs but i couldn't see any errors... > > >> Could you suggest a way to overcome/debug this issue.. > > >> > > >> Thanks for the kind reply.. > > >> > > >> > > >> -- > > >> Thanks & Regards, > > >> D.Venkateswara Rao, > > >> Software Engineer,One Convergence Devices Pvt Ltd., > > >> Jubille Hills,Hyderabad. > > >> > > >> > > >> > > >> > > >> -- > > >> Thanks & Regards, > > >> D.Venkateswara Rao, > > >> Software Engineer,One Convergence Devices Pvt Ltd., > > >> Jubille Hills,Hyderabad. > > >> > > >> _______________________________________________ > > >> users mailing list > > >> us...@open-mpi.org > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > >-- > > >Jeff Squyres > > >jsquy...@cisco.com > > >For corporate legal information go to: > > >http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > > > >_______________________________________________ > > >users mailing list > > >us...@open-mpi.org > > >http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > > > -- > > Brian W. Barrett > > Dept. 1423: Scalable System Software > > Sandia National Laboratories > > > > > > > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/