FWIW, if Brian says that our one-sided stuff is a bit buggy, I believe him (because he wrote it). :-)
The fact is that the MPI-2 one-sided stuff is extremely complicated and somewhat open to interpretation. In practice, I haven't seen the MPI-2 one-sided stuff used much in the wild. The MPI-3 working group just revamped the one-sided support and generally made it much mo'betta. Brian is re-implementing that stuff, and I believe it'll also be much mo'betta. My point: I wouldn't worry if not all one-sided benchmarks run with OMPI. No one uses them (yet) anyway. On Feb 29, 2012, at 1:42 PM, Jingcha Joba wrote: > When I ran my osu tests , I was able to get the numbers out of all the tests > except latency_mt (which was obvious, as I didnt compile open-mpi with multi > threaded support). > A good way to know if the problem is with openmpi or with your custom OFED > stack would be to use some other device like tcp instead of ib and rerun > these one sided comm tests. > On Wed, Feb 29, 2012 at 10:04 AM, Barrett, Brian W <bwba...@sandia.gov> wrote: > I'm pretty sure that they are correct. Our one-sided implementation is > buggier than I'd like (indeed, I'm in the process of rewriting most of it > as part of Open MPI's support for MPI-3's revised RDMA), so it's likely > that the bugs are in Open MPI's onesided support. Can you try a more > recent release (something from the 1.5 tree) and see if the problem > persists? > > Thanks, > > Brian > > On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquy...@cisco.com> wrote: > > >FWIW, I'm immediately suspicious of *any* MPI application that uses the > >MPI one-sided operations (i.e., MPI_PUT and MPI_GET). It looks like > >these two OSU benchmarks are using those operations. > > > >Is it known that these two benchmarks are correct? > > > > > > > >On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote: > > > >> Sorry, i forgot to introduce the system.. Ours is the customized OFED > >>stack implemented to work on the specific hardware.. We tested the stack > >>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We > >>want to execute the osu_benchamark3.1.1 suite on our OFED.. > >> > >> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku > >><dvrao....@gmail.com> wrote: > >> Hiii, > >> I tried executing osu_benchamarks-3.1.1 suite with the openmpi-1.4.3... > >>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_ > >> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and > >>the remaining tests are hanging at some message size.. the output is > >>shown below > >> > >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca > >>orte_base_help_aggregate 0 > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw > >> failed to create doorbell file /dev/plx2_char_dev > >> > >>------------------------------------------------------------------------- > >>- > >> WARNING: No preset parameters were found for the device that Open MPI > >> detected: > >> > >> Local host: test1 > >> Device name: plx2_0 > >> Device vendor ID: 0x10b5 > >> Device vendor part ID: 4277 > >> > >> Default device parameters will be used, which may result in lower > >> performance. You can edit any of the files specified by the > >> btl_openib_device_param_files MCA parameter to set values for your > >> device. > >> > >> NOTE: You can turn off this warning by setting the MCA parameter > >> btl_openib_warn_no_device_params_found to 0. > >> > >>------------------------------------------------------------------------- > >>- > >> failed to create doorbell file /dev/plx2_char_dev > >> > >>------------------------------------------------------------------------- > >>- > >> WARNING: No preset parameters were found for the device that Open MPI > >> detected: > >> > >> Local host: test2 > >> Device name: plx2_0 > >> Device vendor ID: 0x10b5 > >> Device vendor part ID: 4277 > >> > >> Default device parameters will be used, which may result in lower > >> performance. You can edit any of the files specified by the > >> btl_openib_device_param_files MCA parameter to set values for your > >> device. > >> > >> NOTE: You can turn off this warning by setting the MCA parameter > >> btl_openib_warn_no_device_params_found to 0. > >> > >>------------------------------------------------------------------------- > >>- > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1 > >> # Size Bi-Bandwidth (MB/s) > >> plx2_create_qp line: 415 > >> plx2_create_qp line: 415 > >> plx2_create_qp line: 415 > >> plx2_create_qp line: 415 > >> 1 0.00 > >> 2 0.00 > >> 4 0.01 > >> 8 0.03 > >> 16 0.07 > >> 32 0.15 > >> 64 0.11 > >> 128 0.21 > >> 256 0.43 > >> 512 0.88 > >> 1024 2.10 > >> 2048 4.21 > >> 4096 8.10 > >> 8192 16.19 > >> 16384 8.46 > >> 32768 20.34 > >> 65536 39.85 > >> 131072 84.22 > >> 262144 142.23 > >> 524288 234.83 > >> mpirun: killing job... > >> > >> > >>------------------------------------------------------------------------- > >>- > >> mpirun noticed that process rank 0 with PID 7305 on node test2 exited > >>on signal 0 (Unknown signal 0). > >> > >>------------------------------------------------------------------------- > >>- > >> 2 total processes killed (some possibly by mpirun during cleanup) > >> mpirun: clean termination accomplished > >> > >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca > >>orte_base_help_aggregate 0 > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw > >> failed to create doorbell file /dev/plx2_char_dev > >> > >>------------------------------------------------------------------------- > >>- > >> WARNING: No preset parameters were found for the device that Open MPI > >> detected: > >> > >> Local host: test1 > >> Device name: plx2_0 > >> Device vendor ID: 0x10b5 > >> Device vendor part ID: 4277 > >> > >> Default device parameters will be used, which may result in lower > >> performance. You can edit any of the files specified by the > >> btl_openib_device_param_files MCA parameter to set values for your > >> device. > >> > >> NOTE: You can turn off this warning by setting the MCA parameter > >> btl_openib_warn_no_device_params_found to 0. > >> > >>------------------------------------------------------------------------- > >>- > >> failed to create doorbell file /dev/plx2_char_dev > >> > >>------------------------------------------------------------------------- > >>- > >> WARNING: No preset parameters were found for the device that Open MPI > >> detected: > >> > >> Local host: test2 > >> Device name: plx2_0 > >> Device vendor ID: 0x10b5 > >> Device vendor part ID: 4277 > >> > >> Default device parameters will be used, which may result in lower > >> performance. You can edit any of the files specified by the > >> btl_openib_device_param_files MCA parameter to set values for your > >> device. > >> > >> NOTE: You can turn off this warning by setting the MCA parameter > >> btl_openib_warn_no_device_params_found to 0. > >> > >>------------------------------------------------------------------------- > >>- > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> alloc_srq max: 512 wqe_shift: 5 > >> # OSU One Sided MPI_Put Bandwidth Test v3.1.1 > >> # Size Bandwidth (MB/s) > >> plx2_create_qp line: 415 > >> plx2_create_qp line: 415 > >> plx2_create_qp line: 415 > >> plx2_create_qp line: 415 > >> 1 0.02 > >> 2 0.05 > >> 4 0.10 > >> 8 0.19 > >> 16 0.39 > >> 32 0.77 > >> 64 1.53 > >> 128 2.57 > >> 256 4.16 > >> 512 8.30 > >> 1024 16.62 > >> 2048 33.22 > >> 4096 66.51 > >> 8192 42.45 > >> 16384 11.99 > >> 32768 18.20 > >> 65536 76.04 > >> 131072 98.64 > >> 262144 407.66 > >> 524288 489.84 > >> mpirun: killing job... > >> > >> > >>------------------------------------------------------------------------- > >>- > >> mpirun noticed that process rank 0 with PID 7314 on node test2 exited > >>on signal 0 (Unknown signal 0). > >> > >>------------------------------------------------------------------------- > >>- > >> 2 total processes killed (some possibly by mpirun during cleanup) > >> mpirun: clean termination accomplished > >> > >> I even checked the logs but i couldn't see any errors... > >> Could you suggest a way to overcome/debug this issue.. > >> > >> Thanks for the kind reply.. > >> > >> > >> -- > >> Thanks & Regards, > >> D.Venkateswara Rao, > >> Software Engineer,One Convergence Devices Pvt Ltd., > >> Jubille Hills,Hyderabad. > >> > >> > >> > >> > >> -- > >> Thanks & Regards, > >> D.Venkateswara Rao, > >> Software Engineer,One Convergence Devices Pvt Ltd., > >> Jubille Hills,Hyderabad. > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > >-- > >Jeff Squyres > >jsquy...@cisco.com > >For corporate legal information go to: > >http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > >_______________________________________________ > >users mailing list > >us...@open-mpi.org > >http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > -- > Brian W. Barrett > Dept. 1423: Scalable System Software > Sandia National Laboratories > > > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/