FWIW, if Brian says that our one-sided stuff is a bit buggy, I believe him 
(because he wrote it).  :-)

The fact is that the MPI-2 one-sided stuff is extremely complicated and 
somewhat open to interpretation.  In practice, I haven't seen the MPI-2 
one-sided stuff used much in the wild.  The MPI-3 working group just revamped 
the one-sided support and generally made it much mo'betta.  Brian is 
re-implementing that stuff, and I believe it'll also be much mo'betta.

My point: I wouldn't worry if not all one-sided benchmarks run with OMPI.  No 
one uses them (yet) anyway.


On Feb 29, 2012, at 1:42 PM, Jingcha Joba wrote:

> When I ran my osu tests , I was able to get the numbers out of all the tests 
> except latency_mt (which was obvious, as I didnt compile open-mpi with multi 
> threaded support).
> A good way to know if the problem is with openmpi or with your custom OFED 
> stack would be to use some other device like tcp instead of ib and rerun 
> these one sided comm tests.
> On Wed, Feb 29, 2012 at 10:04 AM, Barrett, Brian W <bwba...@sandia.gov> wrote:
> I'm pretty sure that they are correct.  Our one-sided implementation is
> buggier than I'd like (indeed, I'm in the process of rewriting most of it
> as part of Open MPI's support for MPI-3's revised RDMA), so it's likely
> that the bugs are in Open MPI's onesided support.  Can you try a more
> recent release (something from the 1.5 tree) and see if the problem
> persists?
> 
> Thanks,
> 
> Brian
> 
> On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquy...@cisco.com> wrote:
> 
> >FWIW, I'm immediately suspicious of *any* MPI application that uses the
> >MPI one-sided operations (i.e., MPI_PUT and MPI_GET).  It looks like
> >these two OSU benchmarks are using those operations.
> >
> >Is it known that these two benchmarks are correct?
> >
> >
> >
> >On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote:
> >
> >> Sorry, i forgot to introduce the system.. Ours is the customized OFED
> >>stack implemented to work on the specific hardware.. We tested the stack
> >>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We
> >>want to execute the osu_benchamark3.1.1 suite on our OFED..
> >>
> >> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku
> >><dvrao....@gmail.com> wrote:
> >> Hiii,
> >> I tried executing osu_benchamarks-3.1.1 suite with the openmpi-1.4.3...
> >>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_
> >> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and
> >>the remaining tests are hanging at some message size.. the output is
> >>shown below
> >>
> >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> >>orte_base_help_aggregate 0
> >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >>   Local host:            test1
> >>   Device name:           plx2_0
> >>   Device vendor ID:      0x10b5
> >>   Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance.  You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >>       btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >>   Local host:            test2
> >>   Device name:           plx2_0
> >>   Device vendor ID:      0x10b5
> >>   Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance.  You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >>       btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1
> >> # Size     Bi-Bandwidth (MB/s)
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> 1                         0.00
> >> 2                         0.00
> >> 4                         0.01
> >> 8                         0.03
> >> 16                        0.07
> >> 32                        0.15
> >> 64                        0.11
> >> 128                       0.21
> >> 256                       0.43
> >> 512                       0.88
> >> 1024                      2.10
> >> 2048                      4.21
> >> 4096                      8.10
> >> 8192                     16.19
> >> 16384                     8.46
> >> 32768                    20.34
> >> 65536                    39.85
> >> 131072                   84.22
> >> 262144                  142.23
> >> 524288                  234.83
> >> mpirun: killing job...
> >>
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> mpirun noticed that process rank 0 with PID 7305 on node test2 exited
> >>on signal 0 (Unknown signal 0).
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> 2 total processes killed (some possibly by mpirun during cleanup)
> >> mpirun: clean termination accomplished
> >>
> >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> >>orte_base_help_aggregate 0
> >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >>   Local host:            test1
> >>   Device name:           plx2_0
> >>   Device vendor ID:      0x10b5
> >>   Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance.  You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >>       btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >>   Local host:            test2
> >>   Device name:           plx2_0
> >>   Device vendor ID:      0x10b5
> >>   Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance.  You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >>       btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> # OSU One Sided MPI_Put Bandwidth Test v3.1.1
> >> # Size        Bandwidth (MB/s)
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> 1                         0.02
> >> 2                         0.05
> >> 4                         0.10
> >> 8                         0.19
> >> 16                        0.39
> >> 32                        0.77
> >> 64                        1.53
> >> 128                       2.57
> >> 256                       4.16
> >> 512                       8.30
> >> 1024                     16.62
> >> 2048                     33.22
> >> 4096                     66.51
> >> 8192                     42.45
> >> 16384                    11.99
> >> 32768                    18.20
> >> 65536                    76.04
> >> 131072                   98.64
> >> 262144                  407.66
> >> 524288                  489.84
> >> mpirun: killing job...
> >>
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> mpirun noticed that process rank 0 with PID 7314 on node test2 exited
> >>on signal 0 (Unknown signal 0).
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> 2 total processes killed (some possibly by mpirun during cleanup)
> >> mpirun: clean termination accomplished
> >>
> >> I even checked the logs but i couldn't see any errors...
> >> Could you suggest a way to overcome/debug this issue..
> >>
> >> Thanks for the kind reply..
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> D.Venkateswara Rao,
> >> Software Engineer,One Convergence Devices Pvt Ltd.,
> >> Jubille Hills,Hyderabad.
> >>
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> D.Venkateswara Rao,
> >> Software Engineer,One Convergence Devices Pvt Ltd.,
> >> Jubille Hills,Hyderabad.
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >--
> >Jeff Squyres
> >jsquy...@cisco.com
> >For corporate legal information go to:
> >http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> >_______________________________________________
> >users mailing list
> >us...@open-mpi.org
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> 
> 
> --
>  Brian W. Barrett
>  Dept. 1423: Scalable System Software
>  Sandia National Laboratories
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to