On Feb 29, 2012, at 2:30 PM, Jingcha Joba wrote:

> Squyres,
> I thought RDMA read and write are implemented as one side communication using 
> get and put respectively..
> Is it not so? 

Yes and no.

Keep in mind the difference between two things here:

- An an underlying transport's one-sided capabilities (e.g., using InfiniBand 
RDMA reads/writes)
- MPI one-sided and/or two-sided message passing

Most OpenFabrics-capable MPI's use OF RDMA reads and writes for sending large 
messages (both one and two sided).  But it's not always the case.  For example, 
it may not be worth it to use RDMA for short messages because of the cost of 
registering memory, negotiating the target address for the RDMA read/write 
(which may require a round-tip ACK), etc.

So OF-capable MPI's basically divorce the two issues.  The underlying transport 
will choose the "best" method (whether it's a send/recv style exchange, an 
RDMA-stle exchange, or a mixture of the two).

Make sense?


> On Wed, Feb 29, 2012 at 10:49 AM, Jeffrey Squyres <jsquy...@cisco.com> wrote:
> FWIW, if Brian says that our one-sided stuff is a bit buggy, I believe him 
> (because he wrote it).  :-)
> 
> The fact is that the MPI-2 one-sided stuff is extremely complicated and 
> somewhat open to interpretation.  In practice, I haven't seen the MPI-2 
> one-sided stuff used much in the wild.  The MPI-3 working group just revamped 
> the one-sided support and generally made it much mo'betta.  Brian is 
> re-implementing that stuff, and I believe it'll also be much mo'betta.
> 
> My point: I wouldn't worry if not all one-sided benchmarks run with OMPI.  No 
> one uses them (yet) anyway.
> 
> 
> On Feb 29, 2012, at 1:42 PM, Jingcha Joba wrote:
> 
> > When I ran my osu tests , I was able to get the numbers out of all the 
> > tests except latency_mt (which was obvious, as I didnt compile open-mpi 
> > with multi threaded support).
> > A good way to know if the problem is with openmpi or with your custom OFED 
> > stack would be to use some other device like tcp instead of ib and rerun 
> > these one sided comm tests.
> > On Wed, Feb 29, 2012 at 10:04 AM, Barrett, Brian W <bwba...@sandia.gov> 
> > wrote:
> > I'm pretty sure that they are correct.  Our one-sided implementation is
> > buggier than I'd like (indeed, I'm in the process of rewriting most of it
> > as part of Open MPI's support for MPI-3's revised RDMA), so it's likely
> > that the bugs are in Open MPI's onesided support.  Can you try a more
> > recent release (something from the 1.5 tree) and see if the problem
> > persists?
> >
> > Thanks,
> >
> > Brian
> >
> > On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquy...@cisco.com> wrote:
> >
> > >FWIW, I'm immediately suspicious of *any* MPI application that uses the
> > >MPI one-sided operations (i.e., MPI_PUT and MPI_GET).  It looks like
> > >these two OSU benchmarks are using those operations.
> > >
> > >Is it known that these two benchmarks are correct?
> > >
> > >
> > >
> > >On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote:
> > >
> > >> Sorry, i forgot to introduce the system.. Ours is the customized OFED
> > >>stack implemented to work on the specific hardware.. We tested the stack
> > >>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We
> > >>want to execute the osu_benchamark3.1.1 suite on our OFED..
> > >>
> > >> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku
> > >><dvrao....@gmail.com> wrote:
> > >> Hiii,
> > >> I tried executing osu_benchamarks-3.1.1 suite with the openmpi-1.4.3...
> > >>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_
> > >> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and
> > >>the remaining tests are hanging at some message size.. the output is
> > >>shown below
> > >>
> > >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> > >>orte_base_help_aggregate 0
> > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw
> > >> failed to create doorbell file /dev/plx2_char_dev
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> WARNING: No preset parameters were found for the device that Open MPI
> > >> detected:
> > >>
> > >>   Local host:            test1
> > >>   Device name:           plx2_0
> > >>   Device vendor ID:      0x10b5
> > >>   Device vendor part ID: 4277
> > >>
> > >> Default device parameters will be used, which may result in lower
> > >> performance.  You can edit any of the files specified by the
> > >> btl_openib_device_param_files MCA parameter to set values for your
> > >> device.
> > >>
> > >> NOTE: You can turn off this warning by setting the MCA parameter
> > >>       btl_openib_warn_no_device_params_found to 0.
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> failed to create doorbell file /dev/plx2_char_dev
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> WARNING: No preset parameters were found for the device that Open MPI
> > >> detected:
> > >>
> > >>   Local host:            test2
> > >>   Device name:           plx2_0
> > >>   Device vendor ID:      0x10b5
> > >>   Device vendor part ID: 4277
> > >>
> > >> Default device parameters will be used, which may result in lower
> > >> performance.  You can edit any of the files specified by the
> > >> btl_openib_device_param_files MCA parameter to set values for your
> > >> device.
> > >>
> > >> NOTE: You can turn off this warning by setting the MCA parameter
> > >>       btl_openib_warn_no_device_params_found to 0.
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1
> > >> # Size     Bi-Bandwidth (MB/s)
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> 1                         0.00
> > >> 2                         0.00
> > >> 4                         0.01
> > >> 8                         0.03
> > >> 16                        0.07
> > >> 32                        0.15
> > >> 64                        0.11
> > >> 128                       0.21
> > >> 256                       0.43
> > >> 512                       0.88
> > >> 1024                      2.10
> > >> 2048                      4.21
> > >> 4096                      8.10
> > >> 8192                     16.19
> > >> 16384                     8.46
> > >> 32768                    20.34
> > >> 65536                    39.85
> > >> 131072                   84.22
> > >> 262144                  142.23
> > >> 524288                  234.83
> > >> mpirun: killing job...
> > >>
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> mpirun noticed that process rank 0 with PID 7305 on node test2 exited
> > >>on signal 0 (Unknown signal 0).
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> 2 total processes killed (some possibly by mpirun during cleanup)
> > >> mpirun: clean termination accomplished
> > >>
> > >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> > >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> > >>orte_base_help_aggregate 0
> > >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw
> > >> failed to create doorbell file /dev/plx2_char_dev
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> WARNING: No preset parameters were found for the device that Open MPI
> > >> detected:
> > >>
> > >>   Local host:            test1
> > >>   Device name:           plx2_0
> > >>   Device vendor ID:      0x10b5
> > >>   Device vendor part ID: 4277
> > >>
> > >> Default device parameters will be used, which may result in lower
> > >> performance.  You can edit any of the files specified by the
> > >> btl_openib_device_param_files MCA parameter to set values for your
> > >> device.
> > >>
> > >> NOTE: You can turn off this warning by setting the MCA parameter
> > >>       btl_openib_warn_no_device_params_found to 0.
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> failed to create doorbell file /dev/plx2_char_dev
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> WARNING: No preset parameters were found for the device that Open MPI
> > >> detected:
> > >>
> > >>   Local host:            test2
> > >>   Device name:           plx2_0
> > >>   Device vendor ID:      0x10b5
> > >>   Device vendor part ID: 4277
> > >>
> > >> Default device parameters will be used, which may result in lower
> > >> performance.  You can edit any of the files specified by the
> > >> btl_openib_device_param_files MCA parameter to set values for your
> > >> device.
> > >>
> > >> NOTE: You can turn off this warning by setting the MCA parameter
> > >>       btl_openib_warn_no_device_params_found to 0.
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> alloc_srq max: 512 wqe_shift: 5
> > >> # OSU One Sided MPI_Put Bandwidth Test v3.1.1
> > >> # Size        Bandwidth (MB/s)
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> plx2_create_qp line: 415
> > >> 1                         0.02
> > >> 2                         0.05
> > >> 4                         0.10
> > >> 8                         0.19
> > >> 16                        0.39
> > >> 32                        0.77
> > >> 64                        1.53
> > >> 128                       2.57
> > >> 256                       4.16
> > >> 512                       8.30
> > >> 1024                     16.62
> > >> 2048                     33.22
> > >> 4096                     66.51
> > >> 8192                     42.45
> > >> 16384                    11.99
> > >> 32768                    18.20
> > >> 65536                    76.04
> > >> 131072                   98.64
> > >> 262144                  407.66
> > >> 524288                  489.84
> > >> mpirun: killing job...
> > >>
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> mpirun noticed that process rank 0 with PID 7314 on node test2 exited
> > >>on signal 0 (Unknown signal 0).
> > >>
> > >>-------------------------------------------------------------------------
> > >>-
> > >> 2 total processes killed (some possibly by mpirun during cleanup)
> > >> mpirun: clean termination accomplished
> > >>
> > >> I even checked the logs but i couldn't see any errors...
> > >> Could you suggest a way to overcome/debug this issue..
> > >>
> > >> Thanks for the kind reply..
> > >>
> > >>
> > >> --
> > >> Thanks & Regards,
> > >> D.Venkateswara Rao,
> > >> Software Engineer,One Convergence Devices Pvt Ltd.,
> > >> Jubille Hills,Hyderabad.
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks & Regards,
> > >> D.Venkateswara Rao,
> > >> Software Engineer,One Convergence Devices Pvt Ltd.,
> > >> Jubille Hills,Hyderabad.
> > >>
> > >> _______________________________________________
> > >> users mailing list
> > >> us...@open-mpi.org
> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > >--
> > >Jeff Squyres
> > >jsquy...@cisco.com
> > >For corporate legal information go to:
> > >http://www.cisco.com/web/about/doing_business/legal/cri/
> > >
> > >
> > >_______________________________________________
> > >users mailing list
> > >us...@open-mpi.org
> > >http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> >
> >
> > --
> >  Brian W. Barrett
> >  Dept. 1423: Scalable System Software
> >  Sandia National Laboratories
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to