> What happens if you run 2 ibv_rc_pingpong's on each node? Or N > ibv_rc_pingpongs?
With 11 ibv_rc_pingpong's http://pastebin.com/85sPcA47 Code to do that => https://gist.github.com/1233173 Latencies are around 20 microseconds. My job seems to do well so far with ofud ! [sboisver12@colosse2 ray]$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 3047460 0.55384 fish-Assem sboisver12 r 09/21/2011 15:02:25 med@r104-n58 256 > ________________________________________ > De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de > Jeff Squyres [jsquy...@cisco.com] > Date d'envoi : 21 septembre 2011 15:28 > À : Open MPI Users > Objet : Re: [OMPI users] RE : Latency of 250 microseconds with Open-MPI > 1.4.3, Mellanox Infiniband and 256 MPI ranks > > On Sep 21, 2011, at 3:17 PM, Sébastien Boisvert wrote: > >> Meanwhile, I contacted some people at SciNet, which is also part of Compute >> Canada. >> >> They told me to try Open-MPI 1.4.3 with the Intel compiler with --mca btl >> self,ofud to use the ofud BTL instead of openib for OpenFabrics transport. >> >> This worked quite good -- I got a low latency of 35 microseconds. Yay ! > > That's still pretty terrible. > > Per your comments below, yes, ofud was never finished. I believe it doesn't > have retransmission code in there, so if anything is dropped by the network > (which, in a congested/busy network, there will be drops), the job will > likely hang. > > The ofud and openib BTLs should have similar latencies. Indeed, openib > should actually have slightly lower HRT ping-pong latencies because of > protocol and transport differences between the two. > > The openib BTL should give about the same latency as the ibv_rc_pingpong, > which you cited at about 11 microseconds (I assume there must be multiple > hops in that IB network to be that high), which jives with your "only 1 > process sends" RAY network test (http://pastebin.com/dWMXsHpa). > > It's not uncommon for latency to go up if multiple processes are all banging > on the HCA, but it shouldn't go up noticeably if there's only 2 processes on > each node doing simple ping-pong tests, for example. > > What happens if you run 2 ibv_rc_pingpong's on each node? Or N > ibv_rc_pingpongs? > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >