On Sep 21, 2011, at 3:17 PM, Sébastien Boisvert wrote: > Meanwhile, I contacted some people at SciNet, which is also part of Compute > Canada. > > They told me to try Open-MPI 1.4.3 with the Intel compiler with --mca btl > self,ofud to use the ofud BTL instead of openib for OpenFabrics transport. > > This worked quite good -- I got a low latency of 35 microseconds. Yay !
That's still pretty terrible. Per your comments below, yes, ofud was never finished. I believe it doesn't have retransmission code in there, so if anything is dropped by the network (which, in a congested/busy network, there will be drops), the job will likely hang. The ofud and openib BTLs should have similar latencies. Indeed, openib should actually have slightly lower HRT ping-pong latencies because of protocol and transport differences between the two. The openib BTL should give about the same latency as the ibv_rc_pingpong, which you cited at about 11 microseconds (I assume there must be multiple hops in that IB network to be that high), which jives with your "only 1 process sends" RAY network test (http://pastebin.com/dWMXsHpa). It's not uncommon for latency to go up if multiple processes are all banging on the HCA, but it shouldn't go up noticeably if there's only 2 processes on each node doing simple ping-pong tests, for example. What happens if you run 2 ibv_rc_pingpong's on each node? Or N ibv_rc_pingpongs? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/