Re: [OMPI users] General Questions

Matthew Larkin Sat, 5 Mar 2016 09:08:25 -0500 (EST)

> <quote>
> I don't think the Open MPI TCP BTL will pass the SDP socket type when 
> creating sockets -- SDP is much lower performance than native verbs/RDMA.  
> You should use a "native" interface to your RDMA network instead (which one 
> you use depends on which kind of network you have).
> </quote>
> 
> I have a rather naive follow-up question along this line: why is there not a 
> native mode for (garden variety) > Ethernet?

> There's at least three things that Ethernet-based networks do for 
> acceleration / low latency:

> 1. Bypass the OS for injecting and receiving network packets
> 2. Use a wire protocol other than TCP
> 3. Include other offload functionality (e.g., RDMA, or RDMA-like capabilities)

> Enabling these things typically requires additional support from the NIC's 
> drivers and/or firmware.  Hence, > you typically can't just take any old 
> Ethernet NIC and expect that the above three things work.

> Several Ethernet NIC vendors have enabled these kinds of things in their NICs 
> (e.g., I am on the usNIC > team at Cisco, where we enable these things on the 
> Cisco NIC in our UCS server line).

> There was a project a few years ago called OpenMX that used the generic 
> Ethernet driver in Linux to > accomplish #2 for just about any Ethernet NIC, 
> but it never really caught on, and has since bit-rotted.

> Is it because it lacks the end-to-end guarantees of TCP, Infiniband and the 
> like? These days, switched > Ethernet is very reliable, isn't it? (I mean by 
> the rate of packet drop because of congestion). So if the > application only 
> needs data chunks of around 8KB max, which would not need to be fragmented 
> (using > jumbo frames), won't a native ethernet be much more efficient?

> The Cisco usNIC stack was initially OS-bypass injection of simple L2 Ethernet 
> frames.  It did all of its own > retransmission and whatnot in Open MPI 
> itself (*all* network types have drops and/or frame corruption, > due to 
> congestion and lots of other every day kinds of traffic management -- *some* 
> layer in the network > has to handle such drops/retransmits if you want them 
> to look like they are reliable to a higher level in the > stack).  

> We eventually "upgraded" usNIC to the UDP wire protocol because our customers 
> told us that they want > to switch usNIC traffic around L3 networks in their 
> datacenter.  We typically use jumbo frames to get good > bandwidth.  The 
> addition of a few bytes per packet (i.e., the size comparison of a raw L2 
> ethernet frame > vs. a UDP packet) is typically not enough to affect the 
> bandwidth curve for large packets -- especially > when using jumbo frames.  
> Additionally, Cisco gear switches L2 and L3 packets at exactly the same 
> >speed, so we don't lose any native fabric performance by upgrading from L2 
> frames to UDP packets.
I am fairly incompetent with anything other than the TCP/IP stack used in most 
OS, so my first instinct is to use as TCP as it is all I am familiar with. I am 
also working in the embedded world where a cluster/domain may only involve one 
type of interconnect, where Ethernet is used for one situation and PCIe for 
something else. Essentially native Ethernet may not be an option all the time.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28616.php
Re: [OMPI users] General Questions

Reply via email to