On Fri, Jul 07, 2006 at 06:53:20AM +0000, David Miller wrote: > > What I am saying, however, is that we need to understand the > technology and the hooks you guys want before we put any of it in.
Yes indeed. Here is what I've understood so far so let's see if we can start building a censensus. 1) RDMA over straight Infiniband is not contentious. In this case no IP networking is involved. 2) RDMA over TCP/IP (or SCTP) can theoretically run on any network that supported IP, including Infiniband and Ethernet. 3) When RDMA over TCP is completely done in hardware, i.e., it has its own IP address, MAC address, and simply presents an RDMA interface (whatever that may be) to Linux, we're OK with it. This is similar to how some iSCSI adapters work. 4) When RDMA over TCP is done completely in the Linux networking stack, we don't have a problem because the existing TCP stack is still in charge. However, this is pretty pointless. 5) RDMA over TCP on the receive side is offloaded into the NIC. This allows the NIC to directly place data into the application's buffer. We're starting to have a little bit of a problem because it means that part of the incoming IP traffic is now being directly processed by the NIC, with no input from the Linux TCP/IP stack. However, as long as the connection establishment/acks are still controlled/seen by Linux we can probably live with it. 6) RDMA over TCP on the transmit side is offloaded into the NIC. This is starting to look very worrying. The reason is that we lose all control to crucial aspects of TCP like congestion control. It is now completely up to the NIC to do that. For straight RDMA over Infiniband this isn't an issue because the traffic is not likely to travel across the Internet. However, for RDMA over TCP, one of their goals is to support sending traffic over the Internet so this is a concern. Incidentally, this is why they need to know about things like MAC/route/MTU changing. 7) RDMA over TCP is completely offloaded into the NIC, however, they still use Linux's IP address, MAC address, and rely on us to tell it about events such as MTU updates or MAC changes. In addition to the problems we have in 5) and 6), we now have a portion of TCP port space which has suddenly become invisible to Linux. What's more, we lose control (e.g., netfilter) over what connections may or may not be established. So to my mind, RDMA over TCP is most problematic when it shares the same IP/MAC address as the Linux host, and when the transmit side and/or the connection establishment (case 6 and 7) is offloaded into the NIC. This also happens to be the only scenario where they need the notification patch that started all this discussion. BTW, this URL gives an interesting perspective on RDMA over TCP (particularly Q14/Q15): http://www.rdmaconsortium.org/home/FAQs_Apr25.htm Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html