In this patch, we update the design document to reflect the netlink
based kernel-userspace interface implementation and a few other changes.
I have covered at a high level.

Please feel free to extend the document with more details that you think
got missed out.

Signed-off-by: Nithin Raju <nit...@vmware.com>
---
 datapath-windows/DESIGN |  180 ++++++++++++++++++++++++++++++++--------------
 1 files changed, 125 insertions(+), 55 deletions(-)

diff --git a/datapath-windows/DESIGN b/datapath-windows/DESIGN
index b438c44..f81dad0 100644
--- a/datapath-windows/DESIGN
+++ b/datapath-windows/DESIGN
@@ -1,20 +1,14 @@
                        OVS-on-Hyper-V Design Document
                        ==============================
-There has been an effort in the recent past to develop the Open vSwitch (OVS)
-solution onto multiple hypervisor platforms such as FreeBSD and Microsoft
-Hyper-V. VMware has been working on a OVS solution for Microsoft Hyper-V for
-the past few months and has successfully completed the implementation.
-
-This document provides details of the development effort. We believe this
-document should give enough information to members of the community who are
-curious about the developments of OVS on Hyper-V. The community should also be
-able to get enough information to make plans to leverage the deliverables of
-this effort.
-
-The userspace portion of the OVS has already been ported to Hyper-V and
-committed to the openvswitch repo. So, this document will mostly emphasize on
-the kernel driver, though we touch upon some of the aspects of userspace as
-well.
+There has been an community effort to develop a port of Open vSwitch on
+Microsoft Hyper-V. In this document, we provide details of the development
+effort. We believe this document should give enough information to understand
+the overall design.
+
+The userspace portion of the OVS has been ported to Hyper-V in a separate
+effort, and committed to the openvswitch repo. So, this document will mostly
+emphasize on the kernel driver, though we touch upon some of the aspects of
+userspace as well.
 
 We cover the following topics:
 1. Background into relevant Hyper-V architecture
@@ -80,17 +74,18 @@ has been used to retrieve some of the configuration 
information that OVS needs.
   |      | |  DAEMON/CTL  |       | |           |  |            | |
   +------+-++---+---------+       | +--+------+-+  +----+------++ | +--------+
   |  DPIF-  |   | netdev- |       |    |VIF #1|         |VIF #2|  | |Physical|
-  | Windows |<=>| Windows |       |    +------+         +------+  | |  NIC   |
+  | Netlink |   | Windows |       |    +------+         +------+  | |  NIC   |
   +---------+   +---------+       |      ||                   /\  | +--------+
-User     /\                       |      || *#1*         *#4* ||  |     /\
-=========||=======================+------||-------------------||--+     ||
-Kernel   ||                              \/                   ||  ||=====/
-         \/                           +-----+                 +-----+ *#5*
+User     /\         /\            |      || *#1*         *#4* ||  |     /\
+=========||=========||============+------||-------------------||--+     ||
+Kernel   ||         ||                   \/                   ||  ||=====/
+         \/         \/                +-----+                 +-----+ *#5*
  +-------------------------------+    |     |                 |     |
  |   +----------------------+    |    |     |                 |     |
  |   |   OVS Pseudo Device  |    |    |     |                 |     |
- |   +----------------+-----+    |    |     |                 |     |
- |                               |    |  I  |                 |     |
+ |   +----------------------+    |    |     |                 |     |
+ |      | Netlink Impl. |        |    |     |                 |     |
+ |      -----------------        |    |  I  |                 |     |
  | +------------+                |    |  N  |                 |  E  |
  | |  Flowtable | +------------+ |    |  G  |                 |  G  |
  | +------------+ |  Packet    | |*#2*|  R  |                 |  R  |
@@ -140,7 +135,9 @@ are:
  * Interface between the userspace and the kernel module.
  * Event notifications are significantly different.
  * The communication interface between DPIF and the kernel module need not be
-   implemented in the way OVS on Linux does.
+   implemented in the way OVS on Linux does. That said, it would be
+   advantageous to have a similar interface to the kernel module for reasons of
+   readibility and maintenance.
  * Any licensing issues of using Linux kernel code directly.
 
 Due to these differences, it was a straightforward decision to develop the
@@ -159,13 +156,17 @@ called ovs-wind. At a high level ovs-wind manages keeps 
the ovsdb used by
 userspace in sync with the kernel state. More details in the userspace section.
 
 As explained in the OVS porting design document [7], DPIF is the portion of
-userspace that interfaces with the kernel portion of the OVS. Each platform can
-have its own implementation of the DPIF provider whose interface is defined in
-dpif-provider.h [3]. For OVS on Hyper-V, we have an implementation of DPIF
-provider for Hyper-V. The communication interface between userspace and the
-kernel is a pseudo device and is different from that of the Linux’s DPIF
-provider which uses netlink. But, as long as the DPIF provider interface is the
-same, the callers should be agnostic of the underlying communication interface.
+userspace that interfaces with the kernel portion of the OVS. The interface
+that each DPIF provider has to provide is defined in dpif-provider.h [3].
+Though each platform is allowed to have its own implementation of the DPIF
+provider, it was found out via community feedback than it is a good idea to
+share code whenever possible. Thus, the DPIF provider for OVS on Hyper-V shares
+code with the DPIF provider on Linux. This interface is implemented in
+dpif-netlink.c (formerly dpif-linux.c).
+
+We'll elaborate more on Kernel-Userspace interface in a dedicated section
+below. Here it suffices to say that the the DPIF provider implementation for
+Windows is netlink based and shares code with the Linux one.
 
 2.a) Kernel module (datapath)
 -----------------------------
@@ -208,6 +209,35 @@ the OVS kernel module. This is equivalent to the typical 
character device
 interface on POSIX platforms. The pseudo device supports a whole bunch of
 ioctls that netdev and DPIF on OVS userspace make use of.
 
+Netlink messages
+----------------
+The communication between OVS userspace and OVS kernel datapath is in the form
+of Netlink messages [1]. More on this in the section on Kernel-userspace
+interface (#2.c). In the kernel, a full fledged netlink message parser has been
+implemented along the lines of the netlink message parser in OVS userspace. In
+fact, a lot of the code is ported code.
+
+On the lines of 'struct ofpbuf' in OVS userspace, a managed buffer has been
+implemented in the kernel datapath to make it easier to parse and construct
+netlink messages.
+
+Netlink sockets
+---------------
+On Linux, OVS userspace utilizes netlink sockets to pass back and forth netlink
+messages. Since much of userspace code including DPIF provider in
+dpif-netlink.c (formerly dpif-linux.c) has been reused, pseudo-netlink sockets
+have been implemented in OVS userspace. As it is known, Windows lacks native
+netlink socket support, and also the socket family is not extensible either.
+Hence it is not possible to provide a native implementaion of netlink socket.
+We implement pseudo-netlink sockets in lib/netlink-socket.c that appear to be
+netlink sockets from higher levels. However, the implementation opens a handle
+to the pseudo device for each pseudo-netlink socket. More on this in the
+section on later sections.
+
+Typical netlink semantics of read message, write message, dump, and transaction
+have been implemented so that higher level layers are not affected by the
+netlink implementation not being native.
+
 Switch/Datapath management
 --------------------------
 As explained above, we hook onto the management callback functions in the NDIS
@@ -279,36 +309,72 @@ interface to the OVS kernel driver.
 
 2.c) Kernel-Userspace interface
 -------------------------------
-DPIF-Windows
-------------
-DPIF-Windows is the Windows implementation of the interface defined in dpif-
-provider.h, and provides an interface into the OVS kernel driver. We implement
-most of the callbacks required by the DPIF provider. A quick summary of the
-functionality implemented is as follows:
- * dp_dump, dp_get: dump all datapath information or get information for a
-   particular datapath.  Currently we only support one datapath.
- * flow_dump, flow_put, flow_get, flow_flush: These functions retrieve all
-   flows in the kernel, add a flow to the kernel, get a specific flow and
-   delete all the flows in the kernel.
- * recv_set, recv, recv_wait, recv_purge: these poll packets for upcalls.
- * execute: This is used to send packets from userspace to the kernel. The
-   packets could be either flow miss packet punted from kernel earlier or
-   userspace generated packets.
- * vport_dump, vport_get, ext_info: These functions dump all ports in the
-   kernel, get a specific port in the kernel, or get extended information
-   about a port.
- * event_subscribe, wait, poll: These functions subscribe, wait and poll the
-   events that kernel posts.  A typical example is kernel notices a port has
-   gone up/down, and would like to notify the userspace.
+As explained earlier, OVS on Hyper-V shares the DPIF provider implementation
+with Linux. The DPIF provider on Linux uses Netlink sockets and Netlink
+messages. Netlink sockets and messages are extensively used on Linux to
+exchange information between userspace and kernel. In order to satisfy these
+depdendencies, netlink socket (pseudo and non-native) and netlink messages
+are implemented on Hyper-V.
+
+The following are the major advantages of sharing DPIF provider code:
+1. Maintenance is simpler:
+   Any change made to the interface defined in dpif-provider.h need not be
+   propagated to multiple implementations. Also, developers familiar with the
+   Linux implementation of the DPIF provider can easily ramp on the Hyper-V
+   implementation as well.
+2. Netlink messages provides inherent advantages:
+   Netlink messages are known for their extensiblity. Each message is
+   versioned, so the data structures provide mechanisms to version checking and
+   providing forwards and backwards compatiblity with the kernel module.
+
+openvswitch.h and OvsDpInterfaceExt.h
+-------------------------------------
+Since the DPIF provider is shared with Linux, the kernel datapath provides the
+same interface as the Linux datapath. The interface is defined in
+datapath/linux/compat/include/linux/openvswitch.h. Derivatives of this
+interface file are created during OVS userspace complation. The derivative for
+the kernel datpath on Hyper-V is in the following location:
+datapath-windows/include/OvsDpInterface.h
+
+That said, there are Windows specific extensions that are defined in the
+interface file:
+datapath-windows/include/OvsDpInterfaceExt.h
+
+Netlink sockets
+---------------
+As explained in other sections, a version of netlink sockets has been
+implemented in lib/netlink-socket.c for Windows. The implementation creates a
+handle to the OVS pseudo device, and emulates netlink socket semantics of
+receive message, send message, dump, and transact. Most of the nl_* functions
+are supported.
+
+The fact that the implementation is non-native is demonstrated in various ways.
+One example is that PID for the netlink socket is not automatically created
+when a handle is created to the OVS pseudo device. There's an extra command
+(defined in OvsDpInterfaceExt.h) that is used to grab the PID generated in t
+he kernel.
+
+DPIF provider
+--------------
+As has been alluded to in earlier sections, the netlink socket and netlink
+message based DPIF provider on Linux has been ported to Windows.
+Correspondingly, the file is called lib/dpif-netlink.c now from its former
+name of lib/dpif-linux.c.
+
+Most of the code is common. Some divergence is in the code to receive
+packets. The Linux implementation uses epoll() which is not natively supported
+on Windows.
 
 Netdev-Windows
 --------------
-We have a Windows implementation of the the interface defined in lib/netdev-
-provider.h. The implementation provided functionality to get extended
+We have a Windows implementation of the interface defined in lib/netdev-
+provider.h. The implementation provides functionality to get extended
 information about an interface. It is limited in functionality compared to the
 Linux implementation of the netdev provider and cannot be used to add any
-interfaces in the kernel such as a tap interface.
-
+interfaces in the kernel such as a tap interface or to send/receive packets.
+The netdev-windows implementation uses the datapath interface extensions
+defined in:
+datapath-windows/include/OvsDpInterfaceExt.h
 
 2.d) Flow of a packet
 ---------------------
@@ -369,3 +435,7 @@ 
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366510(v=vs.85).aspx
 http://msdn.microsoft.com/en-us/library/windows/hardware/ff557015(v=vs.85).aspx
 7. How to Port Open vSwitch to New Software or Hardware
 http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=PORTING
+8. Netlink
+http://en.wikipedia.org/wiki/Netlink
+9. epoll
+http://en.wikipedia.org/wiki/Epoll
-- 
1.7.4.1

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to