Hi Nithin,

I have only one minor comment. Please see inline.
Other than that, looks good to me.

Acked-by: Sorin Vinturis <svintu...@cloudbasesolutions.com> 

Thanks,
Sorin

-----Original Message-----
From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Nithin Raju
Sent: Friday, 21 November, 2014 02:28
To: dev@openvswitch.org
Subject: [ovs-dev] [PATCH v2] datapath-windows: update DESIGN document

In this patch, we update the design document to reflect the netlink based 
kernel-userspace interface implementation and a few other changes.
I have covered at a high level.

Please feel free to extend the document with more details that you think got 
missed out.

Signed-off-by: Nithin Raju <nit...@vmware.com>
---
 datapath-windows/DESIGN |  260 +++++++++++++++++++++++++++++-----------------
 1 files changed, 164 insertions(+), 96 deletions(-)

diff --git a/datapath-windows/DESIGN b/datapath-windows/DESIGN index 
b438c44..638990d 100644
--- a/datapath-windows/DESIGN
+++ b/datapath-windows/DESIGN
@@ -1,20 +1,13 @@
                        OVS-on-Hyper-V Design Document
                        ============================== -There has been an 
effort in the recent past to develop the Open vSwitch (OVS) -solution onto 
multiple hypervisor platforms such as FreeBSD and Microsoft -Hyper-V. VMware 
has been working on a OVS solution for Microsoft Hyper-V for -the past few 
months and has successfully completed the implementation.
-
-This document provides details of the development effort. We believe this 
-document should give enough information to members of the community who are 
-curious about the developments of OVS on Hyper-V. The community should also be 
-able to get enough information to make plans to leverage the deliverables of 
-this effort.
-
-The userspace portion of the OVS has already been ported to Hyper-V and 
-committed to the openvswitch repo. So, this document will mostly emphasize on 
-the kernel driver, though we touch upon some of the aspects of userspace as 
-well.
+There has been a community effort to develop Open vSwitch on Microsoft Hyper-V.
+In this document, we provide details of the development effort. We 
+believe this document should give enough information to understand the overall 
design.
+
+The userspace portion of the OVS has been ported to Hyper-V in a 
+separate effort, and committed to the openvswitch repo. So, this 
+document will mostly emphasize on the kernel driver, though we touch 
+upon some of the aspects of userspace as well.
 
 We cover the following topics:
 1. Background into relevant Hyper-V architecture @@ -48,13 +41,13 @@ In 
Hyper-V, the virtual machine is called the Child Partition. Each VIF or  
physical NIC on the Hyper-V extensible switch is attached via a port. Each port 
 is both on the ingress path or the egress path of the switch. The ingress path 
 is used for packets being sent out of a port, and egress is used for packet 
-being received on a port. By design, NDIS provides a layered interface, where 
-in the ingress path, higher level layers call into lower level layers, and on 
-the egress path, it is the other way round. In addition, there is a object 
-identifier (OID) interface for control operations Eg. addition of a port. The 
-workflow for the calls is similar in nature to the packets, where higher level 
-layers call into the lower level layers. A good representational diagram of 
-this architecture is in [4].
+being received on a port. By design, NDIS provides a layered interface. 
+In this layered interface, higher level layers call into lower level 
+layers, in the ingress path. In the egress path, it is the other way 
+round. In addition, there is a object identifier (OID) interface for 
+control operations Eg. addition of a port. The workflow for the calls 
+is similar in nature to the packets, where higher level layers call 
+into the lower level layers. A good representational diagram of this 
architecture is in [4].
 
 Windows Filtering Platform (WFP)[5] is a platform implemented on Hyper-V that  
provides APIs and services for filtering packets. WFP has been utilized to @@ 
-75,22 +68,23 @@ has been used to retrieve some of the configuration 
information that OVS needs.
                                   |                               |
   +------+ +--------------+       | +-----------+  +------------+ |
   |      | |              |       | |           |  |            | |
-  | OVS- | |     OVS      |       | | Virtual   |  | Virtual    | |
-  | wind | |  USERSPACE   |       | | Machine #1|  | Machine #2 | |
-  |      | |  DAEMON/CTL  |       | |           |  |            | |
+  | ovs- | |     OVS-     |       | | Virtual   |  | Virtual    | |
+  | *ctl | |  USERSPACE   |       | | Machine #1|  | Machine #2 | |
+  |      | |    DAEMON    |       | |           |  |            | |
   +------+-++---+---------+       | +--+------+-+  +----+------++ | +--------+
-  |  DPIF-  |   | netdev- |       |    |VIF #1|         |VIF #2|  | |Physical|
-  | Windows |<=>| Windows |       |    +------+         +------+  | |  NIC   |
+  |  dpif-  |   | netdev- |       |    |VIF #1|         |VIF #2|  | |Physical|
+  | netlink |   | windows |       |    +------+         +------+  | |  NIC   |
   +---------+   +---------+       |      ||                   /\  | +--------+
-User     /\                       |      || *#1*         *#4* ||  |     /\
-=========||=======================+------||-------------------||--+     ||
-Kernel   ||                              \/                   ||  ||=====/
-         \/                           +-----+                 +-----+ *#5*
+User     /\         /\            |      || *#1*         *#4* ||  |     /\
+=========||=========||============+------||-------------------||--+     ||
+Kernel   ||         ||                   \/                   ||  ||=====/
+         \/         \/                +-----+                 +-----+ *#5*
  +-------------------------------+    |     |                 |     |
  |   +----------------------+    |    |     |                 |     |
  |   |   OVS Pseudo Device  |    |    |     |                 |     |
- |   +----------------+-----+    |    |     |                 |     |
- |                               |    |  I  |                 |     |
+ |   +----------------------+    |    |     |                 |     |
+ |      | Netlink Impl. |        |    |     |                 |     |
+ |      -----------------        |    |  I  |                 |     |
  | +------------+                |    |  N  |                 |  E  |
  | |  Flowtable | +------------+ |    |  G  |                 |  G  |
  | +------------+ |  Packet    | |*#2*|  R  |                 |  R  |
@@ -110,9 +104,8 @@ Kernel   ||                              \/                 
  ||  ||=====/
 Figure 2 shows the various blocks involved in the OVS Windows implementation,  
along with some of the components available in the NDIS stack, and also the  
virtual machines. The workflow of a packet being transmitted from a VIF out and 
-into another VIF and to a physical NIC is also shown. New userspace components 
-being added as also shown. Later on in this section, we’ll discuss the flow of 
-a packet at a high level.
+into another VIF and to a physical NIC is also shown. Later on in this 
+section, we will discuss the flow of a packet at a high level.
 
 The figure gives a general idea of where the OVS userspace and the kernel  
components fit in, and how they interface with each other.
@@ -122,9 +115,11 @@ a forwarding extension roughly implementing the following  
sub-modules/functionality. Details of each of these sub-components in the  
kernel are contained in later sections:
  * Interfacing with the NDIS stack
+ * Netlink message parser
+ * Netlink sockets
  * Switch/Datapath management
  * Interfacing with userspace portion of the OVS solution to implement the
-   necessary ioctls that userspace needs
+   necessary functionality that userspace needs
  * Port management
  * Flowtable/Actions/packet forwarding
  * Tunneling
@@ -140,32 +135,36 @@ are:
  * Interface between the userspace and the kernel module.
  * Event notifications are significantly different.
  * The communication interface between DPIF and the kernel module need not be
-   implemented in the way OVS on Linux does.
+   implemented in the way OVS on Linux does. That said, it would be
+   advantageous to have a similar interface to the kernel module for reasons of
+   readability and maintainability.
  * Any licensing issues of using Linux kernel code directly.
 
 Due to these differences, it was a straightforward decision to develop the  
datapath for OVS on Hyper-V from scratch rather than porting the one on Linux.
-A re-development focussed on the following goals:
+A re-development focused on the following goals:
  * Adhere to the existing requirements of userspace portion of OVS (such as
-   ovs- vswitchd), to minimize changes in the userspace workflow.
+   ovs-vswitchd), to minimize changes in the userspace workflow.
  * Fit well into the typical workflow of a Hyper-V extensible switch forwarding
    extension.
 
 The userspace portion of the OVS solution is mostly POSIX code, and not very 
-Linux specific. Majority of the code has already been ported and committed to 
-the openvswitch repo. Most of the daemons such as ovs-vswitchd or ovsdb-server 
-can run on Windows now. One additional daemon that has been implemented is 
-called ovs-wind. At a high level ovs-wind manages keeps the ovsdb used by 
-userspace in sync with the kernel state. More details in the userspace section.
+Linux specific. Majority of the userspace code does not interface 
+directly with the kernel datapath and was ported independently of the 
+kernel datapath effort.
 
 As explained in the OVS porting design document [7], DPIF is the portion of 
-userspace that interfaces with the kernel portion of the OVS. Each platform 
can -have its own implementation of the DPIF provider whose interface is 
defined in -dpif-provider.h [3]. For OVS on Hyper-V, we have an implementation 
of DPIF -provider for Hyper-V. The communication interface between userspace 
and the -kernel is a pseudo device and is different from that of the Linux’s 
DPIF -provider which uses netlink. But, as long as the DPIF provider interface 
is the -same, the callers should be agnostic of the underlying communication 
interface.
+userspace that interfaces with the kernel portion of the OVS. The 
+interface that each DPIF provider has to implement is defined in 
dpif-provider.h [3].
+Though each platform is allowed to have its own implementation of the 
+DPIF provider, it was found, via community feedback, than it is desired 
[Sorin]: "than it is desired" --> "that it is desired"

+to share code whenever possible. Thus, the DPIF provider for OVS on 
+Hyper-V shares code with the DPIF provider on Linux. This interface is 
+implemented in dpif-netlink.c, formerly dpif-linux.c.
+
+We'll elaborate more on kernel-userspace interface in a dedicated 
+section below. Here it suffices to say that the DPIF provider 
+implementation for Windows is netlink-based and shares code with the Linux one.
 
 2.a) Kernel module (datapath)
 -----------------------------
@@ -178,8 +177,8 @@ This is consistent with using a single datapath in the 
kernel on Linux. All the  physical adapters are connected as external adapters 
to the extensible switch.
 
 When the OVS switch extension registers itself as a filter driver, it also 
-registers callbacks for the switch management and datapath functions. In other 
-words, when a switch is created on the Hyper-V root partition (host), the
+registers callbacks for the switch/port management and datapath 
+functions. In other words, when a switch is created on the Hyper-V root 
+partition (host), the
 extension gets an activate callback upon which it can initialize the data  
structures necessary for OVS to function. Similarly, there are callbacks for  
when a port gets added to the Hyper-V switch, and an External Network adapter 
@@ -190,7 +189,7 @@ packet is received on an external NIC.
 As shown in the figures, an extensible switch extension gets to see a packet  
sent by the VM (VIF) twice - once on the ingress path and once on the egress  
path. Forwarding decisions are to be made on the ingress path. Correspondingly, 
-we’ll be hooking onto the following interfaces:
+we will be hooking onto the following interfaces:
  * Ingress send indication: intercept packets for performing flow based
    forwarding.This includes straight forwarding to output ports. Any packet
    modifications needed to be performed are done here either inline or by @@ 
-203,11 +202,41 @@ we’ll be hooking onto the following interfaces:
 
 Interfacing with OVS userspace
 ------------------------------
-We’ve implemented a pseudo device interface for letting OVS userspace talk to
+We have implemented a pseudo device interface for letting OVS userspace 
+talk to
 the OVS kernel module. This is equivalent to the typical character device 
-interface on POSIX platforms. The pseudo device supports a whole bunch of
+interface on POSIX platforms where we can register custom functions for 
+read, write and ioctl functionality. The pseudo device supports a whole 
+bunch of
 ioctls that netdev and DPIF on OVS userspace make use of.
 
+Netlink message parser
+----------------------
+The communication between OVS userspace and OVS kernel datapath is in 
+the form of Netlink messages [1]. More details about this are provided 
+in #2.c section, kernel-userspace interface. In the kernel, a full 
+fledged netlink message parser has been implemented along the lines of 
+the netlink message parser in OVS userspace. In fact, a lot of the code is 
ported code.
+
+On the lines of 'struct ofpbuf' in OVS userspace, a managed buffer has 
+been implemented in the kernel datapath to make it easier to parse and 
+construct netlink messages.
+
+Netlink sockets
+---------------
+On Linux, OVS userspace utilizes netlink sockets to pass back and forth 
+netlink messages. Since much of userspace code including DPIF provider 
+in dpif-netlink.c (formerly dpif-linux.c) has been reused, 
+pseudo-netlink sockets have been implemented in OVS userspace. As it is 
+known, Windows lacks native netlink socket support, and also the socket family 
is not extensible either.
+Hence it is not possible to provide a native implementation of netlink socket.
+We emulate netlink sockets in lib/netlink-socket.c and support all of 
+the nl_* APIs to higher levels. The implementation opens a handle to 
+the pseudo device for each netlink socket. Some more details on this 
+topic are provided in the userspace section on netlink sockets.
+
+Typical netlink semantics of read message, write message, dump, and 
+transaction have been implemented so that higher level layers are not 
+affected by the netlink implementation not being native.
+
 Switch/Datapath management
 --------------------------
 As explained above, we hook onto the management callback functions in the NDIS 
@@ -267,48 +296,83 @@ used.
 
 2.b) Userspace components
 -------------------------
-A new daemon has been added to userspace to manage the entities in OVSDB, and 
-also to keep it in sync with the kernel state, and this include bridges, 
-physical NICs, VIFs etc. For example, upon bootup, ovs-wind does a get on the 
-kernel to get a list of the bridges, and the corresponding ports and populates 
-OVSDB. If a new VIF gets added to the kernel switch because a user powered on 
a -Virtual Machine, ovs-wind detects it, and adds a corresponding entry in the 
-ovsdb. This implies that ovs-wind has a synchronous as well as an asynchronous 
-interface to the OVS kernel driver.
-
+The userspace portion of the OVS solution is mostly POSIX code, and not 
+very Linux specific. Majority of the userspace code does not interface 
+directly with the kernel datapath and was ported independently of the 
+kernel datapath effort.
+
+In this section, we cover the userspace components that interface with 
+the kernel datapath.
+
+As explained earlier, OVS on Hyper-V shares the DPIF provider 
+implementation with Linux. The DPIF provider on Linux uses netlink 
+sockets and netlink messages. Netlink sockets and messages are 
+extensively used on Linux to exchange information between userspace and 
+kernel. In order to satisfy these dependencies, netlink socket (pseudo 
+and non-native) and netlink messages are implemented on Hyper-V.
+
+The following are the major advantages of sharing DPIF provider code:
+1. Maintenance is simpler:
+   Any change made to the interface defined in dpif-provider.h need not be
+   propagated to multiple implementations. Also, developers familiar with the
+   Linux implementation of the DPIF provider can easily ramp on the Hyper-V
+   implementation as well.
+2. Netlink messages provides inherent advantages:
+   Netlink messages are known for their extensibility. Each message is
+   versioned, so the provided data structures offer a mechanism to perform
+   version checking and forward/backward compatibility with the kernel
+   module.
+
+Netlink sockets
+---------------
+As explained in other sections, an emulation of netlink sockets has 
+been implemented in lib/netlink-socket.c for Windows. The 
+implementation creates a handle to the OVS pseudo device, and emulates 
+netlink socket semantics of receive message, send message, dump, and 
+transact. Most of the nl_* functions are supported.
+
+The fact that the implementation is non-native manifests in various ways.
+One example is that PID for the netlink socket is not automatically 
+assigned in userspace when a handle is created to the OVS pseudo 
+device. There's an extra command (defined in OvsDpInterfaceExt.h) that 
+is used to grab the PID generated in the kernel.
+
+DPIF provider
+--------------
+As has been mentioned in earlier sections, the netlink socket and 
+netlink message based DPIF provider on Linux has been ported to Windows.
+Correspondingly, the file is called lib/dpif-netlink.c now from its 
+former name of lib/dpif-linux.c.
 
-2.c) Kernel-Userspace interface
--------------------------------
-DPIF-Windows
-------------
-DPIF-Windows is the Windows implementation of the interface defined in dpif- 
-provider.h, and provides an interface into the OVS kernel driver. We implement 
-most of the callbacks required by the DPIF provider. A quick summary of the 
-functionality implemented is as follows:
- * dp_dump, dp_get: dump all datapath information or get information for a
-   particular datapath.  Currently we only support one datapath.
- * flow_dump, flow_put, flow_get, flow_flush: These functions retrieve all
-   flows in the kernel, add a flow to the kernel, get a specific flow and
-   delete all the flows in the kernel.
- * recv_set, recv, recv_wait, recv_purge: these poll packets for upcalls.
- * execute: This is used to send packets from userspace to the kernel. The
-   packets could be either flow miss packet punted from kernel earlier or
-   userspace generated packets.
- * vport_dump, vport_get, ext_info: These functions dump all ports in the
-   kernel, get a specific port in the kernel, or get extended information
-   about a port.
- * event_subscribe, wait, poll: These functions subscribe, wait and poll the
-   events that kernel posts.  A typical example is kernel notices a port has
-   gone up/down, and would like to notify the userspace.
+Most of the code is common. Some divergence is in the code to receive 
+packets. The Linux implementation uses epoll() which is not natively 
+supported on Windows.
 
 Netdev-Windows
 --------------
-We have a Windows implementation of the the interface defined in lib/netdev- 
-provider.h. The implementation provided functionality to get extended 
-information about an interface. It is limited in functionality compared to the 
-Linux implementation of the netdev provider and cannot be used to add any 
-interfaces in the kernel such as a tap interface.
+We have a Windows implementation of the interface defined in 
+lib/netdev-provider.h. The implementation provides functionality to get 
+extended information about an interface. It is limited in functionality 
+compared to the Linux implementation of the netdev provider and cannot 
+be used to add any interfaces in the kernel such as a tap interface or 
+to send/receive packets. The netdev-windows implementation uses the 
+datapath interface extensions defined in:
+datapath-windows/include/OvsDpInterfaceExt.h
 
+2.c) Kernel-Userspace interface
+-------------------------------
+openvswitch.h and OvsDpInterfaceExt.h
+-------------------------------------
+Since the DPIF provider is shared with Linux, the kernel datapath 
+provides the same interface as the Linux datapath. The interface is 
+defined in datapath/linux/compat/include/linux/openvswitch.h. 
+Derivatives of this interface file are created during OVS userspace 
+compilation. The derivative for the kernel datapath on Hyper-V is provided in 
the following location:
+datapath-windows/include/OvsDpInterface.h
+
+That said, there are Windows specific extensions that are defined in 
+the interface file:
+datapath-windows/include/OvsDpInterfaceExt.h
 
 2.d) Flow of a packet
 ---------------------
@@ -354,9 +418,9 @@ driver.
 
 Reference list:
 ===============
-1: Hyper-V Extensible Switch
+1. Hyper-V Extensible Switch
 http://msdn.microsoft.com/en-us/library/windows/hardware/hh598161(v=vs.85).aspx
-2: Hyper-V Extensible Switch Extensions
+2. Hyper-V Extensible Switch Extensions
 http://msdn.microsoft.com/en-us/library/windows/hardware/hh598169(v=vs.85).aspx
 3. DPIF Provider
 http://openvswitch.sourcearchive.com/documentation/1.1.0-1/dpif-
@@ -369,3 +433,7 @@ 
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366510(v=vs.85).aspx
 http://msdn.microsoft.com/en-us/library/windows/hardware/ff557015(v=vs.85).aspx
 7. How to Port Open vSwitch to New Software or Hardware  
http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=PORTING
+8. Netlink
+http://en.wikipedia.org/wiki/Netlink
+9. epoll
+http://en.wikipedia.org/wiki/Epoll
--
1.7.4.1

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to