This patch adds support for a new port type to the userspace datapath
called dpdkvhostuser. It adds to the existing infrastructure of
vhost-cuse, however disables vhost-cuse ports in favour of vhost-user
ports.

A new dpdkvhostuser port will create a unix domain socket which when
provided to QEMU is used to facilitate communication between the
virtio-net device on the VM and the OVS port.

Signed-off-by: Ciara Loftus <ciara.lof...@intel.com>
---
 INSTALL.DPDK.md         | 115 ++++++++++++++++++++++++++++++++++++------------
 acinclude.m4            |  13 ++++++
 configure.ac            |   1 +
 lib/netdev-dpdk.c       |  52 ++++++++++++++++++++--
 lib/netdev.c            |   4 ++
 vswitchd/ovs-vswitchd.c |   2 +
 6 files changed, 155 insertions(+), 32 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 4deeb87..bc2a5a2 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -294,47 +294,106 @@ the vswitchd.
 DPDK vhost:
 -----------
 
-vhost-cuse is only supported at present i.e. not using the standard QEMU
-vhost-user interface. It is intended that vhost-user support will be added
-in future releases when supported in DPDK and that vhost-cuse will eventually
-be deprecated. See [DPDK Docs] for more info on vhost.
+DPDK 2.0 supports two types of vhost:
+1.  vhost-user
+2.  vhost-cuse
+
+By default, vhost-user is enabled in DPDK and following this, the same
+applies for OVS.
+
+Should you wish to use vhost-cuse instead of vhost-user, you must do the
+following:
+1.  Enable vhost-cuse in DPDK and re-build. At the moment this can be
+    achieved by modifying the `$RTE_SDK/lib/librte_vhost/Makefile` file
+    and commenting out vhost-user SRCS and uncommenting vhost-cuse SRCS:
+
+      `SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c 
vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c
+       # SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c 
vhost_user/virtio-net-user.c vhost_user/fd_man.c`
+
+2.  Enable vhost-cuse in OVS and re-build. This can be achieved by using
+    the `--with-vhostcuse` flag in the `./configure` step like so:
+
+      `./configure --with-dpdk=$DPDK_BUILD --with-vhostcuse`
 
 Prerequisites:
-1.  DPDK 1.8 with vhost support enabled and recompile OVS as above.
+1.  DPDK 2.0 with vhost support enabled and recompile OVS as above.
 
      Update `config/common_linuxapp` so that DPDK is built with vhost
      libraries:
 
      `CONFIG_RTE_LIBRTE_VHOST=y`
 
-2.  Insert the Cuse module:
+2. (Optional)If using vhost-cuse:
+
+    2.1  Insert the Cuse module:
 
-      `modprobe cuse`
+          `modprobe cuse`
 
-3.  Build and insert the `eventfd_link` module:
+    2.2  Build and insert the `eventfd_link` module:
 
-     `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
-     `make`
-     `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
+          `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
+          `make`
+          `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
 
 Following the steps above to create a bridge, you can now add DPDK vhost
-as a port to the vswitch.
+as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost ports can have
+arbitrary names.
+
+When adding vhost ports to the switch, take care depending on which
+type of vhost you are using.
 
-`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
+  -  For vhost-user (default), the name of the port type is `dpdkvhostuser`
 
-Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names:
+      `ovs-ofctl add-port br0 vhost-user1 -- set Interface vhost-user1 
type=dpdkvhostuser`
 
-`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
+     This action creates a socket located at `/tmp/vhost-user1`, which you
+     must provide to your VM on the QEMU command line. More instructions
+     on this can be found in the next section "DPDK vhost-user VM
+     configuration"
 
-However, please note that when attaching userspace devices to QEMU, the
-name provided during the add-port operation must match the ifname parameter
-on the QEMU command line.
+  -  For vhost-cuse, the name of the port type is `dpdkvhost`
 
+      `ovs-ofctl add-port br0 vhost-cuse1 -- set Interface vhost-cuse1 
type=dpdkvhost`
 
-DPDK vhost VM configuration:
-----------------------------
+     When attaching vhost-cuse ports to QEMU, the name provided during the
+     add-port operation must match the ifname parameter on the QEMU command
+     line. More instructions on this can be found in the section "DPDK
+     vhost-cuse VM configuration"
+
+DPDK vhost-user VM configuration:
+---------------------------------
+DPDK vhost-user works with QEMU v2.2.0. Follow the steps below to attach
+vhost-user port(s) to a VM.
+
+1. Configure sockets.
+   Pass the following parameters to QEMU to attach a vhost-user device.
+
+     ```
+     -chardev socket,id=char0,path=/tmp/vhost-user-1
+     -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce
+     -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
+     ```
 
-   vhost ports use a Linux* character device to communicate with QEMU.
+     ...where vhost-user-1 is the name of the vhost-user port added
+     to the switch.
+     Repeat the above parameters for multiple devices, changing the
+     chardev path and id as necessary.
+
+2. Configure huge pages:
+   QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
+   virtio-net device's virtual rings and packet buffers mapping the VM's
+   physical memory on hugetlbfs. To enable vhost-ports to map the VM's
+   memory into their process address space, pass the following parameters
+   to QEMU:
+
+     `-object 
memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
+     -numa node,memdev=mem -mem-prealloc`
+
+
+DPDK vhost-cuse VM configuration:
+---------------------------------
+
+   vhost-cuse ports use a Linux* character device to communicate with QEMU.
    By default it is set to `/dev/vhost-net`. It is possible to reuse this
    standard device for DPDK vhost, which makes setup a little simpler but it
    is better practice to specify an alternative character device in order to
@@ -400,7 +459,7 @@ DPDK vhost VM configuration:
    QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
    virtio-net device's virtual rings and packet buffers mapping the VM's
    physical memory on hugetlbfs. To enable vhost-ports to map the VM's
-   memory into their process address space, pass the following paramters
+   memory into their process address space, pass the following parameters
    to QEMU:
 
      `-object memory-backend-file,id=mem,size=4096,mem-path=/dev/hugepages,
@@ -415,8 +474,8 @@ the guest must be built without any modifications to the 
default
    `CONFIG_RTE_LIBRTE_VHOST=n`
 
 
-DPDK vhost VM configuration with QEMU wrapper:
-----------------------------------------------
+DPDK vhost-cuse VM configuration with QEMU wrapper:
+---------------------------------------------------
 
 The QEMU wrapper script automatically detects and calls QEMU with the
 necessary parameters. It performs the following actions:
@@ -442,8 +501,8 @@ qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 
-smp 4
   script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci,
   netdev=net1,mac=00:00:00:00:00:01
 
-DPDK vhost VM configuration with libvirt:
------------------------------------------
+DPDK vhost-cuse VM configuration with libvirt:
+----------------------------------------------
 
 If you are using libvirt, you must enable libvirt to access the character
 device by adding it to controllers cgroup for libvirtd using the following
@@ -517,7 +576,7 @@ Now you may launch your VM using virt-manager, or like so:
 
     `virsh create my_vhost_vm.xml`
 
-DPDK vhost VM configuration with libvirt and QEMU wrapper:
+DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
 ----------------------------------------------------------
 
 To use the qemu-wrapper script in conjuntion with libvirt, follow the
@@ -545,7 +604,7 @@ steps in the previous section before proceeding with the 
following steps:
   the correct emulator location and set any additional options. If you are
   using a alternative character device name, please set "us_vhost_path" to the
   location of that device. The script will automatically detect and insert
-  the correct "vhostfd" value in the QEMU command line arguements.
+  the correct "vhostfd" value in the QEMU command line arguments.
 
   5. Use virt-manager to launch the VM
 
diff --git a/acinclude.m4 b/acinclude.m4
index 18598b3..2113dfb 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -224,6 +224,19 @@ AC_DEFUN([OVS_CHECK_DPDK], [
   AM_CONDITIONAL([DPDK_NETDEV], test -n "$RTE_SDK")
 ])
 
+dnl OVS_CHECK_VHOST_CUSE
+dnl
+dnl Enable DPDK vhost-cuse support in favour of vhost-user
+AC_DEFUN([OVS_CHECK_VHOST_CUSE], [
+  AC_ARG_WITH(vhostcuse,
+              [AC_HELP_STRING([--with-vhostcuse],
+                              [Enable DPDK vhost-cuse])])
+
+  if test X"$with_vhostcuse" != X; then
+    AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse support enabled, vhost-user 
disabled.])
+  fi
+])
+
 dnl OVS_GREP_IFELSE(FILE, REGEX, [IF-MATCH], [IF-NO-MATCH])
 dnl
 dnl Greps FILE for REGEX.  If it matches, runs IF-MATCH, otherwise IF-NO-MATCH.
diff --git a/configure.ac b/configure.ac
index 8d47eb9..14c4b35 100644
--- a/configure.ac
+++ b/configure.ac
@@ -163,6 +163,7 @@ AC_ARG_VAR(KARCH, [Kernel Architecture String])
 AC_SUBST(KARCH)
 OVS_CHECK_LINUX
 OVS_CHECK_DPDK
+OVS_CHECK_VHOST_CUSE
 OVS_CHECK_PRAGMA_MESSAGE
 AC_SUBST([OVS_CFLAGS])
 AC_SUBST([OVS_LDFLAGS])
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8714c52..4c86080 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -86,11 +86,17 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
 #define TX_HTHRESH 0  /* Default values of TX host threshold reg. */
 #define TX_WTHRESH 0  /* Default values of TX write-back threshold reg. */
 
+#ifdef VHOST_CUSE
 #define MAX_BASENAME_SZ NAME_MAX /* Maximum character device basename size. */
+#endif
 #define MAX_PKT_BURST 32           /* Max burst size for RX/TX */
 
+#ifdef VHOST_CUSE
 /* Character device basename. */
 char *dev_basename = NULL;
+#else
+#define VHOST_USER_PORT_SOCK_PATH "/tmp/%s" /* Socket Location Template */
+#endif
 
 static const struct rte_eth_conf port_conf = {
     .rxmode = {
@@ -215,6 +221,11 @@ struct netdev_dpdk {
     /* virtio-net structure for vhost device */
     OVSRCU_TYPE(struct virtio_net *) virtio_dev;
 
+#ifndef VHOST_CUSE
+    /* socket location for vhost-user device */
+    char socket_path[IF_NAME_SZ];
+#endif
+
     /* In dpdk_list. */
     struct ovs_list list_node OVS_GUARDED_BY(dpdk_spinlock);
     rte_spinlock_t dpdkr_tx_lock;
@@ -229,7 +240,7 @@ static bool thread_is_pmd(void);
 
 static int netdev_dpdk_construct(struct netdev *);
 
-void *start_cuse_session_loop(void *dummy);
+void *start_vhost_loop(void *dummy);
 
 struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev);
 
@@ -609,6 +620,18 @@ netdev_dpdk_vhost_construct(struct netdev *netdev_)
     netdev_->n_txq = NR_QUEUE;
     netdev_->n_rxq = NR_QUEUE;
 
+#ifndef VHOST_CUSE
+    snprintf(netdev->socket_path, sizeof(netdev->socket_path), 
VHOST_USER_PORT_SOCK_PATH, netdev_->name);
+    err = rte_vhost_driver_register(netdev->socket_path);
+    if (err != 0) {
+        VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
+                 netdev->socket_path);
+        goto unlock_dev;
+    }
+
+    VLOG_INFO("Socket %s created for vhost-user port %s\n", 
netdev->socket_path, netdev_->name);
+#endif
+
     VLOG_INFO("%s  is associated with VHOST port #%d\n", netdev_->name,
             netdev->port_id);
 
@@ -1531,7 +1554,11 @@ new_device(struct virtio_net *dev)
     rte_spinlock_lock(&dpdk_spinlock);
     /* Add device to the vhost port with the same name as that passed down. */
     LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
+#ifdef VHOST_CUSE
         if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) {
+#else
+        if (strncmp(dev->ifname, netdev->socket_path, IF_NAME_SZ) == 0) {
+#endif
             rte_spinlock_lock(&netdev->spinlock);
             ovsrcu_set(&netdev->virtio_dev, dev);
             rte_spinlock_unlock(&netdev->spinlock);
@@ -1606,10 +1633,15 @@ const struct virtio_net_device_ops 
virtio_net_device_ops =
 static int
 dpdk_vhost_class_init(void)
 {
+#ifdef VHOST_CUSE
     int err = -1;
+#endif
+
+    rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_MRG_RXBUF);
 
     rte_vhost_driver_callback_register(&virtio_net_device_ops);
 
+#ifdef VHOST_CUSE
     /* Register CUSE device to handle IOCTLs.
      * Unless otherwise specified on the vswitchd command line, dev_basename
      * is set to vhost-net.
@@ -1620,14 +1652,15 @@ dpdk_vhost_class_init(void)
         VLOG_ERR("CUSE device setup failure.\n");
         return -1;
     }
+#endif
 
-    ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL);
+    ovs_thread_create("vhost_thread", start_vhost_loop, NULL);
 
     return 0;
 }
 
 void *
-start_cuse_session_loop(void *dummy OVS_UNUSED)
+start_vhost_loop(void *dummy OVS_UNUSED)
 {
      pthread_detach(pthread_self());
 
@@ -1840,7 +1873,9 @@ int
 dpdk_init(int argc, char **argv)
 {
     int result;
+#ifdef VHOST_CUSE
     int base = 0;
+#endif
 
     if (argc < 2 || strcmp(argv[1], "--dpdk"))
         return 0;
@@ -1853,6 +1888,7 @@ dpdk_init(int argc, char **argv)
     argc--;
     argv++;
 
+#ifdef VHOST_CUSE
     /* If the basename parameter has been provided, set 'dev_basename' to
      * this string if it meets the correct criteria. Otherwise, set it to the
      * default (vhost-net).
@@ -1876,6 +1912,7 @@ dpdk_init(int argc, char **argv)
         dev_basename = "vhost-net";
         VLOG_INFO("No basename provided - defaulting to /dev/vhost-net\n");
     }
+#endif
 
     /* Make sure things are initialized ... */
     result = rte_eal_init(argc, argv);
@@ -1892,8 +1929,11 @@ dpdk_init(int argc, char **argv)
 
     /* We are called from the main thread here */
     thread_set_nonpmd();
-
+#ifdef VHOST_CUSE
     return result + 1 + base;
+#else
+    return result + 1;
+#endif
 }
 
 const struct netdev_class dpdk_class =
@@ -1926,7 +1966,11 @@ const struct netdev_class dpdk_ring_class =
 
 const struct netdev_class dpdk_vhost_class =
     NETDEV_DPDK_CLASS(
+#ifdef VHOST_CUSE
         "dpdkvhost",
+#else
+        "dpdkvhostuser",
+#endif
         dpdk_vhost_class_init,
         netdev_dpdk_vhost_construct,
         netdev_dpdk_vhost_destruct,
diff --git a/lib/netdev.c b/lib/netdev.c
index 149b39a..523a011 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -109,7 +109,11 @@ netdev_is_pmd(const struct netdev *netdev)
 {
     return (!strcmp(netdev->netdev_class->type, "dpdk") ||
             !strcmp(netdev->netdev_class->type, "dpdkr") ||
+#ifdef VHOST_CUSE
             !strcmp(netdev->netdev_class->type, "dpdkvhost"));
+#else
+            !strcmp(netdev->netdev_class->type, "dpdkvhostuser"));
+#endif
 }
 
 static void
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index 5497f6c..4b0429a 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -253,7 +253,9 @@ usage(void)
     vlog_usage();
     printf("\nDPDK options:\n"
            "  --dpdk options          Initialize DPDK datapath.\n"
+#ifdef VHOST_CUSE
            "  --basename BASENAME     override default character device name\n"
+#endif
            "                          for use with userspace vHost.\n");
     printf("\nOther options:\n"
            "  --unixctl=SOCKET        override default control socket name\n"
-- 
1.9.3

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to