Re: [ovs-dev] [PATCH v3 1/1] netdev-dpdk: add dpdk vhost ports

Flavio Leitner Tue, 12 Aug 2014 20:22:30 -0700

I've noticed that the directory eventfd_link and its
Makefile.in are in the tarball generated by 'make dist'.
However, the eventfd_link.c and eventfd_link.h are missing.


I haven't looked at anything else in this patch.
Thanks,
fbl


On Fri, Aug 08, 2014 at 01:28:55PM +0100, maryam.tahhan wrote:
> This patch implements the vhost-net offload API.  It adds support for
> a new port type to userspace datapath called dpdkvhost. This allows KVM
> (QEMU) to offload the servicing of virtio-net devices to it's associated
> dpdkvhost port. Instructions for use are in INSTALL.DPDK.
> 
> This has been tested on Intel multi-core platforms and with clients that
> have virtio-net interfaces.
> 
> Signed-off-by: maryam.tahhan <maryam.tah...@intel.com>
> ---
>  INSTALL.DPDK                          |  186 +++++-
>  Makefile.am                           |   46 +-
>  configure.ac                          |    1 +
>  lib/automake.mk                       |    7 +-
>  lib/netdev-dpdk.c                     |  804 ++++++++++++++++++++++--
>  lib/netdev.c                          |    3 +-
>  lib/vhost-net-cdev.c                  |  387 ++++++++++++
>  lib/vhost-net-cdev.h                  |   81 +++
>  lib/virtio-net.c                      | 1093 
> +++++++++++++++++++++++++++++++++
>  lib/virtio-net.h                      |  125 ++++
>  utilities/automake.mk                 |    3 +-
>  utilities/eventfd_link/Makefile.in    |   86 +++
>  utilities/eventfd_link/eventfd_link.c |  179 ++++++
>  utilities/eventfd_link/eventfd_link.h |   79 +++
>  utilities/qemu-wrap.py                |  389 ++++++++++++
>  15 files changed, 3385 insertions(+), 84 deletions(-)
>  create mode 100644 lib/vhost-net-cdev.c
>  create mode 100644 lib/vhost-net-cdev.h
>  create mode 100644 lib/virtio-net.c
>  create mode 100644 lib/virtio-net.h
>  create mode 100644 utilities/eventfd_link/Makefile.in
>  create mode 100644 utilities/eventfd_link/eventfd_link.c
>  create mode 100644 utilities/eventfd_link/eventfd_link.h
>  create mode 100755 utilities/qemu-wrap.py
> 
> diff --git a/INSTALL.DPDK b/INSTALL.DPDK
> index c74fa5c..3bbd84e 100644
> --- a/INSTALL.DPDK
> +++ b/INSTALL.DPDK
> @@ -220,7 +220,7 @@ ethernet ring and then places that same mbuf on the 
> transmit ring of
>  the ethernet ring.  It is a trivial loopback application.
>  
>  In addition to executing the client in the host, you can execute it within
> -a guest VM. To do so you will need a patched qemu.  You can download the
> +a guest VM. To do so you will need a patched QEMU.  You can download the
>  patch and getting started guide at :
>  
>  https://01.org/packet-processing/downloads
> @@ -229,6 +229,190 @@ A general rule of thumb for better performance is that 
> the client
>  application should not be assigned the same dpdk core mask "-c" as
>  the vswitchd.
>  
> +
> +Using the DPDK vhost with ovs-vswitchd:
> +--------------------------------------
> +Prerequisites: FUSE (packages: fuse and fuse-devel) & CUSE
> + - Insert the FUSE Kernel module e.g. modprobe fuse
> +
> + - Build and insert the eventfd_link module located in
> +   <top-level ovs dir>/utilities/eventfd_link.  From the top-level ovs dir:
> +    make eventfd && insmod <toplevel ovs dir>/utilities/eventfd_link.ko
> +
> + - Remove /dev/vhost-net
> +    NOTE: Vhost ports use a Linux* character device to communicate with QEMU.
> +    The basename and the index in dpdk_vhost_class_init() are used to
> +    generate the character device's name. By default it is set to 
> /dev/vhost-net.
> +    If you wish to use kernel vhost in parallel, simply change the values of
> +    dev_index and dev_basename in dpdk_vhost_class_init()) and recompile.
> +    In other words user space vhost can co-exist with Kernel vhost if you 
> wish to use
> +    both. Please refer to the Guest setup section for more information on 
> how to
> +    setup the guest in both cases.
> +
> + - Now you can add dpdkvhost devices.
> +
> +   e.g.
> +   ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 
> type=dpdkvhost
> +
> + - Finally setup a guest with a virtio-net device attached to the vhost port.
> +
> + **Note: tested with QEMU versions 1.4.2 and 1.6.2**
> +
> +Guest Setup Requirements:
> +-------------------------
> +1. The Guest must be configured with virtio-net adapters and offloads MUST BE
> +   DISABLED. e.g:
> +  -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> +  -device 
> virtio-net-pci,netdev=net1,mac=xx:xx:xx:xx:xx:xx,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
> +
> +2. QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
> +   virtio-net device's virtual rings and packet buffers mapping the VM's 
> physical
> +   memory on hugetlbfs. To enable vhost-ports to map the VM's memory into 
> their
> +   process address space, pass the following paramters to QEMU:
> +  -mem-path /dev/hugepages -mem-prealloc
> +
> +3. Redirect QEMU to communicate with a vhost port instead of the vhost-net 
> kernel
> +   module by passing in an open file descriptor, e.g:
> +   -netdev 
> tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,vhostfd=<open fd>
> +
> +   The open file descriptor must be passed to QEMU running as a child 
> process.
> +
> + NOTE 1: If you are using /dev/vhost-net you don't need to pass in 
> vhostfd=<open fd>.
> +         vhost=on is enough.
> +
> + NOTE 2: The easiest way to setup a Guest that isn't using /dev/vhost-net is 
> to
> +         use the qemu-wrap.py script located in utilities. This python script
> +         automates the requirements specified above and can be used in 
> conjunction
> +         with libvirt.
> +
> + /dev/vhost-net:
> + --------------
> + If you are using /dev/vhost-net simply pass the following parameters to 
> QEMU to
> + create a virtio-net device:
> +  -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> +  -device 
> virtio-net-pci,netdev=net1,mac=xx:xx:xx:xx:xx:xx,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
> +
> + NOTE:
> +  - The specified offloads MUST BE DISABLED for userspace vhost.
> +  - To create multiple devices repeat the -netdev -device parameters.
> +
> + /dev/<basename-index>:
> + -----------------------
> + If you are NOT using /dev/vhost-net you have two options to launch your VMs:
> + 1. You can use the QEMU wrapper qemu-wrap.py provided in utilities (Please 
> see
> +    QEMU wrapper setup instructions below).
> + 2. You must manually pass the open file descriptor to the character device 
> to
> +    QEMU running as a child process as specified in the Guest Setup section.
> +
> +In the /dev/<basename-index> case you must enable libvirt to access the 
> userspace
> +device file by adding it to controllers cgroup for libvirtd using the 
> following
> +steps:
> +
> +    a) In /etc/libvirt/qemu.conf add/edit the following lines:
> +          1) cgroup_controllers = [ ... "devices", ... ]
> +          2) clear_emulator_capabilities = 0
> +          3) user = "root"
> +          4) group = "root"
> +          5) cgroup_device_acl = [
> +                 "/dev/null", "/dev/full", "/dev/zero",
> +                 "/dev/random", "/dev/urandom",
> +                 "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> +                 "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> +                 "/dev/<devbase-name>-<index>",
> +                 "/dev/hugepages"]
> +
> +    b) Disable SELinux or set to permissive mode
> +
> +    c) Mount cgroup device controller
> +         "mkdir /dev/cgroup"
> +         "mount -t cgroup none /dev/cgroup -o devices"
> +    d) Restart the libvirtd process
> +        e.g. on Fedora "systemctl restart libvirtd.service"
> +
> + QEMU wrapper instructions:
> + --------------------------
> +The QEMU wrapper script automatically detects and calls QEMU with the 
> necessary
> +parameters required to integrate with the vhost sample code. It performs the 
> following
> +actions:
> +- Automatically detects the location of the hugetlbfs and inserts this into 
> the
> +  command line parameters.
> +- Automatically open file descriptors for each virtio-net device and inserts 
> this into
> +  the command line parameters.
> +- Disables offloads on each virtio-net device.
> +- Calls QEMU passing both the command line parameters passed to the script 
> itself
> +  and those it has auto-detected.
> +
> +Using qemu-wrap.py:
> +------------------
> +  You MUST edit the configuration parameters section of the script to point 
> to
> +  the correct emulator location and set any additional options.
> +  NOTE: emul_path and us_vhost_path must be set.  All other parameters are
> +  optional.
> +
> +To use directly on the command line simply pass the wrapper some of the QEMU
> +parameters it will configure the rest, for e.g:
> +    qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
> +    --enable-kvm -nographic -vnc none -net none -netdev 
> tap,id=net1,script=no,
> +    downscript=no,ifname=if1,vhost=on -device virtio-net-pci,netdev=net1,
> +    mac=00:00:00:00:00:01
> +
> +qemu-wrap.py + Libvirt Integration:
> +-----------------------------------
> +  1. Place qemu-wrap.py in libvirtd's binary search PATH ($PATH)
> +     Ideally in the same directory that the QEMU binary is located.
> +
> +  2. Ensure that the script has the same owner/group and file
> +     permissions as the QEMU binary.
> +
> +  3. Update the VM xml file using "virsh edit VM.xml"
> +
> +     3.a) Set the VM to use the launch script
> +
> +        Set the emulator path contained in the
> +        <emulator><emulator/> tags
> +
> +        e.g replace <emulator>/usr/bin/qemu-kvm<emulator/>
> +         with    <emulator>/usr/bin/qemu-wrap.py<emulator/>
> +
> +     3.b) Set the VM's devices to use vhost-net offload
> +
> +            <interface type='network'>
> +              <mac address='xx:xx:xx:xx:xx:xx'/>
> +              <source network='default'/>
> +              <model type='virtio'/>
> +              <driver name='vhost'/>
> +              <address type=.../>
> +            </interface>
> +
> +  4. Enable libvirt to access our userpace device file by adding it to
> +     controllers cgroup for libvirtd as specified in the 
> /dev/<basename-index>
> +     section above.
> +  5. Set hugetlbfs_mount variable - ( Optional )
> +     VMs using userspace vhost must use hugepage backed memory. This can be
> +     enabled in the libvirt XML config by adding a memory backing section to 
> the
> +     XML config e.g.
> +              <memoryBacking>
> +              <hugepages/>
> +              </memoryBacking>
> +     This memory backing section should be added after the <memory> and
> +     <currentMemory> sections. This will add flags "-mem-prealloc -mem-path 
> <path>"
> +     to the QEMU command line. The hugetlbfs_mount variable can be used
> +     to override the default <path> passed through by libvirt.
> +     if "-mem-prealloc" or "-mem-path <path>" are not passed
> +     through and a vhost device is detected then these options will
> +     be automatically added by this script. This script will detect
> +     the system hugetlbfs mount point to be used for <path>. The
> +     default <path> for this script can be overidden by the
> +     hugetlbfs_dir variable in the configuration section of this script.
> +
> +    6. Restart the libvirtd system process
> +         e.g. on Fedora "systemctl restart libvirtd.service"
> +
> +    7. Edit the Configuration Parameters section of the script
> +       to point to the correct emulator location and set any additional 
> options.
> +
> +    8. Use virt-manager to launch the VM
> +
>  Restrictions:
>  -------------
>  
> diff --git a/Makefile.am b/Makefile.am
> index eddacaf..8a5ae42 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -7,7 +7,7 @@
>  
>  AUTOMAKE_OPTIONS = foreign subdir-objects
>  ACLOCAL_AMFLAGS = -I m4
> -SUBDIRS = datapath
> +SUBDIRS = datapath utilities/eventfd_link
>  
>  AM_CPPFLAGS = $(SSL_CFLAGS)
>  AM_LDFLAGS = $(SSL_LDFLAGS)
> @@ -29,6 +29,10 @@ AM_CPPFLAGS += $(SSL_INCLUDES)
>  AM_CFLAGS = -Wstrict-prototypes
>  AM_CFLAGS += $(WARNING_FLAGS)
>  
> +if DPDK_NETDEV
> +AM_CFLAGS += -D_FILE_OFFSET_BITS=64
> +endif
> +
>  if NDEBUG
>  AM_CPPFLAGS += -DNDEBUG
>  AM_CFLAGS += -fomit-frame-pointer
> @@ -166,20 +170,25 @@ CLEAN_LOCAL += clean-pycov
>  if GNU_MAKE
>  ALL_LOCAL += dist-hook-git
>  dist-hook-git: distfiles
> -     @if test -e $(srcdir)/.git && (git --version) >/dev/null 2>&1; then \
> -       (cd datapath && $(MAKE) distfiles);                               \
> -       (cat distfiles; sed 's|^|datapath/|' datapath/distfiles) |        \
> -         LC_ALL=C sort -u > all-distfiles;                               \
> -       (cd $(srcdir) && git ls-files) | grep -v '\.gitignore$$' |        \
> -         LC_ALL=C sort -u > all-gitfiles;                                \
> -       LC_ALL=C comm -1 -3 all-distfiles all-gitfiles > missing-distfiles; \
> -       if test -s missing-distfiles; then                                \
> -         echo "The distribution is missing the following files:";        \
> -         cat missing-distfiles;                                          \
> -         exit 1;                                                         \
> -       fi;                                                               \
> +     @if test -e $(srcdir)/.git && (git --version) >/dev/null 2>&1; then  \
> +       (cd datapath && $(MAKE) distfiles);                                \
> +       (cat distfiles; sed 's|^|datapath/|' datapath/distfiles) |         \
> +         LC_ALL=C sort -u > all-distfiles;                                \
> +       (cd $(srcdir)/utilities/eventfd_link && $(MAKE) distfiles);        \
> +       (cat distfiles; sed 's|^|utilities/eventfd_link/|'                 \
> +       utilities/eventfd_link/distfiles) >> $(srcdir)/all-distfiles;      \
> +       (cat distfiles all-distfiles > output);                            \
> +       (cat output | LC_ALL=C sort -u > all-distfiles);                   \
> +       (git ls-files) | grep -v '\.gitignore$$' |                         \
> +         LC_ALL=C sort -u > all-gitfiles;                                 \
> +       LC_ALL=C comm -1 -3 all-distfiles all-gitfiles > missing-distfiles;\
> +       if test -s missing-distfiles; then                                 \
> +         echo "The distribution is missing the following files:";         \
> +         cat missing-distfiles;                                           \
> +         exit 1;                                                          \
> +       fi;                                                                \
>       fi
> -CLEANFILES += all-distfiles all-gitfiles missing-distfiles
> +CLEANFILES += output all-distfiles all-gitfiles missing-distfiles
>  # The following is based on commands for the Automake "distdir" target.
>  distfiles: Makefile
>       @srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*]/\\\\&/g'`; \
> @@ -199,7 +208,7 @@ config-h-check:
>       @cd $(srcdir); \
>       if test -e .git && (git --version) >/dev/null 2>&1 && \
>          git --no-pager grep -L '#include <config\.h>' `git ls-files | grep 
> '\.c$$' | \
> -               grep -vE 
> '^datapath|^lib/sflow|^third-party|^datapath-windows'`; \
> +               grep -vE 
> '^datapath|^lib/sflow|^third-party|^datapath-windows|^utilities/eventfd_link/eventfd_link'`;
>  \
>       then \
>           echo "See above for list of violations of the rule that"; \
>           echo "every C source file must #include <config.h>."; \
> @@ -253,7 +262,7 @@ thread-safety-check:
>       if test -e .git && (git --version) >/dev/null 2>&1 && \
>          grep -n -f build-aux/thread-safety-blacklist \
>              `git ls-files | grep '\.[ch]$$' \
> -                | $(EGREP) -v '^datapath|^lib/sflow|^third-party'` /dev/null 
> \
> +                | $(EGREP) -v 
> '^datapath|^lib/sflow|^third-party|^utilities/eventfd_link/eventfd_link'` 
> /dev/null \
>              | $(EGREP) -v ':[        ]*/?\*'; \
>       then \
>           echo "See above for list of calls to functions that are"; \
> @@ -299,6 +308,11 @@ if LINUX_ENABLED
>       cd datapath/linux && $(MAKE) modules_install
>  endif
>  
> +eventfd:
> +if DPDK_NETDEV
> +     cd utilities/eventfd_link && $(MAKE) module
> +endif
> +
>  include m4/automake.mk
>  include lib/automake.mk
>  include ofproto/automake.mk
> diff --git a/configure.ac b/configure.ac
> index 971c7b3..4864892 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -127,6 +127,7 @@ AC_CONFIG_FILES(datapath/linux/Kbuild)
>  AC_CONFIG_FILES(datapath/linux/Makefile)
>  AC_CONFIG_FILES(datapath/linux/Makefile.main)
>  AC_CONFIG_FILES(tests/atlocal)
> +AC_CONFIG_FILES(utilities/eventfd_link/Makefile)
>  
>  dnl This makes sure that include/openflow gets created in the build 
> directory.
>  AC_CONFIG_COMMANDS([include/openflow/openflow.h.stamp])
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 4628c9b..9fb2ec4 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -316,9 +316,14 @@ lib_libopenvswitch_la_SOURCES += \
>  endif
>  
>  if DPDK_NETDEV
> +lib_libopenvswitch_la_LDFLAGS += -lfuse -ldl
>  lib_libopenvswitch_la_SOURCES += \
>         lib/netdev-dpdk.c \
> -       lib/netdev-dpdk.h
> +       lib/netdev-dpdk.h \
> +       lib/virtio-net.c \
> +       lib/virtio-net.h \
> +       lib/vhost-net-cdev.c \
> +       lib/vhost-net-cdev.h
>  endif
>  
>  if WIN32
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index b45e367..ff1f9d6 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -27,6 +27,12 @@
>  #include <stdlib.h>
>  #include <unistd.h>
>  #include <stdio.h>
> +#include <rte_string_fns.h>
> +#include <sys/eventfd.h>
> +#include <sys/param.h>
> +#include <arpa/inet.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_ring.h>
>  
>  #include "dpif-netdev.h"
>  #include "list.h"
> @@ -46,6 +52,8 @@
>  #include "timeval.h"
>  #include "unixctl.h"
>  #include "vlog.h"
> +#include "virtio-net.h"
> +#include "vhost-net-cdev.h"
>  
>  VLOG_DEFINE_THIS_MODULE(dpdk);
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> @@ -83,6 +91,11 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
> 20);
>  #define TX_HTHRESH 0  /* Default values of TX host threshold reg. */
>  #define TX_WTHRESH 0  /* Default values of TX write-back threshold reg. */
>  
> +/* Maximum character device basename size. */
> +#define MAX_BASENAME_SZ 12
> +#define DEVICE_RX            1
> +#define MAX_PKT_BURST 32  /* Max burst size for RX/TX */
> +
>  static const struct rte_eth_conf port_conf = {
>      .rxmode = {
>          .mq_mode = ETH_MQ_RX_RSS,
> @@ -126,6 +139,7 @@ static const struct rte_eth_txconf tx_conf = {
>  enum { MAX_RX_QUEUE_LEN = 192 };
>  enum { MAX_TX_QUEUE_LEN = 384 };
>  enum { DRAIN_TSC = 200000ULL };
> +enum { DPDK = 0, VHOST };
>  
>  static int rte_eal_init_ret = ENODEV;
>  
> @@ -151,8 +165,8 @@ struct dpdk_mp {
>      struct list list_node OVS_GUARDED_BY(dpdk_mutex);
>  };
>  
> -struct dpdk_tx_queue {
> -    rte_spinlock_t tx_lock;
> +struct dpdk_queue {
> +    rte_spinlock_t queue_lock;
>      int count;
>      uint64_t tsc;
>      struct rte_mbuf *burst_pkts[MAX_TX_QUEUE_LEN];
> @@ -178,8 +192,11 @@ struct netdev_dpdk {
>      struct netdev up;
>      int port_id;
>      int max_packet_len;
> +    int type; /*DPDK or VHOST*/
> +    char *name;
>  
> -    struct dpdk_tx_queue tx_q[NR_QUEUE];
> +    struct dpdk_queue tx_q[NR_QUEUE];
> +    struct dpdk_queue rx_q[NR_QUEUE]; /* used only for vhost device*/
>  
>      struct ovs_mutex mutex OVS_ACQ_AFTER(dpdk_mutex);
>  
> @@ -196,6 +213,9 @@ struct netdev_dpdk {
>      struct rte_eth_link link;
>      int link_reset_cnt;
>  
> +    /*virtio-net structure for vhost device*/
> +    struct virtio_net *virtio_dev;
> +
>      /* In dpdk_list. */
>      struct list list_node OVS_GUARDED_BY(dpdk_mutex);
>  };
> @@ -469,10 +489,12 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int 
> port_no) OVS_REQUIRES(dpdk
>      ovs_mutex_lock(&netdev->mutex);
>  
>      for (i = 0; i < NR_QUEUE; i++) {
> -        rte_spinlock_init(&netdev->tx_q[i].tx_lock);
> +        rte_spinlock_init(&netdev->tx_q[i].queue_lock);
> +        rte_spinlock_init(&netdev->rx_q[i].queue_lock);
>      }
>  
>      netdev->port_id = port_no;
> +    netdev->type = DPDK;
>  
>      netdev->flags = 0;
>      netdev->mtu = ETHER_MTU;
> @@ -516,6 +538,76 @@ dpdk_dev_parse_name(const char dev_name[], const char 
> prefix[],
>  }
>  
>  static int
> +netdev_dpdk_vhost_construct(struct netdev *netdev_)
> +{
> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> +    struct netdev_dpdk *tmp_netdev;
> +    unsigned int port_no = 0;
> +    int err = 0;
> +    int i;
> +    struct rte_pktmbuf_pool_private *mbp_priv;
> +
> +    if (rte_eal_init_ret) {
> +        return rte_eal_init_ret;
> +    }
> +    ovs_mutex_lock(&dpdk_mutex);
> +    ovs_mutex_init(&netdev->mutex);
> +    ovs_mutex_lock(&netdev->mutex);
> +
> +    for (i = 0; i < NR_QUEUE; i++) {
> +        rte_spinlock_init(&netdev->tx_q[i].queue_lock);
> +        rte_spinlock_init(&netdev->rx_q[i].queue_lock);
> +    }
> +
> +    if (!list_is_empty(&dpdk_list)) {
> +        LIST_FOR_EACH (tmp_netdev, list_node, &dpdk_list) {
> +            if (tmp_netdev->type== VHOST) {
> +                port_no++;
> +            }
> +        }
> +    }
> +
> +    netdev->port_id = port_no;
> +    netdev->type = VHOST;
> +
> +    netdev->flags = 0;
> +    netdev->mtu = ETHER_MTU;
> +    netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu);
> +
> +    /* TODO: need to discover device node at run time. */
> +    netdev->socket_id = SOCKET0;
> +
> +    netdev->rx_q->count = 0;
> +    netdev->tx_q->count = 0;
> +
> +    netdev->virtio_dev = NULL;
> +
> +    netdev->name = netdev_->name;
> +
> +    netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu);
> +    if (!netdev->dpdk_mp) {
> +        err = ENOMEM;
> +        goto unlock_dev;
> +    }
> +    netdev_->n_rxq = NR_QUEUE;
> +
> +    mbp_priv = rte_mempool_get_priv(netdev->dpdk_mp->mp);
> +    netdev->buf_size =
> +            mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM;
> +
> +    VLOG_INFO("%s  is associated with VHOST port #%d\n", netdev->name,
> +            netdev->port_id);
> +
> +    list_push_back(&dpdk_list, &netdev->list_node);
> +
> +unlock_dev:
> +    ovs_mutex_unlock(&netdev->mutex);
> +    ovs_mutex_unlock(&dpdk_mutex);
> +
> +    return err;
> +}
> +
> +static int
>  netdev_dpdk_construct(struct netdev *netdev)
>  {
>      unsigned int port_no;
> @@ -542,9 +634,15 @@ netdev_dpdk_destruct(struct netdev *netdev_)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev_);
>  
> -    ovs_mutex_lock(&dev->mutex);
> -    rte_eth_dev_stop(dev->port_id);
> -    ovs_mutex_unlock(&dev->mutex);
> +    /* Can't remove a port while a guest is attached to it. */
> +    if (dev->type == VHOST && dev->virtio_dev != NULL) {
> +        VLOG_ERR("Can not remove port, vhost device still attached\n");
> +                return;
> +    } else {
> +        ovs_mutex_lock(&dev->mutex);
> +        rte_eth_dev_stop(dev->port_id);
> +        ovs_mutex_unlock(&dev->mutex);
> +    }
>  
>      ovs_mutex_lock(&dpdk_mutex);
>      list_remove(&dev->list_node);
> @@ -617,10 +715,294 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq_)
>      rte_free(rx);
>  }
>  
> -static inline void
> +/*
> + * Function to convert guest physical addresses to vhost virtual addresses.
> + *  This is used to convert virtio buffer addresses.
> + */
> +inline static uint64_t __attribute__((always_inline))
> +gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
> +{
> +    struct virtio_memory_regions *region;
> +    uint32_t regionidx;
> +    uint64_t vhost_va = 0;
> +
> +    for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
> +        region = &dev->mem->regions[regionidx];
> +        if ((guest_pa >= region->guest_phys_address) &&
> +        (guest_pa <= region->guest_phys_address_end)) {
> +            vhost_va = region->address_offset + guest_pa;
> +            break;
> +        }
> +    }
> +
> +    VLOG_DBG_RL(&rl, "(%"PRIu64") GPA %p| VVA %p\n",
> +        dev->device_fh, (void*)(uintptr_t)guest_pa, 
> (void*)(uintptr_t)vhost_va);
> +
> +    return vhost_va;
> +}
> +
> +/*
> + * This function adds buffers to the virtio device's RX virtqueue. Buffers 
> are
> + * received from the vhost port send function.
> + */
> +inline static void __attribute__((always_inline))
> +virtio_dev_rx(struct netdev_dpdk *vhost_dev, struct rte_mbuf **pkts,
> +              uint32_t count)
> +{
> +    struct virtio_net *dev;
> +    struct vhost_virtqueue *vq;
> +    struct vring_desc *desc;
> +    struct rte_mbuf *buff;
> +    /* The virtio_hdr is initialised to 0. */
> +    struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0,0,0,0,0,0},0};
> +    uint64_t buff_addr = 0, buff_hdr_addr = 0;
> +    uint64_t bytes = 0;
> +    uint32_t head[MAX_PKT_BURST], packet_len = 0;
> +    uint32_t head_idx, packet_success = 0;
> +    uint16_t avail_idx, res_cur_idx, free_entries;
> +    uint16_t res_base_idx, res_end_idx;
> +    uint8_t success = 0;
> +
> +    dev = vhost_dev->virtio_dev;
> +    vq = dev->virtqueue[VIRTIO_RXQ];
> +    count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
> +    /* Reserve available buffers. */
> +    do {
> +            res_base_idx = vq->last_used_idx_res;
> +            avail_idx = *((volatile uint16_t *)&vq->avail->idx);
> +
> +            free_entries = (avail_idx - res_base_idx);
> +            /* Check that we have enough buffers. */
> +            if (unlikely(count > free_entries)) {
> +                count = free_entries;
> +            }
> +            if (count == 0) {
> +                return;
> +            }
> +            res_end_idx = res_base_idx + count;
> +            /* vq->last_used_idx_res is atomically updated. */
> +            success = rte_atomic16_cmpset(&vq->last_used_idx_res, 
> res_base_idx,
> +                res_end_idx);
> +    } while (unlikely(success == 0));
> +
> +    res_cur_idx = res_base_idx;
> +    VLOG_DBG_RL(&rl, "(%"PRIu64") Current Index %d| End Index %d\n",
> +            dev->device_fh, res_cur_idx, res_end_idx);
> +
> +    /* Prefetch available ring to retrieve indexes. */
> +    rte_prefetch0(&vq->avail->ring[res_cur_idx & (vq->size - 1)]);
> +
> +    /* Retrieve all of the head indexes first to avoid caching issues. */
> +    for (head_idx = 0; head_idx < count; head_idx++) {
> +        head[head_idx] =
> +            vq->avail->ring[(res_cur_idx + head_idx) & (vq->size - 1)];
> +    }
> +
> +    /*Prefetch descriptor index. */
> +    rte_prefetch0(&vq->desc[head[packet_success]]);
> +
> +    while (res_cur_idx != res_end_idx) {
> +        /* Get descriptor from available ring */
> +        desc = &vq->desc[head[packet_success]];
> +        buff = pkts[packet_success];
> +        /* Convert from gpa to vva (guest physical addr -> vhost virtual 
> addr)*/
> +        buff_addr = gpa_to_vva(dev, desc->addr);
> +        /* Prefetch buffer address. */
> +        rte_prefetch0((void*)(uintptr_t)buff_addr);
> +
> +        /* Copy virtio_hdr to packet and increment buffer address */
> +        buff_hdr_addr = buff_addr;
> +        packet_len = rte_pktmbuf_data_len(buff) + vq->vhost_hlen;
> +
> +        /*
> +         * If the descriptors are chained the header and data are placed in
> +         * separate buffers.
> +         */
> +        if (desc->flags & VRING_DESC_F_NEXT) {
> +            desc->len = vq->vhost_hlen;
> +            desc = &vq->desc[desc->next];
> +            /* Buffer address translation. */
> +            buff_addr = gpa_to_vva(dev, desc->addr);
> +            desc->len = rte_pktmbuf_data_len(buff);
> +        } else {
> +            buff_addr += vq->vhost_hlen;
> +            desc->len = packet_len;
> +        }
> +
> +        /* Update used ring with desc information */
> +        vq->used->ring[res_cur_idx & (vq->size - 1)].id = 
> head[packet_success];
> +        vq->used->ring[res_cur_idx & (vq->size - 1)].len = packet_len;
> +
> +        /* Copy mbuf data to buffer */
> +        rte_memcpy((void *)(uintptr_t)buff_addr, (const void*)buff->pkt.data,
> +                rte_pktmbuf_data_len(buff));
> +
> +        res_cur_idx++;
> +        packet_success++;
> +        bytes += rte_pktmbuf_data_len(buff);
> +
> +        rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const 
> void*)&virtio_hdr,
> +            vq->vhost_hlen);
> +
> +        if (res_cur_idx < res_end_idx) {
> +            /* Prefetch descriptor index. */
> +            rte_prefetch0(&vq->desc[head[packet_success]]);
> +        }
> +    }
> +    rte_compiler_barrier();
> +
> +    /* Wait until it's our turn to add our buffer to the used ring. */
> +    while (unlikely(vq->last_used_idx != res_base_idx)) {
> +        rte_pause();
> +    }
> +
> +    *(volatile uint16_t *)&vq->used->idx += count;
> +    vq->last_used_idx = res_end_idx;
> +
> +    /* Kick the guest if necessary. */
> +    if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
> +        eventfd_write((int)vq->kickfd, 1);
> +    }
> +
> +    /* Update the TX stats. */
> +    ovs_mutex_lock(&vhost_dev->mutex);
> +    vhost_dev->stats.tx_packets += packet_success;
> +    vhost_dev->stats.tx_bytes += bytes;
> +    ovs_mutex_unlock(&vhost_dev->mutex);
> +    return;
> +}
> +
> +/*
> + * This function pulls buffers from the virtio device's TX virtqueue.
> + */
> +inline static uint16_t __attribute__((always_inline))
> +virtio_dev_tx(struct netdev_dpdk *vhost_dev, struct rte_mbuf ** bufs)
> +{
> +    struct virtio_net *dev = vhost_dev->virtio_dev;
> +    struct rte_mbuf *mbuf;
> +    struct vhost_virtqueue *vq = NULL;
> +    struct vring_desc *desc = NULL;
> +    uint64_t buff_addr = 0, bytes = 0;
> +    uint32_t head[MAX_PKT_BURST];
> +    uint32_t used_idx = 0, i;
> +    uint16_t free_entries = 0, packet_success = 0, avail_idx = 0;
> +    unsigned buf_len = 0;
> +
> +    vq = dev->virtqueue[VIRTIO_TXQ];
> +    avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
> +
> +    /* If there are no available buffers then return. */
> +    if (vq->last_used_idx == avail_idx) {
> +        return 0;
> +    }
> +
> +    VLOG_DBG_RL(&rl,"(%"PRIu64") virtio_dev_tx()\n", dev->device_fh);
> +
> +    /* Prefetch available ring to retrieve head indexes. */
> +    rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]);
> +
> +    /* Get the number of free entries in the ring. */
> +    free_entries = (avail_idx - vq->last_used_idx);
> +
> +    /* Limit to MAX_PKT_BURST. */
> +    if (free_entries > MAX_PKT_BURST) {
> +        free_entries = MAX_PKT_BURST;
> +    }
> +
> +    VLOG_DBG_RL(&rl,"(%"PRIu64") Buffers available %d\n", dev->device_fh,
> +        free_entries);
> +    /* Retrieve all of the head indexes first to avoid caching issues. */
> +    for (i = 0; i < free_entries; i++) {
> +        head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)];
> +    }
> +
> +    /* Prefetch descriptor index. */
> +    rte_prefetch0(&vq->desc[head[packet_success]]);
> +    rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]);
> +
> +    /* If we are on a non pmd thread we have to use the mempool mutex, 
> because
> +     * every non pmd thread shares the same mempool cache */
> +    if (!thread_is_pmd()) {
> +        ovs_mutex_lock(&nonpmd_mempool_mutex);
> +    }
> +
> +    while (packet_success < free_entries) {
> +        desc = &vq->desc[head[packet_success]];
> +
> +        /* Discard first buffer as it is the virtio header. */
> +        desc = &vq->desc[desc->next];
> +
> +        /* Buffer address translation. */
> +        buff_addr = gpa_to_vva(dev, desc->addr);
> +        /* Prefetch buffer address. */
> +        rte_prefetch0((void*)(uintptr_t)buff_addr);
> +
> +        used_idx = vq->last_used_idx & (vq->size - 1);
> +
> +        if (packet_success < (free_entries - 1)) {
> +            /* Prefetch descriptor index. */
> +            rte_prefetch0(&vq->desc[head[packet_success+1]]);
> +            rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]);
> +        }
> +
> +        /* Update used index buffer information. */
> +        vq->used->ring[used_idx].id = head[packet_success];
> +        vq->used->ring[used_idx].len = 0;
> +
> +        /* Allocate an mbuf and populate the structure. */
> +        mbuf = rte_pktmbuf_alloc(vhost_dev->dpdk_mp->mp);
> +        if (!mbuf) {
> +            ovs_mutex_lock(&vhost_dev->mutex);
> +            vhost_dev->stats.rx_dropped++;
> +            ovs_mutex_unlock(&vhost_dev->mutex);
> +            VLOG_ERR("Failed to allocate memory for mbuf.\n");
> +            if (!thread_is_pmd()) {
> +                ovs_mutex_unlock(&nonpmd_mempool_mutex);
> +            }
> +            goto out;
> +        }
> +
> +        mbuf->pkt.data_len = desc->len;
> +        mbuf->pkt.pkt_len = mbuf->pkt.data_len;
> +
> +        /* Copy the packet contents to the mbuf. */
> +        rte_memcpy((void*)mbuf->pkt.data,  (const void 
> *)(uintptr_t)buff_addr,
> +           mbuf->pkt.data_len);
> +
> +        /* Add the buffer to the rx_q. */
> +        bufs[buf_len++] = mbuf;
> +        vq->last_used_idx++;
> +        packet_success++;
> +        bytes += mbuf->pkt.data_len;
> +    }
> +
> +    rte_compiler_barrier();
> +
> +    vq->used->idx += packet_success;
> +
> +    /* If we are on a non pmd thread we have to use the mempool mutex, 
> because
> +     * every non pmd thread shares the same mempool cache */
> +    if (!thread_is_pmd()) {
> +        ovs_mutex_unlock(&nonpmd_mempool_mutex);
> +    }
> +
> +    /* Kick guest if required. */
> +    if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
> +        eventfd_write((int)vq->kickfd, 1);
> +
> +out:
> +    /* Update the TX stats. */
> +    ovs_mutex_lock(&vhost_dev->mutex);
> +    vhost_dev->stats.rx_packets += packet_success;
> +    vhost_dev->stats.rx_bytes += bytes;
> +    ovs_mutex_unlock(&vhost_dev->mutex);
> +    return packet_success;
> +}
> +
> +inline static void
>  dpdk_queue_flush__(struct netdev_dpdk *dev, int qid)
>  {
> -    struct dpdk_tx_queue *txq = &dev->tx_q[qid];
> +    struct dpdk_queue *txq = &dev->tx_q[qid];
>      uint32_t nb_tx;
>  
>      nb_tx = rte_eth_tx_burst(dev->port_id, qid, txq->burst_pkts, txq->count);
> @@ -640,16 +1022,48 @@ dpdk_queue_flush__(struct netdev_dpdk *dev, int qid)
>  static inline void
>  dpdk_queue_flush(struct netdev_dpdk *dev, int qid)
>  {
> -    struct dpdk_tx_queue *txq = &dev->tx_q[qid];
> +    struct dpdk_queue *txq = &dev->tx_q[qid];
>  
>      if (txq->count == 0) {
>          return;
>      }
> -    rte_spinlock_lock(&txq->tx_lock);
> +    rte_spinlock_lock(&txq->queue_lock);
>      dpdk_queue_flush__(dev, qid);
> -    rte_spinlock_unlock(&txq->tx_lock);
> +    rte_spinlock_unlock(&txq->queue_lock);
>  }
>  
> +/*receive: i.e TX out from Guest*/
> +static int
> +netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq_,
> +                           struct dpif_packet **packets, int *c)
> +{
> +    struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq_);
> +    struct netdev *netdev = rx->up.netdev;
> +    struct netdev_dpdk *vhost_dev = netdev_dpdk_cast(netdev);
> +    struct virtio_net *virtio_dev = vhost_dev->virtio_dev;
> +    int nb_rx = 0;
> +
> +    rte_spinlock_lock(&vhost_dev->rx_q->queue_lock);
> +    vhost_dev->rx_q->count = 0;
> +    rte_spinlock_unlock(&vhost_dev->rx_q->queue_lock);
> +
> +    if (virtio_dev != NULL && virtio_dev->flags == VIRTIO_DEV_RUNNING &&
> +            !(virtio_dev->remove)) {
> +        nb_rx = virtio_dev_tx(vhost_dev, (struct rte_mbuf **)packets);
> +        if (!nb_rx) {
> +            return EAGAIN;
> +        }
> +
> +        rte_spinlock_lock(&vhost_dev->rx_q->queue_lock);
> +        vhost_dev->rx_q->count = nb_rx;
> +        rte_spinlock_unlock(&vhost_dev->rx_q->queue_lock);
> +    }
> +
> +    *c = (int) nb_rx;
> +
> +    return 0;
> + }
> +
>  static int
>  netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, struct dpif_packet **packets,
>                       int *c)
> @@ -674,16 +1088,57 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, struct 
> dpif_packet **packets,
>      return 0;
>  }
>  
> +/*Send: i.e. RX into Guest*/
> +static int
> +netdev_dpdk_vhost_send(struct netdev *netdev, struct dpif_packet **pkts,
> +                       int cnt, bool may_steal)
> +{
> +    int i;
> +    struct netdev_dpdk *vhost_dev = netdev_dpdk_cast(netdev);
> +    struct dpdk_queue *txq = &vhost_dev->tx_q[NON_PMD_THREAD_TX_QUEUE];
> +
> +    if (vhost_dev->virtio_dev == NULL ||
> +        vhost_dev->virtio_dev->ready != DEVICE_RX ||
> +        vhost_dev->virtio_dev->remove ||
> +        vhost_dev->virtio_dev->flags != VIRTIO_DEV_RUNNING) {
> +        VLOG_WARN_RL(&rl, "virtio_dev not added yet");
> +
> +        ovs_mutex_lock(&vhost_dev->mutex);
> +        vhost_dev->stats.tx_dropped+= cnt;
> +        ovs_mutex_unlock(&vhost_dev->mutex);
> +
> +        if (may_steal) {
> +            for (i = 0; i < cnt; i++) {
> +               dpif_packet_delete(pkts[i]);
> +            }
> +        }
> +        return 0;
> +    }
> +
> +    rte_spinlock_lock(&txq->queue_lock);
> +    txq->count = cnt;
> +    virtio_dev_rx(vhost_dev, (struct rte_mbuf **)pkts, cnt);
> +    if (may_steal) {
> +        for (i = 0; i < cnt; i++) {
> +            dpif_packet_delete(pkts[i]);
> +        }
> +    }
> +    txq->count = 0;
> +    rte_spinlock_unlock(&txq->queue_lock);
> +
> +    return 0;
> +}
> +
>  inline static void
>  dpdk_queue_pkts(struct netdev_dpdk *dev, int qid,
>                 struct rte_mbuf **pkts, int cnt)
>  {
> -    struct dpdk_tx_queue *txq = &dev->tx_q[qid];
> +    struct dpdk_queue *txq = &dev->tx_q[qid];
>      uint64_t diff_tsc;
>  
>      int i = 0;
>  
> -    rte_spinlock_lock(&txq->tx_lock);
> +    rte_spinlock_lock(&txq->queue_lock);
>      while (i < cnt) {
>          int freeslots = MAX_TX_QUEUE_LEN - txq->count;
>          int tocopy = MIN(freeslots, cnt-i);
> @@ -702,7 +1157,7 @@ dpdk_queue_pkts(struct netdev_dpdk *dev, int qid,
>              dpdk_queue_flush__(dev, qid);
>          }
>      }
> -    rte_spinlock_unlock(&txq->tx_lock);
> +    rte_spinlock_unlock(&txq->queue_lock);
>  }
>  
>  /* Tx function. Transmit packets indefinitely */
> @@ -870,42 +1325,42 @@ netdev_dpdk_set_mtu(const struct netdev *netdev, int 
> mtu)
>      struct dpdk_mp *old_mp;
>      struct dpdk_mp *mp;
>  
> -    ovs_mutex_lock(&dpdk_mutex);
> -    ovs_mutex_lock(&dev->mutex);
> -    if (dev->mtu == mtu) {
> -        err = 0;
> -        goto out;
> -    }
> +    if (dev->type == DPDK && dev->mtu != mtu) {
> +        ovs_mutex_lock(&dpdk_mutex);
> +        ovs_mutex_lock(&dev->mutex);
> +        mp = dpdk_mp_get(dev->socket_id, dev->mtu);
> +        if (!mp) {
> +            err = ENOMEM;
> +            goto out;
> +         }
>  
> -    mp = dpdk_mp_get(dev->socket_id, dev->mtu);
> -    if (!mp) {
> -        err = ENOMEM;
> -        goto out;
> -    }
> +        rte_eth_dev_stop(dev->port_id);
>  
> -    rte_eth_dev_stop(dev->port_id);
> +        old_mtu = dev->mtu;
> +        old_mp = dev->dpdk_mp;
> +        dev->dpdk_mp = mp;
> +        dev->mtu = mtu;
> +        dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
>  
> -    old_mtu = dev->mtu;
> -    old_mp = dev->dpdk_mp;
> -    dev->dpdk_mp = mp;
> -    dev->mtu = mtu;
> -    dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
> +        err = dpdk_eth_dev_init(dev);
> +        if (err) {
> +            dpdk_mp_put(mp);
> +            dev->mtu = old_mtu;
> +            dev->dpdk_mp = old_mp;
> +            dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
> +            dpdk_eth_dev_init(dev);
> +            goto out;
> +        }
>  
> -    err = dpdk_eth_dev_init(dev);
> -    if (err) {
> -        dpdk_mp_put(mp);
> -        dev->mtu = old_mtu;
> -        dev->dpdk_mp = old_mp;
> -        dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
> -        dpdk_eth_dev_init(dev);
> -        goto out;
> +        dpdk_mp_put(old_mp);
> +        netdev_change_seq_changed(netdev);
> +        ovs_mutex_unlock(&dev->mutex);
> +        ovs_mutex_unlock(&dpdk_mutex);
> +    } else {
> +        err = 0;
>      }
>  
> -    dpdk_mp_put(old_mp);
> -    netdev_change_seq_changed(netdev);
>  out:
> -    ovs_mutex_unlock(&dev->mutex);
> -    ovs_mutex_unlock(&dpdk_mutex);
>      return err;
>  }
>  
> @@ -913,6 +1368,43 @@ static int
>  netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
>  
>  static int
> +netdev_dpdk_vhost_get_stats(const struct netdev *netdev,
> +                            struct netdev_stats *stats)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +
> +    ovs_mutex_lock(&dev->mutex);
> +    *stats = dev->stats_offset;
> +    /*Unsupported Stats*/
> +    stats->rx_errors = UINT64_MAX;
> +    stats->tx_errors = UINT64_MAX;
> +    stats->multicast = UINT64_MAX;
> +    stats->collisions = UINT64_MAX;
> +    stats->rx_crc_errors = UINT64_MAX;
> +    stats->rx_fifo_errors = UINT64_MAX;
> +    stats->rx_frame_errors = UINT64_MAX;
> +    stats->rx_length_errors = UINT64_MAX;
> +    stats->rx_missed_errors= UINT64_MAX;
> +    stats->rx_over_errors= UINT64_MAX;
> +    stats->tx_aborted_errors= UINT64_MAX;
> +    stats->tx_carrier_errors= UINT64_MAX;
> +    stats->tx_errors= UINT64_MAX;
> +    stats->tx_fifo_errors= UINT64_MAX;
> +    stats->tx_heartbeat_errors= UINT64_MAX;
> +    stats->tx_window_errors= UINT64_MAX;
> +    /*Supported Stats*/
> +    stats->rx_packets += dev->stats.rx_packets;
> +    stats->tx_packets += dev->stats.tx_packets;
> +    stats->rx_bytes +=dev->stats.rx_bytes;
> +    stats->tx_bytes +=dev->stats.tx_bytes;
> +    stats->tx_dropped += dev->stats.tx_dropped;
> +    ovs_mutex_unlock(&dev->mutex);
> +
> +    return 0;
> +}
> +
> +
> +static int
>  netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats 
> *stats)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> @@ -1016,8 +1508,17 @@ netdev_dpdk_get_carrier(const struct netdev *netdev_, 
> bool *carrier)
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev_);
>  
>      ovs_mutex_lock(&dev->mutex);
> -    check_link_status(dev);
> -    *carrier = dev->link.link_status;
> +    if (dev->type == VHOST) {
> +        struct virtio_net *virtio_dev = dev->virtio_dev;
> +        if (virtio_dev != NULL && virtio_dev->flags == VIRTIO_DEV_RUNNING) {
> +                *carrier =1;
> +        }else{
> +            *carrier = 0 ;
> +        }
> +    } else {
> +        check_link_status(dev);
> +        *carrier = dev->link.link_status;
> +    }
>      ovs_mutex_unlock(&dev->mutex);
>  
>      return 0;
> @@ -1062,18 +1563,20 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
>          return 0;
>      }
>  
> -    if (dev->flags & NETDEV_UP) {
> -        err = rte_eth_dev_start(dev->port_id);
> -        if (err)
> -            return -err;
> -    }
> +    if (dev->type == DPDK) {
> +        if (dev->flags & NETDEV_UP) {
> +            err = rte_eth_dev_start(dev->port_id);
> +            if (err)
> +                return -err;
> +        }
>  
> -    if (dev->flags & NETDEV_PROMISC) {
> -        rte_eth_promiscuous_enable(dev->port_id);
> -    }
> +        if (dev->flags & NETDEV_PROMISC) {
> +            rte_eth_promiscuous_enable(dev->port_id);
> +        }
>  
> -    if (!(dev->flags & NETDEV_UP)) {
> -        rte_eth_dev_stop(dev->port_id);
> +        if (!(dev->flags & NETDEV_UP)) {
> +            rte_eth_dev_stop(dev->port_id);
> +        }
>      }
>  
>      return 0;
> @@ -1184,6 +1687,150 @@ netdev_dpdk_set_admin_state(struct unixctl_conn 
> *conn, int argc,
>      unixctl_command_reply(conn, "OK");
>  }
>  
> +/*
> + * Set virtqueue flags so that we do not receive interrupts.
> + */
> +static void
> +set_irq_status (struct virtio_net *dev)
> +{
> +    dev->virtqueue[VIRTIO_RXQ]->used->flags = VRING_USED_F_NO_NOTIFY;
> +    dev->virtqueue[VIRTIO_TXQ]->used->flags = VRING_USED_F_NO_NOTIFY;
> +}
> +
> +/*
> + * A new virtio-net device is added to a vhost port.  The device is added to
> + * the first available port.
> + */
> +static int
> +new_device (struct virtio_net *dev)
> +{
> +    struct netdev_dpdk *netdev;
> +    bool count = 0;
> +
> +    /* Reset ready flag. */
> +    dev->ready = DEVICE_RX;
> +    dev->remove = 0;
> +
> +    /* Disable notifications. */
> +    set_irq_status(dev);
> +    dev->flags |= VIRTIO_DEV_RUNNING;
> +
> +    ovs_mutex_lock(&dpdk_mutex);
> +    /* Add device to first available vhost port. */
> +    LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
> +        if ( netdev->type == VHOST && netdev->virtio_dev == NULL) {
> +            ovs_mutex_lock(&netdev->mutex);
> +            netdev->virtio_dev = dev;
> +            ovs_mutex_unlock(&netdev->mutex);
> +            count = 1;
> +            break;
> +        }
> +    }
> +
> +    ovs_mutex_unlock(&dpdk_mutex);
> +
> +    if (!count) {
> +        VLOG_ERR("(%ld)  VHOST Device can't be added to first available port 
> \n",
> +            dev->device_fh);
> +        return -1;
> +    }
> +
> +    VLOG_INFO("(%ld)  VHOST Device has been added to vhost port %s \n",
> +        dev->device_fh, netdev->name);
> +
> +    return 0;
> +}
> +
> +/*
> + * Remove a virtio-net device from the specific vhost port.  Use dev->remove
> + * flag to stop any more packets from being sent or received to/from a VM and
> + * ensure all currently queued packets have been sent/received before 
> removing
> + *  the device.
> + */
> +static void
> +destroy_device (volatile struct virtio_net *dev)
> +{
> +    struct netdev_dpdk *vhost_dev;
> +    int tx_count, rx_count;
> +    dev->flags &= ~VIRTIO_DEV_RUNNING;
> +
> +    /* Set the remove flag to stop any more incoming or outgoing packets. */
> +    dev->remove = 1;
> +
> +    ovs_mutex_lock(&dpdk_mutex);
> +    LIST_FOR_EACH (vhost_dev, list_node, &dpdk_list) {
> +        if (vhost_dev->virtio_dev == dev) {
> +            do {
> +                /*
> +                 * Wait until there are no outgoing or incoming packets to
> +                 * remove the device.
> +                 */
> +                ovs_mutex_lock(&vhost_dev->mutex);
> +                rte_spinlock_lock(&vhost_dev->rx_q->queue_lock);
> +                rx_count = vhost_dev->rx_q->count;
> +                rte_spinlock_unlock(&vhost_dev->rx_q->queue_lock);
> +
> +                rte_spinlock_lock(&vhost_dev->tx_q->queue_lock);
> +                tx_count = vhost_dev->tx_q->count;
> +                rte_spinlock_unlock(&vhost_dev->tx_q->queue_lock);
> +                rte_pause();
> +            }while (tx_count != 0 && rx_count != 0) ;
> +
> +            vhost_dev->virtio_dev = NULL;
> +            ovs_mutex_unlock(&vhost_dev->mutex);
> +        }
> +    }
> +
> +    ovs_mutex_unlock(&dpdk_mutex);
> +
> +    VLOG_INFO("%ld Vhost Device has been removed\n", dev->device_fh);
> +}
> +/*
> + * These callbacks allow virtio-net devices to be added to vhost ports when
> + * configuration has been fully complete.
> + */
> +const struct virtio_net_device_ops virtio_net_device_ops =
> +{
> +    .new_device =  new_device,
> +    .destroy_device = destroy_device,
> +};
> +
> +static int
> +dpdk_vhost_class_init(void)
> +{
> +    int ret = -1, dev_index = 0;
> +    /*
> +     * Character device basename.  Can be set to something else for
> +     * co-existance with Linux vhost-net.
> +     */
> +    char dev_basename[MAX_BASENAME_SZ] = "vhost-net";
> +    /*
> +     * Note to use a different character device try "usvhost" for the
> +     * dev_basename and 1 for the index.
> +     */
> +
> +    if (list_is_empty(&dpdk_list)) {
> +        list_init(&dpdk_list);
> +    }
> +    if (list_is_empty(&dpdk_mp_list)) {
> +        list_init(&dpdk_mp_list);
> +    }
> +
> +    /* Register CUSE device to handle IOCTLs. */
> +    ret = register_cuse_device((char*)&dev_basename, dev_index,
> +                                get_virtio_net_callbacks());
> +    if (ret != 0) {
> +        VLOG_ERR("CUSE device setup failure.\n");
> +        return -1;
> +    }
> +
> +    init_virtio_net(&virtio_net_device_ops);
> +
> +    ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL);
> +
> +    return 0;
> +}
> +
>  static void
>  dpdk_common_init(void)
>  {
> @@ -1213,6 +1860,12 @@ dpdk_class_init(void)
>  
>      VLOG_INFO("Ethernet Device Count: %d", (int)rte_eth_dev_count());
>  
> +    if (list_is_empty(&dpdk_list)) {
> +        list_init(&dpdk_list);
> +    }
> +    if (list_is_empty(&dpdk_mp_list)) {
> +        list_init(&dpdk_mp_list);
> +    }
>      return 0;
>  }
>  
> @@ -1316,7 +1969,8 @@ unlock_dpdk:
>      return err;
>  }
>  
> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT)              \
> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, SEND,        \
> +    GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)            \
>  {                                                             \
>      NAME,                                                     \
>      INIT,                       /* init */                    \
> @@ -1331,7 +1985,7 @@ unlock_dpdk:
>      NULL,                       /* netdev_dpdk_set_config */  \
>      NULL,                       /* get_tunnel_config */       \
>                                                                \
> -    netdev_dpdk_send,           /* send */                    \
> +    SEND,                       /* send */                    \
>      NULL,                       /* send_wait */               \
>                                                                \
>      netdev_dpdk_set_etheraddr,                                \
> @@ -1342,9 +1996,9 @@ unlock_dpdk:
>      netdev_dpdk_get_carrier,                                  \
>      netdev_dpdk_get_carrier_resets,                           \
>      netdev_dpdk_set_miimon,                                   \
> -    netdev_dpdk_get_stats,                                    \
> +    GET_STATS,                                                \
>      netdev_dpdk_set_stats,                                    \
> -    netdev_dpdk_get_features,                                 \
> +    GET_FEATURES,                                             \
>      NULL,                       /* set_advertisements */      \
>                                                                \
>      NULL,                       /* set_policing */            \
> @@ -1366,7 +2020,7 @@ unlock_dpdk:
>      NULL,                       /* get_in6 */                 \
>      NULL,                       /* add_router */              \
>      NULL,                       /* get_next_hop */            \
> -    netdev_dpdk_get_status,                                   \
> +    GET_STATUS,                                               \
>      NULL,                       /* arp_lookup */              \
>                                                                \
>      netdev_dpdk_update_flags,                                 \
> @@ -1375,7 +2029,7 @@ unlock_dpdk:
>      netdev_dpdk_rxq_construct,                                \
>      netdev_dpdk_rxq_destruct,                                 \
>      netdev_dpdk_rxq_dealloc,                                  \
> -    netdev_dpdk_rxq_recv,                                     \
> +    RXQ_RECV,                                                 \
>      NULL,                       /* rx_wait */                 \
>      NULL,                       /* rxq_drain */               \
>  }
> @@ -1417,13 +2071,34 @@ const struct netdev_class dpdk_class =
>      NETDEV_DPDK_CLASS(
>          "dpdk",
>          dpdk_class_init,
> -        netdev_dpdk_construct);
> +        netdev_dpdk_construct,
> +        netdev_dpdk_send,
> +        netdev_dpdk_get_stats,
> +        netdev_dpdk_get_features,
> +        netdev_dpdk_get_status,
> +        netdev_dpdk_rxq_recv);
>  
>  const struct netdev_class dpdk_ring_class =
>      NETDEV_DPDK_CLASS(
>          "dpdkr",
>          NULL,
> -        netdev_dpdk_ring_construct);
> +        netdev_dpdk_ring_construct,
> +        netdev_dpdk_send,
> +        netdev_dpdk_get_stats,
> +        netdev_dpdk_get_features,
> +        netdev_dpdk_get_status,
> +        netdev_dpdk_rxq_recv);
> +
> +const struct netdev_class dpdk_vhost_class =
> +    NETDEV_DPDK_CLASS(
> +        "dpdkvhost",
> +        dpdk_vhost_class_init,
> +        netdev_dpdk_vhost_construct,
> +        netdev_dpdk_vhost_send,
> +        netdev_dpdk_vhost_get_stats,
> +        NULL,
> +        NULL,
> +        netdev_dpdk_vhost_rxq_recv);
>  
>  void
>  netdev_dpdk_register(void)
> @@ -1438,6 +2113,7 @@ netdev_dpdk_register(void)
>          dpdk_common_init();
>          netdev_register_provider(&dpdk_class);
>          netdev_register_provider(&dpdk_ring_class);
> +        netdev_register_provider(&dpdk_vhost_class);
>          ovsthread_once_done(&once);
>      }
>  }
> diff --git a/lib/netdev.c b/lib/netdev.c
> index ea16ccb..1e49a32 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -99,7 +99,8 @@ bool
>  netdev_is_pmd(const struct netdev *netdev)
>  {
>      return (!strcmp(netdev->netdev_class->type, "dpdk") ||
> -            !strcmp(netdev->netdev_class->type, "dpdkr"));
> +            !strcmp(netdev->netdev_class->type, "dpdkr") ||
> +            !strcmp(netdev->netdev_class->type, "dpdkvhost"));
>  }
>  
>  static void
> diff --git a/lib/vhost-net-cdev.c b/lib/vhost-net-cdev.c
> new file mode 100644
> index 0000000..2413655
> --- /dev/null
> +++ b/lib/vhost-net-cdev.c
> @@ -0,0 +1,387 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +#include <config.h>
> +#include <errno.h>
> +#include <fuse/cuse_lowlevel.h>
> +#include <linux/limits.h>
> +#include <linux/vhost.h>
> +#include <stdint.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +#include <rte_config.h>
> +#include <rte_eal.h>
> +#include <rte_ethdev.h>
> +#include <rte_string_fns.h>
> +#include "vhost-net-cdev.h"
> +#include "vlog.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpdk_vhost_net_cdev);
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> +
> +#define FUSE_OPT_DUMMY "\0\0"
> +#define FUSE_OPT_FORE "-f\0\0"
> +#define FUSE_OPT_NOMULTI "-s\0\0"
> +
> +const uint32_t default_major = 231;
> +const uint32_t default_minor = 1;
> +const char cuse_device_name[]    = "/dev/cuse";
> +const char default_cdev[] = "vhost-net";
> +
> +static struct fuse_session *session;
> +static struct vhost_net_device_ops const *ops;
> +
> +/*
> + * Returns vhost_device_ctx from given fuse_req_t. The index is populated 
> later
> + * when the device is added to the device linked list.
> + */
> +static struct vhost_device_ctx
> +fuse_req_to_vhost_ctx(fuse_req_t req, struct fuse_file_info *fi)
> +{
> +    struct vhost_device_ctx ctx;
> +    struct fuse_ctx const *const req_ctx = fuse_req_ctx(req);
> +
> +    ctx.pid = req_ctx->pid;
> +    ctx.fh = fi->fh;
> +
> +    return ctx;
> +}
> +
> +/*
> + * When the device is created in QEMU it gets initialised here and added to 
> the
> + * device linked list.
> + */
> +static void
> +vhost_net_open(fuse_req_t req, struct fuse_file_info *fi)
> +{
> +    struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
> +    int err = 0;
> +
> +    err = ops->new_device(ctx);
> +    if (err == -1) {
> +        fuse_reply_err(req, EPERM);
> +        return;
> +    }
> +
> +    fi->fh = err;
> +
> +    VLOG_INFO( "(%"PRIu64") Device configuration started\n", fi->fh);
> +    fuse_reply_open(req, fi);
> +}
> +
> +/*
> + * When QEMU is shutdown or killed the device gets released.
> + */
> +static void
> +vhost_net_release(fuse_req_t req, struct fuse_file_info *fi)
> +{
> +    int err = 0;
> +    struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
> +
> +    ops->destroy_device(ctx);
> +    VLOG_INFO( "(%"PRIu64") Device released\n", ctx.fh);
> +    fuse_reply_err(req, err);
> +}
> +
> +/*
> + * Boilerplate code for CUSE IOCTL
> + * Implicit arguments: ctx, req, result.
> + */
> +#define VHOST_IOCTL(func) do {             \
> +    result = (func)(ctx);                  \
> +    fuse_reply_ioctl(req, result, NULL, 0);\
> +} while(0)
> +
> +/*
> + * Boilerplate IOCTL RETRY
> + * Implicit arguments: req.
> + */
> +#define VHOST_IOCTL_RETRY(size_r, size_w) do {                              \
> +    struct iovec iov_r = { arg, (size_r) };                                 \
> +    struct iovec iov_w = { arg, (size_w) };                                 \
> +    fuse_reply_ioctl_retry(req, &iov_r, (size_r)?1:0, &iov_w, (size_w)?1:0);\
> +} while(0)
> +
> +/*
> + * Boilerplate code for CUSE Read IOCTL
> + * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
> + */
> +#define VHOST_IOCTL_R(type, var, func) do {    \
> +    if (!in_bufsz) {                           \
> +        VHOST_IOCTL_RETRY(sizeof(type), 0);    \
> +    } else {                                   \
> +        (var) = *(const type * ) in_buf;       \
> +        result = func(ctx, &(var));            \
> +        fuse_reply_ioctl(req, result, NULL, 0);\
> +    }                                          \
> +} while(0)
> +
> +/*
> + *    Boilerplate code for CUSE Write IOCTL
> + * Implicit arguments: ctx, req, result, out_bufsz.
> + */
> +#define    VHOST_IOCTL_W(type, var, func) do {              \
> +    if (!out_bufsz) {                                       \
> +        VHOST_IOCTL_RETRY(0, sizeof(type));                 \
> +    } else {                                                \
> +        result = (func)(ctx, &(var));                       \
> +        fuse_reply_ioctl(req, result, &(var), sizeof(type));\
> +    }                                                       \
> +} while(0)
> +
> +/*
> + * Boilerplate code for CUSE Read/Write IOCTL
> + * Implicit arguments: ctx, req, result, in_bufsz, in_buf.
> + */
> +#define VHOST_IOCTL_RW(type1, var1, type2, var2, func) do {   \
> +    if (!in_bufsz) {                                          \
> +        VHOST_IOCTL_RETRY(sizeof(type1), sizeof(type2));      \
> +    } else {                                                  \
> +        (var1) = *(const type1* ) (in_buf);                   \
> +        result = (func)(ctx, (var1), &(var2));                \
> +        fuse_reply_ioctl(req, result, &(var2), sizeof(type2));\
> +    }                                                         \
> +} while(0)
> +
> +/*
> + * The IOCTLs are handled using CUSE/FUSE in userspace.  Depending on
> + * the type of IOCTL a buffer is requested to read or to write.  This
> + * request is handled by FUSE and the buffer is then given to CUSE.
> + */
> +static void
> +vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
> +    struct fuse_file_info *fi, __rte_unused unsigned flags,
> +    const void *in_buf, size_t in_bufsz, size_t out_bufsz)
> +{
> +    struct vhost_device_ctx ctx = fuse_req_to_vhost_ctx(req, fi);
> +    struct vhost_vring_file file;
> +    struct vhost_vring_state state;
> +    struct vhost_vring_addr addr;
> +    static struct vhost_memory mem_temp;
> +    uint64_t features;
> +    uint32_t index;
> +    int result = 0;
> +
> +    switch(cmd) {
> +    case VHOST_NET_SET_BACKEND:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_NET_SET_BACKEND\n",
> +                ctx.fh);
> +            VHOST_IOCTL_R(struct vhost_vring_file, file, ops->set_backend);
> +            break;
> +
> +    case VHOST_GET_FEATURES:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_GET_FEATURES\n",
> +                ctx.fh);
> +            VHOST_IOCTL_W(uint64_t, features, ops->get_features);
> +            break;
> +
> +    case VHOST_SET_FEATURES:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_SET_FEATURES\n",
> +                ctx.fh);
> +            VHOST_IOCTL_R(uint64_t, features, ops->set_features);
> +            break;
> +
> +    case VHOST_RESET_OWNER:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_RESET_OWNER\n", 
> ctx.fh);
> +            VHOST_IOCTL(ops->reset_owner);
> +            break;
> +
> +    case VHOST_SET_OWNER:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_SET_OWNER\n", ctx.fh);
> +            VHOST_IOCTL(ops->set_owner);
> +            break;
> +
> +    case VHOST_SET_MEM_TABLE:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_SET_MEM_TABLE\n",
> +                ctx.fh);
> +            switch (in_bufsz) {
> +            case 0:
> +                    VHOST_IOCTL_RETRY(sizeof(struct vhost_memory), 0);
> +                    break;
> +
> +            case sizeof(struct vhost_memory):
> +                    mem_temp = *(const struct vhost_memory *) in_buf;
> +                    if (mem_temp.nregions > 0) {
> +                        VHOST_IOCTL_RETRY(sizeof(struct vhost_memory) +
> +                            (sizeof(struct vhost_memory_region) *
> +                             mem_temp.nregions), 0);
> +                    } else {
> +                        result = -1;
> +                        fuse_reply_ioctl(req, result, NULL, 0);
> +                    }
> +                    break;
> +
> +            default:
> +                    result = ops->set_mem_table(ctx, in_buf, 
> mem_temp.nregions);
> +                    if (result) {
> +                        fuse_reply_err(req, EINVAL);
> +                    } else {
> +                        fuse_reply_ioctl(req, result, NULL, 0);
> +                    }
> +            }
> +            break;
> +
> +    case VHOST_SET_VRING_NUM:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_SET_VRING_NUM\n",
> +                ctx.fh);
> +            VHOST_IOCTL_R(struct vhost_vring_state, state, 
> ops->set_vring_num);
> +            break;
> +
> +    case VHOST_SET_VRING_BASE:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_SET_VRING_BASE\n",
> +                ctx.fh);
> +            VHOST_IOCTL_R(struct vhost_vring_state, state, 
> ops->set_vring_base);
> +            break;
> +
> +    case VHOST_GET_VRING_BASE:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_GET_VRING_BASE\n",
> +                ctx.fh);
> +            VHOST_IOCTL_RW(uint32_t, index, struct vhost_vring_state, state,
> +                ops->get_vring_base);
> +            break;
> +
> +    case VHOST_SET_VRING_ADDR:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_SET_VRING_ADDR\n",
> +                ctx.fh);
> +            VHOST_IOCTL_R(struct vhost_vring_addr, addr, 
> ops->set_vring_addr);
> +            break;
> +
> +    case VHOST_SET_VRING_KICK:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_SET_VRING_KICK\n",
> +                ctx.fh);
> +            VHOST_IOCTL_R(struct vhost_vring_file, file, 
> ops->set_vring_kick);
> +            break;
> +
> +    case VHOST_SET_VRING_CALL:
> +            VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: VHOST_SET_VRING_CALL\n",
> +                ctx.fh);
> +            VHOST_IOCTL_R(struct vhost_vring_file, file, 
> ops->set_vring_call);
> +            break;
> +
> +    default:
> +            VLOG_ERR("(%"PRIu64") IOCTL: DOESN NOT EXIST\n", ctx.fh);
> +            result = -1;
> +            fuse_reply_ioctl(req, result, NULL, 0);
> +    }
> +
> +    if (result < 0) {
> +        VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: FAIL\n", ctx.fh);
> +    } else {
> +        VLOG_DBG_RL(&rl, "(%"PRIu64") IOCTL: SUCCESS\n", ctx.fh);
> +    }
> +}
> +
> +/*
> + * Structure handling open, release and ioctl function pointers is populated.
> + */
> +static const struct cuse_lowlevel_ops vhost_net_ops = {
> +    .open        = vhost_net_open,
> +    .release    = vhost_net_release,
> +    .ioctl        = vhost_net_ioctl,
> +};
> +
> +/*
> + * cuse_info is populated and used to register the cuse device.
> + * vhost_net_device_ops are also passed when the device is registered in
> + * netdev-dpdk-vhost.c.
> + */
> +int
> +register_cuse_device(const char *base_name, int index,
> +    struct vhost_net_device_ops const * const pops)
> +{
> +    struct cuse_info cuse_info;
> +    char device_name[PATH_MAX] = "";
> +    char char_device_name[PATH_MAX] = "";
> +    const char *device_argv[] = { device_name };
> +
> +    char fuse_opt_dummy[] = FUSE_OPT_DUMMY;
> +    char fuse_opt_fore[] = FUSE_OPT_FORE;
> +    char fuse_opt_nomulti[] = FUSE_OPT_NOMULTI;
> +    char *fuse_argv[] = {fuse_opt_dummy, fuse_opt_fore, fuse_opt_nomulti};
> +
> +    if (access(cuse_device_name, R_OK | W_OK) < 0) {
> +        VLOG_ERR("Character device %s can't be accessed, maybe not exist\n",
> +            cuse_device_name);
> +        return -1;
> +    }
> +
> +    /*
> +     * The device name is created.  This is passed to QEMU so that it can
> +     * register the device with our application.  The index allows us to have
> +     * multiple instances of userspace vhost which we can then add devices to
> +     * separately.
> +     */
> +    if (strncmp(base_name, default_cdev, PATH_MAX)!=0) {
> +        rte_snprintf(device_name, PATH_MAX, "DEVNAME=%s-%d", base_name, 
> index);
> +        rte_snprintf(char_device_name, PATH_MAX, "/dev/%s-%d", base_name,
> +            index);
> +    } else {
> +        rte_snprintf(device_name, PATH_MAX, "DEVNAME=%s", base_name);
> +        rte_snprintf(char_device_name, PATH_MAX, "/dev/%s", base_name);
> +    }
> +
> +    /* Check if device already exists. */
> +    if (access(char_device_name, F_OK) != -1) {
> +        VLOG_ERR("Character device %s already exists\n", char_device_name);
> +        return -1;
> +    }
> +
> +    memset(&cuse_info, 0, sizeof(cuse_info));
> +    cuse_info.dev_major = default_major;
> +    cuse_info.dev_minor = default_minor + index;
> +    cuse_info.dev_info_argc = 1;
> +    cuse_info.dev_info_argv = device_argv;
> +    cuse_info.flags = CUSE_UNRESTRICTED_IOCTL;
> +
> +    ops = pops;
> +
> +    session =
> +        cuse_lowlevel_setup(3, fuse_argv, &cuse_info, &vhost_net_ops, 0, 
> NULL);
> +    if (session == NULL) {
> +        return -1;
> +    }
> +    return 0;
> +}
> +
> +/*
> + * The CUSE session is launched allowing the application to receive open,
> + * release and ioctl calls.
> + */
> +void *
> +start_cuse_session_loop(void *dummy OVS_UNUSED)
> +{
> +    pthread_detach(pthread_self());
> +    fuse_session_loop(session);
> +
> +    return NULL;
> +}
> diff --git a/lib/vhost-net-cdev.h b/lib/vhost-net-cdev.h
> new file mode 100644
> index 0000000..398f314
> --- /dev/null
> +++ b/lib/vhost-net-cdev.h
> @@ -0,0 +1,81 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _VHOST_NET_CDEV_H_
> +#define _VHOST_NET_CDEV_H_
> +
> +#include <linux/vhost.h>
> +
> +struct vhost_memory;
> +struct vhost_vring_state;
> +struct vhost_vring_addr;
> +struct vhost_vring_file;
> +
> +/* Structure used to identify device context. */
> +struct vhost_device_ctx {
> +    pid_t    pid; /* PID of process calling the IOCTL. */
> +    uint64_t fh; /* Populated with fi->fh to track the device index. */
> +};
> +
> +/*
> + * Structure contains function pointers to be defined in virtio-net.c.  These
> + * functions are called in CUSE context and are used to configure devices.
> + */
> +struct vhost_net_device_ops {
> +    int (* new_device) (struct vhost_device_ctx);
> +    void (* destroy_device) (struct vhost_device_ctx);
> +
> +    int (* get_features) (struct vhost_device_ctx, uint64_t *);
> +    int (* set_features) (struct vhost_device_ctx, uint64_t *);
> +
> +    int (* set_mem_table) (struct vhost_device_ctx, const void *, uint32_t);
> +
> +    int (* set_vring_num) (struct vhost_device_ctx, struct vhost_vring_state 
> *);
> +    int (* set_vring_addr) (struct vhost_device_ctx, struct vhost_vring_addr 
> *);
> +    int (* set_vring_base) (struct vhost_device_ctx, struct 
> vhost_vring_state *);
> +    int (* get_vring_base) (struct vhost_device_ctx, uint32_t, struct 
> vhost_vring_state *);
> +
> +    int (* set_vring_kick) (struct vhost_device_ctx, struct vhost_vring_file 
> *);
> +    int (* set_vring_call) (struct vhost_device_ctx, struct vhost_vring_file 
> *);
> +
> +    int (* set_backend) (struct vhost_device_ctx, struct vhost_vring_file *);
> +
> +    int (* set_owner) (struct vhost_device_ctx);
> +    int (* reset_owner) (struct vhost_device_ctx);
> +};
> +
> +int register_cuse_device(const char *base_name, int index,
> +                         struct vhost_net_device_ops const * const);
> +void *start_cuse_session_loop(void *dummy);
> +
> +#endif /* _VHOST_NET_CDEV_H_ */
> diff --git a/lib/virtio-net.c b/lib/virtio-net.c
> new file mode 100644
> index 0000000..eebdcf6
> --- /dev/null
> +++ b/lib/virtio-net.c
> @@ -0,0 +1,1093 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +#include <config.h>
> +#include <dirent.h>
> +#include <fuse/cuse_lowlevel.h>
> +#include <linux/vhost.h>
> +#include <linux/virtio_net.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdlib.h>
> +
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +
> +#include <rte_config.h>
> +#include <rte_ethdev.h>
> +#include <rte_string_fns.h>
> +#include "virtio-net.h"
> +#include "vhost-net-cdev.h"
> +#include "utilities/eventfd_link/eventfd_link.h"
> +#include "vlog.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpdk_vhost_virtio_net);
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> +const char eventfd_cdev[] = "/dev/eventfd-link";
> +
> +/* Device ops to add/remove device to data core. */
> +static struct virtio_net_device_ops const * notify_ops;
> +/* Root address of the linked list in the configuration core. */
> +static struct virtio_net_config_ll *ll_root = NULL;
> +
> +/*
> + * Features supported by this application.
> + * RX merge buffers are disabled by default.
> + */
> +uint64_t VHOST_FEATURES = (0ULL << VIRTIO_NET_F_MRG_RXBUF);
> +
> +/* Line size for reading maps file. */
> +const uint32_t BUFSIZE = PATH_MAX;
> +
> +/* Size of prot char array in procmap. */
> +#define PROT_SZ 5
> +
> +/* Number of elements in procmap struct. */
> +#define PROCMAP_SZ 8
> +
> +/* Structure containing information gathered from maps file. */
> +struct procmap {
> +    uint64_t    va_start; /* Start virtual address in file. */
> +    uint64_t    len; /* Size of file. */
> +    uint64_t    pgoff; /* Not used. */
> +    uint32_t    maj; /* Not used. */
> +    uint32_t    min; /* Not used. */
> +    uint32_t    ino; /* Not used. */
> +    char        prot[PROT_SZ]; /* Not used. */
> +    char        fname[PATH_MAX]; /* File name. */
> +};
> +
> +/*
> + * Converts QEMU virtual address to Vhost virtual address.  This function is
> + * used to convert the ring addresses to our address space.
> + */
> +static uint64_t
> +qva_to_vva(struct virtio_net *dev, uint64_t qemu_va)
> +{
> +    struct virtio_memory_regions *region;
> +    uint64_t vhost_va = 0;
> +    uint32_t regionidx = 0;
> +
> +    /* Find the region where the address lives. */
> +    for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
> +        region = &dev->mem->regions[regionidx];
> +        if ((qemu_va >= region->userspace_address) &&
> +        (qemu_va <= region->userspace_address + region->memory_size)) {
> +            vhost_va = dev->mem->mapped_address + qemu_va -
> +                       dev->mem->base_address;
> +            break;
> +        }
> +    }
> +    return vhost_va;
> +}
> +
> +/*
> + * Locate the file containing QEMU's memory space and map it to our address
> + * space.
> + */
> +static int
> +host_memory_map (struct virtio_net *dev, struct virtio_memory *mem, pid_t 
> pid,
> +                 uint64_t addr)
> +{
> +    struct dirent *dptr = NULL;
> +    struct procmap procmap;
> +    DIR *dp = NULL;
> +    int fd;
> +    int i;
> +    char memfile[PATH_MAX];
> +    char mapfile[PATH_MAX];
> +    char procdir[PATH_MAX];
> +    char resolved_path[PATH_MAX];
> +    FILE *fmap;
> +    void *map;
> +    uint8_t found = 0;
> +    char line[BUFSIZE];
> +    char dlm[] = "-   :   ";
> +    char *str, *sp, *in[PROCMAP_SZ];
> +    char *end = NULL;
> +
> +    /* Path where mem files are located. */
> +    rte_snprintf (procdir, PATH_MAX, "/proc/%u/fd/", pid);
> +    /* Maps file used to locate mem file. */
> +    rte_snprintf (mapfile, PATH_MAX, "/proc/%u/maps", pid);
> +
> +    fmap = fopen(mapfile, "r");
> +    if (fmap == NULL) {
> +        VLOG_ERR("(%"PRIu64") Failed to open maps file for pid %d\n",
> +            dev->device_fh, pid);
> +        return -1;
> +    }
> +
> +    /* Read through maps file until we find out base_address. */
> +    while (fgets(line, BUFSIZE, fmap) != 0) {
> +        str = line;
> +        errno = 0;
> +        /* Split line in to fields. */
> +        for (i = 0; i < PROCMAP_SZ; i++) {
> +            if (((in[i] = strtok_r(str, &dlm[i], &sp)) == NULL) ||
> +            (errno != 0)) {
> +                fclose(fmap);
> +                return -1;
> +            }
> +            str = NULL;
> +        }
> +
> +        /* Convert/Copy each field as needed. */
> +        procmap.va_start = strtoull(in[0], &end, 16);
> +        if ((in[0] == '\0') || (end == NULL) || (*end != '\0') ||
> +        (errno != 0)) {
> +            fclose(fmap);
> +            return -1;
> +        }
> +
> +        procmap.len = strtoull(in[1], &end, 16);
> +        if ((in[1] == '\0') || (end == NULL) || (*end != '\0') ||
> +        (errno != 0)) {
> +            fclose(fmap);
> +            return -1;
> +        }
> +
> +        procmap.pgoff = strtoull(in[3], &end, 16);
> +        if ((in[3] == '\0') || (end == NULL) || (*end != '\0') ||
> +        (errno != 0)) {
> +            fclose(fmap);
> +            return -1;
> +        }
> +
> +        procmap.maj = strtoul(in[4], &end, 16);
> +        if ((in[4] == '\0') || (end == NULL) || (*end != '\0') ||
> +        (errno != 0)) {
> +            fclose(fmap);
> +            return -1;
> +        }
> +
> +        procmap.min = strtoul(in[5], &end, 16);
> +        if ((in[5] == '\0') || (end == NULL) || (*end != '\0') ||
> +        (errno != 0)) {
> +            fclose(fmap);
> +            return -1;
> +        }
> +
> +        procmap.ino = strtoul(in[6], &end, 16);
> +        if ((in[6] == '\0') || (end == NULL) || (*end != '\0') ||
> +        (errno != 0)) {
> +            fclose(fmap);
> +            return -1;
> +        }
> +
> +        memcpy(&procmap.prot, in[2], PROT_SZ);
> +        memcpy(&procmap.fname, in[7], PATH_MAX);
> +
> +        if (procmap.va_start == addr) {
> +            procmap.len = procmap.len - procmap.va_start;
> +            found = 1;
> +            break;
> +        }
> +    }
> +    fclose(fmap);
> +
> +    if (!found) {
> +        VLOG_ERR("(%"PRIu64") Failed to find memory file in pid %d maps 
> file\n",
> +            dev->device_fh, pid);
> +        return -1;
> +    }
> +
> +    /* Find the guest memory file among the process fds. */
> +    dp = opendir(procdir);
> +    if (dp == NULL) {
> +        VLOG_ERR("(%"PRIu64") Cannot open pid %d process directory \n",
> +            dev->device_fh, pid);
> +        return -1;
> +    }
> +
> +    found = 0;
> +
> +    /* Read the fd directory contents. */
> +    while (NULL != (dptr = readdir(dp))) {
> +        rte_snprintf (memfile, PATH_MAX, "/proc/%u/fd/%s", pid, 
> dptr->d_name);
> +        realpath(memfile, resolved_path);
> +        if (resolved_path == NULL) {
> +            VLOG_ERR("(%"PRIu64") Failed to resolve fd directory\n",
> +                dev->device_fh);
> +            closedir(dp);
> +            return -1;
> +        }
> +
> +        if ((strncmp(resolved_path, procmap.fname,
> +                ((strlen(procmap.fname) < PATH_MAX) ?
> +                strlen (procmap.fname) : PATH_MAX))) == 0) {
> +             found = 1;
> +            break;
> +        }
> +    }
> +
> +    closedir(dp);
> +
> +    if (found == 0) {
> +        VLOG_ERR("(%"PRIu64") Failed to find memory file for pid %d\n",
> +            dev->device_fh, pid);
> +        return -1;
> +    }
> +    /* Open the shared memory file and map the memory into this process. */
> +    fd = open(memfile, O_RDWR);
> +
> +    if (fd == -1) {
> +        VLOG_ERR("(%"PRIu64") Failed to open %s for pid %d\n", 
> dev->device_fh,
> +            memfile, pid);
> +        return -1;
> +    }
> +
> +    map = mmap(0, (size_t)procmap.len, PROT_READ|PROT_WRITE ,
> +            MAP_POPULATE|MAP_SHARED, fd, 0);
> +    close (fd);
> +
> +    if (map == MAP_FAILED) {
> +        VLOG_ERR("(%"PRIu64") Error mapping the file %s for pid %d\n",
> +            dev->device_fh, memfile, pid);
> +        return -1;
> +    }
> +
> +    /* Store the memory address and size in the device data structure */
> +    mem->mapped_address = (uint64_t)(uintptr_t)map;
> +    mem->mapped_size = procmap.len;
> +
> +    VLOG_DBG_RL(&rl, "(%"PRIu64") Mem File: %s->%s - Size: %llu - VA: %p\n",
> +                dev->device_fh, memfile, resolved_path,
> +                (long long unsigned)mem->mapped_size, map);
> +    return 0;
> +}
> +
> +/*
> + * Retrieves an entry from the devices configuration linked list.
> + */
> +static struct virtio_net_config_ll *
> +get_config_ll_entry(struct vhost_device_ctx ctx)
> +{
> +    struct virtio_net_config_ll *ll_dev = ll_root;
> +
> +    /* Loop through linked list until the device_fh is found. */
> +    while (ll_dev != NULL) {
> +        if ((ll_dev->dev.device_fh == ctx.fh)) {
> +            return ll_dev;
> +        }
> +        ll_dev = ll_dev->next;
> +    }
> +
> +    return NULL;
> +}
> +
> +/*
> + * Searches the configuration core linked list and retrieves the device if it
> + * exists.
> + */
> +static struct virtio_net *
> +get_device(struct vhost_device_ctx ctx)
> +{
> +    struct virtio_net_config_ll *ll_dev;
> +
> +    ll_dev = get_config_ll_entry(ctx);
> +
> +    /*
> +     * If a matching entry is found in the linked list, return the device in
> +     * that entry.
> +     */
> +    if (ll_dev) {
> +        return &ll_dev->dev;
> +    }
> +
> +    VLOG_ERR("(%"PRIu64") Device not found in linked list.\n", ctx.fh);
> +    return NULL;
> +}
> +
> +/* Add entry containing a device to the device configuration linked list. */
> +static void
> +add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
> +{
> +    struct virtio_net_config_ll *ll_dev = ll_root;
> +
> +    /* If ll_dev == NULL then this is the first device so go to else */
> +    if (ll_dev) {
> +        /* If the 1st device_fh != 0 then we insert our device here. */
> +        if (ll_dev->dev.device_fh != 0) {
> +            new_ll_dev->dev.device_fh = 0;
> +            new_ll_dev->next = ll_dev;
> +            ll_root = new_ll_dev;
> +        } else {
> +            /*
> +             * Increment through the ll until we find un unused device_fh.
> +             * Insert the device at that entry.
> +             */
> +            while ((ll_dev->next != NULL) && (ll_dev->dev.device_fh ==
> +                (ll_dev->next->dev.device_fh - 1))) {
> +                ll_dev = ll_dev->next;
> +            }
> +
> +            new_ll_dev->dev.device_fh = ll_dev->dev.device_fh + 1;
> +            new_ll_dev->next = ll_dev->next;
> +            ll_dev->next = new_ll_dev;
> +        }
> +    } else {
> +        ll_root = new_ll_dev;
> +        ll_root->dev.device_fh = 0;
> +    }
> +}
> +
> +/*
> + * Unmap any memory, close any file descriptors and free any memory owned by 
> a
> + * device.
> + */
> +static void
> +cleanup_device(struct virtio_net *dev)
> +{
> +    /* Unmap QEMU memory file if mapped. */
> +    if (dev->mem) {
> +        munmap((void*)(uintptr_t)dev->mem->mapped_address,
> +                (size_t)dev->mem->mapped_size);
> +        free(dev->mem);
> +    }
> +
> +    /* Close any event notifiers opened by device. */
> +    if (dev->virtqueue[VIRTIO_RXQ]->callfd) {
> +        close((int)dev->virtqueue[VIRTIO_RXQ]->callfd);
> +    }
> +    if (dev->virtqueue[VIRTIO_RXQ]->kickfd) {
> +        close((int)dev->virtqueue[VIRTIO_RXQ]->kickfd);
> +    }
> +    if (dev->virtqueue[VIRTIO_TXQ]->callfd) {
> +        close((int)dev->virtqueue[VIRTIO_TXQ]->callfd);
> +    }
> +    if (dev->virtqueue[VIRTIO_TXQ]->kickfd) {
> +        close((int)dev->virtqueue[VIRTIO_TXQ]->kickfd);
> +    }
> +}
> +
> +/* Release virtqueues and device memory. */
> +static void
> +free_device(struct virtio_net_config_ll *ll_dev)
> +{
> +    /* Free any malloc'd memory. */
> +    free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
> +    free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
> +    free(ll_dev);
> +}
> +
> +/* Remove an entry from the device configuration linked list. */
> +static struct virtio_net_config_ll *
> +rm_config_ll_entry(struct virtio_net_config_ll *ll_dev,
> +                   struct virtio_net_config_ll *ll_dev_last)
> +{
> +    /* First remove the device and then clean it up. */
> +    if (ll_dev == ll_root) {
> +        ll_root = ll_dev->next;
> +        cleanup_device(&ll_dev->dev);
> +        free_device(ll_dev);
> +        return ll_root;
> +    } else {
> +        if (likely(ll_dev_last != NULL)) {
> +            ll_dev_last->next = ll_dev->next;
> +            cleanup_device(&ll_dev->dev);
> +            free_device(ll_dev);
> +            return ll_dev_last->next;
> +        } else {
> +            cleanup_device(&ll_dev->dev);
> +            free_device(ll_dev);
> +            VLOG_ERR("Remove entry from config_ll failed\n");
> +            return NULL;
> +        }
> +    }
> +}
> +
> +/* Initialise all variables in device structure. */
> +static void
> +init_device(struct virtio_net *dev)
> +{
> +    uint64_t vq_offset;
> +
> +    /*
> +     * Virtqueues have already been malloc'ed so we don't want to set them to
> +     * NULL.
> +     */
> +    vq_offset = offsetof(struct virtio_net, mem);
> +
> +    /* Set everything to 0. */
> +    memset((void*)(uintptr_t)((uint64_t)(uintptr_t)dev + vq_offset), 0,
> +        (sizeof(struct virtio_net) - (size_t)vq_offset));
> +    memset(dev->virtqueue[VIRTIO_RXQ], 0, sizeof(struct vhost_virtqueue));
> +    memset(dev->virtqueue[VIRTIO_TXQ], 0, sizeof(struct vhost_virtqueue));
> +
> +    /* Backends are set to -1 indicating an inactive device. */
> +    dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
> +    dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
> +}
> +
> +/*
> + * Function is called from the CUSE open function.  The device structure is
> + * initialised and a new entry is added to the device configuration linked
> + * list.
> + */
> +static int
> +new_device(struct vhost_device_ctx ctx)
> +{
> +    struct virtio_net_config_ll *new_ll_dev;
> +    struct vhost_virtqueue *virtqueue_rx, *virtqueue_tx;
> +
> +    /* Setup device and virtqueues. */
> +    new_ll_dev = malloc(sizeof(struct virtio_net_config_ll));
> +    if (new_ll_dev == NULL) {
> +        VLOG_ERR("(%"PRIu64") Failed to allocate memory for dev.\n", ctx.fh);
> +        return -1;
> +    }
> +
> +    virtqueue_rx = malloc(sizeof(struct vhost_virtqueue));
> +    if (virtqueue_rx == NULL) {
> +        free(new_ll_dev);
> +        VLOG_ERR("(%"PRIu64") Failed to allocate memory for virtqueue_rx.\n",
> +            ctx.fh);
> +        return -1;
> +    }
> +
> +    virtqueue_tx = malloc(sizeof(struct vhost_virtqueue));
> +    if (virtqueue_tx == NULL) {
> +        free(virtqueue_rx);
> +        free(new_ll_dev);
> +        VLOG_ERR("(%"PRIu64") Failed to allocate memory for virtqueue_tx.\n",
> +            ctx.fh);
> +        return -1;
> +    }
> +
> +    new_ll_dev->dev.virtqueue[VIRTIO_RXQ] = virtqueue_rx;
> +    new_ll_dev->dev.virtqueue[VIRTIO_TXQ] = virtqueue_tx;
> +
> +    /* Initialise device and virtqueues. */
> +    init_device(&new_ll_dev->dev);
> +
> +    new_ll_dev->next = NULL;
> +
> +    /* Add entry to device configuration linked list. */
> +    add_config_ll_entry(new_ll_dev);
> +
> +    return new_ll_dev->dev.device_fh;
> +}
> +
> +/*
> + * Function is called from the CUSE release function.  This function will
> + * cleanup the device and remove it from device configuration linked list.
> + */
> +static void
> +destroy_device(struct vhost_device_ctx ctx)
> +{
> +    struct virtio_net_config_ll *ll_dev_cur_ctx, *ll_dev_last = NULL;
> +    struct virtio_net_config_ll *ll_dev_cur = ll_root;
> +
> +    /* Find the linked list entry for the device to be removed. */
> +    ll_dev_cur_ctx = get_config_ll_entry(ctx);
> +    while (ll_dev_cur != NULL) {
> +        /*
> +         * If the device is found or a device that doesn't exist is found 
> then
> +         * it is removed.
> +         */
> +        if (ll_dev_cur == ll_dev_cur_ctx) {
> +            /*
> +             * If the device is running on a data core then call the function
> +             * to remove it from the data core.
> +             */
> +            if ((ll_dev_cur->dev.flags & VIRTIO_DEV_RUNNING)) {
> +                notify_ops->destroy_device(&(ll_dev_cur->dev));
> +            }
> +            ll_dev_cur = rm_config_ll_entry(ll_dev_cur, ll_dev_last);
> +        } else {
> +            ll_dev_last = ll_dev_cur;
> +            ll_dev_cur = ll_dev_cur->next;
> +        }
> +    }
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_SET_OWNER
> + * This function just returns success at the moment unless the device hasn't
> + * been initialised.
> + */
> +static int
> +set_owner(struct vhost_device_ctx ctx)
> +{
> +    struct virtio_net *dev;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_RESET_OWNER
> + */
> +static int
> +reset_owner(struct vhost_device_ctx ctx)
> +{
> +    struct virtio_net_config_ll *ll_dev;
> +
> +    ll_dev = get_config_ll_entry(ctx);
> +
> +    cleanup_device(&ll_dev->dev);
> +    init_device(&ll_dev->dev);
> +
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_GET_FEATURES
> + * The features that we support are requested.
> + */
> +static int
> +get_features(struct vhost_device_ctx ctx, uint64_t *pu)
> +{
> +    struct virtio_net *dev;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    /* Send our supported features. */
> +    *pu = VHOST_FEATURES;
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_SET_FEATURES
> + * We receive the negotiated set of features supported by us and the virtio
> + * device.
> + */
> +static int
> +set_features(struct vhost_device_ctx ctx, uint64_t *pu)
> +{
> +    struct virtio_net *dev;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL || (*pu & ~VHOST_FEATURES)) {
> +        return -1;
> +    }
> +
> +    /* Store the negotiated feature list for the device. */
> +    dev->features = *pu;
> +
> +    /* Set the vhost_hlen depending on if VIRTIO_NET_F_MRG_RXBUF is set. */
> +    if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) {
> +        VLOG_DBG_RL(&rl, "(%"PRIu64") Mergeable RX buffers enabled\n",
> +                    dev->device_fh);
> +        dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
> +                        sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +        dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
> +                        sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +    } else {
> +        VLOG_DBG_RL(&rl, "(%"PRIu64") Mergeable RX buffers disabled\n",
> +                    dev->device_fh);
> +        dev->virtqueue[VIRTIO_RXQ]->vhost_hlen = sizeof(struct 
> virtio_net_hdr);
> +        dev->virtqueue[VIRTIO_TXQ]->vhost_hlen = sizeof(struct 
> virtio_net_hdr);
> +    }
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_SET_MEM_TABLE
> + * This function creates and populates the memory structure for the device.
> + * This includes storing offsets used to translate buffer addresses.
> + */
> +static int
> +set_mem_table(struct vhost_device_ctx ctx, const void *mem_regions_addr,
> +              uint32_t nregions)
> +{
> +    struct virtio_net *dev;
> +    struct vhost_memory_region *mem_regions;
> +    struct virtio_memory *mem;
> +    uint64_t size = offsetof(struct vhost_memory, regions);
> +    uint32_t regionidx, valid_regions;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    if (dev->mem) {
> +        munmap((void*)(uintptr_t)dev->mem->mapped_address,
> +                (size_t)dev->mem->mapped_size);
> +        free(dev->mem);
> +    }
> +
> +    /* Malloc the memory structure depending on the number of regions. */
> +    mem = calloc(1, sizeof(struct virtio_memory) +
> +            (sizeof(struct virtio_memory_regions) * nregions));
> +    if (mem == NULL) {
> +        VLOG_ERR("(%"PRIu64") Failed to allocate memory for dev->mem.\n",
> +            dev->device_fh);
> +        return -1;
> +    }
> +
> +    mem->nregions = nregions;
> +
> +    mem_regions =
> +    (void*)(uintptr_t)((uint64_t)(uintptr_t)mem_regions_addr + size);
> +
> +    for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
> +        /* Populate the region structure for each region. */
> +        mem->regions[regionidx].guest_phys_address =
> +                        mem_regions[regionidx].guest_phys_addr;
> +        mem->regions[regionidx].guest_phys_address_end =
> +                        mem->regions[regionidx].guest_phys_address +
> +                        mem_regions[regionidx].memory_size;
> +        mem->regions[regionidx].memory_size =
> +                        mem_regions[regionidx].memory_size;
> +        mem->regions[regionidx].userspace_address =
> +                        mem_regions[regionidx].userspace_addr;
> +        VLOG_DBG_RL(&rl,
> +                "(%"PRIu64") REGION: %u - GPA: %p - QEMU VA: %p - SIZE"
> +                " (%"PRIu64")\n", dev->device_fh, regionidx,
> +                (void*)(uintptr_t)mem->regions[regionidx].guest_phys_address,
> +                (void*)(uintptr_t)mem->regions[regionidx].userspace_address,
> +                mem->regions[regionidx].memory_size);
> +
> +        /* Set the base address mapping. */
> +        if (mem->regions[regionidx].guest_phys_address == 0x0) {
> +            mem->base_address = mem->regions[regionidx].userspace_address;
> +            /* Map VM memory file */
> +            if (host_memory_map(dev, mem, ctx.pid, mem->base_address) != 0) {
> +                free(mem);
> +                return -1;
> +            }
> +        }
> +    }
> +
> +    /* Check that we have a valid base address. */
> +    if (mem->base_address == 0) {
> +        VLOG_ERR("(%"PRIu64") Failed to find base address of qemu memory"
> +            " file.\n", dev->device_fh);
> +        free(mem);
> +        return -1;
> +    }
> +
> +    /*
> +     * Check if all of our regions have valid mappings.  Usually one does not
> +     * exist in the QEMU memory file.
> +     */
> +    valid_regions = mem->nregions;
> +    for (regionidx = 0; regionidx < mem->nregions; regionidx++) {
> +        if ((mem->regions[regionidx].userspace_address < mem->base_address) 
> ||
> +            (mem->regions[regionidx].userspace_address > (mem->base_address +
> +             mem->mapped_size))) {
> +                valid_regions--;
> +        }
> +    }
> +
> +    /*
> +     * If a region does not have a valid mapping we rebuild our memory struct
> +     * to contain only valid entries.
> +     */
> +    if (valid_regions != mem->nregions) {
> +        VLOG_DBG_RL(&rl,"(%"PRIu64") Not all memory regions exist in the 
> QEMU"
> +                    " mem file. Re-populating mem structure\n",
> +            dev->device_fh);
> +
> +        /*
> +         * Re-populate the memory structure with only valid regions. Invalid
> +         * regions are over-written with memmove.
> +         */
> +        valid_regions = 0;
> +
> +        for (regionidx = mem->nregions; 0 != regionidx--;) {
> +            if ((mem->regions[regionidx].userspace_address < 
> mem->base_address)
> +                || (mem->regions[regionidx].userspace_address >
> +                (mem->base_address + mem->mapped_size))) {
> +                memmove(&mem->regions[regionidx], &mem->regions[regionidx + 
> 1],
> +                    sizeof(struct virtio_memory_regions) * valid_regions);
> +            } else {
> +                valid_regions++;
> +            }
> +        }
> +    }
> +    mem->nregions = valid_regions;
> +    dev->mem = mem;
> +
> +    /*
> +     * Calculate the address offset for each region. This offset is used to
> +     * identify the vhost virtual address corresponding to a QEMU guest
> +     * physical address.
> +     */
> +    for (regionidx = 0; regionidx < dev->mem->nregions; regionidx++) {
> +        dev->mem->regions[regionidx].address_offset =
> +                dev->mem->regions[regionidx].userspace_address -
> +                dev->mem->base_address + dev->mem->mapped_address -
> +                dev->mem->regions[regionidx].guest_phys_address;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_SET_VRING_NUM
> + * The virtio device sends us the size of the descriptor ring.
> + */
> +static int
> +set_vring_num(struct vhost_device_ctx ctx, struct vhost_vring_state *state)
> +{
> +    struct virtio_net *dev;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    /*
> +     * State->index refers to the queue index. The TX queue is 1, RX queue is
> +     * 0.
> +     */
> +    dev->virtqueue[state->index]->size = state->num;
> +
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_SET_VRING_ADDR
> + * The virtio device sends us the desc, used and avail ring addresses. This
> + * function then converts these to our address space.
> + */
> +static int
> +set_vring_addr(struct vhost_device_ctx ctx, struct vhost_vring_addr *addr)
> +{
> +    struct virtio_net *dev;
> +    struct vhost_virtqueue *vq;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    /*
> +     * addr->index refers to the queue index.  The TX queue is 1, RX queue is
> +     * 0.
> +     */
> +    vq = dev->virtqueue[addr->index];
> +
> +    /* The addresses are converted from QEMU virtual to Vhost virtual. */
> +    vq->desc =
> +        (struct vring_desc*)(uintptr_t)qva_to_vva(dev, addr->desc_user_addr);
> +    if (vq->desc == 0) {
> +        VLOG_ERR("(%"PRIu64") Failed to find descriptor ring address.\n",
> +            dev->device_fh);
> +        return -1;
> +    }
> +
> +    vq->avail =
> +        (struct vring_avail*)(uintptr_t)qva_to_vva(dev, 
> addr->avail_user_addr);
> +    if (vq->avail == 0) {
> +        VLOG_ERR("(%"PRIu64") Failed to find available ring address.\n",
> +            dev->device_fh);
> +        return -1;
> +    }
> +
> +    vq->used =
> +        (struct vring_used*)(uintptr_t)qva_to_vva(dev, addr->used_user_addr);
> +    if (vq->used == 0) {
> +        VLOG_ERR("(%"PRIu64") Failed to find used ring address.\n",
> +            dev->device_fh);
> +        return -1;
> +    }
> +
> +    VLOG_DBG_RL(&rl, "(%"PRIu64") mapped address desc: %p\n", dev->device_fh,
> +        vq->desc);
> +    VLOG_DBG_RL(&rl, "(%"PRIu64") mapped address avail: %p\n", 
> dev->device_fh,
> +        vq->avail);
> +    VLOG_DBG_RL(&rl, "(%"PRIu64") mapped address used: %p\n", dev->device_fh,
> +        vq->used);
> +
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_SET_VRING_BASE
> + * The virtio device sends us the available ring last used index.
> + */
> +static int
> +set_vring_base(struct vhost_device_ctx ctx, struct vhost_vring_state *state)
> +{
> +    struct virtio_net *dev;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    /*
> +     * State->index refers to the queue index. The TX queue is 1, RX queue is
> +     * 0.
> +     */
> +    dev->virtqueue[state->index]->last_used_idx = state->num;
> +    dev->virtqueue[state->index]->last_used_idx_res = state->num;
> +
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_GET_VRING_BASE
> + * We send the virtio device our available ring last used index.
> + */
> +static int
> +get_vring_base(struct vhost_device_ctx ctx, uint32_t index,
> +    struct vhost_vring_state *state)
> +{
> +    struct virtio_net *dev;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    state->index = index;
> +    /*
> +     * State->index refers to the queue index. The TX queue is 1, RX queue is
> +     * 0.
> +     */
> +    state->num = dev->virtqueue[state->index]->last_used_idx;
> +
> +    return 0;
> +}
> +
> +/*
> + * This function uses the eventfd_link kernel module to copy an eventfd file
> + * descriptor provided by QEMU in to our process space.
> + */
> +static int
> +eventfd_copy(struct virtio_net *dev, struct eventfd_copy *eventfd_copy)
> +{
> +    int eventfd_link, ret;
> +
> +    /* Open the character device to the kernel module. */
> +    eventfd_link = open(eventfd_cdev, O_RDWR);
> +    if (eventfd_link < 0) {
> +        VLOG_ERR("(%"PRIu64") eventfd_link module is not loaded\n",
> +            dev->device_fh);
> +        return -1;
> +    }
> +
> +    /* Call the IOCTL to copy the eventfd. */
> +    ret = ioctl(eventfd_link, EVENTFD_COPY, eventfd_copy);
> +    close(eventfd_link);
> +
> +    if (ret < 0) {
> +        VLOG_ERR("(%"PRIu64") EVENTFD_COPY ioctl failed\n", dev->device_fh);
> +        return -1;
> +    }
> +
> +
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_SET_VRING_CALL
> + * The virtio device sends an eventfd to interrupt the guest.  This fd gets
> + * copied in to our process space.
> + */
> +static int
> +set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> +{
> +    struct virtio_net *dev;
> +    struct eventfd_copy eventfd_kick;
> +    struct vhost_virtqueue *vq;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    /*
> +     * file->index refers to the queue index.  The TX queue is 1, RX queue is
> +     * 0.
> +     */
> +    vq = dev->virtqueue[file->index];
> +
> +    if (vq->kickfd) {
> +        close((int)vq->kickfd);
> +    }
> +
> +    /* Populate the eventfd_copy structure and call eventfd_copy. */
> +    vq->kickfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> +    eventfd_kick.source_fd = vq->kickfd;
> +    eventfd_kick.target_fd = file->fd;
> +    eventfd_kick.target_pid = ctx.pid;
> +
> +    if (eventfd_copy(dev, &eventfd_kick)) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_SET_VRING_KICK
> + * The virtio device sends an eventfd that it can use to notify us.  This fd
> + * gets copied in to our process space.
> + */
> +static int
> +set_vring_kick(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> +{
> +    struct virtio_net *dev;
> +    struct eventfd_copy eventfd_call;
> +    struct vhost_virtqueue *vq;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    /*
> +     * file->index refers to the queue index.  The TX queue is 1, RX queue is
> +     * 0.
> +     */
> +    vq = dev->virtqueue[file->index];
> +
> +    if (vq->callfd) {
> +        close((int)vq->callfd);
> +    }
> +
> +    /* Populate the eventfd_copy structure and call eventfd_copy. */
> +    vq->callfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> +    eventfd_call.source_fd = vq->callfd;
> +    eventfd_call.target_fd = file->fd;
> +    eventfd_call.target_pid = ctx.pid;
> +
> +    if (eventfd_copy(dev, &eventfd_call)) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Called from CUSE IOCTL: VHOST_NET_SET_BACKEND
> + * To complete device initialisation when the virtio driver is loaded we are
> + * provided with a valid fd for a tap device (not used by us).  If this 
> happens
> + * then we can add the device to a data core. When the virtio driver is 
> removed
> + * we get fd=-1.  At that point we remove the device from the data core.  The
> + * device will still exist in the device configuration linked list.
> + */
> +static int
> +set_backend(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
> +{
> +    struct virtio_net *dev;
> +
> +    dev = get_device(ctx);
> +    if (dev == NULL) {
> +        return -1;
> +    }
> +
> +    /*
> +     * file->index refers to the queue index.  The TX queue is 1, RX queue is
> +     * 0.
> +     */
> +    dev->virtqueue[file->index]->backend = file->fd;
> +
> +    /*
> +     * If the device isn't already running and both backend fds are set we 
> add
> +     * the device.
> +     */
> +    if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
> +        if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED) 
> &&
> +            ((int)dev->virtqueue[VIRTIO_RXQ]->backend != 
> VIRTIO_DEV_STOPPED)) {
> +            notify_ops->new_device(dev);
> +        }
> +    /* Otherwise we remove it. */
> +    } else {
> +        if (file->fd == VIRTIO_DEV_STOPPED) {
> +            notify_ops->destroy_device(dev);
> +        }
> +    }
> +    return 0;
> +}
> +
> +/*
> + * Function pointers are set for the device operations to allow CUSE to call
> + * functions when an IOCTL, device_add or device_release is received.
> + */
> +static const struct vhost_net_device_ops vhost_device_ops =
> +{
> +    .new_device = new_device,
> +    .destroy_device = destroy_device,
> +
> +    .get_features = get_features,
> +    .set_features = set_features,
> +
> +    .set_mem_table = set_mem_table,
> +
> +    .set_vring_num = set_vring_num,
> +    .set_vring_addr = set_vring_addr,
> +    .set_vring_base = set_vring_base,
> +    .get_vring_base = get_vring_base,
> +
> +    .set_vring_kick = set_vring_kick,
> +    .set_vring_call = set_vring_call,
> +
> +    .set_backend = set_backend,
> +
> +    .set_owner = set_owner,
> +    .reset_owner = reset_owner,
> +};
> +
> +/*
> + * Called by main to setup callbacks when registering CUSE device.
> + */
> +struct vhost_net_device_ops const *
> +get_virtio_net_callbacks(void)
> +{
> +    return &vhost_device_ops;
> +}
> +
> +/*
> + * Register ops so that we can add/remove device to data core.
> + */
> +int
> +init_virtio_net(struct virtio_net_device_ops const * const ops)
> +{
> +    notify_ops = ops;
> +
> +    return 0;
> +}
> +
> +/*
> + * Currently not used as we Ctrl+c to exit application.
> + */
> +int
> +deinit_virtio_net(void)
> +{
> +    return 0;
> +}
> diff --git a/lib/virtio-net.h b/lib/virtio-net.h
> new file mode 100644
> index 0000000..205fe03
> --- /dev/null
> +++ b/lib/virtio-net.h
> @@ -0,0 +1,125 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _VIRTIO_NET_H_
> +#define _VIRTIO_NET_H_
> +
> +#include <sys/eventfd.h>
> +/* Used to indicate that the device is running on a data core */
> +#define VIRTIO_DEV_RUNNING 1
> +
> +/* Backend value set by guest. */
> +#define VIRTIO_DEV_STOPPED -1
> +
> +/* Enum for virtqueue management. */
> +enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
> +
> +/* Structure contains variables relevant to TX/RX virtqueues. */
> +struct vhost_virtqueue {
> +    struct vring_desc  *desc; /* Virtqueue descriptor ring. */
> +    struct vring_avail *avail; /* Virtqueue available ring. */
> +    struct vring_used  *used; /* Virtqueue used ring. */
> +    uint32_t           size; /* Size of descriptor ring. */
> +    uint32_t           backend;
> +    /* Backend value to determine if device should started/stopped. */
> +    uint16_t           vhost_hlen;
> +    /* Vhost header length (varies depending on RX merge buffers. */
> +    volatile uint16_t  last_used_idx;
> +    /* Last index used on the available ring */
> +    volatile uint16_t  last_used_idx_res;
> +    /* Used for multiple devices reserving buffers. */
> +    eventfd_t          callfd;
> +    /* Currently unused as polling mode is enabled. */
> +    eventfd_t          kickfd;
> +    /* Used to notify the guest (trigger interrupt). */
> +} __rte_cache_aligned;
> +
> +/*
> + * Device structure contains all configuration information relating to the
> + * device.
> + */
> +struct virtio_net {
> +    struct vhost_virtqueue *virtqueue[VIRTIO_QNUM];
> +    /* Contains all virtqueue information. */
> +    struct virtio_memory   *mem;
> +    /* QEMU memory and memory region information. */
> +    uint64_t               features; /* Negotiated feature set. */
> +    uint64_t               device_fh; /* device identifier. */
> +    uint32_t               flags;
> +    /* Device flags. Only used to check if device is running. */
> +    volatile uint8_t       ready;
> +    /* A device is set as ready if the MAC address has been set. */
> +    volatile uint8_t       remove;
> +    /* Device is marked for removal from the data core. */
> +} __rte_cache_aligned;
> +
> +/* Device linked list structure for configuration. */
> +struct virtio_net_config_ll {
> +    struct virtio_net           dev;    /* Virtio device. */
> +    struct virtio_net_config_ll *next; /* Next entry on linked list. */
> +};
> +
> +/*
> + * Information relating to memory regions including offsets to addresses in
> + * QEMUs memory file.
> + */
> +struct virtio_memory_regions {
> +    uint64_t    guest_phys_address; /* Base guest physical address of 
> region. */
> +    uint64_t    guest_phys_address_end; /* End guest physical address of 
> region. */
> +    uint64_t    memory_size; /* Size of region. */
> +    uint64_t    userspace_address; /* Base userspace address of region. */
> +    uint64_t    address_offset;  /* Offset of region for address 
> translation. */
> +};
> +
> +/* Memory structure includes region and mapping information. */
> +struct virtio_memory {
> +    uint64_t base_address; /* Base QEMU userspace address of the memory 
> file. */
> +    uint64_t mapped_address;
> +    /* Mapped address of memory file base in our applications memory space. 
> */
> +    uint64_t mapped_size; /* Total size of memory file. */
> +    uint32_t nregions; /* Number of memory regions. */
> +    struct virtio_memory_regions regions[0]; /* Memory region information. */
> +};
> +
> +/* Device operations to add/remove device. */
> +struct virtio_net_device_ops {
> +    int (* new_device) (struct virtio_net *); /* Add device. */
> +    void (* destroy_device) (volatile struct virtio_net *); /* Remove 
> device. */
> +};
> +
> +int init_virtio_net(struct virtio_net_device_ops const * const);
> +int deinit_virtio_net(void);
> +
> +struct vhost_net_device_ops const * get_virtio_net_callbacks(void);
> +
> +#endif /* _VIRTIO_NET_H_ */
> diff --git a/utilities/automake.mk b/utilities/automake.mk
> index 3e38e37..7a244fc 100644
> --- a/utilities/automake.mk
> +++ b/utilities/automake.mk
> @@ -35,7 +35,8 @@ EXTRA_DIST += \
>       utilities/ovs-save \
>       utilities/ovs-tcpundump.in \
>       utilities/ovs-test.in \
> -     utilities/ovs-vlan-test.in
> +     utilities/ovs-vlan-test.in \
> +     utilities/qemu-wrap.py
>  MAN_ROOTS += \
>       utilities/ovs-appctl.8.in \
>       utilities/ovs-benchmark.1.in \
> diff --git a/utilities/eventfd_link/Makefile.in 
> b/utilities/eventfd_link/Makefile.in
> new file mode 100644
> index 0000000..9e310c6
> --- /dev/null
> +++ b/utilities/eventfd_link/Makefile.in
> @@ -0,0 +1,86 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel Corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +AUTOMAKE_OPTIONS = -Wno-portability
> +
> +srcdir = @abs_srcdir@
> +builddir = @abs_builddir@
> +srcdir = @abs_srcdir@
> +top_srcdir = @abs_top_srcdir@
> +top_builddir = @top_builddir@
> +VERSION = @VERSION@
> +KSRC = /lib/modules/$(shell uname -r)/build
> +
> +DISTFILES = $(srcdir)/eventfd_link.c $(srcdir)/eventfd_link.h 
> $(srcdir)/Makefile.in
> +obj-m := eventfd_link.o
> +default: all
> +all:
> +module:
> +     $(MAKE) -C $(KSRC) M=$(srcdir) modules
> +
> +clean:
> +     rm -f *.o *.ko *.mod.* Module.symvers *.cmd distfiles 
> utilities-distfiles
> +
> +distclean: clean
> +
> +distfiles:
> +     @srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*]/\\\\&/g'`; \
> +     topsrcdirstrip=`echo "$(top_srcdir)" | sed 's/[].[^$$\\*]/\\\\&/g'`; \
> +     list='$(DISTFILES)'; \
> +     for file in $$list; do echo $$file; done | \
> +       sed -e "s|^$$srcdirstrip/||;t" \
> +           -e "s|^$$topsrcdirstrip/|$(top_builddir)/|;t" | sort -u > $@
> +CLEANFILES = distfiles
> +
> +install:
> +install-data:
> +install-exec:
> +uninstall:
> +install-dvi:
> +install-html:
> +install-info:
> +install-ps:
> +install-pdf:
> +installdirs:
> +check: all
> +installcheck:
> +mostlyclean:
> +dvi:
> +pdf:
> +ps:
> +info:
> +html:
> +tags:
> +TAGS:
> +modules_install:
> +maintainer-clean: distclean
> +
> +.PHONY: all clean distclean distdir
> diff --git a/utilities/eventfd_link/eventfd_link.c 
> b/utilities/eventfd_link/eventfd_link.c
> new file mode 100644
> index 0000000..11468c2
> --- /dev/null
> +++ b/utilities/eventfd_link/eventfd_link.c
> @@ -0,0 +1,179 @@
> +/*-
> + *  * GPL LICENSE SUMMARY
> + *  *
> + *  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *  *
> + *  *   This program is free software; you can redistribute it and/or modify
> + *  *   it under the terms of version 2 of the GNU General Public License as
> + *  *   published by the Free Software Foundation.
> + *  *
> + *  *   This program is distributed in the hope that it will be useful, but
> + *  *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + *  *   General Public License for more details.
> + *  *
> + *  *   You should have received a copy of the GNU General Public License
> + *  *   along with this program; if not, write to the Free Software
> + *  *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 
> 02110-1301 USA.
> + *  *   The full GNU General Public License is included in this distribution
> + *  *   in the file called LICENSE.GPL.
> + *  *
> + *  *   Contact Information:
> + *  *   Intel Corporation
> + *   */
> +
> +#include <linux/eventfd.h>
> +#include <linux/miscdevice.h>
> +#include <linux/module.h>
> +#include <linux/moduleparam.h>
> +#include <linux/rcupdate.h>
> +#include <linux/file.h>
> +#include <linux/slab.h>
> +#include <linux/fs.h>
> +#include <linux/mmu_context.h>
> +#include <linux/sched.h>
> +#include <asm/mmu_context.h>
> +#include <linux/fdtable.h>
> +#include "eventfd_link.h"
> +
> +/* get_files_struct is copied from fs/file.c. */
> +struct files_struct *
> +get_files_struct (struct task_struct *task)
> +{
> +    struct files_struct *files;
> +
> +    task_lock (task);
> +    files = task->files;
> +    if (files) {
> +        atomic_inc (&files->count);
> +    }
> +    task_unlock (task);
> +
> +    return files;
> +}
> +
> +/* put_files_struct is extracted from fs/file.c */
> +void
> +put_files_struct (struct files_struct *files)
> +{
> +    if (atomic_dec_and_test (&files->count)) {
> +        BUG ();
> +    }
> +}
> +
> +static long
> +eventfd_link_ioctl (struct file *f, unsigned int ioctl, unsigned long arg)
> +{
> +    void __user *argp = (void __user *) arg;
> +    struct task_struct *task_target = NULL;
> +    struct file *file;
> +    struct files_struct *files;
> +    struct fdtable *fdt;
> +    struct eventfd_copy eventfd_copy;
> +
> +    switch (ioctl) {
> +    case EVENTFD_COPY:
> +        if (copy_from_user (&eventfd_copy, argp, sizeof (struct 
> eventfd_copy)))
> +            return -EFAULT;
> +
> +        /* Find the task struct for the target pid. */
> +        task_target =
> +            pid_task (find_vpid (eventfd_copy.target_pid), PIDTYPE_PID);
> +        if (task_target == NULL) {
> +            printk (KERN_DEBUG "Failed to get mem ctx for target pid\n");
> +            return -EFAULT;
> +        }
> +
> +        files = get_files_struct (current);
> +        if (files == NULL) {
> +            printk (KERN_DEBUG "Failed to get files struct\n");
> +            return -EFAULT;
> +        }
> +
> +        rcu_read_lock ();
> +        file = fcheck_files (files, eventfd_copy.source_fd);
> +        if (file) {
> +            if (file->f_mode & FMODE_PATH
> +                    || !atomic_long_inc_not_zero (&file->f_count)) {
> +                file = NULL;
> +            }
> +        }
> +        rcu_read_unlock ();
> +        put_files_struct (files);
> +
> +        if (file == NULL) {
> +            printk (KERN_DEBUG "Failed to get file from source pid\n");
> +            return 0;
> +        }
> +
> +        /* Release the existing eventfd in the source process. */
> +        spin_lock (&files->file_lock);
> +        filp_close (file, files);
> +        fdt = files_fdtable (files);
> +        fdt->fd[eventfd_copy.source_fd] = NULL;
> +        spin_unlock (&files->file_lock);
> +
> +        /* Find the file struct associated with the target fd. */
> +        files = get_files_struct (task_target);
> +        if (files == NULL) {
> +            printk (KERN_DEBUG "Failed to get files struct\n");
> +            return -EFAULT;
> +        }
> +
> +        rcu_read_lock ();
> +        file = fcheck_files (files, eventfd_copy.target_fd);
> +        if (file) {
> +            if (file->f_mode & FMODE_PATH
> +                    || !atomic_long_inc_not_zero (&file->f_count)) {
> +                file = NULL;
> +            }
> +        }
> +        rcu_read_unlock ();
> +        put_files_struct (files);
> +
> +        if (file == NULL) {
> +            printk (KERN_DEBUG "Failed to get file from target pid\n");
> +            return 0;
> +        }
> +
> +        /* Install the file struct from the target process into the file 
> desciptor of the source
> +         * process.
> +         */
> +        fd_install (eventfd_copy.source_fd, file);
> +
> +        return 0;
> +
> +    default:
> +        return -ENOIOCTLCMD;
> +    }
> +}
> +
> +static const struct file_operations eventfd_link_fops = {
> +    .owner = THIS_MODULE,
> +    .unlocked_ioctl = eventfd_link_ioctl,
> +};
> +
> +static struct miscdevice eventfd_link_misc = {
> +    .name = "eventfd-link",
> +    .fops = &eventfd_link_fops,
> +};
> +
> +static int __init
> +eventfd_link_init (void)
> +{
> +    return misc_register (&eventfd_link_misc);
> +}
> +
> +static void __exit
> +eventfd_link_exit (void)
> +{
> +    misc_deregister (&eventfd_link_misc);
> +}
> +
> +module_init (eventfd_link_init);
> +module_exit (eventfd_link_exit);
> +MODULE_VERSION ("0.0.1");
> +MODULE_LICENSE ("GPL v2");
> +MODULE_AUTHOR ("Anthony Fee");
> +MODULE_DESCRIPTION ("Link eventfd");
> +MODULE_ALIAS ("devname:eventfd-link");
> diff --git a/utilities/eventfd_link/eventfd_link.h 
> b/utilities/eventfd_link/eventfd_link.h
> new file mode 100644
> index 0000000..8e7e551
> --- /dev/null
> +++ b/utilities/eventfd_link/eventfd_link.h
> @@ -0,0 +1,79 @@
> +/*-
> + *  * This file is provided under a dual BSD/GPLv2 license.  When using or
> + *  *   redistributing this file, you may do so under either license.
> + *  *
> + *  *   GPL LICENSE SUMMARY
> + *  *
> + *  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *  *
> + *  *   This program is free software; you can redistribute it and/or modify
> + *  *   it under the terms of version 2 of the GNU General Public License as
> + *  *   published by the Free Software Foundation.
> + *  *
> + *  *   This program is distributed in the hope that it will be useful, but
> + *  *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + *  *   General Public License for more details.
> + *  *
> + *  *   You should have received a copy of the GNU General Public License
> + *  *   along with this program; if not, write to the Free Software
> + *  *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 
> 02110-1301 USA.
> + *  *   The full GNU General Public License is included in this distribution
> + *  *   in the file called LICENSE.GPL.
> + *  *
> + *  *   Contact Information:
> + *  *   Intel Corporation
> + *  *
> + *  *   BSD LICENSE
> + *  *
> + *  *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *  *   All rights reserved.
> + *  *
> + *  *   Redistribution and use in source and binary forms, with or without
> + *  *   modification, are permitted provided that the following conditions
> + *  *   are met:
> + *  *
> + *  *     * Redistributions of source code must retain the above copyright
> + *  *       notice, this list of conditions and the following disclaimer.
> + *  *     * Redistributions in binary form must reproduce the above copyright
> + *  *       notice, this list of conditions and the following disclaimer in
> + *  *       the documentation and/or other materials provided with the
> + *  *       distribution.
> + *  *     * Neither the name of Intel Corporation nor the names of its
> + *  *       contributors may be used to endorse or promote products derived
> + *  *       from this software without specific prior written permission.
> + *  *
> + *  *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *  *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *  *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *  *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *  *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *  *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *  *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *  *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *  *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *  *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + *  *
> + *   */
> +
> +#ifndef _EVENTFD_LINK_H_
> +#define _EVENTFD_LINK_H_
> +
> +/*
> + *   ioctl to copy an fd entry in calling process to an fd in a target 
> process
> + */
> +#define EVENTFD_COPY 1
> +
> +/*
> + *   arguements for the EVENTFD_COPY ioctl
> + */
> +struct eventfd_copy {
> +    /* fd in the target pid */
> +    unsigned target_fd;
> +    /* fd in the calling pid */
> +    unsigned source_fd;
> +    /* pid of the target pid */
> +    pid_t target_pid;
> +};
> +#endif /* _EVENTFD_LINK_H_ */
> diff --git a/utilities/qemu-wrap.py b/utilities/qemu-wrap.py
> new file mode 100755
> index 0000000..1d5a7f4
> --- /dev/null
> +++ b/utilities/qemu-wrap.py
> @@ -0,0 +1,389 @@
> +#!/usr/bin/python
> +#
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel Corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +#
> +
> +#####################################################################
> +# This script is designed to modify the call to the QEMU emulator
> +# to support userspace vhost when starting a guest machine through
> +# libvirt with vhost enabled. The steps to enable this are as follows
> +# and should be run as root:
> +#
> +# 1. Place this script in a libvirtd's binary search PATH ($PATH)
> +#    A good location would be in the same directory that the QEMU
> +#    binary is located
> +#
> +# 2. Ensure that the script has the same owner/group and file
> +#    permissions as the QEMU binary
> +#
> +# 3. Update the VM xml file using "virsh edit VM.xml"
> +#
> +#    3.a) Set the VM to use the launch script
> +#
> +#            Set the emulator path contained in the
> +#            <emulator><emulator/> tags
> +#
> +#            e.g replace <emulator>/usr/bin/qemu-kvm<emulator/>
> +#        with    <emulator>/usr/bin/qemu-wrap.py<emulator/>
> +#
> +#     3.b) Set the VM's device's to use vhost-net offload
> +#
> +#            <interface type="network">
> +#            <model type="virtio"/>
> +#            <driver name="vhost"/>
> +#            <interface/>
> +#
> +# 4. Enable libvirt to access our userpace device file by adding it to
> +#    controllers cgroup for libvirtd using the following steps
> +#
> +#   4.a) In /etc/libvirt/qemu.conf add/edit the following lines:
> +#         1) cgroup_controllers = [ ... "devices", ... ]
> +#              2) clear_emulator_capabilities = 0
> +#         3) user = "root"
> +#         4) group = "root"
> +#         5) cgroup_device_acl = [
> +#                "/dev/null", "/dev/full", "/dev/zero",
> +#                "/dev/random", "/dev/urandom",
> +#                "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> +#                "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> +#                "/dev/<devbase-name>-<index>",
> +#                "/dev/hugepages"
> +#            ]
> +#
> +#   4.b) Disable SELinux or set to permissive mode
> +#
> +#   4.c) Mount cgroup device controller
> +#        "mkdir /dev/cgroup"
> +#        "mount -t cgroup none /dev/cgroup -o devices"
> +#
> +#   4.d) Set hugetlbfs_mount variable - ( Optional )
> +#        VMs using userspace vhost must use hugepage backed
> +#        memory. This can be enabled in the libvirt XML
> +#        config by adding a memory backing section to the
> +#        XML config e.g.
> +#             <memoryBacking>
> +#             <hugepages/>
> +#             </memoryBacking>
> +#        This memory backing section should be added after the
> +#        <memory> and <currentMemory> sections. This will add
> +#        flags "-mem-prealloc -mem-path <path>" to the QEMU
> +#        command line. The hugetlbfs_mount variable can be used
> +#        to override the default <path> passed through by libvirt.
> +#
> +#        if "-mem-prealloc" or "-mem-path <path>" are not passed
> +#        through and a vhost device is detected then these options will
> +#        be automatically added by this script. This script will detect
> +#        the system hugetlbfs mount point to be used for <path>. The
> +#        default <path> for this script can be overidden by the
> +#        hugetlbfs_dir variable in the configuration section of this script.
> +#
> +#
> +#   4.e) Restart the libvirtd system process
> +#        e.g. on Fedora "systemctl restart libvirtd.service"
> +#
> +#
> +#   4.f) Edit the Configuration Parameters section of this script
> +#        to point to the correct emulator location and set any
> +#        addition options
> +#
> +# The script modifies the libvirtd Qemu call by modifying/adding
> +# options based on the configuration parameters below.
> +# NOTE:
> +#     emul_path and us_vhost_path must be set
> +#     All other parameters are optional
> +#####################################################################
> +
> +
> +#############################################
> +# Configuration Parameters
> +#############################################
> +#Path to QEMU binary
> +emul_path = "/usr/local/bin/qemu-system-x86_64"
> +
> +#Path to userspace vhost device file
> +# This filename should match the --dev-basename --dev-index parameters of
> +# the command used to launch the userspace vhost sample application e.g.
> +# if the sample app lauch command is:
> +#    ./build/vhost-switch ..... --dev-basename usvhost --dev-index 1
> +# then this variable should be set to:
> +#   us_vhost_path = "/dev/usvhost-1"
> +us_vhost_path = "/dev/usvhost-1"
> +
> +#List of additional user defined emulation options. These options will
> +#be added to all Qemu calls
> +emul_opts_user = []
> +
> +#List of additional user defined emulation options for vhost only.
> +#These options will only be added to vhost enabled guests
> +emul_opts_user_vhost = []
> +
> +#For all VHOST enabled VMs, the VM memory is preallocated from hugetlbfs
> +# Set this variable to one to enable this option for all VMs
> +use_huge_all = 0
> +
> +#Instead of autodetecting, override the hugetlbfs directory by setting
> +#this variable
> +hugetlbfs_dir = ""
> +
> +#############################################
> +
> +
> +#############################################
> +# ****** Do Not Modify Below this Line ******
> +#############################################
> +
> +import sys, os, subprocess
> +import time
> +import signal
> +
> +
> +#List of open userspace vhost file descriptors
> +fd_list = []
> +
> +#additional virtio device flags when using userspace vhost
> +vhost_flags = [ "csum=off",
> +                "gso=off",
> +                "guest_tso4=off",
> +                "guest_tso6=off",
> +                "guest_ecn=off"
> +              ]
> +
> +#String of the path to the Qemu process pid
> +qemu_pid = "/tmp/%d-qemu.pid" % os.getpid()
> +
> +#############################################
> +# Signal haldler to kill Qemu subprocess
> +#############################################
> +def kill_qemu_process(signum, stack):
> +    pidfile = open(qemu_pid, 'r')
> +    pid = int(pidfile.read())
> +    os.killpg(pid, signal.SIGTERM)
> +    pidfile.close()
> +
> +
> +#############################################
> +# Find the system hugefile mount point.
> +# Note:
> +# if multiple hugetlbfs mount points exist
> +# then the first one found will be used
> +#############################################
> +def find_huge_mount():
> +
> +    if (len(hugetlbfs_dir)):
> +        return hugetlbfs_dir
> +
> +    huge_mount = ""
> +
> +    if (os.access("/proc/mounts", os.F_OK)):
> +        f = open("/proc/mounts", "r")
> +        line = f.readline()
> +        while line:
> +            line_split = line.split(" ")
> +            if line_split[2] == 'hugetlbfs':
> +                huge_mount = line_split[1]
> +                break
> +            line = f.readline()
> +    else:
> +        print "/proc/mounts not found"
> +        exit (1)
> +
> +    f.close
> +    if len(huge_mount) == 0:
> +        print "Failed to find hugetlbfs mount point"
> +        exit (1)
> +
> +    return huge_mount
> +
> +
> +#############################################
> +# Get a userspace Vhost file descriptor
> +#############################################
> +def get_vhost_fd():
> +
> +    if (os.access(us_vhost_path, os.F_OK)):
> +        fd = os.open( us_vhost_path, os.O_RDWR)
> +    else:
> +        print ("US-Vhost file %s not found" %us_vhost_path)
> +        exit (1)
> +
> +    return fd
> +
> +
> +#############################################
> +# Check for vhostfd. if found then replace
> +# with our own vhost fd and append any vhost
> +# flags onto the end
> +#############################################
> +def modify_netdev_arg(arg):
> +
> +    global fd_list
> +    vhost_in_use = 0
> +    s = ''
> +    new_opts = []
> +    netdev_opts = arg.split(",")
> +
> +    for opt in netdev_opts:
> +        #check if vhost is used
> +        if "vhost" == opt[:5]:
> +            vhost_in_use = 1
> +        else:
> +            new_opts.append(opt)
> +
> +    #if using vhost append vhost options
> +    if vhost_in_use == 1:
> +        #append vhost on option
> +        new_opts.append('vhost=on')
> +        #append vhostfd ption
> +        new_fd = get_vhost_fd()
> +        new_opts.append('vhostfd=' + str(new_fd))
> +        fd_list.append(new_fd)
> +
> +    #concatenate all options
> +    for opt in new_opts:
> +        if len(s) > 0:
> +                     s+=','
> +
> +        s+=opt
> +
> +    return s
> +
> +
> +#############################################
> +# Main
> +#############################################
> +def main():
> +
> +    global fd_list
> +    global vhost_in_use
> +    new_args = []
> +    num_cmd_args = len(sys.argv)
> +    emul_call = ''
> +    mem_prealloc_set = 0
> +    mem_path_set = 0
> +    num = 0;
> +
> +    #parse the parameters
> +    while (num < num_cmd_args):
> +        arg = sys.argv[num]
> +
> +             #Check netdev +1 parameter for vhostfd
> +        if arg == '-netdev':
> +            num_vhost_devs = len(fd_list)
> +            new_args.append(arg)
> +
> +            num+=1
> +            arg = sys.argv[num]
> +            mod_arg = modify_netdev_arg(arg)
> +            new_args.append(mod_arg)
> +
> +            #append vhost flags if this is a vhost device
> +            # and -device is the next arg
> +            # i.e -device -opt1,-opt2,...,-opt3,%vhost
> +            if (num_vhost_devs < len(fd_list)):
> +                num+=1
> +                arg = sys.argv[num]
> +                if arg == '-device':
> +                    new_args.append(arg)
> +                    num+=1
> +                    new_arg = sys.argv[num]
> +                    for flag in vhost_flags:
> +                        new_arg = ''.join([new_arg,',',flag])
> +                    new_args.append(new_arg)
> +                else:
> +                    new_args.append(arg)
> +        elif arg == '-mem-prealloc':
> +            mem_prealloc_set = 1
> +            new_args.append(arg)
> +        elif arg == '-mem-path':
> +            mem_path_set = 1
> +            new_args.append(arg)
> +
> +        else:
> +            new_args.append(arg)
> +
> +        num+=1
> +
> +    #Set Qemu binary location
> +    emul_call+=emul_path
> +    emul_call+=" "
> +
> +    #Add prealloc mem options if using vhost and not already added
> +    if ((len(fd_list) > 0) and (mem_prealloc_set == 0)):
> +        emul_call += "-mem-prealloc "
> +
> +    #Add mempath mem options if using vhost and not already added
> +    if ((len(fd_list) > 0) and (mem_path_set == 0)):
> +        #Detect and add hugetlbfs mount point
> +        mp = find_huge_mount()
> +        mp = "".join(["-mem-path ", mp])
> +        emul_call += mp
> +        emul_call += " "
> +
> +    #add user options
> +    for opt in emul_opts_user:
> +        emul_call += opt
> +        emul_call += " "
> +
> +    #Add add user vhost only options
> +    if len(fd_list) > 0:
> +        for opt in emul_opts_user_vhost:
> +            emul_call += opt
> +            emul_call += " "
> +
> +    #Add updated libvirt options
> +    iter_args = iter(new_args)
> +    #skip 1st arg i.e. call to this script
> +    next(iter_args)
> +    for arg in iter_args:
> +        emul_call+=str(arg)
> +        emul_call+= " "
> +
> +    emul_call += "-pidfile %s " % qemu_pid
> +    #Call QEMU
> +    process = subprocess.Popen(emul_call, shell=True, preexec_fn=os.setsid)
> +
> +    for sig in [signal.SIGTERM, signal.SIGINT, signal.SIGHUP, 
> signal.SIGQUIT]:
> +        signal.signal(sig, kill_qemu_process)
> +
> +    process.wait()
> +
> +    #Close usvhost files
> +    for fd in fd_list:
> +        os.close(fd)
> +    #Cleanup temporary files
> +    if os.access(qemu_pid, os.F_OK):
> +        os.remove(qemu_pid)
> +
> +
> +
> +if __name__ == "__main__":
> +    main()
> -- 
> 1.9.0
> 
> --------------------------------------------------------------
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> Business address: Dromore House, East Park, Shannon, Co. Clare
> 
> This e-mail and any attachments may contain confidential material for the 
> sole use of the intended recipient(s). Any review or distribution by others 
> is strictly prohibited. If you are not the intended recipient, please contact 
> the sender and delete all copies.
> 
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
> 
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 1/1] netdev-dpdk: add dpdk vhost ports

Reply via email to