> -----Original Message----- > From: Loftus, Ciara > Sent: Wednesday, May 18, 2016 3:43 PM > To: Traynor, Kevin <kevin.tray...@intel.com>; dev@openvswitch.org > Subject: RE: [ovs-dev] [PATCH v3 2/3] netdev-dpdk: Add vHost User PMD > > > > > > > DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser' > > > ports > > > to be controlled by the librte_ether API, like physical 'dpdk' > ports. > > > The commit integrates this functionality into OVS, and refactors > some > > > of the existing vhost code such that it is vhost-cuse specific. > > > Similarly, there is now some overlap between dpdk and vhost-user > port > > > code. > > > > > > Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> > > > > Hi, few minor comments below. I didn't review the cuse specific code > this > > time around. > Thanks Kevin for the feedback, my responses are inline. > > Ciara > > > > > Kevin. > > > > > > > --- > > > INSTALL.DPDK.md | 12 ++ > > > NEWS | 2 + > > > lib/netdev-dpdk.c | 628 +++++++++++++++++++++++++++++++++-------- > - > > -- > > > ---------- > > > 3 files changed, 396 insertions(+), 246 deletions(-) > > > > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > > > index 93f92e4..db7153a 100644 > > > --- a/INSTALL.DPDK.md > > > +++ b/INSTALL.DPDK.md > > > @@ -990,6 +990,18 @@ Restrictions: > > > increased to the desired number of queues. Both DPDK and OVS > > > must be > > > recompiled for this change to take effect. > > > > > > + DPDK 'eth' type ports: > > > + - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in > the > > > context of > > > + DPDK as they are all managed by the rte_ether API. This means > > > that they > > > + adhere to the DPDK configuration option > CONFIG_RTE_MAX_ETHPORTS > > > which by > > > + default is set to 32. This means by default the combined > total > > > number of > > > + dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with > DPDK > > > is 32. This > > > + value can be changed if desired by modifying the > configuration > > > file in > > > + DPDK, or by overriding the default value on the command line > > > when building > > > + DPDK. eg. > > > + > > > + `make install CONFIG_RTE_MAX_ETHPORTS=64` > > > + > > > Bug Reporting: > > > -------------- > > > > > > diff --git a/NEWS b/NEWS > > > index 4e81cad..841314b 100644 > > > --- a/NEWS > > > +++ b/NEWS > > > @@ -32,6 +32,8 @@ Post-v2.5.0 > > > * DB entries have been added for many of the DPDK EAL > command > > > line > > > arguments. Additional arguments can be passed via the > dpdk- > > > extra > > > entry. > > > + * vHost PMD integration brings vhost-user ports under > control > > > of the > > > + rte_ether DPDK API. > > > - ovs-benchmark: This utility has been removed due to lack of > use > > > and > > > bitrot. > > > - ovs-appctl: > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > > index 89d783a..814ef83 100644 > > > --- a/lib/netdev-dpdk.c > > > +++ b/lib/netdev-dpdk.c > > > @@ -55,6 +55,7 @@ > > > #include "unixctl.h" > > > > > > #include "rte_config.h" > > > +#include "rte_eth_vhost.h" > > > #include "rte_mbuf.h" > > > #include "rte_meter.h" > > > #include "rte_virtio_net.h" > > > @@ -139,6 +140,11 @@ static char *cuse_dev_name = NULL; /* > > > Character device cuse_dev_name. */ > > > #endif > > > static char *vhost_sock_dir = NULL; /* Location of vhost-user > > > sockets */ > > > > > > +/* Array that tracks the used & unused vHost user driver IDs */ > > > +static unsigned int vhost_user_drv_ids[RTE_MAX_ETHPORTS]; > > > > I think you can replace this array with a counter. You don't need a > > unique id - just that you are < MAX. > > I considered at first using a counter, but what if the counter reaches > the MAX but we still have space for most vHost ports? > eg. We add RTE_MAX_ETHPORTS vHost ports, delete all the ports, then > try to add one again but can't because the counter is at max. > Even if we decrement the counter on delete this still doesn't solve > the problem because the port we delete won't necessarily be the last > one we've added.
ok, got it - it's part of the name and that has to be unique. > > > > > > +/* Maximum string length allowed to provide to rte_eth_attach > > > function */ > > > +#define DEVARGS_MAX (RTE_ETH_NAME_MAX_LEN + PATH_MAX + 18) > > > + > > > /* > > > * Maximum amount of time in micro seconds to try and enqueue to > > > vhost. > > > */ > > > @@ -172,7 +178,8 @@ enum { DRAIN_TSC = 200000ULL }; > > > > > > enum dpdk_dev_type { > > > DPDK_DEV_ETH = 0, > > > - DPDK_DEV_VHOST = 1, > > > + DPDK_DEV_VHOST_USER = 1, > > > + DPDK_DEV_VHOST_CUSE = 2, > > > }; > > > > > > static int rte_eal_init_ret = ENODEV; > > > @@ -358,12 +365,22 @@ struct netdev_dpdk { > > > int real_n_rxq; > > > bool txq_needs_locking; > > > > > > - /* virtio-net structure for vhost device */ > > > + /* Spinlock for vhost cuse transmission. Other DPDK devices > use > > > spinlocks > > > + * in dpdk_tx_queue */ > > > + rte_spinlock_t vhost_cuse_tx_lock; > > > + > > > + /* virtio-net structure for vhost cuse device */ > > > OVSRCU_TYPE(struct virtio_net *) virtio_dev; > > > > > > + /* Number of virtqueue pairs reported by the guest */ > > > + uint32_t vhost_qp_nb; > > > + > > > /* Identifier used to distinguish vhost devices from each > other > > > */ > > > char vhost_id[PATH_MAX]; > > > > > > + /* ID of vhost user port given to the PMD driver */ > > > + unsigned int vhost_pmd_id; > > > + > > > > This could be removed if you just use a counter as per comment > above. > > > > > /* In dpdk_list. */ > > > struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); > > > > > > @@ -381,16 +398,20 @@ struct netdev_rxq_dpdk { > > > static bool dpdk_thread_is_pmd(void); > > > > > > static int netdev_dpdk_construct(struct netdev *); > > > +static int netdev_dpdk_vhost_user_construct(struct netdev *); > > > > > > struct virtio_net * netdev_dpdk_get_virtio(const struct > netdev_dpdk > > > *dev); > > > > > > void link_status_changed_callback(uint8_t port_id, > > > enum rte_eth_event_type type OVS_UNUSED, void *param > > > OVS_UNUSED); > > > +void vring_state_changed_callback(uint8_t port_id, > > > + enum rte_eth_event_type type OVS_UNUSED, void *param > > > OVS_UNUSED); > > > > > > static bool > > > -is_dpdk_class(const struct netdev_class *class) > > > +is_dpdk_eth_class(const struct netdev_class *class) > > > { > > > - return class->construct == netdev_dpdk_construct; > > > + return ((class->construct == netdev_dpdk_construct) || > > > + (class->construct == > netdev_dpdk_vhost_user_construct)); > > > } > > > > > > /* DPDK NIC drivers allocate RX buffers at a particular > granularity, > > > typically > > > @@ -592,7 +613,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > > > *dev, int n_rxq, int n_txq) > > > } > > > > > > dev->up.n_rxq = n_rxq; > > > - dev->real_n_txq = n_txq; > > > + /* Only set real_n_txq for physical devices. vHost User > > > devices will > > > + * set this value correctly during vring_state_changed > > > callbacks. > > > + */ > > > + if (dev->type == DPDK_DEV_ETH) { > > > + dev->real_n_txq = n_txq; > > > + } > > > > > > return 0; > > > } > > > @@ -697,6 +723,118 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk > > *dev, > > > unsigned int n_txqs) > > > } > > > } > > > > > > +/* > > > + * Fixes mapping for vhost-user tx queues. Must be called after > each > > > + * enabling/disabling of queues and real_n_txq modifications. > > > + */ > > > +static void > > > +netdev_dpdk_remap_txqs(struct netdev_dpdk *dev) > > > + OVS_REQUIRES(dev->mutex) > > > +{ > > > + int *enabled_queues, n_enabled = 0; > > > + int i, k, total_txqs = dev->real_n_txq; > > > + > > > + enabled_queues = dpdk_rte_mzalloc(total_txqs * sizeof > > > *enabled_queues); > > > + > > > + for (i = 0; i < total_txqs; i++) { > > > + /* Enabled queues always mapped to themselves. */ > > > + if (dev->tx_q[i].map == i) { > > > + enabled_queues[n_enabled++] = i; > > > + } > > > + } > > > + > > > + if (n_enabled == 0 && total_txqs != 0) { > > > + enabled_queues[0] = OVS_VHOST_QUEUE_DISABLED; > > > + n_enabled = 1; > > > + } > > > + > > > + k = 0; > > > + for (i = 0; i < total_txqs; i++) { > > > + if (dev->tx_q[i].map != i) { > > > + dev->tx_q[i].map = enabled_queues[k]; > > > + k = (k + 1) % n_enabled; > > > + } > > > + } > > > + > > > + VLOG_DBG("TX queue mapping for %s\n", dev->vhost_id); > > > + for (i = 0; i < total_txqs; i++) { > > > + VLOG_DBG("%2d --> %2d", i, dev->tx_q[i].map); > > > + } > > > + > > > + rte_free(enabled_queues); > > > +} > > > + > > > +static void > > > +netdev_dpdk_vhost_set_queues(struct netdev_dpdk *dev) > > > + OVS_REQUIRES(dev->mutex) > > > +{ > > > + dev->real_n_rxq = dev->vhost_qp_nb; > > > + dev->real_n_txq = dev->vhost_qp_nb; > > > + dev->txq_needs_locking = true; > > > + > > > + /* Enable TX queue 0 by default if it wasn't disabled. */ > > > + if (dev->tx_q[0].map == OVS_VHOST_QUEUE_MAP_UNKNOWN) { > > > + dev->tx_q[0].map = 0; > > > + } > > > + > > > + netdev_dpdk_remap_txqs(dev); > > > + > > > + return; > > > +} > > > + > > > +/* Clears mapping for all available queues of vhost interface. */ > > > +static void > > > +netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) > > > + OVS_REQUIRES(dev->mutex) > > > +{ > > > + int i; > > > + > > > + for (i = 0; i < dev->real_n_txq; i++) { > > > + dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN; > > > + } > > > +} > > > + > > > +void > > > +vring_state_changed_callback(uint8_t port_id, > > > + enum rte_eth_event_type type > > > OVS_UNUSED, > > > + void *param OVS_UNUSED) > > > +{ > > > + struct netdev_dpdk *dev; > > > + struct rte_eth_vhost_queue_event event; > > > + int err = 0; > > > + > > > + err = rte_eth_vhost_get_queue_event(port_id, &event); > > > + if (err || (event.rx == 1)) { > > > > if (err || event.rx) would be clearer > ok > > > > > + return; > > > + } > > > + > > > + ovs_mutex_lock(&dpdk_mutex); > > > + LIST_FOR_EACH (dev, list_node, &dpdk_list) { > > > + if (port_id == dev->port_id) { > > > + ovs_mutex_lock(&dev->mutex); > > > + if (event.enable) { > > > + dev->tx_q[event.queue_id].map = event.queue_id; > > > + dev->vhost_qp_nb++; > > > + } else { > > > + dev->tx_q[event.queue_id].map = > > > OVS_VHOST_QUEUE_DISABLED; > > > + dev->vhost_qp_nb--; > > > + } > > > + if (dev->vhost_qp_nb > dev->up.n_rxq) { > > > > There's a few places through the code where vhost_qp_nb is assumed > to > > have > > a value, though it is initialized as 0. It *should* be ok, but > perhaps the > > initialization could be changed or 0 could be incorporated into the > range > > checks. What do you think? > > virt_qp_nb is only used after VM boot so it should be safe to assume > it will have a valid value. Although zero may be a valid value if we > boot with no queues (not sure if possible!) > I think the init should stay as 0 - it's representative of the virtual > queue pairs and we have zero when the port is being created before the > VM exists. > By incorporating 0 into the range checks are you suggesting reporting > an error when the queues reported = 0? My general point was the rx/tx queues are being inited to 1 whereas the qp to 0. So long as we can't do anything we shouldn't while they are in that state because something will catch it, it's ok. Checking for 0 qp would be one way. [snip] > > > + if (dev->type == DPDK_DEV_ETH) { > > > + /* DPDK counts imissed as errors, but count them here as > > > dropped > > > + * instead */ > > > + stats->rx_errors = rte_stats.ierrors - rte_stats.imissed; > > > + stats->tx_errors = rte_stats.oerrors; > > > + stats->multicast = rte_stats.imcasts; > > > + > > > + rte_spinlock_lock(&dev->stats_lock); > > > + stats->tx_dropped = dev->stats.tx_dropped; > > > + rte_spinlock_unlock(&dev->stats_lock); > > > + } else { > > > + stats->rx_errors = UINT64_MAX; > > > + stats->tx_errors = UINT64_MAX; > > > + stats->multicast = UINT64_MAX; > > > + > > > + rte_spinlock_lock(&dev->stats_lock); > > > + stats->tx_dropped = UINT64_MAX; > > > > We should still be able to count tx dropped. > You're right. Will fix in v4 > > > > > > + rte_spinlock_unlock(&dev->stats_lock); > > > + } > > > > I assume there's some rework because of the extended stats that > > have merged now? > I think when this was submitted it was rebased on top of these > changes. Are they reporting for vhost-pmd? if not, you may need to skip that check or set some =UINT64_MAX _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev