Re: [ovs-dev] [PATCH] netdev-dpdk: Set pmd thread priority

2016-04-25 Thread Traynor, Kevin

On 21/04/2016 16:16, Bhanuprakash Bodireddy wrote:

Set the DPDK pmd thread scheduling policy to SCHED_RR and static
priority to highest priority value of the policy. This is to deal with
pmd thread starvation case where another cpu hogging process can get
scheduled/affinitized to the same core where pmd is running there by
significantly impacting the datapath performance.

The realtime scheduling policy is applied only when CPU mask is passed
to 'pmd-cpu-mask'. The exception to this is 'pmd-cpu-mask=1', where the
policy and priority shall not be applied to pmd thread spawned on core0.
For example:

 * In the absence of pmd-cpu-mask or if pmd-cpu-mask=1, one pmd
   thread shall be created and affinitized to 'core 0' with default
   scheduling policy and priority applied.

 * If pmd-cpu-mask is specified with CPU mask > 1, one or more pmd
   threads shall be spawned on the corresponding core(s) in the mask
   and real time scheduling policy SCHED_RR and highest static
   priority is applied to the pmd thread(s).

To reproduce use following commands:

ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
taskset 0x2 cat /dev/zero > /dev/null &


Even though it seems the most likely case - I'm not sure that we can 
always assume the user who put the non-OVS process on the core did so by 
mistake and would want us to increase our priority.




Signed-off-by: Bhanuprakash Bodireddy 
---
  lib/dpif-netdev.c |  9 +
  lib/netdev-dpdk.c | 14 ++
  lib/netdev-dpdk.h |  1 +
  3 files changed, 24 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1e8a37c..4a46816 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2670,6 +2670,15 @@ pmd_thread_main(void *f_)
  /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
  ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
  pmd_thread_setaffinity_cpu(pmd->core_id);
+
+#ifdef DPDK_NETDEV
+/* Set pmd thread's scheduling policy to SCHED_RR and priority to
+ * highest priority of SCHED_RR policy, In absence of pmd-cpu-mask (or)
+ * pmd-cpu-mask=1, default scheduling policy and priority shall
+ * apply to pmd thread */
+ if (pmd->core_id)
+pmd_thread_setpriority();


Similar to above, I don't think we can assume anything special about 
core 0. This type of change sounds like something that would be better 
done at a layer above vswitch which has more system wide knowledge.


fwiw, it would be cleaner to remove the #ifdef from here and create a 
dummy fn in netdev-dpdk.h, also the 'if' needs {}



+#endif
  reload:
  emc_cache_init(&pmd->flow_cache);

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 208c5f5..6518c87 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2926,6 +2926,20 @@ pmd_thread_setaffinity_cpu(unsigned cpu)
  return 0;
  }

+void
+pmd_thread_setpriority(void)
+{
+struct sched_param threadparam;
+int err;
+
+memset(&threadparam, 0, sizeof(threadparam));
+threadparam.sched_priority = sched_get_priority_max(SCHED_RR);
+err = pthread_setschedparam(pthread_self(), SCHED_RR, &threadparam);
+if (err) {
+VLOG_WARN("Thread priority error %d",err);
+}
+}
+
  static bool
  dpdk_thread_is_pmd(void)
  {
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index 646d3e2..168673b 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -26,6 +26,7 @@ int dpdk_init(int argc, char **argv);
  void netdev_dpdk_register(void);
  void free_dpdk_buf(struct dp_packet *);
  int pmd_thread_setaffinity_cpu(unsigned cpu);
+void pmd_thread_setpriority(void);

  #else




___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Add vHost User PMD

2016-04-28 Thread Traynor, Kevin

On 21/04/2016 13:20, Ciara Loftus wrote:

DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser' ports
to be controlled by the librte_ether API, like physical 'dpdk' ports.
The commit integrates this functionality into OVS, and refactors some
of the existing vhost code such that it is vhost-cuse specific.
Similarly, there is now some overlap between dpdk and vhost-user port
code.

Signed-off-by: Ciara Loftus 
---
  INSTALL.DPDK.md   |  12 ++
  NEWS  |   2 +
  lib/netdev-dpdk.c | 515 +-


Hi Ciara, there's a lot of churn in this file. It might be worth 
considering to see if it could be split through a few commits commits to 
help reviewers. e.g. new features like adding get_features, get_status 
for vhost could be a separate patch at least.



  3 files changed, 254 insertions(+), 275 deletions(-)
  mode change 100644 => 100755 lib/netdev-dpdk.c


file permission change.



diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 7f76df8..5006812 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -945,6 +945,18 @@ Restrictions:
  increased to the desired number of queues. Both DPDK and OVS must be
  recompiled for this change to take effect.

+  DPDK 'eth' type ports:
+  - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in the context of
+DPDK as they are all managed by the rte_ether API. This means that they
+adhere to the DPDK configuration option CONFIG_RTE_MAX_ETHPORTS which by
+default is set to 32. This means by default the combined total number of
+dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with DPDK is 32. This
+value can be changed if desired by modifying the configuration file in
+DPDK, or by overriding the default value on the command line when building
+DPDK. eg.
+
+`make install CONFIG_RTE_MAX_ETHPORTS=64`


format is not registering right for this in my md viewer.


+
  Bug Reporting:
  --

diff --git a/NEWS b/NEWS
index ea7f3a1..4dc0201 100644
--- a/NEWS
+++ b/NEWS
@@ -26,6 +26,8 @@ Post-v2.5.0
 assignment.
   * Type of log messages from PMD threads changed from INFO to DBG.
   * QoS functionality with sample egress-policer implementation.
+ * vHost PMD integration brings vhost-user ports under control of the
+   rte_ether DPDK API.
 - ovs-benchmark: This utility has been removed due to lack of use and
   bitrot.
 - ovs-appctl:
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
old mode 100644
new mode 100755
index 208c5f5..4fccd63
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -56,6 +56,7 @@
  #include "rte_mbuf.h"
  #include "rte_meter.h"
  #include "rte_virtio_net.h"
+#include "rte_eth_vhost.h"


nit: generally these go in alphabetical order.



  VLOG_DEFINE_THIS_MODULE(dpdk);
  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
@@ -109,6 +110,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / 
ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))

  static char *cuse_dev_name = NULL;/* Character device cuse_dev_name. */
  static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
+/* Array that tracks the used & unused vHost user driver IDs */
+static unsigned int vhost_user_drv_ids[RTE_MAX_ETHPORTS];

  /*
   * Maximum amount of time in micro seconds to try and enqueue to vhost.
@@ -143,7 +146,8 @@ enum { DRAIN_TSC = 20ULL };

  enum dpdk_dev_type {
  DPDK_DEV_ETH = 0,
-DPDK_DEV_VHOST = 1,
+DPDK_DEV_VHOST_USER = 1,
+DPDK_DEV_VHOST_CUSE = 2,
  };

  static int rte_eal_init_ret = ENODEV;
@@ -275,8 +279,6 @@ struct dpdk_tx_queue {
  * from concurrent access.  It is used only
  * if the queue is shared among different
  * pmd threads (see 'txq_needs_locking'). 
*/
-int map;   /* Mapping of configured vhost-user queues
-* to enabled by guest. */
  uint64_t tsc;
  struct rte_mbuf *burst_pkts[MAX_TX_QUEUE_LEN];
  };
@@ -329,12 +331,22 @@ struct netdev_dpdk {
  int real_n_rxq;
  bool txq_needs_locking;

-/* virtio-net structure for vhost device */
+/* Spinlock for vhost cuse transmission. Other DPDK devices use spinlocks
+ * in dpdk_tx_queue */
+rte_spinlock_t vhost_cuse_tx_lock;


Why can't we continue to use the lock in dpdk_tx_queue? rather than 
adding a cuse specific lock.



+
+/* virtio-net structure for vhost cuse device */
  OVSRCU_TYPE(struct virtio_net *) virtio_dev;

  /* Identifier used to distinguish vhost devices from each other */
  char vhost_id[PATH_MAX];

+/* ID of vhost user port given to the PMD driver */
+unsigned int vhost_pmd_id;
+
+/* Number of virtqueue pairs reported by the guest */
+uint32_t reported_queues;


Is this useful? we could just use the real_n_*xq's directly.


+
  /* In dpdk_list. */
  struct ovs_list list

Re: [ovs-dev] [PATCH] netdev-dpdk: Add vHost User PMD

2016-05-13 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Panu
> Matilainen
> Sent: Tuesday, May 10, 2016 2:51 PM
> To: Daniele Di Proietto ; Loftus, Ciara
> 
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Add vHost User PMD
> 
> On 04/30/2016 03:23 AM, Daniele Di Proietto wrote:
> 
> > I see that vhost-cuse is still handled separately. Is it possible to
> use
> > the vhost pmd also for vhost-cuse? Otherwise we still basically have
> to
> > handle differently three cases: NIC PMD, vhost user pmd, vhost cuse.
> Maybe
> > it's time to remove vhost-cuse (I understand this is a separate
> issue,
> > though)?
> 
> It's a separate issue but sufficiently related that its worth
> rehashing
> now - getting rid of it first would presumably simplify the remaining
> code quite a bit.
> 
> At least my personal experience with it has been that vhost-cuse only
> ever worked once in a full moon with stars aligned just so, and with
> just the right magic kernel versions in host and guest.
> 
> My 5c? Just put the poor thing out of its misery already :)
> 
>   - Panu -

Hi Panu,

I didn't share the pain in running vhost-cuse (although it's a long time
since I ran it) but I agree with removing it. vhost-user is clearly the
way to go now, with better support in QEMU/DPDK and no out of tree kernel
module.

QEMU/DPDK support - vhost-user has been available since QEMU 2.1/2.
Anyone willing to upgrade to a new OVS release without vhost-cuse would
surely be ok to use a version of QEMU that supports vhost-user, multiqueue
etc. 

Distro's - the latest Fedora, Ubuntu all come shipped with versions of QEMU
that support vhost-user. Centos has QEMU 2.0 but that supports neither
vhost-cuse or vhost-user.

Code - now that we have a vhost-user pmd in DPDK, removing vhost-cuse will
be a significant clean up. 

+1 from me.

Kevin.


> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 1/3] netdev-dpdk: Remove dpdk watchdog thread

2016-05-13 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ciara
> Loftus
> Sent: Wednesday, May 11, 2016 4:31 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH v3 1/3] netdev-dpdk: Remove dpdk watchdog
> thread
> 
> Instead of continuously polling for link status changes on 'dpdk'
> ports, register a callback function that will be triggered when DPDK
> detects that the link status of that port has changed.

rte_eth_link_get_nowait() returns void, so polling it in a thread won't
indicate some kind of error in dpdk. I can't see any benefit of the thread
- using the callback means one less thread and less locking.

Acked-by: Kevin Traynor 

> 
> Signed-off-by: Ciara Loftus 
> Suggested-by: Kevin Traynor 
> ---
>  lib/netdev-dpdk.c | 55 ++
> -
>  1 file changed, 30 insertions(+), 25 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index af86d19..89d783a 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -62,8 +62,6 @@
>  VLOG_DEFINE_THIS_MODULE(dpdk);
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> 
> -#define DPDK_PORT_WATCHDOG_INTERVAL 5
> -
>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
>  #define OVS_VPORT_DPDK "ovs_dpdk"
> 
> @@ -386,6 +384,9 @@ static int netdev_dpdk_construct(struct netdev *);
> 
>  struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk
> *dev);
> 
> +void link_status_changed_callback(uint8_t port_id,
> +enum rte_eth_event_type type OVS_UNUSED, void *param
> OVS_UNUSED);
> +
>  static bool
>  is_dpdk_class(const struct netdev_class *class)
>  {
> @@ -536,27 +537,6 @@ check_link_status(struct netdev_dpdk *dev)
>  }
>  }
> 
> -static void *
> -dpdk_watchdog(void *dummy OVS_UNUSED)
> -{
> -struct netdev_dpdk *dev;
> -
> -pthread_detach(pthread_self());
> -
> -for (;;) {
> -ovs_mutex_lock(&dpdk_mutex);
> -LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> -ovs_mutex_lock(&dev->mutex);
> -check_link_status(dev);
> -ovs_mutex_unlock(&dev->mutex);
> -}
> -ovs_mutex_unlock(&dpdk_mutex);
> -xsleep(DPDK_PORT_WATCHDOG_INTERVAL);
> -}
> -
> -return NULL;
> -}
> -
>  static int
>  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
> n_txq)
>  {
> @@ -717,6 +697,27 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *dev,
> unsigned int n_txqs)
>  }
>  }
> 
> +void
> +link_status_changed_callback(uint8_t port_id,
> +  enum rte_eth_event_type type
> OVS_UNUSED,
> +  void *param OVS_UNUSED)
> +{
> +struct netdev_dpdk *dev;
> +
> +ovs_mutex_lock(&dpdk_mutex);
> +LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> +if (port_id == dev->port_id) {
> +ovs_mutex_lock(&dev->mutex);
> +check_link_status(dev);
> +ovs_mutex_unlock(&dev->mutex);
> +break;
> +}
> +}
> +ovs_mutex_unlock(&dpdk_mutex);
> +
> +return;
> +}
> +
>  static int
>  netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
>   enum dpdk_dev_type type)
> @@ -774,6 +775,12 @@ netdev_dpdk_init(struct netdev *netdev, unsigned
> int port_no,
>  netdev_dpdk_alloc_txq(dev, OVS_VHOST_MAX_QUEUE_NUM);
>  }
> 
> +if (type == DPDK_DEV_ETH) {
> +rte_eth_dev_callback_register(port_no,
> RTE_ETH_EVENT_INTR_LSC,
> +
> (void*)link_status_changed_callback,
> +  NULL);
> +}
> +
>  ovs_list_push_back(&dpdk_list, &dev->list_node);
> 
>  unlock:
> @@ -3207,8 +3214,6 @@ dpdk_init__(const struct smap *ovs_other_config)
>  /* We are called from the main thread here */
>  RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID;
> 
> -ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL);
> -
>  #ifdef VHOST_CUSE
>  /* Register CUSE device to handle IOCTLs.
>   * Unless otherwise specified, cuse_dev_name is set to vhost-net.
> --
> 2.4.3
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Add vHost User PMD

2016-05-13 Thread Traynor, Kevin
> -Original Message-
> From: Loftus, Ciara
> Sent: Tuesday, May 10, 2016 10:22 AM
> To: Traynor, Kevin 
> Cc: dev@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH] netdev-dpdk: Add vHost User PMD
> 
> > On 21/04/2016 13:20, Ciara Loftus wrote:
> > > DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser'
> ports
> > > to be controlled by the librte_ether API, like physical 'dpdk'
> ports.
> > > The commit integrates this functionality into OVS, and refactors
> some
> > > of the existing vhost code such that it is vhost-cuse specific.
> > > Similarly, there is now some overlap between dpdk and vhost-user
> port
> > > code.
> > >
> > > Signed-off-by: Ciara Loftus 
> > > ---
> > >   INSTALL.DPDK.md   |  12 ++
> > >   NEWS  |   2 +
> > >   lib/netdev-dpdk.c | 515 +---
> --
> > 
> >
> > Hi Ciara, there's a lot of churn in this file. It might be worth
> > considering to see if it could be split through a few commits
> commits to
> > help reviewers. e.g. new features like adding get_features,
> get_status
> > for vhost could be a separate patch at least.
> 
> I've split into 3:
> - remove watchdog
> - add pmd
> - add get_stats & get_features

Great - this helps review a lot

> 
> Couldn't quite find a way to split it up more.
> 
> >
> > >   3 files changed, 254 insertions(+), 275 deletions(-)
> > >   mode change 100644 => 100755 lib/netdev-dpdk.c
> >
> > file permission change.
> 
> Woops. Fixed in v2.
> 
> >
> > >
> > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > > index 7f76df8..5006812 100644
> > > --- a/INSTALL.DPDK.md
> > > +++ b/INSTALL.DPDK.md
> > > @@ -945,6 +945,18 @@ Restrictions:
> > >   increased to the desired number of queues. Both DPDK and OVS
> must
> > be
> > >   recompiled for this change to take effect.
> > >
> > > +  DPDK 'eth' type ports:
> > > +  - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in
> the context
> > of
> > > +DPDK as they are all managed by the rte_ether API. This means
> that
> > they
> > > +adhere to the DPDK configuration option
> CONFIG_RTE_MAX_ETHPORTS
> > which by
> > > +default is set to 32. This means by default the combined
> total number of
> > > +dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with
> DPDK is 32.
> > This
> > > +value can be changed if desired by modifying the
> configuration file in
> > > +DPDK, or by overriding the default value on the command line
> when
> > building
> > > +DPDK. eg.
> > > +
> > > +`make install CONFIG_RTE_MAX_ETHPORTS=64`
> >
> > format is not registering right for this in my md viewer.
> 
> It's looking ok on mine. What doesn't look right?

The ``'s are showing in atom.io - could be just atom or schema related.

> 
> >
> > > +
> > >   Bug Reporting:
> > >   --
> > >
> > > diff --git a/NEWS b/NEWS
> > > index ea7f3a1..4dc0201 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -26,6 +26,8 @@ Post-v2.5.0
> > >  assignment.
> > >* Type of log messages from PMD threads changed from INFO
> to DBG.
> > >* QoS functionality with sample egress-policer
> implementation.
> > > + * vHost PMD integration brings vhost-user ports under
> control of the
> > > +   rte_ether DPDK API.
> > >  - ovs-benchmark: This utility has been removed due to lack of
> use and
> > >bitrot.
> > >  - ovs-appctl:
> > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > > old mode 100644
> > > new mode 100755
> > > index 208c5f5..4fccd63
> > > --- a/lib/netdev-dpdk.c
> > > +++ b/lib/netdev-dpdk.c
> > > @@ -56,6 +56,7 @@
> > >   #include "rte_mbuf.h"
> > >   #include "rte_meter.h"
> > >   #include "rte_virtio_net.h"
> > > +#include "rte_eth_vhost.h"
> >
> > nit: generally these go in alphabetical order.
> 
> Ok
> 
> >
> > >
> > >   VLOG_DEFINE_THIS_MODULE(dpdk);
> > >   static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> > > @@ -109,6 +110,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBU

Re: [ovs-dev] [PATCH v3 2/3] netdev-dpdk: Add vHost User PMD

2016-05-17 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ciara
> Loftus
> Sent: Wednesday, May 11, 2016 4:31 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH v3 2/3] netdev-dpdk: Add vHost User PMD
> 
> DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser'
> ports
> to be controlled by the librte_ether API, like physical 'dpdk' ports.
> The commit integrates this functionality into OVS, and refactors some
> of the existing vhost code such that it is vhost-cuse specific.
> Similarly, there is now some overlap between dpdk and vhost-user port
> code.
> 
> Signed-off-by: Ciara Loftus 

Hi, few minor comments below. I didn't review the cuse specific code this
time around.

Kevin.


> ---
>  INSTALL.DPDK.md   |  12 ++
>  NEWS  |   2 +
>  lib/netdev-dpdk.c | 628 +---
> --
>  3 files changed, 396 insertions(+), 246 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 93f92e4..db7153a 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -990,6 +990,18 @@ Restrictions:
>  increased to the desired number of queues. Both DPDK and OVS
> must be
>  recompiled for this change to take effect.
> 
> +  DPDK 'eth' type ports:
> +  - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in the
> context of
> +DPDK as they are all managed by the rte_ether API. This means
> that they
> +adhere to the DPDK configuration option CONFIG_RTE_MAX_ETHPORTS
> which by
> +default is set to 32. This means by default the combined total
> number of
> +dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with DPDK
> is 32. This
> +value can be changed if desired by modifying the configuration
> file in
> +DPDK, or by overriding the default value on the command line
> when building
> +DPDK. eg.
> +
> +`make install CONFIG_RTE_MAX_ETHPORTS=64`
> +
>  Bug Reporting:
>  --
> 
> diff --git a/NEWS b/NEWS
> index 4e81cad..841314b 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -32,6 +32,8 @@ Post-v2.5.0
>   * DB entries have been added for many of the DPDK EAL command
> line
> arguments. Additional arguments can be passed via the dpdk-
> extra
> entry.
> + * vHost PMD integration brings vhost-user ports under control
> of the
> +   rte_ether DPDK API.
> - ovs-benchmark: This utility has been removed due to lack of use
> and
>   bitrot.
> - ovs-appctl:
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 89d783a..814ef83 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -55,6 +55,7 @@
>  #include "unixctl.h"
> 
>  #include "rte_config.h"
> +#include "rte_eth_vhost.h"
>  #include "rte_mbuf.h"
>  #include "rte_meter.h"
>  #include "rte_virtio_net.h"
> @@ -139,6 +140,11 @@ static char *cuse_dev_name = NULL;/*
> Character device cuse_dev_name. */
>  #endif
>  static char *vhost_sock_dir = NULL;   /* Location of vhost-user
> sockets */
> 
> +/* Array that tracks the used & unused vHost user driver IDs */
> +static unsigned int vhost_user_drv_ids[RTE_MAX_ETHPORTS];

I think you can replace this array with a counter. You don't need a
unique id - just that you are < MAX.

> +/* Maximum string length allowed to provide to rte_eth_attach
> function */
> +#define DEVARGS_MAX (RTE_ETH_NAME_MAX_LEN + PATH_MAX + 18)
> +
>  /*
>   * Maximum amount of time in micro seconds to try and enqueue to
> vhost.
>   */
> @@ -172,7 +178,8 @@ enum { DRAIN_TSC = 20ULL };
> 
>  enum dpdk_dev_type {
>  DPDK_DEV_ETH = 0,
> -DPDK_DEV_VHOST = 1,
> +DPDK_DEV_VHOST_USER = 1,
> +DPDK_DEV_VHOST_CUSE = 2,
>  };
> 
>  static int rte_eal_init_ret = ENODEV;
> @@ -358,12 +365,22 @@ struct netdev_dpdk {
>  int real_n_rxq;
>  bool txq_needs_locking;
> 
> -/* virtio-net structure for vhost device */
> +/* Spinlock for vhost cuse transmission. Other DPDK devices use
> spinlocks
> + * in dpdk_tx_queue */
> +rte_spinlock_t vhost_cuse_tx_lock;
> +
> +/* virtio-net structure for vhost cuse device */
>  OVSRCU_TYPE(struct virtio_net *) virtio_dev;
> 
> +/* Number of virtqueue pairs reported by the guest */
> +uint32_t vhost_qp_nb;
> +
>  /* Identifier used to distinguish vhost devices from each other
> */
>  char vhost_id[PATH_MAX];
> 
> +/* ID of vhost user port given to the PMD driver */
> +unsigned int vhost_pmd_id;
> +

This could be removed if you just use a counter as per comment above.

>  /* In dpdk_list. */
>  struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
> 
> @@ -381,16 +398,20 @@ struct netdev_rxq_dpdk {
>  static bool dpdk_thread_is_pmd(void);
> 
>  static int netdev_dpdk_construct(struct netdev *);
> +static int netdev_dpdk_vhost_user_construct(struct netdev *);
> 
>  struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk
> *dev);
> 
>  void link_status_changed_callback(uint8_t port_id,
>  enum rte_eth_event_type type OVS_U

Re: [ovs-dev] [PATCH v3 3/3] netdev-dpdk: Add vhost-user 'get_features' & 'get_status' functions

2016-05-17 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ciara Loftus
> Sent: Wednesday, May 11, 2016 4:31 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH v3 3/3] netdev-dpdk: Add vhost-user
> 'get_features' & 'get_status' functions
> 
> Implementations for the netdev functions 'get_features' and
> 'get_status' are now available for vhost-user thanks to the addition of
> the vHost PMD.
> 
> Signed-off-by: Ciara Loftus 
> ---
>  lib/netdev-dpdk.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 814ef83..fce1655 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2301,15 +2301,18 @@ netdev_dpdk_get_status(const struct netdev
> *netdev, struct smap *args)
>  smap_add_format(args, "max_rx_queues", "%u",
> dev_info.max_rx_queues);
>  smap_add_format(args, "max_tx_queues", "%u",
> dev_info.max_tx_queues);
>  smap_add_format(args, "max_mac_addrs", "%u",
> dev_info.max_mac_addrs);
> -smap_add_format(args, "max_hash_mac_addrs", "%u",
> dev_info.max_hash_mac_addrs);
> -smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
> -smap_add_format(args, "max_vmdq_pools", "%u",
> dev_info.max_vmdq_pools);
> 
> -if (dev_info.pci_dev) {
> -smap_add_format(args, "pci-vendor_id", "0x%u",
> -dev_info.pci_dev->id.vendor_id);
> -smap_add_format(args, "pci-device_id", "0x%x",
> -dev_info.pci_dev->id.device_id);
> +if (dev->type == DPDK_DEV_ETH) {
> +smap_add_format(args, "max_hash_mac_addrs", "%u",
> +dev_info.max_hash_mac_addrs);
> +smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
> +smap_add_format(args, "max_vmdq_pools", "%u",
> dev_info.max_vmdq_pools);
> +if (dev_info.pci_dev) {
> +smap_add_format(args, "pci-vendor_id", "0x%u",
> +dev_info.pci_dev->id.vendor_id);
> +smap_add_format(args, "pci-device_id", "0x%x",
> +dev_info.pci_dev->id.device_id);
> +}
>  }
> 
>  return 0;
> @@ -3431,8 +3434,8 @@ static const struct netdev_class OVS_UNUSED
> dpdk_vhost_user_class =
>  netdev_dpdk_vhost_user_send,
>  netdev_dpdk_get_carrier,
>  netdev_dpdk_get_stats,
> -NULL,
> -NULL,
> +netdev_dpdk_get_features,
> +netdev_dpdk_get_status,

Maybe a comment for 1/3 but just thought of it while reviewing this:
do you need to call check_link_status() in netdev_dpdk_get_status() now
that it's not on a timer anymore? or is it guaranteed to be called for
all interfaces prior to netdev_dpdk_get_status(). 


>  netdev_dpdk_vhost_user_rxq_recv);
> 
>  void
> --
> 2.4.3
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 1/2] netdev-dpdk: Fix coremask logic.

2016-05-18 Thread Traynor, Kevin
> -Original Message-
> From: Traynor, Kevin
> Sent: Tuesday, May 10, 2016 2:41 PM
> To: dev@openvswitch.org
> Cc: Traynor, Kevin ; Aaron Conole
> 
> Subject: [PATCH v2 1/2] netdev-dpdk: Fix coremask logic.
> 
> Only set the thread affinity back to the pre rte_eal_init() value
> when the user has not specified a coremask.

Hi Aaron - Do these patches look ok to you? I noticed while running
that the dpdk-lcore-mask was not having an effect due to a typo that
crept in late on. 

Thanks,
Kevin.

> 
> Fixes: 88964e6428dc("netdev-dpdk: Autofill lcore coremask if
> absent")
> CC: Aaron Conole 
> Signed-off-by: Kevin Traynor 
> ---
>  lib/netdev-dpdk.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index af86d19..79fcd1a 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -3188,7 +3188,7 @@ dpdk_init__(const struct smap
> *ovs_other_config)
>  }
> 
>  /* Set the main thread affinity back to pre rte_eal_init()
> value */
> -if (!auto_determine) {
> +if (auto_determine) {
>  err = pthread_setaffinity_np(pthread_self(),
> sizeof(cpu_set_t),
>   &cpuset);
>  if (err) {
> --
> 1.7.4.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 1/2] netdev-dpdk: Fix coremask logic.

2016-05-18 Thread Traynor, Kevin
> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Wednesday, May 18, 2016 3:17 PM
> To: Traynor, Kevin 
> Cc: dev@openvswitch.org
> Subject: Re: [PATCH v2 1/2] netdev-dpdk: Fix coremask logic.
> 
> "Traynor, Kevin"  writes:
> 
> >> -Original Message-
> >> From: Traynor, Kevin
> >> Sent: Tuesday, May 10, 2016 2:41 PM
> >> To: dev@openvswitch.org
> >> Cc: Traynor, Kevin ; Aaron Conole
> >> 
> >> Subject: [PATCH v2 1/2] netdev-dpdk: Fix coremask logic.
> >>
> >> Only set the thread affinity back to the pre rte_eal_init() value
> >> when the user has not specified a coremask.
> >
> > Hi Aaron - Do these patches look ok to you? I noticed while running
> > that the dpdk-lcore-mask was not having an effect due to a typo that
> > crept in late on.
> 
> Whoops, sorry - thought I had responded, but looks like I didn't.
> 
> Yes, this and 2/2 look good to me.
> 
> They both have my Ack. Apologies for the time.

ok, great. thanks for review.

> 
> > Thanks,
> > Kevin.
> >
> >>
> >> Fixes: 88964e6428dc("netdev-dpdk: Autofill lcore coremask if
> >> absent")
> >> CC: Aaron Conole 
> >> Signed-off-by: Kevin Traynor 
> >> ---
> >>  lib/netdev-dpdk.c |2 +-
> >>  1 files changed, 1 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> index af86d19..79fcd1a 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -3188,7 +3188,7 @@ dpdk_init__(const struct smap
> >> *ovs_other_config)
> >>  }
> >>
> >>  /* Set the main thread affinity back to pre rte_eal_init()
> >> value */
> >> -if (!auto_determine) {
> >> +if (auto_determine) {
> >>  err = pthread_setaffinity_np(pthread_self(),
> >> sizeof(cpu_set_t),
> >>   &cpuset);
> >>  if (err) {
> >> --
> >> 1.7.4.1
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 3/3] netdev-dpdk: Add vhost-user 'get_features' & 'get_status' functions

2016-05-18 Thread Traynor, Kevin
> -Original Message-
> From: Loftus, Ciara
> Sent: Wednesday, May 18, 2016 3:56 PM
> To: Traynor, Kevin ; dev@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v3 3/3] netdev-dpdk: Add vhost-user
> 'get_features' & 'get_status' functions
> 
> > >
> > > Implementations for the netdev functions 'get_features' and
> > > 'get_status' are now available for vhost-user thanks to the
> addition of
> > > the vHost PMD.
> > >
> > > Signed-off-by: Ciara Loftus 
> > > ---
> > >  lib/netdev-dpdk.c | 23 +--
> > >  1 file changed, 13 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > > index 814ef83..fce1655 100644
> > > --- a/lib/netdev-dpdk.c
> > > +++ b/lib/netdev-dpdk.c
> > > @@ -2301,15 +2301,18 @@ netdev_dpdk_get_status(const struct netdev
> > > *netdev, struct smap *args)
> > >  smap_add_format(args, "max_rx_queues", "%u",
> > > dev_info.max_rx_queues);
> > >  smap_add_format(args, "max_tx_queues", "%u",
> > > dev_info.max_tx_queues);
> > >  smap_add_format(args, "max_mac_addrs", "%u",
> > > dev_info.max_mac_addrs);
> > > -smap_add_format(args, "max_hash_mac_addrs", "%u",
> > > dev_info.max_hash_mac_addrs);
> > > -smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
> > > -smap_add_format(args, "max_vmdq_pools", "%u",
> > > dev_info.max_vmdq_pools);
> > >
> > > -if (dev_info.pci_dev) {
> > > -smap_add_format(args, "pci-vendor_id", "0x%u",
> > > -dev_info.pci_dev->id.vendor_id);
> > > -smap_add_format(args, "pci-device_id", "0x%x",
> > > -dev_info.pci_dev->id.device_id);
> > > +if (dev->type == DPDK_DEV_ETH) {
> > > +smap_add_format(args, "max_hash_mac_addrs", "%u",
> > > +dev_info.max_hash_mac_addrs);
> > > +smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
> > > +smap_add_format(args, "max_vmdq_pools", "%u",
> > > dev_info.max_vmdq_pools);
> > > +if (dev_info.pci_dev) {
> > > +smap_add_format(args, "pci-vendor_id", "0x%u",
> > > +dev_info.pci_dev->id.vendor_id);
> > > +smap_add_format(args, "pci-device_id", "0x%x",
> > > +dev_info.pci_dev->id.device_id);
> > > +}
> > >  }
> > >
> > >  return 0;
> > > @@ -3431,8 +3434,8 @@ static const struct netdev_class OVS_UNUSED
> > > dpdk_vhost_user_class =
> > >  netdev_dpdk_vhost_user_send,
> > >  netdev_dpdk_get_carrier,
> > >  netdev_dpdk_get_stats,
> > > -NULL,
> > > -NULL,
> > > +netdev_dpdk_get_features,
> > > +netdev_dpdk_get_status,
> >
> > Maybe a comment for 1/3 but just thought of it while reviewing this:
> > do you need to call check_link_status() in netdev_dpdk_get_status()
> now
> > that it's not on a timer anymore? or is it guaranteed to be called
> for
> > all interfaces prior to netdev_dpdk_get_status().
> 
> Do you mean in netdev_dpdk_get_carrier() ?
> Perhaps the call isn't needed anymore... but there's maybe one corner
> case.
> Not sure if it's possible but if both the link status changed
> interrupt and netdev_dpdk_get_carrier are both called, it is whoever
> takes the dpdk_mutex first that will continue first. If
> netdev_dpdk_get_carrier() gets the mutex, it will only get up-to-date
> link info if the check_link_status() is called. If the call is not
> there, get_carrier() will return the old link status, free the mutex,
> and the interrupt will continue on and update the status to the new
> one immediately after.
> So I think it should be kept, but I don't have strong feelings about
> it either way.

That's not what I was getting at. I was just wondering if the dev->link
status structure will always be populated before we report it out.

Now I see there is a call to rte_eth_link_get_nowait() in
dpdk_eth_dev_init() so it will be fine - thanks.

> 
> Thanks,
> Ciara
> 
> >
> >
> > >  netdev_dpdk_vhost_user_rxq_recv);
> > >
> > >  void
> > > --
> > > 2.4.3
> > >
> > > ___
> > > dev mailing list
> > > dev@openvswitch.org
> > > http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3 2/3] netdev-dpdk: Add vHost User PMD

2016-05-18 Thread Traynor, Kevin

> -Original Message-
> From: Loftus, Ciara
> Sent: Wednesday, May 18, 2016 3:43 PM
> To: Traynor, Kevin ; dev@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v3 2/3] netdev-dpdk: Add vHost User PMD
> 
> > >
> > > DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser'
> > > ports
> > > to be controlled by the librte_ether API, like physical 'dpdk'
> ports.
> > > The commit integrates this functionality into OVS, and refactors
> some
> > > of the existing vhost code such that it is vhost-cuse specific.
> > > Similarly, there is now some overlap between dpdk and vhost-user
> port
> > > code.
> > >
> > > Signed-off-by: Ciara Loftus 
> >
> > Hi, few minor comments below. I didn't review the cuse specific code
> this
> > time around.
> Thanks Kevin for the feedback, my responses are inline.
> 
> Ciara
> 
> >
> > Kevin.
> >
> >
> > > ---
> > >  INSTALL.DPDK.md   |  12 ++
> > >  NEWS  |   2 +
> > >  lib/netdev-dpdk.c | 628 +
> -
> > --
> > > --
> > >  3 files changed, 396 insertions(+), 246 deletions(-)
> > >
> > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > > index 93f92e4..db7153a 100644
> > > --- a/INSTALL.DPDK.md
> > > +++ b/INSTALL.DPDK.md
> > > @@ -990,6 +990,18 @@ Restrictions:
> > >  increased to the desired number of queues. Both DPDK and OVS
> > > must be
> > >  recompiled for this change to take effect.
> > >
> > > +  DPDK 'eth' type ports:
> > > +  - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in
> the
> > > context of
> > > +DPDK as they are all managed by the rte_ether API. This means
> > > that they
> > > +adhere to the DPDK configuration option
> CONFIG_RTE_MAX_ETHPORTS
> > > which by
> > > +default is set to 32. This means by default the combined
> total
> > > number of
> > > +dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with
> DPDK
> > > is 32. This
> > > +value can be changed if desired by modifying the
> configuration
> > > file in
> > > +DPDK, or by overriding the default value on the command line
> > > when building
> > > +DPDK. eg.
> > > +
> > > +`make install CONFIG_RTE_MAX_ETHPORTS=64`
> > > +
> > >  Bug Reporting:
> > >  --
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 4e81cad..841314b 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -32,6 +32,8 @@ Post-v2.5.0
> > >   * DB entries have been added for many of the DPDK EAL
> command
> > > line
> > > arguments. Additional arguments can be passed via the
> dpdk-
> > > extra
> > > entry.
> > > + * vHost PMD integration brings vhost-user ports under
> control
> > > of the
> > > +   rte_ether DPDK API.
> > > - ovs-benchmark: This utility has been removed due to lack of
> use
> > > and
> > >   bitrot.
> > > - ovs-appctl:
> > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > > index 89d783a..814ef83 100644
> > > --- a/lib/netdev-dpdk.c
> > > +++ b/lib/netdev-dpdk.c
> > > @@ -55,6 +55,7 @@
> > >  #include "unixctl.h"
> > >
> > >  #include "rte_config.h"
> > > +#include "rte_eth_vhost.h"
> > >  #include "rte_mbuf.h"
> > >  #include "rte_meter.h"
> > >  #include "rte_virtio_net.h"
> > > @@ -139,6 +140,11 @@ static char *cuse_dev_name = NULL;/*
> > > Character device cuse_dev_name. */
> > >  #endif
> > >  static char *vhost_sock_dir = NULL;   /* Location of vhost-user
> > > sockets */
> > >
> > > +/* Array that tracks the used & unused vHost user driver IDs */
> > > +static unsigned int vhost_user_drv_ids[RTE_MAX_ETHPORTS];
> >
> > I think you can replace this array with a counter. You don't need a
> > unique id - just that you are < MAX.
> 
> I considered at first using a counter, but what if the counter reaches
> the MAX but we still have space for most vHost ports?
> eg. We add RTE_MAX_ETHPORTS vHost ports, delete all the ports, then
> try to add one again but can't because the counter is at max.
> Ev

Re: [ovs-dev] [PATCH v3 1/3] netdev-dpdk: Remove dpdk watchdog thread

2016-05-20 Thread Traynor, Kevin
[cross-posting to dpdk mailing list]

> -Original Message-
> From: Torgny Lindberg [mailto:torgny.lindb...@ericsson.com]
> Sent: Thursday, May 19, 2016 8:26 AM
> To: Traynor, Kevin ; Loftus, Ciara
> ; dev@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v3 1/3] netdev-dpdk: Remove dpdk
> watchdog thread
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Traynor,
> > Kevin
> > Sent: den 13 maj 2016 12:47
> > To: Loftus, Ciara; dev@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v3 1/3] netdev-dpdk: Remove dpdk
> watchdog
> > thread
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ciara
> > > Loftus
> > > Sent: Wednesday, May 11, 2016 4:31 PM
> > > To: dev@openvswitch.org
> > > Subject: [ovs-dev] [PATCH v3 1/3] netdev-dpdk: Remove dpdk
> watchdog
> > > thread
> > >
> > > Instead of continuously polling for link status changes on 'dpdk'
> > > ports, register a callback function that will be triggered when
> DPDK
> > > detects that the link status of that port has changed.
> >
> > rte_eth_link_get_nowait() returns void, so polling it in a thread
> won't
> > indicate some kind of error in dpdk. I can't see any benefit of the
> thread
> > - using the callback means one less thread and less locking.
> >
> > Acked-by: Kevin Traynor 
> >
> 
> 
> With this patch a 4s delay before detecting link-down would be
> introduced,
> which is from the viewpoint of many use cases an unacceptably long
> delay.
> I would like to suggest that the existing poll method is kept as it
> detects
> and acts on link failures much faster (millisecond time scale),
> alternatively that both poll and interrupt methods are supported
> and the one to use is selected by configuration.
> 
> The delay occurs inside the dpdk driver.
> (See e.g. dpdk, ixgbe_ethdev.c, IXGBE_LINK_DOWN_CHECK_TIMEOUT)
> 

Hi Torgny,

Thanks for pointing that out, I hadn't realized the additional delay.
Do you think the default should be changed in DPDK? 

Kevin.

> 
> Best regards,
> Torgny Lindberg
> 
> 
> > >
> > > Signed-off-by: Ciara Loftus 
> > > Suggested-by: Kevin Traynor 
> > > ---
> > >  lib/netdev-dpdk.c | 55 ++
> ---
> > -
> > > -
> > >  1 file changed, 30 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > > index af86d19..89d783a 100644
> > > --- a/lib/netdev-dpdk.c
> > > +++ b/lib/netdev-dpdk.c
> > > @@ -62,8 +62,6 @@
> > >  VLOG_DEFINE_THIS_MODULE(dpdk);
> > >  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> > >
> > > -#define DPDK_PORT_WATCHDOG_INTERVAL 5
> > > -
> > >  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
> > >  #define OVS_VPORT_DPDK "ovs_dpdk"
> > >
> > > @@ -386,6 +384,9 @@ static int netdev_dpdk_construct(struct netdev
> *);
> > >
> > >  struct virtio_net * netdev_dpdk_get_virtio(const struct
> netdev_dpdk
> > > *dev);
> > >
> > > +void link_status_changed_callback(uint8_t port_id,
> > > +enum rte_eth_event_type type OVS_UNUSED, void *param
> > > OVS_UNUSED);
> > > +
> > >  static bool
> > >  is_dpdk_class(const struct netdev_class *class)
> > >  {
> > > @@ -536,27 +537,6 @@ check_link_status(struct netdev_dpdk *dev)
> > >  }
> > >  }
> > >
> > > -static void *
> > > -dpdk_watchdog(void *dummy OVS_UNUSED)
> > > -{
> > > -struct netdev_dpdk *dev;
> > > -
> > > -pthread_detach(pthread_self());
> > > -
> > > -for (;;) {
> > > -ovs_mutex_lock(&dpdk_mutex);
> > > -LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> > > -ovs_mutex_lock(&dev->mutex);
> > > -check_link_status(dev);
> > > -ovs_mutex_unlock(&dev->mutex);
> > > -}
> > > -ovs_mutex_unlock(&dpdk_mutex);
> > > -xsleep(DPDK_PORT_WATCHDOG_INTERVAL);
> > > -}
> > > -
> > > -return NULL;
> > > -}
> > > -
> > >  static int
> > >  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
> > > n_txq)
> > >  {
> > > @@ -717,6 +697,27 @@ netdev

Re: [ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in __netdev_dpdk_vhost_send().

2016-05-23 Thread Traynor, Kevin
> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Tuesday, May 17, 2016 4:09 PM
> To: dev@openvswitch.org; Daniele Di Proietto 
> Cc: Dyasly Sergey ; Heetae Ahn
> ; Flavio Leitner ;
> Traynor, Kevin ; Pravin B Shelar
> ; Ilya Maximets 
> Subject: [PATCH] netdev-dpdk: Fix PMD threads hang in
> __netdev_dpdk_vhost_send().
> 
> There are situations when PMD thread can hang forever inside
> __netdev_dpdk_vhost_send() because of broken virtqueue ring.
> 
> This happens if rte_vring_available_entries() always positive and
> rte_vhost_enqueue_burst() can't send anything (possible with broken
> ring).
> 
> In this case time expiration will be never checked and 'do {} while
> (cnt)'
> loop will be infinite.
> 
> This scenario sometimes reproducible with dpdk-16.04-rc2 inside guest
> VM.
> Also it may be reproduced by manual braking of ring structure inside
> the guest VM.
> 
> Fix that by checking time expiration even if we have available
> entries.

Hi Ilya,

Thanks for catching this. This intersects with something else I've seen
wrt retry code and there's a few options...

1. Remove retries when nothing sent. For the VM that needs retries it is a
good thing to have, but Bhanu and I saw in a test with multiple VM's recently
that if one VM causes a lot of retries there is a large performance degradation
for the other VM's. So I changed the retry to only occur when at least one 
packet
has been sent on the previous call. I put a patch up here.
http://openvswitch.org/pipermail/dev/2016-May/071517.html

If we keep retries we can either

2. Make more robust coordination between rte_ring_available_entries() and
rte_vhost_enqueue_burst(), as per your patch.

3. As you've shown that we can't rely on the rte_ring_available_entries() to 
know we
can enqueue, how about just remove it and use rte_vhost_enqueue_burst() directly
in the retry loop.

My preference would be for 1. because on balance I'd rather one VM did not 
degrade
performance of others, more than I'd like it to have retries. Of course there 
could
be some compromise between them as well i.e. reduce amount of retries, but any 
retries
could affect performance for another path if they are using the same core.

What do you think?

Kevin. 

> 
> Signed-off-by: Ilya Maximets 
> ---
> 
> How to reproduce manually:
> 
> * Start packet flow.
> 
> * Start testpmd inside guest VM under gdb:
>   # gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w
> :00:0a.0 \\
> -- --burst=64 --txd=512 --rxd=512
> 
> * Break virtqueue ring somehow:
>   ^C
>   # (gdb) break virtqueue_enqueue_xmit
>   # (gdb) p txvq->vq_ring.avail->idx
>   $1 = 30784
>   # (gdb) p &txvq->vq_ring.avail->idx
>   $2 = (uint16_t *) 0x7fff7ea9b002
>   # (gdb) set {uint16_t}0x7fff7ea9b002 = 0
>   # (gdb) disable 1
>   # (gdb) c
>   Continuing.
> 
> * Hardly stop testpmd:
>   ^C
>   # (gdb) quit
>   A debugging session is active.
>   Quit anyway? (y or n) y
> 
> * Start testpmd inside VM again and see results:
>   # gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w
> :00:0a.0 \\
> -- --burst=64 --txd=512 --rxd=512
>   EAL: Detected 5 lcore(s)
>   EAL: Probing VFIO support...
>   EAL: PCI device :00:0a.0 on NUMA socket -1
>   EAL:   probe driver: 1af4:1000 rte_virtio_pmd
>   ***SSH hangs and QEMU crashes here***
> 
> * On the OVS side we can see hang of PMD thread:
>   |3|ovs_rcu(urcu2)|WARN|blocked 4005 ms waiting for pmd54 to
> quiesce
>   |4|ovs_rcu(urcu2)|WARN|blocked 8005 ms waiting for pmd54 to
> quiesce
>   |5|ovs_rcu(urcu2)|WARN|blocked 16004 ms waiting for pmd54 to
> quiesce
>   
> 
> 
> lib/netdev-dpdk.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 87879d5..aef6ea4 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1367,24 +1367,24 @@ __netdev_dpdk_vhost_send(struct netdev
> *netdev, int qid,
>  cur_pkts = &cur_pkts[tx_pkts];
>  } else {
>  uint64_t timeout = VHOST_ENQ_RETRY_USECS *
> rte_get_timer_hz() / 1E6;
> -unsigned int expired = 0;
> +bool expired = false;
> 
> -if (!start) {
> +if (start) {
> +expired = (rte_get_timer_cycles() - start) > timeout;
> +} else {
>  start = rte_get_timer_cycles();
>  }
> -
>  /*
>   * Unable to enqueue packets to vho

Re: [ovs-dev] [dpdk-dev] Crashing OVS+DPDK at the host, from inside of a KVM Guest

2016-05-25 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Christian
> Ehrhardt
> Sent: Wednesday, May 25, 2016 7:06 AM
> To: Martinx - ジェームズ 
> Cc:  ; dev 
> Subject: Re: [dpdk-dev] Crashing OVS+DPDK at the host, from inside of
> a KVM Guest
> 
> Hi,
> ping ...
> 
> Later on I want to look at it again once we upgraded to more recent
> releases of the software components involved, but those have to be
> made
> ready to use first :-/
> 
> But the description is good and I wonder if anybody else could
> reproduce
> this and/or would have a hint on where this might come from or already
> existing related fixes.
> 
> I mean in general nothing should be able to crash the host right?

Hi, I don't know if they are related to the issue that is being seen,
but Yuanhan made some fixes in DPDK 16.04 regarding a malicious guest
affecting the host. rte_vhost_dequeue_burst() is showing in the stack
trace so it might worth testing with the latest code to see if it's the
same issue and has been fixed.

Kevin.

> 
> 
> P.S. yeah two list cross posting, but it is yet unclear which it
> belongs to
> so I'll keep it
> 
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
> 
> On Sun, May 15, 2016 at 7:08 AM, Martinx - ジェームズ
> 
> wrote:
> 
> > Guys,
> >
> >  If using OVS 2.5 with DPDK 2.2, on Ubuntu Xenial, it is possible to
> crash
> > the OVS running at the host, from inside of a KVM Guest.
> >
> >  Basically, what I'm trying to do, is to run OVS+DPDK at the host,
> and
> > also, inside of a KVM Guest, with multi-queue, but it doesn't work
> and
> > crashes.
> >
> >  Soon as you enable multi-queue at the guest, it crashes the OVS of
> the
> > host!
> >
> > OVS+DPDK segfault at the host, after running "ovs-vsctl set
> Open_vSwitch .
> > other_config:n-dpdk-rxqs=4" within a KVM Guest:
> >
> > https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1577088
> >
> > Thanks!
> > Thiago
> >
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in __netdev_dpdk_vhost_send().

2016-06-10 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ilya
> Maximets
> Sent: Thursday, June 2, 2016 6:24 AM
> To: Daniele Di Proietto ; Traynor, Kevin
> ; dev@openvswitch.org
> Cc: Dyasly Sergey ; Flavio Leitner
> 
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in
> __netdev_dpdk_vhost_send().
> 
> On 02.06.2016 04:32, Daniele Di Proietto wrote:
> >
> > On 25/05/2016 04:03, "Ilya Maximets"  wrote:
> >
> >> On 23.05.2016 17:55, Traynor, Kevin wrote:
> >>>> -Original Message-
> >>>> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> >>>> Sent: Tuesday, May 17, 2016 4:09 PM
> >>>> To: dev@openvswitch.org; Daniele Di Proietto
> 
> >>>> Cc: Dyasly Sergey ; Heetae Ahn
> >>>> ; Flavio Leitner ;
> >>>> Traynor, Kevin ; Pravin B Shelar
> >>>> ; Ilya Maximets 
> >>>> Subject: [PATCH] netdev-dpdk: Fix PMD threads hang in
> >>>> __netdev_dpdk_vhost_send().
> >>>>
> >>>> There are situations when PMD thread can hang forever inside
> >>>> __netdev_dpdk_vhost_send() because of broken virtqueue ring.
> >>>>
> >>>> This happens if rte_vring_available_entries() always positive and
> >>>> rte_vhost_enqueue_burst() can't send anything (possible with
> broken
> >>>> ring).
> >>>>
> >>>> In this case time expiration will be never checked and 'do {}
> while
> >>>> (cnt)'
> >>>> loop will be infinite.
> >>>>
> >>>> This scenario sometimes reproducible with dpdk-16.04-rc2 inside
> guest
> >>>> VM.
> >>>> Also it may be reproduced by manual braking of ring structure
> inside
> >>>> the guest VM.
> >>>>
> >>>> Fix that by checking time expiration even if we have available
> >>>> entries.
> >>>
> >>> Hi Ilya,
> >>
> >> Hi, Kevin.
> >>
> >> Christian and Thiago CC-ed, because, I think, they're faced with
> similar issue.
> >>
> >>>
> >>> Thanks for catching this. This intersects with something else I've
> seen
> >>> wrt retry code and there's a few options...
> >>>
> >>> 1. Remove retries when nothing sent. For the VM that needs retries
> it is a
> >>> good thing to have, but Bhanu and I saw in a test with multiple
> VM's recently
> >>> that if one VM causes a lot of retries there is a large
> performance degradation
> >>> for the other VM's. So I changed the retry to only occur when at
> least one packet
> >>> has been sent on the previous call. I put a patch up here.
> >>> http://openvswitch.org/pipermail/dev/2016-May/071517.html
> >>>
> >>> If we keep retries we can either
> >>>
> >>> 2. Make more robust coordination between
> rte_ring_available_entries() and
> >>> rte_vhost_enqueue_burst(), as per your patch.
> >>>
> >>> 3. As you've shown that we can't rely on the
> rte_ring_available_entries() to know we
> >>> can enqueue, how about just remove it and use
> rte_vhost_enqueue_burst() directly
> >>> in the retry loop.
> >>>
> >>> My preference would be for 1. because on balance I'd rather one VM
> did not degrade
> >>> performance of others, more than I'd like it to have retries. Of
> course there could
> >>> be some compromise between them as well i.e. reduce amount of
> retries, but any retries
> >>> could affect performance for another path if they are using the
> same core.
> >>>
> >>> What do you think?
> >>
> >> I'm worry about scenarios with "pulsing" traffic, i.e. if we have
> not very big but
> >> enough amount of packets to overload vring in a short time and long
> period of silence
> >> after that. HW can keep in its RX queues much more packets than can
> be pushed to
> >> virtio ring. In this scenario, without retrying, most of packets
> will be dropped.
> >>
> >> How about just decreasing of VHOST_ENQ_RETRY_USECS to, may be, 1
> usec with my fix
> >> applied of course? Such interval should be enough to handle 20G
> traffic with 64B
> >> packets by one PMD thread. And, also, this timeout may be applied
> to both cases
> >> (something sent 

Re: [ovs-dev] [PATCH] netdev-dpdk: Fix crash when changing the vhost-user port.

2016-03-22 Thread Traynor, Kevin
> -Original Message-
> From: Ilya Maximets [mailto:i.maxim...@samsung.com]
> Sent: Tuesday, March 22, 2016 12:42 PM
> To: dev@openvswitch.org; Daniele Di Proietto 
> Cc: Dyasly Sergey ; Ben Pfaff ; Flavio
> Leitner ; Traynor, Kevin ; Ilya
> Maximets 
> Subject: [PATCH] netdev-dpdk: Fix crash when changing the vhost-user port.
> 
> According to netdev-provider API:
>   'The "destruct" function is not allowed to fail.'
> 
> netdev-dpdk breaks this restriction for vhost-user ports.
> This leads to SIGABRT or SIGSEGV in dpdk_watchdog thread
> because 'dealloc' will be called anyway indifferently
> to result of 'destruct'.
> 
> For example, if we call
>   # ovs-vsctl set interface vhost1 ofport_request=5
> while QEMU still attached, we'll get:
> --[cut]--
> |dpdk|ERR|Can not remove port, vhost device still attached
> VHOST_CONFIG: socket created, fd:98
> VHOST_CONFIG: fail to bind fd:98, remove file:/home/vhost1 and try again.
> |dpdk|ERR|vhost-user socket device setup failure for socket /home/vhost1
> |bridge|WARN|could not open network device vhost1 (Unknown error -1)
> ovs-vswitchd(dpdk_watchdog1): lib/netdev-dpdk.c:532: ovs_mutex_lock_at()
> passed uninitialized ovs_mutex
> 
> Program received signal SIGABRT, Aborted.
> --[cut]--
> 
> Fix that by removing port anyway even when guest is still
> attached. Guest becomes an orphan in that case but OVS
> will not crash and will continue forwarding for other ports.
> VM restart required to restore connectivity.

The issue in destruct was reported (without a crash) by Jan also. 
http://openvswitch.org/pipermail/discuss/2016-February/020271.html

I wanted to try and fold in re-add without restarting the guest along
with this. I got it working for most part but without vhost client mode
support in DPDK, it's hard to cover all possibilities and not at least
leak the sockets. I think a re-add without restarting the guest is
something to revisit when vhost pmd and vhost client mode are available
in DPDK. 

Acked-by: Kevin Traynor 

> 
> Fixes: 58397e6c1e6c ("netdev-dpdk: add dpdk vhost-cuse ports")
> Signed-off-by: Ilya Maximets 
> ---
>  lib/netdev-dpdk.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 6ac0eec..f4ed210 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -869,10 +869,13 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev_)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev_);
> 
> -/* Can't remove a port while a guest is attached to it. */
> +/* Guest becomes an orphan if still attached. */
>  if (netdev_dpdk_get_virtio(dev) != NULL) {
> -VLOG_ERR("Can not remove port, vhost device still attached");
> -return;
> +VLOG_ERR("Removing port '%s' while vhost device still attached.",
> + netdev_->name);
> +VLOG_ERR("To restore connectivity after re-adding of port, VM on
> socket"
> + " '%s' must be restarted.",
> + dev->vhost_id);
>  }
> 
>  if (rte_vhost_driver_unregister(dev->vhost_id)) {
> --
> 2.5.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [dpdk-ovs] OVS 2.5.0 is incompatible with latest dpdk-16.04

2016-04-12 Thread Traynor, Kevin
Hi Alex,

> -Original Message-
> From: Wang, Alex
> Sent: Tuesday, April 12, 2016 6:50 AM
> To: Traynor, Kevin ; Xu, Qian Q
> 
> Cc: dev@openvswitch.org; Zheng, HaiyanX ; Qiu,
> Michael ; Qian, Xiaobing 
> Subject: [ovs-dev] [dpdk-ovs] OVS 2.5.0 is incompatible with latest dpdk-
> 16.04
> 
> Hi Kevin,
> 
> Latest official DPDK-16.04 has been released and we're trying to use
> OVS(v2.5.0) with this version DPDK. However current OVS cannot support
> dpdk16.04 and we hit the incompatible issue which caused OVS failed to build
> with dpdk16.04.

You are in a hurry - it was only released 10 hours ago :)

> 
> According to the failure message, corresponding code updates are also needed
> in OVS to support DPDK16.04. I would like to know do we have any plan to
> implement the fix/patch to support DPDK16.04?

Yes, Michal has a patch almost ready to submit. I think it just a needs a final
test run now that DPDK 16.04 is released. 

The patch will be for OVS master, not for OVS 2.5 branch. We haven't typically
backported newer versions of DPDK, although I suspect it would apply if you 
wanted
to use it with OVS 2.5.

Kevin.

> 
> lib/netdev-dpdk.c: In function 'netdev_dpdk_get_features':
> lib/netdev-dpdk.c:1569:29: error: 'ETH_LINK_AUTONEG_DUPLEX' undeclared (first
> use in this function)
>  if (link.link_duplex == ETH_LINK_AUTONEG_DUPLEX) {
>      ^
> lib/netdev-dpdk.c:1569:29: note: each undeclared identifier is reported only
> once for each function it appears in
> lib/netdev-dpdk.c:1574:32: error: 'ETH_LINK_SPEED_10' undeclared (first use
> in this function)
>  if (link.link_speed == ETH_LINK_SPEED_10) {
>     ^
> lib/netdev-dpdk.c:1577:32: error: 'ETH_LINK_SPEED_100' undeclared (first use
> in this function)
>  if (link.link_speed == ETH_LINK_SPEED_100) {
>     ^
> lib/netdev-dpdk.c:1580:32: error: 'ETH_LINK_SPEED_1000' undeclared (first use
> in this function)
>  if (link.link_speed == ETH_LINK_SPEED_1000) {
>     ^
> lib/netdev-dpdk.c:1593:32: error: 'ETH_LINK_SPEED_1' undeclared (first
> use in this function)
>  if (link.link_speed == ETH_LINK_SPEED_1) {
>     ^
> Makefile:4026: recipe for target 'lib/netdev-dpdk.lo' failed
> make[2]: *** [lib/netdev-dpdk.lo] Error 1
> make[2]: Leaving directory '/home/openvswitch-2.5.0'
> Makefile:4678: recipe for target 'all-recursive' failed
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory '/home/openvswitch-2.5.0'
> Makefile:2619: recipe for target 'all' failed
> make: *** [all] Error 2
> [root@localhost openvswitch-2.5.0]#
> 
> Our test configurations:
> 
> OS kernel: Fedora 23 + Kernel 4.4.6
> Motherboard: Intel WildCat Pass
> BIOS: D0016
> OVS version: 2.5.0
> DPDK version: 16.04
> 
> Best Regards,
> Alex

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v11 3/8] netdev-dpdk: Convert initialization from cmdline to db

2016-04-12 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Aaron Conole
> Sent: Friday, April 1, 2016 4:32 PM
> To: dev@openvswitch.org; Flavio Leitner ; Traynor, Kevin
> ; Panu Matilainen ;
> Wojciechowicz, RobertX ; Mooney, Sean K
> ; Andy Zhou ; Daniele Di Proietto
> ; Zoltan Kiss ; Christian
> Ehrhardt ; Ben Pfaff 
> Subject: [ovs-dev] [PATCH v11 3/8] netdev-dpdk: Convert initialization from
> cmdline to db
> 
> Existing DPDK integration is provided by use of command line options which
> must be split out and passed to librte in a special manner. However, this
> forces any configuration to be passed by way of a special DPDK flag, and
> interferes with ovs+dpdk packaging solutions.
> 
> This commit delays dpdk initialization until after the OVS database
> connection is established, at which point ovs initializes librte. It
> pulls all of the config data from the OVS database, and assembles a
> new argv/argc pair to be passed along.
> 
> Signed-off-by: Aaron Conole 
> ---

hi Aaron,

There's a few hunks in this patch that don't apply cleanly to master
anymore. It conflicts with the cleanup in d46285.


8<-8<-

> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 7d6976f..c350247 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -171,11 +171,63 @@
>  
>
> 
> +   +  type='{"type": "integer", "minInteger": 1}'>
> +
> +  Specifies the maximum number of rx queues to be created for each
> dpdk
> +  interface.  If not specified or specified to 0, one rx queue will
> +  be created for each dpdk interface by default.
> +
> +  

This was removed from master - looks like it snuck back in through a rebase.

I'll give it another review and quick test when it's rebased, but with the 
issues
above addressed consider it to be acked.

Acked-by: Kevin Traynor 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] Update relevant artifacts to add support for DPDK 16.04.

2016-04-13 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Panu Matilainen
> Sent: Wednesday, April 13, 2016 8:50 AM
> To: Weglicki, MichalX ; dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] Update relevant artifacts to add support for
> DPDK 16.04.
> 

[snip]

> As an aside, I've been thinking maybe this is a case where OVS could
> support both DPDK 2.2 and 16.04. I know its unprecedented but maybe that
> could change, restricting OVS to just one DPDK version seems
> unnecessarily strict when talking about differences this trivial.

Judging by the ML, it's more commonly requested to use the current release
of DPDK with the last release of OVS e.g. OVS 2.5 and DPDK 16.04, than people
wanting OVS master with DPDK X-1.

Even for a trivial case like above - it would be ok now to add support for DPDK 
X-1
but if we then add OVS code to take advantage of new DPDK X features (e.g. vhost
pmd) we'll end up with messy code. Also testing efforts would increase 
(double?),
so I don't think we would get it without a cost.

Kevin.

> 
>   - Panu -
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.

2015-06-24 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Panu Matilainen
> Sent: Wednesday, June 24, 2015 9:33 AM
> To: Pravin Shelar; Jesse Gross
> Cc: dev@openvswitch.org; Flavio Leitner
> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.
> 
> On 06/24/2015 05:06 AM, Pravin Shelar wrote:
> > On Tue, Jun 23, 2015 at 2:51 PM, Jesse Gross  wrote:
> >> On Mon, Jun 22, 2015 at 8:08 PM, Pravin Shelar  wrote:
> >>> On Fri, Jun 19, 2015 at 11:24 AM, Daniele Di Proietto
> >>>  wrote:
> >>>>
> >>>>
> >>>> On 18/06/2015 23:57, "Traynor, Kevin"  wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>>> -Original Message-
> >>>>>
> >>>>>> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> >>>>>
> >>>>>> Proietto
> >>>>>
> >>>>>> Sent: Tuesday, June 16, 2015 7:39 PM
> >>>>>
> >>>>>> To: dev@openvswitch.org
> >>>>>
> >>>>>> Subject: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH
> flag.
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is
> >>>>>
> >>>>>> set in 'ol_flags'.  Otherwise the hash is garbage and doesn't
> >>>>>
> >>>>>> relate to the packet.
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> This fixes an issue with vhost, which, being a virtual NIC, doesn't
> >>>>>
> >>>>>> compute the hash.
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> Unfortunately the ixgbe vPMD doesn't set the PKT_RX_RSS_HASH, forcing
> >>>>>
> >>>>>> OVS to compute an hash is software.  This has a significant impact on
> >>>>>
> >>>>>> performance (-30% throughput in a single flow setup) which can be
> >>>>>
> >>>>>> mitigated in the CPU supports crc32c instructions.
> >>>>>
> >>>>>
> >>>>>
> >>>>> As per the other thread on this I'm a bit concerned about the
> performance
> >>>>>
> >>>>> drop from this patch, so I did some testing of this and alternative/
> >>>>>
> >>>>> complimentary solutions.
> >>>>>
> >>>>>
> >>>>>
> >>>>> Here's the options I looked at and some comments:
> >>>>>
> >>>>> 1. This patch in isolation: vhost drops about ~15% vhost-vhost and
> >>>>>
> >>>>> phy-vhost-phy (because of sw hash) but also there is drops of ~25% for
> >>>>>
> >>>>> phy-phy and ~15% drop for phy-ivshmem-phy.
> >>>>>
> >>>>>
> >>>>>
> >>>>> 2. Leave the code as is and let EMC misses happen for vhost rx pkts:
> >>>>>
> >>>>> I measure this at ~35% drop if missed *everytime* for vhost-vhost. We
> >>>>>
> >>>>> see in testing that it can also never happen, but this is not
> realistic.
> >>>>>
> >>>>> There should be no impact to other DPDK interfaces.
> >>>>>
> >>>>>
> >>>>>
> >>>>> 3. Add hash reset for packets from vhost: This is another way of
> forcing
> >>>>>
> >>>>> the software hash for vhost rx and it is roughly equivalent in
> performance
> >>>>>
> >>>>> to 1. for vhost-vhost (~15% drop). While there is a no significant drop
> >>>>>
> >>>>> for phy-vhost-phy. There should be no impact to other DPDK interfaces.
> >>>>>
> >>>>>
> >>>>>
> >>>>> 4. Apply this patch and turn off Rx Vectorisation. vhost-vhost will
> drop
> >>>>>
> >>>>> ~15% as per 1. and there should be nothing significant for phy-vhost-
> phy.
> >>>>>
> >>>>> We would lose the 10% gain that rx vectorisation gave us for phy-phy.
> >>>>>
> >>>&g

Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.

2015-06-26 Thread Traynor, Kevin

> -Original Message-
> From: Daniele Di Proietto [mailto:diproiet...@vmware.com]
> Sent: Wednesday, June 24, 2015 5:00 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org; Flavio Leitner; Panu Matilainen; Jesse Gross; Pravin
> Shelar
> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.
> 
> 
> 
> On 24/06/2015 09:47, "Traynor, Kevin"  wrote:

..[snip]..

> >
> >I don't expect there will be a DPDK 2.0.1 release either. I'm optimistic
> >we
> >
> >can get a standalone patch to fix the issue in DPDK 2.1 which we will have
> >
> >at the end of July. We could then roll DPDK 2.1 support into OVS master
> >(and
> >
> >presumably OVS 2.5).
> >
> >
> >
> >The issue is fixed as part of the unified packet api changes but that
> >won't
> >
> >be available (by default) until DPDK 2.2, so obviously we would prefer
> >not to
> >
> >have to wait until then.
> >
> 
> I sent a patch to the list that implements the workaround.


Thanks Daniele. FYI - the unified/non-unified packet type patches to fix this
are on the dpdk-dev
http://dpdk.org/ml/archives/dev/2015-June/020247.html
http://dpdk.org/ml/archives/dev/2015-June/020112.html



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] vhost-user performance issue while using I350 NIC

2015-06-29 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of rajeev satya
> Sent: Monday, June 29, 2015 5:54 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] vhost-user performance issue while using I350 NIC
> 
> Hi All,
> While sending 1G bidirectional traffic of 64bytes size, I see a low
> performance for Phy-VM-Phy setup using OVS with DPDK. I assigned 1core to
> vswitchd process and also did performance tuning step for setting core
> affinity as mentioned in INSTALL.DPDK.md. But still I get around 1.1G
> throughput. When the same configuration is used for phy-phy I observe good
> performance. By increasing the cores to vswitchd also I could observe good
> performance. But I want to know if there is a possibility to get good
> performance by assigning 1core itself.
> 
> Following are the Platform and setup details:
> NOTE: Used latest ovs-master and DPDK2.0.0
> 1. Intel Xeon E5 2603 v3 (2 Sockets)
> 2. hugepagesz=1G, hugepages=8,isolcpus=1,2,3,4,5,6,7,8
> 3. Bound two I350 nics to igb_uio driver.
> 4. Ensured that the dpdk ports and the cores assigned to vswitchd are
> mapped to same socket.(included 'socket-mem 4096' also)
> 5. Brought up the OVS+DPDK as mentioned in INSTALL.DPDK.md for vhost-user
> implementation.
> 6.. Brought up VM using qemu with 4 vcpus and ran DPDK l2fwd inside it.
> 7. Used DPDK Pktgen to pump 1G bidirectional traffic of 64bytes size.
> 
> I observe that, even though 1G bidirectional traffic is pumped, the rate is
> still 1100/1100. I am really not sure why each nic is not transmitting
> beyond 550Mbps. When I use the same configuration for Phy-Phy I see the
> rate as 2000/2000.
> 
> Can you please let me know If I should make any I350 NIC specific changes
> in the code?
> All the RX/TX descriptor values of my I350 nic are set to its default
> values in my linux host. Should I do any tuning in my nic to increase
> performance?
> 
> I'm new to Openvswitch and want to learn the internals. It would be really
> helpful if you could let me know packet path in the source code for the
> Phy-VM-Phy scenario, so I can get a clear understanding and work on
> improving performance.

Ballpark, the figures you have look correct given that the E5-2603 is a
1.6 GHz part and at present code path is CPU bound. As you mentioned you
can increase throughput by adding another pmd/core. We are looking into
optimizations on the code path for this in OVS and DPDK which should
increase performance on a single core over the next months.

If you wanted you to try and tune the NIC rx/tx queue config you could
modify the code here
https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L451-462

Other things you could try would be checking the core affinitization,
enable hyper threading and use 2 PMD's/logical cores (still 1 physical
core), disable mergeable buffers on the vhost interface.

You can follow the packet code path from here
https://github.com/openvswitch/ovs/blob/master/lib/dpif-netdev.c#L2694

> 
> Thanks in advance.
> 
> Regards,
> Rajeev.
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] segmentation fault in openvswitch

2015-06-30 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of
> ravali.bu...@wipro.com
> Sent: Tuesday, June 30, 2015 7:39 AM
> To: dev@openvswitch.org
> Subject: [ovs-dev] segmentation fault in openvswitch
> 
> Hi Team,
> From below link https://access.redhat.com/documentation/en-
> US/Red_Hat_Enterprise_Linux/6/html-
> single/Virtualization_Tuning_and_Optimization_Guide/index.html
> In section 4.5: Multi-queue virtio-net works well for incoming traffic, but
> can occasionally hurt performance for outgoing traffic. Enabling multi-queue
> virtio-net increases the total throughput, and in parallel increases CPU
> consumption.
> So for configuring multi-queue virtio-net for vhost-user I have applied the
> patch from https://lists.gnu.org/archive/html/qemu-devel/2015-
> 05/msg05657.html .
> 
> After applying the patch and running the QEMU command I am able to see the
> segmentation fault in openvswitch.
> 
> Can you just let us know how to overcome this.

Multiqueue vhost-user support is not added to DPDK or OVS yet. It should be
part of DPDK 2.2 and then we can add support to OVS.

> 
> Thanks & Regards,
> Ravali
> 
> The information contained in this electronic message and any attachments to
> this message are intended for the exclusive use of the addressee(s) and may
> contain proprietary, confidential or privileged information. If you are not
> the intended recipient, you should not disseminate, distribute or copy this
> e-mail. Please notify the sender immediately and destroy all copies of this
> message and any attachments. WARNING: Computer viruses can be transmitted via
> email. The recipient should check this email and any attachments for the
> presence of viruses. The company accepts no liability for any damage caused
> by any virus transmitted by this email. www.wipro.com
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH 0/6] Increase miniflow's capacity.

2015-07-10 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jarno Rajahalme
> Sent: Thursday, July 9, 2015 6:16 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [RFC PATCH 0/6] Increase miniflow's capacity.
> 
> Upto now struct miniflow has been limited to 63 64-bit units.  This
> series increases this capacity to 128 64-bit units.  For presimed
> performance reasons the new miniflow uses one 64-bit map for tunnel
> metadata and another for the rest of the metadata and the fields
> extracted from packet headers.
> 
> Before making miniflow more complex, this series simplifies it a bit
> by always inlining the miniflow data and cleaning up the interface a
> bit.
> 
> All performance testing is yet to be done.  I would be thankful if
> someone verifies the performance impact on the DPDK datapath, if any.

Hi Jarno, I can run some tests on this early next week when I get access
to a board. 

> 
> Jarno Rajahalme (6):
>   tests: Check for core files before exiting.
>   meta-flow: Add a missing break statement.
>   lib: Always inline miniflows.
>   match: Single malloc minimatch.
>   flow: Eliminate miniflow_clone() and minimask_clone().
>   flow: Split miniflow's map.
> 
>  lib/classifier-private.h |  122 
>  lib/classifier.c |  149 --
>  lib/dpif-netdev.c|  103 +-
>  lib/flow.c   |  482 ++--
> --
>  lib/flow.h   |  252 +---
>  lib/match.c  |   39 ++--
>  lib/match.h  |   12 +-
>  lib/meta-flow.c  |2 +-
>  lib/tnl-ports.c  |4 +-
>  ofproto/ofproto.c|   10 +-
>  tests/ofproto-macros.at  |6 +-
>  tests/test-classifier.c  |  130 +++--
>  12 files changed, 716 insertions(+), 595 deletions(-)
> 
> --
> 1.7.10.4
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC PATCH 0/6] Increase miniflow's capacity.

2015-07-15 Thread Traynor, Kevin
> From: Jarno Rajahalme [mailto:jrajaha...@nicira.com]
> Sent: Friday, July 10, 2015 6:38 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [RFC PATCH 0/6] Increase miniflow's capacity.
> 
> 
> On Jul 10, 2015, at 9:38 AM, Traynor, Kevin  wrote:
> 
> 
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jarno Rajahalme
> Sent: Thursday, July 9, 2015 6:16 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [RFC PATCH 0/6] Increase miniflow's capacity.
> 
> Upto now struct miniflow has been limited to 63 64-bit units.  This
> series increases this capacity to 128 64-bit units.  For presimed
> performance reasons the new miniflow uses one 64-bit map for tunnel
> metadata and another for the rest of the metadata and the fields
> extracted from packet headers.
> 
> Before making miniflow more complex, this series simplifies it a bit
> by always inlining the miniflow data and cleaning up the interface a
> bit.
> 
> All performance testing is yet to be done.  I would be thankful if
> someone verifies the performance impact on the DPDK datapath, if any.
> 
> Hi Jarno, I can run some tests on this early next week when I get access
> to a board.
> 
> 
> Kevin,
> 
> I just sent a v2 based on feedback from Daniele, so please test on that
> instead!
> 
>   Jarno

I've tested this multiple times for phy-phy with dpdk and a bi-directional flow.
It's showing an avg. of 16.39 mpps on head of master and 16.67 mpps with your
changes - so no need to be worried! 

I ran a few tests on the other dpdk ports and there was a slight drop of 100K 
pps
for vhost, but I think this is just test variance.

> 
> 
> 
> 
> Jarno Rajahalme (6):
>  tests: Check for core files before exiting.
>  meta-flow: Add a missing break statement.
>  lib: Always inline miniflows.
>  match: Single malloc minimatch.
>  flow: Eliminate miniflow_clone() and minimask_clone().
>  flow: Split miniflow's map.
> 
> lib/classifier-private.h |  122 
> lib/classifier.c |  149 --
> lib/dpif-netdev.c|  103 +-
> lib/flow.c   |  482 ++--
> --
> lib/flow.h   |  252 +---
> lib/match.c  |   39 ++--
> lib/match.h  |   12 +-
> lib/meta-flow.c  |2 +-
> lib/tnl-ports.c  |4 +-
> ofproto/ofproto.c|   10 +-
> tests/ofproto-macros.at  |6 +-
> tests/test-classifier.c  |  130 +++--
> 12 files changed, 716 insertions(+), 595 deletions(-)
> 
> --
> 1.7.10.4
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev




___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup until we don't get any failure.

2015-07-23 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Thursday, July 16, 2015 7:48 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup until we
> don't get any failure.
> 
> It has been observed that some DPDK device (e.g intel xl710) report an
> high number of queues but make some of them available only for special
> functions (SRIOV).  Therefore the queues will be counted in
> rte_eth_dev_info_get(), but rte_eth_tx_queue_setup() will fail.
> 
> This commit works around the issue by retrying the device initialization
> with a smaller number of queues, if a queue fails to setup.
> 
> Reported-by: Ian Stokes 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/netdev-dpdk.c | 100 +++-
> --
>  1 file changed, 73 insertions(+), 27 deletions(-)


Acked-by: Kevin Traynor 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if initialization fails.

2015-07-23 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Thursday, July 16, 2015 7:48 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if
> initialization fails.
> 
> netdev_dpdk_set_multiq() should not set the number of configured rxq
> and txq if the driver initialization fails (meaning that the driver
> failed to setup the queues).  Otherwise, on a subsequent call to
> netdev_dpdk_set_multiq(), the code may believe that the queues have
> already been setup and there's no work to be done.
> 
> This commit fixes the problem by restoring the old values if
> dpdk_eth_dev_init() fails.
> 
> Reported-by: Ian Stokes 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/netdev-dpdk.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 8b843db..5ae805e 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -743,6 +743,7 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned
> int n_txq,
>  {
>  struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>  int err = 0;
> +int old_rxq, old_txq;
> 
>  if (netdev->up.n_txq == n_txq && netdev->up.n_rxq == n_rxq) {
>  return err;
> @@ -753,12 +754,20 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned
> int n_txq,
> 
>  rte_eth_dev_stop(netdev->port_id);
> 
> +old_txq = netdev->up.n_txq;
> +old_rxq = netdev->up.n_rxq;
>  netdev->up.n_txq = n_txq;
>  netdev->up.n_rxq = n_rxq;
> 
>  rte_free(netdev->tx_q);
>  err = dpdk_eth_dev_init(netdev);
>  netdev_dpdk_alloc_txq(netdev, netdev->real_n_txq);
> +if (err) {
> +/* If there has been an error, it means that the requested queues
> + * have not been created.  Restore the old numbers. */
> +netdev->up.n_txq = old_txq;
> +netdev->up.n_rxq = old_rxq;

I had thought that we should restore the previous netdev->tx_q but at present
txq's are fixed, so I think it is fine. If txq's become configurable we can 
change.

It would be good to get these patches into OVS2.4 branch if there is still time?

Acked-by: Kevin Traynor 

> +}
> 
>  netdev->txq_needs_locking = netdev->real_n_txq != netdev->up.n_txq;
> 
> --
> 2.1.4
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if initialization fails.

2015-07-24 Thread Traynor, Kevin
>>> size after 10.1 seconds
> >>> 2015-07-22T23:38:47.903Z|00017|memory|INFO|handlers:11 ports:1
> >>> revalidators:5 rules:5
> >>> 2015-07-22T23:38:51.118Z|00018|bridge|WARN|could not open network
> >> device
> >>> dpdk0 (No such device)
> >>>
> >>> I can reproduce it if there are any more questions.
> >>>
> >>> Cheers,
> >>>
> >>> Luis E. P.
> >>>
> >>>
> >>> On Jul 23, 2015, at 16:33, Luis E Pena
> >>> mailto:lp...@vmware.com>> wrote:
> >>>
> >>> Will do.
> >>>
> >>> Luis E. P.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Jul 23, 2015, at 16:31, Ethan Jackson
> >>> mailto:et...@nicira.com>> wrote:
> >>>
> >>> Would you please summarize the errors on list?  Daniele should
> >>> probably have a look at it since he wrote the patch originally.
> >>>
> >>> Ethan
> >>>
> >>> On Thu, Jul 23, 2015 at 4:27 PM, Luis E Pena
> >>> mailto:lp...@vmware.com>> wrote:
> >>> I ran into errors when using master when adding a dpdk port. I worked
> >>> with Pravin yesterday and we think that the cause is this patch.
> >>> When we tried branch-2.4, we ran into no errors.
> >>> Today I am working on confirming that this is the patch.
> >>>
> >>> Luis E. P.
> >>>
> >>> Sent from my iPhone
> >>>
> >>> On Jul 23, 2015, at 15:38, Ethan Jackson
> >>> mailto:et...@nicira.com>> wrote:
> >>>
> >>> Ben, Justin, should this be backported?  I'm not up on the policy at
> >> the
> >>> moment.
> >>>
> >>> Etha
> >>>
> >>> On Thu, Jul 23, 2015 at 3:42 AM, Traynor, Kevin
> >>> mailto:kevin.tray...@intel.com>> wrote:
> >>>
> >>> -Original Message-
> >>> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> >>> Proietto
> >>> Sent: Thursday, July 16, 2015 7:48 PM
> >>> To: dev@openvswitch.org<mailto:dev@openvswitch.org>
> >>> Subject: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if
> >>> initialization fails.
> >>>
> >>> netdev_dpdk_set_multiq() should not set the number of configured rxq
> >>> and txq if the driver initialization fails (meaning that the driver
> >>> failed to setup the queues).  Otherwise, on a subsequent call to
> >>> netdev_dpdk_set_multiq(), the code may believe that the queues have
> >>> already been setup and there's no work to be done.
> >>>
> >>> This commit fixes the problem by restoring the old values if
> >>> dpdk_eth_dev_init() fails.
> >>>
> >>> Reported-by: Ian Stokes
> >>> mailto:ian.sto...@intel.com>>
> >>> Signed-off-by: Daniele Di Proietto
> >>> mailto:diproiet...@vmware.com>>
> >>> ---
> >>> lib/netdev-dpdk.c | 9 +
> >>> 1 file changed, 9 insertions(+)
> >>>
> >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >>> index 8b843db..5ae805e 100644
> >>> --- a/lib/netdev-dpdk.c
> >>> +++ b/lib/netdev-dpdk.c
> >>> @@ -743,6 +743,7 @@ netdev_dpdk_set_multiq(struct netdev *netdev_,
> >>> unsigned
> >>> int n_txq,
> >>> {
> >>> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> >>> int err = 0;
> >>> +int old_rxq, old_txq;
> >>>
> >>> if (netdev->up.n_txq == n_txq && netdev->up.n_rxq == n_rxq) {
> >>> return err;
> >>> @@ -753,12 +754,20 @@ netdev_dpdk_set_multiq(struct netdev *netdev_,
> >>> unsigned
> >>> int n_txq,
> >>>
> >>> rte_eth_dev_stop(netdev->port_id);
> >>>
> >>> +old_txq = netdev->up.n_txq;
> >>> +old_rxq = netdev->up.n_rxq;
> >>> netdev->up.n_txq = n_txq;
> >>> netdev->up.n_rxq = n_rxq;
> >>>
> >>> rte_free(netdev->tx_q);
> >>> err = dpdk_eth_dev_init(netdev);
> >>> netdev_dpdk_alloc_txq(netdev, netdev->real_n_txq);
> >>> +if (err) {
> >>> +/* If there has been an error, it means that the requested
> >> queues
> >>> +

Re: [ovs-dev] [dpdk-dev] Fwd: OVS with DPDK ..Error packets

2015-07-31 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Srikanth Akula
> Sent: Wednesday, July 29, 2015 10:32 PM
> To: d...@dpdk.org; dev@openvswitch.org
> Subject: [dpdk-dev] Fwd: OVS with DPDK ..Error packets
> 
> (+DPDK dev team )
> 
> 
> Hello ,
> 
> I am trying to test the OVS_DPDK performance and found that lot of packets
> being treated as error packets .
> 
> ovs-vsctl get Interface dpdk0 statistics
> {collisions=0, rx_bytes=38915076374, rx_crc_err=0, rx_dropped=0,
> *rx_errors=3840287219
> <3840287219>, *rx_frame_err=0, rx_over_err=0, rx_packets=292972799,
> tx_bytes=38935883904, tx_dropped=0, tx_errors=0, tx_packets=293068162}
> 
> I am running DPDK application inside my VM .
> 
> Looks like there is a buffer issue ( 64Bytes - 10Gbps)
> 
> Could  somebody let me know if i have missed any configuration in DPDK/OVS ?

Errors can show here when you are sending traffic in at a rate higher than can
be handled, so it can be normal to see that in some cases. 

First thing I would check is that you have your PMD(s) and qemu threads doing
the fwding core affinitised to different cores so they get the max amount of
cycles they can. 

I would also check a simple phy-phy test to eliminate any test equipment/NIC
setup issues.

> 
> -Srikanth
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Translate Geneve options per-flow, not per-packet.

2015-08-04 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jesse Gross
> Sent: Thursday, July 30, 2015 4:10 AM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 2/2] dpif-netdev: Translate Geneve options per-
> flow, not per-packet.
> 
> The kernel implementation of Geneve options stores the TLV option
> data in the flow exactly as received, without any further parsing.
> This is then translated to known options for the purposes of matching
> on flow setup (which will then install a datapath flow in the form
> the kernel is expecting).
> 
> The userspace implementation behaves a little bit differently - it
> looks up known options as each packet is received. The reason for this
> is there is a much tighter coupling between datapath and flow translation
> and the representation is generally expected to be the same. This works
> but it incurs work on a per-packet basis that could be done per-flow
> instead.
> 
> This introduces a small translation step for Geneve packets between
> datapath and flow lookup for the userspace datapath in order to
> allow the same kind of processing that the kernel does.
> 
> There is a second benefit to this as well: for some operations it is
> preferable to keep the options exactly as they were received on the wire,
> which this enables. One example is that for packets that are executed from
> ofproto-dpif-upcall to the datapath, this avoids the translation of
> Geneve metadata. Since this conversion is potentially lossy (for unknown
> options), keeping everything in the same format removes the possibility
> of dropping options if the packet comes back up to userspace and the
> Geneve option translation table has changed. To help with these types of
> operations, most functions can understand both formats of data and
> seamlessly
> do the right thing.

I tested std bi-directional phy-phy flows with dpdk to see if this affected
performance for them and it looks to be fine - same performance with and
without this patch.

In general, my performance is down a few % from a couple of weeks ago but I
think it's something in my setup. 

> 
> Signed-off-by: Jesse Gross 
> ---
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev.c |  55 ++-
>  lib/flow.c|  48 --
>  lib/flow.h|  13 +-
>  lib/geneve.h  |  63 
>  lib/meta-flow.c   |   6 +-
>  lib/netdev-vport.c|  26 ++--
>  lib/odp-execute.c |   2 +-
>  lib/odp-util.c|  58 ---
>  lib/odp-util.h|  12 +-
>  lib/packets.h |  41 +
>  lib/tun-metadata.c| 352 ++-
> ---
>  lib/tun-metadata.h|  74 ++---
>  ofproto/ofproto-dpif-sflow.c  |   2 +-
>  ofproto/ofproto-dpif-upcall.c |   2 +-
>  tests/tunnel-push-pop.at  |   2 +-
>  16 files changed, 534 insertions(+), 223 deletions(-)
>  create mode 100644 lib/geneve.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index faca968..5b6e9e8 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -81,6 +81,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/fatal-signal.h \
>   lib/flow.c \
>   lib/flow.h \
> + lib/geneve.h \
>   lib/guarded-list.c \
>   lib/guarded-list.h \
>   lib/hash.c \
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index f587df5..c31a7e0 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1884,8 +1884,8 @@ dpif_netdev_mask_from_nlattrs(const struct nlattr
> *key, uint32_t key_len,
>  if (mask_key_len) {
>  enum odp_key_fitness fitness;
> 
> -fitness = odp_flow_key_to_mask(mask_key, mask_key_len, key,
> key_len,
> -   &wc->masks, flow);
> +fitness = odp_flow_key_to_mask_udpif(mask_key, mask_key_len, key,
> + key_len, &wc->masks, flow);
>  if (fitness) {
>  /* This should not happen: it indicates that
>   * odp_flow_key_from_mask() and odp_flow_key_to_mask()
> @@ -1919,7 +1919,7 @@ dpif_netdev_flow_from_nlattrs(const struct nlattr
> *key, uint32_t key_len,
>  {
>  odp_port_t in_port;
> 
> -if (odp_flow_key_to_flow(key, key_len, flow)) {
> +if (odp_flow_key_to_flow_udpif(key, key_len, flow)) {
>  /* This should not happen: it indicates that
> odp_flow_key_from_flow()
>   * and odp_flow_key_to_flow() disagree on the acceptable form of a
>   * flow.  Log the problem as an error, with enough details to
> enable
> @@ -3014,11 +3014,25 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd,
> struct dp_packet *packet_,
>   struct ofpbuf *actions, struct ofpbuf *put_actions)
>  {
>  struct dp_netdev *dp = pmd->dp;
> +struct flow_tnl orig_tunnel;
> +int err;
> 
>  if (OVS_UNLIKELY(!dp->upcall_cb)) {
>  return ENODEV;
>  }
> 
> +orig_tunnel.

Re: [ovs-dev] [RFC] dpdk: support multiple queues in vhost

2015-08-11 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Flavio Leitner
> Sent: Friday, July 31, 2015 11:30 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner
> Subject: [ovs-dev] [RFC] dpdk: support multiple queues in vhost
> 
> This RFC is based on the vhost multiple queues work on
> dpdk-dev: http://dpdk.org/ml/archives/dev/2015-June/019345.html

Hi Flavio - the patch looks good, one minor comment below.

> 
> Signed-off-by: Flavio Leitner 
> ---
>  lib/netdev-dpdk.c | 61 -
> --
>  1 file changed, 40 insertions(+), 21 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 5ae805e..493172c 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -215,12 +215,9 @@ struct netdev_dpdk {
>   * If the numbers match, 'txq_needs_locking' is false, otherwise it is
>   * true and we will take a spinlock on transmission */
>  int real_n_txq;
> +int real_n_rxq;
>  bool txq_needs_locking;
> 
> -/* Spinlock for vhost transmission.  Other DPDK devices use spinlocks in
> - * dpdk_tx_queue */
> -rte_spinlock_t vhost_tx_lock;
> -
>  /* virtio-net structure for vhost device */
>  OVSRCU_TYPE(struct virtio_net *) virtio_dev;
> 
> @@ -602,13 +599,10 @@ dpdk_dev_parse_name(const char dev_name[], const char
> prefix[],
>  static int
>  vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex)
>  {
> -struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> -
>  if (rte_eal_init_ret) {
>  return rte_eal_init_ret;
>  }
> 
> -rte_spinlock_init(&netdev->vhost_tx_lock);
>  return netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST);
>  }
> 
> @@ -791,9 +785,16 @@ netdev_dpdk_vhost_set_multiq(struct netdev *netdev_,
> unsigned int n_txq,
>  ovs_mutex_lock(&dpdk_mutex);
>  ovs_mutex_lock(&netdev->mutex);
> 
> +rte_free(netdev->tx_q);
> +/* FIXME: the number of vqueues needs to match */
>  netdev->up.n_txq = n_txq;
> -netdev->real_n_txq = 1;
> -netdev->up.n_rxq = 1;
> +netdev->up.n_rxq = n_rxq;
> +
> +/* vring has txq = rxq */
> +netdev->real_n_txq = n_rxq;
> +netdev->real_n_rxq = n_rxq;
> +netdev->txq_needs_locking = netdev->real_n_txq != netdev->up.n_txq;
> +netdev_dpdk_alloc_txq(netdev, netdev->up.n_txq);
> 
>  ovs_mutex_unlock(&netdev->mutex);
>  ovs_mutex_unlock(&dpdk_mutex);
> @@ -904,14 +905,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq_,
>  struct netdev *netdev = rx->up.netdev;
>  struct netdev_dpdk *vhost_dev = netdev_dpdk_cast(netdev);
>  struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(vhost_dev);
> -int qid = 1;
> +int qid = rxq_->queue_id;
>  uint16_t nb_rx = 0;
> 
>  if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) {
>  return EAGAIN;
>  }
> 
> -nb_rx = rte_vhost_dequeue_burst(virtio_dev, qid,
> +nb_rx = rte_vhost_dequeue_burst(virtio_dev, VIRTIO_TXQ + qid * 2,
>  vhost_dev->dpdk_mp->mp,
>  (struct rte_mbuf **)packets,
>  NETDEV_MAX_BURST);
> @@ -958,8 +959,9 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, struct
> dp_packet **packets,
>  }
> 
>  static void
> -__netdev_dpdk_vhost_send(struct netdev *netdev, struct dp_packet **pkts,
> - int cnt, bool may_steal)
> +__netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
> + struct dp_packet **pkts, int cnt,
> + bool may_steal)
>  {
>  struct netdev_dpdk *vhost_dev = netdev_dpdk_cast(netdev);
>  struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(vhost_dev);
> @@ -974,13 +976,16 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, struct
> dp_packet **pkts,
>  goto out;
>  }
> 
> -/* There is vHost TX single queue, So we need to lock it for TX. */
> -rte_spinlock_lock(&vhost_dev->vhost_tx_lock);
> +if (vhost_dev->txq_needs_locking) {
> +qid = qid % vhost_dev->real_n_txq;
> +rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock);
> +}
> 
>  do {
> +int vhost_qid = VIRTIO_RXQ + qid * VIRTIO_QNUM;
>  unsigned int tx_pkts;
> 
> -tx_pkts = rte_vhost_enqueue_burst(virtio_dev, VIRTIO_RXQ,
> +tx_pkts = rte_vhost_enqueue_burst(virtio_dev, vhost_qid,
>cur_pkts, cnt);
>  if (OVS_LIKELY(tx_pkts)) {
>  /* Packets have been sent.*/
> @@ -999,7 +1004,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, struct
> dp_packet **pkts,
>   * Unable to enqueue packets to vhost interface.
>   * Check available entries before retrying.
>   */
> -while (!rte_vring_available_entries(virtio_dev, VIRTIO_RXQ)) {
> +while (!rte_vring_available_entries(virtio_dev, vhost_qid)) {
>  if (OVS_UNLIKEL

Re: [ovs-dev] OVS-DPDK performance problem on ixgbe vector PMD

2015-08-24 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Zoltan Kiss
> Sent: Friday, August 21, 2015 7:05 PM
> To: d...@dpdk.org; dev@openvswitch.org
> Cc: Richardson, Bruce; Ananyev, Konstantin
> Subject: [ovs-dev] OVS-DPDK performance problem on ixgbe vector PMD
> 
> Hi,
> 
> I've set up a simple packet forwarding perf test on a dual-port 10G
> 82599ES: one port receives 64 byte UDP packets, the other sends it out,
> one core used. I've used latest OVS with DPDK 2.1, and the first result
> was only 13.2 Mpps, which was a bit far from the 13.9 I've seen last
> year with the same test. The first thing I've changed was to revert back
> to the old behaviour about this issue:
> 
> http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/22731
> 
> So instead of the new default I've passed 2048 + RTE_PKTMBUF_HEADROOM.
> That increased the performance to 13.5, but to figure out what's wrong
> started to play with the receive functions. First I've disabled vector
> PMD, but ixgbe_recv_pkts_bulk_alloc() was even worse, only 12.5 Mpps. So
> then I've enabled scattered RX, and with
> ixgbe_recv_pkts_lro_bulk_alloc() I could manage to get 13.98 Mpps, which
> is I guess as close as possible to the 14.2 line rate (on my HW at
> least, with one core)
> Does anyone has a good explanation about why the vector PMD performs so
> significantly worse? I would expect that on a 3.2 GHz i5-4570 one core
> should be able to reach ~14 Mpps, SG and vector PMD shouldn't make a
> difference.

I've previously turned on/off vectorisation and found that for tx it makes
a significant difference. For Rx it didn't make a much of a difference but
rx bulk allocation which gets enabled with it did improve performance.

Is there is something else also running on the current pmd core? did you
try moving it to another? Also, did you compile OVS with -O3/-Ofast, they
tend to give a performance boost.

Are you hitting 3.2 GHz for the core with the pmd? I think that is only
with turbo boost, so it may not be achievable all the time.

> I've tried to look into it with oprofile, but the results were quite
> strange: 35% of the samples were from miniflow_extract, the part where
> parse_vlan calls data_pull to jump after the MAC addresses. The oprofile
> snippet (1M samples):
> 
>511454 190.0037  flow.c:511
>511458 149   0.0292  dp-packet.h:266
>51145f 4264  0.8357  dp-packet.h:267
>511466 180.0035  dp-packet.h:268
>51146d 430.0084  dp-packet.h:269
>511474 172   0.0337  flow.c:511
>51147a 4320  0.8467  string3.h:51
>51147e 358763   70.3176  flow.c:99
>511482 23.9e-04  string3.h:51
>511485 3060  0.5998  string3.h:51
>511488 1693  0.3318  string3.h:51
>51148c 2933  0.5749  flow.c:326
>511491 470.0092  flow.c:326
> 
> And the corresponding disassembled code:
> 
>511454:   49 83 f9 0d cmpr9,0xd
>511458:   c6 83 81 00 00 00 00movBYTE PTR [rbx+0x81],0x0
>51145f:   66 89 83 82 00 00 00movWORD PTR [rbx+0x82],ax
>511466:   66 89 93 84 00 00 00movWORD PTR [rbx+0x84],dx
>51146d:   66 89 8b 86 00 00 00movWORD PTR [rbx+0x86],cx
>511474:   0f 86 af 01 00 00   jbe511629
> 
>51147a:   48 8b 45 00 movrax,QWORD PTR [rbp+0x0]
>51147e:   4c 8d 5d 0c lear11,[rbp+0xc]
>511482:   49 89 00movQWORD PTR [r8],rax
>511485:   8b 45 08moveax,DWORD PTR [rbp+0x8]
>511488:   41 89 40 08 movDWORD PTR [r8+0x8],eax
>51148c:   44 0f b7 55 0c  movzx  r10d,WORD PTR [rbp+0xc]
>511491:   66 41 81 fa 81 00   cmpr10w,0x81
> 
> My only explanation to this so far is that I misunderstand something
> about the oprofile results.
> 
> Regards,
> 
> Zoltan
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] coverage: Add coverage_try_clear() for performance-critical threads.

2015-08-25 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Monday, August 24, 2015 7:08 PM
> To: Alex Wang
> Cc: dev; Ilya Maximets
> Subject: Re: [ovs-dev] [PATCH] coverage: Add coverage_try_clear() for
> performance-critical threads.
> 
> Hi Alex,
> 
> sorry for the delay and thanks for taking care of this.
> I couldn't experience any noticeable performance drop.

Same for me - no performance drop in my tests.

> 
> Acked-by: Daniele Di Proietto 
> 
> On 22/08/2015 16:44, "Alex Wang"  wrote:
> 
> >Sorry for the delay of pushing this and related dpdk patches,
> >
> >Want to spend some time next week measuring the performance impact,~
> >
> >Thanks,
> >Alex Wang,
> >
> >On Fri, Aug 21, 2015 at 12:48 PM, Ben Pfaff  wrote:
> >
> >> On Thu, Aug 13, 2015 at 11:48:49AM -0700, Alex Wang wrote:
> >> > For performance-critical threads like pmd threads, we currently make
> >>them
> >> > never call coverage_clear() to avoid contention over the global mutex
> >> > 'coverage_mutex'.  So, even though pmd thread still keeps updating
> >>their
> >> > thread-local coverage count, the count is never attributed to the
> >>global
> >> > total.  But it is useful to have them available.
> >> >
> >> > This commit makes this happen by implementing a non-contending version
> >> > of the clear function, coverage_try_clear().  The function will use
> >> > the ovs_mutex_trylock() and return immediately if the mutex cannot
> >> > be acquired.  Since threads like pmd thread are always busy-looping,
> >> > the lock will eventually be acquired.
> >> >
> >> > Requested-by: Ilya Maximets 
> >> > Signed-off-by: Alex Wang 
> >>
> >> This seems like an improvement.  I can imagine better data structures
> >> but I don't know whether they're worthwhile.
> >>
> >> Acked-by: Ben Pfaff 
> >>
> >___
> >dev mailing list
> >dev@openvswitch.org
> >https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_mailma
> >n_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Sm
> >B5nZacmXNq0gKCC1s_Cw5yUNjxgD4v5kJqZ2uWLlE&m=eyhC-yvVbYX9Ldi_S7D-WQlI0gxgpN
> >Z2f7CNpONhdHQ&s=u-Il0-gm4T7RRfs9WlDo_6TRlF2zgHBDAjBYt_IV6VM&e=
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.

2015-09-08 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Tuesday, June 16, 2015 7:39 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.
> 
> DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is
> set in 'ol_flags'.  Otherwise the hash is garbage and doesn't
> relate to the packet.
> 
> This fixes an issue with vhost, which, being a virtual NIC, doesn't
> compute the hash.
> 
> Unfortunately the ixgbe vPMD doesn't set the PKT_RX_RSS_HASH, forcing
> OVS to compute an hash is software.  This has a significant impact on
> performance (-30% throughput in a single flow setup) which can be
> mitigated in the CPU supports crc32c instructions.

Hi Daniele - revisiting this patch now that DPDK 2.1 support has been
integrated into OVS. I've checked and the ixgbe vector PMD correctly
sets the rss bit in ol_flags, so this patch should be able to do its
job now it will get the correct input. The patch still applies and LGTM.

> 
> Reported-by: Dongjun 
> Suggested-by: Flavio Leitner 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/dp-packet.h   | 11 +++
>  lib/dpif-netdev.c |  2 +-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index e4c2593..6840750 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -529,11 +529,22 @@ dp_packet_set_rss_hash(struct dp_packet *p, uint32_t
> hash)
>  {
>  #ifdef DPDK_NETDEV
>  p->mbuf.hash.rss = hash;
> +p->mbuf.ol_flags |= PKT_RX_RSS_HASH;
>  #else
>  p->rss_hash = hash;
>  #endif
>  }
> 
> +static inline bool
> +dp_packet_rss_valid(struct dp_packet *p)
> +{
> +#ifdef DPDK_NETDEV
> +return p->mbuf.ol_flags & PKT_RX_RSS_HASH;
> +#else
> +return true;
> +#endif
> +}
> +
>  #ifdef  __cplusplus
>  }
>  #endif
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index f13169c..c4a4b3a 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3036,7 +3036,7 @@ dpif_netdev_packet_get_rss_hash(struct dp_packet
> *packet,
>  {
>  uint32_t hash, recirc_depth;
> 
> -hash = dp_packet_get_rss_hash(packet);
> +hash = dp_packet_rss_valid(packet) ? dp_packet_get_rss_hash(packet) : 0;
>  if (OVS_UNLIKELY(!hash)) {
>  hash = miniflow_hash_5tuple(mf, 0);
>  dp_packet_set_rss_hash(packet, hash);
> --
> 2.1.4
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] dpif-netdev: Check for PKT_RX_RSS_HASH flag.

2015-09-09 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Wednesday, September 9, 2015 4:46 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH v2] dpif-netdev: Check for PKT_RX_RSS_HASH flag.
> 
> DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is
> set in 'ol_flags'.  Otherwise the hash is garbage and doesn't
> relate to the packet.
> 
> This fixes an issue with vhost, which, being a virtual NIC, doesn't
> compute the hash.
> 
> Reported-by: Dongjun 
> Suggested-by: Flavio Leitner 
> Signed-off-by: Daniele Di Proietto 
> ---
> v1 -> v2:
> 
> * Added a comment above dp_packet_get_rss_hash()
> * Added an OVS_UNUSED attribute on dp_packet_rss_valid()
> ---

Thanks Daniele - I tested and working as expected.

Acked-by: Kevin Traynor 

>  lib/dp-packet.h   | 13 +
>  lib/dpif-netdev.c |  2 +-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index e4c2593..5532bee 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -514,6 +514,8 @@ dp_packet_reset_packet(struct dp_packet *b, int off)
>  b->l2_5_ofs = b->l3_ofs = b->l4_ofs = UINT16_MAX;
>  }
> 
> +/* Returns the RSS hash of the packet 'p'.  Note that the returned value is
> + * correct only if 'dp_packet_rss_valid(p)' returns true */
>  static inline uint32_t
>  dp_packet_get_rss_hash(struct dp_packet *p)
>  {
> @@ -529,11 +531,22 @@ dp_packet_set_rss_hash(struct dp_packet *p, uint32_t
> hash)
>  {
>  #ifdef DPDK_NETDEV
>  p->mbuf.hash.rss = hash;
> +p->mbuf.ol_flags |= PKT_RX_RSS_HASH;
>  #else
>  p->rss_hash = hash;
>  #endif
>  }
> 
> +static inline bool
> +dp_packet_rss_valid(struct dp_packet *p OVS_UNUSED)
> +{
> +#ifdef DPDK_NETDEV
> +return p->mbuf.ol_flags & PKT_RX_RSS_HASH;
> +#else
> +return true;
> +#endif
> +}
> +
>  #ifdef  __cplusplus
>  }
>  #endif
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index db76290..490ced3 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3100,7 +3100,7 @@ dpif_netdev_packet_get_rss_hash(struct dp_packet
> *packet,
>  {
>  uint32_t hash, recirc_depth;
> 
> -hash = dp_packet_get_rss_hash(packet);
> +hash = dp_packet_rss_valid(packet) ? dp_packet_get_rss_hash(packet) : 0;
>  if (OVS_UNLIKELY(!hash)) {
>  hash = miniflow_hash_5tuple(mf, 0);
>  dp_packet_set_rss_hash(packet, hash);
> --
> 2.1.4
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/5] netdev-dpdk: Convert initialization from cmdline to db

2015-12-21 Thread Traynor, Kevin
Hi Aaron,

> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Friday, December 18, 2015 6:28 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner; Traynor, Kevin
> Subject: [PATCH 2/5] netdev-dpdk: Convert initialization from cmdline to db
> 
> Existing DPDK integration is provided by use of command line options which
> must be split out and passed to librte in a special manner. However, this
> forces any configuration to be passed by way of a special DPDK flag, and
> interferes with ovs+dpdk packaging solutions.
> 
> This commit delays dpdk initialization until after the OVS database
> connection
> is established, and then initializes librte. It pulls all of the config data
> from the OVS database, and assembles a new argv/argc pair to be passed along.

FYI - There is some whitespace warnings showing up when applying the patchset.

> 
> Signed-off-by: Aaron Conole 
> ---
>  INSTALL.DPDK.md |  60 -
>  lib/netdev-dpdk.c   | 172 +-
> --
>  lib/netdev-dpdk.h   |  19 --
>  vswitchd/bridge.c   |   3 +
>  vswitchd/ovs-vswitchd.c |  25 ++-
>  5 files changed, 181 insertions(+), 98 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 96b686c..b9d92d0 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -143,22 +143,48 @@ Using the DPDK with ovs-vswitchd:
> 
>  5. Start vswitchd:
> 
> -   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
> -   argument. This needs to be first argument passed to vswitchd process.
> -   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
> -   for dpdk initialization.
> +   DPDK configuration arguments can be passed to vswitchd via Open_vSwitch
> +   other_config database. The recognized configuration options are listed.
> 
> +   * dpdk
> +   This is a bolean configuration option. A value of 'true' signals

typo boolean

> +   Open_vSwitch to initialize the DPDK EAL at startup. A set of nominal
> +   defaults are provided so that simply enabling this option will be
> sufficient
> +   to configure DPDK enabled ports.
> +
> +   * dpdk_lcore_mask
> +   This sets the core mask affinity of non-PMD threads that are spawned by
> the
> +   EAL. It will not impact the affinities of the bridge, or other Open
> vSwitch

I think we need to explain a bit more about the behavior of this param and the
default wrt num of cores and which ones are default. 

> +   userspace threads.
> +
> +   * dpdk_mem_channels
> +   This sets the number of memory spread channels in the CPU to be used by
> +   DPDK. It is purely an optimization flag.
> +
> +   * dpdk_hugepage_dir
> +   Directory where hugetlbfs is mounted
> +
> +   * cuse_dev_name
> +   Option to set the vhost_cuse character device name.
> +
> +   * vhost_sock_dir
> +   Option to set the path to the vhost_user unix socket files.
> +
> +   NOTE: Changing any of these options requires restarting the ovs-vswitchd
> +   application.
> +
> ```
> export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
> -   ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
> +   ovs-vsctl set Open_vSwitch . other_config:dpdk=true
> +   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
> ```

To be consistent with the rest of the guide, we should leave in the command to
start vswitchd, even if there is nothing dpdk specific about it anymore. It will
save someone having to go find it.

> 
> If allocated more than one GB hugepage (as for IVSHMEM), set amount and
> use NUMA node 0 memory:
> 
> ```
> -   ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
> -   -- unix:$DB_SOCK --pidfile --detach
> +   ovs-vsctl set Open_vSwitch . other_config:dpdk_socket_mem="1024,0"
> +   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
> ```
> 
>  6. Add bridge & ports
> @@ -521,11 +547,12 @@ have arbitrary names.
>   `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
>   to your VM on the QEMU command line. More instructions on this can be
>   found in the next section "DPDK vhost-user VM configuration"
> - Note: If you wish for the vhost-user sockets to be created in a
> - directory other than `/usr/local/var/run/openvswitch`, you may specify
> - another location on the ovs-vswitchd command line like so:
> +
> +  - If you wish for the vhost-user sockets to be created in a directory
> other
> +than `/usr/local/var/run/openvswitch`, you may specify another location
> +in the ovsdb like:
> 
> -  `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
> +`./vsw

Re: [ovs-dev] [PATCH 3/5] netdev-dpdk: Autofill lcore coremask if absent

2015-12-21 Thread Traynor, Kevin
> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Friday, December 18, 2015 6:28 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner; Traynor, Kevin
> Subject: [PATCH 3/5] netdev-dpdk: Autofill lcore coremask if absent
> 
> The user has control over the DPDK internal lcore coremask, but this
> parameter can be autofilled with a bit more intelligence. If the user
> does not fill this parameter in, we use the lowest set bit in the
> current task CPU affinity.
> 
> Signed-off-by: Aaron Conole 
> Cc: Kevin Traynor 
> ---
>  lib/netdev-dpdk.c | 47 ---
>  1 file changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 2a81058..696430f 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -65,6 +65,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 20);
>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
>  #define OVS_VPORT_DPDK "ovs_dpdk"
> 
> +#define MAX_BUFSIZ 256
> +
>  /*
>   * need to reserve tons of extra space in the mbufs so we can align the
>   * DMA addresses to 4KB.
> @@ -2147,7 +2149,8 @@ grow_argv(char ***argv, size_t cur_siz, size_t grow_by)
>  }
> 
>  static int
> -get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv)
> +get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv,
> +  int argc)
>  {
>  struct dpdk_options_map {
>  const char *ovs_configuration;
> @@ -2155,13 +2158,13 @@ get_dpdk_args(const struct ovsrec_open_vswitch
> *ovs_cfg, char ***argv)
>  bool default_enabled;
>  const char *default_value;
>  } opts[] = {
> -{"dpdk_lcore_mask", "-c", true, "0x1"},
> +{"dpdk_lcore_mask", "-c", false, NULL},
>  {"dpdk_mem_channels", "-n", true, "4"},
>  {"dpdk_alloc_mem", "-m", false, NULL},
>  {"dpdk_socket_mem", "--socket-mem", true, "1024,0"},
>  {"dpdk_hugepage_dir", "--huge-dir", false, NULL},
>  };
> -int i, ret = 1;
> +int i, ret = argc;
> 
>  for(i = 0; i < (sizeof(opts) / sizeof(opts[0])); ++i) {
>  const char *lookup = smap_get(&ovs_cfg->other_config,
> @@ -2203,7 +2206,8 @@ __dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)
>  {
>  char **argv = NULL;
>  int result;
> -int argc;
> +int argc = 0, argc_tmp;
> +bool auto_determine = true;
>  int err;
>  cpu_set_t cpuset;
> 
> @@ -2236,12 +2240,41 @@ __dpdk_init(const struct ovsrec_open_vswitch
> *ovs_cfg)
>  ovs_abort(0, "Thread getaffinity error %d.", err);
>  }
> 
> -argv = grow_argv(&argv, 0, 1);
> +argv = grow_argv(&argv, argc, argc+1);
>  if (!argv) {
>  ovs_abort(0, "Unable to allocate an initial argv.");
>  }
> -argv[0] = strdup("ovs"); /* TODO use prctl to get process name */
> -argc = get_dpdk_args(ovs_cfg, &argv);
> +argv[argc++] = strdup("ovs"); /* TODO use prctl to get process name */
> +
> +argc_tmp = get_dpdk_args(ovs_cfg, &argv, argc);
> +
> +while(argc_tmp != argc) {
> +if (!strcmp("-c", argv[argc++])) {
> +auto_determine = false;
> +}
> +}
> +
> +/**
> + * NOTE: This is an unsophisticated mechanism for determining the DPDK
> + * lcore for the DPDK Master.
> + */
> +if (auto_determine)
> +{

coding std if { 

> +int i;
> +for (i = 0; i < CPU_SETSIZE; i++) {
> +if (CPU_ISSET(i, &cpuset)) {

I had been thinking we could put in a check here to ensure that the
core we have selected is suitable for the socket-mem but I'm not sure
if it's really needed. We could always add it later, it won't change
the user interface.

> +char buf[MAX_BUFSIZ];
> +snprintf(buf, MAX_BUFSIZ, "0x%08llX", (1ULL< +argv = grow_argv(&argv, argc, argc+2);
> +if (!argv) {
> +ovs_abort(0, "Unable to grow argv for coremask");
> +}
> +argv[argc++] = strdup("-c");
> +argv[argc++] = strdup(buf);
> +i = CPU_SETSIZE;
> +}
> +}
> +}
> 
>  argv = grow_argv(&argv, argc, argc+1);
>  if (!argv) {
> --
> 2.6.1.133.gf5b6079

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 4/5] lib/daemon: Move the user:group code up one level

2015-12-21 Thread Traynor, Kevin
> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Friday, December 18, 2015 6:28 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner; Traynor, Kevin
> Subject: [PATCH 4/5] lib/daemon: Move the user:group code up one level
> 
> It will be useful in the future to be able to set ownership on other
> files which Open vSwitch creates. Allowing the specification of such
> ownership using the standard user:group notation by the user is
> desirable. So move the code which parses that information up one level
> to be used from other modules.

I don't think patches 4-5 are related to 1-3. Having them together might
hold up one or the other? 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/5] netdev-dpdk: Convert initialization from cmdline to db

2015-12-22 Thread Traynor, Kevin
Hi Aaron, 

I ran a few tests today...some more comments on the back of this, 

> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Monday, December 21, 2015 7:24 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org; Flavio Leitner
> Subject: Re: [PATCH 2/5] netdev-dpdk: Convert initialization from cmdline to
> db
> 
> "Traynor, Kevin"  writes:
> > Hi Aaron,
> >
> >> -Original Message-
> >> From: Aaron Conole [mailto:acon...@redhat.com]
> >> Sent: Friday, December 18, 2015 6:28 PM
> >> To: dev@openvswitch.org
> >> Cc: Flavio Leitner; Traynor, Kevin
> >> Subject: [PATCH 2/5] netdev-dpdk: Convert initialization from cmdline to
> db
> >>
> >> Existing DPDK integration is provided by use of command line options which
> >> must be split out and passed to librte in a special manner. However, this
> >> forces any configuration to be passed by way of a special DPDK flag, and
> >> interferes with ovs+dpdk packaging solutions.
> >>
> >> This commit delays dpdk initialization until after the OVS database
> >> connection
> >> is established, and then initializes librte. It pulls all of the config
> data
> >> from the OVS database, and assembles a new argv/argc pair to be passed
> along.
> >
> > FYI - There is some whitespace warnings showing up when applying the
> > patchset.
> 
> D'oh! Okay, I'll make sure to fix that when I resubmit v2.
> 
> >>
> >> Signed-off-by: Aaron Conole 
> >> ---
> >>  INSTALL.DPDK.md |  60 -
> >>  lib/netdev-dpdk.c   | 172 +--
> ---
> >> --
> >>  lib/netdev-dpdk.h   |  19 --
> >>  vswitchd/bridge.c   |   3 +
> >>  vswitchd/ovs-vswitchd.c |  25 ++-
> >>  5 files changed, 181 insertions(+), 98 deletions(-)
> >>
> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> >> index 96b686c..b9d92d0 100644
> >> --- a/INSTALL.DPDK.md
> >> +++ b/INSTALL.DPDK.md
> >> @@ -143,22 +143,48 @@ Using the DPDK with ovs-vswitchd:
> >>
> >>  5. Start vswitchd:
> >>
> >> -   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
> >> -   argument. This needs to be first argument passed to vswitchd process.
> >> -   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
> >> -   for dpdk initialization.
> >> +   DPDK configuration arguments can be passed to vswitchd via
> Open_vSwitch
> >> +   other_config database. The recognized configuration options are
> listed.
> >>
> >> +   * dpdk
> >> +   This is a bolean configuration option. A value of 'true' signals
> >
> > typo boolean

I don't see the value of having a dpdk=true/false config option at present.
It means that now a user will have to configure the build for dpdk and also
set this in the database, which is more work than previous. If I build with
dpdk and don't set this, I'm seeing "ovs-vswitchd: virtual memory exhausted". 

It might be a good idea to have a config like this in the context of a
unified build where it is the only option needed to enable dpdk but that's
for another day. The current build dependent dpdk_init()'s should be enough
to cover with/without dpdk.

> >
> >> +   Open_vSwitch to initialize the DPDK EAL at startup. A set of nominal
> >> +   defaults are provided so that simply enabling this option will be
> >> sufficient
> >> +   to configure DPDK enabled ports.
> >> +
> >> +   * dpdk_lcore_masknetdebv
> >> +   This sets the core mask affinity of non-PMD threads that are spawned
> by
> >> the
> >> +   EAL. It will not impact the affinities of the bridge, or other Open
> >> vSwitch
> >
> > I think we need to explain a bit more about the behavior of this param and
> the
> > default wrt num of cores and which ones are default.
> 
> Okay. I think you're right, because it seems like there's confusion all
> over the place about the DPDK lcore threads relationship with OVS threads.
> 
> >> +   userspace threads.
> >> +
> >> +   * dpdk_mem_channels
> >> +   This sets the number of memory spread channels in the CPU to be used
> by
> >> +   DPDK. It is purely an optimization flag.
> >> +
> >> +   * dpdk_hugepage_dir
> >> +   Directory where hugetlbfs is mounted
> >> +
> >> +   * cuse_dev_name
> >> +   Option to set t

Re: [ovs-dev] [PATCH 3/5] netdev-dpdk: Autofill lcore coremask if absent

2015-12-22 Thread Traynor, Kevin
> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Monday, December 21, 2015 7:27 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org; Flavio Leitner
> Subject: Re: [PATCH 3/5] netdev-dpdk: Autofill lcore coremask if absent
> 
> "Traynor, Kevin"  writes:
> >> -Original Message-
> >> From: Aaron Conole [mailto:acon...@redhat.com]
> >> Sent: Friday, December 18, 2015 6:28 PM
> >> To: dev@openvswitch.org
> >> Cc: Flavio Leitner; Traynor, Kevin
> >> Subject: [PATCH 3/5] netdev-dpdk: Autofill lcore coremask if absent
> >>
> >> The user has control over the DPDK internal lcore coremask, but this
> >> parameter can be autofilled with a bit more intelligence. If the user
> >> does not fill this parameter in, we use the lowest set bit in the
> >> current task CPU affinity.
> >>
> >> Signed-off-by: Aaron Conole 
> >> Cc: Kevin Traynor 
> >> ---
> >>  lib/netdev-dpdk.c | 47 ---
> >>  1 file changed, 40 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> index 2a81058..696430f 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -65,6 +65,8 @@ static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5,
> >> 20);
> >>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
> >>  #define OVS_VPORT_DPDK "ovs_dpdk"
> >>
> >> +#define MAX_BUFSIZ 256
> >> +
> >>  /*
> >>   * need to reserve tons of extra space in the mbufs so we can align the
> >>   * DMA addresses to 4KB.
> >> @@ -2147,7 +2149,8 @@ grow_argv(char ***argv, size_t cur_siz, size_t
> grow_by)
> >>  }
> >>
> >>  static int
> >> -get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv)
> >> +get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv,
> >> +  int argc)
> >>  {
> >>  struct dpdk_options_map {
> >>  const char *ovs_configuration;
> >> @@ -2155,13 +2158,13 @@ get_dpdk_args(const struct ovsrec_open_vswitch
> >> *ovs_cfg, char ***argv)
> >>  bool default_enabled;
> >>  const char *default_value;
> >>  } opts[] = {
> >> -{"dpdk_lcore_mask", "-c", true, "0x1"},
> >> +{"dpdk_lcore_mask", "-c", false, NULL},
> >>  {"dpdk_mem_channels", "-n", true, "4"},
> >>  {"dpdk_alloc_mem", "-m", false, NULL},
> >>  {"dpdk_socket_mem", "--socket-mem", true, "1024,0"},
> >>  {"dpdk_hugepage_dir", "--huge-dir", false, NULL},
> >>  };
> >> -int i, ret = 1;
> >> +int i, ret = argc;
> >>
> >>  for(i = 0; i < (sizeof(opts) / sizeof(opts[0])); ++i) {
> >>  const char *lookup = smap_get(&ovs_cfg->other_config,
> >> @@ -2203,7 +2206,8 @@ __dpdk_init(const struct ovsrec_open_vswitch
> *ovs_cfg)
> >>  {
> >>  char **argv = NULL;
> >>  int result;
> >> -int argc;
> >> +int argc = 0, argc_tmp;
> >> +bool auto_determine = true;
> >>  int err;
> >>  cpu_set_t cpuset;
> >>
> >> @@ -2236,12 +2240,41 @@ __dpdk_init(const struct ovsrec_open_vswitch
> >> *ovs_cfg)
> >>  ovs_abort(0, "Thread getaffinity error %d.", err);
> >>  }
> >>
> >> -argv = grow_argv(&argv, 0, 1);
> >> +argv = grow_argv(&argv, argc, argc+1);
> >>  if (!argv) {
> >>  ovs_abort(0, "Unable to allocate an initial argv.");
> >>  }
> >> -argv[0] = strdup("ovs"); /* TODO use prctl to get process name */
> >> -argc = get_dpdk_args(ovs_cfg, &argv);
> >> +argv[argc++] = strdup("ovs"); /* TODO use prctl to get process name
> */
> >> +
> >> +argc_tmp = get_dpdk_args(ovs_cfg, &argv, argc);
> >> +
> >> +while(argc_tmp != argc) {
> >> +if (!strcmp("-c", argv[argc++])) {
> >> +auto_determine = false;

we can break; here.

If we are not going to autodetermine, later we still set the affinities
back to the non-isolcpu'd cores. This means that the -c is used for the
rte_eal_init() only and the non-p

Re: [ovs-dev] [PATCH] INSTALL.DPDK.md: Clarify DPDK arguments.

2015-12-22 Thread Traynor, Kevin

> -Original Message-
> From: Zoltan Kiss [mailto:zoltan.k...@linaro.org]
> Sent: Tuesday, December 15, 2015 6:56 PM
> To: Traynor, Kevin; Aaron Conole
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] INSTALL.DPDK.md: Clarify DPDK arguments.
> 
> Hi,
> 
> On 15/12/15 14:50, Traynor, Kevin wrote:
> >> Seems good, assuming that the thread affinity at the time dpdk_init()
> >> >was called reflects what cores are allowed for non-PMD threads. And what
> >> >if the user wants to change that later?
> > Hi - not sure what you mean by "allowed for non-PMD threads". Is there an
> example?
> 
> I don't know how OVS determines the cores where non-PMD threads should
> run. I guess the most basic requirement is that they should NOT be on
> the PMD cores, if possible.

Hi Zoltan, sorry for the delayed response. 

yeah, at present it's just based on the -c. So the user will have knowledge
of what is being used for pmd and non-pmd. Ideally, we can start to create
defaults where that knowledge is not needed. 

> But then, you revert the affinity changes made by rte_eal_init(), so the
> only thing we set with the -c value is the lcore_id of the calling
> thread. I guess that doesn't have too much relevance. Or is there any
> case where the affinity of the non-PMD thread (which calls
> rte_eal_init()), changes, and lcore_id should follow that?

That's an interesting point - if the affinity floated across non-isolcpu'd
cores, it would be fine. If someone explicitly taskset it then I think we
may need to add a few LOC to account for that. It should be straightforward
to catch though.

> 
> Zoli
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] lib/netdev-dpdk: increase ring name length for dpdkr ports

2016-01-11 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Mauricio Vasquez
> B
> Sent: Sunday, January 10, 2016 6:28 PM
> To: dev@openvswitch.org
> Cc: acon...@bytheb.org
> Subject: [ovs-dev] [PATCH v2] lib/netdev-dpdk: increase ring name length for
> dpdkr ports
> 
> A ring name length of 10 characters is not enough for dpdkr ports
> starting from dpdkr10, then it is increased to RTE_RING_NAMESIZE
> characters.

Looks good to me. There's some existing headroom for name length in 
ring_client.c
but you may want to also increase it as part of this change now that even larger
numbers are possible? It should work up to  as is, so it would be just to 
catch
someone using magic numbers. 

> 
> Signed-off-by: Mauricio Vasquez B 
> ---
> v2:
> - Use RTE_RING_NAMESIZE instead of a numerical constant.
> 
>  lib/netdev-dpdk.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index b209df2..90512aa 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1921,7 +1921,7 @@ dpdk_ring_create(const char dev_name[], unsigned int
> port_no,
>   unsigned int *eth_port_id)
>  {
>  struct dpdk_ring *ivshmem;
> -char ring_name[10];
> +char ring_name[RTE_RING_NAMESIZE];
>  int err;
> 
>  ivshmem = dpdk_rte_mzalloc(sizeof *ivshmem);
> @@ -1930,7 +1930,7 @@ dpdk_ring_create(const char dev_name[], unsigned int
> port_no,
>  }
> 
>  /* XXX: Add support for multiquque ring. */
> -err = snprintf(ring_name, 10, "%s_tx", dev_name);
> +err = snprintf(ring_name, sizeof(ring_name), "%s_tx", dev_name);
>  if (err < 0) {
>  return -err;
>  }
> @@ -1943,7 +1943,7 @@ dpdk_ring_create(const char dev_name[], unsigned int
> port_no,
>  return ENOMEM;
>  }
> 
> -err = snprintf(ring_name, 10, "%s_rx", dev_name);
> +err = snprintf(ring_name, sizeof(ring_name), "%s_rx", dev_name);
>  if (err < 0) {
>  return -err;
>  }
> --
> 1.9.1
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 2/3] netdev-dpdk: Convert initialization from cmdline to db

2016-01-12 Thread Traynor, Kevin
> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Monday, January 4, 2016 9:47 PM
> To: dev@openvswitch.org; Flavio Leitner; Traynor, Kevin
> Subject: [PATCH v2 2/3] netdev-dpdk: Convert initialization from cmdline to
> db
> 
> Existing DPDK integration is provided by use of command line options which
> must be split out and passed to librte in a special manner. However, this
> forces any configuration to be passed by way of a special DPDK flag, and
> interferes with ovs+dpdk packaging solutions.
> 
> This commit delays dpdk initialization until the first DPDK netdev is added
> to the bridge, at which point ovs initializes librte. It pulls all of
> the config data from the OVS database, and assembles a new argv/argc
> pair to be passed along.
> 
> Signed-off-by: Aaron Conole 
> ---
> v2:
> * Removed trailing whitespace
> * Followed for() loop brace coding style
> * Automatically enable DPDK when adding a DPDK enabled port
> * Fixed an issue on startup when DPDK enabled ports are present
> * Updated the documentation (including vswitch.xml) and documented all
>   new parameters
> * Dropped the premature initialization test

Hi, mostly very minor comments below,

> 
> INSTALL.DPDK.md |  75 
>  lib/netdev-dpdk.c   | 224 +++---
> --
>  lib/netdev-dpdk.h   |  19 ++--
>  vswitchd/bridge.c   |   3 +
>  vswitchd/ovs-vswitchd.c |  25 +-
>  vswitchd/vswitch.xml| 102 +-
>  6 files changed, 341 insertions(+), 107 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 96b686c..2dd2120 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -143,22 +143,59 @@ Using the DPDK with ovs-vswitchd:
> 
>  5. Start vswitchd:
> 
> -   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
> -   argument. This needs to be first argument passed to vswitchd process.
> -   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
> -   for dpdk initialization.
> +   DPDK configuration arguments can be passed to vswitchd via Open_vSwitch
> +   other_config database. The recognized configuration options are listed.

'Defaults will be provided for all values not set.'

> +
> +   * dpdk-lcore-mask
> +   Specifies the CPU cores on which dpdk lcore threads should be spawned.
> +   The DPDK lcore threads are used for DPDK library tasks, such as
> +   library internal message processing, logging, etc. Value should be in
> +   the form of a hex string (so '0x123') similar to the 'taskset' mask
> +   input.

'CPU cores' and '0x123' imply it will be multiple cores in this case - better
to change to CPU core and 0x1

Also, I don't think the 0x prefix is accepted based on using pmd-cpu-mask.

> +   If not specified, the value will be determined by choosing the lowest
> +   CPU core from initial cpu affinity list. Otherwise, the value will be
> +   passed directly to the DPDK library.
> +   For performance reasons, it is best to set this to a single core on
> +   the system, rather than allow lcore threads to float.

I suppose it depends on the system load and vswitch usage as to which will give
better performance but I agree setting a dedicated core is at least more
deterministic and less risky, so statement is probably fine.

> +
> +   * dpdk-mem-channels
> +   This sets the number of memory spread channels per CPU socket. It is
> purely
> +   an optimization flag.
> +
> +   * dpdk-alloc-mem
> +   This sets the total memory to preallocate from hugepages regardless of
> +   processor socket. It is recommended to use dpdk-socket-mem instead.
> +
> +   * dpdk-socket-mem
> +   Comma separated list of memory to pre-allocate from hugepages on specific
> +   sockets.
> +
> +   * dpdk-hugepage-dir
> +   Directory where hugetlbfs is mounted
> +
> +   * cuse-dev-name
> +   Option to set the vhost_cuse character device name.
> +
> +   * vhost-sock-dir
> +   Option to set the path to the vhost_user unix socket files.
> +
> +   NOTE: Changing any of these options requires restarting the ovs-vswitchd
> +   application.
> +
> +   Open vSwitch can be started as normal. DPDK will not be initialized until
> +   the first DPDK-enabled port is added to the bridge.
> 
> ```
> export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
> -   ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
> +   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
> ```
> 
> If allocated more than one GB hugepage (as for IVSHMEM), set amount and
> use NUMA node 0 memory:
> 
> ```
> -   ovs-vswitchd --dpdk -c

Re: [ovs-dev] [PATCH v2 3/3] netdev-dpdk: Autofill lcore coremask if absent

2016-01-12 Thread Traynor, Kevin

> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Monday, January 4, 2016 9:47 PM
> To: dev@openvswitch.org; Flavio Leitner; Traynor, Kevin
> Subject: [PATCH v2 3/3] netdev-dpdk: Autofill lcore coremask if absent
> 
> The user has control over the DPDK internal lcore coremask, but this
> parameter can be autofilled with a bit more intelligence. If the user
> does not fill this parameter in, we use the lowest set bit in the
> current task CPU affinity. Otherwise, we will reassign the current
> thread to the specified lcore mask, in addition to the dpdk lcore
> threads.
> 
> Signed-off-by: Aaron Conole 
> Cc: Kevin Traynor 
> ---
> v2:
> * Fix a conditional branch coding standard issue
> * When lcore coremask is set, do not reset the affinities as
>   suggested by Kevin Traynor
> 
>  lib/netdev-dpdk.c | 58 -
> --
>  1 file changed, 47 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 2ce9f71..75f40ff 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -65,6 +65,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 20);
>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
>  #define OVS_VPORT_DPDK "ovs_dpdk"
> 
> +#define MAX_BUFSIZ 256
> +
>  /*
>   * need to reserve tons of extra space in the mbufs so we can align the
>   * DMA addresses to 4KB.
> @@ -2192,7 +2194,8 @@ grow_argv(char ***argv, size_t cur_siz, size_t grow_by)
>  }
> 
>  static int
> -get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv)
> +get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv,
> +  int argc)
>  {
>  struct dpdk_options_map {
>  const char *ovs_configuration;
> @@ -2200,14 +2203,14 @@ get_dpdk_args(const struct ovsrec_open_vswitch
> *ovs_cfg, char ***argv)
>  bool default_enabled;
>  const char *default_value;
>  } opts[] = {
> -{"dpdk-lcore-mask", "-c", true, "0x1"},
> +{"dpdk-lcore-mask", "-c", false, NULL},
>  /* XXX: DPDK 2.2.0 support, the true should become false for -n */
>  {"dpdk-mem-channels", "-n", true, "4"},
>  {"dpdk-alloc-mem", "-m", false, NULL},
>  {"dpdk-socket-mem", "--socket-mem", true, "1024,0"},
>  {"dpdk-hugepage-dir", "--huge-dir", false, NULL},
>  };
> -int i, ret = 1;
> +int i, ret = argc;
> 
>  for(i = 0; i < (sizeof(opts) / sizeof(opts[0])); ++i) {
>  const char *lookup = smap_get(&ovs_cfg->other_config,
> @@ -2250,7 +2253,8 @@ __dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)
>  {
>  char **argv = NULL;
>  int result;
> -int argc;
> +int argc = 0, argc_tmp;
> +bool auto_determine = true;
>  int err;
>  cpu_set_t cpuset;
> 
> @@ -2279,12 +2283,41 @@ __dpdk_init(const struct ovsrec_open_vswitch
> *ovs_cfg)
>  ovs_abort(0, "Thread getaffinity error %d.", err);
>  }
> 
> -argv = grow_argv(&argv, 0, 1);
> +argv = grow_argv(&argv, argc, argc+1);

argc and 1 are added in grow_argv(), so I think it should be 
'argv = grow_argv(&argv, argc, 1);'. Similar for other grow_argv() calls

>  if (!argv) {
>  ovs_abort(0, "Unable to allocate an initial argv.");
>  }
> -argv[0] = strdup("ovs"); /* TODO use prctl to get process name */
> -argc = get_dpdk_args(ovs_cfg, &argv);
> +argv[argc++] = strdup("ovs"); /* TODO use prctl to get process name */
> +
> +argc_tmp = get_dpdk_args(ovs_cfg, &argv, argc);
> +
> +while(argc_tmp != argc) {
> +if (!strcmp("-c", argv[argc++])) {
> +auto_determine = false;
> +break;
> +}
> +}
> +
> +/**
> + * NOTE: This is an unsophisticated mechanism for determining the DPDK
> + * lcore for the DPDK Master.
> + */
> +if (auto_determine) {
> +int i;
> +for (i = 0; i < CPU_SETSIZE; i++) {
> +if (CPU_ISSET(i, &cpuset)) {
> +char buf[MAX_BUFSIZ];
> +snprintf(buf, MAX_BUFSIZ, "0x%08llX", (1ULL< +argv = grow_argv(&argv, argc, argc+2);
> +if (!argv) {
> +ovs_abort(0, "Unable to grow argv for coremask");
> +}
> +argv[argc++] = strdup("-c");
> +argv[argc++

Re: [ovs-dev] [PATCH v2 2/3] netdev-dpdk: Convert initialization from cmdline to db

2016-01-12 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Aaron Conole
> Sent: Monday, January 11, 2016 6:51 PM
> To: Zoltan Kiss
> Cc: dev@openvswitch.org; Flavio Leitner
> Subject: Re: [ovs-dev] [PATCH v2 2/3] netdev-dpdk: Convert initialization
> from cmdline to db
> 
> Zoltan Kiss  writes:
> > On 08/01/16 16:31, Aaron Conole wrote:
> >> Panu Matilainen  writes:
> >>> On 01/04/2016 11:46 PM, Aaron Conole wrote:
>  Existing DPDK integration is provided by use of command line options
> which
>  must be split out and passed to librte in a special manner. However,
> this
>  forces any configuration to be passed by way of a special DPDK flag, and
>  interferes with ovs+dpdk packaging solutions.
> 
>  This commit delays dpdk initialization until the first DPDK netdev is
> added
>  to the bridge, at which point ovs initializes librte.
> >>>
> >>> On thing to keep in mind is that rte_eal_init() can and will tear down
> >>> the entire process on failure since DPDK calls rte_panic() if
> >>> something so much as sneezes. In current OVS this occurs on service
> >>> startup where its relatively harmless, but with lazy initialization
> >>> there could be already be other activity that is in risk of getting
> >>> terminated when the first DPDK port is added.
> >>>
> >>> Fixing rte_eal_init() to gracefully return on failure has been
> >>> discussed, and agreed on in principle, on DPDK list but all current
> >>> DPDK versions are nasty wrt that.
> >>>
> >>>   - Panu -
> >>
> >> So, I've waffled back and forth on this. I understand the reason to be
> >> nervous about an always init option (because that wastes lots of
> >> resources when the system won't ever use dpdk). I also understand the
> >> possible issues *today* with dpdk_init, but even then, it's a dpdk issue
> >> which we want fixed anyway, so I don't know that this should hold up
> >> this patch.
> >
> > I couldn't find the original email where this discussion happened: why
> > is it required to be able to init DPDK on the fly? I mean it's a nice
> > bit, but brings in a lot of trouble. On the other hand, asking the
> > user to set a "other_config:odp=true" and then restart ovs-vswitchd is
> > not a huge thing to ask.
> 
> You won't find such a discussion - it's never been "required," as
> such. It is very desirable, as having such automatic behavior reduces the
> amount of steps required to get up and running with DPDK enabled
> OVS. This is, imho, the argument of both Panu and Kevin when they think
> a big on/off switch is less than elegant. I tend to agree with that, as well.
> 
> If the user is going through the trouble of compiling with DPDK support, and
> then wants to add a DPDK port, having to also set
> "other_config:dpdk=yes" seems like too many ok dialogs. I hope the
> analogy makes sense.
> 
> That said, it's really an unneeded enhancement, and I've pitched the
> idea to folks at the office within my elastic-shooting range. The answer
> has been consistently "This is an enhancement, not a requirement." So
> I'll drop the lazy-init feature for now.
> 
> >> It's definitely a more elegant solution to do the lazy init, but I think
> >> there could be times where we want a "stop everything" button for
> >> occasions where testing without DPDK doing it's thing are desired.
> >>
> >> However, I think I've come up with a solution that gives us flexibility
> >> to support these cases without much additional work, so let me know what
> >> you think:
> >>
> >> First, go back to the dpdk true/false flag
> >> Second, add a patch in the series which changes true/false to a tristate
> >> on/off/lazy allowing the policy of when to initialize DPDK to be user
> >> defined. We can default it to 'off' or 'lazy', but it can be changed to
> >> 'on' if we want.
> >>
> >> What do folks think? Too much work and code for not enough gains?
> 
> As I wrote above, the 'second' part of this is a great follow up
> enhancement, but for now I'm going to go back to the giant flag, and we
> can take it as 'future development'.

I think we all agree that removing the vswitchd dpdk mandatory cmdline args by
putting them in the db/code with defaults is good for usability and getting that
alone enabled through this patch would be a good outcome IMHO.

I see "other_config:dpdk=yes" and lazy init's as ways to enable a common build.
That's good for usability and support but I think it's a separate enough change
to warrant a separate patchset/discussion.

> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2] lib/netdev-dpdk: increase ring name length for dpdkr ports

2016-01-15 Thread Traynor, Kevin

> -Original Message-
> From: Mauricio Vásquez [mailto:mauricio.vasquezber...@studenti.polito.it]
> Sent: Thursday, January 14, 2016 9:15 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org; acon...@bytheb.org
> Subject: Re: [ovs-dev] [PATCH v2] lib/netdev-dpdk: increase ring name length
> for dpdkr ports
> 
> Hello Kevin,
> 
> It can only work up to 255 actually, notice that the port number is
> first parsed and saved in the variable "uint8_t client_id", then the
> type of this variable should be changed too. Additionally, the port
> number in ovs is saved in the member "int user_port_id;" of the
> dpdk_ring structure.

Hi - yes, you're right! I had just checked the length of the buffer, not the
client_id.

> 
> My proposal would be to change the type of both variables to 'unsigned
> int' and add some error handling in the case overflow happens during
> the parsing.
> 
> What do you think?

That sounds good. I'm not really sure how likely it is that someone would
decide to use a very large number and ring_client.c is only a test program
but if it's simple to put in the check it's no harm to have it as good reference
code.

> 
> 
> On 11/01/2016, Traynor, Kevin  wrote:
> >
> >> -Original Message-
> >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Mauricio
> >> Vasquez
> >> B
> >> Sent: Sunday, January 10, 2016 6:28 PM
> >> To: dev@openvswitch.org
> >> Cc: acon...@bytheb.org
> >> Subject: [ovs-dev] [PATCH v2] lib/netdev-dpdk: increase ring name length
> >> for
> >> dpdkr ports
> >>
> >> A ring name length of 10 characters is not enough for dpdkr ports
> >> starting from dpdkr10, then it is increased to RTE_RING_NAMESIZE
> >> characters.
> >
> > Looks good to me. There's some existing headroom for name length in
> > ring_client.c
> > but you may want to also increase it as part of this change now that even
> > larger
> > numbers are possible? It should work up to  as is, so it would be just
> > to catch
> > someone using magic numbers.
> >
> >>
> >> Signed-off-by: Mauricio Vasquez B
> >> 
> >> ---
> >> v2:
> >> - Use RTE_RING_NAMESIZE instead of a numerical constant.
> >>
> >>  lib/netdev-dpdk.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> index b209df2..90512aa 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -1921,7 +1921,7 @@ dpdk_ring_create(const char dev_name[], unsigned
> >> int
> >> port_no,
> >>   unsigned int *eth_port_id)
> >>  {
> >>  struct dpdk_ring *ivshmem;
> >> -char ring_name[10];
> >> +char ring_name[RTE_RING_NAMESIZE];
> >>  int err;
> >>
> >>  ivshmem = dpdk_rte_mzalloc(sizeof *ivshmem);
> >> @@ -1930,7 +1930,7 @@ dpdk_ring_create(const char dev_name[], unsigned
> >> int
> >> port_no,
> >>  }
> >>
> >>  /* XXX: Add support for multiquque ring. */
> >> -err = snprintf(ring_name, 10, "%s_tx", dev_name);
> >> +err = snprintf(ring_name, sizeof(ring_name), "%s_tx", dev_name);
> >>  if (err < 0) {
> >>  return -err;
> >>  }
> >> @@ -1943,7 +1943,7 @@ dpdk_ring_create(const char dev_name[], unsigned
> >> int
> >> port_no,
> >>  return ENOMEM;
> >>  }
> >>
> >> -err = snprintf(ring_name, 10, "%s_rx", dev_name);
> >> +err = snprintf(ring_name, sizeof(ring_name), "%s_rx", dev_name);
> >>  if (err < 0) {
> >>  return -err;
> >>  }
> >> --
> >> 1.9.1
> >>
> >> ___
> >> dev mailing list
> >> dev@openvswitch.org
> >> http://openvswitch.org/mailman/listinfo/dev
> >
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC v2] dpif-netdev: Allow different numbers of rx queues for different ports.

2016-01-20 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ilya Maximets
> Sent: Wednesday, January 20, 2016 3:24 PM
> To: Iezzi, Federico; dev@openvswitch.org; Ben Pfaff; Daniele Di Proietto;
> Alex Wang; Joe Stringer; Aaron Conole
> Cc: Dyasly Sergey
> Subject: Re: [ovs-dev] [PATCH RFC v2] dpif-netdev: Allow different numbers of
> rx queues for different ports.
> 
> Hi,
> It seems that just no one wants to review them. Even a bug fixes.
> Last time my PMD related patch was reviewed in October, after a
> month and two pings.
> 
> Best regards, Ilya Maximets.

Hi Ilya, I'll review this when I'm finished reviewing Aaron's cmdline patchset.

thanks,
Kevin.

> 
> On 20.01.2016 17:48, Iezzi, Federico wrote:
> > Hi there,
> >
> > It seems that a lot of DPDK and DPDK related patches are still in pending
> since before Christmas.
> > Is there any problem with them?
> >
> > Thanks,
> > Federico
> >
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v5 2/4] netdev-dpdk: Convert initialization from cmdline to db

2016-01-20 Thread Traynor, Kevin
> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Monday, January 18, 2016 8:29 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner; Traynor, Kevin; Panu Matilainen; Zoltan Kiss
> Subject: [PATCH v5 2/4] netdev-dpdk: Convert initialization from cmdline to
> db
> 
> Existing DPDK integration is provided by use of command line options which
> must be split out and passed to librte in a special manner. However, this
> forces any configuration to be passed by way of a special DPDK flag, and
> interferes with ovs+dpdk packaging solutions.
> 
> This commit delays dpdk initialization until after the OVS database
> connection is established, at which point ovs initializes librte. It
> pulls all of the config data from the OVS database, and assembles a
> new argv/argc pair to be passed along.
> 
> Signed-off-by: Aaron Conole 
> ---
> v2:
> * Removed trailing whitespace
> * Followed for() loop brace coding style
> * Automatically enable DPDK when adding a DPDK enabled port
> * Fixed an issue on startup when DPDK enabled ports are present
> * Updated the documentation (including vswitch.xml) and documented all
>   new parameters
> * Dropped the premature initialization test
> 
> v3:
> * Improved description language in INSTALL.DPDK.md
> * Fixed the ovs-vsctl examples for DPDK
> * Returned to the global dpdk-init (bullet 3 from v2)
> * Fixed a build error when compiling without dpdk support enabled
> * converted to xstrdup, for consistency after rebasing
> 
> v4:
> * No change
> 
> v5:
> * Adjust the ovs-dev script to account for the new dpdk configuration
> * Update the ovs-vswitchd.8.in pointing to INSTALL.DPDK.md

Hi Aaron. I've only one real comment below on this patch. The other patches
look fine to me but I didn't get a chance to test.  

thanks,
Kevin.

> 
>  FAQ.md |   6 +-
>  INSTALL.DPDK.md|  81 ++-
>  lib/netdev-dpdk.c  | 191 ---
> --
>  lib/netdev-dpdk.h  |  22 --
>  utilities/ovs-dev.py   |  11 ++-
>  vswitchd/bridge.c  |   3 +
>  vswitchd/ovs-vswitchd.8.in |   5 +-
>  vswitchd/ovs-vswitchd.c|  25 +-
>  vswitchd/vswitch.xml   | 118 +++-
>  9 files changed, 346 insertions(+), 116 deletions(-)
> 
> diff --git a/FAQ.md b/FAQ.md
> index 29b2e19..c233118 100644
> --- a/FAQ.md
> +++ b/FAQ.md
> @@ -431,9 +431,9 @@ A: Yes.  How you configure it depends on what you mean by
> "promiscuous
> 
>  A: Firstly, you must have a DPDK-enabled version of Open vSwitch.
> 
> -   If your version is DPDK-enabled it will support the --dpdk
> -   argument on the command line and will display lines with
> -   "EAL:..." during startup when --dpdk is supplied.
> +   If your version is DPDK-enabled it will support the other_config:dpdk-
> init
> +   configuration in the database and will display lines with
> +   "EAL:..." during startup when other_config:dpdk-init is set to 'true'.
> 
> Secondly, when adding a DPDK port, unlike a system port, the
> type for the interface must be specified. For example;
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 96b686c..46bd1a8 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -143,22 +143,64 @@ Using the DPDK with ovs-vswitchd:
> 
>  5. Start vswitchd:
> 
> -   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
> -   argument. This needs to be first argument passed to vswitchd process.
> -   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
> -   for dpdk initialization.
> +   DPDK configuration arguments can be passed to vswitchd via Open_vSwitch
> +   other_config database. The recognized configuration options are listed.
> +   Defaults will be provided for all values not explicitly set.
> +
> +   * dpdk-init
> +   Specifies whether OVS should initialize and support DPDK ports. This is
> +   a boolean, and defaults to false.

I'm assuming you've renamed from 'dpdk' to 'dpdk-init', so it could be =false
and there is still the possibility of adding a lazy init at a later time?

If so, I think it is good idea because we wouldn't want a situation in the
future where 'dpdk'=false but an introduced lazy init means DPDK is used.

> +
> +   * dpdk-lcore-mask
> +   Specifies the CPU cores on which dpdk lcore threads should be spawned.
> +   The DPDK lcore threads are used for DPDK library tasks, such as
> +   library internal message processing, logging, etc. Value should be in
> +   the form of a hex string (so '0x123') similar to the 'taskset' mask
&

Re: [ovs-dev] [PATCH v3 3/3] netdev-dpdk: Autofill lcore coremask if absent

2016-01-20 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Aaron Conole
> Sent: Tuesday, January 19, 2016 1:28 PM
> To: Qiu, Michael
> Cc: dev@openvswitch.org; Flavio Leitner; Zoltan Kiss
> Subject: Re: [ovs-dev] [PATCH v3 3/3] netdev-dpdk: Autofill lcore coremask if
> absent
> 
> "Qiu, Michael"  writes:
> > On 1/14/2016 5:18 AM, Aaron Conole wrote:
> >> The user has control over the DPDK internal lcore coremask, but this
> >> parameter can be autofilled with a bit more intelligence. If the user
> >> does not fill this parameter in, we use the lowest set bit in the
> >> current task CPU affinity. Otherwise, we will reassign the current
> >> thread to the specified lcore mask, in addition to the dpdk lcore
> >> threads.
> >
> > It's not a good idea to use the lowest set bit in the current task CPU
> > affinity,
> >
> > I think numa info should be considered as the NIC could belongs to
> > different Socket.
> >
> > If remote socket, it will lead bad performance.
> 
> I agree with your concerns, but they are tuning and optimization. The
> point of defaults is to get something up and running "good enough."
> Without an auto-assigned coremask, the user is forced to pick one and
> that is something of an inconvenience just to try out DPDK. So we have a
> default mechanism in place to tide over the user.
> 
> This default exists to handle the case where the user does not tune
> appropriately to their system. See
> http://openvswitch.org/pipermail/dev/2015-December/063626.html for some
> additonal context. Also keep in mind, this is solely for the lcore
> threads - PMD threads have their own CPU mask.

+1

> 
> > Thanks,
> > Michael
> 
> Thanks for the review, Michael!
> 
> -Aaron
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [OVS-DPDK] Unable to successfully add IN port to datapath, failed to add port1 as port: Unknown error -19

2016-01-21 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of beerappa s m
> Sent: Wednesday, January 20, 2016 10:37 AM
> To: 许志峰
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [OVS-DPDK] Unable to successfully add IN port to
> datapath, failed to add port1 as port: Unknown error -19
> 
> Hi,
> 
> OVS works nicely with DPDK. you can follow the same tutorial.
>  http://openvswitch.org/support/dist-docs/INSTALL.DPDK.md.txt

Yes, this file is the best place for instructions for OVS with DPDK. There is a 
mix of two
different open source vswitch projects in the original mail.

OVS with DPDK, [ovs-dev] mailing list: OVS which has DPDK integrated and is 
active and maintained.
OVDK, [OVS-DPDK] mailing list: older fork of OVS that is not active or 
maintained.

> 
> As mentioned in the table before starting the vswitchd daemon, you need to
> bind the interfaces to dpdk. and then start the switchd as follows.
> 
> 1. Bind the interface to dpdk something like this.
> 
> ./tools/dpdk_nic_bind.py --bind=igb_uio ens2f0
> 
> ./tools/dpdk_nic_bind.py --bind=igb_uio ens2f1
> ./tools/dpdk_nic_bind.py --status
> 
> 2. Then start the OVS vswitchd :
> ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
> 
> 
> 3. Create the bridge:
> 
> ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev -- set bridge
> br0 protocols=OpenFlow13
> 
> 
> 4.Add the ports to switchd :
> 
> ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set
> interface dpdk0 ofport_request=1
> 
> ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk -- set
> interface dpdk1 ofport_request=2
> 
> 
> *Note here*:
> 
> 1.  As mentioned the tutorial,  OVS expects DPDK device names to start
> with "dpdk"
> and end with a portid. vswitchd should print (in the log file) the number
> of dpdk devices found.
> 
> 2. And Type should be dpdk.
> 
> 
> So here its dpdk0 and dpdk1.
> 
> I hope it will helps you.
> 
> Regards,
> Beeru
> 
> 
> 
> On Wed, Jan 20, 2016 at 3:30 PM, 许志峰  wrote:
> 
> > Hello all,
> > I am using OVS-DPDK, following this tutorial
> >
> > https://github.com/01org/dpdk-
> ovs/blob/development/docs/04_Sample_Configurations/00_Phy-Phy.md
> > .
> > and I encountered some problems.
> >
> > I have two physical NICs binded to DPDK driver
> >
> > Network devices using DPDK-compatible driver
> > 
> > :06:00.0 '82574L Gigabit Network Connection' drv=igb_uio
> > unused=vfio-pci
> > :07:00.0 '82574L Gigabit Network Connection' drv=igb_uio
> > unused=vfio-pci
> >
> > Then I add a bridge br0 and 2 ports, follow the, just as the tutorial did.
> > and the ovs-vsctl show is normal.
> >
> > 170c321a-0dea-48c5-94dd-f53ec5587a44
> > Bridge "br0"
> > Port "port2"
> > Interface "port2"
> > type: dpdkphy
> > options: {port="1"}
> > Port "port1"
> > Interface "port1"
> > type: dpdkphy
> > options: {port="0"}
> > Port "br0"
> > Interface "br0"
> > type: internal
> >
> > I start ovs-dpdk and it works well and Statistic info is printed. Then I
> > start vswitch by./vswitchd/ovs-vswitchd -c 0x100 --proc-type=secondary --
> > --pidfile=/tmp/vswitchd.pid part of the output is
> >
> > 2016-01-16T09:20:40Z|00036|dpdk_link|INFO|Found OVDK03_Control_Alloc_Ring
> > 2016-01-16T09:20:40Z|00037|dpif_dpdk|ERR|Unable to successfully add IN
> > port to datapath, error '-19'
> > 2016-01-16T09:20:40Z|00038|dpif|WARN|dpdk@ovs-dpdk: failed to add
> > port1 as port: Unknown error -19
> > 2016-01-16T09:20:40Z|00039|dpif_dpdk|ERR|Unable to successfully add IN
> > port to datapath, error '-19'
> > 2016-01-16T09:20:40Z|00040|dpif|WARN|dpdk@ovs-dpdk: failed to add
> > port2 as port: Unknown error -19
> > 2016-01-16T09:20:40Z|00041|bridge|INFO|bridge br0: added interface br0
> > on port 65534
> > 2016-01-16T09:20:40Z|00042|bridge|INFO|bridge br0: using datapath ID
> > 86f3dadf9c41
> > 2016-01-16T09:20:40Z|00043|connmgr|INFO|br0: added service controller
> > "punix:/usr/local/var/run/openvswitch/br0.mgmt"
> > 2016-01-16T09:20:40Z|00044|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.1.2
> > 2016-01-16T09:20:50Z|00045|memory|INFO|16588 kB peak resident set size
> > after 10.0 seconds
> > 2016-01-16T09:20:50Z|00046|memory|INFO|dispatchers:1 flow_dumpers:1
> > handlers:5 ports:1 revalidators:3 rules:4
> >
> > I tried to use dpdk0/dpdk1 to replace port1/port2 and got the same error.
> > http://openvswitch.org/support/dist-docs/INSTALL.DPDK.md.txt step 6
> > I also tried ovs_dpdk_16/17 because I saw the last page of this PDF
> >
> > https://01.org/sites/default/files/downloads/packet-
> processing/intel_dpdk_vswitch_gsg_0_7.pdf
> >
> > I found a same question here
> >  https://lists.01.org/pipermail/dpdk-ovs/2014-May/000973.html
> > 
> > the solution is to use the command below

Re: [ovs-dev] [PATCH] netdev-dpdk: Add vhost-user multiqueue support

2016-01-22 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Flavio Leitner
> Sent: Thursday, January 21, 2016 1:27 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner
> Subject: [ovs-dev] [PATCH] netdev-dpdk: Add vhost-user multiqueue support
> 
> Most of the network cards today supports multiple receive
> and transmit queues (MQ).  The core idea is that on packet
> reception, a NIC can send different packets to different
> queues to distribute processing among CPUs running in parallel.
> The packet distribution is based on a result of a filter applied
> on each packet headers. The filter should keep all packets from
> the same flow on the same queue to avoid re-ordering while
> distributing different flows among all available queues.
> 
> This is how the packet moves in a typical vhost-user use-case:
> 
> NIC OVS
> DPDK port  bridge --- vhost-user  qemu  virtio eth0
> 
> The DPDK ports, OVS bridges, virtio network driver and
> recently QEMU (vhost-user) supports MQ.  This patch adds MQ
> support to OVS that leverages DPDK vhost library to implement
> vhost-user interfaces.

Looks good to me. I'll ack when DPDK 2.2 is listed as the required version of 
OVS.

> 
> Signed-off-by: Flavio Leitner 
> ---
>  INSTALL.DPDK.md   |  12 ++
>  NEWS  |   2 +
>  lib/netdev-dpdk.c | 115 ++--
> --
>  3 files changed, 105 insertions(+), 24 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 96b686c..0952e28 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -567,6 +567,18 @@ Follow the steps below to attach vhost-user port(s) to a
> VM.
> -numa node,memdev=mem -mem-prealloc
> ```
> 
> +3. Optional: Enable multiqueue support
> +   QEMU needs to be configured with multiple queues and the number queues
> +   must be less or equal to Open vSwitch other_config:n-dpdk-rxqs.
> +   The $q below is the number of queues.
> +   The $v is the number of vectors, which is '$q x 2 + 2'.
> +
> +   ```
> +   -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
> +   -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q
> +   -device virtio-net-
> pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
> +   ```
> +
>  DPDK vhost-cuse:
>  
> 
> diff --git a/NEWS b/NEWS
> index 5c18867..9d41245 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -7,6 +7,8 @@ Post-v2.5.0
>   * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported.
> - ovs-ofctl:
>   * queue-get-config command now allows a queue ID to be specified.
> +   - DPDK:
> + * Added multiqueue support to vhost-user
> 
>  v2.5.0 - xx xxx 
>  -
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index de7e488..1af8d1a 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -221,12 +221,9 @@ struct netdev_dpdk {
>   * If the numbers match, 'txq_needs_locking' is false, otherwise it is
>   * true and we will take a spinlock on transmission */
>  int real_n_txq;
> +int real_n_rxq;
>  bool txq_needs_locking;
> 
> -/* Spinlock for vhost transmission.  Other DPDK devices use spinlocks in
> - * dpdk_tx_queue */
> -rte_spinlock_t vhost_tx_lock;
> -
>  /* virtio-net structure for vhost device */
>  OVSRCU_TYPE(struct virtio_net *) virtio_dev;
> 
> @@ -654,13 +651,10 @@ dpdk_dev_parse_name(const char dev_name[], const char
> prefix[],
>  static int
>  vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex)
>  {
> -struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> -
>  if (rte_eal_init_ret) {
>  return rte_eal_init_ret;
>  }
> 
> -rte_spinlock_init(&netdev->vhost_tx_lock);
>  return netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST);
>  }
> 
> @@ -834,7 +828,7 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned
> int n_txq,
>  }
> 
>  static int
> -netdev_dpdk_vhost_set_multiq(struct netdev *netdev_, unsigned int n_txq,
> +netdev_dpdk_vhost_cuse_set_multiq(struct netdev *netdev_, unsigned int
> n_txq,
>   unsigned int n_rxq)
>  {
>  struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> @@ -850,6 +844,32 @@ netdev_dpdk_vhost_set_multiq(struct netdev *netdev_,
> unsigned int n_txq,
>  netdev->up.n_txq = n_txq;
>  netdev->real_n_txq = 1;
>  netdev->up.n_rxq = 1;
> +netdev->txq_needs_locking = netdev->real_n_txq != netdev->up.n_txq;
> +
> +ovs_mutex_unlock(&netdev->mutex);
> +ovs_mutex_unlock(&dpdk_mutex);
> +
> +return err;
> +}
> +
> +static int
> +netdev_dpdk_vhost_set_multiq(struct netdev *netdev_, unsigned int n_txq,
> + unsigned int n_rxq)
> +{
> +struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> +int err = 0;
> +
> +if (netdev->up.n_txq == n_txq && netdev->up.n_rxq == n_rxq) {
> +return err;
> +}
> +
> +ovs_mutex_lock(&dpdk_mutex);
> +ovs

Re: [ovs-dev] [PATCH v2] netdev-dpdk: Add vhost-user multiqueue support

2016-01-26 Thread Traynor, Kevin
> -Original Message-
> From: Flavio Leitner [mailto:f...@sysclose.org]
> Sent: Tuesday, January 26, 2016 6:58 PM
> To: dev@openvswitch.org
> Cc: Traynor, Kevin; Flavio Leitner
> Subject: [PATCH v2] netdev-dpdk: Add vhost-user multiqueue support
> 
> Most of the network cards today supports multiple receive
> and transmit queues (MQ).  The core idea is that on packet
> reception, a NIC can send different packets to different
> queues to distribute processing among CPUs running in parallel.
> The packet distribution is based on a result of a filter applied
> on each packet headers. The filter should keep all packets from
> the same flow on the same queue to avoid re-ordering while
> distributing different flows among all available queues.
> 
> This is how the packet moves in a typical vhost-user use-case:
> 
> NIC OVS
> DPDK port  bridge --- vhost-user  qemu  virtio eth0
> 
> The DPDK ports, OVS bridges, virtio network driver and
> recently QEMU (vhost-user) supports MQ.  This patch adds MQ
> support to OVS that leverages DPDK vhost library to implement
> vhost-user interfaces.

Acked-by: Kevin Traynor 

> 
> Signed-off-by: Flavio Leitner 
> ---
>  INSTALL.DPDK.md   |  12 ++
>  NEWS  |   2 +
>  lib/netdev-dpdk.c | 115 ++--
> --
>  3 files changed, 105 insertions(+), 24 deletions(-)
> 
>  V2:
> Rebased now that branch master supports DPDK 2.2.0
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index c601358..e8ef4b5 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -567,6 +567,18 @@ Follow the steps below to attach vhost-user port(s) to a
> VM.
> -numa node,memdev=mem -mem-prealloc
> ```
> 
> +3. Optional: Enable multiqueue support
> +   QEMU needs to be configured with multiple queues and the number queues
> +   must be less or equal to Open vSwitch other_config:n-dpdk-rxqs.
> +   The $q below is the number of queues.
> +   The $v is the number of vectors, which is '$q x 2 + 2'.
> +
> +   ```
> +   -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
> +   -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q
> +   -device virtio-net-
> pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
> +   ```
> +
>  DPDK vhost-cuse:
>  
> 
> diff --git a/NEWS b/NEWS
> index 5c18867..9d41245 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -7,6 +7,8 @@ Post-v2.5.0
>   * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported.
> - ovs-ofctl:
>   * queue-get-config command now allows a queue ID to be specified.
> +   - DPDK:
> + * Added multiqueue support to vhost-user
> 
>  v2.5.0 - xx xxx 
>  -
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 31e56b6..09ccc2c 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -221,12 +221,9 @@ struct netdev_dpdk {
>   * If the numbers match, 'txq_needs_locking' is false, otherwise it is
>   * true and we will take a spinlock on transmission */
>  int real_n_txq;
> +int real_n_rxq;
>  bool txq_needs_locking;
> 
> -/* Spinlock for vhost transmission.  Other DPDK devices use spinlocks in
> - * dpdk_tx_queue */
> -rte_spinlock_t vhost_tx_lock;
> -
>  /* virtio-net structure for vhost device */
>  OVSRCU_TYPE(struct virtio_net *) virtio_dev;
> 
> @@ -654,13 +651,10 @@ dpdk_dev_parse_name(const char dev_name[], const char
> prefix[],
>  static int
>  vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex)
>  {
> -struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> -
>  if (rte_eal_init_ret) {
>  return rte_eal_init_ret;
>  }
> 
> -rte_spinlock_init(&netdev->vhost_tx_lock);
>  return netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST);
>  }
> 
> @@ -834,7 +828,7 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned
> int n_txq,
>  }
> 
>  static int
> -netdev_dpdk_vhost_set_multiq(struct netdev *netdev_, unsigned int n_txq,
> +netdev_dpdk_vhost_cuse_set_multiq(struct netdev *netdev_, unsigned int
> n_txq,
>   unsigned int n_rxq)
>  {
>  struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> @@ -850,6 +844,32 @@ netdev_dpdk_vhost_set_multiq(struct netdev *netdev_,
> unsigned int n_txq,
>  netdev->up.n_txq = n_txq;
>  netdev->real_n_txq = 1;
>  netdev->up.n_rxq = 1;
> +netdev->txq_needs_locking = netdev->real_n_txq != netdev->up.n_txq;
> +
> +ovs_mutex_unlock(&netdev->mutex);
> +ovs_mutex_unlock(&d

Re: [ovs-dev] [PATCH v8 0/5] Convert DPDK configuration from command line to DB based

2016-02-05 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Friday, February 5, 2016 2:22 AM
> To: Flavio Leitner; Aaron Conole; Andy Zhou
> Cc: 
> Subject: Re: [ovs-dev] [PATCH v8 0/5] Convert DPDK configuration from command
> line to DB based
> 
> Sorry for jumping in so late on this.
> 
> I'm absolutely not a fan of the way we currently handle DPDK command
> line options, so thanks Aaron for taking this initiative.
> 
> I think that one of the main reasons for this series, correct me if I'm
> wrong, is that it's currently hard to start OVS with DPDK from the
> init scripts.

The main benefit I see of these patches is that it removes a requirement for
the user to have knowledge of DPDK init params by providing defaults, while
also catering for a power user who wants to set them. 

> 
> The init script 'ovs-ctl' starts both the database and vswitchd.
> The DPDK parameters should be written by the user in the database
> with ovs-vsctl, after the database is started, but before vswitchd is.
> Currently there's no chance to do so.
> 
> The user can still influence the behavior of ovs-ctl by editing
> /etc/default/openvswitch-switch and then ovs-ctl could either
> populate the database or use command line arguments for
> ovs-vswitchd.
> 
> I'm not an expert on init scripts though, any input is appreciated.
> 
> One more thing: DPDK 2.2 made -c and -n optional with commits
> 4fce65a6be17("eal: default to using all cores") and
> 19bfa4ddb1a9("eal: make the -n argument optional"), so I'm not sure
> we should provide a default and expose them to the user.   

That's noted for -n. 
+/* XXX: DPDK 2.2.0 support, the true should become false for -n */


Using the DPDK default -c (all cores) is not ideal as it will create a
thread on every core which will not be used. Anyway, whatever about
creating something for passing the DPDK init, the real default that is
now provided is that the control type threads in vswitchd will run with
the same affinity as vswitchd. That allows for use of the existing threading
model in vswitchd and that by default those threads will float across
multiple cores as scheduled by Linux.

A power
> user is still able to influence them via the extra options

That's true, they could alternatively be set through extra options.

With the exception of socket-mem, I don't think 99% of users will want to
set -c or -n anymore once defaults are provided.

Kevin.

> 
> Daniele
> 
> On 04/02/2016 07:38, "Flavio Leitner"  wrote:
> 
> >On Thu, 04 Feb 2016 09:51:05 -0500
> >Aaron Conole  wrote:
> >
> >> Hi Andy,
> >>
> >> Andy Zhou  writes:
> >> > Sorry for jumping in late in the review cycle. But I am not sure if
> >> > there is significant advantage in store dpdk options in OVSDB.
> >>
> >> No problem - always good to have a fresh pair of eyes. Additionally,
> >> the cover letter should be updated, because I wrote it in a hurry and
> >> it doesn't state the advantages clearly.
> >>
> >> Thanks very much for the review!
> >>
> >> If folks believe this is not worthwhile, or feel the other way, it
> >> would be nice to hear from them - get an ACK or NAK, or anything
> >> else? :) I'm okay dropping this, btw, if folks think it isn't
> >> worthwhile, and all that. I'd prefer not to, given how much work has
> >> gone into it, though.
> >>
> >> > On Fri, Jan 29, 2016 at 9:56 AM, Aaron Conole 
> >> > wrote:
> >> >
> >> >
> >> >  Currently, configuration of DPDK parameters is done via the
> >> > command line through a --dpdk **OPTIONS** -- command line argument.
> >> > This has a number of challenges, including:
> >> >  * It must be the first option passed to ovs-vswitchd
> >> >
> >> >
> >> > This can be improved by specifying dpdk options in quotes, as
> >> > --dpdk "options", then --dpdk can be any where in the command line.
> >>
> >> Sure, and that solves the positional requirement, but requiring that
> >> we have to put quotes around dpdk options (when we _know_ there will
> >> be many - it's not like you can get away with no options), is really
> >> hacky. At least, I think so.
> >
> >That becomes an exception not only to OVS but to other services.
> >I found very unusual to have to use:
> >
> >--dpdk "-c ff00 --socket-mem=1024,0" --nochdir 
> >
> >and then you have to provide some way to the user to customize that.
> >
> >
> >> >  * It breaks from the way most other things are configured in OVS
> >> >
> >> >
> >> > ovs-vswitchd still has command line options.
> >>
> >> Sure, but they're all 'logistic' - where to reach the database, and
> >> how to configure the logging. None are 'features' (for instance, PMD
> >> thread affinities).
> >>
> >> It remains that dpdk is the only major feature in OVS which requires
> >> passing command line arguments.
> >
> >DPDK options are actually the datapath configuration (how much mem,
> >which socket, where to run PMDs...) . As we do with other configs,
> >we should store them in the DB as well, just

Re: [ovs-dev] [PATCH v8 0/5] Convert DPDK configuration from command line to DB based

2016-02-09 Thread Traynor, Kevin


> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Friday, January 29, 2016 5:57 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner ; Panu Matilainen ;
> Traynor, Kevin ; Zoltan Kiss
> ; Christian Ehrhardt
> 
> Subject: [PATCH v8 0/5] Convert DPDK configuration from command line to DB
> based
> 
> Currently, configuration of DPDK parameters is done via the command line
> through a --dpdk **OPTIONS** -- command line argument. This has a number of
> challenges, including:
> * It must be the first option passed to ovs-vswitchd
> * It breaks from the way most other things are configured in OVS
> * It doesn't allow an easy way to populate defaults
> 
> 
> This series brings the following changes to openvswitch:
> * All DPDK options are taken from the ovs database rather than the
>   command line
> * DPDK lcores are optionally auto-assigned to a single core based on the
>   bridge coremask.
> * Updated documentation
> 
> v2:
> * Dropped the vhost-user socket configuration options. Those can be re-added
>   as an extension
> * Incorporated feedback from Kevin Traynor.
> 
> v3:
> * Went back to a global dpdk-init
> * Language cleanup and various minor fixes
> 
> v4:
> * Added a way to pass arbitrary eal arguments
> 
> v5:
> * Restore the socket-mem default, and fix up the ovs-dev.py script, along
>   with the manpage for ovsdb-server
> 
> v6:
> * Correct a documentation issue with INSTALL.DPDK.md
> * Correct a non-dpdk enabled OVS incorrect warning variable
> * Remove an excess whitespace
> 
> v7:
> * After testing by Christian with dpdk-alloc-mem
> 
> v8:
> * Confirmed ``make check`` operation with and without dpdk.
>   Retested on live-host

Hi,

I've done some testing on this patchset and I couldn't find any issues.
 - tested that -c and -n defaults and explicit values are catered for
 - tested dpdk-init=t/f leads to dpdk initialization or not
 - tested that use of both dpdk-socket-mem and dpdk-alloc-mem is caught 
 - tested that a string can be passed in through extra_args
 - tested the code won't catch using a db entry dpdk-socket-mem and also
   putting --socket-mem in extra_args, however dpdk will barf

On command line args vs. db entries vs. a string of args in the db, if there
is doubt on this then let's debate further. This will change how ovs with
dpdk is used, so better debate it out and get it right.

There's one or two of the db entries that may be able to reused later for
other things e.g. vhostuser socket location, so that would be a + for them.
Backwards compatibility would be a + for command line args. Daniele has
mentioned scripting also. I'm sure there's other +/-'s.

Kevin.

> 
> Aaron Conole (5):
>   netdev-dpdk: Restore thread affinity after DPDK init
>   netdev-dpdk: Convert initialization from cmdline to db
>   netdev-dpdk: Autofill lcore coremask if absent
>   netdev-dpdk: Allow arbitrary eal arguments
>   NEWS: Announce the DPDK EAL configuration change
> 
>  FAQ.md |   6 +-
>  INSTALL.DPDK.md|  90 ++---
>  NEWS   |   5 +
>  lib/netdev-dpdk.c  | 327 ++-
> --
>  lib/netdev-dpdk.h  |  22 ++-
>  utilities/ovs-dev.py   |   7 +-
>  vswitchd/bridge.c  |   3 +
>  vswitchd/ovs-vswitchd.8.in |   5 +-
>  vswitchd/ovs-vswitchd.c|  25 +---
>  vswitchd/vswitch.xml   | 128 +-
>  10 files changed, 513 insertions(+), 105 deletions(-)
> 
> --
> 2.5.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v8 0/5] Convert DPDK configuration from command line to DB based

2016-02-10 Thread Traynor, Kevin

> -Original Message-
> From: Aaron Conole [mailto:acon...@bytheb.org]
> Sent: Wednesday, February 10, 2016 1:23 PM
> To: Traynor, Kevin 
> Cc: dev@openvswitch.org; Flavio Leitner ; Mooney, Sean K
> 
> Subject: Re: [ovs-dev] [PATCH v8 0/5] Convert DPDK configuration from command
> line to DB based
> 
> "Traynor, Kevin"  writes:
> >> -Original Message-
> >> From: Aaron Conole [mailto:acon...@redhat.com]
> >> Sent: Friday, January 29, 2016 5:57 PM
> >> To: dev@openvswitch.org
> >> Cc: Flavio Leitner ; Panu Matilainen
> ;
> >> Traynor, Kevin ; Zoltan Kiss
> >> ; Christian Ehrhardt
> >> 
> >> Subject: [PATCH v8 0/5] Convert DPDK configuration from command line to DB
> >> based
> >>
> >> Currently, configuration of DPDK parameters is done via the command line
> >> through a --dpdk **OPTIONS** -- command line argument. This has a number
> of
> >> challenges, including:
> >> * It must be the first option passed to ovs-vswitchd
> >> * It breaks from the way most other things are configured in OVS
> >> * It doesn't allow an easy way to populate defaults
> >>
> >>
> >> This series brings the following changes to openvswitch:
> >> * All DPDK options are taken from the ovs database rather than the
> >>   command line
> >> * DPDK lcores are optionally auto-assigned to a single core based on the
> >>   bridge coremask.
> >> * Updated documentation
> >>
> >> v2:
> >> * Dropped the vhost-user socket configuration options. Those can be re-
> added
> >>   as an extension
> >> * Incorporated feedback from Kevin Traynor.
> >>
> >> v3:
> >> * Went back to a global dpdk-init
> >> * Language cleanup and various minor fixes
> >>
> >> v4:
> >> * Added a way to pass arbitrary eal arguments
> >>
> >> v5:
> >> * Restore the socket-mem default, and fix up the ovs-dev.py script, along
> >>   with the manpage for ovsdb-server
> >>
> >> v6:
> >> * Correct a documentation issue with INSTALL.DPDK.md
> >> * Correct a non-dpdk enabled OVS incorrect warning variable
> >> * Remove an excess whitespace
> >>
> >> v7:
> >> * After testing by Christian with dpdk-alloc-mem
> >>
> >> v8:
> >> * Confirmed ``make check`` operation with and without dpdk.
> >>   Retested on live-host
> >
> > Hi,
> >
> > I've done some testing on this patchset and I couldn't find any
> > issues.
> 
> Cool; does that mean I have your Tested-by? :)

Yes, you can add Tested-by and Acked-by for me. I'm Signed-off-by on 1/5
so my ack doesn't make sense for that one.

> 
> >  - tested that -c and -n defaults and explicit values are catered for
> >  - tested dpdk-init=t/f leads to dpdk initialization or not
> >  - tested that use of both dpdk-socket-mem and dpdk-alloc-mem is caught
> >  - tested that a string can be passed in through extra_args
> >  - tested the code won't catch using a db entry dpdk-socket-mem and also
> >putting --socket-mem in extra_args, however dpdk will barf
> >
> > On command line args vs. db entries vs. a string of args in the db, if
> there
> > is doubt on this then let's debate further. This will change how ovs with
> > dpdk is used, so better debate it out and get it right.
> 
> I don't think there's any real doubt. I think the approach is the best
> way to do this. I have agreement from almost everyone else, I think?
> Anyone still need to be convinced?
> 
> > There's one or two of the db entries that may be able to reused later for
> > other things e.g. vhostuser socket location, so that would be a + for them.
> > Backwards compatibility would be a + for command line args. Daniele has
> > mentioned scripting also. I'm sure there's other +/-'s.
> 
> I don't know - scripting vswitchd? I think that sounds a little strange;
> it isn't some kind of ephemeral service that comes and goes. And it's
> not like this patch prevents the same kinds of arbitrary commands to be
> passed to the EAL (since 4/5 does precisely that). The only change
> required is doing ovs-vsctl before ovs-vswitch in the 'starting
> vswitchd' case. Is that really a huge deal?
> 
> There's always pros and cons. I haven't heard any explict NAK, or any
> explicit ACK. It would be nice for that to happen, since I can't
> maintain this series out-of-tree forever, and th

Re: [ovs-dev] [PATCH v9 5/6] netdev-dpdk: Check dpdk-extra when reading db

2016-02-15 Thread Traynor, Kevin

> -Original Message-
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Thursday, February 11, 2016 8:16 PM
> To: dev@openvswitch.org
> Cc: Flavio Leitner ; Panu Matilainen ;
> Traynor, Kevin ; Wojciechowicz, RobertX
> ; Zoltan Kiss ;
> Ansis Atteka ; Christian Ehrhardt
> ; Mooney, Sean K 
> Subject: [PATCH v9 5/6] netdev-dpdk: Check dpdk-extra when reading db
> 
> A previous patch introduced the ability to pass arbitrary EAL command
> line options via the dpdk_extras database entry. This commit enhances
> that by warning the user when such a configuration is detected and
> prefering the value in the database.

hi Aaron,

I think we need a small doc update because of this patch. This is
allowing (with warning) and preferring something that docs from 4/6
say is not allowed. 

+   values. You MUST not pass arguments via dpdk-extra if they can be passed
+   via another database parameter.

I tested this functionality for -c and --socket-mem and I see that dpdk_extras 
is
being preferred.

Tested-by: Kevin Traynor 

One other thing I noticed is that 1/6 is giving a build warning, as 
'cuse_dev_name'
now needs to be wrapped in the #ifdef VHOST_CUSE

lib/netdev-dpdk.c:109:14: warning: 'cuse_dev_name' defined but not used 
[-Wunused-variable]
 static char *cuse_dev_name = NULL;/* Character device cuse_dev_name. */

Not sure if these are big enough to warrant a re-spin or they could be quick 
follow ups. 

Kevin.

> 
> Suggested-by: Sean K Mooney 
> Signed-off-by: Aaron Conole 
> ---
> v9:
> * Added as suggested by Sean K Mooney
> 
>  lib/netdev-dpdk.c | 66 +
> --
>  1 file changed, 55 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 1d6b907..b376e40 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2257,6 +2257,17 @@ dpdk_option_extend(char ***argv, int argc, const char
> *option,
>  (*argv)[argc+1] = xstrdup(value);
>  }
> 
> +static char **
> +move_argv(char ***argv, size_t cur_size, char **src_argv, size_t src_argc)
> +{
> +char **newargv = grow_argv(argv, cur_size, src_argc);
> +while(src_argc--) {
> +newargv[cur_size+src_argc] = src_argv[src_argc];
> +src_argv[src_argc] = 0;
> +}
> +return newargv;
> +}
> +
>  static int
>  extra_dpdk_args(const char *ovs_cfg, char ***argv, int argc)
>  {
> @@ -2274,9 +2285,21 @@ extra_dpdk_args(const char *ovs_cfg, char ***argv, int
> argc)
>  return ret;
>  }
> 
> +static bool
> +argv_contains(char **argv_haystack, const size_t argc_haystack,
> +  const char *needle)
> +{
> +for(size_t i = 0; i < argc_haystack; ++i) {
> +if (!strcmp(argv_haystack[i], needle))
> +return true;
> +}
> +return false;
> +}
> +
>  static int
>  construct_dpdk_options(const struct ovsrec_open_vswitch *ovs_cfg,
> -   char ***argv, const int initial_size)
> +   char ***argv, const int initial_size,
> +   char **extra_args, const size_t extra_argc)
>  {
>  struct dpdk_options_map {
>  const char *ovs_configuration;
> @@ -2298,8 +2321,13 @@ construct_dpdk_options(const struct
> ovsrec_open_vswitch *ovs_cfg,
>  lookup = opts[i].default_value;
> 
>  if(lookup) {
> -dpdk_option_extend(argv, ret, opts[i].dpdk_option, lookup);
> -ret += 2;
> +if (!argv_contains(extra_args, extra_argc, opts[i].dpdk_option))
> {
> +dpdk_option_extend(argv, ret, opts[i].dpdk_option, lookup);
> +ret += 2;
> +} else {
> +VLOG_WARN("Ignoring database defined option '%s' due to "
> +  "dpdk_extras config", opts[i].dpdk_option);
> +}
>  }
>  }
> 
> @@ -2308,7 +2336,8 @@ construct_dpdk_options(const struct ovsrec_open_vswitch
> *ovs_cfg,
> 
>  static int
>  construct_dpdk_mutex_options(const struct ovsrec_open_vswitch *ovs_cfg,
> - char ***argv, const int initial_size)
> + char ***argv, const int initial_size,
> + char **extra_args, const size_t extra_argc)
>  {
>  struct dpdk_exclusive_options_map {
>  const char *category;
> @@ -2356,9 +2385,15 @@ construct_dpdk_mutex_options(const struct
> ovsrec_open_vswitch *ovs_cfg,
>  ovs_abort(0, "Unable to cope with DPDK settings.");
>  }
> 
> -dpdk_option_extend(argv, ret, popt->eal_dpdk_options[found_pos],
> -  

Re: [ovs-dev] [PATCH] netdev-dpdk: Put cuse thread into quiescent state.

2015-03-26 Thread Traynor, Kevin

> -Original Message-
> From: Ben Pfaff [mailto:b...@nicira.com]
> Sent: Thursday, March 26, 2015 4:28 AM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Put cuse thread into quiescent
> state.
> 
> On Wed, Mar 25, 2015 at 05:43:06PM +, Kevin Traynor wrote:
> > As ovsrcu_synchronize() is used when setting virtio_dev to NULL,
> > ovsrcu_quiesce_start() must be called before destroy_device() returns.
> > Otherwise there will be warnings about the thread not quiescing.
> > Use of ovs_thread_create() instead of pthread_create() is optional but
> > as we are now setting quiescent state, it is added.
> >
> > Signed-off-by: Kevin Traynor 
> 
> Is the quiescent state permanent?  Normally ovsrcu_quiesce_start() and
> ovsrcu_quiesce_end() should be paired, but this patch only seems to add
> the former.

ovsrcs_quiesce_end() is called as part of the ovsrcu_syncronize() call when
destroy_device() is called. I could put an explicit ovsrcu_quiesce_end() call
for readability if that's preferred? It would be redundant in terms of
functionality but would show the start/end pair. Otherwise I could add a
comment to explain.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Put cuse thread into quiescent state.

2015-03-27 Thread Traynor, Kevin
> I changed this part of original patch so that there is no need for the
> fuse thread enter into quiescent state. But I think it is unavoidable
> without changing RCU implementation. So we need this patch. If you add
> comments and resend I will apply it.

Thanks. I will resend.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC 1/1] netdev-dpdk: add dpdk vhost-user ports

2015-04-02 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Pravin Shelar
> Sent: Friday, March 27, 2015 6:08 PM
> To: Flavio Leitner
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC 1/1] netdev-dpdk: add dpdk vhost-user ports
> 
> On Fri, Mar 27, 2015 at 6:57 AM, Flavio Leitner  wrote:
> > On Fri, 20 Mar 2015 10:53:14 -0700
> > Pravin Shelar  wrote:
> >
> >> On Thu, Mar 19, 2015 at 11:48 AM, Ciara Loftus
> >>  wrote:
> >> > This patch adds support for a new port type to the userspace
> >> > datapath called dpdkvhostuser. It adds to the existing
> >> > infrastructure of vhost-cuse, however disables vhost-cuse ports in
> >> > favour of vhost-user ports.
> >> >
> >> > A new dpdkvhostuser port will create a unix domain socket which when
> >> > provided to QEMU is used to facilitate communication between the
> >> > virtio-net device on the VM and the OVS port.
> >> >
> >> Thanks for the patch. I have pushed OVS DPDK vHost cuse patch. Once we
> >> add support for vhost user, vhost-cuse support should be dropped.
> >
> > Do you mean to literally take vhost-cuse out of the code? If so, yes,
> > I think that makes sense.
> >
> 
> Yes. thats what I meant.

One issue with removing is that vhost-user is only supported from QEMU 2.1.0
onwards. If anyone is using an older version of QEMU they would not be able 
to use vhost-user, whereas vhost-cuse will support older versions of QEMU.

> >
> >> I think we need to wait for dpdk 2.0 for vhost-user support. So can
> >> you post rebased patch once we move to DPDK 2.0?
> >
> > Makes sense too. I'd suggest to post another RFC in the meanwhile with
> > the documentation and code updated to remove vhost-cuse entirely so we
> > can spot/test bugs before the official DPDK 2.0 is out the doors.
> >
> 
> Sure, whenever it is ready.
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC 1/1] netdev-dpdk: add dpdk vhost-user ports

2015-04-07 Thread Traynor, Kevin

> -Original Message-
> From: Pravin Shelar [mailto:pshe...@nicira.com]
> Sent: Friday, April 3, 2015 6:32 AM
> To: Traynor, Kevin
> Cc: Flavio Leitner; Rogers, Gerald; dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC 1/1] netdev-dpdk: add dpdk vhost-user ports
> 
> On Thu, Apr 2, 2015 at 8:37 AM, Traynor, Kevin 
> wrote:
> >
> >> -Original Message-
> >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Pravin Shelar
> >> Sent: Friday, March 27, 2015 6:08 PM
> >> To: Flavio Leitner
> >> Cc: dev@openvswitch.org
> >> Subject: Re: [ovs-dev] [PATCH RFC 1/1] netdev-dpdk: add dpdk vhost-user
> ports
> >>
> >> On Fri, Mar 27, 2015 at 6:57 AM, Flavio Leitner  wrote:
> >> > On Fri, 20 Mar 2015 10:53:14 -0700
> >> > Pravin Shelar  wrote:
> >> >
> >> >> On Thu, Mar 19, 2015 at 11:48 AM, Ciara Loftus
> >> >>  wrote:
> >> >> > This patch adds support for a new port type to the userspace
> >> >> > datapath called dpdkvhostuser. It adds to the existing
> >> >> > infrastructure of vhost-cuse, however disables vhost-cuse ports in
> >> >> > favour of vhost-user ports.
> >> >> >
> >> >> > A new dpdkvhostuser port will create a unix domain socket which when
> >> >> > provided to QEMU is used to facilitate communication between the
> >> >> > virtio-net device on the VM and the OVS port.
> >> >> >
> >> >> Thanks for the patch. I have pushed OVS DPDK vHost cuse patch. Once we
> >> >> add support for vhost user, vhost-cuse support should be dropped.
> >> >
> >> > Do you mean to literally take vhost-cuse out of the code? If so, yes,
> >> > I think that makes sense.
> >> >
> >>
> >> Yes. thats what I meant.
> >
> > One issue with removing is that vhost-user is only supported from QEMU
> 2.1.0
> > onwards. If anyone is using an older version of QEMU they would not be able
> > to use vhost-user, whereas vhost-cuse will support older versions of QEMU.
> >
> 
> Are there distribution still using older QEMU?

I had a scan around for qemu versions - if anyone wants to correct/add please 
do so. 

Fedora 21: 2.1.3
Fedora 20: 1.6.2
CentOS 7/ RHEL 7.0: 1.5.3
RHEL 7.1: 2.1
Ubuntu 14.10: 2.1
Ubuntu 14.04: 2.0.0
Debian wheezy-backports: 2.1
Debian wheezy: 1.1.2
OpenSuse 13.2: 2.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Change eth rx burst size.

2015-04-09 Thread Traynor, Kevin

> -Original Message-
> From: Pravin Shelar [mailto:pshe...@nicira.com]
> Sent: Thursday, April 9, 2015 9:10 PM
> To: Traynor, Kevin; Daniele Di Proietto
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Change eth rx burst size.
> 
> On Wed, Apr 8, 2015 at 8:43 AM, Kevin Traynor 
> wrote:
> > Change eth rx burst size from 192 to 32. This significantly
> > improves performance for packets that will be forwarded
> > through dpdkvhost ports, as the max dpdkvhost tx burst
> > size (32) will not be exceeded. There are negligible
> > effects in other scenarios.
> >
> > Signed-off-by: Kevin Traynor 
> 
> Daniele,
> You mentioned that you are going to fix the issue by handling
> different burst size of vhost devices. Do you have the fix?

It wasn't clear in the commit message but just to clarify, the
dpdkvhost max burst size is set in the DPDK vhost library and if
we try and enqueue >32 packets only 32 will be enqueued. We can
add a back-off/retry but it will give better performance to
ensure that the vhost burst size is not exceeded.

> 
> > ---
> >  lib/netdev-dpdk.c |2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index f69154b..eb51072 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -145,7 +145,7 @@ static const struct rte_eth_txconf tx_conf = {
> >  .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS|ETH_TXQ_FLAGS_NOOFFLOADS,
> >  };
> >
> > -enum { MAX_RX_QUEUE_LEN = 192 };
> > +enum { MAX_RX_QUEUE_LEN = 32 };
> >  enum { MAX_TX_QUEUE_LEN = 384 };
> >  enum { DPDK_RING_SIZE = 256 };
> >  BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
> > --
> > 1.7.4.1
> >
> > ___
> > dev mailing list
> > dev@openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Change eth rx burst size.

2015-04-13 Thread Traynor, Kevin

> -Original Message-
> From: Daniele Di Proietto [mailto:diproiet...@vmware.com]
> Sent: Friday, April 10, 2015 2:01 PM
> To: Traynor, Kevin
> Cc: Pravin Shelar; dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Change eth rx burst size.
> 
> 
> > On 10 Apr 2015, at 00:22, Traynor, Kevin  wrote:
> >
> >
> >> -Original Message-
> >> From: Pravin Shelar [mailto:pshe...@nicira.com]
> >> Sent: Thursday, April 9, 2015 10:50 PM
> >> To: Traynor, Kevin
> >> Cc: Daniele Di Proietto; dev@openvswitch.org
> >> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Change eth rx burst size.
> >>
> >> On Thu, Apr 9, 2015 at 2:36 PM, Traynor, Kevin 
> >> wrote:
> >>>
> >>>> -----Original Message-
> >>>> From: Pravin Shelar [mailto:pshe...@nicira.com]
> >>>> Sent: Thursday, April 9, 2015 9:10 PM
> >>>> To: Traynor, Kevin; Daniele Di Proietto
> >>>> Cc: dev@openvswitch.org
> >>>> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Change eth rx burst size.
> >>>>
> >>>> On Wed, Apr 8, 2015 at 8:43 AM, Kevin Traynor 
> >>>> wrote:
> >>>>> Change eth rx burst size from 192 to 32. This significantly
> >>>>> improves performance for packets that will be forwarded
> >>>>> through dpdkvhost ports, as the max dpdkvhost tx burst
> >>>>> size (32) will not be exceeded. There are negligible
> >>>>> effects in other scenarios.
> >>>>>
> >>>>> Signed-off-by: Kevin Traynor 
> >>>>
> >>>> Daniele,
> >>>> You mentioned that you are going to fix the issue by handling
> >>>> different burst size of vhost devices. Do you have the fix?
> >>>
> >>> It wasn't clear in the commit message but just to clarify, the
> >>> dpdkvhost max burst size is set in the DPDK vhost library and if
> >>> we try and enqueue >32 packets only 32 will be enqueued. We can
> >>> add a back-off/retry but it will give better performance to
> >>> ensure that the vhost burst size is not exceeded.
> >>>
> >>
> >> How about adding for loop in vhost_send() to handle burst of packets
> >> larger than 32?
> >
> > Yeah, the DPDK sample app uses a backoff(small sleep)/retry loop but
> > you could put it in a busy loop too. I was planning to add some retry
> > code anyway as it's possible that you could get bursts of >32 from a
> > different interface, so it's good to handle this.
> >
> > The virtqueues are of very limited size (128 packets I think), so they
> > are not ideally suited to bursty traffic which I guess is why DPDK have
> > set the max burst size to 32. Time the PMD spends waiting/retrying is
> > time it could be doing something else, so I think it is better to reduce
> > the burst size where possible. I haven't seen any advantage to the 192 eth
> > rx burst size in my testing as changing to 32 I get negligible difference
> > on phy2phy tests (~1%) and I get a 2x-3x performance boost for vhost
> > loopback tests, but perhaps there are some use cases where it is
> > advantageous?
> 
> I agree with you guys that we should use a loop here,
> like in dpdk_queue_flush__(). How much slower would that be?
> 
> 

I've tested a simple retry loop with the 192 burst size and the performance
is very poor. With the retry loop and a burst size of 32, I see good
performance. I think this is because with the bigger burst sizes we are
hitting limits/contention in the vhost interface and we have to loop until
they are resolved. 

I think we need a retry loop, but it's not a performant replacement for
optimizing the burst size where we can at the moment. I can share the code
if you want to test it out on your systems.


___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Change eth rx burst size.

2015-04-14 Thread Traynor, Kevin

> -Original Message-
> From: Pravin Shelar [mailto:pshe...@nicira.com]
> Sent: Monday, April 13, 2015 6:53 PM
> To: Traynor, Kevin
> Cc: Daniele Di Proietto; dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Change eth rx burst size.
> 

...

> 
> We need to loop to handle different burst sizes. So Can you post patch
> with the loop and change default burst size to 32 for better
> performance. We can optimize performance for different burst sizes
> later on.

Sure, I will tidy up and post patch.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Add vhost enqueue retries.

2015-05-12 Thread Traynor, Kevin

> -Original Message-
> From: Pravin Shelar [mailto:pshe...@nicira.com]
> Sent: Tuesday, May 12, 2015 5:24 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Add vhost enqueue retries.
> 
> On Mon, May 11, 2015 at 5:28 AM, Kevin Traynor 
> wrote:
> > The max allowed burst size for a single vhost enqueue is 32.
> > This code facilitates trying to send greater than the burst
> > size of packets to the vhost interface by adding a retry loop
> > and calling vhost enqueue multiple times. As this could
> > potentially block, a timeout is added.
> >
> > Signed-off-by: Kevin Traynor 
> 
> > ---
> >  lib/netdev-dpdk.c |   43 +--
> >  1 files changed, 37 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index cbb266d..3ab5995 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -104,6 +104,11 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
> ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
> >  /* Character device cuse_dev_name. */
> >  char *cuse_dev_name = NULL;
> >
> > +/*
> > + * Maximum amount of time in micro seconds to try and enqueue to vhost.
> > + */
> > +#define VHOST_ENQ_RETRY_USECS 100
> > +
> >  static const struct rte_eth_conf port_conf = {
> >  .rxmode = {
> >  .mq_mode = ETH_MQ_RX_RSS,
> > @@ -901,7 +906,12 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, struct
> dp_packet **pkts,
> >  {
> >  struct netdev_dpdk *vhost_dev = netdev_dpdk_cast(netdev);
> >  struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(vhost_dev);
> > -int tx_pkts, i;
> > +struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
> > +unsigned int total_pkts = cnt;
> > +unsigned int tx_pkts, i;
> > +unsigned int expired = 0;
> > +uint64_t start;
> > +uint64_t timeout = VHOST_ENQ_RETRY_USECS * rte_get_timer_hz() / 1E6;
> >
> 
> I changed the transmit function a bit to avoid the division on every
> transmit call and pushed patch to master.
> 
> Thanks.

The changes look good - minimize the timer impact on the normal case, thanks.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 0/7] Userspace datapath performance improvements

2015-05-12 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Thursday, April 23, 2015 7:40 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 0/7] Userspace datapath performance improvements
> 
> This series contains different tweaks to improve the performance of the
> userspace datapath with DPDK ports.
> 
> The first commits reduce the size of struct dp_packet to three cachelines
> (two used by DPDK and one for our metadata). I've put in also some style
> fixes for lib/dp-packet.h
> 
> Then, a microoptimization in the packet metadata initialization (which
> appears to be a bottleneck for simple workflows), toghether with the
> dp_packet changes, seems to improve single flow phy2phy throughput
> 
> The last two commits change the way the userspace datapath handles output
> batches: this should give a significant improvement to multiple megaflows
> scenarios
> 

I've tested with a single flow and it shows a 9% throughput increase for
Phy-2-Phy on my system - and that doesn't account for the other improvements
for multiple megaflows. Patches look good to me. I've one minor comment - will
send inline. Thanks.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 7/7] dpif-netdev: Share emc and fast path output batches.

2015-05-12 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Thursday, April 23, 2015 7:40 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 7/7] dpif-netdev: Share emc and fast path output
> batches.
> 
> Until now the exact match cache processing was able to handle only four
> megaflow.  The rest of the packets was passed to the megaflow
> classifier.
> 
> The limit was arbitraly set to four also because the algorithm used to
> group packets in output batches didn't perform well with a lot of
> megaflows.
> 
> After changing the algorithm and after some performance testing it seems
> much better just share the same output batches between the exact match
> cache and the megaflow classifier.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/dpif-netdev.c | 71 +++--
> --
>  1 file changed, 29 insertions(+), 42 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 333f5a4..0c3f9e7 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3063,7 +3063,6 @@ packet_batch_init(struct packet_batch *batch, struct
> dp_netdev_flow *flow)
>  static inline void
>  packet_batch_execute(struct packet_batch *batch,
>   struct dp_netdev_pmd_thread *pmd,
> - enum dp_stat_type hit_type,
>   long long now)
>  {
>  struct dp_netdev_actions *actions;
> @@ -3077,15 +3076,12 @@ packet_batch_execute(struct packet_batch *batch,
> 
>  dp_netdev_execute_actions(pmd, batch->packets, batch->packet_count,
> true,
>actions->actions, actions->size);
> -
> -dp_netdev_count_packet(pmd, hit_type, batch->packet_count);
>  }
> 
>  static inline bool
>  dp_netdev_queue_batches(struct dp_packet *pkt,
>  struct dp_netdev_flow *flow, const struct miniflow
> *mf,
> -struct packet_batch *batches, size_t *n_batches,
> -size_t max_batches)
> +struct packet_batch *batches, size_t *n_batches)
>  {
>  struct packet_batch *batch;
> 
> @@ -3100,10 +3096,6 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
>  return true;
>  }
> 
> -if (OVS_UNLIKELY(*n_batches >= max_batches)) {
> -return false;
> -}
> -
>  batch = &batches[(*n_batches)++];
>  packet_batch_init(batch, flow);
>  packet_batch_update(batch, pkt, mf);
> @@ -3119,24 +3111,22 @@ dp_packet_swap(struct dp_packet **a, struct dp_packet
> **b)
>  }
> 
>  /* Try to process all ('cnt') the 'packets' using only the exact match cache
> - * 'flow_cache'. If a flow is not found for a packet 'packets[i]', or if
> there
> - * is no matching batch for a packet's flow, the miniflow is copied into
> 'keys'
> - * and the packet pointer is moved at the beginning of the 'packets' array.
> + * 'flow_cache'. If a flow is not found for a packet 'packets[i]', the
> + * miniflow is copied into 'keys' and the packet pointer is moved at the
> + * beginning of the 'packets' array.
>   *
>   * The function returns the number of packets that needs to be processed in
> the
>   * 'packets' array (they have been moved to the beginning of the vector).
>   */
>  static inline size_t
>  emc_processing(struct dp_netdev_pmd_thread *pmd, struct dp_packet **packets,
> -   size_t cnt, struct netdev_flow_key *keys, long long now)
> +   size_t cnt, struct netdev_flow_key *keys,
> +   struct packet_batch batches[], size_t *n_batches)
>  {
> -struct netdev_flow_key key;
> -struct packet_batch batches[4];
>  struct emc_cache *flow_cache = &pmd->flow_cache;
> -size_t n_batches, i;
> -size_t notfound_cnt = 0;
> +struct netdev_flow_key key;
> +size_t i, notfound_cnt = 0;
> 
> -n_batches = 0;
>  miniflow_initialize(&key.mf, key.buf);
>  for (i = 0; i < cnt; i++) {
>  struct dp_netdev_flow *flow;
> @@ -3152,8 +3142,7 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, struct
> dp_packet **packets,
> 
>  flow = emc_lookup(flow_cache, &key);
>  if (OVS_UNLIKELY(!dp_netdev_queue_batches(packets[i], flow, &key.mf,
> -  batches, &n_batches,
> -  ARRAY_SIZE(batches {
> +  batches, n_batches))) {
>  if (i != notfound_cnt) {
>  dp_packet_swap(&packets[i], &packets[notfound_cnt]);
>  }
> @@ -3162,9 +3151,7 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, struct
> dp_packet **packets,
>  }
>  }
> 
> -for (i = 0; i < n_batches; i++) {
> -packet_batch_execute(&batches[i], pmd, DP_STAT_EXACT_HIT, now);
> -}
> +dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT, cnt - notfound_cnt);
> 
>  return notfound_cnt;
>  }
> @@ -3172,7 +3159,8 @@ emc_processing(str

Re: [ovs-dev] netdev-dpdk: Doubt about rings in dpdkr port type

2015-05-18 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Mauricio Vásquez
> Sent: Monday, May 18, 2015 9:31 AM
> To: dev@openvswitch.org; ivano cerrato
> Subject: [ovs-dev] netdev-dpdk: Doubt about rings in dpdkr port type
> 
> I'm performing a series of VM2VM communication testing where a virtual
> machine is a packet generator while the other one just receives all the
> packets.
> 
> When I tested dpdk-ovs[1] I used a port type called dpdkclient: this port
> has 4 rings, (tx, rx, alloc_q and free_q), alloc_q and free_q are used
> because, due to a issue with DPDK[2], it's not possible to call
> rte_pktmbuf_alloc or rte_pktmbuf_free inside a guest. (dpdk-ovs provides
> packets to the guest application through alloc_q and the guest application
> request freeing packets through  free_q).
> 
> I'm trying to do the same tests using ovs with dpdk[3] but I realized that
> dpdkr port type has just the rx and tx rings, so at this moment the sender
> application does not have a way to get new mbufs.
> 
> Basically my questions are:
> * Is there another way to get mbufs in a guest application?

I don't know of another safe way to do this other than alloc/free queues.

> * If not, are the alloc_q and free_q planned to be added in dpdkr port?

No immediate plans for this. Depending on what the purpose of your testing is,
you could use vhost to source in one VM and sink in another.

> 
> Thanks in advance.
> 
> [1] https://github.com/01org/dpdk-ovs
> [2]
> https://github.com/01org/dpdk-
> ovs/blob/development/guest/ovs_client/ovs_client.c#L158-L175
> [3] https://github.com/openvswitch/ovs
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] dpdk: Ditch MAX_PKT_BURST macro.

2015-05-18 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ethan Jackson
> Sent: Monday, May 18, 2015 5:08 PM
> To: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] dpdk: Ditch MAX_PKT_BURST macro.
> 
> This version of the patch breaks sparse, I sent out another.
> 
> Ethan

The change makes sense - I tested this version on various dpdk interfaces and
as expected there was no performance issues.

> 
> On Sat, May 16, 2015 at 11:24 AM, Ethan Jackson  wrote:
> > The MAX_PKT_BURST and NETDEV_MAX_RX_BATCH macros had a confusing
> > relationship.  They basically purport to do the same thing, making it
> > unclear which is the source of truth.
> >
> > Furthermore, while NETDEV_MAX_RX_BATCH was 256, MAX_PKT_BURST was 32,
> > meaning we never process a batch larger than 32 packets further adding
> > to the confusion.
> >
> > This patch resolves the issue by removing MAX_PKT_BURST completely,
> > and shrinking the new NETDEV_MAX_BURST macro to only 32.  This should
> > have no change in the execution path except shrinking a couple of
> > structs and memory allocations (can't hurt).
> >
> > Signed-off-by: Ethan Jackson 
> > ---
> >  lib/dpif-netdev.c | 10 +-
> >  lib/netdev-dpdk.c |  7 ++-
> >  lib/netdev.h  |  2 +-
> >  3 files changed, 8 insertions(+), 11 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index f1d65f5..4216865 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -2500,7 +2500,7 @@ dp_netdev_process_rxq_port(struct
> dp_netdev_pmd_thread *pmd,
> > struct dp_netdev_port *port,
> > struct netdev_rxq *rxq)
> >  {
> > -struct dp_packet *packets[NETDEV_MAX_RX_BATCH];
> > +struct dp_packet *packets[NETDEV_MAX_BURST];
> >  int error, cnt;
> >
> >  cycles_count_start(pmd);
> > @@ -3027,7 +3027,7 @@ struct packet_batch {
> >
> >  struct dp_netdev_flow *flow;
> >
> > -struct dp_packet *packets[NETDEV_MAX_RX_BATCH];
> > +struct dp_packet *packets[NETDEV_MAX_BURST];
> >  };
> >
> >  static inline void
> > @@ -3397,7 +3397,7 @@ dp_execute_cb(void *aux_, struct dp_packet **packets,
> int cnt,
> >
> >  case OVS_ACTION_ATTR_TUNNEL_PUSH:
> >  if (*depth < MAX_RECIRC_DEPTH) {
> > -struct dp_packet *tnl_pkt[NETDEV_MAX_RX_BATCH];
> > +struct dp_packet *tnl_pkt[NETDEV_MAX_BURST];
> >  int err;
> >
> >  if (!may_steal) {
> > @@ -3423,7 +3423,7 @@ dp_execute_cb(void *aux_, struct dp_packet **packets,
> int cnt,
> >
> >  p = dp_netdev_lookup_port(dp, portno);
> >  if (p) {
> > -struct dp_packet *tnl_pkt[NETDEV_MAX_RX_BATCH];
> > +struct dp_packet *tnl_pkt[NETDEV_MAX_BURST];
> >  int err;
> >
> >  if (!may_steal) {
> > @@ -3485,7 +3485,7 @@ dp_execute_cb(void *aux_, struct dp_packet **packets,
> int cnt,
> >
> >  case OVS_ACTION_ATTR_RECIRC:
> >  if (*depth < MAX_RECIRC_DEPTH) {
> > -struct dp_packet *recirc_pkts[NETDEV_MAX_RX_BATCH];
> > +struct dp_packet *recirc_pkts[NETDEV_MAX_BURST];
> >
> >  if (!may_steal) {
> > dp_netdev_clone_pkt_batch(recirc_pkts, packets, cnt);
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 505ab75..b06f92a 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -99,8 +99,6 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
> ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
> >  #define TX_HTHRESH 0  /* Default values of TX host threshold reg. */
> >  #define TX_WTHRESH 0  /* Default values of TX write-back threshold reg. */
> >
> > -#define MAX_PKT_BURST 32   /* Max burst size for RX/TX */
> > -
> >  /* Character device cuse_dev_name. */
> >  char *cuse_dev_name = NULL;
> >
> > @@ -862,7 +860,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq_,
> >  nb_rx = rte_vhost_dequeue_burst(virtio_dev, qid,
> >  vhost_dev->dpdk_mp->mp,
> >  (struct rte_mbuf **)packets,
> > -MAX_PKT_BURST);
> > +NETDEV_MAX_BURST);
> >  if (!nb_rx) {
> >  return EAGAIN;
> >  }
> > @@ -889,8 +887,7 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, struct
> dp_packet **packets,
> >
> >  nb_rx = rte_eth_rx_burst(rx->port_id, rxq_->queue_id,
> >   (struct rte_mbuf **) packets,
> > - MIN((int) NETDEV_MAX_RX_BATCH,
> > - (int) MAX_PKT_BURST));
> > + NETDEV_MAX_BURST);
> >  if (!nb_rx) {
> >  return EAGAIN;
> >  }
> > diff --git a/lib/netdev.h b/lib/netdev.h
> > index 71c0af1..9d412ee 100644
> > --- a/lib/netdev.h
> > +++ b/lib/netdev.h
> > @@ -338,7 +338,7 @@ typedef void netdev_dump_queue_stats_cb(unsigned int
> queue_id,
> >  in

Re: [ovs-dev] [PATCH] netdev-dpdk: Use default NIC configuration.

2015-05-21 Thread Traynor, Kevin

> -Original Message-
> From: Daniele Di Proietto [mailto:diproiet...@vmware.com]
> Sent: Thursday, May 21, 2015 4:55 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use default NIC configuration.
> 
> This seems a really nice improvement to me. Thanks!
> 
> Unfortunately the patch doesn't apply anymore, could you
> rebase and include my ack?
> 
> Acked-by: Daniele Di Proietto 

Done - thanks for the ack.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Is the tx spinlock in __netdev_dpdk_vhost_send necessary?

2015-06-10 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Dongjun
> Sent: Tuesday, June 9, 2015 4:36 AM
> To: dev@openvswitch.org
> Subject: [ovs-dev] Is the tx spinlock in __netdev_dpdk_vhost_send necessary?
> 
> This is the source code of "__netdev_dpdk_vhost_send" in master branch:
> "
>  ...
>  /* There is vHost TX single queue, So we need to lock it for TX. */
>  rte_spinlock_lock(&vhost_dev->vhost_tx_lock);
> 
>  do {
>  unsigned int tx_pkts;
> 
>  tx_pkts = rte_vhost_enqueue_burst(virtio_dev, VIRTIO_RXQ,
>cur_pkts, cnt);
>  ...
> "
> 
> There is a spinlock for vshot TX single queue, but It seems the DPDK API
> "virtio_dev_rx" or "virtio_dev_merge_rx" called in
> "rte_vhost_enqueue_burst" has a lock-free mechanism.
> I tried to remove the spinlock and did a simple concurrency test, two
> TCP traffics form main thread(core 0) and pmd thread(core 1) were sent
> to the same guest, it worked well.
> 
> I deeply appreciate some help for solving my confusion.

The current thinking is that the locking should be provided by the application
(OVS) and not the vhost library as the application knows whether it is needed
or not. There was a recent DPDK patch to make locking optional in the vhost
library, however I think it may get superceeded be some other changes in the
DPDK vhost library but in any case we are looking to remove the additional lock.

> 
> 
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Is this an issue for DPDK vhost rss?

2015-06-11 Thread Traynor, Kevin

> -Original Message-
> From: Daniele Di Proietto [mailto:diproiet...@vmware.com]
> Sent: Thursday, June 11, 2015 4:36 PM
> To: Flavio Leitner
> Cc: dev@openvswitch.org; Traynor, Kevin; Gray, Mark D
> Subject: Re: [ovs-dev] Is this an issue for DPDK vhost rss?
> 
> 
> 
> On 11/06/2015 15:23, "Flavio Leitner"  wrote:
> 
> >On Thu, Jun 11, 2015 at 12:45:10PM +, Daniele Di Proietto wrote:
> >>
> >>
> >> On 11/06/2015 04:25, "Flavio Leitner"  wrote:
> >>
> >> >On Wed, Jun 10, 2015 at 03:13:21PM +, Daniele Di Proietto wrote:
> >> >>
> >> >>
> >> >> On 10/06/2015 12:48, "Gray, Mark D"  wrote:
> >> >> >
> >> >> >The vhost port won't generate an RSS hash because it is a virtual
> >>NIC.
> >> >> >
> >> >> >>
> >> >> >
> >> >> >> It doesn't cause a problem, just make the pkt fall into a slow
> >>path,
> >> >> >>should we
> >> >> >
> >> >> >> fix it?
> >> >>
> >> >> Thanks for investigating this.  We should definitely fix it.
> >> >>
> >> >> >
> >> >> >> The flag ol_flags may be useful for OVS or let DPDK fix this in
> >>vhost
> >> >> >>rcv.
> >> >> >
> >> >> >
> >> >> >
> >> >> >How do you propose that this would work? The RSS would still have
> >>to be
> >> >> >
> >> >> >generated in software.
> >> >> >
> >> >> >>
> >> >>
> >> >> A simple solution would be to reset the RSS hash inside
> >> >> netdev_dpdk_vhost_rxq_recv().  Other netdev providers that do not
> >> >>support
> >> >> reading the RSS hash (netdev-linux, netdev-bsd) call
> >> >> dp_packet_set_rss_hash(pkt, 0) on every received packet.
> >> >>
> >> >> This would probably have a small impact on performance, but it's
> >>better
> >> >> than trashing the exact match cache.
> >> >>
> >> >> I don't believe there's anything that DPDK can do here (like
> >>resetting
> >> >>the
> >> >> hash when the packet is freed), but please correct me if I'm wrong.
> >> >>
> >> >> Thoughts?
> >> >
> >> >I think Dongjun is right and we should use ol_flags.
> >> >Something like this:
> >> >
> >> >diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> >> >index e4c2593..ad315d1 100644
> >> >--- a/lib/dp-packet.h
> >> >+++ b/lib/dp-packet.h
> >> >@@ -518,7 +518,12 @@ static inline uint32_t
> >> > dp_packet_get_rss_hash(struct dp_packet *p)
> >> > {
> >> > #ifdef DPDK_NETDEV
> >> >-return p->mbuf.hash.rss;
> >> >+if (p->mbuf.ol_flags & PKT_RX_RSS_HASH) {
> >> >+return p->mbuf.hash.rss;
> >> >+}
> >> >+else {
> >> >+return 0;
> >> >+}
> >> > #else
> >> > return p->rss_hash;
> >> > #endif
> >> >
> >> >
> >> >I haven't tested that but rte_vhost_dequeue_burst() does
> >> >ol_flags = 0, so this would be a generic solution for all
> >> >dpdk devices.
> >> >
> >> >Thoughts?
> >> >
> >> >fbl
> >>
> >> This seems a much better solution.  I had no idea that ol_flags
> >> included PKT_RX_RSS_HASH.
> >>
> >> Unfortunately, I tested it and it appears that ixgbe_recv_pkts_vec()
> >> is not setting the flag correctly (it is documented in the comments).
> >> Other ideas?
> >
> >The ixgbe_recv_pkts_vec() should at least set ol_flags to zero
> >because it isn't providing a valid hash and what you get is garbage
> >from the OVS stack.  Is that what you are seeing?
> 
> No, the problem is the opposite: ixgbe_recv_pkts_vec() returns a valid
> rss hash, but doesn't set PKT_RX_RSS_HASH in ol_flags.  

agreed, that's what I'm seeing.

This doesn't
> affect correctness, but it forces us to always calculate the hash in
> software.

not with the present code, but with the fix suggested by Flavio it would.

> 
> Is it possible to make ixgbe_recv_pkts_vec() set PKT_RX_RSS_HASH in DPDK?

we'll find out about this - not sure if there's a reason for not setting it.

> Otherwise, should we avoid using the vectorized version?

that's debatable - from a performance view it may be better to leave it in
and take the hit elsewhere for the time being if there's a possibility that
it will be changed in DPDK later.

> 
> That said, we probably should go with your solution anyway.
> 
> >If so, ixgbe_recv_pkts_vec() needs to set ol_flags to zero. This will
> >fix for every user of dpdk, not just OVS.
> >
> >As a possible workaround for OVS we could fix dp_netdev_process_rxq_port()
> >to memset(0) the packets mbufs array which should be faster than loop
> >through the array and set fields to zero.
> 
> I don't understand what you mean here exactly
> 
> Thanks,
> 
> Daniele

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Is this an issue for DPDK vhost rss?

2015-06-15 Thread Traynor, Kevin

> -Original Message-
> From: Traynor, Kevin
> Sent: Thursday, June 11, 2015 4:55 PM
> To: Daniele Di Proietto; Flavio Leitner
> Cc: dev@openvswitch.org; Gray, Mark D
> Subject: RE: [ovs-dev] Is this an issue for DPDK vhost rss?
> 
> 
> > -Original Message-
> > From: Daniele Di Proietto [mailto:diproiet...@vmware.com]
> > Sent: Thursday, June 11, 2015 4:36 PM
> > To: Flavio Leitner
> > Cc: dev@openvswitch.org; Traynor, Kevin; Gray, Mark D
> > Subject: Re: [ovs-dev] Is this an issue for DPDK vhost rss?
> >
> >
> >
> > On 11/06/2015 15:23, "Flavio Leitner"  wrote:
> >
> > >On Thu, Jun 11, 2015 at 12:45:10PM +, Daniele Di Proietto wrote:
> > >>
> > >>
> > >> On 11/06/2015 04:25, "Flavio Leitner"  wrote:
> > >>
> > >> >On Wed, Jun 10, 2015 at 03:13:21PM +, Daniele Di Proietto wrote:
> > >> >>
> > >> >>
> > >> >> On 10/06/2015 12:48, "Gray, Mark D"  wrote:
> > >> >> >
> > >> >> >The vhost port won't generate an RSS hash because it is a virtual
> > >>NIC.
> > >> >> >
> > >> >> >>
> > >> >> >
> > >> >> >> It doesn't cause a problem, just make the pkt fall into a slow
> > >>path,
> > >> >> >>should we
> > >> >> >
> > >> >> >> fix it?
> > >> >>
> > >> >> Thanks for investigating this.  We should definitely fix it.
> > >> >>
> > >> >> >
> > >> >> >> The flag ol_flags may be useful for OVS or let DPDK fix this in
> > >>vhost
> > >> >> >>rcv.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >How do you propose that this would work? The RSS would still have
> > >>to be
> > >> >> >
> > >> >> >generated in software.
> > >> >> >
> > >> >> >>
> > >> >>
> > >> >> A simple solution would be to reset the RSS hash inside
> > >> >> netdev_dpdk_vhost_rxq_recv().  Other netdev providers that do not
> > >> >>support
> > >> >> reading the RSS hash (netdev-linux, netdev-bsd) call
> > >> >> dp_packet_set_rss_hash(pkt, 0) on every received packet.
> > >> >>
> > >> >> This would probably have a small impact on performance, but it's
> > >>better
> > >> >> than trashing the exact match cache.
> > >> >>
> > >> >> I don't believe there's anything that DPDK can do here (like
> > >>resetting
> > >> >>the
> > >> >> hash when the packet is freed), but please correct me if I'm wrong.
> > >> >>
> > >> >> Thoughts?
> > >> >
> > >> >I think Dongjun is right and we should use ol_flags.
> > >> >Something like this:
> > >> >
> > >> >diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> > >> >index e4c2593..ad315d1 100644
> > >> >--- a/lib/dp-packet.h
> > >> >+++ b/lib/dp-packet.h
> > >> >@@ -518,7 +518,12 @@ static inline uint32_t
> > >> > dp_packet_get_rss_hash(struct dp_packet *p)
> > >> > {
> > >> > #ifdef DPDK_NETDEV
> > >> >-return p->mbuf.hash.rss;
> > >> >+if (p->mbuf.ol_flags & PKT_RX_RSS_HASH) {
> > >> >+return p->mbuf.hash.rss;
> > >> >+}
> > >> >+else {
> > >> >+return 0;
> > >> >+}
> > >> > #else
> > >> > return p->rss_hash;
> > >> > #endif
> > >> >
> > >> >
> > >> >I haven't tested that but rte_vhost_dequeue_burst() does
> > >> >ol_flags = 0, so this would be a generic solution for all
> > >> >dpdk devices.
> > >> >
> > >> >Thoughts?
> > >> >
> > >> >fbl
> > >>
> > >> This seems a much better solution.  I had no idea that ol_flags
> > >> included PKT_RX_RSS_HASH.
> > >>
> > >> Unfortunately, I tested it and it appears that ixgbe_recv_pkts_vec()
> >

Re: [ovs-dev] Is this an issue for DPDK vhost rss?

2015-06-17 Thread Traynor, Kevin

> -Original Message-
> From: Flavio Leitner [mailto:f...@sysclose.org]
> Sent: Tuesday, June 16, 2015 6:28 PM
> To: Daniele Di Proietto
> Cc: Traynor, Kevin; dev@openvswitch.org; Gray, Mark D
> Subject: Re: [ovs-dev] Is this an issue for DPDK vhost rss?
> 
> On Mon, Jun 15, 2015 at 05:55:13PM +, Daniele Di Proietto wrote:
> > On 15/06/2015 12:16, "Traynor, Kevin"  wrote:
> > >There is a dpdk patchset that contains a potential fix for this and lots
> > >of
> > >other changes, but I haven't tested yet.
> > >https://urldefense.proofpoint.com/v2/url?u=http-3A__dpdk.org_ml_archives_d
> > >ev_2015-2DJune_018436.html&d=BQIFAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMN
> > >tXt-uEs&r=SmB5nZacmXNq0gKCC1s_Cw5yUNjxgD4v5kJqZ2uWLlE&m=FDVPKa2SqwpyYOTmA2
> > >zGdscCPa1FVdQG3Zbr4tHrp38&s=fjg7wArWvYLJlgEGKijK6W6ECAxGk660UrPF3rAr4Rs&e=
> 
> I skimmed over the patchset and it is an ABI breaker, so I
> think the policy demands to announce it on 2.1 and merge only
> in 2.2 release.

I tested with a subset of the patches and the ol_flags.rss bit is being set
correctly.

It could be merged and available in DPDK 2.1 with a config parameter. However,
even with this we'd need to assess the rest of the changes and compatibility
with OVS.

> 
> Maybe it is possible to separate the ol_flags fix into a
> smaller and simple patch to be accepted as bugfix yet in 2.1.

That would be ideal.

> 
> 
> > >> > Otherwise, should we avoid using the vectorized version?
> > >>
> > >> that's debatable - from a performance view it may be better to leave it
> > >>in
> > >> and take the hit elsewhere for the time being if there's a possibility
> > >>that
> > >> it will be changed in DPDK later.
> > >
> > >With a loop to reset the rss after the rte_vhost_dequeue_burst() call I'm
> > >seeing a drop of ~100kpps in vhost performance. Rx vectoristion gives a
> > >gain
> > >of about ~1 mpps on my system for the phy2phy cases.
> > >
> > >Using the ol_flags check is the right option when DPDK supports setting it
> > >correctly with rx vectorisation. In the meantime there's choice of using
> > >the
> > >reset loop or removing rx vectorisation - what do you think?
> >
> > Thanks for sharing these results.  I've observed that if OVS can't use the
> > RSS
> > hash and has to compute we lose ~2Mpps on a single flow phy2phy test.
> >
> > Despite this, I still think we should consider the ol_flags because:
> >
> > * DPDK drivers (other than ixgbe) should use ol_flags as well to mark the
> >   RSS hash as valid
> > * ixgbe_recv_pkts_vec() will report PKT_RX_RSS_HASH in future releases (the
> >   patch you sent will be effective since DPDK 2.2, right?)
> 
> I agree with the above.
> 
> > If the throughput with the non-vector rx routine is higher we can disable
> > the vector rx as a temporary workaround.
> 
> Could you point me to the vector and non-vector rx routines?
> I feel like I am missing something.
> 
> Thanks,
> fbl

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Is this an issue for DPDK vhost rss?

2015-06-17 Thread Traynor, Kevin

> -Original Message-
> From: Traynor, Kevin
> Sent: Wednesday, June 17, 2015 10:12 AM
> To: Flavio Leitner; Daniele Di Proietto
> Cc: dev@openvswitch.org; Gray, Mark D
> Subject: RE: [ovs-dev] Is this an issue for DPDK vhost rss?
> 
> 
> > -Original Message-
> > From: Flavio Leitner [mailto:f...@sysclose.org]
> > Sent: Tuesday, June 16, 2015 6:28 PM
> > To: Daniele Di Proietto
> > Cc: Traynor, Kevin; dev@openvswitch.org; Gray, Mark D
> > Subject: Re: [ovs-dev] Is this an issue for DPDK vhost rss?
> >
> > On Mon, Jun 15, 2015 at 05:55:13PM +0000, Daniele Di Proietto wrote:
> > > On 15/06/2015 12:16, "Traynor, Kevin"  wrote:
> > > >There is a dpdk patchset that contains a potential fix for this and lots
> > > >of
> > > >other changes, but I haven't tested yet.
> > > >https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__dpdk.org_ml_archives_d
> > > >ev_2015-2DJune_018436.html&d=BQIFAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-
> YihVMN
> > > >tXt-
> uEs&r=SmB5nZacmXNq0gKCC1s_Cw5yUNjxgD4v5kJqZ2uWLlE&m=FDVPKa2SqwpyYOTmA2
> > >
> >zGdscCPa1FVdQG3Zbr4tHrp38&s=fjg7wArWvYLJlgEGKijK6W6ECAxGk660UrPF3rAr4Rs&e=
> >
> > I skimmed over the patchset and it is an ABI breaker, so I
> > think the policy demands to announce it on 2.1 and merge only
> > in 2.2 release.
> 
> I tested with a subset of the patches and the ol_flags.rss bit is being set
> correctly.
> 
> It could be merged and available in DPDK 2.1 with a config parameter.
> However,
> even with this we'd need to assess the rest of the changes and compatibility
> with OVS.
> 
> >
> > Maybe it is possible to separate the ol_flags fix into a
> > smaller and simple patch to be accepted as bugfix yet in 2.1.
> 
> That would be ideal.
> 
> >
> >
> > > >> > Otherwise, should we avoid using the vectorized version?
> > > >>
> > > >> that's debatable - from a performance view it may be better to leave
> it
> > > >>in
> > > >> and take the hit elsewhere for the time being if there's a possibility
> > > >>that
> > > >> it will be changed in DPDK later.
> > > >
> > > >With a loop to reset the rss after the rte_vhost_dequeue_burst() call
> I'm
> > > >seeing a drop of ~100kpps in vhost performance. Rx vectoristion gives a
> > > >gain
> > > >of about ~1 mpps on my system for the phy2phy cases.
> > > >
> > > >Using the ol_flags check is the right option when DPDK supports setting
> it
> > > >correctly with rx vectorisation. In the meantime there's choice of using
> > > >the
> > > >reset loop or removing rx vectorisation - what do you think?
> > >
> > > Thanks for sharing these results.  I've observed that if OVS can't use
> the
> > > RSS
> > > hash and has to compute we lose ~2Mpps on a single flow phy2phy test.
> > >
> > > Despite this, I still think we should consider the ol_flags because:
> > >
> > > * DPDK drivers (other than ixgbe) should use ol_flags as well to mark the
> > >   RSS hash as valid
> > > * ixgbe_recv_pkts_vec() will report PKT_RX_RSS_HASH in future releases
> (the
> > >   patch you sent will be effective since DPDK 2.2, right?)
> >
> > I agree with the above.

I'm also seeing a ~3 mpps drop in phy2phy when not using the hardware hash.
It's a ~25% drop on phy2phy vs. a ~2.5% drop on the vhost interface with the
hash reset workaround. There may not be a DPDK fix that we can incorporate
until towards the end of the year (DPDK 2.2?) so IMHO, with this size of
performance drop it would be better to use the workaround until there's a
DPDK fix. 

> >
> > > If the throughput with the non-vector rx routine is higher we can disable
> > > the vector rx as a temporary workaround.
> >
> > Could you point me to the vector and non-vector rx routines?
> > I feel like I am missing something.
> >
> > Thanks,
> > fbl

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Is this an issue for DPDK vhost rss?

2015-06-17 Thread Traynor, Kevin

> -Original Message-
> From: Flavio Leitner [mailto:f...@sysclose.org]
> Sent: Wednesday, June 17, 2015 2:31 PM
> To: Traynor, Kevin
> Cc: Daniele Di Proietto; dev@openvswitch.org; Gray, Mark D
> Subject: Re: [ovs-dev] Is this an issue for DPDK vhost rss?
> 
> On Wed, Jun 17, 2015 at 01:18:29PM +, Traynor, Kevin wrote:
> >
> > > -Original Message-
> > > From: Traynor, Kevin
> > > Sent: Wednesday, June 17, 2015 10:12 AM
> > > To: Flavio Leitner; Daniele Di Proietto
> > > Cc: dev@openvswitch.org; Gray, Mark D
> > > Subject: RE: [ovs-dev] Is this an issue for DPDK vhost rss?
> > >
> > >
> > > > -Original Message-
> > > > From: Flavio Leitner [mailto:f...@sysclose.org]
> > > > Sent: Tuesday, June 16, 2015 6:28 PM
> > > > To: Daniele Di Proietto
> > > > Cc: Traynor, Kevin; dev@openvswitch.org; Gray, Mark D
> > > > Subject: Re: [ovs-dev] Is this an issue for DPDK vhost rss?
> > > >
> > > > On Mon, Jun 15, 2015 at 05:55:13PM +, Daniele Di Proietto wrote:
> > > > > On 15/06/2015 12:16, "Traynor, Kevin" 
> wrote:
> > > > > >There is a dpdk patchset that contains a potential fix for this and
> lots
> > > > > >of
> > > > > >other changes, but I haven't tested yet.
> > > > > >https://urldefense.proofpoint.com/v2/url?u=http-
> > > 3A__dpdk.org_ml_archives_d
> > > > > >ev_2015-2DJune_018436.html&d=BQIFAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-
> > > YihVMN
> > > > > >tXt-
> > > uEs&r=SmB5nZacmXNq0gKCC1s_Cw5yUNjxgD4v5kJqZ2uWLlE&m=FDVPKa2SqwpyYOTmA2
> > > > >
> > >
> >zGdscCPa1FVdQG3Zbr4tHrp38&s=fjg7wArWvYLJlgEGKijK6W6ECAxGk660UrPF3rAr4Rs&e=
> > > >
> > > > I skimmed over the patchset and it is an ABI breaker, so I
> > > > think the policy demands to announce it on 2.1 and merge only
> > > > in 2.2 release.
> > >
> > > I tested with a subset of the patches and the ol_flags.rss bit is being
> set
> > > correctly.
> > >
> > > It could be merged and available in DPDK 2.1 with a config parameter.
> > > However,
> > > even with this we'd need to assess the rest of the changes and
> compatibility
> > > with OVS.
> > >
> > > >
> > > > Maybe it is possible to separate the ol_flags fix into a
> > > > smaller and simple patch to be accepted as bugfix yet in 2.1.
> > >
> > > That would be ideal.
> > >
> > > >
> > > >
> > > > > >> > Otherwise, should we avoid using the vectorized version?
> > > > > >>
> > > > > >> that's debatable - from a performance view it may be better to
> leave
> > > it
> > > > > >>in
> > > > > >> and take the hit elsewhere for the time being if there's a
> possibility
> > > > > >>that
> > > > > >> it will be changed in DPDK later.
> > > > > >
> > > > > >With a loop to reset the rss after the rte_vhost_dequeue_burst()
> call
> > > I'm
> > > > > >seeing a drop of ~100kpps in vhost performance. Rx vectoristion
> gives a
> > > > > >gain
> > > > > >of about ~1 mpps on my system for the phy2phy cases.
> > > > > >
> > > > > >Using the ol_flags check is the right option when DPDK supports
> setting
> > > it
> > > > > >correctly with rx vectorisation. In the meantime there's choice of
> using
> > > > > >the
> > > > > >reset loop or removing rx vectorisation - what do you think?
> > > > >
> > > > > Thanks for sharing these results.  I've observed that if OVS can't
> use
> > > the
> > > > > RSS
> > > > > hash and has to compute we lose ~2Mpps on a single flow phy2phy test.
> > > > >
> > > > > Despite this, I still think we should consider the ol_flags because:
> > > > >
> > > > > * DPDK drivers (other than ixgbe) should use ol_flags as well to mark
> the
> > > > >   RSS hash as valid
> > > > > * ixgbe_recv_pkts_vec() will report PKT_RX_RSS_HASH in future
> releases
> > > (the
> > > > >   patch you sent will be effective since DPDK 2.2, right?)
> > > >
> > > > I agree with the above.
> >
> > I'm also seeing a ~3 mpps drop in phy2phy when not using the hardware hash.
> > It's a ~25% drop on phy2phy vs. a ~2.5% drop on the vhost interface with
> the
> > hash reset workaround. There may not be a DPDK fix that we can incorporate
> > until towards the end of the year (DPDK 2.2?) so IMHO, with this size of
> > performance drop it would be better to use the workaround until there's a
> > DPDK fix.
> 
> What happens if you just set of_flags in DPDK when there is a
> valid hash with the proposed patch from Daniele?

I'm assuming performance would be fine with that. I haven't gone through the
code to see if there's a simple DPDK patch to enable just that yet.

> 
> fbl
> 

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] INSTALL.DPDK: remove experimental statement

2015-06-18 Thread Traynor, Kevin

> -Original Message-
> From: Gray, Mark D
> Sent: Thursday, June 18, 2015 9:06 AM
> To: Flavio Leitner; diproiet...@vmware.com; Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: RE: [PATCH] INSTALL.DPDK: remove experimental statement
> 
> > Subject: [PATCH] INSTALL.DPDK: remove experimental statement
> >
> > Signed-off-by: Flavio Leitner 
> > ---
> >  INSTALL.DPDK.md | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> >   It looks like a good time to promote DPDK support before
> >   2.4 is branched off.
> >   What do you think?
> 
> +1 :)

Looks good to me!

> 
> >
> >
> > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index cdef6cf..a2d012e
> > 100644
> > --- a/INSTALL.DPDK.md
> > +++ b/INSTALL.DPDK.md
> > @@ -5,9 +5,6 @@ Open vSwitch can use Intel(R) DPDK lib to operate entirely
> > in  userspace. This file explains how to install and use Open vSwitch in
> such a
> > mode.
> >
> > -The DPDK support of Open vSwitch is considered experimental.
> > -It has not been thoroughly tested.
> > -
> >  This version of Open vSwitch should be built manually with `configure`
> and
> > `make`.
> >
> > --
> > 2.1.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.

2015-06-18 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Tuesday, June 16, 2015 7:39 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.
> 
> DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is
> set in 'ol_flags'.  Otherwise the hash is garbage and doesn't
> relate to the packet.
> 
> This fixes an issue with vhost, which, being a virtual NIC, doesn't
> compute the hash.
> 
> Unfortunately the ixgbe vPMD doesn't set the PKT_RX_RSS_HASH, forcing
> OVS to compute an hash is software.  This has a significant impact on
> performance (-30% throughput in a single flow setup) which can be
> mitigated in the CPU supports crc32c instructions.

As per the other thread on this I'm a bit concerned about the performance
drop from this patch, so I did some testing of this and alternative/
complimentary solutions.

Here's the options I looked at and some comments:
1. This patch in isolation: vhost drops about ~15% vhost-vhost and
phy-vhost-phy (because of sw hash) but also there is drops of ~25% for
phy-phy and ~15% drop for phy-ivshmem-phy. 

2. Leave the code as is and let EMC misses happen for vhost rx pkts:
I measure this at ~35% drop if missed *everytime* for vhost-vhost. We
see in testing that it can also never happen, but this is not realistic.
There should be no impact to other DPDK interfaces.

3. Add hash reset for packets from vhost: This is another way of forcing
the software hash for vhost rx and it is roughly equivalent in performance
to 1. for vhost-vhost (~15% drop). While there is a no significant drop
for phy-vhost-phy. There should be no impact to other DPDK interfaces.

4. Apply this patch and turn off Rx Vectorisation. vhost-vhost will drop
~15% as per 1. and there should be nothing significant for phy-vhost-phy.
We would lose the 10% gain that rx vectorisation gave us for phy-phy.
There should be no impact for dpdkr ports. 


In terms of not knowing whether the hw hash is valid or not if the flag is
not checked, I would have expected the pmd to return an error on config if
the hash wasn't supported, but I'm not sure that it does. 
In the worst case where there was an incorrect hash, it would miss the EMC
which is about a 45% drop for phy-phy. I would think it's pretty safe that
if we configure it, the hash will be correct but I guess there is a
possibility it wouldn't be. 

Even if it is possible to get a smaller patch to fix the underlying issue
in DPDK, it would be in DPDK 2.1 at the earliest meaning the performance
would remain low until sometime in August. If it's DPDK 2.2, then it would
be sometime in December. This would mean any performance drops would be
present in OVS 2.4 and possibly OVS 2.5.

Sorry :( but based on the performance drop with this patch in isolation it
would be a NAK from me. My preference would be 3 which gives best performance,
or 4 which is a bit lower for phy-phy but safer.

Kevin.

> 
> Reported-by: Dongjun 
> Suggested-by: Flavio Leitner 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/dp-packet.h   | 11 +++
>  lib/dpif-netdev.c |  2 +-
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index e4c2593..6840750 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -529,11 +529,22 @@ dp_packet_set_rss_hash(struct dp_packet *p, uint32_t
> hash)
>  {
>  #ifdef DPDK_NETDEV
>  p->mbuf.hash.rss = hash;
> +p->mbuf.ol_flags |= PKT_RX_RSS_HASH;
>  #else
>  p->rss_hash = hash;
>  #endif
>  }
> 
> +static inline bool
> +dp_packet_rss_valid(struct dp_packet *p)
> +{
> +#ifdef DPDK_NETDEV
> +return p->mbuf.ol_flags & PKT_RX_RSS_HASH;
> +#else
> +return true;
> +#endif
> +}
> +
>  #ifdef  __cplusplus
>  }
>  #endif
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index f13169c..c4a4b3a 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3036,7 +3036,7 @@ dpif_netdev_packet_get_rss_hash(struct dp_packet
> *packet,
>  {
>  uint32_t hash, recirc_depth;
> 
> -hash = dp_packet_get_rss_hash(packet);
> +hash = dp_packet_rss_valid(packet) ? dp_packet_get_rss_hash(packet) : 0;
>  if (OVS_UNLIKELY(!hash)) {
>  hash = miniflow_hash_5tuple(mf, 0);
>  dp_packet_set_rss_hash(packet, hash);
> --
> 2.1.4
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] OVS DPDK support technical direction

2014-12-18 Thread Traynor, Kevin
Hi John,

The path using DPDK through the netdev can use the exact match cache in 
dpif-netdev.c to enable very fast switching. You could think of this as an 
equivalent to the fast path in OVDK. Have a look at the rates that Madhu Challa 
presented at the OVS Fall Conference on Slide 3 
http://openvswitch.org/support/ovscon2014/18/1600-ovs_perf.pptx

Thanks,
Kevin.

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of John Xiao
> Sent: Wednesday, December 17, 2014 4:10 AM
> To: dev@openvswitch.org
> Subject: [ovs-dev] OVS DPDK support technical direction
> 
> Hi,
> 
> As we know that Intel stopped its effort on OVDK and OVS DPDK support
> will all be in OVS upstream, and I have one question after digging
> OVDK and OVS DPDK architecture:
> 
> - IMHO, OVS DPDK support can follow the existing "fast/slow path"
> model as being done in OVS linux kernel, i.e. just another fast path
> happening in DPDK threads, and this is pretty much the mechanism used
> by OVDK by extending a new kind of dpif provider. But for upstream OVS
> DPDK support, I can only see netdev extension, I don't know what's the
> reason behind, can anybody post any discussion relevant on this if
> any? Do we see huge benefit when every packet going through "netdev"
> type of data path instead of a fast/slow path fashion? or I missed
> anything obvious?
> 
> Thanks,
> John
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] OVS DPDK support technical direction

2014-12-19 Thread Traynor, Kevin

> -Original Message-
> From: John Xiao [mailto:johnxiao.cl...@gmail.com]
> Sent: Thursday, December 18, 2014 2:46 PM
> To: Traynor, Kevin
> Subject: Re: [ovs-dev] OVS DPDK support technical direction
> 
> Hi Kevin,
> 
> The performance number looks great!
> - For DPDK OVS, the number only shows NIC - OVS - NIC as vhost-user is
> not supported yet, do you know what's the schedule for vhost-user
> support?

Assuming you do mean vhost-user (and not userspace-vhost aka vhost-cuse), it is 
planned to be integrated into the DPDK vhost library in DPDK 2.0, which is 
targeted for the end of March. At that point we can look at upgrading OVS to 
support DPDK 2.0 and being able to use the DPDK vhost library for vhost-user.

> - On slide #8, it's interesting to see TCP_STREAM degrade quite a lot
> in offload case.
> 
> Thanks,
> Jun
> 
> On Thu, Dec 18, 2014 at 9:32 PM, Traynor, Kevin  
> wrote:
> > Hi John,
> >
> > The path using DPDK through the netdev can use the exact match cache in 
> > dpif-netdev.c to enable very
> fast switching. You could think of this as an equivalent to the fast path in 
> OVDK. Have a look at the
> rates that Madhu Challa presented at the OVS Fall Conference on Slide 3
> http://openvswitch.org/support/ovscon2014/18/1600-ovs_perf.pptx
> >
> > Thanks,
> > Kevin.
> >
> >> -Original Message-
> >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of John Xiao
> >> Sent: Wednesday, December 17, 2014 4:10 AM
> >> To: dev@openvswitch.org
> >> Subject: [ovs-dev] OVS DPDK support technical direction
> >>
> >> Hi,
> >>
> >> As we know that Intel stopped its effort on OVDK and OVS DPDK support
> >> will all be in OVS upstream, and I have one question after digging
> >> OVDK and OVS DPDK architecture:
> >>
> >> - IMHO, OVS DPDK support can follow the existing "fast/slow path"
> >> model as being done in OVS linux kernel, i.e. just another fast path
> >> happening in DPDK threads, and this is pretty much the mechanism used
> >> by OVDK by extending a new kind of dpif provider. But for upstream OVS
> >> DPDK support, I can only see netdev extension, I don't know what's the
> >> reason behind, can anybody post any discussion relevant on this if
> >> any? Do we see huge benefit when every packet going through "netdev"
> >> type of data path instead of a fast/slow path fashion? or I missed
> >> anything obvious?
> >>
> >> Thanks,
> >> John
> >> ___
> >> dev mailing list
> >> dev@openvswitch.org
> >> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v5 1/1] netdev-dpdk: add dpdk vhost ports

2014-12-21 Thread Traynor, Kevin
Hi,

I'd like to get some feedback about using RCU on the virtio_dev structure as 
per the comment below. Comments inline.

Thanks,
Kevin.

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Pravin Shelar
> Sent: Tuesday, October 21, 2014 12:00 AM
> To: Tahhan, Maryam
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v5 1/1] netdev-dpdk: add dpdk vhost ports
> 
> On Mon, Sep 29, 2014 at 10:10 AM, maryam.tahhan  
> wrote:
> > This patch implements the vhost-net offload API.  It adds support for
> > a new port type to userspace datapath called dpdkvhost. This allows KVM
> > (QEMU) to offload the servicing of virtio-net devices to it's associated
> > dpdkvhost port. Instructions for use are in INSTALL.DPDK.
> >
> > This has been tested on Intel multi-core platforms and with clients that
> > have virtio-net interfaces.
> >
> > ver 5:
> >   - rebased against latest master
> > ver 4:
> >   - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in 
> > utilities/automake.mk
> >   - rebased with master to work with DPDK 1.7
> > ver 3:
> >   - rebased with master
> > ver 2:
> >   - rebased with master
> >
> > Signed-off-by: maryam.tahhan 
> > ---
> Thanks for the patch, I have following comments.
> - NON_PMD_THREAD_TX_QUEUE is not used in the patch.
> - why do we need to limit MAX_BASENAME_SZ to 12
> - dev_basename[] should be configurable at run time or it should be
> vswitchd parameter.
> - any reason for liming MAX_PKT_BURST to 32?
> - netdev_dpdk->type should be a enum type with DPDK and VHOST members.
> I am not sure if you really need type member in netdev_dpdk. you can
> just define separate device ops for DPDK and VHOST of you was to do
> special processing for each device.
> - there is no check on gpa_to_vva() return value.
> - in function virtio_dev_rx() variables can be defined in local block
> rather than in function block. this simplifies code reading.
> - in function virtio_dev_rx() virtio_hdr is always zero, so it can be
> static variable rather than on stack.
> - in function virtio_dev_rx() virtio_hdr should be copied before packet data.
> - in function virtio_dev_rx() can we prefetch next vq->desc before
> coping packet data?
> - ovs-mutex is bit heavy weight, can you use rte-spin-lock
> - can you reverse name of virtio_dev_rx(), virtio_dev_tx(), since this
> is ovs-netdev code we could use name same as ovs context.
> - virtio_dev_tx() is always called from PMD thread, so no need to the check.
> - there is no synchronization for netdev_dpdk->rx_count and tx_count.
> - destroy_device() can use RCU based mechanism rather than polling packet 
> count.
> - Currently new_device() assigns post in linear fashion. How does
> handle multiple bridge case where vhost port might belong to available
> port on different bridge?
> - virtio_net_device_ops ops should be set before registering cuse device.
> - destroy function sets virtio_dev to NULL, so no need to check for
> remove flag, ofcourse you need to RCUfy it first.
(Additional comment later added): device destroy can set p->virtio_dev to NULL 
and call
ovsrcu_postpone(). All read/write code path can check for NULL before
using virtio_dev. That should be enough, I do not see any need to
check tx_count or rx_count.


I'm seeing two issues with using RCU for the virtio_dev...

1. The memory of the virtio_dev is not allocated in netdev-dpdk, but is passed 
in as part of the new_device() callback from the DPDK vhost library. As soon as 
the destroy_device() fn returns, I don't think we can have any guarantees about 
the state of this memory. So for another thread to continue to use it, or for a 
postpone callback to modify it after the destroy_device() fn has returned would 
be unsafe. In any of the other examples I see of RCU set/postpone, it looks 
like the memory has been allocated locally so we can guarantee the integrity of 
it until all threads have quiesced, at which time we can free it in the 
postpone callback. 

2. The cuse thread loop is in the fuse library and only occasionally calls the 
OVS code for new_device()/destroy_device(), so we can't make it periodically 
quiesce - we can only call ovsrcu_quiesce_start() at the very start of the 
thread. When I put a call to ovsrcu_postpone() in destroy_device(), I see that 
the rcu thread is continually waiting for the cuse thread to quiesce and the 
postpone callback does not get called. I'm thinking that if the 
ovsrcu_postpone() fn is used in destroy_device() then the cuse thread would 
have to have a periodic call to quiesce which I don't think is possible? 

I think we may have no option but to block in the destroy_device() fn until we 
know that virtio_dev is not being used anymore by other threads. We can stop 
threads initiating more operations on it by checking for NULL or REMOVE flag 
and we can check that in-progress operations are finished by the tx/rx counters.

Let me know if I've misunderstood and there is a way to use the RCUs here.

> 

Re: [ovs-dev] OVS DPDK support technical direction

2014-12-21 Thread Traynor, Kevin

> -Original Message-
> From: John Xiao [mailto:johnxiao.cl...@gmail.com]
> Sent: Saturday, December 20, 2014 2:19 AM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK support technical direction
> 
> Thanks Kevin,
> 
> There is a librte_vhost dir in upstream DPDK repo, would that be the
> base of DPDK vhost library you are talking about? 

Yes, currently it supports vhost-cuse. It will be changed to support 
vhost-user, there is an RFC patch available at present
http://dpdk.org/ml/archives/dev/2014-December/009798.html


BTW, could you help
> to elaborate what would be exactly included in the library?

I don't have any docs on it other than the RFC. You can ask on the DPDK mailing 
list

> 
> John
> 
> On Fri, Dec 19, 2014 at 11:53 PM, Traynor, Kevin
>  wrote:
> >
> >> -Original Message-
> >> From: John Xiao [mailto:johnxiao.cl...@gmail.com]
> >> Sent: Thursday, December 18, 2014 2:46 PM
> >> To: Traynor, Kevin
> >> Subject: Re: [ovs-dev] OVS DPDK support technical direction
> >>
> >> Hi Kevin,
> >>
> >> The performance number looks great!
> >> - For DPDK OVS, the number only shows NIC - OVS - NIC as vhost-user is
> >> not supported yet, do you know what's the schedule for vhost-user
> >> support?
> >
> > Assuming you do mean vhost-user (and not userspace-vhost aka vhost-cuse), 
> > it is planned to be
> integrated into the DPDK vhost library in DPDK 2.0, which is targeted for the 
> end of March. At that
> point we can look at upgrading OVS to support DPDK 2.0 and being able to use 
> the DPDK vhost library
> for vhost-user.
> >
> >> - On slide #8, it's interesting to see TCP_STREAM degrade quite a lot
> >> in offload case.
> >>
> >> Thanks,
> >> Jun
> >>
> >> On Thu, Dec 18, 2014 at 9:32 PM, Traynor, Kevin  
> >> wrote:
> >> > Hi John,
> >> >
> >> > The path using DPDK through the netdev can use the exact match cache in 
> >> > dpif-netdev.c to enable
> very
> >> fast switching. You could think of this as an equivalent to the fast path 
> >> in OVDK. Have a look at
> the
> >> rates that Madhu Challa presented at the OVS Fall Conference on Slide 3
> >> http://openvswitch.org/support/ovscon2014/18/1600-ovs_perf.pptx
> >> >
> >> > Thanks,
> >> > Kevin.
> >> >
> >> >> -Original Message-
> >> >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of John Xiao
> >> >> Sent: Wednesday, December 17, 2014 4:10 AM
> >> >> To: dev@openvswitch.org
> >> >> Subject: [ovs-dev] OVS DPDK support technical direction
> >> >>
> >> >> Hi,
> >> >>
> >> >> As we know that Intel stopped its effort on OVDK and OVS DPDK support
> >> >> will all be in OVS upstream, and I have one question after digging
> >> >> OVDK and OVS DPDK architecture:
> >> >>
> >> >> - IMHO, OVS DPDK support can follow the existing "fast/slow path"
> >> >> model as being done in OVS linux kernel, i.e. just another fast path
> >> >> happening in DPDK threads, and this is pretty much the mechanism used
> >> >> by OVDK by extending a new kind of dpif provider. But for upstream OVS
> >> >> DPDK support, I can only see netdev extension, I don't know what's the
> >> >> reason behind, can anybody post any discussion relevant on this if
> >> >> any? Do we see huge benefit when every packet going through "netdev"
> >> >> type of data path instead of a fast/slow path fashion? or I missed
> >> >> anything obvious?
> >> >>
> >> >> Thanks,
> >> >> John
> >> >> ___
> >> >> dev mailing list
> >> >> dev@openvswitch.org
> >> >> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] OVS DPDK support technical direction

2014-12-21 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Traynor, Kevin
> Sent: Sunday, December 21, 2014 5:52 PM
> To: John Xiao
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK support technical direction
> 
> 
> > -Original Message-
> > From: John Xiao [mailto:johnxiao.cl...@gmail.com]
> > Sent: Saturday, December 20, 2014 2:19 AM
> > To: Traynor, Kevin
> > Cc: dev@openvswitch.org
> > Subject: Re: [ovs-dev] OVS DPDK support technical direction
> >
> > Thanks Kevin,
> >
> > There is a librte_vhost dir in upstream DPDK repo, would that be the
> > base of DPDK vhost library you are talking about?
> 
> Yes, currently it supports vhost-cuse. It will be changed to support 
> vhost-user, there is an RFC patch
> available at present
> http://dpdk.org/ml/archives/dev/2014-December/009798.html
> 
> 
> BTW, could you help
> > to elaborate what would be exactly included in the library?
> 
> I don't have any docs on it other than the RFC. You can ask on the DPDK 
> mailing list

To clarify, I was referring to extensions for vhost-user. For information about 
the current vhost-cuse only library you can look here http://www.dpdk.org/doc

> 
> >
> > John
> >
> > On Fri, Dec 19, 2014 at 11:53 PM, Traynor, Kevin
> >  wrote:
> > >
> > >> -Original Message-
> > >> From: John Xiao [mailto:johnxiao.cl...@gmail.com]
> > >> Sent: Thursday, December 18, 2014 2:46 PM
> > >> To: Traynor, Kevin
> > >> Subject: Re: [ovs-dev] OVS DPDK support technical direction
> > >>
> > >> Hi Kevin,
> > >>
> > >> The performance number looks great!
> > >> - For DPDK OVS, the number only shows NIC - OVS - NIC as vhost-user is
> > >> not supported yet, do you know what's the schedule for vhost-user
> > >> support?
> > >
> > > Assuming you do mean vhost-user (and not userspace-vhost aka vhost-cuse), 
> > > it is planned to be
> > integrated into the DPDK vhost library in DPDK 2.0, which is targeted for 
> > the end of March. At that
> > point we can look at upgrading OVS to support DPDK 2.0 and being able to 
> > use the DPDK vhost library
> > for vhost-user.
> > >
> > >> - On slide #8, it's interesting to see TCP_STREAM degrade quite a lot
> > >> in offload case.
> > >>
> > >> Thanks,
> > >> Jun
> > >>
> > >> On Thu, Dec 18, 2014 at 9:32 PM, Traynor, Kevin 
> > >>  wrote:
> > >> > Hi John,
> > >> >
> > >> > The path using DPDK through the netdev can use the exact match cache 
> > >> > in dpif-netdev.c to enable
> > very
> > >> fast switching. You could think of this as an equivalent to the fast 
> > >> path in OVDK. Have a look at
> > the
> > >> rates that Madhu Challa presented at the OVS Fall Conference on Slide 3
> > >> http://openvswitch.org/support/ovscon2014/18/1600-ovs_perf.pptx
> > >> >
> > >> > Thanks,
> > >> > Kevin.
> > >> >
> > >> >> -Original Message-
> > >> >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of John Xiao
> > >> >> Sent: Wednesday, December 17, 2014 4:10 AM
> > >> >> To: dev@openvswitch.org
> > >> >> Subject: [ovs-dev] OVS DPDK support technical direction
> > >> >>
> > >> >> Hi,
> > >> >>
> > >> >> As we know that Intel stopped its effort on OVDK and OVS DPDK support
> > >> >> will all be in OVS upstream, and I have one question after digging
> > >> >> OVDK and OVS DPDK architecture:
> > >> >>
> > >> >> - IMHO, OVS DPDK support can follow the existing "fast/slow path"
> > >> >> model as being done in OVS linux kernel, i.e. just another fast path
> > >> >> happening in DPDK threads, and this is pretty much the mechanism used
> > >> >> by OVDK by extending a new kind of dpif provider. But for upstream OVS
> > >> >> DPDK support, I can only see netdev extension, I don't know what's the
> > >> >> reason behind, can anybody post any discussion relevant on this if
> > >> >> any? Do we see huge benefit when every packet going through "netdev"
> > >> >> type of data path instead of a fast/slow path fashion? or I missed
> > >> >> anything obvious?
> > >> >>
> > >> >> Thanks,
> > >> >> John
> > >> >> ___
> > >> >> dev mailing list
> > >> >> dev@openvswitch.org
> > >> >> http://openvswitch.org/mailman/listinfo/dev
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] miniflow: Use 64-bit data.

2015-01-13 Thread Traynor, Kevin
Hi Jarno,

I ran perf top before and after the miniflow patches. It looks like it is 
missing in the emc after the patches and hence the performance drops.

Before: b2623fdbb31570e7e3e39ac9c074b0978b4dd2dc ~13 mpps

  22.83%  ovs-vswitchd[.] ixgbe_recv_pkts
  19.69%  ovs-vswitchd[.] miniflow_extract
  16.86%  ovs-vswitchd[.] emc_processing
  13.12%  ovs-vswitchd[.] dp_netdev_process_rxq_port.isra.16
  11.76%  libc-2.18.so[.] __memcmp_sse4_1
  11.32%  ovs-vswitchd[.] ixgbe_xmit_pkts_vec
   1.52%  ovs-vswitchd[.] netdev_dpdk_eth_send
   0.46%  libc-2.18.so[.] __memcpy_sse2_unaligned
   0.44%  ovs-vswitchd[.] dpdk_queue_flush__
   0.37%  ovs-vswitchd[.] memcmp@plt
   0.19%  [vdso]  [.] __vdso_clock_gettime
   0.13%  ovs-vswitchd[.] netdev_send
   0.13%  ovs-vswitchd[.] netdev_dpdk_rxq_recv
   0.12%  ovs-vswitchd[.] dp_execute_cb
   0.10%  ovs-vswitchd[.] netdev_rxq_recv
   0.10%  ovs-vswitchd[.] cmap_find
   0.10%  ovs-vswitchd[.] odp_execute_actions
   0.07%  ovs-vswitchd[.] pmd_thread_main
   0.07%  libc-2.18.so[.] __clock_gettime
   0.05%  ovs-vswitchd[.] dp_netdev_lookup_port
   0.05%  ovs-vswitchd[.] dp_netdev_input
   0.05%  ovs-vswitchd[.] nl_attr_type
   0.04%  ovs-vswitchd[.] time_timespec__
   
After: d70e8c28f992c0d8c2918aa0733b935ce1a0caed ~4.8 mpps

 20.89%  ovs-vswitchd[.] dpcls_lookup
  14.72%  ovs-vswitchd[.] emc_processing
  14.45%  libc-2.18.so[.] __memcmp_sse4_1
  11.71%  ovs-vswitchd[.] emc_insert
   8.73%  ovs-vswitchd[.] ixgbe_recv_pkts
   7.76%  ovs-vswitchd[.] fast_path_processing
   7.60%  ovs-vswitchd[.] miniflow_extract
   4.62%  ovs-vswitchd[.] dp_netdev_process_rxq_port.isra.16
   4.27%  ovs-vswitchd[.] ixgbe_xmit_pkts_vec
   2.89%  ovs-vswitchd[.] cmap_find_batch
   0.52%  ovs-vswitchd[.] memcmp@plt
   0.49%  ovs-vswitchd[.] netdev_dpdk_eth_send
   0.18%  ovs-vswitchd[.] dpdk_queue_flush__
   0.16%  libc-2.18.so[.] memset
   0.11%  libc-2.18.so[.] __memcpy_sse2_unaligned
   0.08%  [vdso]  [.] __vdso_clock_gettime
   0.06%  ovs-vswitchd[.] netdev_dpdk_rxq_recv
   0.05%  ovs-vswitchd[.] cmap_find
   0.04%  ovs-vswitchd[.] netdev_send
   0.04%  ovs-vswitchd[.] dp_netdev_input
   0.04%  ovs-vswitchd[.] netdev_rxq_recv
   0.03%  ovs-vswitchd[.] pmd_thread_main
   0.03%  ovs-vswitchd[.] odp_execute_actions
   0.03%  ovs-vswitchd[.] dp_execute_cb
   0.02%  libc-2.18.so[.] __clock_gettime

Thanks,
Kevin.

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jarno Rajahalme
> Sent: Monday, January 12, 2015 6:57 PM
> To: Loftus, Ciara
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 2/2] miniflow: Use 64-bit data.
> 
> 
> On Jan 12, 2015, at 9:16 AM, Loftus, Ciara  wrote:
> 
> > Hi,
> >
> > After running some performance tests on the latest master, it appears that 
> > this commit has caused
> netdev DPDK performance to drop significantly (by > 50 %). Has anybody else 
> seen this?
> >
> 
> I saw notified of this last week, and did some checking on the patch to find 
> out what is going on, but
> nothing came up. I’d need someone running the DPDK performance tests to send 
> me perf data taken while
> running the tests so that I have something to go with.
> 
> Thanks,
> 
>   Jarno
> 
> > Regards,
> > Ciara
> >
> > -Original Message-
> > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jarno Rajahalme
> > Sent: Wednesday, December 17, 2014 6:31 PM
> > To: dev@openvswitch.org
> > Subject: [ovs-dev] [PATCH 2/2] miniflow: Use 64-bit data.
> >
> > So far the compressed flow data in struct miniflow has been in 32-bit
> > words with a 63-bit map, allowing for a maximum size of struct flow of
> > 252 bytes.  With the forthcoming Geneve options this is not sufficient
> > any more.
> >
> > This patch solves the problem by changing the miniflow data to 64-bit
> > words, doubling the flow max size to 504 bytes.  Since the word size
> > is doubled, there is some loss in compression efficiency.  To counter
> > this some of the flow fields have been reordered to keep related
> > fields together (e.g., the source and destination IP addresses share
> > the same 64-bit word).
> >
> > This change should speed up flow data processing on 64-bit CPUs, which
> > may help counterbalance the impact of making the struct flow bigger in
> > the future.
> >
> > Classifier lookup stage boundaries are also changed to 64-bit
> > alignment, as the current algorithm depends on each miniflow word to
> > not be split between ranges.  This has resulted in new padding (part
> > of the 'mpls_lse' field).
> >
> > The 'dp_hash' field is also moved to packet metadata to eliminate
>

Re: [ovs-dev] [PATCH] DPDK Initialization in OVS: Modification done in DPDK initialization functions to reflect recent modifications to DPDK codebase

2015-01-23 Thread Traynor, Kevin
Hi Shankari,

I can't see any attachment. Can you send the patch using git send-email ?

Few questions 
- What version and recent changes in DPDK are you referring to?
- What compiler/linker errors are you fixing?
- Did you run make distcheck?

FYI - There was a patch submitted for DPDK v1.8.0 support here,
http://openvswitch.org/pipermail/dev/2014-December/049720.html

The reason I ask about 'make distcheck' is in DPDK v1.8.0 there are mbuf 
changes where they use offsets to the data. Even with changing the underlying 
access methods to the buffers, there are some unit tests failing. In one case
a change is needed to ofpbuf_resize__() to atomically move the buffer so that 
the base/offset combination is valid. There are some other failures that are 
being investigated. 

Thanks,
Kevin.

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Shankari 
> Vaidyalingam
> Sent: Friday, January 23, 2015 12:16 AM
> To: Ben Pfaff
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] DPDK Initialization in OVS: Modification done 
> in DPDK initialization
> functions to reflect recent modifications to DPDK codebase
> 
> Apologies. Forgot to to include the diffs in the attachment.
> Please find enclosed the diffs along with the description of the changes.
> 
> Regards
> Shankari.V
> 
> On Fri, Jan 23, 2015 at 4:37 AM, Ben Pfaff  wrote:
> 
> > On Fri, Jan 23, 2015 at 12:16:37AM +0530, Shankari Vaidyalingam wrote:
> > > I have enclosed the changes made in OVS DPDK code.
> > > I'm planning to submit these changes as a patch.
> > > Without the changes mentioned in this patch, compiler and linker errors
> > > (enclosed)  are seen during the build.
> > > Hence please review the changes (in the enclosed file)  I have made in
> > the
> > > OVS code repository so that I can proceed further.
> >
> > Where is the file?
> >
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] miniflow: Fix miniflow push of L4 port numbers.

2015-02-02 Thread Traynor, Kevin
This patch is for the issue reported here 
http://openvswitch.org/pipermail/dev/2015-January/050368.html. It's tested for 
TCP and UDP. It would have been slightly more efficient to replace the 
miniflow_push_words() with miniflow_push_words_32() and remove the explicit 
igmp_group_ipv4 padding, but it would make it less readable/consistent.

Thanks,
Kevin.

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Kevin Traynor
> Sent: Monday, February 2, 2015 10:48 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH] miniflow: Fix miniflow push of L4 port numbers.
> 
> Replace a 64 bit copy of L4 src/dst ports that was also
> including additional packet params (e.g. TCP Seq Num). This
> was later resulting in all packets from the flow missing in
> the EMC.
> 
> Signed-off-by: Kevin Traynor 
> Reported-by: Ciara Loftus 
> ---
>  lib/flow.c |9 ++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/flow.c b/lib/flow.c
> index 43bb003..b0cb71d 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -672,21 +672,24 @@ miniflow_extract(struct ofpbuf *packet, const struct 
> pkt_metadata *md,
>  miniflow_push_be32(mf, arp_tha[2], 0);
>  miniflow_push_be32(mf, tcp_flags,
> TCP_FLAGS_BE32(tcp->tcp_ctl));
> -miniflow_push_words(mf, tp_src, &tcp->tcp_src, 1);
> +miniflow_push_be16(mf, tp_src, tcp->tcp_src);
> +miniflow_push_be16(mf, tp_dst, tcp->tcp_dst);
>  miniflow_pad_to_64(mf, igmp_group_ip4);
>  }
>  } else if (OVS_LIKELY(nw_proto == IPPROTO_UDP)) {
>  if (OVS_LIKELY(size >= UDP_HEADER_LEN)) {
>  const struct udp_header *udp = data;
> 
> -miniflow_push_words(mf, tp_src, &udp->udp_src, 1);
> +miniflow_push_be16(mf, tp_src, udp->udp_src);
> +miniflow_push_be16(mf, tp_dst, udp->udp_dst);
>  miniflow_pad_to_64(mf, igmp_group_ip4);
>  }
>  } else if (OVS_LIKELY(nw_proto == IPPROTO_SCTP)) {
>  if (OVS_LIKELY(size >= SCTP_HEADER_LEN)) {
>  const struct sctp_header *sctp = data;
> 
> -miniflow_push_words(mf, tp_src, &sctp->sctp_src, 1);
> +miniflow_push_be16(mf, tp_src, sctp->sctp_src);
> +miniflow_push_be16(mf, tp_dst, sctp->sctp_dst);
>  miniflow_pad_to_64(mf, igmp_group_ip4);
>  }
>  } else if (OVS_LIKELY(nw_proto == IPPROTO_ICMP)) {
> --
> 1.7.4.1
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Allow changing NON_PMD_CORE_ID for testing purpose.

2015-02-05 Thread Traynor, Kevin
Hi,

I've done some quick testing on this patch for different values of 
NON_PMD_CORE_ID and it looks to be working fine. 

The only issue I've seen is that I'm isolcpus for all cores except core 0, so 
when I change the NON_PMD_CORE_ID to non-zero, the pmd is being scheduled on 
core 0 also and throughput suffers depending on system load.

I'd suggest adding a comment beside the #define to warn that this value affects 
where the pmd is scheduled, and it may be on a non-isolated core.

Thanks,
Kevin.

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di 
> Proietto
> Sent: Thursday, February 5, 2015 12:58 AM
> To: Alex Wang
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Allow changing NON_PMD_CORE_ID 
> for testing purpose.
> 
> Hey Alex,
> 
> I remember that calling certain DPDK functions (that we use to
> initialize devices) with '_lcore_id !=0' resulted in an error. If this
> is not the case anymore with DPDK 1.7.1 (i.e. if you tested this patch
> with NON_PMD_CORE_ID!=0 and it worked), then I have no objection.
> 
> I would test this myself, but I don't have access to a DPDK capable
> system right now. If you want to be more sure about this you can build
> DPDK with CONFIG_RTE_LIBRTE_MBUF_DEBUG=y and watch for failed
> assertions.
> 
> Last thing: we could also avoid using the _lcore_id to set the
> affinity of a thread (e.g. a thread with '_lcore_id == 1' doesn't
> necessarily need to be pinned to cpu 1). This would be another way to
> achieve the same goal, but I prefer your approach, because it is more
> consistent with DPDK internals.
> 
> Hope this helps. Let me know if there's anything else about this
> 
> Thanks,
> 
> Daniele
> 
> 2015-02-05 0:14 GMT+01:00 Alex Wang :
> > Hey Daniele,
> >
> > Do you still remember why you mentioned:
> >
> > "/* We have to use 0 to allow non pmd threads to perform certain DPDK
> > * operations, like rte_eth_dev_configure(). */
> > "
> > in your commit: db73f716 (netdev-dpdk: Fix race condition with DPDK
> > mempools in non pmd threads)
> >
> > This posted commit works during my manual test.  And the
> > rte_eth_dev_configure() in dpdk-1.7.1 does not require the caller lcore id
> > to
> > be 0.
> >
> > But Pravin mentioned that you may know more about why use 0 for non-pmd
> > threads (to prevent crash?).
> >
> > Could you share some thoughts?
> >
> > Thanks,
> > Alex Wang,
> >
> > On Wed, Feb 4, 2015 at 2:14 PM, Pravin Shelar  wrote:
> >>
> >> On Tue, Feb 3, 2015 at 5:54 PM, Alex Wang  wrote:
> >> > For testing purpose, developers may want to change the NON_PMD_CORE_ID
> >> > and use a different core for non-pmd threads.  Since the netdev-dpdk
> >> > module is hard-coded to assert the non-pmd threads using core 0, such
> >> > change will cause abortion of OVS.
> >> >
> >> > This commit fixes the assertion and allows changing NON_PMD_CORE_ID.
> >> >
> >> > Signed-off-by: Alex Wang 
> >> > ---
> >> >  lib/dpctl.c   |2 +-
> >> >  lib/dpif-netdev.h |1 -
> >> >  lib/netdev-dpdk.c |   12 ++--
> >> >  lib/netdev-dpdk.h |2 ++
> >> >  4 files changed, 9 insertions(+), 8 deletions(-)
> >> >
> >> > diff --git a/lib/dpctl.c b/lib/dpctl.c
> >> > index 4c2614b..125023c 100644
> >> > --- a/lib/dpctl.c
> >> > +++ b/lib/dpctl.c
> >> > @@ -31,11 +31,11 @@
> >> >  #include "dirs.h"
> >> >  #include "dpctl.h"
> >> >  #include "dpif.h"
> >> > -#include "dpif-netdev.h"
> >> >  #include "dynamic-string.h"
> >> >  #include "flow.h"
> >> >  #include "match.h"
> >> >  #include "netdev.h"
> >> > +#include "netdev-dpdk.h"
> >> >  #include "netlink.h"
> >> >  #include "odp-util.h"
> >> >  #include "ofp-parse.h"
> >> > diff --git a/lib/dpif-netdev.h b/lib/dpif-netdev.h
> >> > index d811507..410fcfa 100644
> >> > --- a/lib/dpif-netdev.h
> >> > +++ b/lib/dpif-netdev.h
> >> > @@ -42,7 +42,6 @@ static inline void dp_packet_pad(struct ofpbuf *b)
> >> >
> >> >  #define NR_QUEUE   1
> >> >  #define NR_PMD_THREADS 1
> >> > -#define NON_PMD_CORE_ID 0
> >> >
> >> >  #ifdef  __cplusplus
> >> >  }
> >> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> > index 0ede200..391695f 100644
> >> > --- a/lib/netdev-dpdk.c
> >> > +++ b/lib/netdev-dpdk.c
> >> > @@ -1553,8 +1553,8 @@ pmd_thread_setaffinity_cpu(int cpu)
> >> >  VLOG_ERR("Thread affinity error %d",err);
> >> >  return err;
> >> >  }
> >> > -/* lcore_id 0 is reseved for use by non pmd threads. */
> >> > -ovs_assert(cpu);
> >> > +/* NON_PMD_CORE_ID is reserved for use by non pmd threads. */
> >> > +ovs_assert(cpu != NON_PMD_CORE_ID);
> >>
> >> >  RTE_PER_LCORE(_lcore_id) = cpu;
> >> >
> >> >  return 0;
> >> > @@ -1563,13 +1563,13 @@ pmd_thread_setaffinity_cpu(int cpu)
> >> >  void
> >> >  thread_set_nonpmd(void)
> >> >  {
> >> > -/* We have to use 0 to allow non pmd threads to perform certain
> >> > DPDK
> >> > - * operations, like rte_eth_dev_configure(). */
> >> > -RTE_PER_LCORE(_lcore_id) = 

Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports

2015-02-12 Thread Traynor, Kevin
> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Wednesday, January 21, 2015 11:19 AM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
> 
> On Thu, Jan 08, 2015 at 11:05:02PM +, Kevin Traynor wrote:
> > This patch adds support for a new port type to userspace datapath
> > called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> > of virtio-net devices to its associated dpdkvhost port. Instructions
> > for use are in INSTALL.DPDK.
> >
> > This has been tested on Intel multi-core platforms and with clients
> > that have virtio-net interfaces.
> >
> >  ver 6:
> >- rebased with master
> >- modified to use DPDK v1.8.0 vhost library
> >- reworked for review comments
> >  ver 5:
> >- rebased against latest master
> >  ver 4:
> >- added eventfd_link.h and eventfd_link.c to EXTRA_DIST in
> >  utilities/automake.mk
> >- rebased with master to work with DPDK 1.7 ver 3:
> >- rebased with master
> >  ver 2:
> >- rebased with master
> >
> > Signed-off-by: Ciara Loftus 
> > Signed-off-by: Kevin Traynor 
> > Signed-off-by: Maryam Tahhan 
> > ---
> >  INSTALL.DPDK.md |  236 +
> >  Makefile.am |4 +
> >  lib/automake.mk |1 +
> >  lib/netdev-dpdk.c   |  649 
> > +++
> >  lib/netdev.c|3 +-
> >  utilities/automake.mk   |3 +-
> >  utilities/qemu-wrap.py  |  389 
> >  vswitchd/ovs-vswitchd.c |4 +-
> >  8 files changed, 1177 insertions(+), 112 deletions(-)
> >  mode change 100644 => 100755 lib/netdev-dpdk.c
> >  create mode 100755 utilities/qemu-wrap.py
> >
> > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > index 2cc7636..da8116d 100644
> > --- a/INSTALL.DPDK.md
> > +++ b/INSTALL.DPDK.md
> > @@ -17,6 +17,7 @@ Building and Installing:
> >  
> >
> >  Required DPDK 1.7
> > +Optional `fuse`, `fuse-devel`
> >
> >  1. Configure build & install DPDK:
> >1. Set `$DPDK_DIR`
> > @@ -264,6 +265,241 @@ A general rule of thumb for better performance is 
> > that the client
> >  application should not be assigned the same dpdk core mask "-c" as
> >  the vswitchd.
> >
> > +DPDK vHost:
> > +---
> > +
> > +Prerequisites:
> > +1.  DPDK 1.8 with vHost support enabled and recompile OVS as above.
> > +
> > + Update `config/common_linuxapp` so that DPDK is built with vHost
> > + libraries:
> > +
> > + `CONFIG_RTE_LIBRTE_VHOST=y`
> > +
> > +2.  Insert the Fuse module:
> > +
> > +  `modprobe fuse`
> > +
> > +3.  Build and insert the `eventfd_link` module:
> > +
> > + `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
> > + `make`
> > + `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
> > +
> > +4.  Remove /dev/vhost-net character device:
> > +
> > +  `rm -rf /dev/vhost-net`
> 
> I think it's not a good idea to tell people to do this,
> best to drop this section and put "with standard vhost"
> here instead.

Not clear what you'd like to see dropped? This will be necessary 
if using the default vhost file, so can change to make that clearer.

> 
> > +
> > +Following the steps above to create a bridge, you can now add DPDK vHost
> > +as a port to the vswitch.
> > +
> > +`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 
> > type=dpdkvhost`
> > +
> > +Unlike DPDK ring ports, DPDK vHost ports can have arbitrary names:
> > +
> > +`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC 
> > type=dpdkvhost`
> > +
> > +However, please note that when attaching userspace devices to QEMU, the
> > +name provided during the add-port operation must match the ifname parameter
> > +on the QEMU command line.
> > +
> > +DPDK vHost VM configuration:
> > +
> > +
> > +1. Configure virtio-net adaptors:
> > +   The guest must be configured with virtio-net adapters and offloads
> > +   MUST BE DISABLED.
> 
> Any plans to address this?

There's no plans at present

> 
> > +This means the following parameters should be passed
> > +   to the QEMU binary:
> > +
> > + ```
> > + -netdev tap,id=,script=no,downscript=no,ifname=

Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports

2015-02-13 Thread Traynor, Kevin

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Thursday, February 12, 2015 2:09 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
> 

Michael, thanks for the feedback - will look to see what we can do on these.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] vswitchd: simplify dpdk option parsing.

2015-02-26 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Wednesday, February 25, 2015 4:47 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH] vswitchd: simplify dpdk option parsing.
> 
> With this commit the '--dpdk' option doesn't need to be at the beginning
> of the command line. Furthermode, the code that calls 'rte_eal_init()'
> can be slightly simplified by using the 'optind' variable. The change is
> totally backward compatible
> 

I've tested various combinations of this and have seen no issues. 

> Documentation and manpages are updated accordingly.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
>  INSTALL.DPDK.md| 14 --
>  lib/netdev-dpdk.c  | 41 +
>  lib/netdev-dpdk.h  | 20 +---
>  vswitchd/ovs-vswitchd.8.in |  9 ++---
>  vswitchd/ovs-vswitchd.c| 11 +--
>  5 files changed, 49 insertions(+), 46 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 4c443e5..72318a8 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -77,7 +77,7 @@ Using the DPDK with ovs-vswitchd:
> 
>  1. Setup system boot
> Add the following options to the kernel bootline:
> -
> +
> `default_hugepagesz=1GB hugepagesz=1G hugepages=1`
> 
>  2. Setup DPDK devices:
> @@ -139,10 +139,12 @@ Using the DPDK with ovs-vswitchd:
> 
>  5. Start vswitchd:
> 
> -   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
> -   argument. This needs to be first argument passed to vswitchd process.
> -   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
> -   for dpdk initialization.
> +   To initialize DPDK support the '--dpdk' option must be used. It is
> +   followed by suboptions that are passed to the DPDK library. The
> suboptions
> +   list is terminated by `--`: the remaining options are parsed by vswitchd.
> +
> +   Please not that `-c` and `-n` DPDK suboptions are required (although `-c`
> +   is ignored by OVS)

typo - "Please note"

"-c is ignored by OVS" is true, but it may mislead that it has no 
impact on OVS operation - it impacts the core affinity of the vswitchd 
process, so might be worth adding that.

thanks,
Kevin.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] vswitchd: simplify dpdk option parsing.

2015-03-03 Thread Traynor, Kevin

> -Original Message-
> From: Daniele Di Proietto [mailto:diproiet...@vmware.com]
> Sent: Thursday, February 26, 2015 1:54 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] vswitchd: simplify dpdk option parsing.
> 
> 
> > On 26 Feb 2015, at 13:19, Traynor, Kevin  wrote:
> >
> >
> >> -Original Message-
> >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> >> Proietto
> >> Sent: Wednesday, February 25, 2015 4:47 PM
> >> To: dev@openvswitch.org
> >> Subject: [ovs-dev] [PATCH] vswitchd: simplify dpdk option parsing.
> >>
> >> With this commit the '--dpdk' option doesn't need to be at the beginning
> >> of the command line. Furthermode, the code that calls 'rte_eal_init()'
> >> can be slightly simplified by using the 'optind' variable. The change is
> >> totally backward compatible
> >>
> >
> > I've tested various combinations of this and have seen no issues.
> >
> 
> Thank you for testing this, it's really appreciated.
> 
> >> Documentation and manpages are updated accordingly.
> >>
> >> Signed-off-by: Daniele Di Proietto 
> >> ---
> >> INSTALL.DPDK.md| 14 --
> >> lib/netdev-dpdk.c  | 41 +
> >> lib/netdev-dpdk.h  | 20 +---
> >> vswitchd/ovs-vswitchd.8.in |  9 ++---
> >> vswitchd/ovs-vswitchd.c| 11 +--
> >> 5 files changed, 49 insertions(+), 46 deletions(-)
> >>
> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> >> index 4c443e5..72318a8 100644
> >> --- a/INSTALL.DPDK.md
> >> +++ b/INSTALL.DPDK.md
> >> @@ -77,7 +77,7 @@ Using the DPDK with ovs-vswitchd:
> >>
> >> 1. Setup system boot
> >>Add the following options to the kernel bootline:
> >> -
> >> +
> >>`default_hugepagesz=1GB hugepagesz=1G hugepages=1`
> >>
> >> 2. Setup DPDK devices:
> >> @@ -139,10 +139,12 @@ Using the DPDK with ovs-vswitchd:
> >>
> >> 5. Start vswitchd:
> >>
> >> -   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
> >> -   argument. This needs to be first argument passed to vswitchd process.
> >> -   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
> >> -   for dpdk initialization.
> >> +   To initialize DPDK support the '--dpdk' option must be used. It is
> >> +   followed by suboptions that are passed to the DPDK library. The
> >> suboptions
> >> +   list is terminated by `--`: the remaining options are parsed by
> vswitchd.
> >> +
> >> +   Please not that `-c` and `-n` DPDK suboptions are required (although
> `-c`
> >> +   is ignored by OVS)
> >
> > typo - "Please note”
> >
> 
> Oops, I’ll fix that
> 
> > "-c is ignored by OVS" is true, but it may mislead that it has no
> > impact on OVS operation - it impacts the core affinity of the vswitchd
> > process, so might be worth adding that.
> >
> 
> -c is passed to DPDK, but it is ignored by OVS. To configure the threads and
> the affinity we currently use the database key "other_config:pmd-cpu-mask”
> (INSTALL.DPDK.md “Performance tuning”). Perhaps we should add a reference to
> the database key here? That’s probably going to change somehow, in an effort
> to ease OVS DPDK deployments. Thoughts?

The db key is just for the PMD(s). All the other vswitchd threads run on the 
lowest core specified in the -c option, so probably we should mention that.

> 
> Thanks,
> 
> Daniele

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports

2015-03-04 Thread Traynor, Kevin

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Thursday, February 12, 2015 2:09 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
> 
> On Thu, Feb 12, 2015 at 12:59:17PM +, Traynor, Kevin wrote:
> > > -Original Message-
> > > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > > Sent: Wednesday, January 21, 2015 11:19 AM
> > > To: Traynor, Kevin
> > > Cc: dev@openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost
> ports
> > >
> > > On Thu, Jan 08, 2015 at 11:05:02PM +, Kevin Traynor wrote:
> > > > This patch adds support for a new port type to userspace datapath
> > > > called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> > > > of virtio-net devices to its associated dpdkvhost port. Instructions
> > > > for use are in INSTALL.DPDK.
> > > >
> > > > This has been tested on Intel multi-core platforms and with clients
> > > > that have virtio-net interfaces.
> > > >
> > > >  ver 6:
> > > >- rebased with master
> > > >- modified to use DPDK v1.8.0 vhost library
> > > >- reworked for review comments
> > > >  ver 5:
> > > >- rebased against latest master
> > > >  ver 4:
> > > >- added eventfd_link.h and eventfd_link.c to EXTRA_DIST in
> > > >  utilities/automake.mk
> > > >- rebased with master to work with DPDK 1.7 ver 3:
> > > >- rebased with master
> > > >  ver 2:
> > > >- rebased with master
> > > >
> > > > Signed-off-by: Ciara Loftus 
> > > > Signed-off-by: Kevin Traynor 
> > > > Signed-off-by: Maryam Tahhan 
> > > > ---
> > > >  INSTALL.DPDK.md |  236 +
> > > >  Makefile.am |4 +
> > > >  lib/automake.mk |1 +
> > > >  lib/netdev-dpdk.c   |  649
> +++
> > > >  lib/netdev.c|3 +-
> > > >  utilities/automake.mk   |3 +-
> > > >  utilities/qemu-wrap.py  |  389 
> > > >  vswitchd/ovs-vswitchd.c |4 +-
> > > >  8 files changed, 1177 insertions(+), 112 deletions(-)
> > > >  mode change 100644 => 100755 lib/netdev-dpdk.c
> > > >  create mode 100755 utilities/qemu-wrap.py
> > > >
> > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > > > index 2cc7636..da8116d 100644
> > > > --- a/INSTALL.DPDK.md
> > > > +++ b/INSTALL.DPDK.md
> > > > @@ -17,6 +17,7 @@ Building and Installing:
> > > >  
> > > >
> > > >  Required DPDK 1.7
> > > > +Optional `fuse`, `fuse-devel`
> > > >
> > > >  1. Configure build & install DPDK:
> > > >1. Set `$DPDK_DIR`
> > > > @@ -264,6 +265,241 @@ A general rule of thumb for better performance is
> that the client
> > > >  application should not be assigned the same dpdk core mask "-c" as
> > > >  the vswitchd.
> > > >
> > > > +DPDK vHost:
> > > > +---
> > > > +
> > > > +Prerequisites:
> > > > +1.  DPDK 1.8 with vHost support enabled and recompile OVS as above.
> > > > +
> > > > + Update `config/common_linuxapp` so that DPDK is built with vHost
> > > > + libraries:
> > > > +
> > > > + `CONFIG_RTE_LIBRTE_VHOST=y`
> > > > +
> > > > +2.  Insert the Fuse module:
> > > > +
> > > > +  `modprobe fuse`
> > > > +
> > > > +3.  Build and insert the `eventfd_link` module:
> > > > +
> > > > + `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
> > > > + `make`
> > > > + `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
> > > > +
> > > > +4.  Remove /dev/vhost-net character device:
> > > > +
> > > > +  `rm -rf /dev/vhost-net`
> > >
> > > I think it's not a good idea to tell people to do this,
> > > best to drop this section and put "with standard vhost"
> > > here instead.
> >
> > Not clear what you'd like to see dropped?

> > This will be necessary
> > if using the

Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports

2015-03-06 Thread Traynor, Kevin

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Wednesday, March 4, 2015 6:54 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
> 
> On Wed, Mar 04, 2015 at 06:00:51PM +, Traynor, Kevin wrote:
> > > > > > + 2. Disable SELinux or set to permissive mode
> > > > >
> > > > >
> > > > > It's a work-around, but the right thing to do is really
> > > > > to write up correct selinux policies.
> > > > > Any plans to do this?
> > > >
> > > > No plans for this at present
> > >
> > > That's pretty bad, so one has to give up some security to
> > > gain some other feature. How does one make a call?
> > > Why don't you want to fix it?
> >
> > We haven't been able to get to do this now. I'm not clear yet
> > if this will be needed for vhost-user?
> 
> Well normally yes. Updating selinux policies is easy - you set it to
> record what's going on, then package it.

ok, I've requested that DPDK add this as it will be useful for any users 
of the DPDK vhost libs.

> > > > It's not something we've looked at, but will bring it up with the dpdk
> team
> > >
> > > Please do, wrapper scripts simply can't be supported by libvirt.
> >
> > The vhostfd could be put manually into libvirt or the wrapper script could
> be used.
> > We didn't see another way to get it into the XML?
> 
> If using libvirt, one just sets the backend path for tun and/or vhost:
>   
> 
> see https://libvirt.org/formatdomain.html

thanks - I'll try that,

Kevin.

> 
> 
> > >
> > > > >
> > > > > > +
> > > > > > +DPDK vHost VM configuration with QEMU wrapper:
> > > > >
> > > > > ...
> > > > >
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] vswitchd: simplify dpdk option parsing.

2015-03-06 Thread Traynor, Kevin

> -Original Message-
> From: Daniele Di Proietto [mailto:diproiet...@vmware.com]
> Sent: Wednesday, March 4, 2015 6:59 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] vswitchd: simplify dpdk option parsing.
> 
> 
> > On 3 Mar 2015, at 09:17, Traynor, Kevin  wrote:
> >
> >>
> >> -Original Message-
> >> From: Daniele Di Proietto [mailto:diproiet...@vmware.com]
> >> Sent: Thursday, February 26, 2015 1:54 PM
> >> To: Traynor, Kevin
> >> Cc: dev@openvswitch.org
> >> Subject: Re: [ovs-dev] [PATCH] vswitchd: simplify dpdk option parsing.
> >>
> >>
> >>> On 26 Feb 2015, at 13:19, Traynor, Kevin  wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> >>>> Proietto
> >>>> Sent: Wednesday, February 25, 2015 4:47 PM
> >>>> To: dev@openvswitch.org
> >>>> Subject: [ovs-dev] [PATCH] vswitchd: simplify dpdk option parsing.
> >>>>
> >>>> With this commit the '--dpdk' option doesn't need to be at the beginning
> >>>> of the command line. Furthermode, the code that calls 'rte_eal_init()'
> >>>> can be slightly simplified by using the 'optind' variable. The change is
> >>>> totally backward compatible
> >>>>
> >>>
> >>> I've tested various combinations of this and have seen no issues.
> >>>
> >>
> >> Thank you for testing this, it's really appreciated.
> >>
> >>>> Documentation and manpages are updated accordingly.
> >>>>
> >>>> Signed-off-by: Daniele Di Proietto 
> >>>> ---
> >>>> INSTALL.DPDK.md| 14 --
> >>>> lib/netdev-dpdk.c  | 41 +---
> -
> >>>> lib/netdev-dpdk.h  | 20 +---
> >>>> vswitchd/ovs-vswitchd.8.in |  9 ++---
> >>>> vswitchd/ovs-vswitchd.c| 11 +--
> >>>> 5 files changed, 49 insertions(+), 46 deletions(-)
> >>>>
> >>>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> >>>> index 4c443e5..72318a8 100644
> >>>> --- a/INSTALL.DPDK.md
> >>>> +++ b/INSTALL.DPDK.md
> >>>> @@ -77,7 +77,7 @@ Using the DPDK with ovs-vswitchd:
> >>>>
> >>>> 1. Setup system boot
> >>>>   Add the following options to the kernel bootline:
> >>>> -
> >>>> +
> >>>>   `default_hugepagesz=1GB hugepagesz=1G hugepages=1`
> >>>>
> >>>> 2. Setup DPDK devices:
> >>>> @@ -139,10 +139,12 @@ Using the DPDK with ovs-vswitchd:
> >>>>
> >>>> 5. Start vswitchd:
> >>>>
> >>>> -   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
> >>>> -   argument. This needs to be first argument passed to vswitchd
> process.
> >>>> -   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
> >>>> -   for dpdk initialization.
> >>>> +   To initialize DPDK support the '--dpdk' option must be used. It is
> >>>> +   followed by suboptions that are passed to the DPDK library. The
> >>>> suboptions
> >>>> +   list is terminated by `--`: the remaining options are parsed by
> >> vswitchd.
> >>>> +
> >>>> +   Please not that `-c` and `-n` DPDK suboptions are required (although
> >> `-c`
> >>>> +   is ignored by OVS)
> >>>
> >>> typo - "Please note”
> >>>
> >>
> >> Oops, I’ll fix that
> >>
> >>> "-c is ignored by OVS" is true, but it may mislead that it has no
> >>> impact on OVS operation - it impacts the core affinity of the vswitchd
> >>> process, so might be worth adding that.
> >>>
> >>
> >> -c is passed to DPDK, but it is ignored by OVS. To configure the threads
> and
> >> the affinity we currently use the database key "other_config:pmd-cpu-mask”
> >> (INSTALL.DPDK.md “Performance tuning”). Perhaps we should add a reference
> to
> >> the database key here? That’s probably going to change somehow, in an
> effort
> >> to ease OVS DPDK deployments. Thoughts?
> >
> > The db key is just for the PMD(s). All the other vswitchd threads run on
> the
> > lowest core specified in the -c option, so probably we should mention that.
> >
> 
> You’re right, they do. If you agree I'd prefer changing that (seems a weird
> behaviour). I’ll post something soon

Yeah, at the moment there's a combination of the -c param, the db key and the 
NON_PMD_CORE_ID which is not ideal. It would need some thought how best to 
consolidate themsomething to ponder for the weekend ;-)

> 
> Thanks
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v8] netdev-dpdk: add dpdk vhost-cuse ports

2015-03-12 Thread Traynor, Kevin

> -Original Message-
> From: Pravin Shelar [mailto:pshe...@nicira.com]
> Sent: Thursday, March 12, 2015 7:20 AM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v8] netdev-dpdk: add dpdk vhost-cuse ports
> 
> On Thu, Mar 5, 2015 at 1:42 PM, Kevin Traynor 
> wrote:
> > This patch adds support for a new port type to userspace datapath
> > called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> > of virtio-net devices to its associated dpdkvhost port. Instructions
> > for use are in INSTALL.DPDK.
> >
> > This has been tested on Intel multi-core platforms and with clients
> > that have virtio-net interfaces.
> >
> 
> What is required QEMU version for this patch ?

It's been tested with 2.1.0 and 2.2.0. It has also previously been tested 
with 1.6.2 but only the '-mem-path /dev/hugepages -mem-prealloc' flags are 
needed to share the hugepage for that version. It won't work with 2.0.0 as 
there isn't a way to share the hugepage.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Some questions about DPDK+OVS

2015-03-13 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff
> Sent: Friday, March 13, 2015 4:02 PM
> To: lin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] Some questions about DPDK+OVS
> 
> On Fri, Mar 13, 2015 at 04:16:25PM +0800, lin wrote:
> > 1.DPDK+OVS take advantage of OVS's user space bridge to do packet
> > forwarding, are there any plans to support user space vxlan tunneling?
> 
> Already supported, please read README-native-tunneling.md.
> 
> > 2.Are there any plans to merge user space vhost driver into OVS mainline?
> 
> I believe that's underway.
> 
> > Also, how about the offload features exposure to the VM, like, TCP offload,
> > checksum offload, scatter-gather, etc?
> 
> I don't know.
> 
> > 3.Are there any plans to support QOS and ACL features in DPDK OVS on user
> > space?
> 
> ACLs are supported.
> 
> I do not know of anyone working on QoS.

There's an RFC patch for plugging in a QoS scheduler here:
http://openvswitch.org/pipermail/dev/2015-January/050642.html

All feedback welcome.

> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 0/6] DPDK: simplify configuration

2015-03-16 Thread Traynor, Kevin
> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Thursday, March 12, 2015 6:05 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 0/6] DPDK: simplify configuration
> 
> This series improves OVS configuration with DPDK in three ways:
> 
> * netdev-dpdk is patched to work on smaller systems (without 1GB hugepages)
>   or with smaller NICs (without lots of transmission queues).
> * the 'other_config:nonpmd-cpu-mask' key is introduced: it can be used to
>   limit OVS non PMD threads to a particular set of cores.
> * the 'other_config:n-pmd-cores' key is introduced: it allows setting the
>   number of PMD threads without specifing a CPU mask.

Hi, I've reviewed this patchset - few comments/questions on it... 

I haven't tested yet - but I'm wondering what is the impact to the dpdk -c 
parameter. Is it no longer used for OVS? 

At present the NON_PMD_CORE_ID define overrides the db settings (which is 
clearly documented). Is it needed now that a db key is available? Perhaps 
it would make things simpler that the define is overridden when a key is 
specified?

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 4/6] dpif-netdev: Allow controlling non PMD threads' affinity

2015-03-16 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Thursday, March 12, 2015 6:05 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 4/6] dpif-netdev: Allow controlling non PMD
> threads' affinity
> 
> This commit introduces the 'other_config:nonpmd-cpu-mask' key to control
> the CPU affinity of non PMD threads.
> 
> Signed-off-by: Daniele Di Proietto 
> ---

[snip]

>  static char *
> diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
> index 7b4878eb..3612766 100644
> --- a/lib/dpif-provider.h
> +++ b/lib/dpif-provider.h
> @@ -309,9 +309,10 @@ struct dpif_class {
>  /* If 'dpif' creates its own I/O polling threads, refreshes poll threads
>   * configuration.  'n_rxqs' configures the number of rx_queues, which
>   * are distributed among threads.  'cmask' configures the cpu mask
> - * for setting the polling threads' cpu affinity. */
> + * for setting the polling threads' cpu affinity.  'cmask_nonpmd'
> + * configures the cpumask of the remaining OVS threads */
>  int (*poll_threads_set)(struct dpif *dpif, unsigned int n_rxqs,
> -const char *cmask);
> +const char *cmask, const char *cmask_nonpmd);

You should probably rename cmask to cmask_pmd similar to how you have made the 
names more specific elsewhere. 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/6] netdev-dpdk: Adapt the requested number of tx and rx queues.

2015-03-16 Thread Traynor, Kevin

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Thursday, March 12, 2015 6:05 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 2/6] netdev-dpdk: Adapt the requested number of tx
> and rx queues.
> 
> This commit changes the semantics of 'netdev_set_multiq()' to allow OVS
> DPDK to run on device with limited multi queue support.

This is great, because on a dual socket system with an 18 core Haswell and HT 
enabled you could be looking for 72 tx queues.

> 
> * If a netdev doesn't have the requested number of rxqs it can simply
>   inform the datapath without failing.
> * If a netdev doesn't have the requested number of txqs it should try
>   to create as many as possible and use locking.
> 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/netdev-dpdk.c | 94 +++--
> --
>  lib/netdev-provider.h | 11 ++
>  lib/netdev.c  | 10 ++
>  vswitchd/vswitch.xml  |  2 +-
>  4 files changed, 80 insertions(+), 37 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 54bc318..2278377 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c

[snip]

> @@ -656,8 +684,10 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned
> int n_txq,
>  netdev->up.n_txq = n_txq;
>  netdev->up.n_rxq = n_rxq;
>  rte_free(netdev->tx_q);
> -netdev_dpdk_alloc_txq(netdev, n_txq);
>  err = dpdk_eth_dev_init(netdev);
> +netdev_dpdk_alloc_txq(netdev, netdev->real_n_txq);
> +
> +netdev->txq_needs_locking = netdev->real_n_txq != netdev->up.n_txq;

Probably no point in allocing here if you have been returned an error from 
dpdk_eth_dev_init(). You could just skip to the mutex_unlocking

> 
>  ovs_mutex_unlock(&netdev->mutex);
>  ovs_mutex_unlock(&dpdk_mutex);
> @@ -921,12 +951,21 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
>  }
> 

[snip]

> 
> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, MULTIQ, SEND)  \
> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT)  \
>  { \
>  NAME, \
>  INIT,   /* init */\
> @@ -1429,9 +1455,9 @@ unlock_dpdk:
>  NULL, /* push header */   \
>  NULL, /* pop header */\
>  netdev_dpdk_get_numa_id,/* get_numa_id */ \
> -MULTIQ, /* set_multiq */  \
> +netdev_dpdk_set_multiq, /* set_multiq */  \

I don’t think the netdev_dpdk_set_multiq() is needed for dpdkr as at 
present you can't change the number of q's for dpdkr. It doesn't do 
any harm either. Is there a reason you put it in e.g. future proofing?

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v8] netdev-dpdk: add dpdk vhost-cuse ports

2015-03-16 Thread Traynor, Kevin

> -Original Message-
> From: Pravin Shelar [mailto:pshe...@nicira.com]
> Sent: Monday, March 16, 2015 2:46 AM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v8] netdev-dpdk: add dpdk vhost-cuse ports
> 
> On Thu, Mar 12, 2015 at 7:18 AM, Traynor, Kevin 
> wrote:
> >
> >> -Original Message-
> >> From: Pravin Shelar [mailto:pshe...@nicira.com]
> >> Sent: Thursday, March 12, 2015 7:20 AM
> >> To: Traynor, Kevin
> >> Cc: dev@openvswitch.org
> >> Subject: Re: [ovs-dev] [PATCH v8] netdev-dpdk: add dpdk vhost-cuse ports
> >>
> >> On Thu, Mar 5, 2015 at 1:42 PM, Kevin Traynor 
> >> wrote:
> >> > This patch adds support for a new port type to userspace datapath
> >> > called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> >> > of virtio-net devices to its associated dpdkvhost port. Instructions
> >> > for use are in INSTALL.DPDK.
> >> >
> >> > This has been tested on Intel multi-core platforms and with clients
> >> > that have virtio-net interfaces.
> >> >
> >>
> >> What is required QEMU version for this patch ?
> >
> > It's been tested with 2.1.0 and 2.2.0. It has also previously been tested
> > with 1.6.2 but only the '-mem-path /dev/hugepages -mem-prealloc' flags are
> > needed to share the hugepage for that version. It won't work with 2.0.0 as
> > there isn't a way to share the hugepage.
> >
> 
> Thanks for the info. I will try it this week.
> I had another question about netdev-dpdk mutex, why is it converted to
> spin_lock?

We changed it based on this review comment:
' - ovs-mutex is bit heavy weight, can you use rte-spin-lock '

There was a few different mutexes, so we thought it was a general comment and 
converted them to spinlocks. 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v8] netdev-dpdk: add dpdk vhost-cuse ports

2015-03-16 Thread Traynor, Kevin

> -Original Message-
> From: Pravin Shelar [mailto:pshe...@nicira.com]
> Sent: Monday, March 16, 2015 5:39 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v8] netdev-dpdk: add dpdk vhost-cuse ports
> 
> On Mon, Mar 16, 2015 at 10:27 AM, Traynor, Kevin
>  wrote:
> >
> >> -Original Message-
> >> From: Pravin Shelar [mailto:pshe...@nicira.com]
> >> Sent: Monday, March 16, 2015 2:46 AM
> >> To: Traynor, Kevin
> >> Cc: dev@openvswitch.org
> >> Subject: Re: [ovs-dev] [PATCH v8] netdev-dpdk: add dpdk vhost-cuse ports
> >>
> >> On Thu, Mar 12, 2015 at 7:18 AM, Traynor, Kevin 
> >> wrote:
> >> >
> >> >> -Original Message-
> >> >> From: Pravin Shelar [mailto:pshe...@nicira.com]
> >> >> Sent: Thursday, March 12, 2015 7:20 AM
> >> >> To: Traynor, Kevin
> >> >> Cc: dev@openvswitch.org
> >> >> Subject: Re: [ovs-dev] [PATCH v8] netdev-dpdk: add dpdk vhost-cuse
> ports
> >> >>
> >> >> On Thu, Mar 5, 2015 at 1:42 PM, Kevin Traynor 
> >> >> wrote:
> >> >> > This patch adds support for a new port type to userspace datapath
> >> >> > called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> >> >> > of virtio-net devices to its associated dpdkvhost port. Instructions
> >> >> > for use are in INSTALL.DPDK.
> >> >> >
> >> >> > This has been tested on Intel multi-core platforms and with clients
> >> >> > that have virtio-net interfaces.
> >> >> >
> >> >>
> >> >> What is required QEMU version for this patch ?
> >> >
> >> > It's been tested with 2.1.0 and 2.2.0. It has also previously been
> tested
> >> > with 1.6.2 but only the '-mem-path /dev/hugepages -mem-prealloc' flags
> are
> >> > needed to share the hugepage for that version. It won't work with 2.0.0
> as
> >> > there isn't a way to share the hugepage.
> >> >
> >>
> >> Thanks for the info. I will try it this week.
> >> I had another question about netdev-dpdk mutex, why is it converted to
> >> spin_lock?
> >
> > We changed it based on this review comment:
> > ' - ovs-mutex is bit heavy weight, can you use rte-spin-lock '
> >
> But the mutex is not used in packet rx or tx path. So it should not matter.

Ok, understand. If you want me to re-spin I can do that - it should be just a 
search and replace.

> 
> > There was a few different mutexes, so we thought it was a general comment
> and
> > converted them to spinlocks.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


  1   2   >