[dpdk-dev] [PATCH] vhost: Expose virtio interrupt requirement on rte_vhos API

2017-09-23 Thread Jan Scheurich
Performance tests with the OVS DPDK datapath have shown that the tx throughput 
over a vhostuser port into a VM with an interrupt-based virtio driver is 
limited by the overhead incurred by virtio interrupts. The OVS PMD spends up to 
30% of its cycles in system calls kicking the eventfd. Also the core running 
the vCPU is heavily loaded with generating the virtio interrupts in KVM on the 
host and handling these interrupts in the virtio-net driver in the guest. This 
limits the throughput to about 500-700 Kpps with a single vCPU.

OVS is trying to address this issue by batching packets to a vhostuser port for 
some time to limit the virtio interrupt frequency. With a 50 us batching period 
we have measured an iperf3  throughput increase by 15% and a PMD utilization 
decrease from 45% to 30%.

On the other hand, guests using virtio PMDs do not profit from time-based tx 
batching. Instead they experience a 2-3% performance penalty and an average 
latency increase of 30-40 us. OVS therefore intends to apply time-based tx 
batching only for vhostuser tx queues that need to trigger virtio interrupts.

Today this information is hidden inside the rte_vhost library and not 
accessible to users of the API. This patch adds a function to the API to query 
it.

Signed-off-by: Jan Scheurich 

---

 lib/librte_vhost/rte_vhost.h | 12 
 lib/librte_vhost/vhost.c | 19 +++
 2 files changed, 31 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 8c974eb..d62338b 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -444,6 +444,18 @@ int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
  */
 uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);

+/**
+ * Does the virtio driver request interrupts for a vhost tx queue?
+ *
+ * @param vid
+ *  vhost device ID
+ * @param qid
+ *  virtio queue index in mq case
+ * @return
+ *  1 if true, 0 if false
+ */
+int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 0b6aa1c..bd1ebf9 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -503,3 +503,22 @@ struct virtio_net *

return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
 }
+
+int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid)
+{
+struct virtio_net *dev;
+struct vhost_virtqueue *vq;
+
+dev = get_device(vid);
+if (dev == NULL)
+return 0;
+
+vq = dev->virtqueue[qid];
+if (vq == NULL)
+return 0;
+
+if (unlikely(vq->enabled == 0 || vq->avail == NULL))
+return 0;
+
+return !(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
+}



[dpdk-dev] [PATCH v2] vhost: Expose virtio interrupt need on rte_vhost API

2017-09-23 Thread Jan Scheurich
Performance tests with the OVS DPDK datapath have shown 
that the tx throughput over a vhostuser port into a VM with 
an interrupt-based virtio driver is limited by the overhead
incurred by virtio interrupts. The OVS PMD spends up to 30% 
of its cycles in system calls kicking the eventfd. Also the core
running the vCPU is heavily loaded with generating the virtio
interrupts in KVM on the host and handling these interrupts
in the virtio-net driver in the guest. This limits the throughput
to about 500-700 Kpps with a single vCPU.

OVS is trying to address this issue by batching packets to a
vhostuser port for some time to limit the virtio interrupt 
frequency. With a 50 us batching period we have measured an
iperf3  throughput increase by 15% and a PMD utilization
decrease from 45% to 30%.

On the other hand, guests using virtio PMDs do not profit from
time-based tx batching. Instead they experience a 2-3%
performance penalty and an average latency increase of 
30-40 us. OVS therefore intends to apply time-based tx 
batching only for vhostuser tx queues that need to trigger
virtio interrupts.

Today this information is hidden inside the rte_vhost library
and not accessible to users of the API. This patch adds a
function to the API to query it.

Signed-off-by: Jan Scheurich 

---

v1 -> v2:
Fixed too long commit lines
Fixed white-space errors and warnings

 lib/librte_vhost/rte_vhost.h | 12 
 lib/librte_vhost/vhost.c | 19 +++
 2 files changed, 31 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 8c974eb..d62338b 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -444,6 +444,18 @@ int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
  */
 uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);

+/**
+ * Does the virtio driver request interrupts for a vhost tx queue?
+ *
+ * @param vid
+ *  vhost device ID
+ * @param qid
+ *  virtio queue index in mq case
+ * @return
+ *  1 if true, 0 if false
+ */
+int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 0b6aa1c..c6e636e 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -503,3 +503,22 @@ struct virtio_net *

return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
 }
+
+int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid)
+{
+   struct virtio_net *dev;
+   struct vhost_virtqueue *vq;
+
+   dev = get_device(vid);
+   if (dev == NULL)
+   return 0;
+
+   vq = dev->virtqueue[qid];
+   if (vq == NULL)
+   return 0;
+
+   if (unlikely(vq->enabled == 0 || vq->avail == NULL))
+   return 0;
+
+   return !(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
+}


[dpdk-dev] [PATCH v3] vhost: Expose virtio interrupt need on rte_vhost API

2017-09-23 Thread Jan Scheurich
Performance tests with the OVS DPDK datapath have shown 
that the tx throughput over a vhostuser port into a VM with 
an interrupt-based virtio driver is limited by the overhead
incurred by virtio interrupts. The OVS PMD spends up to 30% 
of its cycles in system calls kicking the eventfd. Also the core
running the vCPU is heavily loaded with generating the virtio
interrupts in KVM on the host and handling these interrupts
in the virtio-net driver in the guest. This limits the throughput
to about 500-700 Kpps with a single vCPU.

OVS is trying to address this issue by batching packets to a
vhostuser port for some time to limit the virtio interrupt 
frequency. With a 50 us batching period we have measured an
iperf3  throughput increase by 15% and a PMD utilization
decrease from 45% to 30%.

On the other hand, guests using virtio PMDs do not profit from
time-based tx batching. Instead they experience a 2-3%
performance penalty and an average latency increase of 
30-40 us. OVS therefore intends to apply time-based tx 
batching only for vhostuser tx queues that need to trigger
virtio interrupts.

Today this information is hidden inside the rte_vhost library
and not accessible to users of the API. This patch adds a
function to the API to query it.

Signed-off-by: Jan Scheurich 

---

v2 -> v3:
Fixed even more white-space errors and warnings
v1 -> v2:
Fixed too long commit lines
Fixed white-space errors and warnings

 lib/librte_vhost/rte_vhost.h | 12 
 lib/librte_vhost/vhost.c | 19 +++
 2 files changed, 31 insertions(+)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 8c974eb..d62338b 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -444,6 +444,18 @@ int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
  */
 uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
 
+/**
+ * Does the virtio driver request interrupts for a vhost tx queue?
+ *
+ * @param vid
+ *  vhost device ID
+ * @param qid
+ *  virtio queue index in mq case
+ * @return
+ *  1 if true, 0 if false
+ */
+int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 0b6aa1c..c6e636e 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -503,3 +503,22 @@ struct virtio_net *
 
return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
 }
+
+int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid)
+{
+   struct virtio_net *dev;
+   struct vhost_virtqueue *vq;
+
+   dev = get_device(vid);
+   if (dev == NULL)
+   return 0;
+
+   vq = dev->virtqueue[qid];
+   if (vq == NULL)
+   return 0;
+
+   if (unlikely(vq->enabled == 0 || vq->avail == NULL))
+   return 0;
+
+   return !(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
+}


Re: [dpdk-dev] [PATCH v3] vhost: Expose virtio interrupt need on rte_vhost API

2017-10-04 Thread Jan Scheurich
Friendly reminder: 
Could somebody please have a look at this patch now that the DPDK
summit is over?

Thanks, Jan

> -Original Message-
> From: Jan Scheurich
> Sent: Saturday, 23 September, 2017 22:32
> To: 'dev@dpdk.org' 
> Subject: [PATCH v3] vhost: Expose virtio interrupt need on rte_vhost API
> 
> Performance tests with the OVS DPDK datapath have shown
> that the tx throughput over a vhostuser port into a VM with
> an interrupt-based virtio driver is limited by the overhead
> incurred by virtio interrupts. The OVS PMD spends up to 30%
> of its cycles in system calls kicking the eventfd. Also the core
> running the vCPU is heavily loaded with generating the virtio
> interrupts in KVM on the host and handling these interrupts
> in the virtio-net driver in the guest. This limits the throughput
> to about 500-700 Kpps with a single vCPU.
> 
> OVS is trying to address this issue by batching packets to a
> vhostuser port for some time to limit the virtio interrupt
> frequency. With a 50 us batching period we have measured an
> iperf3  throughput increase by 15% and a PMD utilization
> decrease from 45% to 30%.
> 
> On the other hand, guests using virtio PMDs do not profit from
> time-based tx batching. Instead they experience a 2-3%
> performance penalty and an average latency increase of
> 30-40 us. OVS therefore intends to apply time-based tx
> batching only for vhostuser tx queues that need to trigger
> virtio interrupts.
> 
> Today this information is hidden inside the rte_vhost library
> and not accessible to users of the API. This patch adds a
> function to the API to query it.
> 
> Signed-off-by: Jan Scheurich 
> 
> ---
> 
> v2 -> v3:
>   Fixed even more white-space errors and warnings
> v1 -> v2:
>   Fixed too long commit lines
>   Fixed white-space errors and warnings
> 
>  lib/librte_vhost/rte_vhost.h | 12 
>  lib/librte_vhost/vhost.c | 19 +++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> index 8c974eb..d62338b 100644
> --- a/lib/librte_vhost/rte_vhost.h
> +++ b/lib/librte_vhost/rte_vhost.h
> @@ -444,6 +444,18 @@ int rte_vhost_get_vhost_vring(int vid, uint16_t 
> vring_idx,
>   */
>  uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
> 
> +/**
> + * Does the virtio driver request interrupts for a vhost tx queue?
> + *
> + * @param vid
> + *  vhost device ID
> + * @param qid
> + *  virtio queue index in mq case
> + * @return
> + *  1 if true, 0 if false
> + */
> +int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 0b6aa1c..c6e636e 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -503,3 +503,22 @@ struct virtio_net *
> 
>   return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
>  }
> +
> +int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid)
> +{
> + struct virtio_net *dev;
> + struct vhost_virtqueue *vq;
> +
> + dev = get_device(vid);
> + if (dev == NULL)
> + return 0;
> +
> + vq = dev->virtqueue[qid];
> + if (vq == NULL)
> + return 0;
> +
> + if (unlikely(vq->enabled == 0 || vq->avail == NULL))
> + return 0;
> +
> + return !(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
> +}


Re: [dpdk-dev] dev Digest, Vol 164, Issue 196

2017-10-16 Thread Jan Scheurich
> Date: Fri, 6 Oct 2017 14:40:15 +0800
> From: Yuanhan Liu 
> To: Jan Scheurich 
> Cc: "'dev@dpdk.org'" 
> Subject: Re: [dpdk-dev] [PATCH v3] vhost: Expose virtio interrupt need
>   on rte_vhost API
> Message-ID: <20171006064015.GD1545@yliu-home>
> Content-Type: text/plain; charset=us-ascii
> 
> On Sat, Sep 23, 2017 at 08:31:37PM +, Jan Scheurich wrote:
> ...
> > +int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid)
> > +{
> > +   struct virtio_net *dev;
> > +   struct vhost_virtqueue *vq;
> > +
> > +   dev = get_device(vid);
> > +   if (dev == NULL)
> > +   return 0;
> > +
> > +   vq = dev->virtqueue[qid];
> > +   if (vq == NULL)
> > +   return 0;
> > +
> > +   if (unlikely(vq->enabled == 0 || vq->avail == NULL))
> > +   return 0;
> > +
> > +   return !(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
> 
> Two comments here:
> 
> - as you see, the flags is stored at vq->avail, which is stored at the
>   shared memory. That means, the virtio driver could change the value
>   at any time.
> 
>   That said, this API should not be intended to be invoked once. Then
>   you have to invoke it repeatedly, which might be a bit costy.

That is understood.

> - OTOH, you might want to try "rte_vhost_get_vhost_vring" API, which
>   exposes the vq->avail, therefore, the interrupt flag is also exposed.

Thanks for the pointer. I checked and all necessary tools to access the 
interrupt flag are publicly available on the API. So the proposed new API 
function is not needed. I will obsolete the patch. 

Regards, Jan


Re: [dpdk-dev] [PATCH v3] vhost: Expose virtio interrupt need on rte_vhost API

2017-10-16 Thread Jan Scheurich
Fixed the subject.

> Date: Fri, 6 Oct 2017 14:40:15 +0800
> From: Yuanhan Liu 
> To: Jan Scheurich 
> Cc: "'dev@dpdk.org'" 
> Subject: Re: [dpdk-dev] [PATCH v3] vhost: Expose virtio interrupt need
>   on rte_vhost API
> Message-ID: <20171006064015.GD1545@yliu-home>
> Content-Type: text/plain; charset=us-ascii
> 
> On Sat, Sep 23, 2017 at 08:31:37PM +, Jan Scheurich wrote:
> ...
> > +int rte_vhost_tx_interrupt_requested(int vid, uint16_t qid)
> > +{
> > +   struct virtio_net *dev;
> > +   struct vhost_virtqueue *vq;
> > +
> > +   dev = get_device(vid);
> > +   if (dev == NULL)
> > +   return 0;
> > +
> > +   vq = dev->virtqueue[qid];
> > +   if (vq == NULL)
> > +   return 0;
> > +
> > +   if (unlikely(vq->enabled == 0 || vq->avail == NULL))
> > +   return 0;
> > +
> > +   return !(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT);
> 
> Two comments here:
> 
> - as you see, the flags is stored at vq->avail, which is stored at the
>   shared memory. That means, the virtio driver could change the value
>   at any time.
> 
>   That said, this API should not be intended to be invoked once. Then
>   you have to invoke it repeatedly, which might be a bit costy.

That is understood.

> - OTOH, you might want to try "rte_vhost_get_vhost_vring" API, which
>   exposes the vq->avail, therefore, the interrupt flag is also exposed.

Thanks for the pointer. I checked and all necessary tools to access the 
interrupt flag are publicly available on the API. So the proposed new API 
function is not needed. I will obsolete the patch. 

Regards, Jan


Re: [dpdk-dev] [ovs-dev] [PATCH] netdev-dpdk: defer MTU set after interface start

2017-12-13 Thread Jan Scheurich
> > The issue only arises with the qede PMD and 67fe6d635193
> > ("netdev-dpdk: use rte_eth_dev_set_mtu.")
>
> I had some more time to look at this today but this patch will break OVS DPDK 
> for existing supported DPDK ports during testing.
>
> I tested with an XL710 and the MTU will fail to apply, the device must be 
> stopped before configuration changes can be applied such as
> MTU. See log message below
>
> 2017-11-28T17:13:50Z|00086|dpdk|ERR|i40e_dev_mtu_set(): port 0 must be 
> stopped before configuration
> 2017-11-28T17:13:50Z|00087|netdev_dpdk|ERR|Interface dpdk0 MTU (1500) setup 
> error: Device or resource busy
>
> Did you come across it in your testing? I guess you didn’t see this for QEDE 
> pmd. In my case the DPDK devices will simply fail to add to the
> bridge.
>
> As is, the patch would not be acceptable as its breaking existing 
> functionality. It would have to be reworked to configure for device that
> cannot reconfigure when busy.
>
> Thanks
> Ian

I fully support the original decision to switch to using rte_dev_set_mtu() in 
OVS. The prior approach setting max_rx_pkt_len in rte_eth_dev_configure() was 
non-portable as that field has no well-defined semantics and its relation to 
the MTU size is different for almost every PMD.

I had a look at the qede PMD implementation of rte_dev_set_mtu() in DPDK 17.05 
and 17.11. It assumes that the device must be started because qede_set_mtu() 
unconditionally stops the device before and restarts it after changing the MTU 
value. Given that other PMDs like i40e don’t accept it after start, there is no 
possible way OVS can use rte_dev_set_mtu() that will work with all drivers.

I think it is a weakness of the rte_ethdev API that it does not specify clearly 
when API functions like rte_dev_set_mtu() may be called. From the documentation 
of error -EBUSY one can guess that the intention was to optionally support it 
when the device is started. Implicitly one could conclude that it MUST be 
supported when the device stopped. That is logical and also what most PMDs do.

I would say the qede PMD should be fixed. It should accept the 
rte_dev_set_mtu() at any time between rte_eth_dev_configure() and 
rte_eth_dev_start() (and optionally also after start).

BR, Jan