[dpdk-dev] [PATCH v4 06/14] vhost: flush vhost enqueue shadow ring by batch

2019-10-08 Thread Marvin Liu
Buffer vhost enqueue shadow ring update, flush shadow ring until buffered descriptors number exceed one batch. Thus virtio can receive packets at a faster frequency. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index e50e137ca..18a207fc6 100644

[dpdk-dev] [PATCH v4 05/14] vhost: add batch dequeue function

2019-10-08 Thread Marvin Liu
Add batch dequeue function like enqueue function for packed ring, batch dequeue function will not support chained descritpors, single packet dequeue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index e241436c7..e50e137ca

[dpdk-dev] [PATCH v4 08/14] vhost: buffer vhost dequeue shadow ring

2019-10-08 Thread Marvin Liu
: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 7bf9ff9b7..f62e9ec3f 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -42,6 +42,8 @@ #define PACKED_RX_USED_FLAG (0ULL | VRING_DESC_F_AVAIL | VRING_DESC_F_USED

[dpdk-dev] [PATCH v4 10/14] vhost: optimize enqueue function of packed ring

2019-10-08 Thread Marvin Liu
Optimize vhost device Tx datapath by separate functions. Packets can be filled into one descriptor will be handled by batch and others will be handled one by one as before. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 1b0fa2c64

[dpdk-dev] [PATCH v4 07/14] vhost: add flush function for batch enqueue

2019-10-08 Thread Marvin Liu
Flush used flags when batched enqueue function is finished. Descriptor's flags are pre-calculated as they will be reset by vhost. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 18a207fc6..7bf9ff9b7 100644 --- a/lib/librte_vhost/vhost.h +++

[dpdk-dev] [PATCH v4 09/14] vhost: split enqueue and dequeue flush functions

2019-10-08 Thread Marvin Liu
Vhost enqueue descriptors are updated by batch number, while vhost dequeue descriptors are buffered. Meanwhile in dequeue function only first descriptor is buffered. Due to these differences, split vhost enqueue and dequeue flush functions. Signed-off-by: Marvin Liu diff --git a/lib

[dpdk-dev] [PATCH v4 11/14] vhost: add batch and single zero dequeue functions

2019-10-08 Thread Marvin Liu
Optimize vhost zero copy dequeue path like normal dequeue path. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5f2822ba2..deb9d0e39 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1881,6 +1881,141

[dpdk-dev] [PATCH v4 13/14] vhost: check whether disable software pre-fetch

2019-10-08 Thread Marvin Liu
Disable software pre-fetch actions on Skylake and later platforms. Hardware can fetch needed data for vhost, additional software pre-fetch will impact performance. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 30839a001..5f3b42e56 100644

[dpdk-dev] [PATCH v4 14/14] vhost: optimize packed ring dequeue when in-order

2019-10-08 Thread Marvin Liu
When VIRTIO_F_IN_ORDER feature is negotiated, vhost can optimize dequeue function by only update first used descriptor. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 046e497c2..6f28082bc 100644 --- a/lib/librte_vhost/virtio_net.c

[dpdk-dev] [PATCH v4 12/14] vhost: optimize dequeue function of packed ring

2019-10-08 Thread Marvin Liu
Optimize vhost device Rx datapath by separate functions. No-chained and direct descriptors will be handled by batch and other will be handled one by one as before. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index deb9d0e39..56c2080fb

[dpdk-dev] [PATCH v5 00/13] vhost packed ring performance optimization

2019-10-14 Thread Marvin Liu
iated Marvin Liu (13): vhost: add packed ring indexes increasing function vhost: add packed ring single enqueue vhost: try to unroll for each loop vhost: add packed ring batch enqueue vhost: add packed ring single dequeue vhost: add packed ring batch dequeue vhost: flush enqueue updat

[dpdk-dev] [PATCH v5 01/13] vhost: add packed ring indexes increasing function

2019-10-14 Thread Marvin Liu
When vhost doing [de]nqueue, vq's local variable last_[used/avail]_idx will be inceased. Adding inline functions can avoid duplicated codes. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5131a97a3..22a3ddc38 100644 --- a/lib/librte_

[dpdk-dev] [PATCH v5 02/13] vhost: add packed ring single enqueue

2019-10-14 Thread Marvin Liu
Add vhost enqueue function for single packet and meanwhile left space for flush used ring function. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 42b662080..142c14e04 100644 --- a/lib/librte_vhost

[dpdk-dev] [PATCH v5 03/13] vhost: try to unroll for each loop

2019-10-14 Thread Marvin Liu
Create macro for adding unroll pragma before for each loop. Batch functions will be contained of several small loops which can be optimized by compilers' loop unrolling pragma. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 8623

[dpdk-dev] [PATCH v5 04/13] vhost: add packed ring batch enqueue

2019-10-14 Thread Marvin Liu
Batch enqueue function will first check whether descriptors are cache aligned. It will also check prerequisites in the beginning. Batch enqueue function do not support chained mbufs, single packet enqueue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost

[dpdk-dev] [PATCH v5 06/13] vhost: add packed ring batch dequeue

2019-10-14 Thread Marvin Liu
Add batch dequeue function like enqueue function for packed ring, batch dequeue function will not support chained descritpors, single packet dequeue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 18d01cb19..96bf763b1

[dpdk-dev] [PATCH v5 10/13] vhost: optimize packed ring enqueue

2019-10-14 Thread Marvin Liu
Optimize vhost device packed ring enqueue function by splitting batch and single functions. Packets can be filled into one desc will be handled by batch and others will be handled by single as before. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost

[dpdk-dev] [PATCH v5 05/13] vhost: add packed ring single dequeue

2019-10-14 Thread Marvin Liu
Add vhost single packet dequeue function for packed ring and meanwhile left space for shadow used ring update function. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index a8130dc06..e1b06c1ce 100644 --- a/lib

[dpdk-dev] [PATCH v5 11/13] vhost: add packed ring zcopy batch and single dequeue

2019-10-14 Thread Marvin Liu
Add vhost packed ring zero copy batch and single dequeue functions like normal dequeue path. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5cdca9a7f..01d1603e3 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost

[dpdk-dev] [PATCH v5 08/13] vhost: flush batched enqueue descs flags directly

2019-10-14 Thread Marvin Liu
Flush used flags when batched enqueue function is finished. Descriptor's flags are pre-calculated as they will be reset by vhost. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index a60b88d89..bf3c30f43 100644 --- a/lib/librte_vhost/vhost.h +++

[dpdk-dev] [PATCH v5 12/13] vhost: optimize packed ring dequeue

2019-10-14 Thread Marvin Liu
Optimize vhost device packed ring dequeue function by splitting batch and single functions. No-chained and direct descriptors will be handled by batch and other will be handled by single as before. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost

[dpdk-dev] [PATCH v5 09/13] vhost: buffer packed ring dequeue updates

2019-10-14 Thread Marvin Liu
Buffer used ring updates as many as possible in vhost dequeue function for coordinating with virtio driver. For supporting buffer, shadow used ring element should contain descriptor's flags. First shadowed ring index was recorded for calculating buffered number. Signed-off-by: Marvin Liu

[dpdk-dev] [PATCH v5 07/13] vhost: flush enqueue updates by batch

2019-10-14 Thread Marvin Liu
Buffer vhost enqueue shadowed ring flush action buffered number exceed one batch. Thus virtio can receive packets at a faster frequency. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 96bf763b1..a60b88d89 100644 --- a/lib/librte_vhost/vhost.h

[dpdk-dev] [PATCH v5 13/13] vhost: optimize packed ring dequeue when in-order

2019-10-14 Thread Marvin Liu
When VIRTIO_F_IN_ORDER feature is negotiated, vhost can optimize dequeue function by only update first used descriptor. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 85ccc02da..88632caff 100644 --- a/lib/librte_vhost/virtio_net.c

[dpdk-dev] [PATCH v6 00/13] vhost packed ring performance optimization

2019-10-15 Thread Marvin Liu
used ring update when in_order negotiated Marvin Liu (13): vhost: add packed ring indexes increasing function vhost: add packed ring single enqueue vhost: try to unroll for each loop vhost: add packed ring batch enqueue vhost: add packed ring single dequeue vhost: add packed ring batch de

[dpdk-dev] [PATCH v6 02/13] vhost: add packed ring single enqueue

2019-10-15 Thread Marvin Liu
Add vhost enqueue function for single packet and meanwhile left space for flush used ring function. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 42b662080..142c14e04 100644 --- a/lib/librte_vhost

[dpdk-dev] [PATCH v6 03/13] vhost: try to unroll for each loop

2019-10-15 Thread Marvin Liu
Create macro for adding unroll pragma before for each loop. Batch functions will be contained of several small loops which can be optimized by compilers' loop unrolling pragma. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 8623

[dpdk-dev] [PATCH v6 01/13] vhost: add packed ring indexes increasing function

2019-10-15 Thread Marvin Liu
When vhost doing [de]nqueue, vq's local variable last_[used/avail]_idx will be inceased. Adding inline functions can avoid duplicated codes. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5131a97a3..22a3ddc38 100644 --- a/lib/librte_

[dpdk-dev] [PATCH v6 05/13] vhost: add packed ring single dequeue

2019-10-15 Thread Marvin Liu
Add vhost single packet dequeue function for packed ring and meanwhile left space for shadow used ring update function. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index a8130dc06..e1b06c1ce 100644 --- a/lib

[dpdk-dev] [PATCH v6 04/13] vhost: add packed ring batch enqueue

2019-10-15 Thread Marvin Liu
Batch enqueue function will first check whether descriptors are cache aligned. It will also check prerequisites in the beginning. Batch enqueue function do not support chained mbufs, single packet enqueue function will handle it. Signed-off-by: Marvin Liu Reviewed-by: Maxime Coquelin diff

[dpdk-dev] [PATCH v6 06/13] vhost: add packed ring batch dequeue

2019-10-15 Thread Marvin Liu
Add batch dequeue function like enqueue function for packed ring, batch dequeue function will not support chained descritpors, single packet dequeue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 18d01cb19..96bf763b1

[dpdk-dev] [PATCH v6 07/13] vhost: flush enqueue updates by batch

2019-10-15 Thread Marvin Liu
Buffer vhost enqueue shadowed ring flush action buffered number exceed one batch. Thus virtio can receive packets at a faster frequency. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 96bf763b1..a60b88d89 100644 --- a/lib/librte_vhost/vhost.h

[dpdk-dev] [PATCH v6 09/13] vhost: buffer packed ring dequeue updates

2019-10-15 Thread Marvin Liu
Buffer used ring updates as many as possible in vhost dequeue function for coordinating with virtio driver. For supporting buffer, shadow used ring element should contain descriptor's flags. First shadowed ring index was recorded for calculating buffered number. Signed-off-by: Marvin Liu

[dpdk-dev] [PATCH v6 08/13] vhost: flush batched enqueue descs directly

2019-10-15 Thread Marvin Liu
Flush used elements when batched enqueue function is finished. Descriptor's flags are pre-calculated as they will be reset by vhost. Signed-off-by: Marvin Liu Reviewed-by: Gavin Hu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index a60b88d89..bf3c30f43 100644 ---

[dpdk-dev] [PATCH v6 11/13] vhost: add packed ring zcopy batch and single dequeue

2019-10-15 Thread Marvin Liu
Add vhost packed ring zero copy batch and single dequeue functions like normal dequeue path. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5cdca9a7f..01d1603e3 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost

[dpdk-dev] [PATCH v6 10/13] vhost: optimize packed ring enqueue

2019-10-15 Thread Marvin Liu
Optimize vhost device packed ring enqueue function by splitting batch and single functions. Packets can be filled into one desc will be handled by batch and others will be handled by single as before. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost

[dpdk-dev] [PATCH v6 12/13] vhost: optimize packed ring dequeue

2019-10-15 Thread Marvin Liu
Optimize vhost device packed ring dequeue function by splitting batch and single functions. No-chained and direct descriptors will be handled by batch and other will be handled by single as before. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost

[dpdk-dev] [PATCH v6 13/13] vhost: optimize packed ring dequeue when in-order

2019-10-15 Thread Marvin Liu
When VIRTIO_F_IN_ORDER feature is negotiated, vhost can optimize dequeue function by only update first used descriptor. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 7c5b4..93ebdd7b6 100644 --- a/lib/librte_vhost/virtio_net.c

[dpdk-dev] [PATCH 2/2] net/virtio: on demand cleanup when doing in order xmit

2019-08-26 Thread Marvin Liu
Check whether freed descriptors are enough before enqueue operation. If more space is needed, will try to cleanup used ring on demand. It can give more chances to cleanup used ring, thus help RFC2544 perf. Signed-off-by: Marvin Liu --- drivers/net/virtio/virtio_rxtx.c | 73

[dpdk-dev] [PATCH 1/2] net/virtio: update stats when in order xmit done

2019-08-26 Thread Marvin Liu
When doing xmit in-order enqueue, packets are buffered and then flushed into avail ring. It has possibility that no free room in avail ring, thus some buffered packets can't be transmitted. So move stats update just after successful avail ring updates. Signed-off-by: Marvin Liu --- driver

[dpdk-dev] [PATCH v1 01/14] vhost: add single packet enqueue function

2019-09-05 Thread Marvin Liu
Add vhost enqueue function for single packet and meanwhile left space for flush used ring function. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5b85b832d..5ad0a8175 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost

[dpdk-dev] [PATCH v1 03/14] vhost: add single packet dequeue function

2019-09-05 Thread Marvin Liu
dd vhost single packet dequeue function for packed ring and meanwhile left space for shadow used ring update function. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 51ed20543..454e8b33e 100644 --- a/lib/librte_vhost/virtio_net.c +++ b

[dpdk-dev] [PATCH v1 04/14] vhost: add burst dequeue function

2019-09-05 Thread Marvin Liu
Add burst dequeue function like enqueue function for packed ring, burst dequeue function will not support chained descritpors, single packet dequeue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index ed8b4aabf..b33f29ba0

[dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue function for packed ring

2019-09-05 Thread Marvin Liu
Burst enqueue function will first check whether descriptors are cache aligned. It will also check prerequisites in the beginning. Burst enqueue function not support chained mbufs, single packet enqueue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b

[dpdk-dev] [PATCH v1 05/14] vhost: rename flush shadow used ring functions

2019-09-05 Thread Marvin Liu
Simplify flush shadow used ring function names as all shadow rings are reflect to used rings. No need to emphasize ring type. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index f34df3733..7116c389d 100644 --- a/lib/librte_vhost

[dpdk-dev] [PATCH v1 00/14] vhost packed ring performance optimization

2019-09-05 Thread Marvin Liu
. Disable sofware prefetch is hardware can do better. After all these methods done, single core vhost PvP performance with 64B packet on Xeon 8180 can boost 40%. Marvin Liu (14): vhost: add single packet enqueue function vhost: add burst enqueue function for packed ring vhost: add single

[dpdk-dev] [PATCH v1 09/14] vhost: split enqueue and dequeue flush functions

2019-09-05 Thread Marvin Liu
Vhost enqueue descriptors are updated by burst number, while vhost dequeue descriptors are buffered. Meanwhile in dequeue function only first descriptor is buffered. Due to these differences, split vhost enqueue and dequeue flush functions. Signed-off-by: Marvin Liu diff --git a/lib

[dpdk-dev] [PATCH v1 08/14] vhost: buffer vhost dequeue shadow ring

2019-09-05 Thread Marvin Liu
: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5471acaf7..b161082ca 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -42,6 +42,8 @@ #define VIRTIO_RX_USED_FLAG (0ULL | VRING_DESC_F_AVAIL | VRING_DESC_F_USED

[dpdk-dev] [PATCH v1 07/14] vhost: add flush function for burst enqueue

2019-09-05 Thread Marvin Liu
Flush used flags when burst enqueue function is finished. Descriptor's flags are pre-calculated as them will be reset by vhost. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 86552cbeb..5471acaf7 100644 --- a/lib/librte_vhost/vhost.h +++

[dpdk-dev] [PATCH v1 06/14] vhost: flush vhost enqueue shadow ring by burst

2019-09-05 Thread Marvin Liu
Buffer vhost enqueue shadow ring update, flush shadow ring until buffered descriptors number exceed one burst. Thus virtio can receive packets at a faster frequency. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index b33f29ba0..86552cbeb 100644

[dpdk-dev] [PATCH v1 13/14] vhost: cache address translation result

2019-09-05 Thread Marvin Liu
Cache address translation result and use it in next translation. Due to limited regions are supported, buffers are most likely in same region when doing data transmission. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index 7fb172912

[dpdk-dev] [PATCH v1 10/14] vhost: optimize Rx function of packed ring

2019-09-05 Thread Marvin Liu
Optimize vhost device rx function by separate descriptors, no-chained and direct descriptors will be handled by burst and other will be handled one by one as before. Pre-fetch descriptors in next two cache lines as hardware will load two cache line data automatically. Signed-off-by: Marvin Liu

[dpdk-dev] [PATCH v1 11/14] vhost: add burst and single zero dequeue functions

2019-09-05 Thread Marvin Liu
Optimize vhost zero copy dequeue path like normal dequeue path. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 269ec8a43..8032229a0 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1979,6 +1979,108

[dpdk-dev] [PATCH v1 14/14] vhost: check whether disable software pre-fetch

2019-09-05 Thread Marvin Liu
Disable software pre-fetch actions on Skylake and Cascadelake platforms. Hardware can fetch needed data for vhost, additional software pre-fetch will have impact on performance. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 8623e91c0

[dpdk-dev] [PATCH v1 12/14] vhost: optimize Tx function of packed ring

2019-09-05 Thread Marvin Liu
Optimize vhost device tx function like rx function. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 8032229a0..554617292 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -302,17 +302,6

[dpdk-dev] [PATCH v2 2/2] net/virtio: on demand cleanup when doing in order xmit

2019-09-10 Thread Marvin Liu
Check whether space are enough before burst enqueue operation. If more space is needed, will try to cleanup used descriptors for space on demand. It can give more chances to free used descriptors, thus will help RFC2544 performance. Signed-off-by: Marvin Liu --- drivers/net/virtio/virtio_rxtx.c

[dpdk-dev] [PATCH v2 1/2] net/virtio: update stats when in order xmit done

2019-09-10 Thread Marvin Liu
When doing xmit in-order enqueue, packets are buffered and then flushed into avail ring. Buffered packets can be dropped due to insufficient space. Moving stats update action just after successful avail ring updates can guarantee correctness. Signed-off-by: Marvin Liu --- drivers/net/virtio

[dpdk-dev] [PATCH v3 1/2] net/virtio: update stats when in order xmit done

2019-09-18 Thread Marvin Liu
er Rx and Tx") Cc: sta...@dpdk.org Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 27ead19fb..91df5b1d0 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -106,6 +106,48 @@ vq_ring_free

[dpdk-dev] [PATCH v3 2/2] net/virtio: on demand cleanup when in order xmit

2019-09-18 Thread Marvin Liu
: e5f456a98d3c ("net/virtio: support in-order Rx and Tx") Cc: sta...@dpdk.org Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 91df5b1d0..7d5c60532 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/vir

[dpdk-dev] [PATCH v2 01/16] vhost: add single packet enqueue function

2019-09-19 Thread Marvin Liu
Add vhost enqueue function for single packet and meanwhile left space for flush used ring function. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5b85b832d..2b5c47145 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost

[dpdk-dev] [PATCH v2 00/16] vhost packed ring performance optimization

2019-09-19 Thread Marvin Liu
NG_SZ - PKT_BURST) - Optimize dequeue used ring update when in_order negotiated Marvin Liu (16): vhost: add single packet enqueue function vhost: unify unroll pragma parameter vhost: add burst enqueue function for packed ring vhost: add single packet dequeue function vhost: add burst de

[dpdk-dev] [PATCH v2 02/16] vhost: unify unroll pragma parameter

2019-09-19 Thread Marvin Liu
Add macro for unifying Clang/ICC/GCC unroll pragma format. Burst functions were contained of several small loops which optimized by compiler’s loop unrolling pragma. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 8623e91c0..30839a001 100644

[dpdk-dev] [PATCH v2 04/16] vhost: add single packet dequeue function

2019-09-19 Thread Marvin Liu
Add vhost single packet dequeue function for packed ring and meanwhile left space for shadow used ring update function. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index c664b27c5..047fa7dc8 100644 --- a/lib/librte_vhost/virtio_net.c

[dpdk-dev] [PATCH v2 06/16] vhost: rename flush shadow used ring functions

2019-09-19 Thread Marvin Liu
Simplify flush shadow used ring function names as all shadow rings are reflect to used rings. No need to emphasize ring type. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 23c0f4685..ebd6c175d 100644 --- a/lib/librte_vhost

[dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function for packed ring

2019-09-19 Thread Marvin Liu
Burst enqueue function will first check whether descriptors are cache aligned. It will also check prerequisites in the beginning. Burst enqueue function not support chained mbufs, single packet enqueue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b

[dpdk-dev] [PATCH v2 05/16] vhost: add burst dequeue function

2019-09-19 Thread Marvin Liu
Add burst dequeue function like enqueue function for packed ring, burst dequeue function will not support chained descritpors, single packet dequeue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 67889c80a..9fa3c8adf

[dpdk-dev] [PATCH v2 09/16] vhost: buffer vhost dequeue shadow ring

2019-09-19 Thread Marvin Liu
: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 9c42c7db0..14e87f670 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -42,6 +42,8 @@ #define VIRTIO_RX_USED_FLAG (0ULL | VRING_DESC_F_AVAIL | VRING_DESC_F_USED

[dpdk-dev] [PATCH v2 07/16] vhost: flush vhost enqueue shadow ring by burst

2019-09-19 Thread Marvin Liu
Buffer vhost enqueue shadow ring update, flush shadow ring until buffered descriptors number exceed one burst. Thus virtio can receive packets at a faster frequency. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 9fa3c8adf..000648dd4 100644

[dpdk-dev] [PATCH v2 08/16] vhost: add flush function for burst enqueue

2019-09-19 Thread Marvin Liu
Flush used flags when burst enqueue function is finished. Descriptor's flags are pre-calculated as them will be reset by vhost. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 000648dd4..9c42c7db0 100644 --- a/lib/librte_vhost/vhost.h +++

[dpdk-dev] [PATCH v2 10/16] vhost: split enqueue and dequeue flush functions

2019-09-19 Thread Marvin Liu
Vhost enqueue descriptors are updated by burst number, while vhost dequeue descriptors are buffered. Meanwhile in dequeue function only first descriptor is buffered. Due to these differences, split vhost enqueue and dequeue flush functions. Signed-off-by: Marvin Liu diff --git a/lib

[dpdk-dev] [PATCH v2 13/16] vhost: optimize dequeue function of packed ring

2019-09-19 Thread Marvin Liu
Optimize vhost device Rx datapath by separate functions. No-chained and direct descriptors will be handled by burst and other will be handled one by one as before. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index a8df74f87..066514e43

[dpdk-dev] [PATCH v2 11/16] vhost: optimize enqueue function of packed ring

2019-09-19 Thread Marvin Liu
Optimize vhost device Tx datapath by separate functions. Packets can be filled into one descriptor will be handled by burst and others will be handled one by one as before. Pre-fetch descriptors in next two cache lines as hardware will load two cache line data automatically. Signed-off-by: Marvin

[dpdk-dev] [PATCH v2 16/16] vhost: optimize packed ring dequeue when in-order

2019-09-19 Thread Marvin Liu
When VIRTIO_F_IN_ORDER feature is negotiated, vhost can optimize dequeue function by only update first used descriptor. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 357517cdd..a7bb4ec79 100644 --- a/lib/librte_vhost/virtio_net.c

[dpdk-dev] [PATCH v2 14/16] vhost: cache address translation result

2019-09-19 Thread Marvin Liu
Cache address translation result and use it in next translation. Due to limited regions are supported, buffers are most likely in same region when doing data transmission. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index 7fb172912

[dpdk-dev] [PATCH v2 12/16] vhost: add burst and single zero dequeue functions

2019-09-19 Thread Marvin Liu
Optimize vhost zero copy dequeue path like normal dequeue path. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 2418b4e45..a8df74f87 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1909,6 +1909,144

[dpdk-dev] [PATCH v2 15/16] vhost: check whether disable software pre-fetch

2019-09-19 Thread Marvin Liu
Disable software pre-fetch actions on Skylake and Cascadelake platforms. Hardware can fetch needed data for vhost, additional software pre-fetch will have impact on performance. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 30839a001

[dpdk-dev] [PATCH] net/virtio: fix mbuf data and pkt length mismatch

2019-09-22 Thread Marvin Liu
et/virtio: optimize Tx enqueue for packed ring") Fixes: e5f456a98d3c ("net/virtio: support in-order Rx and Tx") Cc: sta...@dpdk.org Reported-by: Stephen Hemminger Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 27ead19

[dpdk-dev] [PATCH v3 00/15] vhost packed ring performance optimization

2019-09-25 Thread Marvin Liu
when in_order negotiated Marvin Liu (15): vhost: add single packet enqueue function vhost: unify unroll pragma parameter vhost: add batch enqueue function for packed ring vhost: add single packet dequeue function vhost: add batch dequeue function vhost: flush vhost enqueue shadow ring by

[dpdk-dev] [PATCH v3 01/15] vhost: add single packet enqueue function

2019-09-25 Thread Marvin Liu
Add vhost enqueue function for single packet and meanwhile left space for flush used ring function. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5b85b832d..520c4c6a8 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost

[dpdk-dev] [PATCH v3 02/15] vhost: unify unroll pragma parameter

2019-09-25 Thread Marvin Liu
Add macro for unifying Clang/ICC/GCC unroll pragma format. Batch functions were contained of several small loops which optimized by compiler’s loop unrolling pragma. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 8623e91c0..30839a001 100644

[dpdk-dev] [PATCH v3 04/15] vhost: add single packet dequeue function

2019-09-25 Thread Marvin Liu
Add vhost single packet dequeue function for packed ring and meanwhile left space for shadow used ring update function. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5e08f7d9b..17aabe8eb 100644 --- a/lib/librte_vhost/virtio_net.c

[dpdk-dev] [PATCH v3 03/15] vhost: add batch enqueue function for packed ring

2019-09-25 Thread Marvin Liu
Batch enqueue function will first check whether descriptors are cache aligned. It will also check prerequisites in the beginning. Batch enqueue function not support chained mbufs, single packet enqueue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b

[dpdk-dev] [PATCH v3 05/15] vhost: add batch dequeue function

2019-09-25 Thread Marvin Liu
Add batch dequeue function like enqueue function for packed ring, batch dequeue function will not support chained descritpors, single packet dequeue function will handle it. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index e241436c7..e50e137ca

[dpdk-dev] [PATCH v3 07/15] vhost: add flush function for batch enqueue

2019-09-25 Thread Marvin Liu
Flush used flags when batched enqueue function is finished. Descriptor's flags are pre-calculated as they will be reset by vhost. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 18a207fc6..7bf9ff9b7 100644 --- a/lib/librte_vhost/vhost.h +++

[dpdk-dev] [PATCH v3 06/15] vhost: flush vhost enqueue shadow ring by batch

2019-09-25 Thread Marvin Liu
Buffer vhost enqueue shadow ring update, flush shadow ring until buffered descriptors number exceed one batch. Thus virtio can receive packets at a faster frequency. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index e50e137ca..18a207fc6 100644

[dpdk-dev] [PATCH v3 08/15] vhost: buffer vhost dequeue shadow ring

2019-09-25 Thread Marvin Liu
: Marvin Liu diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 7bf9ff9b7..f62e9ec3f 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -42,6 +42,8 @@ #define PACKED_RX_USED_FLAG (0ULL | VRING_DESC_F_AVAIL | VRING_DESC_F_USED

[dpdk-dev] [PATCH v3 09/15] vhost: split enqueue and dequeue flush functions

2019-09-25 Thread Marvin Liu
Vhost enqueue descriptors are updated by batch number, while vhost dequeue descriptors are buffered. Meanwhile in dequeue function only first descriptor is buffered. Due to these differences, split vhost enqueue and dequeue flush functions. Signed-off-by: Marvin Liu diff --git a/lib

[dpdk-dev] [PATCH v3 10/15] vhost: optimize enqueue function of packed ring

2019-09-25 Thread Marvin Liu
Optimize vhost device Tx datapath by separate functions. Packets can be filled into one descriptor will be handled by batch and others will be handled one by one as before. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 1b0fa2c64

[dpdk-dev] [PATCH v3 12/15] vhost: optimize dequeue function of packed ring

2019-09-25 Thread Marvin Liu
Optimize vhost device Rx datapath by separate functions. No-chained and direct descriptors will be handled by batch and other will be handled one by one as before. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 9ab95763a..20624efdc

[dpdk-dev] [PATCH v3 13/15] vhost: cache address translation result

2019-09-25 Thread Marvin Liu
Cache address translation result and use it in next translation. Due to limited regions are supported, buffers are most likely in same region when doing data transmission. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index 7fb172912

[dpdk-dev] [PATCH v3 11/15] vhost: add batch and single zero dequeue functions

2019-09-25 Thread Marvin Liu
Optimize vhost zero copy dequeue path like normal dequeue path. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index c485e7f49..9ab95763a 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1881,6 +1881,141

[dpdk-dev] [PATCH v3 15/15] vhost: optimize packed ring dequeue when in-order

2019-09-25 Thread Marvin Liu
When VIRTIO_F_IN_ORDER feature is negotiated, vhost can optimize dequeue function by only update first used descriptor. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index e3872e384..1e113fb3a 100644 --- a/lib/librte_vhost/virtio_net.c

[dpdk-dev] [PATCH v3 14/15] vhost: check whether disable software pre-fetch

2019-09-25 Thread Marvin Liu
Disable software pre-fetch actions on Skylake and later platforms. Hardware can fetch needed data for vhost, additional software pre-fetch will impact performance. Signed-off-by: Marvin Liu diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile index 30839a001..5f3b42e56 100644

[dpdk-dev] [PATCH] net/virtio: enable packet data prefetch on x86

2020-11-11 Thread Marvin Liu
Data prefetch instruction can preload data into cpu’s hierarchical cache before data access. Virtio datapath utilized this feature for data access acceleration. As config RTE_PMD_PACKET_PREFETCH was discarded, now packet data prefetch is enabled based on architecture. Signed-off-by: Marvin Liu

[dpdk-dev] [PATCH] net/virtio: fix invalid indirect desc length

2020-10-14 Thread Marvin Liu
mbuf segments number for calculating correct desc length. Fixes: de8b3d238074 ("net/virtio: fix indirect descs in packed datapaths") Cc: sta...@dpdk.org Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h index 8c8ab9889..42c4c9882 10

[dpdk-dev] [PATCH v2] config: enable packet data prefetch

2020-09-22 Thread Marvin Liu
Data prefetch instruction can preload data into cpu’s hierarchical cache before data access. Virtualized data paths like virtio utilized this feature for acceleration. Since most modern cpus have support prefetch function, we can enable packet data prefetch as default. Signed-off-by: Marvin Liu

[dpdk-dev] [PATCH 2/2] net/virtio: use indirect ring in packed datapath

2020-09-28 Thread Marvin Liu
Like split ring, packed ring will utilize indirect ring elements when queuing mbufs need multiple descriptors. Thus each packet will take only one slot when having multiple segments. Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index

[dpdk-dev] [PATCH 1/2] net/virtio: setup Tx region for packed ring

2020-09-28 Thread Marvin Liu
Add packed indirect descriptors format into virtio Tx region. When initializing vring, packed indirect descriptors will be initialized if ring type is packed. Signed-off-by: Marvin Liu diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 013a2904e

[dpdk-dev] [PATCH v3 1/5] vhost: add vectorized data path

2020-10-09 Thread Marvin Liu
requirements. Otherwise will fallback to original data path. Signed-off-by: Marvin Liu diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst index d36f3120b..efdaf4de0 100644 --- a/doc/guides/nics/vhost.rst +++ b/doc/guides/nics/vhost.rst @@ -64,6 +64,11 @@ The user can specify

[dpdk-dev] [PATCH v3 0/5] vhost add vectorized data path

2020-10-09 Thread Marvin Liu
* dynamically allocate memory regions structure * remove unlikely hint for in_order v2: * add vIOMMU support * add dequeue offloading * rebase code Marvin Liu (5): vhost: add vectorized data path vhost: reuse packed ring functions vhost: prepare memory regions addresses vhost: add packed ring

<    1   2   3   4   5   >