[dpdk-dev] [PATCH 1/4] regex/mlx5: fix size of setup constants

2021-06-01 Thread Michael Baum
The constant representing the size of the metadata is defined as a
unsigned int variable with 32-bit.
Similarly the constant representing the maximal output is also defined
as a unsigned int variable with 32-bit.

There is potentially overflowing expression when those constants are
evaluated using 32-bit arithmetic, and then used in a context that
expects an expression of type size_t (64 bits, unsigned).

Change the size of the above constants to 64-bit.

Fixes: 30d604bb1504 ("regex/mlx5: fix type of setup constants")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
 drivers/regex/mlx5/mlx5_regex_fastpath.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/regex/mlx5/mlx5_regex_fastpath.c 
b/drivers/regex/mlx5/mlx5_regex_fastpath.c
index b57e7d7794..3ef5e6c1eb 100644
--- a/drivers/regex/mlx5/mlx5_regex_fastpath.c
+++ b/drivers/regex/mlx5/mlx5_regex_fastpath.c
@@ -25,8 +25,8 @@
 #include "mlx5_regex.h"
 
 #define MLX5_REGEX_MAX_WQE_INDEX 0x
-#define MLX5_REGEX_METADATA_SIZE UINT32_C(64)
-#define MLX5_REGEX_MAX_OUTPUT RTE_BIT32(11)
+#define MLX5_REGEX_METADATA_SIZE UINT64_C(64)
+#define MLX5_REGEX_MAX_OUTPUT RTE_BIT64(11)
 #define MLX5_REGEX_WQE_CTRL_OFFSET 12
 #define MLX5_REGEX_WQE_METADATA_OFFSET 16
 #define MLX5_REGEX_WQE_GATHER_OFFSET 32
-- 
2.25.1



[dpdk-dev] [PATCH 2/4] compress/mlx5: fix constant size in QP creation

2021-06-01 Thread Michael Baum
The mlx5_compress_qp_setup function makes shifting to the numeric
constant 1, then sends it as a parameter to rte_calloc function.

The rte_calloc function expects to get size_t (64 bits, unsigned) and
instead gets a 32-bit variable, because the numeric constant size is a
32-bit.
In case the shift is greater than 32 the variable will lose its value
even though the function can get 64-bit argument.

Change the size of the numeric constant 1 to 64-bit.

Fixes: 8619fcd5161b ("compress/mlx5: support queue pair operations")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
 drivers/compress/mlx5/mlx5_compress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/compress/mlx5/mlx5_compress.c 
b/drivers/compress/mlx5/mlx5_compress.c
index 80c564f10b..90d009c56b 100644
--- a/drivers/compress/mlx5/mlx5_compress.c
+++ b/drivers/compress/mlx5/mlx5_compress.c
@@ -209,7 +209,7 @@ mlx5_compress_qp_setup(struct rte_compressdev *dev, 
uint16_t qp_id,
return -rte_errno;
}
dev->data->queue_pairs[qp_id] = qp;
-   opaq_buf = rte_calloc(__func__, 1u << log_ops_n,
+   opaq_buf = rte_calloc(__func__, RTE_BIT64(log_ops_n),
  sizeof(struct mlx5_gga_compress_opaque),
  sizeof(struct mlx5_gga_compress_opaque));
if (opaq_buf == NULL) {
-- 
2.25.1



[dpdk-dev] [PATCH 3/4] vdpa/mlx5: fix constant type in QP creation

2021-06-01 Thread Michael Baum
The mlx5_vdpa_event_qp_create function makes shifting to the numeric
constant 1, then multiplies it by another constant and finally assigns
it into a uint64_t variable.

The numeric constant type is an int with a 32-bit sign. if after
shifting , its MSB (bit of sign) will change, the uint64 variable will
get into it a different value than what the function intended it to get.

Set the numeric constant 1 to be uint64_t in the first place.

Fixes: 8395927cdfaf ("vdpa/mlx5: prepare HW queues")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
 drivers/vdpa/mlx5/mlx5_vdpa_event.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_event.c 
b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
index 88f6a4256d..3541c652ce 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_event.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_event.c
@@ -629,8 +629,8 @@ mlx5_vdpa_event_qp_create(struct mlx5_vdpa_priv *priv, 
uint16_t desc_n,
attr.wq_umem_id = eqp->umem_obj->umem_id;
attr.wq_umem_offset = 0;
attr.dbr_umem_id = eqp->umem_obj->umem_id;
-   attr.dbr_address = (1 << log_desc_n) * MLX5_WSEG_SIZE;
attr.ts_format = mlx5_ts_format_conv(priv->qp_ts_format);
+   attr.dbr_address = RTE_BIT64(log_desc_n) * MLX5_WSEG_SIZE;
eqp->sw_qp = mlx5_devx_cmd_create_qp(priv->ctx, &attr);
if (!eqp->sw_qp) {
DRV_LOG(ERR, "Failed to create SW QP(%u).", rte_errno);
-- 
2.25.1



[dpdk-dev] [PATCH 4/4] net/mlx5: fix constant type in MP allocation

2021-06-01 Thread Michael Baum
The mlx5_mprq_alloc_mp function makes shifting to the numeric constant
1, for sending it as a parameter to rte_mempool_create function.

The rte_mempool_create function expects to get void pointer (64 bits,
uintptr_t) and instead gets a 32-bit variable, because the numeric
constant size is a 32-bit.
In case the shift is greater than 32 the variable will lose its value
even though the function can get 64-bit argument.

Change the size of the numeric constant 1 to 64-bit.

Fixes: 3a22f3877c9d ("net/mlx5: replace external mbuf shared memory")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
 drivers/net/mlx5/mlx5_rxq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index bb9a908087..950f327f03 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1240,7 +1240,7 @@ mlx5_mprq_alloc_mp(struct rte_eth_dev *dev)
snprintf(name, sizeof(name), "port-%u-mprq", dev->data->port_id);
mp = rte_mempool_create(name, obj_num, obj_size, MLX5_MPRQ_MP_CACHE_SZ,
0, NULL, NULL, mlx5_mprq_buf_init,
-   (void *)(uintptr_t)(1 << strd_num_n),
+   (void *)(uintptr_t)RTE_BIT64(strd_num_n),
dev->device->numa_node, 0);
if (mp == NULL) {
DRV_LOG(ERR,
-- 
2.25.1



Re: [dpdk-dev] [PATCH] vfio: fix stdbool usage without include

2021-06-01 Thread Thomas Monjalon
01/06/2021 07:42, Christian Ehrhardt:
> This became visible by backporting the following for the 19.11 stable tree:
>  c13ca4e8 "vfio: fix DMA mapping granularity for IOVA as VA"
> 
> The usage of type bool in the vfio code would require "#include
> ", but rte_vfio.h has no direct paths to stdbool.h.
> It happens that in eal_vfio_mp_sync.c it comes after "#include
> ".
> 
> And rte_log.h since 20.05 includes stdbool since this change:
>  241e67bfe "log: add API to check if a logtype can log in a given level"
> and thereby masks the issue in >20.05.
> 
> It should be safe to include stdbool.h from rte_vfio.h itself
> to have bool present exactly when needed for the struct it defines
> using that type.

A line "Fixes" is missing for the record of the root cause.





Re: [dpdk-dev] [PATCH 09/28] raw/cnxk_bphy: add bphy cgx/rpm skeleton driver

2021-06-01 Thread Thomas Monjalon
31/05/2021 23:41, Tomasz Duszynski:
> Add baseband phy cgx/rpm skeleton driver. At this point
> it merely probes a matching device.
> 
> Signed-off-by: Tomasz Duszynski 
> Signed-off-by: Jakub Palider 

For the second version, please pay attention to details
like sorting things alphabetically, blank lines,
underlining of correct size, etc. Thanks.




[dpdk-dev] Problem while running dpdk

2021-06-01 Thread Raunak Laddha
Hello,
I am using dpdk 20.11.1 . I tried to use your resource to build a custom 
application which uses dpdk. I compiled my app using meson and makefile to see 
if I get different result for my issue. I have used pkg-config to load cflags 
and ldflags in my application and linking it as a static library.
Problem I am facing:
No buses are loaded. I can see that RTE_INIT_PRIO is used as constructor to 
load the buses. But in my application, buses are not loaded.
I tried to run the dpdk app named test-pipeline to check if it works and it 
does in that case. But same application with my makefile or meson file does not 
work.
First call in my program is rte_eal_init(argc, argv); (the init call of dpdk).
My app gets compiles. Also verified cflags and ldflags to check if dpdk flags 
are added.
My assumption is whatever RTE_INIT_PRIO is loading, it gets loaded correctly in 
dpdk test-pipeline app but not in my custom app.
Is there any config I am missing?
I have attached the makefile and meson file.

Thanks,
Raunak
#include 
#include 

int main(int argc, char **argv) {
rte_eal_init(argc, argv);
return 0;
}

Re: [dpdk-dev] [PATCH v2] vhost/vhost_crypto: do not use possibly NULL Pointers

2021-06-01 Thread Maxime Coquelin
Hi Thierry,

On 5/24/21 11:08 AM, Thierry Herbelot wrote:
> Use vc_req only after it was checked not to be NULL.
> 
> Fixes: 2d962bb736521 ("vhost/crypto: fix possible TOCTOU attack")
> Cc: sta...@dpdk.org
> Cc: Maxime Coquelin 
> Cc: Chenbo Xia 
> 
> Signed-off-by: Thierry Herbelot 
> --
> V2: fix copy/paste typo
> ---
>  lib/vhost/vhost_crypto.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/vhost/vhost_crypto.c b/lib/vhost/vhost_crypto.c
> index 6689c52df239..926b5c0bd94a 100644
> --- a/lib/vhost/vhost_crypto.c
> +++ b/lib/vhost/vhost_crypto.c
> @@ -1337,13 +1337,15 @@ vhost_crypto_finalize_one_request(struct 
> rte_crypto_op *op,
>   struct rte_mbuf *m_src = op->sym->m_src;
>   struct rte_mbuf *m_dst = op->sym->m_dst;
>   struct vhost_crypto_data_req *vc_req = rte_mbuf_to_priv(m_src);
> - struct vhost_virtqueue *vq = vc_req->vq;
> - uint16_t used_idx = vc_req->desc_idx, desc_idx;
> + struct vhost_virtqueue *vq;
> + uint16_t used_idx, desc_idx;
>  
>   if (unlikely(!vc_req)) {
>   VC_LOG_ERR("Failed to retrieve vc_req");
>   return NULL;
>   }
> + vq = vc_req->vq;
> + used_idx = vc_req->desc_idx;
>  
>   if (old_vq && (vq != old_vq))
>   return vq;
> 

Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



[dpdk-dev] [PATCH v2] vfio: fix stdbool usage without include

2021-06-01 Thread Christian Ehrhardt
This became visible by backporting the following for the 19.11 stable tree:
 c13ca4e8 "vfio: fix DMA mapping granularity for IOVA as VA"

The usage of type bool in the vfio code would require "#include
", but rte_vfio.h has no direct paths to stdbool.h.
It happens that in eal_vfio_mp_sync.c it comes after "#include
".

And rte_log.h since 20.05 includes stdbool since this change:
 241e67bfe "log: add API to check if a logtype can log in a given level"
and thereby mitigates the issue.

It should be safe to include stdbool.h from rte_vfio.h itself
to be present exactly when needed for the struct it defines using that
type.

Fixes: c13ca4e81cac ("vfio: fix DMA mapping granularity for IOVA as VA")

Signed-off-by: Christian Ehrhardt 
---
 lib/eal/include/rte_vfio.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index e7a87454bea..2d90b364801 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -14,6 +14,7 @@
 extern "C" {
 #endif
 
+#include 
 #include 
 
 /*
-- 
2.31.1



Re: [dpdk-dev] [PATCH] vfio: fix stdbool usage without include

2021-06-01 Thread Christian Ehrhardt
On Tue, Jun 1, 2021 at 9:25 AM Thomas Monjalon  wrote:
>
> 01/06/2021 07:42, Christian Ehrhardt:
> > This became visible by backporting the following for the 19.11 stable tree:
> >  c13ca4e8 "vfio: fix DMA mapping granularity for IOVA as VA"
> >
> > The usage of type bool in the vfio code would require "#include
> > ", but rte_vfio.h has no direct paths to stdbool.h.
> > It happens that in eal_vfio_mp_sync.c it comes after "#include
> > ".
> >
> > And rte_log.h since 20.05 includes stdbool since this change:
> >  241e67bfe "log: add API to check if a logtype can log in a given level"
> > and thereby masks the issue in >20.05.
> >
> > It should be safe to include stdbool.h from rte_vfio.h itself
> > to have bool present exactly when needed for the struct it defines
> > using that type.
>
> A line "Fixes" is missing for the record of the root cause.

Thanks Thomas for having a look,
it is slightly up for debate what exactly the root cause is here, but I think
c13ca4e81cac that introduced using bool without adding a header is the
right reference.

I'll send a v2 with that added

>
>


-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd


Re: [dpdk-dev] [PATCH] net/virtio: fix kernel set features for multi-queue devices

2021-06-01 Thread Maxime Coquelin



On 5/28/21 3:20 PM, Thierry Herbelot wrote:
> Restore the original code, where VHOST_SET_FEATURES is applied to
> all vhostfds of the device.
> 
> Fixes: cc0151b34dee ("net/virtio: add virtio-user features ops")
> Cc: sta...@dpdk.org
> Cc: Maxime Coquelin 
> Cc: Chenbo Xia 
> 
> Signed-off-by: Thierry Herbelot 
> ---
>  drivers/net/virtio/virtio_user/vhost_kernel.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c 
> b/drivers/net/virtio/virtio_user/vhost_kernel.c
> index ad46f10a9300..d65f89e1fc16 100644
> --- a/drivers/net/virtio/virtio_user/vhost_kernel.c
> +++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
> @@ -158,6 +158,8 @@ static int
>  vhost_kernel_set_features(struct virtio_user_dev *dev, uint64_t features)
>  {
>   struct vhost_kernel_data *data = dev->backend_data;
> + uint32_t i;
> + int ret;
>  
>   /* We don't need memory protection here */
>   features &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);
> @@ -166,7 +168,16 @@ vhost_kernel_set_features(struct virtio_user_dev *dev, 
> uint64_t features)
>   features &= ~VHOST_KERNEL_HOST_OFFLOADS_MASK;
>   features &= ~(1ULL << VIRTIO_NET_F_MQ);
>  
> - return vhost_kernel_ioctl(data->vhostfds[0], VHOST_SET_FEATURES, 
> &features);
> + for (i = 0; i < dev->max_queue_pairs; ++i) {
> + if (data->vhostfds[i] < 0)
> + continue;
> +
> + ret = vhost_kernel_ioctl(data->vhostfds[i], VHOST_SET_FEATURES, 
> &features);
> + if (ret < 0)
> + return ret;
> + }
> +
> + return 0;
>  }
>  
>  static int
> 

Thanks for fixing it, it should be the last one...

Except GET_FEATURES that was also queried for every queue pair, but I
don't think it makes sense to query it and just drop the value read.
What do you think?

Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



Re: [dpdk-dev] [PATCH] net/virtio: fix kernel set features for multi-queue devices

2021-06-01 Thread Thierry Herbelot

Hello Maxime,

On 6/1/21 9:51 AM, Maxime Coquelin wrote:



On 5/28/21 3:20 PM, Thierry Herbelot wrote:

Restore the original code, where VHOST_SET_FEATURES is applied to
all vhostfds of the device.

Fixes: cc0151b34dee ("net/virtio: add virtio-user features ops")
Cc: sta...@dpdk.org
Cc: Maxime Coquelin 
Cc: Chenbo Xia 

Signed-off-by: Thierry Herbelot 
---
  drivers/net/virtio/virtio_user/vhost_kernel.c | 13 -
  1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c 
b/drivers/net/virtio/virtio_user/vhost_kernel.c
index ad46f10a9300..d65f89e1fc16 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -158,6 +158,8 @@ static int
  vhost_kernel_set_features(struct virtio_user_dev *dev, uint64_t features)
  {
struct vhost_kernel_data *data = dev->backend_data;
+   uint32_t i;
+   int ret;
  
  	/* We don't need memory protection here */

features &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);
@@ -166,7 +168,16 @@ vhost_kernel_set_features(struct virtio_user_dev *dev, 
uint64_t features)
features &= ~VHOST_KERNEL_HOST_OFFLOADS_MASK;
features &= ~(1ULL << VIRTIO_NET_F_MQ);
  
-	return vhost_kernel_ioctl(data->vhostfds[0], VHOST_SET_FEATURES, &features);

+   for (i = 0; i < dev->max_queue_pairs; ++i) {
+   if (data->vhostfds[i] < 0)
+   continue;
+
+   ret = vhost_kernel_ioctl(data->vhostfds[i], VHOST_SET_FEATURES, 
&features);
+   if (ret < 0)
+   return ret;
+   }
+
+   return 0;
  }
  
  static int




Thanks for fixing it, it should be the last one...

Except GET_FEATURES that was also queried for every queue pair, but I
don't think it makes sense to query it and just drop the value read.
What do you think?


Indeed, GET_FEATURES returns a single value: let's assume the value for 
the first queue pair is the right one.


Thierry



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime



--
Thierry Herbelot
Senior Software Engineer
Tel: +33 1 39 30 92 61
http://www.6wind.com/

Follow us:
https://www.linkedin.com/company/6wind/
https://twitter.com/6WINDsoftware
https://www.youtube.com/user/6windsoftware


[dpdk-dev] 20.11.2 patches review and test

2021-06-01 Thread Xueming(Steven) Li
Hi all,

Here is a list of patches targeted for stable release 20.11.2.

The planned date for the final release is 15th June.

Please help with testing and validation of your use cases and report
any issues/results with reply-all to this mail. For the final release
the fixes and reported validations will be added to the release notes.

A release candidate tarball can be found at:

https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.2-rc1

These patches are located at branch 20.11 of dpdk-stable repo:
https://dpdk.org/browse/dpdk-stable/


Thanks.

Xueming Li 

---
Ajit Khaparde (3):
  net/bnxt: fix RSS context cleanup
  net/bnxt: check kvargs parsing
  net/bnxt: fix resource cleanup

Alvin Zhang (7):
  net/ice: fix VLAN filter with PF
  net/i40e: fix input set field mask
  net/igc: fix Rx RSS hash offload capability
  net/igc: fix Rx error counter for bad length
  net/e1000: fix Rx error counter for bad length
  net/e1000: fix max Rx packet size
  net/igc: fix Rx packet size

Anatoly Burakov (2):
  fbarray: fix log message on truncation error
  power: do not skip saving original P-state governor

Andrew Boyer (1):
  net/ionic: fix completion type in lif init

Andrew Rybchenko (3):
  net/failsafe: fix RSS hash offload reporting
  net/failsafe: report minimum and maximum MTU
  common/sfc_efx: remove GENEVE from supported tunnels

Ankur Dwivedi (1):
  crypto/octeontx: fix session-less mode

Apeksha Gupta (1):
  examples/l2fwd-crypto: skip masked devices

Arek Kusztal (1):
  crypto/qat: fix offset for out-of-place scatter-gather

Beilei Xing (1):
  net/i40evf: fix packet loss for X722

Bruce Richardson (1):
  build: exclude meson files from examples installation

Chenbo Xia (1):
  examples/vhost: check memory table query

Chengchang Tang (15):
  net/hns3: fix HW buffer size on MTU update
  net/hns3: fix processing Tx offload flags
  net/hns3: fix Tx checksum for UDP packets with special port
  net/hns3: fix long task queue pairs reset time
  ethdev: validate input in module EEPROM dump
  ethdev: validate input in register info
  ethdev: validate input in EEPROM info
  net/hns3: fix rollback after setting PVID failure
  net/hns3: fix timing in resetting queues
  net/hns3: fix queue state when concurrent with reset
  net/hns3: fix configure FEC when concurrent with reset
  net/hns3: fix use of command status enumeration
  examples: add eal cleanup to examples
  net/bonding: fix adding itself as its slave
  net/hns3: fix timing in mailbox

Chengwen Feng (15):
  net/hns3: fix flow counter value
  net/hns3: fix VF mailbox head field
  net/hns3: support get device version when dump register
  net/hns3: fix some packet types
  net/hns3: fix missing outer L4 UDP flag for VXLAN
  net/hns3: remove VLAN/QinQ ptypes from support list
  test: check thread creation
  common/dpaax: fix possible null pointer access
  examples/ethtool: remove unused parsing
  net/hns3: fix flow director lock
  net/e1000/base: fix timeout for shadow RAM write
  net/hns3: fix setting default MAC address in bonding of VF
  net/hns3: fix possible mismatched response of mailbox
  net/hns3: fix VF handling LSC event in secondary process
  net/hns3: fix verification of NEON support

Ciara Loftus (1):
  net/af_xdp: fix error handling during Rx queue setup

Conor Walsh (1):
  examples/l3fwd: fix LPM IPv6 subnets

Cristian Dumitrescu (3):
  table: fix actions with different data size
  pipeline: fix instruction translation
  pipeline: fix endianness conversions

Dapeng Yu (3):
  net/igc: remove MTU setting limitation
  net/e1000: remove MTU setting limitation
  examples/packet_ordering: fix port configuration

David Harton (1):
  net/ena: fix releasing Tx ring mbufs

David Marchand (8):
  doc: fix sphinx rtd theme import in GHA
  service: clean references to removed symbol
  eal: fix evaluation of log level option
  ci: hook to GitHub Actions
  ci: enable v21 ABI checks
  ci: fix package installation in GitHub Actions
  ci: ignore APT update failure in GitHub Actions
  ci: catch coredumps

Dekel Peled (1):
  common/mlx5: fix DevX read output buffer size

Dmitry Kozlyuk (3):
  net/pcap: fix format string
  eal/windows: add missing SPDX license tag
  buildtools: fix all drivers disabled on Windows

Ed Czeck (2):
  net/ark: update packet director initial state
  net/ark: refactor Rx buffer recovery

Elad Nachman (2):
  kni: support async user request
  kni: fix kernel deadlock with bifurcated device

Feifei Wang (2):
  net/i40e: fix parsing packet type for NEON
  test/trace: fix race on collected perf data

Ferruh Yigit (3):
  power: remove duplicated symbols from map file
  log/linux: make default output stderr
  license: fix 

[dpdk-dev] [PATCH v2 0/3] l3fwd improvements

2021-06-01 Thread Ruifeng Wang
This series of patches include changes to l3fwd example application.
Some improvements are made for better usage of CPU cycles and memory.

v2:
Dropped 1/4 prefetch to write change from v1.
Dropped 4/4 data struct change from v1.
Added 1/3 code reorganize.
Updated 3/3 to add 'const'. (Jerin)

Ruifeng Wang (3):
  examples/l3fwd: reorganize code for better performance
  examples/l3fwd: eliminate unnecessary calculations
  examples/l3fwd: eliminate unnecessary reloads in loop

 examples/l3fwd/l3fwd_lpm.c  | 10 ++
 examples/l3fwd/l3fwd_lpm_neon.h | 10 +-
 examples/l3fwd/l3fwd_neon.h | 10 +-
 3 files changed, 16 insertions(+), 14 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v2 1/3] examples/l3fwd: reorganize code for better performance

2021-06-01 Thread Ruifeng Wang
Moved rfc1812 process prior to NEON registers store.
On N1SDP, this reorganization mitigates CPU frontend stall and backend
stall when forwarding.

On N1SDP with MLX5 40G NIC, this change showed 10.2% performance gain
in single port single core MRR test.
On ThunderX2, this changed showed no performance degradation.

Signed-off-by: Ruifeng Wang 
---
 examples/l3fwd/l3fwd_neon.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h
index 86ac5971d7..ea7fe22d00 100644
--- a/examples/l3fwd/l3fwd_neon.h
+++ b/examples/l3fwd/l3fwd_neon.h
@@ -43,11 +43,6 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t 
dst_port[FWDSTEP])
ve[2] = vsetq_lane_u32(vgetq_lane_u32(te[2], 3), ve[2], 3);
ve[3] = vsetq_lane_u32(vgetq_lane_u32(te[3], 3), ve[3], 3);
 
-   vst1q_u32(p[0], ve[0]);
-   vst1q_u32(p[1], ve[1]);
-   vst1q_u32(p[2], ve[2]);
-   vst1q_u32(p[3], ve[3]);
-
rfc1812_process((struct rte_ipv4_hdr *)
((struct rte_ether_hdr *)p[0] + 1),
&dst_port[0], pkt[0]->packet_type);
@@ -60,6 +55,11 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t 
dst_port[FWDSTEP])
rfc1812_process((struct rte_ipv4_hdr *)
((struct rte_ether_hdr *)p[3] + 1),
&dst_port[3], pkt[3]->packet_type);
+
+   vst1q_u32(p[0], ve[0]);
+   vst1q_u32(p[1], ve[1]);
+   vst1q_u32(p[2], ve[2]);
+   vst1q_u32(p[3], ve[3]);
 }
 
 /*
-- 
2.25.1



[dpdk-dev] [PATCH v2 2/3] examples/l3fwd: eliminate unnecessary calculations

2021-06-01 Thread Ruifeng Wang
Both L2 and L3 headers will be used in forward processing. And these
two headers are in the same cache line. It has the same effect for
prefetching with L2 header address and prefetching with L3 header
address.

Changed to use L2 header address for prefetching. The change showed
no measurable performance improvement, but it definitely removed
unnecessary instructions for address calculation.

Signed-off-by: Ruifeng Wang 
Acked-by: Jerin Jacob 
---
 examples/l3fwd/l3fwd_lpm_neon.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h
index d6c0ba64ab..78ee83b76c 100644
--- a/examples/l3fwd/l3fwd_lpm_neon.h
+++ b/examples/l3fwd/l3fwd_lpm_neon.h
@@ -98,14 +98,14 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf 
**pkts_burst,
if (k) {
for (i = 0; i < FWDSTEP; i++) {
rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i],
-   struct rte_ether_hdr *) + 1);
+   void *));
}
 
for (j = 0; j != k - FWDSTEP; j += FWDSTEP) {
for (i = 0; i < FWDSTEP; i++) {
rte_prefetch0(rte_pktmbuf_mtod(
pkts_burst[j + i + FWDSTEP],
-   struct rte_ether_hdr *) + 1);
+   void *));
}
 
processx4_step1(&pkts_burst[j], &dip, &ipv4_flag);
@@ -125,17 +125,17 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf 
**pkts_burst,
switch (m) {
case 3:
rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
-   struct rte_ether_hdr *) + 1);
+   void *));
j++;
/* fallthrough */
case 2:
rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
-   struct rte_ether_hdr *) + 1);
+   void *));
j++;
/* fallthrough */
case 1:
rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
-   struct rte_ether_hdr *) + 1);
+   void *));
j++;
}
 
-- 
2.25.1



[dpdk-dev] [PATCH v2 3/3] examples/l3fwd: eliminate unnecessary reloads in loop

2021-06-01 Thread Ruifeng Wang
Number of rx queue and number of rx port in lcore config are constants
during the period of l3 forward application running. But compiler has
no this information.

Copied values from lcore config to local variables and used the local
variables for iteration. Compiler can see that the local variables are
not changed, so qconf reloads at each iteration can be eliminated.

The change showed 1.8% performance uplift in single core, single port,
single queue test on N1SDP platform with MLX5 NIC.

Signed-off-by: Ruifeng Wang 
Reviewed-by: Ola Liljedahl 
Reviewed-by: Honnappa Nagarahalli 
---
 examples/l3fwd/l3fwd_lpm.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index 427c72b1d2..ff1c18a442 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -154,14 +154,16 @@ lpm_main_loop(__rte_unused void *dummy)
lcore_id = rte_lcore_id();
qconf = &lcore_conf[lcore_id];
 
-   if (qconf->n_rx_queue == 0) {
+   const uint16_t n_rx_q = qconf->n_rx_queue;
+   const uint16_t n_tx_p = qconf->n_tx_port;
+   if (n_rx_q == 0) {
RTE_LOG(INFO, L3FWD, "lcore %u has nothing to do\n", lcore_id);
return 0;
}
 
RTE_LOG(INFO, L3FWD, "entering main loop on lcore %u\n", lcore_id);
 
-   for (i = 0; i < qconf->n_rx_queue; i++) {
+   for (i = 0; i < n_rx_q; i++) {
 
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
@@ -181,7 +183,7 @@ lpm_main_loop(__rte_unused void *dummy)
diff_tsc = cur_tsc - prev_tsc;
if (unlikely(diff_tsc > drain_tsc)) {
 
-   for (i = 0; i < qconf->n_tx_port; ++i) {
+   for (i = 0; i < n_tx_p; ++i) {
portid = qconf->tx_port_id[i];
if (qconf->tx_mbufs[portid].len == 0)
continue;
@@ -197,7 +199,7 @@ lpm_main_loop(__rte_unused void *dummy)
/*
 * Read packet from RX queues
 */
-   for (i = 0; i < qconf->n_rx_queue; ++i) {
+   for (i = 0; i < n_rx_q; ++i) {
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-- 
2.25.1



Re: [dpdk-dev] Problem while running dpdk

2021-06-01 Thread Bruce Richardson
On Thu, May 27, 2021 at 10:05:39PM +, Raunak Laddha wrote:
> Hello,
> I am using dpdk 20.11.1 . I tried to use your resource to build a custom 
> application which uses dpdk. I compiled my app using meson and makefile to 
> see if I get different result for my issue. I have used pkg-config to load 
> cflags and ldflags in my application and linking it as a static library.
> Problem I am facing:
> No buses are loaded. I can see that RTE_INIT_PRIO is used as constructor to 
> load the buses. But in my application, buses are not loaded.
> I tried to run the dpdk app named test-pipeline to check if it works and it 
> does in that case. But same application with my makefile or meson file does 
> not work.
> First call in my program is rte_eal_init(argc, argv); (the init call of dpdk).
> My app gets compiles. Also verified cflags and ldflags to check if dpdk flags 
> are added.
> My assumption is whatever RTE_INIT_PRIO is loading, it gets loaded correctly 
> in dpdk test-pipeline app but not in my custom app.
> Is there any config I am missing?
> I have attached the makefile and meson file.
> 
> Thanks,
> Raunak

Hi Raunak,

I'm afraid that the attachments got stripped on the email. To help resolve
the problem you encountered, the first thing to check would be the actual
link-command used when linking your app. Check that the drivers are being
linked into the static binary appropriately.  Also, depending on the Linux
distro in use, some versions of pkg-config have a problem with reordering
the linker flags, so I'd recommend installing and using pkgconf package
rather than pkg-config to remove this as a source of error.

Regards,
/Bruce


[dpdk-dev] [PATCH v3 0/4] support AVF RSS and FDIR for GTPoGRE packet

2021-06-01 Thread Lingyu Liu
Support AVF RSS and FDIR for GTPoGRE packet.

Lingyu Liu (4):
  net/iavf: support flow pattern for GTPoGRE
  common/iavf: add header types for GRE
  net/iavf: support AVF FDIR for GTPoGRE tunnel packet
  net/iavf: support AVF RSS for GTPoGRE packet
---
 V3 change:
 - add GTPU extension header pattern

 drivers/common/iavf/virtchnl.h   |   1 +
 drivers/net/iavf/iavf_fdir.c |  66 +++
 drivers/net/iavf/iavf_generic_flow.c | 600 +++
 drivers/net/iavf/iavf_generic_flow.h |  80 
 drivers/net/iavf/iavf_hash.c |  48 +++
 5 files changed, 795 insertions(+)

-- 
2.25.1



[dpdk-dev] [PATCH v3 1/4] net/iavf: support flow pattern for GTPoGRE

2021-06-01 Thread Lingyu Liu
Add GTPoGRE pattern support for AVF FDIR and RSS.

Signed-off-by: Lingyu Liu 
---
 drivers/net/iavf/iavf_generic_flow.c | 600 +++
 drivers/net/iavf/iavf_generic_flow.h |  80 
 2 files changed, 680 insertions(+)

diff --git a/drivers/net/iavf/iavf_generic_flow.c 
b/drivers/net/iavf/iavf_generic_flow.c
index 242bb4abc5..758f615c39 100644
--- a/drivers/net/iavf/iavf_generic_flow.c
+++ b/drivers/net/iavf/iavf_generic_flow.c
@@ -433,6 +433,606 @@ enum rte_flow_item_type 
iavf_pattern_eth_ipv4_gtpu_ipv4_icmp[] = {
RTE_FLOW_ITEM_TYPE_END,
 };
 
+/* IPV4 GRE IPv4 UDP GTPU IPv4*/
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv4[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv4_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv4_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+/* IPV4 GRE IPv4 UDP GTPU IPv6*/
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv6[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv6_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv6_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+/* IPV4 GRE IPv6 UDP GTPU IPv4*/
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv4[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv4_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv4_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+/* IPV4 GRE IPv6 UDP GTPU IPv6*/
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv6[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv6_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv6_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_GTPU,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   R

[dpdk-dev] [PATCH v3 2/4] common/iavf: add header types for GRE

2021-06-01 Thread Lingyu Liu
Add a virtchnl protocol header type to support AVF FDIR and RSS for GRE.

Signed-off-by: Lingyu Liu 
---
 drivers/common/iavf/virtchnl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/common/iavf/virtchnl.h b/drivers/common/iavf/virtchnl.h
index 3a60faff93..197edce8a1 100644
--- a/drivers/common/iavf/virtchnl.h
+++ b/drivers/common/iavf/virtchnl.h
@@ -1504,6 +1504,7 @@ enum virtchnl_proto_hdr_type {
 */
VIRTCHNL_PROTO_HDR_IPV4_FRAG,
VIRTCHNL_PROTO_HDR_IPV6_EH_FRAG,
+   VIRTCHNL_PROTO_HDR_GRE,
 };
 
 /* Protocol header field within a protocol header. */
-- 
2.25.1



[dpdk-dev] [PATCH v3 3/4] net/iavf: support AVF FDIR for GTPoGRE tunnel packet

2021-06-01 Thread Lingyu Liu
Support AVF FDIR for inner header of GTPoGRE tunnel packet.

++---+
|Pattern |Input Set  |
++---+
|eth/ipv4/gre/ipv4/gtpu/(eh/)ipv4|inner: src/dst ip  |
|eth/ipv4/gre/ipv4/gtpu/(eh/)ipv4/udp|inner: src/dst ip, src/dst port|
|eth/ipv4/gre/ipv4/gtpu/(eh/)ipv4/tcp|inner: src/dst ip, src/dst port|
|eth/ipv4/gre/ipv4/gtpu/(eh/)ipv6|inner: src/dst ip  |
|eth/ipv4/gre/ipv4/gtpu/(eh/)ipv6/udp|inner: src/dst ip, src/dst port|
|eth/ipv4/gre/ipv4/gtpu/(eh/)ipv6/tcp|inner: src/dst ip, src/dst port|
|eth/ipv4/gre/ipv6/gtpu/(eh/)ipv4|inner: src/dst ip  |
|eth/ipv4/gre/ipv6/gtpu/(eh/)ipv4/udp|inner: src/dst ip, src/dst port|
|eth/ipv4/gre/ipv6/gtpu/(eh/)ipv4/tcp|inner: src/dst ip, src/dst port|
|eth/ipv4/gre/ipv6/gtpu/(eh/)ipv6|inner: src/dst ip  |
|eth/ipv4/gre/ipv6/gtpu/(eh/)ipv6/udp|inner: src/dst ip, src/dst port|
|eth/ipv4/gre/ipv6/gtpu/(eh/)ipv6/tcp|inner: src/dst ip, src/dst port|
|eth/ipv6/gre/ipv4/gtpu/(eh/)ipv4|inner: src/dst ip  |
|eth/ipv6/gre/ipv4/gtpu/(eh/)ipv4/udp|inner: src/dst ip, src/dst port|
|eth/ipv6/gre/ipv4/gtpu/(eh/)ipv4/tcp|inner: src/dst ip, src/dst port|
|eth/ipv6/gre/ipv4/gtpu/(eh/)ipv6|inner: src/dst ip  |
|eth/ipv6/gre/ipv4/gtpu/(eh/)ipv6/udp|inner: src/dst ip, src/dst port|
|eth/ipv6/gre/ipv4/gtpu/(eh/)ipv6/tcp|inner: src/dst ip, src/dst port|
|eth/ipv6/gre/ipv6/gtpu/(eh/)ipv4|inner: src/dst ip  |
|eth/ipv6/gre/ipv6/gtpu/(eh/)ipv4/udp|inner: src/dst ip, src/dst port|
|eth/ipv6/gre/ipv6/gtpu/(eh/)ipv4/tcp|inner: src/dst ip, src/dst port|
|eth/ipv6/gre/ipv6/gtpu/(eh/)ipv6|inner: src/dst ip  |
|eth/ipv6/gre/ipv6/gtpu/(eh/)ipv6/udp|inner: src/dst ip, src/dst port|
|eth/ipv6/gre/ipv6/gtpu/(eh/)ipv6/tcp|inner: src/dst ip, src/dst port|
++---+

Signed-off-by: Lingyu Liu 
---
 drivers/net/iavf/iavf_fdir.c | 66 
 1 file changed, 66 insertions(+)

diff --git a/drivers/net/iavf/iavf_fdir.c b/drivers/net/iavf/iavf_fdir.c
index f238a83c84..d5d97b7ef0 100644
--- a/drivers/net/iavf/iavf_fdir.c
+++ b/drivers/net/iavf/iavf_fdir.c
@@ -164,6 +164,54 @@ static struct iavf_pattern_match_item iavf_fdir_pattern[] 
= {
{iavf_pattern_eth_ipv4_gtpu_eh_ipv6, IAVF_FDIR_INSET_GTPU_IPV6, 
IAVF_INSET_NONE},
{iavf_pattern_eth_ipv4_gtpu_eh_ipv6_udp, IAVF_FDIR_INSET_GTPU_IPV6_UDP, 
IAVF_INSET_NONE},
{iavf_pattern_eth_ipv4_gtpu_eh_ipv6_tcp, IAVF_FDIR_INSET_GTPU_IPV6_TCP, 
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv4,   
IAVF_FDIR_INSET_GTPU_IPV4, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv4_udp,   
IAVF_FDIR_INSET_GTPU_IPV4_UDP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv4_tcp,   
IAVF_FDIR_INSET_GTPU_IPV4_TCP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv6,   
IAVF_FDIR_INSET_GTPU_IPV6, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv6_udp,   
IAVF_FDIR_INSET_GTPU_IPV6_UDP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv6_tcp,   
IAVF_FDIR_INSET_GTPU_IPV6_TCP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv4,   
IAVF_FDIR_INSET_GTPU_IPV4, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv4_udp,   
IAVF_FDIR_INSET_GTPU_IPV4_UDP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv4_tcp,   
IAVF_FDIR_INSET_GTPU_IPV4_TCP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv6,   
IAVF_FDIR_INSET_GTPU_IPV6, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv6_udp,   
IAVF_FDIR_INSET_GTPU_IPV6_UDP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv6_tcp,   
IAVF_FDIR_INSET_GTPU_IPV6_TCP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv4,   
IAVF_FDIR_INSET_GTPU_IPV4, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv4_udp,   
IAVF_FDIR_INSET_GTPU_IPV4_UDP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv4_tcp,   
IAVF_FDIR_INSET_GTPU_IPV4_TCP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv6,   
IAVF_FDIR_INSET_GTPU_IPV6, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv6_udp,   
IAVF_FDIR_INSET_GTPU_IPV6_UDP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv6_tcp,   
IAVF_FDIR_INSET_GTPU_IPV6_TCP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv6_gtpu_ipv4,   
IAVF_FDIR_INSET_GTPU_IPV4, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv6_gtpu_ipv4_udp,   
IAVF_FDIR_INSET_GTPU_IPV4_UDP, IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv6_gtpu_ipv4_tcp,   
IAVF_FDIR_INSET_GTPU_IPV4_TCP, IAVF_INSET_NONE},
+   

[dpdk-dev] [PATCH v3 4/4] net/iavf: support AVF RSS for GTPoGRE packet

2021-06-01 Thread Lingyu Liu
Support AVF RSS for inner most header of GTPoGRE packet. It supports
RSS based on inner most IP src + dst address and TCP/UDP src + dst
port.

Signed-off-by: Lingyu Liu 
---
 drivers/net/iavf/iavf_hash.c | 48 
 1 file changed, 48 insertions(+)

diff --git a/drivers/net/iavf/iavf_hash.c b/drivers/net/iavf/iavf_hash.c
index 5d3d62839b..b2bb625b98 100644
--- a/drivers/net/iavf/iavf_hash.c
+++ b/drivers/net/iavf/iavf_hash.c
@@ -420,6 +420,54 @@ static struct iavf_pattern_match_item 
iavf_hash_pattern_list[] = {
{iavf_pattern_eth_ipv6_gtpu_eh_ipv4,
IAVF_RSS_TYPE_GTPU_IPV4,&inner_ipv4_tmplt},
{iavf_pattern_eth_ipv6_gtpu_eh_ipv4_udp,
IAVF_RSS_TYPE_GTPU_IPV4_UDP,&inner_ipv4_udp_tmplt},
{iavf_pattern_eth_ipv6_gtpu_eh_ipv4_tcp,
IAVF_RSS_TYPE_GTPU_IPV4_TCP,&inner_ipv4_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv4,  
IAVF_RSS_TYPE_GTPU_IPV4,&inner_ipv4_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv4_udp,  
IAVF_RSS_TYPE_GTPU_IPV4_UDP,&inner_ipv4_udp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv4_tcp,  
IAVF_RSS_TYPE_GTPU_IPV4_TCP,&inner_ipv4_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv6,  
IAVF_RSS_TYPE_GTPU_IPV6,&inner_ipv6_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv6_udp,  
IAVF_RSS_TYPE_GTPU_IPV6_UDP,&inner_ipv6_udp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_ipv6_tcp,  
IAVF_RSS_TYPE_GTPU_IPV6_TCP,&inner_ipv6_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv4,  
IAVF_RSS_TYPE_GTPU_IPV4,&inner_ipv4_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv4_udp,  
IAVF_RSS_TYPE_GTPU_IPV4_UDP,&inner_ipv4_udp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv4_tcp,  
IAVF_RSS_TYPE_GTPU_IPV4_TCP,&inner_ipv4_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv6,  
IAVF_RSS_TYPE_GTPU_IPV6,&inner_ipv6_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv6_udp,  
IAVF_RSS_TYPE_GTPU_IPV6_UDP,&inner_ipv6_udp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_ipv6_tcp,  
IAVF_RSS_TYPE_GTPU_IPV6_TCP,&inner_ipv6_tcp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv4,  
IAVF_RSS_TYPE_GTPU_IPV4,&inner_ipv4_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv4_udp,  
IAVF_RSS_TYPE_GTPU_IPV4_UDP,&inner_ipv4_udp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv4_tcp,  
IAVF_RSS_TYPE_GTPU_IPV4_TCP,&inner_ipv4_tcp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv6,  
IAVF_RSS_TYPE_GTPU_IPV6,&inner_ipv6_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv6_udp,  
IAVF_RSS_TYPE_GTPU_IPV6_UDP,&inner_ipv6_udp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv4_gtpu_ipv6_tcp,  
IAVF_RSS_TYPE_GTPU_IPV6_TCP,&inner_ipv6_tcp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv6_gtpu_ipv4,  
IAVF_RSS_TYPE_GTPU_IPV4,&inner_ipv4_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv6_gtpu_ipv4_udp,  
IAVF_RSS_TYPE_GTPU_IPV4_UDP,&inner_ipv4_udp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv6_gtpu_ipv4_tcp,  
IAVF_RSS_TYPE_GTPU_IPV4_TCP,&inner_ipv4_tcp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv6_gtpu_ipv6,  
IAVF_RSS_TYPE_GTPU_IPV6,&inner_ipv6_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv6_gtpu_ipv6_udp,  
IAVF_RSS_TYPE_GTPU_IPV6_UDP,&inner_ipv6_udp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv6_gtpu_ipv6_tcp,  
IAVF_RSS_TYPE_GTPU_IPV6_TCP,&inner_ipv6_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_eh_ipv4,   
IAVF_RSS_TYPE_GTPU_IPV4,&inner_ipv4_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_eh_ipv4_udp,   
IAVF_RSS_TYPE_GTPU_IPV4_UDP,&inner_ipv4_udp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_eh_ipv4_tcp,   
IAVF_RSS_TYPE_GTPU_IPV4_TCP,&inner_ipv4_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_eh_ipv6,   
IAVF_RSS_TYPE_GTPU_IPV6,&inner_ipv6_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_eh_ipv6_udp,   
IAVF_RSS_TYPE_GTPU_IPV6_UDP,&inner_ipv6_udp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_gtpu_eh_ipv6_tcp,   
IAVF_RSS_TYPE_GTPU_IPV6_TCP,&inner_ipv6_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_eh_ipv4,   
IAVF_RSS_TYPE_GTPU_IPV4,&inner_ipv4_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_eh_ipv4_udp,   
IAVF_RSS_TYPE_GTPU_IPV4_UDP,&inner_ipv4_udp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_eh_ipv4_tcp,   
IAVF_RSS_TYPE_GTPU_IPV4_TCP,&inner_ipv4_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_eh_ipv6,   
IAVF_RSS_TYPE_GTPU_IPV6,&inner_ipv6_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_gtpu_eh_i

[dpdk-dev] [PATCH v2] vfio: fix stdbool usage without include

2021-06-01 Thread Christian Ehrhardt
This became visible by backporting the following for the 19.11 stable tree:
 c13ca4e8 "vfio: fix DMA mapping granularity for IOVA as VA"

The usage of type bool in the vfio code would require "#include
", but rte_vfio.h has no direct paths to stdbool.h.
It happens that in eal_vfio_mp_sync.c it comes after "#include
".

And rte_log.h since 20.05 includes stdbool since this change:
 241e67bfe "log: add API to check if a logtype can log in a given level"
and thereby mitigates the issue.

It should be safe to include stdbool.h from rte_vfio.h itself
to be present exactly when needed for the struct it defines using that
type.

Fixes: c13ca4e81cac ("vfio: fix DMA mapping granularity for IOVA as VA")

Signed-off-by: Christian Ehrhardt 
---
 lib/eal/include/rte_vfio.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index e7a87454bea..2d90b364801 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -14,6 +14,7 @@
 extern "C" {
 #endif
 
+#include 
 #include 
 
 /*
-- 
2.31.1



[dpdk-dev] [PATCH 0/2] MLX5 PMD tuning

2021-06-01 Thread Ruifeng Wang
This series include optimizations for MLX5 PMD.
In tests on Arm N1SDP with MLX5 40G NIC, changes
showed performance gain.

Ruifeng Wang (2):
  net/mlx5: remove redundant operations
  net/mlx5: reduce unnecessary memory access

 drivers/net/mlx5/mlx5_rxtx_vec.c  | 6 --
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 9 +
 2 files changed, 5 insertions(+), 10 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH 1/2] net/mlx5: remove redundant operations

2021-06-01 Thread Ruifeng Wang
Some operations on mask are redundant and can be removed.
The change yielded 1.6% performance gain on N1SDP.
On ThunderX2, slight performance uplift was also observed.

Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: sta...@dpdk.org

Signed-off-by: Ruifeng Wang 
---
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h 
b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 2234fbe6b2..98a75b09c6 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -768,18 +768,11 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile 
struct mlx5_cqe *cq,
  comp_mask), 0)) /
  (sizeof(uint16_t) * 8);
/* D.6 mask out entries after the compressed CQE. */
-   mask = vcreate_u16(comp_idx < MLX5_VPMD_DESCS_PER_LOOP ?
-  -1UL >> (comp_idx * sizeof(uint16_t) * 8) :
-  0);
-   invalid_mask = vorr_u16(invalid_mask, mask);
+   invalid_mask = vorr_u16(invalid_mask, comp_mask);
/* D.7 count non-compressed valid CQEs. */
n = __builtin_clzl(vget_lane_u64(vreinterpret_u64_u16(
   invalid_mask), 0)) / (sizeof(uint16_t) * 8);
nocmp_n += n;
-   /* D.2 get the final invalid mask. */
-   mask = vcreate_u16(n < MLX5_VPMD_DESCS_PER_LOOP ?
-  -1UL >> (n * sizeof(uint16_t) * 8) : 0);
-   invalid_mask = vorr_u16(invalid_mask, mask);
/* D.3 check error in opcode. */
opcode = vceq_u16(resp_err_check, opcode);
opcode = vbic_u16(opcode, invalid_mask);
-- 
2.25.1



[dpdk-dev] [PATCH 2/2] net/mlx5: reduce unnecessary memory access

2021-06-01 Thread Ruifeng Wang
MR btree len is a constant during Rx replenish.
Moved retrieve of the value out of loop to reduce data loads.
Slight performance uplift was measured on N1SDP.

Signed-off-by: Ruifeng Wang 
---
 drivers/net/mlx5/mlx5_rxtx_vec.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index d5af2d91ff..fc7e2a7f41 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -95,6 +95,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq)
volatile struct mlx5_wqe_data_seg *wq =
&((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[elts_idx];
unsigned int i;
+   uint16_t btree_len;
 
if (n >= rxq->rq_repl_thresh) {
MLX5_ASSERT(n >= MLX5_VPMD_RXQ_RPLNSH_THRESH(q_n));
@@ -106,6 +107,8 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq)
rxq->stats.rx_nombuf += n;
return;
}
+
+   btree_len = mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh);
for (i = 0; i < n; ++i) {
void *buf_addr;
 
@@ -119,8 +122,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq)
wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
  RTE_PKTMBUF_HEADROOM);
/* If there's a single MR, no need to replace LKey. */
-   if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh)
-> 1))
+   if (unlikely(btree_len > 1))
wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
}
rxq->rq_ci += n;
-- 
2.25.1



[dpdk-dev] [PATCH] doc: announce removal of ABIs in PCI bus driver

2021-06-01 Thread Chenbo Xia
All ABIs in PCI bus driver, which are defined in rte_buc_pci.h,
will be removed and the header will be made internal.

Signed-off-by: Chenbo Xia 
---
 doc/guides/rel_notes/deprecation.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..b01f46c62e 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -147,3 +147,8 @@ Deprecation Notices
 * cmdline: ``cmdline`` structure will be made opaque to hide platform-specific
   content. On Linux and FreeBSD, supported prior to DPDK 20.11,
   original structure will be kept until DPDK 21.11.
+
+* pci: To reduce unnecessary ABIs exposed by DPDK bus driver, "rte_bus_pci.h"
+  will be made internal in 21.11 and macros/data structures/functions defined
+  in the header will not be considered as ABI anymore. This change is inspired
+  by the RFC https://patchwork.dpdk.org/project/dpdk/list/?series=17176.
-- 
2.17.1



[dpdk-dev] [PATCH] net/octeontx2: fix flow create on CN98xx

2021-06-01 Thread psatheesh
From: Satheesh Paul 

CN96xx and CN98xx have 4096 and 16384 MCAM entries respectively.
Aligning the code with the same numbers.

Fixes: 092b3834185 ("net/octeontx2: add flow init and fini")

Signed-off-by: Satheesh Paul 
---
 drivers/common/octeontx2/otx2_dev.h |  3 +++
 drivers/net/octeontx2/otx2_flow.c   | 16 ++--
 drivers/net/octeontx2/otx2_flow.h   |  1 -
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/common/octeontx2/otx2_dev.h 
b/drivers/common/octeontx2/otx2_dev.h
index cd4fe517d..9d8dcca79 100644
--- a/drivers/common/octeontx2/otx2_dev.h
+++ b/drivers/common/octeontx2/otx2_dev.h
@@ -55,6 +55,9 @@
 (RVU_PCI_REV_MINOR(otx2_dev_revid(dev)) == 0x0) && \
 (RVU_PCI_REV_MIDR_ID(otx2_dev_revid(dev)) == 0x0))
 
+#define otx2_dev_is_98xx(dev)   \
+(RVU_PCI_REV_MIDR_ID(otx2_dev_revid(dev)) == 0x3)
+
 struct otx2_dev;
 
 /* Link status callback */
diff --git a/drivers/net/octeontx2/otx2_flow.c 
b/drivers/net/octeontx2/otx2_flow.c
index 1c90d753f..6df073218 100644
--- a/drivers/net/octeontx2/otx2_flow.c
+++ b/drivers/net/octeontx2/otx2_flow.c
@@ -1003,12 +1003,23 @@ flow_fetch_kex_cfg(struct otx2_eth_dev *dev)
return rc;
 }
 
+#define OTX2_MCAM_TOT_ENTRIES_96XX (4096)
+#define OTX2_MCAM_TOT_ENTRIES_98XX (16384)
+
+static int otx2_mcam_tot_entries(struct otx2_eth_dev *dev)
+{
+   if (otx2_dev_is_98xx(dev))
+   return OTX2_MCAM_TOT_ENTRIES_98XX;
+   else
+   return OTX2_MCAM_TOT_ENTRIES_96XX;
+}
+
 int
 otx2_flow_init(struct otx2_eth_dev *hw)
 {
uint8_t *mem = NULL, *nix_mem = NULL, *npc_mem = NULL;
struct otx2_npc_flow_info *npc = &hw->npc_flow;
-   uint32_t bmap_sz;
+   uint32_t bmap_sz, tot_mcam_entries = 0;
int rc = 0, idx;
 
rc = flow_fetch_kex_cfg(hw);
@@ -1020,7 +1031,8 @@ otx2_flow_init(struct otx2_eth_dev *hw)
rte_atomic32_init(&npc->mark_actions);
npc->vtag_actions = 0;
 
-   npc->mcam_entries = NPC_MCAM_TOT_ENTRIES >> npc->keyw[NPC_MCAM_RX];
+   tot_mcam_entries = otx2_mcam_tot_entries(hw);
+   npc->mcam_entries = tot_mcam_entries >> npc->keyw[NPC_MCAM_RX];
/* Free, free_rev, live and live_rev entries */
bmap_sz = rte_bitmap_get_memory_footprint(npc->mcam_entries);
mem = rte_zmalloc(NULL, 4 * bmap_sz * npc->flow_max_priority,
diff --git a/drivers/net/octeontx2/otx2_flow.h 
b/drivers/net/octeontx2/otx2_flow.h
index 82a5064d9..790e6ef1e 100644
--- a/drivers/net/octeontx2/otx2_flow.h
+++ b/drivers/net/octeontx2/otx2_flow.h
@@ -35,7 +35,6 @@ enum {
 /* 32 bytes from LDATA_CFG & 32 bytes from FLAGS_CFG */
 #define NPC_MAX_EXTRACT_DATA_LEN   (64)
 #define NPC_LDATA_LFLAG_LEN(16)
-#define NPC_MCAM_TOT_ENTRIES   (4096)
 #define NPC_MAX_KEY_NIBBLES(31)
 /* Nibble offsets */
 #define NPC_LAYER_KEYX_SZ  (3)
-- 
2.25.4



Re: [dpdk-dev] [PATCH v1] raw/ifpga/base: check address before assigning

2021-06-01 Thread Zhang, Qi Z



> -Original Message-
> From: Huang, Wei 
> Sent: Monday, May 31, 2021 1:23 PM
> To: dev@dpdk.org; Xu, Rosen ; Zhang, Qi Z
> 
> Cc: sta...@dpdk.org; Zhang, Tianfei ; Yigit, Ferruh
> ; Huang, Wei 
> Subject: [PATCH v1] raw/ifpga/base: check address before assigning
> 
> In max10_staging_area_init(), variable "start" from fdt_get_reg() may be
> invalid, it should be checked before assigning to member variable
> "staging_area_base" of structure "intel_max10_device".
> 
> Coverity issue: 367480, 367482
> Fixes: a05bd1b40bde ("raw/ifpga: add FPGA RSU APIs")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Wei Huang 
> Acked-by: Tianfei Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi


Re: [dpdk-dev] [PATCH] net/ice: fix default RSS key generation

2021-06-01 Thread Zhang, Qi Z



> -Original Message-
> From: Yu, DapengX 
> Sent: Thursday, May 27, 2021 2:43 PM
> To: Yang, Qiming ; Zhang, Qi Z
> 
> Cc: dev@dpdk.org; Yigit, Ferruh ; Yu, DapengX
> ; sta...@dpdk.org
> Subject: [PATCH] net/ice: fix default RSS key generation
> 
> From: Dapeng Yu 
> 
> In original implementation, device reconfiguration will generate a new default
> RSS key if there is no one from user, it is unexpected when updating a
> completely unrelated configuration.
> 
> This patch makes default RSS key unchanged, during the lifetime of the DPDK
> application even if there are multiple reconfigurations.
> 
> Fixes: 50370662b727 ("net/ice: support device and queue ops")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Dapeng Yu 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi


Re: [dpdk-dev] [PATCH] net/iavf: fix error handle for unsupported promisc configure

2021-06-01 Thread Zhang, Qi Z



> -Original Message-
> From: Xing, Beilei 
> Sent: Thursday, May 27, 2021 9:54 AM
> To: Zhang, Qi Z 
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: RE: [PATCH] net/iavf: fix error handle for unsupported promisc
> configure
> 
> 
> 
> > -Original Message-
> > From: Zhang, Qi Z 
> > Sent: Wednesday, May 26, 2021 5:53 PM
> > To: Xing, Beilei 
> > Cc: dev@dpdk.org; Zhang, Qi Z ; sta...@dpdk.org
> > Subject: [PATCH] net/iavf: fix error handle for unsupported promisc
> > configure
> >
> > iavf_execute_vf_cmd returns standard error code but not IAVF_xxx, The
> > patch fix the wrong error handling in iavf_config_promisc.
> >
> > Fixes: 1e4d55a7fe71 ("net/iavf: optimize promiscuous device
> > operations")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Qi Zhang 
> > ---
> >  drivers/net/iavf/iavf_vchnl.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/iavf/iavf_vchnl.c
> > b/drivers/net/iavf/iavf_vchnl.c index
> > 5d57e8b541..02e828f9b7 100644
> > --- a/drivers/net/iavf/iavf_vchnl.c
> > +++ b/drivers/net/iavf/iavf_vchnl.c
> > @@ -1257,8 +1257,8 @@ iavf_config_promisc(struct iavf_adapter *adapter,
> > PMD_DRV_LOG(ERR,
> > "fail to execute command
> > CONFIG_PROMISCUOUS_MODE");
> >
> > -   if (err == IAVF_NOT_SUPPORTED)
> > -   return -ENOTSUP;
> > +   if (err == -ENOTSUP)
> > +   return err;
> >
> > return -EAGAIN;
> > }
> > --
> > 2.26.2
> 
> Acked-by: Beilei Xing 

Applied to dpdk-next-net-intel.

Thanks
Qi
> 



Re: [dpdk-dev] [PATCH v1] net/i40e: fix flow director does not work

2021-06-01 Thread Zhang, Qi Z



> -Original Message-
> From: dev  On Behalf Of Xing, Beilei
> Sent: Monday, May 24, 2021 11:33 AM
> To: Yang, SteveX ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1] net/i40e: fix flow director does not work
> 
> 
> 
> > -Original Message-
> > From: Yang, SteveX 
> > Sent: Wednesday, May 19, 2021 11:28 AM
> > To: dev@dpdk.org
> > Cc: Xing, Beilei ; Yang, SteveX
> > 
> > Subject: [PATCH v1] net/i40e: fix flow director does not work
> >
> > When user configured the flow rule with raw packet via command
> > "flow_director_filter", it would reset all previous fdir input set
> > flags with "i40e_flow_set_fdir_inset()".
> >
> > Ignore to configure the flow input set with raw packet rule used.
> >
> > Fixes: ff04964ea6d5 ("net/i40e: fix flow director for common pctypes")
> >
> > Signed-off-by: Steve Yang 
> > ---
> >  drivers/net/i40e/i40e_fdir.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/i40e/i40e_fdir.c
> > b/drivers/net/i40e/i40e_fdir.c index
> > ac0e09bfdd..3c7cf1ba90 100644
> > --- a/drivers/net/i40e/i40e_fdir.c
> > +++ b/drivers/net/i40e/i40e_fdir.c
> > @@ -1768,7 +1768,8 @@ i40e_flow_add_del_fdir_filter(struct rte_eth_dev
> > *dev,
> >
> > if (add) {
> > /* configure the input set for common PCTYPEs*/
> > -   if (!filter->input.flow_ext.customized_pctype) {
> > +   if (!filter->input.flow_ext.customized_pctype &&
> > +   !filter->input.flow_ext.pkt_template) {
> > ret = i40e_flow_set_fdir_inset(pf, pctype,
> > filter->input.flow_ext.input_set);
> > if (ret < 0)
> > --
> > 2.27.0
> 
> Acked-by: Beilei Xing 

Applied to dpdk-next-net-intel.

Thanks
Qi


[dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics

2021-06-01 Thread Ivan Malov
By its very name, action PORT_ID means that packets hit an ethdev with the
given DPDK port ID. At least the current comments don't state the opposite.
That said, since port representors had been adopted, applications like OvS
have been misusing the action. They misread its purpose as sending packets
to the opposite end of the "wire" plugged to the given ethdev, for example,
redirecting packets to the VF itself rather than to its representor ethdev.
Another example: OvS relies on this action with the admin PF's ethdev port
ID specified in it in order to send offloaded packets to the physical port.

Since there might be applications which use this action in its valid sense,
one can't just change the documentation to greenlight the opposite meaning.
This patch adds an explicit bit to the action configuration which will let
applications, depending on their needs, leverage the two meanings properly.
Applications like OvS, as well as PMDs, will have to be corrected when the
patch has been applied. But the improved clarity of the action is worth it.

The proposed change is not the only option. One could avoid changes in OvS
and PMDs if the new configuration field had the opposite meaning, with the
action itself meaning delivery to the represented port and not to DPDK one.
Alternatively, one could define a brand new action with the said behaviour.

One may also consider clarifying item PORT_ID meaning in a separate change.

Signed-off-by: Ivan Malov 
---
 lib/ethdev/rte_flow.h | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 961a5884f..f45937bd7 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -2635,13 +2635,22 @@ struct rte_flow_action_phy_port {
 /**
  * RTE_FLOW_ACTION_TYPE_PORT_ID
  *
- * Directs matching traffic to a given DPDK port ID.
+ * Directs matching traffic to an ethdev with the given DPDK port ID or
+ * to the upstream port (the peer side of the wire) corresponding to it.
+ *
+ * It's assumed that it's the PMD (typically, its instance at the admin
+ * PF) which controls the binding between a (representor) ethdev and an
+ * upstream port. Typical bindings: VF rep. <=> VF, PF <=> network port.
+ * If the PMD instance is unaware of the binding between the ethdev and
+ * its upstream port (or can't control it), it should reject the action
+ * with the upstream bit specified and log an appropriate error message.
  *
  * @see RTE_FLOW_ITEM_TYPE_PORT_ID
  */
 struct rte_flow_action_port_id {
uint32_t original:1; /**< Use original DPDK port ID if possible. */
-   uint32_t reserved:31; /**< Reserved, must be zero. */
+   uint32_t upstream:1; /**< Use the upstream port for this one. */
+   uint32_t reserved:30; /**< Reserved, must be zero. */
uint32_t id; /**< DPDK port ID. */
 };
 
-- 
2.20.1



[dpdk-dev] [Bug 724] Guest causes DPDK to read out of bounds

2021-06-01 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=724

Bug ID: 724
   Summary: Guest causes DPDK to read out of bounds
   Product: DPDK
   Version: 20.11
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: vhost/virtio
  Assignee: dev@dpdk.org
  Reporter: cheng1.ji...@intel.com
  Target Milestone: ---

Report From: dsfasd daf 
Report Date: Thu, 11 Mar 2021 10:24:24 +

Report:


Hi, 
I am clark, a security researcher of Tencent Blade Team. I recently discovered
several security vulnerabilities in DPDK, as follows

1. 
Code:
examples/vhost/virtio_net.c 
vs_enqueue_pkts()
desc_indexes[i] = vr->avail->ring[used_idx];
...
uint16_t desc_idx = desc_indexes[i];
err = enqueue_pkt(dev, vr, pkts[i], desc_idx);

enqueue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
struct rte_mbuf *m, uint16_t desc_idx) {
...
desc = &vr->desc[desc_idx];
}
description:
desc_indexes[i] = vr->avail->ring[used_idx] Its value can be fully
controlled by the guest, which will cause out-of-bounds writing in the
enqueue_pkt function
harm:
Guest causes DPDK to write out of bounds
patch suggestions:
vs_enqueue_pkts() {
...
+   if (vr->avail->ring[used_idx] >= vr->size)
+   return 0;
desc_indexes[i] = vr->avail->ring[used_idx];
...
}

2. 
Code:
examples/vhost/virtio_net.c 
vs_dequeue_pkts()
desc_indexes[i] = vr->avail->ring[avail_idx];
dequeue_pkt(dev, vr, pkts[i], desc_indexes[i],
mbuf_pool);
dequeue_pkt(struct vhost_dev *dev, struct rte_vhost_vring *vr,
struct rte_mbuf *m, uint16_t desc_idx, struct rte_mempool *mbuf_pool) {
desc = &vr->desc[desc_idx];
}
description:
desc_indexes[i] = vr->avail->ring[avail_idx]; Its value can be fully
controlled by the guest, which will cause out-of-bounds reading in the
dequeue_pkt function.
harm:
Guest causes DPDK to read out of bounds
patch suggestions:
vs_dequeue_pkts() {
...
+   if (vr->avail->ring[used_idx] >= vr->size)
+   return 0;
desc_indexes[i] = vr->avail->ring[avail_idx];
...
}

3. 
Code:
examples/vhost_blk/vhost_blk.c
vq_get_desc_idx()
desc_idx = vq->vring.avail->ring[last_avail_idx];
process_vq()
desc_idx = vq_get_desc_idx(vq);
task = &vq->tasks[desc_idx];
...
process_blk_task(task);
description:
desc_idx = vq->vring.avail->ring[last_avail_idx]; Its value can be
fully controlled by the guest, process_blk_task(task); will further cause
out-of-bounds read and write.
harm:
Guest causes DPDK to read and write out of bounds
patch suggestions:
process_vq() {
desc_idx = vq_get_desc_idx(vq);
+   if (desc_idx >= vq->vring.size)
return;
task = &vq->tasks[desc_idx];


4. 
Code:
lib/librte_vhost/vhost_user.c  
vhost_user_postcopy_register() 
if (read_vhost_message(main_fd, &ack_msg) <= 0) {}
description:
vhost_user_postcopy_register is called in the vhost_user_set_mem_table
function, When dev->postcopy_listening was set to 1,
vhost_user_postcopy_register will call read_vhost_message 
and wait for qemu to respond to this message. If there is a Malicious qemu
process does not reply to this message, DPDK will wait for the response
indefinitely, and other legitimate qemu processes 
will not be able to communicate with DPDK normally. This will result in A DoS
attack.
harm:
qemu causes DPDK denial of service
patch suggestions:
Add a timeout mechanism

5. 
Code:
lib/librte_vhost/vhost_crypto.c
rte_vhost_crypto_fetch_requests()
uint16_t desc_idx = vq->avail->ring[used_idx];
struct vring_desc *head = &vq->desc[desc_idx];
if (unlikely(vhost_crypto_process_one_req(vcrypto, vq,
op, head, descs, used_idx) < 0))
description:
uint16_t desc_idx = vq->avail->ring[used_idx]; Its value can be fully
controlled by the guest, vhost_crypto_process_one_req(task); will further cause
out-of-bounds reading.
harm:
Guest causes DPDK to read out of bounds
patch suggestions:
rte_vhost_crypto_fetch_requests()
uint16_t desc_idx = vq->avail->ring[used_idx];
 

Re: [dpdk-dev] [PATCH] net/iavf: use write combining store for tail updates

2021-06-01 Thread Zhang, Qi Z



> -Original Message-
> From: dev  On Behalf Of Radu Nicolau
> Sent: Wednesday, May 12, 2021 6:29 PM
> To: dev@dpdk.org
> Cc: Richardson, Bruce ; Noonan, Gordon
> ; Wu, Jingjing ; Xing,
> Beilei ; Nicolau, Radu 
> Subject: [dpdk-dev] [PATCH] net/iavf: use write combining store for tail
> updates
> 
> From: Gordon Noonan 
> 
> Performance improvement: use a write combining store instead of a regular
> mmio write to update queue tail registers.
> 
> Signed-off-by: Gordon Noonan 
> Signed-off-by: Radu Nicolau 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi


[dpdk-dev] [PATCH v1 0/7] Enhancements for PMD power management

2021-06-01 Thread Anatoly Burakov
This patchset introduces several changes related to PMD power management:

- Add inverted checks to monitor intrinsics, based on previous patchset [1] but
  incorporating feedback [2] - this hopefully will make it possible to add
  support for .get_monitor_addr in virtio
- Add a new intrinsic to monitor multiple addresses, based on RTM instruction
  set and the TPAUSE instruction
- Add support for PMD power management on multiple queues, as well as all
  accompanying infrastructure and example apps changes

[1] http://patches.dpdk.org/project/dpdk/list/?series=16930&state=*
[2] 
http://patches.dpdk.org/project/dpdk/patch/819ef1ace187365a615d3383e54579e3d9fb216e.1620747068.git.anatoly.bura...@intel.com/#133274

Anatoly Burakov (7):
  power_intrinsics: allow monitor checks inversion
  net/af_xdp: add power monitor support
  eal: add power monitor for multiple events
  power: remove thread safety from PMD power API's
  power: support callbacks for multiple Rx queues
  power: support monitoring multiple Rx queues
  l3fwd-power: support multiqueue in PMD pmgmt modes

 drivers/net/af_xdp/rte_eth_af_xdp.c   |  25 +
 examples/l3fwd-power/main.c   |  39 +-
 lib/eal/arm/rte_power_intrinsics.c|  11 +
 lib/eal/include/generic/rte_cpuflags.h|   2 +
 .../include/generic/rte_power_intrinsics.h|  39 ++
 lib/eal/ppc/rte_power_intrinsics.c|  11 +
 lib/eal/version.map   |   3 +
 lib/eal/x86/rte_cpuflags.c|   2 +
 lib/eal/x86/rte_power_intrinsics.c|  74 ++-
 lib/power/meson.build |   3 +
 lib/power/rte_power_pmd_mgmt.c| 500 +-
 lib/power/rte_power_pmd_mgmt.h|  40 ++
 lib/power/version.map |   3 +
 13 files changed, 596 insertions(+), 156 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion

2021-06-01 Thread Anatoly Burakov
Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value.

This commit adds an option to reverse the check, so that we can have
monitor sleep aborted if the expected value *doesn't* match what's in
memory. This allows us to both implement all currently implemented
driver code, as well as support more use cases which don't easily map to
previous semantics (such as waiting on writes to AF_XDP counter value).

Since the old behavior is the default, no need to adjust existing
implementations.

Signed-off-by: Anatoly Burakov 
---
 lib/eal/include/generic/rte_power_intrinsics.h | 4 
 lib/eal/x86/rte_power_intrinsics.c | 5 -
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/eal/include/generic/rte_power_intrinsics.h 
b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..1006c2edfc 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -31,6 +31,10 @@ struct rte_power_monitor_cond {
  *   4, or 8. Supplying any other value will result in
  *   an error.
  */
+   uint8_t invert;  /**< Invert check for expected value (e.g. instead of
+ *   checking if `val` matches something, check if
+ *   `val` *doesn't* match a particular value)
+ */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c 
b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..5d944e9aa4 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -117,7 +117,10 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
const uint64_t masked = cur_value & pmc->mask;
 
/* if the masked value is already matching, abort */
-   if (masked == pmc->val)
+   if (!pmc->invert && masked == pmc->val)
+   goto end;
+   /* same, but for inverse check */
+   if (pmc->invert && masked != pmc->val)
goto end;
}
 
-- 
2.25.1



[dpdk-dev] [PATCH v1 2/7] net/af_xdp: add power monitor support

2021-06-01 Thread Anatoly Burakov
Implement support for .get_monitor_addr in AF_XDP driver.

Signed-off-by: Anatoly Burakov 
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index eb5660a3dc..dfbf74ea53 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "compat.h"
 
@@ -788,6 +789,29 @@ eth_dev_configure(struct rte_eth_dev *dev)
return 0;
 }
 
+static int
+eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
+{
+   struct pkt_rx_queue *rxq = rx_queue;
+   unsigned int *prod = rxq->rx.producer;
+   const uint32_t cur_val = rxq->rx.cached_prod; /* use cached value */
+
+   /* watch for changes in producer ring */
+   pmc->addr = (void*)prod;
+
+   /* store current value */
+   pmc->val = cur_val;
+   pmc->mask = (uint32_t)~0; /* mask entire uint32_t value */
+
+   /* AF_XDP producer ring index is 32-bit */
+   pmc->size = sizeof(uint32_t);
+
+   /* this requires an inverted check */
+   pmc->invert = 1;
+
+   return 0;
+}
+
 static int
 eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
@@ -1448,6 +1472,7 @@ static const struct eth_dev_ops ops = {
.link_update = eth_link_update,
.stats_get = eth_stats_get,
.stats_reset = eth_stats_reset,
+   .get_monitor_addr = eth_get_monitor_addr
 };
 
 /** parse busy_budget argument */
-- 
2.25.1



[dpdk-dev] [PATCH v1 3/7] eal: add power monitor for multiple events

2021-06-01 Thread Anatoly Burakov
Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
what UMWAIT does, but without the limitation of having to listen for
just one event. This works because the optimized power state used by the
TPAUSE instruction will cause a wake up on RTM transaction abort, so if
we add the addresses we're interested in to the read-set, any write to
those addresses will wake us up.

Signed-off-by: Konstantin Ananyev 
Signed-off-by: Anatoly Burakov 
---
 lib/eal/arm/rte_power_intrinsics.c| 11 +++
 lib/eal/include/generic/rte_cpuflags.h|  2 +
 .../include/generic/rte_power_intrinsics.h| 35 ++
 lib/eal/ppc/rte_power_intrinsics.c| 11 +++
 lib/eal/version.map   |  3 +
 lib/eal/x86/rte_cpuflags.c|  2 +
 lib/eal/x86/rte_power_intrinsics.c| 69 +++
 7 files changed, 133 insertions(+)

diff --git a/lib/eal/arm/rte_power_intrinsics.c 
b/lib/eal/arm/rte_power_intrinsics.c
index e83f04072a..78f55b7203 100644
--- a/lib/eal/arm/rte_power_intrinsics.c
+++ b/lib/eal/arm/rte_power_intrinsics.c
@@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 
return -ENOTSUP;
 }
+
+int
+rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp)
+{
+   RTE_SET_USED(pmc);
+   RTE_SET_USED(num);
+   RTE_SET_USED(tsc_timestamp);
+
+   return -ENOTSUP;
+}
diff --git a/lib/eal/include/generic/rte_cpuflags.h 
b/lib/eal/include/generic/rte_cpuflags.h
index 28a5aecde8..d35551e931 100644
--- a/lib/eal/include/generic/rte_cpuflags.h
+++ b/lib/eal/include/generic/rte_cpuflags.h
@@ -24,6 +24,8 @@ struct rte_cpu_intrinsics {
/**< indicates support for rte_power_monitor function */
uint32_t power_pause : 1;
/**< indicates support for rte_power_pause function */
+   uint32_t power_monitor_multi : 1;
+   /**< indicates support for rte_power_monitor_multi function */
 };
 
 /**
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h 
b/lib/eal/include/generic/rte_power_intrinsics.h
index 1006c2edfc..acb0d759ce 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -113,4 +113,39 @@ int rte_power_monitor_wakeup(const unsigned int lcore_id);
 __rte_experimental
 int rte_power_pause(const uint64_t tsc_timestamp);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Monitor a set of addresses for changes. This will cause the CPU to enter an
+ * architecture-defined optimized power state until either one of the specified
+ * memory addresses is written to, a certain TSC timestamp is reached, or other
+ * reasons cause the CPU to wake up.
+ *
+ * Additionally, `expected` 64-bit values and 64-bit masks are provided. If
+ * mask is non-zero, the current value pointed to by the `p` pointer will be
+ * checked against the expected value, and if they do not match, the entering 
of
+ * optimized power state may be aborted.
+ *
+ * @warning It is responsibility of the user to check if this function is
+ *   supported at runtime using `rte_cpu_get_intrinsics_support()` API call.
+ *   Failing to do so may result in an illegal CPU instruction error.
+ *
+ * @param pmc
+ *   An array of monitoring condition structures.
+ * @param num
+ *   Length of the `pmc` array.
+ * @param tsc_timestamp
+ *   Maximum TSC timestamp to wait for. Note that the wait behavior is
+ *   architecture-dependent.
+ *
+ * @return
+ *   0 on success
+ *   -EINVAL on invalid parameters
+ *   -ENOTSUP if unsupported
+ */
+__rte_experimental
+int rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp);
+
 #endif /* _RTE_POWER_INTRINSIC_H_ */
diff --git a/lib/eal/ppc/rte_power_intrinsics.c 
b/lib/eal/ppc/rte_power_intrinsics.c
index 7fc9586da7..f00b58ade5 100644
--- a/lib/eal/ppc/rte_power_intrinsics.c
+++ b/lib/eal/ppc/rte_power_intrinsics.c
@@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 
return -ENOTSUP;
 }
+
+int
+rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp)
+{
+   RTE_SET_USED(pmc);
+   RTE_SET_USED(num);
+   RTE_SET_USED(tsc_timestamp);
+
+   return -ENOTSUP;
+}
diff --git a/lib/eal/version.map b/lib/eal/version.map
index fe5c3dac98..4ccd5475d6 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -423,6 +423,9 @@ EXPERIMENTAL {
rte_version_release; # WINDOWS_NO_EXPORT
rte_version_suffix; # WINDOWS_NO_EXPORT
rte_version_year; # WINDOWS_NO_EXPORT
+
+   # added in 21.08
+   rte_power_monitor_multi; # WINDOWS_NO_EXPORT
 };
 
 INTERNAL {
diff --git a/lib/eal/x86/rte_cpuflags.c b/lib/eal/x86/rte_cpuflags.c
index a96312ff7f..d339734a8c 100644
--- a/lib/eal/x86/rte_cpuflags.c
+++ b

[dpdk-dev] [PATCH v1 4/7] power: remove thread safety from PMD power API's

2021-06-01 Thread Anatoly Burakov
Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov 
---
 lib/power/meson.build  |   3 +
 lib/power/rte_power_pmd_mgmt.c | 106 -
 lib/power/rte_power_pmd_mgmt.h |   6 ++
 3 files changed, 35 insertions(+), 80 deletions(-)

diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..4f6a242364 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,4 +21,7 @@ headers = files(
 'rte_power_pmd_mgmt.h',
 'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..0707c60a4f 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
/**< Callback mode for this queue */
const struct rte_eth_rxtx_callback *cur_cb;
/**< Callback instance */
-   volatile bool umwait_in_progress;
-   /**< are we currently sleeping? */
uint64_t empty_poll_stats;
/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf 
**pkts __rte_unused,
struct rte_power_monitor_cond pmc;
uint16_t ret;
 
-   /*
-* we might get a cancellation request while being
-* inside the callback, in which case the wakeup
-* wouldn't work because it would've arrived too early.
-*
-* to get around this, we notify the other thread that
-* we're sleeping, so that it can spin until we're done.
-* unsolicited wakeups are perfectly safe.
-*/
-   q_conf->umwait_in_progress = true;
-
-   rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-   /* check if we need to cancel sleep */
-   if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-   /* use monitoring condition to sleep */
-   ret = rte_eth_get_monitor_addr(port_id, qidx,
-   &pmc);
-   if (ret == 0)
-   rte_power_monitor(&pmc, UINT64_MAX);
-   }
-   q_conf->umwait_in_progress = false;
-
-   rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+   /* use monitoring condition to sleep */
+   ret = rte_eth_get_monitor_addr(port_id, qidx,
+   &pmc);
+   if (ret == 0)
+   rte_power_monitor(&pmc, UINT64_MAX);
}
} else
q_conf->empty_poll_stats = 0;
@@ -183,6 +162,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, 
uint16_t port_id,
 {
struct pmd_queue_cfg *queue_cfg;
struct rte_eth_dev_info info;
+   rte_rx_callback_fn clb;
int ret;
 
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -232,17 +212,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, 
uint16_t port_id,
ret = -ENOTSUP;
goto end;
}
-   /* initialize data before enabling the callback */
-   queue_cfg->empty_poll_stats = 0;
-   queue_cfg->cb_mode = mode;
-   queue_cfg->umwait_in_progress = false;
-   queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-   /* ensure we update our state before callback starts */
-   rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-   queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-   clb_umwait, NULL);
+   clb = clb_umwait;
break;
}
case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +239,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, 
uint16_t port_id,
ret = -ENOTSUP;
goto end;
}
-   /* initialize data before enablin

[dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues

2021-06-01 Thread Anatoly Burakov
Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of cores to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when a special designated "power saving" queue is polled. To
  put it another way, we have no idea which queue the user will poll in
  what order, so we rely on them telling us that queue X is the last one
  in the polling loop, so any power management should happen there.
- A new API is added to mark a specific Rx queue as "power saving".
  Failing to call this API will result in no power management, however
  when having only one queue per core it is obvious which queue is the
  "power saving" one, so things will still work without this new API for
  use cases that were previously working without it.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Signed-off-by: Anatoly Burakov 
---
 lib/power/rte_power_pmd_mgmt.c | 335 ++---
 lib/power/rte_power_pmd_mgmt.h |  34 
 lib/power/version.map  |   3 +
 3 files changed, 306 insertions(+), 66 deletions(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 0707c60a4f..60dd21a19c 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -33,7 +33,19 @@ enum pmd_mgmt_state {
PMD_MGMT_ENABLED
 };
 
-struct pmd_queue_cfg {
+struct queue {
+   uint16_t portid;
+   uint16_t qid;
+};
+struct pmd_core_cfg {
+   struct queue queues[RTE_MAX_ETHPORTS];
+   /**< Which port-queue pairs are associated with this lcore? */
+   struct queue power_save_queue;
+   /**< When polling multiple queues, all but this one will be ignored */
+   bool power_save_queue_set;
+   /**< When polling multiple queues, power save queue must be set */
+   size_t n_queues;
+   /**< How many queues are in the list? */
volatile enum pmd_mgmt_state pwr_mgmt_state;
/**< State of power management for this queue */
enum rte_power_pmd_mgmt_type cb_mode;
@@ -43,8 +55,97 @@ struct pmd_queue_cfg {
uint64_t empty_poll_stats;
/**< Number of empty polls */
 } __rte_cache_aligned;
+static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
 
-static struct pmd_queue_cfg 
port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+static inline bool
+queue_equal(const struct queue *l, const struct queue *r)
+{
+   return l->portid == r->portid && l->qid == r->qid;
+}
+
+static inline void
+queue_copy(struct queue *dst, const struct queue *src)
+{
+   dst->portid = src->portid;
+   dst->qid = src->qid;
+}
+
+static inline bool
+queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {
+   const struct queue *pwrsave = &cfg->power_save_queue;
+
+   /* if there's only single queue, no need to check anything */
+   if (cfg->n_queues == 1)
+   return true;
+   return cfg->power_save_queue_set && queue_equal(q, pwrsave);
+}
+
+static int
+queue_list_find(const struct pmd_core_cfg *cfg, const struct queue *q,
+   size_t *idx) {
+   size_t i;
+   for (i = 0; i < cfg->n_queues; i++) {
+   const struct queue *cur = &cfg->queues[i];
+   if (queue_equal(cur, q)) {
+   if (idx != NULL)
+   *idx = i;
+   return 0;
+   }
+   }
+   return -1;
+}
+
+static int
+queue_set_power_save(struct pmd_core_cfg *cfg, const struct queue *q) {
+   if (queue_list_find(cfg, q, NULL) < 0)
+   return -ENOENT;
+   queue_copy(&cfg->power_save_queue, q);
+   cfg->power_save_queue_set = true;
+   return 0;
+}
+
+static int
+queue_list_add(struct pmd_core_cfg *cfg, const struct queue *q)
+{
+   size_t idx = cfg->n_queues;
+   if (idx >= RTE_DIM(cfg->queues))
+   return -ENOSPC;
+   /* is it already in the list? */
+   if (queue_list_find(cfg, q, NULL) == 0)
+   return -EEXIST;
+   queue_copy(&cfg->queues[idx], q);
+   cfg->n_queues++;

[dpdk-dev] [PATCH v1 6/7] power: support monitoring multiple Rx queues

2021-06-01 Thread Anatoly Burakov
Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
Rx queues while entering the energy efficient power state. The multi
version will be used unconditionally if supported, and the UMWAIT one
will only be used when multi-monitor is not supported by the hardware.

Signed-off-by: Anatoly Burakov 
---
 lib/power/rte_power_pmd_mgmt.c | 75 +-
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 60dd21a19c..9e0b8bdfaf 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -147,6 +147,23 @@ queue_list_remove(struct pmd_core_cfg *cfg, const struct 
queue *q)
return 0;
 }
 
+static inline int
+get_monitor_addresses(struct pmd_core_cfg *cfg,
+   struct rte_power_monitor_cond *pmc)
+{
+   size_t i;
+   int ret;
+
+   for (i = 0; i < cfg->n_queues; i++) {
+   struct rte_power_monitor_cond *cur = &pmc[i];
+   struct queue *q = &cfg->queues[i];
+   ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur);
+   if (ret < 0)
+   return ret;
+   }
+   return 0;
+}
+
 static void
 calc_tsc(void)
 {
@@ -175,6 +192,48 @@ calc_tsc(void)
}
 }
 
+static uint16_t
+clb_multiwait(uint16_t port_id, uint16_t qidx,
+   struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
+   uint16_t max_pkts __rte_unused, void *addr __rte_unused)
+{
+   const unsigned int lcore = rte_lcore_id();
+   const struct queue q = {port_id, qidx};
+   const bool empty = nb_rx == 0;
+   struct pmd_core_cfg *q_conf;
+
+   q_conf = &lcore_cfg[lcore];
+
+   /* early exit */
+   if (likely(!empty)) {
+   q_conf->empty_poll_stats = 0;
+   } else {
+   /* do we care about this particular queue? */
+   if (!queue_is_power_save(q_conf, &q))
+   return nb_rx;
+
+   /*
+* we can increment unconditionally here because if there were
+* non-empty polls in other queues assigned to this core, we
+* dropped the counter to zero anyway.
+*/
+   q_conf->empty_poll_stats++;
+   if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
+   struct rte_power_monitor_cond pmc[RTE_MAX_ETHPORTS];
+   uint16_t ret;
+
+   /* gather all monitoring conditions */
+   ret = get_monitor_addresses(q_conf, pmc);
+
+   if (ret == 0)
+   rte_power_monitor_multi(pmc,
+   q_conf->n_queues, UINT64_MAX);
+   }
+   }
+
+   return nb_rx;
+}
+
 static uint16_t
 clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts 
__rte_unused,
uint16_t nb_rx, uint16_t max_pkts __rte_unused,
@@ -315,14 +374,19 @@ static int
 check_monitor(struct pmd_core_cfg *cfg, const struct queue *qdata)
 {
struct rte_power_monitor_cond dummy;
+   bool multimonitor_supported;
 
/* check if rte_power_monitor is supported */
if (!global_data.intrinsics_support.power_monitor) {
RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not 
supported\n");
return -ENOTSUP;
}
+   /* check if multi-monitor is supported */
+   multimonitor_supported =
+   global_data.intrinsics_support.power_monitor_multi;
 
-   if (cfg->n_queues > 0) {
+   /* if we're adding a new queue, do we support multiple queues? */
+   if (cfg->n_queues > 0 && !multimonitor_supported) {
RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not 
supported\n");
return -ENOTSUP;
}
@@ -338,6 +402,13 @@ check_monitor(struct pmd_core_cfg *cfg, const struct queue 
*qdata)
return 0;
 }
 
+static inline rte_rx_callback_fn
+get_monitor_callback(void)
+{
+   return global_data.intrinsics_support.power_monitor_multi ?
+   clb_multiwait : clb_umwait;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
@@ -385,7 +456,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, 
uint16_t port_id,
if (ret < 0)
goto end;
 
-   clb = clb_umwait;
+   clb = get_monitor_callback();
break;
case RTE_POWER_MGMT_TYPE_SCALE:
/* check if we can add a new queue */
-- 
2.25.1



[dpdk-dev] [PATCH v1 7/7] l3fwd-power: support multiqueue in PMD pmgmt modes

2021-06-01 Thread Anatoly Burakov
Currently, l3fwd-power enforces the limitation of having one queue per
lcore. This is no longer necessary, so remove the limitation, and always
mark the last queue in qconf as the power save queue.

Signed-off-by: Anatoly Burakov 
---
 examples/l3fwd-power/main.c | 39 +++--
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index f8dfed1634..3057c06936 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -2498,6 +2498,27 @@ mode_to_str(enum appmode mode)
}
 }
 
+static void
+pmd_pmgmt_set_up(unsigned int lcore, uint16_t portid, uint16_t qid, bool last)
+{
+   int ret;
+
+   ret = rte_power_ethdev_pmgmt_queue_enable(lcore, portid,
+   qid, pmgmt_type);
+   if (ret < 0)
+   rte_exit(EXIT_FAILURE,
+   "rte_power_ethdev_pmgmt_queue_enable: err=%d, 
port=%d\n",
+   ret, portid);
+
+   if (!last)
+   return;
+   ret = rte_power_ethdev_pmgmt_queue_set_power_save(lcore, portid, qid);
+   if (ret < 0)
+   rte_exit(EXIT_FAILURE,
+   "rte_power_ethdev_pmgmt_queue_set_power_save: err=%d, 
port=%d\n",
+   ret, portid);
+}
+
 int
 main(int argc, char **argv)
 {
@@ -2723,12 +2744,6 @@ main(int argc, char **argv)
printf("\nInitializing rx queues on lcore %u ... ", lcore_id );
fflush(stdout);
 
-   /* PMD power management mode can only do 1 queue per core */
-   if (app_mode == APP_MODE_PMD_MGMT && qconf->n_rx_queue > 1) {
-   rte_exit(EXIT_FAILURE,
-   "In PMD power management mode, only one queue 
per lcore is allowed\n");
-   }
-
/* init RX queues */
for(queue = 0; queue < qconf->n_rx_queue; ++queue) {
struct rte_eth_rxconf rxq_conf;
@@ -2767,15 +2782,9 @@ main(int argc, char **argv)
 "Fail to add ptype cb\n");
}
 
-   if (app_mode == APP_MODE_PMD_MGMT) {
-   ret = rte_power_ethdev_pmgmt_queue_enable(
-   lcore_id, portid, queueid,
-   pmgmt_type);
-   if (ret < 0)
-   rte_exit(EXIT_FAILURE,
-   
"rte_power_ethdev_pmgmt_queue_enable: err=%d, port=%d\n",
-   ret, portid);
-   }
+   if (app_mode == APP_MODE_PMD_MGMT)
+   pmd_pmgmt_set_up(lcore_id, portid, queueid,
+   queue == (qconf->n_rx_queue - 1));
}
}
 
-- 
2.25.1



Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics

2021-06-01 Thread Ilya Maximets
On 6/1/21 1:14 PM, Ivan Malov wrote:
> By its very name, action PORT_ID means that packets hit an ethdev with the
> given DPDK port ID. At least the current comments don't state the opposite.
> That said, since port representors had been adopted, applications like OvS
> have been misusing the action. They misread its purpose as sending packets
> to the opposite end of the "wire" plugged to the given ethdev, for example,
> redirecting packets to the VF itself rather than to its representor ethdev.
> Another example: OvS relies on this action with the admin PF's ethdev port
> ID specified in it in order to send offloaded packets to the physical port.
> 
> Since there might be applications which use this action in its valid sense,
> one can't just change the documentation to greenlight the opposite meaning.
> This patch adds an explicit bit to the action configuration which will let
> applications, depending on their needs, leverage the two meanings properly.
> Applications like OvS, as well as PMDs, will have to be corrected when the
> patch has been applied. But the improved clarity of the action is worth it.
> 
> The proposed change is not the only option. One could avoid changes in OvS
> and PMDs if the new configuration field had the opposite meaning, with the
> action itself meaning delivery to the represented port and not to DPDK one.
> Alternatively, one could define a brand new action with the said behaviour.

We had already very similar discussions regarding the understanding of what
the representor really is from the DPDK API's point of view, and the last
time, IIUC, it was concluded by a tech. board that representor should be
a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
VF and not to the representor device:
  
https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376
This wasn't enforced though, IIUC, for existing code and semantics is still 
mixed.

I still think that configuration should be applied to VF, and the same applies
to rte_flow API.  IMHO, average application should not care if device is
a VF itself or its representor.  Everything should work exactly the same.
I think this matches with the original idea/design of the switchdev 
functionality
in the linux kernel and also matches with how the average user thinks about
representor devices.

If some specific use-case requires to distinguish VF from the representor,
there should probably be a separate special API/flag for that.

Best regards, Ilya Maximets.


[dpdk-dev] [PATCH] net/mlx5: add TCP and IPv6 to supported flow items list in Windows

2021-06-01 Thread Tal Shnaiderman
WINOF2 2.70 Windows kernel driver allows DevX rule creation
of types TCP and IPv6.

Added the types to the supported items in mlx5_flow_os_item_supported
to allow them to be created in the PMD.

Signed-off-by: Tal Shnaiderman 
---
 drivers/net/mlx5/windows/mlx5_flow_os.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/mlx5/windows/mlx5_flow_os.h 
b/drivers/net/mlx5/windows/mlx5_flow_os.h
index 26c3e59789..df92f25ce6 100644
--- a/drivers/net/mlx5/windows/mlx5_flow_os.h
+++ b/drivers/net/mlx5/windows/mlx5_flow_os.h
@@ -42,6 +42,8 @@ mlx5_flow_os_item_supported(int item)
case RTE_FLOW_ITEM_TYPE_ETH:
case RTE_FLOW_ITEM_TYPE_IPV4:
case RTE_FLOW_ITEM_TYPE_UDP:
+   case RTE_FLOW_ITEM_TYPE_TCP:
+   case RTE_FLOW_ITEM_TYPE_IPV6:
return true;
default:
return false;
-- 
2.16.1.windows.4



Re: [dpdk-dev] [EXT] Re: [dpdk-users] DPDK issue with Marvell NIC QLogic Corp. FastLinQ QL41000

2021-06-01 Thread Igor Russkikh
Adding my team members for that.

Strangely, even if by any reason we would get OOM condition - driver should 
reject load gracefully.

Is there any possibility FW image could be corrupted/of bad size?

Regards, 
  Igor

On 5/26/2021 11:57 AM, Varghese, Vipin wrote:
> External Email
> 
> --
> Based on the logs shared from `Turing Team`, the observation is as follows
> 
> 1. 1GB huge pages are used in the application
> 
> ```logs
> 12:46:11.792 §EAL: No free hugepages reported in hugepages-2048kB
> 12:46:11.792 §EAL: No available hugepages reported in hugepages-2048kB
> ```
> 
> 2. Issue comes right at `device porbe to initialize`
> 
> ```logs
> 12:46:46.382 §EAL: Probe PCI driver: net_qede (1077:8070) device: 
> :65:00.0 (socket 0)
> 12:46:46.403 §[QEDE PMD: ()]ecore_load_mcp_offsets:The address of the MCP 
> scratch-pad is not configured
> 12:46:46.403 §[QEDE PMD: ()]ecore_mcp_cmd_init:MCP is not initialized
> 12:46:46.404 §[QEDE PMD: ()]ecore_mcp_cmd_and_union:MFW is not initialized!
> 12:46:46.404 §[QEDE PMD: ()]ecore_hw_get_nvm_info:Shared memory not 
> initialized
> 12:46:46.404 §[QEDE PMD: ()]ecore_hw_prepare_single:Failed to get HW 
> information
> 12:46:46.404 §[qed_probe:74(65:00.0:dpdk-port-0)]hw prepare failed
> 12:46:46.404 §[qede_common_dev_init:2566(65:00.0:dpdk-port-> 0)]qede probe 
> failed rc -3
> ```
> 
> Hence in my opinion this is related to NIC in use and how it is getting 
> initialized. So please try to debug driver
> 
>> -Original Message-
>> From: dev  On Behalf Of Truring Team
>> Sent: Wednesday, May 26, 2021 10:12 AM
>> To: Nishant Verma 
>> Cc: dev@dpdk.org; USERS 
>> Subject: Re: [dpdk-dev] [dpdk-users] DPDK issue with Marvell NIC QLogic
>> Corp. FastLinQ QL41000
>>
>> Yes , I have configured it via grub.
>>
>> Grub:
>> "default_hugepagesz=1G hugepagesz=1G hugepages=2"
>> grub2-mkconfig -o /boot/grub2/grub.cfg
>>
>> Manual:
>> mkdir /dev/hugepages1G
>> mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G echo 2 >
>> /sys/devices/system/node/node1/hugepages/hugepages-
>> 1048576kB/nr_hugepages
>>
>> Regards
>> Puneet
>>
>> On Wed, 26 May 2021 at 10:05, Nishant Verma  wrote:
>>
>>> meminfo states: Hugepagesize:1048576 kB
>>> But  in your initial mail you state Hugepage size is 1G.
>>>
>>> have you configured HugePage via grub ?
>>>
>>>
>>> Thanks.
>>>
>>> Regards,
>>> NV
>>>
>>> On Wed, May 26, 2021 at 12:21 AM Truring Team 
>> wrote:
>>>
 Hi Nishant,

 cat /proc/meminfo
 11:59:29.200 §MemTotal:   23101672 kB
 11:59:29.200 §MemFree:15254764 kB
 11:59:29.200 §MemAvailable:   19243184 kB
 11:59:29.200 §Buffers:6792 kB
 11:59:29.201 §Cached:  4084844 kB
 11:59:29.201 §SwapCached:0 kB
 11:59:29.201 §Active:  3609016 kB
 11:59:29.201 §Inactive:1175160 kB
 11:59:29.201 §Active(anon): 695644 kB
 11:59:29.202 §Inactive(anon):19716 kB
 11:59:29.202 §Active(file):2913372 kB
 11:59:29.202 §Inactive(file):  1155444 kB
 11:59:29.202 §Unevictable:   0 kB
 11:59:29.202 §Mlocked:   0 kB
 11:59:29.203 §SwapTotal:  11665404 kB
 11:59:29.204 §SwapFree:   11665404 kB
 11:59:29.204 §Dirty:32 kB
 11:59:29.204 §Writeback: 0 kB
 11:59:29.204 §AnonPages:694764 kB
 11:59:29.205 §Mapped:   140220 kB
 11:59:29.205 §Shmem: 22800 kB
 11:59:29.205 §Slab: 445016 kB
 11:59:29.205 §SReclaimable: 289968 kB
 11:59:29.205 §SUnreclaim:   155048 kB
 11:59:29.206 §KernelStack:   16624 kB
 11:59:29.206 §PageTables:35892 kB
 11:59:29.206 §NFS_Unstable:  0 kB
 11:59:29.206 §Bounce:0 kB
 11:59:29.206 §WritebackTmp:  0 kB
 11:59:29.207 §CommitLimit:22167664 kB
 11:59:29.207 §Committed_AS:4342668 kB
 11:59:29.207 §VmallocTotal:   34359738367 kB
 11:59:29.207 §VmallocUsed:  463188 kB
 11:59:29.207 §VmallocChunk:   34358835196 kB
 11:59:29.208 §HardwareCorrupted: 0 kB
 11:59:29.208 §AnonHugePages:329728 kB
 11:59:29.208 §CmaTotal:  0 kB
 11:59:29.209 §CmaFree:   0 kB
 11:59:29.209 §HugePages_Total:1024
 11:59:29.209 §HugePages_Free: 1019
 11:59:29.209 §HugePages_Rsvd:0
 11:59:29.209 §HugePages_Surp:0
 11:59:29.209 §Hugepagesize:   2048 kB
 11:59:29.209 §DirectMap4k:  381760 kB
 11:59:29.209 §DirectMap2M:13973504 kB
 11:59:29.215 §DirectMap1G:11534336 kB

 Regards


 On Wed, 26 May 2021 at 09:43, Nishant Verma 
>> wrote:

> Can you paste output of /proc/meminfo ?
>
>
>
> Regards,
> NV
>
> On Tue, May 25, 2021 at 9:52 PM Truring Team 
> wrote:
>
>> Hi Team,
>>
>> I am trying

Re: [dpdk-dev] [PATCH v3] guides: add a guide for developing unit tests

2021-06-01 Thread Aaron Conole
Ferruh Yigit  writes:

> On 3/9/2021 3:57 PM, Aaron Conole wrote:
>> The DPDK testing infrastructure includes a comprehensive set of
>> libraries, utilities, and CI integrations for developers to test
>> their code changes.  This isn't well documented, however.
>> 
>> Document the basics for adding a test suite to the infrastructure
>> and enabling that test suite for continuous integration platforms
>> so that newer developers can understand how to develop test suites
>> and test cases.
>
> +1 to adding this long missing documentation, thanks.
>
>> 
>> Reviewed-by: David Marchand 
>> Signed-off-by: Aaron Conole 
>> ---
>> v0->v1: Added information for TEST_SKIPPED and details about generating
>> code coverage to help with ideas for writing unit test cases.
>> v1->v2: Corrected some spelling, rephrased a bit after suggestions by
>> Ray.
>> v2->v3: Rewrite the meson build block, updated the copyright section,
>> and change the title to be a bit nicer
>> 
>>  doc/guides/contributing/index.rst   |   1 +
>>  doc/guides/contributing/testing.rst | 243 
>>  2 files changed, 244 insertions(+)
>>  create mode 100644 doc/guides/contributing/testing.rst
>> 
>> diff --git a/doc/guides/contributing/index.rst 
>> b/doc/guides/contributing/index.rst
>> index 2fefd91931..41909d949b 100644
>> --- a/doc/guides/contributing/index.rst
>> +++ b/doc/guides/contributing/index.rst
>> @@ -14,6 +14,7 @@ Contributor's Guidelines
>>  abi_versioning
>>  documentation
>>  patches
>> +testing
>>  vulnerability
>>  stable
>>  cheatsheet
>> diff --git a/doc/guides/contributing/testing.rst 
>> b/doc/guides/contributing/testing.rst
>> new file mode 100644
>> index 00..0757d71ad0
>> --- /dev/null
>> +++ b/doc/guides/contributing/testing.rst
>> @@ -0,0 +1,243 @@
>> +..  SPDX-License-Identifier: BSD-3-Clause
>> +Copyright 2021 The DPDK contributors
>> +
>> +DPDK Testing Guidelines
>> +===
>> +
>> +This document outlines the guidelines for running and adding new
>> +tests to the in-tree DPDK test suites.
>> +
>
> I think both the section name (DPDK Testing Guidelines) and the file name
> (testing.rst) is too broad comparing to what is described here.

Makes sense to me.

> What about using "DPDK Unit Test Guidelines", and 'unit_test.rst'?

Sure - I will make this change.

>> +The DPDK test suite model is loosely based on the xunit model, where
>> +tests are grouped into test suites, and suites are run by runners.
>> +For a basic overview, see the basic Wikipedia article on xunit:
>> +`xUnit - Wikipedia `_.
>> +
>> +
>> +Running a test
>> +--
>> +
>> +DPDK tests are run via the main test runner, the `dpdk-test` app.
>> +The `dpdk-test` app is a command-line interface that facilitates
>> +running various tests or test suites.
>> +
>> +There are two modes of operation.  The first mode is as an interactive
>> +command shell that allows launching specific test suites.  This is
>> +the default operating mode of `dpdk-test` and can be done by::
>> +
>> +  $ ./build/app/test/dpdk-test --dpdk-options-here
>> +  EAL: Detected 4 lcore(s)
>> +  EAL: Detected 1 NUMA nodes
>> + EAL: Static memory layout is selected, amount of reserved memory
>> can be adjusted with -m or --socket-mem
>> +  EAL: Multi-process socket /run/user/26934/dpdk/rte/mp_socket
>> +  EAL: Selected IOVA mode 'VA'
>> +  EAL: Probing VFIO support...
>> +  EAL: PCI device :00:1f.6 on NUMA socket -1
>> +  EAL:   Invalid NUMA socket, default to 0
>> +  EAL:   probe driver: 8086:15d7 net_e1000_em
>> +  APP: HPET is not enabled, using TSC as default timer
>> +  RTE>>
>> +
>> +At the prompt, simply type the name of the test suite you wish to run
>> +and it will execute.
>> +
>> +The second form is useful for a scripting environment, and is used by
>> +the DPDK meson build system.  This mode is invoked by assigning a
>> +specific test suite name to the environment variable `DPDK_TEST`
>> +before invoking the `dpdk-test` command, such as::
>> +
>> +  $ DPDK_TEST=version_autotest ./build/app/test/dpdk-test 
>> --dpdk-options-here
>> +  EAL: Detected 4 lcore(s)
>> +  EAL: Detected 1 NUMA nodes
>> + EAL: Static memory layout is selected, amount of reserved memory
>> can be adjusted with -m or --socket-mem
>> +  EAL: Multi-process socket /run/user/26934/dpdk/rte/mp_socket
>> +  EAL: Selected IOVA mode 'VA'
>> +  EAL: Probing VFIO support...
>> +  EAL: PCI device :00:1f.6 on NUMA socket -1
>> +  EAL:   Invalid NUMA socket, default to 0
>> +  EAL:   probe driver: 8086:15d7 net_e1000_em
>> +  APP: HPET is not enabled, using TSC as default timer
>> +  RTE>>version_autotest
>> +  Version string: 'DPDK 20.02.0-rc0'
>> +  Test OK
>> +  RTE>>$
>> +
>
> According code, it is also possible to run the unit test by providing its name
> as argument to 'dpdk-test', like bellow. And benefit of this is, it is 
> possible
> to provide multiple tests at

Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics

2021-06-01 Thread Eli Britstein



On 6/1/2021 3:10 PM, Ilya Maximets wrote:

External email: Use caution opening links or attachments


On 6/1/21 1:14 PM, Ivan Malov wrote:

By its very name, action PORT_ID means that packets hit an ethdev with the
given DPDK port ID. At least the current comments don't state the opposite.
That said, since port representors had been adopted, applications like OvS
have been misusing the action. They misread its purpose as sending packets
to the opposite end of the "wire" plugged to the given ethdev, for example,
redirecting packets to the VF itself rather than to its representor ethdev.
Another example: OvS relies on this action with the admin PF's ethdev port
ID specified in it in order to send offloaded packets to the physical port.

Since there might be applications which use this action in its valid sense,
one can't just change the documentation to greenlight the opposite meaning.
This patch adds an explicit bit to the action configuration which will let
applications, depending on their needs, leverage the two meanings properly.
Applications like OvS, as well as PMDs, will have to be corrected when the
patch has been applied. But the improved clarity of the action is worth it.

The proposed change is not the only option. One could avoid changes in OvS
and PMDs if the new configuration field had the opposite meaning, with the
action itself meaning delivery to the represented port and not to DPDK one.
Alternatively, one could define a brand new action with the said behaviour.


It doesn't make any sense to attach the VF itself to OVS, but only its 
representor.


For the PF, when in switchdev mode, it is the "uplink representor", so 
it is also a representor.


That said, OVS does not care of the type of the port. It doesn't matter 
if it's an "upstream" or not, or if it's a representor or not.




We had already very similar discussions regarding the understanding of what
the representor really is from the DPDK API's point of view, and the last
time, IIUC, it was concluded by a tech. board that representor should be
a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
VF and not to the representor device:
   
https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376
This wasn't enforced though, IIUC, for existing code and semantics is still 
mixed.

I am not sure how this is related.


I still think that configuration should be applied to VF, and the same applies
to rte_flow API.  IMHO, average application should not care if device is
a VF itself or its representor.  Everything should work exactly the same.
I think this matches with the original idea/design of the switchdev 
functionality
in the linux kernel and also matches with how the average user thinks about
representor devices.
Right. This is the way representors work. It is fully aligned with 
configuration of OVS-kernel.


If some specific use-case requires to distinguish VF from the representor,
there should probably be a separate special API/flag for that.

Best regards, Ilya Maximets.


Re: [dpdk-dev] [RFC PATCH] ethdev: add support for testpmd-compliant flow rule dumping

2021-06-01 Thread Ivan Malov

Hi Ori,

Your review efforts are much appreciated. I understand your concern 
about the partial item/action coverage, but there are some points to be 
considered when addressing it:
- It's anyway hardly possible to use the printed flow directly in 
testpmd if it contains "opaque", or "PMD-specific", items/actions in 
terms of the tunnel offload model. These items/actions have to be 
omitted when printing the flow, and their absence in the resulting 
string means that copy/pasting the flow to testpmd isn't helpful in this 
particular case.
- There's action ENCAP which also can't be fully represented by the tool 
in question, simply because it has no parameters. In tespmd, one first 
has to issue "set vxlan" command to configure the encap. header, whilst 
"vxlan" token in the flow rule string just refers to the previously set 
encap. parameters. The suggested flow print helper can't reliably print 
these two components ("set vxlan" and the flow rule itself) as they 
belong to different testpmd command strings.


As you might see, completeness of the solution wouldn't necessarily be 
reachable, even if full item/action coverage was provided.


As for the item/action coverage itself, it's rather controversial. On 
the one hand, yes, we should probably try to cover more items and 
actions in the suggested patch, to the extent allowed by our current 
priorities. But on the other hand, the existing coverage might not be 
that poor: it's fairly elaborate and at least allows to print the most 
common flow rules.


Yes, macros and some other cunning ways to cover more flow specifics 
might come in handy, but, at the same time, can be rather error prone. 
Sometimes it's more robust to just write the code out in full.


Thank you.

On 30/05/2021 10:27, Ori Kam wrote:

Hi Ivan,

First nice idea and thanks for the picking up the ball.

Before a detail review,
The main thing I'm concerned about is that this print will be partially 
supported,
I know that you covered this issue by printing unknown for unsupported 
item/actions,
but this will mean that it is enough that one item/action is not supported and 
already the
flow can't be used in testpmd.
To get full support it means that the developer needs to add such print with 
each new
item/action. I agree it is possible, but it has high overhead for each feature.

Maybe we should somehow create a macros for the prints or other easier to 
support ways.

For example, just printing the ipv4 has 7 function calls inside of it each one 
with error checking,
and I'm not counting the dedicated functions.



Best,
Ori



-Original Message-
From: Ivan Malov 
Sent: Thursday, May 27, 2021 11:25 AM
To: dev@dpdk.org
Cc: NBU-Contact-Thomas Monjalon ; Ferruh Yigit
; Andrew Rybchenko
; Ori Kam ; Ray
Kinsella ; Neil Horman 
Subject: [RFC PATCH] ethdev: add support for testpmd-compliant flow rule
dumping

DPDK applications (for example, OvS) or tests which use RTE flow API need to
log created or rejected flow rules to help to recognise what goes right or
wrong. From this standpoint, testpmd-compliant format is nice for the
purpose because it allows to copy-paste the flow rules and debug using
testpmd.

Recognisable pattern items:
VOID, VF, PF, PHY_PORT, PORT_ID, ETH, VLAN, IPV4, IPV6, UDP, TCP, VXLAN,
NVGRE, GENEVE, MARK, PPPOES, PPPOED.

Recognisable actions:
VOID, JUMP, MARK, FLAG, QUEUE, DROP, COUNT, RSS, PF, VF, PHY_PORT,
PORT_ID, OF_POP_VLAN, OF_PUSH_VLAN, OF_SET_VLAN_VID,
OF_SET_VLAN_PCP, VXLAN_ENCAP, VXLAN_DECAP.

Recognisable RSS types (action RSS):
IPV4, FRAG_IPV4, NONFRAG_IPV4_TCP, NONFRAG_IPV4_UDP,
NONFRAG_IPV4_OTHER, IPV6, FRAG_IPV6, NONFRAG_IPV6_TCP,
NONFRAG_IPV6_UDP, NONFRAG_IPV6_OTHER, IPV6_EX, IPV6_TCP_EX,
IPV6_UDP_EX, L3_SRC_ONLY, L3_DST_ONLY, L4_SRC_ONLY, L4_DST_ONLY.

Unrecognised parts of the flow specification are represented by tokens
"{unknown}" and "{unknown bits}". Interested parties are welcome to
extend this tool to recognise more items and actions.

Signed-off-by: Ivan Malov 
---
  lib/ethdev/meson.build|1 +
  lib/ethdev/rte_flow.h |   33 +
  lib/ethdev/rte_flow_snprint.c | 1681
+
  lib/ethdev/version.map|3 +
  4 files changed, 1718 insertions(+)
  create mode 100644 lib/ethdev/rte_flow_snprint.c

diff --git a/lib/ethdev/meson.build b/lib/ethdev/meson.build index
0205c853df..97bba4fa1b 100644
--- a/lib/ethdev/meson.build
+++ b/lib/ethdev/meson.build
@@ -8,6 +8,7 @@ sources = files(
  'rte_class_eth.c',
  'rte_ethdev.c',
  'rte_flow.c',
+   'rte_flow_snprint.c',
  'rte_mtr.c',
  'rte_tm.c',
  )
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
961a5884fe..cd5e9ef631 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -4288,6 +4288,39 @@ rte_flow_tunnel_item_release(uint16_t port_id,
 struct rte_flow_item *items,
 uint32_t num_of_items,
 st

Re: [dpdk-dev] [RFC PATCH] ethdev: add support for testpmd-compliant flow rule dumping

2021-06-01 Thread Ivan Malov

Hi Stephen,

I agree that the API rte_flow_snprint() itself would look better if it 
provided the number of characters in its return value, like snprintf 
does. However, with respect to all internal helpers, this wouldn't be 
that clear and simple: one would have to update the buffer pointer and 
decrease the buffer size before each internal (smaller) helper 
invocation. That would make the code more cumbersome in many places.


In v2, I will at least try to make the main API return the number of 
characters. Other than that, it can be discussed further.


Thank you.

On 31/05/2021 05:28, Stephen Hemminger wrote:

On Sun, 30 May 2021 07:27:32 +
Ori Kam  wrote:



DPDK applications (for example, OvS) or tests which use RTE flow API need to
log created or rejected flow rules to help to recognise what goes right or
wrong. From this standpoint, testpmd-compliant format is nice for the
purpose because it allows to copy-paste the flow rules and debug using
testpmd.

Recognisable pattern items:
VOID, VF, PF, PHY_PORT, PORT_ID, ETH, VLAN, IPV4, IPV6, UDP, TCP, VXLAN,
NVGRE, GENEVE, MARK, PPPOES, PPPOED.

Recognisable actions:
VOID, JUMP, MARK, FLAG, QUEUE, DROP, COUNT, RSS, PF, VF, PHY_PORT,
PORT_ID, OF_POP_VLAN, OF_PUSH_VLAN, OF_SET_VLAN_VID,
OF_SET_VLAN_PCP, VXLAN_ENCAP, VXLAN_DECAP.

Recognisable RSS types (action RSS):
IPV4, FRAG_IPV4, NONFRAG_IPV4_TCP, NONFRAG_IPV4_UDP,
NONFRAG_IPV4_OTHER, IPV6, FRAG_IPV6, NONFRAG_IPV6_TCP,
NONFRAG_IPV6_UDP, NONFRAG_IPV6_OTHER, IPV6_EX, IPV6_TCP_EX,
IPV6_UDP_EX, L3_SRC_ONLY, L3_DST_ONLY, L4_SRC_ONLY, L4_DST_ONLY.

Unrecognised parts of the flow specification are represented by tokens
"{unknown}" and "{unknown bits}". Interested parties are welcome to
extend this tool to recognise more items and actions.

Signed-off-by: Ivan Malov 
---
  lib/ethdev/meson.build|1 +
  lib/ethdev/rte_flow.h |   33 +
  lib/ethdev/rte_flow_snprint.c | 1681
+
  lib/ethdev/version.map|3 +
  4 files changed, 1718 insertions(+)
  create mode 100644 lib/ethdev/rte_flow_snprint.c

diff --git a/lib/ethdev/meson.build b/lib/ethdev/meson.build index
0205c853df..97bba4fa1b 100644
--- a/lib/ethdev/meson.build
+++ b/lib/ethdev/meson.build
@@ -8,6 +8,7 @@ sources = files(
  'rte_class_eth.c',
  'rte_ethdev.c',
  'rte_flow.c',
+   'rte_flow_snprint.c',
  'rte_mtr.c',
  'rte_tm.c',
  )
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
961a5884fe..cd5e9ef631 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -4288,6 +4288,39 @@ rte_flow_tunnel_item_release(uint16_t port_id,
 struct rte_flow_item *items,
 uint32_t num_of_items,
 struct rte_flow_error *error);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Dump testpmd-compliant textual representation of the flow rule.
+ * Invoke this with zero-size buffer to learn the string size and
+ * invoke this for the second time to actually dump the flow rule.
+ * The buffer size on the second invocation = the string size + 1.
+ *
+ * @param[out] buf
+ *   Buffer to save the dump in, or NULL
+ * @param buf_size
+ *   Buffer size, or 0
+ * @param[out] nb_chars_total
+ *   Resulting string size (excluding the terminating null byte)
+ * @param[in] attr
+ *   Flow rule attributes.
+ * @param[in] pattern
+ *   Pattern specification (list terminated by the END pattern item).
+ * @param[in] actions
+ *   Associated actions (list terminated by the END action).
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise
+ */
+__rte_experimental
+int
+rte_flow_snprint(char *buf, size_t buf_size, size_t *nb_chars_total,
+const struct rte_flow_attr *attr,
+const struct rte_flow_item pattern[],
+const struct rte_flow_action actions[]);
+


The code would be clearer and simpler if you adopted the same return value
as snprintf. Then lots of places could be just tail calls and the nb_chars_total
would be unnecessary.



--
Ivan M


Re: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check

2021-06-01 Thread Burakov, Anatoly

On 28-May-21 10:09 AM, Ananyev, Konstantin wrote:




On 25-May-21 10:15 AM, Liu, Yong wrote:




-Original Message-
From: dev  On Behalf Of Anatoly Burakov
Sent: Tuesday, May 11, 2021 11:32 PM
To: dev@dpdk.org; McDaniel, Timothy ;

Xing,

Beilei ; Wu, Jingjing ; Yang,
Qiming ; Zhang, Qi Z ;
Wang, Haiyue ; Matan Azrad
; Shahaf Shuler ; Viacheslav
Ovsiienko ; Richardson, Bruce
; Ananyev, Konstantin

Cc: Loftus, Ciara 
Subject: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value.

We can reverse the check, and instead have monitor sleep to be aborted
if the expected value *doesn't* match what's in memory. This allows us
to both implement all currently implemented driver code, as well as
support more use cases which don't easily map to previous semantics
(such as waiting on writes to AF_XDP counter value).



Hi Anatoly,
In virtio spec, packed formatted descriptor utilizes two bits for representing

the status. One bit for available status, one bit for used status.

For checking the status more precisely, it is need to check value against the

expected value.

The monitor function in virtio datapath still can work with new semantics,

but it may lead to some useless io call.

Base on that, I'd like to keep previous semantics.

Regards,
Marvin



Thanks for your feedback! Would making this an option make things
better? Because we need the inverted semantics for AF_XDP, it can't work
without it. So, we either invert all of them, or we have an option to do
regular or inverted check on a per-condition basis. Would that work?



That will be great if we can select the check type based on input parameter.
Just in virtio datapath, we need both inverted and original semantics for 
different ring formats.



Should we probably the consider introducing _check_ callback to be provided by 
PMD?
So we can leave these various check details inside PMD itself.
And monitor will just read the specified address and call the callback.
Konstantin



Getting monitor condition *is* "the check" IMO. I think adding an option 
to the comparison should cover pretty much all worthwhile use cases 
without overcomplicating things. In any case, patches already sent [1] :)


[1] http://patches.dpdk.org/project/dpdk/list/?series=17191

--
Thanks,
Anatoly


Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics

2021-06-01 Thread Ivan Malov

Hi Ilya,

Thank you for reviewing the proposal at such short notice. I'm afraid 
that prior discussions overlook the simple fact that the whole problem 
is not limited to just VF representors. Action PORT_ID is also used with 
respect to the admin PF's ethdev, which "represents itself" (and by no 
means it represents the underlying physical/network port). In this case, 
one cannot state that the application treats it as a physical port, just 
like one states that the application perceives representors as VFs 
themselves.


Given these facts, it would not be quite right to just align the 
documentation with the de-facto action meaning assumed by OvS.


On 01/06/2021 15:10, Ilya Maximets wrote:

On 6/1/21 1:14 PM, Ivan Malov wrote:

By its very name, action PORT_ID means that packets hit an ethdev with the
given DPDK port ID. At least the current comments don't state the opposite.
That said, since port representors had been adopted, applications like OvS
have been misusing the action. They misread its purpose as sending packets
to the opposite end of the "wire" plugged to the given ethdev, for example,
redirecting packets to the VF itself rather than to its representor ethdev.
Another example: OvS relies on this action with the admin PF's ethdev port
ID specified in it in order to send offloaded packets to the physical port.

Since there might be applications which use this action in its valid sense,
one can't just change the documentation to greenlight the opposite meaning.
This patch adds an explicit bit to the action configuration which will let
applications, depending on their needs, leverage the two meanings properly.
Applications like OvS, as well as PMDs, will have to be corrected when the
patch has been applied. But the improved clarity of the action is worth it.

The proposed change is not the only option. One could avoid changes in OvS
and PMDs if the new configuration field had the opposite meaning, with the
action itself meaning delivery to the represented port and not to DPDK one.
Alternatively, one could define a brand new action with the said behaviour.


We had already very similar discussions regarding the understanding of what
the representor really is from the DPDK API's point of view, and the last
time, IIUC, it was concluded by a tech. board that representor should be
a "ghost of a VF", i.e. DPDK APIs should apply configuration by default to
VF and not to the representor device:
   
https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376
This wasn't enforced though, IIUC, for existing code and semantics is still 
mixed.

I still think that configuration should be applied to VF, and the same applies
to rte_flow API.  IMHO, average application should not care if device is
a VF itself or its representor.  Everything should work exactly the same.
I think this matches with the original idea/design of the switchdev 
functionality
in the linux kernel and also matches with how the average user thinks about
representor devices.

If some specific use-case requires to distinguish VF from the representor,
there should probably be a separate special API/flag for that.

Best regards, Ilya Maximets.



--
Ivan M


Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics

2021-06-01 Thread Andrew Rybchenko
On 6/1/21 4:24 PM, Eli Britstein wrote:
> 
> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/1/21 1:14 PM, Ivan Malov wrote:
>>> By its very name, action PORT_ID means that packets hit an ethdev
>>> with the
>>> given DPDK port ID. At least the current comments don't state the
>>> opposite.
>>> That said, since port representors had been adopted, applications
>>> like OvS
>>> have been misusing the action. They misread its purpose as sending
>>> packets
>>> to the opposite end of the "wire" plugged to the given ethdev, for
>>> example,
>>> redirecting packets to the VF itself rather than to its representor
>>> ethdev.
>>> Another example: OvS relies on this action with the admin PF's ethdev
>>> port
>>> ID specified in it in order to send offloaded packets to the physical
>>> port.
>>>
>>> Since there might be applications which use this action in its valid
>>> sense,
>>> one can't just change the documentation to greenlight the opposite
>>> meaning.
>>> This patch adds an explicit bit to the action configuration which
>>> will let
>>> applications, depending on their needs, leverage the two meanings
>>> properly.
>>> Applications like OvS, as well as PMDs, will have to be corrected
>>> when the
>>> patch has been applied. But the improved clarity of the action is
>>> worth it.
>>>
>>> The proposed change is not the only option. One could avoid changes
>>> in OvS
>>> and PMDs if the new configuration field had the opposite meaning,
>>> with the
>>> action itself meaning delivery to the represented port and not to
>>> DPDK one.
>>> Alternatively, one could define a brand new action with the said
>>> behaviour.
> 
> It doesn't make any sense to attach the VF itself to OVS, but only its
> representor.

OvS is not the only DPDK application.

> For the PF, when in switchdev mode, it is the "uplink representor", so
> it is also a representor.

Strictly speaking it is not a representor from DPDK point of
view. E.g. representors have corresponding flag set which is
definitely clear in the case of PF.

> That said, OVS does not care of the type of the port. It doesn't matter
> if it's an "upstream" or not, or if it's a representor or not.

Yes, it is clear, but let's put OvS aside. Let's consider a
DPDK application which has a number of ethdev port. Some may
belong to single switch domain, some may be from different
switch domains (i.e. different NICs). Can I use PORT_ID action
to redirect ingress traffic to a specified ethdev port using
PORT_ID action? It looks like no, but IMHO it is the definition
of the PORT_ID action.

>> We had already very similar discussions regarding the understanding of
>> what
>> the representor really is from the DPDK API's point of view, and the last
>> time, IIUC, it was concluded by a tech. board that representor should be
>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by
>> default to
>> VF and not to the representor device:
>>   
>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376
>>
>> This wasn't enforced though, IIUC, for existing code and semantics is
>> still mixed.
> I am not sure how this is related.
>>
>> I still think that configuration should be applied to VF, and the same
>> applies
>> to rte_flow API.  IMHO, average application should not care if device is
>> a VF itself or its representor.  Everything should work exactly the same.
>> I think this matches with the original idea/design of the switchdev
>> functionality
>> in the linux kernel and also matches with how the average user thinks
>> about
>> representor devices.
> Right. This is the way representors work. It is fully aligned with
> configuration of OVS-kernel.
>>
>> If some specific use-case requires to distinguish VF from the
>> representor,
>> there should probably be a separate special API/flag for that.
>>
>> Best regards, Ilya Maximets.



Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics

2021-06-01 Thread Eli Britstein



On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:

External email: Use caution opening links or attachments


On 6/1/21 4:24 PM, Eli Britstein wrote:

On 6/1/2021 3:10 PM, Ilya Maximets wrote:

External email: Use caution opening links or attachments


On 6/1/21 1:14 PM, Ivan Malov wrote:

By its very name, action PORT_ID means that packets hit an ethdev
with the
given DPDK port ID. At least the current comments don't state the
opposite.
That said, since port representors had been adopted, applications
like OvS
have been misusing the action. They misread its purpose as sending
packets
to the opposite end of the "wire" plugged to the given ethdev, for
example,
redirecting packets to the VF itself rather than to its representor
ethdev.
Another example: OvS relies on this action with the admin PF's ethdev
port
ID specified in it in order to send offloaded packets to the physical
port.

Since there might be applications which use this action in its valid
sense,
one can't just change the documentation to greenlight the opposite
meaning.
This patch adds an explicit bit to the action configuration which
will let
applications, depending on their needs, leverage the two meanings
properly.
Applications like OvS, as well as PMDs, will have to be corrected
when the
patch has been applied. But the improved clarity of the action is
worth it.

The proposed change is not the only option. One could avoid changes
in OvS
and PMDs if the new configuration field had the opposite meaning,
with the
action itself meaning delivery to the represented port and not to
DPDK one.
Alternatively, one could define a brand new action with the said
behaviour.

It doesn't make any sense to attach the VF itself to OVS, but only its
representor.

OvS is not the only DPDK application.

True. It is just the focus of this commit message is OVS.



For the PF, when in switchdev mode, it is the "uplink representor", so
it is also a representor.

Strictly speaking it is not a representor from DPDK point of
view. E.g. representors have corresponding flag set which is
definitely clear in the case of PF.

This is the per-PMD responsibility. The API should not care.



That said, OVS does not care of the type of the port. It doesn't matter
if it's an "upstream" or not, or if it's a representor or not.

Yes, it is clear, but let's put OvS aside. Let's consider a
DPDK application which has a number of ethdev port. Some may
belong to single switch domain, some may be from different
switch domains (i.e. different NICs). Can I use PORT_ID action
to redirect ingress traffic to a specified ethdev port using
PORT_ID action? It looks like no, but IMHO it is the definition
of the PORT_ID action.


Let's separate API from implementation. By API point of view, yes, the 
user may request it. Nothing wrong with it.


From implementation point of view - yes, it might fail, but not for 
sure, even if on different NICs. Maybe the HW of a certain vendor has 
the capability to do it?


We can't know, so I think the API should allow it.




We had already very similar discussions regarding the understanding of
what
the representor really is from the DPDK API's point of view, and the last
time, IIUC, it was concluded by a tech. board that representor should be
a "ghost of a VF", i.e. DPDK APIs should apply configuration by
default to
VF and not to the representor device:

https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376

This wasn't enforced though, IIUC, for existing code and semantics is
still mixed.

I am not sure how this is related.

I still think that configuration should be applied to VF, and the same
applies
to rte_flow API.  IMHO, average application should not care if device is
a VF itself or its representor.  Everything should work exactly the same.
I think this matches with the original idea/design of the switchdev
functionality
in the linux kernel and also matches with how the average user thinks
about
representor devices.

Right. This is the way representors work. It is fully aligned with
configuration of OVS-kernel.

If some specific use-case requires to distinguish VF from the
representor,
there should probably be a separate special API/flag for that.

Best regards, Ilya Maximets.


Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics

2021-06-01 Thread Ivan Malov

Hi Eli,

On 01/06/2021 16:24, Eli Britstein wrote:


On 6/1/2021 3:10 PM, Ilya Maximets wrote:

External email: Use caution opening links or attachments


On 6/1/21 1:14 PM, Ivan Malov wrote:
By its very name, action PORT_ID means that packets hit an ethdev 
with the
given DPDK port ID. At least the current comments don't state the 
opposite.
That said, since port representors had been adopted, applications 
like OvS
have been misusing the action. They misread its purpose as sending 
packets
to the opposite end of the "wire" plugged to the given ethdev, for 
example,
redirecting packets to the VF itself rather than to its representor 
ethdev.
Another example: OvS relies on this action with the admin PF's ethdev 
port
ID specified in it in order to send offloaded packets to the physical 
port.


Since there might be applications which use this action in its valid 
sense,
one can't just change the documentation to greenlight the opposite 
meaning.
This patch adds an explicit bit to the action configuration which 
will let
applications, depending on their needs, leverage the two meanings 
properly.
Applications like OvS, as well as PMDs, will have to be corrected 
when the
patch has been applied. But the improved clarity of the action is 
worth it.


The proposed change is not the only option. One could avoid changes 
in OvS
and PMDs if the new configuration field had the opposite meaning, 
with the
action itself meaning delivery to the represented port and not to 
DPDK one.
Alternatively, one could define a brand new action with the said 
behaviour.


It doesn't make any sense to attach the VF itself to OVS, but only its 
representor.


Sure. But that doesn't invalidate the idea of the patch.



For the PF, when in switchdev mode, it is the "uplink representor", so 
it is also a representor.




No. According to the existing "port representors" documentation, the 
admin PF port "represents itself", that is the PF, and by no means it 
represents the underlying upstream port. And this makes really big 
difference. One can indeed state that plugging VFs and not their 
reprsentors to DPDK/OvS is useless, but the same statement is not 
applicable to the admin's PF.


That said, OVS does not care of the type of the port. It doesn't matter 
if it's an "upstream" or not, or if it's a representor or not.




From the high-level standpoint, indeed, the port type is a don't care 
to OvS, but the truth is that DPDK offload path in OvS, being a 
lower-level component, must respect all underlying DPDK primitives' 
original meaning. Agreeing the top-level expectations (OvS) with the 
lower-level means (DPDK flow library) *is* effectively the proper job of 
app integration. And if for some reason the existing DPDK component 
misreads the lower-level action real semantics, it cannot be justified 
by high-level principles of OvS.




We had already very similar discussions regarding the understanding of 
what

the representor really is from the DPDK API's point of view, and the last
time, IIUC, it was concluded by a tech. board that representor should be
a "ghost of a VF", i.e. DPDK APIs should apply configuration by 
default to

VF and not to the representor device:
   
https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376 

This wasn't enforced though, IIUC, for existing code and semantics is 
still mixed.

I am not sure how this is related.


I still think that configuration should be applied to VF, and the same 
applies

to rte_flow API.  IMHO, average application should not care if device is
a VF itself or its representor.  Everything should work exactly the same.
I think this matches with the original idea/design of the switchdev 
functionality
in the linux kernel and also matches with how the average user thinks 
about

representor devices.
Right. This is the way representors work. It is fully aligned with 
configuration of OVS-kernel.


If some specific use-case requires to distinguish VF from the 
representor,

there should probably be a separate special API/flag for that.

Best regards, Ilya Maximets.


--
Ivan M


Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics

2021-06-01 Thread Ivan Malov




On 01/06/2021 17:44, Eli Britstein wrote:


On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:

External email: Use caution opening links or attachments


On 6/1/21 4:24 PM, Eli Britstein wrote:

On 6/1/2021 3:10 PM, Ilya Maximets wrote:

External email: Use caution opening links or attachments


On 6/1/21 1:14 PM, Ivan Malov wrote:

By its very name, action PORT_ID means that packets hit an ethdev
with the
given DPDK port ID. At least the current comments don't state the
opposite.
That said, since port representors had been adopted, applications
like OvS
have been misusing the action. They misread its purpose as sending
packets
to the opposite end of the "wire" plugged to the given ethdev, for
example,
redirecting packets to the VF itself rather than to its representor
ethdev.
Another example: OvS relies on this action with the admin PF's ethdev
port
ID specified in it in order to send offloaded packets to the physical
port.

Since there might be applications which use this action in its valid
sense,
one can't just change the documentation to greenlight the opposite
meaning.
This patch adds an explicit bit to the action configuration which
will let
applications, depending on their needs, leverage the two meanings
properly.
Applications like OvS, as well as PMDs, will have to be corrected
when the
patch has been applied. But the improved clarity of the action is
worth it.

The proposed change is not the only option. One could avoid changes
in OvS
and PMDs if the new configuration field had the opposite meaning,
with the
action itself meaning delivery to the represented port and not to
DPDK one.
Alternatively, one could define a brand new action with the said
behaviour.

It doesn't make any sense to attach the VF itself to OVS, but only its
representor.

OvS is not the only DPDK application.

True. It is just the focus of this commit message is OVS.


Not the focus, but rather the most pictorial example.




For the PF, when in switchdev mode, it is the "uplink representor", so
it is also a representor.

Strictly speaking it is not a representor from DPDK point of
view. E.g. representors have corresponding flag set which is
definitely clear in the case of PF.

This is the per-PMD responsibility. The API should not care.



That said, OVS does not care of the type of the port. It doesn't matter
if it's an "upstream" or not, or if it's a representor or not.

Yes, it is clear, but let's put OvS aside. Let's consider a
DPDK application which has a number of ethdev port. Some may
belong to single switch domain, some may be from different
switch domains (i.e. different NICs). Can I use PORT_ID action
to redirect ingress traffic to a specified ethdev port using
PORT_ID action? It looks like no, but IMHO it is the definition
of the PORT_ID action.


Let's separate API from implementation. By API point of view, yes, the 
user may request it. Nothing wrong with it.


 From implementation point of view - yes, it might fail, but not for 
sure, even if on different NICs. Maybe the HW of a certain vendor has 
the capability to do it?


We can't know, so I think the API should allow it.




We had already very similar discussions regarding the understanding of
what
the representor really is from the DPDK API's point of view, and the 
last
time, IIUC, it was concluded by a tech. board that representor 
should be

a "ghost of a VF", i.e. DPDK APIs should apply configuration by
default to
VF and not to the representor device:

https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376 



This wasn't enforced though, IIUC, for existing code and semantics is
still mixed.

I am not sure how this is related.

I still think that configuration should be applied to VF, and the same
applies
to rte_flow API.  IMHO, average application should not care if 
device is
a VF itself or its representor.  Everything should work exactly the 
same.

I think this matches with the original idea/design of the switchdev
functionality
in the linux kernel and also matches with how the average user thinks
about
representor devices.

Right. This is the way representors work. It is fully aligned with
configuration of OVS-kernel.

If some specific use-case requires to distinguish VF from the
representor,
there should probably be a separate special API/flag for that.

Best regards, Ilya Maximets.


--
Ivan M


Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics

2021-06-01 Thread Andrew Rybchenko
On 6/1/21 5:44 PM, Eli Britstein wrote:
> 
> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 6/1/21 4:24 PM, Eli Britstein wrote:
>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote:
 External email: Use caution opening links or attachments


 On 6/1/21 1:14 PM, Ivan Malov wrote:
> By its very name, action PORT_ID means that packets hit an ethdev
> with the
> given DPDK port ID. At least the current comments don't state the
> opposite.
> That said, since port representors had been adopted, applications
> like OvS
> have been misusing the action. They misread its purpose as sending
> packets
> to the opposite end of the "wire" plugged to the given ethdev, for
> example,
> redirecting packets to the VF itself rather than to its representor
> ethdev.
> Another example: OvS relies on this action with the admin PF's ethdev
> port
> ID specified in it in order to send offloaded packets to the physical
> port.
>
> Since there might be applications which use this action in its valid
> sense,
> one can't just change the documentation to greenlight the opposite
> meaning.
> This patch adds an explicit bit to the action configuration which
> will let
> applications, depending on their needs, leverage the two meanings
> properly.
> Applications like OvS, as well as PMDs, will have to be corrected
> when the
> patch has been applied. But the improved clarity of the action is
> worth it.
>
> The proposed change is not the only option. One could avoid changes
> in OvS
> and PMDs if the new configuration field had the opposite meaning,
> with the
> action itself meaning delivery to the represented port and not to
> DPDK one.
> Alternatively, one could define a brand new action with the said
> behaviour.
>>> It doesn't make any sense to attach the VF itself to OVS, but only its
>>> representor.
>> OvS is not the only DPDK application.
> True. It is just the focus of this commit message is OVS.
>>
>>> For the PF, when in switchdev mode, it is the "uplink representor", so
>>> it is also a representor.
>> Strictly speaking it is not a representor from DPDK point of
>> view. E.g. representors have corresponding flag set which is
>> definitely clear in the case of PF.
> This is the per-PMD responsibility. The API should not care.
>>
>>> That said, OVS does not care of the type of the port. It doesn't matter
>>> if it's an "upstream" or not, or if it's a representor or not.
>> Yes, it is clear, but let's put OvS aside. Let's consider a
>> DPDK application which has a number of ethdev port. Some may
>> belong to single switch domain, some may be from different
>> switch domains (i.e. different NICs). Can I use PORT_ID action
>> to redirect ingress traffic to a specified ethdev port using
>> PORT_ID action? It looks like no, but IMHO it is the definition
>> of the PORT_ID action.
> 
> Let's separate API from implementation. By API point of view, yes, the
> user may request it. Nothing wrong with it.
> 
> From implementation point of view - yes, it might fail, but not for
> sure, even if on different NICs. Maybe the HW of a certain vendor has
> the capability to do it?
> 
> We can't know, so I think the API should allow it.

Hold on. What should it allow? It is two opposite meanings:
 1. Direct traffic to DPDK ethdev port specified using ID to be
received and processed by the DPDK application.
 2. Direct traffic to an upstream port represented by the
DPDK port.

The patch tries to address the ambiguity, misuse it in OvS
(from my point of view in accordance with the action
documentation), mis-implementation in a number of PMDs
(to work in OvS) and tries to sort it out with an explanation
why proposed direction is chosen. I realize that it could be
painful, but IMHO it is the best option here. Yes, it is a
point to discuss.

To start with we should agree that that problem exists.
Second, we should agree on direction how to solve it.

>>
 We had already very similar discussions regarding the understanding of
 what
 the representor really is from the DPDK API's point of view, and the
 last
 time, IIUC, it was concluded by a tech. board that representor
 should be
 a "ghost of a VF", i.e. DPDK APIs should apply configuration by
 default to
 VF and not to the representor device:

 https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-tho...@monjalon.net/#104376


 This wasn't enforced though, IIUC, for existing code and semantics is
 still mixed.
>>> I am not sure how this is related.
 I still think that configuration should be applied to VF, and the same
 applies
 to rte_flow API.  IMHO, average application should not care if
 device is
 a VF itself or its representor.  Everything should work exactly the
 sa

Re: [dpdk-dev] [21.08 PATCH v1 2/2] net/af_xdp: add power monitor support

2021-06-01 Thread Liang Ma
AF_XDP eventually support umwait.  looking forward to reviewing the
updated version 


Re: [dpdk-dev] [RFC PATCH] ethdev: add support for testpmd-compliant flow rule dumping

2021-06-01 Thread Stephen Hemminger
On Tue, 1 Jun 2021 17:17:24 +0300
Ivan Malov  wrote:

> Hi Stephen,
> 
> I agree that the API rte_flow_snprint() itself would look better if it 
> provided the number of characters in its return value, like snprintf 
> does. However, with respect to all internal helpers, this wouldn't be 
> that clear and simple: one would have to update the buffer pointer and 
> decrease the buffer size before each internal (smaller) helper 
> invocation. That would make the code more cumbersome in many places.
> 
> In v2, I will at least try to make the main API return the number of 
> characters. Other than that, it can be discussed further.
> 
> Thank you.
> 
> On 31/05/2021 05:28, Stephen Hemminger wrote:
> > On Sun, 30 May 2021 07:27:32 +
> > Ori Kam  wrote:
> >   
> >>>
> >>> DPDK applications (for example, OvS) or tests which use RTE flow API need 
> >>> to
> >>> log created or rejected flow rules to help to recognise what goes right or
> >>> wrong. From this standpoint, testpmd-compliant format is nice for the
> >>> purpose because it allows to copy-paste the flow rules and debug using
> >>> testpmd.
> >>>
> >>> Recognisable pattern items:
> >>> VOID, VF, PF, PHY_PORT, PORT_ID, ETH, VLAN, IPV4, IPV6, UDP, TCP, VXLAN,
> >>> NVGRE, GENEVE, MARK, PPPOES, PPPOED.
> >>>
> >>> Recognisable actions:
> >>> VOID, JUMP, MARK, FLAG, QUEUE, DROP, COUNT, RSS, PF, VF, PHY_PORT,
> >>> PORT_ID, OF_POP_VLAN, OF_PUSH_VLAN, OF_SET_VLAN_VID,
> >>> OF_SET_VLAN_PCP, VXLAN_ENCAP, VXLAN_DECAP.
> >>>
> >>> Recognisable RSS types (action RSS):
> >>> IPV4, FRAG_IPV4, NONFRAG_IPV4_TCP, NONFRAG_IPV4_UDP,
> >>> NONFRAG_IPV4_OTHER, IPV6, FRAG_IPV6, NONFRAG_IPV6_TCP,
> >>> NONFRAG_IPV6_UDP, NONFRAG_IPV6_OTHER, IPV6_EX, IPV6_TCP_EX,
> >>> IPV6_UDP_EX, L3_SRC_ONLY, L3_DST_ONLY, L4_SRC_ONLY, L4_DST_ONLY.
> >>>
> >>> Unrecognised parts of the flow specification are represented by tokens
> >>> "{unknown}" and "{unknown bits}". Interested parties are welcome to
> >>> extend this tool to recognise more items and actions.
> >>>
> >>> Signed-off-by: Ivan Malov 
> >>> ---
> >>>   lib/ethdev/meson.build|1 +
> >>>   lib/ethdev/rte_flow.h |   33 +
> >>>   lib/ethdev/rte_flow_snprint.c | 1681
> >>> +
> >>>   lib/ethdev/version.map|3 +
> >>>   4 files changed, 1718 insertions(+)
> >>>   create mode 100644 lib/ethdev/rte_flow_snprint.c
> >>>
> >>> diff --git a/lib/ethdev/meson.build b/lib/ethdev/meson.build index
> >>> 0205c853df..97bba4fa1b 100644
> >>> --- a/lib/ethdev/meson.build
> >>> +++ b/lib/ethdev/meson.build
> >>> @@ -8,6 +8,7 @@ sources = files(
> >>>   'rte_class_eth.c',
> >>>   'rte_ethdev.c',
> >>>   'rte_flow.c',
> >>> + 'rte_flow_snprint.c',
> >>>   'rte_mtr.c',
> >>>   'rte_tm.c',
> >>>   )
> >>> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> >>> 961a5884fe..cd5e9ef631 100644
> >>> --- a/lib/ethdev/rte_flow.h
> >>> +++ b/lib/ethdev/rte_flow.h
> >>> @@ -4288,6 +4288,39 @@ rte_flow_tunnel_item_release(uint16_t port_id,
> >>>struct rte_flow_item *items,
> >>>uint32_t num_of_items,
> >>>struct rte_flow_error *error);
> >>> +
> >>> +/**
> >>> + * @warning
> >>> + * @b EXPERIMENTAL: this API may change without prior notice
> >>> + *
> >>> + * Dump testpmd-compliant textual representation of the flow rule.
> >>> + * Invoke this with zero-size buffer to learn the string size and
> >>> + * invoke this for the second time to actually dump the flow rule.
> >>> + * The buffer size on the second invocation = the string size + 1.
> >>> + *
> >>> + * @param[out] buf
> >>> + *   Buffer to save the dump in, or NULL
> >>> + * @param buf_size
> >>> + *   Buffer size, or 0
> >>> + * @param[out] nb_chars_total
> >>> + *   Resulting string size (excluding the terminating null byte)
> >>> + * @param[in] attr
> >>> + *   Flow rule attributes.
> >>> + * @param[in] pattern
> >>> + *   Pattern specification (list terminated by the END pattern item).
> >>> + * @param[in] actions
> >>> + *   Associated actions (list terminated by the END action).
> >>> + *
> >>> + * @return
> >>> + *   0 on success, a negative errno value otherwise
> >>> + */
> >>> +__rte_experimental
> >>> +int
> >>> +rte_flow_snprint(char *buf, size_t buf_size, size_t *nb_chars_total,
> >>> +  const struct rte_flow_attr *attr,
> >>> +  const struct rte_flow_item pattern[],
> >>> +  const struct rte_flow_action actions[]);
> >>> +  
> > 
> > The code would be clearer and simpler if you adopted the same return value
> > as snprintf. Then lots of places could be just tail calls and the 
> > nb_chars_total
> > would be unnecessary.
> >   
> 

One other thing. Code for this kind of thing grows like a weed.
It would be good to change from if/else/switch to a more table driven
approach.


[dpdk-dev] [PATCH v7 00/10] eal: Add EAL API for threading

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

EAL thread API

**Problem Statement**
DPDK currently uses the pthread interface to create and manage threads.
Windows does not support the POSIX thread programming model, so it currently
relies on a header file that hides the Windows calls under
pthread matched interfaces. Given that EAL should isolate the environment
specifics from the applications and libraries and mediate
all the communication with the operating systems, a new EAL interface
is needed for thread management.

**Goals**
* Introduce a generic EAL API for threading support that will remove
  the current Windows pthread.h shim.
* Replace references to pthread_* across the DPDK codebase with the new
  RTE_THREAD_* API.
* Allow users to choose between using the RTE_THREAD_* API or a
  3rd party thread library through a configuration option.

**Design plan**
New API main files:
* rte_thread.h (librte_eal/include)
* rte_thread_types.h (librte_eal/include)
* rte_thread_windows_types.h (librte_eal/windows/include)
* rte_thread.c (librte_eal/windows)
* rte_thread.c (librte_eal/common)

For flexibility, the user is offered the option of either using the 
RTE_THREAD_* API or
a 3rd party thread library, through a meson flag “use_external_thread_lib”.
By default, this flag is set to FALSE, which means Windows libraries and 
applications
will use the RTE_THREAD_* API for managing threads.

If compiling on Windows and the “use_external_thread_lib” is *not* set,
the following files will be parsed: 
* include/rte_thread.h
* windows/include/rte_thread_windows_types.h
* windows/rte_thread.c
In all other cases, the compilation/parsing includes the following files:
* include/rte_thread.h 
* include/rte_thread_types.h
* common/rte_thread.c

**A schematic example of the design**
--
lib/librte_eal/include/rte_thread.h
int rte_thread_create();

lib/librte_eal/common/rte_thread.c
int rte_thread_create() 
{
return pthread_create();
}

lib/librte_eal/windows/rte_thread.c
int rte_thread_create() 
{
return CreateThread();
}

lib/librte_eal/windows/meson.build
if get_option('use_external_thread_lib')
sources += 'librte_eal/common/rte_thread.c'
else
sources += 'librte_eal/windows/rte_thread.c'
endif
-

**Thread attributes**

When or after a thread is created, specific characteristics of the thread
can be adjusted. Given that the thread characteristics that are of interest
for DPDK applications are affinity and priority, the following structure
that represents thread attributes has been defined:

typedef struct
{
enum rte_thread_priority priority;
rte_cpuset_t cpuset;
} rte_thread_attr_t;

The *rte_thread_create()* function can optionally receive an rte_thread_attr_t
object that will cause the thread to be created with the affinity and priority
described by the attributes object. If no rte_thread_attr_t is passed
(parameter is NULL), the default affinity and priority are used.
An rte_thread_attr_t object can also be set to the default values
by calling *rte_thread_attr_init()*.

*Priority* is represented through an enum that currently advertises
two values for priority:
- RTE_THREAD_PRIORITY_NORMAL
- RTE_THREAD_PRIORITY_REALTIME_CRITICAL
The enum can be extended to allow for multiple priority levels.
rte_thread_set_priority  - sets the priority of a thread
rte_thread_attr_set_priority - updates an rte_thread_attr_t object
   with a new value for priority

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p 

*Affinity* is described by the already known “rte_cpuset_t” type.
rte_thread_attr_set/get_affinity - sets/gets the affinity field in a
   rte_thread_attr_t object
rte_thread_set/get_affinity  – sets/gets the affinity of a thread

**Errors**
A translation function that maps Windows error codes to errno-style
error codes is provided. 

**Future work**
Note that this patchset was focused on introducing new API that will
remove the Windows pthread.h shim. In DPDK, there are still a few references
to pthread_* that were not implemented in the shim.
The long term plan is for EAL to provide full threading support:
* Adding support for conditional variables
* Additional functionality offered by pthread_* (such as pthread_setname_np, 
etc.)
* Static mutex initializers are not used on Windows. If we must continue
  using them, they need to be platform dependent and an implementation will
  need to be provided for Windows.

v7:
Based on DmitryK's review:
- Change thread id representation
- Change mutex id representati

[dpdk-dev] [PATCH v7 01/10] eal: add thread id and simple thread functions

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Use a portable, type-safe representation for the thread identifier.
Add functions for comparing thread ids and obtaining the thread id
for the current thread.
---
 lib/eal/common/rte_thread.c   | 105 ++
 lib/eal/include/rte_thread.h  |  53 +++--
 lib/eal/include/rte_thread_types.h|  10 ++
 .../include/rte_windows_thread_types.h|  10 ++
 lib/eal/windows/rte_thread.c  |  17 +++
 5 files changed, 186 insertions(+), 9 deletions(-)
 create mode 100644 lib/eal/common/rte_thread.c
 create mode 100644 lib/eal/include/rte_thread_types.h
 create mode 100644 lib/eal/windows/include/rte_windows_thread_types.h

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
new file mode 100644
index 00..1292f7a8f8
--- /dev/null
+++ b/lib/eal/common/rte_thread.c
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 Mellanox Technologies, Ltd
+ * Copyright(c) 2021 Microsoft Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+struct eal_tls_key {
+   pthread_key_t thread_index;
+};
+
+rte_thread_t
+rte_thread_self(void)
+{
+   rte_thread_t thread_id = { 0 };
+
+   thread_id.opaque_id = pthread_self();
+
+   return thread_id;
+}
+
+int
+rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
+{
+   return pthread_equal(t1.opaque_id, t2.opaque_id);
+}
+
+int
+rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
+{
+   int err;
+   rte_thread_key k;
+
+   k = malloc(sizeof(*k));
+   if (k == NULL) {
+   RTE_LOG(DEBUG, EAL, "Cannot allocate TLS key.\n");
+   return EINVAL;
+   }
+   err = pthread_key_create(&(k->thread_index), destructor);
+   if (err != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_key_create failed: %s\n",
+strerror(err));
+   free(k);
+   return err;
+   }
+   *key = k;
+   return 0;
+}
+
+int
+rte_thread_key_delete(rte_thread_key key)
+{
+   int err;
+
+   if (key == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
+   return EINVAL;
+   }
+   err = pthread_key_delete(key->thread_index);
+   if (err != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_key_delete failed: %s\n",
+strerror(err));
+   free(key);
+   return err;
+   }
+   free(key);
+   return 0;
+}
+
+int
+rte_thread_value_set(rte_thread_key key, const void *value)
+{
+   int err;
+
+   if (key == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
+   return EINVAL;
+   }
+   err = pthread_setspecific(key->thread_index, value);
+   if (err != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_setspecific failed: %s\n",
+   strerror(err));
+   return err;
+   }
+   return 0;
+}
+
+void *
+rte_thread_value_get(rte_thread_key key)
+{
+   if (key == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
+   rte_errno = EINVAL;
+   return NULL;
+   }
+   return pthread_getspecific(key->thread_index);
+}
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 8be8ed8f36..347df1a6ae 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2021 Mellanox Technologies, Ltd
+ * Copyright(c) 2021 Microsoft Corporation
  */
+#include 
 
 #include 
 #include 
@@ -20,11 +22,50 @@
 extern "C" {
 #endif
 
+#include 
+#if defined(RTE_USE_WINDOWS_THREAD_TYPES)
+#include 
+#else
+#include 
+#endif
+
+/**
+ * Thread id descriptor.
+ */
+typedef struct rte_thread_tag {
+   uintptr_t opaque_id; /**< thread identifier */
+} rte_thread_t;
+
 /**
  * TLS key type, an opaque pointer.
  */
 typedef struct eal_tls_key *rte_thread_key;
 
+/**
+ * Get the id of the calling thread.
+ *
+ * @return
+ *   Return the thread id of the calling thread.
+ */
+__rte_experimental
+rte_thread_t rte_thread_self(void);
+
+/**
+ * Check if 2 thread ids are equal.
+ *
+ * @param t1
+ *   First thread id.
+ *
+ * @param t2
+ *   Second thread id.
+ *
+ * @return
+ *   If the ids are equal, return nonzero.
+ *   Otherwise, return 0.
+ */
+__rte_experimental
+int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
+
 #ifdef RTE_HAS_CPUSET
 
 /**
@@ -63,9 +104,7 @@ void rte_thread_get_affinity(rte_cpuset_t *cpusetp);
  *
  * @return
  *   On success, zero.
- *   On failure, a negative number and an error number is set in rte_errno.
- *   rte_errno can be: ENOMEM  - Memory allocation error.
- * ENOEXEC - Specific OS error.
+ *   On failure, return a positive errno-style error number.
  */
 
 __rte_experimental
@@ -80,9 +119,7 @@ int rte_thread_key_create(rte_

[dpdk-dev] [PATCH v7 03/10] eal/windows: translate Windows errors to errno-style errors

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add function to translate Windows error codes to
errno-style error codes. The possible return values are chosen
so that we have as much semantical compatibility between platforms as
possible.
---
 lib/eal/include/rte_thread.h |  5 +-
 lib/eal/windows/rte_thread.c | 90 +++-
 2 files changed, 71 insertions(+), 24 deletions(-)

diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index eff00023d7..f3eeb28753 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -236,9 +236,8 @@ int rte_thread_value_set(rte_thread_key key, const void 
*value);
  *
  * @return
  *   On success, value data pointer (can also be NULL).
- *   On failure, NULL and an error number is set in rte_errno.
- *   rte_errno can be: EINVAL  - Invalid parameter passed.
- * ENOEXEC - Specific OS error.
+ *   On failure, NULL and a positive error number is set in rte_errno.
+ *
  */
 __rte_experimental
 void *rte_thread_value_get(rte_thread_key key);
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index cc319d3628..6ea1dc2a05 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -13,6 +13,54 @@ struct eal_tls_key {
DWORD thread_index;
 };
 
+/* Translates the most common error codes related to threads */
+static int
+thread_translate_win32_error(DWORD error)
+{
+   switch (error) {
+   case ERROR_SUCCESS:
+   return 0;
+
+   case ERROR_INVALID_PARAMETER:
+   return EINVAL;
+
+   case ERROR_INVALID_HANDLE:
+   return EFAULT;
+
+   case ERROR_NOT_ENOUGH_MEMORY:
+   /* FALLTHROUGH */
+   case ERROR_NO_SYSTEM_RESOURCES:
+   return ENOMEM;
+
+   case ERROR_PRIVILEGE_NOT_HELD:
+   /* FALLTHROUGH */
+   case ERROR_ACCESS_DENIED:
+   return EACCES;
+
+   case ERROR_ALREADY_EXISTS:
+   return EEXIST;
+
+   case ERROR_POSSIBLE_DEADLOCK:
+   return EDEADLK;
+
+   case ERROR_INVALID_FUNCTION:
+   /* FALLTHROUGH */
+   case ERROR_CALL_NOT_IMPLEMENTED:
+   return ENOSYS;
+   }
+
+   return EINVAL;
+}
+
+static int
+thread_log_last_error(const char* message)
+{
+   DWORD error = GetLastError();
+   RTE_LOG(DEBUG, EAL, "GetLastError()=%lu: %s\n", error, message);
+
+   return thread_translate_win32_error(error);
+}
+
 rte_thread_t
 rte_thread_self(void)
 {
@@ -85,18 +133,18 @@ int
 rte_thread_key_create(rte_thread_key *key,
__rte_unused void (*destructor)(void *))
 {
+   int ret;
+
*key = malloc(sizeof(**key));
if ((*key) == NULL) {
RTE_LOG(DEBUG, EAL, "Cannot allocate TLS key.\n");
-   rte_errno = ENOMEM;
-   return -1;
+   return ENOMEM;
}
(*key)->thread_index = TlsAlloc();
if ((*key)->thread_index == TLS_OUT_OF_INDEXES) {
-   RTE_LOG_WIN32_ERR("TlsAlloc()");
+   ret = thread_log_last_error("TlsAlloc()");
free(*key);
-   rte_errno = ENOEXEC;
-   return -1;
+   return ret;
}
return 0;
 }
@@ -104,16 +152,16 @@ rte_thread_key_create(rte_thread_key *key,
 int
 rte_thread_key_delete(rte_thread_key key)
 {
-   if (!key) {
+   int ret;
+
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
-   rte_errno = EINVAL;
-   return -1;
+   return EINVAL;
}
if (!TlsFree(key->thread_index)) {
-   RTE_LOG_WIN32_ERR("TlsFree()");
+   ret = thread_log_last_error("TlsFree()");
free(key);
-   rte_errno = ENOEXEC;
-   return -1;
+   return ret;
}
free(key);
return 0;
@@ -122,19 +170,17 @@ rte_thread_key_delete(rte_thread_key key)
 int
 rte_thread_value_set(rte_thread_key key, const void *value)
 {
+   int ret;
char *p;
 
-   if (!key) {
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
-   rte_errno = EINVAL;
-   return -1;
+   return EINVAL;
}
/* discard const qualifier */
p = (char *) (uintptr_t) value;
if (!TlsSetValue(key->thread_index, p)) {
-   RTE_LOG_WIN32_ERR("TlsSetValue()");
-   rte_errno = ENOEXEC;
-   return -1;
+   return thread_log_last_error("TlsSetValue()");
}
return 0;
 }
@@ -143,16 +189,18 @@ void *
 rte_thread_value_get(rte_thread_key key)
 {
void *output;
+   DWORD ret = 0;
 
-   if (!key) {
+   if (key == NULL) {
RTE_LOG(DEBUG, EAL, "Invalid TLS key.\n");
rte_errno = EINVAL;
return NULL;
}
output = TlsGetValue(key->thread_index);
-   if (GetLastError() !

[dpdk-dev] [PATCH v7 02/10] eal: add thread attributes

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Implement thread attributes for:
* thread affinity
* thread priority
Implement functions for managing thread attributes.

Priority is represented through an enum that allows for two levels:
- RTE_THREAD_PRIORITY_NORMAL
- RTE_THREAD_PRIORITY_REALTIME_CRITICAL

Affinity is described by the already known “rte_cpuset_t” type.

An rte_thread_attr_t object can be set to the default values
by calling *rte_thread_attr_init()*.
---
 lib/eal/common/rte_thread.c   | 51 +++
 lib/eal/include/rte_thread.h  | 89 +++
 lib/eal/include/rte_thread_types.h|  3 +
 .../include/rte_windows_thread_types.h|  3 +
 lib/eal/windows/rte_thread.c  | 53 +++
 5 files changed, 199 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 1292f7a8f8..4b1e8f995e 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -9,6 +9,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,56 @@ rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
return pthread_equal(t1.opaque_id, t2.opaque_id);
 }
 
+int
+rte_thread_attr_init(rte_thread_attr_t *attr)
+{
+   RTE_ASSERT(attr != NULL);
+
+   CPU_ZERO(&attr->cpuset);
+   attr->priority = RTE_THREAD_PRIORITY_NORMAL;
+
+   return 0;
+}
+
+int
+rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+rte_cpuset_t *cpuset)
+{
+   if (thread_attr == NULL || cpuset == NULL) {
+   RTE_LOG(DEBUG, EAL, "Invalid thread attributes parameter\n");
+   return EINVAL;
+   }
+   thread_attr->cpuset = *cpuset;
+   return 0;
+}
+
+int
+rte_thread_attr_get_affinity(rte_thread_attr_t *thread_attr,
+rte_cpuset_t *cpuset)
+{
+   if ((thread_attr == NULL) || (cpuset == NULL)) {
+   RTE_LOG(DEBUG, EAL, "Invalid thread attributes parameter\n");
+   return EINVAL;
+   }
+
+   *cpuset = thread_attr->cpuset;
+   return 0;
+}
+
+int
+rte_thread_attr_set_priority(rte_thread_attr_t *thread_attr,
+enum rte_thread_priority priority)
+{
+   if (thread_attr == NULL) {
+   RTE_LOG(DEBUG, EAL,
+   "Unable to set priority attribute, invalid 
parameter\n");
+   return EINVAL;
+   }
+
+   thread_attr->priority = priority;
+   return 0;
+}
+
 int
 rte_thread_key_create(rte_thread_key *key, void (*destructor)(void *))
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 347df1a6ae..eff00023d7 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -36,6 +36,26 @@ typedef struct rte_thread_tag {
uintptr_t opaque_id; /**< thread identifier */
 } rte_thread_t;
 
+/**
+ * Thread priority values.
+ */
+enum rte_thread_priority {
+   RTE_THREAD_PRIORITY_UNDEFINED = 0,
+   /**< priority hasn't been defined */
+   RTE_THREAD_PRIORITY_NORMAL= 1,
+   /**< normal thread priority, the default */
+   RTE_THREAD_PRIORITY_REALTIME_CRITICAL = 2,
+   /**< highest thread priority allowed */
+};
+
+/**
+ * Representation for thread attributes.
+ */
+typedef struct {
+   enum rte_thread_priority priority; /**< thread priority */
+   rte_cpuset_t cpuset; /**< thread affinity */
+} rte_thread_attr_t;
+
 /**
  * TLS key type, an opaque pointer.
  */
@@ -66,6 +86,75 @@ rte_thread_t rte_thread_self(void);
 __rte_experimental
 int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
 
+/**
+ * Initialize the attributes of a thread.
+ * These attributes can be passed to the rte_thread_create() function
+ * that will create a new thread and set its attributes according to attr.
+ *
+ * @param attr
+ *   Thread attributes to initialize.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_init(rte_thread_attr_t *attr);
+
+/**
+ * Set the CPU affinity value in the thread attributes pointed to
+ * by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes in which affinity will be updated.
+ *
+ * @param cpuset
+ *   Points to the value of the affinity to be set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_attr_set_affinity(rte_thread_attr_t *thread_attr,
+   rte_cpuset_t *cpuset);
+
+/**
+ * Get the value of CPU affinity that is set in the thread attributes pointed
+ * to by 'thread_attr'.
+ *
+ * @param thread_attr
+ *   Points to the thread attributes from which affinity will be retrieved.
+ *
+ * @param cpuset
+ *   Pointer to the memory that will store the affinity.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style er

[dpdk-dev] [PATCH v7 04/10] eal: implement functions for thread affinity management

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Implement functions for getting/setting thread affinity.
Threads can be pinned to specific cores by setting their
affinity attribute.
---
 lib/eal/common/rte_thread.c   |  14 +++
 lib/eal/include/rte_thread.h  |  36 
 lib/eal/windows/eal_lcore.c   | 169 +-
 lib/eal/windows/eal_windows.h |  10 ++
 lib/eal/windows/rte_thread.c  | 127 -
 5 files changed, 310 insertions(+), 46 deletions(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 4b1e8f995e..ceb27feaa7 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -34,6 +34,20 @@ rte_thread_equal(rte_thread_t t1, rte_thread_t t2)
return pthread_equal(t1.opaque_id, t2.opaque_id);
 }
 
+int
+rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset)
+{
+   return pthread_setaffinity_np(thread_id.opaque_id, sizeof(*cpuset), 
cpuset);
+}
+
+int
+rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset)
+{
+   return pthread_getaffinity_np(thread_id.opaque_id, sizeof(*cpuset), 
cpuset);
+}
+
 int
 rte_thread_attr_init(rte_thread_attr_t *attr)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index f3eeb28753..1f02962146 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -86,6 +86,42 @@ rte_thread_t rte_thread_self(void);
 __rte_experimental
 int rte_thread_equal(rte_thread_t t1, rte_thread_t t2);
 
+/**
+ * Set the affinity of thread 'thread_id' to the cpu set
+ * specified by 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to set the affinity.
+ *
+ * @param cpuset
+ *   Pointer to CPU affinity to set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_set_affinity_by_id(rte_thread_t thread_id,
+   const rte_cpuset_t *cpuset);
+
+/**
+ * Get the affinity of thread 'thread_id' and store it
+ * in 'cpuset'.
+ *
+ * @param thread_id
+ *Id of the thread for which to get the affinity.
+ *
+ * @param cpuset
+ *   Pointer for storing the affinity value.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_get_affinity_by_id(rte_thread_t thread_id,
+   rte_cpuset_t *cpuset);
+
 /**
  * Initialize the attributes of a thread.
  * These attributes can be passed to the rte_thread_create() function
diff --git a/lib/eal/windows/eal_lcore.c b/lib/eal/windows/eal_lcore.c
index 476c2d2bdf..519a62b96d 100644
--- a/lib/eal/windows/eal_lcore.c
+++ b/lib/eal/windows/eal_lcore.c
@@ -2,7 +2,6 @@
  * Copyright(c) 2019 Intel Corporation
  */
 
-#include 
 #include 
 #include 
 
@@ -27,13 +26,15 @@ struct socket_map {
 };
 
 struct cpu_map {
-   unsigned int socket_count;
unsigned int lcore_count;
+   unsigned int socket_count;
+   unsigned int cpu_count;
struct lcore_map lcores[RTE_MAX_LCORE];
struct socket_map sockets[RTE_MAX_NUMA_NODES];
+   GROUP_AFFINITY cpus[CPU_SETSIZE];
 };
 
-static struct cpu_map cpu_map = { 0 };
+static struct cpu_map cpu_map;
 
 /* eal_create_cpu_map() is called before logging is initialized */
 static void
@@ -47,13 +48,111 @@ log_early(const char *format, ...)
va_end(va);
 }
 
+static int
+eal_query_group_affinity(void)
+{
+   SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *infos = NULL;
+   DWORD infos_size = 0;
+   int ret = 0;
+
+   if (!GetLogicalProcessorInformationEx(RelationGroup, NULL,
+ &infos_size)) {
+   DWORD error = GetLastError();
+   if (error != ERROR_INSUFFICIENT_BUFFER) {
+   log_early("Cannot get group information size, "
+ "error %lu\n", error);
+   rte_errno = EINVAL;
+   ret = -1;
+   goto cleanup;
+   }
+   }
+
+   infos = malloc(infos_size);
+   if (infos == NULL) {
+   log_early("Cannot allocate memory for NUMA node information\n");
+   rte_errno = ENOMEM;
+   ret = -1;
+   goto cleanup;
+   }
+
+   if (!GetLogicalProcessorInformationEx(RelationGroup, infos,
+ &infos_size)) {
+   log_early("Cannot get group information, error %lu\n",
+ GetLastError());
+   rte_errno = EINVAL;
+   ret = -1;
+   goto cleanup;
+   }
+
+   cpu_map.cpu_count = 0;
+   USHORT group_count = infos->Group.ActiveGroupCount;
+   for (USHORT group_number = 0; group_number < group_count; 
group_number++) {
+   KAFFINITY affinity = 
infos->Group.GroupInfo[group_number].ActiveProcessorMask;
+
+ 

[dpdk-dev] [PATCH v7 05/10] eal: implement thread priority management functions

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add function for setting the priority for a thread.
Priorities on multiple platforms are similarly determined by
a priority value and a priority class/policy.

On Linux, the following mapping is created:
RTE_THREAD_PRIORITY_NORMAL corresponds to
* policy SCHED_OTHER
* priority value:   (sched_get_priority_min(SCHED_OTHER) +
 sched_get_priority_max(SCHED_OTHER))/2;
RTE_THREAD_PRIORITY_REALTIME_CRITICAL corresponds to
* policy SCHED_RR
* priority value: sched_get_priority_max(SCHED_RR);

On Windows, the following mapping is created:
RTE_THREAD_PRIORITY_NORMAL corresponds to
* class NORMAL_PRIORITY_CLASS
* priority THREAD_PRIORITY_NORMAL
RTE_THREAD_PRIORITY_REALTIME_CRITICAL corresponds to
* class REALTIME_PRIORITY_CLASS
* priority THREAD_PRIORITY_TIME_CRITICAL
---
 lib/eal/common/rte_thread.c   | 51 ++
 lib/eal/include/rte_thread.h  | 17 
 lib/eal/include/rte_thread_types.h|  3 -
 .../include/rte_windows_thread_types.h|  3 -
 lib/eal/windows/rte_thread.c  | 92 +++
 5 files changed, 160 insertions(+), 6 deletions(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index ceb27feaa7..5cee19bb7d 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -48,6 +48,57 @@ rte_thread_get_affinity_by_id(rte_thread_t thread_id,
return pthread_getaffinity_np(thread_id.opaque_id, sizeof(*cpuset), 
cpuset);
 }
 
+static int
+thread_map_priority_to_os_value(enum rte_thread_priority eal_pri, int *os_pri, 
int *pol)
+{
+   RTE_VERIFY(os_pri != NULL);
+   RTE_VERIFY(pol != NULL);
+
+   /* Clear the output parameters */
+   *os_pri = sched_get_priority_min(SCHED_OTHER) - 1;
+   *pol = -1;
+
+   switch (eal_pri)
+   {
+   case RTE_THREAD_PRIORITY_NORMAL:
+   *pol = SCHED_OTHER;
+
+   /*
+* Choose the middle of the range to represent
+* the priority 'normal'.
+* On Linux, this should be 0, since both
+* sched_get_priority_min/_max return 0 for SCHED_OTHER.
+*/
+   *os_pri = (sched_get_priority_min(SCHED_OTHER) +
+   sched_get_priority_max(SCHED_OTHER))/2;
+   break;
+   case RTE_THREAD_PRIORITY_REALTIME_CRITICAL:
+   *pol = SCHED_RR;
+   *os_pri = sched_get_priority_max(SCHED_RR);
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL, "The requested priority value is 
invalid.\n");
+   return EINVAL;
+   }
+   return 0;
+}
+
+int
+rte_thread_set_priority(rte_thread_t thread_id,
+   enum rte_thread_priority priority)
+{
+   int ret;
+   int policy;
+   struct sched_param param;
+
+   ret = thread_map_priority_to_os_value(priority, ¶m.sched_priority, 
&policy);
+   if (ret != 0) {
+   return ret;
+   }
+
+   return pthread_setschedparam(thread_id.opaque_id, policy, ¶m);
+}
+
 int
 rte_thread_attr_init(rte_thread_attr_t *attr)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 1f02962146..5c54cd9d67 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -122,6 +122,23 @@ __rte_experimental
 int rte_thread_get_affinity_by_id(rte_thread_t thread_id,
rte_cpuset_t *cpuset);
 
+/**
+ * Set the priority of a thread.
+ *
+ * @param thread_id
+ *Id of the thread for which to set priority.
+ *
+ * @param priority
+ *   Priority value to be set.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_set_priority(rte_thread_t thread_id,
+   enum rte_thread_priority priority);
+
 /**
  * Initialize the attributes of a thread.
  * These attributes can be passed to the rte_thread_create() function
diff --git a/lib/eal/include/rte_thread_types.h 
b/lib/eal/include/rte_thread_types.h
index 996232c636..d67b24a563 100644
--- a/lib/eal/include/rte_thread_types.h
+++ b/lib/eal/include/rte_thread_types.h
@@ -7,7 +7,4 @@
 
 #include 
 
-#define EAL_THREAD_PRIORITY_NORMAL   0
-#define EAL_THREAD_PRIORITY_REALTIME_CIRTICAL99
-
 #endif /* _RTE_THREAD_TYPES_H_ */
diff --git a/lib/eal/windows/include/rte_windows_thread_types.h 
b/lib/eal/windows/include/rte_windows_thread_types.h
index 5bdeaad3d4..60e6d94553 100644
--- a/lib/eal/windows/include/rte_windows_thread_types.h
+++ b/lib/eal/windows/include/rte_windows_thread_types.h
@@ -7,7 +7,4 @@
 
 #include 
 
-#define EAL_THREAD_PRIORITY_NORMAL THREAD_PRIORITY_NORMAL
-#define EAL_THREAD_PRIORITY_REALTIME_CIRTICAL  THREAD_PRIORITY_TIME_CRITICAL
-
 #endif /* _RTE_THREAD_TYP

[dpdk-dev] [PATCH v7 06/10] eal: add thread lifetime management

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add function for thread creation, join, canceling.

The *rte_thread_create()* function can optionally receive an rte_thread_attr_t
object that will cause the thread to be created with the affinity and priority
described by the attributes object. If no rte_thread_attr_t is passed
(parameter is NULL), the default affinity and priority are used.

On Windows, the function executed by a thread when the thread starts is
represeneted by a function pointer of type DWORD (*func) (void*).
On other platforms, the function pointer is a void* (*func) (void*).

Performing a cast between these two types of function pointers to
uniformize the API on all platforms may result in undefined behavior.
TO fix this issue, a wrapper that respects the signature required by
CreateThread() has been created on Windows.
---
 lib/eal/common/rte_thread.c  | 110 +
 lib/eal/include/rte_thread.h |  53 
 lib/eal/windows/rte_thread.c | 155 +++
 3 files changed, 318 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 5cee19bb7d..2e06f16a69 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -149,6 +149,116 @@ rte_thread_attr_set_priority(rte_thread_attr_t 
*thread_attr,
return 0;
 }
 
+int
+rte_thread_create(rte_thread_t *thread_id,
+ const rte_thread_attr_t *thread_attr,
+ void *(*thread_func)(void *), void *args)
+{
+   int ret = 0;
+   pthread_attr_t attr;
+   pthread_attr_t *attrp = NULL;
+   struct sched_param param = {
+   .sched_priority = 0,
+   };
+   int policy = SCHED_OTHER;
+
+   if (thread_attr != NULL) {
+   ret = pthread_attr_init(&attr);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_init failed\n");
+   goto cleanup;
+   }
+
+   attrp = &attr;
+
+   /*
+* Set the inherit scheduler parameter to explicit,
+* otherwise the priority attribute is ignored.
+*/
+   ret = pthread_attr_setinheritsched(attrp,
+  PTHREAD_EXPLICIT_SCHED);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_setinheritsched 
failed\n");
+   goto cleanup;
+   }
+
+   /*
+* In case a realtime scheduling policy is requested,
+* the sched_priority parameter is set to the value stored in
+* thread_attr. Otherwise, for the default scheduling policy
+* (SCHED_OTHER) sched_priority needs to be initialized to 0.
+*/
+   if (thread_attr->priority == 
RTE_THREAD_PRIORITY_REALTIME_CRITICAL) {
+   policy = SCHED_RR;
+   param.sched_priority = thread_attr->priority;
+   }
+
+   ret = pthread_attr_setschedpolicy(attrp, policy);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_setschedpolicy 
failed\n");
+   goto cleanup;
+   }
+
+   ret = pthread_attr_setschedparam(attrp, ¶m);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_setschedparam 
failed\n");
+   goto cleanup;
+   }
+
+   ret = pthread_attr_setaffinity_np(attrp,
+ sizeof(thread_attr->cpuset),
+ &thread_attr->cpuset);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_attr_setaffinity_np 
failed\n");
+   goto cleanup;
+   }
+   }
+
+   ret = pthread_create(&thread_id->opaque_id, attrp, thread_func, args);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_create failed\n");
+   goto cleanup;
+   }
+
+cleanup:
+   if (attrp != NULL)
+   pthread_attr_destroy(&attr);
+
+   return ret;
+}
+
+int
+rte_thread_join(rte_thread_t thread_id, int *value_ptr)
+{
+   int ret = 0;
+   void *res = NULL;
+   void **pres = NULL;
+
+   if (value_ptr != NULL)
+   pres = &res;
+
+   ret = pthread_join(thread_id.opaque_id, pres);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "pthread_join failed\n");
+   return ret;
+   }
+
+   if (pres != NULL)
+   *value_ptr = *(int *)(*pres);
+
+   return 0;
+}
+
+int rte_thread_cancel(rte_thread_t thread_id)
+{
+   /*
+* TODO: Behavior is different between POSIX and Windows threads.
+* POSIX threads wait for a cancellation point.
+* Current Windows emulation kills thread at any point.
+*/
+   return pthread_ca

[dpdk-dev] [PATCH v7 07/10] eal: implement functions for mutex management

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add functions for mutex init, destroy, lock, unlock.

On Linux, static initialization of a mutex is possible
through PTHREAD_MUTEX_INITIALIZER.

Windows does not have a static initializer.
Initialization is only done through InitializeCriticalSection().

To simulate static initialization, a fake initializator has been added:
The rte_mutex_lock() function will verify if the mutex has been initialized
using this fake initializer and if so, it will perform additional
initialization.
---
 lib/eal/common/rte_thread.c   | 24 ++
 lib/eal/include/rte_thread.h  | 53 
 lib/eal/include/rte_thread_types.h|  4 +
 .../include/rte_windows_thread_types.h|  9 ++
 lib/eal/windows/rte_thread.c  | 83 ++-
 5 files changed, 172 insertions(+), 1 deletion(-)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index 2e06f16a69..e8e4af0451 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -249,6 +249,30 @@ rte_thread_join(rte_thread_t thread_id, int *value_ptr)
return 0;
 }
 
+int
+rte_thread_mutex_init(rte_thread_mutex_t *mutex)
+{
+   return pthread_mutex_init(mutex, NULL);
+}
+
+int
+rte_thread_mutex_lock(rte_thread_mutex_t *mutex)
+{
+   return pthread_mutex_lock(mutex);
+}
+
+int
+rte_thread_mutex_unlock(rte_thread_mutex_t *mutex)
+{
+   return pthread_mutex_unlock(mutex);
+}
+
+int
+rte_thread_mutex_destroy(rte_thread_mutex_t *mutex)
+{
+   return pthread_mutex_destroy(mutex);
+}
+
 int rte_thread_cancel(rte_thread_t thread_id)
 {
/*
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 2ff207f8bb..2fca662616 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -248,6 +248,58 @@ int rte_thread_create(rte_thread_t *thread_id,
 __rte_experimental
 int rte_thread_join(rte_thread_t thread_id, int *value_ptr);
 
+/**
+ * Initializes a mutex.
+ *
+ * @param mutex
+ *The mutex to be initialized.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_init(rte_thread_mutex_t *mutex);
+
+/**
+ * Locks a mutex.
+ *
+ * @param mutex
+ *The mutex to be locked.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_lock(rte_thread_mutex_t *mutex);
+
+/**
+ * Unlocks a mutex.
+ *
+ * @param mutex
+ *The mutex to be unlocked.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_unlock(rte_thread_mutex_t *mutex);
+
+/**
+ * Releases all resources associated with a mutex.
+ *
+ * @param mutex
+ *The mutex to be uninitialized.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_mutex_destroy(rte_thread_mutex_t *mutex);
+
 /**
  * Terminates a thread.
  *
@@ -269,6 +321,7 @@ int rte_thread_cancel(rte_thread_t thread_id);
  *
  * @param cpusetp
  *   Pointer to CPU affinity to set.
+ *
  * @return
  *   On success, return 0; otherwise return -1;
  */
diff --git a/lib/eal/include/rte_thread_types.h 
b/lib/eal/include/rte_thread_types.h
index d67b24a563..7bb0d2948c 100644
--- a/lib/eal/include/rte_thread_types.h
+++ b/lib/eal/include/rte_thread_types.h
@@ -7,4 +7,8 @@
 
 #include 
 
+#define RTE_THREAD_MUTEX_INITIALIZER PTHREAD_MUTEX_INITIALIZER
+
+typedef pthread_mutex_t rte_thread_mutex_t;
+
 #endif /* _RTE_THREAD_TYPES_H_ */
diff --git a/lib/eal/windows/include/rte_windows_thread_types.h 
b/lib/eal/windows/include/rte_windows_thread_types.h
index 60e6d94553..c6c8502bfb 100644
--- a/lib/eal/windows/include/rte_windows_thread_types.h
+++ b/lib/eal/windows/include/rte_windows_thread_types.h
@@ -7,4 +7,13 @@
 
 #include 
 
+#define WINDOWS_MUTEX_INITIALIZER   (void*)-1
+#define RTE_THREAD_MUTEX_INITIALIZER{WINDOWS_MUTEX_INITIALIZER}
+
+struct thread_mutex_t {
+   void* mutex_id;
+};
+
+typedef struct thread_mutex_t rte_thread_mutex_t;
+
 #endif /* _RTE_THREAD_TYPES_H_ */
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index 321b44caf6..f81876f4f2 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -470,6 +470,88 @@ rte_thread_join(rte_thread_t thread_id, int *value_ptr)
return ret;
 }
 
+int
+rte_thread_mutex_init(rte_thread_mutex_t *mutex)
+{
+   int ret = 0;
+   CRITICAL_SECTION *m = NULL;
+
+   RTE_VERIFY(mutex != NULL);
+
+   m = calloc(1, sizeof(*m));
+   if (m == NULL) {
+   RTE_LOG(DEBUG, EAL, "Unable to initialize mutex. Insufficient 
memory!\n");
+   ret = ENOMEM;
+   goto cleanup;
+   }
+
+   InitializeCriticalSection(m)

[dpdk-dev] [PATCH v7 09/10] eal: add EAL argument for setting thread priority

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Allow the user to choose the thread priority through an EAL
command line argument.

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

 Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p 
---
 lib/eal/common/eal_common_options.c | 28 +++-
 lib/eal/common/eal_internal_cfg.h   |  2 ++
 lib/eal/common/eal_options.h|  2 ++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index 66f9114715..773cefdff7 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -107,6 +107,7 @@ eal_long_options[] = {
{OPT_TELEMETRY, 0, NULL, OPT_TELEMETRY_NUM},
{OPT_NO_TELEMETRY,  0, NULL, OPT_NO_TELEMETRY_NUM },
{OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
+   {OPT_THREAD_PRIORITY,   1, NULL, OPT_THREAD_PRIORITY_NUM},
 
/* legacy options that will be removed in future */
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
@@ -1406,6 +1407,24 @@ eal_parse_simd_bitwidth(const char *arg)
return 0;
 }
 
+static int
+eal_parse_thread_priority(const char *arg)
+{
+   struct internal_config *internal_conf =
+   eal_get_internal_configuration();
+   enum rte_thread_priority priority;
+
+   if (!strncmp("normal", arg, sizeof("normal")))
+   priority = RTE_THREAD_PRIORITY_NORMAL;
+   else if (!strncmp("realtime", arg, sizeof("realtime")))
+   priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
+   else
+   return -1;
+
+   internal_conf->thread_priority = priority;
+   return 0;
+}
+
 static int
 eal_parse_base_virtaddr(const char *arg)
 {
@@ -1819,7 +1838,13 @@ eal_parse_common_option(int opt, const char *optarg,
return -1;
}
break;
-
+   case OPT_THREAD_PRIORITY_NUM:
+   if (eal_parse_thread_priority(optarg) < 0) {
+   RTE_LOG(ERR, EAL, "invalid parameter for --"
+   OPT_THREAD_PRIORITY "\n");
+   return -1;
+   }
+   break;
/* don't know what to do, leave this to caller */
default:
return 1;
@@ -2082,6 +2107,7 @@ eal_common_usage(void)
   "  (can be used multiple times)\n"
   "  --"OPT_VMWARE_TSC_MAP"Use VMware TSC map instead of 
native RDTSC\n"
   "  --"OPT_PROC_TYPE" Type of this process 
(primary|secondary|auto)\n"
+  "  --"OPT_THREAD_PRIORITY"   Set threads priority 
(normal|realtime)\n"
 #ifndef RTE_EXEC_ENV_WINDOWS
   "  --"OPT_SYSLOG"Set syslog facility\n"
 #endif
diff --git a/lib/eal/common/eal_internal_cfg.h 
b/lib/eal/common/eal_internal_cfg.h
index d6c0470eb8..b2996cd65b 100644
--- a/lib/eal/common/eal_internal_cfg.h
+++ b/lib/eal/common/eal_internal_cfg.h
@@ -94,6 +94,8 @@ struct internal_config {
unsigned int no_telemetry; /**< true to disable Telemetry */
struct simd_bitwidth max_simd_bitwidth;
/**< max simd bitwidth path to use */
+   enum rte_thread_priority thread_priority;
+   /**< thread priority to configure */
 };
 
 void eal_reset_internal_config(struct internal_config *internal_cfg);
diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
index 7b348e707f..9f5b209f64 100644
--- a/lib/eal/common/eal_options.h
+++ b/lib/eal/common/eal_options.h
@@ -93,6 +93,8 @@ enum {
OPT_NO_TELEMETRY_NUM,
 #define OPT_FORCE_MAX_SIMD_BITWIDTH  "force-max-simd-bitwidth"
OPT_FORCE_MAX_SIMD_BITWIDTH_NUM,
+#define OPT_THREAD_PRIORITY  "thread-prio"
+   OPT_THREAD_PRIORITY_NUM,
 
/* legacy option that will be removed in future */
 #define OPT_PCI_BLACKLIST "pci-blacklist"
-- 
2.31.0.vfs.0.1



[dpdk-dev] [PATCH v7 08/10] eal: implement functions for thread barrier management

2021-06-01 Thread Narcisa Ana Maria Vasile
From: Narcisa Vasile 

Add functions for barrier init, destroy, wait.

A portable type is used to represent a barrier identifier.
The rte_thread_barrier_wait() function returns the same value
on all platforms.
---
 lib/eal/common/rte_thread.c  | 61 
 lib/eal/include/rte_thread.h | 58 ++
 lib/eal/windows/rte_thread.c | 56 +
 3 files changed, 175 insertions(+)

diff --git a/lib/eal/common/rte_thread.c b/lib/eal/common/rte_thread.c
index e8e4af0451..7560585784 100644
--- a/lib/eal/common/rte_thread.c
+++ b/lib/eal/common/rte_thread.c
@@ -273,6 +273,67 @@ rte_thread_mutex_destroy(rte_thread_mutex_t *mutex)
return pthread_mutex_destroy(mutex);
 }
 
+int
+rte_thread_barrier_init(rte_thread_barrier_t *barrier, int count)
+{
+   int ret = 0;
+   pthread_barrier_t *pthread_barrier = NULL;
+
+   RTE_VERIFY(barrier != NULL);
+   RTE_VERIFY(count > 0);
+
+   pthread_barrier = calloc(1, sizeof(*pthread_barrier));
+   if (pthread_barrier == NULL) {
+   RTE_LOG(DEBUG, EAL, "Unable to initialize barrier. Insufficient 
memory!\n");
+   ret = ENOMEM;
+   goto cleanup;
+   }
+   ret = pthread_barrier_init(pthread_barrier, NULL, count);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "Unable to initialize barrier, ret = %d\n", 
ret);
+   goto cleanup;
+   }
+
+   barrier->barrier_id = pthread_barrier;
+   pthread_barrier = NULL;
+
+cleanup:
+   free(pthread_barrier);
+   return ret;
+}
+
+int rte_thread_barrier_wait(rte_thread_barrier_t *barrier)
+{
+   int ret = 0;
+
+   RTE_VERIFY(barrier != NULL);
+   RTE_VERIFY(barrier->barrier_id != NULL);
+
+   ret = pthread_barrier_wait(barrier->barrier_id);
+   if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
+   ret = RTE_THREAD_BARRIER_SERIAL_THREAD;
+   }
+
+   return ret;
+}
+
+int rte_thread_barrier_destroy(rte_thread_barrier_t *barrier)
+{
+   int ret = 0;
+
+   RTE_VERIFY(barrier != NULL);
+
+   ret = pthread_barrier_destroy(barrier->barrier_id);
+   if (ret != 0) {
+   RTE_LOG(DEBUG, EAL, "Unable to destroy barrier, ret = %d\n", 
ret);
+   }
+
+   free(barrier->barrier_id);
+   barrier->barrier_id = NULL;
+
+   return ret;
+}
+
 int rte_thread_cancel(rte_thread_t thread_id)
 {
/*
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index 2fca662616..06b23571a1 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -29,6 +29,11 @@ extern "C" {
 #include 
 #endif
 
+/**
+ * Returned by rte_thread_barrier_wait() when call is successful.
+ */
+#define RTE_THREAD_BARRIER_SERIAL_THREAD -1
+
 /**
  * Thread id descriptor.
  */
@@ -56,6 +61,13 @@ typedef struct {
rte_cpuset_t cpuset; /**< thread affinity */
 } rte_thread_attr_t;
 
+/**
+ * Thread barrier representation.
+ */
+typedef struct rte_thread_barrier_tag {
+   void* barrier_id;  /**< barrrier identifier */
+} rte_thread_barrier_t;
+
 /**
  * TLS key type, an opaque pointer.
  */
@@ -300,6 +312,52 @@ int rte_thread_mutex_unlock(rte_thread_mutex_t *mutex);
 __rte_experimental
 int rte_thread_mutex_destroy(rte_thread_mutex_t *mutex);
 
+/**
+ * Initializes a synchronization barrier.
+ *
+ * @param barrier
+ *A pointer that references the newly created 'barrier' object.
+ *
+ * @param count
+ *The number of threads that must enter the barrier before
+ *the threads can continue execution.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_barrier_init(rte_thread_barrier_t *barrier, int count);
+
+/**
+ * Causes the calling thread to wait at the synchronization barrier 'barrier'.
+ *
+ * @param barrier
+ *The barrier used for synchronizing the threads.
+ *
+ * @return
+ *   Return RTE_THREAD_BARRIER_SERIAL_THREAD for the thread synchronized
+ *  at the barrier.
+ *   Return 0 for all other threads.
+ *   Return a positive errno-style error number, in case of failure.
+ */
+__rte_experimental
+int rte_thread_barrier_wait(rte_thread_barrier_t *barrier);
+
+/**
+ * Releases all resources used by a synchronization barrier
+ * and uninitializes it.
+ *
+ * @param barrier
+ *The barrier to be destroyed.
+ *
+ * @return
+ *   On success, return 0.
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_barrier_destroy(rte_thread_barrier_t *barrier);
+
 /**
  * Terminates a thread.
  *
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index f81876f4f2..e1778b603e 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -552,6 +552,62 @@ rte_thread_mutex_destroy(rte_thread_mutex_t *mutex)
return 0;
 }
 
+int
+rte_thread_barrier_init(rte_thread_barrier_t *barrier, int

[dpdk-dev] [PATCH 1/2] net/ice: add Tx AVX2 offload path

2021-06-01 Thread Wenzhuo Lu
Add a specific path for TX AVX2.
In this path, support the HW offload features, like,
checksum insertion, VLAN insertion.
This path is chosen automatically according to the
configuration.

'inline' is used, then the duplicate code is generated
by the compiler.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ice/ice_rxtx.c  | 36 -
 drivers/net/ice/ice_rxtx.h  |  2 ++
 drivers/net/ice/ice_rxtx_vec_avx2.c | 54 ++---
 3 files changed, 64 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 49abcb2..7c9474e 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -3294,7 +3294,7 @@
 #ifdef RTE_ARCH_X86
struct ice_tx_queue *txq;
int i;
-   int tx_check_ret;
+   int tx_check_ret = -1;
bool use_avx512 = false;
bool use_avx2 = false;
 
@@ -3313,13 +3313,13 @@
PMD_DRV_LOG(NOTICE,
"AVX512 is not supported in build env");
 #endif
-   if (!use_avx512 && tx_check_ret == ICE_VECTOR_PATH &&
-   (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1 ||
-   rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) == 1) &&
-   rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256)
+   if ((rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1 ||
+rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) == 
1) &&
+   rte_vect_get_max_simd_bitwidth() >= 
RTE_VECT_SIMD_256)
use_avx2 = true;
 
-   if (!use_avx512 && tx_check_ret == 
ICE_VECTOR_OFFLOAD_PATH)
+   if (!use_avx2 && !use_avx512 &&
+   tx_check_ret == ICE_VECTOR_OFFLOAD_PATH)
ad->tx_vec_allowed = false;
 
if (ad->tx_vec_allowed) {
@@ -3337,6 +3337,7 @@
}
 
if (ad->tx_vec_allowed) {
+   dev->tx_pkt_prepare = NULL;
if (use_avx512) {
 #ifdef CC_AVX512_SUPPORT
if (tx_check_ret == ICE_VECTOR_OFFLOAD_PATH) {
@@ -3345,6 +3346,7 @@
dev->data->port_id);
dev->tx_pkt_burst =
ice_xmit_pkts_vec_avx512_offload;
+   dev->tx_pkt_prepare = ice_prep_pkts;
} else {
PMD_DRV_LOG(NOTICE,
"Using AVX512 Vector Tx (port %d).",
@@ -3353,14 +3355,22 @@
}
 #endif
} else {
-   PMD_DRV_LOG(DEBUG, "Using %sVector Tx (port %d).",
-   use_avx2 ? "avx2 " : "",
-   dev->data->port_id);
-   dev->tx_pkt_burst = use_avx2 ?
-   ice_xmit_pkts_vec_avx2 :
-   ice_xmit_pkts_vec;
+   if (tx_check_ret == ICE_VECTOR_OFFLOAD_PATH) {
+   PMD_DRV_LOG(NOTICE,
+   "Using AVX2 OFFLOAD Vector Tx (port 
%d).",
+   dev->data->port_id);
+   dev->tx_pkt_burst =
+   ice_xmit_pkts_vec_avx2_offload;
+   dev->tx_pkt_prepare = ice_prep_pkts;
+   } else {
+   PMD_DRV_LOG(DEBUG, "Using %sVector Tx (port 
%d).",
+   use_avx2 ? "avx2 " : "",
+   dev->data->port_id);
+   dev->tx_pkt_burst = use_avx2 ?
+   ice_xmit_pkts_vec_avx2 :
+   ice_xmit_pkts_vec;
+   }
}
-   dev->tx_pkt_prepare = NULL;
 
return;
}
diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h
index b29387c..595dc66 100644
--- a/drivers/net/ice/ice_rxtx.h
+++ b/drivers/net/ice/ice_rxtx.h
@@ -255,6 +255,8 @@ uint16_t ice_recv_scattered_pkts_vec_avx2(void *rx_queue,
  uint16_t nb_pkts);
 uint16_t ice_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+uint16_t ice_xmit_pkts_vec_avx2_offload(void *tx_queue, struct rte_mbuf 
**tx_pkts,
+   uint16_t nb_pkts);
 uint16_t ice_recv_pkts_vec_avx512(void *rx_queue, struct rte_mbuf **rx_pkts,
  uint16_t nb_pkts);
 uint16_t ice_recv_pkts_vec_avx512_offload(void *rx_queue,
diff --git a/drivers/ne

[dpdk-dev] [PATCH 0/2] add Rx/Tx offload paths for ICE AVX2

2021-06-01 Thread Wenzhuo Lu
Add specific paths for RX/TX AVX2, called offload paths.
In these paths, support the HW offload features, like, checksum, VLAN, RSS 
offload.
These paths are chosen automatically according to the configuration.

Wenzhuo Lu (2):
  net/ice: add Tx AVX2 offload path
  net/ice: add Rx AVX2 offload path

 doc/guides/rel_notes/release_21_08.rst |   6 +
 drivers/net/ice/ice_rxtx.c |  86 +--
 drivers/net/ice/ice_rxtx.h |   7 +
 drivers/net/ice/ice_rxtx_vec_avx2.c| 402 +++--
 4 files changed, 307 insertions(+), 194 deletions(-)

-- 
1.9.3



[dpdk-dev] [PATCH 2/2] net/ice: add Rx AVX2 offload path

2021-06-01 Thread Wenzhuo Lu
Add a specific path for RX AVX2.
In this path, support the HW offload features, like,
checksum, VLAN stripping, RSS hash.
This path is chosen automatically according to the
configuration.

'inline' is used, then the duplicate code is generated
by the compiler.

Signed-off-by: Wenzhuo Lu 
---
 doc/guides/rel_notes/release_21_08.rst |   6 +
 drivers/net/ice/ice_rxtx.c |  50 +++--
 drivers/net/ice/ice_rxtx.h |   5 +
 drivers/net/ice/ice_rxtx_vec_avx2.c| 348 ++---
 4 files changed, 243 insertions(+), 166 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf..203b772 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,12 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Updated Intel ice driver.**
+
+  * In AVX2 code, added the new RX and TX paths to use the HW offload
+features. When the HW offload features are configured to be used, the
+offload paths are chosen automatically. In parallel the support for HW
+offload features was removed from the legacy AVX2 paths.
 
 Removed Items
 -
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 7c9474e..4e51fd6 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1999,7 +1999,9 @@
dev->rx_pkt_burst == ice_recv_scattered_pkts_vec_avx512_offload ||
 #endif
dev->rx_pkt_burst == ice_recv_pkts_vec_avx2 ||
-   dev->rx_pkt_burst == ice_recv_scattered_pkts_vec_avx2)
+   dev->rx_pkt_burst == ice_recv_pkts_vec_avx2_offload ||
+   dev->rx_pkt_burst == ice_recv_scattered_pkts_vec_avx2 ||
+   dev->rx_pkt_burst == ice_recv_scattered_pkts_vec_avx2_offload)
return ptypes;
 #endif
 
@@ -3058,7 +3060,7 @@
 #ifdef RTE_ARCH_X86
struct ice_rx_queue *rxq;
int i;
-   int rx_check_ret;
+   int rx_check_ret = -1;
bool use_avx512 = false;
bool use_avx2 = false;
 
@@ -3113,14 +3115,25 @@

ice_recv_scattered_pkts_vec_avx512;
}
 #endif
+   } else if (use_avx2) {
+   if (rx_check_ret == ICE_VECTOR_OFFLOAD_PATH) {
+   PMD_DRV_LOG(NOTICE,
+   "Using AVX2 OFFLOAD Vector 
Scattered Rx (port %d).",
+   dev->data->port_id);
+   dev->rx_pkt_burst =
+   
ice_recv_scattered_pkts_vec_avx2_offload;
+   } else {
+   PMD_DRV_LOG(NOTICE,
+   "Using AVX2 Vector 
Scattered Rx (port %d).",
+   dev->data->port_id);
+   dev->rx_pkt_burst =
+   
ice_recv_scattered_pkts_vec_avx2;
+   }
} else {
PMD_DRV_LOG(DEBUG,
-   "Using %sVector Scattered Rx (port 
%d).",
-   use_avx2 ? "avx2 " : "",
+   "Using Vector Scattered Rx (port %d).",
dev->data->port_id);
-   dev->rx_pkt_burst = use_avx2 ?
-   ice_recv_scattered_pkts_vec_avx2 :
-   ice_recv_scattered_pkts_vec;
+   dev->rx_pkt_burst = ice_recv_scattered_pkts_vec;
}
} else {
if (use_avx512) {
@@ -3139,14 +3152,25 @@
ice_recv_pkts_vec_avx512;
}
 #endif
+   } else if (use_avx2) {
+   if (rx_check_ret == ICE_VECTOR_OFFLOAD_PATH) {
+   PMD_DRV_LOG(NOTICE,
+   "Using AVX2 OFFLOAD Vector 
Rx (port %d).",
+   dev->data->port_id);
+   dev->rx_pkt_burst =
+   ice_recv_pkts_vec_avx2_offload;
+   } else {
+   PMD_DRV_LOG(NOTICE,
+   "Using AVX2 Vector Rx (port 
%d).",
+   dev->data->port_id);
+   dev->rx_pkt_burst =
+ 

[dpdk-dev] [PATCH v2 0/4] support AVF RSS and FDIR for GRE tunnel packet

2021-06-01 Thread Wenjun Wu
[PATCH v2 1/4] net/iavf: support flow pattern for GRE
[PATCH v2 2/4] common/iavf: add header types for GRE
[PATCH v2 3/4] net/iavf: support AVF RSS for GRE tunnel packet
[PATCH v2 4/4] net/iavf: support AVF FDIR for GRE tunnel packet

v2:
* Delete the share code patch, because it is not necessary for this
  patch set.
* Delete the definition of ETH_RSS_GRE and related dependencies,
  because GRE header is not needed for hash input set.

Wenjun Wu (4):
  net/iavf: support flow pattern for GRE
  common/iavf: add header types for GRE
  net/iavf: support AVF RSS for GRE tunnel packet
  net/iavf: support AVF FDIR for GRE tunnel packet

 drivers/common/iavf/virtchnl.h   |   1 +
 drivers/net/iavf/iavf_fdir.c |  55 ++
 drivers/net/iavf/iavf_generic_flow.c | 105 +++
 drivers/net/iavf/iavf_generic_flow.h |  14 
 drivers/net/iavf/iavf_hash.c |  27 +--
 5 files changed, 197 insertions(+), 5 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v2 1/4] net/iavf: support flow pattern for GRE

2021-06-01 Thread Wenjun Wu
Add GRE pattern support for AVF FDIR and RSS.

Patterns are listed below:
  1. eth/ipv4/gre/ipv4
  2. eth/ipv4/gre/ipv6
  3. eth/ipv6/gre/ipv4
  4. eth/ipv6/gre/ipv6
  5. eth/ipv4/gre/ipv4/tcp
  6. eth/ipv4/gre/ipv6/tcp
  7. eth/ipv4/gre/ipv4/udp
  8. eth/ipv4/gre/ipv6/udp
  9. eth/ipv6/gre/ipv4/tcp
  10. eth/ipv6/gre/ipv6/tcp
  11. eth/ipv6/gre/ipv4/udp
  12. eth/ipv6/gre/ipv6/udp

Signed-off-by: Wenjun Wu 
---
 drivers/net/iavf/iavf_generic_flow.c | 105 +++
 drivers/net/iavf/iavf_generic_flow.h |  14 
 2 files changed, 119 insertions(+)

diff --git a/drivers/net/iavf/iavf_generic_flow.c 
b/drivers/net/iavf/iavf_generic_flow.c
index 242bb4abc5..d9ba5735b2 100644
--- a/drivers/net/iavf/iavf_generic_flow.c
+++ b/drivers/net/iavf/iavf_generic_flow.c
@@ -822,6 +822,111 @@ enum rte_flow_item_type iavf_pattern_eth_ipv4_ecpri[] = {
RTE_FLOW_ITEM_TYPE_END,
 };
 
+/* GRE */
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_gre_ipv4[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_gre_ipv6[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_gre_ipv4_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_gre_ipv4_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_gre_ipv6_tcp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_TCP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+enum rte_flow_item_type iavf_pattern_eth_ipv6_gre_ipv6_udp[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_GRE,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_UDP,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
 typedef struct iavf_flow_engine * (*parse_engine_t)(struct iavf_adapter *ad,
struct rte_flow *flow,
struct iavf_parser_list *parser_list,
diff --git a/drivers/net/iavf/iavf_generic_flow.h 
b/drivers/net/iavf/iavf_generic_flow.h
index e19da15518..48fd03a973 100644
--- a/drivers/net/iavf/iavf_generic_flow.h
+++ b/drivers/net/iavf/iavf_generic_flow.h
@@ -308,6 +308,20 @@ extern enum rte_flow_item_type 
iavf_pattern_eth_ipv6_pfcp[];
 extern enum rte_flow_item_type iavf_pattern_eth_ecpri[];
 extern enum rte_flow_item_type iavf_pattern_eth_ipv4_ecpri[];
 
+/* GRE */
+extern enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4[];
+extern enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6[];
+extern enum rte_flow_item_type iavf_pattern_eth_ipv6_gre_ipv4[];
+extern enum rte_flow_item_type iavf_pattern_eth_ipv6_gre_ipv6[];
+extern enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv4_tcp[];
+extern enum rte_flow_item_type iavf_pattern_eth_ipv4_gre_ipv6_tcp[];
+extern enum rte_flow_item_type 

[dpdk-dev] [PATCH v2 2/4] common/iavf: add header types for GRE

2021-06-01 Thread Wenjun Wu
Add a virtchnl protocol header type to support AVF FDIR and RSS for GRE.

Signed-off-by: Wenjun Wu 
---
 drivers/common/iavf/virtchnl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/common/iavf/virtchnl.h b/drivers/common/iavf/virtchnl.h
index 3a60faff93..197edce8a1 100644
--- a/drivers/common/iavf/virtchnl.h
+++ b/drivers/common/iavf/virtchnl.h
@@ -1504,6 +1504,7 @@ enum virtchnl_proto_hdr_type {
 */
VIRTCHNL_PROTO_HDR_IPV4_FRAG,
VIRTCHNL_PROTO_HDR_IPV6_EH_FRAG,
+   VIRTCHNL_PROTO_HDR_GRE,
 };
 
 /* Protocol header field within a protocol header. */
-- 
2.25.1



[dpdk-dev] [PATCH v2 3/4] net/iavf: support AVF RSS for GRE tunnel packet

2021-06-01 Thread Wenjun Wu
Support AVF RSS for inner header of GRE tunnel packet. It supports
RSS based on fields inner IP src + dst address and TCP/UDP src + dst
port.

Signed-off-by: Wenjun Wu 
---
 drivers/net/iavf/iavf_hash.c | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/iavf/iavf_hash.c b/drivers/net/iavf/iavf_hash.c
index 5d3d62839b..f4f0bcbfef 100644
--- a/drivers/net/iavf/iavf_hash.c
+++ b/drivers/net/iavf/iavf_hash.c
@@ -30,6 +30,7 @@
 #defineIAVF_PHINT_GTPU_EH_UP   BIT_ULL(3)
 #define IAVF_PHINT_OUTER_IPV4  BIT_ULL(4)
 #define IAVF_PHINT_OUTER_IPV6  BIT_ULL(5)
+#define IAVF_PHINT_GRE BIT_ULL(6)
 
 #define IAVF_PHINT_GTPU_MSK(IAVF_PHINT_GTPU| \
 IAVF_PHINT_GTPU_EH | \
@@ -428,6 +429,12 @@ static struct iavf_pattern_match_item 
iavf_hash_pattern_list[] = {
{iavf_pattern_eth_ipv4_gtpc,ETH_RSS_IPV4,   
&ipv4_udp_gtpc_tmplt},
{iavf_pattern_eth_ecpri,ETH_RSS_ECPRI,  
ð_ecpri_tmplt},
{iavf_pattern_eth_ipv4_ecpri,   ETH_RSS_ECPRI,  
&ipv4_ecpri_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4,
IAVF_RSS_TYPE_INNER_IPV4,   &inner_ipv4_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv4,
IAVF_RSS_TYPE_INNER_IPV4, &inner_ipv4_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_tcp,IAVF_RSS_TYPE_INNER_IPV4_TCP, 
&inner_ipv4_tcp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv4_tcp,IAVF_RSS_TYPE_INNER_IPV4_TCP, 
&inner_ipv4_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv4_udp,IAVF_RSS_TYPE_INNER_IPV4_UDP, 
&inner_ipv4_udp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv4_udp,IAVF_RSS_TYPE_INNER_IPV4_UDP, 
&inner_ipv4_udp_tmplt},
/* IPv6 */
{iavf_pattern_eth_ipv6, 
IAVF_RSS_TYPE_OUTER_IPV6,   &outer_ipv6_tmplt},
{iavf_pattern_eth_ipv6_frag_ext,
IAVF_RSS_TYPE_OUTER_IPV6_FRAG,  &outer_ipv6_frag_tmplt},
@@ -458,6 +465,12 @@ static struct iavf_pattern_match_item 
iavf_hash_pattern_list[] = {
{iavf_pattern_eth_ipv6_l2tpv3,  
IAVF_RSS_TYPE_IPV6_L2TPV3,  &ipv6_l2tpv3_tmplt},
{iavf_pattern_eth_ipv6_pfcp,
IAVF_RSS_TYPE_IPV6_PFCP,&ipv6_pfcp_tmplt},
{iavf_pattern_eth_ipv6_gtpc,ETH_RSS_IPV6,   
&ipv6_udp_gtpc_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6,
IAVF_RSS_TYPE_INNER_IPV6,   &inner_ipv6_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv6,
IAVF_RSS_TYPE_INNER_IPV6, &inner_ipv6_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_tcp,IAVF_RSS_TYPE_INNER_IPV6_TCP, 
&inner_ipv6_tcp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv6_tcp,IAVF_RSS_TYPE_INNER_IPV6_TCP, 
&inner_ipv6_tcp_tmplt},
+   {iavf_pattern_eth_ipv4_gre_ipv6_udp,IAVF_RSS_TYPE_INNER_IPV6_UDP, 
&inner_ipv6_udp_tmplt},
+   {iavf_pattern_eth_ipv6_gre_ipv6_udp,IAVF_RSS_TYPE_INNER_IPV6_UDP, 
&inner_ipv6_udp_tmplt},
 };
 
 static struct iavf_flow_engine iavf_hash_engine = {
@@ -592,11 +605,11 @@ iavf_hash_parse_pattern(const struct rte_flow_item 
pattern[], uint64_t *phint,
 
switch (item->type) {
case RTE_FLOW_ITEM_TYPE_IPV4:
-   if (!(*phint & IAVF_PHINT_GTPU_MSK))
+   if (!(*phint & IAVF_PHINT_GTPU_MSK) && !(*phint & 
IAVF_PHINT_GRE))
*phint |= IAVF_PHINT_OUTER_IPV4;
break;
case RTE_FLOW_ITEM_TYPE_IPV6:
-   if (!(*phint & IAVF_PHINT_GTPU_MSK))
+   if (!(*phint & IAVF_PHINT_GTPU_MSK) && !(*phint & 
IAVF_PHINT_GRE))
*phint |= IAVF_PHINT_OUTER_IPV6;
break;
case RTE_FLOW_ITEM_TYPE_GTPU:
@@ -627,6 +640,8 @@ iavf_hash_parse_pattern(const struct rte_flow_item 
pattern[], uint64_t *phint,
return -rte_errno;
}
break;
+   case RTE_FLOW_ITEM_TYPE_GRE:
+   *phint |= IAVF_PHINT_GRE;
default:
break;
}
@@ -867,7 +882,7 @@ iavf_refine_proto_hdrs_by_pattern(struct 
virtchnl_proto_hdrs *proto_hdrs,
struct virtchnl_proto_hdr *hdr2;
int i, shift_count = 1;
 
-   if (!(phint & IAVF_PHINT_GTPU_MSK))
+   if (!(phint & IAVF_PHINT_GTPU_MSK) && !(phint & IAVF_PHINT_GRE))
return;
 
if (phint & IAVF_PHINT_LAYERS_MSK)
@@ -883,10 +898,10 @@ iavf_refine_proto_hdrs_by_pattern(struct 
virtchnl_proto_hdrs *proto_hdrs,
}
 
if (shift_count == 1) {
-   /* adding gtpu header at layer 0 */
+   /* adding tunn

[dpdk-dev] [PATCH v2 4/4] net/iavf: support AVF FDIR for GRE tunnel packet

2021-06-01 Thread Wenjun Wu
Support AVF FDIR for inner header of GRE tunnel packet.

+--+---+
|   Pattern|Input Set  |
+--+---+
| eth/ipv4/gre/ipv4| inner: src/dst ip, dscp   |
| eth/ipv4/gre/ipv4/udp| inner: src/dst ip, dscp, src/dst port |
| eth/ipv4/gre/ipv4/tcp| inner: src/dst ip, dscp, src/dst port |
| eth/ipv4/gre/eh/ipv6 | inner: src/dst ip, tc |
| eth/ipv4/gre/eh/ipv6/udp | inner: src/dst ip, tc, src/dst port   |
| eth/ipv4/gre/eh/ipv6/tcp | inner: src/dst ip, tc, src/dst port   |
| eth/ipv6/gre/ipv4| inner: src/dst ip, dscp   |
| eth/ipv6/gre/ipv4/udp| inner: src/dst ip, dscp, src/dst port |
| eth/ipv6/gre/ipv4/tcp| inner: src/dst ip, dscp, src/dst port |
| eth/ipv6/gre/ipv6| inner: src/dst ip, tc |
| eth/ipv6/gre/ipv6/udp| inner: src/dst ip, tc, src/dst port   |
| eth/ipv6/gre/ipv6/tcp| inner: src/dst ip, tc, src/dst port   |
+--+---+

Signed-off-by: Wenjun Wu 
---
 drivers/net/iavf/iavf_fdir.c | 55 
 1 file changed, 55 insertions(+)

diff --git a/drivers/net/iavf/iavf_fdir.c b/drivers/net/iavf/iavf_fdir.c
index f238a83c84..c0b748caca 100644
--- a/drivers/net/iavf/iavf_fdir.c
+++ b/drivers/net/iavf/iavf_fdir.c
@@ -139,6 +139,30 @@
 #define IAVF_FDIR_INSET_ECPRI (\
IAVF_INSET_ECPRI)
 
+#define IAVF_FDIR_INSET_GRE_IPV4 (\
+   IAVF_INSET_TUN_IPV4_SRC | IAVF_INSET_TUN_IPV4_DST | \
+   IAVF_INSET_TUN_IPV4_TOS | IAVF_INSET_TUN_IPV4_PROTO)
+
+#define IAVF_FDIR_INSET_GRE_IPV4_TCP (\
+   IAVF_FDIR_INSET_GRE_IPV4 | IAVF_INSET_TUN_TCP_SRC_PORT | \
+   IAVF_INSET_TUN_TCP_DST_PORT)
+
+#define IAVF_FDIR_INSET_GRE_IPV4_UDP (\
+   IAVF_FDIR_INSET_GRE_IPV4 | IAVF_INSET_TUN_UDP_SRC_PORT | \
+   IAVF_INSET_TUN_UDP_DST_PORT)
+
+#define IAVF_FDIR_INSET_GRE_IPV6 (\
+   IAVF_INSET_TUN_IPV6_SRC | IAVF_INSET_TUN_IPV6_DST | \
+   IAVF_INSET_TUN_IPV6_TC | IAVF_INSET_TUN_IPV6_NEXT_HDR)
+
+#define IAVF_FDIR_INSET_GRE_IPV6_TCP (\
+   IAVF_FDIR_INSET_GRE_IPV6 | IAVF_INSET_TUN_TCP_SRC_PORT | \
+   IAVF_INSET_TUN_TCP_DST_PORT)
+
+#define IAVF_FDIR_INSET_GRE_IPV6_UDP (\
+   IAVF_FDIR_INSET_GRE_IPV6 | IAVF_INSET_TUN_UDP_SRC_PORT | \
+   IAVF_INSET_TUN_UDP_DST_PORT)
+
 static struct iavf_pattern_match_item iavf_fdir_pattern[] = {
{iavf_pattern_ethertype, IAVF_FDIR_INSET_ETH,   
IAVF_INSET_NONE},
{iavf_pattern_eth_ipv4,  IAVF_FDIR_INSET_ETH_IPV4,  
IAVF_INSET_NONE},
@@ -178,6 +202,18 @@ static struct iavf_pattern_match_item iavf_fdir_pattern[] 
= {
{iavf_pattern_eth_ipv6_pfcp, IAVF_FDIR_INSET_PFCP,  
IAVF_INSET_NONE},
{iavf_pattern_eth_ecpri, IAVF_FDIR_INSET_ECPRI, 
IAVF_INSET_NONE},
{iavf_pattern_eth_ipv4_ecpri,IAVF_FDIR_INSET_ECPRI, 
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv4,IAVF_FDIR_INSET_GRE_IPV4,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv4_tcp,IAVF_FDIR_INSET_GRE_IPV4_TCP,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv4_udp,IAVF_FDIR_INSET_GRE_IPV4_UDP,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv6,IAVF_FDIR_INSET_GRE_IPV6,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv6_tcp,IAVF_FDIR_INSET_GRE_IPV6_TCP,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv4_gre_ipv6_udp,IAVF_FDIR_INSET_GRE_IPV6_UDP,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv4,IAVF_FDIR_INSET_GRE_IPV4,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv4_tcp,IAVF_FDIR_INSET_GRE_IPV4_TCP,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv4_udp,IAVF_FDIR_INSET_GRE_IPV4_UDP,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv6,IAVF_FDIR_INSET_GRE_IPV6,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv6_tcp,IAVF_FDIR_INSET_GRE_IPV6_TCP,   
IAVF_INSET_NONE},
+   {iavf_pattern_eth_ipv6_gre_ipv6_udp,IAVF_FDIR_INSET_GRE_IPV6_UDP,   
IAVF_INSET_NONE},
 };
 
 static struct iavf_flow_parser iavf_fdir_parser;
@@ -596,6 +632,7 @@ iavf_fdir_parse_pattern(__rte_unused struct iavf_adapter 
*ad,
const struct rte_flow_item_ah *ah_spec, *ah_mask;
const struct rte_flow_item_pfcp *pfcp_spec, *pfcp_mask;
const struct rte_flow_item_ecpri *ecpri_spec, *ecpri_mask;
+   const struct rte_flow_item_gre *gre_spec, *gre_mask;
const struct rte_flow_item *item = pattern;
struct virtchnl_proto_hdr *hdr, *hdr1 = NULL;
struct rte_ecpri_common_hdr ecpri_common;
@@ -1195,6 +1232,24 @@ iavf_fdir_parse_pattern(__rte_unused struct iavf_adapter 
*ad,

Re: [dpdk-dev] [PATCH] fix Marvell maintainer email for atlantic

2021-06-01 Thread Devendra Singh Rawat
ACK for Marvell QLogic qede PMD.

Thanks,
Devendra

> -Original Message-
> From: Igor Russkikh 
> Sent: Tuesday, May 25, 2021 3:27 PM
> To: dev@dpdk.org
> Cc: Ferruh Yigit ; Jerin Jacob Kollanukkaran
> ; Rasesh Mody ; Devendra Singh
> Rawat ; Ariel Elior ; Igor
> Russkikh 
> Subject: [PATCH] fix Marvell maintainer email for atlantic
> 
> Fixing ex-Aquantia email - it is now part of Marvell.
> Removing Pavel Belous email - he is not in company now.
> 
> Also adding Marvell company prefix to the driver names for both atlantic and
> qede, to eliminate any confusion.
> 
> Qlogic is still actively used as a well known marketing name for the device
> family, so keeping it.
> 
> Signed-off-by: Igor Russkikh 
> Signed-off-by: Rasesh Mody 
> ---
>  MAINTAINERS | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5877a16971..c25959b546 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -599,9 +599,8 @@ F: drivers/net/axgbe/
>  F: doc/guides/nics/axgbe.rst
>  F: doc/guides/nics/features/axgbe.ini
> 
> -Aquantia atlantic
> -M: Igor Russkikh 
> -M: Pavel Belous 
> +Marvell/Aquantia atlantic
> +M: Igor Russkikh 
>  F: drivers/net/atlantic/
>  F: doc/guides/nics/atlantic.rst
>  F: doc/guides/nics/features/atlantic.ini
> @@ -872,7 +871,7 @@ F: drivers/net/ionic/
>  F: doc/guides/nics/ionic.rst
>  F: doc/guides/nics/features/ionic.ini
> 
> -QLogic bnx2x
> +Marvell QLogic bnx2x
>  M: Rasesh Mody 
>  M: Shahed Shaikh 
>  T: git://dpdk.org/next/dpdk-next-net-mrvl
> @@ -880,7 +879,7 @@ F: drivers/net/bnx2x/
>  F: doc/guides/nics/bnx2x.rst
>  F: doc/guides/nics/features/bnx2x*.ini
> 
> -QLogic qede PMD
> +Marvell QLogic qede PMD
>  M: Rasesh Mody 
>  M: Devendra Singh Rawat 
>  M: Igor Russkikh 
> --
> 2.25.1



[dpdk-dev] [PATCH 1/2] vhost: add unsafe API to drain pkts in async vhost

2021-06-01 Thread Cheng Jiang
Applications need to stop DMA transfers and finish all the in-flight
pkts when in VM memory hot-plug case and async vhost is used. This
patch is to provide an unsafe API to drain in-flight pkts which are
submitted to DMA engine in vhost async data path. And enable it in
vhost example.

Signed-off-by: Cheng Jiang 
---
 examples/vhost/main.c   | 48 +++-
 examples/vhost/main.h   |  1 +
 lib/vhost/rte_vhost_async.h | 22 +
 lib/vhost/version.map   |  3 ++
 lib/vhost/virtio_net.c  | 90 +++--
 5 files changed, 139 insertions(+), 25 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d2179eadb9..70bb67c7f8 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -851,8 +851,11 @@ complete_async_pkts(struct vhost_dev *vdev)
 
complete_count = rte_vhost_poll_enqueue_completed(vdev->vid,
VIRTIO_RXQ, p_cpl, MAX_PKT_BURST);
-   if (complete_count)
+   if (complete_count) {
free_pkts(p_cpl, complete_count);
+   __atomic_sub_fetch(&vdev->pkts_inflight, complete_count, 
__ATOMIC_SEQ_CST);
+   }
+
 }
 
 static __rte_always_inline void
@@ -895,6 +898,7 @@ drain_vhost(struct vhost_dev *vdev)
complete_async_pkts(vdev);
ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ,
m, nr_xmit, m_cpu_cpl, &cpu_cpl_nr);
+   __atomic_add_fetch(&vdev->pkts_inflight, ret - cpu_cpl_nr, 
__ATOMIC_SEQ_CST);
 
if (cpu_cpl_nr)
free_pkts(m_cpu_cpl, cpu_cpl_nr);
@@ -1226,6 +1230,9 @@ drain_eth_rx(struct vhost_dev *vdev)
enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
VIRTIO_RXQ, pkts, rx_count,
m_cpu_cpl, &cpu_cpl_nr);
+   __atomic_add_fetch(&vdev->pkts_inflight, enqueue_count - 
cpu_cpl_nr,
+   __ATOMIC_SEQ_CST);
+
if (cpu_cpl_nr)
free_pkts(m_cpu_cpl, cpu_cpl_nr);
 
@@ -1397,8 +1404,15 @@ destroy_device(int vid)
"(%d) device has been removed from data core\n",
vdev->vid);
 
-   if (async_vhost_driver)
+   if (async_vhost_driver) {
+   uint16_t n_pkt = 0;
+   struct rte_mbuf *m_cpl[vdev->pkts_inflight];
+   n_pkt = rte_vhost_drain_queue_thread_unsafe(vid, VIRTIO_RXQ, 
m_cpl,
+   vdev->pkts_inflight);
+
+   free_pkts(m_cpl, n_pkt);
rte_vhost_async_channel_unregister(vid, VIRTIO_RXQ);
+   }
 
rte_free(vdev);
 }
@@ -1487,6 +1501,35 @@ new_device(int vid)
return 0;
 }
 
+static int
+vring_state_changed(int vid, uint16_t queue_id, int enable)
+{
+   struct vhost_dev *vdev = NULL;
+
+   TAILQ_FOREACH(vdev, &vhost_dev_list, global_vdev_entry) {
+   if (vdev->vid == vid)
+   break;
+   }
+   if (!vdev)
+   return -1;
+
+   if (queue_id != VIRTIO_RXQ)
+   return 0;
+
+   if (async_vhost_driver) {
+   if (!enable) {
+   uint16_t n_pkt;
+   struct rte_mbuf *m_cpl[vdev->pkts_inflight];
+
+   n_pkt = rte_vhost_drain_queue_thread_unsafe(vid, 
queue_id,
+   m_cpl, 
vdev->pkts_inflight);
+   free_pkts(m_cpl, n_pkt);
+   }
+   }
+
+   return 0;
+}
+
 /*
  * These callback allow devices to be added to the data core when configuration
  * has been fully complete.
@@ -1495,6 +1538,7 @@ static const struct vhost_device_ops 
virtio_net_device_ops =
 {
.new_device =  new_device,
.destroy_device = destroy_device,
+   .vring_state_changed = vring_state_changed,
 };
 
 /*
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index 0ccdce4b4a..e7b1ac60a6 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -51,6 +51,7 @@ struct vhost_dev {
uint64_t features;
size_t hdr_len;
uint16_t nr_vrings;
+   uint16_t pkts_inflight;
struct rte_vhost_memory *mem;
struct device_statistics stats;
TAILQ_ENTRY(vhost_dev) global_vdev_entry;
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 6faa31f5ad..041f40cf04 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -193,4 +193,26 @@ __rte_experimental
 uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id,
struct rte_mbuf **pkts, uint16_t count);
 
+/**
+ * This function checks async completion status and empty all pakcets
+ * for a specific vhost device queue. Packets which are inflight will
+ * be returned in an array.
+ *
+ * @note This function do

[dpdk-dev] [PATCH 0/2] vhost: handle memory hotplug for async vhost

2021-06-01 Thread Cheng Jiang
When the guest memory is hotplugged, the vhost application which
enables DMA acceleration must stop DMA transfers before the vhost
re-maps the guest memory.

This patch set is to provide an unsafe API to drain in-flight pkts
which are submitted to DMA engine in vhost async data path, and
notify the vhost application of stopping DMA transfers.

Cheng Jiang (1):
  vhost: add unsafe API to drain pkts in async vhost

Jiayu Hu (1):
  vhost: handle memory hotplug for async vhost

 examples/vhost/main.c   | 48 +++-
 examples/vhost/main.h   |  1 +
 lib/vhost/rte_vhost_async.h | 22 +
 lib/vhost/version.map   |  3 ++
 lib/vhost/vhost_user.c  |  9 
 lib/vhost/virtio_net.c  | 90 +++--
 6 files changed, 148 insertions(+), 25 deletions(-)

--
2.29.2



[dpdk-dev] [PATCH 2/2] vhost: handle memory hotplug for async vhost

2021-06-01 Thread Cheng Jiang
From: Jiayu Hu 

When the guest memory is hotplugged, the vhost application which
enables DMA acceleration must stop DMA transfers before the vhost
re-maps the guest memory.

This patch is to notify the vhost application of stopping DMA
transfers.

Signed-off-by: Jiayu Hu 
---
 lib/vhost/vhost_user.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 8f0eba6412..6800e60c2d 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -1223,6 +1223,15 @@ vhost_user_set_mem_table(struct virtio_net **pdev, 
struct VhostUserMsg *msg,
vdpa_dev->ops->dev_close(dev->vid);
dev->flags &= ~VIRTIO_DEV_VDPA_CONFIGURED;
}
+
+   /* notify the backend application to stop DMA transfers */
+   if (dev->async_copy && dev->notify_ops->vring_state_changed) {
+   for (i = 0; i < dev->nr_vring; i++) {
+   dev->notify_ops->vring_state_changed(dev->vid,
+   i, 0);
+   }
+   }
+
free_mem_region(dev);
rte_free(dev->mem);
dev->mem = NULL;
-- 
2.29.2



[dpdk-dev] [PATCH v2] lib/vhost: enable IOMMU for async vhost

2021-06-01 Thread xuan . ding
From: Xuan Ding 

For async copy, it is unsafe to directly use the physical address.
and current address translation from GPA to HPA via SW also takes
CPU cycles, these can all benefit from IOMMU.

Since the existing DMA engine supports to use platform IOMMU,
this patch enables IOMMU for async vhost, which defines IOAT
devices to use virtual address instead of physical address.

When set memory table, the frontend's memory will be mapped
to the default container of DPDK where IOAT devices has been
added into. When DMA copy fails, the virtual address provided
to IOAT devices also allow us fallback to SW copy or PA copy.

With IOMMU enabled, to use IOAT devices:
1. IOAT devices must be binded to vfio-pci, rather than igb_uio.
2. DPDK must use "--iova-mode=va".

Signed-off-by: Xuan Ding 
---

v2:
* Fixed a format issue.
* Added the dma unmap logic when device is closed.
---
 doc/guides/prog_guide/vhost_lib.rst |  20 +
 lib/vhost/vhost_user.c  | 125 +---
 lib/vhost/virtio_net.c  |  30 +++
 3 files changed, 69 insertions(+), 106 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst 
b/doc/guides/prog_guide/vhost_lib.rst
index d18fb98910..9891394e50 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -420,3 +420,23 @@ Finally, a set of device ops is defined for device 
specific operations:
 * ``get_notify_area``
 
   Called to get the notify area info of the queue.
+
+  Vhost async data path
+  ---
+
+* Address mode
+
+Modern IOAT devices supports to use the IOMMU, which can avoid using
+the unsafe HPA. Besides, the CPU cycles took by SW to translate from
+GPA to HPA can also be saved. So IOAT devices are defined to use
+virtual address instead of physical address.
+
+With IOMMU enabled, to use IOAT devices:
+1. IOAT devices must be binded to vfio-pci, rather than igb_uio.
+2. DPDK must use ``--iova-mode=va``.
+
+* Fallback
+
+When the DMA copy fails, the user who implements the transfer_data
+callback can fallback to SW copy or fallback to PA copy through
+rte_mem_virt2iova().
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 8f0eba6412..1154c7ee24 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "iotlb.h"
 #include "vhost.h"
@@ -141,6 +142,34 @@ get_blk_size(int fd)
return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
 }
 
+static int
+async_dma_map(struct rte_vhost_mem_region *region, bool do_map)
+{
+   int ret = 0;
+   if (do_map) {
+   /* Add mapped region into the default container of DPDK. */
+   ret = rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD,
+   region->host_user_addr,
+   region->host_user_addr,
+   region->size);
+   if (ret) {
+   VHOST_LOG_CONFIG(ERR, "DMA engine map failed\n");
+   return ret;
+   }
+   } else {
+   /* Remove mapped region from the default container of DPDK. */
+   ret = 
rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD,
+   region->host_user_addr,
+   region->host_user_addr,
+   region->size);
+   if (ret) {
+   VHOST_LOG_CONFIG(ERR, "DMA engine unmap failed\n");
+   return ret;
+   }
+   }
+   return ret;
+}
+
 static void
 free_mem_region(struct virtio_net *dev)
 {
@@ -155,6 +184,9 @@ free_mem_region(struct virtio_net *dev)
if (reg->host_user_addr) {
munmap(reg->mmap_addr, reg->mmap_size);
close(reg->fd);
+
+   if (dev->async_copy)
+   async_dma_map(reg, false);
}
}
 }
@@ -866,87 +898,6 @@ vhost_user_set_vring_base(struct virtio_net **pdev,
return RTE_VHOST_MSG_RESULT_OK;
 }
 
-static int
-add_one_guest_page(struct virtio_net *dev, uint64_t guest_phys_addr,
-  uint64_t host_phys_addr, uint64_t size)
-{
-   struct guest_page *page, *last_page;
-   struct guest_page *old_pages;
-
-   if (dev->nr_guest_pages == dev->max_guest_pages) {
-   dev->max_guest_pages *= 2;
-   old_pages = dev->guest_pages;
-   dev->guest_pages = rte_realloc(dev->guest_pages,
-   dev->max_guest_pages * sizeof(*page),
-   RTE_CACHE_LINE_SIZE);
-   if (dev->guest_pages == NULL) {
-   VHOST_LOG_CONFIG(ERR, "cannot realloc guest_pa

[dpdk-dev] DTS Workgroup: MoM 05/26/2021

2021-06-01 Thread Honnappa Nagarahalli
Attendees:
Ashwin Shekar
Brandon Lo
Honnappa Nagarahalli
Lijuan Tu
Juraj Linkes
Owen Hilyard

The meeting announcements are sent to dev@dpdk.org.
Minutes:
Review action items from 5/19/2021
1) Test-pmd investigation under progress. Patch does not apply with git. Need 
to use 'patch' command.
2) DTS master is merged to dts-next (Thanks Lijuan)
3) Compiling the list is under progress. The excel sheet will capture the test 
cases along with the classes, diff if possible. This work is getting captured 
at [2]
4) DTS Bugzilla to fix the IAVF enablement in the framework is created [3], 
Lijuan will get the ticket fixed.
5) LF has added a link on DTS bugs in the DTS links under hosted projects
6) The work item related discussions are captured in [1]

[1] 
https://docs.google.com/document/d/1c5S0_mZzFvzZfYkqyORLT2-qNvUb-fBdjA6DGusy4yM/edit?usp=sharing
[2] 
https://docs.google.com/spreadsheets/d/1i7x4ecPiRXNKOrOy0the5WyGSWbdMxPLK7aWbuWI4ew/edit#gid=880561943
[3] https://bugs.dpdk.org/show_bug.cgi?id=715