Re: [dpdk-dev] [PATCH] service: print errors to rte log

2019-08-21 Thread Van Haaren, Harry
> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Wednesday, August 21, 2019 12:33 AM
> To: Van Haaren, Harry 
> Cc: dev@dpdk.org; Stephen Hemminger 
> Subject: [PATCH] service: print errors to rte log
> 
> EAL should always use rte_log instead of putting errors to
> stderr (which maybe redirected to /dev/null in a daemon).
> 
> Also checks for null before rte_free are unnecessary.
> 
> Signed-off-by: Stephen Hemminger 

Thanks - good improvements.

A few nit-picks, I'll send a v2 based on your changes here with
the below notes implemented.

I'll add my Sign-off for code changes, and Acked-by for the whole,
hope that's OK, if you'd prefer two different patches just let me know.

-H

> ---
>  lib/librte_eal/common/rte_service.c | 23 +++
>  1 file changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/librte_eal/common/rte_service.c
> b/lib/librte_eal/common/rte_service.c
> index c3653ebae46c..aa2f8f3ef4b1 100644
> --- a/lib/librte_eal/common/rte_service.c
> +++ b/lib/librte_eal/common/rte_service.c
> @@ -70,10 +70,12 @@ static struct rte_service_spec_impl *rte_services;
>  static struct core_state *lcore_states;
>  static uint32_t rte_service_library_initialized;
> 
> +
>  int32_t rte_service_init(void)
>  {

Added line here should really split return-value and function into
two lines. Found another instance of this, splitting that too to make
the whole file consistent.

Rest of file uses 1 line to split variable declarations and functions,
so one line will do.



>   if (!rte_services) {
> - printf("error allocating rte services array\n");
> + RTE_LOG(ERR, EAL,
> + "error allocating rte services array\n");
>   goto fail_mem;

Some of these "strings" can be on the same line as RTE_LOG and stay
inside the 80 char limit, moving them up a line for consistency.


[dpdk-dev] [PATCH v2] service: print errors to rte log

2019-08-21 Thread Harry van Haaren
From: Stephen Hemminger 

EAL should always use rte_log instead of putting errors to
stderr (which maybe redirected to /dev/null in a daemon).

Also checks for null before rte_free are unnecessary.
Minor code consistency improvements.

Signed-off-by: Stephen Hemminger 
Signed-off-by: Harry van Haaren 
Acked-by: Harry van Haaren 
---
 lib/librte_eal/common/rte_service.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/common/rte_service.c 
b/lib/librte_eal/common/rte_service.c
index c3653ebae..fe0907720 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -70,10 +70,12 @@ static struct rte_service_spec_impl *rte_services;
 static struct core_state *lcore_states;
 static uint32_t rte_service_library_initialized;
 
-int32_t rte_service_init(void)
+int32_t
+rte_service_init(void)
 {
if (rte_service_library_initialized) {
-   printf("service library init() called, init flag %d\n",
+   RTE_LOG(NOTICE, EAL,
+   "service library init() called, init flag %d\n",
rte_service_library_initialized);
return -EALREADY;
}
@@ -82,14 +84,14 @@ int32_t rte_service_init(void)
sizeof(struct rte_service_spec_impl),
RTE_CACHE_LINE_SIZE);
if (!rte_services) {
-   printf("error allocating rte services array\n");
+   RTE_LOG(ERR, EAL, "error allocating rte services array\n");
goto fail_mem;
}
 
lcore_states = rte_calloc("rte_service_core_states", RTE_MAX_LCORE,
sizeof(struct core_state), RTE_CACHE_LINE_SIZE);
if (!lcore_states) {
-   printf("error allocating core states array\n");
+   RTE_LOG(ERR, EAL, "error allocating core states array\n");
goto fail_mem;
}
 
@@ -108,10 +110,8 @@ int32_t rte_service_init(void)
rte_service_library_initialized = 1;
return 0;
 fail_mem:
-   if (rte_services)
-   rte_free(rte_services);
-   if (lcore_states)
-   rte_free(lcore_states);
+   rte_free(rte_services);
+   rte_free(lcore_states);
return -ENOMEM;
 }
 
@@ -121,11 +121,8 @@ rte_service_finalize(void)
if (!rte_service_library_initialized)
return;
 
-   if (rte_services)
-   rte_free(rte_services);
-
-   if (lcore_states)
-   rte_free(lcore_states);
+   rte_free(rte_services);
+   rte_free(lcore_states);
 
rte_service_library_initialized = 0;
 }
@@ -397,8 +394,8 @@ rte_service_may_be_active(uint32_t id)
return 0;
 }
 
-int32_t rte_service_run_iter_on_app_lcore(uint32_t id,
-   uint32_t serialize_mt_unsafe)
+int32_t
+rte_service_run_iter_on_app_lcore(uint32_t id, uint32_t serialize_mt_unsafe)
 {
/* run service on calling core, using all-ones as the service mask */
if (!service_valid(id))
-- 
2.17.1



Re: [dpdk-dev] Sync up status for Mellanox PMD barrier investigation

2019-08-21 Thread Phil Yang (Arm Technology China)
Some update for this thread.

In the most critical datapath of mlx5 PMD, there are some rte_cio_w/rmb, 'dmb 
osh' on aarch64, in use.
C11 atomic is good for replacing the rte_smp_r/wmb to relax the data 
synchronization barrier between CPUs.
However, mlx5 PMD needs to write data back to the  HW, so it used a lot of 
rte_cio_r/wmb to synchronize data.

Please check details below. All comments are welcomed. Thanks.

 Data path ///
drivers/net/mlx5/mlx5_rxtx.c=950=mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, 
uint8_t mbuf_prepare)
drivers/net/mlx5/mlx5_rxtx.c:1002:   rte_cio_wmb();
drivers/net/mlx5/mlx5_rxtx.c:1004:   rte_cio_wmb();
drivers/net/mlx5/mlx5_rxtx.c:1010:   rte_cio_wmb();
drivers/net/mlx5/mlx5_rxtx.c=1272=mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf 
**pkts, uint16_t pkts_n)
drivers/net/mlx5/mlx5_rxtx.c:1385:   rte_cio_wmb();
drivers/net/mlx5/mlx5_rxtx.c:1387:   rte_cio_wmb();
drivers/net/mlx5/mlx5_rxtx.c=1549=mlx5_rx_burst_mprq(void *dpdk_rxq, struct 
rte_mbuf **pkts, uint16_t pkts_n)
drivers/net/mlx5/mlx5_rxtx.c:1741:   rte_cio_wmb();
drivers/net/mlx5/mlx5_rxtx.c:1745:   rte_cio_wmb();
drivers/net/mlx5/mlx5_rxtx_vec_neon.h=366=rxq_burst_v(struct mlx5_rxq_data 
*rxq, struct rte_mbuf **pkts, uint16_t pkts_n,
drivers/net/mlx5/mlx5_rxtx_vec_neon.h:530:   rte_cio_rmb();

Commit messages:
net/mlx5: cleanup memory barriers: mlx5_rx_burst
https://git.dpdk.org/dpdk/commit/?id=9afa3f74658afc0e21fbe5c3884c55a21ff49299

net/mlx5: add Multi-Packet Rx support : mlx5_rx_burst_mprq
https://git.dpdk.org/dpdk/commit/?id=7d6bf6b866b8c25ec06539b3eeed1db4f785577c

net/mlx5: use coherent I/O memory barrier
https://git.dpdk.org/dpdk/commit/drivers/net/mlx5/mlx5_rxtx.c?id=0cfdc1808de82357a924a479dc3f89de88cd91c2

net/mlx5: extend Rx completion with error handling
https://git.dpdk.org/dpdk/commit/drivers/net/mlx5/mlx5_rxtx.c?id=88c0733535d6a7ce79045d4d57a1d78d904067c8

net/mlx5: fix synchronization on polling Rx completions
https://git.dpdk.org/dpdk/commit/?id=1742c2d9fab07e66209f2d14e7daa50829fc4423


Thanks,
Phil Yang

From: Phil Yang (Arm Technology China)
Sent: Thursday, August 15, 2019 6:35 PM
To: Honnappa Nagarahalli 
Subject: Sync up status for Mellanox PMD barrier investigation

Hi Honnappa,

I have checked all the barriers in mlx5 PMD data path. In my understanding, it 
used the barrier correctly (Use DMB to synchronize the memory data between 
CPUs).
The attachment is the list of positions of these barriers.
I just want to sync up with you the status. Do you have any idea or suggestion 
on which part should we start to optimization?

Best Regards,
Phil Yang


[dpdk-dev] [PATCH 0/3] add unit tests for eal vfio library

2019-08-21 Thread Chaitanya Babu Talluri
1/3: fix vfio unmap that fails unexpectedly
2/3: fix vfio unmap that succeeds unexpectedly
3/3: add unit tests for eal vfio

Signed-off-by: Chaitanya Babu Talluri 

Chaitanya Babu Talluri (3):
  lib/eal: fix vfio unmap that fails unexpectedly
  lib/eal: fix vfio unmap that succeeds unexpectedly
  app/test: add unit tests for eal vfio

 app/test/Makefile   |   1 +
 app/test/meson.build|   2 +
 app/test/test_eal_vfio.c| 728 
 lib/librte_eal/linux/eal/eal_vfio.c |  59 ++-
 4 files changed, 783 insertions(+), 7 deletions(-)
 create mode 100644 app/test/test_eal_vfio.c

-- 
2.17.2



Re: [dpdk-dev] [PATCH] eal: remove redundant error output

2019-08-21 Thread Aaron Conole
Stephen Hemminger  writes:

> The function rte_eal_init_alert ends up printing the same message
> twice. Once via RTE_LOG and once to stderr. Remove the fprintf
> to stderr since it is redundant.
>
> Signed-off-by: Stephen Hemminger 
> ---

This was originally added at your suggestion:

http://mails.dpdk.org/archives/dev/2017-January/056431.html

Because sometimes we have these alerts before logging is up (so the
RTE_LOG(...) won't show up, I gather).

Is it possible to have an either/or?

>  lib/librte_eal/linux/eal/eal.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
> index 946222ccdb7a..076fb3cbde5f 100644
> --- a/lib/librte_eal/linux/eal/eal.c
> +++ b/lib/librte_eal/linux/eal/eal.c
> @@ -949,7 +949,6 @@ static int rte_eal_vfio_setup(void)
>  
>  static void rte_eal_init_alert(const char *msg)
>  {
> - fprintf(stderr, "EAL: FATAL: %s\n", msg);
>   RTE_LOG(ERR, EAL, "%s\n", msg);
>  }


[dpdk-dev] [PATCH 1/3] lib/eal: fix vfio unmap that fails unexpectedly

2019-08-21 Thread Chaitanya Babu Talluri
Unmap of multiple pages fails after a sequence of partial map/unmaps.
The scenario is that multiple maps are created in user_mem_maps,
after multiple map/unmap/remap sequences.

For an example,
Steps:
1. Map 3 pages together
2. Un-map page1
3. Re-map page 1
4. Un-map page 2
5. Re-map page 2
6. Un-map page 3
7. Re-map page 3
8. Un-map all pages

Unmap fails when there are duplicate entries in user_mem_maps.

The fix is to validate if the input VA, IOVA exists in
user_mem_maps before creating map.

Fixes: 73a63908 ("vfio: allow to map other memory regions")
Cc: sta...@dpdk.org

Signed-off-by: Chaitanya Babu Talluri 
---
 lib/librte_eal/linux/eal/eal_vfio.c | 46 +
 1 file changed, 46 insertions(+)

diff --git a/lib/librte_eal/linux/eal/eal_vfio.c 
b/lib/librte_eal/linux/eal/eal_vfio.c
index 501c74f23..104912077 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -212,6 +212,41 @@ find_user_mem_map(struct user_mem_maps *user_mem_maps, 
uint64_t addr,
return NULL;
 }
 
+static int
+find_user_mem_map_overlap(struct user_mem_maps *user_mem_maps, uint64_t addr,
+   uint64_t iova, uint64_t len)
+{
+   uint64_t va_end = addr + len;
+   uint64_t iova_end = iova + len;
+   int i;
+
+   for (i = 0; i < user_mem_maps->n_maps; i++) {
+   struct user_mem_map *map = &user_mem_maps->maps[i];
+   uint64_t map_va_end = map->addr + map->len;
+   uint64_t map_iova_end = map->iova + map->len;
+
+   bool no_lo_va_overlap = addr < map->addr && va_end <= map->addr;
+   bool no_hi_va_overlap = addr >= map_va_end &&
+   va_end > map_va_end;
+   bool no_lo_iova_overlap = iova < map->iova &&
+   iova_end <= map->iova;
+   bool no_hi_iova_overlap = iova >= map_iova_end &&
+   iova_end > map_iova_end;
+
+   /* check input VA and iova is not within the
+* existing map's range
+*/
+   if ((no_lo_va_overlap || no_hi_va_overlap) &&
+   (no_lo_iova_overlap || no_hi_iova_overlap))
+   continue;
+   else
+   /* map overlaps */
+   return 1;
+   }
+   /* map doesn't overlap */
+   return 0;
+}
+
 /* this will sort all user maps, and merge/compact any adjacent maps */
 static void
 compact_user_maps(struct user_mem_maps *user_mem_maps)
@@ -1732,6 +1767,17 @@ container_dma_map(struct vfio_config *vfio_cfg, uint64_t 
vaddr, uint64_t iova,
ret = -1;
goto out;
}
+
+   /* check whether vaddr and iova exists in user_mem_maps */
+   ret = find_user_mem_map_overlap(user_mem_maps, vaddr, iova, len);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Mapping overlaps with a previously "
+   "existing mapping\n");
+   rte_errno = EEXIST;
+   ret = -1;
+   goto out;
+   }
+
/* map the entry */
if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) {
/* technically, this will fail if there are currently no devices
-- 
2.17.2



[dpdk-dev] [PATCH 2/3] lib/eal: fix vfio unmap that succeeds unexpectedly

2019-08-21 Thread Chaitanya Babu Talluri
Un-map of page with valid virtual address and
another page's IOVA succeeds unexpectedly.
An entry in user_mem_maps can refer multiple pages.
Currently in such case to unmap single page, VA
and IOVA related to entry in user_mem_maps is
checked but not based on page (based on the
page size), this is the cause.

The solution is that in find_user_mem_maps,
check whether user input iova is in relation with
input virtual address of the page which is to be
unmapped.

Fixes: 73a6390859 ("vfio: allow to map other memory regions")
Cc: sta...@dpdk.org

Signed-off-by: Chaitanya Babu Talluri 
---
 lib/librte_eal/linux/eal/eal_vfio.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/linux/eal/eal_vfio.c 
b/lib/librte_eal/linux/eal/eal_vfio.c
index 104912077..04c284cb2 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -184,13 +184,13 @@ find_user_mem_map(struct user_mem_maps *user_mem_maps, 
uint64_t addr,
uint64_t iova, uint64_t len)
 {
uint64_t va_end = addr + len;
-   uint64_t iova_end = iova + len;
int i;
 
for (i = 0; i < user_mem_maps->n_maps; i++) {
struct user_mem_map *map = &user_mem_maps->maps[i];
uint64_t map_va_end = map->addr + map->len;
-   uint64_t map_iova_end = map->iova + map->len;
+   uint64_t diff_addr_len = addr - map->addr;
+   uint64_t expected_iova = map->iova + diff_addr_len;
 
/* check start VA */
if (addr < map->addr || addr >= map_va_end)
@@ -199,11 +199,10 @@ find_user_mem_map(struct user_mem_maps *user_mem_maps, 
uint64_t addr,
if (va_end <= map->addr || va_end > map_va_end)
continue;
 
-   /* check start IOVA */
-   if (iova < map->iova || iova >= map_iova_end)
-   continue;
-   /* check if IOVA end is within boundaries */
-   if (iova_end <= map->iova || iova_end > map_iova_end)
+   /* check whether user input iova is in sync with
+* user_mem_map entry's iova
+*/
+   if (expected_iova != iova)
continue;
 
/* we've found our map */
-- 
2.17.2



[dpdk-dev] [PATCH 3/3] app/test: add unit tests for eal vfio

2019-08-21 Thread Chaitanya Babu Talluri
Unit test cases are added for eal vfio library.
eal_vfio_autotest added to meson build file.

Signed-off-by: Chaitanya Babu Talluri 
---
 app/test/Makefile|   1 +
 app/test/meson.build |   2 +
 app/test/test_eal_vfio.c | 728 +++
 3 files changed, 731 insertions(+)
 create mode 100644 app/test/test_eal_vfio.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 26ba6fe2b..9b9c78b4e 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -137,6 +137,7 @@ SRCS-y += test_cpuflags.c
 SRCS-y += test_mp_secondary.c
 SRCS-y += test_eal_flags.c
 SRCS-y += test_eal_fs.c
+SRCS-y += test_eal_vfio.c
 SRCS-y += test_alarm.c
 SRCS-y += test_interrupts.c
 SRCS-y += test_version.c
diff --git a/app/test/meson.build b/app/test/meson.build
index ec40943bd..2ec9c863a 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -36,6 +36,7 @@ test_sources = files('commands.c',
'test_distributor_perf.c',
'test_eal_flags.c',
'test_eal_fs.c',
+   'test_eal_vfio.c',
'test_efd.c',
'test_efd_perf.c',
'test_errno.c',
@@ -175,6 +176,7 @@ fast_test_names = [
 'eal_flags_file_prefix_autotest',
 'eal_flags_misc_autotest',
 'eal_fs_autotest',
+   'eal_vfio_autotest',
 'errno_autotest',
 'event_ring_autotest',
 'func_reentrancy_autotest',
diff --git a/app/test/test_eal_vfio.c b/app/test/test_eal_vfio.c
new file mode 100644
index 0..8995573df
--- /dev/null
+++ b/app/test/test_eal_vfio.c
@@ -0,0 +1,728 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#if !defined(RTE_EXEC_ENV_LINUX) || !defined(RTE_EAL_VFIO)
+static int
+test_eal_vfio(void)
+{
+   printf("VFIO not supported, skipping test\n");
+   return TEST_SKIPPED;
+}
+
+#else
+
+#define PAGESIZE sysconf(_SC_PAGESIZE)
+#define INVALID_CONTAINER_FD -5
+#define THREE_PAGES 3
+#define UNMAPPED_ADDR 0x1500
+
+uint64_t virtaddr_64;
+const char *name = "heap";
+size_t map_length;
+int container_fds[RTE_MAX_VFIO_CONTAINERS];
+
+static int
+check_get_mem(void *addr, rte_iova_t *iova)
+{
+   const struct rte_memseg_list *msl;
+   const struct rte_memseg *ms;
+   rte_iova_t expected_iova;
+
+   msl = rte_mem_virt2memseg_list(addr);
+   if (!msl->external) {
+   printf("%s():%i: Memseg list is not marked as "
+   "external\n", __func__, __LINE__);
+   return -1;
+   }
+   ms = rte_mem_virt2memseg(addr, msl);
+   if (ms == NULL) {
+   printf("%s():%i: Failed to retrieve memseg for "
+   "external mem\n", __func__, __LINE__);
+   return -1;
+   }
+   if (ms->addr != addr) {
+   printf("%s():%i: VA mismatch\n", __func__, __LINE__);
+   return -1;
+   }
+   expected_iova = (iova == NULL) ? RTE_BAD_IOVA : iova[0];
+   if (ms->iova != expected_iova) {
+   printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+   return -1;
+   }
+   return 0;
+}
+
+/* Initialize container fds */
+static int
+initialize_container_fds(void)
+{
+   int i = 0;
+
+   for (i = 0; i < RTE_MAX_VFIO_CONTAINERS; i++)
+   container_fds[i] = -1;
+
+   return TEST_SUCCESS;
+}
+
+/* To test vfio container create */
+static int
+test_vfio_container_create(void)
+{
+   int ret = 0, i = 0;
+
+   /* check max containers limit */
+   for (i = 1; i < RTE_MAX_VFIO_CONTAINERS; i++) {
+   container_fds[i] = rte_vfio_container_create();
+   TEST_ASSERT(container_fds[i] >  0, "Test to check "
+   "rte_vfio_container_create with max "
+   "containers limit: Failed\n");
+   }
+
+   /* check rte_vfio_container_create when exceeds max containers limit */
+   ret = rte_vfio_container_create();
+   TEST_ASSERT(ret == -1, "Test to check "
+   "rte_vfio_container_create container "
+   "when exceeds limit: Failed\n");
+
+   return TEST_SUCCESS;
+}
+
+/* To test vfio container destroy */
+static int
+test_vfio_container_destroy(void)
+{
+   int i = 0, ret = 0;
+
+   /* check to destroy max container limit */
+   for (i = 1; i < RTE_MAX_VFIO_CONTAINERS; i++) {
+   ret = rte_vfio_container_destroy(container_fds[i]);
+   TEST_ASSERT(ret == 0, "Test to check "
+   "rte_vfio_container_destroy: Failed\n");
+   container_fds[i] = -1;
+   }
+
+   /* check rte_vfio_container_destroy with valid but non existing value */
+   ret = rte_vfio_container_destroy(0);
+   TEST_ASSERT(ret == -1, "Test to check rte_vfio_container_destroy with "
+

Re: [dpdk-dev] [PATCH 1/3] lib/eal: fix vfio unmap that fails unexpectedly

2019-08-21 Thread Burakov, Anatoly

On 21-Aug-19 2:02 PM, Chaitanya Babu Talluri wrote:

Unmap of multiple pages fails after a sequence of partial map/unmaps.
The scenario is that multiple maps are created in user_mem_maps,
after multiple map/unmap/remap sequences.

For an example,
Steps:
1. Map 3 pages together
2. Un-map page1
3. Re-map page 1
4. Un-map page 2
5. Re-map page 2
6. Un-map page 3
7. Re-map page 3
8. Un-map all pages


I don't think this description is correct in relation to what is being 
fixed here. The code attempts to prevent overlaps, but there are no 
overlaps in the above example - none of the above operations would 
trigger the added code.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH 2/3] lib/eal: fix vfio unmap that succeeds unexpectedly

2019-08-21 Thread Burakov, Anatoly

On 21-Aug-19 2:02 PM, Chaitanya Babu Talluri wrote:

Un-map of page with valid virtual address and
another page's IOVA succeeds unexpectedly.
An entry in user_mem_maps can refer multiple pages.
Currently in such case to unmap single page, VA
and IOVA related to entry in user_mem_maps is
checked but not based on page (based on the
page size), this is the cause.

The solution is that in find_user_mem_maps,
check whether user input iova is in relation with
input virtual address of the page which is to be
unmapped.


The description could be clearer. Suggested rewording:

Unmapping page with a VA that is found in the list of current mappings 
will succeed even if the IOVA for the chunk that is being unmapped, is 
mismatched. Fix it by checking if IOVA address matches the expected IOVA 
address exactly.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH 3/3] app/test: add unit tests for eal vfio

2019-08-21 Thread Aaron Conole
Chaitanya Babu Talluri  writes:

> Unit test cases are added for eal vfio library.
> eal_vfio_autotest added to meson build file.
>
> Signed-off-by: Chaitanya Babu Talluri 
> ---

Thanks for adding unit tests for the vfio library.

In this case, there seems to be some failures - can you help determine
the cause:

https://travis-ci.com/ovsrobot/dpdk/jobs/227066776


[dpdk-dev] [PATCH] net/vmxnet3: fix RSS setting on v4

2019-08-21 Thread Eduard Serra Miralles
When calling to setup RSS on v4 API, ESX will expect
IPv4/6 TCP RSS to be set/requested mandatorily.

This patch will:
- Set IPv4/6 TCP RSS when these have not been set. A warning
message is thrown to make sure we warn the application we are
setting IPv4/6 TCP RSS when not set.
- An additional check has been added to dodge RSS configuration
altogether unless MQ_RSS has been requested, similar to v3.

The alternative (returning error) was considered, the intent
is to ease the task of setting up and running vmxnet3 in situations
where it's supposted to be most strightforward (testpmd, pktgen).

Signed-off-by: Eduard Serra 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 3 ++-
 drivers/net/vmxnet3/vmxnet3_ethdev.h | 4 
 drivers/net/vmxnet3/vmxnet3_rxtx.c   | 8 
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 57feb37..0a7047e 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -769,7 +769,8 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
PMD_INIT_LOG(DEBUG, "Failed to setup memory region\n");
}
 
-   if (VMXNET3_VERSION_GE_4(hw)) {
+   if (VMXNET3_VERSION_GE_4(hw) &&
+   dev->data->dev_conf.rxmode.mq_mode == ETH_MQ_RX_RSS) {
/* Check for additional RSS  */
ret = vmxnet3_v4_rss_configure(dev);
if (ret != VMXNET3_SUCCESS) {
diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.h 
b/drivers/net/vmxnet3/vmxnet3_ethdev.h
index 8c2b6f8..6e3ce7d 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.h
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.h
@@ -38,6 +38,10 @@
ETH_RSS_NONFRAG_IPV4_UDP | \
ETH_RSS_NONFRAG_IPV6_UDP)
 
+#define VMXNET3_MANDATORY_V4_RSS ( \
+   ETH_RSS_NONFRAG_IPV4_TCP | \
+   ETH_RSS_NONFRAG_IPV6_TCP)
+
 /* RSS configuration structure - shared with device through GPA */
 typedef struct VMXNET3_RSSConf {
uint16_t   hashType;
diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 7794d74..dd99684 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -1311,6 +1311,14 @@ vmxnet3_v4_rss_configure(struct rte_eth_dev *dev)
 
cmdInfo->setRSSFields = 0;
port_rss_conf = &dev->data->dev_conf.rx_adv_conf.rss_conf;
+
+   if ((port_rss_conf->rss_hf & VMXNET3_MANDATORY_V4_RSS) !=
+   VMXNET3_MANDATORY_V4_RSS) {
+   PMD_INIT_LOG(WARNING, "RSS: IPv4/6 TCP is required for vmxnet3 
v4 RSS,"
+"automatically setting it");
+   port_rss_conf->rss_hf |= VMXNET3_MANDATORY_V4_RSS;
+   }
+
rss_hf = port_rss_conf->rss_hf &
(VMXNET3_V4_RSS_MASK | VMXNET3_RSS_OFFLOAD_ALL);
 
-- 
2.7.4



[dpdk-dev] [PATCH] timer: remove check_tsc_flags()

2019-08-21 Thread Jim Harris
This code was added 7+ years ago (commit fb022b85ba),
presumably when variant TSCs were still somewhat
common?  But this code doesn't do anything except print
a warning, and the warning doesn't give any kind of
advice to the user, so let's just remove it.

While the warning has no functional meaning, the
/proc/cpuinfo parsing consumes a non-trivial amount
of time which is especially noticeable in secondary
processes.  On my test system, it consumes
21ms out of the 66ms total execution time for
rte_eal_init() in a secondary process.

Signed-off-by: Jim Harris 
---
 lib/librte_eal/linux/eal/eal_timer.c |   36 --
 1 file changed, 36 deletions(-)

diff --git a/lib/librte_eal/linux/eal/eal_timer.c 
b/lib/librte_eal/linux/eal/eal_timer.c
index 76ec17034..a904a8297 100644
--- a/lib/librte_eal/linux/eal/eal_timer.c
+++ b/lib/librte_eal/linux/eal/eal_timer.c
@@ -192,41 +192,6 @@ rte_eal_hpet_init(int make_default)
 }
 #endif
 
-static void
-check_tsc_flags(void)
-{
-   char line[512];
-   FILE *stream;
-
-   stream = fopen("/proc/cpuinfo", "r");
-   if (!stream) {
-   RTE_LOG(WARNING, EAL, "WARNING: Unable to open 
/proc/cpuinfo\n");
-   return;
-   }
-
-   while (fgets(line, sizeof line, stream)) {
-   char *constant_tsc;
-   char *nonstop_tsc;
-
-   if (strncmp(line, "flags", 5) != 0)
-   continue;
-
-   constant_tsc = strstr(line, "constant_tsc");
-   nonstop_tsc = strstr(line, "nonstop_tsc");
-   if (!constant_tsc || !nonstop_tsc)
-   RTE_LOG(WARNING, EAL,
-   "WARNING: cpu flags "
-   "constant_tsc=%s "
-   "nonstop_tsc=%s "
-   "-> using unreliable clock cycles !\n",
-   constant_tsc ? "yes":"no",
-   nonstop_tsc ? "yes":"no");
-   break;
-   }
-
-   fclose(stream);
-}
-
 uint64_t
 get_tsc_freq(void)
 {
@@ -263,6 +228,5 @@ rte_eal_timer_init(void)
eal_timer_source = EAL_TIMER_TSC;
 
set_tsc_freq();
-   check_tsc_flags();
return 0;
 }



[dpdk-dev] [PATCH v2] timer: remove check_tsc_flags()

2019-08-21 Thread Jim Harris
This code was added 7+ years ago:

commit fb022b85bae4 ("timer: check TSC reliability")

presumably when variant TSCs were still somewhat
common?  But this code doesn't do anything except print
a warning, and the warning doesn't give any kind of
advice to the user, so let's just remove it.

While the warning has no functional meaning, the
/proc/cpuinfo parsing consumes a non-trivial amount
of time which is especially noticeable in secondary
processes.  On my test system, it consumes
21ms out of the 66ms total execution time for
rte_eal_init() in a secondary process.

Signed-off-by: Jim Harris 
---
 lib/librte_eal/linux/eal/eal_timer.c |   36 --
 1 file changed, 36 deletions(-)

diff --git a/lib/librte_eal/linux/eal/eal_timer.c 
b/lib/librte_eal/linux/eal/eal_timer.c
index 76ec17034..a904a8297 100644
--- a/lib/librte_eal/linux/eal/eal_timer.c
+++ b/lib/librte_eal/linux/eal/eal_timer.c
@@ -192,41 +192,6 @@ rte_eal_hpet_init(int make_default)
 }
 #endif
 
-static void
-check_tsc_flags(void)
-{
-   char line[512];
-   FILE *stream;
-
-   stream = fopen("/proc/cpuinfo", "r");
-   if (!stream) {
-   RTE_LOG(WARNING, EAL, "WARNING: Unable to open 
/proc/cpuinfo\n");
-   return;
-   }
-
-   while (fgets(line, sizeof line, stream)) {
-   char *constant_tsc;
-   char *nonstop_tsc;
-
-   if (strncmp(line, "flags", 5) != 0)
-   continue;
-
-   constant_tsc = strstr(line, "constant_tsc");
-   nonstop_tsc = strstr(line, "nonstop_tsc");
-   if (!constant_tsc || !nonstop_tsc)
-   RTE_LOG(WARNING, EAL,
-   "WARNING: cpu flags "
-   "constant_tsc=%s "
-   "nonstop_tsc=%s "
-   "-> using unreliable clock cycles !\n",
-   constant_tsc ? "yes":"no",
-   nonstop_tsc ? "yes":"no");
-   break;
-   }
-
-   fclose(stream);
-}
-
 uint64_t
 get_tsc_freq(void)
 {
@@ -263,6 +228,5 @@ rte_eal_timer_init(void)
eal_timer_source = EAL_TIMER_TSC;
 
set_tsc_freq();
-   check_tsc_flags();
return 0;
 }



[dpdk-dev] [PATCH] timer: use rte_mp_msg to get freq from primary process

2019-08-21 Thread Jim Harris
Ideally, get_tsc_freq_arch() is able to provide the
TSC rate using architecture-specific means.  When that
is not possible, DPDK reverts to calculating the
TSC rate with a 100ms nanosleep or 1s sleep.  The latter
occurs more frequently in VMs which often do not have
access to the data they need from arch-specific means
(CPUID leaf 0x15 or MSR 0xCE on x86).

In secondary processes, the extra 100ms is especially
noticeable and consumes the bulk of rte_eal_init()
execution time.  So in secondary processes, if
we cannot get the TSC rate using get_tsc_freq_arch(),
try to get the TSC rate from the primary process
instead using rte_mp_msg.  This is much faster than
100ms.

Reduces rte_eal_init() execution time in a secondary
process from 165ms to 66ms on my test system.

Signed-off-by: Jim Harris 
Change-Id: I584419ed1c7d6f47841e0a0eb23f34c9f1186d35
---
 lib/librte_eal/common/eal_common_timer.c |   61 ++
 1 file changed, 61 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_timer.c 
b/lib/librte_eal/common/eal_common_timer.c
index 145543de7..4c58cea6e 100644
--- a/lib/librte_eal/common/eal_common_timer.c
+++ b/lib/librte_eal/common/eal_common_timer.c
@@ -15,9 +15,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eal_private.h"
 
+#define EAL_TIMER_MP "eal_timer_mp_sync"
+
+struct timer_mp_param {
+   uint64_t tsc_hz;
+};
+
 /* The frequency of the RDTSC timer resolution */
 static uint64_t eal_tsc_resolution_hz;
 
@@ -74,12 +81,58 @@ estimate_tsc_freq(void)
return RTE_ALIGN_MUL_NEAR(rte_rdtsc() - start, CYC_PER_10MHZ);
 }
 
+static uint64_t
+get_tsc_freq_from_primary(void)
+{
+   struct rte_mp_msg mp_req = {0};
+   struct rte_mp_reply mp_reply = {0};
+   struct timer_mp_param *r;
+   struct timespec ts = {.tv_sec = 1, .tv_nsec = 0};
+   uint64_t tsc_hz;
+
+   strcpy(mp_req.name, EAL_TIMER_MP);
+   if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) ||
+   mp_reply.nb_received != 1) {
+   tsc_hz = 0;
+   } else {
+   r = (struct timer_mp_param *)mp_reply.msgs[0].param;
+   tsc_hz = r->tsc_hz;
+   }
+
+   free(mp_reply.msgs);
+   return tsc_hz;
+}
+
+static int
+timer_mp_primary(__attribute__((unused)) const struct rte_mp_msg *msg,
+const void *peer)
+{
+   struct rte_mp_msg reply = {0};
+   struct timer_mp_param *r = (struct timer_mp_param *)reply.param;
+
+   r->tsc_hz = eal_tsc_resolution_hz;
+   strcpy(reply.name, EAL_TIMER_MP);
+   reply.len_param = sizeof(*r);
+
+   return rte_mp_reply(&reply, peer);
+}
+
 void
 set_tsc_freq(void)
 {
uint64_t freq;
+   int rc;
 
freq = get_tsc_freq_arch();
+   if (!freq && rte_eal_process_type() != RTE_PROC_PRIMARY) {
+   /* We couldn't get the TSC frequency through arch-specific
+*  means.  If this is a secondary process, try to get the
+*  TSC frequency from the primary process - this will
+*  be much faster than get_tsc_freq() or estimate_tsc_freq()
+*  below.
+*/
+   freq = get_tsc_freq_from_primary();
+   }
if (!freq)
freq = get_tsc_freq();
if (!freq)
@@ -87,6 +140,14 @@ set_tsc_freq(void)
 
RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000);
eal_tsc_resolution_hz = freq;
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   rc = rte_mp_action_register(EAL_TIMER_MP, timer_mp_primary);
+   if (rc) {
+   RTE_LOG(WARNING, EAL, "Could not register mp_action - "
+   "secondary processes will calculate TSC rate "
+   "independently.\n");
+   }
+   }
 }
 
 void rte_delay_us_callback_register(void (*userfunc)(unsigned int))



[dpdk-dev] [PATCH v2 0/7] ethdev: add new Rx offload flags

2019-08-21 Thread pbhagavatula
From: Pavan Nikhilesh 

Add new Rx offload flags `DEV_RX_OFFLOAD_RSS_HASH` and
`DEV_RX_OFFLOAD_FLOW_MARK`. These flags can be used to
enable/disable PMD writes to rte_mbuf fields `hash.rss` and `hash.fdir.hi`
and also `ol_flags:PKT_RX_RSS` and `ol_flags:PKT_RX_FDIR`.

Add new packet type set function `rte_eth_dev_set_supported_ptypes`,
allows application to inform PMDs about the packet types it is interested in.
Based on ptypes requested by application PMDs can optimize the Rx path.

For example, if a given PMD doesn't support any packet types that the
application is interested in then the application can disable[1] writes to
`mbuf.packet_type` done by the PMD and use a software ptype parser.
[1] rte_eth_dev_set_supported_ptypes(*port_id*, 0);

v2 Changes:
--
- Update release notes. (Andrew)
- Redo commit logs. (Andrew)
- Disable ptype parsing for unsupported examples. (Jerin)
- Disable RSS write only in generic mode eventdev_pipeline. (Jerin)
- Modify set_supported_ptypes function to return successfuly set mask
  instead of failure.
- Dropped set_supported_ptypes to drivers by handling in library layer,
  interested PMD can add it in.


Pavan Nikhilesh (7):
  ethdev: add set ptype function
  ethdev: add mbuf RSS update as an offload
  ethdev: add flow action type update as an offload
  drivers/net: update Rx RSS hash offload capabilities
  drivers/net: update Rx flow flag and mark offload capabilities
  examples/eventdev_pipeline: add new Rx RSS hash offload
  examples: disable Rx packet type parsing

 doc/guides/nics/features.rst  |  24 +++-
 doc/guides/rel_notes/release_19_11.rst|   7 ++
 drivers/net/bnxt/bnxt_ethdev.c|   4 +-
 drivers/net/cxgbe/cxgbe.h |   3 +-
 drivers/net/dpaa/dpaa_ethdev.c|   3 +-
 drivers/net/dpaa2/dpaa2_ethdev.c  |   3 +-
 drivers/net/e1000/igb_rxtx.c  |   3 +-
 drivers/net/enic/enic_res.c   |   4 +-
 drivers/net/fm10k/fm10k_ethdev.c  |   3 +-
 drivers/net/hinic/hinic_pmd_ethdev.c  |   3 +-
 drivers/net/i40e/i40e_ethdev.c|   4 +-
 drivers/net/iavf/iavf_ethdev.c|   4 +-
 drivers/net/ice/ice_ethdev.c  |   4 +-
 drivers/net/ixgbe/ixgbe_rxtx.c|   4 +-
 drivers/net/liquidio/lio_ethdev.c |   3 +-
 drivers/net/mlx4/mlx4_rxq.c   |   3 +-
 drivers/net/mlx5/mlx5_rxq.c   |   4 +-
 drivers/net/netvsc/hn_rndis.c |   3 +-
 drivers/net/nfp/nfp_net.c |   3 +-
 drivers/net/octeontx2/otx2_ethdev.c   |   3 +-
 drivers/net/octeontx2/otx2_ethdev.h   |  16 +--
 drivers/net/octeontx2/otx2_flow_parse.c   |   3 +-
 drivers/net/qede/qede_ethdev.c|   3 +-
 drivers/net/sfc/sfc_ef10_essb_rx.c|   3 +-
 drivers/net/sfc/sfc_ef10_rx.c |   3 +-
 drivers/net/sfc/sfc_rx.c  |   4 +-
 drivers/net/thunderx/nicvf_ethdev.h   |   3 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c  |   3 +-
 examples/bbdev_app/main.c |   1 +
 examples/bond/main.c  |   2 +
 examples/distributor/Makefile |   1 +
 examples/distributor/main.c   |   1 +
 examples/eventdev_pipeline/main.c | 114 +
 .../pipeline_worker_generic.c | 118 ++
 .../eventdev_pipeline/pipeline_worker_tx.c| 114 +
 examples/exception_path/Makefile  |   1 +
 examples/exception_path/main.c|   1 +
 examples/flow_classify/flow_classify.c|   1 +
 examples/flow_filtering/Makefile  |   1 +
 examples/flow_filtering/main.c|   1 +
 examples/ip_pipeline/link.c   |   1 +
 examples/ip_reassembly/Makefile   |   1 +
 examples/ip_reassembly/main.c |   1 +
 examples/ipsec-secgw/ipsec-secgw.c|   1 +
 examples/ipv4_multicast/Makefile  |   1 +
 examples/ipv4_multicast/main.c|   1 +
 examples/kni/main.c   |   1 +
 examples/l2fwd-cat/Makefile   |   1 +
 examples/l2fwd-cat/l2fwd-cat.c|   1 +
 examples/l2fwd-crypto/main.c  |   1 +
 examples/l2fwd-jobstats/Makefile  |   1 +
 examples/l2fwd-jobstats/main.c|   1 +
 examples/l2fwd-keepalive/Makefile |   1 +
 examples/l2fwd-keepalive/main.c   |   1 +
 examples/l2fwd/Makefile   |   1 +
 examples/l2fwd/main.c |   1 +
 examples/l3fwd-acl/Makefile   |   1 +
 examples/l3fwd-acl/main.c |   1 +
 examples/l3fwd-power/main.c   |   1 +
 examples/l3fwd-vf/Makefile|   1 +
 examples/l3fwd-vf/main.c  |   1 +
 examples/link_statu

[dpdk-dev] [PATCH v2 1/7] ethdev: add set ptype function

2019-08-21 Thread pbhagavatula
From: Pavan Nikhilesh 

Add `rte_eth_dev_set_supported_ptypes` function that will allow the
application to inform the PMD the packet types it is interested in.
Based on the ptypes set PMDs can optimize their Rx path.

-If application doesn’t want any ptype information it can call
`rte_eth_dev_set_supported_ptypes(ethdev_id, RTE_PTYPE_UNKNOWN)` and PMD
will set rte_mbuf::packet_type to 0.

-If application doesn’t call `rte_eth_dev_set_supported_ptypes` PMD can
return `rte_mbuf::packet_type` with `rte_eth_dev_get_supported_ptypes`.

-If application is interested only in L2/L3 layer, it can inform the PMD
to update `rte_mbuf::packet_type` with L2/L3 ptype by calling
`rte_eth_dev_set_supported_ptypes(ethdev_id,
RTE_PTYPE_L2_MASK | RTE_PTYPE_L3_MASK)`.

Suggested-by: Konstantin Ananyev 
Signed-off-by: Pavan Nikhilesh 
---
 doc/guides/nics/features.rst | 12 ++---
 doc/guides/rel_notes/release_19_11.rst   |  7 ++
 lib/librte_ethdev/rte_ethdev.c   | 32 
 lib/librte_ethdev/rte_ethdev.h   | 16 
 lib/librte_ethdev/rte_ethdev_core.h  |  6 +
 lib/librte_ethdev/rte_ethdev_version.map |  3 +++
 6 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index c4e128d2f..d4d55f721 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -582,10 +582,14 @@ Supports inner packet L4 checksum.
 Packet type parsing
 ---
 
-Supports packet type parsing and returns a list of supported types.
-
-* **[implements] eth_dev_ops**: ``dev_supported_ptypes_get``.
-* **[related]API**: ``rte_eth_dev_get_supported_ptypes()``.
+Supports packet type parsing and returns a list of supported types. Allows
+application to set ptypes it is interested in.
+
+* **[implements] eth_dev_ops**: ``dev_supported_ptypes_get``,
+  ``dev_supported_ptypes_set``.
+* **[related]API**: ``rte_eth_dev_get_supported_ptypes()``,
+  ``rte_eth_dev_set_supported_ptypes()``.
+* **[provides]   mbuf**: ``mbuf.packet_type``.
 
 
 .. _nic_features_timesync:
diff --git a/doc/guides/rel_notes/release_19_11.rst 
b/doc/guides/rel_notes/release_19_11.rst
index 8490d897c..a7cec1fe8 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,13 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Added API in ethdev to set supported packet types**
+
+  *  Added new API ``rte_eth_dev_set_supported_ptypes`` that allows an
+ application to request PMD to set specific ptypes defined
+ through ``rte_eth_dev_set_supported_ptypes`` in ``rte_mbuf::packet_type``.
+  *  This scheme will allow PMDs to avoid lookup to internal ptype table on Rx
+ and thereby improve Rx performance if application wishes do so.
 
 Removed Items
 -
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 17d183e1f..f529cbe9f 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -2602,6 +2602,38 @@ rte_eth_dev_get_supported_ptypes(uint16_t port_id, 
uint32_t ptype_mask,
return j;
 }
 
+uint32_t
+rte_eth_dev_set_supported_ptypes(uint16_t port_id, uint32_t ptype_mask)
+{
+   int i;
+   struct rte_eth_dev *dev;
+   const uint32_t *all_ptypes;
+   uint32_t all_ptype_mask = 0;
+   uint32_t supp_ptype_mask = 0;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_supported_ptypes_get, 0);
+
+   if (ptype_mask == 0) {
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_supported_ptypes_set,
+   0);
+   return (*dev->dev_ops->dev_supported_ptypes_set)(dev,
+   ptype_mask);
+   }
+
+   all_ptypes = (*dev->dev_ops->dev_supported_ptypes_get)(dev);
+   if (all_ptypes == NULL)
+   return 0;
+
+   for (i = 0; all_ptypes[i] != RTE_PTYPE_UNKNOWN; ++i)
+   all_ptype_mask |= all_ptypes[i];
+
+   supp_ptype_mask = all_ptype_mask & ptype_mask;
+
+   return (*dev->dev_ops->dev_supported_ptypes_set)(dev, supp_ptype_mask);
+}
+
 void
 rte_eth_macaddr_get(uint16_t port_id, struct rte_ether_addr *mac_addr)
 {
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index dc6596bc9..1ab0af4d8 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -2431,6 +2431,22 @@ int rte_eth_dev_fw_version_get(uint16_t port_id,
  */
 int rte_eth_dev_get_supported_ptypes(uint16_t port_id, uint32_t ptype_mask,
 uint32_t *ptypes, int num);
+/**
+ * Request Ethernet device to set only specific packet types in the packet.
+ *
+ * Application can use this function to set only specific ptypes

[dpdk-dev] [PATCH v2 2/7] ethdev: add mbuf RSS update as an offload

2019-08-21 Thread pbhagavatula
From: Pavan Nikhilesh 

Add new Rx offload flag `DEV_RX_OFFLOAD_RSS_HASH` which can be used to
enable/disable PMDs write to `rte_mbuf::hash::rss`.
PMDs notify the validity of `rte_mbuf::hash:rss` to the applcation
by enabling `PKT_RX_RSS_HASH ` flag in `rte_mbuf::ol_flags`.

Signed-off-by: Pavan Nikhilesh 
Reviewed-by: Andrew Rybchenko 
---
 doc/guides/nics/features.rst   | 2 ++
 lib/librte_ethdev/rte_ethdev.c | 1 +
 lib/librte_ethdev/rte_ethdev.h | 1 +
 3 files changed, 4 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index d4d55f721..f79b69b38 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -274,6 +274,7 @@ Supports RSS hashing on RX.
 
 * **[uses] user config**: ``dev_conf.rxmode.mq_mode`` = 
``ETH_MQ_RX_RSS_FLAG``.
 * **[uses] user config**: ``dev_conf.rx_adv_conf.rss_conf``.
+* **[uses] rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_RSS_HASH``.
 * **[provides] rte_eth_dev_info**: ``flow_type_rss_offloads``.
 * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_RSS_HASH``, ``mbuf.rss``.
 
@@ -286,6 +287,7 @@ Inner RSS
 Supports RX RSS hashing on Inner headers.
 
 * **[uses]rte_flow_action_rss**: ``level``.
+* **[uses]rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_RSS_HASH``.
 * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_RSS_HASH``, ``mbuf.rss``.
 
 
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index f529cbe9f..9c5517d5f 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -129,6 +129,7 @@ static const struct {
RTE_RX_OFFLOAD_BIT2STR(KEEP_CRC),
RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+   RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 1ab0af4d8..836b30074 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1013,6 +1013,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_KEEP_CRC0x0001
 #define DEV_RX_OFFLOAD_SCTP_CKSUM  0x0002
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x0004
+#define DEV_RX_OFFLOAD_RSS_HASH0x0008
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 DEV_RX_OFFLOAD_UDP_CKSUM | \
-- 
2.22.0



[dpdk-dev] [PATCH v2 4/7] drivers/net: update Rx RSS hash offload capabilities

2019-08-21 Thread pbhagavatula
From: Pavan Nikhilesh 

Add DEV_RX_OFFLOAD_RSS_HASH flag for all PMDs that support RSS hash
delivery.

Signed-off-by: Pavan Nikhilesh 
---
 drivers/net/bnxt/bnxt_ethdev.c   |  3 ++-
 drivers/net/cxgbe/cxgbe.h|  3 ++-
 drivers/net/dpaa/dpaa_ethdev.c   |  3 ++-
 drivers/net/dpaa2/dpaa2_ethdev.c |  3 ++-
 drivers/net/e1000/igb_rxtx.c |  3 ++-
 drivers/net/enic/enic_res.c  |  3 ++-
 drivers/net/fm10k/fm10k_ethdev.c |  3 ++-
 drivers/net/hinic/hinic_pmd_ethdev.c |  3 ++-
 drivers/net/i40e/i40e_ethdev.c   |  3 ++-
 drivers/net/iavf/iavf_ethdev.c   |  3 ++-
 drivers/net/ice/ice_ethdev.c |  3 ++-
 drivers/net/ixgbe/ixgbe_rxtx.c   |  3 ++-
 drivers/net/liquidio/lio_ethdev.c|  3 ++-
 drivers/net/mlx4/mlx4_rxq.c  |  3 ++-
 drivers/net/mlx5/mlx5_rxq.c  |  3 ++-
 drivers/net/netvsc/hn_rndis.c|  3 ++-
 drivers/net/nfp/nfp_net.c|  3 ++-
 drivers/net/octeontx2/otx2_ethdev.c  |  3 ++-
 drivers/net/octeontx2/otx2_ethdev.h  | 15 ---
 drivers/net/qede/qede_ethdev.c   |  3 ++-
 drivers/net/sfc/sfc_ef10_essb_rx.c   |  2 +-
 drivers/net/sfc/sfc_ef10_rx.c|  3 ++-
 drivers/net/sfc/sfc_rx.c |  3 ++-
 drivers/net/thunderx/nicvf_ethdev.h  |  3 ++-
 drivers/net/vmxnet3/vmxnet3_ethdev.c |  3 ++-
 25 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 6685ee7d9..6c106baf7 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -160,7 +160,8 @@ static const struct rte_pci_id bnxt_pci_id_map[] = {
 DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM | \
 DEV_RX_OFFLOAD_JUMBO_FRAME | \
 DEV_RX_OFFLOAD_KEEP_CRC | \
-DEV_RX_OFFLOAD_TCP_LRO)
+DEV_RX_OFFLOAD_TCP_LRO | \
+DEV_RX_OFFLOAD_RSS_HASH)
 
 static int bnxt_vlan_offload_set_op(struct rte_eth_dev *dev, int mask);
 static void bnxt_print_link_info(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/cxgbe/cxgbe.h b/drivers/net/cxgbe/cxgbe.h
index 3f97fa58b..22e61a55c 100644
--- a/drivers/net/cxgbe/cxgbe.h
+++ b/drivers/net/cxgbe/cxgbe.h
@@ -47,7 +47,8 @@
   DEV_RX_OFFLOAD_UDP_CKSUM | \
   DEV_RX_OFFLOAD_TCP_CKSUM | \
   DEV_RX_OFFLOAD_JUMBO_FRAME | \
-  DEV_RX_OFFLOAD_SCATTER)
+  DEV_RX_OFFLOAD_SCATTER | \
+  DEV_RX_OFFLOAD_RSS_HASH)
 
 
 #define CXGBE_DEVARG_KEEP_OVLAN "keep_ovlan"
diff --git a/drivers/net/dpaa/dpaa_ethdev.c b/drivers/net/dpaa/dpaa_ethdev.c
index 7154fb9b4..18c7bd0d5 100644
--- a/drivers/net/dpaa/dpaa_ethdev.c
+++ b/drivers/net/dpaa/dpaa_ethdev.c
@@ -49,7 +49,8 @@
 /* Supported Rx offloads */
 static uint64_t dev_rx_offloads_sup =
DEV_RX_OFFLOAD_JUMBO_FRAME |
-   DEV_RX_OFFLOAD_SCATTER;
+   DEV_RX_OFFLOAD_SCATTER |
+   DEV_RX_OFFLOAD_RSS_HASH;
 
 /* Rx offloads which cannot be disabled */
 static uint64_t dev_rx_offloads_nodis =
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index dd6a78f9f..55a1c4455 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -38,7 +38,8 @@ static uint64_t dev_rx_offloads_sup =
DEV_RX_OFFLOAD_TCP_CKSUM |
DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM |
DEV_RX_OFFLOAD_VLAN_FILTER |
-   DEV_RX_OFFLOAD_JUMBO_FRAME;
+   DEV_RX_OFFLOAD_JUMBO_FRAME |
+   DEV_RX_OFFLOAD_RSS_HASH;
 
 /* Rx offloads which cannot be disabled */
 static uint64_t dev_rx_offloads_nodis =
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index c5606de5d..684fa4ad8 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -1646,7 +1646,8 @@ igb_get_rx_port_offloads_capa(struct rte_eth_dev *dev)
  DEV_RX_OFFLOAD_TCP_CKSUM   |
  DEV_RX_OFFLOAD_JUMBO_FRAME |
  DEV_RX_OFFLOAD_KEEP_CRC|
- DEV_RX_OFFLOAD_SCATTER;
+ DEV_RX_OFFLOAD_SCATTER |
+ DEV_RX_OFFLOAD_RSS_HASH;
 
return rx_offload_capa;
 }
diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c
index 9405e1933..607a085f8 100644
--- a/drivers/net/enic/enic_res.c
+++ b/drivers/net/enic/enic_res.c
@@ -198,7 +198,8 @@ int enic_get_vnic_config(struct enic *enic)
DEV_RX_OFFLOAD_VLAN_STRIP |
DEV_RX_OFFLOAD_IPV4_CKSUM |
DEV_RX_OFFLOAD_UDP_CKSUM |
-   DEV_RX_OFFLOAD_TCP_CKSUM;
+   DEV_RX_OFFLOAD_TCP_CKSUM |
+   DEV_RX_OFFLOAD_RSS_HASH;
enic->tx_offload_mask =
   

[dpdk-dev] [PATCH v2 5/7] drivers/net: update Rx flow flag and mark capabilities

2019-08-21 Thread pbhagavatula
From: Pavan Nikhilesh 

Add DEV_RX_OFFLOAD_FLOW_MARK flag for all PMDs that support flow action
flag and mark.

Signed-off-by: Pavan Nikhilesh 
---
 drivers/net/bnxt/bnxt_ethdev.c  | 3 ++-
 drivers/net/enic/enic_res.c | 3 ++-
 drivers/net/i40e/i40e_ethdev.c  | 3 ++-
 drivers/net/iavf/iavf_ethdev.c  | 3 ++-
 drivers/net/ice/ice_ethdev.c| 3 ++-
 drivers/net/ixgbe/ixgbe_rxtx.c  | 3 ++-
 drivers/net/mlx5/mlx5_rxq.c | 3 ++-
 drivers/net/octeontx2/otx2_ethdev.h | 3 ++-
 drivers/net/octeontx2/otx2_flow_parse.c | 3 ++-
 drivers/net/sfc/sfc_ef10_essb_rx.c  | 3 ++-
 drivers/net/sfc/sfc_rx.c| 3 ++-
 11 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 6c106baf7..fd1fb7eda 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -161,7 +161,8 @@ static const struct rte_pci_id bnxt_pci_id_map[] = {
 DEV_RX_OFFLOAD_JUMBO_FRAME | \
 DEV_RX_OFFLOAD_KEEP_CRC | \
 DEV_RX_OFFLOAD_TCP_LRO | \
-DEV_RX_OFFLOAD_RSS_HASH)
+DEV_RX_OFFLOAD_RSS_HASH | \
+DEV_RX_OFFLOAD_FLOW_MARK)
 
 static int bnxt_vlan_offload_set_op(struct rte_eth_dev *dev, int mask);
 static void bnxt_print_link_info(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c
index 607a085f8..3503d5d7e 100644
--- a/drivers/net/enic/enic_res.c
+++ b/drivers/net/enic/enic_res.c
@@ -199,7 +199,8 @@ int enic_get_vnic_config(struct enic *enic)
DEV_RX_OFFLOAD_IPV4_CKSUM |
DEV_RX_OFFLOAD_UDP_CKSUM |
DEV_RX_OFFLOAD_TCP_CKSUM |
-   DEV_RX_OFFLOAD_RSS_HASH;
+   DEV_RX_OFFLOAD_RSS_HASH |
+   DEV_RX_OFFLOAD_FLOW_MARK;
enic->tx_offload_mask =
PKT_TX_IPV6 |
PKT_TX_IPV4 |
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 7058e0213..6311943be 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -3512,7 +3512,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_RX_OFFLOAD_VLAN_EXTEND |
DEV_RX_OFFLOAD_VLAN_FILTER |
DEV_RX_OFFLOAD_JUMBO_FRAME |
-   DEV_RX_OFFLOAD_RSS_HASH;
+   DEV_RX_OFFLOAD_RSS_HASH |
+   DEV_RX_OFFLOAD_FLOW_MARK;
 
dev_info->tx_queue_offload_capa = DEV_TX_OFFLOAD_MBUF_FAST_FREE;
dev_info->tx_offload_capa =
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index aef91a79b..7bdaa87b1 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -518,7 +518,8 @@ iavf_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_RX_OFFLOAD_SCATTER |
DEV_RX_OFFLOAD_JUMBO_FRAME |
DEV_RX_OFFLOAD_VLAN_FILTER |
-   DEV_RX_OFFLOAD_RSS_HASH;
+   DEV_RX_OFFLOAD_RSS_HASH |
+   DEV_RX_OFFLOAD_FLOW_MARK;
dev_info->tx_offload_capa =
DEV_TX_OFFLOAD_VLAN_INSERT |
DEV_TX_OFFLOAD_QINQ_INSERT |
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index fc0f0003f..8b8d55e4a 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -2134,7 +2134,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_RX_OFFLOAD_QINQ_STRIP |
DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM |
DEV_RX_OFFLOAD_VLAN_EXTEND |
-   DEV_RX_OFFLOAD_RSS_HASH;
+   DEV_RX_OFFLOAD_RSS_HASH |
+   DEV_RX_OFFLOAD_FLOW_MARK;
dev_info->tx_offload_capa |=
DEV_TX_OFFLOAD_QINQ_INSERT |
DEV_TX_OFFLOAD_IPV4_CKSUM |
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index fa572d184..1481e2426 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -2873,7 +2873,8 @@ ixgbe_get_rx_port_offloads(struct rte_eth_dev *dev)
   DEV_RX_OFFLOAD_JUMBO_FRAME |
   DEV_RX_OFFLOAD_VLAN_FILTER |
   DEV_RX_OFFLOAD_SCATTER |
-  DEV_RX_OFFLOAD_RSS_HASH;
+  DEV_RX_OFFLOAD_RSS_HASH |
+  DEV_RX_OFFLOAD_FLOW_MARK;
 
if (hw->mac.type == ixgbe_mac_82598EB)
offloads |= DEV_RX_OFFLOAD_VLAN_STRIP;
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b5fd57693..1bf01bda3 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -369,7 +369,8 @

[dpdk-dev] [PATCH v2 6/7] examples/eventdev_pipeline: add new Rx RSS hash offload

2019-08-21 Thread pbhagavatula
From: Pavan Nikhilesh 

Since pipeline_generic uses `rte_mbuf::hash::rss` add the new Rx offload
flag `DEV_RX_OFFLOAD_RSS_HASH` to inform PMD to copy the RSS hash result
into the mbuf.

Signed-off-by: Pavan Nikhilesh 
---
 Currently, there is no means to retrieve set configuration from an ethdev
 without touching the internal structures of `rte_ethdev`. So, moving port
 configuration into specific pipeline models is the only way.

 examples/eventdev_pipeline/main.c | 113 -
 .../pipeline_worker_generic.c | 118 ++
 .../eventdev_pipeline/pipeline_worker_tx.c| 114 +
 3 files changed, 232 insertions(+), 113 deletions(-)

diff --git a/examples/eventdev_pipeline/main.c 
b/examples/eventdev_pipeline/main.c
index f4e57f541..a73b61d59 100644
--- a/examples/eventdev_pipeline/main.c
+++ b/examples/eventdev_pipeline/main.c
@@ -242,118 +242,6 @@ parse_app_args(int argc, char **argv)
}
 }

-/*
- * Initializes a given port using global settings and with the RX buffers
- * coming from the mbuf_pool passed as a parameter.
- */
-static inline int
-port_init(uint8_t port, struct rte_mempool *mbuf_pool)
-{
-   struct rte_eth_rxconf rx_conf;
-   static const struct rte_eth_conf port_conf_default = {
-   .rxmode = {
-   .mq_mode = ETH_MQ_RX_RSS,
-   .max_rx_pkt_len = RTE_ETHER_MAX_LEN,
-   },
-   .rx_adv_conf = {
-   .rss_conf = {
-   .rss_hf = ETH_RSS_IP |
- ETH_RSS_TCP |
- ETH_RSS_UDP,
-   }
-   }
-   };
-   const uint16_t rx_rings = 1, tx_rings = 1;
-   const uint16_t rx_ring_size = 512, tx_ring_size = 512;
-   struct rte_eth_conf port_conf = port_conf_default;
-   int retval;
-   uint16_t q;
-   struct rte_eth_dev_info dev_info;
-   struct rte_eth_txconf txconf;
-
-   if (!rte_eth_dev_is_valid_port(port))
-   return -1;
-
-   rte_eth_dev_info_get(port, &dev_info);
-   if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE)
-   port_conf.txmode.offloads |=
-   DEV_TX_OFFLOAD_MBUF_FAST_FREE;
-   rx_conf = dev_info.default_rxconf;
-   rx_conf.offloads = port_conf.rxmode.offloads;
-
-   port_conf.rx_adv_conf.rss_conf.rss_hf &=
-   dev_info.flow_type_rss_offloads;
-   if (port_conf.rx_adv_conf.rss_conf.rss_hf !=
-   port_conf_default.rx_adv_conf.rss_conf.rss_hf) {
-   printf("Port %u modified RSS hash function based on hardware 
support,"
-   "requested:%#"PRIx64" configured:%#"PRIx64"\n",
-   port,
-   port_conf_default.rx_adv_conf.rss_conf.rss_hf,
-   port_conf.rx_adv_conf.rss_conf.rss_hf);
-   }
-
-   /* Configure the Ethernet device. */
-   retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);
-   if (retval != 0)
-   return retval;
-
-   /* Allocate and set up 1 RX queue per Ethernet port. */
-   for (q = 0; q < rx_rings; q++) {
-   retval = rte_eth_rx_queue_setup(port, q, rx_ring_size,
-   rte_eth_dev_socket_id(port), &rx_conf,
-   mbuf_pool);
-   if (retval < 0)
-   return retval;
-   }
-
-   txconf = dev_info.default_txconf;
-   txconf.offloads = port_conf_default.txmode.offloads;
-   /* Allocate and set up 1 TX queue per Ethernet port. */
-   for (q = 0; q < tx_rings; q++) {
-   retval = rte_eth_tx_queue_setup(port, q, tx_ring_size,
-   rte_eth_dev_socket_id(port), &txconf);
-   if (retval < 0)
-   return retval;
-   }
-
-   /* Display the port MAC address. */
-   struct rte_ether_addr addr;
-   rte_eth_macaddr_get(port, &addr);
-   printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8
-  " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n",
-   (unsigned int)port,
-   addr.addr_bytes[0], addr.addr_bytes[1],
-   addr.addr_bytes[2], addr.addr_bytes[3],
-   addr.addr_bytes[4], addr.addr_bytes[5]);
-
-   /* Enable RX in promiscuous mode for the Ethernet device. */
-   rte_eth_promiscuous_enable(port);
-
-   return 0;
-}
-
-static int
-init_ports(uint16_t num_ports)
-{
-   uint16_t portid;
-
-   if (!cdata.num_mbuf)
-   cdata.num_mbuf = 16384 * num_ports;
-
-   struct rte_mempool *mp = rte_pktmbuf_pool_create("packet_pool",
-   /* mbufs */ cdata.num_mbuf,
-   /* cache_size */ 512,
-   /* priv_size*/ 0,
-

[dpdk-dev] [PATCH v2 3/7] ethdev: add flow action type update as an offload

2019-08-21 Thread pbhagavatula
From: Pavan Nikhilesh 

Add new Rx offload flag `DEV_RX_OFFLOAD_FLOW_MARK` that can be used to
enable/disable PMDs write to `rte_mbuf::hash::fdir::hi` and
`rte_mbuf::ol_flags` when flow actions `RTE_FLOW_ACTION_MARK` and
`RTE_FLOW_ACTION_FLAG` are enabled.

PMDs notify the validity of `rte_mbuf::hash:fdir::hi` to the applcation
by enabling `PKT_RX_FDIR_ID` flag in `rte_mbuf::ol_flags`.

Signed-off-by: Pavan Nikhilesh 
---
 doc/guides/nics/features.rst   | 12 
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  1 +
 lib/librte_ethdev/rte_flow.h   |  6 --
 4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index f79b69b38..338b19e03 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -594,6 +594,18 @@ application to set ptypes it is interested in.
 * **[provides]   mbuf**: ``mbuf.packet_type``.
 
 
+.. _nic_features_flow_flag_mark:
+
+Flow flag/mark update
+-
+
+Supports flow action type update to ``mbuf.ol_flags`` and 
``mbuf.hash.fdir.hi``.
+
+* **[uses] rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_FLOW_MARK``.
+* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_FDIR``, 
``mbuf.ol_flags:PKT_RX_FDIR_ID;``,
+  ``mbuf.hash.fdir.hi``
+
+
 .. _nic_features_timesync:
 
 Timesync
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 9c5517d5f..bcbe06c5c 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -130,6 +130,7 @@ static const struct {
RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM),
RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
RTE_RX_OFFLOAD_BIT2STR(RSS_HASH),
+   RTE_RX_OFFLOAD_BIT2STR(FLOW_MARK),
 };
 
 #undef RTE_RX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 836b30074..44686ec21 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1014,6 +1014,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_SCTP_CKSUM  0x0002
 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM  0x0004
 #define DEV_RX_OFFLOAD_RSS_HASH0x0008
+#define DEV_RX_OFFLOAD_FLOW_MARK   0x0010
 
 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \
 DEV_RX_OFFLOAD_UDP_CKSUM | \
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index b66bf1495..5d9d88d76 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -1316,7 +1316,8 @@ enum rte_flow_action_type {
 
/**
 * Attaches an integer value to packets and sets PKT_RX_FDIR and
-* PKT_RX_FDIR_ID mbuf flags.
+* PKT_RX_FDIR_ID mbuf flags when
+* `rx_mode:offloads:DEV_RX_OFFLOAD_FLOW_MARK` is set.
 *
 * See struct rte_flow_action_mark.
 */
@@ -1324,7 +1325,8 @@ enum rte_flow_action_type {
 
/**
 * Flags packets. Similar to MARK without a specific value; only
-* sets the PKT_RX_FDIR mbuf flag.
+* sets the PKT_RX_FDIR mbuf flag when
+* `rx_mode:offloads:DEV_RX_OFFLOAD_FLOW_MARK` is set
 *
 * No associated configuration structure.
 */
-- 
2.22.0



[dpdk-dev] [PATCH v2 7/7] examples: disable Rx packet type parsing

2019-08-21 Thread pbhagavatula
From: Pavan Nikhilesh 

Disable packet type parsing in examples that don't use
`rte_mbuf::packet_type` by setting ptype_mask as 0 in
`rte_eth_dev_set_supported_ptypes`

Signed-off-by: Pavan Nikhilesh 
---
 examples/bbdev_app/main.c  | 1 +
 examples/bond/main.c   | 2 ++
 examples/distributor/Makefile  | 1 +
 examples/distributor/main.c| 1 +
 examples/distributor/meson.build   | 1 +
 examples/eventdev_pipeline/main.c  | 1 +
 examples/eventdev_pipeline/meson.build | 1 +
 examples/exception_path/Makefile   | 1 +
 examples/exception_path/main.c | 1 +
 examples/exception_path/meson.build| 1 +
 examples/flow_classify/flow_classify.c | 1 +
 examples/flow_filtering/Makefile   | 1 +
 examples/flow_filtering/main.c | 1 +
 examples/flow_filtering/meson.build| 1 +
 examples/ip_pipeline/link.c| 1 +
 examples/ip_reassembly/Makefile| 1 +
 examples/ip_reassembly/main.c  | 1 +
 examples/ip_reassembly/meson.build | 1 +
 examples/ipsec-secgw/ipsec-secgw.c | 1 +
 examples/ipv4_multicast/Makefile   | 1 +
 examples/ipv4_multicast/main.c | 1 +
 examples/ipv4_multicast/meson.build| 1 +
 examples/kni/main.c| 1 +
 examples/l2fwd-cat/Makefile| 1 +
 examples/l2fwd-cat/l2fwd-cat.c | 1 +
 examples/l2fwd-cat/meson.build | 1 +
 examples/l2fwd-crypto/main.c   | 1 +
 examples/l2fwd-jobstats/Makefile   | 1 +
 examples/l2fwd-jobstats/main.c | 1 +
 examples/l2fwd-jobstats/meson.build| 1 +
 examples/l2fwd-keepalive/Makefile  | 1 +
 examples/l2fwd-keepalive/main.c| 1 +
 examples/l2fwd-keepalive/meson.build   | 1 +
 examples/l2fwd/Makefile| 1 +
 examples/l2fwd/main.c  | 1 +
 examples/l2fwd/meson.build | 1 +
 examples/l3fwd-acl/Makefile| 1 +
 examples/l3fwd-acl/main.c  | 1 +
 examples/l3fwd-acl/meson.build | 1 +
 examples/l3fwd-power/main.c| 1 +
 examples/l3fwd-vf/Makefile | 1 +
 examples/l3fwd-vf/main.c   | 1 +
 examples/l3fwd-vf/meson.build  | 1 +
 examples/link_status_interrupt/Makefile| 1 +
 examples/link_status_interrupt/main.c  | 1 +
 examples/link_status_interrupt/meson.build | 1 +
 examples/load_balancer/Makefile| 1 +
 examples/load_balancer/init.c  | 1 +
 examples/load_balancer/meson.build | 1 +
 examples/packet_ordering/Makefile  | 1 +
 examples/packet_ordering/main.c| 1 +
 examples/packet_ordering/meson.build   | 1 +
 examples/ptpclient/Makefile| 1 +
 examples/ptpclient/meson.build | 1 +
 examples/ptpclient/ptpclient.c | 1 +
 examples/qos_meter/Makefile| 1 +
 examples/qos_meter/main.c  | 2 ++
 examples/qos_meter/meson.build | 1 +
 examples/qos_sched/Makefile| 1 +
 examples/qos_sched/init.c  | 1 +
 examples/qos_sched/meson.build | 1 +
 examples/quota_watermark/qw/Makefile   | 1 +
 examples/quota_watermark/qw/init.c | 1 +
 examples/rxtx_callbacks/main.c | 1 +
 examples/server_node_efd/server/Makefile   | 1 +
 examples/server_node_efd/server/init.c | 1 +
 examples/skeleton/Makefile | 1 +
 examples/skeleton/basicfwd.c   | 1 +
 examples/skeleton/meson.build  | 1 +
 examples/tep_termination/Makefile  | 1 +
 examples/tep_termination/meson.build   | 1 +
 examples/tep_termination/vxlan_setup.c | 1 +
 examples/vhost/Makefile| 1 +
 examples/vhost/main.c  | 1 +
 examples/vm_power_manager/Makefile | 1 +
 examples/vm_power_manager/main.c   | 1 +
 examples/vm_power_manager/meson.build  | 1 +
 examples/vmdq/Makefile | 1 +
 examples/vmdq/main.c   | 1 +
 examples/vmdq/meson.build  | 1 +
 examples/vmdq_dcb/Makefile | 1 +
 examples/vmdq_dcb/main.c   | 1 +
 examples/vmdq_dcb/meson.build  | 1 +
 83 files changed, 85 insertions(+)

diff --git a/examples/bbdev_app/main.c b/examples/bbdev_app/main.c
index 9acf666dc..8ae6e4972 100644
--- a/examples/bbdev_app/main.c
+++ b/examples/bbdev_app/main.c
@@ -478,6 +478,7 @@ initialize_ports(struct app_config_params *app_params,
}
 
rte_eth_promiscuous_enable(port_id);
+   rte_eth_dev_set_supported_ptypes(port_id, 0);
 
rte_eth_macaddr_get(port_id, &bbdev_port_eth_addr);
print_mac(port_id, &bbdev_port_eth_addr);
diff --git a/examples/bond/main.c b/examples/bond/main.c
index 1c0df9d46..ffb911fc5 100644
--- a/examples/bond/main.c
+++ b/examples/bond/main

[dpdk-dev] [PATCH] maintainers: update for Mellanox mlx5 PMD

2019-08-21 Thread Yongseok Koh
Matan thankfully accepted to replace myself as maintainer for mlx5 PMD.
Good luck!

Signed-off-by: Yongseok Koh 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4100260861..30dbb8be55 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -715,8 +715,8 @@ F: doc/guides/nics/mlx4.rst
 F: doc/guides/nics/features/mlx4.ini
 
 Mellanox mlx5
+M: Matan Azrad 
 M: Shahaf Shuler 
-M: Yongseok Koh 
 M: Viacheslav Ovsiienko 
 T: git://dpdk.org/next/dpdk-next-net-mlx
 F: drivers/net/mlx5/
-- 
2.21.0.196.g041f5ea



[dpdk-dev] [PATCH v2] timer: use rte_mp_msg to get freq from primary process

2019-08-21 Thread Jim Harris
Ideally, get_tsc_freq_arch() is able to provide the
TSC rate using architecture-specific means.  When that
is not possible, DPDK reverts to calculating the
TSC rate with a 100ms nanosleep or 1s sleep.  The latter
occurs more frequently in VMs which often do not have
access to the data they need from arch-specific means
(CPUID leaf 0x15 or MSR 0xCE on x86).

In secondary processes, the extra 100ms is especially
noticeable and consumes the bulk of rte_eal_init()
execution time.  So in secondary processes, if
we cannot get the TSC rate using get_tsc_freq_arch(),
try to get the TSC rate from the primary process
instead using rte_mp_msg.  This is much faster than
100ms.

Reduces rte_eal_init() execution time in a secondary
process from 165ms to 66ms on my test system.

Signed-off-by: Jim Harris 
Change-Id: I584419ed1c7d6f47841e0a0eb23f34c9f1186d35
---
 lib/librte_eal/common/eal_common_timer.c |   62 ++
 1 file changed, 62 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_timer.c 
b/lib/librte_eal/common/eal_common_timer.c
index 145543de7..ad965455d 100644
--- a/lib/librte_eal/common/eal_common_timer.c
+++ b/lib/librte_eal/common/eal_common_timer.c
@@ -15,9 +15,17 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "eal_private.h"
 
+#define EAL_TIMER_MP "eal_timer_mp_sync"
+
+struct timer_mp_param {
+   uint64_t tsc_hz;
+};
+
 /* The frequency of the RDTSC timer resolution */
 static uint64_t eal_tsc_resolution_hz;
 
@@ -74,12 +82,58 @@ estimate_tsc_freq(void)
return RTE_ALIGN_MUL_NEAR(rte_rdtsc() - start, CYC_PER_10MHZ);
 }
 
+static uint64_t
+get_tsc_freq_from_primary(void)
+{
+   struct rte_mp_msg mp_req = {0};
+   struct rte_mp_reply mp_reply = {0};
+   struct timer_mp_param *r;
+   struct timespec ts = {.tv_sec = 1, .tv_nsec = 0};
+   uint64_t tsc_hz;
+
+   strcpy(mp_req.name, EAL_TIMER_MP);
+   if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) ||
+   mp_reply.nb_received != 1) {
+   tsc_hz = 0;
+   } else {
+   r = (struct timer_mp_param *)mp_reply.msgs[0].param;
+   tsc_hz = r->tsc_hz;
+   }
+
+   free(mp_reply.msgs);
+   return tsc_hz;
+}
+
+static int
+timer_mp_primary(__attribute__((unused)) const struct rte_mp_msg *msg,
+const void *peer)
+{
+   struct rte_mp_msg reply = {0};
+   struct timer_mp_param *r = (struct timer_mp_param *)reply.param;
+
+   r->tsc_hz = eal_tsc_resolution_hz;
+   strcpy(reply.name, EAL_TIMER_MP);
+   reply.len_param = sizeof(*r);
+
+   return rte_mp_reply(&reply, peer);
+}
+
 void
 set_tsc_freq(void)
 {
uint64_t freq;
+   int rc;
 
freq = get_tsc_freq_arch();
+   if (!freq && rte_eal_process_type() != RTE_PROC_PRIMARY) {
+   /* We couldn't get the TSC frequency through arch-specific
+*  means.  If this is a secondary process, try to get the
+*  TSC frequency from the primary process - this will
+*  be much faster than get_tsc_freq() or estimate_tsc_freq()
+*  below.
+*/
+   freq = get_tsc_freq_from_primary();
+   }
if (!freq)
freq = get_tsc_freq();
if (!freq)
@@ -87,6 +141,14 @@ set_tsc_freq(void)
 
RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000);
eal_tsc_resolution_hz = freq;
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   rc = rte_mp_action_register(EAL_TIMER_MP, timer_mp_primary);
+   if (rc && rte_errno != ENOTSUP) {
+   RTE_LOG(WARNING, EAL, "Could not register mp_action - "
+   "secondary processes will calculate TSC rate "
+   "independently.\n");
+   }
+   }
 }
 
 void rte_delay_us_callback_register(void (*userfunc)(unsigned int))



Re: [dpdk-dev] Sync up status for Mellanox PMD barrier investigation

2019-08-21 Thread Phil Yang (Arm Technology China)
Please disregard my last message. It was mistakenly sent to the wrong group. 
Sorry about that.

Thanks,
Phil Yang

> -Original Message-
> From: dev  On Behalf Of Phil Yang (Arm
> Technology China)
> Sent: Wednesday, August 21, 2019 5:58 PM
> To: Honnappa Nagarahalli 
> Cc: dev@dpdk.org; nd 
> Subject: Re: [dpdk-dev] Sync up status for Mellanox PMD barrier
> investigation
> 
> Some update for this thread.
> 
> In the most critical datapath of mlx5 PMD, there are some rte_cio_w/rmb,
> 'dmb osh' on aarch64, in use.
> C11 atomic is good for replacing the rte_smp_r/wmb to relax the data
> synchronization barrier between CPUs.
> However, mlx5 PMD needs to write data back to the  HW, so it used a lot of
> rte_cio_r/wmb to synchronize data.
> 
> Please check details below. All comments are welcomed. Thanks.
> 
>  Data path ///
> drivers/net/mlx5/mlx5_rxtx.c=950=mlx5_rx_err_handle(struct
> mlx5_rxq_data *rxq, uint8_t mbuf_prepare)
> drivers/net/mlx5/mlx5_rxtx.c:1002:   rte_cio_wmb();
> drivers/net/mlx5/mlx5_rxtx.c:1004:   rte_cio_wmb();
> drivers/net/mlx5/mlx5_rxtx.c:1010:   rte_cio_wmb();
> drivers/net/mlx5/mlx5_rxtx.c=1272=mlx5_rx_burst(void *dpdk_rxq, struct
> rte_mbuf **pkts, uint16_t pkts_n)
> drivers/net/mlx5/mlx5_rxtx.c:1385:   rte_cio_wmb();
> drivers/net/mlx5/mlx5_rxtx.c:1387:   rte_cio_wmb();
> drivers/net/mlx5/mlx5_rxtx.c=1549=mlx5_rx_burst_mprq(void *dpdk_rxq,
> struct rte_mbuf **pkts, uint16_t pkts_n)
> drivers/net/mlx5/mlx5_rxtx.c:1741:   rte_cio_wmb();
> drivers/net/mlx5/mlx5_rxtx.c:1745:   rte_cio_wmb();
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h=366=rxq_burst_v(struct
> mlx5_rxq_data *rxq, struct rte_mbuf **pkts, uint16_t pkts_n,
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h:530:   rte_cio_rmb();
> 
> Commit messages:
> net/mlx5: cleanup memory barriers: mlx5_rx_burst
> https://git.dpdk.org/dpdk/commit/?id=9afa3f74658afc0e21fbe5c3884c55a21
> ff49299
> 
> net/mlx5: add Multi-Packet Rx support : mlx5_rx_burst_mprq
> https://git.dpdk.org/dpdk/commit/?id=7d6bf6b866b8c25ec06539b3eeed1db
> 4f785577c
> 
> net/mlx5: use coherent I/O memory barrier
> https://git.dpdk.org/dpdk/commit/drivers/net/mlx5/mlx5_rxtx.c?id=0cfdc18
> 08de82357a924a479dc3f89de88cd91c2
> 
> net/mlx5: extend Rx completion with error handling
> https://git.dpdk.org/dpdk/commit/drivers/net/mlx5/mlx5_rxtx.c?id=88c073
> 3535d6a7ce79045d4d57a1d78d904067c8
> 
> net/mlx5: fix synchronization on polling Rx completions
> https://git.dpdk.org/dpdk/commit/?id=1742c2d9fab07e66209f2d14e7daa508
> 29fc4423
> 
> 
> Thanks,
> Phil Yang
> 
> From: Phil Yang (Arm Technology China)
> Sent: Thursday, August 15, 2019 6:35 PM
> To: Honnappa Nagarahalli 
> Subject: Sync up status for Mellanox PMD barrier investigation
> 
> Hi Honnappa,
> 
> I have checked all the barriers in mlx5 PMD data path. In my understanding, it
> used the barrier correctly (Use DMB to synchronize the memory data
> between CPUs).
> The attachment is the list of positions of these barriers.
> I just want to sync up with you the status. Do you have any idea or
> suggestion on which part should we start to optimization?
> 
> Best Regards,
> Phil Yang


Re: [dpdk-dev] [PATCH] net/i40e: add checking for messages from VF

2019-08-21 Thread Ye Xiaolong
On 08/20, alvinx.zh...@intel.com wrote:
>From: Alvin Zhang 
>
>If VF driver in VM continuous sending invalid messages by mailbox,
>it will waste CPU cycles on PF driver and impact other VF drivers
>configuration. New feature can count the numbers of invalid and
>unsupported messages from VFs, when the statistics from a VF
>exceed maximum limit, PF driver will ignore any message from that
>VF for some seconds.
>
>Signed-off-by: Alvin Zhang 
>---
> drivers/net/i40e/i40e_ethdev.c |  80 +
> drivers/net/i40e/i40e_ethdev.h |  30 +++
> drivers/net/i40e/i40e_pf.c | 189 -
> 3 files changed, 258 insertions(+), 41 deletions(-)
>
>diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
>index 4e40b7a..045ba49 100644
>--- a/drivers/net/i40e/i40e_ethdev.c
>+++ b/drivers/net/i40e/i40e_ethdev.c
>@@ -44,6 +44,7 @@
> #define ETH_I40E_SUPPORT_MULTI_DRIVER "support-multi-driver"
> #define ETH_I40E_QUEUE_NUM_PER_VF_ARG "queue-num-per-vf"
> #define ETH_I40E_USE_LATEST_VEC   "use-latest-supported-vec"
>+#define ETH_I40E_MAX_VF_WRONG_MSG "vf_max_wrong_msg"
> 
> #define I40E_CLEAR_PXE_WAIT_MS 200
> 
>@@ -406,6 +407,7 @@ static int i40e_sw_tunnel_filter_insert(struct i40e_pf *pf,
>   ETH_I40E_SUPPORT_MULTI_DRIVER,
>   ETH_I40E_QUEUE_NUM_PER_VF_ARG,
>   ETH_I40E_USE_LATEST_VEC,
>+  ETH_I40E_MAX_VF_WRONG_MSG,
>   NULL};
> 
> static const struct rte_pci_id pci_id_i40e_map[] = {
>@@ -1256,6 +1258,82 @@ static inline void i40e_config_automask(struct i40e_pf 
>*pf)
>   return 0;
> }
> 
>+static int
>+read_vf_msg_check_info(__rte_unused const char *key,
>+ const char *value,
>+ void *opaque)
>+{
>+  struct i40e_wrong_vf_msg info;
>+
>+  memset(&info, 0, sizeof(info));
>+  /*
>+   * VF message checking function need 3 parameters, max_invalid,
>+   * max_unsupported and silence_seconds.
>+   * When continuous invalid or unsupported message statistics
>+   * from a VF exceed the limitation of 'max_invalid' or
>+   * 'max_unsupported', PF will ignore any message from that VF for
>+   * 'silence_seconds' seconds.
>+   */
>+  if (sscanf(value, "%u:%u:%lu", &info.max_invalid,
>+  &info.max_unsupport, &info.silence_seconds)
>+  != 3) {
>+  PMD_DRV_LOG(ERR, "vf_max_wrong_msg error! format like: "
>+  "vf_max_wrong_msg=4:6:60");
>+  return -EINVAL;
>+  }
>+
>+  /*
>+   * If invalid or unsupported message checking function is enabled
>+   * by setting max_invalid or max_unsupport variable to not zero,
>+   * 'slience_seconds' must be greater than zero.
>+   */
>+  if ((info.max_invalid | info.max_unsupport) &&

info.max_invalid || info.max_unsupport?

And I prefer to use unsupported in your variable names for unsupport is not a 
valid word.

>+  !info.silence_seconds) {
>+  PMD_DRV_LOG(ERR, "vf_max_wrong_msg error! last integer"
>+  " must be larger than zero");
>+  return -EINVAL;
>+  }
>+
>+  memcpy(opaque, &info, sizeof(struct i40e_wrong_vf_msg));
>+  return 0;
>+}
>+
>+static int
>+i40e_parse_vf_msg_check_info(struct rte_eth_dev *dev,
>+  struct i40e_wrong_vf_msg *wrong_info)
>+{
>+  int ret = 0;
>+  int kvargs_count;
>+  struct rte_kvargs *kvlist;
>+
>+  /* reset all to zero */
>+  memset(wrong_info, 0, sizeof(*wrong_info));
>+
>+  if (!dev->device->devargs)
>+  return ret;
>+
>+  kvlist = rte_kvargs_parse(dev->device->devargs->args, valid_keys);
>+  if (!kvlist)
>+  return -EINVAL;
>+
>+  kvargs_count = rte_kvargs_count(kvlist, ETH_I40E_MAX_VF_WRONG_MSG);
>+  if (!kvargs_count)
>+  goto free_end;
>+
>+  if (kvargs_count > 1)
>+  PMD_DRV_LOG(WARNING, "More than one argument \"%s\" and only "
>+  "the first invalid or last valid one is used !",
>+  ETH_I40E_MAX_VF_WRONG_MSG);

What about we just allow 1 wrong msg argument?

>+
>+  if (rte_kvargs_process(kvlist, ETH_I40E_MAX_VF_WRONG_MSG,
>+  read_vf_msg_check_info, wrong_info) < 0)
>+  ret = -EINVAL;
>+
>+free_end:
>+  rte_kvargs_free(kvlist);
>+  return ret;
>+}
>+
> #define I40E_ALARM_INTERVAL 5 /* us */
> 
> static int
>@@ -1328,6 +1406,8 @@ static inline void i40e_config_automask(struct i40e_pf 
>*pf)
>   return -EIO;
>   }
> 
>+  /* read VF message checking function parameters */
>+  i40e_parse_vf_msg_check_info(dev, &pf->wrong_vf_msg_conf);
>   /* Check if need to support multi-driver */
>   i40e_support_multi_driver(dev);
>   /* Check if users want the latest supported vec path */
>diff --git a/drivers/net/i40e/i40e_ethdev.h b/drive

Re: [dpdk-dev] [PATCH] maintainers: update for Mellanox mlx5 PMD

2019-08-21 Thread Shahaf Shuler
Wednesday, August 21, 2019 11:56 PM, Yongseok Koh:
> Subject: [dpdk-dev] [PATCH] maintainers: update for Mellanox mlx5 PMD
> 
> Matan thankfully accepted to replace myself as maintainer for mlx5 PMD.
> Good luck!
> 
> Signed-off-by: Yongseok Koh 

Thanks you Koh for all the hard work and the maintenance of the PMD.

Acked-by: Shahaf Shuler 

> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4100260861..30dbb8be55 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -715,8 +715,8 @@ F: doc/guides/nics/mlx4.rst
>  F: doc/guides/nics/features/mlx4.ini
> 
>  Mellanox mlx5
> +M: Matan Azrad 
>  M: Shahaf Shuler 
> -M: Yongseok Koh 
>  M: Viacheslav Ovsiienko 
>  T: git://dpdk.org/next/dpdk-next-net-mlx
>  F: drivers/net/mlx5/
> --
> 2.21.0.196.g041f5ea



[dpdk-dev] [PATCH 02/13] net/bnxt: prevent device access when device is in reset

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

Refactor init and uninit functions so that the driver can fail
the eth_dev_ops callbacks and accessing Tx and Rx queues
when device is in reset or in error state.

Transmit and receive queues are freed during reset cleanup and
reallocated during recovery. So we block all data path handling
in this state. The eth_dev dev_started field is updated depending
on the status of the device.

Signed-off-by: Kalesh AP 
Reviewed-by: Ajit Khaparde 
Reviewed-by: Santoshkumar Karanappa Rastapur 
Reviewed-by: Somnath Kotur 
---
 drivers/net/bnxt/bnxt.h|   1 +
 drivers/net/bnxt/bnxt_ethdev.c | 455 ++---
 drivers/net/bnxt/bnxt_hwrm.c   |   2 -
 drivers/net/bnxt/bnxt_ring.c   |  32 +++
 drivers/net/bnxt/bnxt_ring.h   |   1 +
 drivers/net/bnxt/bnxt_rxq.c|  25 ++
 drivers/net/bnxt/bnxt_rxr.c|  17 ++
 drivers/net/bnxt/bnxt_rxr.h|   2 +
 drivers/net/bnxt/bnxt_stats.c  |  34 ++-
 drivers/net/bnxt/bnxt_txq.c|   7 +
 drivers/net/bnxt/bnxt_txr.c|  27 ++
 drivers/net/bnxt/bnxt_txr.h|   2 +
 12 files changed, 452 insertions(+), 153 deletions(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index 0c9f994ea..49418cac9 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -465,6 +465,7 @@ struct bnxt {
 
 int bnxt_link_update_op(struct rte_eth_dev *eth_dev, int wait_to_complete);
 int bnxt_rcv_msg_from_vf(struct bnxt *bp, uint16_t vf_id, void *msg);
+int is_bnxt_in_error(struct bnxt *bp);
 
 bool is_bnxt_supported(struct rte_eth_dev *dev);
 bool bnxt_stratus_device(struct bnxt *bp);
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 6685ee7d9..33ff4a5a7 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -167,6 +167,16 @@ static void bnxt_print_link_info(struct rte_eth_dev 
*eth_dev);
 static int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t new_mtu);
 static int bnxt_dev_uninit(struct rte_eth_dev *eth_dev);
 
+int is_bnxt_in_error(struct bnxt *bp)
+{
+   if (bp->flags & BNXT_FLAG_FATAL_ERROR)
+   return -EIO;
+   if (bp->flags & BNXT_FLAG_FW_RESET)
+   return -EBUSY;
+
+   return 0;
+}
+
 /***/
 
 /*
@@ -207,6 +217,10 @@ static int bnxt_alloc_mem(struct bnxt *bp)
 {
int rc;
 
+   rc = bnxt_alloc_ring_grps(bp);
+   if (rc)
+   goto alloc_mem_err;
+
rc = bnxt_alloc_async_ring_struct(bp);
if (rc)
goto alloc_mem_err;
@@ -501,6 +515,9 @@ static void bnxt_dev_info_get_op(struct rte_eth_dev 
*eth_dev,
uint16_t max_vnics, i, j, vpool, vrxq;
unsigned int max_rx_rings;
 
+   if (is_bnxt_in_error(bp))
+   return;
+
/* MAC Specifics */
dev_info->max_mac_addrs = bp->max_l2_ctx;
dev_info->max_hash_mac_addrs = 0;
@@ -602,6 +619,10 @@ static int bnxt_dev_configure_op(struct rte_eth_dev 
*eth_dev)
bp->tx_nr_rings = eth_dev->data->nb_tx_queues;
bp->rx_nr_rings = eth_dev->data->nb_rx_queues;
 
+   rc = is_bnxt_in_error(bp);
+   if (rc)
+   return rc;
+
if (BNXT_VF(bp) && (bp->flags & BNXT_FLAG_NEW_RM)) {
rc = bnxt_hwrm_check_vf_rings(bp);
if (rc) {
@@ -791,8 +812,10 @@ static int bnxt_dev_start_op(struct rte_eth_dev *eth_dev)
 
eth_dev->rx_pkt_burst = bnxt_receive_function(eth_dev);
eth_dev->tx_pkt_burst = bnxt_transmit_function(eth_dev);
+
bnxt_enable_int(bp);
bp->flags |= BNXT_FLAG_INIT_DONE;
+   eth_dev->data->dev_started = 1;
bp->dev_stopped = 0;
return 0;
 
@@ -835,6 +858,11 @@ static void bnxt_dev_stop_op(struct rte_eth_dev *eth_dev)
struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
struct rte_intr_handle *intr_handle = &pci_dev->intr_handle;
 
+   eth_dev->data->dev_started = 0;
+   /* Prevent crashes when queues are still in use */
+   eth_dev->rx_pkt_burst = &bnxt_dummy_recv_pkts;
+   eth_dev->tx_pkt_burst = &bnxt_dummy_xmit_pkts;
+
bnxt_disable_int(bp);
 
/* disable uio/vfio intr/eventfd mapping */
@@ -889,6 +917,9 @@ static void bnxt_mac_addr_remove_op(struct rte_eth_dev 
*eth_dev,
struct bnxt_filter_info *filter, *temp_filter;
uint32_t i;
 
+   if (is_bnxt_in_error(bp))
+   return;
+
/*
 * Loop through all VNICs from the specified filter flow pools to
 * remove the corresponding MAC addr filter
@@ -924,6 +955,10 @@ static int bnxt_mac_addr_add_op(struct rte_eth_dev 
*eth_dev,
struct bnxt_filter_info *filter;
int rc = 0;
 
+   rc = is_bnxt_in_error(bp);
+   if (rc)
+   return rc;
+
if (BNXT_VF(bp) & !BNXT_VF_IS_TRUSTED(bp)) {
PMD_DRV_LOG(ERR, "Cannot add MAC address to a VF interface\n");
return -ENOTSUP;
@@ -969,6 +1004,10 @@ int bnxt_link_update_op(struct rte_eth_dev *eth_dev, i

[dpdk-dev] [PATCH 01/13] net/bnxt: hsi version update

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

Signed-off-by: Kalesh AP 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/hsi_struct_def_dpdk.h | 137 +
 1 file changed, 137 insertions(+)

diff --git a/drivers/net/bnxt/hsi_struct_def_dpdk.h 
b/drivers/net/bnxt/hsi_struct_def_dpdk.h
index 6c98c1d6d..009571725 100644
--- a/drivers/net/bnxt/hsi_struct_def_dpdk.h
+++ b/drivers/net/bnxt/hsi_struct_def_dpdk.h
@@ -33621,4 +33621,141 @@ struct hwrm_nvm_validate_option_cmd_err {
uint8_t unused_0[7];
 } __attribute__((packed));
 
+/*
+ * hwrm_fw_reset *
+ **/
+
+
+/* hwrm_fw_reset_input (size:192b/24B) */
+struct hwrm_fw_reset_input {
+   /* The HWRM command request type. */
+   uint16_treq_type;
+   /*
+* The completion ring to send the completion event on. This should
+* be the NQ ID returned from the `nq_alloc` HWRM command.
+*/
+   uint16_tcmpl_ring;
+   /*
+* The sequence ID is used by the driver for tracking multiple
+* commands. This ID is treated as opaque data by the firmware and
+* the value is returned in the `hwrm_resp_hdr` upon completion.
+*/
+   uint16_tseq_id;
+   /*
+* The target ID of the command:
+* * 0x0-0xFFF8 - The function ID
+* * 0xFFF8-0xFFFE - Reserved for internal processors
+* * 0x - HWRM
+*/
+   uint16_ttarget_id;
+   /*
+* A physical address pointer pointing to a host buffer that the
+* command's response data will be written. This can be either a host
+* physical address (HPA) or a guest physical address (GPA) and must
+* point to a physically contiguous block of memory.
+*/
+   uint64_tresp_addr;
+   /* Type of embedded processor. */
+   uint8_t embedded_proc_type;
+   /* Boot Processor */
+   #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_BOOT \
+   UINT32_C(0x0)
+   /* Management Processor */
+   #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_MGMT \
+   UINT32_C(0x1)
+   /* Network control processor */
+   #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_NETCTRL \
+   UINT32_C(0x2)
+   /* RoCE control processor */
+   #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_ROCE \
+   UINT32_C(0x3)
+   /*
+* Host (in multi-host environment): This is only valid if requester is 
IPC.
+* Reinit host hardware resources and PCIe.
+*/
+   #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_HOST \
+   UINT32_C(0x4)
+   /* AP processor complex (in multi-host environment). Use host_idx to 
control which core is reset */
+   #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_AP \
+   UINT32_C(0x5)
+   /* Reset all blocks of the chip (including all processors) */
+   #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_CHIP \
+   UINT32_C(0x6)
+   /*
+* Host (in multi-host environment): This is only valid if requester is 
IPC.
+* Reinit host hardware resources.
+*/
+   #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_HOST_RESOURCE_REINIT \
+   UINT32_C(0x7)
+   #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_LAST \
+   HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_HOST_RESOURCE_REINIT
+   /* Type of self reset. */
+   uint8_t selfrst_status;
+   /* No Self Reset */
+   #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTNONE \
+   UINT32_C(0x0)
+   /* Self Reset as soon as possible to do so safely */
+   #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTASAP \
+   UINT32_C(0x1)
+   /* Self Reset on PCIe Reset */
+   #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTPCIERST \
+   UINT32_C(0x2)
+   /* Self Reset immediately after notification to all clients. */
+   #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTIMMEDIATE \
+   UINT32_C(0x3)
+   #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_LAST \
+   HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTIMMEDIATE
+   /*
+* Indicate which host is being reset. 0 means first host.
+* Only valid when embedded_proc_type is host in multihost
+* environment
+*/
+   uint8_t host_idx;
+   uint8_t flags;
+   /*
+* When this bit is '1', then the core firmware initiates
+* the reset only after graceful shut down of all registered instances.
+* If not, the device will continue with the existing firmware.
+*/
+   #define HWRM_FW_RESET_INPUT_FLAGS_RESET_GRACEFUL UINT32_C(0x1)
+   uint8_t unused_0[4];
+} __attribute__((packed));
+
+/* hwrm_fw_reset_output (size:128b/16B) */
+struct hwrm_fw_reset_output {
+   /* The specific error status for the command. */
+   uint16_terror_code;
+   /* The HWR

[dpdk-dev] [PATCH 00/13] bnxt patchset to support device error recovery

2019-08-21 Thread Ajit Khaparde
This patchset adds support to moitor the health of the firmware and the
underlying device and recover to an operational state in case of error.
We can also detect if a FW upgrade is in progress and quiesce all
access to the device and recover once FW indicates everything is ready.

Patchset against dpdk-next-net. Please apply.

Kalesh AP (13):
  net/bnxt: hsi version update
  net/bnxt: prevent device access when device is in reset
  net/bnxt: handle reset notify async event from FW
  net/bnxt: inform firmware about IF state changes
  net/bnxt: handle fatal event from FW under error conditions
  net/bnxt: query firmware error recovery capabilities
  net/bnxt: map status registers for FW health monitoring
  net/bnxt: advertise error recovery capability and handle async event
  net/bnxt: add code for periodic FW health monitoring
  net/bnxt: use BIT macro instead of bit fields
  net/bnxt: reschedule the health check alarm correctly
  net/bnxt: add support for FW reset
  net/bnxt: reduce verbosity of logs

 drivers/net/bnxt/bnxt.h| 130 +++-
 drivers/net/bnxt/bnxt_cpr.c|  78 +++
 drivers/net/bnxt/bnxt_cpr.h|  18 +
 drivers/net/bnxt/bnxt_ethdev.c | 817 -
 drivers/net/bnxt/bnxt_hwrm.c   | 200 +-
 drivers/net/bnxt/bnxt_hwrm.h   |   7 +
 drivers/net/bnxt/bnxt_ring.c   |  39 +-
 drivers/net/bnxt/bnxt_ring.h   |   1 +
 drivers/net/bnxt/bnxt_rxq.c|  25 +
 drivers/net/bnxt/bnxt_rxr.c|  17 +
 drivers/net/bnxt/bnxt_rxr.h|   2 +
 drivers/net/bnxt/bnxt_stats.c  |  34 +-
 drivers/net/bnxt/bnxt_txq.c|   7 +
 drivers/net/bnxt/bnxt_txr.c|  27 +
 drivers/net/bnxt/bnxt_txr.h|   2 +
 drivers/net/bnxt/bnxt_util.h   |   4 +
 drivers/net/bnxt/bnxt_vnic.c   |   7 +-
 drivers/net/bnxt/hsi_struct_def_dpdk.h | 137 +
 18 files changed, 1339 insertions(+), 213 deletions(-)

-- 
2.20.1 (Apple Git-117)



[dpdk-dev] [PATCH 03/13] net/bnxt: handle reset notify async event from FW

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

When the FW upgrade is initiated the current instance
of FW issues a HWRM_ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY
async notification to the driver. On receiving this notification,
the PMD shall quiesce itself and poll on the HWRM_VER_GET FW
command at regular intervals.

Once the VER_GET command succeeds, the driver should go through
the rediscovery process and re-initialize the device.

Also register with FW for the reset notify async event.

Signed-off-by: Kalesh AP 
Reviewed-by: Ajit Khaparde 
Reviewed-by: Somnath Kotur 
---
 drivers/net/bnxt/bnxt.h|  15 +
 drivers/net/bnxt/bnxt_cpr.c|  14 +
 drivers/net/bnxt/bnxt_cpr.h|   1 +
 drivers/net/bnxt/bnxt_ethdev.c | 110 -
 drivers/net/bnxt/bnxt_hwrm.c   |  39 +---
 drivers/net/bnxt/bnxt_hwrm.h   |   2 +
 6 files changed, 158 insertions(+), 23 deletions(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index 49418cac9..8797b032e 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -333,6 +333,16 @@ struct bnxt_ctx_mem_info {
struct bnxt_ctx_pg_info *tqm_mem[BNXT_MAX_TC_Q];
 };
 
+/* Maximum Firmware Reset bail out value in milliseconds */
+#define BNXT_MAX_FW_RESET_TIMEOUT  6000
+/* Minimum time required for the firmware readiness in milliseconds */
+#define BNXT_MIN_FW_READY_TIMEOUT  2000
+/* Frequency for the firmware readiness check in milliseconds */
+#define BNXT_FW_READY_WAIT_INTERVAL100
+
+#define US_PER_MS  1000
+#define NS_PER_US  1000
+
 #define BNXT_HWRM_SHORT_REQ_LENsizeof(struct hwrm_short_input)
 struct bnxt {
void*bar0;
@@ -358,6 +368,8 @@ struct bnxt {
 #define BNXT_FLAG_DFLT_VNIC_SET(1 << 12)
 #define BNXT_FLAG_THOR_CHIP(1 << 13)
 #define BNXT_FLAG_STINGRAY (1 << 14)
+#define BNXT_FLAG_FW_RESET (1 << 15)
+#define BNXT_FLAG_FATAL_ERROR  (1 << 16)
 #define BNXT_FLAG_EXT_STATS_SUPPORTED  (1 << 29)
 #define BNXT_FLAG_NEW_RM   (1 << 30)
 #define BNXT_FLAG_INIT_DONE(1U << 31)
@@ -461,6 +473,9 @@ struct bnxt {
struct bnxt_ptp_cfg *ptp_cfg;
uint16_tvf_resv_strategy;
struct bnxt_ctx_mem_info*ctx;
+
+   uint16_tfw_reset_min_msecs;
+   uint16_tfw_reset_max_msecs;
 };
 
 int bnxt_link_update_op(struct rte_eth_dev *eth_dev, int wait_to_complete);
diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c
index 655bcf1a8..cefb5db2a 100644
--- a/drivers/net/bnxt/bnxt_cpr.c
+++ b/drivers/net/bnxt/bnxt_cpr.c
@@ -40,6 +40,20 @@ void bnxt_handle_async_event(struct bnxt *bp,
case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED:
PMD_DRV_LOG(INFO, "Port conn async event\n");
break;
+   case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY:
+   /* timestamp_lo/hi values are in units of 100ms */
+   bp->fw_reset_max_msecs = async_cmp->timestamp_hi ?
+   rte_le_to_cpu_16(async_cmp->timestamp_hi) * 100 :
+   BNXT_MAX_FW_RESET_TIMEOUT;
+   bp->fw_reset_min_msecs = async_cmp->timestamp_lo ?
+   async_cmp->timestamp_lo * 100 :
+   BNXT_MIN_FW_READY_TIMEOUT;
+   PMD_DRV_LOG(INFO,
+   "Firmware non-fatal reset event received\n");
+
+   bp->flags |= BNXT_FLAG_FW_RESET;
+   bnxt_dev_reset_and_resume(bp);
+   break;
default:
PMD_DRV_LOG(INFO, "handle_async_event id = 0x%x\n", event_id);
break;
diff --git a/drivers/net/bnxt/bnxt_cpr.h b/drivers/net/bnxt/bnxt_cpr.h
index 8c6a34b61..4f86e3f60 100644
--- a/drivers/net/bnxt/bnxt_cpr.h
+++ b/drivers/net/bnxt/bnxt_cpr.h
@@ -106,5 +106,6 @@ struct bnxt;
 void bnxt_handle_async_event(struct bnxt *bp, struct cmpl_base *cmp);
 void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base *cmp);
 int bnxt_event_hwrm_resp_handler(struct bnxt *bp, struct cmpl_base *cmp);
+int bnxt_dev_reset_and_resume(struct bnxt *bp);
 
 #endif
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 33ff4a5a7..1aef227f2 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bnxt.h"
 #include "bnxt_cpr.h"
@@ -166,6 +167,8 @@ static int bnxt_vlan_offload_set_op(struct rte_eth_dev 
*dev, int mask);
 static void bnxt_print_link_info(struct rte_eth_dev *eth_dev);
 static int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t new_mtu);
 static int bnxt_dev_uninit(struct rte_eth_dev *eth_dev);
+static int bnxt_init_resources(struct bnxt *bp, bool reconfig_dev);
+static int bnxt_uninit_resources(struct bnxt *bp, bool reconfig_dev);
 
 int is_bnxt_in_error(struct bnxt *bp)
 {
@@ -201,19 +204,25 @@ static uint16_t  bnxt_r

[dpdk-dev] [PATCH 04/13] net/bnxt: inform firmware about IF state changes

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

Use latest firmware API to inform firmware about IF state changes.
Firmware has the option to clean up resources during IF down and
to require the driver to reserve resources again during IF up.

Signed-off-by: Kalesh AP 
Reviewed-by: Santoshkumar Karanappa Rastapur 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h|  1 +
 drivers/net/bnxt/bnxt_ethdev.c |  4 
 drivers/net/bnxt/bnxt_hwrm.c   | 35 ++
 drivers/net/bnxt/bnxt_hwrm.h   |  1 +
 4 files changed, 41 insertions(+)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index 8797b032e..394a2a941 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -370,6 +370,7 @@ struct bnxt {
 #define BNXT_FLAG_STINGRAY (1 << 14)
 #define BNXT_FLAG_FW_RESET (1 << 15)
 #define BNXT_FLAG_FATAL_ERROR  (1 << 16)
+#define BNXT_FLAG_FW_CAP_IF_CHANGE (1 << 17)
 #define BNXT_FLAG_EXT_STATS_SUPPORTED  (1 << 29)
 #define BNXT_FLAG_NEW_RM   (1 << 30)
 #define BNXT_FLAG_INIT_DONE(1U << 31)
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 1aef227f2..f7b2ef179 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -803,6 +803,8 @@ static int bnxt_dev_start_op(struct rte_eth_dev *eth_dev)
bp->rx_cp_nr_rings, RTE_ETHDEV_QUEUE_STAT_CNTRS);
}
 
+   bnxt_hwrm_if_change(bp, 1);
+
rc = bnxt_init_chip(bp);
if (rc)
goto error;
@@ -829,6 +831,7 @@ static int bnxt_dev_start_op(struct rte_eth_dev *eth_dev)
return 0;
 
 error:
+   bnxt_hwrm_if_change(bp, 0);
bnxt_shutdown_nic(bp);
bnxt_free_tx_mbufs(bp);
bnxt_free_rx_mbufs(bp);
@@ -895,6 +898,7 @@ static void bnxt_dev_stop_op(struct rte_eth_dev *eth_dev)
bnxt_free_tx_mbufs(bp);
bnxt_free_rx_mbufs(bp);
bnxt_shutdown_nic(bp);
+   bnxt_hwrm_if_change(bp, 0);
bp->dev_stopped = 1;
 }
 
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index b27dbe87e..17c7b5e9e 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -716,6 +716,11 @@ int bnxt_hwrm_func_driver_register(struct bnxt *bp)
rc = bnxt_hwrm_send_message(bp, &req, sizeof(req), BNXT_USE_CHIMP_MB);
 
HWRM_CHECK_RESULT();
+
+   flags = rte_le_to_cpu_32(resp->flags);
+   if (flags & HWRM_FUNC_DRV_RGTR_OUTPUT_FLAGS_IF_CHANGE_SUPPORTED)
+   bp->flags |= BNXT_FLAG_FW_CAP_IF_CHANGE;
+
HWRM_UNLOCK();
 
bp->flags |= BNXT_FLAG_REGISTERED;
@@ -4649,3 +4654,33 @@ int bnxt_hwrm_set_mac(struct bnxt *bp)
 
return rc;
 }
+
+int bnxt_hwrm_if_change(struct bnxt *bp, bool state)
+{
+   struct hwrm_func_drv_if_change_output *resp = bp->hwrm_cmd_resp_addr;
+   struct hwrm_func_drv_if_change_input req = {0};
+   int rc;
+
+   if (!(bp->flags & BNXT_FLAG_FW_CAP_IF_CHANGE))
+   return 0;
+
+   /* Do not issue FUNC_DRV_IF_CHANGE during reset recovery.
+* If we issue FUNC_DRV_IF_CHANGE with flags down before
+* FUNC_DRV_UNRGTR, FW resets before FUNC_DRV_UNRGTR
+*/
+   if (!state && (bp->flags & BNXT_FLAG_FW_RESET))
+   return 0;
+
+   HWRM_PREP(req, FUNC_DRV_IF_CHANGE, BNXT_USE_CHIMP_MB);
+
+   if (state)
+   req.flags =
+   rte_cpu_to_le_32(HWRM_FUNC_DRV_IF_CHANGE_INPUT_FLAGS_UP);
+
+   rc = bnxt_hwrm_send_message(bp, &req, sizeof(req), BNXT_USE_CHIMP_MB);
+
+   HWRM_CHECK_RESULT();
+   HWRM_UNLOCK();
+
+   return rc;
+}
diff --git a/drivers/net/bnxt/bnxt_hwrm.h b/drivers/net/bnxt/bnxt_hwrm.h
index a03620532..2f57e950b 100644
--- a/drivers/net/bnxt/bnxt_hwrm.h
+++ b/drivers/net/bnxt/bnxt_hwrm.h
@@ -201,4 +201,5 @@ int bnxt_hwrm_tunnel_redirect_query(struct bnxt *bp, 
uint32_t *type);
 int bnxt_hwrm_tunnel_redirect_info(struct bnxt *bp, uint8_t tun_type,
   uint16_t *dst_fid);
 int bnxt_hwrm_set_mac(struct bnxt *bp);
+int bnxt_hwrm_if_change(struct bnxt *bp, bool state);
 #endif
-- 
2.20.1 (Apple Git-117)



[dpdk-dev] [PATCH 08/13] net/bnxt: advertise error recovery capability and handle async event

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

1. Advertise HWRM_FUNC_DRV_RGTR_INPUT_FLAGS_ERROR_RECOVERY_SUPPORT flag
   in the FUNC_DRV_RGTR command.
2. request for the async event ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY
   in the FUNC_DRV_RGTR command.
3. handle the async event EVENT_ID_ERROR_RECOVERY from FW.

Error recovery support will be used by firmware only if all the driver
instances support error recovery process.

Signed-off-by: Kalesh AP 
Reviewed-by: Somnath Kotur 
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h  |  2 ++
 drivers/net/bnxt/bnxt_cpr.c  | 45 
 drivers/net/bnxt/bnxt_cpr.h  | 12 ++
 drivers/net/bnxt/bnxt_hwrm.c |  5 
 drivers/net/bnxt/bnxt_hwrm.h |  2 ++
 5 files changed, 66 insertions(+)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index 1da09569d..f9147a9a8 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -365,6 +365,8 @@ struct bnxt_error_recovery_info {
uint8_t delay_after_reset[BNXT_NUM_RESET_REG];
 #define BNXT_FLAG_ERROR_RECOVERY_HOST  (1 << 0)
 #define BNXT_FLAG_ERROR_RECOVERY_CO_CPU(1 << 1)
+#define BNXT_FLAG_MASTER_FUNC  (1 << 2)
+#define BNXT_FLAG_RECOVERY_ENABLED (1 << 3)
uint32_tflags;
 };
 
diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c
index 6e0b1d67e..7f5b3314e 100644
--- a/drivers/net/bnxt/bnxt_cpr.c
+++ b/drivers/net/bnxt/bnxt_cpr.c
@@ -20,6 +20,7 @@ void bnxt_handle_async_event(struct bnxt *bp,
struct hwrm_async_event_cmpl *async_cmp =
(struct hwrm_async_event_cmpl *)cmp;
uint16_t event_id = rte_le_to_cpu_16(async_cmp->event_id);
+   struct bnxt_error_recovery_info *info;
uint32_t event_data;
 
/* TODO: HWRM async events are not defined yet */
@@ -63,6 +64,31 @@ void bnxt_handle_async_event(struct bnxt *bp,
bp->flags |= BNXT_FLAG_FW_RESET;
bnxt_dev_reset_and_resume(bp);
break;
+   case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY:
+   info = bp->recovery_info;
+
+   if (!info)
+   return;
+
+   PMD_DRV_LOG(INFO, "Error recovery async event received\n");
+
+   event_data = rte_le_to_cpu_32(async_cmp->event_data1) &
+   EVENT_DATA1_FLAGS_MASK;
+
+   if (event_data & EVENT_DATA1_FLAGS_MASTER_FUNC)
+   info->flags |= BNXT_FLAG_MASTER_FUNC;
+   else
+   info->flags &= ~BNXT_FLAG_MASTER_FUNC;
+
+   if (event_data & EVENT_DATA1_FLAGS_RECOVERY_ENABLED)
+   info->flags |= BNXT_FLAG_RECOVERY_ENABLED;
+   else
+   info->flags &= ~BNXT_FLAG_RECOVERY_ENABLED;
+
+   PMD_DRV_LOG(INFO, "recovery enabled(%d), master function(%d)\n",
+   bnxt_is_recovery_enabled(bp),
+   bnxt_is_master_func(bp));
+   break;
default:
PMD_DRV_LOG(INFO, "handle_async_event id = 0x%x\n", event_id);
break;
@@ -184,3 +210,22 @@ int bnxt_event_hwrm_resp_handler(struct bnxt *bp, struct 
cmpl_base *cmp)
 
return evt;
 }
+
+bool bnxt_is_master_func(struct bnxt *bp)
+{
+   if (bp->recovery_info->flags & BNXT_FLAG_MASTER_FUNC)
+   return true;
+
+   return false;
+}
+
+bool bnxt_is_recovery_enabled(struct bnxt *bp)
+{
+   struct bnxt_error_recovery_info *info;
+
+   info = bp->recovery_info;
+   if (info && (info->flags & BNXT_FLAG_RECOVERY_ENABLED))
+   return true;
+
+   return false;
+}
diff --git a/drivers/net/bnxt/bnxt_cpr.h b/drivers/net/bnxt/bnxt_cpr.h
index 4e63fd12f..22fba5b40 100644
--- a/drivers/net/bnxt/bnxt_cpr.h
+++ b/drivers/net/bnxt/bnxt_cpr.h
@@ -113,4 +113,16 @@ int bnxt_dev_reset_and_resume(struct bnxt *bp);
 #define EVENT_DATA1_REASON_CODE_MASK   \
HWRM_ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_MASK
 
+#define EVENT_DATA1_FLAGS_MASK \
+   HWRM_ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASK
+
+#define EVENT_DATA1_FLAGS_MASTER_FUNC  \
+   HWRM_ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASTER_FUNC
+
+#define EVENT_DATA1_FLAGS_RECOVERY_ENABLED \
+   HWRM_ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_RECOVERY_ENABLED
+
+bool bnxt_is_recovery_enabled(struct bnxt *bp);
+bool bnxt_is_master_func(struct bnxt *bp);
+
 #endif
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 2d9c43c98..350e867bf 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -685,6 +685,8 @@ int bnxt_hwrm_func_driver_register(struct bnxt *bp)
return 0;
 
flags = HWRM_FUNC_DRV_RGTR_INPUT_FLAGS_HOT_RESET_SUPPORT;
+   if (bp->flags & BNXT_FLAG_FW_CAP_ERROR_RECOVERY)
+   

[dpdk-dev] [PATCH 12/13] net/bnxt: add support for FW reset

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

Added code to perform FW_RESET. When the driver detects error in FW,
it has to initiate the recovery by resetting the cores. FW advertise
the method to do a core reset, reset register offsets and values
to perform reset in response of HWRM_ERROR_RECOVERY_QCFG command.

There are 2 ways to recover from the error.
1. Master function issues core resets to recover from error.
2. Master function detects chimp dead condition and notify the Kong
   processor about the chimp dead case through FW_RESET HWRM command.
   Kong Processor send an RESET_NOTIFY async event with
   REASON_CODE_FW_EXCEPTION_FATAL to all the PF’s/VF’s that
   chimp is dead and it is going to reset the chimp.

Signed-off-by: Kalesh AP 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h|   1 +
 drivers/net/bnxt/bnxt_ethdev.c | 103 -
 drivers/net/bnxt/bnxt_hwrm.c   |  26 +
 drivers/net/bnxt/bnxt_hwrm.h   |   1 +
 4 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index edaef7897..9ea84ec2f 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -389,6 +389,7 @@ struct bnxt_error_recovery_info {
 #define BNXT_FW_STATUS_REG_OFF(reg)((reg) & ~BNXT_FW_STATUS_REG_TYPE_MASK)
 
 #define BNXT_GRCP_WINDOW_2_BASE0x2000
+#define BNXT_GRCP_WINDOW_3_BASE0x3000
 
 #define BNXT_HWRM_SHORT_REQ_LENsizeof(struct hwrm_short_input)
 struct bnxt {
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index e7b0b44c4..095395dae 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3499,6 +3499,19 @@ static const struct eth_dev_ops bnxt_dev_ops = {
.timesync_read_tx_timestamp = bnxt_timesync_read_tx_timestamp,
 };
 
+static uint32_t bnxt_map_reset_regs(struct bnxt *bp, uint32_t reg)
+{
+   uint32_t offset;
+
+   /* Only pre-map the reset GRC registers using window 3 */
+   rte_write32(reg & 0xf000, (uint8_t *)bp->bar0 +
+   BNXT_GRCPF_REG_WINDOW_BASE_OUT + 8);
+
+   offset = BNXT_GRCP_WINDOW_3_BASE + (reg & 0xffc);
+
+   return offset;
+}
+
 int bnxt_map_fw_health_status_regs(struct bnxt *bp)
 {
struct bnxt_error_recovery_info *info = bp->recovery_info;
@@ -3542,6 +3555,34 @@ static void bnxt_unmap_fw_health_status_regs(struct bnxt 
*bp)
BNXT_GRCPF_REG_WINDOW_BASE_OUT + 4);
 }
 
+static void bnxt_write_fw_reset_reg(struct bnxt *bp, uint32_t index)
+{
+   struct bnxt_error_recovery_info *info = bp->recovery_info;
+   uint32_t delay = info->delay_after_reset[index];
+   uint32_t val = info->reset_reg_val[index];
+   uint32_t reg = info->reset_reg[index];
+   uint32_t type, offset;
+
+   type = BNXT_FW_STATUS_REG_TYPE(reg);
+   offset = BNXT_FW_STATUS_REG_OFF(reg);
+
+   switch (type) {
+   case BNXT_FW_STATUS_REG_TYPE_CFG:
+   rte_pci_write_config(bp->pdev, &val, sizeof(val), offset);
+   break;
+   case BNXT_FW_STATUS_REG_TYPE_GRC:
+   offset = bnxt_map_reset_regs(bp, offset);
+   rte_write32(val, (uint8_t *)bp->bar0 + offset);
+   break;
+   case BNXT_FW_STATUS_REG_TYPE_BAR0:
+   rte_write32(val, (uint8_t *)bp->bar0 + offset);
+   break;
+   }
+   /* wait on a specific interval of time until core reset is complete */
+   if (delay)
+   rte_delay_ms(delay);
+}
+
 static void bnxt_dev_cleanup(struct bnxt *bp)
 {
bnxt_set_hwrm_link_config(bp, false);
@@ -3636,6 +3677,58 @@ uint32_t bnxt_read_fw_status_reg(struct bnxt *bp, 
uint32_t index)
return val;
 }
 
+static int bnxt_fw_reset_all(struct bnxt *bp)
+{
+   struct bnxt_error_recovery_info *info = bp->recovery_info;
+   uint32_t i;
+   int rc = 0;
+
+   if (info->flags & BNXT_FLAG_ERROR_RECOVERY_HOST) {
+   /* Reset through master function driver */
+   for (i = 0; i < info->reg_array_cnt; i++)
+   bnxt_write_fw_reset_reg(bp, i);
+   /* Wait for time specified by FW after triggering reset */
+   rte_delay_ms(info->master_func_wait_period_after_reset);
+   } else if (info->flags & BNXT_FLAG_ERROR_RECOVERY_CO_CPU) {
+   /* Reset with the help of Kong processor */
+   rc = bnxt_hwrm_fw_reset(bp);
+   if (rc)
+   PMD_DRV_LOG(ERR, "Failed to reset FW\n");
+   }
+
+   return rc;
+}
+
+static void bnxt_fw_reset_cb(void *arg)
+{
+   struct bnxt *bp = arg;
+   struct bnxt_error_recovery_info *info = bp->recovery_info;
+   int rc = 0;
+
+   /* Only Master function can do FW reset */
+   if (bnxt_is_master_func(bp) &&
+   bnxt_is_recovery_enabled(bp)) {
+   rc = bnxt_fw_reset_all(bp);
+   if (rc) {
+  

[dpdk-dev] [PATCH 13/13] net/bnxt: reduce verbosity of logs

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

When IOMMU is available, EAL picks IOVA as VA as the default IOVA mode.
This causes the bnxt driver to log warning messages saying
"Memzone physical address same as virtual." and "Using rte_mem_virt2iova()"
during load.

Reduce the verbosity of logs to DEBUG.

Signed-off-by: Kalesh AP 
Reviewed-by: Lance Richardson 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_ethdev.c | 21 +
 drivers/net/bnxt/bnxt_ring.c   |  7 +++
 drivers/net/bnxt/bnxt_vnic.c   |  7 +++
 3 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 095395dae..13f1ff6fb 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3893,10 +3893,9 @@ static int bnxt_alloc_ctx_mem_blk(__rte_unused struct 
bnxt *bp,
memset(mz->addr, 0, mz->len);
mz_phys_addr = mz->iova;
if ((unsigned long)mz->addr == mz_phys_addr) {
-   PMD_DRV_LOG(WARNING,
-   "Memzone physical address same as virtual.\n");
-   PMD_DRV_LOG(WARNING,
-   "Using rte_mem_virt2iova()\n");
+   PMD_DRV_LOG(DEBUG,
+   "physical address same as virtual\n");
+   PMD_DRV_LOG(DEBUG, "Using rte_mem_virt2iova()\n");
mz_phys_addr = rte_mem_virt2iova(mz->addr);
if (mz_phys_addr == RTE_BAD_IOVA) {
PMD_DRV_LOG(ERR,
@@ -3929,10 +3928,9 @@ static int bnxt_alloc_ctx_mem_blk(__rte_unused struct 
bnxt *bp,
memset(mz->addr, 0, mz->len);
mz_phys_addr = mz->iova;
if ((unsigned long)mz->addr == mz_phys_addr) {
-   PMD_DRV_LOG(WARNING,
+   PMD_DRV_LOG(DEBUG,
"Memzone physical address same as virtual.\n");
-   PMD_DRV_LOG(WARNING,
-   "Using rte_mem_virt2iova()\n");
+   PMD_DRV_LOG(DEBUG, "Using rte_mem_virt2iova()\n");
for (sz = 0; sz < mem_size; sz += BNXT_PAGE_SIZE)
rte_mem_lock_page(((char *)mz->addr) + sz);
mz_phys_addr = rte_mem_virt2iova(mz->addr);
@@ -4120,9 +4118,9 @@ static int bnxt_alloc_stats_mem(struct bnxt *bp)
memset(mz->addr, 0, mz->len);
mz_phys_addr = mz->iova;
if ((unsigned long)mz->addr == mz_phys_addr) {
-   PMD_DRV_LOG(WARNING,
+   PMD_DRV_LOG(DEBUG,
"Memzone physical address same as virtual.\n");
-   PMD_DRV_LOG(WARNING,
+   PMD_DRV_LOG(DEBUG,
"Using rte_mem_virt2iova()\n");
mz_phys_addr = rte_mem_virt2iova(mz->addr);
if (mz_phys_addr == RTE_BAD_IOVA) {
@@ -4158,10 +4156,9 @@ static int bnxt_alloc_stats_mem(struct bnxt *bp)
memset(mz->addr, 0, mz->len);
mz_phys_addr = mz->iova;
if ((unsigned long)mz->addr == mz_phys_addr) {
-   PMD_DRV_LOG(WARNING,
+   PMD_DRV_LOG(DEBUG,
"Memzone physical address same as virtual\n");
-   PMD_DRV_LOG(WARNING,
-   "Using rte_mem_virt2iova()\n");
+   PMD_DRV_LOG(DEBUG, "Using rte_mem_virt2iova()\n");
mz_phys_addr = rte_mem_virt2iova(mz->addr);
if (mz_phys_addr == RTE_BAD_IOVA) {
PMD_DRV_LOG(ERR,
diff --git a/drivers/net/bnxt/bnxt_ring.c b/drivers/net/bnxt/bnxt_ring.c
index f19865c83..2f57e038a 100644
--- a/drivers/net/bnxt/bnxt_ring.c
+++ b/drivers/net/bnxt/bnxt_ring.c
@@ -212,10 +212,9 @@ int bnxt_alloc_rings(struct bnxt *bp, uint16_t qidx,
mz_phys_addr_base = mz->iova;
mz_phys_addr = mz->iova;
if ((unsigned long)mz->addr == mz_phys_addr_base) {
-   PMD_DRV_LOG(WARNING,
-   "Memzone physical address same as virtual.\n");
-   PMD_DRV_LOG(WARNING,
-   "Using rte_mem_virt2iova()\n");
+   PMD_DRV_LOG(DEBUG,
+   "Memzone physical address same as virtual.\n");
+   PMD_DRV_LOG(DEBUG, "Using rte_mem_virt2iova()\n");
for (sz = 0; sz < total_alloc_len; sz += getpagesize())
rte_mem_lock_page(((char *)mz->addr) + sz);
mz_phys_addr_base = rte_mem_virt2iova(mz->addr);
diff --git a/drivers/net/bnxt/bnxt_vnic.c b/drivers/net/bnxt/bnxt_vnic.c
index 98415633e..9ea99388b 100644
--- a/drivers/net/bnxt/bnxt_vnic.c
+++ b/drivers/net/bnxt/bnxt_vnic.c
@@ -150,10 +150,9 @@ int bnxt_alloc_vnic_attributes(struct bnxt *bp)
}
mz_phys_addr = mz->iova;
if ((unsigned long)mz->addr == mz_phys_addr) {
-   PMD_DRV_LOG(WARNING,
-   "Memzone physical address s

[dpdk-dev] [PATCH 07/13] net/bnxt: map status registers for FW health monitoring

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

HWRM_ERROR_RECOVERY_QCFG command returns the FW status registers offset
for periodic firmware health check monitoring. Map them to GRC window 2.

Signed-off-by: Kalesh AP 
Reviewed-by: Somnath Kotur 
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h| 22 -
 drivers/net/bnxt/bnxt_ethdev.c | 44 ++
 drivers/net/bnxt/bnxt_hwrm.c   |  4 
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index 19bd13a7f..1da09569d 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -354,7 +354,9 @@ struct bnxt_error_recovery_info {
 #define BNXT_FW_HEARTBEAT_CNT_REG  1
 #define BNXT_FW_RECOVERY_CNT_REG   2
 #define BNXT_FW_RESET_INPROG_REG   3
-   uint32_tstatus_regs[4];
+#define BNXT_FW_STATUS_REG_CNT 4
+   uint32_tstatus_regs[BNXT_FW_STATUS_REG_CNT];
+   uint32_tmapped_status_regs[BNXT_FW_STATUS_REG_CNT];
uint32_treset_inprogress_reg_mask;
 #define BNXT_NUM_RESET_REG 16
uint8_t reg_array_cnt;
@@ -366,6 +368,22 @@ struct bnxt_error_recovery_info {
uint32_tflags;
 };
 
+/* address space location of register */
+#define BNXT_FW_STATUS_REG_TYPE_MASK   3
+/* register is located in PCIe config space */
+#define BNXT_FW_STATUS_REG_TYPE_CFG0
+/* register is located in GRC address space */
+#define BNXT_FW_STATUS_REG_TYPE_GRC1
+/* register is located in BAR0  */
+#define BNXT_FW_STATUS_REG_TYPE_BAR0   2
+/* register is located in BAR1  */
+#define BNXT_FW_STATUS_REG_TYPE_BAR1   3
+
+#define BNXT_FW_STATUS_REG_TYPE(reg)   ((reg) & BNXT_FW_STATUS_REG_TYPE_MASK)
+#define BNXT_FW_STATUS_REG_OFF(reg)((reg) & ~BNXT_FW_STATUS_REG_TYPE_MASK)
+
+#define BNXT_GRCP_WINDOW_2_BASE0x2000
+
 #define BNXT_HWRM_SHORT_REQ_LENsizeof(struct hwrm_short_input)
 struct bnxt {
void*bar0;
@@ -510,6 +528,8 @@ int bnxt_link_update_op(struct rte_eth_dev *eth_dev, int 
wait_to_complete);
 int bnxt_rcv_msg_from_vf(struct bnxt *bp, uint16_t vf_id, void *msg);
 int is_bnxt_in_error(struct bnxt *bp);
 
+int bnxt_map_fw_health_status_regs(struct bnxt *bp);
+
 bool is_bnxt_supported(struct rte_eth_dev *dev);
 bool bnxt_stratus_device(struct bnxt *bp);
 extern const struct rte_flow_ops bnxt_flow_ops;
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 18046c00a..52c460d2c 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3496,6 +3496,49 @@ static const struct eth_dev_ops bnxt_dev_ops = {
.timesync_read_tx_timestamp = bnxt_timesync_read_tx_timestamp,
 };
 
+int bnxt_map_fw_health_status_regs(struct bnxt *bp)
+{
+   struct bnxt_error_recovery_info *info = bp->recovery_info;
+   uint32_t reg_base = 0x;
+   int i;
+
+   /* Only pre-map the monitoring GRC registers using window 2 */
+   for (i = 0; i < BNXT_FW_STATUS_REG_CNT; i++) {
+   uint32_t reg = info->status_regs[i];
+
+   if (BNXT_FW_STATUS_REG_TYPE(reg) != BNXT_FW_STATUS_REG_TYPE_GRC)
+   continue;
+
+   if (reg_base == 0x)
+   reg_base = reg & 0xf000;
+   if ((reg & 0xf000) != reg_base)
+   return -ERANGE;
+
+   /* Use mask 0xffc as the Lower 2 bits indicates
+* address space location
+*/
+   info->mapped_status_regs[i] = BNXT_GRCP_WINDOW_2_BASE +
+   (reg & 0xffc);
+   }
+
+   if (reg_base == 0x)
+   return 0;
+
+   rte_write32(reg_base, (uint8_t *)bp->bar0 +
+   BNXT_GRCPF_REG_WINDOW_BASE_OUT + 4);
+
+   return 0;
+}
+
+static void bnxt_unmap_fw_health_status_regs(struct bnxt *bp)
+{
+   if (!(bp->flags & BNXT_FLAG_FW_CAP_ERROR_RECOVERY))
+   return;
+
+   rte_write32(0, (uint8_t *)bp->bar0 +
+   BNXT_GRCPF_REG_WINDOW_BASE_OUT + 4);
+}
+
 static void bnxt_dev_cleanup(struct bnxt *bp)
 {
bnxt_set_hwrm_link_config(bp, false);
@@ -4227,6 +4270,7 @@ bnxt_uninit_resources(struct bnxt *bp, bool reconfig_dev)
bnxt_free_int(bp);
bnxt_free_mem(bp, reconfig_dev);
bnxt_hwrm_func_buf_unrgtr(bp);
+   bnxt_unmap_fw_health_status_regs(bp);
rc = bnxt_hwrm_func_driver_unregister(bp, 0);
bp->flags &= ~BNXT_FLAG_REGISTERED;
bnxt_free_ctx_mem(bp);
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index e2c993936..2d9c43c98 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -4767,6 +4767,10 @@ int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
 err:
HWRM_UNLOCK();
 
+   /* Map the FW status registers */
+   if (!rc)
+   rc = bnxt_map_fw_health

[dpdk-dev] [PATCH 10/13] net/bnxt: use BIT macro instead of bit fields

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

use BIT macro instead of bit fields.

Signed-off-by: Kalesh AP 
Reviewed-by: Somnath Kotur 
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h  | 73 ++--
 drivers/net/bnxt/bnxt_util.h |  4 ++
 2 files changed, 41 insertions(+), 36 deletions(-)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index a23c4a64c..93aac15b4 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -19,6 +19,7 @@
 #include 
 
 #include "bnxt_cpr.h"
+#include "bnxt_util.h"
 
 #define BNXT_MAX_MTU   9574
 #define VLAN_TAG_SIZE  4
@@ -198,16 +199,16 @@ struct bnxt_ptp_cfg {
struct bnxt *bp;
 #define BNXT_MAX_TX_TS 1
uint16_trxctl;
-#define BNXT_PTP_MSG_SYNC  (1 << 0)
-#define BNXT_PTP_MSG_DELAY_REQ (1 << 1)
-#define BNXT_PTP_MSG_PDELAY_REQ(1 << 2)
-#define BNXT_PTP_MSG_PDELAY_RESP   (1 << 3)
-#define BNXT_PTP_MSG_FOLLOW_UP (1 << 8)
-#define BNXT_PTP_MSG_DELAY_RESP(1 << 9)
-#define BNXT_PTP_MSG_PDELAY_RESP_FOLLOW_UP (1 << 10)
-#define BNXT_PTP_MSG_ANNOUNCE  (1 << 11)
-#define BNXT_PTP_MSG_SIGNALING (1 << 12)
-#define BNXT_PTP_MSG_MANAGEMENT(1 << 13)
+#define BNXT_PTP_MSG_SYNC  BIT(0)
+#define BNXT_PTP_MSG_DELAY_REQ BIT(1)
+#define BNXT_PTP_MSG_PDELAY_REQBIT(2)
+#define BNXT_PTP_MSG_PDELAY_RESP   BIT(3)
+#define BNXT_PTP_MSG_FOLLOW_UP BIT(8)
+#define BNXT_PTP_MSG_DELAY_RESPBIT(9)
+#define BNXT_PTP_MSG_PDELAY_RESP_FOLLOW_UP BIT(10)
+#define BNXT_PTP_MSG_ANNOUNCE  BIT(11)
+#define BNXT_PTP_MSG_SIGNALING BIT(12)
+#define BNXT_PTP_MSG_MANAGEMENTBIT(13)
 #define BNXT_PTP_MSG_EVENTS(BNXT_PTP_MSG_SYNC |\
 BNXT_PTP_MSG_DELAY_REQ |   \
 BNXT_PTP_MSG_PDELAY_REQ |  \
@@ -363,10 +364,10 @@ struct bnxt_error_recovery_info {
uint32_treset_reg[BNXT_NUM_RESET_REG];
uint32_treset_reg_val[BNXT_NUM_RESET_REG];
uint8_t delay_after_reset[BNXT_NUM_RESET_REG];
-#define BNXT_FLAG_ERROR_RECOVERY_HOST  (1 << 0)
-#define BNXT_FLAG_ERROR_RECOVERY_CO_CPU(1 << 1)
-#define BNXT_FLAG_MASTER_FUNC  (1 << 2)
-#define BNXT_FLAG_RECOVERY_ENABLED (1 << 3)
+#define BNXT_FLAG_ERROR_RECOVERY_HOST  BIT(0)
+#define BNXT_FLAG_ERROR_RECOVERY_CO_CPUBIT(1)
+#define BNXT_FLAG_MASTER_FUNC  BIT(2)
+#define BNXT_FLAG_RECOVERY_ENABLED BIT(3)
uint32_tflags;
 
uint32_tlast_heart_beat;
@@ -399,28 +400,28 @@ struct bnxt {
void*doorbell_base;
 
uint32_tflags;
-#define BNXT_FLAG_REGISTERED   (1 << 0)
-#define BNXT_FLAG_VF   (1 << 1)
-#define BNXT_FLAG_PORT_STATS   (1 << 2)
-#define BNXT_FLAG_JUMBO(1 << 3)
-#define BNXT_FLAG_SHORT_CMD(1 << 4)
-#define BNXT_FLAG_UPDATE_HASH  (1 << 5)
-#define BNXT_FLAG_PTP_SUPPORTED(1 << 6)
-#define BNXT_FLAG_MULTI_HOST(1 << 7)
-#define BNXT_FLAG_EXT_RX_PORT_STATS(1 << 8)
-#define BNXT_FLAG_EXT_TX_PORT_STATS(1 << 9)
-#define BNXT_FLAG_KONG_MB_EN   (1 << 10)
-#define BNXT_FLAG_TRUSTED_VF_EN(1 << 11)
-#define BNXT_FLAG_DFLT_VNIC_SET(1 << 12)
-#define BNXT_FLAG_THOR_CHIP(1 << 13)
-#define BNXT_FLAG_STINGRAY (1 << 14)
-#define BNXT_FLAG_FW_RESET (1 << 15)
-#define BNXT_FLAG_FATAL_ERROR  (1 << 16)
-#define BNXT_FLAG_FW_CAP_IF_CHANGE (1 << 17)
-#define BNXT_FLAG_FW_CAP_ERROR_RECOVERY(1 << 18)
-#define BNXT_FLAG_EXT_STATS_SUPPORTED  (1 << 29)
-#define BNXT_FLAG_NEW_RM   (1 << 30)
-#define BNXT_FLAG_INIT_DONE(1U << 31)
+#define BNXT_FLAG_REGISTERED   BIT(0)
+#define BNXT_FLAG_VF   BIT(1)
+#define BNXT_FLAG_PORT_STATS   BIT(2)
+#define BNXT_FLAG_JUMBOBIT(3)
+#define BNXT_FLAG_SHORT_CMDBIT(4)
+#define BNXT_FLAG_UPDATE_HASH  BIT(5)
+#define BNXT_FLAG_PTP_SUPPORTEDBIT(6)
+#define BNXT_FLAG_MULTI_HOST   BIT(7)
+#define BNXT_FLAG_EXT_RX_PORT_STATSBIT(8)
+#define BNXT_FLAG_EXT_TX_PORT_STATSBIT(9)
+#define BNXT_FLAG_KONG_MB_EN   BIT(10)
+#define BNXT_FLAG_TRUSTED_VF_ENBIT(11)
+#define BNXT_FLAG_DFLT_VNIC_SETBIT(12)
+#define BNXT_FLAG_THOR_CHIPBIT(13)
+#define BNXT_FLAG_STINGRAY BIT(14)
+#define BNXT_FLAG_FW_RESET BIT(15)
+#define BNXT_FLAG_FATAL_ERROR  BIT(16)
+#define BNXT_FLAG_FW_CAP_IF_CHANGE BIT(17)
+#define BNXT_FLAG_FW_CAP_ERROR_RECOVERYBIT(18)
+#define BNXT_FLAG_EXT_STATS_SUPPORTED  BIT(19)
+#defi

[dpdk-dev] [PATCH 06/13] net/bnxt: query firmware error recovery capabilities

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

In Driver initiated error recovery process, driver has to know about
the registers offset and values to initiate FW reset. The HWRM command
HWRM_ERROR_RECOVERY_QCFG is used to obtain all the registers and values
required to initiate FW reset. This command response includes
FW heart_beat register, health status register, Error counter register,
register offsets and values to do chip reset if firmware crashes and
becomes unresponsive.

Signed-off-by: Kalesh AP 
Reviewed-by: Somnath Kotur 
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h| 27 +++
 drivers/net/bnxt/bnxt_ethdev.c | 10 
 drivers/net/bnxt/bnxt_hwrm.c   | 89 ++
 drivers/net/bnxt/bnxt_hwrm.h   |  1 +
 4 files changed, 127 insertions(+)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index 394a2a941..19bd13a7f 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -343,6 +343,29 @@ struct bnxt_ctx_mem_info {
 #define US_PER_MS  1000
 #define NS_PER_US  1000
 
+struct bnxt_error_recovery_info {
+   /* All units in milliseconds */
+   uint32_tdriver_polling_freq;
+   uint32_tmaster_func_wait_period;
+   uint32_tnormal_func_wait_period;
+   uint32_tmaster_func_wait_period_after_reset;
+   uint32_tmax_bailout_time_after_reset;
+#define BNXT_FW_STATUS_REG 0
+#define BNXT_FW_HEARTBEAT_CNT_REG  1
+#define BNXT_FW_RECOVERY_CNT_REG   2
+#define BNXT_FW_RESET_INPROG_REG   3
+   uint32_tstatus_regs[4];
+   uint32_treset_inprogress_reg_mask;
+#define BNXT_NUM_RESET_REG 16
+   uint8_t reg_array_cnt;
+   uint32_treset_reg[BNXT_NUM_RESET_REG];
+   uint32_treset_reg_val[BNXT_NUM_RESET_REG];
+   uint8_t delay_after_reset[BNXT_NUM_RESET_REG];
+#define BNXT_FLAG_ERROR_RECOVERY_HOST  (1 << 0)
+#define BNXT_FLAG_ERROR_RECOVERY_CO_CPU(1 << 1)
+   uint32_tflags;
+};
+
 #define BNXT_HWRM_SHORT_REQ_LENsizeof(struct hwrm_short_input)
 struct bnxt {
void*bar0;
@@ -371,6 +394,7 @@ struct bnxt {
 #define BNXT_FLAG_FW_RESET (1 << 15)
 #define BNXT_FLAG_FATAL_ERROR  (1 << 16)
 #define BNXT_FLAG_FW_CAP_IF_CHANGE (1 << 17)
+#define BNXT_FLAG_FW_CAP_ERROR_RECOVERY(1 << 18)
 #define BNXT_FLAG_EXT_STATS_SUPPORTED  (1 << 29)
 #define BNXT_FLAG_NEW_RM   (1 << 30)
 #define BNXT_FLAG_INIT_DONE(1U << 31)
@@ -477,6 +501,9 @@ struct bnxt {
 
uint16_tfw_reset_min_msecs;
uint16_tfw_reset_max_msecs;
+
+   /* Struct to hold adapter error recovery related info */
+   struct bnxt_error_recovery_info *recovery_info;
 };
 
 int bnxt_link_update_op(struct rte_eth_dev *eth_dev, int wait_to_complete);
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index a0b9e8f9e..18046c00a 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -4071,6 +4071,11 @@ static int bnxt_init_fw(struct bnxt *bp)
if (rc)
return rc;
 
+   /* Get the adapter error recovery support info */
+   rc = bnxt_hwrm_error_recovery_qcfg(bp);
+   if (rc)
+   bp->flags &= ~BNXT_FLAG_FW_CAP_ERROR_RECOVERY;
+
if (mtu >= RTE_ETHER_MIN_MTU && mtu <= BNXT_MAX_MTU &&
mtu != bp->eth_dev->data->mtu)
bp->eth_dev->data->mtu = mtu;
@@ -4228,6 +4233,11 @@ bnxt_uninit_resources(struct bnxt *bp, bool reconfig_dev)
if (!reconfig_dev)
bnxt_free_hwrm_resources(bp);
 
+   if (bp->recovery_info != NULL) {
+   rte_free(bp->recovery_info);
+   bp->recovery_info = NULL;
+   }
+
return rc;
 }
 
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 17c7b5e9e..e2c993936 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -626,6 +626,13 @@ static int __bnxt_hwrm_func_qcaps(struct bnxt *bp)
if (flags & HWRM_FUNC_QCAPS_OUTPUT_FLAGS_EXT_STATS_SUPPORTED)
bp->flags |= BNXT_FLAG_EXT_STATS_SUPPORTED;
 
+   if (flags & HWRM_FUNC_QCAPS_OUTPUT_FLAGS_ERROR_RECOVERY_CAPABLE) {
+   bp->flags |= BNXT_FLAG_FW_CAP_ERROR_RECOVERY;
+   PMD_DRV_LOG(DEBUG, "Adapter Error recovery SUPPORTED\n");
+   } else {
+   bp->flags &= ~BNXT_FLAG_FW_CAP_ERROR_RECOVERY;
+   }
+
HWRM_UNLOCK();
 
return rc;
@@ -4684,3 +4691,85 @@ int bnxt_hwrm_if_change(struct bnxt *bp, bool state)
 
return rc;
 }
+
+int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
+{
+   struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
+   struct bnxt_error_recovery_info *info;
+   struct hwrm_error_recovery_qcfg_input req = {0};
+   uint32_t flags = 0;
+   unsigned int i;
+   int

[dpdk-dev] [PATCH 11/13] net/bnxt: reschedule the health check alarm correctly

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

When the driver receives the error recovery notify event from fw
for the first time, it has to read the heartbeat count register and
recovery count register and schedule the fw health check task for
periodically monitoring the fw health.

FW may send this event at a later time when the state of master function
changes. There is no need to schedule the health check task this time.

Signed-off-by: Kalesh AP 
Reviewed-by: Santoshkumar Karanappa Rastapur 
Signed-off-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt.h| 1 +
 drivers/net/bnxt/bnxt_cpr.c| 3 +++
 drivers/net/bnxt/bnxt_ethdev.c | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index 93aac15b4..edaef7897 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -422,6 +422,7 @@ struct bnxt {
 #define BNXT_FLAG_EXT_STATS_SUPPORTED  BIT(19)
 #define BNXT_FLAG_NEW_RM   BIT(20)
 #define BNXT_FLAG_INIT_DONEBIT(21)
+#define BNXT_FLAG_FW_HEALTH_CHECK_SCHEDULEDBIT(22)
 #define BNXT_PF(bp)(!((bp)->flags & BNXT_FLAG_VF))
 #define BNXT_VF(bp)((bp)->flags & BNXT_FLAG_VF)
 #define BNXT_NPAR(bp)  ((bp)->port_partition_type)
diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c
index a692fbe7c..50f93bd21 100644
--- a/drivers/net/bnxt/bnxt_cpr.c
+++ b/drivers/net/bnxt/bnxt_cpr.c
@@ -89,6 +89,9 @@ void bnxt_handle_async_event(struct bnxt *bp,
bnxt_is_recovery_enabled(bp),
bnxt_is_master_func(bp));
 
+   if (bp->flags & BNXT_FLAG_FW_HEALTH_CHECK_SCHEDULED)
+   return;
+
info->last_heart_beat =
bnxt_read_fw_status_reg(bp, BNXT_FW_HEARTBEAT_CNT_REG);
info->last_reset_counter =
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 0317eb888..e7b0b44c4 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3687,6 +3687,7 @@ void bnxt_schedule_fw_health_check(struct bnxt *bp)
 
rte_eal_alarm_set(US_PER_MS * polling_freq,
  bnxt_check_fw_health, (void *)bp);
+   bp->flags |= BNXT_FLAG_FW_HEALTH_CHECK_SCHEDULED;
 }
 
 static void bnxt_cancel_fw_health_check(struct bnxt *bp)
@@ -3695,6 +3696,7 @@ static void bnxt_cancel_fw_health_check(struct bnxt *bp)
return;
 
rte_eal_alarm_cancel(bnxt_check_fw_health, (void *)bp);
+   bp->flags &= ~BNXT_FLAG_FW_HEALTH_CHECK_SCHEDULED;
 }
 
 static bool bnxt_vf_pciid(uint16_t id)
-- 
2.20.1 (Apple Git-117)



[dpdk-dev] [PATCH 05/13] net/bnxt: handle fatal event from FW under error conditions

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

When firmware hit some unrecoverable error conditions, firmware initiate
the recovery by sending an async event EVENT_CMPL_EVENT_ID_RESET_NOTIFY
with data1 set to RESET_NOTIFY_EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL
to all host drivers and will reset the chip.

The recovery procedure is same sequence as the one for hot FW upgrade.

Signed-off-by: Kalesh AP 
Reviewed-by: Somnath Kotur 
Reviewed-by: Ajit Khaparde 
---
 drivers/net/bnxt/bnxt_cpr.c| 13 +++--
 drivers/net/bnxt/bnxt_cpr.h|  5 +
 drivers/net/bnxt/bnxt_ethdev.c |  3 +++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c
index cefb5db2a..6e0b1d67e 100644
--- a/drivers/net/bnxt/bnxt_cpr.c
+++ b/drivers/net/bnxt/bnxt_cpr.c
@@ -20,6 +20,7 @@ void bnxt_handle_async_event(struct bnxt *bp,
struct hwrm_async_event_cmpl *async_cmp =
(struct hwrm_async_event_cmpl *)cmp;
uint16_t event_id = rte_le_to_cpu_16(async_cmp->event_id);
+   uint32_t event_data;
 
/* TODO: HWRM async events are not defined yet */
/* Needs to handle: link events, error events, etc. */
@@ -41,6 +42,7 @@ void bnxt_handle_async_event(struct bnxt *bp,
PMD_DRV_LOG(INFO, "Port conn async event\n");
break;
case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY:
+   event_data = rte_le_to_cpu_32(async_cmp->event_data1);
/* timestamp_lo/hi values are in units of 100ms */
bp->fw_reset_max_msecs = async_cmp->timestamp_hi ?
rte_le_to_cpu_16(async_cmp->timestamp_hi) * 100 :
@@ -48,8 +50,15 @@ void bnxt_handle_async_event(struct bnxt *bp,
bp->fw_reset_min_msecs = async_cmp->timestamp_lo ?
async_cmp->timestamp_lo * 100 :
BNXT_MIN_FW_READY_TIMEOUT;
-   PMD_DRV_LOG(INFO,
-   "Firmware non-fatal reset event received\n");
+   if ((event_data & EVENT_DATA1_REASON_CODE_MASK) ==
+   EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL) {
+   PMD_DRV_LOG(INFO,
+   "Firmware fatal reset event received\n");
+   bp->flags |= BNXT_FLAG_FATAL_ERROR;
+   } else {
+   PMD_DRV_LOG(INFO,
+   "Firmware non-fatal reset event 
received\n");
+   }
 
bp->flags |= BNXT_FLAG_FW_RESET;
bnxt_dev_reset_and_resume(bp);
diff --git a/drivers/net/bnxt/bnxt_cpr.h b/drivers/net/bnxt/bnxt_cpr.h
index 4f86e3f60..4e63fd12f 100644
--- a/drivers/net/bnxt/bnxt_cpr.h
+++ b/drivers/net/bnxt/bnxt_cpr.h
@@ -108,4 +108,9 @@ void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base 
*cmp);
 int bnxt_event_hwrm_resp_handler(struct bnxt *bp, struct cmpl_base *cmp);
 int bnxt_dev_reset_and_resume(struct bnxt *bp);
 
+#define EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL \
+   
HWRM_ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL
+#define EVENT_DATA1_REASON_CODE_MASK   \
+   HWRM_ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_MASK
+
 #endif
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index f7b2ef179..a0b9e8f9e 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3512,6 +3512,9 @@ static void bnxt_dev_recover(void *arg)
int timeout = bp->fw_reset_max_msecs;
int rc = 0;
 
+   /* Clear Error flag so that device re-init should happen */
+   bp->flags &= ~BNXT_FLAG_FATAL_ERROR;
+
do {
rc = bnxt_hwrm_ver_get(bp);
if (rc == 0)
-- 
2.20.1 (Apple Git-117)



[dpdk-dev] [PATCH 09/13] net/bnxt: add code for periodic FW health monitoring

2019-08-21 Thread Ajit Khaparde
From: Kalesh AP 

Periodically poll the FW heartbeat register and FW recovery counter
registers to check the FW health. Polling frequency will be
advertised by the FW in HWRM_ERROR_RECOVERY_QCFG response.
Schedule the task upon receiving the async event from FW.

Signed-off-by: Kalesh AP 
Reviewed-by: Ajit Khaparde 
Reviewed-by: Somnath Kotur 
---
 drivers/net/bnxt/bnxt.h|  5 ++
 drivers/net/bnxt/bnxt_cpr.c|  7 +++
 drivers/net/bnxt/bnxt_ethdev.c | 89 ++
 3 files changed, 101 insertions(+)

diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index f9147a9a8..a23c4a64c 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -368,6 +368,9 @@ struct bnxt_error_recovery_info {
 #define BNXT_FLAG_MASTER_FUNC  (1 << 2)
 #define BNXT_FLAG_RECOVERY_ENABLED (1 << 3)
uint32_tflags;
+
+   uint32_tlast_heart_beat;
+   uint32_tlast_reset_counter;
 };
 
 /* address space location of register */
@@ -531,6 +534,8 @@ int bnxt_rcv_msg_from_vf(struct bnxt *bp, uint16_t vf_id, 
void *msg);
 int is_bnxt_in_error(struct bnxt *bp);
 
 int bnxt_map_fw_health_status_regs(struct bnxt *bp);
+uint32_t bnxt_read_fw_status_reg(struct bnxt *bp, uint32_t index);
+void bnxt_schedule_fw_health_check(struct bnxt *bp);
 
 bool is_bnxt_supported(struct rte_eth_dev *dev);
 bool bnxt_stratus_device(struct bnxt *bp);
diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c
index 7f5b3314e..a692fbe7c 100644
--- a/drivers/net/bnxt/bnxt_cpr.c
+++ b/drivers/net/bnxt/bnxt_cpr.c
@@ -88,6 +88,13 @@ void bnxt_handle_async_event(struct bnxt *bp,
PMD_DRV_LOG(INFO, "recovery enabled(%d), master function(%d)\n",
bnxt_is_recovery_enabled(bp),
bnxt_is_master_func(bp));
+
+   info->last_heart_beat =
+   bnxt_read_fw_status_reg(bp, BNXT_FW_HEARTBEAT_CNT_REG);
+   info->last_reset_counter =
+   bnxt_read_fw_status_reg(bp, BNXT_FW_RECOVERY_CNT_REG);
+
+   bnxt_schedule_fw_health_check(bp);
break;
default:
PMD_DRV_LOG(INFO, "handle_async_event id = 0x%x\n", event_id);
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 52c460d2c..0317eb888 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -169,6 +169,7 @@ static int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, 
uint16_t new_mtu);
 static int bnxt_dev_uninit(struct rte_eth_dev *eth_dev);
 static int bnxt_init_resources(struct bnxt *bp, bool reconfig_dev);
 static int bnxt_uninit_resources(struct bnxt *bp, bool reconfig_dev);
+static void bnxt_cancel_fw_health_check(struct bnxt *bp);
 
 int is_bnxt_in_error(struct bnxt *bp)
 {
@@ -880,6 +881,8 @@ static void bnxt_dev_stop_op(struct rte_eth_dev *eth_dev)
/* disable uio/vfio intr/eventfd mapping */
rte_intr_disable(intr_handle);
 
+   bnxt_cancel_fw_health_check(bp);
+
bp->flags &= ~BNXT_FLAG_INIT_DONE;
if (bp->eth_dev->data->dev_started) {
/* TBD: STOP HW queues DMA */
@@ -3608,6 +3611,92 @@ int bnxt_dev_reset_and_resume(struct bnxt *bp)
return rc;
 }
 
+uint32_t bnxt_read_fw_status_reg(struct bnxt *bp, uint32_t index)
+{
+   struct bnxt_error_recovery_info *info = bp->recovery_info;
+   uint32_t reg = info->status_regs[index];
+   uint32_t type, offset, val = 0;
+
+   type = BNXT_FW_STATUS_REG_TYPE(reg);
+   offset = BNXT_FW_STATUS_REG_OFF(reg);
+
+   switch (type) {
+   case BNXT_FW_STATUS_REG_TYPE_CFG:
+   rte_pci_read_config(bp->pdev, &val, sizeof(val), offset);
+   break;
+   case BNXT_FW_STATUS_REG_TYPE_GRC:
+   offset = info->mapped_status_regs[index];
+   /* FALLTHROUGH */
+   case BNXT_FW_STATUS_REG_TYPE_BAR0:
+   val = rte_le_to_cpu_32(rte_read32((uint8_t *)bp->bar0 +
+  offset));
+   break;
+   }
+
+   return val;
+}
+
+/* Driver should poll FW heartbeat, reset_counter with the frequency
+ * advertised by FW in HWRM_ERROR_RECOVERY_QCFG.
+ * When the driver detects heartbeat stop or change in reset_counter,
+ * it has to trigger a reset to recover from the error condition.
+ * A “master PF” is the function who will have the privilege to
+ * initiate the chimp reset. The master PF will be elected by the
+ * firmware and will be notified through async message.
+ */
+static void bnxt_check_fw_health(void *arg)
+{
+   struct bnxt *bp = arg;
+   struct bnxt_error_recovery_info *info = bp->recovery_info;
+   uint32_t val = 0;
+
+   if (!info || !bnxt_is_recovery_enabled(bp) ||
+   is_bnxt_in_error(bp))
+   return;
+
+   val = bnxt_read_fw_status_reg(bp, BNXT_FW_HEARTBEAT_CNT_REG);
+   if (val == info->last_heart_beat)
+

Re: [dpdk-dev] [PATCH v2] timer: use rte_mp_msg to get freq from primary process

2019-08-21 Thread Ye Xiaolong
On 08/21, Jim Harris wrote:
>Ideally, get_tsc_freq_arch() is able to provide the
>TSC rate using architecture-specific means.  When that
>is not possible, DPDK reverts to calculating the
>TSC rate with a 100ms nanosleep or 1s sleep.  The latter
>occurs more frequently in VMs which often do not have
>access to the data they need from arch-specific means
>(CPUID leaf 0x15 or MSR 0xCE on x86).
>
>In secondary processes, the extra 100ms is especially
>noticeable and consumes the bulk of rte_eal_init()
>execution time.  So in secondary processes, if
>we cannot get the TSC rate using get_tsc_freq_arch(),
>try to get the TSC rate from the primary process
>instead using rte_mp_msg.  This is much faster than
>100ms.
>
>Reduces rte_eal_init() execution time in a secondary
>process from 165ms to 66ms on my test system.
>
>Signed-off-by: Jim Harris 
>Change-Id: I584419ed1c7d6f47841e0a0eb23f34c9f1186d35

This Change-Id line is unnecessary.

Thanks,
Xiaolong

>---
> lib/librte_eal/common/eal_common_timer.c |   62 ++
> 1 file changed, 62 insertions(+)
>
>diff --git a/lib/librte_eal/common/eal_common_timer.c 
>b/lib/librte_eal/common/eal_common_timer.c
>index 145543de7..ad965455d 100644
>--- a/lib/librte_eal/common/eal_common_timer.c
>+++ b/lib/librte_eal/common/eal_common_timer.c
>@@ -15,9 +15,17 @@
> #include 
> #include 
> #include 
>+#include 
>+#include 
> 
> #include "eal_private.h"
> 
>+#define EAL_TIMER_MP "eal_timer_mp_sync"
>+
>+struct timer_mp_param {
>+  uint64_t tsc_hz;
>+};
>+
> /* The frequency of the RDTSC timer resolution */
> static uint64_t eal_tsc_resolution_hz;
> 
>@@ -74,12 +82,58 @@ estimate_tsc_freq(void)
>   return RTE_ALIGN_MUL_NEAR(rte_rdtsc() - start, CYC_PER_10MHZ);
> }
> 
>+static uint64_t
>+get_tsc_freq_from_primary(void)
>+{
>+  struct rte_mp_msg mp_req = {0};
>+  struct rte_mp_reply mp_reply = {0};
>+  struct timer_mp_param *r;
>+  struct timespec ts = {.tv_sec = 1, .tv_nsec = 0};
>+  uint64_t tsc_hz;
>+
>+  strcpy(mp_req.name, EAL_TIMER_MP);
>+  if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) ||
>+  mp_reply.nb_received != 1) {
>+  tsc_hz = 0;
>+  } else {
>+  r = (struct timer_mp_param *)mp_reply.msgs[0].param;
>+  tsc_hz = r->tsc_hz;
>+  }
>+
>+  free(mp_reply.msgs);
>+  return tsc_hz;
>+}
>+
>+static int
>+timer_mp_primary(__attribute__((unused)) const struct rte_mp_msg *msg,
>+   const void *peer)
>+{
>+  struct rte_mp_msg reply = {0};
>+  struct timer_mp_param *r = (struct timer_mp_param *)reply.param;
>+
>+  r->tsc_hz = eal_tsc_resolution_hz;
>+  strcpy(reply.name, EAL_TIMER_MP);
>+  reply.len_param = sizeof(*r);
>+
>+  return rte_mp_reply(&reply, peer);
>+}
>+
> void
> set_tsc_freq(void)
> {
>   uint64_t freq;
>+  int rc;
> 
>   freq = get_tsc_freq_arch();
>+  if (!freq && rte_eal_process_type() != RTE_PROC_PRIMARY) {
>+  /* We couldn't get the TSC frequency through arch-specific
>+   *  means.  If this is a secondary process, try to get the
>+   *  TSC frequency from the primary process - this will
>+   *  be much faster than get_tsc_freq() or estimate_tsc_freq()
>+   *  below.
>+   */
>+  freq = get_tsc_freq_from_primary();
>+  }
>   if (!freq)
>   freq = get_tsc_freq();
>   if (!freq)
>@@ -87,6 +141,14 @@ set_tsc_freq(void)
> 
>   RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000);
>   eal_tsc_resolution_hz = freq;
>+  if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
>+  rc = rte_mp_action_register(EAL_TIMER_MP, timer_mp_primary);
>+  if (rc && rte_errno != ENOTSUP) {
>+  RTE_LOG(WARNING, EAL, "Could not register mp_action - "
>+  "secondary processes will calculate TSC rate "
>+  "independently.\n");
>+  }
>+  }
> }
> 
> void rte_delay_us_callback_register(void (*userfunc)(unsigned int))
>


[dpdk-dev] [PATCH v4 3/6] ticketlock: use new API to reduce contention on aarch64

2019-08-21 Thread Gavin Hu
While using ticket lock, cores repeatedly poll the lock variable.
This is replaced by rte_wait_until_equal API.

Running ticketlock_autotest on ThunderX2, Ampere eMAG80, and Arm N1SDP[1],
there were variances between runs, but no notable performance gain or
degradation were seen with and without this patch.

[1] https://community.arm.com/developer/tools-software/oss-platforms/w/\
docs/440/neoverse-n1-sdp

Signed-off-by: Gavin Hu 
Reviewed-by: Honnappa Nagarahalli 
Tested-by: Phil Yang 
Tested-by: Pavan Nikhilesh 
---
 lib/librte_eal/common/include/generic/rte_ticketlock.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/generic/rte_ticketlock.h 
b/lib/librte_eal/common/include/generic/rte_ticketlock.h
index d9bec87..232bbe9 100644
--- a/lib/librte_eal/common/include/generic/rte_ticketlock.h
+++ b/lib/librte_eal/common/include/generic/rte_ticketlock.h
@@ -66,8 +66,7 @@ static inline void
 rte_ticketlock_lock(rte_ticketlock_t *tl)
 {
uint16_t me = __atomic_fetch_add(&tl->s.next, 1, __ATOMIC_RELAXED);
-   while (__atomic_load_n(&tl->s.current, __ATOMIC_ACQUIRE) != me)
-   rte_pause();
+   rte_wait_until_equal_acquire_16(&tl->s.current, me);
 }
 
 /**
-- 
2.7.4



[dpdk-dev] [PATCH v4 2/6] eal: add the APIs to wait until equal

2019-08-21 Thread Gavin Hu
The rte_wait_until_equalxx APIs abstract the functionality of 'polling
for a memory location to become equal to a given value'.

Signed-off-by: Gavin Hu 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Steve Capper 
Reviewed-by: Ola Liljedahl 
Reviewed-by: Honnappa Nagarahalli 
Reviewed-by: Phil Yang 
Acked-by: Pavan Nikhilesh 
---
 .../common/include/arch/arm/rte_pause_64.h | 30 ++
 lib/librte_eal/common/include/generic/rte_pause.h  | 26 ++-
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
index 93895d3..dabde17 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Cavium, Inc
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_PAUSE_ARM64_H_
@@ -17,6 +18,35 @@ static inline void rte_pause(void)
asm volatile("yield" ::: "memory");
 }
 
+#ifdef RTE_ARM_USE_WFE
+#define __WAIT_UNTIL_EQUAL(name, asm_op, wide, type) \
+static __rte_always_inline void \
+rte_wait_until_equal_##name(volatile type * addr, type expected) \
+{ \
+   type tmp; \
+   asm volatile( \
+   #asm_op " %" #wide "[tmp], %[addr]\n" \
+   "cmp%" #wide "[tmp], %" #wide "[expected]\n" \
+   "b.eq   2f\n" \
+   "sevl\n" \
+   "1: wfe\n" \
+   #asm_op " %" #wide "[tmp], %[addr]\n" \
+   "cmp%" #wide "[tmp], %" #wide "[expected]\n" \
+   "bne1b\n" \
+   "2:\n" \
+   : [tmp] "=&r" (tmp) \
+   : [addr] "Q"(*addr), [expected] "r"(expected) \
+   : "cc", "memory"); \
+}
+/* Wait for *addr to be updated with expected value */
+__WAIT_UNTIL_EQUAL(relaxed_16, ldxrh, w, uint16_t)
+__WAIT_UNTIL_EQUAL(acquire_16, ldaxrh, w, uint16_t)
+__WAIT_UNTIL_EQUAL(relaxed_32, ldxr, w, uint32_t)
+__WAIT_UNTIL_EQUAL(acquire_32, ldaxr, w, uint32_t)
+__WAIT_UNTIL_EQUAL(relaxed_64, ldxr, x, uint64_t)
+__WAIT_UNTIL_EQUAL(acquire_64, ldaxr, x, uint64_t)
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/common/include/generic/rte_pause.h 
b/lib/librte_eal/common/include/generic/rte_pause.h
index 52bd4db..4741f8a 100644
--- a/lib/librte_eal/common/include/generic/rte_pause.h
+++ b/lib/librte_eal/common/include/generic/rte_pause.h
@@ -1,10 +1,10 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Cavium, Inc
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_PAUSE_H_
 #define _RTE_PAUSE_H_
-
 /**
  * @file
  *
@@ -12,6 +12,10 @@
  *
  */
 
+#include 
+#include 
+#include 
+
 /**
  * Pause CPU execution for a short while
  *
@@ -20,4 +24,24 @@
  */
 static inline void rte_pause(void);
 
+#if !defined(RTE_ARM_USE_WFE)
+#define __WAIT_UNTIL_EQUAL(op_name, size, type, memorder) \
+__rte_always_inline \
+static void\
+rte_wait_until_equal_##op_name##_##size(volatile type *addr, \
+   type expected) \
+{ \
+   while (__atomic_load_n(addr, memorder) != expected) \
+   rte_pause(); \
+}
+
+/* Wait for *addr to be updated with expected value */
+__WAIT_UNTIL_EQUAL(relaxed, 16, uint16_t, __ATOMIC_RELAXED)
+__WAIT_UNTIL_EQUAL(acquire, 16, uint16_t, __ATOMIC_ACQUIRE)
+__WAIT_UNTIL_EQUAL(relaxed, 32, uint32_t, __ATOMIC_RELAXED)
+__WAIT_UNTIL_EQUAL(acquire, 32, uint32_t, __ATOMIC_ACQUIRE)
+__WAIT_UNTIL_EQUAL(relaxed, 64, uint64_t, __ATOMIC_RELAXED)
+__WAIT_UNTIL_EQUAL(acquire, 64, uint64_t, __ATOMIC_ACQUIRE)
+#endif /* RTE_ARM_USE_WFE */
+
 #endif /* _RTE_PAUSE_H_ */
-- 
2.7.4



[dpdk-dev] [PATCH v4 4/6] ring: use wfe to wait for ring tail update on aarch64

2019-08-21 Thread Gavin Hu
Instead of polling for tail to be updated, use wfe instruction.

Signed-off-by: Gavin Hu 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Steve Capper 
Reviewed-by: Ola Liljedahl 
Reviewed-by: Honnappa Nagarahalli 
---
 lib/librte_ring/rte_ring_c11_mem.h | 4 ++--
 lib/librte_ring/rte_ring_generic.h | 3 +--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ring/rte_ring_c11_mem.h 
b/lib/librte_ring/rte_ring_c11_mem.h
index 0fb73a3..764d8f1 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -2,6 +2,7 @@
  *
  * Copyright (c) 2017,2018 HXT-semitech Corporation.
  * Copyright (c) 2007-2009 Kip Macy km...@freebsd.org
+ * Copyright (c) 2019 Arm Limited
  * All rights reserved.
  * Derived from FreeBSD's bufring.h
  * Used as BSD-3 Licensed with permission from Kip Macy.
@@ -21,8 +22,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, 
uint32_t new_val,
 * we need to wait for them to complete
 */
if (!single)
-   while (unlikely(ht->tail != old_val))
-   rte_pause();
+   rte_wait_until_equal_relaxed_32(&ht->tail, old_val);
 
__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
 }
diff --git a/lib/librte_ring/rte_ring_generic.h 
b/lib/librte_ring/rte_ring_generic.h
index 953cdbb..6828527 100644
--- a/lib/librte_ring/rte_ring_generic.h
+++ b/lib/librte_ring/rte_ring_generic.h
@@ -23,8 +23,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, 
uint32_t new_val,
 * we need to wait for them to complete
 */
if (!single)
-   while (unlikely(ht->tail != old_val))
-   rte_pause();
+   rte_wait_until_equal_relaxed_32(&ht->tail, old_val);
 
ht->tail = new_val;
 }
-- 
2.7.4



[dpdk-dev] [PATCH v4 0/6] use WFE for locks and ring on aarch64

2019-08-21 Thread Gavin Hu
DPDK has multiple use cases where the core repeatedly polls a location in
memory. This polling results in many cache and memory transactions.

Arm architecture provides WFE (Wait For Event) instruction, which allows
the cpu core to enter a low power state until woken up by the update to the
memory location being polled. Thus reducing the cache and memory
transactions.

x86 has the PAUSE hint instruction to reduce such overhead.

The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
for a memory location to become equal to a given value'.

For non-Arm platforms, these APIs are just wrappers around do-while loop
with rte_pause, so there are no performance differences.

For Arm platforms, use of WFE can be configured using CONFIG_RTE_USE_WFE
option. It is disabled by default.

Currently, use of WFE is supported only for aarch64 platforms. armv7
platforms do support the WFE instruction, but they require explicit wake up
events(sev) and are less performannt.

Testing shows that, performance varies across different platforms, with
some showing degradation.

CONFIG_RTE_ARM_USE_WFE should be enabled depending on the performance
benchmarking on the target platforms. Power saving should be an bonus,
but currenly we don't have ways to characterize that.

V4:
- rename the config as CONFIG_RTE_ARM_USE_WFE to indicate it applys to arm only
- introduce a macro for assembly Skelton to reduce the duplication of code
- add one patch for nxp fslmc to address a compiling error
V3:
- Convert RFCs to patches
V2:
- Use inline functions instead of marcos
- Add load and compare in the beginning of the APIs
- Fix some style errors in asm inline 
V1:
- Add the new APIs and use it for ring and locks

Gavin Hu (6):
  bus/fslmc: fix the conflicting dmb function
  eal: add the APIs to wait until equal
  ticketlock: use new API to reduce contention on aarch64
  ring: use wfe to wait for ring tail update on aarch64
  spinlock: use wfe to reduce contention on aarch64
  config: add WFE config entry for aarch64

 config/arm/meson.build |  1 +
 config/common_base |  6 +
 drivers/bus/fslmc/mc/fsl_mc_sys.h  | 10 +---
 drivers/bus/fslmc/mc/mc_sys.c  |  3 +--
 .../common/include/arch/arm/rte_pause_64.h | 30 ++
 .../common/include/arch/arm/rte_spinlock.h | 25 ++
 lib/librte_eal/common/include/generic/rte_pause.h  | 26 ++-
 .../common/include/generic/rte_ticketlock.h|  3 +--
 lib/librte_ring/rte_ring_c11_mem.h |  4 +--
 lib/librte_ring/rte_ring_generic.h |  3 +--
 10 files changed, 99 insertions(+), 12 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH v4 1/6] bus/fslmc: fix the conflicting dmb function

2019-08-21 Thread Gavin Hu
There are two definitions conflicting each other, for more
details, refer to [1].

include/rte_atomic_64.h:19: error: "dmb" redefined [-Werror]
drivers/bus/fslmc/mc/fsl_mc_sys.h:36: note: this is the location of the
previous definition
 #define dmb() {__asm__ __volatile__("" : : : "memory"); }

The fix is to include the spinlock.h file before the other header files,
this is inline with the coding style[2] about the "header includes".
The fix changes the function to take the argument for arm to be
meaningful.

[1] http://inbox.dpdk.org/users/VI1PR08MB537631AB25F41B8880DCCA988FDF0@i
VI1PR08MB5376.eurprd08.prod.outlook.com/T/#u
[2] https://doc.dpdk.org/guides/contributing/coding_style.html

Fixes: 3af733ba8da8 ("bus/fslmc: introduce MC object functions")
Cc: sta...@dpdk.org

Signed-off-by: Gavin Hu 
Reviewed-by: Phil Yang 
---
 drivers/bus/fslmc/mc/fsl_mc_sys.h | 10 +++---
 drivers/bus/fslmc/mc/mc_sys.c |  3 +--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/bus/fslmc/mc/fsl_mc_sys.h 
b/drivers/bus/fslmc/mc/fsl_mc_sys.h
index d0c7b39..fe9dc95 100644
--- a/drivers/bus/fslmc/mc/fsl_mc_sys.h
+++ b/drivers/bus/fslmc/mc/fsl_mc_sys.h
@@ -33,10 +33,14 @@ struct fsl_mc_io {
 #include 
 
 #ifndef dmb
-#define dmb() {__asm__ __volatile__("" : : : "memory"); }
+#ifdef RTE_ARCH_ARM64
+#define dmb(opt) {asm volatile("dmb " #opt : : : "memory"); }
+#else
+#define dmb(opt)
 #endif
-#define __iormb()  dmb()
-#define __iowmb()  dmb()
+#endif
+#define __iormb()  dmb(ld)
+#define __iowmb()  dmb(st)
 #define __arch_getq(a) (*(volatile uint64_t *)(a))
 #define __arch_putq(v, a)  (*(volatile uint64_t *)(a) = (v))
 #define __arch_putq32(v, a)(*(volatile uint32_t *)(a) = (v))
diff --git a/drivers/bus/fslmc/mc/mc_sys.c b/drivers/bus/fslmc/mc/mc_sys.c
index efafdc3..22143ef 100644
--- a/drivers/bus/fslmc/mc/mc_sys.c
+++ b/drivers/bus/fslmc/mc/mc_sys.c
@@ -4,11 +4,10 @@
  * Copyright 2017 NXP
  *
  */
+#include 
 #include 
 #include 
 
-#include 
-
 /** User space framework uses MC Portal in shared mode. Following change
  * introduces lock in MC FLIB
  */
-- 
2.7.4



[dpdk-dev] [PATCH v4 5/6] spinlock: use wfe to reduce contention on aarch64

2019-08-21 Thread Gavin Hu
In acquiring a spinlock, cores repeatedly poll the lock variable.
This is replaced by rte_wait_until_equal API.

Running the micro benchmarking and the testpmd and l3fwd traffic tests
on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no
notable performance gain nor degradation was measured.

Signed-off-by: Gavin Hu 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Phil Yang 
Reviewed-by: Steve Capper 
Reviewed-by: Ola Liljedahl 
Reviewed-by: Honnappa Nagarahalli 
Tested-by: Pavan Nikhilesh 
---
 .../common/include/arch/arm/rte_spinlock.h | 25 ++
 1 file changed, 25 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h 
b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
index 1a6916b..7b8328e 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
@@ -16,6 +16,31 @@ extern "C" {
 #include 
 #include "generic/rte_spinlock.h"
 
+/* armv7a does support WFE, but an explicit wake-up signal using SEV is
+ * required (must be preceded by DSB to drain the store buffer) and
+ * this is less performant, so keep armv7a implementation unchanged.
+ */
+#ifndef RTE_FORCE_INTRINSICS
+static inline void
+rte_spinlock_lock(rte_spinlock_t *sl)
+{
+   unsigned int tmp;
+   /* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.
+* faqs/ka16809.html
+*/
+   asm volatile(
+   "sevl\n"
+   "1: wfe\n"
+   "2: ldaxr %w[tmp], %w[locked]\n"
+   "cbnz   %w[tmp], 1b\n"
+   "stxr   %w[tmp], %w[one], %w[locked]\n"
+   "cbnz   %w[tmp], 2b\n"
+   : [tmp] "=&r" (tmp), [locked] "+Q"(sl->locked)
+   : [one] "r" (1)
+   : "cc", "memory");
+}
+#endif
+
 static inline int rte_tm_supported(void)
 {
return 0;
-- 
2.7.4



[dpdk-dev] [PATCH v4 6/6] config: add WFE config entry for aarch64

2019-08-21 Thread Gavin Hu
Add the RTE_USE_WFE configuration entry for aarch64, disabled by default.
It can be enabled selectively based on the performance benchmarking.

Signed-off-by: Gavin Hu 
Reviewed-by: Ruifeng Wang 
Reviewed-by: Steve Capper 
Reviewed-by: Honnappa Nagarahalli 
Reviewed-by: Phil Yang 
Acked-by: Pavan Nikhilesh 
---
 config/arm/meson.build | 1 +
 config/common_base | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 979018e..18ecd53 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -116,6 +116,7 @@ impl_dpaa = ['NXP DPAA', flags_dpaa, machine_args_generic]
 impl_dpaa2 = ['NXP DPAA2', flags_dpaa2, machine_args_generic]
 
 dpdk_conf.set('RTE_FORCE_INTRINSICS', 1)
+dpdk_conf.set('RTE_ARM_USE_WFE', 0)
 
 if not dpdk_conf.get('RTE_ARCH_64')
dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
diff --git a/config/common_base b/config/common_base
index 8ef75c2..d4cf974 100644
--- a/config/common_base
+++ b/config/common_base
@@ -570,6 +570,12 @@ CONFIG_RTE_CRYPTO_MAX_DEVS=64
 CONFIG_RTE_LIBRTE_PMD_ARMV8_CRYPTO=n
 CONFIG_RTE_LIBRTE_PMD_ARMV8_CRYPTO_DEBUG=n
 
+# Use WFE instructions to implement the rte_wait_for_equal_xxx APIs,
+# calling these APIs put the cores in low power state while waiting
+# for the memory address to become equal to the expected value.
+# This is supported only by aarch64.
+CONFIG_RTE_ARM_USE_WFE=n
+
 #
 # Compile NXP CAAM JR crypto Driver
 #
-- 
2.7.4



[dpdk-dev] [PATCH] net/af_packet: fix for stale sockets

2019-08-21 Thread Abhishek Sachan
af_packet driver is leaving stale socket after device is removed.
Ring buffers are memory mapped when device is added using rte_dev_probe.
There is no corresponding munmap call when device is removed/closed.
This commit fixes the issue by calling munmap
from rte_pmd_af_packet_remove().

Bugzilla ID: 339
Cc: sta...@dpdk.org

Signed-off-by: Abhishek Sachan 
Reviewed-by: John W. Linville 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index 82bf2cd..6df09f2 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -972,6 +972,7 @@ rte_pmd_af_packet_remove(struct rte_vdev_device *dev)
 {
struct rte_eth_dev *eth_dev = NULL;
struct pmd_internals *internals;
+   struct tpacket_req *req;
unsigned q;
 
PMD_LOG(INFO, "Closing AF_PACKET ethdev on numa socket %u",
@@ -992,7 +993,10 @@ rte_pmd_af_packet_remove(struct rte_vdev_device *dev)
return rte_eth_dev_release_port(eth_dev);
 
internals = eth_dev->data->dev_private;
+   req = &internals->req;
for (q = 0; q < internals->nb_queues; q++) {
+   munmap(internals->rx_queue[q].map,
+   2 * req->tp_block_size * req->tp_block_nr);
rte_free(internals->rx_queue[q].rd);
rte_free(internals->tx_queue[q].rd);
}
-- 
2.7.4



[dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details

2019-08-21 Thread Ruifeng Wang
From: Honnappa Nagarahalli 

Add a section to describe a design to integrate QSBR RCU library
with other libraries in DPDK.

Signed-off-by: Honnappa Nagarahalli 
Reviewed-by: Gavin Hu 
Reviewed-by: Ruifeng Wang 
---
 doc/guides/prog_guide/rcu_lib.rst | 51 +++
 1 file changed, 51 insertions(+)

diff --git a/doc/guides/prog_guide/rcu_lib.rst 
b/doc/guides/prog_guide/rcu_lib.rst
index 8fe5b1f73..2869441ca 100644
--- a/doc/guides/prog_guide/rcu_lib.rst
+++ b/doc/guides/prog_guide/rcu_lib.rst
@@ -186,3 +186,54 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, 
these APIs aid
 in debugging issues. One can mark the access to shared data structures on the
 reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if
 all the locks are unlocked.
+
+Integrating QSBR RCU with other libraries
+-
+
+Lock-free algorithms place additional burden on the application to reclaim
+memory. Integrating memory reclaiming mechanisms in the libraries help
+remove some of the burden. Though QSBR method presents flexibility to
+achieve performance, it presents challenges while integrating with libraries.
+
+The memory reclaiming process using QSBR can be split into 4 parts:
+
+#. Initialization
+#. Quiescent State Reporting
+#. Reclaiming Resources
+#. Shutdown
+
+The design proposed here requires the application to handle 'Initialization'
+and 'Quiescent State Reporting'. So,
+
+* the application has to create the RCU variable and register the reader 
threads to report their quiescent state.
+* the application has to register the same RCU variable with the library.
+* reader threads in the application have to report the quiescent state. This 
allows for the application to control the length of the critical section/how 
frequently the application wants to report the quiescent state.
+
+The library will handle 'Reclaiming Resources' part of the process. The
+libraries will make use of the writer thread context to execute the memory
+reclaiming algorithm. So,
+
+* library should provide an API to register a RCU variable that it will use.
+* library should trigger the readers to report quiescent state status upon 
deleting the resources by calling ``rte_rcu_qsbr_start``.
+
+* library should store the token and deleted resources for later use to free 
them after the readers have reported their quiescent state. Since the readers 
will report the quiescent state status in the order of deletion, the library 
must store the tokens/resources in the order in which the resources were 
deleted. A FIFO data structure would achieve the desired results. The length of 
the FIFO would depend on the rate of deletion and the rate at which the readers 
report their quiescent state. In the worst case the length of FIFO would be 
equal to the maximum number of resources the data structure supports. However, 
in most cases, the length will be much smaller. But, the library should not 
take the length of FIFO as an input from the application. Instead, it should 
implement a data structure which should be able to grow/shrink dynamically. 
Overhead introduced by such a data structure on delete operations should be 
considered as well.
+
+* library should query the quiescent state and free the resources. It should 
make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent 
state. This allows the application to do useful work while the readers report 
their quiescent state. If there are tokens/resources present in the FIFO 
already, the delete API should peek the head of the FIFO and check the 
quiescent state status. If the status is success, the token/resource should be 
dequeued and the resource should be freed. This process can be repeated till 
the quiescent state status for a token returns failure indicating that 
subsequent tokens will also fail quiescent state status query. The same process 
can be incorporated while adding new entries in the data structure if the 
library runs out of resources.
+
+The 'Shutdown' process needs to be shared between the application and the
+library.
+
+* library should check the quiescent state status of all the tokens that may 
be present in the FIFO and free the resources. It should make use of 
non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. If any of 
the tokens do not pass the quiescent state check, the library should print an 
error and stop the memory reclaimation process.
+
+* the application should make sure that the reader threads are not using the 
shared data structure, unregister the reader threads from the QSBR variable 
before calling the library's shutdown function.
+
+Integrating the resource reclaimation with libraries removes the burden from
+the application and makes it easy to use lock-free algorithms.
+
+This design has several advantages over currently known methods.
+
+#. Application does not need a dedicated thread to reclaim resources. Memory
+   reclaimation happ

[dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR

2019-08-21 Thread Ruifeng Wang
Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

Signed-off-by: Ruifeng Wang 
Reviewed-by: Honnappa Nagarahalli 
Reviewed-by: Gavin Hu 
---
 lib/librte_lpm/Makefile|   3 +-
 lib/librte_lpm/meson.build |   2 +
 lib/librte_lpm/rte_lpm.c   | 218 +++--
 lib/librte_lpm/rte_lpm.h   |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/meson.build|   3 +-
 6 files changed, 239 insertions(+), 15 deletions(-)

diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index a7946a1c5..ca9e16312 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_lpm.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index a5176d8ae..19a35107f 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -2,9 +2,11 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
+allow_experimental_apis = true
 sources = files('rte_lpm.c', 'rte_lpm6.c')
 headers = files('rte_lpm.h', 'rte_lpm6.h')
 # since header files have different names, we can install all vector headers
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 3a929a1b1..1efdef22d 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2019 Arm Limited
  */
 
 #include 
@@ -22,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "rte_lpm.h"
 
@@ -39,6 +41,11 @@ enum valid_flag {
VALID
 };
 
+struct __rte_lpm_qs_item {
+   uint64_t token; /**< QSBR token.*/
+   uint32_t index; /**< tbl8 group index.*/
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include 
@@ -381,6 +388,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm)
 
rte_mcfg_tailq_write_unlock();
 
+   if (lpm->qsv)
+   rte_ring_free(lpm->qs_fifo);
rte_free(lpm->tbl8);
rte_free(lpm->rules_tbl);
rte_free(lpm);
@@ -390,6 +399,145 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04);
 MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm),
rte_lpm_free_v1604);
 
+/* Add an item into FIFO.
+ * return: 0 - success
+ */
+static int
+__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo,
+   struct __rte_lpm_qs_item *item)
+{
+   if (rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token) != 0) {
+   rte_errno = ENOSPC;
+   return 1;
+   }
+   if (rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index) != 0) {
+   void *obj;
+   /* token needs to be dequeued when index enqueue fails */
+   rte_ring_sc_dequeue(fifo, &obj);
+   rte_errno = ENOSPC;
+   return 1;
+   }
+
+   return 0;
+}
+
+/* Remove item from FIFO.
+ * Used when data observed by rte_ring_peek.
+ */
+static void
+__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo,
+   struct __rte_lpm_qs_item *item)
+{
+   void *obj_token = NULL;
+   void *obj_index = NULL;
+
+   (void)rte_ring_sc_dequeue(fifo, &obj_token);
+   (void)rte_ring_sc_dequeue(fifo, &obj_index);
+
+   if (item) {
+   item->token = (uint64_t)((uintptr_t)obj_token);
+   item->index = (uint32_t)((uintptr_t)obj_index);
+   }
+}
+
+/* Max number of tbl8 groups to reclaim at one time. */
+#define RCU_QSBR_RECLAIM_SIZE  8
+
+/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL),
+ * reclaim will be triggered by tbl8_free.
+ */
+#define RCU_QSBR_RECLAIM_LEVEL 3
+
+/* Reclaim some tbl8 groups based on quiescent state check.
+ * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max.
+ * return: 0 - success, 1 - no group reclaimed.
+ */
+static uint32_t
+__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index)
+{
+   struct __rte_lpm_qs_item qs_item;
+   struct rte_lpm_tbl_entry *tbl8_entry = NULL;
+   void *obj_token;
+   uint32_t cnt = 0;
+
+   /* Check reader threads quiescent state and
+* reclaim as much tbl8 groups as possible.
+*/
+   while ((cnt < RCU_QSBR_RECLAIM_SIZE) &&
+   (rte_ring_peek(lpm->qs_fifo, &obj_

[dpdk-dev] [RFC PATCH 2/3] lib/ring: add peek API

2019-08-21 Thread Ruifeng Wang
The peek API allows fetching the next available object in the ring
without dequeuing it. This helps in scenarios where dequeuing of
objects depend on their value.

Signed-off-by: Dharmik Thakkar 
Signed-off-by: Ruifeng Wang 
Reviewed-by: Honnappa Nagarahalli 
Reviewed-by: Gavin Hu 
---
 lib/librte_ring/rte_ring.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2a9f768a1..d3d0d5e18 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void 
**obj_table,
r->cons.single, available);
 }
 
+/**
+ * Peek one object from a ring.
+ *
+ * The peek API allows fetching the next available object in the ring
+ * without dequeuing it. This API is not multi-thread safe with respect
+ * to other consumer threads.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, object available
+ *   - -ENOENT: Not enough entries in the ring.
+ */
+__rte_experimental
+static __rte_always_inline int
+rte_ring_peek(struct rte_ring *r, void **obj_p)
+{
+   uint32_t prod_tail = r->prod.tail;
+   uint32_t cons_head = r->cons.head;
+   uint32_t count = (prod_tail - cons_head) & r->mask;
+   unsigned int n = 1;
+   if (count) {
+   DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *);
+   return 0;
+   }
+   return -ENOENT;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1



[dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library

2019-08-21 Thread Ruifeng Wang
This patchset integrates RCU QSBR support with LPM library.

Document is added with suggested design of integrating RCU
library with other libraries in DPDK.
As an example, LPM library adds the integration. RCU is used
to safely free tbl8 groups that can be recycled. Table will not
be reclaimed or reused until reader finished referencing it.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use.

New API rte_ring_peek is introduced to help on management of
reclaiming FIFO queue.


Honnappa Nagarahalli (1):
  doc/rcu: add RCU integration design details

Ruifeng Wang (2):
  lib/ring: add peek API
  lib/lpm: integrate RCU QSBR

 doc/guides/prog_guide/rcu_lib.rst  |  51 +++
 lib/librte_lpm/Makefile|   3 +-
 lib/librte_lpm/meson.build |   2 +
 lib/librte_lpm/rte_lpm.c   | 218 +++--
 lib/librte_lpm/rte_lpm.h   |  22 +++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 lib/librte_ring/rte_ring.h |  30 
 lib/meson.build|   3 +-
 8 files changed, 320 insertions(+), 15 deletions(-)

-- 
2.17.1