Re: [dpdk-dev] [PATCH] service: print errors to rte log
> -Original Message- > From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Wednesday, August 21, 2019 12:33 AM > To: Van Haaren, Harry > Cc: dev@dpdk.org; Stephen Hemminger > Subject: [PATCH] service: print errors to rte log > > EAL should always use rte_log instead of putting errors to > stderr (which maybe redirected to /dev/null in a daemon). > > Also checks for null before rte_free are unnecessary. > > Signed-off-by: Stephen Hemminger Thanks - good improvements. A few nit-picks, I'll send a v2 based on your changes here with the below notes implemented. I'll add my Sign-off for code changes, and Acked-by for the whole, hope that's OK, if you'd prefer two different patches just let me know. -H > --- > lib/librte_eal/common/rte_service.c | 23 +++ > 1 file changed, 11 insertions(+), 12 deletions(-) > > diff --git a/lib/librte_eal/common/rte_service.c > b/lib/librte_eal/common/rte_service.c > index c3653ebae46c..aa2f8f3ef4b1 100644 > --- a/lib/librte_eal/common/rte_service.c > +++ b/lib/librte_eal/common/rte_service.c > @@ -70,10 +70,12 @@ static struct rte_service_spec_impl *rte_services; > static struct core_state *lcore_states; > static uint32_t rte_service_library_initialized; > > + > int32_t rte_service_init(void) > { Added line here should really split return-value and function into two lines. Found another instance of this, splitting that too to make the whole file consistent. Rest of file uses 1 line to split variable declarations and functions, so one line will do. > if (!rte_services) { > - printf("error allocating rte services array\n"); > + RTE_LOG(ERR, EAL, > + "error allocating rte services array\n"); > goto fail_mem; Some of these "strings" can be on the same line as RTE_LOG and stay inside the 80 char limit, moving them up a line for consistency.
[dpdk-dev] [PATCH v2] service: print errors to rte log
From: Stephen Hemminger EAL should always use rte_log instead of putting errors to stderr (which maybe redirected to /dev/null in a daemon). Also checks for null before rte_free are unnecessary. Minor code consistency improvements. Signed-off-by: Stephen Hemminger Signed-off-by: Harry van Haaren Acked-by: Harry van Haaren --- lib/librte_eal/common/rte_service.c | 27 --- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c index c3653ebae..fe0907720 100644 --- a/lib/librte_eal/common/rte_service.c +++ b/lib/librte_eal/common/rte_service.c @@ -70,10 +70,12 @@ static struct rte_service_spec_impl *rte_services; static struct core_state *lcore_states; static uint32_t rte_service_library_initialized; -int32_t rte_service_init(void) +int32_t +rte_service_init(void) { if (rte_service_library_initialized) { - printf("service library init() called, init flag %d\n", + RTE_LOG(NOTICE, EAL, + "service library init() called, init flag %d\n", rte_service_library_initialized); return -EALREADY; } @@ -82,14 +84,14 @@ int32_t rte_service_init(void) sizeof(struct rte_service_spec_impl), RTE_CACHE_LINE_SIZE); if (!rte_services) { - printf("error allocating rte services array\n"); + RTE_LOG(ERR, EAL, "error allocating rte services array\n"); goto fail_mem; } lcore_states = rte_calloc("rte_service_core_states", RTE_MAX_LCORE, sizeof(struct core_state), RTE_CACHE_LINE_SIZE); if (!lcore_states) { - printf("error allocating core states array\n"); + RTE_LOG(ERR, EAL, "error allocating core states array\n"); goto fail_mem; } @@ -108,10 +110,8 @@ int32_t rte_service_init(void) rte_service_library_initialized = 1; return 0; fail_mem: - if (rte_services) - rte_free(rte_services); - if (lcore_states) - rte_free(lcore_states); + rte_free(rte_services); + rte_free(lcore_states); return -ENOMEM; } @@ -121,11 +121,8 @@ rte_service_finalize(void) if (!rte_service_library_initialized) return; - if (rte_services) - rte_free(rte_services); - - if (lcore_states) - rte_free(lcore_states); + rte_free(rte_services); + rte_free(lcore_states); rte_service_library_initialized = 0; } @@ -397,8 +394,8 @@ rte_service_may_be_active(uint32_t id) return 0; } -int32_t rte_service_run_iter_on_app_lcore(uint32_t id, - uint32_t serialize_mt_unsafe) +int32_t +rte_service_run_iter_on_app_lcore(uint32_t id, uint32_t serialize_mt_unsafe) { /* run service on calling core, using all-ones as the service mask */ if (!service_valid(id)) -- 2.17.1
Re: [dpdk-dev] Sync up status for Mellanox PMD barrier investigation
Some update for this thread. In the most critical datapath of mlx5 PMD, there are some rte_cio_w/rmb, 'dmb osh' on aarch64, in use. C11 atomic is good for replacing the rte_smp_r/wmb to relax the data synchronization barrier between CPUs. However, mlx5 PMD needs to write data back to the HW, so it used a lot of rte_cio_r/wmb to synchronize data. Please check details below. All comments are welcomed. Thanks. Data path /// drivers/net/mlx5/mlx5_rxtx.c=950=mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t mbuf_prepare) drivers/net/mlx5/mlx5_rxtx.c:1002: rte_cio_wmb(); drivers/net/mlx5/mlx5_rxtx.c:1004: rte_cio_wmb(); drivers/net/mlx5/mlx5_rxtx.c:1010: rte_cio_wmb(); drivers/net/mlx5/mlx5_rxtx.c=1272=mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) drivers/net/mlx5/mlx5_rxtx.c:1385: rte_cio_wmb(); drivers/net/mlx5/mlx5_rxtx.c:1387: rte_cio_wmb(); drivers/net/mlx5/mlx5_rxtx.c=1549=mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) drivers/net/mlx5/mlx5_rxtx.c:1741: rte_cio_wmb(); drivers/net/mlx5/mlx5_rxtx.c:1745: rte_cio_wmb(); drivers/net/mlx5/mlx5_rxtx_vec_neon.h=366=rxq_burst_v(struct mlx5_rxq_data *rxq, struct rte_mbuf **pkts, uint16_t pkts_n, drivers/net/mlx5/mlx5_rxtx_vec_neon.h:530: rte_cio_rmb(); Commit messages: net/mlx5: cleanup memory barriers: mlx5_rx_burst https://git.dpdk.org/dpdk/commit/?id=9afa3f74658afc0e21fbe5c3884c55a21ff49299 net/mlx5: add Multi-Packet Rx support : mlx5_rx_burst_mprq https://git.dpdk.org/dpdk/commit/?id=7d6bf6b866b8c25ec06539b3eeed1db4f785577c net/mlx5: use coherent I/O memory barrier https://git.dpdk.org/dpdk/commit/drivers/net/mlx5/mlx5_rxtx.c?id=0cfdc1808de82357a924a479dc3f89de88cd91c2 net/mlx5: extend Rx completion with error handling https://git.dpdk.org/dpdk/commit/drivers/net/mlx5/mlx5_rxtx.c?id=88c0733535d6a7ce79045d4d57a1d78d904067c8 net/mlx5: fix synchronization on polling Rx completions https://git.dpdk.org/dpdk/commit/?id=1742c2d9fab07e66209f2d14e7daa50829fc4423 Thanks, Phil Yang From: Phil Yang (Arm Technology China) Sent: Thursday, August 15, 2019 6:35 PM To: Honnappa Nagarahalli Subject: Sync up status for Mellanox PMD barrier investigation Hi Honnappa, I have checked all the barriers in mlx5 PMD data path. In my understanding, it used the barrier correctly (Use DMB to synchronize the memory data between CPUs). The attachment is the list of positions of these barriers. I just want to sync up with you the status. Do you have any idea or suggestion on which part should we start to optimization? Best Regards, Phil Yang
[dpdk-dev] [PATCH 0/3] add unit tests for eal vfio library
1/3: fix vfio unmap that fails unexpectedly 2/3: fix vfio unmap that succeeds unexpectedly 3/3: add unit tests for eal vfio Signed-off-by: Chaitanya Babu Talluri Chaitanya Babu Talluri (3): lib/eal: fix vfio unmap that fails unexpectedly lib/eal: fix vfio unmap that succeeds unexpectedly app/test: add unit tests for eal vfio app/test/Makefile | 1 + app/test/meson.build| 2 + app/test/test_eal_vfio.c| 728 lib/librte_eal/linux/eal/eal_vfio.c | 59 ++- 4 files changed, 783 insertions(+), 7 deletions(-) create mode 100644 app/test/test_eal_vfio.c -- 2.17.2
Re: [dpdk-dev] [PATCH] eal: remove redundant error output
Stephen Hemminger writes: > The function rte_eal_init_alert ends up printing the same message > twice. Once via RTE_LOG and once to stderr. Remove the fprintf > to stderr since it is redundant. > > Signed-off-by: Stephen Hemminger > --- This was originally added at your suggestion: http://mails.dpdk.org/archives/dev/2017-January/056431.html Because sometimes we have these alerts before logging is up (so the RTE_LOG(...) won't show up, I gather). Is it possible to have an either/or? > lib/librte_eal/linux/eal/eal.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c > index 946222ccdb7a..076fb3cbde5f 100644 > --- a/lib/librte_eal/linux/eal/eal.c > +++ b/lib/librte_eal/linux/eal/eal.c > @@ -949,7 +949,6 @@ static int rte_eal_vfio_setup(void) > > static void rte_eal_init_alert(const char *msg) > { > - fprintf(stderr, "EAL: FATAL: %s\n", msg); > RTE_LOG(ERR, EAL, "%s\n", msg); > }
[dpdk-dev] [PATCH 1/3] lib/eal: fix vfio unmap that fails unexpectedly
Unmap of multiple pages fails after a sequence of partial map/unmaps. The scenario is that multiple maps are created in user_mem_maps, after multiple map/unmap/remap sequences. For an example, Steps: 1. Map 3 pages together 2. Un-map page1 3. Re-map page 1 4. Un-map page 2 5. Re-map page 2 6. Un-map page 3 7. Re-map page 3 8. Un-map all pages Unmap fails when there are duplicate entries in user_mem_maps. The fix is to validate if the input VA, IOVA exists in user_mem_maps before creating map. Fixes: 73a63908 ("vfio: allow to map other memory regions") Cc: sta...@dpdk.org Signed-off-by: Chaitanya Babu Talluri --- lib/librte_eal/linux/eal/eal_vfio.c | 46 + 1 file changed, 46 insertions(+) diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c index 501c74f23..104912077 100644 --- a/lib/librte_eal/linux/eal/eal_vfio.c +++ b/lib/librte_eal/linux/eal/eal_vfio.c @@ -212,6 +212,41 @@ find_user_mem_map(struct user_mem_maps *user_mem_maps, uint64_t addr, return NULL; } +static int +find_user_mem_map_overlap(struct user_mem_maps *user_mem_maps, uint64_t addr, + uint64_t iova, uint64_t len) +{ + uint64_t va_end = addr + len; + uint64_t iova_end = iova + len; + int i; + + for (i = 0; i < user_mem_maps->n_maps; i++) { + struct user_mem_map *map = &user_mem_maps->maps[i]; + uint64_t map_va_end = map->addr + map->len; + uint64_t map_iova_end = map->iova + map->len; + + bool no_lo_va_overlap = addr < map->addr && va_end <= map->addr; + bool no_hi_va_overlap = addr >= map_va_end && + va_end > map_va_end; + bool no_lo_iova_overlap = iova < map->iova && + iova_end <= map->iova; + bool no_hi_iova_overlap = iova >= map_iova_end && + iova_end > map_iova_end; + + /* check input VA and iova is not within the +* existing map's range +*/ + if ((no_lo_va_overlap || no_hi_va_overlap) && + (no_lo_iova_overlap || no_hi_iova_overlap)) + continue; + else + /* map overlaps */ + return 1; + } + /* map doesn't overlap */ + return 0; +} + /* this will sort all user maps, and merge/compact any adjacent maps */ static void compact_user_maps(struct user_mem_maps *user_mem_maps) @@ -1732,6 +1767,17 @@ container_dma_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, ret = -1; goto out; } + + /* check whether vaddr and iova exists in user_mem_maps */ + ret = find_user_mem_map_overlap(user_mem_maps, vaddr, iova, len); + if (ret) { + RTE_LOG(ERR, EAL, "Mapping overlaps with a previously " + "existing mapping\n"); + rte_errno = EEXIST; + ret = -1; + goto out; + } + /* map the entry */ if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) { /* technically, this will fail if there are currently no devices -- 2.17.2
[dpdk-dev] [PATCH 2/3] lib/eal: fix vfio unmap that succeeds unexpectedly
Un-map of page with valid virtual address and another page's IOVA succeeds unexpectedly. An entry in user_mem_maps can refer multiple pages. Currently in such case to unmap single page, VA and IOVA related to entry in user_mem_maps is checked but not based on page (based on the page size), this is the cause. The solution is that in find_user_mem_maps, check whether user input iova is in relation with input virtual address of the page which is to be unmapped. Fixes: 73a6390859 ("vfio: allow to map other memory regions") Cc: sta...@dpdk.org Signed-off-by: Chaitanya Babu Talluri --- lib/librte_eal/linux/eal/eal_vfio.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c index 104912077..04c284cb2 100644 --- a/lib/librte_eal/linux/eal/eal_vfio.c +++ b/lib/librte_eal/linux/eal/eal_vfio.c @@ -184,13 +184,13 @@ find_user_mem_map(struct user_mem_maps *user_mem_maps, uint64_t addr, uint64_t iova, uint64_t len) { uint64_t va_end = addr + len; - uint64_t iova_end = iova + len; int i; for (i = 0; i < user_mem_maps->n_maps; i++) { struct user_mem_map *map = &user_mem_maps->maps[i]; uint64_t map_va_end = map->addr + map->len; - uint64_t map_iova_end = map->iova + map->len; + uint64_t diff_addr_len = addr - map->addr; + uint64_t expected_iova = map->iova + diff_addr_len; /* check start VA */ if (addr < map->addr || addr >= map_va_end) @@ -199,11 +199,10 @@ find_user_mem_map(struct user_mem_maps *user_mem_maps, uint64_t addr, if (va_end <= map->addr || va_end > map_va_end) continue; - /* check start IOVA */ - if (iova < map->iova || iova >= map_iova_end) - continue; - /* check if IOVA end is within boundaries */ - if (iova_end <= map->iova || iova_end > map_iova_end) + /* check whether user input iova is in sync with +* user_mem_map entry's iova +*/ + if (expected_iova != iova) continue; /* we've found our map */ -- 2.17.2
[dpdk-dev] [PATCH 3/3] app/test: add unit tests for eal vfio
Unit test cases are added for eal vfio library. eal_vfio_autotest added to meson build file. Signed-off-by: Chaitanya Babu Talluri --- app/test/Makefile| 1 + app/test/meson.build | 2 + app/test/test_eal_vfio.c | 728 +++ 3 files changed, 731 insertions(+) create mode 100644 app/test/test_eal_vfio.c diff --git a/app/test/Makefile b/app/test/Makefile index 26ba6fe2b..9b9c78b4e 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -137,6 +137,7 @@ SRCS-y += test_cpuflags.c SRCS-y += test_mp_secondary.c SRCS-y += test_eal_flags.c SRCS-y += test_eal_fs.c +SRCS-y += test_eal_vfio.c SRCS-y += test_alarm.c SRCS-y += test_interrupts.c SRCS-y += test_version.c diff --git a/app/test/meson.build b/app/test/meson.build index ec40943bd..2ec9c863a 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -36,6 +36,7 @@ test_sources = files('commands.c', 'test_distributor_perf.c', 'test_eal_flags.c', 'test_eal_fs.c', + 'test_eal_vfio.c', 'test_efd.c', 'test_efd_perf.c', 'test_errno.c', @@ -175,6 +176,7 @@ fast_test_names = [ 'eal_flags_file_prefix_autotest', 'eal_flags_misc_autotest', 'eal_fs_autotest', + 'eal_vfio_autotest', 'errno_autotest', 'event_ring_autotest', 'func_reentrancy_autotest', diff --git a/app/test/test_eal_vfio.c b/app/test/test_eal_vfio.c new file mode 100644 index 0..8995573df --- /dev/null +++ b/app/test/test_eal_vfio.c @@ -0,0 +1,728 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2019 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "test.h" + +#if !defined(RTE_EXEC_ENV_LINUX) || !defined(RTE_EAL_VFIO) +static int +test_eal_vfio(void) +{ + printf("VFIO not supported, skipping test\n"); + return TEST_SKIPPED; +} + +#else + +#define PAGESIZE sysconf(_SC_PAGESIZE) +#define INVALID_CONTAINER_FD -5 +#define THREE_PAGES 3 +#define UNMAPPED_ADDR 0x1500 + +uint64_t virtaddr_64; +const char *name = "heap"; +size_t map_length; +int container_fds[RTE_MAX_VFIO_CONTAINERS]; + +static int +check_get_mem(void *addr, rte_iova_t *iova) +{ + const struct rte_memseg_list *msl; + const struct rte_memseg *ms; + rte_iova_t expected_iova; + + msl = rte_mem_virt2memseg_list(addr); + if (!msl->external) { + printf("%s():%i: Memseg list is not marked as " + "external\n", __func__, __LINE__); + return -1; + } + ms = rte_mem_virt2memseg(addr, msl); + if (ms == NULL) { + printf("%s():%i: Failed to retrieve memseg for " + "external mem\n", __func__, __LINE__); + return -1; + } + if (ms->addr != addr) { + printf("%s():%i: VA mismatch\n", __func__, __LINE__); + return -1; + } + expected_iova = (iova == NULL) ? RTE_BAD_IOVA : iova[0]; + if (ms->iova != expected_iova) { + printf("%s():%i: IOVA mismatch\n", __func__, __LINE__); + return -1; + } + return 0; +} + +/* Initialize container fds */ +static int +initialize_container_fds(void) +{ + int i = 0; + + for (i = 0; i < RTE_MAX_VFIO_CONTAINERS; i++) + container_fds[i] = -1; + + return TEST_SUCCESS; +} + +/* To test vfio container create */ +static int +test_vfio_container_create(void) +{ + int ret = 0, i = 0; + + /* check max containers limit */ + for (i = 1; i < RTE_MAX_VFIO_CONTAINERS; i++) { + container_fds[i] = rte_vfio_container_create(); + TEST_ASSERT(container_fds[i] > 0, "Test to check " + "rte_vfio_container_create with max " + "containers limit: Failed\n"); + } + + /* check rte_vfio_container_create when exceeds max containers limit */ + ret = rte_vfio_container_create(); + TEST_ASSERT(ret == -1, "Test to check " + "rte_vfio_container_create container " + "when exceeds limit: Failed\n"); + + return TEST_SUCCESS; +} + +/* To test vfio container destroy */ +static int +test_vfio_container_destroy(void) +{ + int i = 0, ret = 0; + + /* check to destroy max container limit */ + for (i = 1; i < RTE_MAX_VFIO_CONTAINERS; i++) { + ret = rte_vfio_container_destroy(container_fds[i]); + TEST_ASSERT(ret == 0, "Test to check " + "rte_vfio_container_destroy: Failed\n"); + container_fds[i] = -1; + } + + /* check rte_vfio_container_destroy with valid but non existing value */ + ret = rte_vfio_container_destroy(0); + TEST_ASSERT(ret == -1, "Test to check rte_vfio_container_destroy with " +
Re: [dpdk-dev] [PATCH 1/3] lib/eal: fix vfio unmap that fails unexpectedly
On 21-Aug-19 2:02 PM, Chaitanya Babu Talluri wrote: Unmap of multiple pages fails after a sequence of partial map/unmaps. The scenario is that multiple maps are created in user_mem_maps, after multiple map/unmap/remap sequences. For an example, Steps: 1. Map 3 pages together 2. Un-map page1 3. Re-map page 1 4. Un-map page 2 5. Re-map page 2 6. Un-map page 3 7. Re-map page 3 8. Un-map all pages I don't think this description is correct in relation to what is being fixed here. The code attempts to prevent overlaps, but there are no overlaps in the above example - none of the above operations would trigger the added code. -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH 2/3] lib/eal: fix vfio unmap that succeeds unexpectedly
On 21-Aug-19 2:02 PM, Chaitanya Babu Talluri wrote: Un-map of page with valid virtual address and another page's IOVA succeeds unexpectedly. An entry in user_mem_maps can refer multiple pages. Currently in such case to unmap single page, VA and IOVA related to entry in user_mem_maps is checked but not based on page (based on the page size), this is the cause. The solution is that in find_user_mem_maps, check whether user input iova is in relation with input virtual address of the page which is to be unmapped. The description could be clearer. Suggested rewording: Unmapping page with a VA that is found in the list of current mappings will succeed even if the IOVA for the chunk that is being unmapped, is mismatched. Fix it by checking if IOVA address matches the expected IOVA address exactly. -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH 3/3] app/test: add unit tests for eal vfio
Chaitanya Babu Talluri writes: > Unit test cases are added for eal vfio library. > eal_vfio_autotest added to meson build file. > > Signed-off-by: Chaitanya Babu Talluri > --- Thanks for adding unit tests for the vfio library. In this case, there seems to be some failures - can you help determine the cause: https://travis-ci.com/ovsrobot/dpdk/jobs/227066776
[dpdk-dev] [PATCH] net/vmxnet3: fix RSS setting on v4
When calling to setup RSS on v4 API, ESX will expect IPv4/6 TCP RSS to be set/requested mandatorily. This patch will: - Set IPv4/6 TCP RSS when these have not been set. A warning message is thrown to make sure we warn the application we are setting IPv4/6 TCP RSS when not set. - An additional check has been added to dodge RSS configuration altogether unless MQ_RSS has been requested, similar to v3. The alternative (returning error) was considered, the intent is to ease the task of setting up and running vmxnet3 in situations where it's supposted to be most strightforward (testpmd, pktgen). Signed-off-by: Eduard Serra --- drivers/net/vmxnet3/vmxnet3_ethdev.c | 3 ++- drivers/net/vmxnet3/vmxnet3_ethdev.h | 4 drivers/net/vmxnet3/vmxnet3_rxtx.c | 8 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c index 57feb37..0a7047e 100644 --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c @@ -769,7 +769,8 @@ vmxnet3_dev_start(struct rte_eth_dev *dev) PMD_INIT_LOG(DEBUG, "Failed to setup memory region\n"); } - if (VMXNET3_VERSION_GE_4(hw)) { + if (VMXNET3_VERSION_GE_4(hw) && + dev->data->dev_conf.rxmode.mq_mode == ETH_MQ_RX_RSS) { /* Check for additional RSS */ ret = vmxnet3_v4_rss_configure(dev); if (ret != VMXNET3_SUCCESS) { diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.h b/drivers/net/vmxnet3/vmxnet3_ethdev.h index 8c2b6f8..6e3ce7d 100644 --- a/drivers/net/vmxnet3/vmxnet3_ethdev.h +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.h @@ -38,6 +38,10 @@ ETH_RSS_NONFRAG_IPV4_UDP | \ ETH_RSS_NONFRAG_IPV6_UDP) +#define VMXNET3_MANDATORY_V4_RSS ( \ + ETH_RSS_NONFRAG_IPV4_TCP | \ + ETH_RSS_NONFRAG_IPV6_TCP) + /* RSS configuration structure - shared with device through GPA */ typedef struct VMXNET3_RSSConf { uint16_t hashType; diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c index 7794d74..dd99684 100644 --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c @@ -1311,6 +1311,14 @@ vmxnet3_v4_rss_configure(struct rte_eth_dev *dev) cmdInfo->setRSSFields = 0; port_rss_conf = &dev->data->dev_conf.rx_adv_conf.rss_conf; + + if ((port_rss_conf->rss_hf & VMXNET3_MANDATORY_V4_RSS) != + VMXNET3_MANDATORY_V4_RSS) { + PMD_INIT_LOG(WARNING, "RSS: IPv4/6 TCP is required for vmxnet3 v4 RSS," +"automatically setting it"); + port_rss_conf->rss_hf |= VMXNET3_MANDATORY_V4_RSS; + } + rss_hf = port_rss_conf->rss_hf & (VMXNET3_V4_RSS_MASK | VMXNET3_RSS_OFFLOAD_ALL); -- 2.7.4
[dpdk-dev] [PATCH] timer: remove check_tsc_flags()
This code was added 7+ years ago (commit fb022b85ba), presumably when variant TSCs were still somewhat common? But this code doesn't do anything except print a warning, and the warning doesn't give any kind of advice to the user, so let's just remove it. While the warning has no functional meaning, the /proc/cpuinfo parsing consumes a non-trivial amount of time which is especially noticeable in secondary processes. On my test system, it consumes 21ms out of the 66ms total execution time for rte_eal_init() in a secondary process. Signed-off-by: Jim Harris --- lib/librte_eal/linux/eal/eal_timer.c | 36 -- 1 file changed, 36 deletions(-) diff --git a/lib/librte_eal/linux/eal/eal_timer.c b/lib/librte_eal/linux/eal/eal_timer.c index 76ec17034..a904a8297 100644 --- a/lib/librte_eal/linux/eal/eal_timer.c +++ b/lib/librte_eal/linux/eal/eal_timer.c @@ -192,41 +192,6 @@ rte_eal_hpet_init(int make_default) } #endif -static void -check_tsc_flags(void) -{ - char line[512]; - FILE *stream; - - stream = fopen("/proc/cpuinfo", "r"); - if (!stream) { - RTE_LOG(WARNING, EAL, "WARNING: Unable to open /proc/cpuinfo\n"); - return; - } - - while (fgets(line, sizeof line, stream)) { - char *constant_tsc; - char *nonstop_tsc; - - if (strncmp(line, "flags", 5) != 0) - continue; - - constant_tsc = strstr(line, "constant_tsc"); - nonstop_tsc = strstr(line, "nonstop_tsc"); - if (!constant_tsc || !nonstop_tsc) - RTE_LOG(WARNING, EAL, - "WARNING: cpu flags " - "constant_tsc=%s " - "nonstop_tsc=%s " - "-> using unreliable clock cycles !\n", - constant_tsc ? "yes":"no", - nonstop_tsc ? "yes":"no"); - break; - } - - fclose(stream); -} - uint64_t get_tsc_freq(void) { @@ -263,6 +228,5 @@ rte_eal_timer_init(void) eal_timer_source = EAL_TIMER_TSC; set_tsc_freq(); - check_tsc_flags(); return 0; }
[dpdk-dev] [PATCH v2] timer: remove check_tsc_flags()
This code was added 7+ years ago: commit fb022b85bae4 ("timer: check TSC reliability") presumably when variant TSCs were still somewhat common? But this code doesn't do anything except print a warning, and the warning doesn't give any kind of advice to the user, so let's just remove it. While the warning has no functional meaning, the /proc/cpuinfo parsing consumes a non-trivial amount of time which is especially noticeable in secondary processes. On my test system, it consumes 21ms out of the 66ms total execution time for rte_eal_init() in a secondary process. Signed-off-by: Jim Harris --- lib/librte_eal/linux/eal/eal_timer.c | 36 -- 1 file changed, 36 deletions(-) diff --git a/lib/librte_eal/linux/eal/eal_timer.c b/lib/librte_eal/linux/eal/eal_timer.c index 76ec17034..a904a8297 100644 --- a/lib/librte_eal/linux/eal/eal_timer.c +++ b/lib/librte_eal/linux/eal/eal_timer.c @@ -192,41 +192,6 @@ rte_eal_hpet_init(int make_default) } #endif -static void -check_tsc_flags(void) -{ - char line[512]; - FILE *stream; - - stream = fopen("/proc/cpuinfo", "r"); - if (!stream) { - RTE_LOG(WARNING, EAL, "WARNING: Unable to open /proc/cpuinfo\n"); - return; - } - - while (fgets(line, sizeof line, stream)) { - char *constant_tsc; - char *nonstop_tsc; - - if (strncmp(line, "flags", 5) != 0) - continue; - - constant_tsc = strstr(line, "constant_tsc"); - nonstop_tsc = strstr(line, "nonstop_tsc"); - if (!constant_tsc || !nonstop_tsc) - RTE_LOG(WARNING, EAL, - "WARNING: cpu flags " - "constant_tsc=%s " - "nonstop_tsc=%s " - "-> using unreliable clock cycles !\n", - constant_tsc ? "yes":"no", - nonstop_tsc ? "yes":"no"); - break; - } - - fclose(stream); -} - uint64_t get_tsc_freq(void) { @@ -263,6 +228,5 @@ rte_eal_timer_init(void) eal_timer_source = EAL_TIMER_TSC; set_tsc_freq(); - check_tsc_flags(); return 0; }
[dpdk-dev] [PATCH] timer: use rte_mp_msg to get freq from primary process
Ideally, get_tsc_freq_arch() is able to provide the TSC rate using architecture-specific means. When that is not possible, DPDK reverts to calculating the TSC rate with a 100ms nanosleep or 1s sleep. The latter occurs more frequently in VMs which often do not have access to the data they need from arch-specific means (CPUID leaf 0x15 or MSR 0xCE on x86). In secondary processes, the extra 100ms is especially noticeable and consumes the bulk of rte_eal_init() execution time. So in secondary processes, if we cannot get the TSC rate using get_tsc_freq_arch(), try to get the TSC rate from the primary process instead using rte_mp_msg. This is much faster than 100ms. Reduces rte_eal_init() execution time in a secondary process from 165ms to 66ms on my test system. Signed-off-by: Jim Harris Change-Id: I584419ed1c7d6f47841e0a0eb23f34c9f1186d35 --- lib/librte_eal/common/eal_common_timer.c | 61 ++ 1 file changed, 61 insertions(+) diff --git a/lib/librte_eal/common/eal_common_timer.c b/lib/librte_eal/common/eal_common_timer.c index 145543de7..4c58cea6e 100644 --- a/lib/librte_eal/common/eal_common_timer.c +++ b/lib/librte_eal/common/eal_common_timer.c @@ -15,9 +15,16 @@ #include #include #include +#include #include "eal_private.h" +#define EAL_TIMER_MP "eal_timer_mp_sync" + +struct timer_mp_param { + uint64_t tsc_hz; +}; + /* The frequency of the RDTSC timer resolution */ static uint64_t eal_tsc_resolution_hz; @@ -74,12 +81,58 @@ estimate_tsc_freq(void) return RTE_ALIGN_MUL_NEAR(rte_rdtsc() - start, CYC_PER_10MHZ); } +static uint64_t +get_tsc_freq_from_primary(void) +{ + struct rte_mp_msg mp_req = {0}; + struct rte_mp_reply mp_reply = {0}; + struct timer_mp_param *r; + struct timespec ts = {.tv_sec = 1, .tv_nsec = 0}; + uint64_t tsc_hz; + + strcpy(mp_req.name, EAL_TIMER_MP); + if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) || + mp_reply.nb_received != 1) { + tsc_hz = 0; + } else { + r = (struct timer_mp_param *)mp_reply.msgs[0].param; + tsc_hz = r->tsc_hz; + } + + free(mp_reply.msgs); + return tsc_hz; +} + +static int +timer_mp_primary(__attribute__((unused)) const struct rte_mp_msg *msg, +const void *peer) +{ + struct rte_mp_msg reply = {0}; + struct timer_mp_param *r = (struct timer_mp_param *)reply.param; + + r->tsc_hz = eal_tsc_resolution_hz; + strcpy(reply.name, EAL_TIMER_MP); + reply.len_param = sizeof(*r); + + return rte_mp_reply(&reply, peer); +} + void set_tsc_freq(void) { uint64_t freq; + int rc; freq = get_tsc_freq_arch(); + if (!freq && rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* We couldn't get the TSC frequency through arch-specific +* means. If this is a secondary process, try to get the +* TSC frequency from the primary process - this will +* be much faster than get_tsc_freq() or estimate_tsc_freq() +* below. +*/ + freq = get_tsc_freq_from_primary(); + } if (!freq) freq = get_tsc_freq(); if (!freq) @@ -87,6 +140,14 @@ set_tsc_freq(void) RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000); eal_tsc_resolution_hz = freq; + if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + rc = rte_mp_action_register(EAL_TIMER_MP, timer_mp_primary); + if (rc) { + RTE_LOG(WARNING, EAL, "Could not register mp_action - " + "secondary processes will calculate TSC rate " + "independently.\n"); + } + } } void rte_delay_us_callback_register(void (*userfunc)(unsigned int))
[dpdk-dev] [PATCH v2 0/7] ethdev: add new Rx offload flags
From: Pavan Nikhilesh Add new Rx offload flags `DEV_RX_OFFLOAD_RSS_HASH` and `DEV_RX_OFFLOAD_FLOW_MARK`. These flags can be used to enable/disable PMD writes to rte_mbuf fields `hash.rss` and `hash.fdir.hi` and also `ol_flags:PKT_RX_RSS` and `ol_flags:PKT_RX_FDIR`. Add new packet type set function `rte_eth_dev_set_supported_ptypes`, allows application to inform PMDs about the packet types it is interested in. Based on ptypes requested by application PMDs can optimize the Rx path. For example, if a given PMD doesn't support any packet types that the application is interested in then the application can disable[1] writes to `mbuf.packet_type` done by the PMD and use a software ptype parser. [1] rte_eth_dev_set_supported_ptypes(*port_id*, 0); v2 Changes: -- - Update release notes. (Andrew) - Redo commit logs. (Andrew) - Disable ptype parsing for unsupported examples. (Jerin) - Disable RSS write only in generic mode eventdev_pipeline. (Jerin) - Modify set_supported_ptypes function to return successfuly set mask instead of failure. - Dropped set_supported_ptypes to drivers by handling in library layer, interested PMD can add it in. Pavan Nikhilesh (7): ethdev: add set ptype function ethdev: add mbuf RSS update as an offload ethdev: add flow action type update as an offload drivers/net: update Rx RSS hash offload capabilities drivers/net: update Rx flow flag and mark offload capabilities examples/eventdev_pipeline: add new Rx RSS hash offload examples: disable Rx packet type parsing doc/guides/nics/features.rst | 24 +++- doc/guides/rel_notes/release_19_11.rst| 7 ++ drivers/net/bnxt/bnxt_ethdev.c| 4 +- drivers/net/cxgbe/cxgbe.h | 3 +- drivers/net/dpaa/dpaa_ethdev.c| 3 +- drivers/net/dpaa2/dpaa2_ethdev.c | 3 +- drivers/net/e1000/igb_rxtx.c | 3 +- drivers/net/enic/enic_res.c | 4 +- drivers/net/fm10k/fm10k_ethdev.c | 3 +- drivers/net/hinic/hinic_pmd_ethdev.c | 3 +- drivers/net/i40e/i40e_ethdev.c| 4 +- drivers/net/iavf/iavf_ethdev.c| 4 +- drivers/net/ice/ice_ethdev.c | 4 +- drivers/net/ixgbe/ixgbe_rxtx.c| 4 +- drivers/net/liquidio/lio_ethdev.c | 3 +- drivers/net/mlx4/mlx4_rxq.c | 3 +- drivers/net/mlx5/mlx5_rxq.c | 4 +- drivers/net/netvsc/hn_rndis.c | 3 +- drivers/net/nfp/nfp_net.c | 3 +- drivers/net/octeontx2/otx2_ethdev.c | 3 +- drivers/net/octeontx2/otx2_ethdev.h | 16 +-- drivers/net/octeontx2/otx2_flow_parse.c | 3 +- drivers/net/qede/qede_ethdev.c| 3 +- drivers/net/sfc/sfc_ef10_essb_rx.c| 3 +- drivers/net/sfc/sfc_ef10_rx.c | 3 +- drivers/net/sfc/sfc_rx.c | 4 +- drivers/net/thunderx/nicvf_ethdev.h | 3 +- drivers/net/vmxnet3/vmxnet3_ethdev.c | 3 +- examples/bbdev_app/main.c | 1 + examples/bond/main.c | 2 + examples/distributor/Makefile | 1 + examples/distributor/main.c | 1 + examples/eventdev_pipeline/main.c | 114 + .../pipeline_worker_generic.c | 118 ++ .../eventdev_pipeline/pipeline_worker_tx.c| 114 + examples/exception_path/Makefile | 1 + examples/exception_path/main.c| 1 + examples/flow_classify/flow_classify.c| 1 + examples/flow_filtering/Makefile | 1 + examples/flow_filtering/main.c| 1 + examples/ip_pipeline/link.c | 1 + examples/ip_reassembly/Makefile | 1 + examples/ip_reassembly/main.c | 1 + examples/ipsec-secgw/ipsec-secgw.c| 1 + examples/ipv4_multicast/Makefile | 1 + examples/ipv4_multicast/main.c| 1 + examples/kni/main.c | 1 + examples/l2fwd-cat/Makefile | 1 + examples/l2fwd-cat/l2fwd-cat.c| 1 + examples/l2fwd-crypto/main.c | 1 + examples/l2fwd-jobstats/Makefile | 1 + examples/l2fwd-jobstats/main.c| 1 + examples/l2fwd-keepalive/Makefile | 1 + examples/l2fwd-keepalive/main.c | 1 + examples/l2fwd/Makefile | 1 + examples/l2fwd/main.c | 1 + examples/l3fwd-acl/Makefile | 1 + examples/l3fwd-acl/main.c | 1 + examples/l3fwd-power/main.c | 1 + examples/l3fwd-vf/Makefile| 1 + examples/l3fwd-vf/main.c | 1 + examples/link_statu
[dpdk-dev] [PATCH v2 1/7] ethdev: add set ptype function
From: Pavan Nikhilesh Add `rte_eth_dev_set_supported_ptypes` function that will allow the application to inform the PMD the packet types it is interested in. Based on the ptypes set PMDs can optimize their Rx path. -If application doesn’t want any ptype information it can call `rte_eth_dev_set_supported_ptypes(ethdev_id, RTE_PTYPE_UNKNOWN)` and PMD will set rte_mbuf::packet_type to 0. -If application doesn’t call `rte_eth_dev_set_supported_ptypes` PMD can return `rte_mbuf::packet_type` with `rte_eth_dev_get_supported_ptypes`. -If application is interested only in L2/L3 layer, it can inform the PMD to update `rte_mbuf::packet_type` with L2/L3 ptype by calling `rte_eth_dev_set_supported_ptypes(ethdev_id, RTE_PTYPE_L2_MASK | RTE_PTYPE_L3_MASK)`. Suggested-by: Konstantin Ananyev Signed-off-by: Pavan Nikhilesh --- doc/guides/nics/features.rst | 12 ++--- doc/guides/rel_notes/release_19_11.rst | 7 ++ lib/librte_ethdev/rte_ethdev.c | 32 lib/librte_ethdev/rte_ethdev.h | 16 lib/librte_ethdev/rte_ethdev_core.h | 6 + lib/librte_ethdev/rte_ethdev_version.map | 3 +++ 6 files changed, 72 insertions(+), 4 deletions(-) diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index c4e128d2f..d4d55f721 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -582,10 +582,14 @@ Supports inner packet L4 checksum. Packet type parsing --- -Supports packet type parsing and returns a list of supported types. - -* **[implements] eth_dev_ops**: ``dev_supported_ptypes_get``. -* **[related]API**: ``rte_eth_dev_get_supported_ptypes()``. +Supports packet type parsing and returns a list of supported types. Allows +application to set ptypes it is interested in. + +* **[implements] eth_dev_ops**: ``dev_supported_ptypes_get``, + ``dev_supported_ptypes_set``. +* **[related]API**: ``rte_eth_dev_get_supported_ptypes()``, + ``rte_eth_dev_set_supported_ptypes()``. +* **[provides] mbuf**: ``mbuf.packet_type``. .. _nic_features_timesync: diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst index 8490d897c..a7cec1fe8 100644 --- a/doc/guides/rel_notes/release_19_11.rst +++ b/doc/guides/rel_notes/release_19_11.rst @@ -56,6 +56,13 @@ New Features Also, make sure to start the actual text at the margin. = +* **Added API in ethdev to set supported packet types** + + * Added new API ``rte_eth_dev_set_supported_ptypes`` that allows an + application to request PMD to set specific ptypes defined + through ``rte_eth_dev_set_supported_ptypes`` in ``rte_mbuf::packet_type``. + * This scheme will allow PMDs to avoid lookup to internal ptype table on Rx + and thereby improve Rx performance if application wishes do so. Removed Items - diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 17d183e1f..f529cbe9f 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -2602,6 +2602,38 @@ rte_eth_dev_get_supported_ptypes(uint16_t port_id, uint32_t ptype_mask, return j; } +uint32_t +rte_eth_dev_set_supported_ptypes(uint16_t port_id, uint32_t ptype_mask) +{ + int i; + struct rte_eth_dev *dev; + const uint32_t *all_ptypes; + uint32_t all_ptype_mask = 0; + uint32_t supp_ptype_mask = 0; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = &rte_eth_devices[port_id]; + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_supported_ptypes_get, 0); + + if (ptype_mask == 0) { + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_supported_ptypes_set, + 0); + return (*dev->dev_ops->dev_supported_ptypes_set)(dev, + ptype_mask); + } + + all_ptypes = (*dev->dev_ops->dev_supported_ptypes_get)(dev); + if (all_ptypes == NULL) + return 0; + + for (i = 0; all_ptypes[i] != RTE_PTYPE_UNKNOWN; ++i) + all_ptype_mask |= all_ptypes[i]; + + supp_ptype_mask = all_ptype_mask & ptype_mask; + + return (*dev->dev_ops->dev_supported_ptypes_set)(dev, supp_ptype_mask); +} + void rte_eth_macaddr_get(uint16_t port_id, struct rte_ether_addr *mac_addr) { diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index dc6596bc9..1ab0af4d8 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -2431,6 +2431,22 @@ int rte_eth_dev_fw_version_get(uint16_t port_id, */ int rte_eth_dev_get_supported_ptypes(uint16_t port_id, uint32_t ptype_mask, uint32_t *ptypes, int num); +/** + * Request Ethernet device to set only specific packet types in the packet. + * + * Application can use this function to set only specific ptypes
[dpdk-dev] [PATCH v2 2/7] ethdev: add mbuf RSS update as an offload
From: Pavan Nikhilesh Add new Rx offload flag `DEV_RX_OFFLOAD_RSS_HASH` which can be used to enable/disable PMDs write to `rte_mbuf::hash::rss`. PMDs notify the validity of `rte_mbuf::hash:rss` to the applcation by enabling `PKT_RX_RSS_HASH ` flag in `rte_mbuf::ol_flags`. Signed-off-by: Pavan Nikhilesh Reviewed-by: Andrew Rybchenko --- doc/guides/nics/features.rst | 2 ++ lib/librte_ethdev/rte_ethdev.c | 1 + lib/librte_ethdev/rte_ethdev.h | 1 + 3 files changed, 4 insertions(+) diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index d4d55f721..f79b69b38 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -274,6 +274,7 @@ Supports RSS hashing on RX. * **[uses] user config**: ``dev_conf.rxmode.mq_mode`` = ``ETH_MQ_RX_RSS_FLAG``. * **[uses] user config**: ``dev_conf.rx_adv_conf.rss_conf``. +* **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_RSS_HASH``. * **[provides] rte_eth_dev_info**: ``flow_type_rss_offloads``. * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_RSS_HASH``, ``mbuf.rss``. @@ -286,6 +287,7 @@ Inner RSS Supports RX RSS hashing on Inner headers. * **[uses]rte_flow_action_rss**: ``level``. +* **[uses]rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_RSS_HASH``. * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_RSS_HASH``, ``mbuf.rss``. diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index f529cbe9f..9c5517d5f 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -129,6 +129,7 @@ static const struct { RTE_RX_OFFLOAD_BIT2STR(KEEP_CRC), RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM), RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM), + RTE_RX_OFFLOAD_BIT2STR(RSS_HASH), }; #undef RTE_RX_OFFLOAD_BIT2STR diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index 1ab0af4d8..836b30074 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -1013,6 +1013,7 @@ struct rte_eth_conf { #define DEV_RX_OFFLOAD_KEEP_CRC0x0001 #define DEV_RX_OFFLOAD_SCTP_CKSUM 0x0002 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM 0x0004 +#define DEV_RX_OFFLOAD_RSS_HASH0x0008 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \ DEV_RX_OFFLOAD_UDP_CKSUM | \ -- 2.22.0
[dpdk-dev] [PATCH v2 4/7] drivers/net: update Rx RSS hash offload capabilities
From: Pavan Nikhilesh Add DEV_RX_OFFLOAD_RSS_HASH flag for all PMDs that support RSS hash delivery. Signed-off-by: Pavan Nikhilesh --- drivers/net/bnxt/bnxt_ethdev.c | 3 ++- drivers/net/cxgbe/cxgbe.h| 3 ++- drivers/net/dpaa/dpaa_ethdev.c | 3 ++- drivers/net/dpaa2/dpaa2_ethdev.c | 3 ++- drivers/net/e1000/igb_rxtx.c | 3 ++- drivers/net/enic/enic_res.c | 3 ++- drivers/net/fm10k/fm10k_ethdev.c | 3 ++- drivers/net/hinic/hinic_pmd_ethdev.c | 3 ++- drivers/net/i40e/i40e_ethdev.c | 3 ++- drivers/net/iavf/iavf_ethdev.c | 3 ++- drivers/net/ice/ice_ethdev.c | 3 ++- drivers/net/ixgbe/ixgbe_rxtx.c | 3 ++- drivers/net/liquidio/lio_ethdev.c| 3 ++- drivers/net/mlx4/mlx4_rxq.c | 3 ++- drivers/net/mlx5/mlx5_rxq.c | 3 ++- drivers/net/netvsc/hn_rndis.c| 3 ++- drivers/net/nfp/nfp_net.c| 3 ++- drivers/net/octeontx2/otx2_ethdev.c | 3 ++- drivers/net/octeontx2/otx2_ethdev.h | 15 --- drivers/net/qede/qede_ethdev.c | 3 ++- drivers/net/sfc/sfc_ef10_essb_rx.c | 2 +- drivers/net/sfc/sfc_ef10_rx.c| 3 ++- drivers/net/sfc/sfc_rx.c | 3 ++- drivers/net/thunderx/nicvf_ethdev.h | 3 ++- drivers/net/vmxnet3/vmxnet3_ethdev.c | 3 ++- 25 files changed, 55 insertions(+), 31 deletions(-) diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 6685ee7d9..6c106baf7 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -160,7 +160,8 @@ static const struct rte_pci_id bnxt_pci_id_map[] = { DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM | \ DEV_RX_OFFLOAD_JUMBO_FRAME | \ DEV_RX_OFFLOAD_KEEP_CRC | \ -DEV_RX_OFFLOAD_TCP_LRO) +DEV_RX_OFFLOAD_TCP_LRO | \ +DEV_RX_OFFLOAD_RSS_HASH) static int bnxt_vlan_offload_set_op(struct rte_eth_dev *dev, int mask); static void bnxt_print_link_info(struct rte_eth_dev *eth_dev); diff --git a/drivers/net/cxgbe/cxgbe.h b/drivers/net/cxgbe/cxgbe.h index 3f97fa58b..22e61a55c 100644 --- a/drivers/net/cxgbe/cxgbe.h +++ b/drivers/net/cxgbe/cxgbe.h @@ -47,7 +47,8 @@ DEV_RX_OFFLOAD_UDP_CKSUM | \ DEV_RX_OFFLOAD_TCP_CKSUM | \ DEV_RX_OFFLOAD_JUMBO_FRAME | \ - DEV_RX_OFFLOAD_SCATTER) + DEV_RX_OFFLOAD_SCATTER | \ + DEV_RX_OFFLOAD_RSS_HASH) #define CXGBE_DEVARG_KEEP_OVLAN "keep_ovlan" diff --git a/drivers/net/dpaa/dpaa_ethdev.c b/drivers/net/dpaa/dpaa_ethdev.c index 7154fb9b4..18c7bd0d5 100644 --- a/drivers/net/dpaa/dpaa_ethdev.c +++ b/drivers/net/dpaa/dpaa_ethdev.c @@ -49,7 +49,8 @@ /* Supported Rx offloads */ static uint64_t dev_rx_offloads_sup = DEV_RX_OFFLOAD_JUMBO_FRAME | - DEV_RX_OFFLOAD_SCATTER; + DEV_RX_OFFLOAD_SCATTER | + DEV_RX_OFFLOAD_RSS_HASH; /* Rx offloads which cannot be disabled */ static uint64_t dev_rx_offloads_nodis = diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c index dd6a78f9f..55a1c4455 100644 --- a/drivers/net/dpaa2/dpaa2_ethdev.c +++ b/drivers/net/dpaa2/dpaa2_ethdev.c @@ -38,7 +38,8 @@ static uint64_t dev_rx_offloads_sup = DEV_RX_OFFLOAD_TCP_CKSUM | DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM | DEV_RX_OFFLOAD_VLAN_FILTER | - DEV_RX_OFFLOAD_JUMBO_FRAME; + DEV_RX_OFFLOAD_JUMBO_FRAME | + DEV_RX_OFFLOAD_RSS_HASH; /* Rx offloads which cannot be disabled */ static uint64_t dev_rx_offloads_nodis = diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c index c5606de5d..684fa4ad8 100644 --- a/drivers/net/e1000/igb_rxtx.c +++ b/drivers/net/e1000/igb_rxtx.c @@ -1646,7 +1646,8 @@ igb_get_rx_port_offloads_capa(struct rte_eth_dev *dev) DEV_RX_OFFLOAD_TCP_CKSUM | DEV_RX_OFFLOAD_JUMBO_FRAME | DEV_RX_OFFLOAD_KEEP_CRC| - DEV_RX_OFFLOAD_SCATTER; + DEV_RX_OFFLOAD_SCATTER | + DEV_RX_OFFLOAD_RSS_HASH; return rx_offload_capa; } diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c index 9405e1933..607a085f8 100644 --- a/drivers/net/enic/enic_res.c +++ b/drivers/net/enic/enic_res.c @@ -198,7 +198,8 @@ int enic_get_vnic_config(struct enic *enic) DEV_RX_OFFLOAD_VLAN_STRIP | DEV_RX_OFFLOAD_IPV4_CKSUM | DEV_RX_OFFLOAD_UDP_CKSUM | - DEV_RX_OFFLOAD_TCP_CKSUM; + DEV_RX_OFFLOAD_TCP_CKSUM | + DEV_RX_OFFLOAD_RSS_HASH; enic->tx_offload_mask =
[dpdk-dev] [PATCH v2 5/7] drivers/net: update Rx flow flag and mark capabilities
From: Pavan Nikhilesh Add DEV_RX_OFFLOAD_FLOW_MARK flag for all PMDs that support flow action flag and mark. Signed-off-by: Pavan Nikhilesh --- drivers/net/bnxt/bnxt_ethdev.c | 3 ++- drivers/net/enic/enic_res.c | 3 ++- drivers/net/i40e/i40e_ethdev.c | 3 ++- drivers/net/iavf/iavf_ethdev.c | 3 ++- drivers/net/ice/ice_ethdev.c| 3 ++- drivers/net/ixgbe/ixgbe_rxtx.c | 3 ++- drivers/net/mlx5/mlx5_rxq.c | 3 ++- drivers/net/octeontx2/otx2_ethdev.h | 3 ++- drivers/net/octeontx2/otx2_flow_parse.c | 3 ++- drivers/net/sfc/sfc_ef10_essb_rx.c | 3 ++- drivers/net/sfc/sfc_rx.c| 3 ++- 11 files changed, 22 insertions(+), 11 deletions(-) diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 6c106baf7..fd1fb7eda 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -161,7 +161,8 @@ static const struct rte_pci_id bnxt_pci_id_map[] = { DEV_RX_OFFLOAD_JUMBO_FRAME | \ DEV_RX_OFFLOAD_KEEP_CRC | \ DEV_RX_OFFLOAD_TCP_LRO | \ -DEV_RX_OFFLOAD_RSS_HASH) +DEV_RX_OFFLOAD_RSS_HASH | \ +DEV_RX_OFFLOAD_FLOW_MARK) static int bnxt_vlan_offload_set_op(struct rte_eth_dev *dev, int mask); static void bnxt_print_link_info(struct rte_eth_dev *eth_dev); diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c index 607a085f8..3503d5d7e 100644 --- a/drivers/net/enic/enic_res.c +++ b/drivers/net/enic/enic_res.c @@ -199,7 +199,8 @@ int enic_get_vnic_config(struct enic *enic) DEV_RX_OFFLOAD_IPV4_CKSUM | DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM | - DEV_RX_OFFLOAD_RSS_HASH; + DEV_RX_OFFLOAD_RSS_HASH | + DEV_RX_OFFLOAD_FLOW_MARK; enic->tx_offload_mask = PKT_TX_IPV6 | PKT_TX_IPV4 | diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 7058e0213..6311943be 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -3512,7 +3512,8 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) DEV_RX_OFFLOAD_VLAN_EXTEND | DEV_RX_OFFLOAD_VLAN_FILTER | DEV_RX_OFFLOAD_JUMBO_FRAME | - DEV_RX_OFFLOAD_RSS_HASH; + DEV_RX_OFFLOAD_RSS_HASH | + DEV_RX_OFFLOAD_FLOW_MARK; dev_info->tx_queue_offload_capa = DEV_TX_OFFLOAD_MBUF_FAST_FREE; dev_info->tx_offload_capa = diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c index aef91a79b..7bdaa87b1 100644 --- a/drivers/net/iavf/iavf_ethdev.c +++ b/drivers/net/iavf/iavf_ethdev.c @@ -518,7 +518,8 @@ iavf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) DEV_RX_OFFLOAD_SCATTER | DEV_RX_OFFLOAD_JUMBO_FRAME | DEV_RX_OFFLOAD_VLAN_FILTER | - DEV_RX_OFFLOAD_RSS_HASH; + DEV_RX_OFFLOAD_RSS_HASH | + DEV_RX_OFFLOAD_FLOW_MARK; dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT | DEV_TX_OFFLOAD_QINQ_INSERT | diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index fc0f0003f..8b8d55e4a 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -2134,7 +2134,8 @@ ice_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) DEV_RX_OFFLOAD_QINQ_STRIP | DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM | DEV_RX_OFFLOAD_VLAN_EXTEND | - DEV_RX_OFFLOAD_RSS_HASH; + DEV_RX_OFFLOAD_RSS_HASH | + DEV_RX_OFFLOAD_FLOW_MARK; dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_QINQ_INSERT | DEV_TX_OFFLOAD_IPV4_CKSUM | diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index fa572d184..1481e2426 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -2873,7 +2873,8 @@ ixgbe_get_rx_port_offloads(struct rte_eth_dev *dev) DEV_RX_OFFLOAD_JUMBO_FRAME | DEV_RX_OFFLOAD_VLAN_FILTER | DEV_RX_OFFLOAD_SCATTER | - DEV_RX_OFFLOAD_RSS_HASH; + DEV_RX_OFFLOAD_RSS_HASH | + DEV_RX_OFFLOAD_FLOW_MARK; if (hw->mac.type == ixgbe_mac_82598EB) offloads |= DEV_RX_OFFLOAD_VLAN_STRIP; diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index b5fd57693..1bf01bda3 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -369,7 +369,8 @
[dpdk-dev] [PATCH v2 6/7] examples/eventdev_pipeline: add new Rx RSS hash offload
From: Pavan Nikhilesh Since pipeline_generic uses `rte_mbuf::hash::rss` add the new Rx offload flag `DEV_RX_OFFLOAD_RSS_HASH` to inform PMD to copy the RSS hash result into the mbuf. Signed-off-by: Pavan Nikhilesh --- Currently, there is no means to retrieve set configuration from an ethdev without touching the internal structures of `rte_ethdev`. So, moving port configuration into specific pipeline models is the only way. examples/eventdev_pipeline/main.c | 113 - .../pipeline_worker_generic.c | 118 ++ .../eventdev_pipeline/pipeline_worker_tx.c| 114 + 3 files changed, 232 insertions(+), 113 deletions(-) diff --git a/examples/eventdev_pipeline/main.c b/examples/eventdev_pipeline/main.c index f4e57f541..a73b61d59 100644 --- a/examples/eventdev_pipeline/main.c +++ b/examples/eventdev_pipeline/main.c @@ -242,118 +242,6 @@ parse_app_args(int argc, char **argv) } } -/* - * Initializes a given port using global settings and with the RX buffers - * coming from the mbuf_pool passed as a parameter. - */ -static inline int -port_init(uint8_t port, struct rte_mempool *mbuf_pool) -{ - struct rte_eth_rxconf rx_conf; - static const struct rte_eth_conf port_conf_default = { - .rxmode = { - .mq_mode = ETH_MQ_RX_RSS, - .max_rx_pkt_len = RTE_ETHER_MAX_LEN, - }, - .rx_adv_conf = { - .rss_conf = { - .rss_hf = ETH_RSS_IP | - ETH_RSS_TCP | - ETH_RSS_UDP, - } - } - }; - const uint16_t rx_rings = 1, tx_rings = 1; - const uint16_t rx_ring_size = 512, tx_ring_size = 512; - struct rte_eth_conf port_conf = port_conf_default; - int retval; - uint16_t q; - struct rte_eth_dev_info dev_info; - struct rte_eth_txconf txconf; - - if (!rte_eth_dev_is_valid_port(port)) - return -1; - - rte_eth_dev_info_get(port, &dev_info); - if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE) - port_conf.txmode.offloads |= - DEV_TX_OFFLOAD_MBUF_FAST_FREE; - rx_conf = dev_info.default_rxconf; - rx_conf.offloads = port_conf.rxmode.offloads; - - port_conf.rx_adv_conf.rss_conf.rss_hf &= - dev_info.flow_type_rss_offloads; - if (port_conf.rx_adv_conf.rss_conf.rss_hf != - port_conf_default.rx_adv_conf.rss_conf.rss_hf) { - printf("Port %u modified RSS hash function based on hardware support," - "requested:%#"PRIx64" configured:%#"PRIx64"\n", - port, - port_conf_default.rx_adv_conf.rss_conf.rss_hf, - port_conf.rx_adv_conf.rss_conf.rss_hf); - } - - /* Configure the Ethernet device. */ - retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); - if (retval != 0) - return retval; - - /* Allocate and set up 1 RX queue per Ethernet port. */ - for (q = 0; q < rx_rings; q++) { - retval = rte_eth_rx_queue_setup(port, q, rx_ring_size, - rte_eth_dev_socket_id(port), &rx_conf, - mbuf_pool); - if (retval < 0) - return retval; - } - - txconf = dev_info.default_txconf; - txconf.offloads = port_conf_default.txmode.offloads; - /* Allocate and set up 1 TX queue per Ethernet port. */ - for (q = 0; q < tx_rings; q++) { - retval = rte_eth_tx_queue_setup(port, q, tx_ring_size, - rte_eth_dev_socket_id(port), &txconf); - if (retval < 0) - return retval; - } - - /* Display the port MAC address. */ - struct rte_ether_addr addr; - rte_eth_macaddr_get(port, &addr); - printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 - " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", - (unsigned int)port, - addr.addr_bytes[0], addr.addr_bytes[1], - addr.addr_bytes[2], addr.addr_bytes[3], - addr.addr_bytes[4], addr.addr_bytes[5]); - - /* Enable RX in promiscuous mode for the Ethernet device. */ - rte_eth_promiscuous_enable(port); - - return 0; -} - -static int -init_ports(uint16_t num_ports) -{ - uint16_t portid; - - if (!cdata.num_mbuf) - cdata.num_mbuf = 16384 * num_ports; - - struct rte_mempool *mp = rte_pktmbuf_pool_create("packet_pool", - /* mbufs */ cdata.num_mbuf, - /* cache_size */ 512, - /* priv_size*/ 0, -
[dpdk-dev] [PATCH v2 3/7] ethdev: add flow action type update as an offload
From: Pavan Nikhilesh Add new Rx offload flag `DEV_RX_OFFLOAD_FLOW_MARK` that can be used to enable/disable PMDs write to `rte_mbuf::hash::fdir::hi` and `rte_mbuf::ol_flags` when flow actions `RTE_FLOW_ACTION_MARK` and `RTE_FLOW_ACTION_FLAG` are enabled. PMDs notify the validity of `rte_mbuf::hash:fdir::hi` to the applcation by enabling `PKT_RX_FDIR_ID` flag in `rte_mbuf::ol_flags`. Signed-off-by: Pavan Nikhilesh --- doc/guides/nics/features.rst | 12 lib/librte_ethdev/rte_ethdev.c | 1 + lib/librte_ethdev/rte_ethdev.h | 1 + lib/librte_ethdev/rte_flow.h | 6 -- 4 files changed, 18 insertions(+), 2 deletions(-) diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index f79b69b38..338b19e03 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -594,6 +594,18 @@ application to set ptypes it is interested in. * **[provides] mbuf**: ``mbuf.packet_type``. +.. _nic_features_flow_flag_mark: + +Flow flag/mark update +- + +Supports flow action type update to ``mbuf.ol_flags`` and ``mbuf.hash.fdir.hi``. + +* **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_FLOW_MARK``. +* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_FDIR``, ``mbuf.ol_flags:PKT_RX_FDIR_ID;``, + ``mbuf.hash.fdir.hi`` + + .. _nic_features_timesync: Timesync diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 9c5517d5f..bcbe06c5c 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -130,6 +130,7 @@ static const struct { RTE_RX_OFFLOAD_BIT2STR(SCTP_CKSUM), RTE_RX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM), RTE_RX_OFFLOAD_BIT2STR(RSS_HASH), + RTE_RX_OFFLOAD_BIT2STR(FLOW_MARK), }; #undef RTE_RX_OFFLOAD_BIT2STR diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index 836b30074..44686ec21 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -1014,6 +1014,7 @@ struct rte_eth_conf { #define DEV_RX_OFFLOAD_SCTP_CKSUM 0x0002 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM 0x0004 #define DEV_RX_OFFLOAD_RSS_HASH0x0008 +#define DEV_RX_OFFLOAD_FLOW_MARK 0x0010 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \ DEV_RX_OFFLOAD_UDP_CKSUM | \ diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h index b66bf1495..5d9d88d76 100644 --- a/lib/librte_ethdev/rte_flow.h +++ b/lib/librte_ethdev/rte_flow.h @@ -1316,7 +1316,8 @@ enum rte_flow_action_type { /** * Attaches an integer value to packets and sets PKT_RX_FDIR and -* PKT_RX_FDIR_ID mbuf flags. +* PKT_RX_FDIR_ID mbuf flags when +* `rx_mode:offloads:DEV_RX_OFFLOAD_FLOW_MARK` is set. * * See struct rte_flow_action_mark. */ @@ -1324,7 +1325,8 @@ enum rte_flow_action_type { /** * Flags packets. Similar to MARK without a specific value; only -* sets the PKT_RX_FDIR mbuf flag. +* sets the PKT_RX_FDIR mbuf flag when +* `rx_mode:offloads:DEV_RX_OFFLOAD_FLOW_MARK` is set * * No associated configuration structure. */ -- 2.22.0
[dpdk-dev] [PATCH v2 7/7] examples: disable Rx packet type parsing
From: Pavan Nikhilesh Disable packet type parsing in examples that don't use `rte_mbuf::packet_type` by setting ptype_mask as 0 in `rte_eth_dev_set_supported_ptypes` Signed-off-by: Pavan Nikhilesh --- examples/bbdev_app/main.c | 1 + examples/bond/main.c | 2 ++ examples/distributor/Makefile | 1 + examples/distributor/main.c| 1 + examples/distributor/meson.build | 1 + examples/eventdev_pipeline/main.c | 1 + examples/eventdev_pipeline/meson.build | 1 + examples/exception_path/Makefile | 1 + examples/exception_path/main.c | 1 + examples/exception_path/meson.build| 1 + examples/flow_classify/flow_classify.c | 1 + examples/flow_filtering/Makefile | 1 + examples/flow_filtering/main.c | 1 + examples/flow_filtering/meson.build| 1 + examples/ip_pipeline/link.c| 1 + examples/ip_reassembly/Makefile| 1 + examples/ip_reassembly/main.c | 1 + examples/ip_reassembly/meson.build | 1 + examples/ipsec-secgw/ipsec-secgw.c | 1 + examples/ipv4_multicast/Makefile | 1 + examples/ipv4_multicast/main.c | 1 + examples/ipv4_multicast/meson.build| 1 + examples/kni/main.c| 1 + examples/l2fwd-cat/Makefile| 1 + examples/l2fwd-cat/l2fwd-cat.c | 1 + examples/l2fwd-cat/meson.build | 1 + examples/l2fwd-crypto/main.c | 1 + examples/l2fwd-jobstats/Makefile | 1 + examples/l2fwd-jobstats/main.c | 1 + examples/l2fwd-jobstats/meson.build| 1 + examples/l2fwd-keepalive/Makefile | 1 + examples/l2fwd-keepalive/main.c| 1 + examples/l2fwd-keepalive/meson.build | 1 + examples/l2fwd/Makefile| 1 + examples/l2fwd/main.c | 1 + examples/l2fwd/meson.build | 1 + examples/l3fwd-acl/Makefile| 1 + examples/l3fwd-acl/main.c | 1 + examples/l3fwd-acl/meson.build | 1 + examples/l3fwd-power/main.c| 1 + examples/l3fwd-vf/Makefile | 1 + examples/l3fwd-vf/main.c | 1 + examples/l3fwd-vf/meson.build | 1 + examples/link_status_interrupt/Makefile| 1 + examples/link_status_interrupt/main.c | 1 + examples/link_status_interrupt/meson.build | 1 + examples/load_balancer/Makefile| 1 + examples/load_balancer/init.c | 1 + examples/load_balancer/meson.build | 1 + examples/packet_ordering/Makefile | 1 + examples/packet_ordering/main.c| 1 + examples/packet_ordering/meson.build | 1 + examples/ptpclient/Makefile| 1 + examples/ptpclient/meson.build | 1 + examples/ptpclient/ptpclient.c | 1 + examples/qos_meter/Makefile| 1 + examples/qos_meter/main.c | 2 ++ examples/qos_meter/meson.build | 1 + examples/qos_sched/Makefile| 1 + examples/qos_sched/init.c | 1 + examples/qos_sched/meson.build | 1 + examples/quota_watermark/qw/Makefile | 1 + examples/quota_watermark/qw/init.c | 1 + examples/rxtx_callbacks/main.c | 1 + examples/server_node_efd/server/Makefile | 1 + examples/server_node_efd/server/init.c | 1 + examples/skeleton/Makefile | 1 + examples/skeleton/basicfwd.c | 1 + examples/skeleton/meson.build | 1 + examples/tep_termination/Makefile | 1 + examples/tep_termination/meson.build | 1 + examples/tep_termination/vxlan_setup.c | 1 + examples/vhost/Makefile| 1 + examples/vhost/main.c | 1 + examples/vm_power_manager/Makefile | 1 + examples/vm_power_manager/main.c | 1 + examples/vm_power_manager/meson.build | 1 + examples/vmdq/Makefile | 1 + examples/vmdq/main.c | 1 + examples/vmdq/meson.build | 1 + examples/vmdq_dcb/Makefile | 1 + examples/vmdq_dcb/main.c | 1 + examples/vmdq_dcb/meson.build | 1 + 83 files changed, 85 insertions(+) diff --git a/examples/bbdev_app/main.c b/examples/bbdev_app/main.c index 9acf666dc..8ae6e4972 100644 --- a/examples/bbdev_app/main.c +++ b/examples/bbdev_app/main.c @@ -478,6 +478,7 @@ initialize_ports(struct app_config_params *app_params, } rte_eth_promiscuous_enable(port_id); + rte_eth_dev_set_supported_ptypes(port_id, 0); rte_eth_macaddr_get(port_id, &bbdev_port_eth_addr); print_mac(port_id, &bbdev_port_eth_addr); diff --git a/examples/bond/main.c b/examples/bond/main.c index 1c0df9d46..ffb911fc5 100644 --- a/examples/bond/main.c +++ b/examples/bond/main
[dpdk-dev] [PATCH] maintainers: update for Mellanox mlx5 PMD
Matan thankfully accepted to replace myself as maintainer for mlx5 PMD. Good luck! Signed-off-by: Yongseok Koh --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 4100260861..30dbb8be55 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -715,8 +715,8 @@ F: doc/guides/nics/mlx4.rst F: doc/guides/nics/features/mlx4.ini Mellanox mlx5 +M: Matan Azrad M: Shahaf Shuler -M: Yongseok Koh M: Viacheslav Ovsiienko T: git://dpdk.org/next/dpdk-next-net-mlx F: drivers/net/mlx5/ -- 2.21.0.196.g041f5ea
[dpdk-dev] [PATCH v2] timer: use rte_mp_msg to get freq from primary process
Ideally, get_tsc_freq_arch() is able to provide the TSC rate using architecture-specific means. When that is not possible, DPDK reverts to calculating the TSC rate with a 100ms nanosleep or 1s sleep. The latter occurs more frequently in VMs which often do not have access to the data they need from arch-specific means (CPUID leaf 0x15 or MSR 0xCE on x86). In secondary processes, the extra 100ms is especially noticeable and consumes the bulk of rte_eal_init() execution time. So in secondary processes, if we cannot get the TSC rate using get_tsc_freq_arch(), try to get the TSC rate from the primary process instead using rte_mp_msg. This is much faster than 100ms. Reduces rte_eal_init() execution time in a secondary process from 165ms to 66ms on my test system. Signed-off-by: Jim Harris Change-Id: I584419ed1c7d6f47841e0a0eb23f34c9f1186d35 --- lib/librte_eal/common/eal_common_timer.c | 62 ++ 1 file changed, 62 insertions(+) diff --git a/lib/librte_eal/common/eal_common_timer.c b/lib/librte_eal/common/eal_common_timer.c index 145543de7..ad965455d 100644 --- a/lib/librte_eal/common/eal_common_timer.c +++ b/lib/librte_eal/common/eal_common_timer.c @@ -15,9 +15,17 @@ #include #include #include +#include +#include #include "eal_private.h" +#define EAL_TIMER_MP "eal_timer_mp_sync" + +struct timer_mp_param { + uint64_t tsc_hz; +}; + /* The frequency of the RDTSC timer resolution */ static uint64_t eal_tsc_resolution_hz; @@ -74,12 +82,58 @@ estimate_tsc_freq(void) return RTE_ALIGN_MUL_NEAR(rte_rdtsc() - start, CYC_PER_10MHZ); } +static uint64_t +get_tsc_freq_from_primary(void) +{ + struct rte_mp_msg mp_req = {0}; + struct rte_mp_reply mp_reply = {0}; + struct timer_mp_param *r; + struct timespec ts = {.tv_sec = 1, .tv_nsec = 0}; + uint64_t tsc_hz; + + strcpy(mp_req.name, EAL_TIMER_MP); + if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) || + mp_reply.nb_received != 1) { + tsc_hz = 0; + } else { + r = (struct timer_mp_param *)mp_reply.msgs[0].param; + tsc_hz = r->tsc_hz; + } + + free(mp_reply.msgs); + return tsc_hz; +} + +static int +timer_mp_primary(__attribute__((unused)) const struct rte_mp_msg *msg, +const void *peer) +{ + struct rte_mp_msg reply = {0}; + struct timer_mp_param *r = (struct timer_mp_param *)reply.param; + + r->tsc_hz = eal_tsc_resolution_hz; + strcpy(reply.name, EAL_TIMER_MP); + reply.len_param = sizeof(*r); + + return rte_mp_reply(&reply, peer); +} + void set_tsc_freq(void) { uint64_t freq; + int rc; freq = get_tsc_freq_arch(); + if (!freq && rte_eal_process_type() != RTE_PROC_PRIMARY) { + /* We couldn't get the TSC frequency through arch-specific +* means. If this is a secondary process, try to get the +* TSC frequency from the primary process - this will +* be much faster than get_tsc_freq() or estimate_tsc_freq() +* below. +*/ + freq = get_tsc_freq_from_primary(); + } if (!freq) freq = get_tsc_freq(); if (!freq) @@ -87,6 +141,14 @@ set_tsc_freq(void) RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000); eal_tsc_resolution_hz = freq; + if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + rc = rte_mp_action_register(EAL_TIMER_MP, timer_mp_primary); + if (rc && rte_errno != ENOTSUP) { + RTE_LOG(WARNING, EAL, "Could not register mp_action - " + "secondary processes will calculate TSC rate " + "independently.\n"); + } + } } void rte_delay_us_callback_register(void (*userfunc)(unsigned int))
Re: [dpdk-dev] Sync up status for Mellanox PMD barrier investigation
Please disregard my last message. It was mistakenly sent to the wrong group. Sorry about that. Thanks, Phil Yang > -Original Message- > From: dev On Behalf Of Phil Yang (Arm > Technology China) > Sent: Wednesday, August 21, 2019 5:58 PM > To: Honnappa Nagarahalli > Cc: dev@dpdk.org; nd > Subject: Re: [dpdk-dev] Sync up status for Mellanox PMD barrier > investigation > > Some update for this thread. > > In the most critical datapath of mlx5 PMD, there are some rte_cio_w/rmb, > 'dmb osh' on aarch64, in use. > C11 atomic is good for replacing the rte_smp_r/wmb to relax the data > synchronization barrier between CPUs. > However, mlx5 PMD needs to write data back to the HW, so it used a lot of > rte_cio_r/wmb to synchronize data. > > Please check details below. All comments are welcomed. Thanks. > > Data path /// > drivers/net/mlx5/mlx5_rxtx.c=950=mlx5_rx_err_handle(struct > mlx5_rxq_data *rxq, uint8_t mbuf_prepare) > drivers/net/mlx5/mlx5_rxtx.c:1002: rte_cio_wmb(); > drivers/net/mlx5/mlx5_rxtx.c:1004: rte_cio_wmb(); > drivers/net/mlx5/mlx5_rxtx.c:1010: rte_cio_wmb(); > drivers/net/mlx5/mlx5_rxtx.c=1272=mlx5_rx_burst(void *dpdk_rxq, struct > rte_mbuf **pkts, uint16_t pkts_n) > drivers/net/mlx5/mlx5_rxtx.c:1385: rte_cio_wmb(); > drivers/net/mlx5/mlx5_rxtx.c:1387: rte_cio_wmb(); > drivers/net/mlx5/mlx5_rxtx.c=1549=mlx5_rx_burst_mprq(void *dpdk_rxq, > struct rte_mbuf **pkts, uint16_t pkts_n) > drivers/net/mlx5/mlx5_rxtx.c:1741: rte_cio_wmb(); > drivers/net/mlx5/mlx5_rxtx.c:1745: rte_cio_wmb(); > drivers/net/mlx5/mlx5_rxtx_vec_neon.h=366=rxq_burst_v(struct > mlx5_rxq_data *rxq, struct rte_mbuf **pkts, uint16_t pkts_n, > drivers/net/mlx5/mlx5_rxtx_vec_neon.h:530: rte_cio_rmb(); > > Commit messages: > net/mlx5: cleanup memory barriers: mlx5_rx_burst > https://git.dpdk.org/dpdk/commit/?id=9afa3f74658afc0e21fbe5c3884c55a21 > ff49299 > > net/mlx5: add Multi-Packet Rx support : mlx5_rx_burst_mprq > https://git.dpdk.org/dpdk/commit/?id=7d6bf6b866b8c25ec06539b3eeed1db > 4f785577c > > net/mlx5: use coherent I/O memory barrier > https://git.dpdk.org/dpdk/commit/drivers/net/mlx5/mlx5_rxtx.c?id=0cfdc18 > 08de82357a924a479dc3f89de88cd91c2 > > net/mlx5: extend Rx completion with error handling > https://git.dpdk.org/dpdk/commit/drivers/net/mlx5/mlx5_rxtx.c?id=88c073 > 3535d6a7ce79045d4d57a1d78d904067c8 > > net/mlx5: fix synchronization on polling Rx completions > https://git.dpdk.org/dpdk/commit/?id=1742c2d9fab07e66209f2d14e7daa508 > 29fc4423 > > > Thanks, > Phil Yang > > From: Phil Yang (Arm Technology China) > Sent: Thursday, August 15, 2019 6:35 PM > To: Honnappa Nagarahalli > Subject: Sync up status for Mellanox PMD barrier investigation > > Hi Honnappa, > > I have checked all the barriers in mlx5 PMD data path. In my understanding, it > used the barrier correctly (Use DMB to synchronize the memory data > between CPUs). > The attachment is the list of positions of these barriers. > I just want to sync up with you the status. Do you have any idea or > suggestion on which part should we start to optimization? > > Best Regards, > Phil Yang
Re: [dpdk-dev] [PATCH] net/i40e: add checking for messages from VF
On 08/20, alvinx.zh...@intel.com wrote: >From: Alvin Zhang > >If VF driver in VM continuous sending invalid messages by mailbox, >it will waste CPU cycles on PF driver and impact other VF drivers >configuration. New feature can count the numbers of invalid and >unsupported messages from VFs, when the statistics from a VF >exceed maximum limit, PF driver will ignore any message from that >VF for some seconds. > >Signed-off-by: Alvin Zhang >--- > drivers/net/i40e/i40e_ethdev.c | 80 + > drivers/net/i40e/i40e_ethdev.h | 30 +++ > drivers/net/i40e/i40e_pf.c | 189 - > 3 files changed, 258 insertions(+), 41 deletions(-) > >diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c >index 4e40b7a..045ba49 100644 >--- a/drivers/net/i40e/i40e_ethdev.c >+++ b/drivers/net/i40e/i40e_ethdev.c >@@ -44,6 +44,7 @@ > #define ETH_I40E_SUPPORT_MULTI_DRIVER "support-multi-driver" > #define ETH_I40E_QUEUE_NUM_PER_VF_ARG "queue-num-per-vf" > #define ETH_I40E_USE_LATEST_VEC "use-latest-supported-vec" >+#define ETH_I40E_MAX_VF_WRONG_MSG "vf_max_wrong_msg" > > #define I40E_CLEAR_PXE_WAIT_MS 200 > >@@ -406,6 +407,7 @@ static int i40e_sw_tunnel_filter_insert(struct i40e_pf *pf, > ETH_I40E_SUPPORT_MULTI_DRIVER, > ETH_I40E_QUEUE_NUM_PER_VF_ARG, > ETH_I40E_USE_LATEST_VEC, >+ ETH_I40E_MAX_VF_WRONG_MSG, > NULL}; > > static const struct rte_pci_id pci_id_i40e_map[] = { >@@ -1256,6 +1258,82 @@ static inline void i40e_config_automask(struct i40e_pf >*pf) > return 0; > } > >+static int >+read_vf_msg_check_info(__rte_unused const char *key, >+ const char *value, >+ void *opaque) >+{ >+ struct i40e_wrong_vf_msg info; >+ >+ memset(&info, 0, sizeof(info)); >+ /* >+ * VF message checking function need 3 parameters, max_invalid, >+ * max_unsupported and silence_seconds. >+ * When continuous invalid or unsupported message statistics >+ * from a VF exceed the limitation of 'max_invalid' or >+ * 'max_unsupported', PF will ignore any message from that VF for >+ * 'silence_seconds' seconds. >+ */ >+ if (sscanf(value, "%u:%u:%lu", &info.max_invalid, >+ &info.max_unsupport, &info.silence_seconds) >+ != 3) { >+ PMD_DRV_LOG(ERR, "vf_max_wrong_msg error! format like: " >+ "vf_max_wrong_msg=4:6:60"); >+ return -EINVAL; >+ } >+ >+ /* >+ * If invalid or unsupported message checking function is enabled >+ * by setting max_invalid or max_unsupport variable to not zero, >+ * 'slience_seconds' must be greater than zero. >+ */ >+ if ((info.max_invalid | info.max_unsupport) && info.max_invalid || info.max_unsupport? And I prefer to use unsupported in your variable names for unsupport is not a valid word. >+ !info.silence_seconds) { >+ PMD_DRV_LOG(ERR, "vf_max_wrong_msg error! last integer" >+ " must be larger than zero"); >+ return -EINVAL; >+ } >+ >+ memcpy(opaque, &info, sizeof(struct i40e_wrong_vf_msg)); >+ return 0; >+} >+ >+static int >+i40e_parse_vf_msg_check_info(struct rte_eth_dev *dev, >+ struct i40e_wrong_vf_msg *wrong_info) >+{ >+ int ret = 0; >+ int kvargs_count; >+ struct rte_kvargs *kvlist; >+ >+ /* reset all to zero */ >+ memset(wrong_info, 0, sizeof(*wrong_info)); >+ >+ if (!dev->device->devargs) >+ return ret; >+ >+ kvlist = rte_kvargs_parse(dev->device->devargs->args, valid_keys); >+ if (!kvlist) >+ return -EINVAL; >+ >+ kvargs_count = rte_kvargs_count(kvlist, ETH_I40E_MAX_VF_WRONG_MSG); >+ if (!kvargs_count) >+ goto free_end; >+ >+ if (kvargs_count > 1) >+ PMD_DRV_LOG(WARNING, "More than one argument \"%s\" and only " >+ "the first invalid or last valid one is used !", >+ ETH_I40E_MAX_VF_WRONG_MSG); What about we just allow 1 wrong msg argument? >+ >+ if (rte_kvargs_process(kvlist, ETH_I40E_MAX_VF_WRONG_MSG, >+ read_vf_msg_check_info, wrong_info) < 0) >+ ret = -EINVAL; >+ >+free_end: >+ rte_kvargs_free(kvlist); >+ return ret; >+} >+ > #define I40E_ALARM_INTERVAL 5 /* us */ > > static int >@@ -1328,6 +1406,8 @@ static inline void i40e_config_automask(struct i40e_pf >*pf) > return -EIO; > } > >+ /* read VF message checking function parameters */ >+ i40e_parse_vf_msg_check_info(dev, &pf->wrong_vf_msg_conf); > /* Check if need to support multi-driver */ > i40e_support_multi_driver(dev); > /* Check if users want the latest supported vec path */ >diff --git a/drivers/net/i40e/i40e_ethdev.h b/drive
Re: [dpdk-dev] [PATCH] maintainers: update for Mellanox mlx5 PMD
Wednesday, August 21, 2019 11:56 PM, Yongseok Koh: > Subject: [dpdk-dev] [PATCH] maintainers: update for Mellanox mlx5 PMD > > Matan thankfully accepted to replace myself as maintainer for mlx5 PMD. > Good luck! > > Signed-off-by: Yongseok Koh Thanks you Koh for all the hard work and the maintenance of the PMD. Acked-by: Shahaf Shuler > --- > MAINTAINERS | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 4100260861..30dbb8be55 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -715,8 +715,8 @@ F: doc/guides/nics/mlx4.rst > F: doc/guides/nics/features/mlx4.ini > > Mellanox mlx5 > +M: Matan Azrad > M: Shahaf Shuler > -M: Yongseok Koh > M: Viacheslav Ovsiienko > T: git://dpdk.org/next/dpdk-next-net-mlx > F: drivers/net/mlx5/ > -- > 2.21.0.196.g041f5ea
[dpdk-dev] [PATCH 02/13] net/bnxt: prevent device access when device is in reset
From: Kalesh AP Refactor init and uninit functions so that the driver can fail the eth_dev_ops callbacks and accessing Tx and Rx queues when device is in reset or in error state. Transmit and receive queues are freed during reset cleanup and reallocated during recovery. So we block all data path handling in this state. The eth_dev dev_started field is updated depending on the status of the device. Signed-off-by: Kalesh AP Reviewed-by: Ajit Khaparde Reviewed-by: Santoshkumar Karanappa Rastapur Reviewed-by: Somnath Kotur --- drivers/net/bnxt/bnxt.h| 1 + drivers/net/bnxt/bnxt_ethdev.c | 455 ++--- drivers/net/bnxt/bnxt_hwrm.c | 2 - drivers/net/bnxt/bnxt_ring.c | 32 +++ drivers/net/bnxt/bnxt_ring.h | 1 + drivers/net/bnxt/bnxt_rxq.c| 25 ++ drivers/net/bnxt/bnxt_rxr.c| 17 ++ drivers/net/bnxt/bnxt_rxr.h| 2 + drivers/net/bnxt/bnxt_stats.c | 34 ++- drivers/net/bnxt/bnxt_txq.c| 7 + drivers/net/bnxt/bnxt_txr.c| 27 ++ drivers/net/bnxt/bnxt_txr.h| 2 + 12 files changed, 452 insertions(+), 153 deletions(-) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index 0c9f994ea..49418cac9 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -465,6 +465,7 @@ struct bnxt { int bnxt_link_update_op(struct rte_eth_dev *eth_dev, int wait_to_complete); int bnxt_rcv_msg_from_vf(struct bnxt *bp, uint16_t vf_id, void *msg); +int is_bnxt_in_error(struct bnxt *bp); bool is_bnxt_supported(struct rte_eth_dev *dev); bool bnxt_stratus_device(struct bnxt *bp); diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 6685ee7d9..33ff4a5a7 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -167,6 +167,16 @@ static void bnxt_print_link_info(struct rte_eth_dev *eth_dev); static int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t new_mtu); static int bnxt_dev_uninit(struct rte_eth_dev *eth_dev); +int is_bnxt_in_error(struct bnxt *bp) +{ + if (bp->flags & BNXT_FLAG_FATAL_ERROR) + return -EIO; + if (bp->flags & BNXT_FLAG_FW_RESET) + return -EBUSY; + + return 0; +} + /***/ /* @@ -207,6 +217,10 @@ static int bnxt_alloc_mem(struct bnxt *bp) { int rc; + rc = bnxt_alloc_ring_grps(bp); + if (rc) + goto alloc_mem_err; + rc = bnxt_alloc_async_ring_struct(bp); if (rc) goto alloc_mem_err; @@ -501,6 +515,9 @@ static void bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev, uint16_t max_vnics, i, j, vpool, vrxq; unsigned int max_rx_rings; + if (is_bnxt_in_error(bp)) + return; + /* MAC Specifics */ dev_info->max_mac_addrs = bp->max_l2_ctx; dev_info->max_hash_mac_addrs = 0; @@ -602,6 +619,10 @@ static int bnxt_dev_configure_op(struct rte_eth_dev *eth_dev) bp->tx_nr_rings = eth_dev->data->nb_tx_queues; bp->rx_nr_rings = eth_dev->data->nb_rx_queues; + rc = is_bnxt_in_error(bp); + if (rc) + return rc; + if (BNXT_VF(bp) && (bp->flags & BNXT_FLAG_NEW_RM)) { rc = bnxt_hwrm_check_vf_rings(bp); if (rc) { @@ -791,8 +812,10 @@ static int bnxt_dev_start_op(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = bnxt_receive_function(eth_dev); eth_dev->tx_pkt_burst = bnxt_transmit_function(eth_dev); + bnxt_enable_int(bp); bp->flags |= BNXT_FLAG_INIT_DONE; + eth_dev->data->dev_started = 1; bp->dev_stopped = 0; return 0; @@ -835,6 +858,11 @@ static void bnxt_dev_stop_op(struct rte_eth_dev *eth_dev) struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev); struct rte_intr_handle *intr_handle = &pci_dev->intr_handle; + eth_dev->data->dev_started = 0; + /* Prevent crashes when queues are still in use */ + eth_dev->rx_pkt_burst = &bnxt_dummy_recv_pkts; + eth_dev->tx_pkt_burst = &bnxt_dummy_xmit_pkts; + bnxt_disable_int(bp); /* disable uio/vfio intr/eventfd mapping */ @@ -889,6 +917,9 @@ static void bnxt_mac_addr_remove_op(struct rte_eth_dev *eth_dev, struct bnxt_filter_info *filter, *temp_filter; uint32_t i; + if (is_bnxt_in_error(bp)) + return; + /* * Loop through all VNICs from the specified filter flow pools to * remove the corresponding MAC addr filter @@ -924,6 +955,10 @@ static int bnxt_mac_addr_add_op(struct rte_eth_dev *eth_dev, struct bnxt_filter_info *filter; int rc = 0; + rc = is_bnxt_in_error(bp); + if (rc) + return rc; + if (BNXT_VF(bp) & !BNXT_VF_IS_TRUSTED(bp)) { PMD_DRV_LOG(ERR, "Cannot add MAC address to a VF interface\n"); return -ENOTSUP; @@ -969,6 +1004,10 @@ int bnxt_link_update_op(struct rte_eth_dev *eth_dev, i
[dpdk-dev] [PATCH 01/13] net/bnxt: hsi version update
From: Kalesh AP Signed-off-by: Kalesh AP Reviewed-by: Somnath Kotur Reviewed-by: Ajit Khaparde --- drivers/net/bnxt/hsi_struct_def_dpdk.h | 137 + 1 file changed, 137 insertions(+) diff --git a/drivers/net/bnxt/hsi_struct_def_dpdk.h b/drivers/net/bnxt/hsi_struct_def_dpdk.h index 6c98c1d6d..009571725 100644 --- a/drivers/net/bnxt/hsi_struct_def_dpdk.h +++ b/drivers/net/bnxt/hsi_struct_def_dpdk.h @@ -33621,4 +33621,141 @@ struct hwrm_nvm_validate_option_cmd_err { uint8_t unused_0[7]; } __attribute__((packed)); +/* + * hwrm_fw_reset * + **/ + + +/* hwrm_fw_reset_input (size:192b/24B) */ +struct hwrm_fw_reset_input { + /* The HWRM command request type. */ + uint16_treq_type; + /* +* The completion ring to send the completion event on. This should +* be the NQ ID returned from the `nq_alloc` HWRM command. +*/ + uint16_tcmpl_ring; + /* +* The sequence ID is used by the driver for tracking multiple +* commands. This ID is treated as opaque data by the firmware and +* the value is returned in the `hwrm_resp_hdr` upon completion. +*/ + uint16_tseq_id; + /* +* The target ID of the command: +* * 0x0-0xFFF8 - The function ID +* * 0xFFF8-0xFFFE - Reserved for internal processors +* * 0x - HWRM +*/ + uint16_ttarget_id; + /* +* A physical address pointer pointing to a host buffer that the +* command's response data will be written. This can be either a host +* physical address (HPA) or a guest physical address (GPA) and must +* point to a physically contiguous block of memory. +*/ + uint64_tresp_addr; + /* Type of embedded processor. */ + uint8_t embedded_proc_type; + /* Boot Processor */ + #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_BOOT \ + UINT32_C(0x0) + /* Management Processor */ + #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_MGMT \ + UINT32_C(0x1) + /* Network control processor */ + #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_NETCTRL \ + UINT32_C(0x2) + /* RoCE control processor */ + #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_ROCE \ + UINT32_C(0x3) + /* +* Host (in multi-host environment): This is only valid if requester is IPC. +* Reinit host hardware resources and PCIe. +*/ + #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_HOST \ + UINT32_C(0x4) + /* AP processor complex (in multi-host environment). Use host_idx to control which core is reset */ + #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_AP \ + UINT32_C(0x5) + /* Reset all blocks of the chip (including all processors) */ + #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_CHIP \ + UINT32_C(0x6) + /* +* Host (in multi-host environment): This is only valid if requester is IPC. +* Reinit host hardware resources. +*/ + #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_HOST_RESOURCE_REINIT \ + UINT32_C(0x7) + #define HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_LAST \ + HWRM_FW_RESET_INPUT_EMBEDDED_PROC_TYPE_HOST_RESOURCE_REINIT + /* Type of self reset. */ + uint8_t selfrst_status; + /* No Self Reset */ + #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTNONE \ + UINT32_C(0x0) + /* Self Reset as soon as possible to do so safely */ + #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTASAP \ + UINT32_C(0x1) + /* Self Reset on PCIe Reset */ + #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTPCIERST \ + UINT32_C(0x2) + /* Self Reset immediately after notification to all clients. */ + #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTIMMEDIATE \ + UINT32_C(0x3) + #define HWRM_FW_RESET_INPUT_SELFRST_STATUS_LAST \ + HWRM_FW_RESET_INPUT_SELFRST_STATUS_SELFRSTIMMEDIATE + /* +* Indicate which host is being reset. 0 means first host. +* Only valid when embedded_proc_type is host in multihost +* environment +*/ + uint8_t host_idx; + uint8_t flags; + /* +* When this bit is '1', then the core firmware initiates +* the reset only after graceful shut down of all registered instances. +* If not, the device will continue with the existing firmware. +*/ + #define HWRM_FW_RESET_INPUT_FLAGS_RESET_GRACEFUL UINT32_C(0x1) + uint8_t unused_0[4]; +} __attribute__((packed)); + +/* hwrm_fw_reset_output (size:128b/16B) */ +struct hwrm_fw_reset_output { + /* The specific error status for the command. */ + uint16_terror_code; + /* The HWR
[dpdk-dev] [PATCH 00/13] bnxt patchset to support device error recovery
This patchset adds support to moitor the health of the firmware and the underlying device and recover to an operational state in case of error. We can also detect if a FW upgrade is in progress and quiesce all access to the device and recover once FW indicates everything is ready. Patchset against dpdk-next-net. Please apply. Kalesh AP (13): net/bnxt: hsi version update net/bnxt: prevent device access when device is in reset net/bnxt: handle reset notify async event from FW net/bnxt: inform firmware about IF state changes net/bnxt: handle fatal event from FW under error conditions net/bnxt: query firmware error recovery capabilities net/bnxt: map status registers for FW health monitoring net/bnxt: advertise error recovery capability and handle async event net/bnxt: add code for periodic FW health monitoring net/bnxt: use BIT macro instead of bit fields net/bnxt: reschedule the health check alarm correctly net/bnxt: add support for FW reset net/bnxt: reduce verbosity of logs drivers/net/bnxt/bnxt.h| 130 +++- drivers/net/bnxt/bnxt_cpr.c| 78 +++ drivers/net/bnxt/bnxt_cpr.h| 18 + drivers/net/bnxt/bnxt_ethdev.c | 817 - drivers/net/bnxt/bnxt_hwrm.c | 200 +- drivers/net/bnxt/bnxt_hwrm.h | 7 + drivers/net/bnxt/bnxt_ring.c | 39 +- drivers/net/bnxt/bnxt_ring.h | 1 + drivers/net/bnxt/bnxt_rxq.c| 25 + drivers/net/bnxt/bnxt_rxr.c| 17 + drivers/net/bnxt/bnxt_rxr.h| 2 + drivers/net/bnxt/bnxt_stats.c | 34 +- drivers/net/bnxt/bnxt_txq.c| 7 + drivers/net/bnxt/bnxt_txr.c| 27 + drivers/net/bnxt/bnxt_txr.h| 2 + drivers/net/bnxt/bnxt_util.h | 4 + drivers/net/bnxt/bnxt_vnic.c | 7 +- drivers/net/bnxt/hsi_struct_def_dpdk.h | 137 + 18 files changed, 1339 insertions(+), 213 deletions(-) -- 2.20.1 (Apple Git-117)
[dpdk-dev] [PATCH 03/13] net/bnxt: handle reset notify async event from FW
From: Kalesh AP When the FW upgrade is initiated the current instance of FW issues a HWRM_ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY async notification to the driver. On receiving this notification, the PMD shall quiesce itself and poll on the HWRM_VER_GET FW command at regular intervals. Once the VER_GET command succeeds, the driver should go through the rediscovery process and re-initialize the device. Also register with FW for the reset notify async event. Signed-off-by: Kalesh AP Reviewed-by: Ajit Khaparde Reviewed-by: Somnath Kotur --- drivers/net/bnxt/bnxt.h| 15 + drivers/net/bnxt/bnxt_cpr.c| 14 + drivers/net/bnxt/bnxt_cpr.h| 1 + drivers/net/bnxt/bnxt_ethdev.c | 110 - drivers/net/bnxt/bnxt_hwrm.c | 39 +--- drivers/net/bnxt/bnxt_hwrm.h | 2 + 6 files changed, 158 insertions(+), 23 deletions(-) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index 49418cac9..8797b032e 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -333,6 +333,16 @@ struct bnxt_ctx_mem_info { struct bnxt_ctx_pg_info *tqm_mem[BNXT_MAX_TC_Q]; }; +/* Maximum Firmware Reset bail out value in milliseconds */ +#define BNXT_MAX_FW_RESET_TIMEOUT 6000 +/* Minimum time required for the firmware readiness in milliseconds */ +#define BNXT_MIN_FW_READY_TIMEOUT 2000 +/* Frequency for the firmware readiness check in milliseconds */ +#define BNXT_FW_READY_WAIT_INTERVAL100 + +#define US_PER_MS 1000 +#define NS_PER_US 1000 + #define BNXT_HWRM_SHORT_REQ_LENsizeof(struct hwrm_short_input) struct bnxt { void*bar0; @@ -358,6 +368,8 @@ struct bnxt { #define BNXT_FLAG_DFLT_VNIC_SET(1 << 12) #define BNXT_FLAG_THOR_CHIP(1 << 13) #define BNXT_FLAG_STINGRAY (1 << 14) +#define BNXT_FLAG_FW_RESET (1 << 15) +#define BNXT_FLAG_FATAL_ERROR (1 << 16) #define BNXT_FLAG_EXT_STATS_SUPPORTED (1 << 29) #define BNXT_FLAG_NEW_RM (1 << 30) #define BNXT_FLAG_INIT_DONE(1U << 31) @@ -461,6 +473,9 @@ struct bnxt { struct bnxt_ptp_cfg *ptp_cfg; uint16_tvf_resv_strategy; struct bnxt_ctx_mem_info*ctx; + + uint16_tfw_reset_min_msecs; + uint16_tfw_reset_max_msecs; }; int bnxt_link_update_op(struct rte_eth_dev *eth_dev, int wait_to_complete); diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c index 655bcf1a8..cefb5db2a 100644 --- a/drivers/net/bnxt/bnxt_cpr.c +++ b/drivers/net/bnxt/bnxt_cpr.c @@ -40,6 +40,20 @@ void bnxt_handle_async_event(struct bnxt *bp, case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED: PMD_DRV_LOG(INFO, "Port conn async event\n"); break; + case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY: + /* timestamp_lo/hi values are in units of 100ms */ + bp->fw_reset_max_msecs = async_cmp->timestamp_hi ? + rte_le_to_cpu_16(async_cmp->timestamp_hi) * 100 : + BNXT_MAX_FW_RESET_TIMEOUT; + bp->fw_reset_min_msecs = async_cmp->timestamp_lo ? + async_cmp->timestamp_lo * 100 : + BNXT_MIN_FW_READY_TIMEOUT; + PMD_DRV_LOG(INFO, + "Firmware non-fatal reset event received\n"); + + bp->flags |= BNXT_FLAG_FW_RESET; + bnxt_dev_reset_and_resume(bp); + break; default: PMD_DRV_LOG(INFO, "handle_async_event id = 0x%x\n", event_id); break; diff --git a/drivers/net/bnxt/bnxt_cpr.h b/drivers/net/bnxt/bnxt_cpr.h index 8c6a34b61..4f86e3f60 100644 --- a/drivers/net/bnxt/bnxt_cpr.h +++ b/drivers/net/bnxt/bnxt_cpr.h @@ -106,5 +106,6 @@ struct bnxt; void bnxt_handle_async_event(struct bnxt *bp, struct cmpl_base *cmp); void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base *cmp); int bnxt_event_hwrm_resp_handler(struct bnxt *bp, struct cmpl_base *cmp); +int bnxt_dev_reset_and_resume(struct bnxt *bp); #endif diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 33ff4a5a7..1aef227f2 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "bnxt.h" #include "bnxt_cpr.h" @@ -166,6 +167,8 @@ static int bnxt_vlan_offload_set_op(struct rte_eth_dev *dev, int mask); static void bnxt_print_link_info(struct rte_eth_dev *eth_dev); static int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t new_mtu); static int bnxt_dev_uninit(struct rte_eth_dev *eth_dev); +static int bnxt_init_resources(struct bnxt *bp, bool reconfig_dev); +static int bnxt_uninit_resources(struct bnxt *bp, bool reconfig_dev); int is_bnxt_in_error(struct bnxt *bp) { @@ -201,19 +204,25 @@ static uint16_t bnxt_r
[dpdk-dev] [PATCH 04/13] net/bnxt: inform firmware about IF state changes
From: Kalesh AP Use latest firmware API to inform firmware about IF state changes. Firmware has the option to clean up resources during IF down and to require the driver to reserve resources again during IF up. Signed-off-by: Kalesh AP Reviewed-by: Santoshkumar Karanappa Rastapur Reviewed-by: Somnath Kotur Reviewed-by: Ajit Khaparde --- drivers/net/bnxt/bnxt.h| 1 + drivers/net/bnxt/bnxt_ethdev.c | 4 drivers/net/bnxt/bnxt_hwrm.c | 35 ++ drivers/net/bnxt/bnxt_hwrm.h | 1 + 4 files changed, 41 insertions(+) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index 8797b032e..394a2a941 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -370,6 +370,7 @@ struct bnxt { #define BNXT_FLAG_STINGRAY (1 << 14) #define BNXT_FLAG_FW_RESET (1 << 15) #define BNXT_FLAG_FATAL_ERROR (1 << 16) +#define BNXT_FLAG_FW_CAP_IF_CHANGE (1 << 17) #define BNXT_FLAG_EXT_STATS_SUPPORTED (1 << 29) #define BNXT_FLAG_NEW_RM (1 << 30) #define BNXT_FLAG_INIT_DONE(1U << 31) diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 1aef227f2..f7b2ef179 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -803,6 +803,8 @@ static int bnxt_dev_start_op(struct rte_eth_dev *eth_dev) bp->rx_cp_nr_rings, RTE_ETHDEV_QUEUE_STAT_CNTRS); } + bnxt_hwrm_if_change(bp, 1); + rc = bnxt_init_chip(bp); if (rc) goto error; @@ -829,6 +831,7 @@ static int bnxt_dev_start_op(struct rte_eth_dev *eth_dev) return 0; error: + bnxt_hwrm_if_change(bp, 0); bnxt_shutdown_nic(bp); bnxt_free_tx_mbufs(bp); bnxt_free_rx_mbufs(bp); @@ -895,6 +898,7 @@ static void bnxt_dev_stop_op(struct rte_eth_dev *eth_dev) bnxt_free_tx_mbufs(bp); bnxt_free_rx_mbufs(bp); bnxt_shutdown_nic(bp); + bnxt_hwrm_if_change(bp, 0); bp->dev_stopped = 1; } diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c index b27dbe87e..17c7b5e9e 100644 --- a/drivers/net/bnxt/bnxt_hwrm.c +++ b/drivers/net/bnxt/bnxt_hwrm.c @@ -716,6 +716,11 @@ int bnxt_hwrm_func_driver_register(struct bnxt *bp) rc = bnxt_hwrm_send_message(bp, &req, sizeof(req), BNXT_USE_CHIMP_MB); HWRM_CHECK_RESULT(); + + flags = rte_le_to_cpu_32(resp->flags); + if (flags & HWRM_FUNC_DRV_RGTR_OUTPUT_FLAGS_IF_CHANGE_SUPPORTED) + bp->flags |= BNXT_FLAG_FW_CAP_IF_CHANGE; + HWRM_UNLOCK(); bp->flags |= BNXT_FLAG_REGISTERED; @@ -4649,3 +4654,33 @@ int bnxt_hwrm_set_mac(struct bnxt *bp) return rc; } + +int bnxt_hwrm_if_change(struct bnxt *bp, bool state) +{ + struct hwrm_func_drv_if_change_output *resp = bp->hwrm_cmd_resp_addr; + struct hwrm_func_drv_if_change_input req = {0}; + int rc; + + if (!(bp->flags & BNXT_FLAG_FW_CAP_IF_CHANGE)) + return 0; + + /* Do not issue FUNC_DRV_IF_CHANGE during reset recovery. +* If we issue FUNC_DRV_IF_CHANGE with flags down before +* FUNC_DRV_UNRGTR, FW resets before FUNC_DRV_UNRGTR +*/ + if (!state && (bp->flags & BNXT_FLAG_FW_RESET)) + return 0; + + HWRM_PREP(req, FUNC_DRV_IF_CHANGE, BNXT_USE_CHIMP_MB); + + if (state) + req.flags = + rte_cpu_to_le_32(HWRM_FUNC_DRV_IF_CHANGE_INPUT_FLAGS_UP); + + rc = bnxt_hwrm_send_message(bp, &req, sizeof(req), BNXT_USE_CHIMP_MB); + + HWRM_CHECK_RESULT(); + HWRM_UNLOCK(); + + return rc; +} diff --git a/drivers/net/bnxt/bnxt_hwrm.h b/drivers/net/bnxt/bnxt_hwrm.h index a03620532..2f57e950b 100644 --- a/drivers/net/bnxt/bnxt_hwrm.h +++ b/drivers/net/bnxt/bnxt_hwrm.h @@ -201,4 +201,5 @@ int bnxt_hwrm_tunnel_redirect_query(struct bnxt *bp, uint32_t *type); int bnxt_hwrm_tunnel_redirect_info(struct bnxt *bp, uint8_t tun_type, uint16_t *dst_fid); int bnxt_hwrm_set_mac(struct bnxt *bp); +int bnxt_hwrm_if_change(struct bnxt *bp, bool state); #endif -- 2.20.1 (Apple Git-117)
[dpdk-dev] [PATCH 08/13] net/bnxt: advertise error recovery capability and handle async event
From: Kalesh AP 1. Advertise HWRM_FUNC_DRV_RGTR_INPUT_FLAGS_ERROR_RECOVERY_SUPPORT flag in the FUNC_DRV_RGTR command. 2. request for the async event ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY in the FUNC_DRV_RGTR command. 3. handle the async event EVENT_ID_ERROR_RECOVERY from FW. Error recovery support will be used by firmware only if all the driver instances support error recovery process. Signed-off-by: Kalesh AP Reviewed-by: Somnath Kotur Signed-off-by: Ajit Khaparde --- drivers/net/bnxt/bnxt.h | 2 ++ drivers/net/bnxt/bnxt_cpr.c | 45 drivers/net/bnxt/bnxt_cpr.h | 12 ++ drivers/net/bnxt/bnxt_hwrm.c | 5 drivers/net/bnxt/bnxt_hwrm.h | 2 ++ 5 files changed, 66 insertions(+) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index 1da09569d..f9147a9a8 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -365,6 +365,8 @@ struct bnxt_error_recovery_info { uint8_t delay_after_reset[BNXT_NUM_RESET_REG]; #define BNXT_FLAG_ERROR_RECOVERY_HOST (1 << 0) #define BNXT_FLAG_ERROR_RECOVERY_CO_CPU(1 << 1) +#define BNXT_FLAG_MASTER_FUNC (1 << 2) +#define BNXT_FLAG_RECOVERY_ENABLED (1 << 3) uint32_tflags; }; diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c index 6e0b1d67e..7f5b3314e 100644 --- a/drivers/net/bnxt/bnxt_cpr.c +++ b/drivers/net/bnxt/bnxt_cpr.c @@ -20,6 +20,7 @@ void bnxt_handle_async_event(struct bnxt *bp, struct hwrm_async_event_cmpl *async_cmp = (struct hwrm_async_event_cmpl *)cmp; uint16_t event_id = rte_le_to_cpu_16(async_cmp->event_id); + struct bnxt_error_recovery_info *info; uint32_t event_data; /* TODO: HWRM async events are not defined yet */ @@ -63,6 +64,31 @@ void bnxt_handle_async_event(struct bnxt *bp, bp->flags |= BNXT_FLAG_FW_RESET; bnxt_dev_reset_and_resume(bp); break; + case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: + info = bp->recovery_info; + + if (!info) + return; + + PMD_DRV_LOG(INFO, "Error recovery async event received\n"); + + event_data = rte_le_to_cpu_32(async_cmp->event_data1) & + EVENT_DATA1_FLAGS_MASK; + + if (event_data & EVENT_DATA1_FLAGS_MASTER_FUNC) + info->flags |= BNXT_FLAG_MASTER_FUNC; + else + info->flags &= ~BNXT_FLAG_MASTER_FUNC; + + if (event_data & EVENT_DATA1_FLAGS_RECOVERY_ENABLED) + info->flags |= BNXT_FLAG_RECOVERY_ENABLED; + else + info->flags &= ~BNXT_FLAG_RECOVERY_ENABLED; + + PMD_DRV_LOG(INFO, "recovery enabled(%d), master function(%d)\n", + bnxt_is_recovery_enabled(bp), + bnxt_is_master_func(bp)); + break; default: PMD_DRV_LOG(INFO, "handle_async_event id = 0x%x\n", event_id); break; @@ -184,3 +210,22 @@ int bnxt_event_hwrm_resp_handler(struct bnxt *bp, struct cmpl_base *cmp) return evt; } + +bool bnxt_is_master_func(struct bnxt *bp) +{ + if (bp->recovery_info->flags & BNXT_FLAG_MASTER_FUNC) + return true; + + return false; +} + +bool bnxt_is_recovery_enabled(struct bnxt *bp) +{ + struct bnxt_error_recovery_info *info; + + info = bp->recovery_info; + if (info && (info->flags & BNXT_FLAG_RECOVERY_ENABLED)) + return true; + + return false; +} diff --git a/drivers/net/bnxt/bnxt_cpr.h b/drivers/net/bnxt/bnxt_cpr.h index 4e63fd12f..22fba5b40 100644 --- a/drivers/net/bnxt/bnxt_cpr.h +++ b/drivers/net/bnxt/bnxt_cpr.h @@ -113,4 +113,16 @@ int bnxt_dev_reset_and_resume(struct bnxt *bp); #define EVENT_DATA1_REASON_CODE_MASK \ HWRM_ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_MASK +#define EVENT_DATA1_FLAGS_MASK \ + HWRM_ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASK + +#define EVENT_DATA1_FLAGS_MASTER_FUNC \ + HWRM_ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASTER_FUNC + +#define EVENT_DATA1_FLAGS_RECOVERY_ENABLED \ + HWRM_ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_RECOVERY_ENABLED + +bool bnxt_is_recovery_enabled(struct bnxt *bp); +bool bnxt_is_master_func(struct bnxt *bp); + #endif diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c index 2d9c43c98..350e867bf 100644 --- a/drivers/net/bnxt/bnxt_hwrm.c +++ b/drivers/net/bnxt/bnxt_hwrm.c @@ -685,6 +685,8 @@ int bnxt_hwrm_func_driver_register(struct bnxt *bp) return 0; flags = HWRM_FUNC_DRV_RGTR_INPUT_FLAGS_HOT_RESET_SUPPORT; + if (bp->flags & BNXT_FLAG_FW_CAP_ERROR_RECOVERY) +
[dpdk-dev] [PATCH 12/13] net/bnxt: add support for FW reset
From: Kalesh AP Added code to perform FW_RESET. When the driver detects error in FW, it has to initiate the recovery by resetting the cores. FW advertise the method to do a core reset, reset register offsets and values to perform reset in response of HWRM_ERROR_RECOVERY_QCFG command. There are 2 ways to recover from the error. 1. Master function issues core resets to recover from error. 2. Master function detects chimp dead condition and notify the Kong processor about the chimp dead case through FW_RESET HWRM command. Kong Processor send an RESET_NOTIFY async event with REASON_CODE_FW_EXCEPTION_FATAL to all the PF’s/VF’s that chimp is dead and it is going to reset the chimp. Signed-off-by: Kalesh AP Reviewed-by: Somnath Kotur Reviewed-by: Ajit Khaparde --- drivers/net/bnxt/bnxt.h| 1 + drivers/net/bnxt/bnxt_ethdev.c | 103 - drivers/net/bnxt/bnxt_hwrm.c | 26 + drivers/net/bnxt/bnxt_hwrm.h | 1 + 4 files changed, 130 insertions(+), 1 deletion(-) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index edaef7897..9ea84ec2f 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -389,6 +389,7 @@ struct bnxt_error_recovery_info { #define BNXT_FW_STATUS_REG_OFF(reg)((reg) & ~BNXT_FW_STATUS_REG_TYPE_MASK) #define BNXT_GRCP_WINDOW_2_BASE0x2000 +#define BNXT_GRCP_WINDOW_3_BASE0x3000 #define BNXT_HWRM_SHORT_REQ_LENsizeof(struct hwrm_short_input) struct bnxt { diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index e7b0b44c4..095395dae 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -3499,6 +3499,19 @@ static const struct eth_dev_ops bnxt_dev_ops = { .timesync_read_tx_timestamp = bnxt_timesync_read_tx_timestamp, }; +static uint32_t bnxt_map_reset_regs(struct bnxt *bp, uint32_t reg) +{ + uint32_t offset; + + /* Only pre-map the reset GRC registers using window 3 */ + rte_write32(reg & 0xf000, (uint8_t *)bp->bar0 + + BNXT_GRCPF_REG_WINDOW_BASE_OUT + 8); + + offset = BNXT_GRCP_WINDOW_3_BASE + (reg & 0xffc); + + return offset; +} + int bnxt_map_fw_health_status_regs(struct bnxt *bp) { struct bnxt_error_recovery_info *info = bp->recovery_info; @@ -3542,6 +3555,34 @@ static void bnxt_unmap_fw_health_status_regs(struct bnxt *bp) BNXT_GRCPF_REG_WINDOW_BASE_OUT + 4); } +static void bnxt_write_fw_reset_reg(struct bnxt *bp, uint32_t index) +{ + struct bnxt_error_recovery_info *info = bp->recovery_info; + uint32_t delay = info->delay_after_reset[index]; + uint32_t val = info->reset_reg_val[index]; + uint32_t reg = info->reset_reg[index]; + uint32_t type, offset; + + type = BNXT_FW_STATUS_REG_TYPE(reg); + offset = BNXT_FW_STATUS_REG_OFF(reg); + + switch (type) { + case BNXT_FW_STATUS_REG_TYPE_CFG: + rte_pci_write_config(bp->pdev, &val, sizeof(val), offset); + break; + case BNXT_FW_STATUS_REG_TYPE_GRC: + offset = bnxt_map_reset_regs(bp, offset); + rte_write32(val, (uint8_t *)bp->bar0 + offset); + break; + case BNXT_FW_STATUS_REG_TYPE_BAR0: + rte_write32(val, (uint8_t *)bp->bar0 + offset); + break; + } + /* wait on a specific interval of time until core reset is complete */ + if (delay) + rte_delay_ms(delay); +} + static void bnxt_dev_cleanup(struct bnxt *bp) { bnxt_set_hwrm_link_config(bp, false); @@ -3636,6 +3677,58 @@ uint32_t bnxt_read_fw_status_reg(struct bnxt *bp, uint32_t index) return val; } +static int bnxt_fw_reset_all(struct bnxt *bp) +{ + struct bnxt_error_recovery_info *info = bp->recovery_info; + uint32_t i; + int rc = 0; + + if (info->flags & BNXT_FLAG_ERROR_RECOVERY_HOST) { + /* Reset through master function driver */ + for (i = 0; i < info->reg_array_cnt; i++) + bnxt_write_fw_reset_reg(bp, i); + /* Wait for time specified by FW after triggering reset */ + rte_delay_ms(info->master_func_wait_period_after_reset); + } else if (info->flags & BNXT_FLAG_ERROR_RECOVERY_CO_CPU) { + /* Reset with the help of Kong processor */ + rc = bnxt_hwrm_fw_reset(bp); + if (rc) + PMD_DRV_LOG(ERR, "Failed to reset FW\n"); + } + + return rc; +} + +static void bnxt_fw_reset_cb(void *arg) +{ + struct bnxt *bp = arg; + struct bnxt_error_recovery_info *info = bp->recovery_info; + int rc = 0; + + /* Only Master function can do FW reset */ + if (bnxt_is_master_func(bp) && + bnxt_is_recovery_enabled(bp)) { + rc = bnxt_fw_reset_all(bp); + if (rc) { +
[dpdk-dev] [PATCH 13/13] net/bnxt: reduce verbosity of logs
From: Kalesh AP When IOMMU is available, EAL picks IOVA as VA as the default IOVA mode. This causes the bnxt driver to log warning messages saying "Memzone physical address same as virtual." and "Using rte_mem_virt2iova()" during load. Reduce the verbosity of logs to DEBUG. Signed-off-by: Kalesh AP Reviewed-by: Lance Richardson Reviewed-by: Somnath Kotur Reviewed-by: Ajit Khaparde --- drivers/net/bnxt/bnxt_ethdev.c | 21 + drivers/net/bnxt/bnxt_ring.c | 7 +++ drivers/net/bnxt/bnxt_vnic.c | 7 +++ 3 files changed, 15 insertions(+), 20 deletions(-) diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 095395dae..13f1ff6fb 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -3893,10 +3893,9 @@ static int bnxt_alloc_ctx_mem_blk(__rte_unused struct bnxt *bp, memset(mz->addr, 0, mz->len); mz_phys_addr = mz->iova; if ((unsigned long)mz->addr == mz_phys_addr) { - PMD_DRV_LOG(WARNING, - "Memzone physical address same as virtual.\n"); - PMD_DRV_LOG(WARNING, - "Using rte_mem_virt2iova()\n"); + PMD_DRV_LOG(DEBUG, + "physical address same as virtual\n"); + PMD_DRV_LOG(DEBUG, "Using rte_mem_virt2iova()\n"); mz_phys_addr = rte_mem_virt2iova(mz->addr); if (mz_phys_addr == RTE_BAD_IOVA) { PMD_DRV_LOG(ERR, @@ -3929,10 +3928,9 @@ static int bnxt_alloc_ctx_mem_blk(__rte_unused struct bnxt *bp, memset(mz->addr, 0, mz->len); mz_phys_addr = mz->iova; if ((unsigned long)mz->addr == mz_phys_addr) { - PMD_DRV_LOG(WARNING, + PMD_DRV_LOG(DEBUG, "Memzone physical address same as virtual.\n"); - PMD_DRV_LOG(WARNING, - "Using rte_mem_virt2iova()\n"); + PMD_DRV_LOG(DEBUG, "Using rte_mem_virt2iova()\n"); for (sz = 0; sz < mem_size; sz += BNXT_PAGE_SIZE) rte_mem_lock_page(((char *)mz->addr) + sz); mz_phys_addr = rte_mem_virt2iova(mz->addr); @@ -4120,9 +4118,9 @@ static int bnxt_alloc_stats_mem(struct bnxt *bp) memset(mz->addr, 0, mz->len); mz_phys_addr = mz->iova; if ((unsigned long)mz->addr == mz_phys_addr) { - PMD_DRV_LOG(WARNING, + PMD_DRV_LOG(DEBUG, "Memzone physical address same as virtual.\n"); - PMD_DRV_LOG(WARNING, + PMD_DRV_LOG(DEBUG, "Using rte_mem_virt2iova()\n"); mz_phys_addr = rte_mem_virt2iova(mz->addr); if (mz_phys_addr == RTE_BAD_IOVA) { @@ -4158,10 +4156,9 @@ static int bnxt_alloc_stats_mem(struct bnxt *bp) memset(mz->addr, 0, mz->len); mz_phys_addr = mz->iova; if ((unsigned long)mz->addr == mz_phys_addr) { - PMD_DRV_LOG(WARNING, + PMD_DRV_LOG(DEBUG, "Memzone physical address same as virtual\n"); - PMD_DRV_LOG(WARNING, - "Using rte_mem_virt2iova()\n"); + PMD_DRV_LOG(DEBUG, "Using rte_mem_virt2iova()\n"); mz_phys_addr = rte_mem_virt2iova(mz->addr); if (mz_phys_addr == RTE_BAD_IOVA) { PMD_DRV_LOG(ERR, diff --git a/drivers/net/bnxt/bnxt_ring.c b/drivers/net/bnxt/bnxt_ring.c index f19865c83..2f57e038a 100644 --- a/drivers/net/bnxt/bnxt_ring.c +++ b/drivers/net/bnxt/bnxt_ring.c @@ -212,10 +212,9 @@ int bnxt_alloc_rings(struct bnxt *bp, uint16_t qidx, mz_phys_addr_base = mz->iova; mz_phys_addr = mz->iova; if ((unsigned long)mz->addr == mz_phys_addr_base) { - PMD_DRV_LOG(WARNING, - "Memzone physical address same as virtual.\n"); - PMD_DRV_LOG(WARNING, - "Using rte_mem_virt2iova()\n"); + PMD_DRV_LOG(DEBUG, + "Memzone physical address same as virtual.\n"); + PMD_DRV_LOG(DEBUG, "Using rte_mem_virt2iova()\n"); for (sz = 0; sz < total_alloc_len; sz += getpagesize()) rte_mem_lock_page(((char *)mz->addr) + sz); mz_phys_addr_base = rte_mem_virt2iova(mz->addr); diff --git a/drivers/net/bnxt/bnxt_vnic.c b/drivers/net/bnxt/bnxt_vnic.c index 98415633e..9ea99388b 100644 --- a/drivers/net/bnxt/bnxt_vnic.c +++ b/drivers/net/bnxt/bnxt_vnic.c @@ -150,10 +150,9 @@ int bnxt_alloc_vnic_attributes(struct bnxt *bp) } mz_phys_addr = mz->iova; if ((unsigned long)mz->addr == mz_phys_addr) { - PMD_DRV_LOG(WARNING, - "Memzone physical address s
[dpdk-dev] [PATCH 07/13] net/bnxt: map status registers for FW health monitoring
From: Kalesh AP HWRM_ERROR_RECOVERY_QCFG command returns the FW status registers offset for periodic firmware health check monitoring. Map them to GRC window 2. Signed-off-by: Kalesh AP Reviewed-by: Somnath Kotur Signed-off-by: Ajit Khaparde --- drivers/net/bnxt/bnxt.h| 22 - drivers/net/bnxt/bnxt_ethdev.c | 44 ++ drivers/net/bnxt/bnxt_hwrm.c | 4 3 files changed, 69 insertions(+), 1 deletion(-) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index 19bd13a7f..1da09569d 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -354,7 +354,9 @@ struct bnxt_error_recovery_info { #define BNXT_FW_HEARTBEAT_CNT_REG 1 #define BNXT_FW_RECOVERY_CNT_REG 2 #define BNXT_FW_RESET_INPROG_REG 3 - uint32_tstatus_regs[4]; +#define BNXT_FW_STATUS_REG_CNT 4 + uint32_tstatus_regs[BNXT_FW_STATUS_REG_CNT]; + uint32_tmapped_status_regs[BNXT_FW_STATUS_REG_CNT]; uint32_treset_inprogress_reg_mask; #define BNXT_NUM_RESET_REG 16 uint8_t reg_array_cnt; @@ -366,6 +368,22 @@ struct bnxt_error_recovery_info { uint32_tflags; }; +/* address space location of register */ +#define BNXT_FW_STATUS_REG_TYPE_MASK 3 +/* register is located in PCIe config space */ +#define BNXT_FW_STATUS_REG_TYPE_CFG0 +/* register is located in GRC address space */ +#define BNXT_FW_STATUS_REG_TYPE_GRC1 +/* register is located in BAR0 */ +#define BNXT_FW_STATUS_REG_TYPE_BAR0 2 +/* register is located in BAR1 */ +#define BNXT_FW_STATUS_REG_TYPE_BAR1 3 + +#define BNXT_FW_STATUS_REG_TYPE(reg) ((reg) & BNXT_FW_STATUS_REG_TYPE_MASK) +#define BNXT_FW_STATUS_REG_OFF(reg)((reg) & ~BNXT_FW_STATUS_REG_TYPE_MASK) + +#define BNXT_GRCP_WINDOW_2_BASE0x2000 + #define BNXT_HWRM_SHORT_REQ_LENsizeof(struct hwrm_short_input) struct bnxt { void*bar0; @@ -510,6 +528,8 @@ int bnxt_link_update_op(struct rte_eth_dev *eth_dev, int wait_to_complete); int bnxt_rcv_msg_from_vf(struct bnxt *bp, uint16_t vf_id, void *msg); int is_bnxt_in_error(struct bnxt *bp); +int bnxt_map_fw_health_status_regs(struct bnxt *bp); + bool is_bnxt_supported(struct rte_eth_dev *dev); bool bnxt_stratus_device(struct bnxt *bp); extern const struct rte_flow_ops bnxt_flow_ops; diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 18046c00a..52c460d2c 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -3496,6 +3496,49 @@ static const struct eth_dev_ops bnxt_dev_ops = { .timesync_read_tx_timestamp = bnxt_timesync_read_tx_timestamp, }; +int bnxt_map_fw_health_status_regs(struct bnxt *bp) +{ + struct bnxt_error_recovery_info *info = bp->recovery_info; + uint32_t reg_base = 0x; + int i; + + /* Only pre-map the monitoring GRC registers using window 2 */ + for (i = 0; i < BNXT_FW_STATUS_REG_CNT; i++) { + uint32_t reg = info->status_regs[i]; + + if (BNXT_FW_STATUS_REG_TYPE(reg) != BNXT_FW_STATUS_REG_TYPE_GRC) + continue; + + if (reg_base == 0x) + reg_base = reg & 0xf000; + if ((reg & 0xf000) != reg_base) + return -ERANGE; + + /* Use mask 0xffc as the Lower 2 bits indicates +* address space location +*/ + info->mapped_status_regs[i] = BNXT_GRCP_WINDOW_2_BASE + + (reg & 0xffc); + } + + if (reg_base == 0x) + return 0; + + rte_write32(reg_base, (uint8_t *)bp->bar0 + + BNXT_GRCPF_REG_WINDOW_BASE_OUT + 4); + + return 0; +} + +static void bnxt_unmap_fw_health_status_regs(struct bnxt *bp) +{ + if (!(bp->flags & BNXT_FLAG_FW_CAP_ERROR_RECOVERY)) + return; + + rte_write32(0, (uint8_t *)bp->bar0 + + BNXT_GRCPF_REG_WINDOW_BASE_OUT + 4); +} + static void bnxt_dev_cleanup(struct bnxt *bp) { bnxt_set_hwrm_link_config(bp, false); @@ -4227,6 +4270,7 @@ bnxt_uninit_resources(struct bnxt *bp, bool reconfig_dev) bnxt_free_int(bp); bnxt_free_mem(bp, reconfig_dev); bnxt_hwrm_func_buf_unrgtr(bp); + bnxt_unmap_fw_health_status_regs(bp); rc = bnxt_hwrm_func_driver_unregister(bp, 0); bp->flags &= ~BNXT_FLAG_REGISTERED; bnxt_free_ctx_mem(bp); diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c index e2c993936..2d9c43c98 100644 --- a/drivers/net/bnxt/bnxt_hwrm.c +++ b/drivers/net/bnxt/bnxt_hwrm.c @@ -4767,6 +4767,10 @@ int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp) err: HWRM_UNLOCK(); + /* Map the FW status registers */ + if (!rc) + rc = bnxt_map_fw_health
[dpdk-dev] [PATCH 10/13] net/bnxt: use BIT macro instead of bit fields
From: Kalesh AP use BIT macro instead of bit fields. Signed-off-by: Kalesh AP Reviewed-by: Somnath Kotur Signed-off-by: Ajit Khaparde --- drivers/net/bnxt/bnxt.h | 73 ++-- drivers/net/bnxt/bnxt_util.h | 4 ++ 2 files changed, 41 insertions(+), 36 deletions(-) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index a23c4a64c..93aac15b4 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -19,6 +19,7 @@ #include #include "bnxt_cpr.h" +#include "bnxt_util.h" #define BNXT_MAX_MTU 9574 #define VLAN_TAG_SIZE 4 @@ -198,16 +199,16 @@ struct bnxt_ptp_cfg { struct bnxt *bp; #define BNXT_MAX_TX_TS 1 uint16_trxctl; -#define BNXT_PTP_MSG_SYNC (1 << 0) -#define BNXT_PTP_MSG_DELAY_REQ (1 << 1) -#define BNXT_PTP_MSG_PDELAY_REQ(1 << 2) -#define BNXT_PTP_MSG_PDELAY_RESP (1 << 3) -#define BNXT_PTP_MSG_FOLLOW_UP (1 << 8) -#define BNXT_PTP_MSG_DELAY_RESP(1 << 9) -#define BNXT_PTP_MSG_PDELAY_RESP_FOLLOW_UP (1 << 10) -#define BNXT_PTP_MSG_ANNOUNCE (1 << 11) -#define BNXT_PTP_MSG_SIGNALING (1 << 12) -#define BNXT_PTP_MSG_MANAGEMENT(1 << 13) +#define BNXT_PTP_MSG_SYNC BIT(0) +#define BNXT_PTP_MSG_DELAY_REQ BIT(1) +#define BNXT_PTP_MSG_PDELAY_REQBIT(2) +#define BNXT_PTP_MSG_PDELAY_RESP BIT(3) +#define BNXT_PTP_MSG_FOLLOW_UP BIT(8) +#define BNXT_PTP_MSG_DELAY_RESPBIT(9) +#define BNXT_PTP_MSG_PDELAY_RESP_FOLLOW_UP BIT(10) +#define BNXT_PTP_MSG_ANNOUNCE BIT(11) +#define BNXT_PTP_MSG_SIGNALING BIT(12) +#define BNXT_PTP_MSG_MANAGEMENTBIT(13) #define BNXT_PTP_MSG_EVENTS(BNXT_PTP_MSG_SYNC |\ BNXT_PTP_MSG_DELAY_REQ | \ BNXT_PTP_MSG_PDELAY_REQ | \ @@ -363,10 +364,10 @@ struct bnxt_error_recovery_info { uint32_treset_reg[BNXT_NUM_RESET_REG]; uint32_treset_reg_val[BNXT_NUM_RESET_REG]; uint8_t delay_after_reset[BNXT_NUM_RESET_REG]; -#define BNXT_FLAG_ERROR_RECOVERY_HOST (1 << 0) -#define BNXT_FLAG_ERROR_RECOVERY_CO_CPU(1 << 1) -#define BNXT_FLAG_MASTER_FUNC (1 << 2) -#define BNXT_FLAG_RECOVERY_ENABLED (1 << 3) +#define BNXT_FLAG_ERROR_RECOVERY_HOST BIT(0) +#define BNXT_FLAG_ERROR_RECOVERY_CO_CPUBIT(1) +#define BNXT_FLAG_MASTER_FUNC BIT(2) +#define BNXT_FLAG_RECOVERY_ENABLED BIT(3) uint32_tflags; uint32_tlast_heart_beat; @@ -399,28 +400,28 @@ struct bnxt { void*doorbell_base; uint32_tflags; -#define BNXT_FLAG_REGISTERED (1 << 0) -#define BNXT_FLAG_VF (1 << 1) -#define BNXT_FLAG_PORT_STATS (1 << 2) -#define BNXT_FLAG_JUMBO(1 << 3) -#define BNXT_FLAG_SHORT_CMD(1 << 4) -#define BNXT_FLAG_UPDATE_HASH (1 << 5) -#define BNXT_FLAG_PTP_SUPPORTED(1 << 6) -#define BNXT_FLAG_MULTI_HOST(1 << 7) -#define BNXT_FLAG_EXT_RX_PORT_STATS(1 << 8) -#define BNXT_FLAG_EXT_TX_PORT_STATS(1 << 9) -#define BNXT_FLAG_KONG_MB_EN (1 << 10) -#define BNXT_FLAG_TRUSTED_VF_EN(1 << 11) -#define BNXT_FLAG_DFLT_VNIC_SET(1 << 12) -#define BNXT_FLAG_THOR_CHIP(1 << 13) -#define BNXT_FLAG_STINGRAY (1 << 14) -#define BNXT_FLAG_FW_RESET (1 << 15) -#define BNXT_FLAG_FATAL_ERROR (1 << 16) -#define BNXT_FLAG_FW_CAP_IF_CHANGE (1 << 17) -#define BNXT_FLAG_FW_CAP_ERROR_RECOVERY(1 << 18) -#define BNXT_FLAG_EXT_STATS_SUPPORTED (1 << 29) -#define BNXT_FLAG_NEW_RM (1 << 30) -#define BNXT_FLAG_INIT_DONE(1U << 31) +#define BNXT_FLAG_REGISTERED BIT(0) +#define BNXT_FLAG_VF BIT(1) +#define BNXT_FLAG_PORT_STATS BIT(2) +#define BNXT_FLAG_JUMBOBIT(3) +#define BNXT_FLAG_SHORT_CMDBIT(4) +#define BNXT_FLAG_UPDATE_HASH BIT(5) +#define BNXT_FLAG_PTP_SUPPORTEDBIT(6) +#define BNXT_FLAG_MULTI_HOST BIT(7) +#define BNXT_FLAG_EXT_RX_PORT_STATSBIT(8) +#define BNXT_FLAG_EXT_TX_PORT_STATSBIT(9) +#define BNXT_FLAG_KONG_MB_EN BIT(10) +#define BNXT_FLAG_TRUSTED_VF_ENBIT(11) +#define BNXT_FLAG_DFLT_VNIC_SETBIT(12) +#define BNXT_FLAG_THOR_CHIPBIT(13) +#define BNXT_FLAG_STINGRAY BIT(14) +#define BNXT_FLAG_FW_RESET BIT(15) +#define BNXT_FLAG_FATAL_ERROR BIT(16) +#define BNXT_FLAG_FW_CAP_IF_CHANGE BIT(17) +#define BNXT_FLAG_FW_CAP_ERROR_RECOVERYBIT(18) +#define BNXT_FLAG_EXT_STATS_SUPPORTED BIT(19) +#defi
[dpdk-dev] [PATCH 06/13] net/bnxt: query firmware error recovery capabilities
From: Kalesh AP In Driver initiated error recovery process, driver has to know about the registers offset and values to initiate FW reset. The HWRM command HWRM_ERROR_RECOVERY_QCFG is used to obtain all the registers and values required to initiate FW reset. This command response includes FW heart_beat register, health status register, Error counter register, register offsets and values to do chip reset if firmware crashes and becomes unresponsive. Signed-off-by: Kalesh AP Reviewed-by: Somnath Kotur Signed-off-by: Ajit Khaparde --- drivers/net/bnxt/bnxt.h| 27 +++ drivers/net/bnxt/bnxt_ethdev.c | 10 drivers/net/bnxt/bnxt_hwrm.c | 89 ++ drivers/net/bnxt/bnxt_hwrm.h | 1 + 4 files changed, 127 insertions(+) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index 394a2a941..19bd13a7f 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -343,6 +343,29 @@ struct bnxt_ctx_mem_info { #define US_PER_MS 1000 #define NS_PER_US 1000 +struct bnxt_error_recovery_info { + /* All units in milliseconds */ + uint32_tdriver_polling_freq; + uint32_tmaster_func_wait_period; + uint32_tnormal_func_wait_period; + uint32_tmaster_func_wait_period_after_reset; + uint32_tmax_bailout_time_after_reset; +#define BNXT_FW_STATUS_REG 0 +#define BNXT_FW_HEARTBEAT_CNT_REG 1 +#define BNXT_FW_RECOVERY_CNT_REG 2 +#define BNXT_FW_RESET_INPROG_REG 3 + uint32_tstatus_regs[4]; + uint32_treset_inprogress_reg_mask; +#define BNXT_NUM_RESET_REG 16 + uint8_t reg_array_cnt; + uint32_treset_reg[BNXT_NUM_RESET_REG]; + uint32_treset_reg_val[BNXT_NUM_RESET_REG]; + uint8_t delay_after_reset[BNXT_NUM_RESET_REG]; +#define BNXT_FLAG_ERROR_RECOVERY_HOST (1 << 0) +#define BNXT_FLAG_ERROR_RECOVERY_CO_CPU(1 << 1) + uint32_tflags; +}; + #define BNXT_HWRM_SHORT_REQ_LENsizeof(struct hwrm_short_input) struct bnxt { void*bar0; @@ -371,6 +394,7 @@ struct bnxt { #define BNXT_FLAG_FW_RESET (1 << 15) #define BNXT_FLAG_FATAL_ERROR (1 << 16) #define BNXT_FLAG_FW_CAP_IF_CHANGE (1 << 17) +#define BNXT_FLAG_FW_CAP_ERROR_RECOVERY(1 << 18) #define BNXT_FLAG_EXT_STATS_SUPPORTED (1 << 29) #define BNXT_FLAG_NEW_RM (1 << 30) #define BNXT_FLAG_INIT_DONE(1U << 31) @@ -477,6 +501,9 @@ struct bnxt { uint16_tfw_reset_min_msecs; uint16_tfw_reset_max_msecs; + + /* Struct to hold adapter error recovery related info */ + struct bnxt_error_recovery_info *recovery_info; }; int bnxt_link_update_op(struct rte_eth_dev *eth_dev, int wait_to_complete); diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index a0b9e8f9e..18046c00a 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -4071,6 +4071,11 @@ static int bnxt_init_fw(struct bnxt *bp) if (rc) return rc; + /* Get the adapter error recovery support info */ + rc = bnxt_hwrm_error_recovery_qcfg(bp); + if (rc) + bp->flags &= ~BNXT_FLAG_FW_CAP_ERROR_RECOVERY; + if (mtu >= RTE_ETHER_MIN_MTU && mtu <= BNXT_MAX_MTU && mtu != bp->eth_dev->data->mtu) bp->eth_dev->data->mtu = mtu; @@ -4228,6 +4233,11 @@ bnxt_uninit_resources(struct bnxt *bp, bool reconfig_dev) if (!reconfig_dev) bnxt_free_hwrm_resources(bp); + if (bp->recovery_info != NULL) { + rte_free(bp->recovery_info); + bp->recovery_info = NULL; + } + return rc; } diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c index 17c7b5e9e..e2c993936 100644 --- a/drivers/net/bnxt/bnxt_hwrm.c +++ b/drivers/net/bnxt/bnxt_hwrm.c @@ -626,6 +626,13 @@ static int __bnxt_hwrm_func_qcaps(struct bnxt *bp) if (flags & HWRM_FUNC_QCAPS_OUTPUT_FLAGS_EXT_STATS_SUPPORTED) bp->flags |= BNXT_FLAG_EXT_STATS_SUPPORTED; + if (flags & HWRM_FUNC_QCAPS_OUTPUT_FLAGS_ERROR_RECOVERY_CAPABLE) { + bp->flags |= BNXT_FLAG_FW_CAP_ERROR_RECOVERY; + PMD_DRV_LOG(DEBUG, "Adapter Error recovery SUPPORTED\n"); + } else { + bp->flags &= ~BNXT_FLAG_FW_CAP_ERROR_RECOVERY; + } + HWRM_UNLOCK(); return rc; @@ -4684,3 +4691,85 @@ int bnxt_hwrm_if_change(struct bnxt *bp, bool state) return rc; } + +int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp) +{ + struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr; + struct bnxt_error_recovery_info *info; + struct hwrm_error_recovery_qcfg_input req = {0}; + uint32_t flags = 0; + unsigned int i; + int
[dpdk-dev] [PATCH 11/13] net/bnxt: reschedule the health check alarm correctly
From: Kalesh AP When the driver receives the error recovery notify event from fw for the first time, it has to read the heartbeat count register and recovery count register and schedule the fw health check task for periodically monitoring the fw health. FW may send this event at a later time when the state of master function changes. There is no need to schedule the health check task this time. Signed-off-by: Kalesh AP Reviewed-by: Santoshkumar Karanappa Rastapur Signed-off-by: Ajit Khaparde --- drivers/net/bnxt/bnxt.h| 1 + drivers/net/bnxt/bnxt_cpr.c| 3 +++ drivers/net/bnxt/bnxt_ethdev.c | 2 ++ 3 files changed, 6 insertions(+) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index 93aac15b4..edaef7897 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -422,6 +422,7 @@ struct bnxt { #define BNXT_FLAG_EXT_STATS_SUPPORTED BIT(19) #define BNXT_FLAG_NEW_RM BIT(20) #define BNXT_FLAG_INIT_DONEBIT(21) +#define BNXT_FLAG_FW_HEALTH_CHECK_SCHEDULEDBIT(22) #define BNXT_PF(bp)(!((bp)->flags & BNXT_FLAG_VF)) #define BNXT_VF(bp)((bp)->flags & BNXT_FLAG_VF) #define BNXT_NPAR(bp) ((bp)->port_partition_type) diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c index a692fbe7c..50f93bd21 100644 --- a/drivers/net/bnxt/bnxt_cpr.c +++ b/drivers/net/bnxt/bnxt_cpr.c @@ -89,6 +89,9 @@ void bnxt_handle_async_event(struct bnxt *bp, bnxt_is_recovery_enabled(bp), bnxt_is_master_func(bp)); + if (bp->flags & BNXT_FLAG_FW_HEALTH_CHECK_SCHEDULED) + return; + info->last_heart_beat = bnxt_read_fw_status_reg(bp, BNXT_FW_HEARTBEAT_CNT_REG); info->last_reset_counter = diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 0317eb888..e7b0b44c4 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -3687,6 +3687,7 @@ void bnxt_schedule_fw_health_check(struct bnxt *bp) rte_eal_alarm_set(US_PER_MS * polling_freq, bnxt_check_fw_health, (void *)bp); + bp->flags |= BNXT_FLAG_FW_HEALTH_CHECK_SCHEDULED; } static void bnxt_cancel_fw_health_check(struct bnxt *bp) @@ -3695,6 +3696,7 @@ static void bnxt_cancel_fw_health_check(struct bnxt *bp) return; rte_eal_alarm_cancel(bnxt_check_fw_health, (void *)bp); + bp->flags &= ~BNXT_FLAG_FW_HEALTH_CHECK_SCHEDULED; } static bool bnxt_vf_pciid(uint16_t id) -- 2.20.1 (Apple Git-117)
[dpdk-dev] [PATCH 05/13] net/bnxt: handle fatal event from FW under error conditions
From: Kalesh AP When firmware hit some unrecoverable error conditions, firmware initiate the recovery by sending an async event EVENT_CMPL_EVENT_ID_RESET_NOTIFY with data1 set to RESET_NOTIFY_EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL to all host drivers and will reset the chip. The recovery procedure is same sequence as the one for hot FW upgrade. Signed-off-by: Kalesh AP Reviewed-by: Somnath Kotur Reviewed-by: Ajit Khaparde --- drivers/net/bnxt/bnxt_cpr.c| 13 +++-- drivers/net/bnxt/bnxt_cpr.h| 5 + drivers/net/bnxt/bnxt_ethdev.c | 3 +++ 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c index cefb5db2a..6e0b1d67e 100644 --- a/drivers/net/bnxt/bnxt_cpr.c +++ b/drivers/net/bnxt/bnxt_cpr.c @@ -20,6 +20,7 @@ void bnxt_handle_async_event(struct bnxt *bp, struct hwrm_async_event_cmpl *async_cmp = (struct hwrm_async_event_cmpl *)cmp; uint16_t event_id = rte_le_to_cpu_16(async_cmp->event_id); + uint32_t event_data; /* TODO: HWRM async events are not defined yet */ /* Needs to handle: link events, error events, etc. */ @@ -41,6 +42,7 @@ void bnxt_handle_async_event(struct bnxt *bp, PMD_DRV_LOG(INFO, "Port conn async event\n"); break; case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY: + event_data = rte_le_to_cpu_32(async_cmp->event_data1); /* timestamp_lo/hi values are in units of 100ms */ bp->fw_reset_max_msecs = async_cmp->timestamp_hi ? rte_le_to_cpu_16(async_cmp->timestamp_hi) * 100 : @@ -48,8 +50,15 @@ void bnxt_handle_async_event(struct bnxt *bp, bp->fw_reset_min_msecs = async_cmp->timestamp_lo ? async_cmp->timestamp_lo * 100 : BNXT_MIN_FW_READY_TIMEOUT; - PMD_DRV_LOG(INFO, - "Firmware non-fatal reset event received\n"); + if ((event_data & EVENT_DATA1_REASON_CODE_MASK) == + EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL) { + PMD_DRV_LOG(INFO, + "Firmware fatal reset event received\n"); + bp->flags |= BNXT_FLAG_FATAL_ERROR; + } else { + PMD_DRV_LOG(INFO, + "Firmware non-fatal reset event received\n"); + } bp->flags |= BNXT_FLAG_FW_RESET; bnxt_dev_reset_and_resume(bp); diff --git a/drivers/net/bnxt/bnxt_cpr.h b/drivers/net/bnxt/bnxt_cpr.h index 4f86e3f60..4e63fd12f 100644 --- a/drivers/net/bnxt/bnxt_cpr.h +++ b/drivers/net/bnxt/bnxt_cpr.h @@ -108,4 +108,9 @@ void bnxt_handle_fwd_req(struct bnxt *bp, struct cmpl_base *cmp); int bnxt_event_hwrm_resp_handler(struct bnxt *bp, struct cmpl_base *cmp); int bnxt_dev_reset_and_resume(struct bnxt *bp); +#define EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL \ + HWRM_ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL +#define EVENT_DATA1_REASON_CODE_MASK \ + HWRM_ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_MASK + #endif diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index f7b2ef179..a0b9e8f9e 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -3512,6 +3512,9 @@ static void bnxt_dev_recover(void *arg) int timeout = bp->fw_reset_max_msecs; int rc = 0; + /* Clear Error flag so that device re-init should happen */ + bp->flags &= ~BNXT_FLAG_FATAL_ERROR; + do { rc = bnxt_hwrm_ver_get(bp); if (rc == 0) -- 2.20.1 (Apple Git-117)
[dpdk-dev] [PATCH 09/13] net/bnxt: add code for periodic FW health monitoring
From: Kalesh AP Periodically poll the FW heartbeat register and FW recovery counter registers to check the FW health. Polling frequency will be advertised by the FW in HWRM_ERROR_RECOVERY_QCFG response. Schedule the task upon receiving the async event from FW. Signed-off-by: Kalesh AP Reviewed-by: Ajit Khaparde Reviewed-by: Somnath Kotur --- drivers/net/bnxt/bnxt.h| 5 ++ drivers/net/bnxt/bnxt_cpr.c| 7 +++ drivers/net/bnxt/bnxt_ethdev.c | 89 ++ 3 files changed, 101 insertions(+) diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h index f9147a9a8..a23c4a64c 100644 --- a/drivers/net/bnxt/bnxt.h +++ b/drivers/net/bnxt/bnxt.h @@ -368,6 +368,9 @@ struct bnxt_error_recovery_info { #define BNXT_FLAG_MASTER_FUNC (1 << 2) #define BNXT_FLAG_RECOVERY_ENABLED (1 << 3) uint32_tflags; + + uint32_tlast_heart_beat; + uint32_tlast_reset_counter; }; /* address space location of register */ @@ -531,6 +534,8 @@ int bnxt_rcv_msg_from_vf(struct bnxt *bp, uint16_t vf_id, void *msg); int is_bnxt_in_error(struct bnxt *bp); int bnxt_map_fw_health_status_regs(struct bnxt *bp); +uint32_t bnxt_read_fw_status_reg(struct bnxt *bp, uint32_t index); +void bnxt_schedule_fw_health_check(struct bnxt *bp); bool is_bnxt_supported(struct rte_eth_dev *dev); bool bnxt_stratus_device(struct bnxt *bp); diff --git a/drivers/net/bnxt/bnxt_cpr.c b/drivers/net/bnxt/bnxt_cpr.c index 7f5b3314e..a692fbe7c 100644 --- a/drivers/net/bnxt/bnxt_cpr.c +++ b/drivers/net/bnxt/bnxt_cpr.c @@ -88,6 +88,13 @@ void bnxt_handle_async_event(struct bnxt *bp, PMD_DRV_LOG(INFO, "recovery enabled(%d), master function(%d)\n", bnxt_is_recovery_enabled(bp), bnxt_is_master_func(bp)); + + info->last_heart_beat = + bnxt_read_fw_status_reg(bp, BNXT_FW_HEARTBEAT_CNT_REG); + info->last_reset_counter = + bnxt_read_fw_status_reg(bp, BNXT_FW_RECOVERY_CNT_REG); + + bnxt_schedule_fw_health_check(bp); break; default: PMD_DRV_LOG(INFO, "handle_async_event id = 0x%x\n", event_id); diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 52c460d2c..0317eb888 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -169,6 +169,7 @@ static int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t new_mtu); static int bnxt_dev_uninit(struct rte_eth_dev *eth_dev); static int bnxt_init_resources(struct bnxt *bp, bool reconfig_dev); static int bnxt_uninit_resources(struct bnxt *bp, bool reconfig_dev); +static void bnxt_cancel_fw_health_check(struct bnxt *bp); int is_bnxt_in_error(struct bnxt *bp) { @@ -880,6 +881,8 @@ static void bnxt_dev_stop_op(struct rte_eth_dev *eth_dev) /* disable uio/vfio intr/eventfd mapping */ rte_intr_disable(intr_handle); + bnxt_cancel_fw_health_check(bp); + bp->flags &= ~BNXT_FLAG_INIT_DONE; if (bp->eth_dev->data->dev_started) { /* TBD: STOP HW queues DMA */ @@ -3608,6 +3611,92 @@ int bnxt_dev_reset_and_resume(struct bnxt *bp) return rc; } +uint32_t bnxt_read_fw_status_reg(struct bnxt *bp, uint32_t index) +{ + struct bnxt_error_recovery_info *info = bp->recovery_info; + uint32_t reg = info->status_regs[index]; + uint32_t type, offset, val = 0; + + type = BNXT_FW_STATUS_REG_TYPE(reg); + offset = BNXT_FW_STATUS_REG_OFF(reg); + + switch (type) { + case BNXT_FW_STATUS_REG_TYPE_CFG: + rte_pci_read_config(bp->pdev, &val, sizeof(val), offset); + break; + case BNXT_FW_STATUS_REG_TYPE_GRC: + offset = info->mapped_status_regs[index]; + /* FALLTHROUGH */ + case BNXT_FW_STATUS_REG_TYPE_BAR0: + val = rte_le_to_cpu_32(rte_read32((uint8_t *)bp->bar0 + + offset)); + break; + } + + return val; +} + +/* Driver should poll FW heartbeat, reset_counter with the frequency + * advertised by FW in HWRM_ERROR_RECOVERY_QCFG. + * When the driver detects heartbeat stop or change in reset_counter, + * it has to trigger a reset to recover from the error condition. + * A “master PF” is the function who will have the privilege to + * initiate the chimp reset. The master PF will be elected by the + * firmware and will be notified through async message. + */ +static void bnxt_check_fw_health(void *arg) +{ + struct bnxt *bp = arg; + struct bnxt_error_recovery_info *info = bp->recovery_info; + uint32_t val = 0; + + if (!info || !bnxt_is_recovery_enabled(bp) || + is_bnxt_in_error(bp)) + return; + + val = bnxt_read_fw_status_reg(bp, BNXT_FW_HEARTBEAT_CNT_REG); + if (val == info->last_heart_beat) +
Re: [dpdk-dev] [PATCH v2] timer: use rte_mp_msg to get freq from primary process
On 08/21, Jim Harris wrote: >Ideally, get_tsc_freq_arch() is able to provide the >TSC rate using architecture-specific means. When that >is not possible, DPDK reverts to calculating the >TSC rate with a 100ms nanosleep or 1s sleep. The latter >occurs more frequently in VMs which often do not have >access to the data they need from arch-specific means >(CPUID leaf 0x15 or MSR 0xCE on x86). > >In secondary processes, the extra 100ms is especially >noticeable and consumes the bulk of rte_eal_init() >execution time. So in secondary processes, if >we cannot get the TSC rate using get_tsc_freq_arch(), >try to get the TSC rate from the primary process >instead using rte_mp_msg. This is much faster than >100ms. > >Reduces rte_eal_init() execution time in a secondary >process from 165ms to 66ms on my test system. > >Signed-off-by: Jim Harris >Change-Id: I584419ed1c7d6f47841e0a0eb23f34c9f1186d35 This Change-Id line is unnecessary. Thanks, Xiaolong >--- > lib/librte_eal/common/eal_common_timer.c | 62 ++ > 1 file changed, 62 insertions(+) > >diff --git a/lib/librte_eal/common/eal_common_timer.c >b/lib/librte_eal/common/eal_common_timer.c >index 145543de7..ad965455d 100644 >--- a/lib/librte_eal/common/eal_common_timer.c >+++ b/lib/librte_eal/common/eal_common_timer.c >@@ -15,9 +15,17 @@ > #include > #include > #include >+#include >+#include > > #include "eal_private.h" > >+#define EAL_TIMER_MP "eal_timer_mp_sync" >+ >+struct timer_mp_param { >+ uint64_t tsc_hz; >+}; >+ > /* The frequency of the RDTSC timer resolution */ > static uint64_t eal_tsc_resolution_hz; > >@@ -74,12 +82,58 @@ estimate_tsc_freq(void) > return RTE_ALIGN_MUL_NEAR(rte_rdtsc() - start, CYC_PER_10MHZ); > } > >+static uint64_t >+get_tsc_freq_from_primary(void) >+{ >+ struct rte_mp_msg mp_req = {0}; >+ struct rte_mp_reply mp_reply = {0}; >+ struct timer_mp_param *r; >+ struct timespec ts = {.tv_sec = 1, .tv_nsec = 0}; >+ uint64_t tsc_hz; >+ >+ strcpy(mp_req.name, EAL_TIMER_MP); >+ if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) || >+ mp_reply.nb_received != 1) { >+ tsc_hz = 0; >+ } else { >+ r = (struct timer_mp_param *)mp_reply.msgs[0].param; >+ tsc_hz = r->tsc_hz; >+ } >+ >+ free(mp_reply.msgs); >+ return tsc_hz; >+} >+ >+static int >+timer_mp_primary(__attribute__((unused)) const struct rte_mp_msg *msg, >+ const void *peer) >+{ >+ struct rte_mp_msg reply = {0}; >+ struct timer_mp_param *r = (struct timer_mp_param *)reply.param; >+ >+ r->tsc_hz = eal_tsc_resolution_hz; >+ strcpy(reply.name, EAL_TIMER_MP); >+ reply.len_param = sizeof(*r); >+ >+ return rte_mp_reply(&reply, peer); >+} >+ > void > set_tsc_freq(void) > { > uint64_t freq; >+ int rc; > > freq = get_tsc_freq_arch(); >+ if (!freq && rte_eal_process_type() != RTE_PROC_PRIMARY) { >+ /* We couldn't get the TSC frequency through arch-specific >+ * means. If this is a secondary process, try to get the >+ * TSC frequency from the primary process - this will >+ * be much faster than get_tsc_freq() or estimate_tsc_freq() >+ * below. >+ */ >+ freq = get_tsc_freq_from_primary(); >+ } > if (!freq) > freq = get_tsc_freq(); > if (!freq) >@@ -87,6 +141,14 @@ set_tsc_freq(void) > > RTE_LOG(DEBUG, EAL, "TSC frequency is ~%" PRIu64 " KHz\n", freq / 1000); > eal_tsc_resolution_hz = freq; >+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) { >+ rc = rte_mp_action_register(EAL_TIMER_MP, timer_mp_primary); >+ if (rc && rte_errno != ENOTSUP) { >+ RTE_LOG(WARNING, EAL, "Could not register mp_action - " >+ "secondary processes will calculate TSC rate " >+ "independently.\n"); >+ } >+ } > } > > void rte_delay_us_callback_register(void (*userfunc)(unsigned int)) >
[dpdk-dev] [PATCH v4 3/6] ticketlock: use new API to reduce contention on aarch64
While using ticket lock, cores repeatedly poll the lock variable. This is replaced by rte_wait_until_equal API. Running ticketlock_autotest on ThunderX2, Ampere eMAG80, and Arm N1SDP[1], there were variances between runs, but no notable performance gain or degradation were seen with and without this patch. [1] https://community.arm.com/developer/tools-software/oss-platforms/w/\ docs/440/neoverse-n1-sdp Signed-off-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli Tested-by: Phil Yang Tested-by: Pavan Nikhilesh --- lib/librte_eal/common/include/generic/rte_ticketlock.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/lib/librte_eal/common/include/generic/rte_ticketlock.h b/lib/librte_eal/common/include/generic/rte_ticketlock.h index d9bec87..232bbe9 100644 --- a/lib/librte_eal/common/include/generic/rte_ticketlock.h +++ b/lib/librte_eal/common/include/generic/rte_ticketlock.h @@ -66,8 +66,7 @@ static inline void rte_ticketlock_lock(rte_ticketlock_t *tl) { uint16_t me = __atomic_fetch_add(&tl->s.next, 1, __ATOMIC_RELAXED); - while (__atomic_load_n(&tl->s.current, __ATOMIC_ACQUIRE) != me) - rte_pause(); + rte_wait_until_equal_acquire_16(&tl->s.current, me); } /** -- 2.7.4
[dpdk-dev] [PATCH v4 2/6] eal: add the APIs to wait until equal
The rte_wait_until_equalxx APIs abstract the functionality of 'polling for a memory location to become equal to a given value'. Signed-off-by: Gavin Hu Reviewed-by: Ruifeng Wang Reviewed-by: Steve Capper Reviewed-by: Ola Liljedahl Reviewed-by: Honnappa Nagarahalli Reviewed-by: Phil Yang Acked-by: Pavan Nikhilesh --- .../common/include/arch/arm/rte_pause_64.h | 30 ++ lib/librte_eal/common/include/generic/rte_pause.h | 26 ++- 2 files changed, 55 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h index 93895d3..dabde17 100644 --- a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h +++ b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2017 Cavium, Inc + * Copyright(c) 2019 Arm Limited */ #ifndef _RTE_PAUSE_ARM64_H_ @@ -17,6 +18,35 @@ static inline void rte_pause(void) asm volatile("yield" ::: "memory"); } +#ifdef RTE_ARM_USE_WFE +#define __WAIT_UNTIL_EQUAL(name, asm_op, wide, type) \ +static __rte_always_inline void \ +rte_wait_until_equal_##name(volatile type * addr, type expected) \ +{ \ + type tmp; \ + asm volatile( \ + #asm_op " %" #wide "[tmp], %[addr]\n" \ + "cmp%" #wide "[tmp], %" #wide "[expected]\n" \ + "b.eq 2f\n" \ + "sevl\n" \ + "1: wfe\n" \ + #asm_op " %" #wide "[tmp], %[addr]\n" \ + "cmp%" #wide "[tmp], %" #wide "[expected]\n" \ + "bne1b\n" \ + "2:\n" \ + : [tmp] "=&r" (tmp) \ + : [addr] "Q"(*addr), [expected] "r"(expected) \ + : "cc", "memory"); \ +} +/* Wait for *addr to be updated with expected value */ +__WAIT_UNTIL_EQUAL(relaxed_16, ldxrh, w, uint16_t) +__WAIT_UNTIL_EQUAL(acquire_16, ldaxrh, w, uint16_t) +__WAIT_UNTIL_EQUAL(relaxed_32, ldxr, w, uint32_t) +__WAIT_UNTIL_EQUAL(acquire_32, ldaxr, w, uint32_t) +__WAIT_UNTIL_EQUAL(relaxed_64, ldxr, x, uint64_t) +__WAIT_UNTIL_EQUAL(acquire_64, ldaxr, x, uint64_t) +#endif + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/common/include/generic/rte_pause.h b/lib/librte_eal/common/include/generic/rte_pause.h index 52bd4db..4741f8a 100644 --- a/lib/librte_eal/common/include/generic/rte_pause.h +++ b/lib/librte_eal/common/include/generic/rte_pause.h @@ -1,10 +1,10 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2017 Cavium, Inc + * Copyright(c) 2019 Arm Limited */ #ifndef _RTE_PAUSE_H_ #define _RTE_PAUSE_H_ - /** * @file * @@ -12,6 +12,10 @@ * */ +#include +#include +#include + /** * Pause CPU execution for a short while * @@ -20,4 +24,24 @@ */ static inline void rte_pause(void); +#if !defined(RTE_ARM_USE_WFE) +#define __WAIT_UNTIL_EQUAL(op_name, size, type, memorder) \ +__rte_always_inline \ +static void\ +rte_wait_until_equal_##op_name##_##size(volatile type *addr, \ + type expected) \ +{ \ + while (__atomic_load_n(addr, memorder) != expected) \ + rte_pause(); \ +} + +/* Wait for *addr to be updated with expected value */ +__WAIT_UNTIL_EQUAL(relaxed, 16, uint16_t, __ATOMIC_RELAXED) +__WAIT_UNTIL_EQUAL(acquire, 16, uint16_t, __ATOMIC_ACQUIRE) +__WAIT_UNTIL_EQUAL(relaxed, 32, uint32_t, __ATOMIC_RELAXED) +__WAIT_UNTIL_EQUAL(acquire, 32, uint32_t, __ATOMIC_ACQUIRE) +__WAIT_UNTIL_EQUAL(relaxed, 64, uint64_t, __ATOMIC_RELAXED) +__WAIT_UNTIL_EQUAL(acquire, 64, uint64_t, __ATOMIC_ACQUIRE) +#endif /* RTE_ARM_USE_WFE */ + #endif /* _RTE_PAUSE_H_ */ -- 2.7.4
[dpdk-dev] [PATCH v4 4/6] ring: use wfe to wait for ring tail update on aarch64
Instead of polling for tail to be updated, use wfe instruction. Signed-off-by: Gavin Hu Reviewed-by: Ruifeng Wang Reviewed-by: Steve Capper Reviewed-by: Ola Liljedahl Reviewed-by: Honnappa Nagarahalli --- lib/librte_ring/rte_ring_c11_mem.h | 4 ++-- lib/librte_ring/rte_ring_generic.h | 3 +-- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h index 0fb73a3..764d8f1 100644 --- a/lib/librte_ring/rte_ring_c11_mem.h +++ b/lib/librte_ring/rte_ring_c11_mem.h @@ -2,6 +2,7 @@ * * Copyright (c) 2017,2018 HXT-semitech Corporation. * Copyright (c) 2007-2009 Kip Macy km...@freebsd.org + * Copyright (c) 2019 Arm Limited * All rights reserved. * Derived from FreeBSD's bufring.h * Used as BSD-3 Licensed with permission from Kip Macy. @@ -21,8 +22,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); } diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h index 953cdbb..6828527 100644 --- a/lib/librte_ring/rte_ring_generic.h +++ b/lib/librte_ring/rte_ring_generic.h @@ -23,8 +23,7 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val, * we need to wait for them to complete */ if (!single) - while (unlikely(ht->tail != old_val)) - rte_pause(); + rte_wait_until_equal_relaxed_32(&ht->tail, old_val); ht->tail = new_val; } -- 2.7.4
[dpdk-dev] [PATCH v4 0/6] use WFE for locks and ring on aarch64
DPDK has multiple use cases where the core repeatedly polls a location in memory. This polling results in many cache and memory transactions. Arm architecture provides WFE (Wait For Event) instruction, which allows the cpu core to enter a low power state until woken up by the update to the memory location being polled. Thus reducing the cache and memory transactions. x86 has the PAUSE hint instruction to reduce such overhead. The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling for a memory location to become equal to a given value'. For non-Arm platforms, these APIs are just wrappers around do-while loop with rte_pause, so there are no performance differences. For Arm platforms, use of WFE can be configured using CONFIG_RTE_USE_WFE option. It is disabled by default. Currently, use of WFE is supported only for aarch64 platforms. armv7 platforms do support the WFE instruction, but they require explicit wake up events(sev) and are less performannt. Testing shows that, performance varies across different platforms, with some showing degradation. CONFIG_RTE_ARM_USE_WFE should be enabled depending on the performance benchmarking on the target platforms. Power saving should be an bonus, but currenly we don't have ways to characterize that. V4: - rename the config as CONFIG_RTE_ARM_USE_WFE to indicate it applys to arm only - introduce a macro for assembly Skelton to reduce the duplication of code - add one patch for nxp fslmc to address a compiling error V3: - Convert RFCs to patches V2: - Use inline functions instead of marcos - Add load and compare in the beginning of the APIs - Fix some style errors in asm inline V1: - Add the new APIs and use it for ring and locks Gavin Hu (6): bus/fslmc: fix the conflicting dmb function eal: add the APIs to wait until equal ticketlock: use new API to reduce contention on aarch64 ring: use wfe to wait for ring tail update on aarch64 spinlock: use wfe to reduce contention on aarch64 config: add WFE config entry for aarch64 config/arm/meson.build | 1 + config/common_base | 6 + drivers/bus/fslmc/mc/fsl_mc_sys.h | 10 +--- drivers/bus/fslmc/mc/mc_sys.c | 3 +-- .../common/include/arch/arm/rte_pause_64.h | 30 ++ .../common/include/arch/arm/rte_spinlock.h | 25 ++ lib/librte_eal/common/include/generic/rte_pause.h | 26 ++- .../common/include/generic/rte_ticketlock.h| 3 +-- lib/librte_ring/rte_ring_c11_mem.h | 4 +-- lib/librte_ring/rte_ring_generic.h | 3 +-- 10 files changed, 99 insertions(+), 12 deletions(-) -- 2.7.4
[dpdk-dev] [PATCH v4 1/6] bus/fslmc: fix the conflicting dmb function
There are two definitions conflicting each other, for more details, refer to [1]. include/rte_atomic_64.h:19: error: "dmb" redefined [-Werror] drivers/bus/fslmc/mc/fsl_mc_sys.h:36: note: this is the location of the previous definition #define dmb() {__asm__ __volatile__("" : : : "memory"); } The fix is to include the spinlock.h file before the other header files, this is inline with the coding style[2] about the "header includes". The fix changes the function to take the argument for arm to be meaningful. [1] http://inbox.dpdk.org/users/VI1PR08MB537631AB25F41B8880DCCA988FDF0@i VI1PR08MB5376.eurprd08.prod.outlook.com/T/#u [2] https://doc.dpdk.org/guides/contributing/coding_style.html Fixes: 3af733ba8da8 ("bus/fslmc: introduce MC object functions") Cc: sta...@dpdk.org Signed-off-by: Gavin Hu Reviewed-by: Phil Yang --- drivers/bus/fslmc/mc/fsl_mc_sys.h | 10 +++--- drivers/bus/fslmc/mc/mc_sys.c | 3 +-- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/bus/fslmc/mc/fsl_mc_sys.h b/drivers/bus/fslmc/mc/fsl_mc_sys.h index d0c7b39..fe9dc95 100644 --- a/drivers/bus/fslmc/mc/fsl_mc_sys.h +++ b/drivers/bus/fslmc/mc/fsl_mc_sys.h @@ -33,10 +33,14 @@ struct fsl_mc_io { #include #ifndef dmb -#define dmb() {__asm__ __volatile__("" : : : "memory"); } +#ifdef RTE_ARCH_ARM64 +#define dmb(opt) {asm volatile("dmb " #opt : : : "memory"); } +#else +#define dmb(opt) #endif -#define __iormb() dmb() -#define __iowmb() dmb() +#endif +#define __iormb() dmb(ld) +#define __iowmb() dmb(st) #define __arch_getq(a) (*(volatile uint64_t *)(a)) #define __arch_putq(v, a) (*(volatile uint64_t *)(a) = (v)) #define __arch_putq32(v, a)(*(volatile uint32_t *)(a) = (v)) diff --git a/drivers/bus/fslmc/mc/mc_sys.c b/drivers/bus/fslmc/mc/mc_sys.c index efafdc3..22143ef 100644 --- a/drivers/bus/fslmc/mc/mc_sys.c +++ b/drivers/bus/fslmc/mc/mc_sys.c @@ -4,11 +4,10 @@ * Copyright 2017 NXP * */ +#include #include #include -#include - /** User space framework uses MC Portal in shared mode. Following change * introduces lock in MC FLIB */ -- 2.7.4
[dpdk-dev] [PATCH v4 5/6] spinlock: use wfe to reduce contention on aarch64
In acquiring a spinlock, cores repeatedly poll the lock variable. This is replaced by rte_wait_until_equal API. Running the micro benchmarking and the testpmd and l3fwd traffic tests on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no notable performance gain nor degradation was measured. Signed-off-by: Gavin Hu Reviewed-by: Ruifeng Wang Reviewed-by: Phil Yang Reviewed-by: Steve Capper Reviewed-by: Ola Liljedahl Reviewed-by: Honnappa Nagarahalli Tested-by: Pavan Nikhilesh --- .../common/include/arch/arm/rte_spinlock.h | 25 ++ 1 file changed, 25 insertions(+) diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h index 1a6916b..7b8328e 100644 --- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h +++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h @@ -16,6 +16,31 @@ extern "C" { #include #include "generic/rte_spinlock.h" +/* armv7a does support WFE, but an explicit wake-up signal using SEV is + * required (must be preceded by DSB to drain the store buffer) and + * this is less performant, so keep armv7a implementation unchanged. + */ +#ifndef RTE_FORCE_INTRINSICS +static inline void +rte_spinlock_lock(rte_spinlock_t *sl) +{ + unsigned int tmp; + /* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc. +* faqs/ka16809.html +*/ + asm volatile( + "sevl\n" + "1: wfe\n" + "2: ldaxr %w[tmp], %w[locked]\n" + "cbnz %w[tmp], 1b\n" + "stxr %w[tmp], %w[one], %w[locked]\n" + "cbnz %w[tmp], 2b\n" + : [tmp] "=&r" (tmp), [locked] "+Q"(sl->locked) + : [one] "r" (1) + : "cc", "memory"); +} +#endif + static inline int rte_tm_supported(void) { return 0; -- 2.7.4
[dpdk-dev] [PATCH v4 6/6] config: add WFE config entry for aarch64
Add the RTE_USE_WFE configuration entry for aarch64, disabled by default. It can be enabled selectively based on the performance benchmarking. Signed-off-by: Gavin Hu Reviewed-by: Ruifeng Wang Reviewed-by: Steve Capper Reviewed-by: Honnappa Nagarahalli Reviewed-by: Phil Yang Acked-by: Pavan Nikhilesh --- config/arm/meson.build | 1 + config/common_base | 6 ++ 2 files changed, 7 insertions(+) diff --git a/config/arm/meson.build b/config/arm/meson.build index 979018e..18ecd53 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -116,6 +116,7 @@ impl_dpaa = ['NXP DPAA', flags_dpaa, machine_args_generic] impl_dpaa2 = ['NXP DPAA2', flags_dpaa2, machine_args_generic] dpdk_conf.set('RTE_FORCE_INTRINSICS', 1) +dpdk_conf.set('RTE_ARM_USE_WFE', 0) if not dpdk_conf.get('RTE_ARCH_64') dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64) diff --git a/config/common_base b/config/common_base index 8ef75c2..d4cf974 100644 --- a/config/common_base +++ b/config/common_base @@ -570,6 +570,12 @@ CONFIG_RTE_CRYPTO_MAX_DEVS=64 CONFIG_RTE_LIBRTE_PMD_ARMV8_CRYPTO=n CONFIG_RTE_LIBRTE_PMD_ARMV8_CRYPTO_DEBUG=n +# Use WFE instructions to implement the rte_wait_for_equal_xxx APIs, +# calling these APIs put the cores in low power state while waiting +# for the memory address to become equal to the expected value. +# This is supported only by aarch64. +CONFIG_RTE_ARM_USE_WFE=n + # # Compile NXP CAAM JR crypto Driver # -- 2.7.4
[dpdk-dev] [PATCH] net/af_packet: fix for stale sockets
af_packet driver is leaving stale socket after device is removed. Ring buffers are memory mapped when device is added using rte_dev_probe. There is no corresponding munmap call when device is removed/closed. This commit fixes the issue by calling munmap from rte_pmd_af_packet_remove(). Bugzilla ID: 339 Cc: sta...@dpdk.org Signed-off-by: Abhishek Sachan Reviewed-by: John W. Linville --- drivers/net/af_packet/rte_eth_af_packet.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index 82bf2cd..6df09f2 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -972,6 +972,7 @@ rte_pmd_af_packet_remove(struct rte_vdev_device *dev) { struct rte_eth_dev *eth_dev = NULL; struct pmd_internals *internals; + struct tpacket_req *req; unsigned q; PMD_LOG(INFO, "Closing AF_PACKET ethdev on numa socket %u", @@ -992,7 +993,10 @@ rte_pmd_af_packet_remove(struct rte_vdev_device *dev) return rte_eth_dev_release_port(eth_dev); internals = eth_dev->data->dev_private; + req = &internals->req; for (q = 0; q < internals->nb_queues; q++) { + munmap(internals->rx_queue[q].map, + 2 * req->tp_block_size * req->tp_block_nr); rte_free(internals->rx_queue[q].rd); rte_free(internals->tx_queue[q].rd); } -- 2.7.4
[dpdk-dev] [RFC PATCH 1/3] doc/rcu: add RCU integration design details
From: Honnappa Nagarahalli Add a section to describe a design to integrate QSBR RCU library with other libraries in DPDK. Signed-off-by: Honnappa Nagarahalli Reviewed-by: Gavin Hu Reviewed-by: Ruifeng Wang --- doc/guides/prog_guide/rcu_lib.rst | 51 +++ 1 file changed, 51 insertions(+) diff --git a/doc/guides/prog_guide/rcu_lib.rst b/doc/guides/prog_guide/rcu_lib.rst index 8fe5b1f73..2869441ca 100644 --- a/doc/guides/prog_guide/rcu_lib.rst +++ b/doc/guides/prog_guide/rcu_lib.rst @@ -186,3 +186,54 @@ However, when ``CONFIG_RTE_LIBRTE_RCU_DEBUG`` is enabled, these APIs aid in debugging issues. One can mark the access to shared data structures on the reader side using these APIs. The ``rte_rcu_qsbr_quiescent()`` will check if all the locks are unlocked. + +Integrating QSBR RCU with other libraries +- + +Lock-free algorithms place additional burden on the application to reclaim +memory. Integrating memory reclaiming mechanisms in the libraries help +remove some of the burden. Though QSBR method presents flexibility to +achieve performance, it presents challenges while integrating with libraries. + +The memory reclaiming process using QSBR can be split into 4 parts: + +#. Initialization +#. Quiescent State Reporting +#. Reclaiming Resources +#. Shutdown + +The design proposed here requires the application to handle 'Initialization' +and 'Quiescent State Reporting'. So, + +* the application has to create the RCU variable and register the reader threads to report their quiescent state. +* the application has to register the same RCU variable with the library. +* reader threads in the application have to report the quiescent state. This allows for the application to control the length of the critical section/how frequently the application wants to report the quiescent state. + +The library will handle 'Reclaiming Resources' part of the process. The +libraries will make use of the writer thread context to execute the memory +reclaiming algorithm. So, + +* library should provide an API to register a RCU variable that it will use. +* library should trigger the readers to report quiescent state status upon deleting the resources by calling ``rte_rcu_qsbr_start``. + +* library should store the token and deleted resources for later use to free them after the readers have reported their quiescent state. Since the readers will report the quiescent state status in the order of deletion, the library must store the tokens/resources in the order in which the resources were deleted. A FIFO data structure would achieve the desired results. The length of the FIFO would depend on the rate of deletion and the rate at which the readers report their quiescent state. In the worst case the length of FIFO would be equal to the maximum number of resources the data structure supports. However, in most cases, the length will be much smaller. But, the library should not take the length of FIFO as an input from the application. Instead, it should implement a data structure which should be able to grow/shrink dynamically. Overhead introduced by such a data structure on delete operations should be considered as well. + +* library should query the quiescent state and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. This allows the application to do useful work while the readers report their quiescent state. If there are tokens/resources present in the FIFO already, the delete API should peek the head of the FIFO and check the quiescent state status. If the status is success, the token/resource should be dequeued and the resource should be freed. This process can be repeated till the quiescent state status for a token returns failure indicating that subsequent tokens will also fail quiescent state status query. The same process can be incorporated while adding new entries in the data structure if the library runs out of resources. + +The 'Shutdown' process needs to be shared between the application and the +library. + +* library should check the quiescent state status of all the tokens that may be present in the FIFO and free the resources. It should make use of non-blocking ``rte_rcu_qsbr_check`` API to query the quiescent state. If any of the tokens do not pass the quiescent state check, the library should print an error and stop the memory reclaimation process. + +* the application should make sure that the reader threads are not using the shared data structure, unregister the reader threads from the QSBR variable before calling the library's shutdown function. + +Integrating the resource reclaimation with libraries removes the burden from +the application and makes it easy to use lock-free algorithms. + +This design has several advantages over currently known methods. + +#. Application does not need a dedicated thread to reclaim resources. Memory + reclaimation happ
[dpdk-dev] [RFC PATCH 3/3] lib/lpm: integrate RCU QSBR
Currently, the tbl8 group is freed even though the readers might be using the tbl8 group entries. The freed tbl8 group can be reallocated quickly. This results in incorrect lookup results. RCU QSBR process is integrated for safe tbl8 group reclaim. Refer to RCU documentation to understand various aspects of integrating RCU library into other libraries. Signed-off-by: Ruifeng Wang Reviewed-by: Honnappa Nagarahalli Reviewed-by: Gavin Hu --- lib/librte_lpm/Makefile| 3 +- lib/librte_lpm/meson.build | 2 + lib/librte_lpm/rte_lpm.c | 218 +++-- lib/librte_lpm/rte_lpm.h | 22 +++ lib/librte_lpm/rte_lpm_version.map | 6 + lib/meson.build| 3 +- 6 files changed, 239 insertions(+), 15 deletions(-) diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index a7946a1c5..ca9e16312 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -6,9 +6,10 @@ include $(RTE_SDK)/mk/rte.vars.mk # library name LIB = librte_lpm.a +CFLAGS += -DALLOW_EXPERIMENTAL_API CFLAGS += -O3 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -LDLIBS += -lrte_eal -lrte_hash +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu EXPORT_MAP := rte_lpm_version.map diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build index a5176d8ae..19a35107f 100644 --- a/lib/librte_lpm/meson.build +++ b/lib/librte_lpm/meson.build @@ -2,9 +2,11 @@ # Copyright(c) 2017 Intel Corporation version = 2 +allow_experimental_apis = true sources = files('rte_lpm.c', 'rte_lpm6.c') headers = files('rte_lpm.h', 'rte_lpm6.h') # since header files have different names, we can install all vector headers # without worrying about which architecture we actually need headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h') deps += ['hash'] +deps += ['rcu'] diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index 3a929a1b1..1efdef22d 100644 --- a/lib/librte_lpm/rte_lpm.c +++ b/lib/librte_lpm/rte_lpm.c @@ -1,5 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation + * Copyright(c) 2019 Arm Limited */ #include @@ -22,6 +23,7 @@ #include #include #include +#include #include "rte_lpm.h" @@ -39,6 +41,11 @@ enum valid_flag { VALID }; +struct __rte_lpm_qs_item { + uint64_t token; /**< QSBR token.*/ + uint32_t index; /**< tbl8 group index.*/ +}; + /* Macro to enable/disable run-time checks. */ #if defined(RTE_LIBRTE_LPM_DEBUG) #include @@ -381,6 +388,8 @@ rte_lpm_free_v1604(struct rte_lpm *lpm) rte_mcfg_tailq_write_unlock(); + if (lpm->qsv) + rte_ring_free(lpm->qs_fifo); rte_free(lpm->tbl8); rte_free(lpm->rules_tbl); rte_free(lpm); @@ -390,6 +399,145 @@ BIND_DEFAULT_SYMBOL(rte_lpm_free, _v1604, 16.04); MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm), rte_lpm_free_v1604); +/* Add an item into FIFO. + * return: 0 - success + */ +static int +__rte_lpm_rcu_qsbr_fifo_push(struct rte_ring *fifo, + struct __rte_lpm_qs_item *item) +{ + if (rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->token) != 0) { + rte_errno = ENOSPC; + return 1; + } + if (rte_ring_sp_enqueue(fifo, (void *)(uintptr_t)item->index) != 0) { + void *obj; + /* token needs to be dequeued when index enqueue fails */ + rte_ring_sc_dequeue(fifo, &obj); + rte_errno = ENOSPC; + return 1; + } + + return 0; +} + +/* Remove item from FIFO. + * Used when data observed by rte_ring_peek. + */ +static void +__rte_lpm_rcu_qsbr_fifo_pop(struct rte_ring *fifo, + struct __rte_lpm_qs_item *item) +{ + void *obj_token = NULL; + void *obj_index = NULL; + + (void)rte_ring_sc_dequeue(fifo, &obj_token); + (void)rte_ring_sc_dequeue(fifo, &obj_index); + + if (item) { + item->token = (uint64_t)((uintptr_t)obj_token); + item->index = (uint32_t)((uintptr_t)obj_index); + } +} + +/* Max number of tbl8 groups to reclaim at one time. */ +#define RCU_QSBR_RECLAIM_SIZE 8 + +/* When RCU QSBR FIFO usage is above 1/(2^RCU_QSBR_RECLAIM_LEVEL), + * reclaim will be triggered by tbl8_free. + */ +#define RCU_QSBR_RECLAIM_LEVEL 3 + +/* Reclaim some tbl8 groups based on quiescent state check. + * RCU_QSBR_RECLAIM_SIZE groups will be reclaimed at max. + * return: 0 - success, 1 - no group reclaimed. + */ +static uint32_t +__rte_lpm_rcu_qsbr_reclaim_chunk(struct rte_lpm *lpm, uint32_t *index) +{ + struct __rte_lpm_qs_item qs_item; + struct rte_lpm_tbl_entry *tbl8_entry = NULL; + void *obj_token; + uint32_t cnt = 0; + + /* Check reader threads quiescent state and +* reclaim as much tbl8 groups as possible. +*/ + while ((cnt < RCU_QSBR_RECLAIM_SIZE) && + (rte_ring_peek(lpm->qs_fifo, &obj_
[dpdk-dev] [RFC PATCH 2/3] lib/ring: add peek API
The peek API allows fetching the next available object in the ring without dequeuing it. This helps in scenarios where dequeuing of objects depend on their value. Signed-off-by: Dharmik Thakkar Signed-off-by: Ruifeng Wang Reviewed-by: Honnappa Nagarahalli Reviewed-by: Gavin Hu --- lib/librte_ring/rte_ring.h | 30 ++ 1 file changed, 30 insertions(+) diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, r->cons.single, available); } +/** + * Peek one object from a ring. + * + * The peek API allows fetching the next available object in the ring + * without dequeuing it. This API is not multi-thread safe with respect + * to other consumer threads. + * + * @param r + * A pointer to the ring structure. + * @param obj_p + * A pointer to a void * pointer (object) that will be filled. + * @return + * - 0: Success, object available + * - -ENOENT: Not enough entries in the ring. + */ +__rte_experimental +static __rte_always_inline int +rte_ring_peek(struct rte_ring *r, void **obj_p) +{ + uint32_t prod_tail = r->prod.tail; + uint32_t cons_head = r->cons.head; + uint32_t count = (prod_tail - cons_head) & r->mask; + unsigned int n = 1; + if (count) { + DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *); + return 0; + } + return -ENOENT; +} + #ifdef __cplusplus } #endif -- 2.17.1
[dpdk-dev] [RFC PATCH 0/3] RCU integration with LPM library
This patchset integrates RCU QSBR support with LPM library. Document is added with suggested design of integrating RCU library with other libraries in DPDK. As an example, LPM library adds the integration. RCU is used to safely free tbl8 groups that can be recycled. Table will not be reclaimed or reused until reader finished referencing it. New API rte_lpm_rcu_qsbr_add is introduced for application to register a RCU variable that LPM library will use. New API rte_ring_peek is introduced to help on management of reclaiming FIFO queue. Honnappa Nagarahalli (1): doc/rcu: add RCU integration design details Ruifeng Wang (2): lib/ring: add peek API lib/lpm: integrate RCU QSBR doc/guides/prog_guide/rcu_lib.rst | 51 +++ lib/librte_lpm/Makefile| 3 +- lib/librte_lpm/meson.build | 2 + lib/librte_lpm/rte_lpm.c | 218 +++-- lib/librte_lpm/rte_lpm.h | 22 +++ lib/librte_lpm/rte_lpm_version.map | 6 + lib/librte_ring/rte_ring.h | 30 lib/meson.build| 3 +- 8 files changed, 320 insertions(+), 15 deletions(-) -- 2.17.1