Re: [dpdk-dev] Compilation of MLX5 driver
Hi, It has following files: arch.h ib.h kern-abi.h mlx4dv.h mlx5dv.h opcode.h sa.h sa-kern-abi.h verbs.h I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64 Regards, Nitin -Original Message- From: Shahaf Shuler [mailto:shah...@mellanox.com] Sent: Thursday, May 31, 2018 10:51 AM To: Nitin Katiyar ; Nélio Laranjeiro Cc: dev@dpdk.org Subject: RE: [dpdk-dev] Compilation of MLX5 driver Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar: > > Hi, > I was compiling 17.05.02. > Regards, > Nitin > > -Original Message- > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] > Sent: Wednesday, May 30, 2018 6:42 PM > To: Nitin Katiyar > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] Compilation of MLX5 driver > > Hi, > > On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote: > > Hi, > > I am trying to compile MLX5 PMD driver by setting > "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation > error. > > > > fatal error: infiniband/mlx5_hw.h: No such file or directory Can you list the files you have under /usr/include/infiniband ? > > > > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 > > machine > but still hitting the same error. Am I missing some other package? > > Which version of DPDK are you using (it is important to help)? > > Regards, > > -- > Nélio Laranjeiro > 6WIND
[dpdk-dev] [PATCH 1/2] log: remove useless intermediate buffer
Rather than copy the log message, we can use a precision in the format string given to syslog. Fixes: af75078fece3 ("first public release") Signed-off-by: David Marchand --- lib/librte_eal/linuxapp/eal/eal_log.c | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_log.c b/lib/librte_eal/linuxapp/eal/eal_log.c index ff14588..9d02ddd 100644 --- a/lib/librte_eal/linuxapp/eal/eal_log.c +++ b/lib/librte_eal/linuxapp/eal/eal_log.c @@ -25,25 +25,14 @@ static ssize_t console_log_write(__attribute__((unused)) void *c, const char *buf, size_t size) { - char copybuf[BUFSIZ + 1]; ssize_t ret; - uint32_t loglevel; /* write on stdout */ ret = fwrite(buf, 1, size, stdout); fflush(stdout); - /* truncate message if too big (should not happen) */ - if (size > BUFSIZ) - size = BUFSIZ; - /* Syslog error levels are from 0 to 7, so subtract 1 to convert */ - loglevel = rte_log_cur_msg_loglevel() - 1; - memcpy(copybuf, buf, size); - copybuf[size] = '\0'; - - /* write on syslog too */ - syslog(loglevel, "%s", copybuf); + syslog(rte_log_cur_msg_loglevel() - 1, "%.*s", (int)size, buf); return ret; } -- 2.7.4
[dpdk-dev] [PATCH 2/2] cmdline: remove useless intermediate buffer
Rather than copy the string, we can use a precision in the format string given to printf. Signed-off-by: David Marchand --- lib/librte_cmdline/cmdline_parse.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/lib/librte_cmdline/cmdline_parse.c b/lib/librte_cmdline/cmdline_parse.c index 961f9be..9666e90 100644 --- a/lib/librte_cmdline/cmdline_parse.c +++ b/lib/librte_cmdline/cmdline_parse.c @@ -208,9 +208,6 @@ cmdline_parse(struct cmdline *cl, const char * buf) int err = CMDLINE_PARSE_NOMATCH; int tok; cmdline_parse_ctx_t *ctx; -#ifdef RTE_LIBRTE_CMDLINE_DEBUG - char debug_buf[BUFSIZ]; -#endif char *result_buf = result.buf; if (!cl || !buf) @@ -250,10 +247,8 @@ cmdline_parse(struct cmdline *cl, const char * buf) return linelen; } -#ifdef RTE_LIBRTE_CMDLINE_DEBUG - strlcpy(debug_buf, buf, (linelen > 64 ? 64 : linelen)); - debug_printf("Parse line : len=%d, <%s>\n", linelen, debug_buf); -#endif + debug_printf("Parse line : len=%d, <%.*s>\n", +linelen, linelen > 64 ? 64 : linelen, buf); /* parse it !! */ inst = ctx[inst_num]; -- 2.7.4
Re: [dpdk-dev] Compilation of MLX5 driver
On Thu, May 31, 2018 at 07:01:17AM +, Nitin Katiyar wrote: > Hi, > It has following files: > > arch.h ib.h kern-abi.h mlx4dv.h mlx5dv.h opcode.h sa.h > sa-kern-abi.h verbs.h > > I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and > MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64 Did you installed Mellanox OFED with the --dpdk --upstream-libs arguments for the installation script? If it is the case, you should not add them for this version, those options are for DPDK v17.11 and higher. Regards, > Regards, > Nitin > > -Original Message- > From: Shahaf Shuler [mailto:shah...@mellanox.com] > Sent: Thursday, May 31, 2018 10:51 AM > To: Nitin Katiyar ; Nélio Laranjeiro > > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] Compilation of MLX5 driver > > Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar: > > > > Hi, > > I was compiling 17.05.02. > > Regards, > > Nitin > > > > -Original Message- > > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] > > Sent: Wednesday, May 30, 2018 6:42 PM > > To: Nitin Katiyar > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] Compilation of MLX5 driver > > > > Hi, > > > > On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote: > > > Hi, > > > I am trying to compile MLX5 PMD driver by setting > > "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation > > error. > > > > > > fatal error: infiniband/mlx5_hw.h: No such file or directory > > Can you list the files you have under /usr/include/infiniband ? > > > > > > > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 > > > machine > > but still hitting the same error. Am I missing some other package? > > > > Which version of DPDK are you using (it is important to help)? > > > > Regards, > > > > -- > > Nélio Laranjeiro > > 6WIND -- Nélio Laranjeiro 6WIND
[dpdk-dev] [PATCH v2] net/ixgbe: fix crash on detach
When detaching a port bound to ixgbe PMD, if the port does not have any VFs, *vfinfo is not set and there is a NULL dereference attempt, when calling rte_eth_switch_domain_free(), which expects VFs to be used, causing a segmentation fault. Steps to reproduce: ./testpmd -- -i testpmd> port stop all testpmd> port close all testpmd> port detach 0 Fixes: cf80ba6e2038 ("net/ixgbe: add support for representor ports") Cc: sta...@dpdk.org Reported-by: Anatoly Burakov Signed-off-by: Pablo de Lara Tested-by: Anatoly Burakov Acked-by: Remy Horton --- v2: - Cc stable as this fix is targetting code that was introduced in previous release drivers/net/ixgbe/ixgbe_pf.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c index 4d199c802..c381acf44 100644 --- a/drivers/net/ixgbe/ixgbe_pf.c +++ b/drivers/net/ixgbe/ixgbe_pf.c @@ -135,14 +135,14 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev) RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx = 0; RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx = 0; - ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id); - if (ret) - PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret); - vf_num = dev_num_vf(eth_dev); if (vf_num == 0) return; + ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id); + if (ret) + PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret); + rte_free(*vfinfo); *vfinfo = NULL; } -- 2.17.0
Re: [dpdk-dev] Compilation of MLX5 driver
Yes,I installed it using --dpdk --upstream-libs. What is the way forward now? Regards, Nitin -Original Message- From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] Sent: Thursday, May 31, 2018 1:36 PM To: Nitin Katiyar Cc: Shahaf Shuler ; dev@dpdk.org Subject: Re: [dpdk-dev] Compilation of MLX5 driver On Thu, May 31, 2018 at 07:01:17AM +, Nitin Katiyar wrote: > Hi, > It has following files: > > arch.h ib.h kern-abi.h mlx4dv.h mlx5dv.h opcode.h sa.h > sa-kern-abi.h verbs.h > > I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and > MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64 Did you installed Mellanox OFED with the --dpdk --upstream-libs arguments for the installation script? If it is the case, you should not add them for this version, those options are for DPDK v17.11 and higher. Regards, > Regards, > Nitin > > -Original Message- > From: Shahaf Shuler [mailto:shah...@mellanox.com] > Sent: Thursday, May 31, 2018 10:51 AM > To: Nitin Katiyar ; Nélio Laranjeiro > > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] Compilation of MLX5 driver > > Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar: > > > > Hi, > > I was compiling 17.05.02. > > Regards, > > Nitin > > > > -Original Message- > > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] > > Sent: Wednesday, May 30, 2018 6:42 PM > > To: Nitin Katiyar > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] Compilation of MLX5 driver > > > > Hi, > > > > On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote: > > > Hi, > > > I am trying to compile MLX5 PMD driver by setting > > "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation > > error. > > > > > > fatal error: infiniband/mlx5_hw.h: No such file or directory > > Can you list the files you have under /usr/include/infiniband ? > > > > > > > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 > > > machine > > but still hitting the same error. Am I missing some other package? > > > > Which version of DPDK are you using (it is important to help)? > > > > Regards, > > > > -- > > Nélio Laranjeiro > > 6WIND -- Nélio Laranjeiro 6WIND
[dpdk-dev] [Bug 54] i40e port link status no updated for interrupt mode
https://dpdk.org/tracker/show_bug.cgi?id=54 Fan Zhang (roy.fan.zh...@intel.com) changed: What|Removed |Added Status|CONFIRMED |RESOLVED Component|eventdev|ethdev Resolution|--- |FIXED --- Comment #1 from Fan Zhang (roy.fan.zh...@intel.com) --- Sent patch to fix and merged. https://dpdk.org/dev/patchwork/patch/40512/ -- You are receiving this mail because: You are the assignee for the bug.
[dpdk-dev] [PATCH v6 0/3] Improve zero-length memzone allocation
This patchset does two things. First, it enables reserving memzones of zero-length that are IOVA-contiguous. Second, it fixes a long-standing race condition in reserving zero-length memzones, where malloc heap is not locked between stats collection and reservation, and will instead allocate biggest element on the spot. Some limitations are added, but they are a trade-off between not having race conditions and user convenience. It would be possible to lock all heaps during memzone reserve for zero- length, and that would keep the old behavior, but given how such allocation (especially when asking for IOVA-contiguous memory) may take a long time, a design decision was made to keep things simple, and only check other heaps if the current one is completely busy. Ideas on improvement are welcome. v6: - Rebase on top of 18.05 - Dropped malloc stats changes as no deprecation notice was sent, and i would like to integrate these changes in this release :) v5: - Use bound length if reserving memzone with zero length v4: - Fixes in memzone test - Account for element padding - Fix for wrong memzone size being returned - Documentation fixes Anatoly Burakov (3): malloc: add finding biggest free IOVA-contiguous element malloc: allow reserving biggest element memzone: improve zero-length memzone reserve lib/librte_eal/common/eal_common_memzone.c | 70 ++--- lib/librte_eal/common/include/rte_memzone.h | 24 ++- lib/librte_eal/common/malloc_elem.c | 79 ++ lib/librte_eal/common/malloc_elem.h | 6 + lib/librte_eal/common/malloc_heap.c | 126 +++ lib/librte_eal/common/malloc_heap.h | 4 + test/test/test_memzone.c| 165 +++- 7 files changed, 343 insertions(+), 131 deletions(-) -- 2.17.0
[dpdk-dev] [PATCH v6 2/3] malloc: allow reserving biggest element
Add an internal-only function to allocate biggest element from the heap. Nominally, it supports SOCKET_ID_ANY as its socket argument, but it's essentially useless because other sockets will only be allocated from if the entire heap on current or specified socket is busy. Still, asking to reserve a biggest element will allow fixing race condition in memzone reserve that has been there for a long time. Signed-off-by: Anatoly Burakov Acked-by: Remy Horton --- lib/librte_eal/common/malloc_heap.c | 126 lib/librte_eal/common/malloc_heap.h | 4 + 2 files changed, 130 insertions(+) diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index d6cf3af81..12aaf2d72 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -148,6 +148,52 @@ find_suitable_element(struct malloc_heap *heap, size_t size, return NULL; } +/* + * Iterates through the freelist for a heap to find a free element with the + * biggest size and requested alignment. Will also set size to whatever element + * size that was found. + * Returns null on failure, or pointer to element on success. + */ +static struct malloc_elem * +find_biggest_element(struct malloc_heap *heap, size_t *size, + unsigned int flags, size_t align, bool contig) +{ + struct malloc_elem *elem, *max_elem = NULL; + size_t idx, max_size = 0; + + for (idx = 0; idx < RTE_HEAP_NUM_FREELISTS; idx++) { + for (elem = LIST_FIRST(&heap->free_head[idx]); + !!elem; elem = LIST_NEXT(elem, free_list)) { + size_t cur_size; + if (!check_hugepage_sz(flags, elem->msl->page_sz)) + continue; + if (contig) { + cur_size = + malloc_elem_find_max_iova_contig(elem, + align); + } else { + void *data_start = RTE_PTR_ADD(elem, + MALLOC_ELEM_HEADER_LEN); + void *data_end = RTE_PTR_ADD(elem, elem->size - + MALLOC_ELEM_TRAILER_LEN); + void *aligned = RTE_PTR_ALIGN_CEIL(data_start, + align); + /* check if aligned data start is beyond end */ + if (aligned >= data_end) + continue; + cur_size = RTE_PTR_DIFF(data_end, aligned); + } + if (cur_size > max_size) { + max_size = cur_size; + max_elem = elem; + } + } + } + + *size = max_size; + return max_elem; +} + /* * Main function to allocate a block of memory from the heap. * It locks the free list, scans it, and adds a new memseg if the @@ -174,6 +220,26 @@ heap_alloc(struct malloc_heap *heap, const char *type __rte_unused, size_t size, return elem == NULL ? NULL : (void *)(&elem[1]); } +static void * +heap_alloc_biggest(struct malloc_heap *heap, const char *type __rte_unused, + unsigned int flags, size_t align, bool contig) +{ + struct malloc_elem *elem; + size_t size; + + align = RTE_CACHE_LINE_ROUNDUP(align); + + elem = find_biggest_element(heap, &size, flags, align, contig); + if (elem != NULL) { + elem = malloc_elem_alloc(elem, size, align, 0, contig); + + /* increase heap's count of allocated elements */ + heap->alloc_count++; + } + + return elem == NULL ? NULL : (void *)(&elem[1]); +} + /* this function is exposed in malloc_mp.h */ void rollback_expand_heap(struct rte_memseg **ms, int n_segs, @@ -575,6 +641,66 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg, return NULL; } +static void * +heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags, + size_t align, bool contig) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct malloc_heap *heap = &mcfg->malloc_heaps[socket]; + void *ret; + + rte_spinlock_lock(&(heap->lock)); + + align = align == 0 ? 1 : align; + + ret = heap_alloc_biggest(heap, type, flags, align, contig); + + rte_spinlock_unlock(&(heap->lock)); + + return ret; +} + +void * +malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags, + size_t align, bool contig) +{ + int socket, i, cur_socket; + void *ret; + + /* return NULL if align is not power-of-2 */ + if ((align && !rte_is_power_of_2(align)))
[dpdk-dev] [PATCH v6 1/3] malloc: add finding biggest free IOVA-contiguous element
Adding internal-only function to find biggest free IOVA-contiguous malloc element. This is not exposed to external API. Signed-off-by: Anatoly Burakov Acked-by: Remy Horton Acked-by: Shreyansh Jain --- Notes: v6: - Patch was postponed to 18.08 but i forgot to add deprecation notice for the API changes, so these external malloc stats API changes have been dropped from this patchset v4: - Fix comments to be more up to date with v4 code - Add comments explaining trailer handling v2: - Add header to newly recalculated element start v2: - Add header to newly recalculated element start lib/librte_eal/common/malloc_elem.c | 79 + lib/librte_eal/common/malloc_elem.h | 6 +++ 2 files changed, 85 insertions(+) diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c index 9bfe9b9b4..f1bb4fee7 100644 --- a/lib/librte_eal/common/malloc_elem.c +++ b/lib/librte_eal/common/malloc_elem.c @@ -18,10 +18,89 @@ #include #include +#include "eal_internal_cfg.h" #include "eal_memalloc.h" #include "malloc_elem.h" #include "malloc_heap.h" +size_t +malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align) +{ + void *cur_page, *contig_seg_start, *page_end, *cur_seg_end; + void *data_start, *data_end; + rte_iova_t expected_iova; + struct rte_memseg *ms; + size_t page_sz, cur, max; + + page_sz = (size_t)elem->msl->page_sz; + data_start = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN); + data_end = RTE_PTR_ADD(elem, elem->size - MALLOC_ELEM_TRAILER_LEN); + /* segment must start after header and with specified alignment */ + contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align); + + /* if we're in IOVA as VA mode, or if we're in legacy mode with +* hugepages, all elements are IOVA-contiguous. +*/ + if (rte_eal_iova_mode() == RTE_IOVA_VA || + (internal_config.legacy_mem && rte_eal_has_hugepages())) + return RTE_PTR_DIFF(data_end, contig_seg_start); + + cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz); + ms = rte_mem_virt2memseg(cur_page, elem->msl); + + /* do first iteration outside the loop */ + page_end = RTE_PTR_ADD(cur_page, page_sz); + cur_seg_end = RTE_MIN(page_end, data_end); + cur = RTE_PTR_DIFF(cur_seg_end, contig_seg_start) - + MALLOC_ELEM_TRAILER_LEN; + max = cur; + expected_iova = ms->iova + page_sz; + /* memsegs are contiguous in memory */ + ms++; + + cur_page = RTE_PTR_ADD(cur_page, page_sz); + + while (cur_page < data_end) { + page_end = RTE_PTR_ADD(cur_page, page_sz); + cur_seg_end = RTE_MIN(page_end, data_end); + + /* reset start of contiguous segment if unexpected iova */ + if (ms->iova != expected_iova) { + /* next contiguous segment must start at specified +* alignment. +*/ + contig_seg_start = RTE_PTR_ALIGN(cur_page, align); + /* new segment start may be on a different page, so find +* the page and skip to next iteration to make sure +* we're not blowing past data end. +*/ + ms = rte_mem_virt2memseg(contig_seg_start, elem->msl); + cur_page = ms->addr; + /* don't trigger another recalculation */ + expected_iova = ms->iova; + continue; + } + /* cur_seg_end ends on a page boundary or on data end. if we're +* looking at data end, then malloc trailer is already included +* in the calculations. if we're looking at page end, then we +* know there's more data past this page and thus there's space +* for malloc element trailer, so don't count it here. +*/ + cur = RTE_PTR_DIFF(cur_seg_end, contig_seg_start); + /* update max if cur value is bigger */ + if (cur > max) + max = cur; + + /* move to next page */ + cur_page = page_end; + expected_iova = ms->iova + page_sz; + /* memsegs are contiguous in memory */ + ms++; + } + + return max; +} + /* * Initialize a general malloc_elem header structure */ diff --git a/lib/librte_eal/common/malloc_elem.h b/lib/librte_eal/common/malloc_elem.h index 7331af9ca..e2bda4c02 100644 --- a/lib/librte_eal/common/malloc_elem.h +++ b/lib/librte_eal/common/malloc_elem.h @@ -179,4 +179,10 @@ malloc_elem_free_list_index(size_t size); void malloc_elem_free_list_insert(struct malloc_elem *elem); +/* + * Find biggest IOV
[dpdk-dev] [PATCH v6 3/3] memzone: improve zero-length memzone reserve
Currently, reserving zero-length memzones is done by looking at malloc statistics, and reserving biggest sized element found in those statistics. This has two issues. First, there is a race condition. The heap is unlocked between the time we check stats, and the time we reserve malloc element for memzone. This may lead to inability to reserve the memzone we wanted to reserve, because another allocation might have taken place and biggest sized element may no longer be available. Second, the size returned by malloc statistics does not include any alignment information, which is worked around by being conservative and subtracting alignment length from the final result. This leads to fragmentation and reserving memzones that could have been bigger but aren't. Fix all of this by using earlier-introduced operation to reserve biggest possible malloc element. This, however, comes with a trade-off, because we can only lock one heap at a time. So, if we check the first available heap and find *any* element at all, that element will be considered "the biggest", even though other heaps might have bigger elements. We cannot know what other heaps have before we try and allocate it, and it is not a good idea to lock all of the heaps at the same time, so, we will just document this limitation and encourage users to reserve memzones with socket id properly set. Also, fixup unit tests to account for the new behavior. Fixes: fafcc11985a2 ("mem: rework memzone to be allocated by malloc") Cc: sergio.gonzalez.mon...@intel.com Signed-off-by: Anatoly Burakov --- Notes: v6: - Rebase on 18.05 v5: - Use bound len when reserving bounded zero-length memzones v4: - Rebased on latest master - Improved documentation - Added accounting for element pad [1] - Fixed max len underflow in test - Checkpatch fixes [1] A patch has recently fixed a similar issue: https://dpdk.org/dev/patchwork/patch/39332/ The accounting for padding is also needed because size of the element may include not only malloc header overhead, but also the padding if it has any. At first glance, it would seem like additional change is needed for pre-18.05 code as well. However, on closer inspection, the original code was incorrect because it was comparing requested_len to 0, which is never zero and is always a minimum of cache line size due to earlier RTE_MAX() call (or rather, it could be zero, but in that case it would fail earlier). This downgrades the above quoted bug from "potential memory corruption bug" to "this bug was never a bug due to another bug". A proper fix for pre-18.05 would be to remove the check altogether and always go by requested_len, which is what we use to reserve memzones in the first place. I will submit it separately. lib/librte_eal/common/eal_common_memzone.c | 70 ++--- lib/librte_eal/common/include/rte_memzone.h | 24 ++- test/test/test_memzone.c| 165 +++- 3 files changed, 128 insertions(+), 131 deletions(-) diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c index faa3b0615..7300fe05d 100644 --- a/lib/librte_eal/common/eal_common_memzone.c +++ b/lib/librte_eal/common/eal_common_memzone.c @@ -52,38 +52,6 @@ memzone_lookup_thread_unsafe(const char *name) return NULL; } - -/* This function will return the greatest free block if a heap has been - * specified. If no heap has been specified, it will return the heap and - * length of the greatest free block available in all heaps */ -static size_t -find_heap_max_free_elem(int *s, unsigned align) -{ - struct rte_mem_config *mcfg; - struct rte_malloc_socket_stats stats; - int i, socket = *s; - size_t len = 0; - - /* get pointer to global configuration */ - mcfg = rte_eal_get_configuration()->mem_config; - - for (i = 0; i < RTE_MAX_NUMA_NODES; i++) { - if ((socket != SOCKET_ID_ANY) && (socket != i)) - continue; - - malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats); - if (stats.greatest_free_size > len) { - len = stats.greatest_free_size; - *s = i; - } - } - - if (len < MALLOC_ELEM_OVERHEAD + align) - return 0; - - return len - MALLOC_ELEM_OVERHEAD - align; -} - static const struct rte_memzone * memzone_reserve_aligned_thread_unsafe(const char *name, size_t len, int socket_id, unsigned int flags, unsigned int align, @@ -92,6 +60,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len, struct rte_memzone *mz; struct rte_mem_config *mcfg; struct rte_fbarray *arr; + void *mz_addr; size_t requested_len; int mz_idx; bool contig; @@ -140,8 +109,7 @@ memzone_reserve_aligned_threa
[dpdk-dev] [PATCH v3] net/ixgbe: fix crash on detach
When detaching a port bound to ixgbe PMD, if the port does not have any VFs, *vfinfo is not set and there is a NULL dereference attempt, when calling rte_eth_switch_domain_free(), which expects VFs to be used, causing a segmentation fault. Steps to reproduce: ./testpmd -- -i testpmd> port stop all testpmd> port close all testpmd> port detach 0 Bugzilla ID: 57 Fixes: cf80ba6e2038 ("net/ixgbe: add support for representor ports") Cc: sta...@dpdk.org Reported-by: Anatoly Burakov Signed-off-by: Pablo de Lara Tested-by: Anatoly Burakov Acked-by: Remy Horton --- Changes in v3: - Added Bugzilla ID Changes in v2: - CC'd stable list drivers/net/ixgbe/ixgbe_pf.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c index 4d199c802..c381acf44 100644 --- a/drivers/net/ixgbe/ixgbe_pf.c +++ b/drivers/net/ixgbe/ixgbe_pf.c @@ -135,14 +135,14 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev) RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx = 0; RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx = 0; - ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id); - if (ret) - PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret); - vf_num = dev_num_vf(eth_dev); if (vf_num == 0) return; + ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id); + if (ret) + PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret); + rte_free(*vfinfo); *vfinfo = NULL; } -- 2.17.0
Re: [dpdk-dev] Compilation of MLX5 driver
On Thu, May 31, 2018 at 09:14:03AM +, Nitin Katiyar wrote: > Yes,I installed it using --dpdk --upstream-libs. What is the way > forward now? In v17.05 MLX5 PMD is still relying on libibverbs and libmlx5, the way Those options you used are necessary to select in their package the installation of libverbs,libmlx5 or rdma-core. Doing this you have selected rdma-core which is not supported in v17.05 DPDK version. You need to install Mellanox OFED without those two options to select libibverbs, libmlx5 to make it work. Regards, > Regards, > Nitin > > -Original Message- > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] > Sent: Thursday, May 31, 2018 1:36 PM > To: Nitin Katiyar > Cc: Shahaf Shuler ; dev@dpdk.org > Subject: Re: [dpdk-dev] Compilation of MLX5 driver > > On Thu, May 31, 2018 at 07:01:17AM +, Nitin Katiyar wrote: > > Hi, > > It has following files: > > > > arch.h ib.h kern-abi.h mlx4dv.h mlx5dv.h opcode.h sa.h > > sa-kern-abi.h verbs.h > > > > I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and > > MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64 > > Did you installed Mellanox OFED with the --dpdk --upstream-libs arguments for > the installation script? > > If it is the case, you should not add them for this version, those options > are for DPDK v17.11 and higher. > > Regards, > > > Regards, > > Nitin > > > > -Original Message- > > From: Shahaf Shuler [mailto:shah...@mellanox.com] > > Sent: Thursday, May 31, 2018 10:51 AM > > To: Nitin Katiyar ; Nélio Laranjeiro > > > > Cc: dev@dpdk.org > > Subject: RE: [dpdk-dev] Compilation of MLX5 driver > > > > Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar: > > > > > > Hi, > > > I was compiling 17.05.02. > > > Regards, > > > Nitin > > > > > > -Original Message- > > > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] > > > Sent: Wednesday, May 30, 2018 6:42 PM > > > To: Nitin Katiyar > > > Cc: dev@dpdk.org > > > Subject: Re: [dpdk-dev] Compilation of MLX5 driver > > > > > > Hi, > > > > > > On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote: > > > > Hi, > > > > I am trying to compile MLX5 PMD driver by setting > > > "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation > > > error. > > > > > > > > fatal error: infiniband/mlx5_hw.h: No such file or directory > > > > Can you list the files you have under /usr/include/infiniband ? > > > > > > > > > > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 > > > > machine > > > but still hitting the same error. Am I missing some other package? > > > > > > Which version of DPDK are you using (it is important to help)? > > > > > > Regards, > > > > > > -- > > > Nélio Laranjeiro > > > 6WIND > > -- > Nélio Laranjeiro > 6WIND -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] [PATCH v2 0/2] Vhost: unitfy receive paths
> -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Tuesday, May 29, 2018 5:45 PM > To: dev@dpdk.org; Bie, Tiwei ; Wang, Zhihong > > Cc: Maxime Coquelin > Subject: [PATCH v2 0/2] Vhost: unitfy receive paths > > Hi, > > This second version fixes the feature bit check in > rxvq_is_mergeable(), and remove "mergeable" from rx funcs > names. No difference is seen in the benchmarks > > This series is preliminary work to ease the integration of > packed ring layout support. But even without packed ring > layout, the result is positive. > > First patch unify both paths, and second one is a small > optimization to avoid copying batch_copy_nb_elems VQ field > to/from the stack. > > With the series applied, I get modest performance gain for > both mergeable and non-mergeable casesi (, and the gain of > about 300 LoC is non negligible maintenance-wise. > > Rx-mrg=off benchmarks: > > ++---+-+-+--+ > |Run | PVP | Guest->Host | Host->Guest | Loopback | > ++---+-+-+--+ > | v18.05-rc5 | 14.47 | 16.64 | 17.57 |13.15 | > | + series | 14.87 | 16.86 | 17.70 |13.30 | > ++---+-+-+--+ > > Rx-mrg=on benchmarks: > > ++--+-+-+--+ > |Run | PVP | Guest->Host | Host->Guest | Loopback | > ++--+-+-+--+ > | v18.05-rc5 | 9.38 | 13.78 | 16.70 |12.79 | > | + series | 9.38 | 13.80 | 17.49 |13.36 | > ++--+-+-+--+ > > Note: Even without my series, the guest->host benchmark with > mergeable buffers enabled looks suspicious as it should in > theory be alsmost identical as when Rx mergeable buffers are > disabled. To be investigated... > > Maxime Coquelin (2): > vhost: unify Rx mergeable and non-mergeable paths > vhost: improve batched copies performance > > lib/librte_vhost/virtio_net.c | 376 > +- > 1 file changed, 37 insertions(+), 339 deletions(-) > Acked-by: Zhihong Wang Thanks Maxime! This is really great to see. ;) We probably need the same improvement for Virtio-pmd. One comment on Virtio/Vhost performance analysis: No matter what type of traffic is used (PVP, or Txonly-Rxonly, Loopback...), we need to be clear on who we're testing, and give the other part excessive CPU resources, otherwise we'll be testing whoever the slowest. Since this patch is for Vhost, I suggest to run N (e.g. N = 4) Virtio threads on N cores, and the corresponding N Vhost threads on a single core, to do performance comparison. Do you think this makes sense? For Guest -> Host, in my test I see Rx-mrg=on has negative impact on Virtio side, probably because Virtio touches something that's not touched when Rx-mrg=off. Thanks -Zhihong
[dpdk-dev] i40evf: Problem with the statistics
Hi, I am testing packet drops scenario by setting the MTU size. My setup have i40evf driver. I set the dpdk interface's MTU size to 1800. I am sending 100 packets of size 1918 each. I am expecting the drop counter to increment. rte_eth_stats_get() returns i.packets with number of packets I sent. There are no drop counters incrementing. Also my application is not recieving any packets. Is there some issue with dpdk statistics? xstats output is as follows. It is not showing any drops but rx_good_bytes counts are incrementing. NIC extended statistics for port 1 rx_good_packets: 656 tx_good_packets: 556 rx_good_bytes: 225160 tx_good_bytes: 33360 rx_errors: 0 tx_errors: 0 rx_mbuf_allocation_errors: 0 rx_q0packets: 0 rx_q0bytes: 0 rx_q0errors: 0 tx_q0packets: 0 tx_q0bytes: 0 rx_bytes: 225160 rx_unicast_packets: 656 rx_multicast_packets: 0 rx_broadcast_packets: 0 rx_dropped_packets: 0 rx_unknown_protocol_packets: 0 tx_bytes: 33360 tx_unicast_packets: 556 tx_multicast_packets: 0 tx_broadcast_packets: 0 tx_dropped_packets: 0 tx_error_packets: 0 Thanks and Regards, Mridula
[dpdk-dev] [RFC v2 0/6] Remove IPC threads
As previously discussed [1], IPC threads need to be removed and their workload moved to interrupt thread. The transition is complete as far as Linux support is concerned, however since there is no interrupt thread on FreeBSD, this patchset effectively disables IPC on FreeBSD for now (hence it still being an RFC and not a v1). Work on adding interrupt thread to FreeBSD is in progress. [1] http://dpdk.org/dev/patchwork/patch/36579/ Anatoly Burakov (2): ipc: remove IPC thread for async requests ipc: remove main IPC thread Jianfeng Tan (4): eal/linux: use glibc malloc in alarm eal/linux: use glibc malloc in interrupt handling eal: bring forward init of interrupt handling eal: add IPC type for interrupt thread lib/librte_eal/common/eal_common_proc.c | 233 +++--- .../common/include/rte_eal_interrupts.h | 1 + lib/librte_eal/linuxapp/eal/eal.c | 10 +- lib/librte_eal/linuxapp/eal/eal_alarm.c | 9 +- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 19 +- test/test/test_interrupts.c | 29 ++- 6 files changed, 137 insertions(+), 164 deletions(-) -- 2.17.0
[dpdk-dev] [RFC v2 1/6] eal/linux: use glibc malloc in alarm
From: Jianfeng Tan We will reply on alarm API for async IPC request as following patch indicates. rte_malloc could require async IPC request. To avoid such chicken or the egg causality dilemma, we change to use glibc malloc in alarm implimentation. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_alarm.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c b/lib/librte_eal/linuxapp/eal/eal_alarm.c index c115e823a..391d2a65f 100644 --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c @@ -19,7 +19,6 @@ #include #include #include -#include #include #include @@ -91,7 +90,7 @@ eal_alarm_callback(void *arg __rte_unused) rte_spinlock_lock(&alarm_list_lk); LIST_REMOVE(ap, next); - rte_free(ap); + free(ap); } if (!LIST_EMPTY(&alarm_list)) { @@ -122,7 +121,7 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback cb_fn, void *cb_arg) if (us < 1 || us > (UINT64_MAX - US_PER_S) || cb_fn == NULL) return -EINVAL; - new_alarm = rte_zmalloc(NULL, sizeof(*new_alarm), 0); + new_alarm = calloc(1, sizeof(*new_alarm)); if (new_alarm == NULL) return -ENOMEM; @@ -196,7 +195,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg) if (ap->executing == 0) { LIST_REMOVE(ap, next); - rte_free(ap); + free(ap); count++; } else { /* If calling from other context, mark that alarm is executing @@ -220,7 +219,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void *cb_arg) if (ap->executing == 0) { LIST_REMOVE(ap, next); - rte_free(ap); + free(ap); count++; ap = ap_prev; } else if (pthread_equal(ap->executing_id, pthread_self()) == 0) -- 2.17.0
[dpdk-dev] [RFC v2 3/6] eal/linux: use glibc malloc in interrupt handling
From: Jianfeng Tan We will rely on interrupt thread to implement IPC; and IPC initialization is in very early stage, when memory subsystem is not initialized yet. So we change to use glibc malloc/free. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index 056d41c12..180c0378a 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -30,7 +30,6 @@ #include #include #include -#include #include #include #include @@ -405,8 +404,7 @@ rte_intr_callback_register(const struct rte_intr_handle *intr_handle, } /* allocate a new interrupt callback entity */ - callback = rte_zmalloc("interrupt callback list", - sizeof(*callback), 0); + callback = calloc(1, sizeof(*callback)); if (callback == NULL) { RTE_LOG(ERR, EAL, "Can not allocate memory\n"); return -ENOMEM; @@ -431,10 +429,10 @@ rte_intr_callback_register(const struct rte_intr_handle *intr_handle, /* no existing callbacks for this - add new source */ if (src == NULL) { - if ((src = rte_zmalloc("interrupt source list", - sizeof(*src), 0)) == NULL) { + src = calloc(1, sizeof(*src)); + if (src == NULL) { RTE_LOG(ERR, EAL, "Can not allocate memory\n"); - rte_free(callback); + free(callback); ret = -ENOMEM; } else { src->intr_handle = *intr_handle; @@ -501,7 +499,7 @@ rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle, if (cb->cb_fn == cb_fn && (cb_arg == (void *)-1 || cb->cb_arg == cb_arg)) { TAILQ_REMOVE(&src->callbacks, cb, next); - rte_free(cb); + free(cb); ret++; } } @@ -509,7 +507,7 @@ rte_intr_callback_unregister(const struct rte_intr_handle *intr_handle, /* all callbacks for that source are removed. */ if (TAILQ_EMPTY(&src->callbacks)) { TAILQ_REMOVE(&intr_sources, src, next); - rte_free(src); + free(src); } } -- 2.17.0
[dpdk-dev] [RFC v2 6/6] ipc: remove main IPC thread
Previously, to handle requests from peer(s), or replies for a request (sync or async) by itself, a dedicated IPC thread was set up. Now that every other piece of the puzzle is in place, we can get rid of the IPC thread, and move waiting for IPC messages entirely into the interrupt thread. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov Suggested-by: Thomas Monjalon --- Notes: RFC->RFCv2: - Fixed resource leaks - Improved readability lib/librte_eal/common/eal_common_proc.c | 46 ++--- 1 file changed, 25 insertions(+), 21 deletions(-) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index 6f3366403..162d67ca5 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -101,6 +102,8 @@ static struct { /**< used in async requests only */ }; +static struct rte_intr_handle ipc_intr_handle; + /* forward declarations */ static int mp_send(struct rte_mp_msg *msg, const char *peer, int type); @@ -350,18 +353,17 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un *s) } } -static void * +static void mp_handle(void *arg __rte_unused) { struct mp_msg_internal msg; struct sockaddr_un sa; while (1) { - if (read_msg(&msg, &sa) == 0) - process_msg(&msg, &sa); + if (read_msg(&msg, &sa) < 0) + break; + process_msg(&msg, &sa); } - - return NULL; } static int @@ -570,7 +572,6 @@ rte_mp_channel_init(void) { char path[PATH_MAX]; int dir_fd; - pthread_t mp_handle_tid; /* create filter path */ create_socket_path("*", path, sizeof(path)); @@ -585,36 +586,32 @@ rte_mp_channel_init(void) if (dir_fd < 0) { RTE_LOG(ERR, EAL, "failed to open %s: %s\n", mp_dir_path, strerror(errno)); - return -1; + goto fail; } if (flock(dir_fd, LOCK_EX)) { RTE_LOG(ERR, EAL, "failed to lock %s: %s\n", mp_dir_path, strerror(errno)); - close(dir_fd); - return -1; + goto fail; } if (rte_eal_process_type() == RTE_PROC_PRIMARY && unlink_sockets(mp_filter)) { RTE_LOG(ERR, EAL, "failed to unlink mp sockets\n"); - close(dir_fd); - return -1; + goto fail; } if (open_socket_fd() < 0) { - close(dir_fd); - return -1; + goto fail; } - if (rte_ctrl_thread_create(&mp_handle_tid, "rte_mp_handle", - NULL, mp_handle, NULL) < 0) { - RTE_LOG(ERR, EAL, "failed to create mp thead: %s\n", - strerror(errno)); - close(mp_fd); - close(dir_fd); - mp_fd = -1; - return -1; + ipc_intr_handle.fd = mp_fd; + ipc_intr_handle.type = RTE_INTR_HANDLE_IPC; + + if (rte_intr_callback_register(&ipc_intr_handle, mp_handle, NULL) < 0) { + RTE_LOG(ERR, EAL, "failed to register IPC interrupt callback: %s\n", + strerror(errno)); + goto fail; } /* unlock the directory */ @@ -622,6 +619,13 @@ rte_mp_channel_init(void) close(dir_fd); return 0; +fail: + if (dir_fd >= 0) + close(dir_fd); + if (mp_fd >= 0) + close(mp_fd); + mp_fd = -1; + return -1; } /** -- 2.17.0
[dpdk-dev] [RFC v2 4/6] eal: bring forward init of interrupt handling
From: Jianfeng Tan IPC will reply on interrupt handling, so we move forward the init of interrupt handling. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 8655b8691..f8a0c06d7 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -839,6 +839,11 @@ rte_eal_init(int argc, char **argv) rte_config_init(); + if (rte_eal_intr_init() < 0) { + rte_eal_init_alert("Cannot init interrupt-handling thread\n"); + return -1; + } + /* Put mp channel init before bus scan so that we can init the vdev * bus through mp channel in the secondary process before the bus scan. */ @@ -968,11 +973,6 @@ rte_eal_init(int argc, char **argv) rte_config.master_lcore, (int)thread_id, cpuset, ret == 0 ? "" : "..."); - if (rte_eal_intr_init() < 0) { - rte_eal_init_alert("Cannot init interrupt-handling thread\n"); - return -1; - } - RTE_LCORE_FOREACH_SLAVE(i) { /* -- 2.17.0
[dpdk-dev] [RFC v2 2/6] ipc: remove IPC thread for async requests
Previously, we were using two IPC threads - one to handle messages and synchronous requests, and another to handle asynchronous requests. To handle replies for an async request, rte_mp_handle woke up the rte_mp_handle_async thread to process through pthread_cond variable. Change it to handle asynchronous messages within the main IPC thread. To handle timeout events, for each async request which is sent, we set an alarm for it. If its reply is received before timeout, we will cancel the alarm when we handle the reply; otherwise, alarm will invoke the async_reply_handle() as the alarm callback. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov Suggested-by: Thomas Monjalon --- Notes: RFC->RFCv2: - Rebased on latest code - Implemented comments to the original RFC lib/librte_eal/common/eal_common_proc.c | 191 1 file changed, 65 insertions(+), 126 deletions(-) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index 707d8ab30..6f3366403 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -20,6 +20,7 @@ #include #include +#include #include #include #include @@ -94,11 +95,9 @@ TAILQ_HEAD(pending_request_list, pending_request); static struct { struct pending_request_list requests; pthread_mutex_t lock; - pthread_cond_t async_cond; } pending_requests = { .requests = TAILQ_HEAD_INITIALIZER(pending_requests.requests), .lock = PTHREAD_MUTEX_INITIALIZER, - .async_cond = PTHREAD_COND_INITIALIZER /**< used in async requests only */ }; @@ -106,6 +105,16 @@ static struct { static int mp_send(struct rte_mp_msg *msg, const char *peer, int type); +/* for use with alarm callback */ +static void +async_reply_handle(void *arg); + +/* for use with process_msg */ +static struct pending_request * +async_reply_handle_thread_unsafe(void *arg); + +static void +trigger_async_action(struct pending_request *req); static struct pending_request * find_pending_request(const char *dst, const char *act_name) @@ -290,6 +299,8 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un *s) RTE_LOG(DEBUG, EAL, "msg: %s\n", msg->name); if (m->type == MP_REP || m->type == MP_IGN) { + struct pending_request *req = NULL; + pthread_mutex_lock(&pending_requests.lock); pending_req = find_pending_request(s->sun_path, msg->name); if (pending_req) { @@ -301,11 +312,14 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un *s) if (pending_req->type == REQUEST_TYPE_SYNC) pthread_cond_signal(&pending_req->sync.cond); else if (pending_req->type == REQUEST_TYPE_ASYNC) - pthread_cond_signal( - &pending_requests.async_cond); + req = async_reply_handle_thread_unsafe( + pending_req); } else RTE_LOG(ERR, EAL, "Drop mp reply: %s\n", msg->name); pthread_mutex_unlock(&pending_requests.lock); + + if (req != NULL) + trigger_async_action(req); return; } @@ -365,7 +379,6 @@ timespec_cmp(const struct timespec *a, const struct timespec *b) } enum async_action { - ACTION_NONE, /**< don't do anything */ ACTION_FREE, /**< free the action entry, but don't trigger callback */ ACTION_TRIGGER /**< trigger callback, then free action entry */ }; @@ -375,7 +388,7 @@ process_async_request(struct pending_request *sr, const struct timespec *now) { struct async_request_param *param; struct rte_mp_reply *reply; - bool timeout, received, last_msg; + bool timeout, last_msg; param = sr->async.param; reply = ¶m->user_reply; @@ -383,13 +396,6 @@ process_async_request(struct pending_request *sr, const struct timespec *now) /* did we timeout? */ timeout = timespec_cmp(¶m->end, now) <= 0; - /* did we receive a response? */ - received = sr->reply_received != 0; - - /* if we didn't time out, and we didn't receive a response, ignore */ - if (!timeout && !received) - return ACTION_NONE; - /* if we received a response, adjust relevant data and copy mesasge. */ if (sr->reply_received == 1 && sr->reply) { struct rte_mp_msg *msg, *user_msgs, *tmp; @@ -448,118 +454,58 @@ trigger_async_action(struct pending_request *sr) free(sr->async.param->user_reply.msgs); free(sr->async.param); free(sr->request); + free(sr); } static struct pending_request * -check_trigger(struct timespec *ts) +async_reply_handle_thread_unsafe(void *arg) { - struct pending_request *next
[dpdk-dev] [RFC v2 5/6] eal: add IPC type for interrupt thread
From: Jianfeng Tan We are going to merge IPC into interrupt thread. This patch adds IPC type for interrupt thread. Signed-off-by: Jianfeng Tan Signed-off-by: Anatoly Burakov --- Notes: RFC->RFCv2: - Fixed typo in test app .../common/include/rte_eal_interrupts.h | 1 + lib/librte_eal/linuxapp/eal/eal_interrupts.c | 5 test/test/test_interrupts.c | 29 ++- 3 files changed, 34 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h index 6eb493271..344db768d 100644 --- a/lib/librte_eal/common/include/rte_eal_interrupts.h +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h @@ -35,6 +35,7 @@ enum rte_intr_handle_type { RTE_INTR_HANDLE_EXT, /**< external handler */ RTE_INTR_HANDLE_VDEV, /**< virtual device */ RTE_INTR_HANDLE_DEV_EVENT,/**< device event handle */ + RTE_INTR_HANDLE_IPC, /**< IPC event handle */ RTE_INTR_HANDLE_MAX /**< count of elements */ }; diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index 180c0378a..390672739 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -560,6 +560,8 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle) /* not used at this moment */ case RTE_INTR_HANDLE_DEV_EVENT: return -1; + case RTE_INTR_HANDLE_IPC: + return -1; /* unknown handle type */ default: RTE_LOG(ERR, EAL, @@ -610,6 +612,8 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle) /* not used at this moment */ case RTE_INTR_HANDLE_DEV_EVENT: return -1; + case RTE_INTR_HANDLE_IPC: + return -1; /* unknown handle type */ default: RTE_LOG(ERR, EAL, @@ -679,6 +683,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) call = true; break; case RTE_INTR_HANDLE_DEV_EVENT: + case RTE_INTR_HANDLE_IPC: bytes_read = 0; call = true; break; diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c index dc19175d3..fa18ddf75 100644 --- a/test/test/test_interrupts.c +++ b/test/test/test_interrupts.c @@ -21,6 +21,7 @@ enum test_interrupt_handle_type { TEST_INTERRUPT_HANDLE_VALID_UIO, TEST_INTERRUPT_HANDLE_VALID_ALARM, TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT, + TEST_INTERRUPT_HANDLE_VALID_IPC, TEST_INTERRUPT_HANDLE_CASE1, TEST_INTERRUPT_HANDLE_MAX }; @@ -85,6 +86,10 @@ test_interrupt_init(void) intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type = RTE_INTR_HANDLE_DEV_EVENT; + intr_handles[TEST_INTERRUPT_HANDLE_VALID_IPC].fd = pfds.readfd; + intr_handles[TEST_INTERRUPT_HANDLE_VALID_IPC].type = + RTE_INTR_HANDLE_IPC; + intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd; intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type = RTE_INTR_HANDLE_UIO; @@ -263,6 +268,14 @@ test_interrupt_enable(void) return -1; } + /* check with specific valid intr_handle */ + test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_IPC]; + if (rte_intr_enable(&test_intr_handle) == 0) { + printf("unexpectedly enable a specific intr_handle " + "successfully\n"); + return -1; + } + /* check with valid handler and its type */ test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1]; if (rte_intr_enable(&test_intr_handle) < 0) { @@ -327,6 +340,14 @@ test_interrupt_disable(void) return -1; } + /* check with specific valid intr_handle */ + test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_IPC]; + if (rte_intr_disable(&test_intr_handle) == 0) { + printf("unexpectedly disable a specific intr_handle " + "successfully\n"); + return -1; + } + /* check with valid handler and its type */ test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1]; if (rte_intr_disable(&test_intr_handle) < 0) { @@ -424,7 +445,7 @@ test_interrupt(void) printf("Check valid alarm interrupt full path\n"); if (test_interrupt_full_path_check( - TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) { + TEST_INTERRUPT_HANDLE_VALID_IPC) < 0) { printf("failure occurred during checking valid alarm " "interrupt full path\n"); goto out; @@ -548,
[dpdk-dev] 16.11.7 (LTS) patches review and test
Hi all, Here is a list of patches targeted for LTS release 16.11.7. Please help review and test. The planned date for the final release is Monday, June the 11th. Before that, please shout if anyone has objections with these patches being applied. Also for the companies committed to running regression tests, please run the tests and report any issue before the release date. These patches are located at branch 16.11 of dpdk-stable repo: https://dpdk.org/browse/dpdk-stable/ Thanks. Luca Boccassi --- Ajit Khaparde (6): net/bnxt: fix Rx drop setting net/bnxt: fix endianness of flag net/bnxt: fix Rx checksum flags for tunnel frames net/bnxt: avoid freeing memzone multiple times net/bnxt: fix mbuf data offset initialization net/bnxt: fix Rx checksum flags Alejandro Lucero (3): net/nfp: fix assigning port id in mbuf net/nfp: fix barrier location net/nfp: fix mbufs releasing when stop or close Allain Legacy (1): ip_frag: fix double free of chained mbufs Anatoly Burakov (3): memzone: fix size on reserving biggest memzone eal: remove unused path pattern mempool: fix virtual address population Andrew Rybchenko (2): mempool: fix leak when no objects are populated test/mempool: fix autotest retry Andy Green (29): eal: explicit cast of builtin for bsf32 eal: explicit cast of core id when getting index eal: declare trace buffer at top of own block spinlock/x86: move stack declaration before code net: move stack variable at top of VLAN strip function ethdev: explicit cast of buffered Tx number hash: move stack declaration at top of CRC32c function hash: explicit casts for truncation in CRC32c net/nfp: fix memcpy out of source range net/bnx2x: do not cast function pointers as a policy net/bnx2x: fix KR2 device check net/bnx2x: fix memzone name overrun net/qede: replace strncpy by strlcpy net/qede: fix strncpy bus/pci: fix size of driver name buffer eal: fix casts in random functions mbuf: fix reference counter integer promotion mbuf: explicit casts of reference counter mbuf: explicit cast of headroom on reset mbuf: explicit cast of size on detach net: explicit cast of multicast bit clearing net: explicit cast of IP checksum to 16-bit net: explicit cast of protocol in IPv6 checksum ethdev: explicit cast of queue count return eal: explicit cast in rwlock functions net: explicit cast in L4 checksum mbuf: fix type of private size in detach mbuf: avoid integer promotion in prepend/adj/chain ethdev: fix type and scope of variables in Rx burst Beilei Xing (2): net/i40e: fix link status update net/i40e: fix failing to disable FDIR Tx queue Bruce Richardson (1): eal: support strlcpy function Chas Williams (5): net/vmxnet3: set the queue shared buffer at start net/bonding: fix setting VLAN ID on slave ports net/bonding: clear started state if start fails net/vmxnet3: keep link state consistent net/bonding: export mode 4 slave info routine Ciara Loftus (1): net/vhost: initialise device as inactive Daniel Shelepov (1): app/testpmd: fix burst stats reporting David Hunt (3): test/distributor: fix return type of thread function test/pipeline: fix return type of stub miss examples/performance-thread: fix return type of threads Fan Zhang (1): net/i40e: fix link update no wait Ferruh Yigit (3): drivers/net: fix icc deprecated parameter warning drivers/net: fix link autoneg value for virtual PMDs net/i40e: fix shifts of signed values Gowrishankar Muthukrishnan (1): eal/ppc: remove braces in SMP memory barrier macro Hyong Youb Kim (1): net/enic: allocate stats DMA buffer upfront during probe Ivan Malov (1): ethdev: improve doc for name by port ID API Jasvinder Singh (1): test/pipeline: fix type of table entry parameter Jerin Jacob (1): app/crypto-perf: fix parameters copy Jianfeng Tan (1): net/virtio-user: fix hugepage files enumeration John Daley (1): net/enic: fix crash on MTU update with non-setup queues Keith Wiles (1): kvargs: fix syntax in comments Lee Roberts (1): kni: fix build on RHEL 7.5 Li Han (1): ip_frag: fix some debug logs Matan Azrad (5): app/testpmd: fix slave port detection app/testpmd: fix valid ports prints app/testpmd: fix forward ports update app/testpmd: fix forward ports Rx flush app/testpmd: fix synchronic port hotplug Matej Vido (2): net/szedata2: fix total stats net/szedata2: fix format string for PCI address Maxime Coquelin (2): vhost: fix compilation issue when vhost debug enabled vhost: improve dirty pages logging performance Mohammad Abdul Awal (1): ethdev: fix string length in name comparison Nitin
[dpdk-dev] Regression tests for stable releases from companies involved in DPDK
Hello all, At this morning's release meeting (minutes coming soon from John), we briefly discussed the state of the regression testing for stable releases and agreed we need to formalise the process. At the moment we have a firm commitment from Intel and Mellanox to test all stable branches (and if I heard correctly from NXP as well? Please confirm!). AT&T committed to run regressions on the 16.11 branch. Here's what we need in order to improve the quality of the stable releases process: 1) More commitments to help from other companies involved in the DPDK community. At the cost of re-stating the obvious, improving the quality of stable releases is for everyone's benefit, as a lot of customers and projects rely on the stable or LTS releases for their production environments. 2) A formalised deadline - the current proposal is 10 days from the "xx.yy patches review and test" email, which was just sent for 16.11. For the involved companies, please let us know if 10 days is enough. In terms of scheduling, this period will always start within a week from the mainline final release. Again, the signal is the "xx.yy patches review and test" appearing in the inbox, which will detail the deadline. Comments? -- Kind regards, Luca Boccassi
[dpdk-dev] [RFC 0/3] Make device mapping more reliable
Currently, memory for device maps is allocated ad-hoc, by calculating end of VA space allocated for hugepages and crossing fingers in hopes that those addresses will be free in primary and secondary processes. This leads to situations such as this: EAL: Detected 88 lcore(s) EAL: Detected 2 NUMA nodes EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_178323_8af2229603de4 EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device :81:00.0 on NUMA socket 1 EAL: probe driver: 8086:1563 net_ixgbe EAL: Cannot mmap device resource file /sys/bus/pci/devices/:81:00.0/resource0 to address: 0x7ff7f580 EAL: Requested device :81:00.0 cannot be used EAL: Error - exiting with code: 1 Cause: No Ethernet ports - bye As can be seen from the above log, secondary process has initialized successfully, but device BAR mapping has failed, which resulted in missing ports in the secondary process. This patchset is an attempt to fix this problem once and for all, by using the same method we use for memory to do device mappings as well. That is, by preallocating all of the device memory in advance, so that initialization either succeeds and allows for device mappings, or it fails outright (whereas currently we may be in an in-between kind of situation, where init has succeeded but device mappings have failed). This change breaks the ABI, so it is not for this release. However, i'd like to hear feedback on the approach and whether there are potential problems with other buses/use cases that i didn't think of. Anatoly Burakov (3): fbarray: allow zero-sized elements mem: add device memory reserve/free API bus/pci: use the new device memory API for BAR mapping drivers/bus/pci/linux/pci_init.h | 1 - drivers/bus/pci/linux/pci_uio.c | 11 +- drivers/bus/pci/linux/pci_vfio.c | 27 +- lib/librte_eal/common/eal_common_fbarray.c| 10 +- lib/librte_eal/common/eal_common_memory.c | 270 -- .../common/include/rte_eal_memconfig.h| 18 ++ lib/librte_eal/common/include/rte_memory.h| 40 +++ lib/librte_pci/Makefile | 1 + lib/librte_pci/rte_pci.c | 20 +- 9 files changed, 350 insertions(+), 48 deletions(-) -- 2.17.0
[dpdk-dev] [RFC 2/3] mem: add device memory reserve/free API
In order for hotplug in multiprocess to work reliably, we will need a common shared memory area that is guaranteed to be accessible to all processes at all times. This is accomplished by pre-reserving memory that will be used for device mappings at startup, and managing it at runtime. Two new API calls are added: alloc and free of device memory. Once allocation is requested, memory is considered to be reserved until it is freed back using the same API. Usage of which blocks are occupied is tracked using shared fbarray. This allows us to give out device memory piecemeal and lessen fragmentation. Naturally, this adds a limitation of how much device memory DPDK can use. This is currently set to 2 gigabytes, but will be adjustable in later revisions. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_memory.c | 270 -- .../common/include/rte_eal_memconfig.h| 18 ++ lib/librte_eal/common/include/rte_memory.h| 40 +++ 3 files changed, 312 insertions(+), 16 deletions(-) diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 4f0688f9d..8cae9b354 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -33,6 +33,7 @@ */ #define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i" +#define DEVICE_MEMORY_NAME "device_memory" static uint64_t baseaddr_offset; static uint64_t system_page_sz; @@ -904,6 +905,227 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg) return ret; } +void * __rte_experimental +rte_mem_dev_memory_alloc(size_t size, size_t align) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + struct rte_fbarray *arr = &mcfg->device_memory.mem_map_arr; + unsigned int n_pages, page_align; + int start_idx, cur_idx; + void *addr = NULL; + + /* check parameters first */ + if (size == 0 || (size & (system_page_sz - 1)) != 0) { + RTE_LOG(ERR, EAL, "%s(): size is not page-aligned\n", + __func__); + rte_errno = EINVAL; + return NULL; + } + if ((align & (system_page_sz - 1)) != 0) { + RTE_LOG(ERR, EAL, "%s(): alignment is not page-aligned\n", + __func__); + rte_errno = EINVAL; + return NULL; + } + /* PCI BAR sizes can only be powers of two, but this memory may be used +* for more than just PCI BAR mappings, so only check if alignment is +* power of two. +*/ + if (align != 0 && !rte_is_power_of_2(align)) { + RTE_LOG(ERR, EAL, "%s(): alignment is not a power of two\n", + __func__); + rte_errno = EINVAL; + return NULL; + } + /* check if device memory map is uninitialized. */ + if (mcfg->device_memory.base_va == NULL || arr->len == 0) { + RTE_LOG(ERR, EAL, "%s(): device memory map is not initialized\n", + __func__); + rte_errno = ENODEV; + return NULL; + } + + n_pages = size / system_page_sz; + page_align = align / system_page_sz; + + /* lock the device memory map */ + rte_spinlock_lock(&mcfg->device_memory.lock); + + start_idx = 0; + while (1) { + size_t offset; + int end; + + cur_idx = rte_fbarray_find_next_n_free(arr, start_idx, n_pages); + if (cur_idx < 0) + break; + + /* if there are alignment requirements, check if the offset we +* found is aligned, and if not, align it and check if we still +* have enough space. +*/ + if (page_align != 0 && (cur_idx & (page_align - 1)) != 0) { + unsigned int aligned, len; + + aligned = RTE_ALIGN_CEIL(cur_idx, page_align); + len = rte_fbarray_find_contig_free(arr, aligned); + + /* if there's not enough space, keep looking */ + if (len < n_pages) { + start_idx = aligned + len; + continue; + } + + /* we've found space */ + cur_idx = aligned; + } + end = cur_idx + n_pages; + offset = cur_idx * system_page_sz; + addr = RTE_PTR_ADD(mcfg->device_memory.base_va, + offset); + + /* now, mark all space as occupied */ + for (; cur_idx < end; cur_idx++) + rte_fbarray_set_used(arr, cur_idx); + break; + } + rte_spinlock_unlock(&mcfg->device_memory.lock); + + if (addr != NULL) + RTE_LOG(DEBUG, EAL, "%s(): allocated %p-%p
[dpdk-dev] [RFC 1/3] fbarray: allow zero-sized elements
We need to keep usage of our memory area indexed, but we don't actually need to store any data - we need just the indexing capabilities of fbarray. Yet, it currently disallows zero-sized elements. Fix that by removing the check for zero-sized elements - the rest will work correctly already. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_fbarray.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index 019f84c18..4a365e7ce 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -391,9 +391,9 @@ set_used(struct rte_fbarray *arr, unsigned int idx, bool used) } static int -fully_validate(const char *name, unsigned int elt_sz, unsigned int len) +fully_validate(const char *name, unsigned int len) { - if (name == NULL || elt_sz == 0 || len == 0 || len > INT_MAX) { + if (name == NULL || len == 0 || len > INT_MAX) { rte_errno = EINVAL; return -1; } @@ -420,7 +420,7 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, return -1; } - if (fully_validate(name, elt_sz, len)) + if (fully_validate(name, len)) return -1; page_sz = sysconf(_SC_PAGESIZE); @@ -511,7 +511,7 @@ rte_fbarray_attach(struct rte_fbarray *arr) * the array, so the parts we care about will not race. */ - if (fully_validate(arr->name, arr->elt_sz, arr->len)) + if (fully_validate(arr->name, arr->len)) return -1; page_sz = sysconf(_SC_PAGESIZE); @@ -858,7 +858,7 @@ rte_fbarray_dump_metadata(struct rte_fbarray *arr, FILE *f) return; } - if (fully_validate(arr->name, arr->elt_sz, arr->len)) { + if (fully_validate(arr->name, arr->len)) { fprintf(f, "Invalid file-backed array\n"); goto out; } -- 2.17.0
[dpdk-dev] [RFC 3/3] bus/pci: use the new device memory API for BAR mapping
Adjust PCI infrastructure to reserve device memory through the new device memory API. Any hotplug event will reserve memory, any hot-unplug event will release memory back to the system. This allows for more reliable PCI mappings in secondary processes, and will be crucial to support multiprocess hotplug. Signed-off-by: Anatoly Burakov --- drivers/bus/pci/linux/pci_init.h | 1 - drivers/bus/pci/linux/pci_uio.c | 11 +-- drivers/bus/pci/linux/pci_vfio.c | 27 --- lib/librte_pci/Makefile | 1 + lib/librte_pci/rte_pci.c | 20 +++- 5 files changed, 33 insertions(+), 27 deletions(-) diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h index c2e603a37..bc9279c66 100644 --- a/drivers/bus/pci/linux/pci_init.h +++ b/drivers/bus/pci/linux/pci_init.h @@ -14,7 +14,6 @@ /* * Helper function to map PCI resources right after hugepages in virtual memory */ -extern void *pci_map_addr; void *pci_find_max_end_va(void); /* parse one line of the "resource" sysfs file (note that the 'line' diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c index d423e4bb0..dbf108b6f 100644 --- a/drivers/bus/pci/linux/pci_uio.c +++ b/drivers/bus/pci/linux/pci_uio.c @@ -26,8 +26,6 @@ #include "eal_filesystem.h" #include "pci_init.h" -void *pci_map_addr = NULL; - #define OFF_MAX ((uint64_t)(off_t)-1) int @@ -316,19 +314,12 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx, goto error; } - /* try mapping somewhere close to the end of hugepages */ - if (pci_map_addr == NULL) - pci_map_addr = pci_find_max_end_va(); - - mapaddr = pci_map_resource(pci_map_addr, fd, 0, + mapaddr = pci_map_resource(NULL, fd, 0, (size_t)dev->mem_resource[res_idx].len, 0); close(fd); if (mapaddr == MAP_FAILED) goto error; - pci_map_addr = RTE_PTR_ADD(mapaddr, - (size_t)dev->mem_resource[res_idx].len); - maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr; maps[map_idx].size = dev->mem_resource[res_idx].len; maps[map_idx].addr = mapaddr; diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c index aeeaa9ed8..f390ea37a 100644 --- a/drivers/bus/pci/linux/pci_vfio.c +++ b/drivers/bus/pci/linux/pci_vfio.c @@ -324,7 +324,7 @@ pci_rte_vfio_setup_device(struct rte_pci_device *dev, int vfio_dev_fd) static int pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res, - int bar_index, int additional_flags) + int bar_index) { struct memreg { unsigned long offset, size; @@ -371,9 +371,14 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res, memreg[0].size = bar->size; } - /* reserve the address using an inaccessible mapping */ - bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE | - MAP_ANONYMOUS | additional_flags, -1, 0); + if (bar->addr == NULL) { + bar_addr = rte_mem_dev_memory_alloc(bar->size, 0); + if (bar_addr == NULL) { + RTE_LOG(ERR, EAL, "%s(): cannot reserve space for device\n", + __func__); + return -1; + } + } if (bar_addr != MAP_FAILED) { void *map_addr = NULL; if (memreg[0].size) { @@ -469,7 +474,6 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev) for (i = 0; i < (int) vfio_res->nb_maps; i++) { struct vfio_region_info reg = { .argsz = sizeof(reg) }; - void *bar_addr; reg.index = i; @@ -494,19 +498,12 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev) if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0) continue; - /* try mapping somewhere close to the end of hugepages */ - if (pci_map_addr == NULL) - pci_map_addr = pci_find_max_end_va(); - - bar_addr = pci_map_addr; - pci_map_addr = RTE_PTR_ADD(bar_addr, (size_t) reg.size); - - maps[i].addr = bar_addr; + maps[i].addr = NULL; maps[i].offset = reg.offset; maps[i].size = reg.size; maps[i].path = NULL; /* vfio doesn't have per-resource paths */ - ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0); + ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i); if (ret < 0) { RTE_LOG(ERR, EAL, " %s mapping BAR%i failed: %s\n", pci_addr, i, strerror(errno)); @@ -574,7 +571,7 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev) maps = vf
[dpdk-dev] i40evf: Problem with the statistics
Hi, I am testing packet drops scenario by setting the MTU size. My setup have i40evf driver. I set the dpdk interface's MTU size to 1800. I am sending 100 packets of size 1918 each. I am expecting the drop counter to increment. rte_eth_stats_get() returns i.packets with number of packets I sent. There are no drop counters incrementing. Also my application is not recieving any packets. Is there some issue with dpdk statistics? xstats output is as follows. It is not showing any drops but rx_good_bytes counts are incrementing. NIC extended statistics for port 1 rx_good_packets: 656 tx_good_packets: 556 rx_good_bytes: 225160 tx_good_bytes: 33360 rx_errors: 0 tx_errors: 0 rx_mbuf_allocation_errors: 0 rx_q0packets: 0 rx_q0bytes: 0 rx_q0errors: 0 tx_q0packets: 0 tx_q0bytes: 0 rx_bytes: 225160 rx_unicast_packets: 656 rx_multicast_packets: 0 rx_broadcast_packets: 0 rx_dropped_packets: 0 rx_unknown_protocol_packets: 0 tx_bytes: 33360 tx_unicast_packets: 556 tx_multicast_packets: 0 tx_broadcast_packets: 0 tx_dropped_packets: 0 tx_error_packets: 0 Thanks and Regards, Mridula
[dpdk-dev] [Bug 58] cppcheck static analyzer warnings
https://dpdk.org/tracker/show_bug.cgi?id=58 Bug ID: 58 Summary: cppcheck static analyzer warnings Product: DPDK Version: 18.05 Hardware: All OS: All Status: CONFIRMED Severity: normal Priority: Normal Component: core Assignee: dev@dpdk.org Reporter: ferruh.yi...@intel.com Target Milestone: --- There was already a mail in mail list to report this issue: https://dpdk.org/ml/archives/dev/2018-May/101961.html Some of the issues fixed in v18.05, but some still remain. Following is the list of the remaining issues: [app/test-pmd/cmdline_mtr.c:115]: (error) Memory leak: dscp_table [app/test-pmd/flowgen.c:160]: (error) Uninitialized variable: ol_flags [app/test-pmd/tm.c:594]: (error) Memory leak: tnp.shared_shaper_id [drivers/bus/dpaa/base/fman/fman.c:557]: (error) Uninitialized variable: __if [drivers/bus/dpaa/base/qbman/qman.c:1220]: (error) Address of auto-variable 'p->shadow_dqrr[DQRR_PTR2IDX(dq)]' returned [drivers/bus/ifpga/ifpga_bus.c:436]: (warning) Possible null pointer dereference: c2 [drivers/crypto/ccp/ccp_pci.c:41]: (error) Resource leak: fp [drivers/crypto/dpaa_sec/dpaa_sec.c:662]: (error) Address of auto-variable 'ctx->job' returned [drivers/crypto/dpaa_sec/dpaa_sec.c:731]: (error) Address of auto-variable 'ctx->job' returned [drivers/crypto/dpaa_sec/dpaa_sec.c:826]: (error) Address of auto-variable 'ctx->job' returned [drivers/crypto/dpaa_sec/dpaa_sec.c:881]: (error) Address of auto-variable 'ctx->job' returned [drivers/crypto/dpaa_sec/dpaa_sec.c:1020]: (error) Address of auto-variable 'ctx->job' returned [drivers/crypto/dpaa_sec/dpaa_sec.c:1132]: (error) Address of auto-variable 'ctx->job' returned [drivers/crypto/dpaa_sec/dpaa_sec.c:1258]: (error) Address of auto-variable 'ctx->job' returned [drivers/crypto/dpaa_sec/dpaa_sec.c:1353]: (error) Address of auto-variable 'ctx->job' returned [drivers/crypto/dpaa_sec/dpaa_sec.c:1392]: (error) Address of auto-variable 'ctx->job' returned [drivers/net/avf/base/avf_adminq.c:301]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/avf/base/avf_adminq.c:336]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/avf/base/avf_adminq.c:298]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/avf/base/avf_adminq.c:333]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/avf/base/avf_common.c:367]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/avf/base/avf_common.c:364]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/axgbe/axgbe_dev.c:808]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/axgbe/axgbe_dev.c] -> [drivers/net/axgbe/axgbe_dev.c]: (error) Invalid value: 0x0204_BUSY_WIDTH [drivers/net/axgbe/axgbe_ethdev.c] -> [drivers/net/axgbe/axgbe_ethdev.c]: (error) Invalid value: 0x0008_PR_WIDTH [drivers/net/axgbe/axgbe_i2c.c] -> [drivers/net/axgbe/axgbe_i2c.c]: (error) Invalid value: 0x006c_EN_WIDTH [drivers/net/axgbe/axgbe_phy_impl.c] -> [drivers/net/axgbe/axgbe_phy_impl.c]: (error) Invalid value: 0x0080_ID_WIDTH [drivers/net/axgbe/axgbe_rxtx.c:292]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/axgbe/axgbe_rxtx.c:592]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/axgbe/axgbe_rxtx.c] -> [drivers/net/axgbe/axgbe_rxtx.c]: (error) Invalid value: 0x48_PRXQ_WIDTH [drivers/net/bnx2x/bnx2x.c:3995]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnx2x/bnx2x.c:4000]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnx2x/bnx2x.c:8729]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnx2x/bnx2x.c:9765]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnx2x/elink.c:1042]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnx2x/elink.c:2711]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnx2x/elink.c:9662]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnx2x/elink.c:10295]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnxt/bnxt_ethdev.c:598]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnxt/bnxt_ethdev.c:638]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnxt/bnxt_rxr.c:486]: (error) Uninitialized variable: ag_cons [drivers/net/bnxt/bnxt_stats.c:211]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/bnxt/bnxt_stats.c:248]: (error) Shifting signed 32-bit value by 31 bits is undefined behaviour [drivers/net/e1000/base/e1000_8257
Re: [dpdk-dev] cppcheck on dpdk
On 5/16/2018 1:41 PM, Ferruh Yigit wrote: > Today after listening Colin's Static Analysis talk, I run cppcheck on > v18.05-rc4 > code and it revealed some issues, sharing here for anyone to interested in > fixing them. At least I encourage to check maintainers to check their own > pieces. > > It is really easy to run cppcheck, in dpdk source folder: > cppcheck --force . > > With above command cppcheck verifies all #ifdef paths, some issues below seems > related to this and that is why these issues not seen in build tests. Some issues are fixed but we still have more, to trace them better submitted a bugzilla issue for it: https://dpdk.org/tracker/show_bug.cgi?id=58
Re: [dpdk-dev] Compilation of MLX5 driver
Thanks Shahaf, it worked after removing the options you specified. Regards, Nitin -Original Message- From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] Sent: Thursday, May 31, 2018 3:23 PM To: Nitin Katiyar Cc: Shahaf Shuler ; dev@dpdk.org Subject: Re: [dpdk-dev] Compilation of MLX5 driver On Thu, May 31, 2018 at 09:14:03AM +, Nitin Katiyar wrote: > Yes,I installed it using --dpdk --upstream-libs. What is the way > forward now? In v17.05 MLX5 PMD is still relying on libibverbs and libmlx5, the way Those options you used are necessary to select in their package the installation of libverbs,libmlx5 or rdma-core. Doing this you have selected rdma-core which is not supported in v17.05 DPDK version. You need to install Mellanox OFED without those two options to select libibverbs, libmlx5 to make it work. Regards, > Regards, > Nitin > > -Original Message- > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] > Sent: Thursday, May 31, 2018 1:36 PM > To: Nitin Katiyar > Cc: Shahaf Shuler ; dev@dpdk.org > Subject: Re: [dpdk-dev] Compilation of MLX5 driver > > On Thu, May 31, 2018 at 07:01:17AM +, Nitin Katiyar wrote: > > Hi, > > It has following files: > > > > arch.h ib.h kern-abi.h mlx4dv.h mlx5dv.h opcode.h sa.h > > sa-kern-abi.h verbs.h > > > > I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and > > MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64 > > Did you installed Mellanox OFED with the --dpdk --upstream-libs arguments for > the installation script? > > If it is the case, you should not add them for this version, those options > are for DPDK v17.11 and higher. > > Regards, > > > Regards, > > Nitin > > > > -Original Message- > > From: Shahaf Shuler [mailto:shah...@mellanox.com] > > Sent: Thursday, May 31, 2018 10:51 AM > > To: Nitin Katiyar ; Nélio Laranjeiro > > > > Cc: dev@dpdk.org > > Subject: RE: [dpdk-dev] Compilation of MLX5 driver > > > > Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar: > > > > > > Hi, > > > I was compiling 17.05.02. > > > Regards, > > > Nitin > > > > > > -Original Message- > > > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] > > > Sent: Wednesday, May 30, 2018 6:42 PM > > > To: Nitin Katiyar > > > Cc: dev@dpdk.org > > > Subject: Re: [dpdk-dev] Compilation of MLX5 driver > > > > > > Hi, > > > > > > On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote: > > > > Hi, > > > > I am trying to compile MLX5 PMD driver by setting > > > "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation > > > error. > > > > > > > > fatal error: infiniband/mlx5_hw.h: No such file or directory > > > > Can you list the files you have under /usr/include/infiniband ? > > > > > > > > > > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 > > > > machine > > > but still hitting the same error. Am I missing some other package? > > > > > > Which version of DPDK are you using (it is important to help)? > > > > > > Regards, > > > > > > -- > > > Nélio Laranjeiro > > > 6WIND > > -- > Nélio Laranjeiro > 6WIND -- Nélio Laranjeiro 6WIND
[dpdk-dev] [PATCH] ethdev: force offloading API rules
The error path was disabled in previous release to let apps to be more flexible. But this release they are enabled, applications have to obey offload API rules otherwise they will get errors from following APIs: rte_eth_dev_configure rte_eth_rx_queue_setup rte_eth_tx_queue_setup Signed-off-by: Ferruh Yigit --- Cc: Shahaf Shuler Cc: Wei Dai Cc: Qi Zhang Cc: Andrew Rybchenko --- lib/librte_ethdev/rte_ethdev.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index cd4bfd3c6..66e311676 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -1171,7 +1171,7 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, local_conf.rxmode.offloads, dev_info.rx_offload_capa, __func__); - /* Will return -EINVAL in the next release */ + return -EINVAL; } if ((local_conf.txmode.offloads & dev_info.tx_offload_capa) != local_conf.txmode.offloads) { @@ -1182,7 +1182,7 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, local_conf.txmode.offloads, dev_info.tx_offload_capa, __func__); - /* Will return -EINVAL in the next release */ + return -EINVAL; } /* Check that device supports requested rss hash functions. */ @@ -1580,7 +1580,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id, local_conf.offloads, dev_info.rx_queue_offload_capa, __func__); - /* Will return -EINVAL in the next release */ + return -EINVAL; } ret = (*dev->dev_ops->rx_queue_setup)(dev, rx_queue_id, nb_rx_desc, @@ -1745,7 +1745,7 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id, local_conf.offloads, dev_info.tx_queue_offload_capa, __func__); - /* Will return -EINVAL in the next release */ + return -EINVAL; } return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev, -- 2.14.3
Re: [dpdk-dev] [PATCH v2 0/2] Vhost: unitfy receive paths
On 05/31/2018 11:55 AM, Wang, Zhihong wrote: -Original Message- From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] Sent: Tuesday, May 29, 2018 5:45 PM To: dev@dpdk.org; Bie, Tiwei ; Wang, Zhihong Cc: Maxime Coquelin Subject: [PATCH v2 0/2] Vhost: unitfy receive paths Hi, This second version fixes the feature bit check in rxvq_is_mergeable(), and remove "mergeable" from rx funcs names. No difference is seen in the benchmarks This series is preliminary work to ease the integration of packed ring layout support. But even without packed ring layout, the result is positive. First patch unify both paths, and second one is a small optimization to avoid copying batch_copy_nb_elems VQ field to/from the stack. With the series applied, I get modest performance gain for both mergeable and non-mergeable casesi (, and the gain of about 300 LoC is non negligible maintenance-wise. Rx-mrg=off benchmarks: ++---+-+-+--+ |Run | PVP | Guest->Host | Host->Guest | Loopback | ++---+-+-+--+ | v18.05-rc5 | 14.47 | 16.64 | 17.57 |13.15 | | + series | 14.87 | 16.86 | 17.70 |13.30 | ++---+-+-+--+ Rx-mrg=on benchmarks: ++--+-+-+--+ |Run | PVP | Guest->Host | Host->Guest | Loopback | ++--+-+-+--+ | v18.05-rc5 | 9.38 | 13.78 | 16.70 |12.79 | | + series | 9.38 | 13.80 | 17.49 |13.36 | ++--+-+-+--+ Note: Even without my series, the guest->host benchmark with mergeable buffers enabled looks suspicious as it should in theory be alsmost identical as when Rx mergeable buffers are disabled. To be investigated... Maxime Coquelin (2): vhost: unify Rx mergeable and non-mergeable paths vhost: improve batched copies performance lib/librte_vhost/virtio_net.c | 376 +- 1 file changed, 37 insertions(+), 339 deletions(-) Acked-by: Zhihong Wang Thanks Maxime! This is really great to see. ;) We probably need the same improvement for Virtio-pmd. Yes, probably. I'll have a look at it, or if you have time to look at it, won't blame you! :) One comment on Virtio/Vhost performance analysis: No matter what type of traffic is used (PVP, or Txonly-Rxonly, Loopback...), we need to be clear on who we're testing, and give the other part excessive CPU resources, otherwise we'll be testing whoever the slowest. Since this patch is for Vhost, I suggest to run N (e.g. N = 4) Virtio threads on N cores, and the corresponding N Vhost threads on a single core, to do performance comparison. Do you think this makes sense? That's a valid point. I'll try this to get the bottleneck. I'm in the process of setting up an automated test bench, it will help running more and more test cases. For Guest -> Host, in my test I see Rx-mrg=on has negative impact on Virtio side, probably because Virtio touches something that's not touched when Rx-mrg=off. I get it now. When mrg=off, we use simple_tx version whereas we use the full one when mrg is off: static int virtio_dev_configure(struct rte_eth_dev *dev) { ... hw->use_simple_rx = 1; hw->use_simple_tx = 1; #if defined RTE_ARCH_ARM64 || defined RTE_ARCH_ARM if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) { hw->use_simple_rx = 0; hw->use_simple_tx = 0; } #endif if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { hw->use_simple_rx = 0; hw->use_simple_tx = 0; } if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM)) hw->use_simple_rx = 0; return 0; } I see two problems here: 1. There should be no reasons not to use simple_tx if mrg is on. 2. We should add test on whether rx and tx offloads have been negotiated to not use simple versions if it has been. Do you agree with that proposed changes? I'll post a RFC for this. Thanks, Maxime Thanks -Zhihong
Re: [dpdk-dev] [PATCH 1/2] librte_ip_frag: add function to delete expired entries
Hi Alex, > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Alex Kiselev > Sent: Wednesday, May 16, 2018 12:04 PM > To: dev@dpdk.org; Burakov, Anatoly > Subject: [dpdk-dev] [PATCH 1/2] librte_ip_frag: add function to delete > expired entries > > add new function rte_frag_table_del_expired_entries() > that scans the list of recently used packets and delete the expired ones. > > A fragmented packets is supposed to live no longer than max_cycles, > but the lib deletes an expired packet only occasionally when it scans > a bucket to find an empty slot while adding a new packet. > Therefore a fragment might sit in the table forever. > > Signed-off-by: Alex Kiselev > --- > lib/librte_ip_frag/ip_frag_common.h| 18 > lib/librte_ip_frag/ip_frag_internal.c | 18 > lib/librte_ip_frag/rte_ip_frag.h | 19 +++- > lib/librte_ip_frag/rte_ip_frag_common.c| 46 > ++ > lib/librte_ip_frag/rte_ip_frag_version.map | 6 > 5 files changed, 88 insertions(+), 19 deletions(-) > > diff --git a/lib/librte_ip_frag/ip_frag_common.h > b/lib/librte_ip_frag/ip_frag_common.h > index 197acf8d8..0fdcc7d0f 100644 > --- a/lib/librte_ip_frag/ip_frag_common.h > +++ b/lib/librte_ip_frag/ip_frag_common.h > @@ -25,6 +25,12 @@ > #define IPv6_KEY_BYTES_FMT \ > "%08" PRIx64 "%08" PRIx64 "%08" PRIx64 "%08" PRIx64 > > +#ifdef RTE_LIBRTE_IP_FRAG_TBL_STAT > +#define IP_FRAG_TBL_STAT_UPDATE(s, f, v)((s)->f += (v)) > +#else > +#define IP_FRAG_TBL_STAT_UPDATE(s, f, v)do {} while (0) > +#endif /* IP_FRAG_TBL_STAT */ > + > /* internal functions declarations */ > struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp, > struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, > @@ -149,4 +155,16 @@ ip_frag_reset(struct ip_frag_pkt *fp, uint64_t tms) > fp->frags[IP_FIRST_FRAG_IDX] = zero_frag; > } > > +/* local frag table helper functions */ > +static inline void > +ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row > *dr, > + struct ip_frag_pkt *fp) > +{ > + ip_frag_free(fp, dr); > + ip_frag_key_invalidate(&fp->key); > + TAILQ_REMOVE(&tbl->lru, fp, lru); > + tbl->use_entries--; > + IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, del_num, 1); > +} > + > #endif /* _IP_FRAG_COMMON_H_ */ > diff --git a/lib/librte_ip_frag/ip_frag_internal.c > b/lib/librte_ip_frag/ip_frag_internal.c > index 2560c7713..97470a872 100644 > --- a/lib/librte_ip_frag/ip_frag_internal.c > +++ b/lib/librte_ip_frag/ip_frag_internal.c > @@ -14,24 +14,6 @@ > #define IP_FRAG_TBL_POS(tbl, sig) \ > ((tbl)->pkt + ((sig) & (tbl)->entry_mask)) > > -#ifdef RTE_LIBRTE_IP_FRAG_TBL_STAT > -#define IP_FRAG_TBL_STAT_UPDATE(s, f, v)((s)->f += (v)) > -#else > -#define IP_FRAG_TBL_STAT_UPDATE(s, f, v)do {} while (0) > -#endif /* IP_FRAG_TBL_STAT */ > - > -/* local frag table helper functions */ > -static inline void > -ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row > *dr, > - struct ip_frag_pkt *fp) > -{ > - ip_frag_free(fp, dr); > - ip_frag_key_invalidate(&fp->key); > - TAILQ_REMOVE(&tbl->lru, fp, lru); > - tbl->use_entries--; > - IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, del_num, 1); > -} > - > static inline void > ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp, > const struct ip_frag_key *key, uint64_t tms) > diff --git a/lib/librte_ip_frag/rte_ip_frag.h > b/lib/librte_ip_frag/rte_ip_frag.h > index b3f3f78df..3c694df92 100644 > --- a/lib/librte_ip_frag/rte_ip_frag.h > +++ b/lib/librte_ip_frag/rte_ip_frag.h > @@ -65,10 +65,13 @@ struct ip_frag_pkt { > > #define IP_FRAG_DEATH_ROW_LEN 32 /**< death row size (in packets) */ > > +/* death row size in mbufs */ > +#define IP_FRAG_DEATH_ROW_MBUF_LEN (IP_FRAG_DEATH_ROW_LEN * (IP_MAX_FRAG_NUM > + 1)) > + > /** mbuf death row (packets to be freed) */ > struct rte_ip_frag_death_row { > uint32_t cnt; /**< number of mbufs currently on death row */ > - struct rte_mbuf *row[IP_FRAG_DEATH_ROW_LEN * (IP_MAX_FRAG_NUM + 1)]; > + struct rte_mbuf *row[IP_FRAG_DEATH_ROW_MBUF_LEN]; > /**< mbufs to be freed */ > }; > > @@ -325,6 +328,20 @@ void rte_ip_frag_free_death_row(struct > rte_ip_frag_death_row *dr, > void > rte_ip_frag_table_statistics_dump(FILE * f, const struct rte_ip_frag_tbl > *tbl); > > +/** > + * Delete expired fragments > + * > + * @param tbl > + * Table to delete expired fragments from > + * @param dr > + * Death row to free buffers to > + * @param tms > + * Current timestamp > + */ > +void __rte_experimental > +rte_frag_table_del_expired_entries(struct rte_ip_frag_tbl *tbl, > + struct rte_ip_frag_death_row *dr, uint64_t tms); > + > #ifdef __cplusplus > } > #endif > diff --git a/lib/librte_ip_frag/rte_ip_frag_common.c > b/lib/librte_ip_frag/rte_ip_frag_common.c
Re: [dpdk-dev] [PATCH 2/2] librte_ip_frag: add mbuf counter
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Alex Kiselev > Sent: Wednesday, May 16, 2018 12:04 PM > To: dev@dpdk.org; Burakov, Anatoly > Subject: [dpdk-dev] [PATCH 2/2] librte_ip_frag: add mbuf counter > > add new function rte_frag_table_mbuf_count() that returns > number of mbufs holded in the fragmentation table. > > There might be situations (kind of attack when a lot of > fragmented packets are sent to a dpdk application in order > to flood the fragmentation table) when no additional mbufs > must be added to the fragmentations table since it already > contains to many of them. Currently there is no way to > determine the number of mbufs holded int the fragmentation > table. This patch allows to keep track of the number of mbufs > holded in the fragmentation table. > > Signed-off-by: Alex Kiselev > --- > lib/librte_ip_frag/ip_frag_common.h| 12 +++- > lib/librte_ip_frag/ip_frag_internal.c | 15 +-- > lib/librte_ip_frag/rte_ip_frag.h | 16 > lib/librte_ip_frag/rte_ip_frag_common.c| 1 + > lib/librte_ip_frag/rte_ip_frag_version.map | 1 + > lib/librte_ip_frag/rte_ipv4_reassembly.c | 2 +- > lib/librte_ip_frag/rte_ipv6_reassembly.c | 2 +- > 7 files changed, 36 insertions(+), 13 deletions(-) Do we really need it? It's quite significant code changes and the advantage looks quite small to me... We already have use_entries, right? That can be used to get some estimation for a number of mbufs in the table. Konstantin > > diff --git a/lib/librte_ip_frag/ip_frag_common.h > b/lib/librte_ip_frag/ip_frag_common.h > index 0fdcc7d0f..d04e69de6 100644 > --- a/lib/librte_ip_frag/ip_frag_common.h > +++ b/lib/librte_ip_frag/ip_frag_common.h > @@ -32,9 +32,9 @@ > #endif /* IP_FRAG_TBL_STAT */ > > /* internal functions declarations */ > -struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp, > - struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, > - uint16_t ofs, uint16_t len, uint16_t more_frags); > +struct rte_mbuf * ip_frag_process(struct rte_ip_frag_tbl *tbl, > + struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr, > + struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags); > > struct ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl, > struct rte_ip_frag_death_row *dr, > @@ -91,7 +91,8 @@ ip_frag_key_cmp(const struct ip_frag_key * k1, const struct > ip_frag_key * k2) > > /* put fragment on death row */ > static inline void > -ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr) > +ip_frag_free(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp, > + struct rte_ip_frag_death_row *dr) > { > uint32_t i, k; > > @@ -100,6 +101,7 @@ ip_frag_free(struct ip_frag_pkt *fp, struct > rte_ip_frag_death_row *dr) > if (fp->frags[i].mb != NULL) { > dr->row[k++] = fp->frags[i].mb; > fp->frags[i].mb = NULL; > + tbl->nb_mbufs --; > } > } > > @@ -160,7 +162,7 @@ static inline void > ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row > *dr, > struct ip_frag_pkt *fp) > { > - ip_frag_free(fp, dr); > + ip_frag_free(tbl, fp, dr); > ip_frag_key_invalidate(&fp->key); > TAILQ_REMOVE(&tbl->lru, fp, lru); > tbl->use_entries--; > diff --git a/lib/librte_ip_frag/ip_frag_internal.c > b/lib/librte_ip_frag/ip_frag_internal.c > index 97470a872..eea871b7e 100644 > --- a/lib/librte_ip_frag/ip_frag_internal.c > +++ b/lib/librte_ip_frag/ip_frag_internal.c > @@ -29,14 +29,13 @@ static inline void > ip_frag_tbl_reuse(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row > *dr, > struct ip_frag_pkt *fp, uint64_t tms) > { > - ip_frag_free(fp, dr); > + ip_frag_free(tbl, fp, dr); > ip_frag_reset(fp, tms); > TAILQ_REMOVE(&tbl->lru, fp, lru); > TAILQ_INSERT_TAIL(&tbl->lru, fp, lru); > IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, reuse_num, 1); > } > > - > static inline void > ipv4_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2) > { > @@ -88,8 +87,9 @@ ipv6_frag_hash(const struct ip_frag_key *key, uint32_t *v1, > uint32_t *v2) > } > > struct rte_mbuf * > -ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr, > - struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags) > +ip_frag_process(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp, > + struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint16_t ofs, > + uint16_t len, uint16_t more_frags) > { > uint32_t idx; > > @@ -147,7 +147,7 @@ ip_frag_process(struct ip_frag_pkt *fp, struct > rte_ip_frag_death_row *dr, > fp->frags[IP_LAST_FRAG_IDX].len); > > /* free all fragments, invalidate the entry. */ > - ip_frag_free(fp, dr); > + ip_fr
[dpdk-dev] [PATCH] ethdev: force RSS offload rules again
PMDs should provide supported RSS hash functions via dev_info.flow_type_rss_offloads variable. There is a check in ethdev if requested RSS hash function is supported by PMD or not. This check has been relaxed in previous release to not return an error when a non supported has function requested [1], this has been done to not break the applications. Adding the error return back. PMDs need to provide correct list of supported hash functions and applications need to take care this information before configuring the RSS otherwise they will get an error from APIs: rte_eth_dev_rss_hash_update() rte_eth_dev_configure() [1] af7551e2bfce ("ethdev: remove error return on RSS hash check") Signed-off-by: Ferruh Yigit --- Cc: Xueming Li Cc: Shahaf Shuler Cc: Wei Dai Cc: Qi Zhang Cc: Andrew Rybchenko --- lib/librte_ethdev/rte_ethdev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 66e311676..a9977df97 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -1194,6 +1194,7 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, port_id, dev_conf->rx_adv_conf.rss_conf.rss_hf, dev_info.flow_type_rss_offloads); + return -EINVAL; } /* @@ -2928,6 +2929,7 @@ rte_eth_dev_rss_hash_update(uint16_t port_id, port_id, rss_conf->rss_hf, dev_info.flow_type_rss_offloads); + return -EINVAL; } RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rss_hash_update, -ENOTSUP); return eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev, -- 2.14.3
Re: [dpdk-dev] [PATCH v2 1/2] net/tap: calculate checksums of multi segs packets
On 4/22/2018 12:30 PM, Ophir Munk wrote: > Prior to this commit IP/UDP/TCP checksum offload calculations > were skipped in case of a multi segments packet. > This commit enables TAP checksum calculations for multi segments > packets. > The only restriction is that the first segment must contain > headers of layers 3 (IP) and 4 (UDP or TCP) > > Signed-off-by: Ophir Munk Hi Ophir, Can you please rebase the patch on top of latest master, it doesn't applies cleanly. This is an feature from previous release, please send updates early so that we can get this early into this release. Thanks, ferruh
Re: [dpdk-dev] [PATCH v2 1/2] net/tap: calculate checksums of multi segs packets
On 5/31/2018 2:52 PM, Ferruh Yigit wrote: > On 4/22/2018 12:30 PM, Ophir Munk wrote: >> Prior to this commit IP/UDP/TCP checksum offload calculations >> were skipped in case of a multi segments packet. >> This commit enables TAP checksum calculations for multi segments >> packets. >> The only restriction is that the first segment must contain >> headers of layers 3 (IP) and 4 (UDP or TCP) >> >> Signed-off-by: Ophir Munk > > Hi Ophir, > > Can you please rebase the patch on top of latest master, it doesn't applies > cleanly. > > This is an feature from previous release, please send updates early so that we > can get this early into this release. Opps, I replied to v2 instead of v3. But tested latest version, v3, and need v4.
[dpdk-dev] [PATCH 0/2] Improve service stop support
Existing service functions allow us to stop a service, but doing so doesn't guarantee that the service has finished running on a service core. This patch set introduces a function, rte_service_may_be_active(), to check whether a stopped service is truly stopped. This is needed for flows that modify a resource that the service is using; for example when stopping an eventdev, any event adapters and/or scheduler service need to be quiesced first. This patch set also adds support for the event sw PMD's device stop flush callback, which relies on this new mechanism to ensure that the scheduler service is no longer active. Gage Eads (2): service: add mechanism for quiescing a service event/sw: support device stop flush callback drivers/event/sw/sw_evdev.c | 114 +++- drivers/event/sw/sw_evdev_selftest.c| 81 +++- lib/librte_eal/common/include/rte_service.h | 16 lib/librte_eal/common/rte_service.c | 31 +++- lib/librte_eal/rte_eal_version.map | 1 + test/test/test_service_cores.c | 43 +++ 6 files changed, 279 insertions(+), 7 deletions(-) -- 2.13.6
[dpdk-dev] [PATCH 1/2] service: add mechanism for quiescing a service
Existing service functions allow us to stop a service, but doing so doesn't guarantee that the service has finished running on a service core. This commit introduces rte_service_may_be_active(), which returns whether the service may be executing on one or more lcores currently, or definitely is not. The service core layer supports this function by setting a flag when a service core is going to execute a service, and unsetting the flag when the core is no longer able to run the service (its runstate becomes stopped or the lcore is no longer mapped). With this new function, applications can set a service's runstate to stopped, then poll rte_service_may_be_active() until it returns false. At that point, the service is quiesced. Signed-off-by: Gage Eads --- lib/librte_eal/common/include/rte_service.h | 16 +++ lib/librte_eal/common/rte_service.c | 31 ++--- lib/librte_eal/rte_eal_version.map | 1 + test/test/test_service_cores.c | 43 + 4 files changed, 87 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/common/include/rte_service.h b/lib/librte_eal/common/include/rte_service.h index aea4d91b9..27b2dab7c 100644 --- a/lib/librte_eal/common/include/rte_service.h +++ b/lib/librte_eal/common/include/rte_service.h @@ -162,6 +162,22 @@ int32_t rte_service_runstate_set(uint32_t id, uint32_t runstate); int32_t rte_service_runstate_get(uint32_t id); /** + * This function returns whether the service may be currently executing on + * at least one lcore, or definitely is not. This function can be used to + * determine if, after setting the service runstate to stopped, the service + * is still executing an a service lcore. + * + * Care must be taken if calling this function when the service runstate is + * running, since the result of this function may be incorrect by the time the + * function returns due to service cores running in parallel. + * + * @retval 1 Service may be running on one or more lcores + * @retval 0 Service is not running on any lcore + * @retval -EINVAL Invalid service id + */ +int32_t rte_service_may_be_active(uint32_t id); + +/** * Enable or disable the check for a service-core being mapped to the service. * An application can disable the check when takes the responsibility to run a * service itself using *rte_service_run_iter_on_app_lcore*. diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c index 73507aacb..d6c4c6039 100644 --- a/lib/librte_eal/common/rte_service.c +++ b/lib/librte_eal/common/rte_service.c @@ -52,6 +52,7 @@ struct rte_service_spec_impl { rte_atomic32_t num_mapped_cores; uint64_t calls; uint64_t cycles_spent; + uint8_t active_on_lcore[RTE_MAX_LCORE]; } __rte_cache_aligned; /* the internal values of a service core */ @@ -347,15 +348,19 @@ rte_service_runner_do_callback(struct rte_service_spec_impl *s, static inline int32_t -service_run(uint32_t i, struct core_state *cs, uint64_t service_mask) +service_run(uint32_t i, int lcore, struct core_state *cs, uint64_t service_mask) { if (!service_valid(i)) return -EINVAL; struct rte_service_spec_impl *s = &rte_services[i]; if (s->comp_runstate != RUNSTATE_RUNNING || s->app_runstate != RUNSTATE_RUNNING || - !(service_mask & (UINT64_C(1) << i))) + !(service_mask & (UINT64_C(1) << i))) { + s->active_on_lcore[lcore] = 0; return -ENOEXEC; + } + + s->active_on_lcore[lcore] = 1; /* check do we need cmpset, if MT safe or <= 1 core * mapped, atomic ops are not required. @@ -374,6 +379,24 @@ service_run(uint32_t i, struct core_state *cs, uint64_t service_mask) return 0; } +int32_t rte_service_may_be_active(uint32_t id) +{ + uint32_t ids[RTE_MAX_LCORE] = {0}; + struct rte_service_spec_impl *s = &rte_services[id]; + int32_t lcore_count = rte_service_lcore_list(ids, RTE_MAX_LCORE); + int i; + + if (!service_valid(id)) + return -EINVAL; + + for (i = 0; i < lcore_count; i++) { + if (s->active_on_lcore[ids[i]]) + return 1; + } + + return 0; +} + int32_t rte_service_run_iter_on_app_lcore(uint32_t id, uint32_t serialize_mt_unsafe) { @@ -398,7 +421,7 @@ int32_t rte_service_run_iter_on_app_lcore(uint32_t id, return -EBUSY; } - int ret = service_run(id, cs, UINT64_MAX); + int ret = service_run(id, rte_lcore_id(), cs, UINT64_MAX); if (serialize_mt_unsafe) rte_atomic32_dec(&s->num_mapped_cores); @@ -419,7 +442,7 @@ rte_service_runner_func(void *arg) for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) { /* return value ignored as no change to code flow */ - service_run(i, cs,
[dpdk-dev] [PATCH 2/2] event/sw: support device stop flush callback
This commit also adds a flush callback test to the sw eventdev's selftest suite. Signed-off-by: Gage Eads --- drivers/event/sw/sw_evdev.c | 114 ++- drivers/event/sw/sw_evdev_selftest.c | 81 - 2 files changed, 192 insertions(+), 3 deletions(-) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 10f0e1ad4..95a6f1fda 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -361,9 +361,99 @@ sw_init_qid_iqs(struct sw_evdev *sw) } } +static int +sw_qids_empty(struct sw_evdev *sw) +{ + unsigned int i, j; + + for (i = 0; i < sw->qid_count; i++) { + for (j = 0; j < SW_IQS_MAX; j++) { + if (iq_count(&sw->qids[i].iq[j])) + return 0; + } + } + + return 1; +} + +static int +sw_ports_empty(struct sw_evdev *sw) +{ + unsigned int i; + + for (i = 0; i < sw->port_count; i++) { + if ((rte_event_ring_count(sw->ports[i].rx_worker_ring)) || +rte_event_ring_count(sw->ports[i].cq_worker_ring)) + return 0; + } + + return 1; +} + +static void +sw_drain_ports(struct rte_eventdev *dev) +{ + struct sw_evdev *sw = sw_pmd_priv(dev); + eventdev_stop_flush_t flush; + unsigned int i; + uint8_t dev_id; + void *arg; + + flush = dev->dev_ops->dev_stop_flush; + dev_id = dev->data->dev_id; + arg = dev->data->dev_stop_flush_arg; + + for (i = 0; i < sw->port_count; i++) { + struct rte_event ev; + + while (rte_event_dequeue_burst(dev_id, i, &ev, 1, 0)) { + if (flush) + flush(dev_id, ev, arg); + + ev.op = RTE_EVENT_OP_RELEASE; + rte_event_enqueue_burst(dev_id, i, &ev, 1); + } + } +} + +static void +sw_drain_queue(struct rte_eventdev *dev, struct sw_iq *iq) +{ + struct sw_evdev *sw = sw_pmd_priv(dev); + eventdev_stop_flush_t flush; + uint8_t dev_id; + void *arg; + + flush = dev->dev_ops->dev_stop_flush; + dev_id = dev->data->dev_id; + arg = dev->data->dev_stop_flush_arg; + + while (iq_count(iq) > 0) { + struct rte_event ev; + + iq_dequeue_burst(sw, iq, &ev, 1); + + if (flush) + flush(dev_id, ev, arg); + } +} + +static void +sw_drain_queues(struct rte_eventdev *dev) +{ + struct sw_evdev *sw = sw_pmd_priv(dev); + int i, j; + + for (i = 0; i < sw->qid_count; i++) { + for (j = 0; j < SW_IQS_MAX; j++) + sw_drain_queue(dev, &sw->qids[i].iq[j]); + } +} + static void -sw_clean_qid_iqs(struct sw_evdev *sw) +sw_clean_qid_iqs(struct rte_eventdev *dev) { + struct sw_evdev *sw = sw_pmd_priv(dev); int i, j; /* Release the IQ memory of all configured qids */ @@ -729,10 +819,30 @@ static void sw_stop(struct rte_eventdev *dev) { struct sw_evdev *sw = sw_pmd_priv(dev); - sw_clean_qid_iqs(sw); + int32_t runstate; + + /* Stop the scheduler if it's running */ + runstate = rte_service_runstate_get(sw->service_id); + if (runstate == 1) + rte_service_runstate_set(sw->service_id, 0); + + while (rte_service_may_be_active(sw->service_id)) + rte_pause(); + + /* Flush all events out of the device */ + while (!(sw_qids_empty(sw) && sw_ports_empty(sw))) { + sw_event_schedule(dev); + sw_drain_ports(dev); + sw_drain_queues(dev); + } + + sw_clean_qid_iqs(dev); sw_xstats_uninit(sw); sw->started = 0; rte_smp_wmb(); + + if (runstate == 1) + rte_service_runstate_set(sw->service_id, 1); } static int diff --git a/drivers/event/sw/sw_evdev_selftest.c b/drivers/event/sw/sw_evdev_selftest.c index 78d30e07a..c40912db5 100644 --- a/drivers/event/sw/sw_evdev_selftest.c +++ b/drivers/event/sw/sw_evdev_selftest.c @@ -28,6 +28,7 @@ #define MAX_PORTS 16 #define MAX_QIDS 16 #define NUM_PACKETS (1<<18) +#define DEQUEUE_DEPTH 128 static int evdev; @@ -147,7 +148,7 @@ init(struct test *t, int nb_queues, int nb_ports) .nb_event_ports = nb_ports, .nb_event_queue_flows = 1024, .nb_events_limit = 4096, - .nb_event_port_dequeue_depth = 128, + .nb_event_port_dequeue_depth = DEQUEUE_DEPTH, .nb_event_port_enqueue_depth = 128, }; int ret; @@ -2807,6 +2808,78 @@ holb(struct test *t) /* test to check we avoid basic head-of-line blocking */ return -1; } +static void +flush(uint8_t dev_id __rte_unused, struct rte_event event, void *arg) +{ + *((uint
Re: [dpdk-dev] [RFC] net/mvpp2: implement dynamic logging
On 4/26/2018 11:44 AM, Tomasz Duszynski wrote: > Hello Stephen, > > A few nits on this inline. Hi Tomasz, This was an RFC targeting your driver. Can you re-spin the patch with your suggested updates? > > On Wed, Apr 25, 2018 at 09:44:54AM -0700, Stephen Hemminger wrote: >> All DPDK drivers should use dynamic log types, not the default PMD >> value. >> >> This is an RFC not a patch since I don't have libraries are >> hardware to validate it. >> >> Signed-off-by: Stephen Hemminger <...>
Re: [dpdk-dev] [PATCH] net/bonding: update link status on slave add
On 5/9/2018 1:06 PM, Radu Nicolau wrote: > Add a call to rte_eth_link_get_nowait on every slave to update > the internal link status struct. Otherwise slave add will fail > for mode 4 if the ports are all stopped but only one of them checked. > > Signed-off-by: Radu Nicolau Hi Radu, Can you please send a new version with updated commit log, with a fix title, Fixes commit info, and the bugzilla id it is fixing? Thanks, ferruh
Re: [dpdk-dev] [PATCH 2/2] librte_ip_frag: add mbuf counter
Hi Konstantin. >> -Original Message- >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Alex Kiselev >> Sent: Wednesday, May 16, 2018 12:04 PM >> To: dev@dpdk.org; Burakov, Anatoly >> Subject: [dpdk-dev] [PATCH 2/2] librte_ip_frag: add mbuf counter >> add new function rte_frag_table_mbuf_count() that returns >> number of mbufs holded in the fragmentation table. >> There might be situations (kind of attack when a lot of >> fragmented packets are sent to a dpdk application in order >> to flood the fragmentation table) when no additional mbufs >> must be added to the fragmentations table since it already >> contains to many of them. Currently there is no way to >> determine the number of mbufs holded int the fragmentation >> table. This patch allows to keep track of the number of mbufs >> holded in the fragmentation table. >> Signed-off-by: Alex Kiselev >> --- >> lib/librte_ip_frag/ip_frag_common.h| 12 +++- >> lib/librte_ip_frag/ip_frag_internal.c | 15 +-- >> lib/librte_ip_frag/rte_ip_frag.h | 16 >> lib/librte_ip_frag/rte_ip_frag_common.c| 1 + >> lib/librte_ip_frag/rte_ip_frag_version.map | 1 + >> lib/librte_ip_frag/rte_ipv4_reassembly.c | 2 +- >> lib/librte_ip_frag/rte_ipv6_reassembly.c | 2 +- >> 7 files changed, 36 insertions(+), 13 deletions(-) > Do we really need it? > It's quite significant code changes and the advantage looks quite small to > me... Most of the changes are just movements of some internal functions in order to reuse them in the new code. Basically, the only change I propose is adding one additional counter. > We already have use_entries, right? Let's say for example that there are 8 fragmentation tables, one table per lcore, since it doesn't support concurrent operations. use_entries variable indicates that 1000 entries are in use. Each entry can hold from 1 to 4 mbufs (RTE_LIBRTE_IP_FRAG_MAX_FRAG). So, you can't tell whether a fragmentation table holds 1000 mbufs or 4000, then if we multiply this number to 8 fragmentation tables the estimation would be even more incorrect. That estimation error might be critical under DOS attacks since mbufs is a pretty much limited resource. > That can be used to get some estimation for a number of mbufs in the table. > Konstantin >> diff --git a/lib/librte_ip_frag/ip_frag_common.h >> b/lib/librte_ip_frag/ip_frag_common.h >> index 0fdcc7d0f..d04e69de6 100644 >> --- a/lib/librte_ip_frag/ip_frag_common.h >> +++ b/lib/librte_ip_frag/ip_frag_common.h >> @@ -32,9 +32,9 @@ >> #endif /* IP_FRAG_TBL_STAT */ >> /* internal functions declarations */ >> -struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp, >> - struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, >> - uint16_t ofs, uint16_t len, uint16_t more_frags); >> +struct rte_mbuf * ip_frag_process(struct rte_ip_frag_tbl *tbl, >> + struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr, >> + struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags); >> struct ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl, >> struct rte_ip_frag_death_row *dr, >> @@ -91,7 +91,8 @@ ip_frag_key_cmp(const struct ip_frag_key * k1, const >> struct ip_frag_key * k2) >> /* put fragment on death row */ >> static inline void >> -ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr) >> +ip_frag_free(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp, >> + struct rte_ip_frag_death_row *dr) >> { >> uint32_t i, k; >> @@ -100,6 +101,7 @@ ip_frag_free(struct ip_frag_pkt *fp, struct >> rte_ip_frag_death_row *dr) >> if (fp->frags[i].mb != NULL) { >> dr->row[k++] = fp->frags[i].mb; >> fp->frags[i].mb = NULL; >> + tbl->nb_mbufs --; >> } >> } >> @@ -160,7 +162,7 @@ static inline void >> ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row >> *dr, >> struct ip_frag_pkt *fp) >> { >> - ip_frag_free(fp, dr); >> + ip_frag_free(tbl, fp, dr); >> ip_frag_key_invalidate(&fp->key); >> TAILQ_REMOVE(&tbl->lru, fp, lru); >> tbl->use_entries--; >> diff --git a/lib/librte_ip_frag/ip_frag_internal.c >> b/lib/librte_ip_frag/ip_frag_internal.c >> index 97470a872..eea871b7e 100644 >> --- a/lib/librte_ip_frag/ip_frag_internal.c >> +++ b/lib/librte_ip_frag/ip_frag_internal.c >> @@ -29,14 +29,13 @@ static inline void >> ip_frag_tbl_reuse(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row >> *dr, >> struct ip_frag_pkt *fp, uint64_t tms) >> { >> - ip_frag_free(fp, dr); >> + ip_frag_free(tbl, fp, dr); >> ip_frag_reset(fp, tms); >> TAILQ_REMOVE(&tbl->lru, fp, lru); >> TAILQ_INSERT_TAIL(&tbl->lru, fp, lru); >> IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, reuse_num, 1); >> } >> - >> static inline void >> ipv4_frag_hash(const struct ip_frag_key *key, uint32_
[dpdk-dev] [RFC 01/10] eal: add --no-shared-files option
This command-line option will cause DPDK to not create any shared files at runtime, including any shared configuration or hugetlbfs files. This is useful for debug purposes, as well as for certain use cases like containers. Currently, this option does nothing. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_options.c | 7 +++ lib/librte_eal/common/eal_internal_cfg.h | 1 + lib/librte_eal/common/eal_options.h| 2 ++ 3 files changed, 10 insertions(+) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index ecebb2923..38df094de 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -66,6 +66,7 @@ eal_long_options[] = { {OPT_NO_HUGE, 0, NULL, OPT_NO_HUGE_NUM }, {OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM }, {OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM}, + {OPT_NO_SHARED_FILES, 0, NULL, OPT_NO_SHARED_FILES_NUM }, {OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM}, {OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM}, {OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM}, @@ -1165,6 +1166,10 @@ eal_parse_common_option(int opt, const char *optarg, conf->no_shconf = 1; break; + case OPT_NO_SHARED_FILES_NUM: + conf->no_shared_files = 1; + break; + case OPT_PROC_TYPE_NUM: conf->process_type = eal_parse_proc_type(optarg); break; @@ -1370,6 +1375,8 @@ eal_common_usage(void) " Set specific log level\n" " -v Display version information on startup\n" " -h, --help This help\n" + " --"OPT_NO_SHARED_FILES" Do not create any shared files (config, hugetlbfs, etc.).\n" + " This disables secondary process support\n" "\nEAL options for DEBUG use only:\n" " --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n" " --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n" diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h index c4cbf3acd..3fc71bb49 100644 --- a/lib/librte_eal/common/eal_internal_cfg.h +++ b/lib/librte_eal/common/eal_internal_cfg.h @@ -41,6 +41,7 @@ struct internal_config { volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping * instead of native TSC */ volatile unsigned no_shconf; /**< true if there is no shared config */ + volatile unsigned no_shared_files; /**< true if there are no shared files to be created*/ volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */ volatile enum rte_proc_type_t process_type; /**< multi-process proc type */ /** true to try allocating memory on specific sockets */ diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h index 211ae06ae..b0d9d6819 100644 --- a/lib/librte_eal/common/eal_options.h +++ b/lib/librte_eal/common/eal_options.h @@ -45,6 +45,8 @@ enum { OPT_NO_PCI_NUM, #define OPT_NO_SHCONF "no-shconf" OPT_NO_SHCONF_NUM, +#define OPT_NO_SHARED_FILES "no-shared-files" + OPT_NO_SHARED_FILES_NUM, #define OPT_SOCKET_MEM"socket-mem" OPT_SOCKET_MEM_NUM, #define OPT_SYSLOG"syslog" -- 2.17.0
[dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint
This patchset takes old debug options "--huge-unlink" and "--no-shconf" and replaces them both with a new option, "--no-shared-files". This is a special mode which will disable support for secondary processes, but which will cause DPDK to not create any shared files while running - neither hugepages nor any runtime data (everything will be entirely in memory). Additionally, on supported kernel/glibc versions (Linux 4.14+, glibc 2.27+), "--no-shared-files" mode will also reserve hugepages using memfd instead of relying on hugetlbfs mountpoint. This will make it possible to use DPDK without hugetlbfs mountpoints (e.g. container use cases). This changes functionality of several command-line switches, so RFC for now. Maybe we could leave the old switches as they are and deprecate them in the next release? Anatoly Burakov (10): eal: add --no-shared-files option eal: make --no-shconf an alias for --no-shared-files eal: make --huge-unlink an alias for --no-shared-files fbarray: support no-shared-files mode mem: add support for no-shared-files mode ipc: add support for no-shared-files mode eal: add support for no-shared-files for hugepage info eal: add support for no-shared-files in hugepage data file eal: do not create runtime dir in no-shared-files mode mem: enable memfd-based hugepage allocation lib/librte_eal/bsdapp/eal/eal.c | 7 +- lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 + lib/librte_eal/common/eal_common_fbarray.c| 71 + lib/librte_eal/common/eal_common_memory.c | 3 +- lib/librte_eal/common/eal_common_options.c| 25 ++-- lib/librte_eal/common/eal_common_proc.c | 25 lib/librte_eal/common/eal_internal_cfg.h | 3 +- lib/librte_eal/common/eal_options.h | 7 +- lib/librte_eal/linuxapp/eal/eal.c | 18 ++- .../linuxapp/eal/eal_hugepage_info.c | 140 ++ lib/librte_eal/linuxapp/eal/eal_memalloc.c| 126 +++- lib/librte_eal/linuxapp/eal/eal_memfd.h | 28 lib/librte_eal/linuxapp/eal/eal_memory.c | 19 ++- test/test/test_eal_flags.c| 18 +-- 14 files changed, 384 insertions(+), 110 deletions(-) create mode 100644 lib/librte_eal/linuxapp/eal/eal_memfd.h -- 2.17.0
[dpdk-dev] [RFC 05/10] mem: add support for no-shared-files mode
Unlink hugepages after creating them, to honor the no shared files mode. We cannot resize non-existing files, so make single file segments explicitly unsupported. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal.c | 9 + lib/librte_eal/linuxapp/eal/eal_memalloc.c | 23 +++--- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 32ca25dc2..7904f813e 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -690,6 +690,15 @@ eal_parse_args(int argc, char **argv) goto out; } + if (internal_config.single_file_segments && + internal_config.no_shared_files) { + RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is " + "incompatible with --"OPT_NO_SHARED_FILES"\n"); + eal_usage(prgname); + ret = -1; + goto out; + } + if (optind >= 0) argv[optind-1] = prgname; ret = optind-1; diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 8c11f98c9..f57d307dd 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -512,6 +512,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, __func__, strerror(errno)); goto resized; } + if (internal_config.no_shared_files) { + if (unlink(path)) { + RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n", + __func__, strerror(errno)); + goto resized; + } + } } /* @@ -562,8 +569,11 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, (unsigned int)(alloc_sz >> 20)); goto mapped; } - /* for non-single file segments, we can close fd here */ - if (!internal_config.single_file_segments) + /* for non-single file segments or no shared files mode, we can close fd +* here +*/ + if (!internal_config.single_file_segments || + internal_config.no_shared_files) close(fd); /* we need to trigger a write to the page to enforce page fault and @@ -592,7 +602,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, /* ignore failure, can't make it any worse */ } else { /* only remove file if we can take out a write lock */ - if (lock(fd, LOCK_EX) == 1) + if (internal_config.no_shared_files == 0 && + lock(fd, LOCK_EX) == 1) unlink(path); close(fd); } @@ -617,6 +628,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi, return -1; } + /* if we're no in shared files mode, nothing needs to be done */ + if (internal_config.no_shared_files) { + memset(ms, 0, sizeof(*ms)); + return 0; + } + /* if we are not in single file segments mode, we're going to unmap the * segment and thus drop the lock on original fd, but hugepage dir is * now locked so we can take out another one without races. -- 2.17.0
[dpdk-dev] [RFC 06/10] ipc: add support for no-shared-files mode
IPC is an inter-process communication mechanism. Since no secondaries can ever be expected to run in no shared files mode, IPC will be useless, so do not enable it in the first place. In the interests of API usage convenience, we will still allow registering callbacks, but obviously they won't ever be triggered. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_proc.c | 25 + 1 file changed, 25 insertions(+) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index 707d8ab30..6cce4e925 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -626,6 +626,14 @@ rte_mp_channel_init(void) int dir_fd; pthread_t mp_handle_tid, async_reply_handle_tid; + /* in no shared files mode, we do not have secondary processes support, +* so no need to initialize IPC. +*/ + if (internal_config.no_shared_files) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n"); + return 0; + } + /* create filter path */ create_socket_path("*", path, sizeof(path)); strlcpy(mp_filter, basename(path), sizeof(mp_filter)); @@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply, if (check_input(req) == false) return -1; + + if (internal_config.no_shared_files) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n"); + return 0; + } + if (gettimeofday(&now, NULL) < 0) { RTE_LOG(ERR, EAL, "Faile to get current time\n"); rte_errno = errno; @@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts, if (check_input(req) == false) return -1; + + if (internal_config.no_shared_files) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n"); + return 0; + } + if (gettimeofday(&now, NULL) < 0) { RTE_LOG(ERR, EAL, "Faile to get current time\n"); rte_errno = errno; @@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer) return -1; } + if (internal_config.no_shared_files) { + RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n"); + return 0; + } + return mp_send(msg, peer, MP_REP); } -- 2.17.0
[dpdk-dev] [RFC 09/10] eal: do not create runtime dir in no-shared-files mode
Now that the rest of the EAL is adjusted to not create any shared files, prevent runtime directory from ever being created. Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal.c | 3 ++- lib/librte_eal/linuxapp/eal/eal.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 4dff1804e..3ba2502cc 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv) } /* create runtime data directory */ - if (eal_create_runtime_dir() < 0) { + if (internal_config.no_shared_files == 0 && + eal_create_runtime_dir() < 0) { rte_eal_init_alert("Cannot create runtime directory\n"); rte_errno = EACCES; return -1; diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 7904f813e..c0b2b1a5a 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -827,7 +827,8 @@ rte_eal_init(int argc, char **argv) } /* create runtime data directory */ - if (eal_create_runtime_dir() < 0) { + if (internal_config.no_shared_files == 0 && + eal_create_runtime_dir() < 0) { rte_eal_init_alert("Cannot create runtime directory\n"); rte_errno = EACCES; return -1; -- 2.17.0
[dpdk-dev] [RFC 02/10] eal: make --no-shconf an alias for --no-shared-files
Move all functionality associated with --no-shconf to --no-shared-files, and make the former an alias for the latter. Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal.c| 4 ++-- lib/librte_eal/common/eal_common_memory.c | 3 ++- lib/librte_eal/common/eal_common_options.c | 8 ++-- lib/librte_eal/common/eal_internal_cfg.h | 1 - lib/librte_eal/common/eal_options.h| 2 +- lib/librte_eal/linuxapp/eal/eal.c | 6 +++--- test/test/test_eal_flags.c | 18 +- 7 files changed, 19 insertions(+), 23 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index dc279542d..4dff1804e 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -222,7 +222,7 @@ rte_eal_config_create(void) const char *pathname = eal_runtime_config_path(); - if (internal_config.no_shconf) + if (internal_config.no_shared_files) return; if (mem_cfg_fd < 0){ @@ -261,7 +261,7 @@ rte_eal_config_attach(void) void *rte_mem_cfg_addr; const char *pathname = eal_runtime_config_path(); - if (internal_config.no_shconf) + if (internal_config.no_shared_files) return; if (mem_cfg_fd < 0){ diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c index 4f0688f9d..a9c4b9b68 100644 --- a/lib/librte_eal/common/eal_common_memory.c +++ b/lib/librte_eal/common/eal_common_memory.c @@ -938,7 +938,8 @@ rte_eal_memory_init(void) if (retval < 0) goto fail; - if (internal_config.no_shconf == 0 && rte_eal_memdevice_init() < 0) + if (internal_config.no_shared_files == 0 && + rte_eal_memdevice_init() < 0) goto fail; return 0; diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 38df094de..0f3eb928a 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -65,7 +65,7 @@ eal_long_options[] = { {OPT_NO_HPET, 0, NULL, OPT_NO_HPET_NUM }, {OPT_NO_HUGE, 0, NULL, OPT_NO_HUGE_NUM }, {OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM }, - {OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM}, + {OPT_NO_SHCONF, 0, NULL, OPT_NO_SHARED_FILES_NUM }, {OPT_NO_SHARED_FILES, 0, NULL, OPT_NO_SHARED_FILES_NUM }, {OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM}, {OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM}, @@ -1162,10 +1162,6 @@ eal_parse_common_option(int opt, const char *optarg, conf->vmware_tsc_map = 1; break; - case OPT_NO_SHCONF_NUM: - conf->no_shconf = 1; - break; - case OPT_NO_SHARED_FILES_NUM: conf->no_shared_files = 1; break; @@ -1382,6 +1378,6 @@ eal_common_usage(void) " --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n" " --"OPT_NO_PCI"Disable PCI\n" " --"OPT_NO_HPET" Disable HPET\n" - " --"OPT_NO_SHCONF" No shared config (mmap'd files)\n" + " --"OPT_NO_SHCONF" Deprecated. Alias for --no-shared-files\n" "\n", RTE_MAX_LCORE); } diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h index 3fc71bb49..d80bacd4d 100644 --- a/lib/librte_eal/common/eal_internal_cfg.h +++ b/lib/librte_eal/common/eal_internal_cfg.h @@ -40,7 +40,6 @@ struct internal_config { volatile unsigned no_hpet;/**< true to disable HPET */ volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping * instead of native TSC */ - volatile unsigned no_shconf; /**< true if there is no shared config */ volatile unsigned no_shared_files; /**< true if there are no shared files to be created*/ volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */ volatile enum rte_proc_type_t process_type; /**< multi-process proc type */ diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h index b0d9d6819..6890d4114 100644 --- a/lib/librte_eal/common/eal_options.h +++ b/lib/librte_eal/common/eal_options.h @@ -43,8 +43,8 @@ enum { OPT_NO_HUGE_NUM, #define OPT_NO_PCI"no-pci" OPT_NO_PCI_NUM, +/* no-shconf is an alias for no-shared-files */ #define OPT_NO_SHCONF "no-shconf" - OPT_NO_SHCONF_NUM, #define OPT_NO_SHARED_FILES "no-shared-files" OPT_NO_SHARED_FILES_NUM, #define OPT_SOCKET_MEM"socket-mem" diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_e
[dpdk-dev] [RFC 07/10] eal: add support for no-shared-files for hugepage info
Do not create any shared hugepage size info files if we were asked to not create any shared files. Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 4 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 2 files changed, 8 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c index 836feb672..4b2f71c7e 100644 --- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c @@ -101,6 +101,10 @@ eal_hugepage_info_init(void) hpi->num_pages[0] = num_buffers; hpi->lock_descriptor = fd; + /* for no shared files mode, do not create shared memory config */ + if (internal_config.no_shared_files) + return 0; + tmp_hpi = create_shared_memory(eal_hugepage_info_path(), sizeof(internal_config.hugepage_info)); if (tmp_hpi == NULL ) { diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 7eca711ba..02b1c4ff1 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -446,6 +446,10 @@ eal_hugepage_info_init(void) if (hugepage_info_init() < 0) return -1; + /* for no shared files mode, we're done */ + if (internal_config.no_shared_files) + return 0; + hpi = &internal_config.hugepage_info[0]; tmp_hpi = create_shared_memory(eal_hugepage_info_path(), -- 2.17.0
[dpdk-dev] [RFC 08/10] eal: add support for no-shared-files in hugepage data file
Do not create a shared hugepage data file if we were asked to not create any shared files. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_memory.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 5e1810712..d7b43b5c1 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -521,7 +521,18 @@ static void * create_shared_memory(const char *filename, const size_t mem_size) { void *retval; - int fd = open(filename, O_CREAT | O_RDWR, 0666); + int fd; + + /* if no shared files mode is used, create anonymous memory instead */ + if (internal_config.no_shared_files) { + retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (retval == MAP_FAILED) + return NULL; + return retval; + } + + fd = open(filename, O_CREAT | O_RDWR, 0666); if (fd < 0) return NULL; if (ftruncate(fd, mem_size) < 0) { -- 2.17.0
[dpdk-dev] [RFC 04/10] fbarray: support no-shared-files mode
When using --no-shared-files option, the expectation is that no multiprocess will be supported as no shared files are created. However, fbarray still creates some shared files that prevent multiple processes with the same prefix from starting. Fix this by avoiding creating shared files whenever noshconf option is specified. Since virtual areas we get from eal_get_virtual_area() are read-only, remap them as writable. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_fbarray.c | 71 +- 1 file changed, 42 insertions(+), 29 deletions(-) diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c index 019f84c18..69576c8a8 100644 --- a/lib/librte_eal/common/eal_common_fbarray.c +++ b/lib/librte_eal/common/eal_common_fbarray.c @@ -434,39 +434,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, if (data == NULL) goto fail; - eal_get_fbarray_path(path, sizeof(path), name); + if (internal_config.no_shared_files) { + /* remap virtual area as writable */ + void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE, + MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (new_data == MAP_FAILED) { + RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n", + __func__, strerror(errno)); + goto fail; + } + } else { + eal_get_fbarray_path(path, sizeof(path), name); - /* -* Each fbarray is unique to process namespace, i.e. the filename -* depends on process prefix. Try to take out a lock and see if we -* succeed. If we don't, someone else is using it already. -*/ - fd = open(path, O_CREAT | O_RDWR, 0600); - if (fd < 0) { - RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__, - path, strerror(errno)); - rte_errno = errno; - goto fail; - } else if (flock(fd, LOCK_EX | LOCK_NB)) { - RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__, - path, strerror(errno)); - rte_errno = EBUSY; - goto fail; - } + /* +* Each fbarray is unique to process namespace, i.e. the +* filename depends on process prefix. Try to take out a lock +* and see if we succeed. If we don't, someone else is using it +* already. +*/ + fd = open(path, O_CREAT | O_RDWR, 0600); + if (fd < 0) { + RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", + __func__, path, strerror(errno)); + rte_errno = errno; + goto fail; + } else if (flock(fd, LOCK_EX | LOCK_NB)) { + RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", + __func__, path, strerror(errno)); + rte_errno = EBUSY; + goto fail; + } - /* take out a non-exclusive lock, so that other processes could still -* attach to it, but no other process could reinitialize it. -*/ - if (flock(fd, LOCK_SH | LOCK_NB)) { - rte_errno = errno; - goto fail; - } + /* take out a non-exclusive lock, so that other processes could +* still attach to it, but no other process could reinitialize +* it. +*/ + if (flock(fd, LOCK_SH | LOCK_NB)) { + rte_errno = errno; + goto fail; + } - if (resize_and_map(fd, data, mmap_len)) - goto fail; + if (resize_and_map(fd, data, mmap_len)) + goto fail; - /* we've mmap'ed the file, we can now close the fd */ - close(fd); + /* we've mmap'ed the file, we can now close the fd */ + close(fd); + } /* initialize the data */ memset(data, 0, mmap_len); -- 2.17.0
[dpdk-dev] [RFC 10/10] mem: enable memfd-based hugepage allocation
This will supplant no-shared-files mode to use memfd-based hugetlbfs allocation instead of hugetlbfs mounts. Due to memfd only being supported kernel 4.14+ and glibc 2.27+, a compile-time check is performed along with runtime checks. Signed-off-by: Anatoly Burakov --- .../linuxapp/eal/eal_hugepage_info.c | 136 ++ lib/librte_eal/linuxapp/eal/eal_memalloc.c| 105 +- lib/librte_eal/linuxapp/eal/eal_memfd.h | 28 lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +- 4 files changed, 234 insertions(+), 39 deletions(-) create mode 100644 lib/librte_eal/linuxapp/eal/eal_memfd.h diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 02b1c4ff1..1a80ee0ee 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -30,6 +30,7 @@ #include "eal_internal_cfg.h" #include "eal_hugepages.h" #include "eal_filesystem.h" +#include "eal_memfd.h" static const char sys_dir_path[] = "/sys/kernel/mm/hugepages"; static const char sys_pages_numa_dir_path[] = "/sys/devices/system/node"; @@ -313,11 +314,85 @@ compare_hpi(const void *a, const void *b) return hpi_b->hugepage_sz - hpi_a->hugepage_sz; } +static void +calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent) +{ + uint64_t total_pages = 0; + unsigned int i; + + /* +* first, try to put all hugepages into relevant sockets, but +* if first attempts fails, fall back to collecting all pages +* in one socket and sorting them later +*/ + total_pages = 0; + /* we also don't want to do this for legacy init */ + if (!internal_config.legacy_mem) + for (i = 0; i < rte_socket_count(); i++) { + int socket = rte_socket_id_by_idx(i); + unsigned int num_pages = + get_num_hugepages_on_node( + dirent->d_name, socket); + hpi->num_pages[socket] = num_pages; + total_pages += num_pages; + } + /* +* we failed to sort memory from the get go, so fall +* back to old way +*/ + if (total_pages == 0) { + hpi->num_pages[0] = get_num_hugepages(dirent->d_name); + +#ifndef RTE_ARCH_64 + /* for 32-bit systems, limit number of hugepages to +* 1GB per page size */ + hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0], + RTE_PGSIZE_1G / hpi->hugepage_sz); +#endif + } +} + +static int +check_memfd_pagesize_supported(uint64_t page_sz) +{ +#ifdef MEMFD_SUPPORTED + int sz_flag, fd; + + /* first, check if this particular pagesize is supported */ + sz_flag = eal_memalloc_get_memfd_pagesize_flag(page_sz); + if (sz_flag == 0) { + RTE_LOG(ERR, EAL, "Unexpected memfd hugepage size: %" + PRIu64" bytes\n", page_sz); + return 0; + } + + /* does currently running kernel support it? */ + fd = memfd_create("memfd_test", sz_flag | MFD_HUGETLB); + if (fd >= 0) { + /* success */ + close(fd); + return 1; + } + /* creating memfd failed, but if the error wasn't EINVAL, reserving of +* hugepages via memfd is supported by the kernel +*/ + if (errno != EINVAL) { + return 1; + } + RTE_LOG(DEBUG, EAL, "Kernel does not support memfd hugepages of size %" + PRIu64" bytes\n", page_sz); +#else + RTE_LOG(DEBUG, EAL, "Memfd hugepage support not enabled at compile time\n"); + RTE_SET_USED(page_sz); +#endif + return 0; +} + static int hugepage_info_init(void) { const char dirent_start_text[] = "hugepages-"; const size_t dirent_start_len = sizeof(dirent_start_text) - 1; - unsigned int i, total_pages, num_sizes = 0; + unsigned int i, num_sizes = 0; DIR *dir; struct dirent *dirent; @@ -343,6 +418,10 @@ hugepage_info_init(void) hpi->hugepage_sz = rte_str_to_size(&dirent->d_name[dirent_start_len]); + /* by default, memfd_hugepage_supported is 1 */ + memfd_hugepage_supported &= + check_memfd_pagesize_supported(hpi->hugepage_sz); + /* first, check if we have a mountpoint */ if (get_hugepage_dir(hpi->hugepage_sz, hpi->hugedir, sizeof(hpi->hugedir)) < 0) { @@ -355,6 +434,23 @@ hugepage_info_init(void) "%" PRIu64 " reserved, but no mounted " "hugetlbfs found for that size\n", num_pages, hpi->hugepage_sz); + +
[dpdk-dev] [PATCH] app/testpmd: fix missing count action fields
COUNT action has been modified and has several fields not addressable though testpmd. In addition, as those fields are not definable testpmd is providing an empty configuration which is undefined. Fixes: fb8fd96d4251 ("ethdev: add shared counter to flow API") Cc: declan.dohe...@intel.com Cc: sta...@dpdk.org Signed-off-by: Nelio Laranjeiro --- app/test-pmd/cmdline_flow.c | 29 +++-- lib/librte_ethdev/rte_flow.c | 2 +- 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c index 9918d7fda..934cf7e90 100644 --- a/app/test-pmd/cmdline_flow.c +++ b/app/test-pmd/cmdline_flow.c @@ -194,6 +194,8 @@ enum index { ACTION_QUEUE_INDEX, ACTION_DROP, ACTION_COUNT, + ACTION_COUNT_SHARED, + ACTION_COUNT_ID, ACTION_RSS, ACTION_RSS_FUNC, ACTION_RSS_LEVEL, @@ -788,6 +790,13 @@ static const enum index action_queue[] = { ZERO, }; +static const enum index action_count[] = { + ACTION_COUNT_ID, + ACTION_COUNT_SHARED, + ACTION_NEXT, + ZERO, +}; + static const enum index action_rss[] = { ACTION_RSS_FUNC, ACTION_RSS_LEVEL, @@ -2022,10 +2031,26 @@ static const struct token token_list[] = { [ACTION_COUNT] = { .name = "count", .help = "enable counters for this rule", - .priv = PRIV_ACTION(COUNT, 0), - .next = NEXT(NEXT_ENTRY(ACTION_NEXT)), + .priv = PRIV_ACTION(COUNT, + sizeof(struct rte_flow_action_count)), + .next = NEXT(action_count), .call = parse_vc, }, + [ACTION_COUNT_ID] = { + .name = "identifier", + .help = "counter identifier to use", + .next = NEXT(action_count, NEXT_ENTRY(UNSIGNED)), + .args = ARGS(ARGS_ENTRY(struct rte_flow_action_count, id)), + .call = parse_vc_conf, + }, + [ACTION_COUNT_SHARED] = { + .name = "shared", + .help = "shared counter", + .next = NEXT(action_count, NEXT_ENTRY(BOOLEAN)), + .args = ARGS(ARGS_ENTRY_BF(struct rte_flow_action_count, + shared, 1)), + .call = parse_vc_conf, + }, [ACTION_RSS] = { .name = "rss", .help = "spread packets among several queues", diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c index b2afba089..2e87e59f3 100644 --- a/lib/librte_ethdev/rte_flow.c +++ b/lib/librte_ethdev/rte_flow.c @@ -84,7 +84,7 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] = { MK_FLOW_ACTION(FLAG, 0), MK_FLOW_ACTION(QUEUE, sizeof(struct rte_flow_action_queue)), MK_FLOW_ACTION(DROP, 0), - MK_FLOW_ACTION(COUNT, 0), + MK_FLOW_ACTION(COUNT, sizeof(struct rte_flow_action_count)), MK_FLOW_ACTION(RSS, sizeof(struct rte_flow_action_rss)), MK_FLOW_ACTION(PF, 0), MK_FLOW_ACTION(VF, sizeof(struct rte_flow_action_vf)), -- 2.17.1
[dpdk-dev] [RFC 03/10] eal: make --huge-unlink an alias for --no-shared-files
Move all functionality associated with --huge-unlink command-line option to --no-shared-files, and make it an alias. Since the new command-line option does things other than just unlinking hugepage files after they've been created, it is no longer incompatible with --no-huge option, so removing that check as well. Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_options.c | 14 ++ lib/librte_eal/common/eal_internal_cfg.h | 1 - lib/librte_eal/common/eal_options.h| 5 ++--- lib/librte_eal/linuxapp/eal/eal_memory.c | 2 +- 4 files changed, 5 insertions(+), 17 deletions(-) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 0f3eb928a..63e562bdb 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -57,7 +57,7 @@ eal_long_options[] = { {OPT_FILE_PREFIX, 1, NULL, OPT_FILE_PREFIX_NUM }, {OPT_HELP, 0, NULL, OPT_HELP_NUM }, {OPT_HUGE_DIR, 1, NULL, OPT_HUGE_DIR_NUM }, - {OPT_HUGE_UNLINK, 0, NULL, OPT_HUGE_UNLINK_NUM }, + {OPT_HUGE_UNLINK, 0, NULL, OPT_NO_SHARED_FILES_NUM }, {OPT_LCORES,1, NULL, OPT_LCORES_NUM }, {OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM}, {OPT_MASTER_LCORE, 1, NULL, OPT_MASTER_LCORE_NUM }, @@ -1140,10 +1140,6 @@ eal_parse_common_option(int opt, const char *optarg, break; /* long options */ - case OPT_HUGE_UNLINK_NUM: - conf->hugepage_unlink = 1; - break; - case OPT_NO_HUGE_NUM: conf->no_hugetlbfs = 1; /* no-huge is legacy mem */ @@ -1318,12 +1314,6 @@ eal_check_common_options(struct internal_config *internal_cfg) return -1; } - if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) { - RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot " - "be specified together with --"OPT_NO_HUGE"\n"); - return -1; - } - return 0; } @@ -1374,7 +1364,7 @@ eal_common_usage(void) " --"OPT_NO_SHARED_FILES" Do not create any shared files (config, hugetlbfs, etc.).\n" " This disables secondary process support\n" "\nEAL options for DEBUG use only:\n" - " --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n" + " --"OPT_HUGE_UNLINK" Deprecated. Alias for --no-shared-files\n" " --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n" " --"OPT_NO_PCI"Disable PCI\n" " --"OPT_NO_HPET" Disable HPET\n" diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h index d80bacd4d..887a6a8e2 100644 --- a/lib/librte_eal/common/eal_internal_cfg.h +++ b/lib/librte_eal/common/eal_internal_cfg.h @@ -35,7 +35,6 @@ struct internal_config { volatile unsigned force_nchannel; /**< force number of channels */ volatile unsigned force_nrank;/**< force number of ranks */ volatile unsigned no_hugetlbfs; /**< true to disable hugetlbfs */ - unsigned hugepage_unlink; /**< true to unlink backing files */ volatile unsigned no_pci; /**< true to disable PCI */ volatile unsigned no_hpet;/**< true to disable HPET */ volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h index 6890d4114..aef696c92 100644 --- a/lib/librte_eal/common/eal_options.h +++ b/lib/librte_eal/common/eal_options.h @@ -25,8 +25,6 @@ enum { OPT_FILE_PREFIX_NUM, #define OPT_HUGE_DIR "huge-dir" OPT_HUGE_DIR_NUM, -#define OPT_HUGE_UNLINK "huge-unlink" - OPT_HUGE_UNLINK_NUM, #define OPT_LCORES"lcores" OPT_LCORES_NUM, #define OPT_LOG_LEVEL "log-level" @@ -43,7 +41,8 @@ enum { OPT_NO_HUGE_NUM, #define OPT_NO_PCI"no-pci" OPT_NO_PCI_NUM, -/* no-shconf is an alias for no-shared-files */ +/* huge-unlink and no-shconf are alias for no-shared-files */ +#define OPT_HUGE_UNLINK "huge-unlink" #define OPT_NO_SHCONF "no-shconf" #define OPT_NO_SHARED_FILES "no-shared-files" OPT_NO_SHARED_FILES_NUM, diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index c917de1c2..5e1810712 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -1547,7 +1547,7 @@ eal_legacy_hugepage_init(void) } /* free the hugepage backing files */ - if (internal_config.hugepage_unlink && + if (internal_config.no_shared_files && u
[dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add
Add a call to rte_eth_link_get_nowait on every slave to update the internal link status struct. Otherwise slave add will fail for mode 4 if the ports are all stopped but only one of them checked. Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions") Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52 Signed-off-by: Radu Nicolau --- v2: add fix and Bugzilla references drivers/net/bonding/rte_eth_bond_api.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index d558df8..cad08b9 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, uint16_t slave_port_id) return -1; } + rte_eth_link_get_nowait(slave_port_id, &link_props); + slave_add(internals, slave_eth_dev); /* We need to store slaves reta_size to be able to synchronize RETA for all -- 2.7.5
Re: [dpdk-dev] [PATCH 1/2] librte_ip_frag: add function to delete expired entries
Hi Konstantin. > Hi Alex, >> -Original Message- >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Alex Kiselev >> Sent: Wednesday, May 16, 2018 12:04 PM >> To: dev@dpdk.org; Burakov, Anatoly >> Subject: [dpdk-dev] [PATCH 1/2] librte_ip_frag: add function to delete >> expired entries >> add new function rte_frag_table_del_expired_entries() >> that scans the list of recently used packets and delete the expired ones. >> A fragmented packets is supposed to live no longer than max_cycles, >> but the lib deletes an expired packet only occasionally when it scans >> a bucket to find an empty slot while adding a new packet. >> Therefore a fragment might sit in the table forever. >> Signed-off-by: Alex Kiselev >> --- >> lib/librte_ip_frag/ip_frag_common.h| 18 >> lib/librte_ip_frag/ip_frag_internal.c | 18 >> lib/librte_ip_frag/rte_ip_frag.h | 19 +++- >> lib/librte_ip_frag/rte_ip_frag_common.c| 46 >> ++ >> lib/librte_ip_frag/rte_ip_frag_version.map | 6 >> 5 files changed, 88 insertions(+), 19 deletions(-) >> diff --git a/lib/librte_ip_frag/ip_frag_common.h >> b/lib/librte_ip_frag/ip_frag_common.h >> index 197acf8d8..0fdcc7d0f 100644 >> --- a/lib/librte_ip_frag/ip_frag_common.h >> +++ b/lib/librte_ip_frag/ip_frag_common.h >> @@ -25,6 +25,12 @@ >> #define IPv6_KEY_BYTES_FMT \ >> "%08" PRIx64 "%08" PRIx64 "%08" PRIx64 "%08" PRIx64 >> +#ifdef RTE_LIBRTE_IP_FRAG_TBL_STAT >> +#define IP_FRAG_TBL_STAT_UPDATE(s, f, v)((s)->f += (v)) >> +#else >> +#define IP_FRAG_TBL_STAT_UPDATE(s, f, v)do {} while (0) >> +#endif /* IP_FRAG_TBL_STAT */ >> + >> /* internal functions declarations */ >> struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp, >> struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, >> @@ -149,4 +155,16 @@ ip_frag_reset(struct ip_frag_pkt *fp, uint64_t tms) >> fp->frags[IP_FIRST_FRAG_IDX] = zero_frag; >> } >> +/* local frag table helper functions */ >> +static inline void >> +ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row >> *dr, >> + struct ip_frag_pkt *fp) >> +{ >> + ip_frag_free(fp, dr); >> + ip_frag_key_invalidate(&fp->key); >> + TAILQ_REMOVE(&tbl->lru, fp, lru); >> + tbl->use_entries--; >> + IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, del_num, 1); >> +} >> + >> #endif /* _IP_FRAG_COMMON_H_ */ >> diff --git a/lib/librte_ip_frag/ip_frag_internal.c >> b/lib/librte_ip_frag/ip_frag_internal.c >> index 2560c7713..97470a872 100644 >> --- a/lib/librte_ip_frag/ip_frag_internal.c >> +++ b/lib/librte_ip_frag/ip_frag_internal.c >> @@ -14,24 +14,6 @@ >> #define IP_FRAG_TBL_POS(tbl, sig) \ >> ((tbl)->pkt + ((sig) & (tbl)->entry_mask)) >> -#ifdef RTE_LIBRTE_IP_FRAG_TBL_STAT >> -#define IP_FRAG_TBL_STAT_UPDATE(s, f, v)((s)->f += (v)) >> -#else >> -#define IP_FRAG_TBL_STAT_UPDATE(s, f, v)do {} while (0) >> -#endif /* IP_FRAG_TBL_STAT */ >> - >> -/* local frag table helper functions */ >> -static inline void >> -ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row >> *dr, >> - struct ip_frag_pkt *fp) >> -{ >> - ip_frag_free(fp, dr); >> - ip_frag_key_invalidate(&fp->key); >> - TAILQ_REMOVE(&tbl->lru, fp, lru); >> - tbl->use_entries--; >> - IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, del_num, 1); >> -} >> - >> static inline void >> ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp, >> const struct ip_frag_key *key, uint64_t tms) >> diff --git a/lib/librte_ip_frag/rte_ip_frag.h >> b/lib/librte_ip_frag/rte_ip_frag.h >> index b3f3f78df..3c694df92 100644 >> --- a/lib/librte_ip_frag/rte_ip_frag.h >> +++ b/lib/librte_ip_frag/rte_ip_frag.h >> @@ -65,10 +65,13 @@ struct ip_frag_pkt { >> #define IP_FRAG_DEATH_ROW_LEN 32 /**< death row size (in packets) */ >> +/* death row size in mbufs */ >> +#define IP_FRAG_DEATH_ROW_MBUF_LEN (IP_FRAG_DEATH_ROW_LEN * >> (IP_MAX_FRAG_NUM + 1)) >> + >> /** mbuf death row (packets to be freed) */ >> struct rte_ip_frag_death_row { >> uint32_t cnt; /**< number of mbufs currently on death row */ >> - struct rte_mbuf *row[IP_FRAG_DEATH_ROW_LEN * (IP_MAX_FRAG_NUM + 1)]; >> + struct rte_mbuf *row[IP_FRAG_DEATH_ROW_MBUF_LEN]; >> /**< mbufs to be freed */ >> }; >> @@ -325,6 +328,20 @@ void rte_ip_frag_free_death_row(struct >> rte_ip_frag_death_row *dr, >> void >> rte_ip_frag_table_statistics_dump(FILE * f, const struct rte_ip_frag_tbl >> *tbl); >> +/** >> + * Delete expired fragments >> + * >> + * @param tbl >> + * Table to delete expired fragments from >> + * @param dr >> + * Death row to free buffers to >> + * @param tms >> + * Current timestamp >> + */ >> +void __rte_experimental >> +rte_frag_table_del_expired_entries(struct rte_ip_frag_tbl *tbl, >> + struct rte_ip_frag_death_row *dr, uint64_t tms); >> + >> #ifdef __cplusp
Re: [dpdk-dev] [PATCH v1 01/24] net/ena: update ena_com to the newer version
On 5/9/2018 1:45 PM, Michal Krawczyk wrote: > ena_com is the HAL provided by the vendor and it shouldn't be modified > by the driver developers. > > The PMD and platform file was adjusted for the new version of the > ena_com: > * Do not use deprecated meta descriptor fields > * Add empty AENQ handler structure with unimplemented handlers > * Add memzone allocations count to ena_ethdev.c file - it was > removed from ena_com.c file > * Add new macros used in new ena_com files > * Use error code ENA_COM_UNSUPPORTED instead of ENA_COM_PERMISSION > > Signed-off-by: Michal Krawczyk > Signed-off-by: Rafal Kozik Hi Michał, Marcin, Guy, Evgeny, Can you please send a new version rebased on top of latest next-net master? Patchset gives conflicts, it is not hard to resolve but some of them are related to the removed offload checks and invalidates the patch, it is better to you guys to decide on it. Thanks, ferruh
[dpdk-dev] [PATCH] hash: validate hash bucket entries while compiling
Validate RTE_HASH_BUCKET_ENTRIES during compilation instead of run time. Signed-off-by: Honnappa Nagarahalli Reviewed-by: Gavin Hu --- lib/librte_eal/common/include/rte_common.h | 5 + lib/librte_hash/rte_cuckoo_hash.c | 1 - lib/librte_hash/rte_cuckoo_hash.h | 4 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/include/rte_common.h b/lib/librte_eal/common/include/rte_common.h index 434adfd45..a9df7c161 100644 --- a/lib/librte_eal/common/include/rte_common.h +++ b/lib/librte_eal/common/include/rte_common.h @@ -293,6 +293,11 @@ rte_combine64ms1b(register uint64_t v) /*** Macros to work with powers of 2 / +/** + * Macro to return 1 if n is a power of 2, 0 otherwise + */ +#define RTE_IS_POWER_OF_2(n) ((n) && !(((n) - 1) & (n))) + /** * Returns true if n is a power of 2 * @param n diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c index a07543a29..375e7d208 100644 --- a/lib/librte_hash/rte_cuckoo_hash.c +++ b/lib/librte_hash/rte_cuckoo_hash.c @@ -107,7 +107,6 @@ rte_hash_create(const struct rte_hash_parameters *params) /* Check for valid parameters */ if ((params->entries > RTE_HASH_ENTRIES_MAX) || (params->entries < RTE_HASH_BUCKET_ENTRIES) || - !rte_is_power_of_2(RTE_HASH_BUCKET_ENTRIES) || (params->key_len == 0)) { rte_errno = EINVAL; RTE_LOG(ERR, HASH, "rte_hash_create has invalid parameters\n"); diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h index 7a54e5557..bd6ad1bd6 100644 --- a/lib/librte_hash/rte_cuckoo_hash.h +++ b/lib/librte_hash/rte_cuckoo_hash.h @@ -97,6 +97,10 @@ enum add_key_case { /** Number of items per bucket. */ #define RTE_HASH_BUCKET_ENTRIES8 +#if !RTE_IS_POWER_OF_2(RTE_HASH_BUCKET_ENTRIES) +#error RTE_HASH_BUCKET_ENTRIES must be a power of 2 +#endif + #define NULL_SIGNATURE 0 #define EMPTY_SLOT 0 -- 2.14.1
Re: [dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add
On 5/31/2018 3:34 PM, Radu Nicolau wrote: I can see you just prefix "fix" to the title without updating it :) What about following one: "net/bonding: fix slave add for mode 4" ? > Add a call to rte_eth_link_get_nowait on every slave to update > the internal link status struct. Otherwise slave add will fail > for mode 4 if the ports are all stopped but only one of them checked. What is the link related expectation from slaves in mode 4? What does "if the ports are all stopped but only one of them checked" mean, why checking only one of them? > > Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions") > Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52 > > Signed-off-by: Radu Nicolau > --- > v2: add fix and Bugzilla references > > drivers/net/bonding/rte_eth_bond_api.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/bonding/rte_eth_bond_api.c > b/drivers/net/bonding/rte_eth_bond_api.c > index d558df8..cad08b9 100644 > --- a/drivers/net/bonding/rte_eth_bond_api.c > +++ b/drivers/net/bonding/rte_eth_bond_api.c > @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, > uint16_t slave_port_id) > return -1; > } > > + rte_eth_link_get_nowait(slave_port_id, &link_props); > + The error seems in link_properties_valid(), does it make sense to get link info inside that function before link checks? > slave_add(internals, slave_eth_dev); > > /* We need to store slaves reta_size to be able to synchronize RETA for > all >
[dpdk-dev] [PATCH] eal: move runtime config file to new location
As per deprecation notice [1], move DPDK runtime config to default DPDK runtime data location. Also, remove the deprecation notice. [1] http://dpdk.org/dev/patchwork/patch/40418/ Signed-off-by: Anatoly Burakov --- doc/guides/rel_notes/deprecation.rst | 10 -- lib/librte_eal/common/eal_filesystem.h | 10 +++--- 2 files changed, 3 insertions(+), 17 deletions(-) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 1ce692eac..ff15baa3f 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -8,16 +8,6 @@ API and ABI deprecation notices are to be posted here. Deprecation Notices --- -* eal: DPDK runtime configuration file (located at - ``/var/run/._config``) will be moved. The new path will be as follows: - - - if DPDK is running as root, path will be set to -``/var/run/dpdk//config`` - - if DPDK is not running as root and $XDG_RUNTIME_DIR is set, path will be set -to ``$XDG_RUNTIME_DIR/dpdk//config`` - - if DPDK is not running as root and $XDG_RUNTIME_DIR is not set, path will be -set to ``/tmp/dpdk//config`` - * eal: both declaring and identifying devices will be streamlined in v18.08. New functions will appear to query a specific port from buses, classes of device and device drivers. Device declaration will be made coherent with the diff --git a/lib/librte_eal/common/eal_filesystem.h b/lib/librte_eal/common/eal_filesystem.h index 364f38d13..de05febf4 100644 --- a/lib/librte_eal/common/eal_filesystem.h +++ b/lib/librte_eal/common/eal_filesystem.h @@ -12,7 +12,6 @@ #define EAL_FILESYSTEM_H /** Path of rte config file. */ -#define RUNTIME_CONFIG_FMT "%s/.%s_config" #include #include @@ -30,17 +29,14 @@ eal_create_runtime_dir(void); const char * eal_get_runtime_dir(void); +#define RUNTIME_CONFIG_FNAME "config" static inline const char * eal_runtime_config_path(void) { static char buffer[PATH_MAX]; /* static so auto-zeroed */ - const char *directory = "/var/run"; - const char *home_dir = getenv("HOME"); - if (getuid() != 0 && home_dir != NULL) - directory = home_dir; - snprintf(buffer, sizeof(buffer) - 1, RUNTIME_CONFIG_FMT, directory, - internal_config.hugefile_prefix); + snprintf(buffer, sizeof(buffer) - 1, "%s/%s", eal_get_runtime_dir(), + RUNTIME_CONFIG_FNAME); return buffer; } -- 2.17.0
Re: [dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add
On 5/31/2018 4:34 PM, Ferruh Yigit wrote: > On 5/31/2018 3:34 PM, Radu Nicolau wrote: > > I can see you just prefix "fix" to the title without updating it :) > > What about following one: > "net/bonding: fix slave add for mode 4" ? > >> Add a call to rte_eth_link_get_nowait on every slave to update >> the internal link status struct. Otherwise slave add will fail >> for mode 4 if the ports are all stopped but only one of them checked. > > What is the link related expectation from slaves in mode 4? > > What does "if the ports are all stopped but only one of them checked" mean, > why > checking only one of them? > >> >> Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions") >> Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52 Bugzilla ID: 52 btw, can you please send new version as reply to previous version? >> >> Signed-off-by: Radu Nicolau >> --- >> v2: add fix and Bugzilla references >> >> drivers/net/bonding/rte_eth_bond_api.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/net/bonding/rte_eth_bond_api.c >> b/drivers/net/bonding/rte_eth_bond_api.c >> index d558df8..cad08b9 100644 >> --- a/drivers/net/bonding/rte_eth_bond_api.c >> +++ b/drivers/net/bonding/rte_eth_bond_api.c >> @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, >> uint16_t slave_port_id) >> return -1; >> } >> >> +rte_eth_link_get_nowait(slave_port_id, &link_props); >> + > > The error seems in link_properties_valid(), does it make sense to get link > info > inside that function before link checks? > >> slave_add(internals, slave_eth_dev); >> >> /* We need to store slaves reta_size to be able to synchronize RETA for >> all >> >
Re: [dpdk-dev] [PATCH] net/tap: update tap index to unsgined
On 5/15/2018 1:36 PM, Wiles, Keith wrote: > > >> On May 12, 2018, at 1:30 AM, Vipin Varghese wrote: >> >> Updating the logic to reflect unsigned integer as index for TAP PMD. >> >> Signed-off-by: Vipin Varghese <...> > Acked by Keith Wiles Repeating ack with "-" to help patchwork: Acked-by: Keith Wiles Applied to dpdk-next-net/master, thanks.
[dpdk-dev] [PATCH] mem: mark pages as freeable on exit
When rte_eal_cleanup() is called, it is expected that DPDK will be able to release all of its memory back to the system. However, if pages are marked as unfreeable, the pages will not be released back. Fix this to mark all pages as freeable on calling rte_eal_cleanup(), but only do it for primary process, as secondaries can come and go. Signed-off-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal.c | 17 + 1 file changed, 17 insertions(+) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 8655b8691..987b57f87 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -1044,9 +1044,26 @@ rte_eal_init(int argc, char **argv) return fctret; } +static int +mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms, + void *arg __rte_unused) +{ + /* ms is const, so find this memseg */ + struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl); + + found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE; + + return 0; +} + int __rte_experimental rte_eal_cleanup(void) { + /* if we're in a primary process, we need to mark hugepages as freeable +* so that finalization can release them back to the system. +*/ + if (rte_eal_process_type() == RTE_PROC_PRIMARY) + rte_memseg_walk(mark_freeable, NULL); rte_service_finalize(); return 0; } -- 2.17.0
Re: [dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add
On 5/31/2018 4:36 PM, Ferruh Yigit wrote: On 5/31/2018 4:34 PM, Ferruh Yigit wrote: On 5/31/2018 3:34 PM, Radu Nicolau wrote: I can see you just prefix "fix" to the title without updating it :) What about following one: "net/bonding: fix slave add for mode 4" ? Great, I'll use it for v3 :) Add a call to rte_eth_link_get_nowait on every slave to update the internal link status struct. Otherwise slave add will fail for mode 4 if the ports are all stopped but only one of them checked. What is the link related expectation from slaves in mode 4? To be identical across all ports What does "if the ports are all stopped but only one of them checked" mean, why checking only one of them? This is the behavior of testpmd, stop getting the link status after the first down port; but this should not affect bonding, so there is no need to update testpmd. Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions") Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52 Bugzilla ID: 52 btw, can you please send new version as reply to previous version? Sure. Signed-off-by: Radu Nicolau --- v2: add fix and Bugzilla references drivers/net/bonding/rte_eth_bond_api.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index d558df8..cad08b9 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, uint16_t slave_port_id) return -1; } + rte_eth_link_get_nowait(slave_port_id, &link_props); + The error seems in link_properties_valid(), does it make sense to get link info inside that function before link checks? Not really, as one might expect that link_properties_valid will only test the struct rte_eth_link *slave_link argument, not update it. slave_add(internals, slave_eth_dev); /* We need to store slaves reta_size to be able to synchronize RETA for all
[dpdk-dev] [PATCH] test/test: properly clean up on exit
The test application didn't call rte_eal_cleanup() on exit, which caused leftover hugepages and memory leaks when running secondary processes. Fix this by calling rte_eal_cleanup() on exit. Signed-off-by: Anatoly Burakov --- test/test/test.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/test/test/test.c b/test/test/test.c index 44dfe20ef..ffa9c3669 100644 --- a/test/test/test.c +++ b/test/test/test.c @@ -84,22 +84,29 @@ main(int argc, char **argv) int ret; ret = rte_eal_init(argc, argv); - if (ret < 0) - return -1; + if (ret < 0) { + ret = -1; + goto out; + } #ifdef RTE_LIBRTE_TIMER rte_timer_subsystem_init(); #endif - if (commands_init() < 0) - return -1; + if (commands_init() < 0) { + ret = -1; + goto out; + } argv += ret; prgname = argv[0]; - if ((recursive_call = getenv(RECURSIVE_ENV_VAR)) != NULL) - return do_recursive_call(); + recursive_call = getenv(RECURSIVE_ENV_VAR); + if (recursive_call != NULL) { + ret = do_recursive_call(); + goto out; + } #ifdef RTE_LIBEAL_USE_HPET if (rte_eal_hpet_init(1) < 0) @@ -111,7 +118,8 @@ main(int argc, char **argv) #ifdef RTE_LIBRTE_CMDLINE cl = cmdline_stdin_new(main_ctx, "RTE>>"); if (cl == NULL) { - return -1; + ret = -1; + goto out; } char *dpdk_test = getenv("DPDK_TEST"); @@ -120,18 +128,23 @@ main(int argc, char **argv) snprintf(buf, sizeof(buf), "%s\n", dpdk_test); if (cmdline_in(cl, buf, strlen(buf)) < 0) { printf("error on cmdline input\n"); - return -1; + ret = -1; + goto out; } cmdline_stdin_exit(cl); - return last_test_result; + ret = last_test_result; + goto out; } /* if no DPDK_TEST env variable, go interactive */ cmdline_interact(cl); cmdline_stdin_exit(cl); #endif + ret = 0; - return 0; +out: + rte_eal_cleanup(); + return ret; } -- 2.17.0
[dpdk-dev] [PATCH v3] net/bonding: fix slave add for mode 4
Add a call to rte_eth_link_get_nowait on every slave to update the internal link status struct. Otherwise slave add will fail for mode 4 if the ports are all stopped but only one of them checked. Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions") Bugzilla ID: 52 Signed-off-by: Radu Nicolau --- v3: updated commit msg v2: add fix and Bugzilla references drivers/net/bonding/rte_eth_bond_api.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index d558df8..cad08b9 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, uint16_t slave_port_id) return -1; } + rte_eth_link_get_nowait(slave_port_id, &link_props); + slave_add(internals, slave_eth_dev); /* We need to store slaves reta_size to be able to synchronize RETA for all -- 2.7.5
Re: [dpdk-dev] [PATCH] net/thunderx: add support for hardware first skip feature
On 5/30/2018 7:41 AM, Rakesh K wrote: > > > On Monday 28 May 2018 07:14 PM, Ferruh Yigit wrote: >> On 5/28/2018 1:57 PM, rkudurumalla wrote: >>> This feature is used to create a hole between HEADROOM >>> and actual data.Size of hole is specified in bytes as >>> module param to pmd >> >> Can't mbuf private area be used? It is between HEADROOM and mbuf header. > > data inserted in the hole will be part of the packet data. One of the > use cases is inserting VLAN header for each packet received before it is > being forwarded without having to move the packet data Cc'ed Olivier. Is this something should be addressed in mbuf level instead of PMD via devarg? >> >>> >>> Signed-off-by: Rakesh Kudurumalla >> >> <...> >>
[dpdk-dev] [RFC] net/mlx4: add TSO support
TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP network stack to delegate segmentation of a TCP segment to the NIC, thus saving compute resources. This RFC proposes to add support for TSO to the MLX4 PMD. Prerequisites: In order for the PMD to recognize the TSO capabilities of the device one has to use: * RDMA-core v18.0 or above. * Linux kernel 4.16 or above. Assumptions: * mlx4 PMD will follow the TSO support implemented in mlx5 PMD. * PMD is backwards compatible. ** The PMD will continue work with the kernels and RDMA-core supported by it today. ** The PMD will continue to work with devices not supporting TSO. Changes proposed in the PMD for implementing TSO: * At init, query the device for TSO support and MAX segment size being supported. This will also determine if the PMD will advertise support for TSO (dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;) * Calling create-qp when creating a Tx queue will have to consider the MAX TSO header size when calculating the actual queue buffer size. This may be abstracted by calling ibv_create_qp_ex with IBV_QP_INIT_ATTR_MAX_TSO_HEADER as comp flag rather than ibv_create_qp. If this breaks backwards compatibility then this calculation will be done in the PMD code. * Modify tx_burst function to: ** Check for TSO flag indication in the packets of the packet burst (buf->ol_flags & PKT_TX_TCP_SEG). ** For TSO packet create the WQE appropriate for sending a TSO packet and fill it with packet info and L2/L3/L4 Headers. * Modify Tx completion function to handle releasing of TSO packet buffers that were transmitted. Concerns: * Impact of changing Tx send routine on performance. The performance of the tx_burst routine for non-TSO packets may be affected just by placing the code that handles TSO packets in it, so we may want to consider having a dedicated routine for TSO packets. * No MAX-TSO parameter. This is a cross-PMD issue that may need a separate mailing thread to handle. As for today there is no way for the PMD to advertise the MAX-TSO it or its HW support as done with other capabilities. (The indirection table size for example. see rte_eth_dev_info.reta_size in rte_ethdev.h). Also there is no DPDK parameter or constant value that the PMD can use in order to know the MAX-TSO the system requires. This prevents applications from determining the MAX-TSO that can be used leading to configuration mismatches that may lead to transmit failures or to less-than-optimize TSO configuration in the best case. I propose to add a max_tso field in rte_eth_dev_info that will allow the PMD to advertise the max tso is supports. This can be used by DPDK applications to determine what TSO size to use. If this is a major change that cannot fit the 18.08 schedule then I propose to add a MAX_TSO constant in rte_ethdev.h, The PMD will compare this value whit its own MAX-TSO and if it cannot meet the defined value it will not advertise that it is a TSO capable device. * Handling packets longer then MAX-TSO In case a PMD is requested to send a TSO packet which is longer than MAX-TSO the PMD send routine should return with an error. A different approach that can be used on the future is to apply GSO to those packets using the GSO lib in DPDK. I am interested in general design comments and concerns listed above. Signed-off-by: Moti Haimovsky -- 1.8.3.1
Re: [dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add
On 5/31/2018 5:13 PM, Radu Nicolau wrote: > > > On 5/31/2018 4:36 PM, Ferruh Yigit wrote: >> On 5/31/2018 4:34 PM, Ferruh Yigit wrote: >>> On 5/31/2018 3:34 PM, Radu Nicolau wrote: >>> >>> I can see you just prefix "fix" to the title without updating it :) >>> >>> What about following one: >>> "net/bonding: fix slave add for mode 4" ? > Great, I'll use it for v3 :) > >>> Add a call to rte_eth_link_get_nowait on every slave to update the internal link status struct. Otherwise slave add will fail for mode 4 if the ports are all stopped but only one of them checked. >>> What is the link related expectation from slaves in mode 4? > To be identical across all ports >>> >>> What does "if the ports are all stopped but only one of them checked" mean, >>> why >>> checking only one of them? > This is the behavior of testpmd, stop getting the link status after the > first down port; but this should not affect bonding, so there is no need > to update testpmd. I see, when this link updating happens in this bonding issue context? When bonding device created? Should we update testpmd behavior too? > >>> Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions") Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52 >> Bugzilla ID: 52 >> >> btw, can you please send new version as reply to previous version? > Sure. > >> Signed-off-by: Radu Nicolau --- v2: add fix and Bugzilla references drivers/net/bonding/rte_eth_bond_api.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index d558df8..cad08b9 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, uint16_t slave_port_id) return -1; } + rte_eth_link_get_nowait(slave_port_id, &link_props); + >>> The error seems in link_properties_valid(), does it make sense to get link >>> info >>> inside that function before link checks? > Not really, as one might expect that link_properties_valid will only > test the struct rte_eth_link *slave_link argument, not update it. Fair enough, I just thought to be sure the tested link is up to date, but that function seems only called by __eth_bond_slave_add_lock_free() which you are updating, so this is ok. > >>> slave_add(internals, slave_eth_dev); /* We need to store slaves reta_size to be able to synchronize RETA for all >
Re: [dpdk-dev] [PATCH 1/2] net/qede: fix to update VF MTU
On 5/23/2018 12:16 AM, Rasesh Mody wrote: > This patch fixes VF MTU update to work without having to restart the > vport and there by not requiring port re-configuration. It adds a > VF MTU Update TLV to achieve the same. Firmware can handle VF MTU update > by just pausing the vport. > > Fixes: dd28bc8c6ef4 ("net/qede: fix VF port creation sequence") > Cc: sta...@dpdk.org > > Signed-off-by: Rasesh Mody Series applied to dpdk-next-net/master, thanks. (for v18.08)
Re: [dpdk-dev] [PATCH] net/cxgbe: report configured link auto-negotiation
On 5/23/2018 7:00 PM, Rahul Lakkireddy wrote: > Report current configured link auto-negotiation. Also initialize > rte_eth_link. > > Coverity issue: 280648 > Fixes: f5b3c7b29357 ("net/cxgbevf: fix inter-VM traffic when physical link > down") > > Signed-off-by: Rahul Lakkireddy Applied to dpdk-next-net/master, thanks.
[dpdk-dev] [PATCH] malloc: fix pad erasing
Previously, when joining adjacent free elements, we were erasing trailer and header, but did not erase the padding. Fix this by accounting for padding on erase, and do not erase padding twice by adjusting data pointer and data len to not include padding. Fixes: bb372060dad4 ("malloc: make heap a doubly-linked list") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- lib/librte_eal/common/malloc_elem.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c index 9bfe9b9b4..944587bc5 100644 --- a/lib/librte_eal/common/malloc_elem.c +++ b/lib/librte_eal/common/malloc_elem.c @@ -386,16 +386,18 @@ malloc_elem_join_adjacent_free(struct malloc_elem *elem) if (elem->next != NULL && elem->next->state == ELEM_FREE && next_elem_is_adjacent(elem)) { void *erase; + size_t erase_len; /* we will want to erase the trailer and header */ erase = RTE_PTR_SUB(elem->next, MALLOC_ELEM_TRAILER_LEN); + erase_len = MALLOC_ELEM_OVERHEAD + elem->next->pad; /* remove from free list, join to this one */ malloc_elem_free_list_remove(elem->next); join_elem(elem, elem->next); - /* erase header and trailer */ - memset(erase, 0, MALLOC_ELEM_OVERHEAD); + /* erase header, trailer and pad */ + memset(erase, 0, erase_len); } /* @@ -406,9 +408,11 @@ malloc_elem_join_adjacent_free(struct malloc_elem *elem) prev_elem_is_adjacent(elem)) { struct malloc_elem *new_elem; void *erase; + size_t erase_len; /* we will want to erase trailer and header */ erase = RTE_PTR_SUB(elem, MALLOC_ELEM_TRAILER_LEN); + erase_len = MALLOC_ELEM_OVERHEAD + elem->pad; /* remove from free list, join to this one */ malloc_elem_free_list_remove(elem->prev); @@ -416,8 +420,8 @@ malloc_elem_join_adjacent_free(struct malloc_elem *elem) new_elem = elem->prev; join_elem(new_elem, elem); - /* erase header and trailer */ - memset(erase, 0, MALLOC_ELEM_OVERHEAD); + /* erase header, trailer and pad */ + memset(erase, 0, erase_len); elem = new_elem; } @@ -436,8 +440,8 @@ malloc_elem_free(struct malloc_elem *elem) void *ptr; size_t data_len; - ptr = RTE_PTR_ADD(elem, sizeof(*elem)); - data_len = elem->size - MALLOC_ELEM_OVERHEAD; + ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN + elem->pad); + data_len = elem->size - elem->pad - MALLOC_ELEM_OVERHEAD; elem = malloc_elem_join_adjacent_free(elem); -- 2.17.0
[dpdk-dev] DPDK Release Status Meeting 31/05/2018
Minutes from the weekly DPDK Release Status Meeting. The DPDK Release Status Meeting is intended for DPDK Committers to discuss the status of the master tree and sub-trees, and for project managers to track progress or milestone dates. The meeting occurs on Thursdays at 8:30 UTC. If you wish to attend just send me and email and I will send you the invite. Minutes 31 May 2018 Agenda: * DPDK 18.05 release. * Retrospective on 18.05. * Testing of stable releases. Participants: * Intel * Cavium * Mellanox * NXP * 6Wind * RedHat DPDK 18.05 release. * DPDK 18.08 "The Venky Release" is out. \o/ * Thanks to all the maintainers and contributors. * See the release notes for the full stats: http://dpdk.org/ml/archives/announce/2018-May/000204.html Retrospective on 18.05. * What went well. * Largest DPDK release ever. * Lots of contributors from all the main companies involved in networking. * Collaboration on major features between companies in the community. * What didn't go so well. * The release was very late and required a lot of release candidates. * RC1 was late and low quality. * Many major defects found in RC testing. * Some reviews were slow or late. * What can we do differently next time. * Should we change the number of releases per year from 4 to 3 or 2? * Merge earlier from subtrees: every 7-10 days. * Push patches earlier. * Review patches more critically. Does every feature/patch need to go in. * Use unit tests more * Make them a requirement for any sizeable code. * Make them easier to write/use/run. * Add a make/meson target for generating coverage results from units tests. Any volunteers? * Hold more strictly to the release milestone dates. Testing of stable releases. * Luca Boccassi asked about testing of the stable release. * All major contributing companies should confirm test results on the stable releases. * Luca will send an email to the list about it: http://dpdk.org/ml/archives/dev/2018-May/103249.html
[dpdk-dev] [PATCH] eal: add option to limit memory allocation on sockets
Previously, it was possible to limit maximum amount of memory allowed for allocation by creating validator callbacks. Although a powerful tool, it's a bit of a hassle and requires modifying the application for it to work with DPDK example applications. Fix this by adding a new parameter "--socket-limit", with syntax similar to "--socket-mem", which would set per-socket memory allocation limits, and set up a default validator callback to deny all allocations above the limit. This option is incompatible with legacy mode, as validator callbacks are not supported there. Signed-off-by: Anatoly Burakov --- doc/guides/linux_gsg/build_sample_apps.rst| 4 +++ .../prog_guide/env_abstraction_layer.rst | 4 +++ lib/librte_eal/common/eal_common_options.c| 10 ++ lib/librte_eal/common/eal_internal_cfg.h | 2 ++ lib/librte_eal/common/eal_options.h | 2 ++ lib/librte_eal/linuxapp/eal/eal.c | 36 +-- lib/librte_eal/linuxapp/eal/eal_memory.c | 21 +++ 7 files changed, 68 insertions(+), 11 deletions(-) diff --git a/doc/guides/linux_gsg/build_sample_apps.rst b/doc/guides/linux_gsg/build_sample_apps.rst index 3623ddf46..332424e05 100644 --- a/doc/guides/linux_gsg/build_sample_apps.rst +++ b/doc/guides/linux_gsg/build_sample_apps.rst @@ -114,6 +114,10 @@ The EAL options are as follows: this memory will also be pinned (i.e. not released back to the system until application closes). +* ``--socket-limit``: + Limit maximum memory available for allocation on each socket. Does not support + legacy memory mode. + * ``-d``: Add a driver or driver directory to be loaded. The application should use this option to load the pmd drivers diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index a22640d29..4c51efd42 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -147,6 +147,10 @@ notified about memory allocations above specified threshold (and have a chance to deny them), allocation validator callbacks are also available via ``rte_mem_alloc_validator_callback_register()`` function. +A default validator callback is provided by EAL, which can be enabled with a +``--socket-limit`` command-line option, for a simple way to limit maximum amount +of memory that can be used by DPDK application. + .. note:: In multiprocess scenario, all related processes (i.e. primary process, and diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index ecebb2923..c720efa86 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -70,6 +70,7 @@ eal_long_options[] = { {OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM}, {OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM}, {OPT_SOCKET_MEM,1, NULL, OPT_SOCKET_MEM_NUM }, + {OPT_SOCKET_LIMIT, 1, NULL, OPT_SOCKET_LIMIT_NUM }, {OPT_SYSLOG,1, NULL, OPT_SYSLOG_NUM }, {OPT_VDEV, 1, NULL, OPT_VDEV_NUM }, {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM}, @@ -179,6 +180,10 @@ eal_reset_internal_config(struct internal_config *internal_cfg) /* zero out the NUMA config */ for (i = 0; i < RTE_MAX_NUMA_NODES; i++) internal_cfg->socket_mem[i] = 0; + internal_cfg->force_socket_limits = 0; + /* zero out the NUMA limits config */ + for (i = 0; i < RTE_MAX_NUMA_NODES; i++) + internal_cfg->socket_limit[i] = 0; /* zero out hugedir descriptors */ for (i = 0; i < MAX_HUGEPAGE_SIZES; i++) { memset(&internal_cfg->hugepage_info[i], 0, @@ -1322,6 +1327,11 @@ eal_check_common_options(struct internal_config *internal_cfg) "be specified together with --"OPT_NO_HUGE"\n"); return -1; } + if (internal_config.force_socket_limits && internal_config.legacy_mem) { + RTE_LOG(ERR, EAL, "--" OPT_SOCKET_LIMIT " is only supported in " + "non-legacy memory mode\n"); + return -1; + } return 0; } diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h index c4cbf3acd..d66cd0313 100644 --- a/lib/librte_eal/common/eal_internal_cfg.h +++ b/lib/librte_eal/common/eal_internal_cfg.h @@ -46,6 +46,8 @@ struct internal_config { /** true to try allocating memory on specific sockets */ volatile unsigned force_sockets; volatile uint64_t socket_mem[RTE_MAX_NUMA_NODES]; /**< amount of memory per socket */ + volatile unsigned force_socket_limits; + volatile uint64_t socket_limit[RTE_MAX_NUMA_NODES]; /**< limit amount of memory per socket */ uintptr_t base_virtaddr; /**< base addre
Re: [dpdk-dev] [RFC] net/mlx4: add TSO support
> On May 31, 2018, at 11:21 AM, Moti Haimovsky wrote: > > TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP > network stack to delegate segmentation of a TCP segment to the NIC, > thus saving compute resources. > > This RFC proposes to add support for TSO to the MLX4 PMD. > > Prerequisites: > In order for the PMD to recognize the TSO capabilities of the device > one has to use: > * RDMA-core v18.0 or above. > * Linux kernel 4.16 or above. > > Assumptions: > * mlx4 PMD will follow the TSO support implemented in mlx5 PMD. > * PMD is backwards compatible. > ** The PMD will continue work with the kernels and RDMA-core > supported by it today. > ** The PMD will continue to work with devices not supporting TSO. > > Changes proposed in the PMD for implementing TSO: > * At init, query the device for TSO support and MAX segment size > being supported. > This will also determine if the PMD will advertise support for TSO > (dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;) > * Calling create-qp when creating a Tx queue will have to consider > the MAX TSO header size when calculating the actual queue buffer > size. This may be abstracted by calling ibv_create_qp_ex with > IBV_QP_INIT_ATTR_MAX_TSO_HEADER as comp flag rather than > ibv_create_qp. > If this breaks backwards compatibility then this calculation will > be done in the PMD code. > * Modify tx_burst function to: > ** Check for TSO flag indication in the packets of the packet burst > (buf->ol_flags & PKT_TX_TCP_SEG). > ** For TSO packet create the WQE appropriate for sending a TSO packet > and fill it with packet info and L2/L3/L4 Headers. > * Modify Tx completion function to handle releasing of TSO packet > buffers that were transmitted. > > Concerns: > * Impact of changing Tx send routine on performance. > The performance of the tx_burst routine for non-TSO packets may be > affected just by placing the code that handles TSO packets in it, > so we may want to consider having a dedicated routine for TSO packets. How much shared code between the two APIs if we created a new API just for TSO? My first thought was to create a new API, but it would require my application to know it needs to call the new TSO API instead of the normal tx_burst API or does it? Maybe it does not matter and a TSO request would never be directed to the normal API, if that is the case I would like a new API and not effect the old one. > * No MAX-TSO parameter. > This is a cross-PMD issue that may need a separate mailing thread to handle. > As for today there is no way for the PMD to advertise the MAX-TSO > it or its HW support as done with other capabilities. > (The indirection table size for example. >see rte_eth_dev_info.reta_size in rte_ethdev.h). > Also there is no DPDK parameter or constant value that the PMD > can use in order to know the MAX-TSO the system requires. > This prevents applications from determining the MAX-TSO that can be > used leading to configuration mismatches that may lead to transmit > failures or to less-than-optimize TSO configuration in the best case. > I propose to add a max_tso field in rte_eth_dev_info that will allow > the PMD to advertise the max tso is supports. This can be used by > DPDK applications to determine what TSO size to use. > If this is a major change that cannot fit the 18.08 schedule then > I propose to add a MAX_TSO constant in rte_ethdev.h, The PMD will > compare this value whit its own MAX-TSO and if it cannot meet the > defined value it will not advertise that it is a TSO capable device. > * Handling packets longer then MAX-TSO > In case a PMD is requested to send a TSO packet which is longer than > MAX-TSO the PMD send routine should return with an error. > A different approach that can be used on the future is to apply GSO > to those packets using the GSO lib in DPDK. > > I am interested in general design comments and concerns listed above. > > Signed-off-by: Moti Haimovsky > > -- > 1.8.3.1 > Regards, Keith
Re: [dpdk-dev] DPDK Release Status Meeting 31/05/2018
> On May 31, 2018, at 12:18 PM, dev-boun...@dpdk.org wrote: > > Minutes from the weekly DPDK Release Status Meeting. > > The DPDK Release Status Meeting is intended for DPDK Committers to discuss > the status of the master tree and sub-trees, and for project managers to > track progress or milestone dates. > > The meeting occurs on Thursdays at 8:30 UTC. If you wish to attend just > send me and email and I will send you the invite. > > > Minutes 31 May 2018 > > Agenda: > > * DPDK 18.05 release. > * Retrospective on 18.05. > * Testing of stable releases. > > Participants: > > * Intel > * Cavium > * Mellanox > * NXP > * 6Wind > * RedHat > > > DPDK 18.05 release. > > * DPDK 18.08 "The Venky Release" is out. \o/ > * Thanks to all the maintainers and contributors. > * See the release notes for the full stats: > http://dpdk.org/ml/archives/announce/2018-May/000204.html > > > Retrospective on 18.05. > > * What went well. > > * Largest DPDK release ever. > * Lots of contributors from all the main companies involved in networking. > * Collaboration on major features between companies in the community. > > * What didn't go so well. > > * The release was very late and required a lot of release candidates. More testing is always better and sooner then later is better. > * RC1 was late and low quality. > * Many major defects found in RC testing. > * Some reviews were slow or late. > > * What can we do differently next time. > > * Should we change the number of releases per year from 4 to 3 or 2? > * Merge earlier from subtrees: every 7-10 days. Knowing when a merge is scheduled could help if say it would be every Monday or Wednesday. I would not put it on a Friday as that tends to cause people having to work on the weekends. Maybe every Wednesday would be best. It allows the developers to know when patches are applied and allow for more testing before the first RC. > * Push patches earlier. > * Review patches more critically. Does every feature/patch need to go in. > * Use unit tests more >* Make them a requirement for any sizeable code. >* Make them easier to write/use/run. >* Add a make/meson target for generating coverage results from units > tests. Any volunteers? > * Hold more strictly to the release milestone dates. I notice at times developers need to rebase on to master because of changes. Is this because of patches after theirs is applied before or do we need to have something in place to eliminate this rework? Knowing when someone is going to introduce a big patch that may disturb or change many things effecting a patch would be nice to see a schedule produced by the developer as to when it will be push. I hope it would give more control for maintainers to schedule a given patch. > > > Testing of stable releases. > > * Luca Boccassi asked about testing of the stable release. > * All major contributing companies should confirm test results on the stable > releases. > * Luca will send an email to the list about it: > http://dpdk.org/ml/archives/dev/2018-May/103249.html > Regards, Keith
[dpdk-dev] [PATCH] doc/event: improve eventdev library documentation
Add small amount of additional code, use consistent variable names across code blocks, change the image to represent queues and CPU cores intuitively. These help improve the eventdev library documentation. Signed-off-by: Honnappa Nagarahalli Reviewed-by: Gavin Hu --- doc/guides/prog_guide/eventdev.rst | 55 +- doc/guides/prog_guide/img/eventdev_usage.svg | 1518 +- 2 files changed, 570 insertions(+), 1003 deletions(-) diff --git a/doc/guides/prog_guide/eventdev.rst b/doc/guides/prog_guide/eventdev.rst index ce19997..0203d9e 100644 --- a/doc/guides/prog_guide/eventdev.rst +++ b/doc/guides/prog_guide/eventdev.rst @@ -1,5 +1,6 @@ .. SPDX-License-Identifier: BSD-3-Clause Copyright(c) 2017 Intel Corporation. +Copyright(c) 2018 Arm Limited. Event Device Library @@ -129,7 +130,7 @@ API Walk-through This section will introduce the reader to the eventdev API, showing how to create and configure an eventdev and use it for a two-stage atomic pipeline -with a single core for TX. The diagram below shows the final state of the +with one core each for RX and TX. The diagram below shows the final state of the application after this walk-through: .. _figure_eventdev-usage1: @@ -196,23 +197,29 @@ calling the setup function. Repeat this step for each queue, starting from .nb_atomic_flows = 1024, .nb_atomic_order_sequences = 1024, }; +struct rte_event_queue_conf single_link_conf = { +.event_queue_cfg = RTE_EVENT_QUEUE_CFG_SINGLE_LINK, +}; int dev_id = 0; -int queue_id = 0; -int err = rte_event_queue_setup(dev_id, queue_id, &atomic_conf); +int atomic_q_1 = 0; +int atomic_q_2 = 1; +int single_link_q = 2; +int err = rte_event_queue_setup(dev_id, atomic_q_1, &atomic_conf); +int err = rte_event_queue_setup(dev_id, atomic_q_2, &atomic_conf); +int err = rte_event_queue_setup(dev_id, single_link_q, &single_link_conf); -The remainder of this walk-through assumes that the queues are configured as -follows: +As shown above, queue IDs are as follows: * id 0, atomic queue #1 * id 1, atomic queue #2 * id 2, single-link queue +These queues are used for the remainder of this walk-through. + Setting up Ports -Once queues are set up successfully, create the ports as required. Each port -should be set up with its corresponding port_conf type, worker for worker cores, -rx and tx for the RX and TX cores: +Once queues are set up successfully, create the ports as required. .. code-block:: c @@ -232,15 +239,24 @@ rx and tx for the RX and TX cores: .new_event_threshold = 4096, }; int dev_id = 0; -int port_id = 0; -int err = rte_event_port_setup(dev_id, port_id, &CORE_FUNCTION_conf); +int rx_port_id = 0; +int err = rte_event_port_setup(dev_id, rx_port_id, &rx_conf); + +for(int worker_port_id = 1; worker_port_id <= 4; worker_port_id++) { + int err = rte_event_port_setup(dev_id, worker_port_id, &worker_conf); +} -It is now assumed that: +int tx_port_id = 5; + int err = rte_event_port_setup(dev_id, tx_port_id, &tx_conf); + +As shown above: * port 0: RX core * ports 1,2,3,4: Workers * port 5: TX core +These ports are used for the remainder of this walk-through. + Linking Queues and Ports @@ -254,15 +270,14 @@ can be achieved like this: .. code-block:: c -uint8_t port_id = 0; +uint8_t rx_port_id = 0; +uint8_t tx_port_id = 5; uint8_t atomic_qs[] = {0, 1}; uint8_t single_link_q = 2; -uint8_t tx_port_id = 5; uin8t_t priority = RTE_EVENT_DEV_PRIORITY_NORMAL; -for(int i = 0; i < 4; i++) { -int worker_port = i + 1; -int links_made = rte_event_port_link(dev_id, worker_port, atomic_qs, NULL, 2); +for(int worker_port_id = 1; worker_port_id <= 4; worker_port_id++) { +int links_made = rte_event_port_link(dev_id, worker_port_id, atomic_qs, NULL, 2); } int links_made = rte_event_port_link(dev_id, tx_port_id, &single_link_q, &priority, 1); @@ -295,14 +310,14 @@ The following code shows how those packets can be enqueued into the eventdev: ev[i].flow_id = mbufs[i]->hash.rss; ev[i].op = RTE_EVENT_OP_NEW; ev[i].sched_type = RTE_SCHED_TYPE_ATOMIC; -ev[i].queue_id = 0; +ev[i].queue_id = atomic_q_1; ev[i].event_type = RTE_EVENT_TYPE_ETHDEV; ev[i].sub_event_type = 0; ev[i].priority = RTE_EVENT_DEV_PRIORITY_NORMAL; ev[i].mbuf = mbufs[i]; } -const int nb_tx = rte_event_enqueue_burst(dev_id, port_id, ev, nb_rx); +const int nb_tx = rte
[dpdk-dev] [PATCH] doc: add template release notes for 18.08
Add template release notes for DPDK 18.08 with inline comments and explanations of the various sections. Signed-off-by: Thomas Monjalon --- doc/guides/rel_notes/index.rst | 1 + doc/guides/rel_notes/release_18_08.rst | 192 + 2 files changed, 193 insertions(+) create mode 100644 doc/guides/rel_notes/release_18_08.rst diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst index eb82a0e06..d125342c3 100644 --- a/doc/guides/rel_notes/index.rst +++ b/doc/guides/rel_notes/index.rst @@ -9,6 +9,7 @@ Release Notes :numbered: rel_description +release_18_08 release_18_05 release_18_02 release_17_11 diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst new file mode 100644 index 0..5bc23c537 --- /dev/null +++ b/doc/guides/rel_notes/release_18_08.rst @@ -0,0 +1,192 @@ +DPDK Release 18.08 +== + +.. **Read this first.** + + The text in the sections below explains how to update the release notes. + + Use proper spelling, capitalization and punctuation in all sections. + + Variable and config names should be quoted as fixed width text: + ``LIKE_THIS``. + + Build the docs and view the output file to ensure the changes are correct:: + + make doc-guides-html + + xdg-open build/doc/html/guides/rel_notes/release_18_08.html + + +New Features + + +.. This section should contain new features added in this release. + Sample format: + + * **Add a title in the past tense with a full stop.** + + Add a short 1-2 sentence description in the past tense. + The description should be enough to allow someone scanning + the release notes to understand the new feature. + + If the feature adds a lot of sub-features you can use a bullet list + like this: + + * Added feature foo to do something. + * Enhanced feature bar to do something else. + + Refer to the previous release notes for examples. + + This section is a comment. Do not overwrite or remove it. + Also, make sure to start the actual text at the margin. + = + + +API Changes +--- + +.. This section should contain API changes. Sample format: + + * Add a short 1-2 sentence description of the API change. + Use fixed width quotes for ``function_names`` or ``struct_names``. + Use the past tense. + + This section is a comment. Do not overwrite or remove it. + Also, make sure to start the actual text at the margin. + = + + +ABI Changes +--- + +.. This section should contain ABI changes. Sample format: + + * Add a short 1-2 sentence description of the ABI change + that was announced in the previous releases and made in this release. + Use fixed width quotes for ``function_names`` or ``struct_names``. + Use the past tense. + + This section is a comment. Do not overwrite or remove it. + Also, make sure to start the actual text at the margin. + = + + +Removed Items +- + +.. This section should contain removed items in this release. Sample format: + + * Add a short 1-2 sentence description of the removed item + in the past tense. + + This section is a comment. Do not overwrite or remove it. + Also, make sure to start the actual text at the margin. + = + + +Shared Library Versions +--- + +.. Update any library version updated in this release + and prepend with a ``+`` sign, like this: + + librte_acl.so.2 + + librte_cfgfile.so.2 + librte_cmdline.so.2 + + This section is a comment. Do not overwrite or remove it. + = + +The libraries prepended with a plus sign were incremented in this version. + +.. code-block:: diff + + librte_acl.so.2 + librte_bbdev.so.1 + librte_bitratestats.so.2 + librte_bpf.so.1 + librte_bus_dpaa.so.1 + librte_bus_fslmc.so.1 + librte_bus_pci.so.1 + librte_bus_vdev.so.1 + librte_cfgfile.so.2 + librte_cmdline.so.2 + librte_common_octeontx.so.1 + librte_compressdev.so.1 + librte_cryptodev.so.4 + librte_distributor.so.1 + librte_eal.so.7 + librte_ethdev.so.9 + librte_eventdev.so.4 + librte_flow_classify.so.1 + librte_gro.so.1 + librte_gso.so.1 + librte_hash.so.2 + librte_ip_frag.so.1 + librte_jobstats.so.1 + librte_kni.so.2 + librte_kvargs.so.1 + librte_latencystats.so.1 + librte_lpm.so.2 + librte_mbuf.so.4 + librte_mempool.so.4 + librte_meter.so.2 + librte_metrics.so.1 + librte_net.so.1 + librte_pci.so.1 + librte_pdump.so.2 + librte_pipeline.so.3 + librte_pmd_bnxt.so.2 + librte_pmd_bond.so.2 + librte_pmd_i40e.so.2 +
Re: [dpdk-dev] [PATCH v3] net/bonding: fix slave add for mode 4
It's not clear to me that the issue here is the bonding slave add. You can only add started PMDs. When a PMD dev start is complete, the PMD should have a valid link state and the link properties should be valid. A few of the PMDs are very good about this, particularly the ones with LSC interrupts. Those drivers often wait for the first link interrupt before setting their link status. So there is a race where the link state isn't well defined. And lastly, why do we care what the link state is when adding a slave? If the link state changes to down, do we remove the slave? If the link speed of the slave changes, do we remove the slave? So this test doesn't make much sense. For mode 4, you should be able to add a slave, but if the link state doesn't match what has been negotiated, then the slave should fail to activate. On Thu, May 31, 2018 at 12:10 PM, Radu Nicolau wrote: > > Add a call to rte_eth_link_get_nowait on every slave to update > the internal link status struct. Otherwise slave add will fail > for mode 4 if the ports are all stopped but only one of them checked. > > Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions") > Bugzilla ID: 52 > > Signed-off-by: Radu Nicolau > --- > v3: updated commit msg > v2: add fix and Bugzilla references > > drivers/net/bonding/rte_eth_bond_api.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c > index d558df8..cad08b9 100644 > --- a/drivers/net/bonding/rte_eth_bond_api.c > +++ b/drivers/net/bonding/rte_eth_bond_api.c > @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, uint16_t slave_port_id) > return -1; > } > > + rte_eth_link_get_nowait(slave_port_id, &link_props); > + > slave_add(internals, slave_eth_dev); > > /* We need to store slaves reta_size to be able to synchronize RETA for all > -- > 2.7.5 >
Re: [dpdk-dev] [dpdk-stable] Regression tests for stable releases from companies involved in DPDK
On Thu, May 31, 2018 at 12:26 PM, Luca Boccassi wrote: > Hello all, > > At this morning's release meeting (minutes coming soon from John), we > briefly discussed the state of the regression testing for stable > releases and agreed we need to formalise the process. > > At the moment we have a firm commitment from Intel and Mellanox to test > all stable branches (and if I heard correctly from NXP as well? Please > confirm!). AT&T committed to run regressions on the 16.11 branch. > > Here's what we need in order to improve the quality of the stable > releases process: > > 1) More commitments to help from other companies involved in the DPDK > community. At the cost of re-stating the obvious, improving the quality > of stable releases is for everyone's benefit, as a lot of customers and > projects rely on the stable or LTS releases for their production > environments. > > 2) A formalised deadline - the current proposal is 10 days from the > "xx.yy patches review and test" email, which was just sent for 16.11. > For the involved companies, please let us know if 10 days is enough. In > terms of scheduling, this period will always start within a week from > the mainline final release. Again, the signal is the "xx.yy patches > review and test" appearing in the inbox, which will detail the > deadline. > > Hi Luca, I discussed with Thomas about it. I don't know how much extra effort for the stable maintainers it would be, but I wonder if there could be a XX.YY.z-rc tarball. That would be a) a more clear sign what people are used to test b) easier to integrate as I assume quite a bunch of tests will usually start rebasing on tarballs instead of directly from git. If you think everyone can derive from git easily I'm fine, I just wondered if a proper -rc tarball might be more comfortable for the testing entities. cu Christian