Re: [dpdk-dev] Compilation of MLX5 driver

2018-05-31 Thread Nitin Katiyar
Hi,
It has following files:

arch.h  ib.h  kern-abi.h  mlx4dv.h  mlx5dv.h  opcode.h  sa.h  sa-kern-abi.h  
verbs.h

I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and 
MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64

Regards,
Nitin

-Original Message-
From: Shahaf Shuler [mailto:shah...@mellanox.com] 
Sent: Thursday, May 31, 2018 10:51 AM
To: Nitin Katiyar ; Nélio Laranjeiro 

Cc: dev@dpdk.org
Subject: RE: [dpdk-dev] Compilation of MLX5 driver

Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar:
> 
> Hi,
> I was compiling 17.05.02.
> Regards,
> Nitin
> 
> -Original Message-
> From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> Sent: Wednesday, May 30, 2018 6:42 PM
> To: Nitin Katiyar 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] Compilation of MLX5 driver
> 
> Hi,
> 
> On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote:
> > Hi,
> > I am trying to compile MLX5 PMD driver by setting
> "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation 
> error.
> >
> > fatal error: infiniband/mlx5_hw.h: No such file or directory

Can you list the files you have under /usr/include/infiniband ? 

> >
> > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 
> > machine
> but still hitting the same error. Am I missing some other package?
> 
> Which version of DPDK are you using (it is important to help)?
> 
> Regards,
> 
> --
> Nélio Laranjeiro
> 6WIND


[dpdk-dev] [PATCH 1/2] log: remove useless intermediate buffer

2018-05-31 Thread David Marchand
Rather than copy the log message, we can use a precision in the format
string given to syslog.

Fixes: af75078fece3 ("first public release")
Signed-off-by: David Marchand 
---
 lib/librte_eal/linuxapp/eal/eal_log.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_log.c 
b/lib/librte_eal/linuxapp/eal/eal_log.c
index ff14588..9d02ddd 100644
--- a/lib/librte_eal/linuxapp/eal/eal_log.c
+++ b/lib/librte_eal/linuxapp/eal/eal_log.c
@@ -25,25 +25,14 @@
 static ssize_t
 console_log_write(__attribute__((unused)) void *c, const char *buf, size_t 
size)
 {
-   char copybuf[BUFSIZ + 1];
ssize_t ret;
-   uint32_t loglevel;
 
/* write on stdout */
ret = fwrite(buf, 1, size, stdout);
fflush(stdout);
 
-   /* truncate message if too big (should not happen) */
-   if (size > BUFSIZ)
-   size = BUFSIZ;
-
/* Syslog error levels are from 0 to 7, so subtract 1 to convert */
-   loglevel = rte_log_cur_msg_loglevel() - 1;
-   memcpy(copybuf, buf, size);
-   copybuf[size] = '\0';
-
-   /* write on syslog too */
-   syslog(loglevel, "%s", copybuf);
+   syslog(rte_log_cur_msg_loglevel() - 1, "%.*s", (int)size, buf);
 
return ret;
 }
-- 
2.7.4



[dpdk-dev] [PATCH 2/2] cmdline: remove useless intermediate buffer

2018-05-31 Thread David Marchand
Rather than copy the string, we can use a precision in the format string
given to printf.

Signed-off-by: David Marchand 
---
 lib/librte_cmdline/cmdline_parse.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/lib/librte_cmdline/cmdline_parse.c 
b/lib/librte_cmdline/cmdline_parse.c
index 961f9be..9666e90 100644
--- a/lib/librte_cmdline/cmdline_parse.c
+++ b/lib/librte_cmdline/cmdline_parse.c
@@ -208,9 +208,6 @@ cmdline_parse(struct cmdline *cl, const char * buf)
int err = CMDLINE_PARSE_NOMATCH;
int tok;
cmdline_parse_ctx_t *ctx;
-#ifdef RTE_LIBRTE_CMDLINE_DEBUG
-   char debug_buf[BUFSIZ];
-#endif
char *result_buf = result.buf;
 
if (!cl || !buf)
@@ -250,10 +247,8 @@ cmdline_parse(struct cmdline *cl, const char * buf)
return linelen;
}
 
-#ifdef RTE_LIBRTE_CMDLINE_DEBUG
-   strlcpy(debug_buf, buf, (linelen > 64 ? 64 : linelen));
-   debug_printf("Parse line : len=%d, <%s>\n", linelen, debug_buf);
-#endif
+   debug_printf("Parse line : len=%d, <%.*s>\n",
+linelen, linelen > 64 ? 64 : linelen, buf);
 
/* parse it !! */
inst = ctx[inst_num];
-- 
2.7.4



Re: [dpdk-dev] Compilation of MLX5 driver

2018-05-31 Thread Nélio Laranjeiro
On Thu, May 31, 2018 at 07:01:17AM +, Nitin Katiyar wrote:
> Hi,
> It has following files:
> 
> arch.h  ib.h  kern-abi.h  mlx4dv.h  mlx5dv.h  opcode.h  sa.h
> sa-kern-abi.h  verbs.h
> 
> I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and
> MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64

Did you installed Mellanox OFED with the --dpdk --upstream-libs
arguments for the installation script?

If it is the case, you should not add them for this version, those
options are for DPDK v17.11 and higher.

Regards,

> Regards,
> Nitin
> 
> -Original Message-
> From: Shahaf Shuler [mailto:shah...@mellanox.com] 
> Sent: Thursday, May 31, 2018 10:51 AM
> To: Nitin Katiyar ; Nélio Laranjeiro 
> 
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] Compilation of MLX5 driver
> 
> Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar:
> > 
> > Hi,
> > I was compiling 17.05.02.
> > Regards,
> > Nitin
> > 
> > -Original Message-
> > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> > Sent: Wednesday, May 30, 2018 6:42 PM
> > To: Nitin Katiyar 
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] Compilation of MLX5 driver
> > 
> > Hi,
> > 
> > On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote:
> > > Hi,
> > > I am trying to compile MLX5 PMD driver by setting
> > "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation 
> > error.
> > >
> > > fatal error: infiniband/mlx5_hw.h: No such file or directory
> 
> Can you list the files you have under /usr/include/infiniband ? 
> 
> > >
> > > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 
> > > machine
> > but still hitting the same error. Am I missing some other package?
> > 
> > Which version of DPDK are you using (it is important to help)?
> > 
> > Regards,
> > 
> > --
> > Nélio Laranjeiro
> > 6WIND

-- 
Nélio Laranjeiro
6WIND


[dpdk-dev] [PATCH v2] net/ixgbe: fix crash on detach

2018-05-31 Thread Pablo de Lara
When detaching a port bound to ixgbe PMD, if the port
does not have any VFs, *vfinfo is not set and there is
a NULL dereference attempt, when calling
rte_eth_switch_domain_free(), which expects VFs to be used,
causing a segmentation fault.

Steps to reproduce:

./testpmd -- -i
testpmd> port stop all
testpmd> port close all
testpmd> port detach 0

Fixes: cf80ba6e2038 ("net/ixgbe: add support for representor ports")
Cc: sta...@dpdk.org

Reported-by: Anatoly Burakov 
Signed-off-by: Pablo de Lara 
Tested-by: Anatoly Burakov 
Acked-by: Remy Horton 
---

v2:
- Cc stable as this fix is targetting code that was introduced in
  previous release

 drivers/net/ixgbe/ixgbe_pf.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
index 4d199c802..c381acf44 100644
--- a/drivers/net/ixgbe/ixgbe_pf.c
+++ b/drivers/net/ixgbe/ixgbe_pf.c
@@ -135,14 +135,14 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev)
RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx = 0;
RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx = 0;
 
-   ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id);
-   if (ret)
-   PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret);
-
vf_num = dev_num_vf(eth_dev);
if (vf_num == 0)
return;
 
+   ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id);
+   if (ret)
+   PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret);
+
rte_free(*vfinfo);
*vfinfo = NULL;
 }
-- 
2.17.0



Re: [dpdk-dev] Compilation of MLX5 driver

2018-05-31 Thread Nitin Katiyar
Yes,I installed it using --dpdk --upstream-libs. What is the way forward now?

Regards,
Nitin

-Original Message-
From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] 
Sent: Thursday, May 31, 2018 1:36 PM
To: Nitin Katiyar 
Cc: Shahaf Shuler ; dev@dpdk.org
Subject: Re: [dpdk-dev] Compilation of MLX5 driver

On Thu, May 31, 2018 at 07:01:17AM +, Nitin Katiyar wrote:
> Hi,
> It has following files:
> 
> arch.h  ib.h  kern-abi.h  mlx4dv.h  mlx5dv.h  opcode.h  sa.h 
> sa-kern-abi.h  verbs.h
> 
> I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and
> MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64

Did you installed Mellanox OFED with the --dpdk --upstream-libs arguments for 
the installation script?

If it is the case, you should not add them for this version, those options are 
for DPDK v17.11 and higher.

Regards,

> Regards,
> Nitin
> 
> -Original Message-
> From: Shahaf Shuler [mailto:shah...@mellanox.com]
> Sent: Thursday, May 31, 2018 10:51 AM
> To: Nitin Katiyar ; Nélio Laranjeiro 
> 
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] Compilation of MLX5 driver
> 
> Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar:
> > 
> > Hi,
> > I was compiling 17.05.02.
> > Regards,
> > Nitin
> > 
> > -Original Message-
> > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> > Sent: Wednesday, May 30, 2018 6:42 PM
> > To: Nitin Katiyar 
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] Compilation of MLX5 driver
> > 
> > Hi,
> > 
> > On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote:
> > > Hi,
> > > I am trying to compile MLX5 PMD driver by setting
> > "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation 
> > error.
> > >
> > > fatal error: infiniband/mlx5_hw.h: No such file or directory
> 
> Can you list the files you have under /usr/include/infiniband ? 
> 
> > >
> > > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 
> > > machine
> > but still hitting the same error. Am I missing some other package?
> > 
> > Which version of DPDK are you using (it is important to help)?
> > 
> > Regards,
> > 
> > --
> > Nélio Laranjeiro
> > 6WIND

--
Nélio Laranjeiro
6WIND


[dpdk-dev] [Bug 54] i40e port link status no updated for interrupt mode

2018-05-31 Thread bugzilla
https://dpdk.org/tracker/show_bug.cgi?id=54

Fan Zhang (roy.fan.zh...@intel.com) changed:

   What|Removed |Added

 Status|CONFIRMED   |RESOLVED
  Component|eventdev|ethdev
 Resolution|--- |FIXED

--- Comment #1 from Fan Zhang (roy.fan.zh...@intel.com) ---
Sent patch to fix and merged.

https://dpdk.org/dev/patchwork/patch/40512/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[dpdk-dev] [PATCH v6 0/3] Improve zero-length memzone allocation

2018-05-31 Thread Anatoly Burakov
This patchset does two things. First, it enables reserving
memzones of zero-length that are IOVA-contiguous. Second,
it fixes a long-standing race condition in reserving
zero-length memzones, where malloc heap is not locked between
stats collection and reservation, and will instead allocate
biggest element on the spot.

Some limitations are added, but they are a trade-off between
not having race conditions and user convenience. It would be
possible to lock all heaps during memzone reserve for zero-
length, and that would keep the old behavior, but given how
such allocation (especially when asking for IOVA-contiguous
memory) may take a long time, a design decision was made to
keep things simple, and only check other heaps if the
current one is completely busy.

Ideas on improvement are welcome.

v6:
- Rebase on top of 18.05
- Dropped malloc stats changes as no deprecation notice was
  sent, and i would like to integrate these changes in this
  release :)

v5:
- Use bound length if reserving memzone with zero length

v4:
- Fixes in memzone test
- Account for element padding
- Fix for wrong memzone size being returned
- Documentation fixes

Anatoly Burakov (3):
  malloc: add finding biggest free IOVA-contiguous element
  malloc: allow reserving biggest element
  memzone: improve zero-length memzone reserve

 lib/librte_eal/common/eal_common_memzone.c  |  70 ++---
 lib/librte_eal/common/include/rte_memzone.h |  24 ++-
 lib/librte_eal/common/malloc_elem.c |  79 ++
 lib/librte_eal/common/malloc_elem.h |   6 +
 lib/librte_eal/common/malloc_heap.c | 126 +++
 lib/librte_eal/common/malloc_heap.h |   4 +
 test/test/test_memzone.c| 165 +++-
 7 files changed, 343 insertions(+), 131 deletions(-)

-- 
2.17.0


[dpdk-dev] [PATCH v6 2/3] malloc: allow reserving biggest element

2018-05-31 Thread Anatoly Burakov
Add an internal-only function to allocate biggest element from
the heap. Nominally, it supports SOCKET_ID_ANY as its socket
argument, but it's essentially useless because other sockets
will only be allocated from if the entire heap on current or
specified socket is busy.

Still, asking to reserve a biggest element will allow fixing
race condition in memzone reserve that has been there for a
long time.

Signed-off-by: Anatoly Burakov 
Acked-by: Remy Horton 
---
 lib/librte_eal/common/malloc_heap.c | 126 
 lib/librte_eal/common/malloc_heap.h |   4 +
 2 files changed, 130 insertions(+)

diff --git a/lib/librte_eal/common/malloc_heap.c 
b/lib/librte_eal/common/malloc_heap.c
index d6cf3af81..12aaf2d72 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -148,6 +148,52 @@ find_suitable_element(struct malloc_heap *heap, size_t 
size,
return NULL;
 }
 
+/*
+ * Iterates through the freelist for a heap to find a free element with the
+ * biggest size and requested alignment. Will also set size to whatever element
+ * size that was found.
+ * Returns null on failure, or pointer to element on success.
+ */
+static struct malloc_elem *
+find_biggest_element(struct malloc_heap *heap, size_t *size,
+   unsigned int flags, size_t align, bool contig)
+{
+   struct malloc_elem *elem, *max_elem = NULL;
+   size_t idx, max_size = 0;
+
+   for (idx = 0; idx < RTE_HEAP_NUM_FREELISTS; idx++) {
+   for (elem = LIST_FIRST(&heap->free_head[idx]);
+   !!elem; elem = LIST_NEXT(elem, free_list)) {
+   size_t cur_size;
+   if (!check_hugepage_sz(flags, elem->msl->page_sz))
+   continue;
+   if (contig) {
+   cur_size =
+   malloc_elem_find_max_iova_contig(elem,
+   align);
+   } else {
+   void *data_start = RTE_PTR_ADD(elem,
+   MALLOC_ELEM_HEADER_LEN);
+   void *data_end = RTE_PTR_ADD(elem, elem->size -
+   MALLOC_ELEM_TRAILER_LEN);
+   void *aligned = RTE_PTR_ALIGN_CEIL(data_start,
+   align);
+   /* check if aligned data start is beyond end */
+   if (aligned >= data_end)
+   continue;
+   cur_size = RTE_PTR_DIFF(data_end, aligned);
+   }
+   if (cur_size > max_size) {
+   max_size = cur_size;
+   max_elem = elem;
+   }
+   }
+   }
+
+   *size = max_size;
+   return max_elem;
+}
+
 /*
  * Main function to allocate a block of memory from the heap.
  * It locks the free list, scans it, and adds a new memseg if the
@@ -174,6 +220,26 @@ heap_alloc(struct malloc_heap *heap, const char *type 
__rte_unused, size_t size,
return elem == NULL ? NULL : (void *)(&elem[1]);
 }
 
+static void *
+heap_alloc_biggest(struct malloc_heap *heap, const char *type __rte_unused,
+   unsigned int flags, size_t align, bool contig)
+{
+   struct malloc_elem *elem;
+   size_t size;
+
+   align = RTE_CACHE_LINE_ROUNDUP(align);
+
+   elem = find_biggest_element(heap, &size, flags, align, contig);
+   if (elem != NULL) {
+   elem = malloc_elem_alloc(elem, size, align, 0, contig);
+
+   /* increase heap's count of allocated elements */
+   heap->alloc_count++;
+   }
+
+   return elem == NULL ? NULL : (void *)(&elem[1]);
+}
+
 /* this function is exposed in malloc_mp.h */
 void
 rollback_expand_heap(struct rte_memseg **ms, int n_segs,
@@ -575,6 +641,66 @@ malloc_heap_alloc(const char *type, size_t size, int 
socket_arg,
return NULL;
 }
 
+static void *
+heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
+   size_t align, bool contig)
+{
+   struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+   struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+   void *ret;
+
+   rte_spinlock_lock(&(heap->lock));
+
+   align = align == 0 ? 1 : align;
+
+   ret = heap_alloc_biggest(heap, type, flags, align, contig);
+
+   rte_spinlock_unlock(&(heap->lock));
+
+   return ret;
+}
+
+void *
+malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
+   size_t align, bool contig)
+{
+   int socket, i, cur_socket;
+   void *ret;
+
+   /* return NULL if align is not power-of-2 */
+   if ((align && !rte_is_power_of_2(align)))

[dpdk-dev] [PATCH v6 1/3] malloc: add finding biggest free IOVA-contiguous element

2018-05-31 Thread Anatoly Burakov
Adding internal-only function to find biggest free IOVA-contiguous
malloc element. This is not exposed to external API.

Signed-off-by: Anatoly Burakov 
Acked-by: Remy Horton 
Acked-by: Shreyansh Jain 
---

Notes:
v6:
- Patch was postponed to 18.08 but i forgot to add
  deprecation notice for the API changes, so
  these external malloc stats API changes have been
  dropped from this patchset

v4:
- Fix comments to be more up to date with v4 code
- Add comments explaining trailer handling

v2:
- Add header to newly recalculated element start

v2:
- Add header to newly recalculated element start

 lib/librte_eal/common/malloc_elem.c | 79 +
 lib/librte_eal/common/malloc_elem.h |  6 +++
 2 files changed, 85 insertions(+)

diff --git a/lib/librte_eal/common/malloc_elem.c 
b/lib/librte_eal/common/malloc_elem.c
index 9bfe9b9b4..f1bb4fee7 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -18,10 +18,89 @@
 #include 
 #include 
 
+#include "eal_internal_cfg.h"
 #include "eal_memalloc.h"
 #include "malloc_elem.h"
 #include "malloc_heap.h"
 
+size_t
+malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
+{
+   void *cur_page, *contig_seg_start, *page_end, *cur_seg_end;
+   void *data_start, *data_end;
+   rte_iova_t expected_iova;
+   struct rte_memseg *ms;
+   size_t page_sz, cur, max;
+
+   page_sz = (size_t)elem->msl->page_sz;
+   data_start = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN);
+   data_end = RTE_PTR_ADD(elem, elem->size - MALLOC_ELEM_TRAILER_LEN);
+   /* segment must start after header and with specified alignment */
+   contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
+
+   /* if we're in IOVA as VA mode, or if we're in legacy mode with
+* hugepages, all elements are IOVA-contiguous.
+*/
+   if (rte_eal_iova_mode() == RTE_IOVA_VA ||
+   (internal_config.legacy_mem && rte_eal_has_hugepages()))
+   return RTE_PTR_DIFF(data_end, contig_seg_start);
+
+   cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
+   ms = rte_mem_virt2memseg(cur_page, elem->msl);
+
+   /* do first iteration outside the loop */
+   page_end = RTE_PTR_ADD(cur_page, page_sz);
+   cur_seg_end = RTE_MIN(page_end, data_end);
+   cur = RTE_PTR_DIFF(cur_seg_end, contig_seg_start) -
+   MALLOC_ELEM_TRAILER_LEN;
+   max = cur;
+   expected_iova = ms->iova + page_sz;
+   /* memsegs are contiguous in memory */
+   ms++;
+
+   cur_page = RTE_PTR_ADD(cur_page, page_sz);
+
+   while (cur_page < data_end) {
+   page_end = RTE_PTR_ADD(cur_page, page_sz);
+   cur_seg_end = RTE_MIN(page_end, data_end);
+
+   /* reset start of contiguous segment if unexpected iova */
+   if (ms->iova != expected_iova) {
+   /* next contiguous segment must start at specified
+* alignment.
+*/
+   contig_seg_start = RTE_PTR_ALIGN(cur_page, align);
+   /* new segment start may be on a different page, so find
+* the page and skip to next iteration to make sure
+* we're not blowing past data end.
+*/
+   ms = rte_mem_virt2memseg(contig_seg_start, elem->msl);
+   cur_page = ms->addr;
+   /* don't trigger another recalculation */
+   expected_iova = ms->iova;
+   continue;
+   }
+   /* cur_seg_end ends on a page boundary or on data end. if we're
+* looking at data end, then malloc trailer is already included
+* in the calculations. if we're looking at page end, then we
+* know there's more data past this page and thus there's space
+* for malloc element trailer, so don't count it here.
+*/
+   cur = RTE_PTR_DIFF(cur_seg_end, contig_seg_start);
+   /* update max if cur value is bigger */
+   if (cur > max)
+   max = cur;
+
+   /* move to next page */
+   cur_page = page_end;
+   expected_iova = ms->iova + page_sz;
+   /* memsegs are contiguous in memory */
+   ms++;
+   }
+
+   return max;
+}
+
 /*
  * Initialize a general malloc_elem header structure
  */
diff --git a/lib/librte_eal/common/malloc_elem.h 
b/lib/librte_eal/common/malloc_elem.h
index 7331af9ca..e2bda4c02 100644
--- a/lib/librte_eal/common/malloc_elem.h
+++ b/lib/librte_eal/common/malloc_elem.h
@@ -179,4 +179,10 @@ malloc_elem_free_list_index(size_t size);
 void
 malloc_elem_free_list_insert(struct malloc_elem *elem);
 
+/*
+ * Find biggest IOV

[dpdk-dev] [PATCH v6 3/3] memzone: improve zero-length memzone reserve

2018-05-31 Thread Anatoly Burakov
Currently, reserving zero-length memzones is done by looking at
malloc statistics, and reserving biggest sized element found in those
statistics. This has two issues.

First, there is a race condition. The heap is unlocked between the
time we check stats, and the time we reserve malloc element for memzone.
This may lead to inability to reserve the memzone we wanted to reserve,
because another allocation might have taken place and biggest sized
element may no longer be available.

Second, the size returned by malloc statistics does not include any
alignment information, which is worked around by being conservative and
subtracting alignment length from the final result. This leads to
fragmentation and reserving memzones that could have been bigger but
aren't.

Fix all of this by using earlier-introduced operation to reserve
biggest possible malloc element. This, however, comes with a trade-off,
because we can only lock one heap at a time. So, if we check the first
available heap and find *any* element at all, that element will be
considered "the biggest", even though other heaps might have bigger
elements. We cannot know what other heaps have before we try and
allocate it, and it is not a good idea to lock all of the heaps at
the same time, so, we will just document this limitation and
encourage users to reserve memzones with socket id properly set.

Also, fixup unit tests to account for the new behavior.

Fixes: fafcc11985a2 ("mem: rework memzone to be allocated by malloc")
Cc: sergio.gonzalez.mon...@intel.com

Signed-off-by: Anatoly Burakov 
---

Notes:
v6:
- Rebase on 18.05

v5:
- Use bound len when reserving bounded zero-length memzones

v4:
- Rebased on latest master
- Improved documentation
- Added accounting for element pad [1]
- Fixed max len underflow in test
- Checkpatch fixes

[1] A patch has recently fixed a similar issue:

https://dpdk.org/dev/patchwork/patch/39332/

The accounting for padding is also needed because size of the element
may include not only malloc header overhead, but also the padding if
it has any.

At first glance, it would seem like additional change is needed for
pre-18.05 code as well. However, on closer inspection, the original code
was incorrect because it was comparing requested_len to 0, which is never
zero and is always a minimum of cache line size due to earlier RTE_MAX()
call (or rather, it could be zero, but in that case it would fail earlier).
This downgrades the above quoted bug from "potential memory corruption bug"
to "this bug was never a bug due to another bug".

A proper fix for pre-18.05 would be to remove the check altogether and
always go by requested_len, which is what we use to reserve memzones
in the first place. I will submit it separately.

 lib/librte_eal/common/eal_common_memzone.c  |  70 ++---
 lib/librte_eal/common/include/rte_memzone.h |  24 ++-
 test/test/test_memzone.c| 165 +++-
 3 files changed, 128 insertions(+), 131 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c 
b/lib/librte_eal/common/eal_common_memzone.c
index faa3b0615..7300fe05d 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -52,38 +52,6 @@ memzone_lookup_thread_unsafe(const char *name)
return NULL;
 }
 
-
-/* This function will return the greatest free block if a heap has been
- * specified. If no heap has been specified, it will return the heap and
- * length of the greatest free block available in all heaps */
-static size_t
-find_heap_max_free_elem(int *s, unsigned align)
-{
-   struct rte_mem_config *mcfg;
-   struct rte_malloc_socket_stats stats;
-   int i, socket = *s;
-   size_t len = 0;
-
-   /* get pointer to global configuration */
-   mcfg = rte_eal_get_configuration()->mem_config;
-
-   for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
-   if ((socket != SOCKET_ID_ANY) && (socket != i))
-   continue;
-
-   malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
-   if (stats.greatest_free_size > len) {
-   len = stats.greatest_free_size;
-   *s = i;
-   }
-   }
-
-   if (len < MALLOC_ELEM_OVERHEAD + align)
-   return 0;
-
-   return len - MALLOC_ELEM_OVERHEAD - align;
-}
-
 static const struct rte_memzone *
 memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
int socket_id, unsigned int flags, unsigned int align,
@@ -92,6 +60,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, 
size_t len,
struct rte_memzone *mz;
struct rte_mem_config *mcfg;
struct rte_fbarray *arr;
+   void *mz_addr;
size_t requested_len;
int mz_idx;
bool contig;
@@ -140,8 +109,7 @@ memzone_reserve_aligned_threa

[dpdk-dev] [PATCH v3] net/ixgbe: fix crash on detach

2018-05-31 Thread Pablo de Lara
When detaching a port bound to ixgbe PMD, if the port
does not have any VFs, *vfinfo is not set and there is
a NULL dereference attempt, when calling
rte_eth_switch_domain_free(), which expects VFs to be used,
causing a segmentation fault.

Steps to reproduce:

./testpmd -- -i
testpmd> port stop all
testpmd> port close all
testpmd> port detach 0

Bugzilla ID: 57
Fixes: cf80ba6e2038 ("net/ixgbe: add support for representor ports")
Cc: sta...@dpdk.org

Reported-by: Anatoly Burakov 
Signed-off-by: Pablo de Lara 
Tested-by: Anatoly Burakov 
Acked-by: Remy Horton 
---

Changes in v3:
- Added Bugzilla ID

Changes in v2:
- CC'd stable list

 drivers/net/ixgbe/ixgbe_pf.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
index 4d199c802..c381acf44 100644
--- a/drivers/net/ixgbe/ixgbe_pf.c
+++ b/drivers/net/ixgbe/ixgbe_pf.c
@@ -135,14 +135,14 @@ void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev)
RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx = 0;
RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx = 0;
 
-   ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id);
-   if (ret)
-   PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret);
-
vf_num = dev_num_vf(eth_dev);
if (vf_num == 0)
return;
 
+   ret = rte_eth_switch_domain_free((*vfinfo)->switch_domain_id);
+   if (ret)
+   PMD_INIT_LOG(WARNING, "failed to free switch domain: %d", ret);
+
rte_free(*vfinfo);
*vfinfo = NULL;
 }
-- 
2.17.0



Re: [dpdk-dev] Compilation of MLX5 driver

2018-05-31 Thread Nélio Laranjeiro
On Thu, May 31, 2018 at 09:14:03AM +, Nitin Katiyar wrote:
> Yes,I installed it using --dpdk --upstream-libs. What is the way
> forward now?

In v17.05 MLX5 PMD is still relying on libibverbs and libmlx5, the way
Those options you used are necessary to select in their package the
installation of libverbs,libmlx5 or rdma-core.
Doing this you have selected rdma-core which is not supported in v17.05
DPDK version.

You need to install Mellanox OFED without those two options to select
libibverbs, libmlx5 to make it work.

Regards,
 
> Regards,
> Nitin
> 
> -Original Message-
> From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] 
> Sent: Thursday, May 31, 2018 1:36 PM
> To: Nitin Katiyar 
> Cc: Shahaf Shuler ; dev@dpdk.org
> Subject: Re: [dpdk-dev] Compilation of MLX5 driver
> 
> On Thu, May 31, 2018 at 07:01:17AM +, Nitin Katiyar wrote:
> > Hi,
> > It has following files:
> > 
> > arch.h  ib.h  kern-abi.h  mlx4dv.h  mlx5dv.h  opcode.h  sa.h 
> > sa-kern-abi.h  verbs.h
> > 
> > I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and
> > MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64
> 
> Did you installed Mellanox OFED with the --dpdk --upstream-libs arguments for 
> the installation script?
> 
> If it is the case, you should not add them for this version, those options 
> are for DPDK v17.11 and higher.
> 
> Regards,
> 
> > Regards,
> > Nitin
> > 
> > -Original Message-
> > From: Shahaf Shuler [mailto:shah...@mellanox.com]
> > Sent: Thursday, May 31, 2018 10:51 AM
> > To: Nitin Katiyar ; Nélio Laranjeiro 
> > 
> > Cc: dev@dpdk.org
> > Subject: RE: [dpdk-dev] Compilation of MLX5 driver
> > 
> > Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar:
> > > 
> > > Hi,
> > > I was compiling 17.05.02.
> > > Regards,
> > > Nitin
> > > 
> > > -Original Message-
> > > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> > > Sent: Wednesday, May 30, 2018 6:42 PM
> > > To: Nitin Katiyar 
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] Compilation of MLX5 driver
> > > 
> > > Hi,
> > > 
> > > On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote:
> > > > Hi,
> > > > I am trying to compile MLX5 PMD driver by setting
> > > "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation 
> > > error.
> > > >
> > > > fatal error: infiniband/mlx5_hw.h: No such file or directory
> > 
> > Can you list the files you have under /usr/include/infiniband ? 
> > 
> > > >
> > > > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 
> > > > machine
> > > but still hitting the same error. Am I missing some other package?
> > > 
> > > Which version of DPDK are you using (it is important to help)?
> > > 
> > > Regards,
> > > 
> > > --
> > > Nélio Laranjeiro
> > > 6WIND
> 
> --
> Nélio Laranjeiro
> 6WIND

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v2 0/2] Vhost: unitfy receive paths

2018-05-31 Thread Wang, Zhihong



> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Tuesday, May 29, 2018 5:45 PM
> To: dev@dpdk.org; Bie, Tiwei ; Wang, Zhihong
> 
> Cc: Maxime Coquelin 
> Subject: [PATCH v2 0/2] Vhost: unitfy receive paths
> 
> Hi,
> 
> This second version fixes the feature bit check in
> rxvq_is_mergeable(), and remove "mergeable" from rx funcs
> names. No difference is seen in the benchmarks
> 
> This series is preliminary work to ease the integration of
> packed ring layout support. But even without packed ring
> layout, the result is positive.
> 
> First patch unify both paths, and second one is a small
> optimization to avoid copying batch_copy_nb_elems VQ field
> to/from the stack.
> 
> With the series applied, I get modest performance gain for
> both mergeable and non-mergeable casesi (, and the gain of
> about 300 LoC is non negligible maintenance-wise.
> 
> Rx-mrg=off benchmarks:
> 
> ++---+-+-+--+
> |Run |  PVP  | Guest->Host | Host->Guest | Loopback |
> ++---+-+-+--+
> | v18.05-rc5 | 14.47 |   16.64 |   17.57 |13.15 |
> | + series   | 14.87 |   16.86 |   17.70 |13.30 |
> ++---+-+-+--+
> 
> Rx-mrg=on benchmarks:
> 
> ++--+-+-+--+
> |Run | PVP  | Guest->Host | Host->Guest | Loopback |
> ++--+-+-+--+
> | v18.05-rc5 | 9.38 |   13.78 |   16.70 |12.79 |
> | + series   | 9.38 |   13.80 |   17.49 |13.36 |
> ++--+-+-+--+
> 
> Note: Even without my series, the guest->host benchmark with
> mergeable buffers enabled looks suspicious as it should in
> theory be alsmost identical as when Rx mergeable buffers are
> disabled. To be investigated...
> 
> Maxime Coquelin (2):
>   vhost: unify Rx mergeable and non-mergeable paths
>   vhost: improve batched copies performance
> 
>  lib/librte_vhost/virtio_net.c | 376 
> +-
>  1 file changed, 37 insertions(+), 339 deletions(-)
> 

Acked-by: Zhihong Wang 

Thanks Maxime! This is really great to see. ;) We probably need the
same improvement for Virtio-pmd.

One comment on Virtio/Vhost performance analysis: No matter what type
of traffic is used (PVP, or Txonly-Rxonly, Loopback...), we need to
be clear on who we're testing, and give the other part excessive CPU
resources, otherwise we'll be testing whoever the slowest.

Since this patch is for Vhost, I suggest to run N (e.g. N = 4) Virtio
threads on N cores, and the corresponding N Vhost threads on a single
core, to do performance comparison. Do you think this makes sense?

For Guest -> Host, in my test I see Rx-mrg=on has negative impact on
Virtio side, probably because Virtio touches something that's not
touched when Rx-mrg=off.

Thanks
-Zhihong


[dpdk-dev] i40evf: Problem with the statistics

2018-05-31 Thread Mridula V Gangadharan
Hi,

I am testing packet drops scenario by setting the MTU size.
My setup have i40evf driver. I set the dpdk interface's MTU size to 1800.
I am sending 100 packets of size 1918 each.
I am expecting the drop counter to increment.
rte_eth_stats_get() returns i.packets with number of packets I sent.
There are no drop counters incrementing. Also my application is not recieving 
any packets.
Is there some issue with dpdk statistics?
xstats  output is as follows. It is not showing any drops but rx_good_bytes 
counts are incrementing.



NIC extended statistics for port 1

rx_good_packets: 656
tx_good_packets: 556
rx_good_bytes: 225160
tx_good_bytes: 33360
rx_errors: 0
tx_errors: 0
rx_mbuf_allocation_errors: 0
rx_q0packets: 0
rx_q0bytes: 0
rx_q0errors: 0
tx_q0packets: 0
tx_q0bytes: 0
rx_bytes: 225160
rx_unicast_packets: 656
rx_multicast_packets: 0
rx_broadcast_packets: 0
rx_dropped_packets: 0
rx_unknown_protocol_packets: 0
tx_bytes: 33360
tx_unicast_packets: 556
tx_multicast_packets: 0
tx_broadcast_packets: 0
tx_dropped_packets: 0
tx_error_packets: 0

Thanks and Regards,
Mridula



[dpdk-dev] [RFC v2 0/6] Remove IPC threads

2018-05-31 Thread Anatoly Burakov
As previously discussed [1], IPC threads need to be removed and their
workload moved to interrupt thread.

The transition is complete as far as Linux support is concerned, however
since there is no interrupt thread on FreeBSD, this patchset effectively
disables IPC on FreeBSD for now (hence it still being an RFC and not a v1).

Work on adding interrupt thread to FreeBSD is in progress.

[1] http://dpdk.org/dev/patchwork/patch/36579/

Anatoly Burakov (2):
  ipc: remove IPC thread for async requests
  ipc: remove main IPC thread

Jianfeng Tan (4):
  eal/linux: use glibc malloc in alarm
  eal/linux: use glibc malloc in interrupt handling
  eal: bring forward init of interrupt handling
  eal: add IPC type for interrupt thread

 lib/librte_eal/common/eal_common_proc.c   | 233 +++---
 .../common/include/rte_eal_interrupts.h   |   1 +
 lib/librte_eal/linuxapp/eal/eal.c |  10 +-
 lib/librte_eal/linuxapp/eal/eal_alarm.c   |   9 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c  |  19 +-
 test/test/test_interrupts.c   |  29 ++-
 6 files changed, 137 insertions(+), 164 deletions(-)

-- 
2.17.0


[dpdk-dev] [RFC v2 1/6] eal/linux: use glibc malloc in alarm

2018-05-31 Thread Anatoly Burakov
From: Jianfeng Tan 

We will reply on alarm API for async IPC request as following patch
indicates. rte_malloc could require async IPC request.

To avoid such chicken or the egg causality dilemma, we change to
use glibc malloc in alarm implimentation.

Signed-off-by: Jianfeng Tan 
Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_alarm.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c 
b/lib/librte_eal/linuxapp/eal/eal_alarm.c
index c115e823a..391d2a65f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
+++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -91,7 +90,7 @@ eal_alarm_callback(void *arg __rte_unused)
rte_spinlock_lock(&alarm_list_lk);
 
LIST_REMOVE(ap, next);
-   rte_free(ap);
+   free(ap);
}
 
if (!LIST_EMPTY(&alarm_list)) {
@@ -122,7 +121,7 @@ rte_eal_alarm_set(uint64_t us, rte_eal_alarm_callback 
cb_fn, void *cb_arg)
if (us < 1 || us > (UINT64_MAX - US_PER_S) || cb_fn == NULL)
return -EINVAL;
 
-   new_alarm = rte_zmalloc(NULL, sizeof(*new_alarm), 0);
+   new_alarm = calloc(1, sizeof(*new_alarm));
if (new_alarm == NULL)
return -ENOMEM;
 
@@ -196,7 +195,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void 
*cb_arg)
 
if (ap->executing == 0) {
LIST_REMOVE(ap, next);
-   rte_free(ap);
+   free(ap);
count++;
} else {
/* If calling from other context, mark that 
alarm is executing
@@ -220,7 +219,7 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn, void 
*cb_arg)
 
if (ap->executing == 0) {
LIST_REMOVE(ap, next);
-   rte_free(ap);
+   free(ap);
count++;
ap = ap_prev;
} else if (pthread_equal(ap->executing_id, 
pthread_self()) == 0)
-- 
2.17.0


[dpdk-dev] [RFC v2 3/6] eal/linux: use glibc malloc in interrupt handling

2018-05-31 Thread Anatoly Burakov
From: Jianfeng Tan 

We will rely on interrupt thread to implement IPC; and IPC initialization
is in very early stage, when memory subsystem is not initialized yet.
So we change to use glibc malloc/free.

Signed-off-by: Jianfeng Tan 
Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 056d41c12..180c0378a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -30,7 +30,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -405,8 +404,7 @@ rte_intr_callback_register(const struct rte_intr_handle 
*intr_handle,
}
 
/* allocate a new interrupt callback entity */
-   callback = rte_zmalloc("interrupt callback list",
-   sizeof(*callback), 0);
+   callback = calloc(1, sizeof(*callback));
if (callback == NULL) {
RTE_LOG(ERR, EAL, "Can not allocate memory\n");
return -ENOMEM;
@@ -431,10 +429,10 @@ rte_intr_callback_register(const struct rte_intr_handle 
*intr_handle,
 
/* no existing callbacks for this - add new source */
if (src == NULL) {
-   if ((src = rte_zmalloc("interrupt source list",
-   sizeof(*src), 0)) == NULL) {
+   src = calloc(1, sizeof(*src));
+   if (src == NULL) {
RTE_LOG(ERR, EAL, "Can not allocate memory\n");
-   rte_free(callback);
+   free(callback);
ret = -ENOMEM;
} else {
src->intr_handle = *intr_handle;
@@ -501,7 +499,7 @@ rte_intr_callback_unregister(const struct rte_intr_handle 
*intr_handle,
if (cb->cb_fn == cb_fn && (cb_arg == (void *)-1 ||
cb->cb_arg == cb_arg)) {
TAILQ_REMOVE(&src->callbacks, cb, next);
-   rte_free(cb);
+   free(cb);
ret++;
}
}
@@ -509,7 +507,7 @@ rte_intr_callback_unregister(const struct rte_intr_handle 
*intr_handle,
/* all callbacks for that source are removed. */
if (TAILQ_EMPTY(&src->callbacks)) {
TAILQ_REMOVE(&intr_sources, src, next);
-   rte_free(src);
+   free(src);
}
}
 
-- 
2.17.0


[dpdk-dev] [RFC v2 6/6] ipc: remove main IPC thread

2018-05-31 Thread Anatoly Burakov
Previously, to handle requests from peer(s), or replies for a
request (sync or async) by itself, a dedicated IPC thread was set
up.

Now that every other piece of the puzzle is in place, we can get rid
of the IPC thread, and move waiting for IPC messages entirely into the
interrupt thread.

Signed-off-by: Jianfeng Tan 
Signed-off-by: Anatoly Burakov 
Suggested-by: Thomas Monjalon 
---

Notes:
RFC->RFCv2:
- Fixed resource leaks
- Improved readability

 lib/librte_eal/common/eal_common_proc.c | 46 ++---
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index 6f3366403..162d67ca5 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -101,6 +102,8 @@ static struct {
/**< used in async requests only */
 };
 
+static struct rte_intr_handle ipc_intr_handle;
+
 /* forward declarations */
 static int
 mp_send(struct rte_mp_msg *msg, const char *peer, int type);
@@ -350,18 +353,17 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un 
*s)
}
 }
 
-static void *
+static void
 mp_handle(void *arg __rte_unused)
 {
struct mp_msg_internal msg;
struct sockaddr_un sa;
 
while (1) {
-   if (read_msg(&msg, &sa) == 0)
-   process_msg(&msg, &sa);
+   if (read_msg(&msg, &sa) < 0)
+   break;
+   process_msg(&msg, &sa);
}
-
-   return NULL;
 }
 
 static int
@@ -570,7 +572,6 @@ rte_mp_channel_init(void)
 {
char path[PATH_MAX];
int dir_fd;
-   pthread_t mp_handle_tid;
 
/* create filter path */
create_socket_path("*", path, sizeof(path));
@@ -585,36 +586,32 @@ rte_mp_channel_init(void)
if (dir_fd < 0) {
RTE_LOG(ERR, EAL, "failed to open %s: %s\n",
mp_dir_path, strerror(errno));
-   return -1;
+   goto fail;
}
 
if (flock(dir_fd, LOCK_EX)) {
RTE_LOG(ERR, EAL, "failed to lock %s: %s\n",
mp_dir_path, strerror(errno));
-   close(dir_fd);
-   return -1;
+   goto fail;
}
 
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
unlink_sockets(mp_filter)) {
RTE_LOG(ERR, EAL, "failed to unlink mp sockets\n");
-   close(dir_fd);
-   return -1;
+   goto fail;
}
 
if (open_socket_fd() < 0) {
-   close(dir_fd);
-   return -1;
+   goto fail;
}
 
-   if (rte_ctrl_thread_create(&mp_handle_tid, "rte_mp_handle",
-   NULL, mp_handle, NULL) < 0) {
-   RTE_LOG(ERR, EAL, "failed to create mp thead: %s\n",
-   strerror(errno));
-   close(mp_fd);
-   close(dir_fd);
-   mp_fd = -1;
-   return -1;
+   ipc_intr_handle.fd = mp_fd;
+   ipc_intr_handle.type = RTE_INTR_HANDLE_IPC;
+
+   if (rte_intr_callback_register(&ipc_intr_handle, mp_handle, NULL) < 0) {
+   RTE_LOG(ERR, EAL, "failed to register IPC interrupt callback: 
%s\n",
+   strerror(errno));
+   goto fail;
}
 
/* unlock the directory */
@@ -622,6 +619,13 @@ rte_mp_channel_init(void)
close(dir_fd);
 
return 0;
+fail:
+   if (dir_fd >= 0)
+   close(dir_fd);
+   if (mp_fd >= 0)
+   close(mp_fd);
+   mp_fd = -1;
+   return -1;
 }
 
 /**
-- 
2.17.0


[dpdk-dev] [RFC v2 4/6] eal: bring forward init of interrupt handling

2018-05-31 Thread Anatoly Burakov
From: Jianfeng Tan 

IPC will reply on interrupt handling, so we move forward the init
of interrupt handling.

Signed-off-by: Jianfeng Tan 
Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 8655b8691..f8a0c06d7 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -839,6 +839,11 @@ rte_eal_init(int argc, char **argv)
 
rte_config_init();
 
+   if (rte_eal_intr_init() < 0) {
+   rte_eal_init_alert("Cannot init interrupt-handling thread\n");
+   return -1;
+   }
+
/* Put mp channel init before bus scan so that we can init the vdev
 * bus through mp channel in the secondary process before the bus scan.
 */
@@ -968,11 +973,6 @@ rte_eal_init(int argc, char **argv)
rte_config.master_lcore, (int)thread_id, cpuset,
ret == 0 ? "" : "...");
 
-   if (rte_eal_intr_init() < 0) {
-   rte_eal_init_alert("Cannot init interrupt-handling thread\n");
-   return -1;
-   }
-
RTE_LCORE_FOREACH_SLAVE(i) {
 
/*
-- 
2.17.0


[dpdk-dev] [RFC v2 2/6] ipc: remove IPC thread for async requests

2018-05-31 Thread Anatoly Burakov
Previously, we were using two IPC threads - one to handle messages
and synchronous requests, and another to handle asynchronous requests.
To handle replies for an async request, rte_mp_handle woke up the
rte_mp_handle_async thread to process through pthread_cond variable.

Change it to handle asynchronous messages within the main IPC thread.
To handle timeout events, for each async request which is sent,
we set an alarm for it. If its reply is received before timeout,
we will cancel the alarm when we handle the reply; otherwise,
alarm will invoke the async_reply_handle() as the alarm callback.

Signed-off-by: Jianfeng Tan 
Signed-off-by: Anatoly Burakov 
Suggested-by: Thomas Monjalon 
---

Notes:
RFC->RFCv2:
- Rebased on latest code
- Implemented comments to the original RFC

 lib/librte_eal/common/eal_common_proc.c | 191 
 1 file changed, 65 insertions(+), 126 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index 707d8ab30..6f3366403 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -94,11 +95,9 @@ TAILQ_HEAD(pending_request_list, pending_request);
 static struct {
struct pending_request_list requests;
pthread_mutex_t lock;
-   pthread_cond_t async_cond;
 } pending_requests = {
.requests = TAILQ_HEAD_INITIALIZER(pending_requests.requests),
.lock = PTHREAD_MUTEX_INITIALIZER,
-   .async_cond = PTHREAD_COND_INITIALIZER
/**< used in async requests only */
 };
 
@@ -106,6 +105,16 @@ static struct {
 static int
 mp_send(struct rte_mp_msg *msg, const char *peer, int type);
 
+/* for use with alarm callback */
+static void
+async_reply_handle(void *arg);
+
+/* for use with process_msg */
+static struct pending_request *
+async_reply_handle_thread_unsafe(void *arg);
+
+static void
+trigger_async_action(struct pending_request *req);
 
 static struct pending_request *
 find_pending_request(const char *dst, const char *act_name)
@@ -290,6 +299,8 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un 
*s)
RTE_LOG(DEBUG, EAL, "msg: %s\n", msg->name);
 
if (m->type == MP_REP || m->type == MP_IGN) {
+   struct pending_request *req = NULL;
+
pthread_mutex_lock(&pending_requests.lock);
pending_req = find_pending_request(s->sun_path, msg->name);
if (pending_req) {
@@ -301,11 +312,14 @@ process_msg(struct mp_msg_internal *m, struct sockaddr_un 
*s)
if (pending_req->type == REQUEST_TYPE_SYNC)
pthread_cond_signal(&pending_req->sync.cond);
else if (pending_req->type == REQUEST_TYPE_ASYNC)
-   pthread_cond_signal(
-   &pending_requests.async_cond);
+   req = async_reply_handle_thread_unsafe(
+   pending_req);
} else
RTE_LOG(ERR, EAL, "Drop mp reply: %s\n", msg->name);
pthread_mutex_unlock(&pending_requests.lock);
+
+   if (req != NULL)
+   trigger_async_action(req);
return;
}
 
@@ -365,7 +379,6 @@ timespec_cmp(const struct timespec *a, const struct 
timespec *b)
 }
 
 enum async_action {
-   ACTION_NONE, /**< don't do anything */
ACTION_FREE, /**< free the action entry, but don't trigger callback */
ACTION_TRIGGER /**< trigger callback, then free action entry */
 };
@@ -375,7 +388,7 @@ process_async_request(struct pending_request *sr, const 
struct timespec *now)
 {
struct async_request_param *param;
struct rte_mp_reply *reply;
-   bool timeout, received, last_msg;
+   bool timeout, last_msg;
 
param = sr->async.param;
reply = ¶m->user_reply;
@@ -383,13 +396,6 @@ process_async_request(struct pending_request *sr, const 
struct timespec *now)
/* did we timeout? */
timeout = timespec_cmp(¶m->end, now) <= 0;
 
-   /* did we receive a response? */
-   received = sr->reply_received != 0;
-
-   /* if we didn't time out, and we didn't receive a response, ignore */
-   if (!timeout && !received)
-   return ACTION_NONE;
-
/* if we received a response, adjust relevant data and copy mesasge. */
if (sr->reply_received == 1 && sr->reply) {
struct rte_mp_msg *msg, *user_msgs, *tmp;
@@ -448,118 +454,58 @@ trigger_async_action(struct pending_request *sr)
free(sr->async.param->user_reply.msgs);
free(sr->async.param);
free(sr->request);
+   free(sr);
 }
 
 static struct pending_request *
-check_trigger(struct timespec *ts)
+async_reply_handle_thread_unsafe(void *arg)
 {
-   struct pending_request *next

[dpdk-dev] [RFC v2 5/6] eal: add IPC type for interrupt thread

2018-05-31 Thread Anatoly Burakov
From: Jianfeng Tan 

We are going to merge IPC into interrupt thread. This patch adds
IPC type for interrupt thread.

Signed-off-by: Jianfeng Tan 
Signed-off-by: Anatoly Burakov 
---

Notes:
RFC->RFCv2:
- Fixed typo in test app

 .../common/include/rte_eal_interrupts.h   |  1 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c  |  5 
 test/test/test_interrupts.c   | 29 ++-
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h 
b/lib/librte_eal/common/include/rte_eal_interrupts.h
index 6eb493271..344db768d 100644
--- a/lib/librte_eal/common/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/common/include/rte_eal_interrupts.h
@@ -35,6 +35,7 @@ enum rte_intr_handle_type {
RTE_INTR_HANDLE_EXT,  /**< external handler */
RTE_INTR_HANDLE_VDEV, /**< virtual device */
RTE_INTR_HANDLE_DEV_EVENT,/**< device event handle */
+   RTE_INTR_HANDLE_IPC,  /**< IPC event handle */
RTE_INTR_HANDLE_MAX   /**< count of elements */
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 180c0378a..390672739 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -560,6 +560,8 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle)
/* not used at this moment */
case RTE_INTR_HANDLE_DEV_EVENT:
return -1;
+   case RTE_INTR_HANDLE_IPC:
+   return -1;
/* unknown handle type */
default:
RTE_LOG(ERR, EAL,
@@ -610,6 +612,8 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle)
/* not used at this moment */
case RTE_INTR_HANDLE_DEV_EVENT:
return -1;
+   case RTE_INTR_HANDLE_IPC:
+   return -1;
/* unknown handle type */
default:
RTE_LOG(ERR, EAL,
@@ -679,6 +683,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int 
nfds)
call = true;
break;
case RTE_INTR_HANDLE_DEV_EVENT:
+   case RTE_INTR_HANDLE_IPC:
bytes_read = 0;
call = true;
break;
diff --git a/test/test/test_interrupts.c b/test/test/test_interrupts.c
index dc19175d3..fa18ddf75 100644
--- a/test/test/test_interrupts.c
+++ b/test/test/test_interrupts.c
@@ -21,6 +21,7 @@ enum test_interrupt_handle_type {
TEST_INTERRUPT_HANDLE_VALID_UIO,
TEST_INTERRUPT_HANDLE_VALID_ALARM,
TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT,
+   TEST_INTERRUPT_HANDLE_VALID_IPC,
TEST_INTERRUPT_HANDLE_CASE1,
TEST_INTERRUPT_HANDLE_MAX
 };
@@ -85,6 +86,10 @@ test_interrupt_init(void)
intr_handles[TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT].type =
RTE_INTR_HANDLE_DEV_EVENT;
 
+   intr_handles[TEST_INTERRUPT_HANDLE_VALID_IPC].fd = pfds.readfd;
+   intr_handles[TEST_INTERRUPT_HANDLE_VALID_IPC].type =
+   RTE_INTR_HANDLE_IPC;
+
intr_handles[TEST_INTERRUPT_HANDLE_CASE1].fd = pfds.writefd;
intr_handles[TEST_INTERRUPT_HANDLE_CASE1].type = RTE_INTR_HANDLE_UIO;
 
@@ -263,6 +268,14 @@ test_interrupt_enable(void)
return -1;
}
 
+   /* check with specific valid intr_handle */
+   test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_IPC];
+   if (rte_intr_enable(&test_intr_handle) == 0) {
+   printf("unexpectedly enable a specific intr_handle "
+   "successfully\n");
+   return -1;
+   }
+
/* check with valid handler and its type */
test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
if (rte_intr_enable(&test_intr_handle) < 0) {
@@ -327,6 +340,14 @@ test_interrupt_disable(void)
return -1;
}
 
+   /* check with specific valid intr_handle */
+   test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_VALID_IPC];
+   if (rte_intr_disable(&test_intr_handle) == 0) {
+   printf("unexpectedly disable a specific intr_handle "
+   "successfully\n");
+   return -1;
+   }
+
/* check with valid handler and its type */
test_intr_handle = intr_handles[TEST_INTERRUPT_HANDLE_CASE1];
if (rte_intr_disable(&test_intr_handle) < 0) {
@@ -424,7 +445,7 @@ test_interrupt(void)
 
printf("Check valid alarm interrupt full path\n");
if (test_interrupt_full_path_check(
-   TEST_INTERRUPT_HANDLE_VALID_DEV_EVENT) < 0) {
+   TEST_INTERRUPT_HANDLE_VALID_IPC) < 0) {
printf("failure occurred during checking valid alarm "
"interrupt full path\n");
goto out;
@@ -548,

[dpdk-dev] 16.11.7 (LTS) patches review and test

2018-05-31 Thread Luca Boccassi
Hi all,

Here is a list of patches targeted for LTS release 16.11.7. Please
help review and test. The planned date for the final release is Monday,
June the 11th.
Before that, please shout if anyone has objections with these
patches being applied.

Also for the companies committed to running regression tests, please
run the tests and report any issue before the release date.

These patches are located at branch 16.11 of dpdk-stable repo:
https://dpdk.org/browse/dpdk-stable/

Thanks.

Luca Boccassi

---
Ajit Khaparde (6):
  net/bnxt: fix Rx drop setting
  net/bnxt: fix endianness of flag
  net/bnxt: fix Rx checksum flags for tunnel frames
  net/bnxt: avoid freeing memzone multiple times
  net/bnxt: fix mbuf data offset initialization
  net/bnxt: fix Rx checksum flags

Alejandro Lucero (3):
  net/nfp: fix assigning port id in mbuf
  net/nfp: fix barrier location
  net/nfp: fix mbufs releasing when stop or close

Allain Legacy (1):
  ip_frag: fix double free of chained mbufs

Anatoly Burakov (3):
  memzone: fix size on reserving biggest memzone
  eal: remove unused path pattern
  mempool: fix virtual address population

Andrew Rybchenko (2):
  mempool: fix leak when no objects are populated
  test/mempool: fix autotest retry

Andy Green (29):
  eal: explicit cast of builtin for bsf32
  eal: explicit cast of core id when getting index
  eal: declare trace buffer at top of own block
  spinlock/x86: move stack declaration before code
  net: move stack variable at top of VLAN strip function
  ethdev: explicit cast of buffered Tx number
  hash: move stack declaration at top of CRC32c function
  hash: explicit casts for truncation in CRC32c
  net/nfp: fix memcpy out of source range
  net/bnx2x: do not cast function pointers as a policy
  net/bnx2x: fix KR2 device check
  net/bnx2x: fix memzone name overrun
  net/qede: replace strncpy by strlcpy
  net/qede: fix strncpy
  bus/pci: fix size of driver name buffer
  eal: fix casts in random functions
  mbuf: fix reference counter integer promotion
  mbuf: explicit casts of reference counter
  mbuf: explicit cast of headroom on reset
  mbuf: explicit cast of size on detach
  net: explicit cast of multicast bit clearing
  net: explicit cast of IP checksum to 16-bit
  net: explicit cast of protocol in IPv6 checksum
  ethdev: explicit cast of queue count return
  eal: explicit cast in rwlock functions
  net: explicit cast in L4 checksum
  mbuf: fix type of private size in detach
  mbuf: avoid integer promotion in prepend/adj/chain
  ethdev: fix type and scope of variables in Rx burst

Beilei Xing (2):
  net/i40e: fix link status update
  net/i40e: fix failing to disable FDIR Tx queue

Bruce Richardson (1):
  eal: support strlcpy function

Chas Williams (5):
  net/vmxnet3: set the queue shared buffer at start
  net/bonding: fix setting VLAN ID on slave ports
  net/bonding: clear started state if start fails
  net/vmxnet3: keep link state consistent
  net/bonding: export mode 4 slave info routine

Ciara Loftus (1):
  net/vhost: initialise device as inactive

Daniel Shelepov (1):
  app/testpmd: fix burst stats reporting

David Hunt (3):
  test/distributor: fix return type of thread function
  test/pipeline: fix return type of stub miss
  examples/performance-thread: fix return type of threads

Fan Zhang (1):
  net/i40e: fix link update no wait

Ferruh Yigit (3):
  drivers/net: fix icc deprecated parameter warning
  drivers/net: fix link autoneg value for virtual PMDs
  net/i40e: fix shifts of signed values

Gowrishankar Muthukrishnan (1):
  eal/ppc: remove braces in SMP memory barrier macro

Hyong Youb Kim (1):
  net/enic: allocate stats DMA buffer upfront during probe

Ivan Malov (1):
  ethdev: improve doc for name by port ID API

Jasvinder Singh (1):
  test/pipeline: fix type of table entry parameter

Jerin Jacob (1):
  app/crypto-perf: fix parameters copy

Jianfeng Tan (1):
  net/virtio-user: fix hugepage files enumeration

John Daley (1):
  net/enic: fix crash on MTU update with non-setup queues

Keith Wiles (1):
  kvargs: fix syntax in comments

Lee Roberts (1):
  kni: fix build on RHEL 7.5

Li Han (1):
  ip_frag: fix some debug logs

Matan Azrad (5):
  app/testpmd: fix slave port detection
  app/testpmd: fix valid ports prints
  app/testpmd: fix forward ports update
  app/testpmd: fix forward ports Rx flush
  app/testpmd: fix synchronic port hotplug

Matej Vido (2):
  net/szedata2: fix total stats
  net/szedata2: fix format string for PCI address

Maxime Coquelin (2):
  vhost: fix compilation issue when vhost debug enabled
  vhost: improve dirty pages logging performance

Mohammad Abdul Awal (1):
  ethdev: fix string length in name comparison

Nitin

[dpdk-dev] Regression tests for stable releases from companies involved in DPDK

2018-05-31 Thread Luca Boccassi
Hello all,

At this morning's release meeting (minutes coming soon from John), we
briefly discussed the state of the regression testing for stable
releases and agreed we need to formalise the process.

At the moment we have a firm commitment from Intel and Mellanox to test
all stable branches (and if I heard correctly from NXP as well? Please
confirm!). AT&T committed to run regressions on the 16.11 branch.

Here's what we need in order to improve the quality of the stable
releases process:

1) More commitments to help from other companies involved in the DPDK
community. At the cost of re-stating the obvious, improving the quality
of stable releases is for everyone's benefit, as a lot of customers and
projects rely on the stable or LTS releases for their production
environments.

2) A formalised deadline - the current proposal is 10 days from the
"xx.yy patches review and test" email, which was just sent for 16.11.
For the involved companies, please let us know if 10 days is enough. In
terms of scheduling, this period will always start within a week from
the mainline final release. Again, the signal is the "xx.yy patches
review and test" appearing in the inbox, which will detail the
deadline.

Comments?

-- 
Kind regards,
Luca Boccassi


[dpdk-dev] [RFC 0/3] Make device mapping more reliable

2018-05-31 Thread Anatoly Burakov
Currently, memory for device maps is allocated ad-hoc, by calculating
end of VA space allocated for hugepages and crossing fingers in hopes that
those addresses will be free in primary and secondary processes. This leads
to situations such as this:

EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_178323_8af2229603de4
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device :81:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1563 net_ixgbe
EAL: Cannot mmap device resource file 
/sys/bus/pci/devices/:81:00.0/resource0 to address: 0x7ff7f580
EAL: Requested device :81:00.0 cannot be used
EAL: Error - exiting with code: 1
  Cause: No Ethernet ports - bye

As can be seen from the above log, secondary process has initialized
successfully, but device BAR mapping has failed, which resulted in missing ports
in the secondary process.

This patchset is an attempt to fix this problem once and for all, by using
the same method we use for memory to do device mappings as well. That is,
by preallocating all of the device memory in advance, so that initialization
either succeeds and allows for device mappings, or it fails outright (whereas
currently we may be in an in-between kind of situation, where init has
succeeded but device mappings have failed).

This change breaks the ABI, so it is not for this release. However, i'd like
to hear feedback on the approach and whether there are potential problems with
other buses/use cases that i didn't think of.

Anatoly Burakov (3):
  fbarray: allow zero-sized elements
  mem: add device memory reserve/free API
  bus/pci: use the new device memory API for BAR mapping

 drivers/bus/pci/linux/pci_init.h  |   1 -
 drivers/bus/pci/linux/pci_uio.c   |  11 +-
 drivers/bus/pci/linux/pci_vfio.c  |  27 +-
 lib/librte_eal/common/eal_common_fbarray.c|  10 +-
 lib/librte_eal/common/eal_common_memory.c | 270 --
 .../common/include/rte_eal_memconfig.h|  18 ++
 lib/librte_eal/common/include/rte_memory.h|  40 +++
 lib/librte_pci/Makefile   |   1 +
 lib/librte_pci/rte_pci.c  |  20 +-
 9 files changed, 350 insertions(+), 48 deletions(-)

-- 
2.17.0


[dpdk-dev] [RFC 2/3] mem: add device memory reserve/free API

2018-05-31 Thread Anatoly Burakov
In order for hotplug in multiprocess to work reliably, we will need
a common shared memory area that is guaranteed to be accessible to all
processes at all times. This is accomplished by pre-reserving memory
that will be used for device mappings at startup, and managing it
at runtime.

Two new API calls are added: alloc and free of device memory. Once
allocation is requested, memory is considered to be reserved until it
is freed back using the same API. Usage of which blocks are occupied is
tracked using shared fbarray. This allows us to give out device memory
piecemeal and lessen fragmentation.

Naturally, this adds a limitation of how much device memory DPDK can
use. This is currently set to 2 gigabytes, but will be adjustable in
later revisions.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_memory.c | 270 --
 .../common/include/rte_eal_memconfig.h|  18 ++
 lib/librte_eal/common/include/rte_memory.h|  40 +++
 3 files changed, 312 insertions(+), 16 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memory.c 
b/lib/librte_eal/common/eal_common_memory.c
index 4f0688f9d..8cae9b354 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -33,6 +33,7 @@
  */
 
 #define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i"
+#define DEVICE_MEMORY_NAME "device_memory"
 
 static uint64_t baseaddr_offset;
 static uint64_t system_page_sz;
@@ -904,6 +905,227 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void 
*arg)
return ret;
 }
 
+void * __rte_experimental
+rte_mem_dev_memory_alloc(size_t size, size_t align)
+{
+   struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+   struct rte_fbarray *arr = &mcfg->device_memory.mem_map_arr;
+   unsigned int n_pages, page_align;
+   int start_idx, cur_idx;
+   void *addr = NULL;
+
+   /* check parameters first */
+   if (size == 0 || (size & (system_page_sz - 1)) != 0) {
+   RTE_LOG(ERR, EAL, "%s(): size is not page-aligned\n",
+   __func__);
+   rte_errno = EINVAL;
+   return NULL;
+   }
+   if ((align & (system_page_sz - 1)) != 0) {
+   RTE_LOG(ERR, EAL, "%s(): alignment is not page-aligned\n",
+   __func__);
+   rte_errno = EINVAL;
+   return NULL;
+   }
+   /* PCI BAR sizes can only be powers of two, but this memory may be used
+* for more than just PCI BAR mappings, so only check if alignment is
+* power of two.
+*/
+   if (align != 0 && !rte_is_power_of_2(align)) {
+   RTE_LOG(ERR, EAL, "%s(): alignment is not a power of two\n",
+   __func__);
+   rte_errno = EINVAL;
+   return NULL;
+   }
+   /* check if device memory map is uninitialized. */
+   if (mcfg->device_memory.base_va == NULL || arr->len == 0) {
+   RTE_LOG(ERR, EAL, "%s(): device memory map is not 
initialized\n",
+   __func__);
+   rte_errno = ENODEV;
+   return NULL;
+   }
+
+   n_pages = size / system_page_sz;
+   page_align = align / system_page_sz;
+
+   /* lock the device memory map */
+   rte_spinlock_lock(&mcfg->device_memory.lock);
+
+   start_idx = 0;
+   while (1) {
+   size_t offset;
+   int end;
+
+   cur_idx = rte_fbarray_find_next_n_free(arr, start_idx, n_pages);
+   if (cur_idx < 0)
+   break;
+
+   /* if there are alignment requirements, check if the offset we
+* found is aligned, and if not, align it and check if we still
+* have enough space.
+*/
+   if (page_align != 0 && (cur_idx & (page_align - 1)) != 0) {
+   unsigned int aligned, len;
+
+   aligned = RTE_ALIGN_CEIL(cur_idx, page_align);
+   len = rte_fbarray_find_contig_free(arr, aligned);
+
+   /* if there's not enough space, keep looking */
+   if (len < n_pages) {
+   start_idx = aligned + len;
+   continue;
+   }
+
+   /* we've found space */
+   cur_idx = aligned;
+   }
+   end = cur_idx + n_pages;
+   offset = cur_idx * system_page_sz;
+   addr = RTE_PTR_ADD(mcfg->device_memory.base_va,
+   offset);
+
+   /* now, mark all space as occupied */
+   for (; cur_idx < end; cur_idx++)
+   rte_fbarray_set_used(arr, cur_idx);
+   break;
+   }
+   rte_spinlock_unlock(&mcfg->device_memory.lock);
+
+   if (addr != NULL)
+   RTE_LOG(DEBUG, EAL, "%s(): allocated %p-%p

[dpdk-dev] [RFC 1/3] fbarray: allow zero-sized elements

2018-05-31 Thread Anatoly Burakov
We need to keep usage of our memory area indexed, but we don't
actually need to store any data - we need just the indexing
capabilities of fbarray. Yet, it currently disallows zero-sized
elements. Fix that by removing the check for zero-sized elements -
the rest will work correctly already.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_fbarray.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c 
b/lib/librte_eal/common/eal_common_fbarray.c
index 019f84c18..4a365e7ce 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -391,9 +391,9 @@ set_used(struct rte_fbarray *arr, unsigned int idx, bool 
used)
 }
 
 static int
-fully_validate(const char *name, unsigned int elt_sz, unsigned int len)
+fully_validate(const char *name, unsigned int len)
 {
-   if (name == NULL || elt_sz == 0 || len == 0 || len > INT_MAX) {
+   if (name == NULL || len == 0 || len > INT_MAX) {
rte_errno = EINVAL;
return -1;
}
@@ -420,7 +420,7 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, 
unsigned int len,
return -1;
}
 
-   if (fully_validate(name, elt_sz, len))
+   if (fully_validate(name, len))
return -1;
 
page_sz = sysconf(_SC_PAGESIZE);
@@ -511,7 +511,7 @@ rte_fbarray_attach(struct rte_fbarray *arr)
 * the array, so the parts we care about will not race.
 */
 
-   if (fully_validate(arr->name, arr->elt_sz, arr->len))
+   if (fully_validate(arr->name, arr->len))
return -1;
 
page_sz = sysconf(_SC_PAGESIZE);
@@ -858,7 +858,7 @@ rte_fbarray_dump_metadata(struct rte_fbarray *arr, FILE *f)
return;
}
 
-   if (fully_validate(arr->name, arr->elt_sz, arr->len)) {
+   if (fully_validate(arr->name, arr->len)) {
fprintf(f, "Invalid file-backed array\n");
goto out;
}
-- 
2.17.0


[dpdk-dev] [RFC 3/3] bus/pci: use the new device memory API for BAR mapping

2018-05-31 Thread Anatoly Burakov
Adjust PCI infrastructure to reserve device memory through the
new device memory API. Any hotplug event will reserve memory, any
hot-unplug event will release memory back to the system.

This allows for more reliable PCI mappings in secondary processes,
and will be crucial to support multiprocess hotplug.

Signed-off-by: Anatoly Burakov 
---
 drivers/bus/pci/linux/pci_init.h |  1 -
 drivers/bus/pci/linux/pci_uio.c  | 11 +--
 drivers/bus/pci/linux/pci_vfio.c | 27 ---
 lib/librte_pci/Makefile  |  1 +
 lib/librte_pci/rte_pci.c | 20 +++-
 5 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index c2e603a37..bc9279c66 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -14,7 +14,6 @@
 /*
  * Helper function to map PCI resources right after hugepages in virtual memory
  */
-extern void *pci_map_addr;
 void *pci_find_max_end_va(void);
 
 /* parse one line of the "resource" sysfs file (note that the 'line'
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index d423e4bb0..dbf108b6f 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -26,8 +26,6 @@
 #include "eal_filesystem.h"
 #include "pci_init.h"
 
-void *pci_map_addr = NULL;
-
 #define OFF_MAX  ((uint64_t)(off_t)-1)
 
 int
@@ -316,19 +314,12 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, 
int res_idx,
goto error;
}
 
-   /* try mapping somewhere close to the end of hugepages */
-   if (pci_map_addr == NULL)
-   pci_map_addr = pci_find_max_end_va();
-
-   mapaddr = pci_map_resource(pci_map_addr, fd, 0,
+   mapaddr = pci_map_resource(NULL, fd, 0,
(size_t)dev->mem_resource[res_idx].len, 0);
close(fd);
if (mapaddr == MAP_FAILED)
goto error;
 
-   pci_map_addr = RTE_PTR_ADD(mapaddr,
-   (size_t)dev->mem_resource[res_idx].len);
-
maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr;
maps[map_idx].size = dev->mem_resource[res_idx].len;
maps[map_idx].addr = mapaddr;
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index aeeaa9ed8..f390ea37a 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -324,7 +324,7 @@ pci_rte_vfio_setup_device(struct rte_pci_device *dev, int 
vfio_dev_fd)
 
 static int
 pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
-   int bar_index, int additional_flags)
+   int bar_index)
 {
struct memreg {
unsigned long offset, size;
@@ -371,9 +371,14 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct 
mapped_pci_resource *vfio_res,
memreg[0].size = bar->size;
}
 
-   /* reserve the address using an inaccessible mapping */
-   bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
-   MAP_ANONYMOUS | additional_flags, -1, 0);
+   if (bar->addr == NULL) {
+   bar_addr = rte_mem_dev_memory_alloc(bar->size, 0);
+   if (bar_addr == NULL) {
+   RTE_LOG(ERR, EAL, "%s(): cannot reserve space for 
device\n",
+   __func__);
+   return -1;
+   }
+   }
if (bar_addr != MAP_FAILED) {
void *map_addr = NULL;
if (memreg[0].size) {
@@ -469,7 +474,6 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
for (i = 0; i < (int) vfio_res->nb_maps; i++) {
struct vfio_region_info reg = { .argsz = sizeof(reg) };
-   void *bar_addr;
 
reg.index = i;
 
@@ -494,19 +498,12 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0)
continue;
 
-   /* try mapping somewhere close to the end of hugepages */
-   if (pci_map_addr == NULL)
-   pci_map_addr = pci_find_max_end_va();
-
-   bar_addr = pci_map_addr;
-   pci_map_addr = RTE_PTR_ADD(bar_addr, (size_t) reg.size);
-
-   maps[i].addr = bar_addr;
+   maps[i].addr = NULL;
maps[i].offset = reg.offset;
maps[i].size = reg.size;
maps[i].path = NULL; /* vfio doesn't have per-resource paths */
 
-   ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+   ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i);
if (ret < 0) {
RTE_LOG(ERR, EAL, "  %s mapping BAR%i failed: %s\n",
pci_addr, i, strerror(errno));
@@ -574,7 +571,7 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
maps = vf

[dpdk-dev] i40evf: Problem with the statistics

2018-05-31 Thread Mridula V Gangadharan
Hi,

I am testing packet drops scenario by setting the MTU size. 
My setup have i40evf driver. I set the dpdk interface's MTU size to 1800. 
I am sending 100 packets of size 1918 each.
I am expecting the drop counter to increment.
rte_eth_stats_get() returns i.packets with number of packets I sent. 
There are no drop counters incrementing. Also my application is not recieving 
any packets.
Is there some issue with dpdk statistics?
xstats  output is as follows. It is not showing any drops but rx_good_bytes 
counts are incrementing.



NIC extended statistics for port 1

rx_good_packets: 656
tx_good_packets: 556
rx_good_bytes: 225160 
tx_good_bytes: 33360
rx_errors: 0
tx_errors: 0
rx_mbuf_allocation_errors: 0
rx_q0packets: 0
rx_q0bytes: 0
rx_q0errors: 0
tx_q0packets: 0
tx_q0bytes: 0
rx_bytes: 225160
rx_unicast_packets: 656
rx_multicast_packets: 0
rx_broadcast_packets: 0
rx_dropped_packets: 0
rx_unknown_protocol_packets: 0
tx_bytes: 33360
tx_unicast_packets: 556
tx_multicast_packets: 0
tx_broadcast_packets: 0
tx_dropped_packets: 0
tx_error_packets: 0

Thanks and Regards,
Mridula



[dpdk-dev] [Bug 58] cppcheck static analyzer warnings

2018-05-31 Thread bugzilla
https://dpdk.org/tracker/show_bug.cgi?id=58

Bug ID: 58
   Summary: cppcheck static analyzer warnings
   Product: DPDK
   Version: 18.05
  Hardware: All
OS: All
Status: CONFIRMED
  Severity: normal
  Priority: Normal
 Component: core
  Assignee: dev@dpdk.org
  Reporter: ferruh.yi...@intel.com
  Target Milestone: ---

There was already a mail in mail list to report this issue:
https://dpdk.org/ml/archives/dev/2018-May/101961.html

Some of the issues fixed in v18.05, but some still remain.

Following is the list of the remaining issues:

[app/test-pmd/cmdline_mtr.c:115]: (error) Memory leak: dscp_table
[app/test-pmd/flowgen.c:160]: (error) Uninitialized variable: ol_flags
[app/test-pmd/tm.c:594]: (error) Memory leak: tnp.shared_shaper_id
[drivers/bus/dpaa/base/fman/fman.c:557]: (error) Uninitialized variable: __if
[drivers/bus/dpaa/base/qbman/qman.c:1220]: (error) Address of auto-variable
'p->shadow_dqrr[DQRR_PTR2IDX(dq)]' returned
[drivers/bus/ifpga/ifpga_bus.c:436]: (warning) Possible null pointer
dereference: c2
[drivers/crypto/ccp/ccp_pci.c:41]: (error) Resource leak: fp
[drivers/crypto/dpaa_sec/dpaa_sec.c:662]: (error) Address of auto-variable
'ctx->job' returned
[drivers/crypto/dpaa_sec/dpaa_sec.c:731]: (error) Address of auto-variable
'ctx->job' returned
[drivers/crypto/dpaa_sec/dpaa_sec.c:826]: (error) Address of auto-variable
'ctx->job' returned
[drivers/crypto/dpaa_sec/dpaa_sec.c:881]: (error) Address of auto-variable
'ctx->job' returned
[drivers/crypto/dpaa_sec/dpaa_sec.c:1020]: (error) Address of auto-variable
'ctx->job' returned
[drivers/crypto/dpaa_sec/dpaa_sec.c:1132]: (error) Address of auto-variable
'ctx->job' returned
[drivers/crypto/dpaa_sec/dpaa_sec.c:1258]: (error) Address of auto-variable
'ctx->job' returned
[drivers/crypto/dpaa_sec/dpaa_sec.c:1353]: (error) Address of auto-variable
'ctx->job' returned
[drivers/crypto/dpaa_sec/dpaa_sec.c:1392]: (error) Address of auto-variable
'ctx->job' returned
[drivers/net/avf/base/avf_adminq.c:301]: (error) Shifting signed 32-bit value
by 31 bits is undefined behaviour
[drivers/net/avf/base/avf_adminq.c:336]: (error) Shifting signed 32-bit value
by 31 bits is undefined behaviour
[drivers/net/avf/base/avf_adminq.c:298]: (error) Shifting signed 32-bit value
by 31 bits is undefined behaviour
[drivers/net/avf/base/avf_adminq.c:333]: (error) Shifting signed 32-bit value
by 31 bits is undefined behaviour
[drivers/net/avf/base/avf_common.c:367]: (error) Shifting signed 32-bit value
by 31 bits is undefined behaviour
[drivers/net/avf/base/avf_common.c:364]: (error) Shifting signed 32-bit value
by 31 bits is undefined behaviour
[drivers/net/axgbe/axgbe_dev.c:808]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/axgbe/axgbe_dev.c] -> [drivers/net/axgbe/axgbe_dev.c]: (error)
Invalid value: 0x0204_BUSY_WIDTH
[drivers/net/axgbe/axgbe_ethdev.c] -> [drivers/net/axgbe/axgbe_ethdev.c]:
(error) Invalid value: 0x0008_PR_WIDTH
[drivers/net/axgbe/axgbe_i2c.c] -> [drivers/net/axgbe/axgbe_i2c.c]: (error)
Invalid value: 0x006c_EN_WIDTH
[drivers/net/axgbe/axgbe_phy_impl.c] -> [drivers/net/axgbe/axgbe_phy_impl.c]:
(error) Invalid value: 0x0080_ID_WIDTH
[drivers/net/axgbe/axgbe_rxtx.c:292]: (error) Shifting signed 32-bit value by
31 bits is undefined behaviour
[drivers/net/axgbe/axgbe_rxtx.c:592]: (error) Shifting signed 32-bit value by
31 bits is undefined behaviour
[drivers/net/axgbe/axgbe_rxtx.c] -> [drivers/net/axgbe/axgbe_rxtx.c]: (error)
Invalid value: 0x48_PRXQ_WIDTH
[drivers/net/bnx2x/bnx2x.c:3995]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/bnx2x/bnx2x.c:4000]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/bnx2x/bnx2x.c:8729]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/bnx2x/bnx2x.c:9765]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/bnx2x/elink.c:1042]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/bnx2x/elink.c:2711]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/bnx2x/elink.c:9662]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/bnx2x/elink.c:10295]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/bnxt/bnxt_ethdev.c:598]: (error) Shifting signed 32-bit value by
31 bits is undefined behaviour
[drivers/net/bnxt/bnxt_ethdev.c:638]: (error) Shifting signed 32-bit value by
31 bits is undefined behaviour
[drivers/net/bnxt/bnxt_rxr.c:486]: (error) Uninitialized variable: ag_cons
[drivers/net/bnxt/bnxt_stats.c:211]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/bnxt/bnxt_stats.c:248]: (error) Shifting signed 32-bit value by 31
bits is undefined behaviour
[drivers/net/e1000/base/e1000_8257

Re: [dpdk-dev] cppcheck on dpdk

2018-05-31 Thread Ferruh Yigit
On 5/16/2018 1:41 PM, Ferruh Yigit wrote:
> Today after listening Colin's Static Analysis talk, I run cppcheck on 
> v18.05-rc4
> code and it revealed some issues, sharing here for anyone to interested in
> fixing them. At least I encourage to check maintainers to check their own 
> pieces.
> 
> It is really easy to run cppcheck, in dpdk source folder:
> cppcheck --force .
> 
> With above command cppcheck verifies all #ifdef paths, some issues below seems
> related to this and that is why these issues not seen in build tests.

Some issues are fixed but we still have more, to trace them better submitted a
bugzilla issue for it:

https://dpdk.org/tracker/show_bug.cgi?id=58


Re: [dpdk-dev] Compilation of MLX5 driver

2018-05-31 Thread Nitin Katiyar
Thanks Shahaf, it worked after removing the options you specified.

Regards,
Nitin

-Original Message-
From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com] 
Sent: Thursday, May 31, 2018 3:23 PM
To: Nitin Katiyar 
Cc: Shahaf Shuler ; dev@dpdk.org
Subject: Re: [dpdk-dev] Compilation of MLX5 driver

On Thu, May 31, 2018 at 09:14:03AM +, Nitin Katiyar wrote:
> Yes,I installed it using --dpdk --upstream-libs. What is the way 
> forward now?

In v17.05 MLX5 PMD is still relying on libibverbs and libmlx5, the way Those 
options you used are necessary to select in their package the installation of 
libverbs,libmlx5 or rdma-core.
Doing this you have selected rdma-core which is not supported in v17.05 DPDK 
version.

You need to install Mellanox OFED without those two options to select 
libibverbs, libmlx5 to make it work.

Regards,
 
> Regards,
> Nitin
> 
> -Original Message-
> From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> Sent: Thursday, May 31, 2018 1:36 PM
> To: Nitin Katiyar 
> Cc: Shahaf Shuler ; dev@dpdk.org
> Subject: Re: [dpdk-dev] Compilation of MLX5 driver
> 
> On Thu, May 31, 2018 at 07:01:17AM +, Nitin Katiyar wrote:
> > Hi,
> > It has following files:
> > 
> > arch.h  ib.h  kern-abi.h  mlx4dv.h  mlx5dv.h  opcode.h  sa.h 
> > sa-kern-abi.h  verbs.h
> > 
> > I tried with both MLNX_OFED_LINUX-4.2-1.0.0.0 and
> > MLNX_OFED_LINUX-4.2-1.2.0.0-ubuntu14.04-x86_64
> 
> Did you installed Mellanox OFED with the --dpdk --upstream-libs arguments for 
> the installation script?
> 
> If it is the case, you should not add them for this version, those options 
> are for DPDK v17.11 and higher.
> 
> Regards,
> 
> > Regards,
> > Nitin
> > 
> > -Original Message-
> > From: Shahaf Shuler [mailto:shah...@mellanox.com]
> > Sent: Thursday, May 31, 2018 10:51 AM
> > To: Nitin Katiyar ; Nélio Laranjeiro 
> > 
> > Cc: dev@dpdk.org
> > Subject: RE: [dpdk-dev] Compilation of MLX5 driver
> > 
> > Wednesday, May 30, 2018 7:45 PM, Nitin Katiyar:
> > > 
> > > Hi,
> > > I was compiling 17.05.02.
> > > Regards,
> > > Nitin
> > > 
> > > -Original Message-
> > > From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> > > Sent: Wednesday, May 30, 2018 6:42 PM
> > > To: Nitin Katiyar 
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] Compilation of MLX5 driver
> > > 
> > > Hi,
> > > 
> > > On Wed, May 30, 2018 at 11:54:31AM +, Nitin Katiyar wrote:
> > > > Hi,
> > > > I am trying to compile MLX5 PMD driver by setting
> > > "CONFIG_RTE_LIBRTE_MLX5_PMD=y" and hitting following compilation 
> > > error.
> > > >
> > > > fatal error: infiniband/mlx5_hw.h: No such file or directory
> > 
> > Can you list the files you have under /usr/include/infiniband ? 
> > 
> > > >
> > > > I have installed MLNX_OFED _LINUX-4.2-1.2.0 on my Ubuntu 14.04 
> > > > machine
> > > but still hitting the same error. Am I missing some other package?
> > > 
> > > Which version of DPDK are you using (it is important to help)?
> > > 
> > > Regards,
> > > 
> > > --
> > > Nélio Laranjeiro
> > > 6WIND
> 
> --
> Nélio Laranjeiro
> 6WIND

--
Nélio Laranjeiro
6WIND


[dpdk-dev] [PATCH] ethdev: force offloading API rules

2018-05-31 Thread Ferruh Yigit
The error path was disabled in previous release to let apps to be more
flexible.

But this release they are enabled, applications have to obey offload API
rules otherwise they will get errors from following APIs:
rte_eth_dev_configure
rte_eth_rx_queue_setup
rte_eth_tx_queue_setup

Signed-off-by: Ferruh Yigit 
---
Cc: Shahaf Shuler 
Cc: Wei Dai 
Cc: Qi Zhang 
Cc: Andrew Rybchenko 
---
 lib/librte_ethdev/rte_ethdev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index cd4bfd3c6..66e311676 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -1171,7 +1171,7 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, 
uint16_t nb_tx_q,
local_conf.rxmode.offloads,
dev_info.rx_offload_capa,
__func__);
-   /* Will return -EINVAL in the next release */
+   return -EINVAL;
}
if ((local_conf.txmode.offloads & dev_info.tx_offload_capa) !=
 local_conf.txmode.offloads) {
@@ -1182,7 +1182,7 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, 
uint16_t nb_tx_q,
local_conf.txmode.offloads,
dev_info.tx_offload_capa,
__func__);
-   /* Will return -EINVAL in the next release */
+   return -EINVAL;
}
 
/* Check that device supports requested rss hash functions. */
@@ -1580,7 +1580,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t 
rx_queue_id,
local_conf.offloads,
dev_info.rx_queue_offload_capa,
__func__);
-   /* Will return -EINVAL in the next release */
+   return -EINVAL;
}
 
ret = (*dev->dev_ops->rx_queue_setup)(dev, rx_queue_id, nb_rx_desc,
@@ -1745,7 +1745,7 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t 
tx_queue_id,
local_conf.offloads,
dev_info.tx_queue_offload_capa,
__func__);
-   /* Will return -EINVAL in the next release */
+   return -EINVAL;
}
 
return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
-- 
2.14.3



Re: [dpdk-dev] [PATCH v2 0/2] Vhost: unitfy receive paths

2018-05-31 Thread Maxime Coquelin




On 05/31/2018 11:55 AM, Wang, Zhihong wrote:




-Original Message-
From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
Sent: Tuesday, May 29, 2018 5:45 PM
To: dev@dpdk.org; Bie, Tiwei ; Wang, Zhihong

Cc: Maxime Coquelin 
Subject: [PATCH v2 0/2] Vhost: unitfy receive paths

Hi,

This second version fixes the feature bit check in
rxvq_is_mergeable(), and remove "mergeable" from rx funcs
names. No difference is seen in the benchmarks

This series is preliminary work to ease the integration of
packed ring layout support. But even without packed ring
layout, the result is positive.

First patch unify both paths, and second one is a small
optimization to avoid copying batch_copy_nb_elems VQ field
to/from the stack.

With the series applied, I get modest performance gain for
both mergeable and non-mergeable casesi (, and the gain of
about 300 LoC is non negligible maintenance-wise.

Rx-mrg=off benchmarks:

++---+-+-+--+
|Run |  PVP  | Guest->Host | Host->Guest | Loopback |
++---+-+-+--+
| v18.05-rc5 | 14.47 |   16.64 |   17.57 |13.15 |
| + series   | 14.87 |   16.86 |   17.70 |13.30 |
++---+-+-+--+

Rx-mrg=on benchmarks:

++--+-+-+--+
|Run | PVP  | Guest->Host | Host->Guest | Loopback |
++--+-+-+--+
| v18.05-rc5 | 9.38 |   13.78 |   16.70 |12.79 |
| + series   | 9.38 |   13.80 |   17.49 |13.36 |
++--+-+-+--+

Note: Even without my series, the guest->host benchmark with
mergeable buffers enabled looks suspicious as it should in
theory be alsmost identical as when Rx mergeable buffers are
disabled. To be investigated...

Maxime Coquelin (2):
   vhost: unify Rx mergeable and non-mergeable paths
   vhost: improve batched copies performance

  lib/librte_vhost/virtio_net.c | 376 +-
  1 file changed, 37 insertions(+), 339 deletions(-)



Acked-by: Zhihong Wang 

Thanks Maxime! This is really great to see. ;) We probably need the
same improvement for Virtio-pmd.


Yes, probably. I'll have a look at it, or if you have time to look at
it, won't blame you! :)


One comment on Virtio/Vhost performance analysis: No matter what type
of traffic is used (PVP, or Txonly-Rxonly, Loopback...), we need to
be clear on who we're testing, and give the other part excessive CPU
resources, otherwise we'll be testing whoever the slowest.

Since this patch is for Vhost, I suggest to run N (e.g. N = 4) Virtio
threads on N cores, and the corresponding N Vhost threads on a single
core, to do performance comparison. Do you think this makes sense?


That's a valid point. I'll try this to get the bottleneck.
I'm in the process of setting up an automated test bench, it will help
running more and more test cases.


For Guest -> Host, in my test I see Rx-mrg=on has negative impact on
Virtio side, probably because Virtio touches something that's not
touched when Rx-mrg=off.


I get it now.
When mrg=off, we use simple_tx version whereas we use the full one when
mrg is off:

static int
virtio_dev_configure(struct rte_eth_dev *dev)
{
...
hw->use_simple_rx = 1;
hw->use_simple_tx = 1;

#if defined RTE_ARCH_ARM64 || defined RTE_ARCH_ARM
if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON)) {
hw->use_simple_rx = 0;
hw->use_simple_tx = 0;
}
#endif
if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
hw->use_simple_rx = 0;
hw->use_simple_tx = 0;
}

if (rx_offloads & (DEV_RX_OFFLOAD_UDP_CKSUM |
   DEV_RX_OFFLOAD_TCP_CKSUM))
hw->use_simple_rx = 0;

return 0;
}

I see two problems here:
1. There should be no reasons not to use simple_tx if mrg is on.
2. We should add test on whether rx and tx offloads have been
negotiated to not use simple versions if it has been.

Do you agree with that proposed changes?
I'll post a RFC for this.

Thanks,
Maxime


Thanks
-Zhihong



Re: [dpdk-dev] [PATCH 1/2] librte_ip_frag: add function to delete expired entries

2018-05-31 Thread Ananyev, Konstantin
Hi Alex,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Alex Kiselev
> Sent: Wednesday, May 16, 2018 12:04 PM
> To: dev@dpdk.org; Burakov, Anatoly 
> Subject: [dpdk-dev] [PATCH 1/2] librte_ip_frag: add function to delete 
> expired entries
> 
> add new function rte_frag_table_del_expired_entries()
> that scans the list of recently used packets and delete the expired ones.
> 
> A fragmented packets is supposed to live no longer than max_cycles,
> but the lib deletes an expired packet only occasionally when it scans
> a bucket to find an empty slot while adding a new packet.
> Therefore a fragment might sit in the table forever.
> 
> Signed-off-by: Alex Kiselev 
> ---
>  lib/librte_ip_frag/ip_frag_common.h| 18 
>  lib/librte_ip_frag/ip_frag_internal.c  | 18 
>  lib/librte_ip_frag/rte_ip_frag.h   | 19 +++-
>  lib/librte_ip_frag/rte_ip_frag_common.c| 46 
> ++
>  lib/librte_ip_frag/rte_ip_frag_version.map |  6 
>  5 files changed, 88 insertions(+), 19 deletions(-)
> 
> diff --git a/lib/librte_ip_frag/ip_frag_common.h 
> b/lib/librte_ip_frag/ip_frag_common.h
> index 197acf8d8..0fdcc7d0f 100644
> --- a/lib/librte_ip_frag/ip_frag_common.h
> +++ b/lib/librte_ip_frag/ip_frag_common.h
> @@ -25,6 +25,12 @@
>  #define IPv6_KEY_BYTES_FMT \
>   "%08" PRIx64 "%08" PRIx64 "%08" PRIx64 "%08" PRIx64
> 
> +#ifdef RTE_LIBRTE_IP_FRAG_TBL_STAT
> +#define  IP_FRAG_TBL_STAT_UPDATE(s, f, v)((s)->f += (v))
> +#else
> +#define  IP_FRAG_TBL_STAT_UPDATE(s, f, v)do {} while (0)
> +#endif /* IP_FRAG_TBL_STAT */
> +
>  /* internal functions declarations */
>  struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp,
>   struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
> @@ -149,4 +155,16 @@ ip_frag_reset(struct ip_frag_pkt *fp, uint64_t tms)
>   fp->frags[IP_FIRST_FRAG_IDX] = zero_frag;
>  }
> 
> +/* local frag table helper functions */
> +static inline void
> +ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
> *dr,
> + struct ip_frag_pkt *fp)
> +{
> + ip_frag_free(fp, dr);
> + ip_frag_key_invalidate(&fp->key);
> + TAILQ_REMOVE(&tbl->lru, fp, lru);
> + tbl->use_entries--;
> + IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, del_num, 1);
> +}
> +
>  #endif /* _IP_FRAG_COMMON_H_ */
> diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
> b/lib/librte_ip_frag/ip_frag_internal.c
> index 2560c7713..97470a872 100644
> --- a/lib/librte_ip_frag/ip_frag_internal.c
> +++ b/lib/librte_ip_frag/ip_frag_internal.c
> @@ -14,24 +14,6 @@
>  #define  IP_FRAG_TBL_POS(tbl, sig)   \
>   ((tbl)->pkt + ((sig) & (tbl)->entry_mask))
> 
> -#ifdef RTE_LIBRTE_IP_FRAG_TBL_STAT
> -#define  IP_FRAG_TBL_STAT_UPDATE(s, f, v)((s)->f += (v))
> -#else
> -#define  IP_FRAG_TBL_STAT_UPDATE(s, f, v)do {} while (0)
> -#endif /* IP_FRAG_TBL_STAT */
> -
> -/* local frag table helper functions */
> -static inline void
> -ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
> *dr,
> - struct ip_frag_pkt *fp)
> -{
> - ip_frag_free(fp, dr);
> - ip_frag_key_invalidate(&fp->key);
> - TAILQ_REMOVE(&tbl->lru, fp, lru);
> - tbl->use_entries--;
> - IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, del_num, 1);
> -}
> -
>  static inline void
>  ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl,  struct ip_frag_pkt *fp,
>   const struct ip_frag_key *key, uint64_t tms)
> diff --git a/lib/librte_ip_frag/rte_ip_frag.h 
> b/lib/librte_ip_frag/rte_ip_frag.h
> index b3f3f78df..3c694df92 100644
> --- a/lib/librte_ip_frag/rte_ip_frag.h
> +++ b/lib/librte_ip_frag/rte_ip_frag.h
> @@ -65,10 +65,13 @@ struct ip_frag_pkt {
> 
>  #define IP_FRAG_DEATH_ROW_LEN 32 /**< death row size (in packets) */
> 
> +/* death row size in mbufs */
> +#define IP_FRAG_DEATH_ROW_MBUF_LEN (IP_FRAG_DEATH_ROW_LEN * (IP_MAX_FRAG_NUM 
> + 1))
> +
>  /** mbuf death row (packets to be freed) */
>  struct rte_ip_frag_death_row {
>   uint32_t cnt;  /**< number of mbufs currently on death row */
> - struct rte_mbuf *row[IP_FRAG_DEATH_ROW_LEN * (IP_MAX_FRAG_NUM + 1)];
> + struct rte_mbuf *row[IP_FRAG_DEATH_ROW_MBUF_LEN];
>   /**< mbufs to be freed */
>  };
> 
> @@ -325,6 +328,20 @@ void rte_ip_frag_free_death_row(struct 
> rte_ip_frag_death_row *dr,
>  void
>  rte_ip_frag_table_statistics_dump(FILE * f, const struct rte_ip_frag_tbl 
> *tbl);
> 
> +/**
> + * Delete expired fragments
> + *
> + * @param tbl
> + *   Table to delete expired fragments from
> + * @param dr
> + *   Death row to free buffers to
> + * @param tms
> + *   Current timestamp
> + */
> +void __rte_experimental
> +rte_frag_table_del_expired_entries(struct rte_ip_frag_tbl *tbl,
> + struct rte_ip_frag_death_row *dr, uint64_t tms);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_ip_frag/rte_ip_frag_common.c 
> b/lib/librte_ip_frag/rte_ip_frag_common.c

Re: [dpdk-dev] [PATCH 2/2] librte_ip_frag: add mbuf counter

2018-05-31 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Alex Kiselev
> Sent: Wednesday, May 16, 2018 12:04 PM
> To: dev@dpdk.org; Burakov, Anatoly 
> Subject: [dpdk-dev] [PATCH 2/2] librte_ip_frag: add mbuf counter
> 
> add new function rte_frag_table_mbuf_count() that returns
> number of mbufs holded in the fragmentation table.
> 
> There might be situations (kind of attack when a lot of
> fragmented packets are sent to a dpdk application in order
> to flood the fragmentation table) when no additional mbufs
> must be added to the fragmentations table since it already
> contains to many of them. Currently there is no way to
> determine the number of mbufs holded int the fragmentation
> table. This patch allows to keep track of the number of mbufs
> holded in the fragmentation table.
> 
> Signed-off-by: Alex Kiselev 
> ---
>  lib/librte_ip_frag/ip_frag_common.h| 12 +++-
>  lib/librte_ip_frag/ip_frag_internal.c  | 15 +--
>  lib/librte_ip_frag/rte_ip_frag.h   | 16 
>  lib/librte_ip_frag/rte_ip_frag_common.c|  1 +
>  lib/librte_ip_frag/rte_ip_frag_version.map |  1 +
>  lib/librte_ip_frag/rte_ipv4_reassembly.c   |  2 +-
>  lib/librte_ip_frag/rte_ipv6_reassembly.c   |  2 +-
>  7 files changed, 36 insertions(+), 13 deletions(-)

Do we really need it?
It's quite significant code changes and the advantage looks quite small to me...
We already have use_entries, right?
That can be used to get some estimation for a number of mbufs in the table.
Konstantin

> 
> diff --git a/lib/librte_ip_frag/ip_frag_common.h 
> b/lib/librte_ip_frag/ip_frag_common.h
> index 0fdcc7d0f..d04e69de6 100644
> --- a/lib/librte_ip_frag/ip_frag_common.h
> +++ b/lib/librte_ip_frag/ip_frag_common.h
> @@ -32,9 +32,9 @@
>  #endif /* IP_FRAG_TBL_STAT */
> 
>  /* internal functions declarations */
> -struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp,
> - struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
> - uint16_t ofs, uint16_t len, uint16_t more_frags);
> +struct rte_mbuf * ip_frag_process(struct rte_ip_frag_tbl *tbl,
> + struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
> + struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags);
> 
>  struct ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl,
>   struct rte_ip_frag_death_row *dr,
> @@ -91,7 +91,8 @@ ip_frag_key_cmp(const struct ip_frag_key * k1, const struct 
> ip_frag_key * k2)
> 
>  /* put fragment on death row */
>  static inline void
> -ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
> +ip_frag_free(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp,
> + struct rte_ip_frag_death_row *dr)
>  {
>   uint32_t i, k;
> 
> @@ -100,6 +101,7 @@ ip_frag_free(struct ip_frag_pkt *fp, struct 
> rte_ip_frag_death_row *dr)
>   if (fp->frags[i].mb != NULL) {
>   dr->row[k++] = fp->frags[i].mb;
>   fp->frags[i].mb = NULL;
> + tbl->nb_mbufs --;
>   }
>   }
> 
> @@ -160,7 +162,7 @@ static inline void
>  ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
> *dr,
>   struct ip_frag_pkt *fp)
>  {
> - ip_frag_free(fp, dr);
> + ip_frag_free(tbl, fp, dr);
>   ip_frag_key_invalidate(&fp->key);
>   TAILQ_REMOVE(&tbl->lru, fp, lru);
>   tbl->use_entries--;
> diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
> b/lib/librte_ip_frag/ip_frag_internal.c
> index 97470a872..eea871b7e 100644
> --- a/lib/librte_ip_frag/ip_frag_internal.c
> +++ b/lib/librte_ip_frag/ip_frag_internal.c
> @@ -29,14 +29,13 @@ static inline void
>  ip_frag_tbl_reuse(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
> *dr,
>   struct ip_frag_pkt *fp, uint64_t tms)
>  {
> - ip_frag_free(fp, dr);
> + ip_frag_free(tbl, fp, dr);
>   ip_frag_reset(fp, tms);
>   TAILQ_REMOVE(&tbl->lru, fp, lru);
>   TAILQ_INSERT_TAIL(&tbl->lru, fp, lru);
>   IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, reuse_num, 1);
>  }
> 
> -
>  static inline void
>  ipv4_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
>  {
> @@ -88,8 +87,9 @@ ipv6_frag_hash(const struct ip_frag_key *key, uint32_t *v1, 
> uint32_t *v2)
>  }
> 
>  struct rte_mbuf *
> -ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
> - struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags)
> +ip_frag_process(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp,
> + struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint16_t ofs,
> + uint16_t len, uint16_t more_frags)
>  {
>   uint32_t idx;
> 
> @@ -147,7 +147,7 @@ ip_frag_process(struct ip_frag_pkt *fp, struct 
> rte_ip_frag_death_row *dr,
>   fp->frags[IP_LAST_FRAG_IDX].len);
> 
>   /* free all fragments, invalidate the entry. */
> - ip_frag_free(fp, dr);
> + ip_fr

[dpdk-dev] [PATCH] ethdev: force RSS offload rules again

2018-05-31 Thread Ferruh Yigit
PMDs should provide supported RSS hash functions via
dev_info.flow_type_rss_offloads variable.

There is a check in ethdev if requested RSS hash function is supported
by PMD or not.
This check has been relaxed in previous release to not return an error
when a non supported has function requested [1], this has been done to
not break the applications.

Adding the error return back.
PMDs need to provide correct list of supported hash functions and
applications need to take care this information before configuring
the RSS otherwise they will get an error from APIs:
rte_eth_dev_rss_hash_update()
rte_eth_dev_configure()

[1] af7551e2bfce ("ethdev: remove error return on RSS hash check")

Signed-off-by: Ferruh Yigit 
---
Cc: Xueming Li 
Cc: Shahaf Shuler 
Cc: Wei Dai 
Cc: Qi Zhang 
Cc: Andrew Rybchenko 
---
 lib/librte_ethdev/rte_ethdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 66e311676..a9977df97 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -1194,6 +1194,7 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, 
uint16_t nb_tx_q,
port_id,
dev_conf->rx_adv_conf.rss_conf.rss_hf,
dev_info.flow_type_rss_offloads);
+   return -EINVAL;
}
 
/*
@@ -2928,6 +2929,7 @@ rte_eth_dev_rss_hash_update(uint16_t port_id,
port_id,
rss_conf->rss_hf,
dev_info.flow_type_rss_offloads);
+   return -EINVAL;
}
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rss_hash_update, -ENOTSUP);
return eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev,
-- 
2.14.3



Re: [dpdk-dev] [PATCH v2 1/2] net/tap: calculate checksums of multi segs packets

2018-05-31 Thread Ferruh Yigit
On 4/22/2018 12:30 PM, Ophir Munk wrote:
> Prior to this commit IP/UDP/TCP checksum offload calculations
> were skipped in case of a multi segments packet.
> This commit enables TAP checksum calculations for multi segments
> packets.
> The only restriction is that the first segment must contain
> headers of layers 3 (IP) and 4 (UDP or TCP)
> 
> Signed-off-by: Ophir Munk 

Hi Ophir,

Can you please rebase the patch on top of latest master, it doesn't applies 
cleanly.

This is an feature from previous release, please send updates early so that we
can get this early into this release.

Thanks,
ferruh


Re: [dpdk-dev] [PATCH v2 1/2] net/tap: calculate checksums of multi segs packets

2018-05-31 Thread Ferruh Yigit
On 5/31/2018 2:52 PM, Ferruh Yigit wrote:
> On 4/22/2018 12:30 PM, Ophir Munk wrote:
>> Prior to this commit IP/UDP/TCP checksum offload calculations
>> were skipped in case of a multi segments packet.
>> This commit enables TAP checksum calculations for multi segments
>> packets.
>> The only restriction is that the first segment must contain
>> headers of layers 3 (IP) and 4 (UDP or TCP)
>>
>> Signed-off-by: Ophir Munk 
> 
> Hi Ophir,
> 
> Can you please rebase the patch on top of latest master, it doesn't applies 
> cleanly.
> 
> This is an feature from previous release, please send updates early so that we
> can get this early into this release.

Opps, I replied to v2 instead of v3. But tested latest version, v3, and need v4.



[dpdk-dev] [PATCH 0/2] Improve service stop support

2018-05-31 Thread Gage Eads
Existing service functions allow us to stop a service, but doing so doesn't
guarantee that the service has finished running on a service core. This
patch set introduces a function, rte_service_may_be_active(), to check
whether a stopped service is truly stopped.

This is needed for flows that modify a resource that the service is
using; for example when stopping an eventdev, any event adapters and/or
scheduler service need to be quiesced first.

This patch set also adds support for the event sw PMD's device stop flush
callback, which relies on this new mechanism to ensure that the
scheduler service is no longer active.

Gage Eads (2):
  service: add mechanism for quiescing a service
  event/sw: support device stop flush callback

 drivers/event/sw/sw_evdev.c | 114 +++-
 drivers/event/sw/sw_evdev_selftest.c|  81 +++-
 lib/librte_eal/common/include/rte_service.h |  16 
 lib/librte_eal/common/rte_service.c |  31 +++-
 lib/librte_eal/rte_eal_version.map  |   1 +
 test/test/test_service_cores.c  |  43 +++
 6 files changed, 279 insertions(+), 7 deletions(-)

-- 
2.13.6



[dpdk-dev] [PATCH 1/2] service: add mechanism for quiescing a service

2018-05-31 Thread Gage Eads
Existing service functions allow us to stop a service, but doing so doesn't
guarantee that the service has finished running on a service core. This
commit introduces rte_service_may_be_active(), which returns whether the
service may be executing on one or more lcores currently, or definitely is
not.

The service core layer supports this function by setting a flag when
a service core is going to execute a service, and unsetting the flag when
the core is no longer able to run the service (its runstate becomes stopped
or the lcore is no longer mapped).

With this new function, applications can set a service's runstate to
stopped, then poll rte_service_may_be_active() until it returns false. At
that point, the service is quiesced.

Signed-off-by: Gage Eads 
---
 lib/librte_eal/common/include/rte_service.h | 16 +++
 lib/librte_eal/common/rte_service.c | 31 ++---
 lib/librte_eal/rte_eal_version.map  |  1 +
 test/test/test_service_cores.c  | 43 +
 4 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_service.h 
b/lib/librte_eal/common/include/rte_service.h
index aea4d91b9..27b2dab7c 100644
--- a/lib/librte_eal/common/include/rte_service.h
+++ b/lib/librte_eal/common/include/rte_service.h
@@ -162,6 +162,22 @@ int32_t rte_service_runstate_set(uint32_t id, uint32_t 
runstate);
 int32_t rte_service_runstate_get(uint32_t id);
 
 /**
+ * This function returns whether the service may be currently executing on
+ * at least one lcore, or definitely is not. This function can be used to
+ * determine if, after setting the service runstate to stopped, the service
+ * is still executing an a service lcore.
+ *
+ * Care must be taken if calling this function when the service runstate is
+ * running, since the result of this function may be incorrect by the time the
+ * function returns due to service cores running in parallel.
+ *
+ * @retval 1 Service may be running on one or more lcores
+ * @retval 0 Service is not running on any lcore
+ * @retval -EINVAL Invalid service id
+ */
+int32_t rte_service_may_be_active(uint32_t id);
+
+/**
  * Enable or disable the check for a service-core being mapped to the service.
  * An application can disable the check when takes the responsibility to run a
  * service itself using *rte_service_run_iter_on_app_lcore*.
diff --git a/lib/librte_eal/common/rte_service.c 
b/lib/librte_eal/common/rte_service.c
index 73507aacb..d6c4c6039 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -52,6 +52,7 @@ struct rte_service_spec_impl {
rte_atomic32_t num_mapped_cores;
uint64_t calls;
uint64_t cycles_spent;
+   uint8_t active_on_lcore[RTE_MAX_LCORE];
 } __rte_cache_aligned;
 
 /* the internal values of a service core */
@@ -347,15 +348,19 @@ rte_service_runner_do_callback(struct 
rte_service_spec_impl *s,
 
 
 static inline int32_t
-service_run(uint32_t i, struct core_state *cs, uint64_t service_mask)
+service_run(uint32_t i, int lcore, struct core_state *cs, uint64_t 
service_mask)
 {
if (!service_valid(i))
return -EINVAL;
struct rte_service_spec_impl *s = &rte_services[i];
if (s->comp_runstate != RUNSTATE_RUNNING ||
s->app_runstate != RUNSTATE_RUNNING ||
-   !(service_mask & (UINT64_C(1) << i)))
+   !(service_mask & (UINT64_C(1) << i))) {
+   s->active_on_lcore[lcore] = 0;
return -ENOEXEC;
+   }
+
+   s->active_on_lcore[lcore] = 1;
 
/* check do we need cmpset, if MT safe or <= 1 core
 * mapped, atomic ops are not required.
@@ -374,6 +379,24 @@ service_run(uint32_t i, struct core_state *cs, uint64_t 
service_mask)
return 0;
 }
 
+int32_t rte_service_may_be_active(uint32_t id)
+{
+   uint32_t ids[RTE_MAX_LCORE] = {0};
+   struct rte_service_spec_impl *s = &rte_services[id];
+   int32_t lcore_count = rte_service_lcore_list(ids, RTE_MAX_LCORE);
+   int i;
+
+   if (!service_valid(id))
+   return -EINVAL;
+
+   for (i = 0; i < lcore_count; i++) {
+   if (s->active_on_lcore[ids[i]])
+   return 1;
+   }
+
+   return 0;
+}
+
 int32_t rte_service_run_iter_on_app_lcore(uint32_t id,
uint32_t serialize_mt_unsafe)
 {
@@ -398,7 +421,7 @@ int32_t rte_service_run_iter_on_app_lcore(uint32_t id,
return -EBUSY;
}
 
-   int ret = service_run(id, cs, UINT64_MAX);
+   int ret = service_run(id, rte_lcore_id(), cs, UINT64_MAX);
 
if (serialize_mt_unsafe)
rte_atomic32_dec(&s->num_mapped_cores);
@@ -419,7 +442,7 @@ rte_service_runner_func(void *arg)
 
for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) {
/* return value ignored as no change to code flow */
-   service_run(i, cs,

[dpdk-dev] [PATCH 2/2] event/sw: support device stop flush callback

2018-05-31 Thread Gage Eads
This commit also adds a flush callback test to the sw eventdev's selftest
suite.

Signed-off-by: Gage Eads 
---
 drivers/event/sw/sw_evdev.c  | 114 ++-
 drivers/event/sw/sw_evdev_selftest.c |  81 -
 2 files changed, 192 insertions(+), 3 deletions(-)

diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 10f0e1ad4..95a6f1fda 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -361,9 +361,99 @@ sw_init_qid_iqs(struct sw_evdev *sw)
}
 }
 
+static int
+sw_qids_empty(struct sw_evdev *sw)
+{
+   unsigned int i, j;
+
+   for (i = 0; i < sw->qid_count; i++) {
+   for (j = 0; j < SW_IQS_MAX; j++) {
+   if (iq_count(&sw->qids[i].iq[j]))
+   return 0;
+   }
+   }
+
+   return 1;
+}
+
+static int
+sw_ports_empty(struct sw_evdev *sw)
+{
+   unsigned int i;
+
+   for (i = 0; i < sw->port_count; i++) {
+   if ((rte_event_ring_count(sw->ports[i].rx_worker_ring)) ||
+rte_event_ring_count(sw->ports[i].cq_worker_ring))
+   return 0;
+   }
+
+   return 1;
+}
+
+static void
+sw_drain_ports(struct rte_eventdev *dev)
+{
+   struct sw_evdev *sw = sw_pmd_priv(dev);
+   eventdev_stop_flush_t flush;
+   unsigned int i;
+   uint8_t dev_id;
+   void *arg;
+
+   flush = dev->dev_ops->dev_stop_flush;
+   dev_id = dev->data->dev_id;
+   arg = dev->data->dev_stop_flush_arg;
+
+   for (i = 0; i < sw->port_count; i++) {
+   struct rte_event ev;
+
+   while (rte_event_dequeue_burst(dev_id, i, &ev, 1, 0)) {
+   if (flush)
+   flush(dev_id, ev, arg);
+
+   ev.op = RTE_EVENT_OP_RELEASE;
+   rte_event_enqueue_burst(dev_id, i, &ev, 1);
+   }
+   }
+}
+
+static void
+sw_drain_queue(struct rte_eventdev *dev, struct sw_iq *iq)
+{
+   struct sw_evdev *sw = sw_pmd_priv(dev);
+   eventdev_stop_flush_t flush;
+   uint8_t dev_id;
+   void *arg;
+
+   flush = dev->dev_ops->dev_stop_flush;
+   dev_id = dev->data->dev_id;
+   arg = dev->data->dev_stop_flush_arg;
+
+   while (iq_count(iq) > 0) {
+   struct rte_event ev;
+
+   iq_dequeue_burst(sw, iq, &ev, 1);
+
+   if (flush)
+   flush(dev_id, ev, arg);
+   }
+}
+
+static void
+sw_drain_queues(struct rte_eventdev *dev)
+{
+   struct sw_evdev *sw = sw_pmd_priv(dev);
+   int i, j;
+
+   for (i = 0; i < sw->qid_count; i++) {
+   for (j = 0; j < SW_IQS_MAX; j++)
+   sw_drain_queue(dev, &sw->qids[i].iq[j]);
+   }
+}
+
 static void
-sw_clean_qid_iqs(struct sw_evdev *sw)
+sw_clean_qid_iqs(struct rte_eventdev *dev)
 {
+   struct sw_evdev *sw = sw_pmd_priv(dev);
int i, j;
 
/* Release the IQ memory of all configured qids */
@@ -729,10 +819,30 @@ static void
 sw_stop(struct rte_eventdev *dev)
 {
struct sw_evdev *sw = sw_pmd_priv(dev);
-   sw_clean_qid_iqs(sw);
+   int32_t runstate;
+
+   /* Stop the scheduler if it's running */
+   runstate = rte_service_runstate_get(sw->service_id);
+   if (runstate == 1)
+   rte_service_runstate_set(sw->service_id, 0);
+
+   while (rte_service_may_be_active(sw->service_id))
+   rte_pause();
+
+   /* Flush all events out of the device */
+   while (!(sw_qids_empty(sw) && sw_ports_empty(sw))) {
+   sw_event_schedule(dev);
+   sw_drain_ports(dev);
+   sw_drain_queues(dev);
+   }
+
+   sw_clean_qid_iqs(dev);
sw_xstats_uninit(sw);
sw->started = 0;
rte_smp_wmb();
+
+   if (runstate == 1)
+   rte_service_runstate_set(sw->service_id, 1);
 }
 
 static int
diff --git a/drivers/event/sw/sw_evdev_selftest.c 
b/drivers/event/sw/sw_evdev_selftest.c
index 78d30e07a..c40912db5 100644
--- a/drivers/event/sw/sw_evdev_selftest.c
+++ b/drivers/event/sw/sw_evdev_selftest.c
@@ -28,6 +28,7 @@
 #define MAX_PORTS 16
 #define MAX_QIDS 16
 #define NUM_PACKETS (1<<18)
+#define DEQUEUE_DEPTH 128
 
 static int evdev;
 
@@ -147,7 +148,7 @@ init(struct test *t, int nb_queues, int nb_ports)
.nb_event_ports = nb_ports,
.nb_event_queue_flows = 1024,
.nb_events_limit = 4096,
-   .nb_event_port_dequeue_depth = 128,
+   .nb_event_port_dequeue_depth = DEQUEUE_DEPTH,
.nb_event_port_enqueue_depth = 128,
};
int ret;
@@ -2807,6 +2808,78 @@ holb(struct test *t) /* test to check we avoid basic 
head-of-line blocking */
return -1;
 }
 
+static void
+flush(uint8_t dev_id __rte_unused, struct rte_event event, void *arg)
+{
+   *((uint

Re: [dpdk-dev] [RFC] net/mvpp2: implement dynamic logging

2018-05-31 Thread Ferruh Yigit
On 4/26/2018 11:44 AM, Tomasz Duszynski wrote:
> Hello Stephen,
> 
> A few nits on this inline.

Hi Tomasz,

This was an RFC targeting your driver.
Can you re-spin the patch with your suggested updates?


> 
> On Wed, Apr 25, 2018 at 09:44:54AM -0700, Stephen Hemminger wrote:
>> All DPDK drivers should use dynamic log types, not the default PMD
>> value.
>>
>> This is an RFC not a patch since I don't have libraries are
>> hardware to validate it.
>>
>> Signed-off-by: Stephen Hemminger 

<...>


Re: [dpdk-dev] [PATCH] net/bonding: update link status on slave add

2018-05-31 Thread Ferruh Yigit
On 5/9/2018 1:06 PM, Radu Nicolau wrote:
> Add a call to rte_eth_link_get_nowait on every slave to update
> the internal link status struct. Otherwise slave add will fail
> for mode 4 if the ports are all stopped but only one of them checked.
> 
> Signed-off-by: Radu Nicolau 

Hi Radu,

Can you please send a new version with updated commit log, with a fix title,
Fixes commit info, and the bugzilla id it is fixing?

Thanks,
ferruh


Re: [dpdk-dev] [PATCH 2/2] librte_ip_frag: add mbuf counter

2018-05-31 Thread Alex Kiselev
Hi Konstantin.

>> -Original Message-
>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Alex Kiselev
>> Sent: Wednesday, May 16, 2018 12:04 PM
>> To: dev@dpdk.org; Burakov, Anatoly 
>> Subject: [dpdk-dev] [PATCH 2/2] librte_ip_frag: add mbuf counter

>> add new function rte_frag_table_mbuf_count() that returns
>> number of mbufs holded in the fragmentation table.

>> There might be situations (kind of attack when a lot of
>> fragmented packets are sent to a dpdk application in order
>> to flood the fragmentation table) when no additional mbufs
>> must be added to the fragmentations table since it already
>> contains to many of them. Currently there is no way to
>> determine the number of mbufs holded int the fragmentation
>> table. This patch allows to keep track of the number of mbufs
>> holded in the fragmentation table.

>> Signed-off-by: Alex Kiselev 
>> ---
>>  lib/librte_ip_frag/ip_frag_common.h| 12 +++-
>>  lib/librte_ip_frag/ip_frag_internal.c  | 15 +--
>>  lib/librte_ip_frag/rte_ip_frag.h   | 16 
>>  lib/librte_ip_frag/rte_ip_frag_common.c|  1 +
>>  lib/librte_ip_frag/rte_ip_frag_version.map |  1 +
>>  lib/librte_ip_frag/rte_ipv4_reassembly.c   |  2 +-
>>  lib/librte_ip_frag/rte_ipv6_reassembly.c   |  2 +-
>>  7 files changed, 36 insertions(+), 13 deletions(-)

> Do we really need it?
> It's quite significant code changes and the advantage looks quite small to 
> me...
Most of the changes are just movements of some internal functions in order
to reuse them in the new code. Basically, the only change I propose is adding 
one
additional counter.

> We already have use_entries, right?
Let's say for example that there are 8 fragmentation tables, 
one table per lcore, since it doesn't support concurrent operations.
use_entries variable indicates that 1000 entries are in use. Each entry 
can hold from 1 to 4 mbufs (RTE_LIBRTE_IP_FRAG_MAX_FRAG).
So, you can't tell whether a fragmentation table holds 1000 mbufs or 4000,
then if we multiply this number to 8 fragmentation tables the estimation
would be even more incorrect. That estimation error might be critical under 
DOS attacks since mbufs is a pretty much limited resource.

> That can be used to get some estimation for a number of mbufs in the table.
> Konstantin


>> diff --git a/lib/librte_ip_frag/ip_frag_common.h 
>> b/lib/librte_ip_frag/ip_frag_common.h
>> index 0fdcc7d0f..d04e69de6 100644
>> --- a/lib/librte_ip_frag/ip_frag_common.h
>> +++ b/lib/librte_ip_frag/ip_frag_common.h
>> @@ -32,9 +32,9 @@
>>  #endif /* IP_FRAG_TBL_STAT */

>>  /* internal functions declarations */
>> -struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp,
>> - struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
>> - uint16_t ofs, uint16_t len, uint16_t more_frags);
>> +struct rte_mbuf * ip_frag_process(struct rte_ip_frag_tbl *tbl,
>> + struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
>> + struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags);

>>  struct ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl,
>>   struct rte_ip_frag_death_row *dr,
>> @@ -91,7 +91,8 @@ ip_frag_key_cmp(const struct ip_frag_key * k1, const 
>> struct ip_frag_key * k2)

>>  /* put fragment on death row */
>>  static inline void
>> -ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
>> +ip_frag_free(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *fp,
>> + struct rte_ip_frag_death_row *dr)
>>  {
>>   uint32_t i, k;

>> @@ -100,6 +101,7 @@ ip_frag_free(struct ip_frag_pkt *fp, struct 
>> rte_ip_frag_death_row *dr)
>>   if (fp->frags[i].mb != NULL) {
>>   dr->row[k++] = fp->frags[i].mb;
>>   fp->frags[i].mb = NULL;
>> + tbl->nb_mbufs --;
>>   }
>>   }

>> @@ -160,7 +162,7 @@ static inline void
>>  ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
>> *dr,
>>   struct ip_frag_pkt *fp)
>>  {
>> - ip_frag_free(fp, dr);
>> + ip_frag_free(tbl, fp, dr);
>>   ip_frag_key_invalidate(&fp->key);
>>   TAILQ_REMOVE(&tbl->lru, fp, lru);
>>   tbl->use_entries--;
>> diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
>> b/lib/librte_ip_frag/ip_frag_internal.c
>> index 97470a872..eea871b7e 100644
>> --- a/lib/librte_ip_frag/ip_frag_internal.c
>> +++ b/lib/librte_ip_frag/ip_frag_internal.c
>> @@ -29,14 +29,13 @@ static inline void
>>  ip_frag_tbl_reuse(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
>> *dr,
>>   struct ip_frag_pkt *fp, uint64_t tms)
>>  {
>> - ip_frag_free(fp, dr);
>> + ip_frag_free(tbl, fp, dr);
>>   ip_frag_reset(fp, tms);
>>   TAILQ_REMOVE(&tbl->lru, fp, lru);
>>   TAILQ_INSERT_TAIL(&tbl->lru, fp, lru);
>>   IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, reuse_num, 1);
>>  }

>> -
>>  static inline void
>>  ipv4_frag_hash(const struct ip_frag_key *key, uint32_

[dpdk-dev] [RFC 01/10] eal: add --no-shared-files option

2018-05-31 Thread Anatoly Burakov
This command-line option will cause DPDK to not create any shared
files at runtime, including any shared configuration or hugetlbfs
files. This is useful for debug purposes, as well as for certain
use cases like containers.

Currently, this option does nothing.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_options.c | 7 +++
 lib/librte_eal/common/eal_internal_cfg.h   | 1 +
 lib/librte_eal/common/eal_options.h| 2 ++
 3 files changed, 10 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index ecebb2923..38df094de 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -66,6 +66,7 @@ eal_long_options[] = {
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
+   {OPT_NO_SHARED_FILES,   0, NULL, OPT_NO_SHARED_FILES_NUM  },
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
{OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM},
{OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM},
@@ -1165,6 +1166,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_shconf = 1;
break;
 
+   case OPT_NO_SHARED_FILES_NUM:
+   conf->no_shared_files = 1;
+   break;
+
case OPT_PROC_TYPE_NUM:
conf->process_type = eal_parse_proc_type(optarg);
break;
@@ -1370,6 +1375,8 @@ eal_common_usage(void)
   "  Set specific log level\n"
   "  -v  Display version information on startup\n"
   "  -h, --help  This help\n"
+  "  --"OPT_NO_SHARED_FILES"   Do not create any shared files 
(config, hugetlbfs, etc.).\n"
+  "  This disables secondary process support\n"
   "\nEAL options for DEBUG use only:\n"
   "  --"OPT_HUGE_UNLINK"   Unlink hugepage files after init\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index c4cbf3acd..3fc71bb49 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -41,6 +41,7 @@ struct internal_config {
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping

* instead of native TSC */
volatile unsigned no_shconf;  /**< true if there is no shared 
config */
+   volatile unsigned no_shared_files; /**< true if there are no shared 
files to be created*/
volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices 
*/
volatile enum rte_proc_type_t process_type; /**< multi-process proc 
type */
/** true to try allocating memory on specific sockets */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index 211ae06ae..b0d9d6819 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -45,6 +45,8 @@ enum {
OPT_NO_PCI_NUM,
 #define OPT_NO_SHCONF "no-shconf"
OPT_NO_SHCONF_NUM,
+#define OPT_NO_SHARED_FILES   "no-shared-files"
+   OPT_NO_SHARED_FILES_NUM,
 #define OPT_SOCKET_MEM"socket-mem"
OPT_SOCKET_MEM_NUM,
 #define OPT_SYSLOG"syslog"
-- 
2.17.0


[dpdk-dev] [RFC 00/10] Support running DPDK without hugetlbfs mountpoint

2018-05-31 Thread Anatoly Burakov
This patchset takes old debug options "--huge-unlink" and
"--no-shconf" and replaces them both with a new option,
"--no-shared-files". This is a special mode which will
disable support for secondary processes, but which will
cause DPDK to not create any shared files while running -
neither hugepages nor any runtime data (everything will
be entirely in memory).

Additionally, on supported kernel/glibc versions (Linux
4.14+, glibc 2.27+), "--no-shared-files" mode will also
reserve hugepages using memfd instead of relying on
hugetlbfs mountpoint. This will make it possible to use
DPDK without hugetlbfs mountpoints (e.g. container use
cases).

This changes functionality of several command-line
switches, so RFC for now. Maybe we could leave the old
switches as they are and deprecate them in the next
release?

Anatoly Burakov (10):
  eal: add --no-shared-files option
  eal: make --no-shconf an alias for --no-shared-files
  eal: make --huge-unlink an alias for --no-shared-files
  fbarray: support no-shared-files mode
  mem: add support for no-shared-files mode
  ipc: add support for no-shared-files mode
  eal: add support for no-shared-files for hugepage info
  eal: add support for no-shared-files in hugepage data file
  eal: do not create runtime dir in no-shared-files mode
  mem: enable memfd-based hugepage allocation

 lib/librte_eal/bsdapp/eal/eal.c   |   7 +-
 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c |   4 +
 lib/librte_eal/common/eal_common_fbarray.c|  71 +
 lib/librte_eal/common/eal_common_memory.c |   3 +-
 lib/librte_eal/common/eal_common_options.c|  25 ++--
 lib/librte_eal/common/eal_common_proc.c   |  25 
 lib/librte_eal/common/eal_internal_cfg.h  |   3 +-
 lib/librte_eal/common/eal_options.h   |   7 +-
 lib/librte_eal/linuxapp/eal/eal.c |  18 ++-
 .../linuxapp/eal/eal_hugepage_info.c  | 140 ++
 lib/librte_eal/linuxapp/eal/eal_memalloc.c| 126 +++-
 lib/librte_eal/linuxapp/eal/eal_memfd.h   |  28 
 lib/librte_eal/linuxapp/eal/eal_memory.c  |  19 ++-
 test/test/test_eal_flags.c|  18 +--
 14 files changed, 384 insertions(+), 110 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_memfd.h

-- 
2.17.0


[dpdk-dev] [RFC 05/10] mem: add support for no-shared-files mode

2018-05-31 Thread Anatoly Burakov
Unlink hugepages after creating them, to honor the no shared files mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal.c  |  9 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c | 23 +++---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 32ca25dc2..7904f813e 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -690,6 +690,15 @@ eal_parse_args(int argc, char **argv)
goto out;
}
 
+   if (internal_config.single_file_segments &&
+   internal_config.no_shared_files) {
+   RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
+   "incompatible with --"OPT_NO_SHARED_FILES"\n");
+   eal_usage(prgname);
+   ret = -1;
+   goto out;
+   }
+
if (optind >= 0)
argv[optind-1] = prgname;
ret = optind-1;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c 
b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 8c11f98c9..f57d307dd 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -512,6 +512,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
__func__, strerror(errno));
goto resized;
}
+   if (internal_config.no_shared_files) {
+   if (unlink(path)) {
+   RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: 
%s\n",
+   __func__, strerror(errno));
+   goto resized;
+   }
+   }
}
 
/*
@@ -562,8 +569,11 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
(unsigned int)(alloc_sz >> 20));
goto mapped;
}
-   /* for non-single file segments, we can close fd here */
-   if (!internal_config.single_file_segments)
+   /* for non-single file segments or no shared files mode, we can close fd
+* here
+*/
+   if (!internal_config.single_file_segments ||
+   internal_config.no_shared_files)
close(fd);
 
/* we need to trigger a write to the page to enforce page fault and
@@ -592,7 +602,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
/* ignore failure, can't make it any worse */
} else {
/* only remove file if we can take out a write lock */
-   if (lock(fd, LOCK_EX) == 1)
+   if (internal_config.no_shared_files == 0 &&
+   lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
}
@@ -617,6 +628,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
return -1;
}
 
+   /* if we're no in shared files mode, nothing needs to be done */
+   if (internal_config.no_shared_files) {
+   memset(ms, 0, sizeof(*ms));
+   return 0;
+   }
+
/* if we are not in single file segments mode, we're going to unmap the
 * segment and thus drop the lock on original fd, but hugepage dir is
 * now locked so we can take out another one without races.
-- 
2.17.0


[dpdk-dev] [RFC 06/10] ipc: add support for no-shared-files mode

2018-05-31 Thread Anatoly Burakov
IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no shared files mode, IPC will be
useless, so do not enable it in the first place. In the interests of
API usage convenience, we will still allow registering callbacks, but
obviously they won't ever be triggered.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_proc.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
index 707d8ab30..6cce4e925 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -626,6 +626,14 @@ rte_mp_channel_init(void)
int dir_fd;
pthread_t mp_handle_tid, async_reply_handle_tid;
 
+   /* in no shared files mode, we do not have secondary processes support,
+* so no need to initialize IPC.
+*/
+   if (internal_config.no_shared_files) {
+   RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be 
disabled\n");
+   return 0;
+   }
+
/* create filter path */
create_socket_path("*", path, sizeof(path));
strlcpy(mp_filter, basename(path), sizeof(mp_filter));
@@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct 
rte_mp_reply *reply,
 
if (check_input(req) == false)
return -1;
+
+   if (internal_config.no_shared_files) {
+   RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is 
disabled\n");
+   return 0;
+   }
+
if (gettimeofday(&now, NULL) < 0) {
RTE_LOG(ERR, EAL, "Faile to get current time\n");
rte_errno = errno;
@@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const 
struct timespec *ts,
 
if (check_input(req) == false)
return -1;
+
+   if (internal_config.no_shared_files) {
+   RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is 
disabled\n");
+   return 0;
+   }
+
if (gettimeofday(&now, NULL) < 0) {
RTE_LOG(ERR, EAL, "Faile to get current time\n");
rte_errno = errno;
@@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
return -1;
}
 
+   if (internal_config.no_shared_files) {
+   RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is 
disabled\n");
+   return 0;
+   }
+
return mp_send(msg, peer, MP_REP);
 }
-- 
2.17.0


[dpdk-dev] [RFC 09/10] eal: do not create runtime dir in no-shared-files mode

2018-05-31 Thread Anatoly Burakov
Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/bsdapp/eal/eal.c   | 3 ++-
 lib/librte_eal/linuxapp/eal/eal.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 4dff1804e..3ba2502cc 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv)
}
 
/* create runtime data directory */
-   if (eal_create_runtime_dir() < 0) {
+   if (internal_config.no_shared_files == 0 &&
+   eal_create_runtime_dir() < 0) {
rte_eal_init_alert("Cannot create runtime directory\n");
rte_errno = EACCES;
return -1;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 7904f813e..c0b2b1a5a 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -827,7 +827,8 @@ rte_eal_init(int argc, char **argv)
}
 
/* create runtime data directory */
-   if (eal_create_runtime_dir() < 0) {
+   if (internal_config.no_shared_files == 0 &&
+   eal_create_runtime_dir() < 0) {
rte_eal_init_alert("Cannot create runtime directory\n");
rte_errno = EACCES;
return -1;
-- 
2.17.0


[dpdk-dev] [RFC 02/10] eal: make --no-shconf an alias for --no-shared-files

2018-05-31 Thread Anatoly Burakov
Move all functionality associated with --no-shconf to
--no-shared-files, and make the former an alias for the latter.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/bsdapp/eal/eal.c|  4 ++--
 lib/librte_eal/common/eal_common_memory.c  |  3 ++-
 lib/librte_eal/common/eal_common_options.c |  8 ++--
 lib/librte_eal/common/eal_internal_cfg.h   |  1 -
 lib/librte_eal/common/eal_options.h|  2 +-
 lib/librte_eal/linuxapp/eal/eal.c  |  6 +++---
 test/test/test_eal_flags.c | 18 +-
 7 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index dc279542d..4dff1804e 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -222,7 +222,7 @@ rte_eal_config_create(void)
 
const char *pathname = eal_runtime_config_path();
 
-   if (internal_config.no_shconf)
+   if (internal_config.no_shared_files)
return;
 
if (mem_cfg_fd < 0){
@@ -261,7 +261,7 @@ rte_eal_config_attach(void)
void *rte_mem_cfg_addr;
const char *pathname = eal_runtime_config_path();
 
-   if (internal_config.no_shconf)
+   if (internal_config.no_shared_files)
return;
 
if (mem_cfg_fd < 0){
diff --git a/lib/librte_eal/common/eal_common_memory.c 
b/lib/librte_eal/common/eal_common_memory.c
index 4f0688f9d..a9c4b9b68 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -938,7 +938,8 @@ rte_eal_memory_init(void)
if (retval < 0)
goto fail;
 
-   if (internal_config.no_shconf == 0 && rte_eal_memdevice_init() < 0)
+   if (internal_config.no_shared_files == 0 &&
+   rte_eal_memdevice_init() < 0)
goto fail;
 
return 0;
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 38df094de..0f3eb928a 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -65,7 +65,7 @@ eal_long_options[] = {
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
-   {OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
+   {OPT_NO_SHCONF, 0, NULL, OPT_NO_SHARED_FILES_NUM  },
{OPT_NO_SHARED_FILES,   0, NULL, OPT_NO_SHARED_FILES_NUM  },
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
{OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM},
@@ -1162,10 +1162,6 @@ eal_parse_common_option(int opt, const char *optarg,
conf->vmware_tsc_map = 1;
break;
 
-   case OPT_NO_SHCONF_NUM:
-   conf->no_shconf = 1;
-   break;
-
case OPT_NO_SHARED_FILES_NUM:
conf->no_shared_files = 1;
break;
@@ -1382,6 +1378,6 @@ eal_common_usage(void)
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
-  "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
+  "  --"OPT_NO_SHCONF" Deprecated. Alias for 
--no-shared-files\n"
   "\n", RTE_MAX_LCORE);
 }
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index 3fc71bb49..d80bacd4d 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -40,7 +40,6 @@ struct internal_config {
volatile unsigned no_hpet;/**< true to disable HPET */
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping

* instead of native TSC */
-   volatile unsigned no_shconf;  /**< true if there is no shared 
config */
volatile unsigned no_shared_files; /**< true if there are no shared 
files to be created*/
volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices 
*/
volatile enum rte_proc_type_t process_type; /**< multi-process proc 
type */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index b0d9d6819..6890d4114 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -43,8 +43,8 @@ enum {
OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI"no-pci"
OPT_NO_PCI_NUM,
+/* no-shconf is an alias for no-shared-files */
 #define OPT_NO_SHCONF "no-shconf"
-   OPT_NO_SHCONF_NUM,
 #define OPT_NO_SHARED_FILES   "no-shared-files"
OPT_NO_SHARED_FILES_NUM,
 #define OPT_SOCKET_MEM"socket-mem"
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_e

[dpdk-dev] [RFC 07/10] eal: add support for no-shared-files for hugepage info

2018-05-31 Thread Anatoly Burakov
Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c   | 4 
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 
 2 files changed, 8 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c 
b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
index 836feb672..4b2f71c7e 100644
--- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
@@ -101,6 +101,10 @@ eal_hugepage_info_init(void)
hpi->num_pages[0] = num_buffers;
hpi->lock_descriptor = fd;
 
+   /* for no shared files mode, do not create shared memory config */
+   if (internal_config.no_shared_files)
+   return 0;
+
tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
sizeof(internal_config.hugepage_info));
if (tmp_hpi == NULL ) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7eca711ba..02b1c4ff1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -446,6 +446,10 @@ eal_hugepage_info_init(void)
if (hugepage_info_init() < 0)
return -1;
 
+   /* for no shared files mode, we're done */
+   if (internal_config.no_shared_files)
+   return 0;
+
hpi = &internal_config.hugepage_info[0];
 
tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
-- 
2.17.0


[dpdk-dev] [RFC 08/10] eal: add support for no-shared-files in hugepage data file

2018-05-31 Thread Anatoly Burakov
Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5e1810712..d7b43b5c1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -521,7 +521,18 @@ static void *
 create_shared_memory(const char *filename, const size_t mem_size)
 {
void *retval;
-   int fd = open(filename, O_CREAT | O_RDWR, 0666);
+   int fd;
+
+   /* if no shared files mode is used, create anonymous memory instead */
+   if (internal_config.no_shared_files) {
+   retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+   if (retval == MAP_FAILED)
+   return NULL;
+   return retval;
+   }
+
+   fd = open(filename, O_CREAT | O_RDWR, 0666);
if (fd < 0)
return NULL;
if (ftruncate(fd, mem_size) < 0) {
-- 
2.17.0


[dpdk-dev] [RFC 04/10] fbarray: support no-shared-files mode

2018-05-31 Thread Anatoly Burakov
When using --no-shared-files option, the expectation is that no
multiprocess will be supported as no shared files are created. However,
fbarray still creates some shared files that prevent multiple processes
with the same prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_fbarray.c | 71 +-
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c 
b/lib/librte_eal/common/eal_common_fbarray.c
index 019f84c18..69576c8a8 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -434,39 +434,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char 
*name, unsigned int len,
if (data == NULL)
goto fail;
 
-   eal_get_fbarray_path(path, sizeof(path), name);
+   if (internal_config.no_shared_files) {
+   /* remap virtual area as writable */
+   void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE,
+   MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+   if (new_data == MAP_FAILED) {
+   RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous 
memory: %s\n",
+   __func__, strerror(errno));
+   goto fail;
+   }
+   } else {
+   eal_get_fbarray_path(path, sizeof(path), name);
 
-   /*
-* Each fbarray is unique to process namespace, i.e. the filename
-* depends on process prefix. Try to take out a lock and see if we
-* succeed. If we don't, someone else is using it already.
-*/
-   fd = open(path, O_CREAT | O_RDWR, 0600);
-   if (fd < 0) {
-   RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__,
-   path, strerror(errno));
-   rte_errno = errno;
-   goto fail;
-   } else if (flock(fd, LOCK_EX | LOCK_NB)) {
-   RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__,
-   path, strerror(errno));
-   rte_errno = EBUSY;
-   goto fail;
-   }
+   /*
+* Each fbarray is unique to process namespace, i.e. the
+* filename depends on process prefix. Try to take out a lock
+* and see if we succeed. If we don't, someone else is using it
+* already.
+*/
+   fd = open(path, O_CREAT | O_RDWR, 0600);
+   if (fd < 0) {
+   RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n",
+   __func__, path, strerror(errno));
+   rte_errno = errno;
+   goto fail;
+   } else if (flock(fd, LOCK_EX | LOCK_NB)) {
+   RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n",
+   __func__, path, strerror(errno));
+   rte_errno = EBUSY;
+   goto fail;
+   }
 
-   /* take out a non-exclusive lock, so that other processes could still
-* attach to it, but no other process could reinitialize it.
-*/
-   if (flock(fd, LOCK_SH | LOCK_NB)) {
-   rte_errno = errno;
-   goto fail;
-   }
+   /* take out a non-exclusive lock, so that other processes could
+* still attach to it, but no other process could reinitialize
+* it.
+*/
+   if (flock(fd, LOCK_SH | LOCK_NB)) {
+   rte_errno = errno;
+   goto fail;
+   }
 
-   if (resize_and_map(fd, data, mmap_len))
-   goto fail;
+   if (resize_and_map(fd, data, mmap_len))
+   goto fail;
 
-   /* we've mmap'ed the file, we can now close the fd */
-   close(fd);
+   /* we've mmap'ed the file, we can now close the fd */
+   close(fd);
+   }
 
/* initialize the data */
memset(data, 0, mmap_len);
-- 
2.17.0


[dpdk-dev] [RFC 10/10] mem: enable memfd-based hugepage allocation

2018-05-31 Thread Anatoly Burakov
This will supplant no-shared-files mode to use memfd-based hugetlbfs
allocation instead of hugetlbfs mounts. Due to memfd only being
supported kernel 4.14+ and glibc 2.27+, a compile-time check is
performed along with runtime checks.

Signed-off-by: Anatoly Burakov 
---
 .../linuxapp/eal/eal_hugepage_info.c  | 136 ++
 lib/librte_eal/linuxapp/eal/eal_memalloc.c| 105 +-
 lib/librte_eal/linuxapp/eal/eal_memfd.h   |  28 
 lib/librte_eal/linuxapp/eal/eal_memory.c  |   4 +-
 4 files changed, 234 insertions(+), 39 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_memfd.h

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 02b1c4ff1..1a80ee0ee 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -30,6 +30,7 @@
 #include "eal_internal_cfg.h"
 #include "eal_hugepages.h"
 #include "eal_filesystem.h"
+#include "eal_memfd.h"
 
 static const char sys_dir_path[] = "/sys/kernel/mm/hugepages";
 static const char sys_pages_numa_dir_path[] = "/sys/devices/system/node";
@@ -313,11 +314,85 @@ compare_hpi(const void *a, const void *b)
return hpi_b->hugepage_sz - hpi_a->hugepage_sz;
 }
 
+static void
+calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent)
+{
+   uint64_t total_pages = 0;
+   unsigned int i;
+
+   /*
+* first, try to put all hugepages into relevant sockets, but
+* if first attempts fails, fall back to collecting all pages
+* in one socket and sorting them later
+*/
+   total_pages = 0;
+   /* we also don't want to do this for legacy init */
+   if (!internal_config.legacy_mem)
+   for (i = 0; i < rte_socket_count(); i++) {
+   int socket = rte_socket_id_by_idx(i);
+   unsigned int num_pages =
+   get_num_hugepages_on_node(
+   dirent->d_name, socket);
+   hpi->num_pages[socket] = num_pages;
+   total_pages += num_pages;
+   }
+   /*
+* we failed to sort memory from the get go, so fall
+* back to old way
+*/
+   if (total_pages == 0) {
+   hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+
+#ifndef RTE_ARCH_64
+   /* for 32-bit systems, limit number of hugepages to
+* 1GB per page size */
+   hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
+   RTE_PGSIZE_1G / hpi->hugepage_sz);
+#endif
+   }
+}
+
+static int
+check_memfd_pagesize_supported(uint64_t page_sz)
+{
+#ifdef MEMFD_SUPPORTED
+   int sz_flag, fd;
+
+   /* first, check if this particular pagesize is supported */
+   sz_flag = eal_memalloc_get_memfd_pagesize_flag(page_sz);
+   if (sz_flag == 0) {
+   RTE_LOG(ERR, EAL, "Unexpected memfd hugepage size: %"
+   PRIu64" bytes\n", page_sz);
+   return 0;
+   }
+
+   /* does currently running kernel support it? */
+   fd = memfd_create("memfd_test", sz_flag | MFD_HUGETLB);
+   if (fd >= 0) {
+   /* success */
+   close(fd);
+   return 1;
+   }
+   /* creating memfd failed, but if the error wasn't EINVAL, reserving of
+* hugepages via memfd is supported by the kernel
+*/
+   if (errno != EINVAL) {
+   return 1;
+   }
+   RTE_LOG(DEBUG, EAL, "Kernel does not support memfd hugepages of size %"
+   PRIu64" bytes\n", page_sz);
+#else
+   RTE_LOG(DEBUG, EAL, "Memfd hugepage support not enabled at compile 
time\n");
+   RTE_SET_USED(page_sz);
+#endif
+   return 0;
+}
+
 static int
 hugepage_info_init(void)
 {  const char dirent_start_text[] = "hugepages-";
const size_t dirent_start_len = sizeof(dirent_start_text) - 1;
-   unsigned int i, total_pages, num_sizes = 0;
+   unsigned int i, num_sizes = 0;
DIR *dir;
struct dirent *dirent;
 
@@ -343,6 +418,10 @@ hugepage_info_init(void)
hpi->hugepage_sz =
rte_str_to_size(&dirent->d_name[dirent_start_len]);
 
+   /* by default, memfd_hugepage_supported is 1 */
+   memfd_hugepage_supported &=
+   check_memfd_pagesize_supported(hpi->hugepage_sz);
+
/* first, check if we have a mountpoint */
if (get_hugepage_dir(hpi->hugepage_sz,
hpi->hugedir, sizeof(hpi->hugedir)) < 0) {
@@ -355,6 +434,23 @@ hugepage_info_init(void)
"%" PRIu64 " reserved, but no mounted "
"hugetlbfs found for that size\n",
num_pages, hpi->hugepage_sz);
+
+   

[dpdk-dev] [PATCH] app/testpmd: fix missing count action fields

2018-05-31 Thread Nelio Laranjeiro
COUNT action has been modified and has several fields not addressable
though testpmd.  In addition, as those fields are not definable testpmd
is providing an empty configuration which is undefined.

Fixes: fb8fd96d4251 ("ethdev: add shared counter to flow API")
Cc: declan.dohe...@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Nelio Laranjeiro 
---
 app/test-pmd/cmdline_flow.c  | 29 +++--
 lib/librte_ethdev/rte_flow.c |  2 +-
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 9918d7fda..934cf7e90 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -194,6 +194,8 @@ enum index {
ACTION_QUEUE_INDEX,
ACTION_DROP,
ACTION_COUNT,
+   ACTION_COUNT_SHARED,
+   ACTION_COUNT_ID,
ACTION_RSS,
ACTION_RSS_FUNC,
ACTION_RSS_LEVEL,
@@ -788,6 +790,13 @@ static const enum index action_queue[] = {
ZERO,
 };
 
+static const enum index action_count[] = {
+   ACTION_COUNT_ID,
+   ACTION_COUNT_SHARED,
+   ACTION_NEXT,
+   ZERO,
+};
+
 static const enum index action_rss[] = {
ACTION_RSS_FUNC,
ACTION_RSS_LEVEL,
@@ -2022,10 +2031,26 @@ static const struct token token_list[] = {
[ACTION_COUNT] = {
.name = "count",
.help = "enable counters for this rule",
-   .priv = PRIV_ACTION(COUNT, 0),
-   .next = NEXT(NEXT_ENTRY(ACTION_NEXT)),
+   .priv = PRIV_ACTION(COUNT,
+   sizeof(struct rte_flow_action_count)),
+   .next = NEXT(action_count),
.call = parse_vc,
},
+   [ACTION_COUNT_ID] = {
+   .name = "identifier",
+   .help = "counter identifier to use",
+   .next = NEXT(action_count, NEXT_ENTRY(UNSIGNED)),
+   .args = ARGS(ARGS_ENTRY(struct rte_flow_action_count, id)),
+   .call = parse_vc_conf,
+   },
+   [ACTION_COUNT_SHARED] = {
+   .name = "shared",
+   .help = "shared counter",
+   .next = NEXT(action_count, NEXT_ENTRY(BOOLEAN)),
+   .args = ARGS(ARGS_ENTRY_BF(struct rte_flow_action_count,
+  shared, 1)),
+   .call = parse_vc_conf,
+   },
[ACTION_RSS] = {
.name = "rss",
.help = "spread packets among several queues",
diff --git a/lib/librte_ethdev/rte_flow.c b/lib/librte_ethdev/rte_flow.c
index b2afba089..2e87e59f3 100644
--- a/lib/librte_ethdev/rte_flow.c
+++ b/lib/librte_ethdev/rte_flow.c
@@ -84,7 +84,7 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] 
= {
MK_FLOW_ACTION(FLAG, 0),
MK_FLOW_ACTION(QUEUE, sizeof(struct rte_flow_action_queue)),
MK_FLOW_ACTION(DROP, 0),
-   MK_FLOW_ACTION(COUNT, 0),
+   MK_FLOW_ACTION(COUNT, sizeof(struct rte_flow_action_count)),
MK_FLOW_ACTION(RSS, sizeof(struct rte_flow_action_rss)),
MK_FLOW_ACTION(PF, 0),
MK_FLOW_ACTION(VF, sizeof(struct rte_flow_action_vf)),
-- 
2.17.1



[dpdk-dev] [RFC 03/10] eal: make --huge-unlink an alias for --no-shared-files

2018-05-31 Thread Anatoly Burakov
Move all functionality associated with --huge-unlink command-line
option to --no-shared-files, and make it an alias. Since the new
command-line option does things other than just unlinking hugepage
files after they've been created, it is no longer incompatible with
--no-huge option, so removing that check as well.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_options.c | 14 ++
 lib/librte_eal/common/eal_internal_cfg.h   |  1 -
 lib/librte_eal/common/eal_options.h|  5 ++---
 lib/librte_eal/linuxapp/eal/eal_memory.c   |  2 +-
 4 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 0f3eb928a..63e562bdb 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -57,7 +57,7 @@ eal_long_options[] = {
{OPT_FILE_PREFIX,   1, NULL, OPT_FILE_PREFIX_NUM  },
{OPT_HELP,  0, NULL, OPT_HELP_NUM },
{OPT_HUGE_DIR,  1, NULL, OPT_HUGE_DIR_NUM },
-   {OPT_HUGE_UNLINK,   0, NULL, OPT_HUGE_UNLINK_NUM  },
+   {OPT_HUGE_UNLINK,   0, NULL, OPT_NO_SHARED_FILES_NUM  },
{OPT_LCORES,1, NULL, OPT_LCORES_NUM   },
{OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM},
{OPT_MASTER_LCORE,  1, NULL, OPT_MASTER_LCORE_NUM },
@@ -1140,10 +1140,6 @@ eal_parse_common_option(int opt, const char *optarg,
break;
 
/* long options */
-   case OPT_HUGE_UNLINK_NUM:
-   conf->hugepage_unlink = 1;
-   break;
-
case OPT_NO_HUGE_NUM:
conf->no_hugetlbfs = 1;
/* no-huge is legacy mem */
@@ -1318,12 +1314,6 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
return -1;
}
 
-   if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
-   RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
-   "be specified together with --"OPT_NO_HUGE"\n");
-   return -1;
-   }
-
return 0;
 }
 
@@ -1374,7 +1364,7 @@ eal_common_usage(void)
   "  --"OPT_NO_SHARED_FILES"   Do not create any shared files 
(config, hugetlbfs, etc.).\n"
   "  This disables secondary process support\n"
   "\nEAL options for DEBUG use only:\n"
-  "  --"OPT_HUGE_UNLINK"   Unlink hugepage files after init\n"
+  "  --"OPT_HUGE_UNLINK"   Deprecated. Alias for 
--no-shared-files\n"
   "  --"OPT_NO_HUGE"   Use malloc instead of hugetlbfs\n"
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index d80bacd4d..887a6a8e2 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -35,7 +35,6 @@ struct internal_config {
volatile unsigned force_nchannel; /**< force number of channels */
volatile unsigned force_nrank;/**< force number of ranks */
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
-   unsigned hugepage_unlink; /**< true to unlink backing files */
volatile unsigned no_pci; /**< true to disable PCI */
volatile unsigned no_hpet;/**< true to disable HPET */
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index 6890d4114..aef696c92 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -25,8 +25,6 @@ enum {
OPT_FILE_PREFIX_NUM,
 #define OPT_HUGE_DIR  "huge-dir"
OPT_HUGE_DIR_NUM,
-#define OPT_HUGE_UNLINK   "huge-unlink"
-   OPT_HUGE_UNLINK_NUM,
 #define OPT_LCORES"lcores"
OPT_LCORES_NUM,
 #define OPT_LOG_LEVEL "log-level"
@@ -43,7 +41,8 @@ enum {
OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI"no-pci"
OPT_NO_PCI_NUM,
-/* no-shconf is an alias for no-shared-files */
+/* huge-unlink and no-shconf are alias for no-shared-files */
+#define OPT_HUGE_UNLINK   "huge-unlink"
 #define OPT_NO_SHCONF "no-shconf"
 #define OPT_NO_SHARED_FILES   "no-shared-files"
OPT_NO_SHARED_FILES_NUM,
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index c917de1c2..5e1810712 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1547,7 +1547,7 @@ eal_legacy_hugepage_init(void)
}
 
/* free the hugepage backing files */
-   if (internal_config.hugepage_unlink &&
+   if (internal_config.no_shared_files &&
u

[dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add

2018-05-31 Thread Radu Nicolau
Add a call to rte_eth_link_get_nowait on every slave to update
the internal link status struct. Otherwise slave add will fail
for mode 4 if the ports are all stopped but only one of them checked.

Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52

Signed-off-by: Radu Nicolau 
---
v2: add fix and Bugzilla references

 drivers/net/bonding/rte_eth_bond_api.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index d558df8..cad08b9 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, 
uint16_t slave_port_id)
return -1;
}
 
+   rte_eth_link_get_nowait(slave_port_id, &link_props);
+
slave_add(internals, slave_eth_dev);
 
/* We need to store slaves reta_size to be able to synchronize RETA for 
all
-- 
2.7.5



Re: [dpdk-dev] [PATCH 1/2] librte_ip_frag: add function to delete expired entries

2018-05-31 Thread Alex Kiselev
Hi Konstantin.

> Hi Alex,

>> -Original Message-
>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Alex Kiselev
>> Sent: Wednesday, May 16, 2018 12:04 PM
>> To: dev@dpdk.org; Burakov, Anatoly 
>> Subject: [dpdk-dev] [PATCH 1/2] librte_ip_frag: add function to delete 
>> expired entries

>> add new function rte_frag_table_del_expired_entries()
>> that scans the list of recently used packets and delete the expired ones.

>> A fragmented packets is supposed to live no longer than max_cycles,
>> but the lib deletes an expired packet only occasionally when it scans
>> a bucket to find an empty slot while adding a new packet.
>> Therefore a fragment might sit in the table forever.

>> Signed-off-by: Alex Kiselev 
>> ---
>>  lib/librte_ip_frag/ip_frag_common.h| 18 
>>  lib/librte_ip_frag/ip_frag_internal.c  | 18 
>>  lib/librte_ip_frag/rte_ip_frag.h   | 19 +++-
>>  lib/librte_ip_frag/rte_ip_frag_common.c| 46 
>> ++
>>  lib/librte_ip_frag/rte_ip_frag_version.map |  6 
>>  5 files changed, 88 insertions(+), 19 deletions(-)

>> diff --git a/lib/librte_ip_frag/ip_frag_common.h 
>> b/lib/librte_ip_frag/ip_frag_common.h
>> index 197acf8d8..0fdcc7d0f 100644
>> --- a/lib/librte_ip_frag/ip_frag_common.h
>> +++ b/lib/librte_ip_frag/ip_frag_common.h
>> @@ -25,6 +25,12 @@
>>  #define IPv6_KEY_BYTES_FMT \
>>   "%08" PRIx64 "%08" PRIx64 "%08" PRIx64 "%08" PRIx64

>> +#ifdef RTE_LIBRTE_IP_FRAG_TBL_STAT
>> +#define  IP_FRAG_TBL_STAT_UPDATE(s, f, v)((s)->f += (v))
>> +#else
>> +#define  IP_FRAG_TBL_STAT_UPDATE(s, f, v)do {} while (0)
>> +#endif /* IP_FRAG_TBL_STAT */
>> +
>>  /* internal functions declarations */
>>  struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp,
>>   struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
>> @@ -149,4 +155,16 @@ ip_frag_reset(struct ip_frag_pkt *fp, uint64_t tms)
>>   fp->frags[IP_FIRST_FRAG_IDX] = zero_frag;
>>  }

>> +/* local frag table helper functions */
>> +static inline void
>> +ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
>> *dr,
>> + struct ip_frag_pkt *fp)
>> +{
>> + ip_frag_free(fp, dr);
>> + ip_frag_key_invalidate(&fp->key);
>> + TAILQ_REMOVE(&tbl->lru, fp, lru);
>> + tbl->use_entries--;
>> + IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, del_num, 1);
>> +}
>> +
>>  #endif /* _IP_FRAG_COMMON_H_ */
>> diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
>> b/lib/librte_ip_frag/ip_frag_internal.c
>> index 2560c7713..97470a872 100644
>> --- a/lib/librte_ip_frag/ip_frag_internal.c
>> +++ b/lib/librte_ip_frag/ip_frag_internal.c
>> @@ -14,24 +14,6 @@
>>  #define  IP_FRAG_TBL_POS(tbl, sig)   \
>>   ((tbl)->pkt + ((sig) & (tbl)->entry_mask))

>> -#ifdef RTE_LIBRTE_IP_FRAG_TBL_STAT
>> -#define  IP_FRAG_TBL_STAT_UPDATE(s, f, v)((s)->f += (v))
>> -#else
>> -#define  IP_FRAG_TBL_STAT_UPDATE(s, f, v)do {} while (0)
>> -#endif /* IP_FRAG_TBL_STAT */
>> -
>> -/* local frag table helper functions */
>> -static inline void
>> -ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
>> *dr,
>> - struct ip_frag_pkt *fp)
>> -{
>> - ip_frag_free(fp, dr);
>> - ip_frag_key_invalidate(&fp->key);
>> - TAILQ_REMOVE(&tbl->lru, fp, lru);
>> - tbl->use_entries--;
>> - IP_FRAG_TBL_STAT_UPDATE(&tbl->stat, del_num, 1);
>> -}
>> -
>>  static inline void
>>  ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl,  struct ip_frag_pkt *fp,
>>   const struct ip_frag_key *key, uint64_t tms)
>> diff --git a/lib/librte_ip_frag/rte_ip_frag.h 
>> b/lib/librte_ip_frag/rte_ip_frag.h
>> index b3f3f78df..3c694df92 100644
>> --- a/lib/librte_ip_frag/rte_ip_frag.h
>> +++ b/lib/librte_ip_frag/rte_ip_frag.h
>> @@ -65,10 +65,13 @@ struct ip_frag_pkt {

>>  #define IP_FRAG_DEATH_ROW_LEN 32 /**< death row size (in packets) */

>> +/* death row size in mbufs */
>> +#define IP_FRAG_DEATH_ROW_MBUF_LEN (IP_FRAG_DEATH_ROW_LEN * 
>> (IP_MAX_FRAG_NUM + 1))
>> +
>>  /** mbuf death row (packets to be freed) */
>>  struct rte_ip_frag_death_row {
>>   uint32_t cnt;  /**< number of mbufs currently on death row */
>> - struct rte_mbuf *row[IP_FRAG_DEATH_ROW_LEN * (IP_MAX_FRAG_NUM + 1)];
>> + struct rte_mbuf *row[IP_FRAG_DEATH_ROW_MBUF_LEN];
>>   /**< mbufs to be freed */
>>  };

>> @@ -325,6 +328,20 @@ void rte_ip_frag_free_death_row(struct 
>> rte_ip_frag_death_row *dr,
>>  void
>>  rte_ip_frag_table_statistics_dump(FILE * f, const struct rte_ip_frag_tbl 
>> *tbl);

>> +/**
>> + * Delete expired fragments
>> + *
>> + * @param tbl
>> + *   Table to delete expired fragments from
>> + * @param dr
>> + *   Death row to free buffers to
>> + * @param tms
>> + *   Current timestamp
>> + */
>> +void __rte_experimental
>> +rte_frag_table_del_expired_entries(struct rte_ip_frag_tbl *tbl,
>> + struct rte_ip_frag_death_row *dr, uint64_t tms);
>> +
>>  #ifdef __cplusp

Re: [dpdk-dev] [PATCH v1 01/24] net/ena: update ena_com to the newer version

2018-05-31 Thread Ferruh Yigit
On 5/9/2018 1:45 PM, Michal Krawczyk wrote:
> ena_com is the HAL provided by the vendor and it shouldn't be modified
> by the driver developers.
> 
> The PMD and platform file was adjusted for the new version of the
> ena_com:
> * Do not use deprecated meta descriptor fields
> * Add empty AENQ handler structure with unimplemented handlers
> * Add memzone allocations count to ena_ethdev.c file - it was
>   removed from ena_com.c file
> * Add new macros used in new ena_com files
> * Use error code ENA_COM_UNSUPPORTED instead of ENA_COM_PERMISSION
> 
> Signed-off-by: Michal Krawczyk 
> Signed-off-by: Rafal Kozik 

Hi Michał, Marcin, Guy, Evgeny,

Can you please send a new version rebased on top of latest next-net master?

Patchset gives conflicts, it is not hard to resolve but some of them are related
to the removed offload checks and invalidates the patch, it is better to you
guys to decide on it.

Thanks,
ferruh


[dpdk-dev] [PATCH] hash: validate hash bucket entries while compiling

2018-05-31 Thread Honnappa Nagarahalli
Validate RTE_HASH_BUCKET_ENTRIES during compilation instead of
run time.

Signed-off-by: Honnappa Nagarahalli 
Reviewed-by: Gavin Hu 
---
 lib/librte_eal/common/include/rte_common.h | 5 +
 lib/librte_hash/rte_cuckoo_hash.c  | 1 -
 lib/librte_hash/rte_cuckoo_hash.h  | 4 
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_common.h 
b/lib/librte_eal/common/include/rte_common.h
index 434adfd45..a9df7c161 100644
--- a/lib/librte_eal/common/include/rte_common.h
+++ b/lib/librte_eal/common/include/rte_common.h
@@ -293,6 +293,11 @@ rte_combine64ms1b(register uint64_t v)
 
 /*** Macros to work with powers of 2 /
 
+/**
+ * Macro to return 1 if n is a power of 2, 0 otherwise
+ */
+#define RTE_IS_POWER_OF_2(n) ((n) && !(((n) - 1) & (n)))
+
 /**
  * Returns true if n is a power of 2
  * @param n
diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index a07543a29..375e7d208 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -107,7 +107,6 @@ rte_hash_create(const struct rte_hash_parameters *params)
/* Check for valid parameters */
if ((params->entries > RTE_HASH_ENTRIES_MAX) ||
(params->entries < RTE_HASH_BUCKET_ENTRIES) ||
-   !rte_is_power_of_2(RTE_HASH_BUCKET_ENTRIES) ||
(params->key_len == 0)) {
rte_errno = EINVAL;
RTE_LOG(ERR, HASH, "rte_hash_create has invalid parameters\n");
diff --git a/lib/librte_hash/rte_cuckoo_hash.h 
b/lib/librte_hash/rte_cuckoo_hash.h
index 7a54e5557..bd6ad1bd6 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -97,6 +97,10 @@ enum add_key_case {
 /** Number of items per bucket. */
 #define RTE_HASH_BUCKET_ENTRIES8
 
+#if !RTE_IS_POWER_OF_2(RTE_HASH_BUCKET_ENTRIES)
+#error RTE_HASH_BUCKET_ENTRIES must be a power of 2
+#endif
+
 #define NULL_SIGNATURE 0
 
 #define EMPTY_SLOT 0
-- 
2.14.1



Re: [dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add

2018-05-31 Thread Ferruh Yigit
On 5/31/2018 3:34 PM, Radu Nicolau wrote:

I can see you just prefix "fix" to the title without updating it :)

What about following one:
"net/bonding: fix slave add for mode 4" ?

> Add a call to rte_eth_link_get_nowait on every slave to update
> the internal link status struct. Otherwise slave add will fail
> for mode 4 if the ports are all stopped but only one of them checked.

What is the link related expectation from slaves in mode 4?

What does "if the ports are all stopped but only one of them checked" mean, why
checking only one of them?

> 
> Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
> Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52
> 
> Signed-off-by: Radu Nicolau 
> ---
> v2: add fix and Bugzilla references
> 
>  drivers/net/bonding/rte_eth_bond_api.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
> b/drivers/net/bonding/rte_eth_bond_api.c
> index d558df8..cad08b9 100644
> --- a/drivers/net/bonding/rte_eth_bond_api.c
> +++ b/drivers/net/bonding/rte_eth_bond_api.c
> @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, 
> uint16_t slave_port_id)
>   return -1;
>   }
>  
> + rte_eth_link_get_nowait(slave_port_id, &link_props);
> +

The error seems in link_properties_valid(), does it make sense to get link info
inside that function before link checks?

>   slave_add(internals, slave_eth_dev);
>  
>   /* We need to store slaves reta_size to be able to synchronize RETA for 
> all
> 



[dpdk-dev] [PATCH] eal: move runtime config file to new location

2018-05-31 Thread Anatoly Burakov
As per deprecation notice [1], move DPDK runtime config to default
DPDK runtime data location. Also, remove the deprecation notice.

[1] http://dpdk.org/dev/patchwork/patch/40418/

Signed-off-by: Anatoly Burakov 
---
 doc/guides/rel_notes/deprecation.rst   | 10 --
 lib/librte_eal/common/eal_filesystem.h | 10 +++---
 2 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 1ce692eac..ff15baa3f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,16 +8,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 ---
 
-* eal: DPDK runtime configuration file (located at
-  ``/var/run/._config``) will be moved. The new path will be as 
follows:
-
-  - if DPDK is running as root, path will be set to
-``/var/run/dpdk//config``
-  - if DPDK is not running as root and $XDG_RUNTIME_DIR is set, path will be 
set
-to ``$XDG_RUNTIME_DIR/dpdk//config``
-  - if DPDK is not running as root and $XDG_RUNTIME_DIR is not set, path will 
be
-set to ``/tmp/dpdk//config``
-
 * eal: both declaring and identifying devices will be streamlined in v18.08.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/lib/librte_eal/common/eal_filesystem.h 
b/lib/librte_eal/common/eal_filesystem.h
index 364f38d13..de05febf4 100644
--- a/lib/librte_eal/common/eal_filesystem.h
+++ b/lib/librte_eal/common/eal_filesystem.h
@@ -12,7 +12,6 @@
 #define EAL_FILESYSTEM_H
 
 /** Path of rte config file. */
-#define RUNTIME_CONFIG_FMT "%s/.%s_config"
 
 #include 
 #include 
@@ -30,17 +29,14 @@ eal_create_runtime_dir(void);
 const char *
 eal_get_runtime_dir(void);
 
+#define RUNTIME_CONFIG_FNAME "config"
 static inline const char *
 eal_runtime_config_path(void)
 {
static char buffer[PATH_MAX]; /* static so auto-zeroed */
-   const char *directory = "/var/run";
-   const char *home_dir = getenv("HOME");
 
-   if (getuid() != 0 && home_dir != NULL)
-   directory = home_dir;
-   snprintf(buffer, sizeof(buffer) - 1, RUNTIME_CONFIG_FMT, directory,
-   internal_config.hugefile_prefix);
+   snprintf(buffer, sizeof(buffer) - 1, "%s/%s", eal_get_runtime_dir(),
+   RUNTIME_CONFIG_FNAME);
return buffer;
 }
 
-- 
2.17.0


Re: [dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add

2018-05-31 Thread Ferruh Yigit
On 5/31/2018 4:34 PM, Ferruh Yigit wrote:
> On 5/31/2018 3:34 PM, Radu Nicolau wrote:
> 
> I can see you just prefix "fix" to the title without updating it :)
> 
> What about following one:
> "net/bonding: fix slave add for mode 4" ?
> 
>> Add a call to rte_eth_link_get_nowait on every slave to update
>> the internal link status struct. Otherwise slave add will fail
>> for mode 4 if the ports are all stopped but only one of them checked.
> 
> What is the link related expectation from slaves in mode 4?
> 
> What does "if the ports are all stopped but only one of them checked" mean, 
> why
> checking only one of them?
> 
>>
>> Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
>> Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52

Bugzilla ID: 52

btw, can you please send new version as reply to previous version?

>>
>> Signed-off-by: Radu Nicolau 
>> ---
>> v2: add fix and Bugzilla references
>>
>>  drivers/net/bonding/rte_eth_bond_api.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
>> b/drivers/net/bonding/rte_eth_bond_api.c
>> index d558df8..cad08b9 100644
>> --- a/drivers/net/bonding/rte_eth_bond_api.c
>> +++ b/drivers/net/bonding/rte_eth_bond_api.c
>> @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, 
>> uint16_t slave_port_id)
>>  return -1;
>>  }
>>  
>> +rte_eth_link_get_nowait(slave_port_id, &link_props);
>> +
> 
> The error seems in link_properties_valid(), does it make sense to get link 
> info
> inside that function before link checks?
> 
>>  slave_add(internals, slave_eth_dev);
>>  
>>  /* We need to store slaves reta_size to be able to synchronize RETA for 
>> all
>>
> 



Re: [dpdk-dev] [PATCH] net/tap: update tap index to unsgined

2018-05-31 Thread Ferruh Yigit
On 5/15/2018 1:36 PM, Wiles, Keith wrote:
> 
> 
>> On May 12, 2018, at 1:30 AM, Vipin Varghese  wrote:
>>
>> Updating the logic to reflect unsigned integer as index for TAP PMD.
>>
>> Signed-off-by: Vipin Varghese 

<...>

> Acked by Keith Wiles

Repeating ack with "-" to help patchwork:
Acked-by: Keith Wiles 

Applied to dpdk-next-net/master, thanks.


[dpdk-dev] [PATCH] mem: mark pages as freeable on exit

2018-05-31 Thread Anatoly Burakov
When rte_eal_cleanup() is called, it is expected that DPDK will be able to
release all of its memory back to the system. However, if pages are marked
as unfreeable, the pages will not be released back. Fix this to mark all
pages as freeable on calling rte_eal_cleanup(), but only do it for primary
process, as secondaries can come and go.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 8655b8691..987b57f87 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -1044,9 +1044,26 @@ rte_eal_init(int argc, char **argv)
return fctret;
 }
 
+static int
+mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+   void *arg __rte_unused)
+{
+   /* ms is const, so find this memseg */
+   struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+
+   found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
+
+   return 0;
+}
+
 int __rte_experimental
 rte_eal_cleanup(void)
 {
+   /* if we're in a primary process, we need to mark hugepages as freeable
+* so that finalization can release them back to the system.
+*/
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   rte_memseg_walk(mark_freeable, NULL);
rte_service_finalize();
return 0;
 }
-- 
2.17.0


Re: [dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add

2018-05-31 Thread Radu Nicolau




On 5/31/2018 4:36 PM, Ferruh Yigit wrote:

On 5/31/2018 4:34 PM, Ferruh Yigit wrote:

On 5/31/2018 3:34 PM, Radu Nicolau wrote:

I can see you just prefix "fix" to the title without updating it :)

What about following one:
"net/bonding: fix slave add for mode 4" ?

Great, I'll use it for v3 :)




Add a call to rte_eth_link_get_nowait on every slave to update
the internal link status struct. Otherwise slave add will fail
for mode 4 if the ports are all stopped but only one of them checked.

What is the link related expectation from slaves in mode 4?

To be identical across all ports


What does "if the ports are all stopped but only one of them checked" mean, why
checking only one of them?
This is the behavior of testpmd, stop getting the link status after the 
first down port; but this should not affect bonding, so there is no need 
to update testpmd.





Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52

Bugzilla ID: 52

btw, can you please send new version as reply to previous version?

Sure.




Signed-off-by: Radu Nicolau 
---
v2: add fix and Bugzilla references

  drivers/net/bonding/rte_eth_bond_api.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index d558df8..cad08b9 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, 
uint16_t slave_port_id)
return -1;
}
  
+	rte_eth_link_get_nowait(slave_port_id, &link_props);

+

The error seems in link_properties_valid(), does it make sense to get link info
inside that function before link checks?
Not really, as one might expect that link_properties_valid will only 
test the struct rte_eth_link *slave_link argument, not update it.





slave_add(internals, slave_eth_dev);
  
  	/* We need to store slaves reta_size to be able to synchronize RETA for all






[dpdk-dev] [PATCH] test/test: properly clean up on exit

2018-05-31 Thread Anatoly Burakov
The test application didn't call rte_eal_cleanup() on exit, which
caused leftover hugepages and memory leaks when running secondary
processes. Fix this by calling rte_eal_cleanup() on exit.

Signed-off-by: Anatoly Burakov 
---
 test/test/test.c | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/test/test/test.c b/test/test/test.c
index 44dfe20ef..ffa9c3669 100644
--- a/test/test/test.c
+++ b/test/test/test.c
@@ -84,22 +84,29 @@ main(int argc, char **argv)
int ret;
 
ret = rte_eal_init(argc, argv);
-   if (ret < 0)
-   return -1;
+   if (ret < 0) {
+   ret = -1;
+   goto out;
+   }
 
 #ifdef RTE_LIBRTE_TIMER
rte_timer_subsystem_init();
 #endif
 
-   if (commands_init() < 0)
-   return -1;
+   if (commands_init() < 0) {
+   ret = -1;
+   goto out;
+   }
 
argv += ret;
 
prgname = argv[0];
 
-   if ((recursive_call = getenv(RECURSIVE_ENV_VAR)) != NULL)
-   return do_recursive_call();
+   recursive_call = getenv(RECURSIVE_ENV_VAR);
+   if (recursive_call != NULL) {
+   ret = do_recursive_call();
+   goto out;
+   }
 
 #ifdef RTE_LIBEAL_USE_HPET
if (rte_eal_hpet_init(1) < 0)
@@ -111,7 +118,8 @@ main(int argc, char **argv)
 #ifdef RTE_LIBRTE_CMDLINE
cl = cmdline_stdin_new(main_ctx, "RTE>>");
if (cl == NULL) {
-   return -1;
+   ret = -1;
+   goto out;
}
 
char *dpdk_test = getenv("DPDK_TEST");
@@ -120,18 +128,23 @@ main(int argc, char **argv)
snprintf(buf, sizeof(buf), "%s\n", dpdk_test);
if (cmdline_in(cl, buf, strlen(buf)) < 0) {
printf("error on cmdline input\n");
-   return -1;
+   ret = -1;
+   goto out;
}
 
cmdline_stdin_exit(cl);
-   return last_test_result;
+   ret = last_test_result;
+   goto out;
}
/* if no DPDK_TEST env variable, go interactive */
cmdline_interact(cl);
cmdline_stdin_exit(cl);
 #endif
+   ret = 0;
 
-   return 0;
+out:
+   rte_eal_cleanup();
+   return ret;
 }
 
 
-- 
2.17.0


[dpdk-dev] [PATCH v3] net/bonding: fix slave add for mode 4

2018-05-31 Thread Radu Nicolau
Add a call to rte_eth_link_get_nowait on every slave to update
the internal link status struct. Otherwise slave add will fail
for mode 4 if the ports are all stopped but only one of them checked.

Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
Bugzilla ID: 52

Signed-off-by: Radu Nicolau 
---
v3: updated commit msg
v2: add fix and Bugzilla references

 drivers/net/bonding/rte_eth_bond_api.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index d558df8..cad08b9 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, 
uint16_t slave_port_id)
return -1;
}
 
+   rte_eth_link_get_nowait(slave_port_id, &link_props);
+
slave_add(internals, slave_eth_dev);
 
/* We need to store slaves reta_size to be able to synchronize RETA for 
all
-- 
2.7.5



Re: [dpdk-dev] [PATCH] net/thunderx: add support for hardware first skip feature

2018-05-31 Thread Ferruh Yigit
On 5/30/2018 7:41 AM, Rakesh K wrote:
> 
> 
> On Monday 28 May 2018 07:14 PM, Ferruh Yigit wrote:
>> On 5/28/2018 1:57 PM, rkudurumalla wrote:
>>> This feature is used to create a hole between HEADROOM
>>> and actual data.Size of hole is specified in bytes as
>>> module param to pmd
>>
>> Can't mbuf private area be used? It is between HEADROOM and mbuf header.
> 
> data inserted in the hole will be part of the packet data. One of the
> use cases is inserting VLAN header for each packet received before it is
> being forwarded without having to move the packet data

Cc'ed Olivier.

Is this something should be addressed in mbuf level instead of PMD via devarg?

>>
>>>
>>> Signed-off-by: Rakesh Kudurumalla 
>>
>> <...>
>>



[dpdk-dev] [RFC] net/mlx4: add TSO support

2018-05-31 Thread Moti Haimovsky
TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP
network stack to delegate segmentation of a TCP segment to the NIC,
thus saving compute resources.

This RFC proposes to add support for TSO to the MLX4 PMD.

Prerequisites:
In order for the PMD to recognize the TSO capabilities of the device
one has to use:
* RDMA-core v18.0 or above.
* Linux kernel 4.16 or above.

Assumptions:
* mlx4 PMD will follow the TSO support implemented in mlx5 PMD. 
* PMD is backwards compatible.
  ** The PMD will continue work with the kernels and RDMA-core
 supported by it today.
  ** The PMD will continue to work with devices not supporting TSO. 

Changes proposed in the PMD for implementing TSO:
* At init, query the device for TSO support and MAX segment size
  being supported.
  This will also determine if the PMD will advertise support for TSO
  (dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;)
* Calling create-qp when creating a Tx queue will have to consider
  the MAX TSO header size when calculating the actual queue buffer
  size. This may be abstracted by calling ibv_create_qp_ex with
  IBV_QP_INIT_ATTR_MAX_TSO_HEADER as comp flag rather than
  ibv_create_qp.
  If this breaks backwards compatibility then this calculation will
  be done in the PMD code.
* Modify tx_burst function to:
  **  Check for TSO flag indication in the packets of the packet burst
  (buf->ol_flags & PKT_TX_TCP_SEG).
  **  For TSO packet create the WQE appropriate for sending a TSO packet
  and fill it with packet info and  L2/L3/L4 Headers.
* Modify Tx completion function to handle releasing of TSO packet
  buffers that were transmitted.

Concerns:
* Impact of changing Tx send routine on performance.
  The performance of the tx_burst routine for non-TSO packets may be
  affected just by placing the code that handles TSO packets in it,
  so we may want to consider having a dedicated routine for TSO packets.
* No MAX-TSO parameter.
   This is a cross-PMD issue that may need a separate mailing thread to handle.
   As for today there is no way for the PMD to advertise the MAX-TSO
   it or its HW support as done with other capabilities.
   (The indirection table size for example.
see rte_eth_dev_info.reta_size in rte_ethdev.h).
   Also there is no DPDK parameter or constant value that the PMD
   can use in order to know the MAX-TSO the system requires.
   This prevents applications from determining the MAX-TSO that can be
   used leading to configuration mismatches that may lead to transmit
   failures or to less-than-optimize TSO configuration in the best case.
   I propose to add a max_tso field in rte_eth_dev_info that will allow
   the PMD to advertise the max tso is supports. This can be used by
   DPDK applications to determine what TSO size to use.
   If this is a major change that cannot fit the 18.08 schedule then
   I propose to add a MAX_TSO constant in rte_ethdev.h, The PMD will
   compare this value whit its own MAX-TSO and if it cannot meet the
   defined value it will not advertise that it is a TSO capable device.
* Handling packets longer then MAX-TSO
   In case a PMD is requested to send a TSO packet which is longer than
   MAX-TSO the PMD send routine should return with an error.
   A different approach that can be used on the future is to apply GSO
   to those packets using the GSO lib in DPDK.

I am interested in general design comments and concerns listed above.

Signed-off-by: Moti Haimovsky 

-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v2] net/bonding: fix update link status on slave add

2018-05-31 Thread Ferruh Yigit
On 5/31/2018 5:13 PM, Radu Nicolau wrote:
> 
> 
> On 5/31/2018 4:36 PM, Ferruh Yigit wrote:
>> On 5/31/2018 4:34 PM, Ferruh Yigit wrote:
>>> On 5/31/2018 3:34 PM, Radu Nicolau wrote:
>>>
>>> I can see you just prefix "fix" to the title without updating it :)
>>>
>>> What about following one:
>>> "net/bonding: fix slave add for mode 4" ?
> Great, I'll use it for v3 :)
> 
>>>
 Add a call to rte_eth_link_get_nowait on every slave to update
 the internal link status struct. Otherwise slave add will fail
 for mode 4 if the ports are all stopped but only one of them checked.
>>> What is the link related expectation from slaves in mode 4?
> To be identical across all ports
>>>
>>> What does "if the ports are all stopped but only one of them checked" mean, 
>>> why
>>> checking only one of them?
> This is the behavior of testpmd, stop getting the link status after the 
> first down port; but this should not affect bonding, so there is no need 
> to update testpmd.

I see, when this link updating happens in this bonding issue context? When
bonding device created?

Should we update testpmd behavior too?

> 
>>>
 Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
 Bugzilla entry: https://dpdk.org/tracker/show_bug.cgi?id=52
>> Bugzilla ID: 52
>>
>> btw, can you please send new version as reply to previous version?
> Sure.
> 
>>
 Signed-off-by: Radu Nicolau 
 ---
 v2: add fix and Bugzilla references

   drivers/net/bonding/rte_eth_bond_api.c | 2 ++
   1 file changed, 2 insertions(+)

 diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
 b/drivers/net/bonding/rte_eth_bond_api.c
 index d558df8..cad08b9 100644
 --- a/drivers/net/bonding/rte_eth_bond_api.c
 +++ b/drivers/net/bonding/rte_eth_bond_api.c
 @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t 
 bonded_port_id, uint16_t slave_port_id)
return -1;
}
   
 +  rte_eth_link_get_nowait(slave_port_id, &link_props);
 +
>>> The error seems in link_properties_valid(), does it make sense to get link 
>>> info
>>> inside that function before link checks?
> Not really, as one might expect that link_properties_valid will only 
> test the struct rte_eth_link *slave_link argument, not update it.

Fair enough, I just thought to be sure the tested link is up to date, but that
function seems only called by __eth_bond_slave_add_lock_free() which you are
updating, so this is ok.

> 
>>>
slave_add(internals, slave_eth_dev);
   
/* We need to store slaves reta_size to be able to synchronize RETA for 
 all

> 



Re: [dpdk-dev] [PATCH 1/2] net/qede: fix to update VF MTU

2018-05-31 Thread Ferruh Yigit
On 5/23/2018 12:16 AM, Rasesh Mody wrote:
> This patch fixes VF MTU update to work without having to restart the
> vport and there by not requiring port re-configuration. It adds a
> VF MTU Update TLV to achieve the same. Firmware can handle VF MTU update
> by just pausing the vport.
> 
> Fixes: dd28bc8c6ef4 ("net/qede: fix VF port creation sequence")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Rasesh Mody 

Series applied to dpdk-next-net/master, thanks.

(for v18.08)


Re: [dpdk-dev] [PATCH] net/cxgbe: report configured link auto-negotiation

2018-05-31 Thread Ferruh Yigit
On 5/23/2018 7:00 PM, Rahul Lakkireddy wrote:
> Report current configured link auto-negotiation. Also initialize
> rte_eth_link.
> 
> Coverity issue: 280648
> Fixes: f5b3c7b29357 ("net/cxgbevf: fix inter-VM traffic when physical link 
> down")
> 
> Signed-off-by: Rahul Lakkireddy 

Applied to dpdk-next-net/master, thanks.


[dpdk-dev] [PATCH] malloc: fix pad erasing

2018-05-31 Thread Anatoly Burakov
Previously, when joining adjacent free elements, we were erasing
trailer and header, but did not erase the padding. Fix this by
accounting for padding on erase, and do not erase padding twice
by adjusting data pointer and data len to not include padding.

Fixes: bb372060dad4 ("malloc: make heap a doubly-linked list")
Cc: sta...@dpdk.org

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/malloc_elem.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/common/malloc_elem.c 
b/lib/librte_eal/common/malloc_elem.c
index 9bfe9b9b4..944587bc5 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -386,16 +386,18 @@ malloc_elem_join_adjacent_free(struct malloc_elem *elem)
if (elem->next != NULL && elem->next->state == ELEM_FREE &&
next_elem_is_adjacent(elem)) {
void *erase;
+   size_t erase_len;
 
/* we will want to erase the trailer and header */
erase = RTE_PTR_SUB(elem->next, MALLOC_ELEM_TRAILER_LEN);
+   erase_len = MALLOC_ELEM_OVERHEAD + elem->next->pad;
 
/* remove from free list, join to this one */
malloc_elem_free_list_remove(elem->next);
join_elem(elem, elem->next);
 
-   /* erase header and trailer */
-   memset(erase, 0, MALLOC_ELEM_OVERHEAD);
+   /* erase header, trailer and pad */
+   memset(erase, 0, erase_len);
}
 
/*
@@ -406,9 +408,11 @@ malloc_elem_join_adjacent_free(struct malloc_elem *elem)
prev_elem_is_adjacent(elem)) {
struct malloc_elem *new_elem;
void *erase;
+   size_t erase_len;
 
/* we will want to erase trailer and header */
erase = RTE_PTR_SUB(elem, MALLOC_ELEM_TRAILER_LEN);
+   erase_len = MALLOC_ELEM_OVERHEAD + elem->pad;
 
/* remove from free list, join to this one */
malloc_elem_free_list_remove(elem->prev);
@@ -416,8 +420,8 @@ malloc_elem_join_adjacent_free(struct malloc_elem *elem)
new_elem = elem->prev;
join_elem(new_elem, elem);
 
-   /* erase header and trailer */
-   memset(erase, 0, MALLOC_ELEM_OVERHEAD);
+   /* erase header, trailer and pad */
+   memset(erase, 0, erase_len);
 
elem = new_elem;
}
@@ -436,8 +440,8 @@ malloc_elem_free(struct malloc_elem *elem)
void *ptr;
size_t data_len;
 
-   ptr = RTE_PTR_ADD(elem, sizeof(*elem));
-   data_len = elem->size - MALLOC_ELEM_OVERHEAD;
+   ptr = RTE_PTR_ADD(elem, MALLOC_ELEM_HEADER_LEN + elem->pad);
+   data_len = elem->size - elem->pad - MALLOC_ELEM_OVERHEAD;
 
elem = malloc_elem_join_adjacent_free(elem);
 
-- 
2.17.0


[dpdk-dev] DPDK Release Status Meeting 31/05/2018

2018-05-31 Thread Mcnamara, John
Minutes from the weekly DPDK Release Status Meeting.

The DPDK Release Status Meeting is intended for DPDK Committers to discuss
the status of the master tree and sub-trees, and for project managers to
track progress or milestone dates.

The meeting occurs on Thursdays at 8:30 UTC. If you wish to attend just
send me and email and I will send you the invite.


Minutes 31 May 2018

Agenda:

* DPDK 18.05 release.
* Retrospective on 18.05.
* Testing of stable releases.

Participants:

* Intel
* Cavium
* Mellanox
* NXP
* 6Wind
* RedHat


DPDK 18.05 release.

* DPDK 18.08 "The Venky Release" is out. \o/
* Thanks to all the maintainers and contributors.
* See the release notes for the full stats:
  http://dpdk.org/ml/archives/announce/2018-May/000204.html


Retrospective on 18.05.

* What went well.

  * Largest DPDK release ever.
  * Lots of contributors from all the main companies involved in networking.
  * Collaboration on major features between companies in the community.

* What didn't go so well.

  * The release was very late and required a lot of release candidates.
  * RC1 was late and low quality.
  * Many major defects found in RC testing.
  * Some reviews were slow or late.

* What can we do differently next time.

  * Should we change the number of releases per year from 4 to 3 or 2?
  * Merge earlier from subtrees: every 7-10 days.
  * Push patches earlier.
  * Review patches more critically. Does every feature/patch need to go in.
  * Use unit tests more
* Make them a requirement for any sizeable code.
* Make them easier to write/use/run.
* Add a make/meson target for generating coverage results from units
  tests. Any volunteers?
  * Hold more strictly to the release milestone dates.


Testing of stable releases.

* Luca Boccassi asked about testing of the stable release.
* All major contributing companies should confirm test results on the stable
  releases.
* Luca will send an email to the list about it:
  http://dpdk.org/ml/archives/dev/2018-May/103249.html



[dpdk-dev] [PATCH] eal: add option to limit memory allocation on sockets

2018-05-31 Thread Anatoly Burakov
Previously, it was possible to limit maximum amount of memory
allowed for allocation by creating validator callbacks. Although a
powerful tool, it's a bit of a hassle and requires modifying the
application for it to work with DPDK example applications.

Fix this by adding a new parameter "--socket-limit", with syntax
similar to "--socket-mem", which would set per-socket memory
allocation limits, and set up a default validator callback to deny
all allocations above the limit.

This option is incompatible with legacy mode, as validator callbacks
are not supported there.

Signed-off-by: Anatoly Burakov 
---
 doc/guides/linux_gsg/build_sample_apps.rst|  4 +++
 .../prog_guide/env_abstraction_layer.rst  |  4 +++
 lib/librte_eal/common/eal_common_options.c| 10 ++
 lib/librte_eal/common/eal_internal_cfg.h  |  2 ++
 lib/librte_eal/common/eal_options.h   |  2 ++
 lib/librte_eal/linuxapp/eal/eal.c | 36 +--
 lib/librte_eal/linuxapp/eal/eal_memory.c  | 21 +++
 7 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/doc/guides/linux_gsg/build_sample_apps.rst 
b/doc/guides/linux_gsg/build_sample_apps.rst
index 3623ddf46..332424e05 100644
--- a/doc/guides/linux_gsg/build_sample_apps.rst
+++ b/doc/guides/linux_gsg/build_sample_apps.rst
@@ -114,6 +114,10 @@ The EAL options are as follows:
   this memory will also be pinned (i.e. not released back to the system until
   application closes).
 
+* ``--socket-limit``:
+  Limit maximum memory available for allocation on each socket. Does not 
support
+  legacy memory mode.
+
 * ``-d``:
   Add a driver or driver directory to be loaded.
   The application should use this option to load the pmd drivers
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
b/doc/guides/prog_guide/env_abstraction_layer.rst
index a22640d29..4c51efd42 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -147,6 +147,10 @@ notified about memory allocations above specified 
threshold (and have a chance
 to deny them), allocation validator callbacks are also available via
 ``rte_mem_alloc_validator_callback_register()`` function.
 
+A default validator callback is provided by EAL, which can be enabled with a
+``--socket-limit`` command-line option, for a simple way to limit maximum 
amount
+of memory that can be used by DPDK application.
+
 .. note::
 
 In multiprocess scenario, all related processes (i.e. primary process, and
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index ecebb2923..c720efa86 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -70,6 +70,7 @@ eal_long_options[] = {
{OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM},
{OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM},
{OPT_SOCKET_MEM,1, NULL, OPT_SOCKET_MEM_NUM   },
+   {OPT_SOCKET_LIMIT,  1, NULL, OPT_SOCKET_LIMIT_NUM },
{OPT_SYSLOG,1, NULL, OPT_SYSLOG_NUM   },
{OPT_VDEV,  1, NULL, OPT_VDEV_NUM },
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
@@ -179,6 +180,10 @@ eal_reset_internal_config(struct internal_config 
*internal_cfg)
/* zero out the NUMA config */
for (i = 0; i < RTE_MAX_NUMA_NODES; i++)
internal_cfg->socket_mem[i] = 0;
+   internal_cfg->force_socket_limits = 0;
+   /* zero out the NUMA limits config */
+   for (i = 0; i < RTE_MAX_NUMA_NODES; i++)
+   internal_cfg->socket_limit[i] = 0;
/* zero out hugedir descriptors */
for (i = 0; i < MAX_HUGEPAGE_SIZES; i++) {
memset(&internal_cfg->hugepage_info[i], 0,
@@ -1322,6 +1327,11 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
"be specified together with --"OPT_NO_HUGE"\n");
return -1;
}
+   if (internal_config.force_socket_limits && internal_config.legacy_mem) {
+   RTE_LOG(ERR, EAL, "--" OPT_SOCKET_LIMIT " is only supported in "
+   "non-legacy memory mode\n");
+   return -1;
+   }
 
return 0;
 }
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index c4cbf3acd..d66cd0313 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -46,6 +46,8 @@ struct internal_config {
/** true to try allocating memory on specific sockets */
volatile unsigned force_sockets;
volatile uint64_t socket_mem[RTE_MAX_NUMA_NODES]; /**< amount of memory 
per socket */
+   volatile unsigned force_socket_limits;
+   volatile uint64_t socket_limit[RTE_MAX_NUMA_NODES]; /**< limit amount 
of memory per socket */
uintptr_t base_virtaddr;  /**< base addre

Re: [dpdk-dev] [RFC] net/mlx4: add TSO support

2018-05-31 Thread Wiles, Keith



> On May 31, 2018, at 11:21 AM, Moti Haimovsky  wrote:
> 
> TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP
> network stack to delegate segmentation of a TCP segment to the NIC,
> thus saving compute resources.
> 
> This RFC proposes to add support for TSO to the MLX4 PMD.
> 
> Prerequisites:
> In order for the PMD to recognize the TSO capabilities of the device
> one has to use:
> * RDMA-core v18.0 or above.
> * Linux kernel 4.16 or above.
> 
> Assumptions:
> * mlx4 PMD will follow the TSO support implemented in mlx5 PMD. 
> * PMD is backwards compatible.
>  ** The PMD will continue work with the kernels and RDMA-core
> supported by it today.
>  ** The PMD will continue to work with devices not supporting TSO. 
> 
> Changes proposed in the PMD for implementing TSO:
> * At init, query the device for TSO support and MAX segment size
>  being supported.
>  This will also determine if the PMD will advertise support for TSO
>  (dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;)
> * Calling create-qp when creating a Tx queue will have to consider
>  the MAX TSO header size when calculating the actual queue buffer
>  size. This may be abstracted by calling ibv_create_qp_ex with
>  IBV_QP_INIT_ATTR_MAX_TSO_HEADER as comp flag rather than
>  ibv_create_qp.
>  If this breaks backwards compatibility then this calculation will
>  be done in the PMD code.
> * Modify tx_burst function to:
>  **  Check for TSO flag indication in the packets of the packet burst
>  (buf->ol_flags & PKT_TX_TCP_SEG).
>  **  For TSO packet create the WQE appropriate for sending a TSO packet
>  and fill it with packet info and  L2/L3/L4 Headers.
> * Modify Tx completion function to handle releasing of TSO packet
>  buffers that were transmitted.
> 
> Concerns:
> * Impact of changing Tx send routine on performance.
>  The performance of the tx_burst routine for non-TSO packets may be
>  affected just by placing the code that handles TSO packets in it,
>  so we may want to consider having a dedicated routine for TSO packets.

How much shared code between the two APIs if we created a new API just for TSO?

My first thought was to create a new API, but it would require my application 
to know it needs to call the new TSO API instead of the normal tx_burst API or 
does it?
Maybe it does not matter and a TSO request would never be directed to the 
normal API, if that is the case I would like a new API and not effect the old 
one. 

> * No MAX-TSO parameter.
>   This is a cross-PMD issue that may need a separate mailing thread to handle.
>   As for today there is no way for the PMD to advertise the MAX-TSO
>   it or its HW support as done with other capabilities.
>   (The indirection table size for example.
>see rte_eth_dev_info.reta_size in rte_ethdev.h).
>   Also there is no DPDK parameter or constant value that the PMD
>   can use in order to know the MAX-TSO the system requires.
>   This prevents applications from determining the MAX-TSO that can be
>   used leading to configuration mismatches that may lead to transmit
>   failures or to less-than-optimize TSO configuration in the best case.
>   I propose to add a max_tso field in rte_eth_dev_info that will allow
>   the PMD to advertise the max tso is supports. This can be used by
>   DPDK applications to determine what TSO size to use.
>   If this is a major change that cannot fit the 18.08 schedule then
>   I propose to add a MAX_TSO constant in rte_ethdev.h, The PMD will
>   compare this value whit its own MAX-TSO and if it cannot meet the
>   defined value it will not advertise that it is a TSO capable device.
> * Handling packets longer then MAX-TSO
>   In case a PMD is requested to send a TSO packet which is longer than
>   MAX-TSO the PMD send routine should return with an error.
>   A different approach that can be used on the future is to apply GSO
>   to those packets using the GSO lib in DPDK.
> 
> I am interested in general design comments and concerns listed above.
> 
> Signed-off-by: Moti Haimovsky 
> 
> -- 
> 1.8.3.1
> 

Regards,
Keith



Re: [dpdk-dev] DPDK Release Status Meeting 31/05/2018

2018-05-31 Thread Wiles, Keith



> On May 31, 2018, at 12:18 PM, dev-boun...@dpdk.org wrote:
> 
> Minutes from the weekly DPDK Release Status Meeting.
> 
> The DPDK Release Status Meeting is intended for DPDK Committers to discuss
> the status of the master tree and sub-trees, and for project managers to
> track progress or milestone dates.
> 
> The meeting occurs on Thursdays at 8:30 UTC. If you wish to attend just
> send me and email and I will send you the invite.
> 
> 
> Minutes 31 May 2018
> 
> Agenda:
> 
> * DPDK 18.05 release.
> * Retrospective on 18.05.
> * Testing of stable releases.
> 
> Participants:
> 
> * Intel
> * Cavium
> * Mellanox
> * NXP
> * 6Wind
> * RedHat
> 
> 
> DPDK 18.05 release.
> 
> * DPDK 18.08 "The Venky Release" is out. \o/
> * Thanks to all the maintainers and contributors.
> * See the release notes for the full stats:
>  http://dpdk.org/ml/archives/announce/2018-May/000204.html
> 
> 
> Retrospective on 18.05.
> 
> * What went well.
> 
>  * Largest DPDK release ever.
>  * Lots of contributors from all the main companies involved in networking.
>  * Collaboration on major features between companies in the community.
> 
> * What didn't go so well.
> 
>  * The release was very late and required a lot of release candidates.

More testing is always better and sooner then later is better.

>  * RC1 was late and low quality.
>  * Many major defects found in RC testing.
>  * Some reviews were slow or late.
> 
> * What can we do differently next time.
> 
>  * Should we change the number of releases per year from 4 to 3 or 2?
>  * Merge earlier from subtrees: every 7-10 days.

Knowing when a merge is scheduled could help if say it would be every Monday or 
Wednesday. I would not put it on a Friday as that tends to cause people having 
to work on the weekends. Maybe every Wednesday would be best. It allows the 
developers to know when patches are applied and allow for more testing before 
the first RC.

>  * Push patches earlier.
>  * Review patches more critically. Does every feature/patch need to go in.
>  * Use unit tests more
>* Make them a requirement for any sizeable code.
>* Make them easier to write/use/run.
>* Add a make/meson target for generating coverage results from units
>  tests. Any volunteers?
>  * Hold more strictly to the release milestone dates.

I notice at times developers need to rebase on to master because of changes. Is 
this because of patches after theirs is applied before or do we need to have 
something in place to eliminate this rework?

Knowing when someone is going to introduce a big patch that may disturb or 
change many things effecting a patch would be nice to see a schedule produced 
by the developer as to when it will be push. I hope it would give more control 
for maintainers to schedule a given patch.
> 
> 
> Testing of stable releases.
> 
> * Luca Boccassi asked about testing of the stable release.
> * All major contributing companies should confirm test results on the stable
>  releases.
> * Luca will send an email to the list about it:
>  http://dpdk.org/ml/archives/dev/2018-May/103249.html
> 

Regards,
Keith



[dpdk-dev] [PATCH] doc/event: improve eventdev library documentation

2018-05-31 Thread Honnappa Nagarahalli
Add small amount of additional code, use consistent variable names
across code blocks, change the image to represent queues and
CPU cores intuitively. These help improve the eventdev library
documentation.

Signed-off-by: Honnappa Nagarahalli 
Reviewed-by: Gavin Hu 
---
 doc/guides/prog_guide/eventdev.rst   |   55 +-
 doc/guides/prog_guide/img/eventdev_usage.svg | 1518 +-
 2 files changed, 570 insertions(+), 1003 deletions(-)

diff --git a/doc/guides/prog_guide/eventdev.rst 
b/doc/guides/prog_guide/eventdev.rst
index ce19997..0203d9e 100644
--- a/doc/guides/prog_guide/eventdev.rst
+++ b/doc/guides/prog_guide/eventdev.rst
@@ -1,5 +1,6 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
 Copyright(c) 2017 Intel Corporation.
+Copyright(c) 2018 Arm Limited.
 
 Event Device Library
 
@@ -129,7 +130,7 @@ API Walk-through
 
 This section will introduce the reader to the eventdev API, showing how to
 create and configure an eventdev and use it for a two-stage atomic pipeline
-with a single core for TX. The diagram below shows the final state of the
+with one core each for RX and TX. The diagram below shows the final state of 
the
 application after this walk-through:
 
 .. _figure_eventdev-usage1:
@@ -196,23 +197,29 @@ calling the setup function. Repeat this step for each 
queue, starting from
 .nb_atomic_flows = 1024,
 .nb_atomic_order_sequences = 1024,
 };
+struct rte_event_queue_conf single_link_conf = {
+.event_queue_cfg = RTE_EVENT_QUEUE_CFG_SINGLE_LINK,
+};
 int dev_id = 0;
-int queue_id = 0;
-int err = rte_event_queue_setup(dev_id, queue_id, &atomic_conf);
+int atomic_q_1 = 0;
+int atomic_q_2 = 1;
+int single_link_q = 2;
+int err = rte_event_queue_setup(dev_id, atomic_q_1, &atomic_conf);
+int err = rte_event_queue_setup(dev_id, atomic_q_2, &atomic_conf);
+int err = rte_event_queue_setup(dev_id, single_link_q, 
&single_link_conf);
 
-The remainder of this walk-through assumes that the queues are configured as
-follows:
+As shown above, queue IDs are as follows:
 
  * id 0, atomic queue #1
  * id 1, atomic queue #2
  * id 2, single-link queue
 
+These queues are used for the remainder of this walk-through.
+
 Setting up Ports
 
 
-Once queues are set up successfully, create the ports as required. Each port
-should be set up with its corresponding port_conf type, worker for worker 
cores,
-rx and tx for the RX and TX cores:
+Once queues are set up successfully, create the ports as required.
 
 .. code-block:: c
 
@@ -232,15 +239,24 @@ rx and tx for the RX and TX cores:
 .new_event_threshold = 4096,
 };
 int dev_id = 0;
-int port_id = 0;
-int err = rte_event_port_setup(dev_id, port_id, &CORE_FUNCTION_conf);
+int rx_port_id = 0;
+int err = rte_event_port_setup(dev_id, rx_port_id, &rx_conf);
+
+for(int worker_port_id = 1; worker_port_id <= 4; worker_port_id++) {
+   int err = rte_event_port_setup(dev_id, worker_port_id, 
&worker_conf);
+}
 
-It is now assumed that:
+int tx_port_id = 5;
+   int err = rte_event_port_setup(dev_id, tx_port_id, &tx_conf);
+
+As shown above:
 
  * port 0: RX core
  * ports 1,2,3,4: Workers
  * port 5: TX core
 
+These ports are used for the remainder of this walk-through.
+
 Linking Queues and Ports
 
 
@@ -254,15 +270,14 @@ can be achieved like this:
 
 .. code-block:: c
 
-uint8_t port_id = 0;
+uint8_t rx_port_id = 0;
+uint8_t tx_port_id = 5;
 uint8_t atomic_qs[] = {0, 1};
 uint8_t single_link_q = 2;
-uint8_t tx_port_id = 5;
 uin8t_t priority = RTE_EVENT_DEV_PRIORITY_NORMAL;
 
-for(int i = 0; i < 4; i++) {
-int worker_port = i + 1;
-int links_made = rte_event_port_link(dev_id, worker_port, 
atomic_qs, NULL, 2);
+for(int worker_port_id = 1; worker_port_id <= 4; worker_port_id++) {
+int links_made = rte_event_port_link(dev_id, worker_port_id, 
atomic_qs, NULL, 2);
 }
 int links_made = rte_event_port_link(dev_id, tx_port_id, 
&single_link_q, &priority, 1);
 
@@ -295,14 +310,14 @@ The following code shows how those packets can be 
enqueued into the eventdev:
 ev[i].flow_id = mbufs[i]->hash.rss;
 ev[i].op = RTE_EVENT_OP_NEW;
 ev[i].sched_type = RTE_SCHED_TYPE_ATOMIC;
-ev[i].queue_id = 0;
+ev[i].queue_id = atomic_q_1;
 ev[i].event_type = RTE_EVENT_TYPE_ETHDEV;
 ev[i].sub_event_type = 0;
 ev[i].priority = RTE_EVENT_DEV_PRIORITY_NORMAL;
 ev[i].mbuf = mbufs[i];
 }
 
-const int nb_tx = rte_event_enqueue_burst(dev_id, port_id, ev, nb_rx);
+const int nb_tx = rte

[dpdk-dev] [PATCH] doc: add template release notes for 18.08

2018-05-31 Thread Thomas Monjalon
Add template release notes for DPDK 18.08 with inline
comments and explanations of the various sections.

Signed-off-by: Thomas Monjalon 
---
 doc/guides/rel_notes/index.rst |   1 +
 doc/guides/rel_notes/release_18_08.rst | 192 +
 2 files changed, 193 insertions(+)
 create mode 100644 doc/guides/rel_notes/release_18_08.rst

diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index eb82a0e06..d125342c3 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -9,6 +9,7 @@ Release Notes
 :numbered:
 
 rel_description
+release_18_08
 release_18_05
 release_18_02
 release_17_11
diff --git a/doc/guides/rel_notes/release_18_08.rst 
b/doc/guides/rel_notes/release_18_08.rst
new file mode 100644
index 0..5bc23c537
--- /dev/null
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -0,0 +1,192 @@
+DPDK Release 18.08
+==
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+  make doc-guides-html
+
+  xdg-open build/doc/html/guides/rel_notes/release_18_08.html
+
+
+New Features
+
+
+.. This section should contain new features added in this release.
+   Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+ Add a short 1-2 sentence description in the past tense.
+ The description should be enough to allow someone scanning
+ the release notes to understand the new feature.
+
+ If the feature adds a lot of sub-features you can use a bullet list
+ like this:
+
+ * Added feature foo to do something.
+ * Enhanced feature bar to do something else.
+
+ Refer to the previous release notes for examples.
+
+ This section is a comment. Do not overwrite or remove it.
+ Also, make sure to start the actual text at the margin.
+ =
+
+
+API Changes
+---
+
+.. This section should contain API changes. Sample format:
+
+   * Add a short 1-2 sentence description of the API change.
+ Use fixed width quotes for ``function_names`` or ``struct_names``.
+ Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =
+
+
+ABI Changes
+---
+
+.. This section should contain ABI changes. Sample format:
+
+   * Add a short 1-2 sentence description of the ABI change
+ that was announced in the previous releases and made in this release.
+ Use fixed width quotes for ``function_names`` or ``struct_names``.
+ Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =
+
+
+Removed Items
+-
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item
+ in the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =
+
+
+Shared Library Versions
+---
+
+.. Update any library version updated in this release
+   and prepend with a ``+`` sign, like this:
+
+ librte_acl.so.2
+   + librte_cfgfile.so.2
+ librte_cmdline.so.2
+
+   This section is a comment. Do not overwrite or remove it.
+   =
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+ librte_acl.so.2
+ librte_bbdev.so.1
+ librte_bitratestats.so.2
+ librte_bpf.so.1
+ librte_bus_dpaa.so.1
+ librte_bus_fslmc.so.1
+ librte_bus_pci.so.1
+ librte_bus_vdev.so.1
+ librte_cfgfile.so.2
+ librte_cmdline.so.2
+ librte_common_octeontx.so.1
+ librte_compressdev.so.1
+ librte_cryptodev.so.4
+ librte_distributor.so.1
+ librte_eal.so.7
+ librte_ethdev.so.9
+ librte_eventdev.so.4
+ librte_flow_classify.so.1
+ librte_gro.so.1
+ librte_gso.so.1
+ librte_hash.so.2
+ librte_ip_frag.so.1
+ librte_jobstats.so.1
+ librte_kni.so.2
+ librte_kvargs.so.1
+ librte_latencystats.so.1
+ librte_lpm.so.2
+ librte_mbuf.so.4
+ librte_mempool.so.4
+ librte_meter.so.2
+ librte_metrics.so.1
+ librte_net.so.1
+ librte_pci.so.1
+ librte_pdump.so.2
+ librte_pipeline.so.3
+ librte_pmd_bnxt.so.2
+ librte_pmd_bond.so.2
+ librte_pmd_i40e.so.2
+

Re: [dpdk-dev] [PATCH v3] net/bonding: fix slave add for mode 4

2018-05-31 Thread Chas Williams
It's not clear to me that the issue here is the bonding slave add.
You can only add started PMDs.  When a PMD dev start is complete,
the PMD should have a valid link state and the link properties should be
valid.  A few of the PMDs are very good about this, particularly the
ones with LSC interrupts.  Those drivers often wait for the first
link interrupt before setting their link status.  So there is a
race where the link state isn't well defined.

And lastly, why do we care what the link state is when adding a
slave?  If the link state changes to down, do we remove the slave?
If the link speed of the slave changes, do we remove the slave?
So this test doesn't make much sense.  For mode 4, you should be
able to add a slave, but if the link state doesn't match what
has been negotiated, then the slave should fail to activate.

On Thu, May 31, 2018 at 12:10 PM, Radu Nicolau 
wrote:
>
> Add a call to rte_eth_link_get_nowait on every slave to update
> the internal link status struct. Otherwise slave add will fail
> for mode 4 if the ports are all stopped but only one of them checked.
>
> Fixes: b77d21cc2364 ("ethdev: add link status get/set helper functions")
> Bugzilla ID: 52
>
> Signed-off-by: Radu Nicolau 
> ---
> v3: updated commit msg
> v2: add fix and Bugzilla references
>
>  drivers/net/bonding/rte_eth_bond_api.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/bonding/rte_eth_bond_api.c
b/drivers/net/bonding/rte_eth_bond_api.c
> index d558df8..cad08b9 100644
> --- a/drivers/net/bonding/rte_eth_bond_api.c
> +++ b/drivers/net/bonding/rte_eth_bond_api.c
> @@ -296,6 +296,8 @@ __eth_bond_slave_add_lock_free(uint16_t
bonded_port_id, uint16_t slave_port_id)
> return -1;
> }
>
> +   rte_eth_link_get_nowait(slave_port_id, &link_props);
> +
> slave_add(internals, slave_eth_dev);
>
> /* We need to store slaves reta_size to be able to synchronize
RETA for all
> --
> 2.7.5
>


Re: [dpdk-dev] [dpdk-stable] Regression tests for stable releases from companies involved in DPDK

2018-05-31 Thread Christian Ehrhardt
On Thu, May 31, 2018 at 12:26 PM, Luca Boccassi  wrote:

> Hello all,
>
> At this morning's release meeting (minutes coming soon from John), we
> briefly discussed the state of the regression testing for stable
> releases and agreed we need to formalise the process.
>
> At the moment we have a firm commitment from Intel and Mellanox to test
> all stable branches (and if I heard correctly from NXP as well? Please
> confirm!). AT&T committed to run regressions on the 16.11 branch.
>
> Here's what we need in order to improve the quality of the stable
> releases process:
>
> 1) More commitments to help from other companies involved in the DPDK
> community. At the cost of re-stating the obvious, improving the quality
> of stable releases is for everyone's benefit, as a lot of customers and
> projects rely on the stable or LTS releases for their production
> environments.
>
> 2) A formalised deadline - the current proposal is 10 days from the
> "xx.yy patches review and test" email, which was just sent for 16.11.
> For the involved companies, please let us know if 10 days is enough. In
> terms of scheduling, this period will always start within a week from
> the mainline final release. Again, the signal is the "xx.yy patches
> review and test" appearing in the inbox, which will detail the
> deadline.
>
>
Hi Luca,
I discussed with Thomas about it.
I don't know how much extra effort for the stable maintainers it would be,
but I wonder if there could be a XX.YY.z-rc tarball.
That would be
a) a more clear sign what people are used to test
b) easier to integrate as I assume quite a bunch of tests will usually
start rebasing on tarballs instead of directly from git.

If you think everyone can derive from git easily I'm fine, I just wondered
if a proper -rc tarball might be more comfortable for the testing entities.

cu
Christian