Re: [dpdk-dev] [PATCH] mk: allow exec-env specific targets

2017-06-06 Thread Jerin Jacob
-Original Message-
> Date: Tue, 06 Jun 2017 08:46:12 +0200
> From: Thomas Monjalon 
> To: Jerin Jacob 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mk: allow exec-env specific targets
> 
> 06/06/2017 08:36, Jerin Jacob:
> > Add a hook in generic rte.sdkbuild.mk file
> > to include exec-env specific targets.
> > 
> > Signed-off-by: Jerin Jacob 
> > ---
> > Useful in integrating some custom targets in nonstandard execution 
> > environments.
> > For example, a bare-metal-simulator exec execution environment may need
> > a target to run the dpdk applications.
> > ---
> 
> This patch is just including an empty file.

Do you like to add check for the file is present or not ? and if present,
invoke the file.

> Please explain how it can help with a real example.

We are evaluating on running DPDK on a nonstandard execution environment like
bare metal where I would to keep all my execution environment specific
change at following location. So that I can easy move around different
version of DPDK without merge conflict.

$(RTE_SDK)mk/exec-env/my-exec-env
$(RTE_SDK)lib/librte_eal/my-exec-env

I believe, The existing target like "exec-env-appinstall" in 
mk/exec-env/linuxapp/rte.app.mk,
solves the same purpose.


Re: [dpdk-dev] [PATCH] mk: allow exec-env specific targets

2017-06-06 Thread Thomas Monjalon
06/06/2017 09:02, Jerin Jacob:
> From: Thomas Monjalon 
> > 06/06/2017 08:36, Jerin Jacob:
> > > Add a hook in generic rte.sdkbuild.mk file
> > > to include exec-env specific targets.
> > > 
> > > Signed-off-by: Jerin Jacob 
> > > ---
> > > Useful in integrating some custom targets in nonstandard execution 
> > > environments.
> > > For example, a bare-metal-simulator exec execution environment may need
> > > a target to run the dpdk applications.
> > > ---
> > 
> > This patch is just including an empty file.
> 
> Do you like to add check for the file is present or not ? and if present,
> invoke the file.

The dash prefixing does the check:
-include

> > Please explain how it can help with a real example.
> 
> We are evaluating on running DPDK on a nonstandard execution environment like
> bare metal where I would to keep all my execution environment specific
> change at following location. So that I can easy move around different
> version of DPDK without merge conflict.
> 
> $(RTE_SDK)mk/exec-env/my-exec-env
> $(RTE_SDK)lib/librte_eal/my-exec-env
> 
> I believe, The existing target like "exec-env-appinstall" in 
> mk/exec-env/linuxapp/rte.app.mk,
> solves the same purpose.

I do not understand.
If you want to add a new environment, why not just adding it?



Re: [dpdk-dev] [PATCH] net/e1000: add support 2-tuple filter on i210/i211

2017-06-06 Thread Zhao1, Wei
I'm sorry for this mistake, this patch is ok and I will not deliver a new 
version.
The mail about v3 before is intended to be reply to another mail else.

Thank you.

> -Original Message-
> From: Zhao1, Wei
> Sent: Monday, June 5, 2017 2:16 PM
> To: Lu, Wenzhuo ; dev@dpdk.org
> Subject: RE: [PATCH] net/e1000: add support 2-tuple filter on i210/i211
> 
> Hi ,wenzhuo
> 
> > -Original Message-
> > From: Lu, Wenzhuo
> > Sent: Monday, June 5, 2017 2:15 PM
> > To: Zhao1, Wei ; dev@dpdk.org
> > Subject: RE: [PATCH] net/e1000: add support 2-tuple filter on i210/i211
> >
> > Hi,
> >
> >
> > > -Original Message-
> > > From: Zhao1, Wei
> > > Sent: Monday, June 5, 2017 1:41 PM
> > > To: dev@dpdk.org
> > > Cc: Lu, Wenzhuo; Zhao1, Wei
> > > Subject: [PATCH] net/e1000: add support 2-tuple filter on i210/i211
> > >
> > > Add support of i210 and i211 type nic in 2-tuple filter.
> > >
> > > Signed-off-by: Wei Zhao 
> > Acked-by: Wenzhuo Lu 
> 
> I will commit v3 later  to rework code as your suggestion.


Re: [dpdk-dev] [PATCH 1/3] net/i40e: support flexible payload parsing for FDIR

2017-06-06 Thread Lu, Wenzhuo
Hi Beilei,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> Sent: Wednesday, May 24, 2017 2:10 PM
> To: Zhang, Helin; Wu, Jingjing
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 1/3] net/i40e: support flexible payload parsing
> for FDIR
> 
> This patch adds flexible payload parsing support for flow director filter.
> 
> Signed-off-by: Beilei Xing 
> ---
>  drivers/net/i40e/i40e_ethdev.h |  23 
>  drivers/net/i40e/i40e_fdir.c   |  19 ---
>  drivers/net/i40e/i40e_flow.c   | 298
> -
>  3 files changed, 317 insertions(+), 23 deletions(-)

> +
> +static int
> +i40e_flow_store_flex_pit(struct i40e_pf *pf,
> +  struct i40e_fdir_flex_pit *flex_pit,
> +  enum i40e_flxpld_layer_idx layer_idx,
> +  uint8_t raw_id)
> +{
> + uint8_t field_idx;
> +
> + field_idx = layer_idx * I40E_MAX_FLXPLD_FIED + raw_id;
> + /* Check if the configuration is conflicted */
> + if (pf->fdir.flex_pit_flag[layer_idx] &&
> + (pf->fdir.flex_set[field_idx].src_offset != flex_pit->src_offset ||
> +  pf->fdir.flex_set[field_idx].size != flex_pit->size ||
> +  pf->fdir.flex_set[field_idx].dst_offset != flex_pit->dst_offset))
> + return -1;
> +
> + if (pf->fdir.flex_pit_flag[layer_idx] &&
> + (pf->fdir.flex_set[field_idx].src_offset == flex_pit->src_offset &&
> +  pf->fdir.flex_set[field_idx].size == flex_pit->size &&
> +  pf->fdir.flex_set[field_idx].dst_offset == flex_pit->dst_offset))
> + return 1;
Is this check necessary? Don't find a specific handling for this return value.
If it's necessary, would you like to add some comments about this check?

> +
> + pf->fdir.flex_set[field_idx].src_offset =
> + flex_pit->src_offset;
> + pf->fdir.flex_set[field_idx].size =
> + flex_pit->size;
> + pf->fdir.flex_set[field_idx].dst_offset =
> + flex_pit->dst_offset;
> +
> + return 0;
> +}



Re: [dpdk-dev] [PATCH] mk: allow exec-env specific targets

2017-06-06 Thread Jerin Jacob
-Original Message-
> Date: Tue, 06 Jun 2017 09:16:34 +0200
> From: Thomas Monjalon 
> To: Jerin Jacob 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mk: allow exec-env specific targets
> 
> 06/06/2017 09:02, Jerin Jacob:
> > From: Thomas Monjalon 
> > > 06/06/2017 08:36, Jerin Jacob:
> > > > Add a hook in generic rte.sdkbuild.mk file
> > > > to include exec-env specific targets.
> > > > 
> > > > Signed-off-by: Jerin Jacob 
> > > > ---
> > > > Useful in integrating some custom targets in nonstandard execution 
> > > > environments.
> > > > For example, a bare-metal-simulator exec execution environment may need
> > > > a target to run the dpdk applications.
> > > > ---
> > > 
> > > This patch is just including an empty file.
> > 
> > Do you like to add check for the file is present or not ? and if present,
> > invoke the file.
> 
> The dash prefixing does the check:
> -include

OK

> 
> > > Please explain how it can help with a real example.
> > 
> > We are evaluating on running DPDK on a nonstandard execution environment 
> > like
> > bare metal where I would to keep all my execution environment specific
> > change at following location. So that I can easy move around different
> > version of DPDK without merge conflict.
> > 
> > $(RTE_SDK)mk/exec-env/my-exec-env
> > $(RTE_SDK)lib/librte_eal/my-exec-env
> > 
> > I believe, The existing target like "exec-env-appinstall" in 
> > mk/exec-env/linuxapp/rte.app.mk,
> > solves the same purpose.
> 
> I do not understand.
> If you want to add a new environment, why not just adding it?

I do not understand it either. In exiting makefile infrastructure,
How do you add an exec environment specific target(s) with out changing
the common code?

> 


Re: [dpdk-dev] [PATCH 1/3] net/i40e: support flexible payload parsing for FDIR

2017-06-06 Thread Xing, Beilei
> -Original Message-
> From: Lu, Wenzhuo
> Sent: Tuesday, June 6, 2017 3:46 PM
> To: Xing, Beilei ; Zhang, Helin
> ; Wu, Jingjing 
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/3] net/i40e: support flexible payload
> parsing for FDIR
> 
> Hi Beilei,
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> > Sent: Wednesday, May 24, 2017 2:10 PM
> > To: Zhang, Helin; Wu, Jingjing
> > Cc: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH 1/3] net/i40e: support flexible payload
> > parsing for FDIR
> >
> > This patch adds flexible payload parsing support for flow director filter.
> >
> > Signed-off-by: Beilei Xing 
> > ---
> >  drivers/net/i40e/i40e_ethdev.h |  23 
> >  drivers/net/i40e/i40e_fdir.c   |  19 ---
> >  drivers/net/i40e/i40e_flow.c   | 298
> > -
> >  3 files changed, 317 insertions(+), 23 deletions(-)
> 
> > +
> > +static int
> > +i40e_flow_store_flex_pit(struct i40e_pf *pf,
> > +struct i40e_fdir_flex_pit *flex_pit,
> > +enum i40e_flxpld_layer_idx layer_idx,
> > +uint8_t raw_id)
> > +{
> > +   uint8_t field_idx;
> > +
> > +   field_idx = layer_idx * I40E_MAX_FLXPLD_FIED + raw_id;
> > +   /* Check if the configuration is conflicted */
> > +   if (pf->fdir.flex_pit_flag[layer_idx] &&
> > +   (pf->fdir.flex_set[field_idx].src_offset != flex_pit->src_offset ||
> > +pf->fdir.flex_set[field_idx].size != flex_pit->size ||
> > +pf->fdir.flex_set[field_idx].dst_offset != flex_pit->dst_offset))
> > +   return -1;
> > +
> > +   if (pf->fdir.flex_pit_flag[layer_idx] &&
> > +   (pf->fdir.flex_set[field_idx].src_offset == flex_pit->src_offset &&
> > +pf->fdir.flex_set[field_idx].size == flex_pit->size &&
> > +pf->fdir.flex_set[field_idx].dst_offset == flex_pit->dst_offset))
> > +   return 1;
> Is this check necessary? Don't find a specific handling for this return value.
> If it's necessary, would you like to add some comments about this check?


Thanks for catching it, I think it can be deleted, will update in next version.

> 
> > +
> > +   pf->fdir.flex_set[field_idx].src_offset =
> > +   flex_pit->src_offset;
> > +   pf->fdir.flex_set[field_idx].size =
> > +   flex_pit->size;
> > +   pf->fdir.flex_set[field_idx].dst_offset =
> > +   flex_pit->dst_offset;
> > +
> > +   return 0;
> > +}



Re: [dpdk-dev] [PATCH 2/3] net/i40e: support input set selection for FDIR

2017-06-06 Thread Lu, Wenzhuo
Hi,


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> Sent: Wednesday, May 24, 2017 2:10 PM
> To: Zhang, Helin; Wu, Jingjing
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 2/3] net/i40e: support input set selection for
> FDIR
> 
> This patch supports input set selection for flow director filter.
> 
> Signed-off-by: Beilei Xing 
Acked-by: Wenzhuo Lu 


Re: [dpdk-dev] [PATCH] eventdev: remove PCI dependency

2017-06-06 Thread Gaëtan Rivet
On Tue, Jun 06, 2017 at 08:35:48AM +0530, Jerin Jacob wrote:
> -Original Message-
> > Date: Mon, 5 Jun 2017 14:55:55 +0200
> > From: Gaëtan Rivet 
> > To: Jerin Jacob 
> > Cc: dev@dpdk.org, bruce.richard...@intel.com, harry.van.haa...@intel.com,
> >  hemant.agra...@nxp.com, gage.e...@intel.com, nipun.gu...@nxp.com
> > Subject: Re: [dpdk-dev] [PATCH] eventdev: remove PCI dependency
> > User-Agent: Mutt/1.5.23 (2014-03-12)
> > 
> > Hi Jerin,
> 
> Hi Gaëtan,
> 
> > 
> > On Thu, Jun 01, 2017 at 10:11:46PM +0530, Jerin Jacob wrote:
> > > Remove the PCI dependency from generic data structures
> > > and moved the PCI specific code to rte_event_pmd_pci*
> > > 
> > 
> > Thanks for working on this.
> > 
> > Do you plan on removing rte_pci.h in rte_eventdev_pmd.h? Do you think it
> > would be feasible?
> 
> That is for PCI PMD specific probe(rte_event_pmd_pci_probe() and 
> rte_event_pmd_pci_remove()),
> More like, lib/librte_ether/rte_ethdev_pci.h functions in ethdev.
> So, I think, It is OK to keep rte_pci.h for PMD specific functions.
> 
> 

Ok, sure. However rte_eventdev.c includes both rte_pci.h and
rte_eventdev_pmd.h. Can it be made independent from the PMD specific
include?

> > 
> > > CC: Gaetan Rivet 
> > > Signed-off-by: Jerin Jacob 
> > > ---
> > >  drivers/event/skeleton/skeleton_eventdev.c | 30 +-
> > >  lib/librte_eventdev/rte_eventdev.c | 37 +++---
> > >  lib/librte_eventdev/rte_eventdev.h |  2 -
> > >  lib/librte_eventdev/rte_eventdev_pmd.h | 63 
> > > --
> > >  4 files changed, 41 insertions(+), 91 deletions(-)
> > > 
> > > diff --git a/drivers/event/skeleton/skeleton_eventdev.c 
> > > b/drivers/event/skeleton/skeleton_eventdev.c
> > > index 800bd76e0..34684aba0 100644
> > > --- a/drivers/event/skeleton/skeleton_eventdev.c
> > > +++ b/drivers/event/skeleton/skeleton_eventdev.c
> > > @@ -427,18 +427,28 @@ static const struct rte_pci_id 
> > > pci_id_skeleton_map[] = {
> > >   },
> > >  };
> > >  
> > > -static struct rte_eventdev_driver pci_eventdev_skeleton_pmd = {
> > > - .pci_drv = {
> > > - .id_table = pci_id_skeleton_map,
> > > - .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
> > > - .probe = rte_event_pmd_pci_probe,
> > > - .remove = rte_event_pmd_pci_remove,
> > > - },
> > > - .eventdev_init = skeleton_eventdev_init,
> > > - .dev_private_size = sizeof(struct skeleton_eventdev),
> > > +static int
> > > +event_skeleton_pci_probe(struct rte_pci_driver *pci_drv,
> > > +  struct rte_pci_device *pci_dev)
> > > +{
> > > + return rte_event_pmd_pci_probe(pci_drv, pci_dev,
> > > + sizeof(struct skeleton_eventdev), skeleton_eventdev_init);
> > > +}
> > > +
> > > +static int
> > > +event_skeleton_pci_remove(struct rte_pci_device *pci_dev)
> > > +{
> > > + return rte_event_pmd_pci_remove(pci_dev, NULL);
> > > +}
> > > +
> > > +static struct rte_pci_driver pci_eventdev_skeleton_pmd = {
> > > + .id_table = pci_id_skeleton_map,
> > > + .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
> > > + .probe = event_skeleton_pci_probe,
> > > + .remove = event_skeleton_pci_remove,
> > >  };
> > >  
> > > -RTE_PMD_REGISTER_PCI(event_skeleton_pci, 
> > > pci_eventdev_skeleton_pmd.pci_drv);
> > > +RTE_PMD_REGISTER_PCI(event_skeleton_pci, pci_eventdev_skeleton_pmd);
> > >  RTE_PMD_REGISTER_PCI_TABLE(event_skeleton_pci, pci_id_skeleton_map);
> > >  
> > >  /* VDEV based event device */
> > > diff --git a/lib/librte_eventdev/rte_eventdev.c 
> > > b/lib/librte_eventdev/rte_eventdev.c
> > > index 20afc3f0e..91f950666 100644
> > > --- a/lib/librte_eventdev/rte_eventdev.c
> > > +++ b/lib/librte_eventdev/rte_eventdev.c
> > > @@ -126,8 +126,6 @@ rte_event_dev_info_get(uint8_t dev_id, struct 
> > > rte_event_dev_info *dev_info)
> > >   dev_info->dequeue_timeout_ns = dev->data->dev_conf.dequeue_timeout_ns;
> > >  
> > >   dev_info->dev = dev->dev;
> > > - if (dev->driver)
> > > - dev_info->driver_name = dev->driver->pci_drv.driver.name;
> > >   return 0;
> > >  }
> > >  
> > > @@ -1250,18 +1248,18 @@ rte_event_pmd_vdev_uninit(const char *name)
> > >  
> > >  int
> > >  rte_event_pmd_pci_probe(struct rte_pci_driver *pci_drv,
> > > - struct rte_pci_device *pci_dev)
> > > + struct rte_pci_device *pci_dev,
> > > + size_t private_data_size,
> > > + eventdev_pmd_pci_callback_t devinit)
> > >  {
> > > - struct rte_eventdev_driver *eventdrv;
> > >   struct rte_eventdev *eventdev;
> > >  
> > >   char eventdev_name[RTE_EVENTDEV_NAME_MAX_LEN];
> > >  
> > >   int retval;
> > >  
> > > - eventdrv = (struct rte_eventdev_driver *)pci_drv;
> > > - if (eventdrv == NULL)
> > > - return -ENODEV;
> > > + if (devinit == NULL)
> > > + return -EINVAL;
> > >  
> > >   rte_pci_device_name(&pci_dev->addr, eventdev_name,
> > >   sizeof(eventdev_name));
> > > @@ -1275,7 +1273,7 @@ rte_event_pmd_pci_probe(struct rte_pci_driver 
> > > *pci_drv,

[dpdk-dev] [PATCH v4 0/2] Balanced allocation of hugepages

2017-06-06 Thread Ilya Maximets
Version 4:
* Fixed work on systems without NUMA by adding check for NUMA
  support in kernel.

Version 3:
* Implemented hybrid schema for allocation.
* Fixed not needed mempolicy change while remapping. (orig = 0)
* Added patch to enable VHOST_NUMA by default.

Version 2:
* rebased (fuzz in Makefile)

Ilya Maximets (2):
  mem: balanced allocation of hugepages
  config: enable vhost numa awareness by default

 config/common_base   |  2 +-
 lib/librte_eal/Makefile  |  2 +
 lib/librte_eal/linuxapp/eal/eal_memory.c | 94 ++--
 mk/rte.app.mk|  1 +
 4 files changed, 94 insertions(+), 5 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH v4 1/2] mem: balanced allocation of hugepages

2017-06-06 Thread Ilya Maximets
Currently EAL allocates hugepages one by one not paying attention
from which NUMA node allocation was done.

Such behaviour leads to allocation failure if number of available
hugepages for application limited by cgroups or hugetlbfs and
memory requested not only from the first socket.

Example:
# 90 x 1GB hugepages availavle in a system

cgcreate -g hugetlb:/test
# Limit to 32GB of hugepages
cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
# Request 4GB from each of 2 sockets
cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...

EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
EAL: 32 not 90 hugepages of size 1024 MB allocated
EAL: Not enough memory available on socket 1!
 Requested: 4096MB, available: 0MB
PANIC in rte_eal_init():
Cannot init memory

This happens beacause all allocated pages are
on socket 0.

Fix this issue by setting mempolicy MPOL_PREFERRED for each hugepage
to one of requested nodes using following schema:

1) Allocate essential hugepages:
1.1) Allocate as many hugepages from numa N to
 only fit requested memory for this numa.
1.2) repeat 1.1 for all numa nodes.
2) Try to map all remaining free hugepages in a round-robin
   fashion.
3) Sort pages and choose the most suitable.

In this case all essential memory will be allocated and all remaining
pages will be fairly distributed between all requested nodes.

libnuma added as a general dependency for EAL.

Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")

Signed-off-by: Ilya Maximets 
---
 lib/librte_eal/Makefile  |  2 +
 lib/librte_eal/linuxapp/eal/eal_memory.c | 94 ++--
 mk/rte.app.mk|  1 +
 3 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/Makefile b/lib/librte_eal/Makefile
index 5690bb4..0a1af3a 100644
--- a/lib/librte_eal/Makefile
+++ b/lib/librte_eal/Makefile
@@ -37,4 +37,6 @@ DEPDIRS-linuxapp := common
 DIRS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += bsdapp
 DEPDIRS-bsdapp := common
 
+LDLIBS += -lnuma
+
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 9c9baf6..5947434 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -358,6 +359,19 @@ static int huge_wrap_sigsetjmp(void)
return sigsetjmp(huge_jmpenv, 1);
 }
 
+#ifndef ULONG_SIZE
+#define ULONG_SIZE sizeof(unsigned long)
+#endif
+#ifndef ULONG_BITS
+#define ULONG_BITS (ULONG_SIZE * CHAR_BIT)
+#endif
+#ifndef DIV_ROUND_UP
+#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
+#endif
+#ifndef BITS_TO_LONGS
+#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, ULONG_SIZE)
+#endif
+
 /*
  * Mmap all hugepages of hugepage table: it first open a file in
  * hugetlbfs, then mmap() hugepage_sz data in it. If orig is set, the
@@ -366,18 +380,78 @@ static int huge_wrap_sigsetjmp(void)
  * map continguous physical blocks in contiguous virtual blocks.
  */
 static unsigned
-map_all_hugepages(struct hugepage_file *hugepg_tbl,
-   struct hugepage_info *hpi, int orig)
+map_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi,
+ uint64_t *essential_memory, int orig)
 {
int fd;
unsigned i;
void *virtaddr;
void *vma_addr = NULL;
size_t vma_len = 0;
+   unsigned long nodemask[BITS_TO_LONGS(RTE_MAX_NUMA_NODES)] = {0UL};
+   unsigned long maxnode = 0;
+   int node_id = -1;
+   bool numa_available = true;
+
+   /* Check if kernel supports NUMA. */
+   if (get_mempolicy(NULL, NULL, 0, 0, 0) < 0 && errno == ENOSYS) {
+   RTE_LOG(DEBUG, EAL, "NUMA is not supported.\n");
+   numa_available = false;
+   }
+
+   if (orig && numa_available) {
+   for (i = 0; i < RTE_MAX_NUMA_NODES; i++)
+   if (internal_config.socket_mem[i])
+   maxnode = i + 1;
+   }
 
for (i = 0; i < hpi->num_pages[0]; i++) {
uint64_t hugepage_sz = hpi->hugepage_sz;
 
+   if (maxnode) {
+   unsigned int j;
+
+   for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
+   if (essential_memory[j])
+   break;
+
+   if (j == RTE_MAX_NUMA_NODES) {
+   node_id = (node_id + 1) % RTE_MAX_NUMA_NODES;
+   while (!internal_config.socket_mem[node_id]) {
+   node_id++;
+   node_id %= RTE_MAX_NUMA_NODES;
+   }
+ 

[dpdk-dev] [PATCH v4 2/2] config: enable vhost numa awareness by default

2017-06-06 Thread Ilya Maximets
Since libnuma is added as a general dependency for EAL,
it is safe to enable LIBRTE_VHOST_NUMA by default.

Signed-off-by: Ilya Maximets 
---
 config/common_base | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config/common_base b/config/common_base
index c858769..db4cc1c 100644
--- a/config/common_base
+++ b/config/common_base
@@ -708,7 +708,7 @@ CONFIG_RTE_LIBRTE_PDUMP=y
 # Compile vhost user library
 #
 CONFIG_RTE_LIBRTE_VHOST=n
-CONFIG_RTE_LIBRTE_VHOST_NUMA=n
+CONFIG_RTE_LIBRTE_VHOST_NUMA=y
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
-- 
2.7.4



Re: [dpdk-dev] [PATCH 3/3] net/i40e: update supported patterns for FDIR

2017-06-06 Thread Lu, Wenzhuo
Hi,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> Sent: Wednesday, May 24, 2017 2:10 PM
> To: Zhang, Helin; Wu, Jingjing
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 3/3] net/i40e: update supported patterns for
> FDIR
> 
> This patch updates supported patterns for flow director filters.
> 
> Signed-off-by: Beilei Xing 
Acked-by: Wenzhuo Lu 


Re: [dpdk-dev] [PATCH] eventdev: remove PCI dependency

2017-06-06 Thread Jerin Jacob
-Original Message-
> Date: Tue, 6 Jun 2017 10:09:21 +0200
> From: Gaëtan Rivet 
> To: Jerin Jacob 
> Cc: dev@dpdk.org, bruce.richard...@intel.com, harry.van.haa...@intel.com,
>  hemant.agra...@nxp.com, gage.e...@intel.com, nipun.gu...@nxp.com
> Subject: Re: [dpdk-dev] [PATCH] eventdev: remove PCI dependency
> User-Agent: Mutt/1.5.23 (2014-03-12)
> 
> On Tue, Jun 06, 2017 at 08:35:48AM +0530, Jerin Jacob wrote:
> > -Original Message-
> > > Date: Mon, 5 Jun 2017 14:55:55 +0200
> > > From: Gaëtan Rivet 
> > > To: Jerin Jacob 
> > > Cc: dev@dpdk.org, bruce.richard...@intel.com, harry.van.haa...@intel.com,
> > >  hemant.agra...@nxp.com, gage.e...@intel.com, nipun.gu...@nxp.com
> > > Subject: Re: [dpdk-dev] [PATCH] eventdev: remove PCI dependency
> > > User-Agent: Mutt/1.5.23 (2014-03-12)
> > > 
> > > Hi Jerin,
> > 
> > Hi Gaëtan,
> > 
> > > 
> > > On Thu, Jun 01, 2017 at 10:11:46PM +0530, Jerin Jacob wrote:
> > > > Remove the PCI dependency from generic data structures
> > > > and moved the PCI specific code to rte_event_pmd_pci*
> > > > 
> > > 
> > > Thanks for working on this.
> > > 
> > > Do you plan on removing rte_pci.h in rte_eventdev_pmd.h? Do you think it
> > > would be feasible?
> > 
> > That is for PCI PMD specific probe(rte_event_pmd_pci_probe() and 
> > rte_event_pmd_pci_remove()),
> > More like, lib/librte_ether/rte_ethdev_pci.h functions in ethdev.
> > So, I think, It is OK to keep rte_pci.h for PMD specific functions.
> > 
> > 
> 
> Ok, sure. However rte_eventdev.c includes both rte_pci.h and
> rte_eventdev_pmd.h. Can it be made independent from the PMD specific
> include?

Sure. I will remove rte_pci.h from rte_eventdev.c and send the v2.



Re: [dpdk-dev] [PATCH] mk: allow exec-env specific targets

2017-06-06 Thread Jerin Jacob
-Original Message-
> Date: Tue, 6 Jun 2017 13:20:42 +0530
> From: Jerin Jacob 
> To: Thomas Monjalon 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mk: allow exec-env specific targets
> User-Agent: Mutt/1.8.3 (2017-05-23)
> 
> -Original Message-
> > Date: Tue, 06 Jun 2017 09:16:34 +0200
> > From: Thomas Monjalon 
> > To: Jerin Jacob 
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] mk: allow exec-env specific targets
> > 
> > 06/06/2017 09:02, Jerin Jacob:
> > > From: Thomas Monjalon 
> > > > 06/06/2017 08:36, Jerin Jacob:
> > > > > Add a hook in generic rte.sdkbuild.mk file
> > > > > to include exec-env specific targets.
> > > > > 
> > > > > Signed-off-by: Jerin Jacob 
> > > > > ---
> > > > > Useful in integrating some custom targets in nonstandard execution 
> > > > > environments.
> > > > > For example, a bare-metal-simulator exec execution environment may 
> > > > > need
> > > > > a target to run the dpdk applications.
> > > > > ---
> > > > 
> > > > This patch is just including an empty file.
> > > 
> > > Do you like to add check for the file is present or not ? and if present,
> > > invoke the file.
> > 
> > The dash prefixing does the check:
> > -include
> 
> OK
> 
> > 
> > > > Please explain how it can help with a real example.
> > > 
> > > We are evaluating on running DPDK on a nonstandard execution environment 
> > > like
> > > bare metal where I would to keep all my execution environment specific
> > > change at following location. So that I can easy move around different
> > > version of DPDK without merge conflict.
> > > 
> > > $(RTE_SDK)mk/exec-env/my-exec-env
> > > $(RTE_SDK)lib/librte_eal/my-exec-env
> > > 
> > > I believe, The existing target like "exec-env-appinstall" in 
> > > mk/exec-env/linuxapp/rte.app.mk,
> > > solves the same purpose.
> > 
> > I do not understand.
> > If you want to add a new environment, why not just adding it?
> 
> I do not understand it either. In exiting makefile infrastructure,
> How do you add an exec environment specific target(s) with out changing
> the common code?

As disucssed in IRC, I will send the v2 with following changes,
- Change mk/exec-env/$(RTE_EXEC_ENV)/rte.extra.mk to
mk/exec-env/$(RTE_EXEC_ENV)/rte.custom.mk
- Remove empty files and include through -include

> 
> > 


Re: [dpdk-dev] [PATCH] mk: allow exec-env specific targets

2017-06-06 Thread Thomas Monjalon
06/06/2017 11:05, Jerin Jacob:
> From: Jerin Jacob 
> > From: Thomas Monjalon 
> > > 06/06/2017 09:02, Jerin Jacob:
> > > > From: Thomas Monjalon 
> > > > > Please explain how it can help with a real example.
> > > > 
> > > > We are evaluating on running DPDK on a nonstandard execution 
> > > > environment like
> > > > bare metal where I would to keep all my execution environment specific
> > > > change at following location. So that I can easy move around different
> > > > version of DPDK without merge conflict.
> > > > 
> > > > $(RTE_SDK)mk/exec-env/my-exec-env
> > > > $(RTE_SDK)lib/librte_eal/my-exec-env
> > > > 
> > > > I believe, The existing target like "exec-env-appinstall" in 
> > > > mk/exec-env/linuxapp/rte.app.mk,
> > > > solves the same purpose.
> > > 
> > > I do not understand.
> > > If you want to add a new environment, why not just adding it?
> > 
> > I do not understand it either. In exiting makefile infrastructure,
> > How do you add an exec environment specific target(s) with out changing
> > the common code?
> 
> As disucssed in IRC, I will send the v2 with following changes,
> - Change mk/exec-env/$(RTE_EXEC_ENV)/rte.extra.mk to
> mk/exec-env/$(RTE_EXEC_ENV)/rte.custom.mk
> - Remove empty files and include through -include

It will help defining some new local environments.
However, in the general case, it is better to upstream environment changes
and make everybody able to use it.


Re: [dpdk-dev] [RFC] eal/memory: introducing an option to set iova as va

2017-06-06 Thread Bruce Richardson
On Mon, Jun 05, 2017 at 10:24:11AM +0530, santosh wrote:
> Hi Bruce,
> 
> 
> On Friday 02 June 2017 02:57 PM, Bruce Richardson wrote:
> > On Fri, Jun 02, 2017 at 09:54:46AM +0530, santosh wrote:
> >> Ping?
> >>
> >> On Wednesday 24 May 2017 09:41 PM, Santosh Shukla wrote:
> >>
> >>> Some NPU hardware like OCTEONTX follows push model to get
> >>> the packet from the pktio device. Where packet allocation
> >>> and freeing done by the HW. Since HW can operate only on
> >>> IOVA with help of SMMU/IOMMU, When packet receives from the
> >>> Ethernet device, It is the IOVA address(which is PA in existing scheme).
> >>>
> >>> Mapping IOVA as PA is expensive on those HW, where every
> >>> packet needs to be converted to VA from PA/IOVA.
> >>>
> >>> This patch proposes the scheme where the user can set IOVA
> >>> as VA by using an eal command line argument. That helps to
> >>> avoid costly lookup for VA in SW by leveraging the SMMU
> >>> translation feature.
> >>>
> >>> Signed-off-by: Santosh Shukla 
> >>> ---
> > Hi,
> >
> > I agree this is a problem that needs to be solved, but this doesn't look
> > like a particularly future-proofed solution. Given that we should
> > use the IOMMU on as many platforms as possible for protection, we
> > probably need to find an automatic way for DPDK to use IO addresses
> > correctly. Is this therefore better done as part of the VFIO and
> > UIO-specific code in EAL - as that is the part that knows how the memory
> > mapping is done, and in the VFIO case, what address ranges were
> > programmed in. The mempool driver was something else I considered but it
> > is probably too high a level to implement this.
> 
> The other approach which we evaluated, Its detail:
> 0) Introduce a new bus api whose job is to detect iommu capable devices on 
> that
> bus {/ are those devices bind to iommu capable driver or not?}. Let's call 
> that
> api rte_bus_chk_iommu_dev();
> 
> 1) The scheme is like If _all_ the devices bind to iommu kdrv then return 
> iova=va
> 2) Otherwise switch to default mode i.e.. iova=pa.
> 3) Based on rte_bus_chk_iommu_dev() return value, 
> accordingly program iova=va Or iova=pa in vfio_type1/spapr_map(). 
> 
> 4) User from the command line can always override iova=va, 
> in case if he wants to default scheme( iova=pa mode). For that purpose - 
> Introduce eal
> option something like --iova-pa Or --override-iova Or --iova-default 
> or some better name.
> 
> Proposed API snap:
> 
> enum iova_mode {
> iova_va;
> iova_pa;
> iova_unknown;
> };
> 
> /**
>  * Look for iommu devices on that Bus.
>  * And find out that those devices bind to iommu
>  * capable driver example vfio.
>  *
>  *
>  * @return
>  *  On success return valid iova mode (iova_va or iova_pa)
>  *  On failure return iova_unkown.
>  */
> typedef int (*rte_bus_chk_iommu_dev_t)(void);
> 
> 
> By this approach, 
> - We can automatically detect iova is va or pa
> and then program accordingly. 
> - Also, the user can always switch to default iova mode.
> - Drivers like dpaa2 can use this API to detect iova mode then 
> program dma_map accordingly. Currently they are doing in ifdef-way.
> 
> Comments? thoughts? Or if anyone has better proposal then, please
> suggest.
> 

That sounds a more complete solution. However, it's probably a lot of
work to implement. :-)

I also wonder if we want to simplify things a little and disallow
mixed-mode operation i.e. all devices have to use UIO or all use VFIO?
Would that help to allow simplification or other options. Having a whole
new bus type seems strange for this. Can each bus just report whether
it's members require physical addresses. Then the EAL can manage a
single flag to report whether we are using VA or PA?

/Bruce


Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation

2017-06-06 Thread Ananyev, Konstantin

> >
> >
> >
> > >
> > > The PROD/CONS_ALIGN values on x86-64 are set to 2 cache lines, so members
> > of struct rte_ring are 128 byte aligned,
> > >and therefore the whole struct needs 128-byte alignment according to the 
> > >ABI
> > so that the 128-byte alignment of the fields can be guaranteed.
> >
> > Ah ok, missed the fact that rte_ring is 128B aligned these days.
> > BTW, I probably missed the initial discussion, but what was the reason for 
> > that?
> > Konstantin
> 
> I don't know why PROD_ALIGN/CONS_ALIGN use 128 byte alignment; it seems 
> unnecessary if the cache line is only 64 bytes.  An alternate
> fix would be to just use cache line alignment for these fields (since 
> memzones are already cache line aligned). 

Yes, had the same thought.

> Maybe there is some deeper  reason for the >= 128-byte alignment logic in 
> rte_ring.h?

Might be, would be good to hear opinion the author of that change. 
Thanks
Konstantin


Re: [dpdk-dev] [RFCv2] service core concept

2017-06-06 Thread Van Haaren, Harry


> -Original Message-
> From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> Sent: Monday, June 5, 2017 8:23 AM
> To: Van Haaren, Harry 
> Cc: dev@dpdk.org; Thomas Monjalon ; Richardson, Bruce
> ; Ananyev, Konstantin 
> ; Wiles, Keith
> 
> Subject: Re: [dpdk-dev] [RFCv2] service core concept




> Looks good to me in general.
> 
> How about an API to query the service function running status?
> bool rte_service_is_running(struct rte_service_spec *service); or something 
> similar.


Good idea - noted and will implement for patchset.


Re: [dpdk-dev] [RFC] eal/memory: introducing an option to set iova as va

2017-06-06 Thread Gaëtan Rivet
On Tue, Jun 06, 2017 at 10:57:20AM +0100, Bruce Richardson wrote:
> On Mon, Jun 05, 2017 at 10:24:11AM +0530, santosh wrote:
> > Hi Bruce,
> > 
> > 
> > On Friday 02 June 2017 02:57 PM, Bruce Richardson wrote:
> > > On Fri, Jun 02, 2017 at 09:54:46AM +0530, santosh wrote:
> > >> Ping?
> > >>
> > >> On Wednesday 24 May 2017 09:41 PM, Santosh Shukla wrote:
> > >>
> > >>> Some NPU hardware like OCTEONTX follows push model to get
> > >>> the packet from the pktio device. Where packet allocation
> > >>> and freeing done by the HW. Since HW can operate only on
> > >>> IOVA with help of SMMU/IOMMU, When packet receives from the
> > >>> Ethernet device, It is the IOVA address(which is PA in existing scheme).
> > >>>
> > >>> Mapping IOVA as PA is expensive on those HW, where every
> > >>> packet needs to be converted to VA from PA/IOVA.
> > >>>
> > >>> This patch proposes the scheme where the user can set IOVA
> > >>> as VA by using an eal command line argument. That helps to
> > >>> avoid costly lookup for VA in SW by leveraging the SMMU
> > >>> translation feature.
> > >>>
> > >>> Signed-off-by: Santosh Shukla 
> > >>> ---
> > > Hi,
> > >
> > > I agree this is a problem that needs to be solved, but this doesn't look
> > > like a particularly future-proofed solution. Given that we should
> > > use the IOMMU on as many platforms as possible for protection, we
> > > probably need to find an automatic way for DPDK to use IO addresses
> > > correctly. Is this therefore better done as part of the VFIO and
> > > UIO-specific code in EAL - as that is the part that knows how the memory
> > > mapping is done, and in the VFIO case, what address ranges were
> > > programmed in. The mempool driver was something else I considered but it
> > > is probably too high a level to implement this.
> > 
> > The other approach which we evaluated, Its detail:
> > 0) Introduce a new bus api whose job is to detect iommu capable devices on 
> > that
> > bus {/ are those devices bind to iommu capable driver or not?}. Let's call 
> > that
> > api rte_bus_chk_iommu_dev();
> > 
> > 1) The scheme is like If _all_ the devices bind to iommu kdrv then return 
> > iova=va
> > 2) Otherwise switch to default mode i.e.. iova=pa.
> > 3) Based on rte_bus_chk_iommu_dev() return value, 
> > accordingly program iova=va Or iova=pa in vfio_type1/spapr_map(). 
> > 
> > 4) User from the command line can always override iova=va, 
> > in case if he wants to default scheme( iova=pa mode). For that purpose - 
> > Introduce eal
> > option something like --iova-pa Or --override-iova Or --iova-default 
> > or some better name.
> > 
> > Proposed API snap:
> > 
> > enum iova_mode {
> > iova_va;
> > iova_pa;
> > iova_unknown;
> > };
> > 
> > /**
> >  * Look for iommu devices on that Bus.
> >  * And find out that those devices bind to iommu
> >  * capable driver example vfio.
> >  *
> >  *
> >  * @return
> >  *  On success return valid iova mode (iova_va or iova_pa)
> >  *  On failure return iova_unkown.
> >  */
> > typedef int (*rte_bus_chk_iommu_dev_t)(void);
> > 
> > 
> > By this approach, 
> > - We can automatically detect iova is va or pa
> > and then program accordingly. 
> > - Also, the user can always switch to default iova mode.
> > - Drivers like dpaa2 can use this API to detect iova mode then 
> > program dma_map accordingly. Currently they are doing in ifdef-way.
> > 
> > Comments? thoughts? Or if anyone has better proposal then, please
> > suggest.
> > 
> 
> That sounds a more complete solution. However, it's probably a lot of
> work to implement. :-)
> 
> I also wonder if we want to simplify things a little and disallow
> mixed-mode operation i.e. all devices have to use UIO or all use VFIO?
> Would that help to allow simplification or other options. Having a whole
> new bus type seems strange for this. Can each bus just report whether
> it's members require physical addresses. Then the EAL can manage a
> single flag to report whether we are using VA or PA?
> 

Implementing this at a bus level requires all buses to have drivers
iterators, which are currently not exposed, or force all buses to
actively report drivers capabilities upon successful probing. The former
is a sizeable evolution while the latter leads to having duplicated code
in all bus->probe() implementation, which seems unsound.

I may be mistaken, but is this iova mode not currently limited to
VFIO? Should this API be made generic for all buses or is it only
relevant to the PCI bus?

If it can stay specific to the PCI bus, then it should simplify greatly
the implementation.

-- 
Gaëtan Rivet
6WIND


Re: [dpdk-dev] [PATCH v2] net/i40e: update actions for FDIR

2017-06-06 Thread Ferruh Yigit
On 6/2/2017 3:22 AM, Wu, Jingjing wrote:
> 
> 
>> -Original Message-
>> From: Xing, Beilei
>> Sent: Thursday, June 1, 2017 7:48 AM
>> To: Wu, Jingjing 
>> Cc: dev@dpdk.org
>> Subject: [PATCH v2] net/i40e: update actions for FDIR
>>
>> This commit adds support of FLAG action and PASSTHRU action for flow
>> director.
>>
>> Signed-off-by: Beilei Xing 
> 
> Acked-by: Jingjing Wu 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [RFCv2] service core concept

2017-06-06 Thread Van Haaren, Harry
> -Original Message-
> From: Ananyev, Konstantin
> Sent: Saturday, June 3, 2017 11:23 AM
> To: Van Haaren, Harry ; dev@dpdk.org
> Cc: Thomas Monjalon ; Jerin Jacob 
> ;
> Richardson, Bruce ; Wiles, Keith 
> 
> Subject: RE: [dpdk-dev] [RFCv2] service core concept



> > In particular this version of the API enables applications that are not 
> > aware of services to
> > benefit from the services concept, as EAL args can be used to setup 
> > services and service
> cores.
> > With this design, switching to/from SW/HW PMD is transparent to the 
> > application. An example
> > use-case is the Eventdev HW PMD to Eventdev SW PMD that requires a service 
> > core.
> >
> > I have noted the implementation comments that were raised on the v1. For 
> > v2, I think our
> time
> > is better spent looking at the API design, and I will handle implementation 
> > feedback in the
> > follow-up patchset to v2 RFC.
> >
> > Below a summary of what we are trying to achieve, and the current API 
> > design.
> > Have a good weekend! Cheers, -Harry
> 
>
> Looks good to me in general.
> The only comment I have - do we really need to put it into rte_eal_init()
> and a new EAL command-line parameter for it?
> Might be better to leave it to the particular app to decide.


There are a number of options here, each with its own merit:

A) Services/cores config in EAL
Benefit is that service functionality can be transparent to the application. 
Negative is that the complexity is in EAL.

B) Application configures services/cores
Benefit is no added EAL complexity. Negative is that application code has to 
configure cores (duplicated per application).


To answer this question, I think we need to estimate how many applications 
would benefit from EAL integration and balance that against the "complexity 
cost" of doing so. I do like the simplicity of option (B), however if there is 
significant value in total transparency to the application I think (A) is the 
better choice.


Input on A) or B) welcomed! -Harry


Re: [dpdk-dev] [PATCH v3 0/3] ixgbe: enable flex filter for rte_flow

2017-06-06 Thread Ferruh Yigit
On 6/1/2017 6:36 PM, Qi Zhang wrote:
> Enable fdir flex byte support for rte_flow APIs.
> 
> v2:
> - fix couple checkpatch errors.
> 
> v3:
> - fix comment.
> 
> Qi Zhang (3):
>   net/ixgbe: remove reduandent code
>   net/ixgbe: fix fdir mask not be reset
>   net/ixgbe: enable flex bytes for generic flow API

Acked-by: Wenzhuo Lu 

Series applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH v2] net/i40e: exclude internal packet's byte count

2017-06-06 Thread Ferruh Yigit
On 6/2/2017 3:10 AM, Xing, Beilei wrote:
> 
>> -Original Message-
>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Qi Zhang
>> Sent: Friday, June 2, 2017 1:56 AM
>> To: Wu, Jingjing ; Zhang, Helin
>> 
>> Cc: dev@dpdk.org; Zhang, Qi Z ; sta...@dpdk.org
>> Subject: [dpdk-dev] [PATCH v2] net/i40e: exclude internal packet's byte
>> count
>>
>> Tx/Rx byte counts of internal managed packet should be exluded from the
>> total rx/tx bytes.
>>
>> Fixes: 9aace75fc82e ("i40e: fix statistics")
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Qi Zhang 
> 
> Acked-by: Beilei Xing 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [RFC] eal/memory: introducing an option to set iova as va

2017-06-06 Thread Jerin Jacob
-Original Message-
> Date: Tue, 6 Jun 2017 10:57:20 +0100
> From: Bruce Richardson 
> To: santosh 
> CC: tho...@monjalon.net, dev@dpdk.org, jerin.ja...@caviumnetworks.com,
>  hemant.agra...@nxp.com
> Subject: Re: [dpdk-dev] [RFC] eal/memory: introducing an option to set iova
>  as va
> User-Agent: Mutt/1.8.1 (2017-04-11)
> 
> On Mon, Jun 05, 2017 at 10:24:11AM +0530, santosh wrote:
> > Hi Bruce,
> > 
> > 
> > On Friday 02 June 2017 02:57 PM, Bruce Richardson wrote:
> > > On Fri, Jun 02, 2017 at 09:54:46AM +0530, santosh wrote:
> > >> Ping?
> > >>
> > >> On Wednesday 24 May 2017 09:41 PM, Santosh Shukla wrote:
> > >>
> > >>> Some NPU hardware like OCTEONTX follows push model to get
> > >>> the packet from the pktio device. Where packet allocation
> > >>> and freeing done by the HW. Since HW can operate only on
> > >>> IOVA with help of SMMU/IOMMU, When packet receives from the
> > >>> Ethernet device, It is the IOVA address(which is PA in existing scheme).
> > >>>
> > >>> Mapping IOVA as PA is expensive on those HW, where every
> > >>> packet needs to be converted to VA from PA/IOVA.
> > >>>
> > >>> This patch proposes the scheme where the user can set IOVA
> > >>> as VA by using an eal command line argument. That helps to
> > >>> avoid costly lookup for VA in SW by leveraging the SMMU
> > >>> translation feature.
> > >>>
> > >>> Signed-off-by: Santosh Shukla 
> > >>> ---
> > > Hi,
> > >
> > > I agree this is a problem that needs to be solved, but this doesn't look
> > > like a particularly future-proofed solution. Given that we should
> > > use the IOMMU on as many platforms as possible for protection, we
> > > probably need to find an automatic way for DPDK to use IO addresses
> > > correctly. Is this therefore better done as part of the VFIO and
> > > UIO-specific code in EAL - as that is the part that knows how the memory
> > > mapping is done, and in the VFIO case, what address ranges were
> > > programmed in. The mempool driver was something else I considered but it
> > > is probably too high a level to implement this.
> > 
> > The other approach which we evaluated, Its detail:
> > 0) Introduce a new bus api whose job is to detect iommu capable devices on 
> > that
> > bus {/ are those devices bind to iommu capable driver or not?}. Let's call 
> > that
> > api rte_bus_chk_iommu_dev();
> > 
> > 1) The scheme is like If _all_ the devices bind to iommu kdrv then return 
> > iova=va
> > 2) Otherwise switch to default mode i.e.. iova=pa.
> > 3) Based on rte_bus_chk_iommu_dev() return value, 
> > accordingly program iova=va Or iova=pa in vfio_type1/spapr_map(). 
> > 
> > 4) User from the command line can always override iova=va, 
> > in case if he wants to default scheme( iova=pa mode). For that purpose - 
> > Introduce eal
> > option something like --iova-pa Or --override-iova Or --iova-default 
> > or some better name.
> > 
> > Proposed API snap:
> > 
> > enum iova_mode {
> > iova_va;
> > iova_pa;
> > iova_unknown;
> > };
> > 
> > /**
> >  * Look for iommu devices on that Bus.
> >  * And find out that those devices bind to iommu
> >  * capable driver example vfio.
> >  *
> >  *
> >  * @return
> >  *  On success return valid iova mode (iova_va or iova_pa)
> >  *  On failure return iova_unkown.
> >  */
> > typedef int (*rte_bus_chk_iommu_dev_t)(void);
> > 
> > 
> > By this approach, 
> > - We can automatically detect iova is va or pa
> > and then program accordingly. 
> > - Also, the user can always switch to default iova mode.
> > - Drivers like dpaa2 can use this API to detect iova mode then 
> > program dma_map accordingly. Currently they are doing in ifdef-way.
> > 
> > Comments? thoughts? Or if anyone has better proposal then, please
> > suggest.
> > 
> 
> That sounds a more complete solution. However, it's probably a lot of
> work to implement. :-)
> 
> I also wonder if we want to simplify things a little and disallow
> mixed-mode operation i.e. all devices have to use UIO or all use VFIO?
> Would that help to allow simplification or other options. Having a whole
> new bus type seems strange for this. Can each bus just report whether
> it's members require physical addresses. Then the EAL can manage a
> single flag to report whether we are using VA or PA?

That's the plan. Each bus op can say, VA or PA or Don't care(in the
case of vdev). And rte_bus aggregation function check all the buses
preferred address scheme and decide the mode of operation. Yes, We will
keep aggregation logic simple now, where when all bus says to go with VA
and Don't care, we will go with VA else PA.

> /Bruce


Re: [dpdk-dev] [PATCH] net/e1000: add support 2-tuple filter on i210/i211

2017-06-06 Thread Ferruh Yigit
On 6/5/2017 7:14 AM, Lu, Wenzhuo wrote:
> Hi,
> 
> 
>> -Original Message-
>> From: Zhao1, Wei
>> Sent: Monday, June 5, 2017 1:41 PM
>> To: dev@dpdk.org
>> Cc: Lu, Wenzhuo; Zhao1, Wei
>> Subject: [PATCH] net/e1000: add support 2-tuple filter on i210/i211
>>
>> Add support of i210 and i211 type nic in 2-tuple filter.
>>
>> Signed-off-by: Wei Zhao 
> Acked-by: Wenzhuo Lu 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [RFC] eal/memory: introducing an option to set iova as va

2017-06-06 Thread Jerin Jacob
-Original Message-
> Date: Tue, 6 Jun 2017 12:13:08 +0200
> From: Gaëtan Rivet 
> To: Bruce Richardson 
> Cc: santosh , tho...@monjalon.net,
>  dev@dpdk.org, jerin.ja...@caviumnetworks.com, hemant.agra...@nxp.com
> Subject: Re: [dpdk-dev] [RFC] eal/memory: introducing an option to set iova
>  as va
> User-Agent: Mutt/1.5.23 (2014-03-12)
> 
> > 
> > That sounds a more complete solution. However, it's probably a lot of
> > work to implement. :-)
> > 
> > I also wonder if we want to simplify things a little and disallow
> > mixed-mode operation i.e. all devices have to use UIO or all use VFIO?
> > Would that help to allow simplification or other options. Having a whole
> > new bus type seems strange for this. Can each bus just report whether
> > it's members require physical addresses. Then the EAL can manage a
> > single flag to report whether we are using VA or PA?
> > 
> 
> Implementing this at a bus level requires all buses to have drivers
> iterators, which are currently not exposed, or force all buses to
> actively report drivers capabilities upon successful probing. The former
> is a sizeable evolution while the latter leads to having duplicated code
> in all bus->probe() implementation, which seems unsound.
> 
> I may be mistaken, but is this iova mode not currently limited to
> VFIO? Should this API be made generic for all buses or is it only
> relevant to the PCI bus?
> 
> If it can stay specific to the PCI bus, then it should simplify greatly
> the implementation.

It not PCI bus specific. We can have VFIO platform bus too. NXP bus is a
VFIO platform bus. I think, This will help NXP bus as well as currently
they are using #ifdef scheme to select PA vs VA.



Re: [dpdk-dev] [RFCv2] service core concept

2017-06-06 Thread Ananyev, Konstantin


> -Original Message-
> From: Van Haaren, Harry
> Sent: Tuesday, June 6, 2017 11:26 AM
> To: Ananyev, Konstantin ; dev@dpdk.org
> Cc: Thomas Monjalon ; Jerin Jacob 
> ; Richardson, Bruce
> ; Wiles, Keith 
> Subject: RE: [dpdk-dev] [RFCv2] service core concept
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Saturday, June 3, 2017 11:23 AM
> > To: Van Haaren, Harry ; dev@dpdk.org
> > Cc: Thomas Monjalon ; Jerin Jacob 
> > ;
> > Richardson, Bruce ; Wiles, Keith 
> > 
> > Subject: RE: [dpdk-dev] [RFCv2] service core concept
> 
> 
> 
> > > In particular this version of the API enables applications that are not 
> > > aware of services to
> > > benefit from the services concept, as EAL args can be used to setup 
> > > services and service
> > cores.
> > > With this design, switching to/from SW/HW PMD is transparent to the 
> > > application. An example
> > > use-case is the Eventdev HW PMD to Eventdev SW PMD that requires a 
> > > service core.
> > >
> > > I have noted the implementation comments that were raised on the v1. For 
> > > v2, I think our
> > time
> > > is better spent looking at the API design, and I will handle 
> > > implementation feedback in the
> > > follow-up patchset to v2 RFC.
> > >
> > > Below a summary of what we are trying to achieve, and the current API 
> > > design.
> > > Have a good weekend! Cheers, -Harry
> >
> >
> > Looks good to me in general.
> > The only comment I have - do we really need to put it into rte_eal_init()
> > and a new EAL command-line parameter for it?
> > Might be better to leave it to the particular app to decide.
> 
> 
> There are a number of options here, each with its own merit:
> 
> A) Services/cores config in EAL
> Benefit is that service functionality can be transparent to the application. 
> Negative is that the complexity is in EAL.

It is not only complexity of EAL, as I understand, it would mean that
EAL will have a dependency on this new library.
Konstantin

> 
> B) Application configures services/cores
> Benefit is no added EAL complexity. Negative is that application code has to 
> configure cores (duplicated per application).
> 
> 
> To answer this question, I think we need to estimate how many applications 
> would benefit from EAL integration and balance that against
> the "complexity cost" of doing so. I do like the simplicity of option (B), 
> however if there is significant value in total transparency to the
> application I think (A) is the better choice.
> 
> 
> Input on A) or B) welcomed! -Harry


[dpdk-dev] [PATCH] net/liquidio: fix MTU calculation from port configuration

2017-06-06 Thread Shijith Thotton
max_rx_pkt_len member of port RX configuration indicates max frame
length. Ethernet header and CRC length should be subtracted from it to
find MTU.

Fixes: 605164c8e79d ("net/liquidio: add API to validate VF MTU")

Signed-off-by: Shijith Thotton 
---
 drivers/net/liquidio/lio_ethdev.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/liquidio/lio_ethdev.c 
b/drivers/net/liquidio/lio_ethdev.c
index 436d25b..61946ac 100644
--- a/drivers/net/liquidio/lio_ethdev.c
+++ b/drivers/net/liquidio/lio_ethdev.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "lio_logs.h"
 #include "lio_23xx_vf.h"
@@ -1348,7 +1349,8 @@ struct rte_lio_xstats_name_off {
 static int
 lio_dev_start(struct rte_eth_dev *eth_dev)
 {
-   uint16_t mtu = eth_dev->data->dev_conf.rxmode.max_rx_pkt_len;
+   uint16_t mtu;
+   uint32_t frame_len = eth_dev->data->dev_conf.rxmode.max_rx_pkt_len;
struct lio_device *lio_dev = LIO_DEV(eth_dev);
uint16_t timeout = LIO_MAX_CMD_TIMEOUT;
int ret = 0;
@@ -1386,12 +1388,29 @@ struct rte_lio_xstats_name_off {
goto dev_mtu_check_error;
}
 
+   if (eth_dev->data->dev_conf.rxmode.jumbo_frame == 1) {
+   if (frame_len <= ETHER_MAX_LEN ||
+   frame_len > LIO_MAX_RX_PKTLEN) {
+   lio_dev_err(lio_dev, "max packet length should be >= %d 
and < %d when jumbo frame is enabled\n",
+   ETHER_MAX_LEN, LIO_MAX_RX_PKTLEN);
+   ret = -EINVAL;
+   goto dev_mtu_check_error;
+   }
+   mtu = (uint16_t)(frame_len - ETHER_HDR_LEN - ETHER_CRC_LEN);
+   } else {
+   /* default MTU */
+   mtu = ETHER_MTU;
+   eth_dev->data->dev_conf.rxmode.max_rx_pkt_len = ETHER_MAX_LEN;
+   }
+
if (lio_dev->linfo.link.s.mtu != mtu) {
ret = lio_dev_validate_vf_mtu(eth_dev, mtu);
if (ret)
goto dev_mtu_check_error;
}
 
+   eth_dev->data->mtu = mtu;
+
return 0;
 
 dev_mtu_check_error:
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation

2017-06-06 Thread Bruce Richardson
On Tue, Jun 06, 2017 at 10:59:59AM +0100, Ananyev, Konstantin wrote:
> 
> > >
> > >
> > >
> > > >
> > > > The PROD/CONS_ALIGN values on x86-64 are set to 2 cache lines, so 
> > > > members
> > > of struct rte_ring are 128 byte aligned,
> > > >and therefore the whole struct needs 128-byte alignment according to the 
> > > >ABI
> > > so that the 128-byte alignment of the fields can be guaranteed.
> > >
> > > Ah ok, missed the fact that rte_ring is 128B aligned these days.
> > > BTW, I probably missed the initial discussion, but what was the reason 
> > > for that?
> > > Konstantin
> > 
> > I don't know why PROD_ALIGN/CONS_ALIGN use 128 byte alignment; it seems 
> > unnecessary if the cache line is only 64 bytes.  An alternate
> > fix would be to just use cache line alignment for these fields (since 
> > memzones are already cache line aligned). 
> 
> Yes, had the same thought.
> 
> > Maybe there is some deeper  reason for the >= 128-byte alignment logic in 
> > rte_ring.h?
> 
> Might be, would be good to hear opinion the author of that change. 

It gives improved performance for core-2-core transfer.

/Bruce


Re: [dpdk-dev] [PATCH v4 12/26] net/bnxt: add support to set MTU

2017-06-06 Thread Ferruh Yigit
On 6/1/2017 6:07 PM, Ajit Khaparde wrote:
> This patch adds support to modify MTU using the set_mtu dev_op.
> To support frames > 2k, the PMD creates an aggregator ring.
> When a frame greater than 2k is received, it is fragmented
> and the resulting fragments are DMA'ed to the aggregator ring.
> Now the driver can support jumbo frames upto 9500 bytes.
> 
> Signed-off-by: Steeven Li 
> Signed-off-by: Ajit Khaparde 
> 
> --
> v1->v2: regroup related patches and incorporate other review comments
> 
> v2->v3:
>   - rebasing to next-net tree
>   - Use net/bnxt instead of just bnxt in patch subject

<...>

> +int bnxt_hwrm_vnic_plcmode_cfg(struct bnxt *bp,
> + struct bnxt_vnic_info *vnic)
> +{
> + int rc = 0;
> + struct hwrm_vnic_plcmodes_cfg_input req = {.req_type = 0 };
> + struct hwrm_vnic_plcmodes_cfg_output *resp = bp->hwrm_cmd_resp_addr;
> + uint16_t size;
> +
> + HWRM_PREP(req, VNIC_PLCMODES_CFG, -1, resp);
> +
> + req.flags = rte_cpu_to_le_32(
> +//   HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_REGULAR_PLACEMENT |
> + HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_JUMBO_PLACEMENT);
> +//   HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_HDS_IPV4 | //TODO
> +//   HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_HDS_IPV6);

Hi Ajit,

Would you mind if I remove these commented code, in this patch and other
patches, while applying?

Of course it would be better if you send the new version of the patch to
fix them, but I believe I can do this faster. Just let me know please.

Thanks,
ferruh

> + req.enables = rte_cpu_to_le_32(
> + HWRM_VNIC_PLCMODES_CFG_INPUT_ENABLES_JUMBO_THRESH_VALID);
> +//   HWRM_VNIC_PLCMODES_CFG_INPUT_ENABLES_HDS_THRESHOLD_VALID);
> +
> + size = rte_pktmbuf_data_room_size(bp->rx_queues[0]->mb_pool);
> + size -= RTE_PKTMBUF_HEADROOM;
> +
> + req.jumbo_thresh = rte_cpu_to_le_16(size);
> +//   req.hds_threshold = rte_cpu_to_le_16(size);
> + req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id);
> +
> + rc = bnxt_hwrm_send_message(bp, &req, sizeof(req));
> +
> + HWRM_CHECK_RESULT;
> +
> + return rc;
> +}

<...>



Re: [dpdk-dev] [PATCH v4 19/26] net/bnxt: add support for tx loopback, set vf mac and queues drop

2017-06-06 Thread Ferruh Yigit
On 6/1/2017 6:07 PM, Ajit Khaparde wrote:
> Add functions rte_pmd_bnxt_set_tx_loopback,
> rte_pmd_bnxt_set_all_queues_drop_en and
> rte_pmd_bnxt_set_vf_mac_addr to configure tx_loopback,
> queue_drop and VF MAC address setting in the hardware.
> It also adds the necessary functions to send the HWRM commands
> to the firmware.

>From the patch title it is clear that this patch add three different
functionality.

For this patchset, since it already went for a few releases I wouldn't
mind, but for future, please send separate patches for each individual
feature.

Thanks,
ferruh

> 
> Signed-off-by: Steeven Li 
> Signed-off-by: Ajit Khaparde 

<...>


[dpdk-dev] [PATCH v2] mk: allow exec-env specific targets

2017-06-06 Thread Jerin Jacob
Add a hook in generic rte.sdkbuild.mk file
to include exec-env specific targets.

Signed-off-by: Jerin Jacob 
---
Useful in integrating some custom targets in nonstandard execution environments.
For example, a bare-metal-simulator exec execution environment may need
a target to run the dpdk applications.

v2:
- Change mk/exec-env/$(RTE_EXEC_ENV)/rte.extra.mk to
mk/exec-env/$(RTE_EXEC_ENV)/rte.custom.mk(Thomas)
- Remove empty files and include through -include(Thomas)
---
 mk/rte.sdkbuild.mk | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mk/rte.sdkbuild.mk b/mk/rte.sdkbuild.mk
index 0bf909e9e..f6068bb93 100644
--- a/mk/rte.sdkbuild.mk
+++ b/mk/rte.sdkbuild.mk
@@ -38,6 +38,9 @@ else
   include $(RTE_SDK)/mk/rte.vars.mk
 endif
 
+# allow exec-env specific targets
+-include $(RTE_SDK)/mk/exec-env/$(RTE_EXEC_ENV)/rte.custom.mk
+
 buildtools: | lib
 drivers: | lib buildtools
 app: | lib buildtools drivers
-- 
2.13.0



Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation

2017-06-06 Thread Ananyev, Konstantin


> -Original Message-
> From: Richardson, Bruce
> Sent: Tuesday, June 6, 2017 1:42 PM
> To: Ananyev, Konstantin 
> Cc: Verkamp, Daniel ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation
> 
> On Tue, Jun 06, 2017 at 10:59:59AM +0100, Ananyev, Konstantin wrote:
> >
> > > >
> > > >
> > > >
> > > > >
> > > > > The PROD/CONS_ALIGN values on x86-64 are set to 2 cache lines, so 
> > > > > members
> > > > of struct rte_ring are 128 byte aligned,
> > > > >and therefore the whole struct needs 128-byte alignment according to 
> > > > >the ABI
> > > > so that the 128-byte alignment of the fields can be guaranteed.
> > > >
> > > > Ah ok, missed the fact that rte_ring is 128B aligned these days.
> > > > BTW, I probably missed the initial discussion, but what was the reason 
> > > > for that?
> > > > Konstantin
> > >
> > > I don't know why PROD_ALIGN/CONS_ALIGN use 128 byte alignment; it seems 
> > > unnecessary if the cache line is only 64 bytes.  An
> alternate
> > > fix would be to just use cache line alignment for these fields (since 
> > > memzones are already cache line aligned).
> >
> > Yes, had the same thought.
> >
> > > Maybe there is some deeper  reason for the >= 128-byte alignment logic in 
> > > rte_ring.h?
> >
> > Might be, would be good to hear opinion the author of that change.
> 
> It gives improved performance for core-2-core transfer.

You mean empty cache-line(s) after prod/cons, correct?
That's ok but why we can't keep them and whole rte_ring aligned on cache-line 
boundaries?
Something like that:
struct rte_ring {
   ...
   struct rte_ring_headtail prod __rte_cache_aligned;
   EMPTY_CACHE_LINE   __rte_cache_aligned;
   struct rte_ring_headtail cons __rte_cache_aligned;
   EMPTY_CACHE_LINE   __rte_cache_aligned;
};

Konstantin



[dpdk-dev] [PATCH v5 0/2] Balanced allocation of hugepages

2017-06-06 Thread Ilya Maximets
Sorry for so frequent respinning of the series.

Version 5:
* Fixed shared build. (Automated build test will fail
  anyway because libnuma-devel not installed on build servers)

Version 4:
* Fixed work on systems without NUMA by adding check for NUMA
  support in kernel.

Version 3:
* Implemented hybrid schema for allocation.
* Fixed not needed mempolicy change while remapping. (orig = 0)
* Added patch to enable VHOST_NUMA by default.

Version 2:
* rebased (fuzz in Makefile)

Ilya Maximets (2):
  mem: balanced allocation of hugepages
  config: enable vhost numa awareness by default

 config/common_base   |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile |  1 +
 lib/librte_eal/linuxapp/eal/eal_memory.c | 94 ++--
 mk/rte.app.mk|  3 +
 4 files changed, 95 insertions(+), 5 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH v5 1/2] mem: balanced allocation of hugepages

2017-06-06 Thread Ilya Maximets
Currently EAL allocates hugepages one by one not paying attention
from which NUMA node allocation was done.

Such behaviour leads to allocation failure if number of available
hugepages for application limited by cgroups or hugetlbfs and
memory requested not only from the first socket.

Example:
# 90 x 1GB hugepages availavle in a system

cgcreate -g hugetlb:/test
# Limit to 32GB of hugepages
cgset -r hugetlb.1GB.limit_in_bytes=34359738368 test
# Request 4GB from each of 2 sockets
cgexec -g hugetlb:test testpmd --socket-mem=4096,4096 ...

EAL: SIGBUS: Cannot mmap more hugepages of size 1024 MB
EAL: 32 not 90 hugepages of size 1024 MB allocated
EAL: Not enough memory available on socket 1!
 Requested: 4096MB, available: 0MB
PANIC in rte_eal_init():
Cannot init memory

This happens beacause all allocated pages are
on socket 0.

Fix this issue by setting mempolicy MPOL_PREFERRED for each hugepage
to one of requested nodes using following schema:

1) Allocate essential hugepages:
1.1) Allocate as many hugepages from numa N to
 only fit requested memory for this numa.
1.2) repeat 1.1 for all numa nodes.
2) Try to map all remaining free hugepages in a round-robin
   fashion.
3) Sort pages and choose the most suitable.

In this case all essential memory will be allocated and all remaining
pages will be fairly distributed between all requested nodes.

libnuma added as a general dependency for EAL.

Fixes: 77988fc08dc5 ("mem: fix allocating all free hugepages")

Signed-off-by: Ilya Maximets 
---
 lib/librte_eal/linuxapp/eal/Makefile |  1 +
 lib/librte_eal/linuxapp/eal/eal_memory.c | 94 ++--
 mk/rte.app.mk|  3 +
 3 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 640afd0..1440fc5 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -50,6 +50,7 @@ LDLIBS += -ldl
 LDLIBS += -lpthread
 LDLIBS += -lgcc_s
 LDLIBS += -lrt
+LDLIBS += -lnuma
 
 # specific to linuxapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) := eal.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 9c9baf6..5947434 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -358,6 +359,19 @@ static int huge_wrap_sigsetjmp(void)
return sigsetjmp(huge_jmpenv, 1);
 }
 
+#ifndef ULONG_SIZE
+#define ULONG_SIZE sizeof(unsigned long)
+#endif
+#ifndef ULONG_BITS
+#define ULONG_BITS (ULONG_SIZE * CHAR_BIT)
+#endif
+#ifndef DIV_ROUND_UP
+#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
+#endif
+#ifndef BITS_TO_LONGS
+#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, ULONG_SIZE)
+#endif
+
 /*
  * Mmap all hugepages of hugepage table: it first open a file in
  * hugetlbfs, then mmap() hugepage_sz data in it. If orig is set, the
@@ -366,18 +380,78 @@ static int huge_wrap_sigsetjmp(void)
  * map continguous physical blocks in contiguous virtual blocks.
  */
 static unsigned
-map_all_hugepages(struct hugepage_file *hugepg_tbl,
-   struct hugepage_info *hpi, int orig)
+map_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi,
+ uint64_t *essential_memory, int orig)
 {
int fd;
unsigned i;
void *virtaddr;
void *vma_addr = NULL;
size_t vma_len = 0;
+   unsigned long nodemask[BITS_TO_LONGS(RTE_MAX_NUMA_NODES)] = {0UL};
+   unsigned long maxnode = 0;
+   int node_id = -1;
+   bool numa_available = true;
+
+   /* Check if kernel supports NUMA. */
+   if (get_mempolicy(NULL, NULL, 0, 0, 0) < 0 && errno == ENOSYS) {
+   RTE_LOG(DEBUG, EAL, "NUMA is not supported.\n");
+   numa_available = false;
+   }
+
+   if (orig && numa_available) {
+   for (i = 0; i < RTE_MAX_NUMA_NODES; i++)
+   if (internal_config.socket_mem[i])
+   maxnode = i + 1;
+   }
 
for (i = 0; i < hpi->num_pages[0]; i++) {
uint64_t hugepage_sz = hpi->hugepage_sz;
 
+   if (maxnode) {
+   unsigned int j;
+
+   for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
+   if (essential_memory[j])
+   break;
+
+   if (j == RTE_MAX_NUMA_NODES) {
+   node_id = (node_id + 1) % RTE_MAX_NUMA_NODES;
+   while (!internal_config.socket_mem[node_id]) {
+   node_id++;
+   node_

[dpdk-dev] [PATCH v5 2/2] config: enable vhost numa awareness by default

2017-06-06 Thread Ilya Maximets
Since libnuma is added as a general dependency for EAL,
it is safe to enable LIBRTE_VHOST_NUMA by default.

Signed-off-by: Ilya Maximets 
---
 config/common_base | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config/common_base b/config/common_base
index c858769..db4cc1c 100644
--- a/config/common_base
+++ b/config/common_base
@@ -708,7 +708,7 @@ CONFIG_RTE_LIBRTE_PDUMP=y
 # Compile vhost user library
 #
 CONFIG_RTE_LIBRTE_VHOST=n
-CONFIG_RTE_LIBRTE_VHOST_NUMA=n
+CONFIG_RTE_LIBRTE_VHOST_NUMA=y
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 
 #
-- 
2.7.4



Re: [dpdk-dev] [PATCH] net/thunderx: manage PCI device mapping for SQS VFs

2017-06-06 Thread Ferruh Yigit
On 6/1/2017 2:05 PM, Jerin Jacob wrote:
> Since the commit e84ad157b7bc ("pci: unmap resources if probe fails"),
> EAL unmaps the PCI device if ethdev probe returns positive or
> negative value.
> 
> nicvf thunderx PMD needs special treatment for Secondary queue set(SQS)
> PCIe VF devices, where, it expects to not unmap or free the memory
> without registering the ethdev subsystem.
> 
> To keep the same behavior, moved the PCI map function inside
> the driver without using the EAL services.

What do you think adding a flag something like
RTE_PCI_DRV_FIXED_MAPPING? Does mapping but not unmap on error.
This would be more generic solution.

I am concerned about calling eal level API from PMD.

> 
> Signed-off-by: Jerin Jacob 
> Signed-off-by: Angela Czubak 
> ---
>  drivers/net/thunderx/nicvf_ethdev.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/thunderx/nicvf_ethdev.c 
> b/drivers/net/thunderx/nicvf_ethdev.c
> index 796701b0f..6ec2f9266 100644
> --- a/drivers/net/thunderx/nicvf_ethdev.c
> +++ b/drivers/net/thunderx/nicvf_ethdev.c
> @@ -2025,6 +2025,13 @@ nicvf_eth_dev_init(struct rte_eth_dev *eth_dev)
>   }
>  
>   pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
> +
> + ret = rte_pci_map_device(pci_dev);
> + if (ret) {
> + PMD_INIT_LOG(ERR, "Failed to map pci device");
> + goto fail;
> + }
> +
>   rte_eth_copy_pci_info(eth_dev, pci_dev);
>  
>   nic->device_id = pci_dev->id.device_id;
> @@ -2171,7 +2178,7 @@ static int nicvf_eth_pci_remove(struct rte_pci_device 
> *pci_dev)
>  
>  static struct rte_pci_driver rte_nicvf_pmd = {
>   .id_table = pci_id_nicvf_map,
> - .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
> + .drv_flags = RTE_PCI_DRV_INTR_LSC,
>   .probe = nicvf_eth_pci_probe,
>   .remove = nicvf_eth_pci_remove,
>  };
> 



Re: [dpdk-dev] [PATCH v4 12/26] net/bnxt: add support to set MTU

2017-06-06 Thread Ajit Khaparde
Ferruh, if it save times, can you please do that.

Thanks
Ajit

On Tue, Jun 6, 2017 at 7:47 AM, Ferruh Yigit  wrote:

> On 6/1/2017 6:07 PM, Ajit Khaparde wrote:
> > This patch adds support to modify MTU using the set_mtu dev_op.
> > To support frames > 2k, the PMD creates an aggregator ring.
> > When a frame greater than 2k is received, it is fragmented
> > and the resulting fragments are DMA'ed to the aggregator ring.
> > Now the driver can support jumbo frames upto 9500 bytes.
> >
> > Signed-off-by: Steeven Li 
> > Signed-off-by: Ajit Khaparde 
> >
> > --
> > v1->v2: regroup related patches and incorporate other review comments
> >
> > v2->v3:
> >   - rebasing to next-net tree
> >   - Use net/bnxt instead of just bnxt in patch subject
>
> <...>
>
> > +int bnxt_hwrm_vnic_plcmode_cfg(struct bnxt *bp,
> > + struct bnxt_vnic_info *vnic)
> > +{
> > + int rc = 0;
> > + struct hwrm_vnic_plcmodes_cfg_input req = {.req_type = 0 };
> > + struct hwrm_vnic_plcmodes_cfg_output *resp =
> bp->hwrm_cmd_resp_addr;
> > + uint16_t size;
> > +
> > + HWRM_PREP(req, VNIC_PLCMODES_CFG, -1, resp);
> > +
> > + req.flags = rte_cpu_to_le_32(
> > +//   HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_REGULAR_PLACEMENT
> |
> > + HWRM_VNIC_PLCMODES_CFG_INPUT_
> FLAGS_JUMBO_PLACEMENT);
> > +//   HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_HDS_IPV4 |
> //TODO
> > +//   HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_HDS_IPV6);
>
> Hi Ajit,
>
> Would you mind if I remove these commented code, in this patch and other
> patches, while applying?
>
> Of course it would be better if you send the new version of the patch to
> fix them, but I believe I can do this faster. Just let me know please.
>
> Thanks,
> ferruh
>
> > + req.enables = rte_cpu_to_le_32(
> > + HWRM_VNIC_PLCMODES_CFG_INPUT_ENABLES_JUMBO_THRESH_VALID);
> > +//   HWRM_VNIC_PLCMODES_CFG_INPUT_ENABLES_HDS_THRESHOLD_VALID);
> > +
> > + size = rte_pktmbuf_data_room_size(bp->rx_queues[0]->mb_pool);
> > + size -= RTE_PKTMBUF_HEADROOM;
> > +
> > + req.jumbo_thresh = rte_cpu_to_le_16(size);
> > +//   req.hds_threshold = rte_cpu_to_le_16(size);
> > + req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id);
> > +
> > + rc = bnxt_hwrm_send_message(bp, &req, sizeof(req));
> > +
> > + HWRM_CHECK_RESULT;
> > +
> > + return rc;
> > +}
>
> <...>
>
>


Re: [dpdk-dev] [PATCH] net/thunderx: manage PCI device mapping for SQS VFs

2017-06-06 Thread Jerin Jacob
-Original Message-
> Date: Tue, 6 Jun 2017 14:36:09 +0100
> From: Ferruh Yigit 
> To: Jerin Jacob , dev@dpdk.org
> CC: Angela Czubak , Thomas Monjalon
>  
> Subject: Re: [dpdk-dev] [PATCH] net/thunderx: manage PCI device mapping for
>  SQS VFs
> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.1.1
> 
> On 6/1/2017 2:05 PM, Jerin Jacob wrote:
> > Since the commit e84ad157b7bc ("pci: unmap resources if probe fails"),
> > EAL unmaps the PCI device if ethdev probe returns positive or
> > negative value.
> > 
> > nicvf thunderx PMD needs special treatment for Secondary queue set(SQS)
> > PCIe VF devices, where, it expects to not unmap or free the memory
> > without registering the ethdev subsystem.
> > 
> > To keep the same behavior, moved the PCI map function inside
> > the driver without using the EAL services.
> 
> What do you think adding a flag something like
> RTE_PCI_DRV_FIXED_MAPPING? Does mapping but not unmap on error.
> This would be more generic solution.
> 
> I am concerned about calling eal level API from PMD.

Understood.

Another option is to unmap only on ERROR(ie, when probe return <0 value)

ret = dr->probe(dr, dev);
if (ret) { // change to if (ret < 0)
dev->driver = NULL;
if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)
rte_pci_unmap_device(dev);
}

I am fine with either way. Let me know, what you prefer. I will
change accordingly.

>
> > 
> > Signed-off-by: Jerin Jacob 
> > Signed-off-by: Angela Czubak 
> > ---
> >  drivers/net/thunderx/nicvf_ethdev.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/thunderx/nicvf_ethdev.c 
> > b/drivers/net/thunderx/nicvf_ethdev.c
> > index 796701b0f..6ec2f9266 100644
> > --- a/drivers/net/thunderx/nicvf_ethdev.c
> > +++ b/drivers/net/thunderx/nicvf_ethdev.c
> > @@ -2025,6 +2025,13 @@ nicvf_eth_dev_init(struct rte_eth_dev *eth_dev)
> > }
> >  
> > pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
> > +
> > +   ret = rte_pci_map_device(pci_dev);
> > +   if (ret) {
> > +   PMD_INIT_LOG(ERR, "Failed to map pci device");
> > +   goto fail;
> > +   }
> > +
> > rte_eth_copy_pci_info(eth_dev, pci_dev);
> >  
> > nic->device_id = pci_dev->id.device_id;
> > @@ -2171,7 +2178,7 @@ static int nicvf_eth_pci_remove(struct rte_pci_device 
> > *pci_dev)
> >  
> >  static struct rte_pci_driver rte_nicvf_pmd = {
> > .id_table = pci_id_nicvf_map,
> > -   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
> > +   .drv_flags = RTE_PCI_DRV_INTR_LSC,
> > .probe = nicvf_eth_pci_probe,
> > .remove = nicvf_eth_pci_remove,
> >  };
> > 
> 


Re: [dpdk-dev] [PATCH v4 19/26] net/bnxt: add support for tx loopback, set vf mac and queues drop

2017-06-06 Thread Ajit Khaparde
On Tue, Jun 6, 2017 at 7:53 AM, Ferruh Yigit  wrote:

> On 6/1/2017 6:07 PM, Ajit Khaparde wrote:
> > Add functions rte_pmd_bnxt_set_tx_loopback,
> > rte_pmd_bnxt_set_all_queues_drop_en and
> > rte_pmd_bnxt_set_vf_mac_addr to configure tx_loopback,
> > queue_drop and VF MAC address setting in the hardware.
> > It also adds the necessary functions to send the HWRM commands
> > to the firmware.
>
> From the patch title it is clear that this patch add three different
> functionality.
>
> For this patchset, since it already went for a few releases I wouldn't
> mind, but for future, please send separate patches for each individual
> feature.



​Sure, Ferruh.

Thanks
Ajit​


[dpdk-dev] [PATCH v2] eventdev: remove PCI dependency

2017-06-06 Thread Jerin Jacob
Remove the PCI dependency from generic data structures
and moved the PCI specific code to rte_event_pmd_pci*

CC: Gaetan Rivet 
Signed-off-by: Jerin Jacob 
---
v2:
- Remove rte_pci.h from rte_eventdev.c(Gaetan)
---
 drivers/event/skeleton/skeleton_eventdev.c | 30 +-
 lib/librte_eventdev/rte_eventdev.c | 38 +++---
 lib/librte_eventdev/rte_eventdev.h |  2 -
 lib/librte_eventdev/rte_eventdev_pmd.h | 63 --
 4 files changed, 41 insertions(+), 92 deletions(-)

diff --git a/drivers/event/skeleton/skeleton_eventdev.c 
b/drivers/event/skeleton/skeleton_eventdev.c
index 800bd76e0..34684aba0 100644
--- a/drivers/event/skeleton/skeleton_eventdev.c
+++ b/drivers/event/skeleton/skeleton_eventdev.c
@@ -427,18 +427,28 @@ static const struct rte_pci_id pci_id_skeleton_map[] = {
},
 };
 
-static struct rte_eventdev_driver pci_eventdev_skeleton_pmd = {
-   .pci_drv = {
-   .id_table = pci_id_skeleton_map,
-   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
-   .probe = rte_event_pmd_pci_probe,
-   .remove = rte_event_pmd_pci_remove,
-   },
-   .eventdev_init = skeleton_eventdev_init,
-   .dev_private_size = sizeof(struct skeleton_eventdev),
+static int
+event_skeleton_pci_probe(struct rte_pci_driver *pci_drv,
+struct rte_pci_device *pci_dev)
+{
+   return rte_event_pmd_pci_probe(pci_drv, pci_dev,
+   sizeof(struct skeleton_eventdev), skeleton_eventdev_init);
+}
+
+static int
+event_skeleton_pci_remove(struct rte_pci_device *pci_dev)
+{
+   return rte_event_pmd_pci_remove(pci_dev, NULL);
+}
+
+static struct rte_pci_driver pci_eventdev_skeleton_pmd = {
+   .id_table = pci_id_skeleton_map,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+   .probe = event_skeleton_pci_probe,
+   .remove = event_skeleton_pci_remove,
 };
 
-RTE_PMD_REGISTER_PCI(event_skeleton_pci, pci_eventdev_skeleton_pmd.pci_drv);
+RTE_PMD_REGISTER_PCI(event_skeleton_pci, pci_eventdev_skeleton_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(event_skeleton_pci, pci_id_skeleton_map);
 
 /* VDEV based event device */
diff --git a/lib/librte_eventdev/rte_eventdev.c 
b/lib/librte_eventdev/rte_eventdev.c
index 20afc3f0e..fd0406747 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -45,7 +45,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -126,8 +125,6 @@ rte_event_dev_info_get(uint8_t dev_id, struct 
rte_event_dev_info *dev_info)
dev_info->dequeue_timeout_ns = dev->data->dev_conf.dequeue_timeout_ns;
 
dev_info->dev = dev->dev;
-   if (dev->driver)
-   dev_info->driver_name = dev->driver->pci_drv.driver.name;
return 0;
 }
 
@@ -1250,18 +1247,18 @@ rte_event_pmd_vdev_uninit(const char *name)
 
 int
 rte_event_pmd_pci_probe(struct rte_pci_driver *pci_drv,
-   struct rte_pci_device *pci_dev)
+   struct rte_pci_device *pci_dev,
+   size_t private_data_size,
+   eventdev_pmd_pci_callback_t devinit)
 {
-   struct rte_eventdev_driver *eventdrv;
struct rte_eventdev *eventdev;
 
char eventdev_name[RTE_EVENTDEV_NAME_MAX_LEN];
 
int retval;
 
-   eventdrv = (struct rte_eventdev_driver *)pci_drv;
-   if (eventdrv == NULL)
-   return -ENODEV;
+   if (devinit == NULL)
+   return -EINVAL;
 
rte_pci_device_name(&pci_dev->addr, eventdev_name,
sizeof(eventdev_name));
@@ -1275,7 +1272,7 @@ rte_event_pmd_pci_probe(struct rte_pci_driver *pci_drv,
eventdev->data->dev_private =
rte_zmalloc_socket(
"eventdev private structure",
-   eventdrv->dev_private_size,
+   private_data_size,
RTE_CACHE_LINE_SIZE,
rte_socket_id());
 
@@ -1285,10 +1282,9 @@ rte_event_pmd_pci_probe(struct rte_pci_driver *pci_drv,
}
 
eventdev->dev = &pci_dev->device;
-   eventdev->driver = eventdrv;
 
/* Invoke PMD device initialization function */
-   retval = (*eventdrv->eventdev_init)(eventdev);
+   retval = devinit(eventdev);
if (retval == 0)
return 0;
 
@@ -1307,12 +1303,12 @@ rte_event_pmd_pci_probe(struct rte_pci_driver *pci_drv,
 }
 
 int
-rte_event_pmd_pci_remove(struct rte_pci_device *pci_dev)
+rte_event_pmd_pci_remove(struct rte_pci_device *pci_dev,
+eventdev_pmd_pci_callback_t devuninit)
 {
-   const struct rte_eventdev_driver *eventdrv;
struct rte_eventdev *eventdev;
char eventdev_name[RTE_EVENTDEV_NAME_MAX_LEN];
-   int ret;
+   int ret = 0;
 
if (pci_dev

Re: [dpdk-dev] [PATCH v2] Correctly handle malloc_elem resize with padding

2017-06-06 Thread Sergio Gonzalez Monroy

Hi Jamie,

On 31/05/2017 01:16, Jamie Lavigne wrote:

Currently when a malloc_elem is split after resizing, any padding
present in the elem is ignored.  This causes the resized elem to be too
small when padding is present, and user data can overwrite the beginning
of the following malloc_elem.

Solve this by including the size of the padding when computing where to
split the malloc_elem.


Nice catch!

Could you please rework commit format a bit:
- Add 'mem:' as prefix in your patch title
- I would mention in the title that this is a fix
- Provide 'Fixes' line in commit message


Signed-off-by: Jamie Lavigne 
---
  lib/librte_eal/common/malloc_elem.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/malloc_elem.c 
b/lib/librte_eal/common/malloc_elem.c
index 42568e1..8766fa8 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -333,9 +333,11 @@ malloc_elem_resize(struct malloc_elem *elem, size_t size)
elem_free_list_remove(next);
join_elem(elem, next);
  
-	if (elem->size - new_size >= MIN_DATA_SIZE + MALLOC_ELEM_OVERHEAD){

+   const size_t new_total_size = new_size + elem->pad;
+
+   if (elem->size - new_total_size >= MIN_DATA_SIZE + 
MALLOC_ELEM_OVERHEAD) {
/* now we have a big block together. Lets cut it down a bit, by 
splitting */
-   struct malloc_elem *split_pt = RTE_PTR_ADD(elem, new_size);
+   struct malloc_elem *split_pt = RTE_PTR_ADD(elem, 
new_total_size);
split_pt = RTE_PTR_ALIGN_CEIL(split_pt, RTE_CACHE_LINE_SIZE);
split_elem(elem, split_pt);
malloc_elem_free_list_insert(split_pt);


This indeed fixes the issue you have mentioned. I was thinking of the 
following fix instead:

- Add elem->pad to new_size
- Remove current_size var and instead use elem->size

I think those changes should have the same result while removing a 
couple of vars from the function, which I hope would be easier to read.


What do you think?

Thanks,
Sergio


[dpdk-dev] [PATCH] doc: add VLAN flow limitation on mlx5 PMD

2017-06-06 Thread Shahaf Shuler
On mlx5 PMD Flow pattern without any specific vlan will match for vlan
packets as well.

Signed-off-by: Shahaf Shuler 
Acked-by: Nelio Laranjeiro nelio.laranje...@6wind.com
---
 doc/guides/rel_notes/release_17_08.rst | 16 
 1 file changed, 16 insertions(+)

diff --git a/doc/guides/rel_notes/release_17_08.rst 
b/doc/guides/rel_notes/release_17_08.rst
index 7f1212094..bd219640c 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -111,6 +111,22 @@ Known Issues
Also, make sure to start the actual text at the margin.
=
 
+* **On mlx5 PMD, Flow pattern without any specific vlan will match for vlan 
packets as well.**
+
+  When VLAN spec is not specified in the pattern, the matching rule will be 
created with VLAN as a wild card.
+  Meaning, the flow rule::
+
+flow create 0 ingress pattern eth / vlan vid is 3 / ipv4 / end ...
+
+  Will only match vlan packets with vid=3. and the flow rules::
+
+flow create 0 ingress pattern eth / ipv4 / end ...
+
+  Or::
+
+flow create 0 ingress pattern eth / vlan / ipv4 / end ...
+
+  Will match any ipv4 packet (VLAN included).
 
 API Changes
 ---
-- 
2.12.0



Re: [dpdk-dev] [PATCH v4 12/26] net/bnxt: add support to set MTU

2017-06-06 Thread Ferruh Yigit
On 6/6/2017 3:00 PM, Ajit Khaparde wrote:
> Ferruh, if it save times, can you please do that.

Done.

> 
> Thanks
> Ajit
> 
> On Tue, Jun 6, 2017 at 7:47 AM, Ferruh Yigit  wrote:
> 
>> On 6/1/2017 6:07 PM, Ajit Khaparde wrote:
>>> This patch adds support to modify MTU using the set_mtu dev_op.
>>> To support frames > 2k, the PMD creates an aggregator ring.
>>> When a frame greater than 2k is received, it is fragmented
>>> and the resulting fragments are DMA'ed to the aggregator ring.
>>> Now the driver can support jumbo frames upto 9500 bytes.
>>>
>>> Signed-off-by: Steeven Li 
>>> Signed-off-by: Ajit Khaparde 
>>>
>>> --
>>> v1->v2: regroup related patches and incorporate other review comments
>>>
>>> v2->v3:
>>>   - rebasing to next-net tree
>>>   - Use net/bnxt instead of just bnxt in patch subject
>>
>> <...>
>>
>>> +int bnxt_hwrm_vnic_plcmode_cfg(struct bnxt *bp,
>>> + struct bnxt_vnic_info *vnic)
>>> +{
>>> + int rc = 0;
>>> + struct hwrm_vnic_plcmodes_cfg_input req = {.req_type = 0 };
>>> + struct hwrm_vnic_plcmodes_cfg_output *resp =
>> bp->hwrm_cmd_resp_addr;
>>> + uint16_t size;
>>> +
>>> + HWRM_PREP(req, VNIC_PLCMODES_CFG, -1, resp);
>>> +
>>> + req.flags = rte_cpu_to_le_32(
>>> +//   HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_REGULAR_PLACEMENT
>> |
>>> + HWRM_VNIC_PLCMODES_CFG_INPUT_
>> FLAGS_JUMBO_PLACEMENT);
>>> +//   HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_HDS_IPV4 |
>> //TODO
>>> +//   HWRM_VNIC_PLCMODES_CFG_INPUT_FLAGS_HDS_IPV6);
>>
>> Hi Ajit,
>>
>> Would you mind if I remove these commented code, in this patch and other
>> patches, while applying?
>>
>> Of course it would be better if you send the new version of the patch to
>> fix them, but I believe I can do this faster. Just let me know please.
>>
>> Thanks,
>> ferruh
>>
>>> + req.enables = rte_cpu_to_le_32(
>>> + HWRM_VNIC_PLCMODES_CFG_INPUT_ENABLES_JUMBO_THRESH_VALID);
>>> +//   HWRM_VNIC_PLCMODES_CFG_INPUT_ENABLES_HDS_THRESHOLD_VALID);
>>> +
>>> + size = rte_pktmbuf_data_room_size(bp->rx_queues[0]->mb_pool);
>>> + size -= RTE_PKTMBUF_HEADROOM;
>>> +
>>> + req.jumbo_thresh = rte_cpu_to_le_16(size);
>>> +//   req.hds_threshold = rte_cpu_to_le_16(size);
>>> + req.vnic_id = rte_cpu_to_le_32(vnic->fw_vnic_id);
>>> +
>>> + rc = bnxt_hwrm_send_message(bp, &req, sizeof(req));
>>> +
>>> + HWRM_CHECK_RESULT;
>>> +
>>> + return rc;
>>> +}
>>
>> <...>
>>
>>



Re: [dpdk-dev] [PATCH v2] kni: add new mbuf in alloc_q only based on its empty slots

2017-06-06 Thread gowrishankar muthukrishnan

Hi Ferruh,
Just wanted to check with you on the verdict of this patch, whether we 
are waiting for

any objection/ack ?.

Thanks,
Gowrishankar

On Thursday 01 June 2017 02:48 PM, Ferruh Yigit wrote:

On 6/1/2017 6:56 AM, gowrishankar muthukrishnan wrote:

Hi Ferruh,

On Wednesday 31 May 2017 09:51 PM, Ferruh Yigit wrote:


I have sampled below data in x86_64 for KNI on ixgbe pmd. iperf server

runs on
remote interface connecting PMD and iperf client runs on KNI interface,
so as to
create more egress from KNI into DPDK (w/o and with this patch) for 1MB and
100MB data. rx and tx stats are from kni app (USR1).

100MB w/o patch 1.28Gbps
rx  txalloc_call  alloc_call_mt1tx freembuf_call
3933 72464 51042  42472  1560540

Some math:

alloc called 51042 times with allocating 32 mbufs each time,
51042 * 32 = 1633344

freed mbufs: 1560540

used mbufs: 1633344 - 1560540 = 72804

72804 =~ 72464, so looks correct.

Which means rte_kni_rx_burst() called 51042 times and 72464 buffers
received.

As you already mentioned, for each call kernel able to put only 1-2
packets into the fifo. This number is close to 3 for my test with KNI PMD.

And for this case, agree your patch looks reasonable.

But what if kni has more egress traffic, that able to put >= 32 packets
between each rte_kni_rx_burst()?
For that case this patch introduces extra cost to get allocq_free count.

Are there case(s) we see kernel thread writing txq faster at a rate
higher than kni application
could dequeue it ?. In my understanding, KNI is suppose to be a slow
path as it puts
packets back into network stack (control plane ?).

Kernel thread doesn't need to be faster than what app can dequeue,  it
is enough if kernel thread can put 32 or more packets for this case, but
I see this goes to same place.

And for kernel multi-thread mode, each kernel thread has more time to
enqueue packets, although I don't have the numbers.


Regards,
Gowrishankar


Overall I am not disagree with patch, but I have concern if this would
cause performance loss some cases while making better for this one. That
would help a lot if KNI users test and comment.

For me, applying patch didn't give any difference in final performance
numbers, but if there is no objection, I am OK to get this patch.









Re: [dpdk-dev] [PATCH v4 00/26] bnxt patchset

2017-06-06 Thread Ferruh Yigit
On 6/1/2017 6:06 PM, Ajit Khaparde wrote:
> This patchset amongst other changes adds support few more dev_ops,
> updates HWRM to version 1.7.7, switches to polling stats from the
> hardware, support for Jumbo MTU, LRO etc..
> 
> v1->v2:
>   - Grouped in the end, it also has PMD specific APIs to control VF from PF.
>   - I have updated the release notes and the features file wherever possible.
> 
> v2->v3:
>   - Rebasing to next-net tree
>   - Use net/bnxt instead of just bnxt in patch subject
>   - update testpmd to use the vendor specific APIs
>   - Addressed other review comments as appropriate
> 
> v3->v4:
>   - fix a rebase error
> 
>   --
> 
>   net/bnxt: update to new HWRM version
>   net/bnxt: code reorg to properly allocate resources for PF/VF
>   net/bnxt: handle VF/PF initialization appropriately
>   net/bnxt: support lack of huge pages
>   net/bnxt: add additonal HWRM debug info to error messages
>   net/bnxt: add tunneling support
>   net/bnxt: add support for xstats get/reset
>   net/bnxt: add support for VLAN filter and strip
>   net/bnxt: add support for set multicast addr list and MAC addr set
>   doc: update bnxt.ini to document Allmulticast mode
>   net/bnxt: add support to get fw version
>   net/bnxt: add support to set MTU
>   net/bnxt: add support for LRO
>   net/bnxt: add rxq/txq info_get
>   net/bnxt: add code to support VLAN pvid
>   net/bnxt: reorg the query stats code
>   doc: update default.ini to add LED support
>   net/bnxt: add support for led on/off
>   net/bnxt: add support for tx loopback, set vf mac and queues drop
>   net/bnxt: add support for set VF QOS and MAC anti spoof
>   net/bnxt: add support to get and clear VF specific stats
>   net/bnxt: add code to determine the Rx status of VF
>   net/bnxt: add support to add a VF MAC address
>   net/bnxt: add code to configure a default VF VLAN
>   net/bnxt: add support to set VF rxmode
>   doc: update release notes

Series applied to dpdk-next-net/master, thanks.


[dpdk-dev] [PATCH v3] net/mlx4: support user space rxq interrupt event

2017-06-06 Thread Moti Haimovsky
v3:
- Reverted cleanups not part of this commit.
v2:
- Removed unneeded comments.

Moti Haimovsky (1):
  net/mlx4: support user space rxq interrupt event

 doc/guides/nics/features/mlx4.ini  |   1 +
 doc/guides/rel_notes/release_17_08.rst |   5 +
 drivers/net/mlx4/mlx4.c| 207 -
 drivers/net/mlx4/mlx4.h|   1 +
 4 files changed, 213 insertions(+), 1 deletion(-)

-- 
1.8.3.1



[dpdk-dev] [PATCH v3] net/mlx4: support user space rxq interrupt event

2017-06-06 Thread Moti Haimovsky
Implement rxq interrupt callbacks

Signed-off-by: Moti Haimovsky 
---
 doc/guides/nics/features/mlx4.ini  |   1 +
 doc/guides/rel_notes/release_17_08.rst |   5 +
 drivers/net/mlx4/mlx4.c| 207 -
 drivers/net/mlx4/mlx4.h|   1 +
 4 files changed, 213 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features/mlx4.ini 
b/doc/guides/nics/features/mlx4.ini
index 285f0ec..1a5e08b 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -6,6 +6,7 @@
 [Features]
 Link status  = Y
 Link status event= Y
+Rx interrupt = Y
 Removal event= Y
 Queue start/stop = Y
 MTU update   = Y
diff --git a/doc/guides/rel_notes/release_17_08.rst 
b/doc/guides/rel_notes/release_17_08.rst
index 74aae10..3bb83db 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -41,6 +41,11 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+   * **Added support for Rx interrupts on mlx4 driver.**
+
+ Rx queues can be armed with an interrupt which will trigger on the
+ next packet arrival.
+
 
 Resolved Issues
 ---
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ec4419a..6239ac3 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -75,6 +75,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Generated configuration header. */
 #include "mlx4_autoconf.h"
@@ -127,6 +128,24 @@ struct mlx4_conf {
NULL,
 };
 
+static int
+mlx4_rx_intr_enable(struct rte_eth_dev *dev, uint16_t idx);
+
+static int
+mlx4_rx_intr_disable(struct rte_eth_dev *dev, uint16_t idx);
+
+static int
+priv_intr_efd_enable(struct priv *priv);
+
+static void
+priv_intr_efd_disable(struct priv *priv);
+
+static int
+priv_create_intr_vec(struct priv *priv);
+
+static void
+priv_destroy_intr_vec(struct priv *priv);
+
 /**
  * Check if running as a secondary process.
  *
@@ -2756,6 +2775,8 @@ struct txq_mp2mr_mbuf_check_data {
}
if (rxq->cq != NULL)
claim_zero(ibv_destroy_cq(rxq->cq));
+   if (rxq->channel != NULL)
+   claim_zero(ibv_destroy_comp_channel(rxq->channel));
if (rxq->rd != NULL) {
struct ibv_exp_destroy_res_domain_attr attr = {
.comp_mask = 0,
@@ -3696,11 +3717,22 @@ struct txq_mp2mr_mbuf_check_data {
  (void *)dev, strerror(ret));
goto error;
}
+   if (dev->data->dev_conf.intr_conf.rxq) {
+   tmpl.channel = ibv_create_comp_channel(priv->ctx);
+   if (tmpl.channel == NULL) {
+   dev->data->dev_conf.intr_conf.rxq = 0;
+   ret = ENOMEM;
+   ERROR("%p: Comp Channel creation failure: %s",
+ (void *)dev, strerror(ret));
+   goto error;
+   }
+   }
attr.cq = (struct ibv_exp_cq_init_attr){
.comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
.res_domain = tmpl.rd,
};
-   tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
+   tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, tmpl.channel, 0,
+   &attr.cq);
if (tmpl.cq == NULL) {
ret = ENOMEM;
ERROR("%p: CQ creation failure: %s",
@@ -4005,6 +4037,11 @@ struct txq_mp2mr_mbuf_check_data {
 (void *)dev);
goto err;
}
+   if (dev->data->dev_conf.intr_conf.rxq) {
+   ret = priv_intr_efd_enable(priv);
+   if (!ret)
+   ret = priv_create_intr_vec(priv);
+   }
ret = mlx4_priv_flow_start(priv);
if (ret) {
ERROR("%p: flow start failed: %s",
@@ -4197,6 +4234,10 @@ struct txq_mp2mr_mbuf_check_data {
assert(priv->ctx == NULL);
priv_dev_removal_interrupt_handler_uninstall(priv, dev);
priv_dev_link_interrupt_handler_uninstall(priv, dev);
+   if (priv->dev->data->dev_conf.intr_conf.rxq) {
+   priv_destroy_intr_vec(priv);
+   priv_intr_efd_disable(priv);
+   }
priv_unlock(priv);
memset(priv, 0, sizeof(*priv));
 }
@@ -5157,6 +5198,8 @@ struct txq_mp2mr_mbuf_check_data {
.mac_addr_set = mlx4_mac_addr_set,
.mtu_set = mlx4_dev_set_mtu,
.filter_ctrl = mlx4_dev_filter_ctrl,
+   .rx_queue_intr_enable = mlx4_rx_intr_enable,
+   .rx_queue_intr_disable = mlx4_rx_intr_disable,
 };
 
 /**
@@ -5592,6 +5635,168 @@ struct txq_mp2mr_mbuf_check_data {
 }
 
 /**
+ * Fill epoll fd list for rxq interrupts.
+ *
+ * @param priv
+ *   Poinetr to private structure.
+ *
+ * @return
+ *   0 on success, negative on failure.
+ */
+static int
+priv_intr_efd_enable(struct p

Re: [dpdk-dev] [dpdk-stable] [PATCH] net/i40e: fix VF statistics

2017-06-06 Thread Ferruh Yigit
On 6/5/2017 10:14 PM, Qi Zhang wrote:
> CRC bytes should be excluded, so rx/tx bytes of VF stats is aligned
> with PF stats.
> 
> Fixes: 9aace75fc82e ("i40e: fix statistics")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Qi Zhang 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH v2] eventdev: remove PCI dependency

2017-06-06 Thread Gaëtan Rivet
On Tue, Jun 06, 2017 at 07:40:46PM +0530, Jerin Jacob wrote:
> Remove the PCI dependency from generic data structures
> and moved the PCI specific code to rte_event_pmd_pci*
> 
> CC: Gaetan Rivet 
> Signed-off-by: Jerin Jacob 
> ---
> v2:
> - Remove rte_pci.h from rte_eventdev.c(Gaetan)

Unfortunately this is not sufficient. There is still
rte_eventdev_pmd.h including rte_pci.h. As you said, it should be
possible for the PMD API to include rte_pci.h, but this is at the
condition that this layer is only included by PMDs, not the library
itself.

make
rm build/include/rte_pci.h
make lib/librte_eventdev_sub

Should allow you to verify that everything is fine.

> ---
>  drivers/event/skeleton/skeleton_eventdev.c | 30 +-
>  lib/librte_eventdev/rte_eventdev.c | 38 +++---
>  lib/librte_eventdev/rte_eventdev.h |  2 -
>  lib/librte_eventdev/rte_eventdev_pmd.h | 63 
> --
>  4 files changed, 41 insertions(+), 92 deletions(-)
> 
> diff --git a/drivers/event/skeleton/skeleton_eventdev.c 
> b/drivers/event/skeleton/skeleton_eventdev.c
> index 800bd76e0..34684aba0 100644
> --- a/drivers/event/skeleton/skeleton_eventdev.c
> +++ b/drivers/event/skeleton/skeleton_eventdev.c
> @@ -427,18 +427,28 @@ static const struct rte_pci_id pci_id_skeleton_map[] = {
>   },
>  };
>  
> -static struct rte_eventdev_driver pci_eventdev_skeleton_pmd = {
> - .pci_drv = {
> - .id_table = pci_id_skeleton_map,
> - .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
> - .probe = rte_event_pmd_pci_probe,
> - .remove = rte_event_pmd_pci_remove,
> - },
> - .eventdev_init = skeleton_eventdev_init,
> - .dev_private_size = sizeof(struct skeleton_eventdev),
> +static int
> +event_skeleton_pci_probe(struct rte_pci_driver *pci_drv,
> +  struct rte_pci_device *pci_dev)
> +{
> + return rte_event_pmd_pci_probe(pci_drv, pci_dev,
> + sizeof(struct skeleton_eventdev), skeleton_eventdev_init);
> +}
> +
> +static int
> +event_skeleton_pci_remove(struct rte_pci_device *pci_dev)
> +{
> + return rte_event_pmd_pci_remove(pci_dev, NULL);
> +}
> +
> +static struct rte_pci_driver pci_eventdev_skeleton_pmd = {
> + .id_table = pci_id_skeleton_map,
> + .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
> + .probe = event_skeleton_pci_probe,
> + .remove = event_skeleton_pci_remove,
>  };
>  
> -RTE_PMD_REGISTER_PCI(event_skeleton_pci, pci_eventdev_skeleton_pmd.pci_drv);
> +RTE_PMD_REGISTER_PCI(event_skeleton_pci, pci_eventdev_skeleton_pmd);
>  RTE_PMD_REGISTER_PCI_TABLE(event_skeleton_pci, pci_id_skeleton_map);
>  
>  /* VDEV based event device */
> diff --git a/lib/librte_eventdev/rte_eventdev.c 
> b/lib/librte_eventdev/rte_eventdev.c
> index 20afc3f0e..fd0406747 100644
> --- a/lib/librte_eventdev/rte_eventdev.c
> +++ b/lib/librte_eventdev/rte_eventdev.c
> @@ -45,7 +45,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -126,8 +125,6 @@ rte_event_dev_info_get(uint8_t dev_id, struct 
> rte_event_dev_info *dev_info)
>   dev_info->dequeue_timeout_ns = dev->data->dev_conf.dequeue_timeout_ns;
>  
>   dev_info->dev = dev->dev;
> - if (dev->driver)
> - dev_info->driver_name = dev->driver->pci_drv.driver.name;
>   return 0;
>  }
>  
> @@ -1250,18 +1247,18 @@ rte_event_pmd_vdev_uninit(const char *name)
>  
>  int
>  rte_event_pmd_pci_probe(struct rte_pci_driver *pci_drv,
> - struct rte_pci_device *pci_dev)
> + struct rte_pci_device *pci_dev,
> + size_t private_data_size,
> + eventdev_pmd_pci_callback_t devinit)
>  {
> - struct rte_eventdev_driver *eventdrv;
>   struct rte_eventdev *eventdev;
>  
>   char eventdev_name[RTE_EVENTDEV_NAME_MAX_LEN];
>  
>   int retval;
>  
> - eventdrv = (struct rte_eventdev_driver *)pci_drv;
> - if (eventdrv == NULL)
> - return -ENODEV;
> + if (devinit == NULL)
> + return -EINVAL;
>  
>   rte_pci_device_name(&pci_dev->addr, eventdev_name,
>   sizeof(eventdev_name));
> @@ -1275,7 +1272,7 @@ rte_event_pmd_pci_probe(struct rte_pci_driver *pci_drv,
>   eventdev->data->dev_private =
>   rte_zmalloc_socket(
>   "eventdev private structure",
> - eventdrv->dev_private_size,
> + private_data_size,
>   RTE_CACHE_LINE_SIZE,
>   rte_socket_id());
>  
> @@ -1285,10 +1282,9 @@ rte_event_pmd_pci_probe(struct rte_pci_driver *pci_drv,
>   }
>  
>   eventdev->dev = &pci_dev->device;
> - eventdev->driver = eventdrv;
>  
>   /* Invoke PMD device initialization function */
> - retval = (*eventdrv->eve

Re: [dpdk-dev] [RFCv2] service core concept

2017-06-06 Thread Bruce Richardson
On Tue, Jun 06, 2017 at 11:25:57AM +0100, Van Haaren, Harry wrote:
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Saturday, June 3, 2017 11:23 AM
> > To: Van Haaren, Harry ; dev@dpdk.org
> > Cc: Thomas Monjalon ; Jerin Jacob 
> > ;
> > Richardson, Bruce ; Wiles, Keith 
> > 
> > Subject: RE: [dpdk-dev] [RFCv2] service core concept
> 
> 
> 
> > > In particular this version of the API enables applications that are not 
> > > aware of services to
> > > benefit from the services concept, as EAL args can be used to setup 
> > > services and service
> > cores.
> > > With this design, switching to/from SW/HW PMD is transparent to the 
> > > application. An example
> > > use-case is the Eventdev HW PMD to Eventdev SW PMD that requires a 
> > > service core.
> > >
> > > I have noted the implementation comments that were raised on the v1. For 
> > > v2, I think our
> > time
> > > is better spent looking at the API design, and I will handle 
> > > implementation feedback in the
> > > follow-up patchset to v2 RFC.
> > >
> > > Below a summary of what we are trying to achieve, and the current API 
> > > design.
> > > Have a good weekend! Cheers, -Harry
> > 
> >
> > Looks good to me in general.
> > The only comment I have - do we really need to put it into rte_eal_init()
> > and a new EAL command-line parameter for it?
> > Might be better to leave it to the particular app to decide.
> 
> 
> There are a number of options here, each with its own merit:
> 
> A) Services/cores config in EAL
> Benefit is that service functionality can be transparent to the application. 
> Negative is that the complexity is in EAL.
> 
> B) Application configures services/cores
> Benefit is no added EAL complexity. Negative is that application code has to 
> configure cores (duplicated per application).
> 
> 
> To answer this question, I think we need to estimate how many applications 
> would benefit from EAL integration and balance that against the "complexity 
> cost" of doing so. I do like the simplicity of option (B), however if there 
> is significant value in total transparency to the application I think (A) is 
> the better choice.
> 
> 
> Input on A) or B) welcomed! -Harry

I'm definitely in favour of having it in EAL. The whole reason for doing
this work is to make it easy for applications to dedicate cores to
background tasks - including applications written before this
functionality was added. By merging this into EAL, we can have
transparency in the app, as we can have the service cores completely in
the background, and the app can call rte_eal_mp_remote_launch() exactly
as before, without unexpected failures. If we move this externally, the
app needs to be reworked to take account of that fact, and call new,
service-core aware, launch functions instead.

Regards,
/Bruce


Re: [dpdk-dev] [PATCH] net/thunderx: manage PCI device mapping for SQS VFs

2017-06-06 Thread Ferruh Yigit
On 6/6/2017 3:05 PM, Jerin Jacob wrote:
> -Original Message-
>> Date: Tue, 6 Jun 2017 14:36:09 +0100
>> From: Ferruh Yigit 
>> To: Jerin Jacob , dev@dpdk.org
>> CC: Angela Czubak , Thomas Monjalon
>>  
>> Subject: Re: [dpdk-dev] [PATCH] net/thunderx: manage PCI device mapping for
>>  SQS VFs
>> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>>  Thunderbird/52.1.1
>>
>> On 6/1/2017 2:05 PM, Jerin Jacob wrote:
>>> Since the commit e84ad157b7bc ("pci: unmap resources if probe fails"),
>>> EAL unmaps the PCI device if ethdev probe returns positive or
>>> negative value.
>>>
>>> nicvf thunderx PMD needs special treatment for Secondary queue set(SQS)
>>> PCIe VF devices, where, it expects to not unmap or free the memory
>>> without registering the ethdev subsystem.
>>>
>>> To keep the same behavior, moved the PCI map function inside
>>> the driver without using the EAL services.
>>
>> What do you think adding a flag something like
>> RTE_PCI_DRV_FIXED_MAPPING? Does mapping but not unmap on error.
>> This would be more generic solution.
>>
>> I am concerned about calling eal level API from PMD.
> 
> Understood.
> 
> Another option is to unmap only on ERROR(ie, when probe return <0 value)
> 
>   ret = dr->probe(dr, dev);
> if (ret) { // change to if (ret < 0)
> dev->driver = NULL;
> if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)
> rte_pci_unmap_device(dev);
> }
> 
> I am fine with either way. Let me know, what you prefer. I will
> change accordingly.

"unmap only on ERROR" looks simpler, but it needs to be documented -in
the code, otherwise easy to miss in the future:

probe() return:
0   : success
< 0 : error, unmap resources
> 0 : error, no unmap

And requires to check that all existing drivers return negative error on
probe()


Adding new flag is more explicit, and no need to concern about what
other PMDs return, but can be overkill.


I would go with second option, but I guess both are OK, -as long as
behavior change in first one is commented in the code. Please pick one.

Thanks,
ferruh

> 
>>
>>>
>>> Signed-off-by: Jerin Jacob 
>>> Signed-off-by: Angela Czubak 
>>> ---
>>>  drivers/net/thunderx/nicvf_ethdev.c | 9 -
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/thunderx/nicvf_ethdev.c 
>>> b/drivers/net/thunderx/nicvf_ethdev.c
>>> index 796701b0f..6ec2f9266 100644
>>> --- a/drivers/net/thunderx/nicvf_ethdev.c
>>> +++ b/drivers/net/thunderx/nicvf_ethdev.c
>>> @@ -2025,6 +2025,13 @@ nicvf_eth_dev_init(struct rte_eth_dev *eth_dev)
>>> }
>>>  
>>> pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
>>> +
>>> +   ret = rte_pci_map_device(pci_dev);
>>> +   if (ret) {
>>> +   PMD_INIT_LOG(ERR, "Failed to map pci device");
>>> +   goto fail;
>>> +   }
>>> +
>>> rte_eth_copy_pci_info(eth_dev, pci_dev);
>>>  
>>> nic->device_id = pci_dev->id.device_id;
>>> @@ -2171,7 +2178,7 @@ static int nicvf_eth_pci_remove(struct rte_pci_device 
>>> *pci_dev)
>>>  
>>>  static struct rte_pci_driver rte_nicvf_pmd = {
>>> .id_table = pci_id_nicvf_map,
>>> -   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
>>> +   .drv_flags = RTE_PCI_DRV_INTR_LSC,
>>> .probe = nicvf_eth_pci_probe,
>>> .remove = nicvf_eth_pci_remove,
>>>  };
>>>
>>



Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation

2017-06-06 Thread Bruce Richardson
On Tue, Jun 06, 2017 at 02:19:21PM +0100, Ananyev, Konstantin wrote:
> 
> 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Tuesday, June 6, 2017 1:42 PM
> > To: Ananyev, Konstantin 
> > Cc: Verkamp, Daniel ; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2] ring: use aligned memzone allocation
> > 
> > On Tue, Jun 06, 2017 at 10:59:59AM +0100, Ananyev, Konstantin wrote:
> > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > The PROD/CONS_ALIGN values on x86-64 are set to 2 cache lines, so 
> > > > > > members
> > > > > of struct rte_ring are 128 byte aligned,
> > > > > >and therefore the whole struct needs 128-byte alignment according to 
> > > > > >the ABI
> > > > > so that the 128-byte alignment of the fields can be guaranteed.
> > > > >
> > > > > Ah ok, missed the fact that rte_ring is 128B aligned these days.
> > > > > BTW, I probably missed the initial discussion, but what was the 
> > > > > reason for that?
> > > > > Konstantin
> > > >
> > > > I don't know why PROD_ALIGN/CONS_ALIGN use 128 byte alignment; it seems 
> > > > unnecessary if the cache line is only 64 bytes.  An
> > alternate
> > > > fix would be to just use cache line alignment for these fields (since 
> > > > memzones are already cache line aligned).
> > >
> > > Yes, had the same thought.
> > >
> > > > Maybe there is some deeper  reason for the >= 128-byte alignment logic 
> > > > in rte_ring.h?
> > >
> > > Might be, would be good to hear opinion the author of that change.
> > 
> > It gives improved performance for core-2-core transfer.
> 
> You mean empty cache-line(s) after prod/cons, correct?
> That's ok but why we can't keep them and whole rte_ring aligned on cache-line 
> boundaries?
> Something like that:
> struct rte_ring {
>...
>struct rte_ring_headtail prod __rte_cache_aligned;
>EMPTY_CACHE_LINE   __rte_cache_aligned;
>struct rte_ring_headtail cons __rte_cache_aligned;
>EMPTY_CACHE_LINE   __rte_cache_aligned;
> };
> 
> Konstantin

Sure. That should probably work too. 

/Bruce


[dpdk-dev] [PATCH v2] ethdev: remove driver name from device private data

2017-06-06 Thread Ferruh Yigit
rte_driver->name has the driver name and all physical and virtual
devices has access to it.

Previously it was not possible for virtual ethernet devices to access
rte_driver->name field (because eth_dev used to keep only pci_dev),
and it was required to save driver name in the device private struct.

After re-works on bus and vdev, it is possible for all bus types to
access rte_driver.

It is able to remove the driver name from ethdev device private data and
use eth_dev->device->driver->name.

Signed-off-by: Ferruh Yigit 
---
Cc: Gaetan Rivet 
Cc: Jan Blunck 

v2:
* rebase on latest next-net
---
 drivers/net/bnxt/bnxt_ethdev.c | 2 +-
 drivers/net/bonding/rte_eth_bond_api.c | 4 ++--
 drivers/net/cxgbe/sge.c| 6 +++---
 drivers/net/dpaa2/dpaa2_ethdev.c   | 1 -
 drivers/net/i40e/i40e_ethdev.c | 3 +--
 drivers/net/i40e/i40e_fdir.c   | 2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c   | 2 +-
 drivers/net/ring/rte_eth_ring.c| 1 -
 drivers/net/tap/rte_eth_tap.c  | 1 -
 drivers/net/vmxnet3/vmxnet3_ethdev.c   | 2 +-
 drivers/net/xenvirt/rte_eth_xenvirt.c  | 1 -
 lib/librte_ether/rte_ethdev.c  | 8 
 lib/librte_ether/rte_ethdev.h  | 1 -
 lib/librte_ether/rte_ethdev_pci.h  | 1 -
 lib/librte_ether/rte_ethdev_vdev.h | 1 -
 15 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index f9bedf7..15195c3 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -1940,7 +1940,7 @@ static struct rte_pci_driver bnxt_rte_pmd = {
 static bool
 is_device_supported(struct rte_eth_dev *dev, struct rte_pci_driver *drv)
 {
-   if (strcmp(dev->data->drv_name, drv->driver.name))
+   if (strcmp(dev->device->driver->name, drv->driver.name))
return false;
 
return true;
diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index 36ec65d..164eb59 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -48,11 +48,11 @@ int
 check_for_bonded_ethdev(const struct rte_eth_dev *eth_dev)
 {
/* Check valid pointer */
-   if (eth_dev->data->drv_name == NULL)
+   if (eth_dev->device->driver->name == NULL)
return -1;
 
/* return 0 if driver name matches */
-   return eth_dev->data->drv_name != pmd_bond_drv.driver.name;
+   return eth_dev->device->driver->name != pmd_bond_drv.driver.name;
 }
 
 int
diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index 9cbd4ec..d088065 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -1691,7 +1691,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
iq->size = cxgbe_roundup(iq->size, 16);
 
snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d",
-eth_dev->data->drv_name,
+eth_dev->device->driver->name,
 fwevtq ? "fwq_ring" : "rx_ring",
 eth_dev->data->port_id, queue_id);
snprintf(z_name_sw, sizeof(z_name_sw), "%s_sw_ring", z_name);
@@ -1745,7 +1745,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
fl->size = cxgbe_roundup(fl->size, 8);
 
snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d",
-eth_dev->data->drv_name,
+eth_dev->device->driver->name,
 fwevtq ? "fwq_ring" : "fl_ring",
 eth_dev->data->port_id, queue_id);
snprintf(z_name_sw, sizeof(z_name_sw), "%s_sw_ring", z_name);
@@ -1945,7 +1945,7 @@ int t4_sge_alloc_eth_txq(struct adapter *adap, struct 
sge_eth_txq *txq,
nentries = txq->q.size + s->stat_len / sizeof(struct tx_desc);
 
snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d",
-eth_dev->data->drv_name, "tx_ring",
+eth_dev->device->driver->name, "tx_ring",
 eth_dev->data->port_id, queue_id);
snprintf(z_name_sw, sizeof(z_name_sw), "%s_sw_ring", z_name);
 
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index 4de1e0c..da309ac 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -1464,7 +1464,6 @@ dpaa2_dev_init(struct rte_eth_dev *eth_dev)
}
 
eth_dev->dev_ops = &dpaa2_ethdev_ops;
-   eth_dev->data->drv_name = rte_dpaa2_pmd.driver.name;
 
eth_dev->rx_pkt_burst = dpaa2_dev_prefetch_rx;
eth_dev->tx_pkt_burst = dpaa2_dev_tx;
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 2f1cd85..078a808 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -10738,8 +10738,7 @@ i40e_filter_restore(struct i40e_pf *pf)
 static bool
 is_device_supported(struct rte_eth_dev *dev, struct rte_pci_driver *drv)
 {
-   if (strcmp(dev->data->drv_name,
-

Re: [dpdk-dev] [RFCv2] service core concept

2017-06-06 Thread Ananyev, Konstantin


> -Original Message-
> From: Richardson, Bruce
> Sent: Tuesday, June 6, 2017 3:54 PM
> To: Van Haaren, Harry 
> Cc: Ananyev, Konstantin ; dev@dpdk.org; Thomas 
> Monjalon ; Jerin Jacob
> ; Wiles, Keith 
> Subject: Re: [dpdk-dev] [RFCv2] service core concept
> 
> On Tue, Jun 06, 2017 at 11:25:57AM +0100, Van Haaren, Harry wrote:
> > > -Original Message-
> > > From: Ananyev, Konstantin
> > > Sent: Saturday, June 3, 2017 11:23 AM
> > > To: Van Haaren, Harry ; dev@dpdk.org
> > > Cc: Thomas Monjalon ; Jerin Jacob 
> > > ;
> > > Richardson, Bruce ; Wiles, Keith 
> > > 
> > > Subject: RE: [dpdk-dev] [RFCv2] service core concept
> >
> > 
> >
> > > > In particular this version of the API enables applications that are not 
> > > > aware of services to
> > > > benefit from the services concept, as EAL args can be used to setup 
> > > > services and service
> > > cores.
> > > > With this design, switching to/from SW/HW PMD is transparent to the 
> > > > application. An example
> > > > use-case is the Eventdev HW PMD to Eventdev SW PMD that requires a 
> > > > service core.
> > > >
> > > > I have noted the implementation comments that were raised on the v1. 
> > > > For v2, I think our
> > > time
> > > > is better spent looking at the API design, and I will handle 
> > > > implementation feedback in the
> > > > follow-up patchset to v2 RFC.
> > > >
> > > > Below a summary of what we are trying to achieve, and the current API 
> > > > design.
> > > > Have a good weekend! Cheers, -Harry
> > >
> > >
> > > Looks good to me in general.
> > > The only comment I have - do we really need to put it into rte_eal_init()
> > > and a new EAL command-line parameter for it?
> > > Might be better to leave it to the particular app to decide.
> >
> >
> > There are a number of options here, each with its own merit:
> >
> > A) Services/cores config in EAL
> > Benefit is that service functionality can be transparent to the 
> > application. Negative is that the complexity is in EAL.
> >
> > B) Application configures services/cores
> > Benefit is no added EAL complexity. Negative is that application code has 
> > to configure cores (duplicated per application).
> >
> >
> > To answer this question, I think we need to estimate how many applications 
> > would benefit from EAL integration and balance that against
> the "complexity cost" of doing so. I do like the simplicity of option (B), 
> however if there is significant value in total transparency to the
> application I think (A) is the better choice.
> >
> >
> > Input on A) or B) welcomed! -Harry
> 
> I'm definitely in favour of having it in EAL. The whole reason for doing
> this work is to make it easy for applications to dedicate cores to
> background tasks - including applications written before this
> functionality was added. By merging this into EAL, we can have
> transparency in the app, as we can have the service cores completely in
> the background, and the app can call rte_eal_mp_remote_launch() exactly
> as before, without unexpected failures. If we move this externally, the
> app needs to be reworked to take account of that fact, and call new,
> service-core aware, launch functions instead.

Not sure I understood you here:
If the app don' plan to use any cores for services, it for sure will be able to 
call
rte_eal_mp_remote_launch() as before (no services running case).
>From other side, if the app would like to use services - it would need to 
>specify
which service it wants to run, and for each service provide a coremask, even if
EAL already allocates service cores for it.
Or are you talking about the when EAL allocates service cores, and then
PMDs themselves (or EAL again) register their services on that cores?
That's probably possible, but how PMD would know which service core(s) it 
allowed to use?
Another EAL cmd-line parameter(s) or extension of existing '-w/--vdev' or 
something else?
Things might get over-complicated here - in theory there could be multiple PMDs,
each of them can have more than one service, running on multiple sets of cores, 
etc.
Konstantin

 



Re: [dpdk-dev] [PATCH v2] ethdev: remove driver name from device private data

2017-06-06 Thread Gaëtan Rivet
Hi Ferruh,

On Tue, Jun 06, 2017 at 04:10:08PM +0100, Ferruh Yigit wrote:
> rte_driver->name has the driver name and all physical and virtual
> devices has access to it.
> 
> Previously it was not possible for virtual ethernet devices to access
> rte_driver->name field (because eth_dev used to keep only pci_dev),
> and it was required to save driver name in the device private struct.
> 
> After re-works on bus and vdev, it is possible for all bus types to
> access rte_driver.
> 
> It is able to remove the driver name from ethdev device private data and
> use eth_dev->device->driver->name.
> 
> Signed-off-by: Ferruh Yigit 
> ---
> Cc: Gaetan Rivet 
> Cc: Jan Blunck 
> 

I'm not sure I am the one who should give his opinion on this, I only
took the work from Jan and ported it for integration.

However, as far as I am concerned, I agree with this change.

> v2:
> * rebase on latest next-net
> ---
>  drivers/net/bnxt/bnxt_ethdev.c | 2 +-
>  drivers/net/bonding/rte_eth_bond_api.c | 4 ++--
>  drivers/net/cxgbe/sge.c| 6 +++---
>  drivers/net/dpaa2/dpaa2_ethdev.c   | 1 -
>  drivers/net/i40e/i40e_ethdev.c | 3 +--
>  drivers/net/i40e/i40e_fdir.c   | 2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c   | 2 +-
>  drivers/net/ring/rte_eth_ring.c| 1 -
>  drivers/net/tap/rte_eth_tap.c  | 1 -
>  drivers/net/vmxnet3/vmxnet3_ethdev.c   | 2 +-
>  drivers/net/xenvirt/rte_eth_xenvirt.c  | 1 -
>  lib/librte_ether/rte_ethdev.c  | 8 
>  lib/librte_ether/rte_ethdev.h  | 1 -
>  lib/librte_ether/rte_ethdev_pci.h  | 1 -
>  lib/librte_ether/rte_ethdev_vdev.h | 1 -
>  15 files changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
> index f9bedf7..15195c3 100644
> --- a/drivers/net/bnxt/bnxt_ethdev.c
> +++ b/drivers/net/bnxt/bnxt_ethdev.c
> @@ -1940,7 +1940,7 @@ static struct rte_pci_driver bnxt_rte_pmd = {
>  static bool
>  is_device_supported(struct rte_eth_dev *dev, struct rte_pci_driver *drv)
>  {
> - if (strcmp(dev->data->drv_name, drv->driver.name))
> + if (strcmp(dev->device->driver->name, drv->driver.name))
>   return false;
>  
>   return true;
> diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
> b/drivers/net/bonding/rte_eth_bond_api.c
> index 36ec65d..164eb59 100644
> --- a/drivers/net/bonding/rte_eth_bond_api.c
> +++ b/drivers/net/bonding/rte_eth_bond_api.c
> @@ -48,11 +48,11 @@ int
>  check_for_bonded_ethdev(const struct rte_eth_dev *eth_dev)
>  {
>   /* Check valid pointer */
> - if (eth_dev->data->drv_name == NULL)
> + if (eth_dev->device->driver->name == NULL)
>   return -1;
>  
>   /* return 0 if driver name matches */
> - return eth_dev->data->drv_name != pmd_bond_drv.driver.name;
> + return eth_dev->device->driver->name != pmd_bond_drv.driver.name;
>  }
>  
>  int
> diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
> index 9cbd4ec..d088065 100644
> --- a/drivers/net/cxgbe/sge.c
> +++ b/drivers/net/cxgbe/sge.c
> @@ -1691,7 +1691,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
> sge_rspq *iq, bool fwevtq,
>   iq->size = cxgbe_roundup(iq->size, 16);
>  
>   snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d",
> -  eth_dev->data->drv_name,
> +  eth_dev->device->driver->name,
>fwevtq ? "fwq_ring" : "rx_ring",
>eth_dev->data->port_id, queue_id);
>   snprintf(z_name_sw, sizeof(z_name_sw), "%s_sw_ring", z_name);
> @@ -1745,7 +1745,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
> sge_rspq *iq, bool fwevtq,
>   fl->size = cxgbe_roundup(fl->size, 8);
>  
>   snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d",
> -  eth_dev->data->drv_name,
> +  eth_dev->device->driver->name,
>fwevtq ? "fwq_ring" : "fl_ring",
>eth_dev->data->port_id, queue_id);
>   snprintf(z_name_sw, sizeof(z_name_sw), "%s_sw_ring", z_name);
> @@ -1945,7 +1945,7 @@ int t4_sge_alloc_eth_txq(struct adapter *adap, struct 
> sge_eth_txq *txq,
>   nentries = txq->q.size + s->stat_len / sizeof(struct tx_desc);
>  
>   snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d",
> -  eth_dev->data->drv_name, "tx_ring",
> +  eth_dev->device->driver->name, "tx_ring",
>eth_dev->data->port_id, queue_id);
>   snprintf(z_name_sw, sizeof(z_name_sw), "%s_sw_ring", z_name);
>  
> diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c 
> b/drivers/net/dpaa2/dpaa2_ethdev.c
> index 4de1e0c..da309ac 100644
> --- a/drivers/net/dpaa2/dpaa2_ethdev.c
> +++ b/drivers/net/dpaa2/dpaa2_ethdev.c
> @@ -1464,7 +1464,6 @@ dpaa2_dev_init(struct rte_eth_dev *eth_dev)
>   }
>  
>   eth_dev->dev_ops = &dpaa2_ethdev_ops;
> - eth_dev->data->drv_name = rte_dpaa2_pmd.driver.name;
>  
>   eth_dev->rx_pkt_burst = dpaa2_dev_prefetch_r

Re: [dpdk-dev] [RFCv2] service core concept

2017-06-06 Thread Van Haaren, Harry
> From: Ananyev, Konstantin
> Sent: Tuesday, June 6, 2017 4:29 PM
> Subject: RE: [dpdk-dev] [RFCv2] service core concept
> 
> 
> > From: Richardson, Bruce
> > Sent: Tuesday, June 6, 2017 3:54 PM
> >
> > On Tue, Jun 06, 2017 at 11:25:57AM +0100, Van Haaren, Harry wrote:
> > > > From: Ananyev, Konstantin
> > > > Sent: Saturday, June 3, 2017 11:23 AM
> > >
> > > 



> > >
> > > There are a number of options here, each with its own merit:
> > >
> > > A) Services/cores config in EAL
> > > Benefit is that service functionality can be transparent to the 
> > > application. Negative is
> that the complexity is in EAL.
> > >
> > > B) Application configures services/cores
> > > Benefit is no added EAL complexity. Negative is that application code has 
> > > to configure
> cores (duplicated per application).
> > >
> > >
> > > To answer this question, I think we need to estimate how many 
> > > applications would benefit
> from EAL integration and balance that against
> > the "complexity cost" of doing so. I do like the simplicity of option (B), 
> > however if there
> is significant value in total transparency to the
> > application I think (A) is the better choice.
> > >
> > >
> > > Input on A) or B) welcomed! -Harry
> >
> > I'm definitely in favour of having it in EAL. The whole reason for doing
> > this work is to make it easy for applications to dedicate cores to
> > background tasks - including applications written before this
> > functionality was added. By merging this into EAL, we can have
> > transparency in the app, as we can have the service cores completely in
> > the background, and the app can call rte_eal_mp_remote_launch() exactly
> > as before, without unexpected failures. If we move this externally, the
> > app needs to be reworked to take account of that fact, and call new,
> > service-core aware, launch functions instead.
> 
> Not sure I understood you here:
> If the app don' plan to use any cores for services, it for sure will be able 
> to call
> rte_eal_mp_remote_launch() as before (no services running case).

Correct - EAL behavior remains unchanged if --service-cores=0xf is not passed


> From other side, if the app would like to use services - it would need to 
> specify
> which service it wants to run, and for each service provide a coremask, even 
> if
> EAL already allocates service cores for it.

See next paragraph


> Or are you talking about the when EAL allocates service cores, and then
> PMDs themselves (or EAL again) register their services on that cores?

EAL could provide sane default behavior. For example, round-robin services over 
available service-cores. Multithread-capable services can be registered on all 
service cores. Its not a perfect solution for all service-to-core mapping 
problems, but I'd guess about 80% of cases would be covered: using a single 
service with a single service core dedicated to it :)


> That's probably possible, but how PMD would know which service core(s) it 
> allowed to use?

The PMD shouldn't be deciding - EAL for basic sanity config, or Application for 
advanced usage.


> Things might get over-complicated here - in theory there could be multiple 
> PMDs,
> each of them can have more than one service, running on multiple sets of 
> cores, etc.

True - the NxM service:core mapping possibility can be huge - the API allows 
the application the flexibility if that flexibility is really required. If the 
flexibility is not required, the round-robin 1:1 service:core EAL scheme should 
cover it?

-Harry


Re: [dpdk-dev] [PATCH v2] net/ixgbe: enable PTYPE offload for x86 vector PMD

2017-06-06 Thread Olivier Matz
Hi Qi,

On Wed, 31 May 2017 19:30:26 -0400, Qi Zhang  wrote:
> Hardware PTYPE in Rx desc will be parsed to fill
> mbuf's packet_type.
> 
> Signed-off-by: Ray Kinsella 
> Signed-off-by: Qi Zhang 
> ---
> 
> v2:
> - replace large macro that parse packet type with inline function
> - fix couple check patch issues.
> 
>  drivers/net/ixgbe/ixgbe_ethdev.c   |   8 +
>  drivers/net/ixgbe/ixgbe_rxtx.c | 623 
> ++---
>  drivers/net/ixgbe/ixgbe_rxtx.h |  92 +
>  drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c |  65 
>  4 files changed, 434 insertions(+), 354 deletions(-)
> 

I tried to compile your patch with RTE_MACHINE=default, and I
have the following compilation error:

gcc -Wp,-MD,./.ixgbe_rxtx_vec_sse.o.d.tmp  -m64 -pthread  -march=core2 
-DRTE_MACHINE_CPUFLAG_SSE -DRTE_MACHINE_CPUFLAG_SSE2 -DRTE_MACHINE_CPUFLAG_SSE3 
-DRTE_MACHINE_CPUFLAG_SSSE3  -I/home/user/dpdk.org/build/include -include 
/home/user/dpdk.org/build/include/rte_config.h -O3 -W -Wall -Wstrict-prototypes 
-Wmissing-prototypes -Wmissing-declarations -Wold-style-definition 
-Wpointer-arith -Wcast-align -Wnested-externs -Wcast-qual -Wformat-nonliteral 
-Wformat-security -Wundef -Wwrite-strings -Werror -Wno-deprecated-o 
ixgbe_rxtx_vec_sse.o -c 
/home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c 
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/x86intrin.h:43:0,
 from /home/user/dpdk.org/build/include/rte_vect.h:70,
 from /home/user/dpdk.org/build/include/rte_memcpy.h:46,
 from /home/user/dpdk.org/build/include/rte_ether.h:50,
 from /home/user/dpdk.org/build/include/rte_ethdev.h:185,
 from 
/home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:35:
/home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c: In function 
‘desc_to_ptype_v’:
/usr/lib/gcc/x86_64-linux-gnu/6/include/smmintrin.h:447:1: error: inlining 
failed in call to always_inline ‘_mm_extract_epi32’: target specific option 
mismatch
 _mm_extract_epi32 (__m128i __X, const int __N)
 ^
/home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:300:13: note: called 
from here
  pkt_info = _mm_extract_epi32(ptype0, 3);
 ^~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/x86intrin.h:43:0,
 from /home/user/dpdk.org/build/include/rte_vect.h:70,
 from /home/user/dpdk.org/build/include/rte_memcpy.h:46,
 from /home/user/dpdk.org/build/include/rte_ether.h:50,
 from /home/user/dpdk.org/build/include/rte_ethdev.h:185,
 from 
/home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:35:
[...]


To reproduce:
 make config T=x86_64-native-linuxapp-gcc
 sed -i 's,CONFIG_RTE_MACHINE="native",CONFIG_RTE_MACHINE="default",' 
build/.config
 make

Do we still want to support the core2 target?


Thanks,
Olivier




Re: [dpdk-dev] [PATCH v2] net/ixgbe: enable PTYPE offload for x86 vector PMD

2017-06-06 Thread Zhang, Qi Z
Hi Oliver:

> -Original Message-
> From: Olivier Matz [mailto:olivier.m...@6wind.com]
> Sent: Tuesday, June 6, 2017 11:45 PM
> To: Zhang, Qi Z 
> Cc: Ananyev, Konstantin ; Zhang, Helin
> ; Lu, Wenzhuo ; Kinsella,
> Ray ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] net/ixgbe: enable PTYPE offload for x86
> vector PMD
> 
> Hi Qi,
> 
> On Wed, 31 May 2017 19:30:26 -0400, Qi Zhang 
> wrote:
> > Hardware PTYPE in Rx desc will be parsed to fill mbuf's packet_type.
> >
> > Signed-off-by: Ray Kinsella 
> > Signed-off-by: Qi Zhang 
> > ---
> >
> > v2:
> > - replace large macro that parse packet type with inline function
> > - fix couple check patch issues.
> >
> >  drivers/net/ixgbe/ixgbe_ethdev.c   |   8 +
> >  drivers/net/ixgbe/ixgbe_rxtx.c | 623
> ++---
> >  drivers/net/ixgbe/ixgbe_rxtx.h |  92 +
> >  drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c |  65 
> >  4 files changed, 434 insertions(+), 354 deletions(-)
> >
> 
> I tried to compile your patch with RTE_MACHINE=default, and I have the
> following compilation error:
> 
> gcc -Wp,-MD,./.ixgbe_rxtx_vec_sse.o.d.tmp  -m64 -pthread
> -march=core2 -DRTE_MACHINE_CPUFLAG_SSE
> -DRTE_MACHINE_CPUFLAG_SSE2 -DRTE_MACHINE_CPUFLAG_SSE3
> -DRTE_MACHINE_CPUFLAG_SSSE3  -I/home/user/dpdk.org/build/include
> -include /home/user/dpdk.org/build/include/rte_config.h -O3 -W -Wall
> -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations
> -Wold-style-definition -Wpointer-arith -Wcast-align -Wnested-externs
> -Wcast-qual -Wformat-nonliteral -Wformat-security -Wundef
> -Wwrite-strings -Werror -Wno-deprecated-o ixgbe_rxtx_vec_sse.o -c
> /home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c
> In file included from
> /usr/lib/gcc/x86_64-linux-gnu/6/include/x86intrin.h:43:0,
>  from
> /home/user/dpdk.org/build/include/rte_vect.h:70,
>  from
> /home/user/dpdk.org/build/include/rte_memcpy.h:46,
>  from
> /home/user/dpdk.org/build/include/rte_ether.h:50,
>  from
> /home/user/dpdk.org/build/include/rte_ethdev.h:185,
>  from
> /home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:35:
> /home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c: In function
> ‘desc_to_ptype_v’:
> /usr/lib/gcc/x86_64-linux-gnu/6/include/smmintrin.h:447:1: error: inlining
> failed in call to always_inline ‘_mm_extract_epi32’: target specific option
> mismatch
>  _mm_extract_epi32 (__m128i __X, const int __N)  ^
> /home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:300:13: note:
> called from here
>   pkt_info = _mm_extract_epi32(ptype0, 3);
>  ^~~~ In file included from
> /usr/lib/gcc/x86_64-linux-gnu/6/include/x86intrin.h:43:0,
>  from
> /home/user/dpdk.org/build/include/rte_vect.h:70,
>  from
> /home/user/dpdk.org/build/include/rte_memcpy.h:46,
>  from
> /home/user/dpdk.org/build/include/rte_ether.h:50,
>  from
> /home/user/dpdk.org/build/include/rte_ethdev.h:185,
>  from
> /home/user/dpdk.org/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:35:
> [...]
> 
> 
> To reproduce:
>  make config T=x86_64-native-linuxapp-gcc  sed -i
> 's,CONFIG_RTE_MACHINE="native",CONFIG_RTE_MACHINE="default",'
> build/.config  make
> 
> Do we still want to support the core2 target?

Thanks for capture this/
Will fix in v3

Regards.
Qi
> 
> 
> Thanks,
> Olivier
> 



[dpdk-dev] technical board meeting, 2017-06-07

2017-06-06 Thread Stephen Hemminger
Hello everyone,

A meeting of the DPDK technical board will occur this Wednesday, 7th June 2017
at 3pm UTC.

The meeting takes place on the #dpdk-board channel on IRC.
This meeting is public, so anybody can join, see below for the agenda.

Agenda: https://annuel.framapad.org/p/r.0c3cc4d1e011214183872a98f6b5c7db


0. Release blocking issues?
Are there any known urgent blocking issues at this time.

1. Hosting and user repositories
What is policy? Where should it be documented?

2. Bus changes and API breakage.
I haven't been tracking the fine print. How badly does next
release change the API?

4. Fail-Safe PMD
What is current status? Is it ready?

5. Bugzilla
Last meeting agreed to setup Bugzilla, what is progress.

6. Governing board status
Are there any AR's for TAB?


[dpdk-dev] [PATCH v2] doc: add new targets to "make help" output

2017-06-06 Thread Gabriel Carrillo
Commit aafaea3d3b70 ("devtools: add tags
and cscope index generation") introduced
new make targets. This change updates the
help target output to reflect the additions.

Signed-off-by: Gabriel Carrillo 
---
 doc/build-sdk-quick.txt | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/doc/build-sdk-quick.txt b/doc/build-sdk-quick.txt
index 8d41052..8ed9d80 100644
--- a/doc/build-sdk-quick.txt
+++ b/doc/build-sdk-quick.txt
@@ -1,16 +1,17 @@
 Basic build
make config T=x86_64-native-linuxapp-gcc && make
 Build commands
-   config   get configuration from target template (T=)
-   all  same as build (default rule)
-   buildbuild in a configured directory
-   cleanremove files but keep configuration
-   install T=   configure, build and install a target in DESTDIR
-   install  install optionally staged in DESTDIR
-   examples build examples for given targets (T=)
-   examples_clean   clean examples for given targets (T=)
-   test compile tests and run basic unit tests
-   test-*   run specific subset of unit tests
+   config  get configuration from target template (T=)
+   all same as build (default rule)
+   build   build in a configured directory
+   clean   remove files but keep configuration
+   install T=  configure, build and install a target in DESTDIR
+   install install optionally staged in DESTDIR
+   examplesbuild examples for given targets (T=)
+   examples_clean  clean examples for given targets (T=)
+   testcompile tests and run basic unit tests
+   test-*  run specific subset of unit tests
+   tags|gtags|cscope [T=]  generate tags or cscope index
 Build variables
EXTRA_CPPFLAGS   preprocessor options
EXTRA_CFLAGS compiler options
-- 
2.6.4



Re: [dpdk-dev] [RFC 3/3] rte_flow: add new action for traffic metering and policing

2017-06-06 Thread Dumitrescu, Cristian
Hi Adrien,

Thanks for reviewing this proposal.

> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> Sent: Thursday, June 1, 2017 4:14 PM
> To: Dumitrescu, Cristian 
> Cc: dev@dpdk.org; tho...@monjalon.net;
> jerin.ja...@caviumnetworks.com; hemant.agra...@nxp.com; Doherty,
> Declan ; Wiles, Keith 
> Subject: Re: [RFC 3/3] rte_flow: add new action for traffic metering and
> policing
> 
> Hi Cristian,
> 
> On Tue, May 30, 2017 at 05:44:13PM +0100, Cristian Dumitrescu wrote:
> > Signed-off-by: Cristian Dumitrescu 
> > ---
> >  lib/librte_ether/rte_flow.h | 22 ++
> >  1 file changed, 22 insertions(+)
> >
> > diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
> > index c47edbc..2942ca7 100644
> > --- a/lib/librte_ether/rte_flow.h
> > +++ b/lib/librte_ether/rte_flow.h
> > @@ -881,6 +881,14 @@ enum rte_flow_action_type {
> >  * See struct rte_flow_action_vf.
> >  */
> > RTE_FLOW_ACTION_TYPE_VF,
> > +
> > +   /**
> > +* Traffic metering and policing (MTR).
> > +*
> > +* See struct rte_flow_action_meter.
> > +* See file rte_mtr.h for MTR object configuration.
> > +*/
> > +   RTE_FLOW_ACTION_TYPE_METER,
> >  };
> >
> >  /**
> > @@ -974,6 +982,20 @@ struct rte_flow_action_vf {
> >  };
> >
> >  /**
> > + * RTE_FLOW_ACTION_TYPE_METER
> > + *
> > + * Traffic metering and policing (MTR).
> > + *
> > + * Packets matched by items of this type can be either dropped or passed
> to the
> > + * next item with their color set by the MTR object.
> > + *
> > + * Non-terminating by default.
> > + */
> > +struct rte_flow_action_meter {
> > +   uint32_t mtr_id; /**< MTR object ID created with rte_mtr_create().
> */
> > +};
> > +
> > +/**
> >   * Definition of a single action.
> >   *
> >   * A list of actions is terminated by a END action.
> 
> Assuming this action is provided to the underlying PMD, can you describe
> what happens next; what is a PMD supposed to do when creating the flow
> rule
> and the impact on its data path?
> 

Metering is just another flow action that needs to be supported by rte_flow API.

Typically, NICs supporting this action have an array of metering & policing 
contexts on their data path, which are abstracted as MTR objects in our API.
- rte_mtr_create() configures an MTR object, with no association to any of the 
known flows yet.
- On NIC side, the driver configures one of the available metering & 
policing contexts.
- rte_flow_create() defines the flow (match rule) and its set of actions, with 
metering & policing as one of the actions.
- On NIC side, the driver configures a flow/filter for traffic 
classification/distribution/bifurcation, with the metering & policing context 
enabled for this flow.

At run-time, any packet matching this flow will execute this action, which 
involves metering (packet is assigned a color) and policing (packet may be 
recolored or dropped, as configured), with stats being updated as well.

> It looks like mtr_id is arbitrarily set by the user calling
> rte_mtr_create(), which means the PMD has to look up the associated MTR
> context somehow.
> 
> How about making the rte_mtr_create() API return an opaque rte_mtr
> object
> pointer provided back to all API functions as well as through this action
> instead, and not leave it up to the user?
> 

Of course, it can be done this way as well, but IMHO probably not the best idea 
from the application perspective. We had a similar discussion when we defined 
the ethdev traffic management API [1].

Object handles can be integers, void pointers or pointers to opaque structures, 
and each of these approaches are allowed and used by DPDK APIs. Here is an 
example why I think using integers for MTR object handle makes the life of the 
application easier:
- Let's assume we have several actions for a flow (a1, a2, a3, ...).
- When handles are pointers to opaque structures, app typically needs to save 
all of them in a per flow data structure: struct a1 *p1, struct a2 *p2, struct 
a3 *p3.
-This results in increased complexity and size for the app tables, 
which can be avoided.
- When handles are integers generated by the app as opposed of driver, the app 
can simply use a single index - let's cal it flow_id - and register it as the 
handle to each of these flow actions.
- No more fake tables.
- No more worries about the pointer being valid in one address space 
and not valid in another.

There is some handle lookup to be done by the driver, but this is a trivial 
task,  and checking the validity of the handle (input parameter) is the first 
thing done by any API function, regardless of which handle style is used.

[1] http://www.dpdk.org/ml/archives/dev/2017-February/057368.html


> --
> Adrien Mazarguil
> 6WIND

Regards,
Cristian



Re: [dpdk-dev] [PATCH 1/4] eal: introduce the rte macro for always inline

2017-06-06 Thread Thomas Monjalon
15/05/2017 10:07, Bruce Richardson:
> On Sat, May 13, 2017 at 02:57:25PM +0530, Jerin Jacob wrote:
> > Different drivers use internal macros like force_inline for compiler
> > always inline feature.
> > Standardizing it through __rte_always_inline macro.
> > 
> > Signed-off-by: Jerin Jacob 
> > ---
> 
> Good cleanup.
> 
> Series Acked-by: Bruce Richardson 

Applied, thanks



Re: [dpdk-dev] [PATCH v2 1/2] net/i40e: optimize vxlan parsing function

2017-06-06 Thread Lu, Wenzhuo
Hi,


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> Sent: Thursday, June 1, 2017 2:57 PM
> To: Wu, Jingjing
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 1/2] net/i40e: optimize vxlan parsing function
> 
> This commit optimizes vxlan parsing function.
> 
> Signed-off-by: Beilei Xing 
Acked-by: Wenzhuo Lu 


Re: [dpdk-dev] [PATCH v2 1/2] net/i40e: optimize vxlan parsing function

2017-06-06 Thread Yuanhan Liu
On Thu, Jun 01, 2017 at 02:56:30PM +0800, Beilei Xing wrote:
> This commit optimizes vxlan parsing function.

How?

--yliu

> Signed-off-by: Beilei Xing 
> ---
>  drivers/net/i40e/i40e_flow.c | 176 
> ++-
>  1 file changed, 55 insertions(+), 121 deletions(-)


Re: [dpdk-dev] [PATCH v2 1/2] net/i40e: optimize vxlan parsing function

2017-06-06 Thread Xing, Beilei

> -Original Message-
> From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> Sent: Wednesday, June 7, 2017 11:31 AM
> To: Xing, Beilei 
> Cc: Wu, Jingjing ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] net/i40e: optimize vxlan parsing
> function
> 
> On Thu, Jun 01, 2017 at 02:56:30PM +0800, Beilei Xing wrote:
> > This commit optimizes vxlan parsing function.
> 
> How?
> 
>   --yliu

The original parsing function is a little complex and not easy to read when 
parsing filter type, this patch optimizes the function and makes it more 
readable.

> 
> > Signed-off-by: Beilei Xing 
> > ---
> >  drivers/net/i40e/i40e_flow.c | 176 ++--
> ---
> >  1 file changed, 55 insertions(+), 121 deletions(-)


Re: [dpdk-dev] [PATCH v2] ethdev: remove driver name from device private data

2017-06-06 Thread Shreyansh Jain

On Tuesday 06 June 2017 08:40 PM, Ferruh Yigit wrote:

rte_driver->name has the driver name and all physical and virtual
devices has access to it.

Previously it was not possible for virtual ethernet devices to access
rte_driver->name field (because eth_dev used to keep only pci_dev),
and it was required to save driver name in the device private struct.

After re-works on bus and vdev, it is possible for all bus types to
access rte_driver.

It is able to remove the driver name from ethdev device private data and
use eth_dev->device->driver->name.

Signed-off-by: Ferruh Yigit 
---
Cc: Gaetan Rivet 
Cc: Jan Blunck 

v2:
* rebase on latest next-net
---
  drivers/net/bnxt/bnxt_ethdev.c | 2 +-
  drivers/net/bonding/rte_eth_bond_api.c | 4 ++--
  drivers/net/cxgbe/sge.c| 6 +++---
  drivers/net/dpaa2/dpaa2_ethdev.c   | 1 -
  drivers/net/i40e/i40e_ethdev.c | 3 +--
  drivers/net/i40e/i40e_fdir.c   | 2 +-
  drivers/net/ixgbe/ixgbe_ethdev.c   | 2 +-
  drivers/net/ring/rte_eth_ring.c| 1 -
  drivers/net/tap/rte_eth_tap.c  | 1 -
  drivers/net/vmxnet3/vmxnet3_ethdev.c   | 2 +-
  drivers/net/xenvirt/rte_eth_xenvirt.c  | 1 -
  lib/librte_ether/rte_ethdev.c  | 8 
  lib/librte_ether/rte_ethdev.h  | 1 -
  lib/librte_ether/rte_ethdev_pci.h  | 1 -
  lib/librte_ether/rte_ethdev_vdev.h | 1 -
  15 files changed, 14 insertions(+), 22 deletions(-)



Apologies for delay in responding. I am OK with respect to the dpaa2 
change. Otherwise as well:


Acked-by: Shreyansh Jain 



[dpdk-dev] [PATCH] dpdk: remove typos using codespell utility

2017-06-06 Thread Jerin Jacob
Fixing typos across dpdk source code using codespell utility.
Skipped the ethdev driver's base code fixes to keep the base
code intact.

Signed-off-by: Jerin Jacob 
---
- This is not completely an automatic process. The tool can do 90% of the job
but need to crosscheck the changes manually.
- The patchset does not create any new check patch errors. This patch only
fixes the typos, not the existing code check patch issues.
---
 app/test-pmd/testpmd.c |  2 +-
 devtools/cocci/mtod-offset.cocci   |  2 +-
 devtools/validate-abi.sh   |  4 +--
 doc/guides/nics/sfc_efx.rst|  2 +-
 doc/guides/tools/cryptoperf.rst|  6 ++--
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h|  2 +-
 drivers/bus/fslmc/qbman/qbman_portal.c |  4 +--
 drivers/crypto/qat/qat_crypto.c|  2 +-
 drivers/net/ark/ark_ethdev.c   |  2 +-
 drivers/net/bnx2x/bnx2x.c  |  8 ++---
 drivers/net/bnx2x/bnx2x_stats.c| 10 +++---
 drivers/net/bnx2x/bnx2x_vfpf.h |  2 +-
 drivers/net/bnx2x/ecore_hsi.h  | 12 +++
 drivers/net/bnx2x/ecore_init.h |  2 +-
 drivers/net/bnx2x/ecore_sp.c   |  4 +--
 drivers/net/bnx2x/ecore_sp.h   | 22 ++---
 drivers/net/bnx2x/elink.c  | 16 -
 drivers/net/bonding/rte_eth_bond_8023ad.c  |  2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c |  4 +--
 drivers/net/bonding/rte_eth_bond_private.h |  2 +-
 drivers/net/cxgbe/cxgbe_main.c |  2 +-
 drivers/net/cxgbe/sge.c|  4 +--
 drivers/net/enic/enic_main.c   |  2 +-
 drivers/net/i40e/i40e_ethdev.c | 10 +++---
 drivers/net/i40e/i40e_ethdev.h |  2 +-
 drivers/net/i40e/i40e_rxtx.c   |  6 ++--
 drivers/net/ixgbe/ixgbe_ethdev.h   |  2 +-
 drivers/net/ixgbe/ixgbe_fdir.c |  2 +-
 drivers/net/nfp/nfp_net.c  |  2 +-
 drivers/net/qede/qede_rxtx.c   |  6 ++--
 drivers/net/ring/rte_eth_ring.c|  2 +-
 drivers/net/sfc/sfc_rx.c   |  2 +-
 drivers/net/tap/rte_eth_tap.c  |  2 +-
 drivers/net/thunderx/nicvf_ethdev.c|  2 +-
 examples/Makefile  |  2 +-
 examples/bond/main.c   |  2 +-
 examples/cmdline/Makefile  |  2 +-
 examples/distributor/Makefile  |  2 +-
 examples/ethtool/ethtool-app/main.c|  2 +-
 examples/ethtool/lib/rte_ethtool.c |  2 +-
 examples/ethtool/lib/rte_ethtool.h |  4 +--
 examples/exception_path/Makefile   |  2 +-
 examples/helloworld/Makefile   |  2 +-
 examples/ip_fragmentation/Makefile |  2 +-
 examples/ip_fragmentation/main.c   |  2 +-
 examples/ip_reassembly/Makefile|  2 +-
 examples/ipv4_multicast/Makefile   |  2 +-
 examples/kni/Makefile  |  2 +-
 examples/l2fwd/Makefile|  2 +-
 examples/l3fwd-acl/Makefile|  2 +-
 examples/l3fwd-power/Makefile  |  2 +-
 examples/l3fwd-vf/Makefile |  2 +-
 examples/l3fwd/Makefile|  2 +-
 examples/l3fwd/l3fwd_sse.h |  2 +-
 examples/link_status_interrupt/Makefile|  2 +-
 examples/load_balancer/Makefile|  2 +-
 examples/multi_process/Makefile|  2 +-
 examples/multi_process/client_server_mp/Makefile   |  2 +-
 .../client_server_mp/mp_client/Makefile|  2 +-
 .../client_server_mp/mp_server/Makefile|  2 +-
 examples/multi_process/l2fwd_fork/Makefile |  2 +-
 examples/multi_process/l2fwd_fork/flib.h   |  2 +-
 examples/multi_process/simple_mp/Makefile  |  2 +-
 examples/multi_process/symmetric_mp/Makefile   |  2 +-
 examples/netmap_compat/Makefile|  2 +-
 examples/netmap_compat/bridge/Makefile |  2 +-
 examples/netmap_compat/lib/compat_netmap.c |  2 +-
 examples/performance-thread/common/lthread_mutex.c |  2 +-
 examples/performance-thread/l3fwd-thread/main.c|  2 +-
 examples/qos_meter/Makefile|  2 +-
 examples/qos_sched/Makefile|  2 +-
 examples/quota_watermark/Makefile  |  2 +-
 examples/quota_watermark/qw/Makefile   |  2 +-
 examples/quota_watermark/qwctl/Makefile|  2 +-
 examples/timer/Makefile|  2 +-
 examples/vhost/Makefil

Re: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing function

2017-06-06 Thread Lu, Wenzhuo
Hi Beilei,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> Sent: Thursday, June 1, 2017 2:57 PM
> To: Wu, Jingjing
> Cc: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing function
> 
> This patch adds NVGRE parsing function to support NVGRE classification.
> 
> Signed-off-by: Beilei Xing 
> ---
>  drivers/net/i40e/i40e_flow.c | 271
> ++-
>  1 file changed, 269 insertions(+), 2 deletions(-)

> 
>  /* 1. Last in item should be NULL as range is not supported.
> + * 2. Supported filter types: IMAC_IVLAN_TENID, IMAC_IVLAN,
> + *IMAC_TENID, OMAC_TENID_IMAC and IMAC.
> + * 3. Mask of fields which need to be matched should be
> + *filled with 1.
> + * 4. Mask of fields which needn't to be matched should be
> + *filled with 0.
> + */
> +static int
> +i40e_flow_parse_nvgre_pattern(__rte_unused struct rte_eth_dev *dev,
> +   const struct rte_flow_item *pattern,
> +   struct rte_flow_error *error,
> +   struct i40e_tunnel_filter_conf *filter) {
> + const struct rte_flow_item *item = pattern;
> + const struct rte_flow_item_eth *eth_spec;
> + const struct rte_flow_item_eth *eth_mask;
> + const struct rte_flow_item_nvgre *nvgre_spec;
> + const struct rte_flow_item_nvgre *nvgre_mask;
> + const struct rte_flow_item_vlan *vlan_spec;
> + const struct rte_flow_item_vlan *vlan_mask;
> + enum rte_flow_item_type item_type;
> + uint8_t filter_type = 0;
> + bool is_tni_masked = 0;
> + uint8_t tni_mask[] = {0xFF, 0xFF, 0xFF};
> + bool nvgre_flag = 0;
> + uint32_t tenant_id_be = 0;
> + int ret;
> +
> + for (; item->type != RTE_FLOW_ITEM_TYPE_END; item++) {
> + if (item->last) {
> + rte_flow_error_set(error, EINVAL,
> +RTE_FLOW_ERROR_TYPE_ITEM,
> +item,
> +"Not support range");
> + return -rte_errno;
> + }
> + item_type = item->type;
> + switch (item_type) {
> + case RTE_FLOW_ITEM_TYPE_ETH:
> + eth_spec = (const struct rte_flow_item_eth *)item-
> >spec;
> + eth_mask = (const struct rte_flow_item_eth *)item-
> >mask;
> + if ((!eth_spec && eth_mask) ||
> + (eth_spec && !eth_mask)) {
> + rte_flow_error_set(error, EINVAL,
> +
> RTE_FLOW_ERROR_TYPE_ITEM,
> +item,
> +"Invalid ether spec/mask");
> + return -rte_errno;
> + }
> +
> + if (eth_spec && eth_mask) {
> + /* DST address of inner MAC shouldn't be
> masked.
> +  * SRC address of Inner MAC should be
> masked.
> +  */
> + if (!is_broadcast_ether_addr(ð_mask->dst)
> ||
> + !is_zero_ether_addr(ð_mask->src) ||
> + eth_mask->type) {
> + rte_flow_error_set(error, EINVAL,
> +
> RTE_FLOW_ERROR_TYPE_ITEM,
> +item,
> +"Invalid ether spec/mask");
> + return -rte_errno;
> + }
> +
> + if (!nvgre_flag) {
> + rte_memcpy(&filter->outer_mac,
> +ð_spec->dst,
> +ETHER_ADDR_LEN);
> + filter_type |=
> ETH_TUNNEL_FILTER_OMAC;
> + } else {
> + rte_memcpy(&filter->inner_mac,
> +ð_spec->dst,
> +ETHER_ADDR_LEN);
> + filter_type |=
> ETH_TUNNEL_FILTER_IMAC;
> + }
> + }
Nothing to do if both spec and mask are NULL, right? If so, would you like to 
add comments here?

> +
> + break;
> + case RTE_FLOW_ITEM_TYPE_VLAN:
> + vlan_spec =
> + (const struct rte_flow_item_vlan *)item-
> >spec;
> + vlan_mask =
> + (const struct rte_flow_item_vlan *)item-
> >mask;
> + if (nvgre_flag) {
Why need to check nvgre_flag? Seems VLAN must be after NVGRE, so this flag is 
always 1.

> + if (!(vlan_spec && vlan_mask)) {
>

Re: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing function

2017-06-06 Thread Xing, Beilei


> -Original Message-
> From: Lu, Wenzhuo
> Sent: Wednesday, June 7, 2017 1:46 PM
> To: Xing, Beilei ; Wu, Jingjing 
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing
> function
> 
> Hi Beilei,
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> > Sent: Thursday, June 1, 2017 2:57 PM
> > To: Wu, Jingjing
> > Cc: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing function
> >
> > This patch adds NVGRE parsing function to support NVGRE classification.
> >
> > Signed-off-by: Beilei Xing 
> > ---
> >  drivers/net/i40e/i40e_flow.c | 271
> > ++-
> >  1 file changed, 269 insertions(+), 2 deletions(-)
> 
> >
> >  /* 1. Last in item should be NULL as range is not supported.
> > + * 2. Supported filter types: IMAC_IVLAN_TENID, IMAC_IVLAN,
> > + *IMAC_TENID, OMAC_TENID_IMAC and IMAC.
> > + * 3. Mask of fields which need to be matched should be
> > + *filled with 1.
> > + * 4. Mask of fields which needn't to be matched should be
> > + *filled with 0.
> > + */
> > +static int
> > +i40e_flow_parse_nvgre_pattern(__rte_unused struct rte_eth_dev *dev,
> > + const struct rte_flow_item *pattern,
> > + struct rte_flow_error *error,
> > + struct i40e_tunnel_filter_conf *filter) {
> > +   const struct rte_flow_item *item = pattern;
> > +   const struct rte_flow_item_eth *eth_spec;
> > +   const struct rte_flow_item_eth *eth_mask;
> > +   const struct rte_flow_item_nvgre *nvgre_spec;
> > +   const struct rte_flow_item_nvgre *nvgre_mask;
> > +   const struct rte_flow_item_vlan *vlan_spec;
> > +   const struct rte_flow_item_vlan *vlan_mask;
> > +   enum rte_flow_item_type item_type;
> > +   uint8_t filter_type = 0;
> > +   bool is_tni_masked = 0;
> > +   uint8_t tni_mask[] = {0xFF, 0xFF, 0xFF};
> > +   bool nvgre_flag = 0;
> > +   uint32_t tenant_id_be = 0;
> > +   int ret;
> > +
> > +   for (; item->type != RTE_FLOW_ITEM_TYPE_END; item++) {
> > +   if (item->last) {
> > +   rte_flow_error_set(error, EINVAL,
> > +  RTE_FLOW_ERROR_TYPE_ITEM,
> > +  item,
> > +  "Not support range");
> > +   return -rte_errno;
> > +   }
> > +   item_type = item->type;
> > +   switch (item_type) {
> > +   case RTE_FLOW_ITEM_TYPE_ETH:
> > +   eth_spec = (const struct rte_flow_item_eth *)item-
> > >spec;
> > +   eth_mask = (const struct rte_flow_item_eth *)item-
> > >mask;
> > +   if ((!eth_spec && eth_mask) ||
> > +   (eth_spec && !eth_mask)) {
> > +   rte_flow_error_set(error, EINVAL,
> > +
> > RTE_FLOW_ERROR_TYPE_ITEM,
> > +  item,
> > +  "Invalid ether spec/mask");
> > +   return -rte_errno;
> > +   }
> > +
> > +   if (eth_spec && eth_mask) {
> > +   /* DST address of inner MAC shouldn't be
> > masked.
> > +* SRC address of Inner MAC should be
> > masked.
> > +*/
> > +   if (!is_broadcast_ether_addr(ð_mask-
> >dst)
> > ||
> > +   !is_zero_ether_addr(ð_mask->src) ||
> > +   eth_mask->type) {
> > +   rte_flow_error_set(error, EINVAL,
> > +
> > RTE_FLOW_ERROR_TYPE_ITEM,
> > +  item,
> > +  "Invalid ether spec/mask");
> > +   return -rte_errno;
> > +   }
> > +
> > +   if (!nvgre_flag) {
> > +   rte_memcpy(&filter->outer_mac,
> > +  ð_spec->dst,
> > +  ETHER_ADDR_LEN);
> > +   filter_type |=
> > ETH_TUNNEL_FILTER_OMAC;
> > +   } else {
> > +   rte_memcpy(&filter->inner_mac,
> > +  ð_spec->dst,
> > +  ETHER_ADDR_LEN);
> > +   filter_type |=
> > ETH_TUNNEL_FILTER_IMAC;
> > +   }
> > +   }
> Nothing to do if both spec and mask are NULL, right? If so, would you like to
> add comments here?

OK. Will update in v2.

> 
> > +
> > +   break;
> > +   case RTE_FLOW_ITEM_TYPE_VLAN:
> > +   vlan_spec =
> > +   (const struct rte_flow

Re: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing function

2017-06-06 Thread Lu, Wenzhuo
> -Original Message-
> From: Xing, Beilei
> Sent: Wednesday, June 7, 2017 2:07 PM
> To: Lu, Wenzhuo; Wu, Jingjing
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing function
> 
> 
> 
> > -Original Message-
> > From: Lu, Wenzhuo
> > Sent: Wednesday, June 7, 2017 1:46 PM
> > To: Xing, Beilei ; Wu, Jingjing
> > 
> > Cc: dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing
> > function
> >
> > Hi Beilei,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> > > Sent: Thursday, June 1, 2017 2:57 PM
> > > To: Wu, Jingjing
> > > Cc: dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing
> > > function
> > >
> > > This patch adds NVGRE parsing function to support NVGRE classification.
> > >
> > > Signed-off-by: Beilei Xing 
> > > ---
> > >  drivers/net/i40e/i40e_flow.c | 271
> > > ++-
> > >  1 file changed, 269 insertions(+), 2 deletions(-)

> 
> >
> > > +
> > > + break;
> > > + case RTE_FLOW_ITEM_TYPE_VLAN:
> > > + vlan_spec =
> > > + (const struct rte_flow_item_vlan *)item-
> > > >spec;
> > > + vlan_mask =
> > > + (const struct rte_flow_item_vlan *)item-
> > > >mask;
> > > + if (nvgre_flag) {
> > Why need to check nvgre_flag? Seems VLAN must be after NVGRE, so this
> > flag is always 1.
> 
> It's used to  distinguish outer mac or inner mac.
I know you need to add this flag for MAC. But I'm talking about VLAN. There's 
only inner VLAN. So, seems it's useless here.


Re: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing function

2017-06-06 Thread Xing, Beilei
> -Original Message-
> From: Lu, Wenzhuo
> Sent: Wednesday, June 7, 2017 2:12 PM
> To: Xing, Beilei ; Wu, Jingjing 
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing
> function
> 
> > -Original Message-
> > From: Xing, Beilei
> > Sent: Wednesday, June 7, 2017 2:07 PM
> > To: Lu, Wenzhuo; Wu, Jingjing
> > Cc: dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing
> > function
> >
> >
> >
> > > -Original Message-
> > > From: Lu, Wenzhuo
> > > Sent: Wednesday, June 7, 2017 1:46 PM
> > > To: Xing, Beilei ; Wu, Jingjing
> > > 
> > > Cc: dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing
> > > function
> > >
> > > Hi Beilei,
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> > > > Sent: Thursday, June 1, 2017 2:57 PM
> > > > To: Wu, Jingjing
> > > > Cc: dev@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH v2 2/2] net/i40e: add NVGRE parsing
> > > > function
> > > >
> > > > This patch adds NVGRE parsing function to support NVGRE classification.
> > > >
> > > > Signed-off-by: Beilei Xing 
> > > > ---
> > > >  drivers/net/i40e/i40e_flow.c | 271
> > > > ++-
> > > >  1 file changed, 269 insertions(+), 2 deletions(-)
> 
> >
> > >
> > > > +
> > > > +   break;
> > > > +   case RTE_FLOW_ITEM_TYPE_VLAN:
> > > > +   vlan_spec =
> > > > +   (const struct rte_flow_item_vlan *)item-
> > > > >spec;
> > > > +   vlan_mask =
> > > > +   (const struct rte_flow_item_vlan *)item-
> > > > >mask;
> > > > +   if (nvgre_flag) {
> > > Why need to check nvgre_flag? Seems VLAN must be after NVGRE, so
> > > this flag is always 1.
> >
> > It's used to  distinguish outer mac or inner mac.
> I know you need to add this flag for MAC. But I'm talking about VLAN. There's
> only inner VLAN. So, seems it's useless here.

Oh yes, sorry for misunderstanding, outer vlan is not supported here, it can be 
removed. Will update in next version.


[dpdk-dev] [PATCH 1/2] vhost: fix TCP csum not set

2017-06-06 Thread Jianfeng Tan
As PKT_TX_TCP_SEG flag in mbuf->ol_flags implies PKT_TX_TCP_CKSUM,
applications, e.g., testpmd, don't set PKT_TX_TCP_CKSUM when TSO
is set.

This leads to that packets get dropped in VM tcp stack layer because
of bad TCP csum.

To fix this, we make sure TCP NEEDS_CSUM info is set into virtio net
header when PKT_TX_TCP_SEG is set, so that VM tcp stack will not
check the TCP csum.

Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: sta...@dpdk.org

Cc: Yuanhan Liu 
Cc: Maxime Coquelin 
Cc: Jiayu Hu 
Signed-off-by: Jianfeng Tan 
---
 lib/librte_vhost/virtio_net.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 48219e0..0a7e023 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -114,11 +114,16 @@ update_shadow_used_ring(struct vhost_virtqueue *vq,
 static void
 virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr)
 {
-   if (m_buf->ol_flags & PKT_TX_L4_MASK) {
+   uint64_t csum_l4 = m_buf->ol_flags & PKT_TX_L4_MASK;
+
+   if (m_buf->ol_flags & PKT_TX_TCP_SEG)
+   csum_l4 |= PKT_TX_TCP_CKSUM;
+
+   if (csum_l4) {
net_hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
net_hdr->csum_start = m_buf->l2_len + m_buf->l3_len;
 
-   switch (m_buf->ol_flags & PKT_TX_L4_MASK) {
+   switch (csum_l4) {
case PKT_TX_TCP_CKSUM:
net_hdr->csum_offset = (offsetof(struct tcp_hdr,
cksum));
-- 
2.7.4



[dpdk-dev] [PATCH 0/2] fix vhost enqueue offload

2017-06-06 Thread Jianfeng Tan
Patch 1: fix TCP csum not set.
Patch 2: fix IP csum not calculated.

This series is to make sure TCP packets (phy NIC -> vhost -> virtio NIC)
can be correctly received by VM, with phy NIC LRO enabled or software GRO
enabled.

The example setup is:
  ixgbe (LRO enabled) <-> vhost <-> virtio, ixgbe and vhost are driven by
testpmd csum forward engine.

Jianfeng Tan (2):
  vhost: fix TCP csum not set
  vhost: fix IP csum not calculated

 lib/librte_vhost/virtio_net.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH 2/2] vhost: fix IP csum not calculated

2017-06-06 Thread Jianfeng Tan
There is no way to bypass IP checksum verification in Linux
kernel, no matter skb->ip_summed is assigned as CHECKSUM_UNNECESSARY
or CHECKSUM_PARTIAL.

So any packets with bad IP checksum will be dropped at VM IP layer.

To correct, we check this flag PKT_TX_IP_CKSUM to calculate IP csum.

Fixes: 859b480d5afd ("vhost: add guest offload setting")
Cc: sta...@dpdk.org

Cc: Yuanhan Liu 
Cc: Maxime Coquelin 
Cc: Jiayu Hu 
Signed-off-by: Jianfeng Tan 
---
 lib/librte_vhost/virtio_net.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 0a7e023..cf7c5ac 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -143,6 +143,15 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct 
virtio_net_hdr *net_hdr)
ASSIGN_UNLESS_EQUAL(net_hdr->flags, 0);
}
 
+   /* IP cksum verification cannot be bypassed, then calculate here */
+   if (m_buf->ol_flags & PKT_TX_IP_CKSUM) {
+   struct ipv4_hdr *ipv4_hdr;
+
+   ipv4_hdr = rte_pktmbuf_mtod_offset(m_buf, struct ipv4_hdr *,
+  m_buf->l2_len);
+   ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
+   }
+
if (m_buf->ol_flags & PKT_TX_TCP_SEG) {
if (m_buf->ol_flags & PKT_TX_IPV4)
net_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
-- 
2.7.4



[dpdk-dev] [PATCH v3 0/2] net/i40e: extend tunnel filter support

2017-06-06 Thread Beilei Xing
This patchset extends tunnel filter support with vxlan parsing function 
optimization and NVGRE parsing function.

v2 changes:
 - Add vxlan parsing function optimization.
 - Optmize NVGRE parsing function.

v3 changes:
 - Polish commit log.
 - Delete redundant if statements.

Beilei Xing (2):
  net/i40e: optimize vxlan parsing function
  net/i40e: add NVGRE parsing function

 drivers/net/i40e/i40e_flow.c | 445 +++
 1 file changed, 319 insertions(+), 126 deletions(-)

-- 
2.5.5



[dpdk-dev] [PATCH v3 2/2] net/i40e: add NVGRE parsing function

2017-06-06 Thread Beilei Xing
This patch adds NVGRE parsing function to support NVGRE
classification.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_flow.c | 267 ++-
 1 file changed, 265 insertions(+), 2 deletions(-)

diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index b4ba555..fab4a0d 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -114,6 +114,12 @@ static int i40e_flow_parse_vxlan_filter(struct rte_eth_dev 
*dev,
const struct rte_flow_action actions[],
struct rte_flow_error *error,
union i40e_filter_t *filter);
+static int i40e_flow_parse_nvgre_filter(struct rte_eth_dev *dev,
+   const struct rte_flow_attr *attr,
+   const struct rte_flow_item pattern[],
+   const struct rte_flow_action actions[],
+   struct rte_flow_error *error,
+   union i40e_filter_t *filter);
 static int i40e_flow_parse_mpls_filter(struct rte_eth_dev *dev,
   const struct rte_flow_attr *attr,
   const struct rte_flow_item pattern[],
@@ -296,7 +302,40 @@ static enum rte_flow_item_type pattern_vxlan_4[] = {
RTE_FLOW_ITEM_TYPE_END,
 };
 
-/* Pattern matched MPLS */
+static enum rte_flow_item_type pattern_nvgre_1[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_NVGRE,
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+static enum rte_flow_item_type pattern_nvgre_2[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_NVGRE,
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+static enum rte_flow_item_type pattern_nvgre_3[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV4,
+   RTE_FLOW_ITEM_TYPE_NVGRE,
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_VLAN,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
+static enum rte_flow_item_type pattern_nvgre_4[] = {
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_IPV6,
+   RTE_FLOW_ITEM_TYPE_NVGRE,
+   RTE_FLOW_ITEM_TYPE_ETH,
+   RTE_FLOW_ITEM_TYPE_VLAN,
+   RTE_FLOW_ITEM_TYPE_END,
+};
+
 static enum rte_flow_item_type pattern_mpls_1[] = {
RTE_FLOW_ITEM_TYPE_ETH,
RTE_FLOW_ITEM_TYPE_IPV4,
@@ -329,7 +368,6 @@ static enum rte_flow_item_type pattern_mpls_4[] = {
RTE_FLOW_ITEM_TYPE_END,
 };
 
-/* Pattern matched QINQ */
 static enum rte_flow_item_type pattern_qinq_1[] = {
RTE_FLOW_ITEM_TYPE_ETH,
RTE_FLOW_ITEM_TYPE_VLAN,
@@ -362,6 +400,11 @@ static struct i40e_valid_pattern i40e_supported_patterns[] 
= {
{ pattern_vxlan_2, i40e_flow_parse_vxlan_filter },
{ pattern_vxlan_3, i40e_flow_parse_vxlan_filter },
{ pattern_vxlan_4, i40e_flow_parse_vxlan_filter },
+   /* NVGRE */
+   { pattern_nvgre_1, i40e_flow_parse_nvgre_filter },
+   { pattern_nvgre_2, i40e_flow_parse_nvgre_filter },
+   { pattern_nvgre_3, i40e_flow_parse_nvgre_filter },
+   { pattern_nvgre_4, i40e_flow_parse_nvgre_filter },
/* MPLSoUDP & MPLSoGRE */
{ pattern_mpls_1, i40e_flow_parse_mpls_filter },
{ pattern_mpls_2, i40e_flow_parse_mpls_filter },
@@ -1525,6 +1568,226 @@ i40e_flow_parse_vxlan_filter(struct rte_eth_dev *dev,
 }
 
 /* 1. Last in item should be NULL as range is not supported.
+ * 2. Supported filter types: IMAC_IVLAN_TENID, IMAC_IVLAN,
+ *IMAC_TENID, OMAC_TENID_IMAC and IMAC.
+ * 3. Mask of fields which need to be matched should be
+ *filled with 1.
+ * 4. Mask of fields which needn't to be matched should be
+ *filled with 0.
+ */
+static int
+i40e_flow_parse_nvgre_pattern(__rte_unused struct rte_eth_dev *dev,
+ const struct rte_flow_item *pattern,
+ struct rte_flow_error *error,
+ struct i40e_tunnel_filter_conf *filter)
+{
+   const struct rte_flow_item *item = pattern;
+   const struct rte_flow_item_eth *eth_spec;
+   const struct rte_flow_item_eth *eth_mask;
+   const struct rte_flow_item_nvgre *nvgre_spec;
+   const struct rte_flow_item_nvgre *nvgre_mask;
+   const struct rte_flow_item_vlan *vlan_spec;
+   const struct rte_flow_item_vlan *vlan_mask;
+   enum rte_flow_item_type item_type;
+   uint8_t filter_type = 0;
+   bool is_tni_masked = 0;
+   uint8_t tni_mask[] = {0xFF, 0xFF, 0xFF};
+   bool nvgre_flag = 0;
+   uint32_t tenant_id_be = 0;
+   int ret;
+
+   for (; item->type != RTE_FLOW_ITEM_TYPE_END; item++) {
+   if (item->last) {
+   rte_flow_error_set(error, EINVAL,
+ 

Re: [dpdk-dev] [PATCH v2 01/11] bus: add bus iterator to find a particular bus

2017-06-06 Thread Shreyansh Jain

Hello Gaetan,

On Wednesday 31 May 2017 06:47 PM, Gaetan Rivet wrote:

From: Jan Blunck 

Signed-off-by: Jan Blunck 
Signed-off-by: Gaetan Rivet 
---
  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
  lib/librte_eal/common/eal_common_bus.c  | 13 ++
  lib/librte_eal/common/include/rte_bus.h | 32 +
  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
  4 files changed, 47 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 2e48a73..ed09ab2 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -162,6 +162,7 @@ DPDK_17.02 {
  DPDK_17.05 {
global:
  
+	rte_bus_find;

rte_cpu_is_supported;
rte_log_dump;
rte_log_register;
diff --git a/lib/librte_eal/common/eal_common_bus.c 
b/lib/librte_eal/common/eal_common_bus.c
index 8f9baf8..68f70d0 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -145,3 +145,16 @@ rte_bus_dump(FILE *f)
}
}
  }
+
+struct rte_bus *
+rte_bus_find(rte_bus_match_t match, const void *data)
+{
+   struct rte_bus *bus = NULL;
+
+   TAILQ_FOREACH(bus, &rte_bus_list, next) {
+   if (match(bus, data))
+   break;
+   }
+
+   return bus;
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index 7c36969..006feca 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -141,6 +141,38 @@ int rte_bus_probe(void);
  void rte_bus_dump(FILE *f);
  
  /**

+ * Bus match function.
+ *
+ * @param bus
+ * bus under test.
+ *
+ * @param data
+ * data matched
+ *
+ * @return
+ * 0 if the bus does not match.
+ * !0 if the bus matches.


One of the common match function implementation could be simply to match
a string. strcmp itself returns '0' for a successful match.
On the same lines, should this function return value be reversed?
-
0 if match
!0 if not a match
-
That way, people would not have to change either the way strcmp works,
for example, or the way various APIs expect '0' as success.

same for rte_device_match_t as well. (in next patch)


+ */
+typedef int (*rte_bus_match_t)(const struct rte_bus *bus, const void *data);
+
+/**
+ * Bus iterator to find a particular bus.
+ *
+ * If the callback returns non-zero this function will stop iterating over
+ * any more buses.
+ *
+ * @param match
+ *  Callback function to check bus
+ *
+ * @param data
+ *  Data to pass to match callback
+ *
+ * @return
+ *  A pointer to a rte_bus structure or NULL in case no bus matches
+ */
+struct rte_bus *rte_bus_find(rte_bus_match_t match, const void *data);
+
+/**
   * Helper for Bus registration.
   * The constructor has higher priority than PMD constructors.
   */
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index 670bab3..6efa517 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -166,6 +166,7 @@ DPDK_17.02 {
  DPDK_17.05 {
global:
  
+	rte_bus_find;

rte_cpu_is_supported;
rte_intr_free_epoll_fd;
rte_log_dump;



[dpdk-dev] [PATCH v3 1/2] net/i40e: optimize vxlan parsing function

2017-06-06 Thread Beilei Xing
The current vxlan parsing function is not easy to read when parsing
filter type, this patch optimizes the function and makes it more
readable.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_flow.c | 196 ++-
 1 file changed, 63 insertions(+), 133 deletions(-)

diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 37b55e7..b4ba555 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -1268,27 +1268,27 @@ i40e_flow_parse_tunnel_action(struct rte_eth_dev *dev,
return 0;
 }
 
+static uint16_t i40e_supported_tunnel_filter_types[] = {
+   ETH_TUNNEL_FILTER_IMAC | ETH_TUNNEL_FILTER_TENID |
+   ETH_TUNNEL_FILTER_IVLAN,
+   ETH_TUNNEL_FILTER_IMAC | ETH_TUNNEL_FILTER_IVLAN,
+   ETH_TUNNEL_FILTER_IMAC | ETH_TUNNEL_FILTER_TENID,
+   ETH_TUNNEL_FILTER_OMAC | ETH_TUNNEL_FILTER_TENID |
+   ETH_TUNNEL_FILTER_IMAC,
+   ETH_TUNNEL_FILTER_IMAC,
+};
+
 static int
-i40e_check_tenant_id_mask(const uint8_t *mask)
+i40e_check_tunnel_filter_type(uint8_t filter_type)
 {
-   uint32_t j;
-   int is_masked = 0;
-
-   for (j = 0; j < I40E_TENANT_ARRAY_NUM; j++) {
-   if (*(mask + j) == UINT8_MAX) {
-   if (j > 0 && (*(mask + j) != *(mask + j - 1)))
-   return -EINVAL;
-   is_masked = 0;
-   } else if (*(mask + j) == 0) {
-   if (j > 0 && (*(mask + j) != *(mask + j - 1)))
-   return -EINVAL;
-   is_masked = 1;
-   } else {
-   return -EINVAL;
-   }
+   uint8_t i;
+
+   for (i = 0; i < RTE_DIM(i40e_supported_tunnel_filter_types); i++) {
+   if (filter_type == i40e_supported_tunnel_filter_types[i])
+   return 0;
}
 
-   return is_masked;
+   return -1;
 }
 
 /* 1. Last in item should be NULL as range is not supported.
@@ -1308,18 +1308,17 @@ i40e_flow_parse_vxlan_pattern(__rte_unused struct 
rte_eth_dev *dev,
const struct rte_flow_item *item = pattern;
const struct rte_flow_item_eth *eth_spec;
const struct rte_flow_item_eth *eth_mask;
-   const struct rte_flow_item_eth *o_eth_spec = NULL;
-   const struct rte_flow_item_eth *o_eth_mask = NULL;
-   const struct rte_flow_item_vxlan *vxlan_spec = NULL;
-   const struct rte_flow_item_vxlan *vxlan_mask = NULL;
-   const struct rte_flow_item_eth *i_eth_spec = NULL;
-   const struct rte_flow_item_eth *i_eth_mask = NULL;
-   const struct rte_flow_item_vlan *vlan_spec = NULL;
-   const struct rte_flow_item_vlan *vlan_mask = NULL;
+   const struct rte_flow_item_vxlan *vxlan_spec;
+   const struct rte_flow_item_vxlan *vxlan_mask;
+   const struct rte_flow_item_vlan *vlan_spec;
+   const struct rte_flow_item_vlan *vlan_mask;
+   uint8_t filter_type = 0;
bool is_vni_masked = 0;
+   uint8_t vni_mask[] = {0xFF, 0xFF, 0xFF};
enum rte_flow_item_type item_type;
bool vxlan_flag = 0;
uint32_t tenant_id_be = 0;
+   int ret;
 
for (; item->type != RTE_FLOW_ITEM_TYPE_END; item++) {
if (item->last) {
@@ -1334,6 +1333,11 @@ i40e_flow_parse_vxlan_pattern(__rte_unused struct 
rte_eth_dev *dev,
case RTE_FLOW_ITEM_TYPE_ETH:
eth_spec = (const struct rte_flow_item_eth *)item->spec;
eth_mask = (const struct rte_flow_item_eth *)item->mask;
+
+   /* Check if ETH item is used for place holder.
+* If yes, both spec and mask should be NULL.
+* If no, both spec and mask shouldn't be NULL.
+*/
if ((!eth_spec && eth_mask) ||
(eth_spec && !eth_mask)) {
rte_flow_error_set(error, EINVAL,
@@ -1357,50 +1361,40 @@ i40e_flow_parse_vxlan_pattern(__rte_unused struct 
rte_eth_dev *dev,
return -rte_errno;
}
 
-   if (!vxlan_flag)
+   if (!vxlan_flag) {
rte_memcpy(&filter->outer_mac,
   ð_spec->dst,
   ETHER_ADDR_LEN);
-   else
+   filter_type |= ETH_TUNNEL_FILTER_OMAC;
+   } else {
rte_memcpy(&filter->inner_mac,
   ð_spec->dst,
   ETHER_ADDR_LEN);
+   filter_type |= ETH_TUNNEL_FILTER_IMAC;
+   }
}
-