Re: [dpdk-dev] [PATCH v4] app/testpmd: enable the heavyweight mode TCP/IPv4 GRO

2017-09-27 Thread Yao, Lei A


> -Original Message-
> From: Hu, Jiayu
> Sent: Tuesday, September 26, 2017 2:27 PM
> To: dev@dpdk.org
> Cc: Yigit, Ferruh ; Tan, Jianfeng
> ; Ananyev, Konstantin
> ; tho...@monjalon.net; Wu, Jingjing
> ; Yao, Lei A ; Hu, Jiayu
> 
> Subject: [PATCH v4] app/testpmd: enable the heavyweight mode TCP/IPv4
> GRO
> 
> The GRO library provides two modes to reassemble packets. Currently, the
> csum forwarding engine has supported to use the lightweight mode to
> reassemble TCP/IPv4 packets. This patch introduces the heavyweight mode
> for TCP/IPv4 GRO in the csum forwarding engine.
> 
> With the command "set port  gro on|off", users can enable
> TCP/IPv4 GRO for a given port. With the command "set gro flush ",
> users can determine when the GROed TCP/IPv4 packets are flushed from
> reassembly tables. With the command "show port  gro", users can
> display GRO configuration.
> 
> The GRO library doesn't re-calculate checksums for merged packets. If
> users want the merged packets to have correct IP and TCP checksums,
> please select HW IP checksum calculation and HW TCP checksum calculation
> for the port which the merged packets are transmitted to.
> 
> Signed-off-by: Jiayu Hu 
> Reviewed-by: Ferruh Yigit 
Tested-by: Yao Lei

This patch has beed tested on my bench. The following
is the performance data got from iperf test with single flow
No GRO: 9.5 Gbps
Kernel GRO: 13.6 Gbps
DPDK GRO with flush cycle=1 : 25.9 Gbps
DPDK GRO with flush cycle=2 : 27.9 Gbps

Note: When use DPDK GRO with flush cycle=2, I set 
the vhost rx_queue_size to 1024, if use default number
256, sometimes I met the date stall. 
OS: Ubuntu 16.04
CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz  

> ---
> changes in v4:
> - fix unchecking the min value of 'cycle' bug in setup_gro_flush_cycles
> - update the context of the testpmd document and commit logs
> changes in v3:
> - remove "heavyweight mode" and "lightweight mode" from GRO
> commands
> - combine two patches into one
> - use consistent help string for GRO commands
> - remove the unnecessary command "gro set (max_flow_num)
>   (max_item_num_per_flow) (port_id)"
> changes in v2:
> - use "set" and "show" as the root level command
> - add a new command to show GRO configuration
> - fix l2_len/l3_len/l4_len unset etc. bugs
> 
>  app/test-pmd/cmdline.c  | 206 
> 
>  app/test-pmd/config.c   |  68 +++--
>  app/test-pmd/csumonly.c |  31 -
>  app/test-pmd/testpmd.c  |  19 ++-
>  app/test-pmd/testpmd.h  |  16 ++-
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  50 +--
>  6 files changed, 270 insertions(+), 120 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index ccdf239..e44c02e 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -423,13 +423,16 @@ static void cmd_help_long_parsed(void
> *parsed_result,
>   "tso show (portid)"
>   "Display the status of TCP Segmentation
> Offload.\n\n"
> 
> - "gro (on|off) (port_id)"
> + "set port (port_id) gro on|off\n"
>   "Enable or disable Generic Receive Offload in"
>   " csum forwarding engine.\n\n"
> 
> - "gro set (max_flow_num)
> (max_item_num_per_flow) (port_id)\n"
> - "Set max flow number and max packet number
> per-flow"
> - " for GRO.\n\n"
> + "show port (port_id) gro\n"
> + "Display GRO configuration.\n\n"
> +
> + "set gro flush (cycles)\n"
> + "Set the cycle to flush GROed packets from"
> + " reassembly tables.\n\n"
> 
>   "set fwd (%s)\n"
>   "Set packet forwarding mode.\n\n"
> @@ -3854,115 +3857,145 @@ cmdline_parse_inst_t cmd_tunnel_tso_show
> = {
>  };
> 
>  /* *** SET GRO FOR A PORT *** */
> -struct cmd_gro_result {
> +struct cmd_gro_enable_result {
> + cmdline_fixed_string_t cmd_set;
> + cmdline_fixed_string_t cmd_port;
>   cmdline_fixed_string_t cmd_keyword;
> - cmdline_fixed_string_t mode;
> - uint8_t port_id;
> + cmdline_fixed_string_t cmd_onoff;
> + uint8_t cmd_pid;
>  };
> 
>  static void
> -cmd_enable_gro_parsed(void *parsed_result,
> +cmd_gro_enable_parsed(void *parsed_result,
>   __attribute__((unused)) struct cmdline *cl,
>   __attribute__((unused)) void *data)
>  {
> - struct cmd_gro_result *res;
> + struct cmd_gro_enable_result *res;
> 
>   res = parsed_result;
> - setup_gro(res->mode, res->port_id);
> -}
> -
> -cmdline_parse_token_string_t cmd_gro_keyword =
> - TOKEN_STRING_INITIALIZER(struct cmd_gro_result,
> + if (!strcmp(res->cmd_keyword, "gro"))
> + setup_gro(res->cmd_onoff, res->cmd_pid);
> +}
> +
> +cm

Re: [dpdk-dev] [PATCH 1/8] app: link the whole rte_cfgfile library

2017-09-27 Thread Tomasz Duszynski
On Tue, Sep 26, 2017 at 03:31:18PM +0100, Bruce Richardson wrote:
> On Tue, Sep 26, 2017 at 11:39:58AM +0200, Tomasz Duszynski wrote:
> > Since MRVL NET PMD needs librte_cfgfile to parse QoS configuration file
> > link it as the whole library.
> >
> > Signed-off-by: Jacek Siuda 
> > Signed-off-by: Tomasz Duszynski 
> > ---
>
> Can you clarify a bit more why this is needed? For a static build, the
> cfgfile should be linked in to the after the PMDs, so the dependencies
> in the driver should be satisfied in the link. For a dynamic build, the
> PMD should depend upon the cfgfile directly, and use it at runtime
> appropriately.
> Am I missing something?
>
> /Bruce
Hi Bruce,

You are correct, all dependencies in the driver will be satisfied.
The reason this change was introduced is that, librte_pmd_mrvl.a
contains undefined symbols from librte_cfgfile.a thus linking
applications under app/ directory will fail as librte_cfgfile.a comes
before librte_pmd_mrvl.a during the linking stage.

Linking librte_cfgfile.a with --whole-archive solves the issue.

--
- Tomasz Duszyński


[dpdk-dev] DPDK Summit Userspace Live Stream - Day 2

2017-09-27 Thread St Leger, Jim
We've made some improvements based on your feedback. Join us online.
https://www.periscope.tv/w/1dRKZnleWbmKB



Re: [dpdk-dev] [PATCH v4 41/41] net/dpaa: support for extended statistics

2017-09-27 Thread Shreyansh Jain

On Thursday 21 September 2017 06:56 PM, Shreyansh Jain wrote:

On Monday 18 September 2017 08:27 PM, Ferruh Yigit wrote:

On 9/9/2017 12:21 PM, Shreyansh Jain wrote:

From: Hemant Agrawal 

Signed-off-by: Hemant Agrawal 


<...>


+static int
+dpaa_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstat 
*xstats,

+    unsigned int n)
+{
+    struct dpaa_if *dpaa_intf = dev->data->dev_private;
+    unsigned int i = 0, num = RTE_DIM(dpaa_xstats_strings);
+    uint64_t values[sizeof(struct dpaa_if_stats) / 8];
+
+    if (xstats == NULL)
+    return 0;


This is a little not clear from API definition, but I guess when xstats
is NULL, it should return num of available stats, "num" for this case. I
guess there are PMDs implements both, can you please double check?


Ok. I will check again.


I checked a number of other ethev implementations. Some like i40e/e1000 
also return 0 when xstats is NULL. Others, like bnx2x and qede don't 
handle this situation.
All return "num" when passed argument is larger than number of elements 
in the table.


Though, I think the logic that get_xstats should return its size (num) 
when passed with NULL, looks good to me.

How does one standardize such semantics for existing APIs?

(I can add this info to the API document that you created - but only 
once we know if others will agree to change)







+
+    if (n < num)
+    return num;
+
+    fman_if_stats_get_all(dpaa_intf->fif, values,
+  sizeof(struct dpaa_if_stats) / 8);
+
+    for (i = 0; i < num; i++) {
+    xstats[i].id = i;
+    xstats[i].value = values[dpaa_xstats_strings[i].offset / 8];
+    }
+    return i;
+}


<...>








Re: [dpdk-dev] [PATCH v3 21/40] maintainers: claim ownership of DPAA Mempool driver

2017-09-27 Thread Shreyansh Jain

On Friday 22 September 2017 01:05 PM, Thomas Monjalon wrote:

22/09/2017 09:37, Shreyansh Jain:

On Friday 22 September 2017 12:23 PM, Thomas Monjalon wrote:

22/09/2017 08:47, Shreyansh Jain:

On Friday 22 September 2017 03:26 AM, Thomas Monjalon wrote:

23/08/2017 16:11, Shreyansh Jain:

--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -409,6 +409,7 @@ NXP dpaa
M: Hemant Agrawal 
M: Shreyansh Jain 
F: drivers/bus/dpaa/
+F: drivers/mempool/dpaa/
F: doc/guides/nics/dpaa.rst
F: doc/guides/nics/features/dpaa.ini


This kind of patch can be squashed in the first patch introducing
this new directory.


Then the patch script (devtools/check-git-log.sh) reports error - I
think. That is the primary reason I split them across multiple patches.
You sure that doesn't matter?


Which error?

To be clear I suggest to squash with patch 19 where
drivers/mempool/dpaa/Makefile is introduced.


Yes, I understand that.
It would report error that the headline is wrong because I am hitting
different directories - "MAINTAINERS" and "drivers/mempool/*" with the
same patch having headline "mempool/*".


The test you are talking about has this comment:
# check headline prefix when touching only drivers, e.g. net/
If you hit a warning, there is a bug.


Somehow I had the impression it throws an error in such cases. I changed 
as suggested and check-git-log.sh script didn't throw any error - I was 
wrong. Thanks for correcting.


-
Shreyansh



Re: [dpdk-dev] [PATCH v4 02/41] bus/dpaa: introduce NXP DPAA Bus driver skeleton

2017-09-27 Thread Shreyansh Jain

On Monday 25 September 2017 08:41 PM, Ferruh Yigit wrote:

On 9/25/2017 3:32 PM, Shreyansh Jain wrote:

On Tuesday 19 September 2017 06:44 PM, Shreyansh Jain wrote:

Hello Ferruh,

On Monday 18 September 2017 08:17 PM, Ferruh Yigit wrote:

On 9/9/2017 12:20 PM, Shreyansh Jain wrote:

Signed-off-by: Shreyansh Jain 
Signed-off-by: Hemant Agrawal 


<...>


diff --git a/drivers/bus/dpaa/rte_bus_dpaa_version.map
b/drivers/bus/dpaa/rte_bus_dpaa_version.map
new file mode 100644
index 000..d97a009
--- /dev/null
+++ b/drivers/bus/dpaa/rte_bus_dpaa_version.map
@@ -0,0 +1,7 @@
+DPDK_17.11 {
+    global:
+
+    rte_dpaa_driver_register;
+    rte_dpaa_driver_unregister;


"local *;" ?


Agree. I will change this.
Currently rte_dpaa_driver_* functions are being used locally within
bus/dpaa.



Even though I agree earlier that I will change this (append 'local *:'
to the file), probably I will have to skip this.
Further in the patch series, there are some symbols which are added
which are required by the mempool and net drivers (and crypto, in
future). Shared compilation fails for them if I add 'local: *;' here.


It should be OK if this is last item in the first group.

Technically I believe it will be OK to remove that line, but not quite sure.

Lets be consistent with exiting usage and keep it, there are many sample
map files.



I had a look at the various map files in code. There is a mixed usage.
Most don't have 'local' tag in their last blocks which exposes symbols. 
Some, like octeonx, bnxt have both.


I am not very sure of how this changes the scope of variables. So, as of 
now I have made DPAA to have both - global and local in its 17.11 symbol 
block.


[dpdk-dev] [PATCH] ethdev: increase length device internal name

2017-09-27 Thread Stephen Hemminger
Allow sufficicent space for UUID in string form (36+1).
Needed to use UUID with Hyper-V. This was in deprecation notice
for previous release and make it so in 17.11.

Signed-off-by: Stephen Hemminger 
---
 doc/guides/rel_notes/deprecation.rst| 3 +++
 lib/librte_eal/common/include/rte_dev.h | 6 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 3362f33501dd..ba5b68e70098 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -120,3 +120,6 @@ Deprecation Notices
   The non-"do-sig" versions of the hash tables will be removed
   (including the ``signature_offset`` parameter)
   and the "do-sig" versions renamed accordingly.
+
+* ethdev: in 17.11 the size of internal device name is increased
+  to 64 characters to allow for storing longer bus specific name.
diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index 5386d3a28be5..4c4ac7e5d375 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -152,7 +152,11 @@ struct rte_driver {
const char *alias;  /**< Driver alias. */
 };
 
-#define RTE_DEV_NAME_MAX_LEN (32)
+/*
+ * Internal identifier length
+ * Sufficiently large to allow for UUID or PCI address
+ */
+#define RTE_DEV_NAME_MAX_LEN 64
 
 /**
  * A structure describing a generic device.
-- 
2.11.0



Re: [dpdk-dev] [PATCH v3] net/virtio: fix of untrusted scalar value

2017-09-27 Thread Yuanhan Liu
On Fri, Sep 22, 2017 at 05:21:49PM +0200, Daniel Mrzyglod wrote:
> The unscrutinized value may be incorrectly assumed to be within a certain
> range by later operations.
> 
> In vhost_user_read: An unscrutinized value from an untrusted source used
> in a trusted context - the value of sz_payload may be harmfull and we need
> limit them to the max value of payload.
> 
> Coverity issue: 139601
> 
> Fixes: 6a84c37e3975 ("net/virtio-user: add vhost-user adapter layer")
> Cc: jianfeng@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Daniel Mrzyglod 

FYI, you should put the Ack from Jianfeng here, so that it will be
there when I apply your patch. Otherwise, I have to add it back manually.

But never mind, I have done it this time. So, applied to dpdk-next-virtio.

Thanks.

--yliu

> ---
> v3:
> * there were wrong v2 email adress for stable dpdk mailinglist
> * fix compilation errors
> 
> v2:
> * Add Cc for stable in gitlog massage
> * Add Coverity line
> * v1 was acked by Acked-by: Jianfeng Tan 
> 
> 
>  drivers/net/virtio/virtio_user/vhost_user.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/virtio/virtio_user/vhost_user.c 
> b/drivers/net/virtio/virtio_user/vhost_user.c
> index 4ad7b21..97bd832 100644
> --- a/drivers/net/virtio/virtio_user/vhost_user.c
> +++ b/drivers/net/virtio/virtio_user/vhost_user.c
> @@ -130,6 +130,10 @@ vhost_user_read(int fd, struct vhost_user_msg *msg)
>   }
>  
>   sz_payload = msg->size;
> +
> + if ((size_t)sz_payload > sizeof(msg->payload))
> + goto fail;
> +
>   if (sz_payload) {
>   ret = recv(fd, (void *)((char *)msg + sz_hdr), sz_payload, 0);
>   if (ret < sz_payload) {
> -- 
> 2.7.4


Re: [dpdk-dev] [PATCH 10/12] vhost: support to kick in secondary process

2017-09-27 Thread Yuanhan Liu
On Fri, Sep 22, 2017 at 02:30:21AM +, Tan, Jianfeng wrote:
> 
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> > Sent: Thursday, September 21, 2017 5:18 PM
> > To: Tan, Jianfeng
> > Cc: dev@dpdk.org; maxime.coque...@redhat.com; mtetsu...@gmail.com
> > Subject: Re: [PATCH 10/12] vhost: support to kick in secondary process
> > 
> > On Thu, Sep 21, 2017 at 03:04:39PM +0800, Tan, Jianfeng wrote:
> > > >On Fri, Aug 25, 2017 at 09:40:50AM +, Jianfeng Tan wrote:
> > > >>To support kick in secondary process, we propose callfd_pri and
> > > >>kickfd_pri to store the value in primary process; and by a new
> > > >>API, rte_vhost_set_vring_effective_fd(), we can set effective
> > > >>callfd and kickfd which can be used by secondary process.
> > > >>
> > > >>Note in this case, either primary process or the secondary process
> > > >>can kick the frontend; that is, they cannot kick a vring at the
> > > >>same time.
> > > >Since only one can work, why not just overwriting the fd? Say, you
> > > >could introudce some APIs like "rte_vhost_set_vring_callfd", then
> > > >you don't need to introduce few more fields like "callfd_pri".
> > >
> > > That cannot address the below case:
> > > 1. Primary starts;
> > > 2. Secondary one starts; (if we overwrite it without storing it in some
> > > other fields)
> > > 3. Secondary one exits;
> > > 4. Secondary two starts. (primary cannot share the fd with this secondary
> > > process now, as this fd does not mean anything to the primary process)
> > 
> > I was thinking that those fds will be retrieved by the primary process
> > once? So thsoe it got at beginning are still valid?
> 
> Yes, the FDs are valid to primary process at step 1. But After overwriting in 
> step 2, those FDs are not valid to primary.

Yes, but the primary process has already got its correct fds saved, right?
With the saved fds, it then could share it with another secondary process.

Actually, the key (and typical) issue of multi-process here is the fds are
process specific, while they are stored in the shared memory. That means
only one will take effect eventually. Worse, the old ones are lost.

So, I think to make it right in this case, you should move the fds from
the shared memory and store them in the memory of the corresponding process.
If that's done, all processes could have its own valid fds, then every
process could do the kick (if that's really necessary).

You could check following commit for more info.
553f45932fb7 ("net/virtio: store PCI operators pointer locally")

--yliu


Re: [dpdk-dev] [PATCH 0/8] add net/crypto mrvl pmd drivers

2017-09-27 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Tomasz Duszynski
> Sent: Tuesday, September 26, 2017 10:40 AM
> To: dev@dpdk.org
> Cc: m...@semihalf.com; d...@marvell.com; nsams...@marvell.com;
> Tomasz Duszynski 
> Subject: [dpdk-dev] [PATCH 0/8] add net/crypto mrvl pmd drivers
> 
> Hello,
> 
> This patch series introduces net/crypto drivers for Marvell Armada 7k/8k
> SoCs along with documentation and crypto driver pmd tests.
> 
> Below you can find the list of features which net/crypto pmds support.
> 

Could you split this patchset into two? I think the network and crypto PMDs 
should be separated,
as they are targetting different subtrees.

Thanks,
Pablo


Re: [dpdk-dev] [PATCH 0/8] add net/crypto mrvl pmd drivers

2017-09-27 Thread Tomasz Duszynski
On Wed, Sep 27, 2017 at 09:40:52AM +, De Lara Guarch, Pablo wrote:
>
>
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Tomasz Duszynski
> > Sent: Tuesday, September 26, 2017 10:40 AM
> > To: dev@dpdk.org
> > Cc: m...@semihalf.com; d...@marvell.com; nsams...@marvell.com;
> > Tomasz Duszynski 
> > Subject: [dpdk-dev] [PATCH 0/8] add net/crypto mrvl pmd drivers
> >
> > Hello,
> >
> > This patch series introduces net/crypto drivers for Marvell Armada 7k/8k
> > SoCs along with documentation and crypto driver pmd tests.
> >
> > Below you can find the list of features which net/crypto pmds support.
> >
>
> Could you split this patchset into two? I think the network and crypto PMDs 
> should be separated,
> as they are targetting different subtrees.
>
> Thanks,
> Pablo
If that's a hard requirement I am fine with splitting the patch series.
Drivers were put into the same patchset because they have common
CONFIG dependency.

E.g. applying just crypto pmd patches and changing
CONFIG_RTE_LIBRTE_PMD_MRVL_CRYPTO to 'y' would break compilation
as it depends on CONFIG introduced by net pmd patches.

--
- Tomasz Duszyński


Re: [dpdk-dev] [PATCH 1/8] app: link the whole rte_cfgfile library

2017-09-27 Thread Bruce Richardson
On Wed, Sep 27, 2017 at 09:36:03AM +0200, Tomasz Duszynski wrote:
> On Tue, Sep 26, 2017 at 03:31:18PM +0100, Bruce Richardson wrote:
> > On Tue, Sep 26, 2017 at 11:39:58AM +0200, Tomasz Duszynski wrote:
> > > Since MRVL NET PMD needs librte_cfgfile to parse QoS configuration file
> > > link it as the whole library.
> > >
> > > Signed-off-by: Jacek Siuda 
> > > Signed-off-by: Tomasz Duszynski 
> > > ---
> >
> > Can you clarify a bit more why this is needed? For a static build, the
> > cfgfile should be linked in to the after the PMDs, so the dependencies
> > in the driver should be satisfied in the link. For a dynamic build, the
> > PMD should depend upon the cfgfile directly, and use it at runtime
> > appropriately.
> > Am I missing something?
> >
> > /Bruce
> Hi Bruce,
> 
> You are correct, all dependencies in the driver will be satisfied.
> The reason this change was introduced is that, librte_pmd_mrvl.a
> contains undefined symbols from librte_cfgfile.a thus linking
> applications under app/ directory will fail as librte_cfgfile.a comes
> before librte_pmd_mrvl.a during the linking stage.
> 
> Linking librte_cfgfile.a with --whole-archive solves the issue.
> 
> --
Thanks for the explanation.

I believe the proper fix in this case is to ensure that when linking
apps that cfgfile comes after librte_pmd_mrvl. The drivers should always
come first when linking, then the DPDK libs.

/Bruce


Re: [dpdk-dev] [PATCH 06/12] eal: add channel for primary/secondary communication

2017-09-27 Thread Yuanhan Liu
On Fri, Aug 25, 2017 at 09:40:46AM +, Jianfeng Tan wrote:
> Previouly, there is only one way for primary/secondary to exchange
> messages, that is, primary process writes info into some predefind
> file, and secondary process reads info out. That cannot address
> the requirements:
>   a. Secondary wants to send info to primary.
>   b. Sending info at any time, instead of just initialization time.
>   c. Share FD with the other side.
> 
> This patch proposes to create a communication channel (as an unix
> socket connection) for above requirements.

Firstly, I think it's a good idea to have such generic interfaces for
multiple process.

> Three new APIs are added:
> 
>   1. rte_eal_primary_secondary_add_action is used to register an action,

As you have said, it's for registration, why use "add" verb here?
Normally, "register" implies one time action, while "add" means
it could be a repeat action.

Also, the function name is a bit long. Maybe something like
"rte_eal_mp_xxx" is shorter and better.

[...]
> +static struct action_entry *
> +find_action_entry_by_name(const char *name)
> +{
> + int len = strlen(name);
> + struct action_entry *entry;
> +
> + TAILQ_FOREACH(entry, &action_entry_list, next) {
> + if (strncmp(entry->action_name, name, len) == 0)
> + break;

Broken indentation here.

> + }
> +
> + return entry;
> +}
> +
> +int
> +rte_eal_primary_secondary_add_action(const char *action_name,
> +  rte_eal_primary_secondary_t action)
> +{
> + struct action_entry *entry = malloc(sizeof(struct action_entry));
> +
> + if (entry == NULL)
> + return -ENOMEM;
> +
> + strncpy(entry->action_name, action_name, MAX_ACTION_NAME_LEN);
> + entry->action = action;
> + TAILQ_INSERT_TAIL(&action_entry_list, entry, next);

Since you intended to support "one primary process and multiple secondary
process", here we need a lock to protect the list.

Another wonder is do we really need that, I mean 1:N model?

> + return 0;
> +}
> +
> +void
> +rte_eal_primary_secondary_del_action(const char *name)
> +{
> + struct action_entry *entry = find_action_entry_by_name(name);
> +
> + TAILQ_REMOVE(&action_entry_list, entry, next);
> + free(entry);
> +}
> +
> +#define MAX_SECONDARY_PROCS  8
> +
> +static int efd_pri_sec; /* epoll fd for primary/secondary channel thread */

I think it's not a good idea to use "pri". For me, "private" comes to
my mind firstly but not "primary".

> +static int fd_listen;   /* unix listen socket by primary */
> +static int fd_to_pri;   /* only used by secondary process */
> +static int fds_to_sec[MAX_SECONDARY_PROCS];

Too many vars. I'd suggest to use a struct here, which could also make
the naming a bit simpler. For instance,

struct mp_fds {
int efd;

union {
/* fds for primary process */
struct {
int listen;
/* fds used to send msg to secondary process */
int secondaries[...];
};

/* fds for secondary process */
struct {
/* fds used to send msg to the primary process */
int primary;
};
};
};

It also separates the scope well. Note that above field naming does
not like perfect though. Feel free to come up with some better names.

[...]
> +static int
> +process_msg(int fd)
> +{
> + int len;
> + int params_len;
> + char buf[1024];
> + int fds[8]; /* accept at most 8 FDs per message */

Why it's 8? I think you are adding a vhost-user specific limitation to a
generic interface, which isn't quite right.

> + struct msg_hdr *hdr;
> + struct action_entry *entry;
> +
> + len = read_msg(fd, buf, 1024, fds, 8);
> + if (len < 0) {
> + RTE_LOG(ERR, EAL, "failed to read message: %s\n",
> + strerror(errno));
> + return -1;
> + }
> +
> + hdr = (struct msg_hdr *) buf;
^
An extra whitespace.

> +
> + entry = find_action_entry_by_name(hdr->action_name);
> + if (entry == NULL) {
> + RTE_LOG(ERR, EAL, "cannot find action by: %s\n",
> + hdr->action_name);
> + return -1;
> + }
> +
> + params_len = len - sizeof(struct msg_hdr);
> + return entry->action(hdr->params, params_len, fds, hdr->fds_num);
> +}
> +
> +static void *
> +thread_primary(__attribute__((unused)) void *arg)

Use __rte_unused instead, and put it after the var name.

> +{
> + int fd;
> + int i, n;
> + struct epoll_event event;
> + struct epoll_event *events;
> +
> + event.events = EPOLLIN | EPOLLRDHUP;
> + event.data.fd = fd_listen;
> + if (epoll_ctl(efd_pri_sec, EPOLL_CTL_ADD, fd_listen, &event) < 0) {
> + RTE_LOG(ERR, EAL, "failed to epoll_ctl: %s\n",
> + 

[dpdk-dev] [PATCH v4 1/8] mbuf: support GTP in software packet type parser

2017-09-27 Thread Beilei Xing
Add support of GTP-C and GTP-U tunnels in rte_net_get_ptype().

Signed-off-by: Beilei Xing 
Acked-by: Olivier Matz 
---
 lib/librte_mbuf/rte_mbuf_ptype.c |  2 ++
 lib/librte_mbuf/rte_mbuf_ptype.h | 24 
 2 files changed, 26 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf_ptype.c b/lib/librte_mbuf/rte_mbuf_ptype.c
index e5c4fae..a450814 100644
--- a/lib/librte_mbuf/rte_mbuf_ptype.c
+++ b/lib/librte_mbuf/rte_mbuf_ptype.c
@@ -89,6 +89,8 @@ const char *rte_get_ptype_tunnel_name(uint32_t ptype)
case RTE_PTYPE_TUNNEL_NVGRE: return "TUNNEL_NVGRE";
case RTE_PTYPE_TUNNEL_GENEVE: return "TUNNEL_GENEVE";
case RTE_PTYPE_TUNNEL_GRENAT: return "TUNNEL_GRENAT";
+   case RTE_PTYPE_TUNNEL_GTPC: return "TUNNEL_GTPC";
+   case RTE_PTYPE_TUNNEL_GTPU: return "TUNNEL_GTPU";
default: return "TUNNEL_UNKNOWN";
}
 }
diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h b/lib/librte_mbuf/rte_mbuf_ptype.h
index acd70bb..eb7cd2c 100644
--- a/lib/librte_mbuf/rte_mbuf_ptype.h
+++ b/lib/librte_mbuf/rte_mbuf_ptype.h
@@ -383,6 +383,30 @@ extern "C" {
  */
 #define RTE_PTYPE_TUNNEL_GRENAT 0x6000
 /**
+ * GTP-C (GPRS Tunnelling Protocol) control tunneling packet type.
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17
+ * | 'destination port'=2123>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17
+ * | 'destination port'=2123>
+ */
+#define RTE_PTYPE_TUNNEL_GTPC   0x7000
+/**
+ * GTP-U (GPRS Tunnelling Protocol) user data tunneling packet type.
+ * Packet format:
+ * <'ether type'=0x0800
+ * | 'version'=4, 'protocol'=17
+ * | 'destination port'=2152>
+ * or,
+ * <'ether type'=0x86DD
+ * | 'version'=6, 'next header'=17
+ * | 'destination port'=2152>
+ */
+#define RTE_PTYPE_TUNNEL_GTPU   0x8000
+/**
  * Mask of tunneling packet types.
  */
 #define RTE_PTYPE_TUNNEL_MASK   0xf000
-- 
2.5.5



[dpdk-dev] [PATCH v4 2/8] net/i40e: update ptype and pctype info

2017-09-27 Thread Beilei Xing
Update new packet type and new pctype info when downloading
profile.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c  | 312 
 drivers/net/i40e/i40e_ethdev.h  |  24 
 drivers/net/i40e/rte_pmd_i40e.c |   6 +-
 3 files changed, 341 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 720f067..d6b0d50 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -65,6 +65,7 @@
 #include "i40e_rxtx.h"
 #include "i40e_pf.h"
 #include "i40e_regs.h"
+#include "rte_pmd_i40e.h"
 
 #define ETH_I40E_FLOATING_VEB_ARG  "enable_floating_veb"
 #define ETH_I40E_FLOATING_VEB_LIST_ARG "floating_veb_list"
@@ -1036,6 +1037,21 @@ i40e_init_fdir_filter_list(struct rte_eth_dev *dev)
return ret;
 }
 
+static void
+i40e_init_customized_info(struct i40e_pf *pf)
+{
+   int i;
+
+   /* Initialize customized pctype */
+   for (i = I40E_CUSTOMIZED_GTPC; i < I40E_CUSTOMIZED_MAX; i++) {
+   pf->customized_pctype[i].index = i;
+   pf->customized_pctype[i].pctype = I40E_FILTER_PCTYPE_INVALID;
+   pf->customized_pctype[i].valid = false;
+   }
+
+   pf->gtp_support = false;
+}
+
 static int
 eth_i40e_dev_init(struct rte_eth_dev *dev)
 {
@@ -1301,6 +1317,9 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
/* initialize Traffic Manager configuration */
i40e_tm_conf_init(dev);
 
+   /* Initialize customized information */
+   i40e_init_customized_info(pf);
+
ret = i40e_init_ethtype_filter_list(dev);
if (ret < 0)
goto err_init_ethtype_filter_list;
@@ -10893,6 +10912,299 @@ is_i40e_supported(struct rte_eth_dev *dev)
return is_device_supported(dev, &rte_i40e_pmd);
 }
 
+struct i40e_customized_pctype*
+i40e_find_customized_pctype(struct i40e_pf *pf, uint8_t index)
+{
+   int i;
+
+   for (i = 0; i < I40E_CUSTOMIZED_MAX; i++) {
+   if (pf->customized_pctype[i].index == index)
+   return &pf->customized_pctype[i];
+   }
+   return NULL;
+}
+
+static int
+i40e_update_customized_pctype(struct rte_eth_dev *dev, uint8_t *pkg,
+ uint32_t pkg_size, uint32_t proto_num,
+ struct rte_pmd_i40e_proto_info *proto)
+{
+   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+   uint32_t pctype_num;
+   struct rte_pmd_i40e_ptype_info *pctype;
+   uint32_t buff_size;
+   struct i40e_customized_pctype *new_pctype = NULL;
+   uint8_t proto_id;
+   uint8_t pctype_value;
+   char name[64];
+   uint32_t i, j, n;
+   int ret;
+
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   (uint8_t *)&pctype_num, sizeof(pctype_num),
+   RTE_PMD_I40E_PKG_INFO_PCTYPE_NUM);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to get pctype number");
+   return -1;
+   }
+   if (!pctype_num) {
+   PMD_DRV_LOG(INFO, "No new pctype added");
+   return -1;
+   }
+
+   buff_size = pctype_num * sizeof(struct rte_pmd_i40e_proto_info);
+   pctype = rte_zmalloc("new_pctype", buff_size, 0);
+   if (!pctype) {
+   PMD_DRV_LOG(ERR, "Failed to allocate memory");
+   return -1;
+   }
+   /* get information about new pctype list */
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   (uint8_t *)pctype, buff_size,
+   RTE_PMD_I40E_PKG_INFO_PCTYPE_LIST);
+   if (ret) {
+   PMD_DRV_LOG(ERR, "Failed to get pctype list");
+   rte_free(pctype);
+   return -1;
+   }
+
+   /* Update customized pctype. */
+   for (i = 0; i < pctype_num; i++) {
+   pctype_value = pctype[i].ptype_id;
+   memset(name, 0, sizeof(name));
+   for (j = 0; j < RTE_PMD_I40E_PROTO_NUM; j++) {
+   proto_id = pctype[i].protocols[j];
+   if (proto_id == RTE_PMD_I40E_PROTO_UNUSED)
+   continue;
+   for (n = 0; n < proto_num; n++) {
+   if (proto[n].proto_id != proto_id)
+   continue;
+   strcat(name, proto[n].name);
+   strcat(name, "_");
+   break;
+   }
+   }
+   name[strlen(name) - 1] = '\0';
+   if (!strcmp(name, "GTPC"))
+   new_pctype =
+   i40e_find_customized_pctype(pf,
+ I40E_CUSTOMIZED_GTPC);
+   else if (!strcmp(name, "GTPU_IPV4"))
+   new_pctype =
+   

[dpdk-dev] [PATCH v4 0/8] GPT-C and GTP-U enabling

2017-09-27 Thread Beilei Xing
This patch set enables RSS/FDIR/cloud filter for GPT-C and GTP-U.
It depends on Kirill's patch:
http://dpdk.org/ml/archives/dev/2017-September/076035.html.
However, Kirill's patchset needs to be updated.

v4 changes:
 - Refine fdir related code.
 - Rework profile metadata parsing function.
 - Fix code style.

v3 changes:
 - Rework implementation to support the new profile.
 - Add GTPC and GTPU tunnel type in software packet type parser.
 - Update ptype info when loading profile.
 - Fix bug of updating pctype info.


v2 changes:
 - Enable RSS/FDIR/cloud filter dinamicly by checking profile
 - Add GTPC and GTPU items to distinguish rule for GTP-C or GTP-U
 - Rework FDIR/cloud filter enabling function

Beilei Xing (8):
  mbuf: support GTP in software packet type parser
  net/i40e: update ptype and pctype info
  net/i40e: support RSS for new pctype
  ethdev: add GTP items to support flow API
  net/i40e: finish integration FDIR with generic flow API
  net/i40e: add FDIR support for GTP-C and GTP-U
  net/i40e: add cloud filter parsing function for GTP
  net/i40e: enable cloud filter for GTP-C and GTP-U

 app/test-pmd/cmdline_flow.c |  40 ++
 app/test-pmd/config.c   |   3 +
 doc/guides/prog_guide/rte_flow.rst  |  18 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   4 +
 drivers/net/i40e/i40e_ethdev.c  | 530 +-
 drivers/net/i40e/i40e_ethdev.h  | 156 +++-
 drivers/net/i40e/i40e_fdir.c| 572 +++-
 drivers/net/i40e/i40e_flow.c| 496 
 drivers/net/i40e/rte_pmd_i40e.c |   6 +-
 lib/librte_ether/rte_flow.h |  52 +++
 lib/librte_mbuf/rte_mbuf_ptype.c|   2 +
 lib/librte_mbuf/rte_mbuf_ptype.h|  24 ++
 12 files changed, 1775 insertions(+), 128 deletions(-)

-- 
2.5.5



[dpdk-dev] [PATCH v4 3/8] net/i40e: support RSS for new pctype

2017-09-27 Thread Beilei Xing
Enable RSS for new pctypes after downloading
new profile.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index d6b0d50..aba35a5 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1928,6 +1928,31 @@ i40e_apply_link_speed(struct rte_eth_dev *dev)
return i40e_phy_conf_link(hw, abilities, speed, true);
 }
 
+static void
+i40e_customized_pctype_hash_set(struct i40e_pf *pf, bool enable)
+{
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   uint64_t hena;
+   int i;
+
+   hena = (uint64_t)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(0));
+   hena |= ((uint64_t)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(1))) << 32;
+
+   for (i = 0; i < I40E_CUSTOMIZED_MAX; i++) {
+   if (pf->customized_pctype[i].valid) {
+   if (enable)
+   hena |= 1ULL << pf->customized_pctype[i].pctype;
+   else
+   hena &= ~(1ULL <<
+ pf->customized_pctype[i].pctype);
+   }
+   }
+
+   i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), (uint32_t)hena);
+   i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), (uint32_t)(hena >> 32));
+   I40E_WRITE_FLUSH(hw);
+}
+
 static int
 i40e_dev_start(struct rte_eth_dev *dev)
 {
@@ -2075,6 +2100,8 @@ i40e_dev_start(struct rte_eth_dev *dev)
"please call hierarchy_commit() "
"before starting the port");
 
+   i40e_customized_pctype_hash_set(pf, true);
+
return I40E_SUCCESS;
 
 err_up:
@@ -2155,6 +2182,8 @@ i40e_dev_close(struct rte_eth_dev *dev)
uint32_t reg;
int i;
 
+   i40e_customized_pctype_hash_set(pf, false);
+
PMD_INIT_FUNC_TRACE();
 
i40e_dev_stop(dev);
-- 
2.5.5



[dpdk-dev] [PATCH v4 5/8] net/i40e: finish integration FDIR with generic flow API

2017-09-27 Thread Beilei Xing
rte_eth_fdir_* structures are still used in FDIR functions.
This patch adds i40e private FDIR related structures and
functions to finish integration FDIR with generic flow API.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.h |  83 ++-
 drivers/net/i40e/i40e_fdir.c   | 490 +++--
 drivers/net/i40e/i40e_flow.c   |  76 +++
 3 files changed, 586 insertions(+), 63 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 73fb5c3..4d690a1 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -461,6 +461,80 @@ struct i40e_vmdq_info {
 #define I40E_FDIR_IPv6_TC_OFFSET   20
 
 /*
+ * A union contains the inputs for all types of flow
+ * items in flows need to be in big endian
+ */
+union i40e_fdir_flow {
+   struct rte_eth_l2_flow l2_flow;
+   struct rte_eth_udpv4_flow  udp4_flow;
+   struct rte_eth_tcpv4_flow  tcp4_flow;
+   struct rte_eth_sctpv4_flow sctp4_flow;
+   struct rte_eth_ipv4_flow   ip4_flow;
+   struct rte_eth_udpv6_flow  udp6_flow;
+   struct rte_eth_tcpv6_flow  tcp6_flow;
+   struct rte_eth_sctpv6_flow sctp6_flow;
+   struct rte_eth_ipv6_flow   ipv6_flow;
+};
+
+/* A structure used to contain extend input of flow */
+struct i40e_fdir_flow_ext {
+   uint16_t vlan_tci;
+   uint8_t flexbytes[RTE_ETH_FDIR_MAX_FLEXLEN];
+   /* It is filled by the flexible payload to match. */
+   uint8_t is_vf;   /* 1 for VF, 0 for port dev */
+   uint16_t dst_id; /* VF ID, available when is_vf is 1*/
+};
+
+/* A structure used to define the input for a flow director filter entry */
+struct i40e_fdir_input {
+   enum i40e_filter_pctype pctype;
+   union i40e_fdir_flow flow;
+   /* Flow fields to match, dependent on flow_type */
+   struct i40e_fdir_flow_ext flow_ext;
+   /* Additional fields to match */
+};
+
+/* Behavior will be taken if FDIR match */
+enum i40e_fdir_behavior {
+   I40E_FDIR_ACCEPT = 0,
+   I40E_FDIR_REJECT,
+   I40E_FDIR_PASSTHRU,
+};
+
+/* Flow director report status
+ * It defines what will be reported if FDIR entry is matched.
+ */
+enum i40e_fdir_status {
+   I40E_FDIR_NO_REPORT_STATUS = 0, /* Report nothing. */
+   I40E_FDIR_REPORT_ID,/* Only report FD ID. */
+   I40E_FDIR_REPORT_ID_FLEX_4, /* Report FD ID and 4 flex bytes. */
+   I40E_FDIR_REPORT_FLEX_8,/* Report 8 flex bytes. */
+};
+
+/* A structure used to define an action when match FDIR packet filter. */
+struct i40e_fdir_action {
+   uint16_t rx_queue;/* Queue assigned to if FDIR match. */
+   enum i40e_fdir_behavior behavior; /* Behavior will be taken */
+   enum i40e_fdir_status report_status;  /* Status report option */
+   /* If report_status is I40E_FDIR_REPORT_ID_FLEX_4 or
+* I40E_FDIR_REPORT_FLEX_8, flex_off specifies where the reported
+* flex bytes start from in flexible payload.
+*/
+   uint8_t flex_off;
+};
+
+/* A structure used to define the flow director filter entry by filter_ctrl API
+ * It supports RTE_ETH_FILTER_FDIR with RTE_ETH_FILTER_ADD and
+ * RTE_ETH_FILTER_DELETE operations.
+ */
+struct i40e_fdir_filter_conf {
+   uint32_t soft_id;
+   /* ID, an unique value is required when deal with FDIR entry */
+   struct i40e_fdir_input input;/* Input set */
+   struct i40e_fdir_action action;  /* Action taken when match */
+};
+
+/*
  * Structure to store flex pit for flow diretor.
  */
 struct i40e_fdir_flex_pit {
@@ -483,7 +557,7 @@ struct i40e_fdir_flex_mask {
 
 struct i40e_fdir_filter {
TAILQ_ENTRY(i40e_fdir_filter) rules;
-   struct rte_eth_fdir_filter fdir;
+   struct i40e_fdir_filter_conf fdir;
 };
 
 TAILQ_HEAD(i40e_fdir_filter_list, i40e_fdir_filter);
@@ -907,7 +981,7 @@ extern const struct rte_flow_ops i40e_flow_ops;
 
 union i40e_filter_t {
struct rte_eth_ethertype_filter ethertype_filter;
-   struct rte_eth_fdir_filter fdir_filter;
+   struct i40e_fdir_filter_conf fdir_filter;
struct rte_eth_tunnel_filter_conf tunnel_filter;
struct i40e_tunnel_filter_conf consistent_tunnel_filter;
 };
@@ -981,7 +1055,7 @@ i40e_sw_ethertype_filter_lookup(struct i40e_ethertype_rule 
*ethertype_rule,
 int i40e_sw_ethertype_filter_del(struct i40e_pf *pf,
 struct i40e_ethertype_filter_input *input);
 int i40e_sw_fdir_filter_del(struct i40e_pf *pf,
-   struct rte_eth_fdir_input *input);
+   struct i40e_fdir_input *input);
 struct i40e_tunnel_filter *
 i40e_sw_tunnel_filter_lookup(struct i40e_tunnel_rule *tunnel_rule,
 const struct i40e_tunnel_filter_input *input);
@@ -994,6 +1068,9 @@ int i40e_ethertype_filter_set(struct i40e_pf *pf,
 int i40e_add_del_fdir_filter(struct rte_eth_dev *dev,
 const struct rte_eth_fdir_filter 

[dpdk-dev] [PATCH v4 6/8] net/i40e: add FDIR support for GTP-C and GTP-U

2017-09-27 Thread Beilei Xing
This patch adds FDIR support for GTP-C and GTP-U.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.h |  30 +
 drivers/net/i40e/i40e_fdir.c   | 200 ++-
 drivers/net/i40e/i40e_flow.c   | 263 +++--
 3 files changed, 396 insertions(+), 97 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 4d690a1..502f6c6 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -460,6 +460,25 @@ struct i40e_vmdq_info {
 #define I40E_FLEX_WORD_MASK(off) (0x80 >> (off))
 #define I40E_FDIR_IPv6_TC_OFFSET   20
 
+/* A structure used to define the input for GTP flow */
+struct i40e_gtp_flow {
+   struct rte_eth_udpv4_flow udp; /* IPv4 UDP fields to match. */
+   uint8_t msg_type;  /* Message type. */
+   uint32_t teid; /* TEID in big endian. */
+};
+
+/* A structure used to define the input for GTP IPV4 flow */
+struct i40e_gtp_ipv4_flow {
+   struct i40e_gtp_flow gtp;
+   struct rte_eth_ipv4_flow ip4;
+};
+
+/* A structure used to define the input for GTP IPV6 flow */
+struct i40e_gtp_ipv6_flow {
+   struct i40e_gtp_flow gtp;
+   struct rte_eth_ipv6_flow ip6;
+};
+
 /*
  * A union contains the inputs for all types of flow
  * items in flows need to be in big endian
@@ -474,6 +493,14 @@ union i40e_fdir_flow {
struct rte_eth_tcpv6_flow  tcp6_flow;
struct rte_eth_sctpv6_flow sctp6_flow;
struct rte_eth_ipv6_flow   ipv6_flow;
+   struct i40e_gtp_flow   gtp_flow;
+   struct i40e_gtp_ipv4_flow  gtp_ipv4_flow;
+   struct i40e_gtp_ipv6_flow  gtp_ipv6_flow;
+};
+
+enum i40e_fdir_ip_type {
+   I40E_FDIR_IPTYPE_IPV4,
+   I40E_FDIR_IPTYPE_IPV6,
 };
 
 /* A structure used to contain extend input of flow */
@@ -483,6 +510,9 @@ struct i40e_fdir_flow_ext {
/* It is filled by the flexible payload to match. */
uint8_t is_vf;   /* 1 for VF, 0 for port dev */
uint16_t dst_id; /* VF ID, available when is_vf is 1*/
+   bool inner_ip;   /* If there is inner ip */
+   enum i40e_fdir_ip_type iip_type; /* ip type for inner ip */
+   bool customized_pctype; /* If customized pctype is used */
 };
 
 /* A structure used to define the input for a flow director filter entry */
diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index eb2593b..e374b42 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -71,6 +71,9 @@
 #define I40E_FDIR_IPv6_DEFAULT_HOP_LIMITS   0xFF
 #define I40E_FDIR_IPv6_PAYLOAD_LEN  380
 #define I40E_FDIR_UDP_DEFAULT_LEN   400
+#define I40E_FDIR_GTP_DEFAULT_LEN   384
+#define I40E_FDIR_INNER_IP_DEFAULT_LEN  384
+#define I40E_FDIR_INNER_IPv6_DEFAULT_LEN344
 
 /* Wait time for fdir filter programming */
 #define I40E_FDIR_MAX_WAIT_US 1
@@ -939,16 +942,34 @@ i40e_fdir_construct_pkt(struct i40e_pf *pf,
return 0;
 }
 
+static struct i40e_customized_pctype *
+i40e_flow_fdir_find_customized_pctype(struct i40e_pf *pf, uint8_t pctype)
+{
+   struct i40e_customized_pctype *cus_pctype;
+   enum i40e_new_pctype i = I40E_CUSTOMIZED_GTPC;
+
+   for (; i < I40E_CUSTOMIZED_MAX; i++) {
+   cus_pctype = &pf->customized_pctype[i];
+   if (pctype == cus_pctype->pctype)
+   return cus_pctype;
+   }
+   return NULL;
+}
+
 static inline int
-i40e_flow_fdir_fill_eth_ip_head(const struct i40e_fdir_input *fdir_input,
+i40e_flow_fdir_fill_eth_ip_head(struct i40e_pf *pf,
+   const struct i40e_fdir_input *fdir_input,
unsigned char *raw_pkt,
bool vlan)
 {
+   struct i40e_customized_pctype *cus_pctype = NULL;
static uint8_t vlan_frame[] = {0x81, 0, 0, 0};
uint16_t *ether_type;
uint8_t len = 2 * sizeof(struct ether_addr);
struct ipv4_hdr *ip;
struct ipv6_hdr *ip6;
+   uint8_t pctype = fdir_input->pctype;
+   bool is_customized_pctype = fdir_input->flow_ext.customized_pctype;
static const uint8_t next_proto[] = {
[I40E_FILTER_PCTYPE_FRAG_IPV4] = IPPROTO_IP,
[I40E_FILTER_PCTYPE_NONF_IPV4_TCP] = IPPROTO_TCP,
@@ -975,27 +996,30 @@ i40e_flow_fdir_fill_eth_ip_head(const struct 
i40e_fdir_input *fdir_input,
raw_pkt += sizeof(uint16_t);
len += sizeof(uint16_t);
 
-   switch (fdir_input->pctype) {
-   case I40E_FILTER_PCTYPE_L2_PAYLOAD:
+   if (is_customized_pctype) {
+   cus_pctype = i40e_flow_fdir_find_customized_pctype(pf, pctype);
+   if (!cus_pctype)
+   PMD_DRV_LOG(ERR, "unknown pctype %u.",
+   fdir_input->pctype);
+   }
+
+   if (pctype == I40E_FILTER_PCTYPE_L2_PAYLOAD)
*ether_type = fdir_input->flow.l2_flow.ether_type;
- 

[dpdk-dev] [PATCH v4 4/8] ethdev: add GTP items to support flow API

2017-09-27 Thread Beilei Xing
This patch adds GTP, GTPC and GTPU items for
generic flow API, and also exposes item fields
through the flow command.

Signed-off-by: Beilei Xing 
Acked-by: Adrien Mazarguil 
---
 app/test-pmd/cmdline_flow.c | 40 ++
 app/test-pmd/config.c   |  3 ++
 doc/guides/prog_guide/rte_flow.rst  | 18 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  4 +++
 lib/librte_ether/rte_flow.h | 52 +
 5 files changed, 117 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index a17a004..26c3e4f 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -171,6 +171,10 @@ enum index {
ITEM_GRE_PROTO,
ITEM_FUZZY,
ITEM_FUZZY_THRESH,
+   ITEM_GTP,
+   ITEM_GTP_TEID,
+   ITEM_GTPC,
+   ITEM_GTPU,
 
/* Validate/create actions. */
ACTIONS,
@@ -451,6 +455,9 @@ static const enum index next_item[] = {
ITEM_MPLS,
ITEM_GRE,
ITEM_FUZZY,
+   ITEM_GTP,
+   ITEM_GTPC,
+   ITEM_GTPU,
ZERO,
 };
 
@@ -588,6 +595,12 @@ static const enum index item_gre[] = {
ZERO,
 };
 
+static const enum index item_gtp[] = {
+   ITEM_GTP_TEID,
+   ITEM_NEXT,
+   ZERO,
+};
+
 static const enum index next_action[] = {
ACTION_END,
ACTION_VOID,
@@ -1421,6 +1434,33 @@ static const struct token token_list[] = {
.args = ARGS(ARGS_ENTRY(struct rte_flow_item_fuzzy,
thresh)),
},
+   [ITEM_GTP] = {
+   .name = "gtp",
+   .help = "match GTP header",
+   .priv = PRIV_ITEM(GTP, sizeof(struct rte_flow_item_gtp)),
+   .next = NEXT(item_gtp),
+   .call = parse_vc,
+   },
+   [ITEM_GTP_TEID] = {
+   .name = "teid",
+   .help = "tunnel endpoint identifier",
+   .next = NEXT(item_gtp, NEXT_ENTRY(UNSIGNED), item_param),
+   .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_gtp, teid)),
+   },
+   [ITEM_GTPC] = {
+   .name = "gtpc",
+   .help = "match GTP header",
+   .priv = PRIV_ITEM(GTPC, sizeof(struct rte_flow_item_gtp)),
+   .next = NEXT(item_gtp),
+   .call = parse_vc,
+   },
+   [ITEM_GTPU] = {
+   .name = "gtpu",
+   .help = "match GTP header",
+   .priv = PRIV_ITEM(GTPU, sizeof(struct rte_flow_item_gtp)),
+   .next = NEXT(item_gtp),
+   .call = parse_vc,
+   },
 
/* Validate/create actions. */
[ACTIONS] = {
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index e8e311c..9b09bbd 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -949,6 +949,9 @@ static const struct {
MK_FLOW_ITEM(MPLS, sizeof(struct rte_flow_item_mpls)),
MK_FLOW_ITEM(GRE, sizeof(struct rte_flow_item_gre)),
MK_FLOW_ITEM(FUZZY, sizeof(struct rte_flow_item_fuzzy)),
+   MK_FLOW_ITEM(GTP, sizeof(struct rte_flow_item_gtp)),
+   MK_FLOW_ITEM(GTPC, sizeof(struct rte_flow_item_gtp)),
+   MK_FLOW_ITEM(GTPU, sizeof(struct rte_flow_item_gtp)),
 };
 
 /** Compute storage space needed by item specification. */
diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index 662a912..1bc8f19 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -955,6 +955,24 @@ Usage example, fuzzy match a TCPv4 packets:
| 4 | END  |
+---+--+
 
+Item: ``GTP``, ``GTPC``, ``GTPU``
+^
+
+Matches a GTP header.
+
+Note: GTP, GTPC and GTPU use the same structure. Since only UDP destination 
port
+is used to distinguish GTP_C (port is 2123) and GTP_U packets (port is 2152),
+GTPC and GTPU item are defined for a user-friendly API when creating GTP-C and
+GTP-U flow.
+
+- ``v_pt_rsv_flags``: version (3b), protocol type (1b), reserved (1b),
+  extension header flag (1b), sequence number flag (1b), N-PDU number
+  flag (1b).
+- ``msg_type``: message type.
+- ``msg_len``: message length.
+- ``teid``: tunnel endpoint identifier.
+- Default ``mask`` matches teid only.
+
 Actions
 ~~~
 
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 2ed62f5..8cc2399 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -2696,6 +2696,10 @@ This section lists supported pattern items and their 
attributes, if any.
 
   - ``thresh {unsigned}``: accuracy threshold.
 
+- ``gtp``, ``gtpc``, ``gtpu``: match GTP header.
+
+  - ``teid {unsigned}``: tunnel endpoint identifier.
+
 Actions list
 
 
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
index bba6169..5da3aff 100644
--- a/lib/librte_ether/rte_flow.h
+++ b

[dpdk-dev] [PATCH v4 7/8] net/i40e: add cloud filter parsing function for GTP

2017-09-27 Thread Beilei Xing
This patch adds i40e_flow_parse_gtp_filter parsing
function for GTP-C and GTP-U.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.h |   2 +
 drivers/net/i40e/i40e_flow.c   | 151 +
 2 files changed, 153 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 502f6c6..436ca2c 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -703,6 +703,8 @@ enum i40e_tunnel_type {
I40E_TUNNEL_TYPE_MPLSoUDP,
I40E_TUNNEL_TYPE_MPLSoGRE,
I40E_TUNNEL_TYPE_QINQ,
+   I40E_TUNNEL_TYPE_GTPC,
+   I40E_TUNNEL_TYPE_GTPU,
I40E_TUNNEL_TYPE_MAX,
 };
 
diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index ea81ecb..2bf7098 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -125,6 +125,12 @@ static int i40e_flow_parse_mpls_filter(struct rte_eth_dev 
*dev,
   const struct rte_flow_action actions[],
   struct rte_flow_error *error,
   union i40e_filter_t *filter);
+static int i40e_flow_parse_gtp_filter(struct rte_eth_dev *dev,
+ const struct rte_flow_attr *attr,
+ const struct rte_flow_item pattern[],
+ const struct rte_flow_action actions[],
+ struct rte_flow_error *error,
+ union i40e_filter_t *filter);
 static int i40e_flow_destroy_ethertype_filter(struct i40e_pf *pf,
  struct i40e_ethertype_filter *filter);
 static int i40e_flow_destroy_tunnel_filter(struct i40e_pf *pf,
@@ -1808,6 +1814,11 @@ static struct i40e_valid_pattern 
i40e_supported_patterns[] = {
{ pattern_mpls_2, i40e_flow_parse_mpls_filter },
{ pattern_mpls_3, i40e_flow_parse_mpls_filter },
{ pattern_mpls_4, i40e_flow_parse_mpls_filter },
+   /* GTP-C & GTP-U */
+   { pattern_fdir_ipv4_gtpc, i40e_flow_parse_gtp_filter },
+   { pattern_fdir_ipv4_gtpu, i40e_flow_parse_gtp_filter },
+   { pattern_fdir_ipv6_gtpc, i40e_flow_parse_gtp_filter },
+   { pattern_fdir_ipv6_gtpu, i40e_flow_parse_gtp_filter },
/* QINQ */
{ pattern_qinq_1, i40e_flow_parse_qinq_filter },
 };
@@ -3823,6 +3834,146 @@ i40e_flow_parse_mpls_filter(struct rte_eth_dev *dev,
 }
 
 /* 1. Last in item should be NULL as range is not supported.
+ * 2. Supported filter types: GTP TEID.
+ * 3. Mask of fields which need to be matched should be
+ *filled with 1.
+ * 4. Mask of fields which needn't to be matched should be
+ *filled with 0.
+ */
+static int
+i40e_flow_parse_gtp_pattern(struct rte_eth_dev *dev,
+   const struct rte_flow_item *pattern,
+   struct rte_flow_error *error,
+   struct i40e_tunnel_filter_conf *filter)
+{
+   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+   const struct rte_flow_item *item = pattern;
+   const struct rte_flow_item_gtp *gtp_spec;
+   const struct rte_flow_item_gtp *gtp_mask;
+   enum rte_flow_item_type item_type;
+
+   if (!pf->gtp_support) {
+   rte_flow_error_set(error, EINVAL,
+  RTE_FLOW_ERROR_TYPE_ITEM,
+  item,
+  "GTP is not supported by default.");
+   return -rte_errno;
+   }
+
+   for (; item->type != RTE_FLOW_ITEM_TYPE_END; item++) {
+   if (item->last) {
+   rte_flow_error_set(error, EINVAL,
+  RTE_FLOW_ERROR_TYPE_ITEM,
+  item,
+  "Not support range");
+   return -rte_errno;
+   }
+   item_type = item->type;
+   switch (item_type) {
+   case RTE_FLOW_ITEM_TYPE_ETH:
+   if (item->spec || item->mask) {
+   rte_flow_error_set(error, EINVAL,
+  RTE_FLOW_ERROR_TYPE_ITEM,
+  item,
+  "Invalid ETH item");
+   return -rte_errno;
+   }
+   break;
+   case RTE_FLOW_ITEM_TYPE_IPV4:
+   filter->ip_type = I40E_TUNNEL_IPTYPE_IPV4;
+   /* IPv4 is used to describe protocol,
+* spec and mask should be NULL.
+*/
+   if (item->spec || item->mask) {
+   rte_flow_error_set(error, EINVAL,
+

[dpdk-dev] [PATCH v4 8/8] net/i40e: enable cloud filter for GTP-C and GTP-U

2017-09-27 Thread Beilei Xing
GTP-C & GTP-U are not supported by cloud filter due
to limited resource of HW, this patch enables GTP-C
and GTP-U cloud filter by replacing inner_mac and
TUNNEL_KEY.
This configuration will be set when adding GTP-C or
GTP-U filter rules, and it will be invalid only by
NIC core reset.

Signed-off-by: Beilei Xing 
---
 drivers/net/i40e/i40e_ethdev.c | 189 +
 drivers/net/i40e/i40e_ethdev.h |  17 ++--
 drivers/net/i40e/i40e_flow.c   |  12 +--
 3 files changed, 189 insertions(+), 29 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index aba35a5..7560867 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -7161,7 +7161,7 @@ i40e_status_code i40e_replace_mpls_l1_filter(struct 
i40e_pf *pf)
/* create L1 filter */
filter_replace.old_filter_type =
I40E_AQC_REPLACE_CLOUD_CMD_INPUT_FV_IMAC;
-   filter_replace.new_filter_type = I40E_AQC_ADD_L1_FILTER_TEID_MPLS;
+   filter_replace.new_filter_type = I40E_AQC_ADD_L1_FILTER_0X11;
filter_replace.tr_bit = 0;
 
/* Prepare the buffer, 3 entries */
@@ -7209,12 +7209,12 @@ i40e_status_code i40e_replace_mpls_cloud_filter(struct 
i40e_pf *pf)
I40E_AQC_MIRROR_CLOUD_FILTER;
filter_replace.old_filter_type = I40E_AQC_ADD_CLOUD_FILTER_IIP;
filter_replace.new_filter_type =
-   I40E_AQC_ADD_CLOUD_FILTER_TEID_MPLSoUDP;
+   I40E_AQC_ADD_CLOUD_FILTER_0X11;
/* Prepare the buffer, 2 entries */
filter_replace_buf.data[0] = I40E_AQC_REPLACE_CLOUD_CMD_INPUT_FV_STAG;
filter_replace_buf.data[0] |=
I40E_AQC_REPLACE_CLOUD_CMD_INPUT_VALIDATED;
-   filter_replace_buf.data[4] = I40E_AQC_ADD_L1_FILTER_TEID_MPLS;
+   filter_replace_buf.data[4] = I40E_AQC_ADD_L1_FILTER_0X11;
filter_replace_buf.data[4] |=
I40E_AQC_REPLACE_CLOUD_CMD_INPUT_VALIDATED;
status = i40e_aq_replace_cloud_filters(hw, &filter_replace,
@@ -7232,12 +7232,131 @@ i40e_status_code i40e_replace_mpls_cloud_filter(struct 
i40e_pf *pf)
I40E_AQC_MIRROR_CLOUD_FILTER;
filter_replace.old_filter_type = I40E_AQC_ADD_CLOUD_FILTER_IMAC;
filter_replace.new_filter_type =
-   I40E_AQC_ADD_CLOUD_FILTER_TEID_MPLSoGRE;
+   I40E_AQC_ADD_CLOUD_FILTER_0X12;
/* Prepare the buffer, 2 entries */
filter_replace_buf.data[0] = I40E_AQC_REPLACE_CLOUD_CMD_INPUT_FV_STAG;
filter_replace_buf.data[0] |=
I40E_AQC_REPLACE_CLOUD_CMD_INPUT_VALIDATED;
-   filter_replace_buf.data[4] = I40E_AQC_ADD_L1_FILTER_TEID_MPLS;
+   filter_replace_buf.data[4] = I40E_AQC_ADD_L1_FILTER_0X11;
+   filter_replace_buf.data[4] |=
+   I40E_AQC_REPLACE_CLOUD_CMD_INPUT_VALIDATED;
+
+   status = i40e_aq_replace_cloud_filters(hw, &filter_replace,
+  &filter_replace_buf);
+   return status;
+}
+
+static enum i40e_status_code
+i40e_replace_gtp_l1_filter(struct i40e_pf *pf)
+{
+   struct i40e_aqc_replace_cloud_filters_cmd  filter_replace;
+   struct i40e_aqc_replace_cloud_filters_cmd_buf  filter_replace_buf;
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+   enum i40e_status_code status = I40E_SUCCESS;
+
+   /* For GTP-C */
+   memset(&filter_replace, 0,
+  sizeof(struct i40e_aqc_replace_cloud_filters_cmd));
+   memset(&filter_replace_buf, 0,
+  sizeof(struct i40e_aqc_replace_cloud_filters_cmd_buf));
+   /* create L1 filter */
+   filter_replace.old_filter_type =
+   I40E_AQC_REPLACE_CLOUD_CMD_INPUT_FV_IMAC;
+   filter_replace.new_filter_type = I40E_AQC_ADD_L1_FILTER_0X12;
+   filter_replace.tr_bit = I40E_AQC_NEW_TR_22 |
+   I40E_AQC_REPLACE_CLOUD_CMD_INPUT_VALIDATED;
+   /* Prepare the buffer, 2 entries */
+   filter_replace_buf.data[0] =
+   I40E_AQC_REPLACE_CLOUD_CMD_INPUT_FV_TEID_WORD0;
+   filter_replace_buf.data[0] |=
+   I40E_AQC_REPLACE_CLOUD_CMD_INPUT_VALIDATED;
+   filter_replace_buf.data[2] = 0xFF;
+   filter_replace_buf.data[3] = 0xFF;
+   filter_replace_buf.data[4] =
+   I40E_AQC_REPLACE_CLOUD_CMD_INPUT_FV_TEID_WORD1;
+   filter_replace_buf.data[4] |=
+   I40E_AQC_REPLACE_CLOUD_CMD_INPUT_VALIDATED;
+   filter_replace_buf.data[6] = 0xFF;
+   filter_replace_buf.data[7] = 0xFF;
+   status = i40e_aq_replace_cloud_filters(hw, &filter_replace,
+  &filter_replace_buf);
+   if (status < 0)
+   return status;
+
+   /* for GTP-U */
+   memset(&filter_replace, 0,
+  sizeof(struct i40e_aqc_replace_cloud_filters_cmd));
+   memset(&filter_replace_buf, 0,
+  sizeof(struct i40e_aqc_replace_cloud_filters_cmd_buf));
+   /* create L1 filter */
+   filter_replace.o

[dpdk-dev] [PATCH v4 0/7] Add Membership Library

2017-09-27 Thread Yipeng Wang
DPDK Membership Library provides an API that can be used by many DPDK
applications to conduct one or more set-membership tests (we mention some
possible use cases below, but interested readers can refer to
[1] for a wider survey of use cases).

The basic functionalities of the Membership Library include
inserting a new member, deleting an existing member, and querying the existence
of a member. The query result would indicate with high accuracy which specific
set this member belongs to among a group of sets.

The Membership Library is an extension and generalization of traditional filter
data structures [2,3], which maintain a space-efficient “set-summary”.
There are two advantages of using such a set-summary rather than operating on a
“full-blown” complete list of elements: firstly it has a much smaller storage
requirement than storing the whole list of elements, and secondly set membership
tests (or other operations) is much more efficient than searching through the
complete list of elements.

A membership test for an element will return the set this element belongs to if
found (or return "not-found") with high accuracy. If needed, the accuracy of the
membership tests could be further increased with larger storage space.
Set-summary is a fundamental data aggregation component that can be used in many
network applications. It is a crucial structure to address performance and
scalability issues of diverse applications including overlay networks, wild card
flow classification, web-caches, load balancing, connection tracking,
data-centric networks, flow table summaries, network statistics and traffic
monitoring. Our Proof of Concept (PoC) using set-summary to optimize flow lookup
in Open vSwitch (OvS) shows a speedup of about 2-3X.

This patch set implements two types of set-summaries, i.e., hash-table based
set-summary (HTSS) and Vector Bloom Filter (vBF). HTSS supports both the
non-cache and cache modes. The non-cache mode can incur a small chance of
false-positives which is the case when the set-summary indicates a key belongs
to a given set while actually it is not. The cache mode can also have
false-negatives in addition to false-positives. False-negatives means the case
when the set-summary indicates a key does not belong to a given set while
actually it does. This happens because cache mode allows new key to evict
existing keys. vBF only has false-positives similar to the non-cache HTSS.
However, one can set the false-positive rate arbitrarily. HTSS's
false-positive rate is determined by the hash-table size and the signature size.

[1] A Broder and M Mitzenmacher, “Network Applications of Bloom Filters: A
Survey,” in Internet Mathematics, 2005.

[2] B H Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors,”
Communications of the ACM, 1970.

[3] B Fan, D G Andersen and M Kaminsky, “Cuckoo Filter: Practically Better Than
Bloom,” in Conference on emerging Networking Experiments and Technologies, 2014.


v4:
- member lib: change two hash function macros to be one. crc or jhash will be
  chosen depending on the architecture.
- member lib: use dynamic log type instead of static one by Thomas' comment.
- member lib: code cleanups (remove unnecessary code, change variable name,
  coding style, etc).
- member lib: reorder members in setsummary struct to be more cache friendly.
- member lib: setsummary struct member "name" should be char array rather than
  char pointer. Fixed.
- member lib: add check for two hash seeds to make sure they are not equal.
- member lib: vBF lookup speed optimization.
- member lib: revise comments in various places as Pablo suggested.
- member lib: change "void *" to "struct rte_member_setsum *" in public API as
  Pablo suggested. test case is modified accordingly to directly use the struct.
- member lib: change order in rte_member_version.map as Pablo suggested.
- MAINTAINERS: change the maintainer block order according to Thomas' comment.
- test: add printf to show which section expects error messages. Change set
  count to 16 from 32 for performance test.
- documentation: revise several places according to Pablo's and John's comments.

v3:
- documentation: update documentation to incorporate John's review.

v2:
- test: add bad parameter test functions to test the creation fail case.
- test: add find existing test.
- member lib: Add num_entries check in rte_member_create_ht.
- member lib: Add multiple checks for rte_member_create_vbf to avoid 
divide-by-0.
- member lib: remove unnecessary ret from rte_member.c according to Stephen's
  comment.
- member lib: change the vBF creation function to be not too conservative.
  Previous algorithm is too conservative on false positive.
- member lib: fix uninitialized issue fail gcc-4.8 in rte_member_find_existing.
- member lib: add rte_member_find_existing to version map.
- makefile: lib/librte_member/Makefile: change include order according to
  Luca's comment.
- documentation: update the programmer guide for membership libra

[dpdk-dev] [PATCH v4 1/7] member: implement main API

2017-09-27 Thread Yipeng Wang
Membership library is an extension and generalization of a traditional
filter (for example Bloom Filter) structure. In general, the Membership
library is a data structure that provides a "set-summary" and responds
to set-membership queries of whether a certain element belongs to a
set(s). A membership test for an element will return the set this element
belongs to or not-found if the element is never inserted into the
set-summary.

The results of the membership test are not 100% accurate. Certain
false positive or false negative probability could exist. However,
comparing to a "full-blown" complete list of elements, a "set-summary"
is memory efficient and fast on lookup.

This patch adds the main API definition.

Signed-off-by: Yipeng Wang 
---
 lib/Makefile |   2 +
 lib/librte_member/Makefile   |  49 +++
 lib/librte_member/rte_member.c   | 336 
 lib/librte_member/rte_member.h   | 507 +++
 lib/librte_member/rte_member_version.map |  16 +
 5 files changed, 910 insertions(+)
 create mode 100644 lib/librte_member/Makefile
 create mode 100644 lib/librte_member/rte_member.c
 create mode 100644 lib/librte_member/rte_member.h
 create mode 100644 lib/librte_member/rte_member_version.map

diff --git a/lib/Makefile b/lib/Makefile
index 86caba1..c82033a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_MEMBER) += librte_member
+DEPDIRS-librte_member := librte_eal librte_hash
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_member/Makefile b/lib/librte_member/Makefile
new file mode 100644
index 000..1a79eaa
--- /dev/null
+++ b/lib/librte_member/Makefile
@@ -0,0 +1,49 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_member.a
+
+CFLAGS := -I$(SRCDIR) $(CFLAGS)
+CFLAGS += $(WERROR_FLAGS) -O3
+
+EXPORT_MAP := rte_member_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) +=  rte_member.c
+# install includes
+SYMLINK-$(CONFIG_RTE_LIBRTE_MEMBER)-include := rte_member.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_member/rte_member.c b/lib/librte_member/rte_member.c
new file mode 100644
index 000..fee75a3
--- /dev/null
+++ b/lib/librte_member/rte_member.c
@@ -0,0 +1,336 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ *

[dpdk-dev] [PATCH v4 2/7] member: implement HT mode

2017-09-27 Thread Yipeng Wang
One of the set-summary structures is hash-table based
set-summary (HTSS). One example is cuckoo filter [1].

Comparing to a traditional hash table, HTSS has a much more
compact structure. For each element, only one signature and
its corresponding set ID is stored. No key comparison is required
during lookup. For the table structure, there are multiple entries
in each bucket, and the table is composed of many buckets.

Two modes are supported for HTSS, "cache" and "none-cache" modes.
The non-cache mode is more similar to traditional cuckoo filter.
When a bucket is full, one entry will be evicted to its
alternative bucket to make space for the new key. The table could
be full and then no more keys could be inserted. This mode has
false-positive rate but no false-negative. Multiple entries
with same signature could stay in the same bucket.

The "cache" mode does not evict key to its alternative bucket
when a bucket is full, an existing key will be evicted out of
the table like a cache. Thus, the table will never reject keys when
it is full. Another property is in each bucket, there cannot be
multiple entries with same signature. The mode could have both
false-positive and false-negative probability.

This patch adds the implementation of HTSS.

[1] B Fan, D G Andersen and M Kaminsky, “Cuckoo Filter: Practically
Better Than Bloom,” in Conference on emerging Networking
Experiments and Technologies, 2014.

Signed-off-by: Yipeng Wang 
---
 lib/librte_member/Makefile|   2 +-
 lib/librte_member/rte_member_ht.c | 474 ++
 lib/librte_member/rte_member_ht.h |  94 
 3 files changed, 569 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_member/rte_member_ht.c
 create mode 100644 lib/librte_member/rte_member_ht.h

diff --git a/lib/librte_member/Makefile b/lib/librte_member/Makefile
index 1a79eaa..ad26548 100644
--- a/lib/librte_member/Makefile
+++ b/lib/librte_member/Makefile
@@ -42,7 +42,7 @@ EXPORT_MAP := rte_member_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) +=  rte_member.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) +=  rte_member.c rte_member_ht.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMBER)-include := rte_member.h
 
diff --git a/lib/librte_member/rte_member_ht.c 
b/lib/librte_member/rte_member_ht.c
new file mode 100644
index 000..55672a4
--- /dev/null
+++ b/lib/librte_member/rte_member_ht.c
@@ -0,0 +1,474 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rte_member.h"
+#include "rte_member_ht.h"
+
+static inline int
+insert_overwrite_search(uint32_t bucket, member_sig_t tmp_sig,
+   struct member_ht_bucket *buckets,
+   member_set_t set_id)
+{
+   int i;
+   for (i = 0; i < RTE_MEMBER_BUCKET_ENTRIES; i++) {
+   if (buckets[bucket].sigs[i] == tmp_sig) {
+   buckets[bucket].sets[i] = set_id;
+   return 1;
+   }
+   }
+   return 0;
+}
+
+static inline int
+search_bucket_single(uint32_t bucket, member_sig_t tmp_sig,
+   struct member_ht_bucket *buckets,
+   member_set_t *set_id)
+{
+   int iter;
+   for (iter = 0; iter < RTE_MEMBER_BUCKET_ENTRIES; iter++) 

[dpdk-dev] [PATCH v4 3/7] member: implement vBF mode

2017-09-27 Thread Yipeng Wang
Bloom Filter (BF) [1] is a well-known space-efficient
probabilistic data structure that answers set membership queries.
Vector of Bloom Filters (vBF) is an extension to traditional BF
that supports multi-set membership testing. Traditional BF will
return found or not-found for each key. vBF will also return
which set the key belongs to if it is found.

Since each set requires a BF, vBF should be used when set count
is small. vBF's false positive rate could be set appropriately so
that its memory requirement and lookup speed is better in certain
cases comparing to HT based set-summary.

This patch adds the vBF implementation.

[1]B H Bloom, “Space/Time Trade-offs in Hash Coding with Allowable
Errors,” Communications of the ACM, 1970.

Signed-off-by: Yipeng Wang 
---
 lib/librte_member/Makefile |   2 +-
 lib/librte_member/rte_member_vbf.c | 349 +
 lib/librte_member/rte_member_vbf.h |  79 +
 3 files changed, 429 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_member/rte_member_vbf.c
 create mode 100644 lib/librte_member/rte_member_vbf.h

diff --git a/lib/librte_member/Makefile b/lib/librte_member/Makefile
index ad26548..50275ed 100644
--- a/lib/librte_member/Makefile
+++ b/lib/librte_member/Makefile
@@ -42,7 +42,7 @@ EXPORT_MAP := rte_member_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) +=  rte_member.c rte_member_ht.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) +=  rte_member.c rte_member_ht.c 
rte_member_vbf.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMBER)-include := rte_member.h
 
diff --git a/lib/librte_member/rte_member_vbf.c 
b/lib/librte_member/rte_member_vbf.c
new file mode 100644
index 000..cbfa18e
--- /dev/null
+++ b/lib/librte_member/rte_member_vbf.c
@@ -0,0 +1,349 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "rte_member.h"
+#include "rte_member_vbf.h"
+
+/*
+ * vBF currently implemented as a big array.
+ * The BFs have a vertical layout. Bits in same location of all bfs will stay
+ * in the same cache line.
+ * For example, if we have 32 bloom filters, we use a uint32_t array to
+ * represent all of them. array[0] represent the first location of all the
+ * bloom filters, array[1] represents the second location of all the
+ * bloom filters, etc. The advantage of this layout is to minimize the average
+ * number of memory accesses to test all bloom filters.
+ *
+ * Currently the implementation supports vBF containing 1,2,4,8,16,32 BFs.
+ */
+int
+rte_member_create_vbf(struct rte_member_setsum *ss,
+   const struct rte_member_parameters *params)
+{
+
+   if (params->num_set > 32 || !rte_is_power_of_2(params->num_set) ||
+   params->num_keys == 0 ||
+   params->false_positive_rate == 0 ||
+   params->false_positive_rate > 1) {
+   rte_errno = EINVAL;
+   RTE_MEMBER_LOG(ERR, "vBF create with invalid parameters\n");
+   return -EINVAL;
+   }
+
+   /* We assume expected keys evenly distribute to all BFs */
+   uint32_t num_keys_per_bf = 1 + (params->num_keys - 1) / ss->num_set;
+
+   /*
+* Note that the false positive rate is

[dpdk-dev] [PATCH v4 4/7] member: add AVX for HT mode

2017-09-27 Thread Yipeng Wang
For key search, the signatures of all entries are compared against
the signature of the key that is being looked up. Since all
signatures are contiguously put in a bucket, they can be compared
with vector instructions (AVX2), achieving higher lookup performance.

This patch adds AVX2 implementation in a separate header file.

Signed-off-by: Yipeng Wang 
---
 lib/librte_member/rte_member_ht.c  | 145 -
 lib/librte_member/rte_member_x86.h | 107 +++
 2 files changed, 219 insertions(+), 33 deletions(-)
 create mode 100644 lib/librte_member/rte_member_x86.h

diff --git a/lib/librte_member/rte_member_ht.c 
b/lib/librte_member/rte_member_ht.c
index 55672a4..ef1541a 100644
--- a/lib/librte_member/rte_member_ht.c
+++ b/lib/librte_member/rte_member_ht.c
@@ -40,6 +40,10 @@
 #include "rte_member.h"
 #include "rte_member_ht.h"
 
+#if defined(RTE_ARCH_X86)
+#include "rte_member_x86.h"
+#endif
+
 static inline int
 insert_overwrite_search(uint32_t bucket, member_sig_t tmp_sig,
struct member_ht_bucket *buckets,
@@ -132,6 +136,13 @@ rte_member_create_ht(struct rte_member_setsum *ss,
for (j = 0; j < RTE_MEMBER_BUCKET_ENTRIES; j++)
buckets[i].sets[j] = RTE_MEMBER_NO_MATCH;
}
+#if defined(RTE_ARCH_X86)
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) &&
+   RTE_MEMBER_BUCKET_ENTRIES == 16)
+   ss->sig_cmp_fn = RTE_MEMBER_COMPARE_AVX2;
+   else
+#endif
+   ss->sig_cmp_fn = RTE_MEMBER_COMPARE_SCALAR;
 
RTE_MEMBER_LOG(DEBUG, "Hash table based filter created, "
"the table has %u entries, %u buckets\n",
@@ -166,15 +177,26 @@ rte_member_lookup_ht(const struct rte_member_setsum *ss,
member_sig_t tmp_sig;
struct member_ht_bucket *buckets = ss->table;
 
-
*set_id = RTE_MEMBER_NO_MATCH;
get_buckets_index(ss, key, &prim_bucket, &sec_bucket, &tmp_sig);
 
-   if (search_bucket_single(prim_bucket, tmp_sig, buckets,
-   set_id) ||
-   search_bucket_single(sec_bucket, tmp_sig,
-   buckets, set_id))
-   return 1;
+   switch (ss->sig_cmp_fn) {
+#if defined(RTE_ARCH_X86) && defined(RTE_MACHINE_CPUFLAG_AVX2)
+   case RTE_MEMBER_COMPARE_AVX2:
+   if (search_bucket_single_avx(prim_bucket, tmp_sig, buckets,
+   set_id) ||
+   search_bucket_single_avx(sec_bucket, tmp_sig,
+   buckets, set_id))
+   return 1;
+   break;
+#endif
+   default:
+   if (search_bucket_single(prim_bucket, tmp_sig, buckets,
+   set_id) ||
+   search_bucket_single(sec_bucket, tmp_sig,
+   buckets, set_id))
+   return 1;
+   }
 
return 0;
 }
@@ -198,13 +220,27 @@ rte_member_lookup_bulk_ht(const struct rte_member_setsum 
*ss,
}
 
for (i = 0; i < num_keys; i++) {
-   if (search_bucket_single(prim_buckets[i], tmp_sig[i],
-   buckets, &set_id[i]) ||
-   search_bucket_single(sec_buckets[i],
-   tmp_sig[i], buckets, &set_id[i]))
-   ret++;
-   else
-   set_id[i] = RTE_MEMBER_NO_MATCH;
+   switch (ss->sig_cmp_fn) {
+#if defined(RTE_ARCH_X86) && defined(RTE_MACHINE_CPUFLAG_AVX2)
+   case RTE_MEMBER_COMPARE_AVX2:
+   if (search_bucket_single_avx(prim_buckets[i],
+   tmp_sig[i], buckets, &set_id[i]) ||
+   search_bucket_single_avx(sec_buckets[i],
+   tmp_sig[i], buckets, &set_id[i]))
+   ret++;
+   else
+   set_id[i] = RTE_MEMBER_NO_MATCH;
+   break;
+#endif
+   default:
+   if (search_bucket_single(prim_buckets[i], tmp_sig[i],
+   buckets, &set_id[i]) ||
+   search_bucket_single(sec_buckets[i],
+   tmp_sig[i], buckets, &set_id[i]))
+   ret++;
+   else
+   set_id[i] = RTE_MEMBER_NO_MATCH;
+   }
}
return ret;
 }
@@ -221,12 +257,24 @@ rte_member_lookup_multi_ht(const struct rte_member_setsum 
*ss,
 
get_buckets_index(ss, key, &prim_bucket, &sec_bucket, &tmp_sig);
 
-   search_bucket_multi(prim_bucket, tmp_sig, buckets, &ret,
-match_per_key, set_id);
-   if (ret < match_per_key)
-   search_bucket_multi(sec_

[dpdk-dev] [PATCH v4 5/7] member: enable the library

2017-09-27 Thread Yipeng Wang
This patch enables the Membership library.

Signed-off-by: Yipeng Wang 
---
 MAINTAINERS| 8 +++-
 config/common_base | 5 +
 lib/librte_member/Makefile | 2 ++
 mk/rte.app.mk  | 2 ++
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a0cd75e..adb8e2c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -726,6 +726,13 @@ F: test/test/test_lpm*
 F: test/test/test_func_reentrancy.c
 F: test/test/test_xmmt_ops.h
 
+Membership - EXPERIMENTAL
+M: Yipeng Wang 
+M: Sameh Gobriel 
+F: lib/librte_member/
+F: doc/guides/prog_guide/member_lib.rst
+F: test/test/test_member*
+
 Traffic metering
 M: Cristian Dumitrescu 
 F: lib/librte_meter/
@@ -734,7 +741,6 @@ F: test/test/test_meter.c
 F: examples/qos_meter/
 F: doc/guides/sample_app_ug/qos_metering.rst
 
-
 Other libraries
 ---
 
diff --git a/config/common_base b/config/common_base
index 5e97a08..5e31ced 100644
--- a/config/common_base
+++ b/config/common_base
@@ -595,6 +595,11 @@ CONFIG_RTE_LIBRTE_HASH_DEBUG=n
 CONFIG_RTE_LIBRTE_EFD=y
 
 #
+# Compile librte_member
+#
+CONFIG_RTE_LIBRTE_MEMBER=y
+
+#
 # Compile librte_jobstats
 #
 CONFIG_RTE_LIBRTE_JOBSTATS=y
diff --git a/lib/librte_member/Makefile b/lib/librte_member/Makefile
index 50275ed..3bac1d0 100644
--- a/lib/librte_member/Makefile
+++ b/lib/librte_member/Makefile
@@ -37,6 +37,8 @@ LIB = librte_member.a
 CFLAGS := -I$(SRCDIR) $(CFLAGS)
 CFLAGS += $(WERROR_FLAGS) -O3
 
+LDLIBS += -lm
+
 EXPORT_MAP := rte_member_version.map
 
 LIBABIVER := 1
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index c25fdd9..c79acf0 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -86,6 +86,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)+= -lrte_cfgfile
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_HASH)   += -lrte_hash
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MEMBER)+= -lrte_member
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lrte_vhost
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KVARGS) += -lrte_kvargs
 _LDLIBS-$(CONFIG_RTE_LIBRTE_MBUF)   += -lrte_mbuf
@@ -196,6 +197,7 @@ endif
 _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lm
 _LDLIBS-$(CONFIG_RTE_LIBRTE_SCHED)  += -lrt
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METER)  += -lm
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MEMBER)  += -lm
 ifeq ($(CONFIG_RTE_LIBRTE_VHOST_NUMA),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VHOST)  += -lnuma
 endif
-- 
2.7.4



[dpdk-dev] [PATCH v4 6/7] test/member: add functional and perf tests

2017-09-27 Thread Yipeng Wang
This patch adds functional and performance tests for membership
library.

Signed-off-by: Yipeng Wang 
---
 test/test/Makefile   |   3 +
 test/test/test_member.c  | 676 +++
 test/test/test_member_perf.c | 630 
 3 files changed, 1309 insertions(+)
 create mode 100644 test/test/test_member.c
 create mode 100644 test/test/test_member_perf.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 42d9a49..b61dde3 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -123,6 +123,9 @@ SRCS-y += test_logs.c
 SRCS-y += test_memcpy.c
 SRCS-y += test_memcpy_perf.c
 
+
+SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) += test_member.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) += test_member_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_EFD) += test_efd.c
 SRCS-$(CONFIG_RTE_LIBRTE_EFD) += test_efd_perf.c
 
diff --git a/test/test/test_member.c b/test/test/test_member.c
new file mode 100644
index 000..46be9bc
--- /dev/null
+++ b/test/test/test_member.c
@@ -0,0 +1,676 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* This test is for membership library's simple feature test */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+struct rte_member_setsum *setsum_ht;
+struct rte_member_setsum *setsum_cache;
+struct rte_member_setsum *setsum_vbf;
+
+/* 5-tuple key type */
+struct flow_key {
+   uint32_t ip_src;
+   uint32_t ip_dst;
+   uint16_t port_src;
+   uint16_t port_dst;
+   uint8_t proto;
+} __attribute__((packed));
+
+/* Keys used by unit test functions */
+static struct flow_key keys[5] = {
+   {
+   .ip_src = IPv4(0x03, 0x02, 0x01, 0x00),
+   .ip_dst = IPv4(0x07, 0x06, 0x05, 0x04),
+   .port_src = 0x0908,
+   .port_dst = 0x0b0a,
+   .proto = 0x0c,
+   },
+   {
+   .ip_src = IPv4(0x13, 0x12, 0x11, 0x10),
+   .ip_dst = IPv4(0x17, 0x16, 0x15, 0x14),
+   .port_src = 0x1918,
+   .port_dst = 0x1b1a,
+   .proto = 0x1c,
+   },
+   {
+   .ip_src = IPv4(0x23, 0x22, 0x21, 0x20),
+   .ip_dst = IPv4(0x27, 0x26, 0x25, 0x24),
+   .port_src = 0x2928,
+   .port_dst = 0x2b2a,
+   .proto = 0x2c,
+   },
+   {
+   .ip_src = IPv4(0x33, 0x32, 0x31, 0x30),
+   .ip_dst = IPv4(0x37, 0x36, 0x35, 0x34),
+   .port_src = 0x3938,
+   .port_dst = 0x3b3a,
+   .proto = 0x3c,
+   },
+   {
+   .ip_src = IPv4(0x43, 0x42, 0x41, 0x40),
+   .ip_dst = IPv4(0x47, 0x46, 0x45, 0x44),
+   .port_src = 0x4948,
+   .port_dst = 0x4b4a,
+   .proto = 0x4c,
+   }
+};
+
+uint32_t test_set[5] = {1, 2, 3, 4, 5};
+
+#define ITERATIONS  3
+#define KEY_SIZE  4
+
+#define MAX_ENTRIES (1 << 16)
+uint8_t gened_keys[MAX_ENTRIES][KEY_SIZE];
+
+static struct rte_member_parameters params = {
+   .num_keys = MAX_ENTRIES,   /* Total hash table entries. */
+   .key_len = KEY_SIZE,   /* Length of hash key. */
+
+   /*num_set and false_positive_rate only relevant to vBF setsum*/
+   .num_set = 32,
+  

Re: [dpdk-dev] [PATCH v5 2/5] ethdev: increase port_id range

2017-09-27 Thread Ferruh Yigit
On 9/26/2017 8:01 AM, Yang, Zhiyong wrote:
> Hi Ferruh, Thomas,
> 
>>> @@ -40,7 +40,7 @@ LIB = librte_pmd_af_packet.a
>>>
>>>  EXPORT_MAP := rte_pmd_af_packet_version.map
>>>
>>> -LIBABIVER := 1
>>> +LIBABIVER := 2
>>
>> I have just recognized this one. This shouldn't be updated. Only LIABIVER of 
>> the
>> libraries that their API modified should be updated.
>>
>> The release notes "Shared Library Versions" updates and Makefile LIBABIVER
>> updates should be compatible. I mean either both needs to be updated or both
>> not updated.
>>
> 
> Ok.  I will cleanup the unnecessary updates. 
> 
> Here are the list I will update .
> 
> - librte_bitratestats.so.1
> + librte_bitratestats.so.2
> - librte_ethdev.so.7
> + librte_ethdev.so.8
> - librte_pdump.so.1
> + librte_pdump.so.2
> - librte_pmd_bond.so.1
> + librte_pmd_bnxt.so.2
> + librte_pmd_bond.so.2
> + librte_pmd_failsafe.so.2
> + librte_pmd_i40e.so.2
> + librte_pmd_ixgbe.so.2
> + librte_pmd_vhost.so.2
> 
> Among them, the following libs are added newly.
> 
> + librte_pmd_bnxt.so.2
> + librte_pmd_failsafe.so.2
> + librte_pmd_i40e.so.2
> + librte_pmd_ixgbe.so.2
> + librte_pmd_vhost.so.2

There is a patch to add missing ones [1], I think it would be better to
add them with a separate patch.

But I missed the "librte_pmd_vhost" one, I will send a new version soon.

Also failsafe don't have any PMD specific API, no need to add it.


[1]
http://dpdk.org/dev/patchwork/patch/28598/


> 
> One  LIBABIVER need to be fixed.
> Could you apply your patch  http://www.dpdk.org/dev/patchwork/patch/28738/  
> in the main tree firstly?
> 
> I hope the patchset can be merged into main tree.  How do you think about it?
> 
>> There is no way to split this patch more without breaking build right?
> 
> Yes. it is too hard to split it.
> 
> thanks
> Zhiyong
> 
>> It would be possible to do it if port_t used, but that has been decided not 
>> to...
>>
>> <...>



[dpdk-dev] [PATCH v2] doc: add shared library versions for missing PMDs

2017-09-27 Thread Ferruh Yigit
Signed-off-by: Ferruh Yigit 
Acked-by: John McNamara 
---
 doc/guides/rel_notes/release_17_11.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/release_17_11.rst 
b/doc/guides/rel_notes/release_17_11.rst
index bb0ba80c6..de637fb4b 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -198,8 +198,12 @@ The libraries prepended with a plus sign were incremented 
in this version.
  librte_net.so.1
  librte_pdump.so.1
  librte_pipeline.so.3
+ librte_pmd_bnxt.so.1
  librte_pmd_bond.so.1
+ librte_pmd_i40e.so.1
+ librte_pmd_ixgbe.so.1
  librte_pmd_ring.so.2
+ librte_pmd_vhost.so.1
  librte_port.so.3
  librte_power.so.1
  librte_reorder.so.1
-- 
2.13.5



Re: [dpdk-dev] [PATCH v3 1/2] net/i40e: queue region set and flush

2017-09-27 Thread Ferruh Yigit
On 9/26/2017 9:54 AM, Zhao1, Wei wrote:
> Hi,  Ferruh

Hi Wei,

>> -Original Message-
>> From: Yigit, Ferruh
>> Sent: Wednesday, September 20, 2017 6:36 PM
>> To: Zhao1, Wei ; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v3 1/2] net/i40e: queue region set and
> flush
>>
>> On 9/15/2017 4:13 AM, Wei Zhao wrote:
>> > This feature enable queue regions configuration for RSS in PF/VF, so
>> > that different traffic classes or different packet classification
>> > types can be separated to different queues in different queue
>> > regions.This patch can set queue region range, it include queue number
>> > in a region and the index of first queue.
>>
>> Is following correct:
>> So instead of distributing packets to the multiple queues, this will
> distribute
>> packets into queue reqions which may consists of multiple queues.
>>
>> If so, is there a way to control how packets distributed within same queue
>> region to multiple queues?
>>
>> And is this feature only supported with RSS? Can it be part of RSS
>> configuration instead of PMD specific API?
>>
>> > This patch enable mapping between different priorities (UP) and
>>
>> User priorities (UP)
>>
>> > different traffic classes.It also enable mapping between a region
>> > index and a sepcific flowtype(PCTYPE).It also provide the solution of
>> > flush all configuration about queue region the above described.
>> >
>> > Signed-off-by: Wei Zhao  >
>> > ---
>> >  drivers/net/i40e/i40e_ethdev.c    |  19 +-
>> >  drivers/net/i40e/i40e_ethdev.h    |  30 ++
>> >  drivers/net/i40e/rte_pmd_i40e.c   | 482
>> ++
>> >  drivers/net/i40e/rte_pmd_i40e.h   |  38 +++
>> >  drivers/net/i40e/rte_pmd_i40e_version.map |   1 +
>> >  5 files changed, 566 insertions(+), 4 deletions(-)
>> >
>>
>> <...>
>>
>> > +static int
>> > +i40e_vsi_update_queue_region_mapping(struct i40e_hw *hw,
>> > +    struct i40e_pf *pf)
>> > +{
>> > +  uint16_t i;
>> > +  struct i40e_vsi *vsi = pf->main_vsi;
>> > +  uint16_t queue_offset, bsf, tc_index;
>> > +  struct i40e_vsi_context ctxt;
>> > +  struct i40e_aqc_vsi_properties_data *vsi_info;
>> > +  struct i40e_queue_region_info *region_info =
>> > + 
> &pf->queue_region;
>> > +  uint32_t ret = -EINVAL;
>> > +
>> > +  if (!region_info->queue_region_number) {
>  
> 
> 
>> > +int rte_pmd_i40e_queue_region_conf(uint8_t port,
>> > +  struct
> rte_i40e_rss_region_conf *conf_ptr) {
>> > +  struct rte_eth_dev *dev = &rte_eth_devices[port];
>>
>> you need to verify port_id, since this is public API now. Please check
> other
>> APIs in this file.
> 
> I have already "if (!is_i40e_supported(dev))" code in v3 in function
>  rte_pmd_i40e_queue_region_conf.
> 
> So, I do not know what is your meaning.

RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);

Normally we don't need to worry about this in PMD because abstraction
layer APIs do this check already. But since this is public API, user can
give any value as port_id parameter, better to verify it.

> 
>>
>> > +  struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data-
>> >dev_private);
>> > +  struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
>> >dev_private);
>> > +  enum rte_pmd_i40e_queue_region_op op_type = conf_ptr->op;
>> > +  uint32_t ret;
>>
>> This should be signed variable, since you are using it for return and
> assigning
>> negative values.
>>
>> > +
>> > +  if (!is_i40e_supported(dev))
>> > +  return -ENOTSUP;
>> > +
>> > +  switch (op_type) {
>> > +  case RTE_PMD_I40E_QUEUE_REGION_SET:
>> > +  ret = i40e_set_queue_region(pf, conf_ptr);
>> > +  break;
>>


Re: [dpdk-dev] [PATCH v4 03/41] bus/dpaa: add compatibility and helper macros

2017-09-27 Thread Ferruh Yigit
On 9/26/2017 1:43 PM, Shreyansh Jain wrote:
> On Tuesday 19 September 2017 07:27 PM, Shreyansh Jain wrote:
>> On Tuesday 19 September 2017 07:10 PM, Ferruh Yigit wrote:
>>> On 9/19/2017 2:18 PM, Shreyansh Jain wrote:
 On Monday 18 September 2017 08:19 PM, Ferruh Yigit wrote:
> On 9/9/2017 12:20 PM, Shreyansh Jain wrote:
>> From: Hemant Agrawal 
>>
>> Linked list, bit operations and compatibility macros.
>>
>> Signed-off-by: Geoff Thorpe 
>> Signed-off-by: Hemant Agrawal 
>
> 
> [...]
> 
>> + */
>
>>>
>>> <...>
>>>
>> +
>> +#ifndef __DPAA_LIST_H
>> +#define __DPAA_LIST_H
>> +
>> +//
>> +/* Linked-lists */
>> +//
>
> Do we need to maintain a linked list implementation, why no just use
> sys/queue.h ones as done many places in DPDK?
>
>> +
>> +struct list_head {
>> +    struct list_head *prev;
>> +    struct list_head *next;
>> +};
>> +
>
> <...>
>

 The underlying DPAA infrastructure code is shared between kernel and
 userspace. That is why, changing the internal headers (for example,
 using RTE_* queues) is something I want to avoid until absolutely
 necessary. The outer layers (drivers/*/dpaa/) are something I am
 trying to keep as close to possible to DPDK.
>>>
>>> I understand you want to escape from maintaining a copy of common files
>>> for DPDK, this has been done by many drivers, as not changing "base"
>>> files, this makes sense.
>>>
>>> But for this case, file is "dpaa_list.h" and as far as I can see all it
>>> has is linked list implementation, this looked easy to exclude, but if
>>> not you can ignore the comment.
>>
>> Got your point. I will respin and see how much is the impact.
>> Thanks for inputs.
> 
> I tried to work around the dpaa_list.h use in DPAA code - but, the 
> changes are subtle but large in number - though, restricted only to base 
> framework.
> I would prefer to skip this for a while as the driver is stable now. I 
> would probably do this change in a incremental manner to keep it traceable.
> 
> Ferruh, Is that OK with you?

That is OK, if it is not easy to escape from it.


Re: [dpdk-dev] [PATCH v4 41/41] net/dpaa: support for extended statistics

2017-09-27 Thread Ferruh Yigit
On 9/27/2017 9:26 AM, Shreyansh Jain wrote:
> On Thursday 21 September 2017 06:56 PM, Shreyansh Jain wrote:
>> On Monday 18 September 2017 08:27 PM, Ferruh Yigit wrote:
>>> On 9/9/2017 12:21 PM, Shreyansh Jain wrote:
 From: Hemant Agrawal 

 Signed-off-by: Hemant Agrawal 
>>>
>>> <...>
>>>
 +static int
 +dpaa_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstat 
 *xstats,
 +    unsigned int n)
 +{
 +    struct dpaa_if *dpaa_intf = dev->data->dev_private;
 +    unsigned int i = 0, num = RTE_DIM(dpaa_xstats_strings);
 +    uint64_t values[sizeof(struct dpaa_if_stats) / 8];
 +
 +    if (xstats == NULL)
 +    return 0;
>>>
>>> This is a little not clear from API definition, but I guess when xstats
>>> is NULL, it should return num of available stats, "num" for this case. I
>>> guess there are PMDs implements both, can you please double check?
>>
>> Ok. I will check again.
> 
> I checked a number of other ethev implementations. Some like i40e/e1000 
> also return 0 when xstats is NULL. Others, like bnx2x and qede don't 
> handle this situation.
> All return "num" when passed argument is larger than number of elements 
> in the table.
> 
> Though, I think the logic that get_xstats should return its size (num) 
> when passed with NULL, looks good to me.
> How does one standardize such semantics for existing APIs?

Thanks for checking, I guess first we should clarify the API and the
expected behavior [1] and later update required PMDs.

So for now I think PMD is OK as it is.


[1]
I double checked the rte_eth_xstats_get(). It does:

If xstats == NULL
xcount = dev_ops->xstats_get(dev, NULL, 0);
return count + xcount;

Intention looks like to returning number of available stats, otherwise
returning "count + 0" will be useless.

So it looks like expectation from eth_xstats_get_t for that case is
returning xstats size, but this not clear and not documented in API comment.

> 
> (I can add this info to the API document that you created - but only 
> once we know if others will agree to change)
> 
>>
>>>
 +
 +    if (n < num)
 +    return num;
 +
 +    fman_if_stats_get_all(dpaa_intf->fif, values,
 +  sizeof(struct dpaa_if_stats) / 8);
 +
 +    for (i = 0; i < num; i++) {
 +    xstats[i].id = i;
 +    xstats[i].value = values[dpaa_xstats_strings[i].offset / 8];
 +    }
 +    return i;
 +}
>>>
>>> <...>
>>>
>>
>>
> 



Re: [dpdk-dev] [PATCH 2/5] eal/ppc64: define architecture specific rdtsc hz

2017-09-27 Thread Chao Zhu
> -Original Message-
> From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> Sent: 2017年8月13日 15:04
> To: dev@dpdk.org
> Cc: tho...@monjalon.net; bruce.richard...@intel.com;
> konstantin.anan...@intel.com; vikto...@rehivetech.com;
> jianbo@linaro.org; chao...@linux.vnet.ibm.com; Jerin Jacob
> 
> Subject: [dpdk-dev] [PATCH 2/5] eal/ppc64: define architecture specific
rdtsc hz
> 
> CC: Chao Zhu 
> Signed-off-by: Jerin Jacob 
> ---
>  lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h
> b/lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h
> index 8fa6fc60b..20243fb29 100644
> --- a/lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h
> +++ b/lib/librte_eal/common/include/arch/ppc_64/rte_cycles.h
> @@ -79,6 +79,19 @@ rte_rdtsc(void)
>   return tsc.tsc_64;
>  }
> 
> +/**
> + * Get the number of rdtsc cycles in one second if the architecture
supports.
> + *
> + * @return
> + *   The number of rdtsc cycles in one second. Return zero if the
architecture
> + *   support is not available.
> + */
> +static inline uint64_t
> +rte_rdtsc_arch_hz(void)
> +{
> + return 0;
> +}
> +
>  static inline uint64_t
>  rte_rdtsc_precise(void)
>  {
> --
> 2.14.0
Acked-by: Chao Zhu 




Re: [dpdk-dev] [PATCH v2] examples/l3fwd: optimised packet processing on powerpc

2017-09-27 Thread Chao Zhu
> -Original Message-
> From: Gowrishankar [mailto:gowrishanka...@linux.vnet.ibm.com]
> Sent: 2017年9月21日 18:05
> To: dev@dpdk.org
> Cc: Chao Zhu ; Tomasz Kantecki
> ; Gowrishankar Muthukrishnan
> 
> Subject: [PATCH v2] examples/l3fwd: optimised packet processing on powerpc
> 
> From: Gowrishankar Muthukrishnan 
> 
> This patch adds altivec support for lpm packet processing in powerpc.
> 
> Signed-off-by: Gowrishankar Muthukrishnan
> 
> ---
> v2:
>  * coding style standards
> 
>  MAINTAINERS|   1 +
>  examples/l3fwd/l3fwd_altivec.h | 284
> +
>  examples/l3fwd/l3fwd_lpm.c |   5 +-
>  examples/l3fwd/l3fwd_lpm_altivec.h | 164 +
>  4 files changed, 453 insertions(+), 1 deletion(-)  create mode 100644
> examples/l3fwd/l3fwd_altivec.h  create mode 100644
> examples/l3fwd/l3fwd_lpm_altivec.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a0cd75e..0010b6c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -168,6 +168,7 @@ M: Chao Zhu 
>  F: lib/librte_eal/common/arch/ppc_64/
>  F: lib/librte_eal/common/include/arch/ppc_64/
>  F: drivers/net/i40e/i40e_rxtx_vec_altivec.c
> +F: examples/l3fwd/*altivec.h
> 
>  Intel x86
>  M: Bruce Richardson  diff --git
> a/examples/l3fwd/l3fwd_altivec.h b/examples/l3fwd/l3fwd_altivec.h new file
> mode 100644 index 000..29c2627
> --- /dev/null
> +++ b/examples/l3fwd/l3fwd_altivec.h
> @@ -0,0 +1,284 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2017 IBM Corporation.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above
copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +
> +#ifndef _L3FWD_ALTIVEC_H_
> +#define _L3FWD_ALTIVEC_H_
> +
> +#include "l3fwd.h"
> +#include "l3fwd_common.h"
> +
> +/*
> + * Update source and destination MAC addresses in the ethernet header.
> + * Perform RFC1812 checks and updates for IPV4 packets.
> + */
> +static inline void
> +processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t
> +dst_port[FWDSTEP]) {
> + vector unsigned int te[FWDSTEP];
> + vector unsigned int ve[FWDSTEP];
> + vector unsigned int *p[FWDSTEP];
> +
> + p[0] = rte_pktmbuf_mtod(pkt[0], vector unsigned int *);
> + p[1] = rte_pktmbuf_mtod(pkt[1], vector unsigned int *);
> + p[2] = rte_pktmbuf_mtod(pkt[2], vector unsigned int *);
> + p[3] = rte_pktmbuf_mtod(pkt[3], vector unsigned int *);
> +
> + ve[0] = (vector unsigned int)val_eth[dst_port[0]];
> + te[0] = *p[0];
> +
> + ve[1] = (vector unsigned int)val_eth[dst_port[1]];
> + te[1] = *p[1];
> +
> + ve[2] = (vector unsigned int)val_eth[dst_port[2]];
> + te[2] = *p[2];
> +
> + ve[3] = (vector unsigned int)val_eth[dst_port[3]];
> + te[3] = *p[3];
> +
> + /* Update first 12 bytes, keep rest bytes intact. */
> + te[0] = (vector unsigned int)vec_sel(
> + (vector unsigned short)ve[0],
> + (vector unsigned short)te[0],
> + (vector unsigned short) {0, 0, 0, 0,
> + 0, 0, 0x, 0x});
> +
> + te[1] = (vector unsigned int)vec_sel(
> + (vector unsigned short)ve[1],
> + (vector unsigned short)te[1],
> + (vector unsigned short) {0, 0, 0, 0,
> +   

Re: [dpdk-dev] [PATCH v3] librte_eal: fix wrong assert for arm and ppc

2017-09-27 Thread Chao Zhu
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Lukasz Majczak
> Sent: 2017年9月25日 15:35
> To: dev@dpdk.org
> Cc: Lukasz Majczak 
> Subject: [dpdk-dev] [PATCH v3] librte_eal: fix wrong assert for arm and
ppc
> 
> The assertion of return value from the open() function is done against 0,
while it
> is a correct value - open() returns -1 in case of an error.
> It causes problems while trying to run as a daemon, in which case, this
call to
> open() will return 0 as a valid descriptor.
> 
> Fixes: b94e5c9406b5 ("eal/arm: add CPU flags for ARMv7")
> Fixes: 97523f822ba9 ("eal/arm: add CPU flags for ARMv8")
> Fixes: 9ae155385686 ("eal/ppc: cpu flag checks for IBM Power")
> 
> Signed-off-by: Lukasz Majczak 
> ---
>  lib/librte_eal/common/arch/arm/rte_cpuflags.c| 2 +-
>  lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_eal/common/arch/arm/rte_cpuflags.c
> b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
> index 5636e9c1d..88f1cbe37 100644
> --- a/lib/librte_eal/common/arch/arm/rte_cpuflags.c
> +++ b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
> @@ -137,7 +137,7 @@ rte_cpu_get_features(hwcap_registers_t out)
>   _Elfx_auxv_t auxv;
> 
>   auxv_fd = open("/proc/self/auxv", O_RDONLY);
> - assert(auxv_fd);
> + assert(auxv_fd != -1);
>   while (read(auxv_fd, &auxv, sizeof(auxv)) == sizeof(auxv)) {
>   if (auxv.a_type == AT_HWCAP) {
>   out[REG_HWCAP] = auxv.a_un.a_val;
> diff --git a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
> b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
> index fcf96e045..970a61c5e 100644
> --- a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
> +++ b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
> @@ -108,7 +108,7 @@ rte_cpu_get_features(hwcap_registers_t out)
>   Elf64_auxv_t auxv;
> 
>   auxv_fd = open("/proc/self/auxv", O_RDONLY);
> - assert(auxv_fd);
> + assert(auxv_fd != -1);
>   while (read(auxv_fd, &auxv,
>   sizeof(Elf64_auxv_t)) == sizeof(Elf64_auxv_t)) {
>   if (auxv.a_type == AT_HWCAP)
> --
> 2.14.1
Acked-by: Chao Zhu 




Re: [dpdk-dev] [PATCH v2] doc: add shared library versions for missing PMDs

2017-09-27 Thread Yang, Zhiyong
Hi Ferruh,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Thursday, September 28, 2017 2:59 AM
> To: Mcnamara, John 
> Cc: dev@dpdk.org; Thomas Monjalon ; Yigit, Ferruh
> 
> Subject: [dpdk-dev] [PATCH v2] doc: add shared library versions for missing
> PMDs
> 
> Signed-off-by: Ferruh Yigit 
> Acked-by: John McNamara 
> ---

Acked-by: Zhiyong Yang 

>   librte_pipeline.so.3
> + librte_pmd_bnxt.so.1
>   librte_pmd_bond.so.1
> + librte_pmd_i40e.so.1
> + librte_pmd_ixgbe.so.1
>   librte_pmd_ring.so.2
> + librte_pmd_vhost.so.1
>   librte_port.so.3
>   librte_power.so.1
>   librte_reorder.so.1
> --
> 2.13.5



Re: [dpdk-dev] [PATCH v5 2/5] ethdev: increase port_id range

2017-09-27 Thread Yang, Zhiyong
Ferruh,

> There is a patch to add missing ones [1], I think it would be better to add 
> them
> with a separate patch.
> 
> But I missed the "librte_pmd_vhost" one, I will send a new version soon.
> 
> Also failsafe don't have any PMD specific API, no need to add it.
> 
> 
> [1]
> http://dpdk.org/dev/patchwork/patch/28598/
> 

http://dpdk.org/dev/patchwork/patch/28598/
http://www.dpdk.org/dev/patchwork/patch/28738/

Ok,  Once these two patches have been applied , 
I will rebase the increase port_id range patchset to send V6 out.

Thanks
Zhiyong


Re: [dpdk-dev] [PATCH v4 41/41] net/dpaa: support for extended statistics

2017-09-27 Thread Shreyansh Jain
Hi Ferruh,

> -Original Message-
> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> Sent: Thursday, September 28, 2017 5:07 AM
> To: Shreyansh Jain 
> Cc: dev@dpdk.org; Hemant Agrawal 
> Subject: Re: [PATCH v4 41/41] net/dpaa: support for extended statistics
> 
> On 9/27/2017 9:26 AM, Shreyansh Jain wrote:
> > On Thursday 21 September 2017 06:56 PM, Shreyansh Jain wrote:
> >> On Monday 18 September 2017 08:27 PM, Ferruh Yigit wrote:
> >>> On 9/9/2017 12:21 PM, Shreyansh Jain wrote:
>  From: Hemant Agrawal 
> 
>  Signed-off-by: Hemant Agrawal 
> >>>
> >>> <...>
> >>>
>  +static int
>  +dpaa_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstat
>  *xstats,
>  +    unsigned int n)
>  +{
>  +    struct dpaa_if *dpaa_intf = dev->data->dev_private;
>  +    unsigned int i = 0, num = RTE_DIM(dpaa_xstats_strings);
>  +    uint64_t values[sizeof(struct dpaa_if_stats) / 8];
>  +
>  +    if (xstats == NULL)
>  +    return 0;
> >>>
> >>> This is a little not clear from API definition, but I guess when xstats
> >>> is NULL, it should return num of available stats, "num" for this case. I
> >>> guess there are PMDs implements both, can you please double check?
> >>
> >> Ok. I will check again.
> >
> > I checked a number of other ethev implementations. Some like i40e/e1000
> > also return 0 when xstats is NULL. Others, like bnx2x and qede don't
> > handle this situation.
> > All return "num" when passed argument is larger than number of elements
> > in the table.
> >
> > Though, I think the logic that get_xstats should return its size (num)
> > when passed with NULL, looks good to me.
> > How does one standardize such semantics for existing APIs?
> 
> Thanks for checking, I guess first we should clarify the API and the
> expected behavior [1] and later update required PMDs.
> 
> So for now I think PMD is OK as it is.
> 
> 
> [1]
> I double checked the rte_eth_xstats_get(). It does:
> 
> If xstats == NULL
>   xcount = dev_ops->xstats_get(dev, NULL, 0);
>   return count + xcount;
> 
> Intention looks like to returning number of available stats, otherwise
> returning "count + 0" will be useless.
 
Makes sense. I missed this and kept looking for implementations.
I will at least fix dpaa code.

> 
> So it looks like expectation from eth_xstats_get_t for that case is
> returning xstats size, but this not clear and not documented in API comment.
> 
> >
> > (I can add this info to the API document that you created - but only
> > once we know if others will agree to change)
 
Probably this info should be in Doxygen APIs [2].

[2] 
http://dpdk.org/doc/api/rte__ethdev_8h.html#adad5c65f659487db1fefba7d7d902973

> >
> >>
> >>>
>  +
>  +    if (n < num)
>  +    return num;
>  +
>  +    fman_if_stats_get_all(dpaa_intf->fif, values,
>  +  sizeof(struct dpaa_if_stats) / 8);
>  +
>  +    for (i = 0; i < num; i++) {
>  +    xstats[i].id = i;
>  +    xstats[i].value = values[dpaa_xstats_strings[i].offset / 8];
>  +    }
>  +    return i;
>  +}
> >>>
> >>> <...>
> >>>
> >>
> >>
> >



[dpdk-dev] [PATCH] examples/l3fwd: pass flow arguments when starting l3fwd

2017-09-27 Thread Xiaoyun Li
When the number of free descriptors goes below the LRXQTRESH, an immediate
interrupt is triggered. And lots of interrupts cause performance drop. This
patch enables to pass flow arguments when starting l3fwd example.

Signed-off-by: Xiaoyun Li 
---
 examples/l3fwd/main.c | 191 +-
 1 file changed, 189 insertions(+), 2 deletions(-)

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 81995fd..1a265a8 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -83,6 +83,7 @@
  */
 #define RTE_TEST_RX_DESC_DEFAULT 128
 #define RTE_TEST_TX_DESC_DEFAULT 512
+#define RTE_PMD_PARAM_UNSET -1
 
 #define MAX_TX_QUEUE_PER_PORT RTE_MAX_ETHPORTS
 #define MAX_RX_QUEUE_PER_PORT 128
@@ -92,6 +93,9 @@
 /* Static global variables used within this file. */
 static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT;
 static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;
+static int16_t rx_free_thresh = RTE_PMD_PARAM_UNSET;
+static int16_t tx_free_thresh = RTE_PMD_PARAM_UNSET;
+static int16_t tx_rs_thresh = RTE_PMD_PARAM_UNSET;
 
 /**< Ports set in promiscuous mode off by default. */
 static int promiscuous_on;
@@ -320,7 +324,12 @@ print_usage(const char *prgname)
" [--no-numa]"
" [--hash-entry-num]"
" [--ipv6]"
-   " [--parse-ptype]\n\n"
+   " [--parse-ptype]"
+   " [--nb-rxd]"
+   " [--nb-txd]"
+   " [--rx-free-thresh]"
+   " [--tx-free-thresh]"
+   " [--tx-rs-thresh]\n\n"
 
"  -p PORTMASK: Hexadecimal bitmask of ports to configure\n"
"  -P : Enable promiscuous mode\n"
@@ -334,7 +343,12 @@ print_usage(const char *prgname)
"  --no-numa: Disable numa awareness\n"
"  --hash-entry-num: Specify the hash entry number in 
hexadecimal to be setup\n"
"  --ipv6: Set if running ipv6 packets\n"
-   "  --parse-ptype: Set to use software to analyze packet 
type\n\n",
+   "  --parse-ptype: Set to use software to analyze packet 
type\n\n"
+   "  --nb-rxd: Set number of descriptors of Rx queue\n"
+   "  --nb-txd: Set number of descriptors of Tx queue\n"
+   "  --rx-free-thresh: Set value of Tx free threshold\n"
+   "  --tx-free-thresh: Set value of Rx free threshold\n"
+   "  --tx-rs-thresh: Set value of Tx RS bit threshold\n\n",
prgname);
 }
 
@@ -389,6 +403,91 @@ parse_hash_entry_number(const char *hash_entry_num)
 }
 
 static int
+parse_nb_rxd(const char *nb_rxd_c)
+{
+   char *end = NULL;
+   unsigned int nb_rxd_t;
+
+   /* parse hexadecimal string */
+   nb_rxd_t = strtoul(nb_rxd_c, &end, 10);
+   if ((nb_rxd_c[0] == '\0') || (end == NULL) || (*end != '\0'))
+   return -1;
+
+   if (nb_rxd_t == 0)
+   return -1;
+
+   return nb_rxd_t;
+}
+
+static int
+parse_nb_txd(const char *nb_txd_c)
+{
+   char *end = NULL;
+   unsigned int nb_txd_t;
+
+   /* parse hexadecimal string */
+   nb_txd_t = strtoul(nb_txd_c, &end, 10);
+   if ((nb_txd_c[0] == '\0') || (end == NULL) || (*end != '\0'))
+   return -1;
+
+   if (nb_txd_t == 0)
+   return -1;
+
+   return nb_txd_t;
+}
+
+static int
+parse_rx_free_thresh(const char *rx_free_thresh_c)
+{
+   char *end = NULL;
+   unsigned int rx_free_thresh_t;
+
+   /* parse hexadecimal string */
+   rx_free_thresh_t = strtoul(rx_free_thresh_c, &end, 10);
+   if ((rx_free_thresh_c[0] == '\0') || (end == NULL) || (*end != '\0'))
+   return -1;
+
+   if (rx_free_thresh_t == 0)
+   return -1;
+
+   return rx_free_thresh_t;
+}
+
+static int
+parse_tx_free_thresh(const char *tx_free_thresh_c)
+{
+   char *end = NULL;
+   unsigned int tx_free_thresh_t;
+
+   /* parse hexadecimal string */
+   tx_free_thresh_t = strtoul(tx_free_thresh_c, &end, 16);
+   if ((tx_free_thresh_c[0] == '\0') || (end == NULL) || (*end != '\0'))
+   return -1;
+
+   if (tx_free_thresh_t == 0)
+   return -1;
+
+   return tx_free_thresh_t;
+}
+
+static int
+parse_tx_rs_thresh(const char *tx_rs_thresh_c)
+{
+   char *end = NULL;
+   unsigned int tx_rs_thresh_t;
+
+   /* parse hexadecimal string */
+   tx_rs_thresh_t = strtoul(tx_rs_thresh_c, &end, 10);
+   if ((tx_rs_thresh_c[0] == '\0') || (end == NULL) || (*end != '\0'))
+   return -1;
+
+   if (tx_rs_thresh_t == 0)
+   return -1;
+
+   return tx_rs_thresh_t;
+}
+
+static int
 parse_config(const char *q_arg)
 {
char s[256];
@@ -487,6 +586,11 @@ static const char short_options[] =
 #define CMD_LINE_OPT_ENABLE_JUMBO "enable-jumbo"
 #define CMD_LINE_OPT_HASH_ENTRY_NUM "hash-entry-num"
 #define CMD_LINE_OPT_PARSE_PTYPE "parse-ptype"
+#define CMD_LINE_OPT_NB_RXD "nb-rxd"
+#def

Re: [dpdk-dev] [PATCH v3 1/2] net/i40e: queue region set and flush

2017-09-27 Thread Zhao1, Wei
Thank you.

> -Original Message-
> From: Yigit, Ferruh
> Sent: Thursday, September 28, 2017 3:13 AM
> To: Zhao1, Wei ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 1/2] net/i40e: queue region set and flush
> 
> On 9/26/2017 9:54 AM, Zhao1, Wei wrote:
> > Hi,  Ferruh
> 
> Hi Wei,
> 
> >> -Original Message-
> >> From: Yigit, Ferruh
> >> Sent: Wednesday, September 20, 2017 6:36 PM
> >> To: Zhao1, Wei ; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v3 1/2] net/i40e: queue region set and
> > flush
> >>
> >> On 9/15/2017 4:13 AM, Wei Zhao wrote:
> >> > This feature enable queue regions configuration for RSS in PF/VF,
> >> > so that different traffic classes or different packet
> >> > classification types can be separated to different queues in
> >> > different queue regions.This patch can set queue region range, it
> >> > include queue number in a region and the index of first queue.
> >>
> >> Is following correct:
> >> So instead of distributing packets to the multiple queues, this will
> > distribute
> >> packets into queue reqions which may consists of multiple queues.
> >>
> >> If so, is there a way to control how packets distributed within same
> >> queue region to multiple queues?
> >>
> >> And is this feature only supported with RSS? Can it be part of RSS
> >> configuration instead of PMD specific API?
> >>
> >> > This patch enable mapping between different priorities (UP) and
> >>
> >> User priorities (UP)
> >>
> >> > different traffic classes.It also enable mapping between a region
> >> > index and a sepcific flowtype(PCTYPE).It also provide the solution
> >> > of flush all configuration about queue region the above described.
> >> >
> >> > Signed-off-by: Wei Zhao  > >
> >> > ---
> >> >  drivers/net/i40e/i40e_ethdev.c    |  19 +-
> >> >  drivers/net/i40e/i40e_ethdev.h    |  30 ++
> >> >  drivers/net/i40e/rte_pmd_i40e.c   | 482
> >> ++
> >> >  drivers/net/i40e/rte_pmd_i40e.h   |  38 +++
> >> >  drivers/net/i40e/rte_pmd_i40e_version.map |   1 +
> >> >  5 files changed, 566 insertions(+), 4 deletions(-)
> >> >
> >>
> >> <...>
> >>
> >> > +static int
> >> > +i40e_vsi_update_queue_region_mapping(struct i40e_hw *hw,
> >> > +    struct i40e_pf *pf) {
> >> > +  uint16_t i;
> >> > +  struct i40e_vsi *vsi = pf->main_vsi;
> >> > +  uint16_t queue_offset, bsf, tc_index;
> >> > +  struct i40e_vsi_context ctxt;
> >> > +  struct i40e_aqc_vsi_properties_data *vsi_info;
> >> > +  struct i40e_queue_region_info *region_info =
> >> > +
> > &pf->queue_region;
> >> > +  uint32_t ret = -EINVAL;
> >> > +
> >> > +  if (!region_info->queue_region_number) {
> >
> > 
> >
> >> > +int rte_pmd_i40e_queue_region_conf(uint8_t port,
> >> > +  struct
> > rte_i40e_rss_region_conf *conf_ptr) {
> >> > +  struct rte_eth_dev *dev = &rte_eth_devices[port];
> >>
> >> you need to verify port_id, since this is public API now. Please
> >> check
> > other
> >> APIs in this file.
> >
> > I have already "if (!is_i40e_supported(dev))" code in v3 in function
> >  rte_pmd_i40e_queue_region_conf.
> >
> > So, I do not know what is your meaning.
> 
> RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
> 
> Normally we don't need to worry about this in PMD because abstraction
> layer APIs do this check already. But since this is public API, user can give 
> any
> value as port_id parameter, better to verify it.
> 
> >
> >>
> >> > +  struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data-
> >> >dev_private);
> >> > +  struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> >> >dev_private);
> >> > +  enum rte_pmd_i40e_queue_region_op op_type = conf_ptr->op;
> >> > +  uint32_t ret;
> >>
> >> This should be signed variable, since you are using it for return and
> > assigning
> >> negative values.
> >>
> >> > +
> >> > +  if (!is_i40e_supported(dev))
> >> > +  return -ENOTSUP;
> >> > +
> >> > +  switch (op_type) {
> >> > +  case RTE_PMD_I40E_QUEUE_REGION_SET:
> >> > +  ret = i40e_set_queue_region(pf, conf_ptr);
> >> > +  break;
> >>


[dpdk-dev] [PATCH v2] net/ixgbe: fix VFIO interrupt mapping in VF

2017-09-27 Thread Wei Dai
When a VF port is bound to VFIO-PIC, only miscellaneous interrupt
is mapped to VFIO vector 0 in eth_ixgbevf_dev_init( ).
In ixgbevf_dev_start(), if previous VFIO interrupt mapping set in
eth_ixgbevf_dev_init( ) is not cleard, it will fail when calling
rte_intr_enable( ) tries to map Rx queue interrupt to other VFIO
vectors. This patch clears the VFIO interrupt mappings before
setting both miscellaneous and Rx queue interrupt mappings again
to avoid failure.

Fixes: 77234603fba0 ("net/ixgbe: support VF mailbox interrupt for link up/down")
Cc: sta...@dpdk.org

Signed-off-by: Wei Dai 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 9ca5cbc..f49c616 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -5046,6 +5046,15 @@ ixgbevf_dev_start(struct rte_eth_dev *dev)
}
ixgbevf_configure_msix(dev);
 
+   /* When a VF port is bound to VFIO-PCI, only miscellaneous interrupt
+* is mapped to VFIO vector 0 in eth_ixgbevf_dev_init( ).
+* If previous VFIO interrupt mapping setting in eth_ixgbevf_dev_init( )
+* is not cleared, it will fail when following rte_intr_enable( ) tries
+* to map Rx queue interrupt to other VFIO vectors.
+* So clear uio/vfio intr/evevnfd first to avoid failure.
+*/
+   rte_intr_disable(intr_handle);
+
rte_intr_enable(intr_handle);
 
/* Re-enable interrupt for VF */
-- 
2.7.5



Re: [dpdk-dev] [PATCH v4 41/41] net/dpaa: support for extended statistics

2017-09-27 Thread Shreyansh Jain
> -Original Message-
> From: Shreyansh Jain
> Sent: Thursday, September 28, 2017 7:59 AM
> To: 'Ferruh Yigit' 
> Cc: dev@dpdk.org; Hemant Agrawal 
> Subject: RE: [PATCH v4 41/41] net/dpaa: support for extended statistics
> 
> Hi Ferruh,
> 
> > -Original Message-
> > From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> > Sent: Thursday, September 28, 2017 5:07 AM
> > To: Shreyansh Jain 
> > Cc: dev@dpdk.org; Hemant Agrawal 
> > Subject: Re: [PATCH v4 41/41] net/dpaa: support for extended statistics
> >
> > On 9/27/2017 9:26 AM, Shreyansh Jain wrote:
> > > On Thursday 21 September 2017 06:56 PM, Shreyansh Jain wrote:
> > >> On Monday 18 September 2017 08:27 PM, Ferruh Yigit wrote:
> > >>> On 9/9/2017 12:21 PM, Shreyansh Jain wrote:
> >  From: Hemant Agrawal 
> > 
> >  Signed-off-by: Hemant Agrawal 
> > >>>
> > >>> <...>
> > >>>
> >  +static int
> >  +dpaa_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstat
> >  *xstats,
> >  +    unsigned int n)
> >  +{
> >  +    struct dpaa_if *dpaa_intf = dev->data->dev_private;
> >  +    unsigned int i = 0, num = RTE_DIM(dpaa_xstats_strings);
> >  +    uint64_t values[sizeof(struct dpaa_if_stats) / 8];
> >  +
> >  +    if (xstats == NULL)
> >  +    return 0;
> > >>>
> > >>> This is a little not clear from API definition, but I guess when xstats
> > >>> is NULL, it should return num of available stats, "num" for this case.
> I
> > >>> guess there are PMDs implements both, can you please double check?
> > >>
> > >> Ok. I will check again.
> > >
> > > I checked a number of other ethev implementations. Some like i40e/e1000
> > > also return 0 when xstats is NULL. Others, like bnx2x and qede don't
> > > handle this situation.
> > > All return "num" when passed argument is larger than number of elements
> > > in the table.
> > >
> > > Though, I think the logic that get_xstats should return its size (num)
> > > when passed with NULL, looks good to me.
> > > How does one standardize such semantics for existing APIs?
> >
> > Thanks for checking, I guess first we should clarify the API and the
> > expected behavior [1] and later update required PMDs.
> >
> > So for now I think PMD is OK as it is.
> >
> >
> > [1]
> > I double checked the rte_eth_xstats_get(). It does:
> >
> > If xstats == NULL
> > xcount = dev_ops->xstats_get(dev, NULL, 0);
> > return count + xcount;
> >
> > Intention looks like to returning number of available stats, otherwise
> > returning "count + 0" will be useless.
> 
> Makes sense. I missed this and kept looking for implementations.
> I will at least fix dpaa code.
 
On a second though: there might be another issue.
Application calls rte_eth_xstats_get_names and finds that 'N' xstats exist.
Thereafter, in a call to rte_eth_xstats_get, xstats==NULL but n=N, so the API 
would return:

if (n < count + xcount || xstats == NULL)   
return count + xcount;

'count' is size of generic stats. If drivers->xstats_get were to return 
xcount='N', the application would think that it has got a positive response.
See the doxygen page [3] - it states:

--
Returns:
* A positive value lower or equal to size: success.
  The return value is the number of entries filled in the
  stats table
--

There might be a case where the generic stats are exactly equal to xstats size 
and application would attempt to access the array.

I am not even sure why the xstats_get is returning (count + xcount) when the 
API definition doesn't say that generic+xstat is returned.
Am I missing something?

[3] 
http://dpdk.org/doc/api/rte__ethdev_8h.html#adad5c65f659487db1fefba7d7d902973

> 
> >
> > So it looks like expectation from eth_xstats_get_t for that case is
> > returning xstats size, but this not clear and not documented in API
> comment.
> >
> > >
> > > (I can add this info to the API document that you created - but only
> > > once we know if others will agree to change)
> 
> Probably this info should be in Doxygen APIs [2].
> 
> [2]
> http://dpdk.org/doc/api/rte__ethdev_8h.html#adad5c65f659487db1fefba7d7d902973
> 



Re: [dpdk-dev] [PATCH 10/12] vhost: support to kick in secondary process

2017-09-27 Thread Tan, Jianfeng


> -Original Message-
> From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> Sent: Wednesday, September 27, 2017 5:36 PM
> To: Tan, Jianfeng
> Cc: dev@dpdk.org; maxime.coque...@redhat.com; mtetsu...@gmail.com
> Subject: Re: [PATCH 10/12] vhost: support to kick in secondary process
> 
> On Fri, Sep 22, 2017 at 02:30:21AM +, Tan, Jianfeng wrote:
> >
> >
> > > -Original Message-
> > > From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> > > Sent: Thursday, September 21, 2017 5:18 PM
> > > To: Tan, Jianfeng
> > > Cc: dev@dpdk.org; maxime.coque...@redhat.com;
> mtetsu...@gmail.com
> > > Subject: Re: [PATCH 10/12] vhost: support to kick in secondary process
> > >
> > > On Thu, Sep 21, 2017 at 03:04:39PM +0800, Tan, Jianfeng wrote:
> > > > >On Fri, Aug 25, 2017 at 09:40:50AM +, Jianfeng Tan wrote:
> > > > >>To support kick in secondary process, we propose callfd_pri and
> > > > >>kickfd_pri to store the value in primary process; and by a new
> > > > >>API, rte_vhost_set_vring_effective_fd(), we can set effective
> > > > >>callfd and kickfd which can be used by secondary process.
> > > > >>
> > > > >>Note in this case, either primary process or the secondary process
> > > > >>can kick the frontend; that is, they cannot kick a vring at the
> > > > >>same time.
> > > > >Since only one can work, why not just overwriting the fd? Say, you
> > > > >could introudce some APIs like "rte_vhost_set_vring_callfd", then
> > > > >you don't need to introduce few more fields like "callfd_pri".
> > > >
> > > > That cannot address the below case:
> > > > 1. Primary starts;
> > > > 2. Secondary one starts; (if we overwrite it without storing it in some
> > > > other fields)
> > > > 3. Secondary one exits;
> > > > 4. Secondary two starts. (primary cannot share the fd with this
> secondary
> > > > process now, as this fd does not mean anything to the primary process)
> > >
> > > I was thinking that those fds will be retrieved by the primary process
> > > once? So thsoe it got at beginning are still valid?
> >
> > Yes, the FDs are valid to primary process at step 1. But After overwriting 
> > in
> step 2, those FDs are not valid to primary.
> 
> Yes, but the primary process has already got its correct fds saved, right?
> With the saved fds, it then could share it with another secondary process.
> 
> Actually, the key (and typical) issue of multi-process here is the fds are
> process specific, while they are stored in the shared memory. That means
> only one will take effect eventually. Worse, the old ones are lost.
> 
> So, I think to make it right in this case, you should move the fds from
> the shared memory and store them in the memory of the corresponding
> process.
> If that's done, all processes could have its own valid fds, then every
> process could do the kick (if that's really necessary).

Agreed, that can reduce the application's effort to decide which process should 
set the FDs. Will try to fix that in the coming version.

Thanks,
Jianfeng

> 
> You could check following commit for more info.
> 553f45932fb7 ("net/virtio: store PCI operators pointer locally")
> 
>   --yliu


[dpdk-dev] dpdk was compiled with CONFIG_RTE_LIBRTE_VHOST_NUMA=n , but on ovs-2.8 compilation, "configure: error: unable to find libnuma, install the dependency package "

2017-09-27 Thread Joo Kim
Hello,


My dpdk was compiled with   CONFIG_RTE_LIBRTE_VHOST_NUMA=n  (which is a
default in config/common_base).

But, I see following error when I  compile ovs-2.8 with dpdk enabled. Is
this expected?   Also ignorable error?

"
. . .
checking for library containing get_mempolicy... no
configure: error: unable to find libnuma, install the dependency package
"


[dpdk-dev] [PATCH v4] librte_eal: fix wrong assert for arm and ppc

2017-09-27 Thread Lukasz Majczak
The assertion of return value from the open() function is done against
0, while it is a correct value - open() returns -1 in case of an error.
It causes problems while trying to run as a daemon, in which case, this
call to open() will return 0 as a valid descriptor.

Fixes: b94e5c9406b5 ("eal/arm: add CPU flags for ARMv7")
Fixes: 97523f822ba9 ("eal/arm: add CPU flags for ARMv8")
Fixes: 9ae155385686 ("eal/ppc: cpu flag checks for IBM Power")

Signed-off-by: Lukasz Majczak 
Acked-by: Jan Viktorin 
Acked-by: Chao Zhu 
---
 lib/librte_eal/common/arch/arm/rte_cpuflags.c| 2 +-
 lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/arch/arm/rte_cpuflags.c 
b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
index 5636e9c1d..88f1cbe37 100644
--- a/lib/librte_eal/common/arch/arm/rte_cpuflags.c
+++ b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
@@ -137,7 +137,7 @@ rte_cpu_get_features(hwcap_registers_t out)
_Elfx_auxv_t auxv;
 
auxv_fd = open("/proc/self/auxv", O_RDONLY);
-   assert(auxv_fd);
+   assert(auxv_fd != -1);
while (read(auxv_fd, &auxv, sizeof(auxv)) == sizeof(auxv)) {
if (auxv.a_type == AT_HWCAP) {
out[REG_HWCAP] = auxv.a_un.a_val;
diff --git a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c 
b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
index fcf96e045..970a61c5e 100644
--- a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
+++ b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
@@ -108,7 +108,7 @@ rte_cpu_get_features(hwcap_registers_t out)
Elf64_auxv_t auxv;
 
auxv_fd = open("/proc/self/auxv", O_RDONLY);
-   assert(auxv_fd);
+   assert(auxv_fd != -1);
while (read(auxv_fd, &auxv,
sizeof(Elf64_auxv_t)) == sizeof(Elf64_auxv_t)) {
if (auxv.a_type == AT_HWCAP)
-- 
2.14.1