Re: [dpdk-dev] [PATCH v3 2/2] gro: support VxLAN GRO

2017-12-22 Thread Chen, Junjie J
Hi Jiayu

> -Original Message-
> From: Hu, Jiayu
> Sent: Friday, December 22, 2017 3:26 PM
> To: dev@dpdk.org
> Cc: Tan, Jianfeng ; Chen, Junjie J
> ; Ananyev, Konstantin
> ; step...@networkplumber.org; Yigit,
> Ferruh ; Yao, Lei A ; Hu, Jiayu
> 
> Subject: [PATCH v3 2/2] gro: support VxLAN GRO
> 
> This patch adds a framework that allows GRO on tunneled packets.
> Furthermore, it leverages that framework to provide GRO support for
> VxLAN-encapsulated packets. Supported VxLAN packets must have an outer
> IPv4 header, and contain an inner TCP/IPv4 packet.
> 
> VxLAN GRO doesn't check if input packets have correct checksums and
> doesn't update checksums for output packets. Additionally, it assumes the
> packets are complete (i.e., MF==0 && frag_off==0), when IP fragmentation is
> possible (i.e., DF==0).
> 
> Signed-off-by: Jiayu Hu 
> ---
>  .../prog_guide/generic_receive_offload_lib.rst |  31 +-
>  lib/librte_gro/Makefile|   1 +
>  lib/librte_gro/gro_vxlan_tcp4.c| 515
> +
>  lib/librte_gro/gro_vxlan_tcp4.h| 184 
>  lib/librte_gro/rte_gro.c   | 129 +-
>  lib/librte_gro/rte_gro.h   |   5 +-
>  6 files changed, 837 insertions(+), 28 deletions(-)  create mode 100644
> lib/librte_gro/gro_vxlan_tcp4.c  create mode 100644
> lib/librte_gro/gro_vxlan_tcp4.h
> 
> diff --git a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> index c2d7a41..078bec0 100644
> --- a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> +++ b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> @@ -57,7 +57,9 @@ assumes the packets are complete (i.e., MF==0 &&
> frag_off==0), when IP  fragmentation is possible (i.e., DF==0). Additionally, 
> it
> complies RFC
>  6864 to process the IPv4 ID field.
> 
> -Currently, the GRO library provides GRO supports for TCP/IPv4 packets.
> +Currently, the GRO library provides GRO supports for TCP/IPv4 packets
> +and VxLAN packets which contain an outer IPv4 header and an inner
> +TCP/IPv4 packet.
> 
>  Two Sets of API
>  ---
> @@ -108,7 +110,8 @@ Reassembly Algorithm
> 
>  The reassembly algorithm is used for reassembling packets. In the GRO
> library, different GRO types can use different algorithms. In this -section, 
> we
> will introduce an algorithm, which is used by TCP/IPv4 GRO.
> +section, we will introduce an algorithm, which is used by TCP/IPv4 GRO
> +and VxLAN GRO.
> 
>  Challenges
>  ~~
> @@ -185,6 +188,30 @@ Header fields deciding if two packets are neighbors
> include:
>  - IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should
>be increased by 1.
> 
> +VxLAN GRO
> +-
> +
> +The table structure used by VxLAN GRO, which is in charge of processing
> +VxLAN packets with an outer IPv4 header and inner TCP/IPv4 packet, is
> +similar with that of TCP/IPv4 GRO. Differently, the header fields used
> +to define a VxLAN flow include:
> +
> +- outer source and destination: Ethernet and IP address, UDP port
> +
> +- VxLAN header (VNI and flag)
> +
> +- inner source and destination: Ethernet and IP address, TCP port
> +
> +Header fields deciding if packets are neighbors include:
> +
> +- outer IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
> +  outer IPv4 header is 0, should be increased by 1.
> +
> +- inner TCP sequence number
> +
> +- inner IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
> +  inner IPv4 header is 0, should be increased by 1.
> +
>  .. note::
>  We comply RFC 6864 to process the IPv4 ID field. Specifically,
>  we check IPv4 ID fields for the packets whose DF bit is 0 and diff
> --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile index
> eb423cc..0110455 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -45,6 +45,7 @@ LIBABIVER := 1
>  # source files
>  SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
>  SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_vxlan_tcp4.c
> 
>  # install this header file
>  SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h diff --git
> a/lib/librte_gro/gro_vxlan_tcp4.c b/lib/librte_gro/gro_vxlan_tcp4.c new file
> mode 100644 index 000..6567779
> --- /dev/null
> +++ b/lib/librte_gro/gro_vxlan_tcp4.c
> @@ -0,0 +1,515 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the followi

Re: [dpdk-dev] [RFC v2 1/5] ether: add flow action to redirect packet in a switch domain

2017-12-22 Thread Zhang, Qi Z
Hi Alex

> -Original Message-
> From: Alex Rosenbaum [mailto:rosenbauma...@gmail.com]
> Sent: Thursday, December 21, 2017 8:37 PM
> To: Zhang, Qi Z 
> Cc: adrien.mazarg...@6wind.com; DPDK ; Doherty, Declan
> 
> Subject: Re: [dpdk-dev] [RFC v2 1/5] ether: add flow action to redirect packet
> in a switch domain
> 
> On Thu, Dec 21, 2017 at 4:35 AM, Qi Zhang  wrote:
> > Add action RTE_FLOW_ACTION_TYPE_SWITCH_PORT, it can be used to
> > redirect
> 
> I guess the word "SWITCH" should be remove from commit message. you
> don't use it later in the patch.

Yes, it should be corrected.
> 
> 
> >
> > +Action: ``PORT``
> > +
> > +
> > +Redirect packets to an interface that connect to the same switch domain.
> > +
> > +The destination should be managed by a rte_ethdev instance, port_id
> > +is the identification of the destination. A typical use case is to
> > +define a flow that redirect packet to an interface that managed by a
> > +Port Representor.
> 
> 
> A verbs would be better suited for an ACTION_TYPE. while ".._TYPE_PORT" is
> a nous.
> Probably ".._TYPE_REDIRECT" would better fit here.
> See man tc-mirred as referance:
> http://man7.org/linux/man-pages/man8/tc-mirred.8.html

I agree it will be better to use verbs for action, so we can have 
TYPE_REDIRECT_TO_PORT/VF/PF...,
But since we already have ACTION_TYPE_VF, ACTION_TYPE_PF ...
Maybe it's better just to follow the same pattern?

> 
> Do we want to distinguish between different destination type?
> The target might be a port (port_id) or potencial other destinations/queue.
> So maybe use ".._TYPE_REDIRECT_TO_PORT"?
> 
> Anyway, I think you should remove the "same switch domain" from docs
> since there is no switch domain yet in DPDK.
> Lets let the PMD decided if this sucessed or fails, based on the target type
> and other HW limitations. Not just based on switch domain.

Yes, it's not necessary to be specific here, the new action is just add the 
semantic
to support packet redirect between port that managed by etherdevs, device driver
can figure out the way or just reject it.
I will capture this in v3.
> 
> PS: I agree switch domain needs to be introduced. I don't think port
> representor is the correct direction.

OK, thanks for your sharing, I think this can be discussed more on Port 
Representor mail list

> 
> Alex

Thanks
Qi


Re: [dpdk-dev] [PATCH v2 18/18] doc: remove devargs deprecation notices

2017-12-22 Thread Gaëtan Rivet
Hi Shreyansh,

On Fri, Dec 22, 2017 at 10:29:14AM +0530, Shreyansh Jain wrote:
> Hello Gaetan,
> 
> On Wednesday 13 December 2017 04:24 PM, Shreyansh Jain wrote:
> > Hello Gaetan,
> > 
> > > -Original Message-
> > > From: Gaëtan Rivet [mailto:gaetan.ri...@6wind.com]
> > > Sent: Wednesday, December 13, 2017 3:56 PM
> > > To: Shreyansh Jain 
> > > Cc: dev@dpdk.org
> > > Subject: Re: [PATCH v2 18/18] doc: remove devargs deprecation notices
> > > 
> 
> [...]
> 
> > While reading through the code, I also had the same feeling - there can be 
> > corner cases in the parsing functions which I can't imagine. Anyways, those 
> > need to be runtime-verified - static reviews may not suffice.
> > 
> > > 
> > > I would certainly appreciate if you are able to fix the pci / vdev
> > > limitation in rte_eal_dev_attach, as I am starting to be overwhelmed
> > > with work (trying to finish a lot of things before the holidays).
> > OK.
> > Once you give the devargs a push, I will start work on the PCI removal from 
> > rte_eal_dev_attach. Before that, I just want to be sure of devargs with 
> > non-PCI bus (non hotplug case).
> > 
> > And, thanks for tons of work you are handling. I saw the patches and really 
> > appreciate how you have split things up in sequential manner per-patch. It 
> > is difficult.
> 
> Have you pushed the new version of the devargs patches?
> Just wanted to check in case I have missed it.
> 

No sorry,

I am removing the dependency on the bus control framework from the
devargs patchset.

I finished the patchset otherwise (redid the unit test), but after some
thinking I found that the bus control was maybe not ideal.

I will send the devargs patchset soon, once it is made independent, and
will expose the issue with the bus control.

Best,

-- 
Gaëtan Rivet
6WIND


Re: [dpdk-dev] [PATCH] ethdev: add notifications for probing and removal

2017-12-22 Thread Thomas Monjalon
22/12/2017 04:17, Ferruh Yigit:
> On 11/28/2017 2:13 PM, Thomas Monjalon wrote:
> > When a PMD finishes probing, it creates the new port by calling
> > the function rte_eth_dev_allocate().
> > A notification of the new port is sent there to the upper layer.
> > 
> > When a PMD finishes removal of a port, it calls the function
> > rte_eth_dev_release_port().
> > A notification of the destroyed port is sent there to the upper layer.
> > 
> > Signed-off-by: Thomas Monjalon 
> 
> Reviewed-by: Ferruh Yigit 
> 
> > ---
> > 
> > This patch depends on:
> > - ethdev: remove useless parameter in callback process
> > - ethdev: free a port by a dedicated API
> 
> What do you think pulling that patch from port ownership patchset, which is
> still under discussion, to this one? Is it required for port ownership one?

It can be used with port ownership, but they are two separate things.


Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules

2017-12-22 Thread Thomas Monjalon
Hi,

22/12/2017 06:57, Hemant Agrawal:
> This patch moves the Linux kernel modules code to a common place.
>  - Separate the kernel module code from user space code.
>  - The GPL-2.0 licensed code is separated from the BSD-3 licensed userspace
>code

What is the benefit of separate things by license?
These modules are Linux modules, so they should be in the linuxapp dir.
There are also some kernel modules in the bsdapp directory.



Re: [dpdk-dev] standardize device identification

2017-12-22 Thread Thomas Monjalon
22/12/2017 08:01, Shreyansh Jain:
> On Thursday 21 December 2017 03:32 AM, Thomas Monjalon wrote:
> > Changing the title and adding more comments inline:
> > 
> > 19/12/2017 00:05, Thomas Monjalon:
> >> Let's summarize and resume this thread.
> >>
> >> We need a generic syntax to describe a device.
> >> This syntax can be used
> >>- before initializing the device (i.e. whitelist/blacklist)
> >>- or after the initialization (e.g. user config)
> >>
> >> We need to answer 4 questions:
> >> 1/ what are the separators (comma, colon, etc)?
> >> 2/ how to distinguish a device identification from a configuration?
> >> 3/ what are the mandatory parts?
> >> 4/ what can be the optional properties?
> >>
> >> 30/11/2017 08:35, Yuanhan Liu:
> >>> What this patch proposes is to use "name[,mac]" syntax. "name" is the
> >>> PCI id for pci device. For vdev, it's the vdev name given by user. The
> >>> reason "mac" is needed is for some devices (say ConnectX-3), 2 ports
> >>> (in a single NIC) have the same PCI id.
> >>
> >> Based on the feedbacks we had, I suggest a syntax where everything is
> >> optional key/value pairs, and split in 3 categories:
> >>- bus (pci, vdev, vmbus, fslmc, etc)
> >>- class (eth, crypto)
> >>- driver (i40e, mlx5, virtio, etc)
> > 
> > The key/value pair describing the category scope is mandatory
> > and must be the first pair in the category properties.
> > Example: bus=pci, must be placed before id=:01:00.0
> > 
> >> Between categories, the separator is a slash.
> 
> Why is a '/' required as a separator? Are you expecting the key in 
> key,value pair to duplicate across categories?

We need to separate categories because each category is parsed by
a different parser.
The first level parser will get strings for each category and will
call the appropriate parser (by going through different levels of
bus parsers for bus and driver categories).

> >> Inside a category, the separator is a comma.
> >> Inside a key/value pair, the separator is an equal sign.
> 
> sounds reasonable to me.
> 
> >>
> >> It may look like this:
> >> bus=BUS_NAME,id=BUS_ID/class=CLASS_NAME,dev_port=PORT_NUM,mac=MAC_ADDRESS/driver=DRIVER_NAME,driverspecificproperty=VALUE
> 
> If I take cue from fslmc and dpaa bus:
> 
> for fslmc: bus=fslmc,id=dpni.1
> bus=fslmc,id=dpsec.1
> for dpaa: bus=dpaa,id=fm1-mac1
>bus=dpaa,id=dpaa-sec1
> 
> So, at least from fslmc/dpaa perspective, above fits.
> 
> Just want to highlight: in some cases the device names can contain ',' - 
> but then, that can be handled at bus scan level.
> For dpaa bus, the device identified from platform identifiers contains a 
> longer ',' separated string - which is then stripped to form a name like 
> 'fm1-mac1' before being added to device->name.
> 
> >>
> >> A device is identified when every properties are matched.
> 
> So, a ovs-dpdk user would have to know a dpdk bus to identify a device 
> to plug into a OVS bridge?

Yes, the user must know the bus if using a bus id.
But he can use another kind of id like the MAC address.

> >> Before device is probed, only the bus category is relevant.
> >> For the simple PCI whitelist, it means moving from
> >>-w :01:00.0
> >> to
> >>-w bus=pci,id=:01:00.0
> 
> OK
> 
> >> It is possible to mix some settings in these devargs syntax if the keys
> >> are differents. Example: mac= is for identification by MAC, whereas
> >> newmac= would be for specifying a MAC address to set.
> >>
> >> Agreement?
> 
> in principle, yes.
> just need clarity on '/' as a separator.

Thanks for reviewing.


Re: [dpdk-dev] [RFC v2 3/5] ether: Add flow timeout support

2017-12-22 Thread Zhang, Qi Z
Alex:

> -Original Message-
> From: Alex Rosenbaum [mailto:rosenbauma...@gmail.com]
> Sent: Thursday, December 21, 2017 9:59 PM
> To: Zhang, Qi Z 
> Cc: adrien.mazarg...@6wind.com; DPDK ; Doherty, Declan
> 
> Subject: Re: [dpdk-dev] [RFC v2 3/5] ether: Add flow timeout support
> 
> On Thu, Dec 21, 2017 at 4:35 AM, Qi Zhang  wrote:
> > Add new APIs to support flow timeout, application is able to 1. Setup
> > the time duration of a flow, the flow is expected to be deleted
> > automatically when timeout.
> 
> Can you explain how the application (OVS) is expected to use this API?
> It will help to better understand the motivation here...

I think the purpose of the APIs is to expose the hardware feature that support
flow auto delete with a timeout.
As I know, for OVS, every flow in flow table will have time duration
A flow be offloaded to hardware is still required to be deleted in specific 
time, 
I think these APIs help OVS to take advantage HW feature and simplify the flow
aging management

> 
> Are you trying to move the aging timer from application code into the PMD?
> or can your HW remove/disable/inactivate a flow at certain time semantics
> without software context?

Yes, it for hardware feature.

> 
> I would prefer to have the aging timer logic in a centralized location, leek 
> the
> application itself or some DPDK library. instead of having each PMD
> implement its own software timers.
> 
> 
> > 3. Register a callback function when a flow is deleted due to timeout.
> 
> Is the application 'struct rte_flow*' handle really deleted? or the flow was
> removed from HW, just in-active at this time?

Here the flow is deleted, same thing happen as rte_flow_destroy and we need to 
call
rte_flow_create to re-enable the flow. 
I will add more explanation to avoid confusion in next release.

> 
> Can a flow be re-activated? or does this require a call to
> rte_flow_destory() and ret_flow_create()?
> 
> Alex

Thanks
Qi


Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK

2017-12-22 Thread Burakov, Anatoly

On 21-Dec-17 9:38 PM, Walker, Benjamin wrote:

On Tue, 2017-12-19 at 11:14 +, Anatoly Burakov wrote:





Quick outline of all changes done as part of this patchset:

  * Malloc heap adjusted to handle holes in address space
  * Single memseg list replaced by multiple expandable memseg lists
  * VA space for hugepages is preallocated in advance
  * Added dynamic alloc/free for pages, happening as needed on malloc/free


SPDK will need some way to register for a notification when pages are allocated
or freed. For storage, the number of requests per second is (relative to
networking) fairly small (hundreds of thousands per second in a traditional
block storage stack, or a few million per second with SPDK). Given that, we can
afford to do a dynamic lookup from va to pa/iova on each request in order to
greatly simplify our APIs (users can just pass pointers around instead of
mbufs). DPDK has a way to lookup the pa from a given va, but it does so by
scanning /proc/self/pagemap and is very slow. SPDK instead handles this by
implementing a lookup table of va to pa/iova which we populate by scanning
through the DPDK memory segments at start up, so the lookup in our table is
sufficiently fast for storage use cases. If the list of memory segments changes,
we need to know about it in order to update our map.


Hi Benjamin,

So, in other words, we need callbacks on alloa/free. What information 
would SPDK need when receiving this notification? Since we can't really 
know in advance how many pages we allocate (it may be one, it may be a 
thousand) and they no longer are guaranteed to be contiguous, would a 
per-page callback be OK? Alternatively, we could have one callback per 
operation, but only provide VA and size of allocated memory, while 
leaving everything else to the user. I do add a virt2memseg() function 
which would allow you to look up segment physical addresses easier, so

you won't have to manually scan memseg lists to get IOVA for a given VA.

Thanks for your feedback and suggestions!



Having the map also enables a number of other nice things - for instance we
allow users to register memory that wasn't allocated through DPDK and use it for
DMA operations. We keep that va to pa/iova mapping in the same map. I appreciate
you adding APIs to dynamically register this type of memory with the IOMMU on
our behalf. That allows us to eliminate a nasty hack where we were looking up
the vfio file descriptor through sysfs in order to send the registration ioctl.


  * Added contiguous memory allocation API's for rte_malloc and rte_memzone
  * Integrated Pawel Wodkowski's patch [1] for registering/unregistering memory
with VFIO



--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH] member: fix memory leak on error

2017-12-22 Thread Burakov, Anatoly

On 22-Dec-17 12:01 AM, Wang, Yipeng1 wrote:

Thank you Anatoly for finding this issue. In the code I tried to reuse the 
rte_member_free function to free memory but it may not be executed through.

Because of this, I may not properly release setsum struct neither. I will post 
a fix for both soon.

Thanks


Yep, i can see that now. Didn't think to look inside rte_member_free() 
:/ However, you're creating a race condition there - you're unlocking a 
tailq, and then locking (and unlocking) it again inside 
rte_member_free() - it probably needs _thread_unsafe() functions that 
you can call from behind the lock.


--
Thanks,
Anatoly


[dpdk-dev] [PATCH] test: add malloc stats dump command

2017-12-22 Thread Anatoly Burakov
This can be useful for checking if an autotest leaks memory after
its execution.

Signed-off-by: Anatoly Burakov 
---
 test/test/commands.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/test/test/commands.c b/test/test/commands.c
index 4097a33..f49f7e8 100644
--- a/test/test/commands.c
+++ b/test/test/commands.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -160,13 +161,20 @@ static void cmd_dump_parsed(void *parsed_result,
rte_eal_devargs_dump(stdout);
else if (!strcmp(res->dump, "dump_log_types"))
rte_log_dump(stdout);
+   else if (!strcmp(res->dump, "dump_malloc_stats"))
+   rte_malloc_dump_stats(stdout, NULL);
 }
 
 cmdline_parse_token_string_t cmd_dump_dump =
TOKEN_STRING_INITIALIZER(struct cmd_dump_result, dump,
-"dump_physmem#dump_memzone#"
-"dump_struct_sizes#dump_ring#dump_mempool#"
-"dump_devargs#dump_log_types");
+"dump_physmem#"
+"dump_memzone#"
+"dump_struct_sizes#"
+"dump_ring#"
+"dump_mempool#"
+"dump_malloc_stats#"
+"dump_devargs#"
+"dump_log_types");
 
 cmdline_parse_inst_t cmd_dump = {
.f = cmd_dump_parsed,  /* function to call */
-- 
2.7.4


Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules

2017-12-22 Thread Hemant Agrawal

On 12/22/2017 2:13 PM, Thomas Monjalon wrote:

Hi,

22/12/2017 06:57, Hemant Agrawal:

This patch moves the Linux kernel modules code to a common place.
 - Separate the kernel module code from user space code.
 - The GPL-2.0 licensed code is separated from the BSD-3 licensed userspace
   code


What is the benefit of separate things by license?


The separation makes it easy to identify and check the license.

Any patch introducing new file in *non-kern* folders shall not be 
GPL-2.0 licensed.  Or GPL-2.0 license is allowed only for kern folder.



These modules are Linux modules, so they should be in the linuxapp dir.



This is a cleaner separation w.r.t userspace/kernel space code.
*kern* is a better placefolder for LKMs.

Also eal is not getting overloaded.

linuxapp is part of librte_eal.  KNI is not related to EAL, but still 
the kni kernel code is added to librte_eal under linuxapp.




There are also some kernel modules in the bsdapp directory.


We can move them as well.








[dpdk-dev] [PATCH 1/6] test: fix memory leak in bitmap test

2017-12-22 Thread Anatoly Burakov
Fixes: c7e4a134e769 ("test: verify bitmap operations")
Cc: pbhagavat...@caviumnetworks.com
Signed-off-by: Anatoly Burakov 
---
 test/test/test_bitmap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/test/test/test_bitmap.c b/test/test/test_bitmap.c
index 5c9eee9..7045d33 100644
--- a/test/test/test_bitmap.c
+++ b/test/test/test_bitmap.c
@@ -186,6 +186,9 @@ test_bitmap(void)
if (test_bitmap_scan_operations(bmp) < 0)
return TEST_FAILED;
 
+   rte_bitmap_free(bmp);
+   rte_free(mem);
+
return TEST_SUCCESS;
 }
 
-- 
2.7.4


[dpdk-dev] [PATCH 2/6] test: fix memory leak in reorder autotest

2017-12-22 Thread Anatoly Burakov
Add a teardown function that frees allocated resources.

Fixes: d0c9b58d7156 ("app/test: new reorder unit test")
Cc: sergio.gonzalez.mon...@intel.com
Cc: reshma.pat...@intek.com
Cc: sta...@dpdk.org
Signed-off-by: Anatoly Burakov 
---
 test/test/test_reorder.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/test/test/test_reorder.c b/test/test/test_reorder.c
index 4ec22ac..429f6eb4 100644
--- a/test/test/test_reorder.c
+++ b/test/test/test_reorder.c
@@ -360,9 +360,20 @@ test_setup(void)
return 0;
 }
 
+static void
+test_teardown(void)
+{
+   rte_reorder_free(test_params->b);
+   test_params->b = NULL;
+   rte_mempool_free(test_params->p);
+   test_params->p = NULL;
+}
+
+
 static struct unit_test_suite reorder_test_suite  = {
 
.setup = test_setup,
+   .teardown = test_teardown,
.suite_name = "Reorder Unit Test Suite",
.unit_test_cases = {
TEST_CASE(test_reorder_create),
-- 
2.7.4


[dpdk-dev] [PATCH 3/6] test: fix memory leak in ring autotest

2017-12-22 Thread Anatoly Burakov
Fixes: af75078fece3 ("first public release")
Cc: sta...@dpdk.org
Signed-off-by: Anatoly Burakov 
---
 test/test/test_ring.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 5eb40a0..004d67e 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -760,6 +760,7 @@ test_ring_basic_ex(void)
 
ret = 0;
 fail_test:
+   rte_ring_free(rp);
if (obj != NULL)
rte_free(obj);
 
@@ -894,6 +895,8 @@ test_ring(void)
/* dump the ring status */
rte_ring_list_dump(stdout);
 
+   rte_ring_free(r);
+
return 0;
 }
 
-- 
2.7.4


[dpdk-dev] [PATCH 4/6] test: fix memory leak in ring perf autotest

2017-12-22 Thread Anatoly Burakov
Fixes: ac3fb3019c52 ("app: rework ring tests")
Cc: sta...@dpdk.org
Signed-off-by: Anatoly Burakov 
---
 test/test/test_ring_perf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c
index 84d2003..b586459 100644
--- a/test/test/test_ring_perf.c
+++ b/test/test/test_ring_perf.c
@@ -420,6 +420,7 @@ test_ring_perf(void)
printf("\n### Testing using two NUMA nodes ###\n");
run_on_core_pair(&cores, enqueue_bulk, dequeue_bulk);
}
+   rte_ring_free(r);
return 0;
 }
 
-- 
2.7.4


[dpdk-dev] [PATCH 6/6] test: fix memory leak in timer perf autotest

2017-12-22 Thread Anatoly Burakov
Fixes: 277afaf3dbcb ("app/test: add timer_perf")
Cc: sta...@dpdk.org
Signed-off-by: Anatoly Burakov 
---
 test/test/test_timer_perf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/test/test/test_timer_perf.c b/test/test/test_timer_perf.c
index 467ae13..9804133 100644
--- a/test/test/test_timer_perf.c
+++ b/test/test/test_timer_perf.c
@@ -156,6 +156,7 @@ test_timer_perf(void)
printf("Time per rte_timer_manage with zero callbacks: %"PRIu64" 
cycles\n",
(end_tsc - start_tsc + iterations/2) / iterations);
 
+   rte_free(tms);
return 0;
 }
 
-- 
2.7.4


[dpdk-dev] [PATCH 5/6] test: fix memory leak in table autotest

2017-12-22 Thread Anatoly Burakov
Always deallocate allocated resources after the test is done.

Fixes: 5205954791cb ("app/test: packet framework unit tests")
Cc: cristian.dumitre...@intel.com
Cc: sta...@dpdk.org
Signed-off-by: Anatoly Burakov 
---
 test/test/test_table.c | 44 
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/test/test/test_table.c b/test/test/test_table.c
index db7d4e6..c5a6e00 100644
--- a/test/test/test_table.c
+++ b/test/test/test_table.c
@@ -84,6 +84,14 @@ uint64_t pipeline_test_hash(void *key,
 }
 
 static void
+app_free_resources(void) {
+   int i;
+   for (i = 0; i < N_PORTS; i++)
+   rte_ring_free(rings_rx[i]);
+   rte_mempool_free(pool);
+}
+
+static void
 app_init_mbuf_pools(void)
 {
/* Init the buffer pool */
@@ -142,18 +150,20 @@ app_init_rings(void)
 static int
 test_table(void)
 {
-   int status, failures;
+   int status, ret;
unsigned i;
 
-   failures = 0;
+   ret = TEST_SUCCESS;
 
app_init_rings();
app_init_mbuf_pools();
 
printf("\n\n\n\nPipeline tests\n");
 
-   if (test_table_pipeline() < 0)
-   return -1;
+   if (test_table_pipeline() < 0) {
+   ret = TEST_FAILED;
+   goto end;
+   }
 
printf("\n\n\n\nPort tests\n");
for (i = 0; i < n_port_tests; i++) {
@@ -161,8 +171,8 @@ test_table(void)
if (status < 0) {
printf("\nPort test number %d failed (%d).\n", i,
status);
-   failures++;
-   return -1;
+   ret = TEST_FAILED;
+   goto end;
}
}
 
@@ -172,8 +182,8 @@ test_table(void)
if (status < 0) {
printf("\nTable test number %d failed (%d).\n", i,
status);
-   failures++;
-   return -1;
+   ret = TEST_FAILED;
+   goto end;
}
}
 
@@ -183,21 +193,23 @@ test_table(void)
if (status < 0) {
printf("\nCombined table test number %d failed with "
"reason number %d.\n", i, status);
-   failures++;
-   return -1;
+   ret = TEST_FAILED;
+   goto end;
}
}
 
-   if (failures)
-   return -1;
-
 #ifdef RTE_LIBRTE_ACL
printf("\n\n\n\nACL tests\n");
-   if (test_table_acl() < 0)
-   return -1;
+   if (test_table_acl() < 0) {
+   ret = TEST_FAILED;
+   goto end;
+   }
 #endif
 
-   return 0;
+end:
+   app_free_resources();
+
+   return ret;
 }
 
 REGISTER_TEST_COMMAND(table_autotest, test_table);
-- 
2.7.4


[dpdk-dev] [PATCH] test: register test as failed if setup failed

2017-12-22 Thread Anatoly Burakov
If test set up couldn't be completed, the test was previously
shown as succeeding, even though setup failed. Fix this to report
test as failed, and count all tests that should've been executed,
as failed as well.

Fixes: ffac67b1f71b ("app/test: new assert macros and test suite runner")
Cc: declan.dohe...@intel.com
Cc: sta...@dpdk.org
Signed-off-by: Anatoly Burakov 
---
 test/test/test.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/test/test/test.c b/test/test/test.c
index 0e6ff7c..fe41d40 100644
--- a/test/test/test.c
+++ b/test/test/test.c
@@ -162,8 +162,20 @@ unit_test_suite_runner(struct unit_test_suite *suite)
}
 
if (suite->setup)
-   if (suite->setup() != 0)
+   if (suite->setup() != 0) {
+   /*
+* setup failed, so count all enabled tests and mark
+* them as failed
+*/
+   while (suite->unit_test_cases[total].testcase) {
+   if (!suite->unit_test_cases[total].enabled)
+   skipped++;
+   else
+   failed++;
+   total++;
+   }
goto suite_summary;
+   }
 
printf(" + --- 
+\n");
 
-- 
2.7.4


[dpdk-dev] [PATCH] vfio: fix error check when checking if vfio is enabled

2017-12-22 Thread Anatoly Burakov
rte_eal_check_module() might return -1, which would have been a
"not false" condition for mod_available. Fix that to only report
vfio being enabled if rte_eal_check_module() returns 1.

Fixes: 221f7c220d6b ("vfio: move global config out of PCI files")
Cc: vikto...@rehivetech.com
Cc: sta...@dpdk.org
Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 58f0123..fb1a622 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -525,7 +525,7 @@ rte_vfio_enable(const char *modname)
 int
 rte_vfio_is_enabled(const char *modname)
 {
-   const int mod_available = rte_eal_check_module(modname);
+   const int mod_available = rte_eal_check_module(modname) > 0;
return vfio_cfg.vfio_enabled && mod_available;
 }
 
-- 
2.7.4


[dpdk-dev] [RFC v4 PATCH 0/8] event: eventdev OPDL PMD

2017-12-22 Thread Liang Ma
The OPDL (Ordered Packet Distribution Library) eventdev is a specific
implementation of the eventdev API. It is particularly suited to packet
processing workloads that have high throughput and low latency 
requirements. All packets follow the same path through the device.
The order which packets  follow is determinted by the order in which
queues are set up. Packets are left on the ring until they are transmitted.
As a result packets do not go out of order.

Features:

The OPDL eventdev implements a subset of features of the eventdev API;

Queues
 * Atomic
 * Ordered (Parallel is supported as parallel is a subset of Ordered)
 * Single-Link

Ports
 * Load balanced (for Atomic, Ordered, Parallel queues)
 * Single Link (for single-link queues)

Single Port Queue

It is possible to create a Single Port Queue 
RTE_EVENT_QUEUE_CFG_SINGLE_LINK. Packets dequeued from this queue do
not need to be re-enqueued (as is the case with an ordered queue). The 
purpose of this queue is to allow for asynchronous handling of packets in 
the middle of a pipeline. Ordered queues in the middle of a pipeline 
cannot delete packets.


Queue Dependencies

As stated the order in which packets travel through queues is static in
nature. They go through the queues in the order the queues are setup at
initialisation rte_event_queue_setup(). For example if an application
sets up 3 queues, Q0, Q1, Q2 and has 3 assoicated ports P0, P1, P2 and 
P3 then packets must be

 * Enqueued onto Q0 (typically through P0), then

 * Dequeued from Q0 (typically through P1), then

 * Enqueued onto Q1 (also through P1), then

 * Dequeued from Q2 (typically through P2),  then

 * Enqueued onto Q3 (also through P2), then

 * Dequeued from Q3 (typically through P3) and then transmitted on the 
   relevant eth port


Limitations

The opdl implementation has a number of limitations. These limitations are
due to the static nature of the underlying queues. It is because of this
that the implementation can achieve such high throughput and low latency

The following list is a comprehensive outline of the what is supported and
the limitations / restrictions imposed by the opdl pmd

 - The order in which packets moved between queues is static and fixed 
   (dynamic scheduling is not supported).

 - NEW, RELEASE op type are not explicitly supported. RX (first enqueue) 
   implicitly adds NEW event types, and TX (last dequeue) implicitly does
   RELEASE event types.

 - All packets follow the same path through device queues.

 - Flows within queues are NOT supported.

 - Event priority is NOT supported.

 - Once the device is stopped all inflight events are lost. Applications should 
   clear all inflight events before stopping it.

 - Each port can only be associated with one queue.

 - Each queue can have multiple ports associated with it.

 - Each worker core has to dequeue the maximum burst size for that port.


Reference
General concept of event driven programming model
[http://dpdk.org/doc/guides/eventdevs/index.html]

Original Ordered Pipeline Design slides 
[https://dpdksummit.com/Archive/pdf/2017Asia/DPDK-China2017-Ma-OPDL.pdf]

ChangeLoga
[v4]
  1. fix 2 coding style issue
[v3]
  1. add dynamic log support.
  2. update maintainer, release notes.
  3. fix issues with review comments.

[v2]
  1. merge the opdl eventdev unit test code into opdl pmd.
  2. propose three new capability capability flags for overall eventdev.
  3. remove the opdl pmd example code.
  4. remove the opdl pmd example doc.


Liang Ma (8):
  event/opdl: add the opdl ring infrastructure library
  event/opdl: add the opdl pmd main body and helper function
  eventdev/opdl: opdl eventdev pmd unit test function
  lib/librte_eventdev: extend the eventdev capability flags
  event/*: apply the three new capability flags for sw/dppa2/octeontx
  maintainers: add the opdl pmd maintainer information
  doc:update 18.02 release notes
  doc: add eventdev opdl pmd docuement

 MAINTAINERS   |6 +
 config/common_base|6 +
 doc/guides/eventdevs/index.rst|1 +
 doc/guides/eventdevs/opdl.rst |  162 +++
 doc/guides/rel_notes/release_18_02.rst|   11 +
 drivers/event/Makefile|1 +
 drivers/event/dpaa2/dpaa2_eventdev.c  |6 +-
 drivers/event/octeontx/ssovf_evdev.c  |6 +-
 drivers/event/opdl/Makefile   |   66 ++
 drivers/event/opdl/opdl_evdev.c   |  794 +
 drivers/event/opdl/opdl_evdev.h   |  342 ++
 drivers/event/opdl/opdl_evdev_init.c  |  963 
 drivers/event/opdl/opdl_evdev_xstats.c|  205 
 drivers/event/opdl/opdl_log.h |   59 +
 drivers/event/opdl/opdl_ring.c| 1252 +
 drivers/event/opdl/opdl_ring.h|  628 +++
 drivers/event/opdl

[dpdk-dev] [PATCH v4 1/8] event/opdl: add the opdl ring infrastructure library

2017-12-22 Thread Liang Ma
OPDL ring is the core infrastructure of OPDL PMD. OPDL ring library
provide the core data structure and core helper function set. The Ring
implements a single ring multi-port/stage pipelined packet distribution
mechanism. This mechanism has the following characteristics:

• No multiple queue cost, therefore, latency is significant reduced.
• Fixed dependencies between queue/ports is more suitable for complex.
  fixed pipelines of stateless packet processing (static pipeline).
• Has decentralized distribution (no scheduling core).
• Packets remain in order (no reorder core(s)).
* Update build system to enable compilation.

Signed-off-by: Liang Ma 
Signed-off-by: Peter Mccarthy 
---
 config/common_base|6 +
 drivers/event/Makefile|1 +
 drivers/event/opdl/Makefile   |   62 +
 drivers/event/opdl/opdl_log.h |   59 +
 drivers/event/opdl/opdl_ring.c| 1252 +
 drivers/event/opdl/opdl_ring.h|  628 +++
 drivers/event/opdl/rte_pmd_evdev_opdl_version.map |3 +
 mk/rte.app.mk |1 +
 mk/toolchain/gcc/rte.toolchain-compat.mk  |6 +
 mk/toolchain/icc/rte.toolchain-compat.mk  |6 +
 10 files changed, 2024 insertions(+)
 create mode 100644 drivers/event/opdl/Makefile
 create mode 100644 drivers/event/opdl/opdl_log.h
 create mode 100644 drivers/event/opdl/opdl_ring.c
 create mode 100644 drivers/event/opdl/opdl_ring.h
 create mode 100644 drivers/event/opdl/rte_pmd_evdev_opdl_version.map

diff --git a/config/common_base b/config/common_base
index e74febe..67adaba 100644
--- a/config/common_base
+++ b/config/common_base
@@ -594,6 +594,12 @@ CONFIG_RTE_LIBRTE_PMD_OCTEONTX_SSOVF=y
 CONFIG_RTE_LIBRTE_PMD_OCTEONTX_SSOVF_DEBUG=n
 
 #
+# Compile PMD for OPDL event device
+#
+CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV=y
+CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV_DEBUG=n
+
+#
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
diff --git a/drivers/event/Makefile b/drivers/event/Makefile
index 1f9c0ba..d62 100644
--- a/drivers/event/Makefile
+++ b/drivers/event/Makefile
@@ -35,5 +35,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_EVENTDEV) += skeleton
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_OCTEONTX_SSOVF) += octeontx
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_DPAA2_EVENTDEV) += dpaa2
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/event/opdl/Makefile b/drivers/event/opdl/Makefile
new file mode 100644
index 000..8277e25
--- /dev/null
+++ b/drivers/event/opdl/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_pmd_opdl_event.a
+
+# build flags
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+# for older GCC versions, allow us to initialize an event using
+# designated initializers.
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+ifeq ($(shell test $(GCC_VERSION) -le 50 && echo 1), 1)
+CFLAGS += -Wno-missing-field-initializers
+endif
+endif
+
+LDLIBS += -lrte_eal -lrte_eventdev -lrte_kvargs
+LDLIBS += -lrte_bus_vdev -lrte_mbuf -lrte_mempool
+
+# library version
+LIBABIVER := 1
+
+# versioning export map
+EXPORT_MAP := rte_pmd_evdev_opdl_version.map
+
+# library source fil

[dpdk-dev] [PATCH v4 2/8] event/opdl: add the opdl pmd main body and helper function

2017-12-22 Thread Liang Ma
This commit adds a OPDL implementation of the eventdev API. The
implementation here is intended to enable the community to use
the OPDL infrastructure under eventdev API.

The main components of the implementation is three files:
  - opdl_evdev.c  Creation, configuration, etc
  - opdl_evdev_xstats.c   helper function to support stats collection
  - opdl_evdev.h  include the main data structure of opdl
  device and all the function prototype
  need to be exposed to support eventdev API.

  - opdl_evdev_init.c implement all initailization helper function

This commit only adds the implementation, no existing DPDK files
are modified.

Signed-off-by: Liang Ma 
Signed-off-by: Peter Mccarthy 
---
 drivers/event/opdl/Makefile|   3 +
 drivers/event/opdl/opdl_evdev.c| 791 +++
 drivers/event/opdl/opdl_evdev.h| 341 
 drivers/event/opdl/opdl_evdev_init.c   | 963 +
 drivers/event/opdl/opdl_evdev_xstats.c | 205 +++
 5 files changed, 2303 insertions(+)
 create mode 100644 drivers/event/opdl/opdl_evdev.c
 create mode 100644 drivers/event/opdl/opdl_evdev.h
 create mode 100644 drivers/event/opdl/opdl_evdev_init.c
 create mode 100644 drivers/event/opdl/opdl_evdev_xstats.c

diff --git a/drivers/event/opdl/Makefile b/drivers/event/opdl/Makefile
index 8277e25..473a09c 100644
--- a/drivers/event/opdl/Makefile
+++ b/drivers/event/opdl/Makefile
@@ -55,6 +55,9 @@ EXPORT_MAP := rte_pmd_evdev_opdl_version.map
 
 # library source files
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl_ring.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl_evdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl_evdev_init.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl_evdev_xstats.c
 
 # export include files
 SYMLINK-y-include +=
diff --git a/drivers/event/opdl/opdl_evdev.c b/drivers/event/opdl/opdl_evdev.c
new file mode 100644
index 000..f51a174
--- /dev/null
+++ b/drivers/event/opdl/opdl_evdev.c
@@ -0,0 +1,791 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "opdl_evdev.h"
+#include "opdl_ring.h"
+#include "opdl_log.h"
+
+#define EVENTDEV_NAME_OPDL_PMD event_opdl
+#define NUMA_NODE_ARG "numa_node"
+#define DO_VALIDATION_ARG "do_validation"
+#define DO_TEST_ARG "self_test"
+
+
+uint16_t
+opdl_event_enqueue_burst(void *port,
+const struct rte_event ev[],
+uint16_t num)
+{
+   struct opdl_port *p = port;
+
+   if (unlikely(!p->opdl->data->dev_started))
+   return 0;
+
+
+   /* either rx_enqueue or disclaim*/
+   return p->enq(p, ev, num);
+}
+
+uint16_t
+opdl_event_enqueue(void *port, const struct rte_event *ev)
+{
+   struct opdl_port *p = port;
+
+   if (unlikely(!p->opdl->data->dev_started))
+   return 0;
+
+
+   return p->enq(p, ev, 1);
+}
+
+uint16_t
+opdl_event_dequeue_burst(void *port,
+struct rte_event *ev,
+uint16_t num,
+uint64_t wait)
+{
+   struct opdl_port *p = (void *)port;
+
+   RTE_SET_USED(wait);
+
+   if (unlikely(!p->op

[dpdk-dev] [PATCH v4 3/8] eventdev/opdl: opdl eventdev pmd unit test function

2017-12-22 Thread Liang Ma
This commit adds unit test inside the OPDL PMD. There is a PMd parameter
"self_test" can be used to triger the test when vdev bus probe opdl device

  e.g.

  sudo ./app/test --vdev="event_opdl0,self_test=1"

Signed-off-by: Liang Ma 
Signed-off-by: Peter Mccarthy 
---
 drivers/event/opdl/Makefile |1 +
 drivers/event/opdl/opdl_evdev.c |3 +
 drivers/event/opdl/opdl_evdev.h |1 +
 drivers/event/opdl/opdl_test.c  | 1080 +++
 4 files changed, 1085 insertions(+)
 create mode 100644 drivers/event/opdl/opdl_test.c

diff --git a/drivers/event/opdl/Makefile b/drivers/event/opdl/Makefile
index 473a09c..62c9e1f 100644
--- a/drivers/event/opdl/Makefile
+++ b/drivers/event/opdl/Makefile
@@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl_ring.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl_evdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl_evdev_init.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl_evdev_xstats.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPDL_EVENTDEV) += opdl_test.c
 
 # export include files
 SYMLINK-y-include +=
diff --git a/drivers/event/opdl/opdl_evdev.c b/drivers/event/opdl/opdl_evdev.c
index f51a174..fca1dc9 100644
--- a/drivers/event/opdl/opdl_evdev.c
+++ b/drivers/event/opdl/opdl_evdev.c
@@ -753,6 +753,9 @@ opdl_probe(struct rte_vdev_device *vdev)
str_len = strlen(name);
memcpy(opdl->service_name, name, str_len);
 
+   if (do_test == 1)
+   test_result =  opdl_selftest();
+
return test_result;
 }
 
diff --git a/drivers/event/opdl/opdl_evdev.h b/drivers/event/opdl/opdl_evdev.h
index 33bc8f2..7849af1 100644
--- a/drivers/event/opdl/opdl_evdev.h
+++ b/drivers/event/opdl/opdl_evdev.h
@@ -337,5 +337,6 @@ int initialise_all_other_ports(struct rte_eventdev *dev);
 int initialise_queue_zero_ports(struct rte_eventdev *dev);
 int assign_internal_queue_ids(struct rte_eventdev *dev);
 void destroy_queues_and_rings(struct rte_eventdev *dev);
+int opdl_selftest(void);
 
 #endif /* _OPDL_EVDEV_H_ */
diff --git a/drivers/event/opdl/opdl_test.c b/drivers/event/opdl/opdl_test.c
new file mode 100644
index 000..bd4083d
--- /dev/null
+++ b/drivers/event/opdl/opdl_test.c
@@ -0,0 +1,1080 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "opdl_evdev.h"
+#include "opdl_log.h"
+
+
+#define MAX_PORTS 16
+#define MAX_QIDS 16
+#define NUM_PACKETS (1<<18)
+#define NUM_EVENTS 256
+#define BURST_SIZE 32
+
+
+
+static int evdev;
+
+struct test {
+   struct rte_mempool *mbuf_pool;
+   uint8_t port[MAX_PORTS];
+   uint8_t qid[MAX_QIDS];
+   int nb_qids;
+};
+
+static struct rte_mempool *eventdev_func_mempool;
+
+static __rte_always_inline struct rte_mbuf *
+rte_gen_arp(int portid, struct rte_mempool *mp)
+{
+   /*
+* len = 14 + 46
+* ARP, Request who-has 10.0.0.1 tell 10.0.0.2, length 46
+*/
+   static const uint8_t arp_request[] = {
+   /*0x:*/ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xec, 0xa8,
+   0x6b, 0xfd, 0x02, 0x29, 0x08, 0x06, 0x00, 0x01,
+   /*0x0010:*/ 0x08, 0

[dpdk-dev] [PATCH v4 5/8] event/*: apply the three new capability flags for sw/dppa2/octeontx

2017-12-22 Thread Liang Ma
Signed-off-by: Liang Ma 
Signed-off-by: Peter Mccarthy 
---
 drivers/event/dpaa2/dpaa2_eventdev.c | 6 +-
 drivers/event/octeontx/ssovf_evdev.c | 6 +-
 drivers/event/sw/sw_evdev.c  | 5 -
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/event/dpaa2/dpaa2_eventdev.c 
b/drivers/event/dpaa2/dpaa2_eventdev.c
index 13e7122..e437edc 100644
--- a/drivers/event/dpaa2/dpaa2_eventdev.c
+++ b/drivers/event/dpaa2/dpaa2_eventdev.c
@@ -333,7 +333,11 @@ dpaa2_eventdev_info_get(struct rte_eventdev *dev,
DPAA2_EVENT_MAX_PORT_ENQUEUE_DEPTH;
dev_info->max_num_events = DPAA2_EVENT_MAX_NUM_EVENTS;
dev_info->event_dev_cap = RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED |
-   RTE_EVENT_DEV_CAP_BURST_MODE;
+   RTE_EVENT_DEV_CAP_BURST_MODE|
+   RTE_EVENT_DEV_CAP_RUNTIME_PORT_LINK |
+   RTE_EVENT_DEV_CAP_MULTIPLE_QUEUE_PORT |
+   RTE_EVENT_DEV_CAP_NONSEQ_MODE;
+
 }
 
 static int
diff --git a/drivers/event/octeontx/ssovf_evdev.c 
b/drivers/event/octeontx/ssovf_evdev.c
index b80a6c0..d85b4fb 100644
--- a/drivers/event/octeontx/ssovf_evdev.c
+++ b/drivers/event/octeontx/ssovf_evdev.c
@@ -187,7 +187,11 @@ ssovf_info_get(struct rte_eventdev *dev, struct 
rte_event_dev_info *dev_info)
dev_info->max_num_events =  edev->max_num_events;
dev_info->event_dev_cap = RTE_EVENT_DEV_CAP_QUEUE_QOS |
RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED |
-   RTE_EVENT_DEV_CAP_QUEUE_ALL_TYPES;
+   RTE_EVENT_DEV_CAP_QUEUE_ALL_TYPES|
+   RTE_EVENT_DEV_CAP_RUNTIME_PORT_LINK |
+   RTE_EVENT_DEV_CAP_MULTIPLE_QUEUE_PORT |
+   RTE_EVENT_DEV_CAP_NONSEQ_MODE;
+
 }
 
 static int
diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 1ef6340..aed521b 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -488,7 +488,10 @@ sw_info_get(struct rte_eventdev *dev, struct 
rte_event_dev_info *info)
RTE_EVENT_DEV_CAP_QUEUE_QOS |
RTE_EVENT_DEV_CAP_BURST_MODE |
RTE_EVENT_DEV_CAP_EVENT_QOS |
-   RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE),
+   RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE|
+   RTE_EVENT_DEV_CAP_RUNTIME_PORT_LINK |
+   RTE_EVENT_DEV_CAP_MULTIPLE_QUEUE_PORT |
+   RTE_EVENT_DEV_CAP_NONSEQ_MODE),
};
 
*info = evdev_sw_info;
-- 
2.7.5

--
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.



[dpdk-dev] [PATCH v4 4/8] lib/librte_eventdev: extend the eventdev capability flags

2017-12-22 Thread Liang Ma
this commitd add three new eventdev capability flags

RTE_EVENT_DEV_CAP_NONSEQ_MODE

Event device is capable of operating in none sequential mode. The path
of the event is not necessary to be sequential. Application can change
the path of event at runtime.if the flag is not set, then event each event
will follow a path from queue 0 to queue 1 to queue 2 etc. If the flag is
set, events may be sent to queues in any order.If the flag is not set, the
eventdev will return an error when the application enqueues an event for a
qid which is not the next in the sequence.

RTE_EVENT_DEV_CAP_RUNTIME_PORT_LINK

Event device is capable of configuring the queue/port link at runtime.
if the flag is not set, the eventdev queue/port linkis only can be
configured during  initialization.

RTE_EVENT_DEV_CAP_MULTIPLE_QUEUE_PORT

Event device is capable of setting up the link between multiple queue
with single port. if  the flag is not set, the eventdev can only map a
single queue to each port or map a signle queue to many port.

Signed-off-by: Liang Ma 
Signed-off-by: Peter Mccarthy 
---
 lib/librte_eventdev/rte_eventdev.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/lib/librte_eventdev/rte_eventdev.h 
b/lib/librte_eventdev/rte_eventdev.h
index 1bbea57..91fd4ef 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -293,6 +293,28 @@ struct rte_mbuf; /* we just use mbuf pointers; no need to 
include rte_mbuf.h */
  * @see rte_event_dequeue_burst() rte_event_enqueue_burst()
  */
 
+#define RTE_EVENT_DEV_CAP_NONSEQ_MODE (1ULL << 6)
+/**< Event device is capable of operating in none sequential mode. The path
+ * of the event is not necessary to be sequential. Application can change
+ * the path of event at runtime.if the flag is not set, then event each event
+ * will follow a path from queue 0 to queue 1 to queue 2 etc. If the flag is
+ * set, events may be sent to queues in any order.If the flag is not set, the
+ * eventdev will return an error when the application enqueues an event for a
+ * qid which is not the next in the sequence.
+ */
+
+#define RTE_EVENT_DEV_CAP_RUNTIME_PORT_LINK   (1ULL << 7)
+/**< Event device is capable of configuring the queue/port link at runtime.
+ * if the flag is not set, the eventdev queue/port link is only can be
+ * configured during  initialization.
+ */
+
+#define RTE_EVENT_DEV_CAP_MULTIPLE_QUEUE_PORT (1ULL << 8)
+/**< Event device is capable of setting up the link between multiple queue
+ * with single port. if  the flag is not set, the eventdev can only map a
+ * single queue to each port or map a signle queue to many port.
+ */
+
 /* Event device priority levels */
 #define RTE_EVENT_DEV_PRIORITY_HIGHEST   0
 /**< Highest priority expressed across eventdev subsystem
-- 
2.7.5

--
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.



[dpdk-dev] [PATCH v4 6/8] maintainers: add the opdl pmd maintainer information

2017-12-22 Thread Liang Ma
Signed-off-by: Liang Ma 
Signed-off-by: Peter Mccarthy 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f0baeb4..1b8d617 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -700,6 +700,12 @@ F: doc/guides/eventdevs/sw.rst
 F: examples/eventdev_pipeline_sw_pmd/
 F: doc/guides/sample_app_ug/eventdev_pipeline_sw_pmd.rst
 
+Software Eventdev PMD
+M: Liang Ma 
+M: Peter Mccarthy 
+F: drivers/event/opdl/
+F: doc/guides/eventdevs/opdl.rst
+
 
 Packet processing
 -
-- 
2.7.5

--
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.



[dpdk-dev] [PATCH v4 7/8] doc:update 18.02 release notes

2017-12-22 Thread Liang Ma
add opdl pmd description

Signed-off-by: Liang Ma 
Signed-off-by: Peter Mccarthy 
---
 doc/guides/rel_notes/release_18_02.rst | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_02.rst 
b/doc/guides/rel_notes/release_18_02.rst
index 24b67bb..b2dc39c 100644
--- a/doc/guides/rel_notes/release_18_02.rst
+++ b/doc/guides/rel_notes/release_18_02.rst
@@ -42,6 +42,17 @@ New Features
  =
 
 
+* **Added New eventdev OPDL PMD**
+  The OPDL (Ordered Packet Distribution Library) eventdev is a specific
+  implementation of the eventdev API. It is particularly suited to packet
+  processing workloads that have high throughput and low latency requirements.
+  All packets follow the same path through the device. The order in which
+  packets  follow is determinted by the order in which queues are set up.
+  Events are left on the ring until they are transmitted. As a result packets
+  do not go out of order.
+
+  With this change, application can use OPDL PMD by eventdev api.
+
 API Changes
 ---
 
-- 
2.7.5

--
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.



[dpdk-dev] [PATCH v4 8/8] doc: add eventdev opdl pmd docuement

2017-12-22 Thread Liang Ma
Add the description about opdl pmd

Signed-off-by: Liang Ma 
Signed-off-by: Peter Mccarthy 
---
 doc/guides/eventdevs/index.rst |   1 +
 doc/guides/eventdevs/opdl.rst  | 162 +
 2 files changed, 163 insertions(+)
 create mode 100644 doc/guides/eventdevs/opdl.rst

diff --git a/doc/guides/eventdevs/index.rst b/doc/guides/eventdevs/index.rst
index ba2048c..07a41bc 100644
--- a/doc/guides/eventdevs/index.rst
+++ b/doc/guides/eventdevs/index.rst
@@ -40,3 +40,4 @@ application trough the eventdev API.
 dpaa2
 sw
 octeontx
+opdl
diff --git a/doc/guides/eventdevs/opdl.rst b/doc/guides/eventdevs/opdl.rst
new file mode 100644
index 000..4922eaa
--- /dev/null
+++ b/doc/guides/eventdevs/opdl.rst
@@ -0,0 +1,162 @@
+..  BSD LICENSE
+Copyright(c) 2017 Intel Corporation. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+OPDL Eventdev Poll Mode Driver
+==
+
+The OPDL (Ordered Packet Distribution Library) eventdev is a specific\
+implementation of the eventdev API. It is particularly suited to packet\
+processing workloads that have high throughput and low latency requirements.\
+All packets follow the same path through the device. The order in which\
+packets  follow is determinted by the order in which queues are set up.\
+Events are left on the ring until they are transmitted. As a result packets\
+do not go out of order
+
+
+Features
+
+
+The OPDL  eventdev implements a subset of features of the eventdev API;
+
+Queues
+ * Atomic
+ * Ordered (Parallel is supported as parallel is a subset of Ordered)
+ * Single-Link
+
+Ports
+ * Load balanced (for Atomic, Ordered, Parallel queues)
+ * Single Link (for single-link queues)
+
+
+Configuration and Options
+-
+
+The software eventdev is a vdev device, and as such can be created from the
+application code, or from the EAL command line:
+
+* Call ``rte_vdev_init("event_opdl0")`` from the application
+
+* Use ``--vdev="event_opdl0"`` in the EAL options, which will call
+  rte_vdev_init() internally
+
+Example:
+
+.. code-block:: console
+
+./your_eventdev_application --vdev="event_opdl0"
+
+
+Single Port Queue
+~
+
+It is possible to create a Single Port Queue 
``RTE_EVENT_QUEUE_CFG_SINGLE_LINK``.
+Packets dequeued from this queue do not need to be re-enqueued (as is the
+case with an ordered queue). The purpose of this queue is to allow for
+asynchronous handling of packets in the middle of a pipeline. Ordered
+queues in the middle of a pipeline cannot delete packets.
+
+
+Queue Dependencies
+~~
+
+As stated the order in which packets travel through queues is static in
+nature. They go through the queues in the order the queues are setup at
+initialisation ``rte_event_queue_setup()``. For example if an application
+sets up 3 queues, Q0, Q1, Q2 and has 3 assoicated ports P0, P1, P2 and
+P3 then packets must be
+
+ * Enqueued onto Q0 (typically through P0), then
+
+ * Dequeued from Q0 (typically through P1), then
+
+ * Enqueued onto Q1 (also through P1), then
+
+ * Dequeued from Q2 (typically through P2),  then
+
+ * Enqueued onto Q3 (also through P2), then
+
+ * Dequeued from Q3 (typically through P3) and then transmitted on the 
relevant \
+   eth port
+
+
+Limitations
+---
+
+The opdl implementation has a number of limitations. These limitations are
+due to the static nature of the underlying queues. It is because of this
+that

[dpdk-dev] [PATCH] mbuf: pktmbuf pool create helper for specific mempool ops

2017-12-22 Thread Hemant Agrawal
Introduce a new helper for pktmbuf pool, which will allow
the application to optionally specify the mempool ops name
as well.

Signed-off-by: Hemant Agrawal 
---
This change was discussed in the 
"doc: announce ABI change for pktmbuf pool create API"
http://dpdk.org/dev/patchwork/patch/32306/

 lib/librte_mbuf/rte_mbuf.c | 24 ++--
 lib/librte_mbuf/rte_mbuf.h | 42 ++
 2 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 7543662..9cc861b 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -148,15 +148,15 @@ rte_pktmbuf_init(struct rte_mempool *mp,
m->next = NULL;
 }
 
-/* helper to create a mbuf pool */
+/* helper to create a mbuf pool with given mempool ops*/
 struct rte_mempool *
-rte_pktmbuf_pool_create(const char *name, unsigned n,
-   unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
-   int socket_id)
+rte_pktmbuf_pool_create_specific(const char *name, unsigned int n,
+   unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+   int socket_id, const char *ops_name)
 {
struct rte_mempool *mp;
struct rte_pktmbuf_pool_private mbp_priv;
-   const char *mp_ops_name;
+   const char *mp_ops_name = ops_name;
unsigned elt_size;
int ret;
 
@@ -176,7 +176,9 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
if (mp == NULL)
return NULL;
 
-   mp_ops_name = rte_eal_mbuf_default_mempool_ops();
+   if (!mp_ops_name)
+   mp_ops_name = rte_eal_mbuf_default_mempool_ops();
+
ret = rte_mempool_set_ops_byname(mp, mp_ops_name, NULL);
if (ret != 0) {
RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
@@ -198,6 +200,16 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
return mp;
 }
 
+/* helper to create a mbuf pool */
+struct rte_mempool *
+rte_pktmbuf_pool_create(const char *name, unsigned int n,
+   unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+   int socket_id)
+{
+   return rte_pktmbuf_pool_create_specific(name, n, cache_size, priv_size,
+   data_room_size, socket_id, NULL);
+}
+
 /* do some sanity checks on a mbuf: panic if it fails */
 void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ce8a05d..d4681fd 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1081,6 +1081,48 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
int socket_id);
 
 /**
+ * Create a mbuf pool with specific mempool ops
+ *
+ * This function creates and initializes a packet mbuf pool. It is
+ * a wrapper to rte_mempool functions.
+ *
+ * @param name
+ *   The name of the mbuf pool.
+ * @param n
+ *   The number of elements in the mbuf pool. The optimum size (in terms
+ *   of memory usage) for a mempool is when n is a power of two minus one:
+ *   n = (2^q - 1).
+ * @param cache_size
+ *   Size of the per-core object cache. See rte_mempool_create() for
+ *   details.
+ * @param priv_size
+ *   Size of application private are between the rte_mbuf structure
+ *   and the data buffer. This value must be aligned to RTE_MBUF_PRIV_ALIGN.
+ * @param data_room_size
+ *   Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM.
+ * @param socket_id
+ *   The socket identifier where the memory should be allocated. The
+ *   value can be *SOCKET_ID_ANY* if there is no NUMA constraint for the
+ *   reserved zone.
+ * @param ops_name
+ *   The mempool ops name to be used for this mempool instead of
+ *   default mempool. The value can be *NULL* to use default mempool.
+ * @return
+ *   The pointer to the new allocated mempool, on success. NULL on error
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *- E_RTE_NO_CONFIG - function could not get pointer to rte_config 
structure
+ *- E_RTE_SECONDARY - function was called from a secondary process instance
+ *- EINVAL - cache size provided is too large, or priv_size is not aligned.
+ *- ENOSPC - the maximum number of memzones has already been allocated
+ *- EEXIST - a memzone with the same name already exists
+ *- ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_mempool *
+rte_pktmbuf_pool_create_specific(const char *name, unsigned int n,
+   unsigned int cache_size, uint16_t priv_size, uint16_t data_room_size,
+   int socket_id, const char *ops_name);
+
+/**
  * Get the data room size of mbufs stored in a pktmbuf_pool
  *
  * The data room size is the amount of data that can be stored in a
-- 
2.7.4



Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules

2017-12-22 Thread Thomas Monjalon
22/12/2017 11:04, Hemant Agrawal:
> On 12/22/2017 2:13 PM, Thomas Monjalon wrote:
> > Hi,
> >
> > 22/12/2017 06:57, Hemant Agrawal:
> >> This patch moves the Linux kernel modules code to a common place.
> >>  - Separate the kernel module code from user space code.
> >>  - The GPL-2.0 licensed code is separated from the BSD-3 licensed userspace
> >>code
> >
> > What is the benefit of separate things by license?
> 
> The separation makes it easy to identify and check the license.
> 
> Any patch introducing new file in *non-kern* folders shall not be 
> GPL-2.0 licensed.  Or GPL-2.0 license is allowed only for kern folder.

The kernel modules are in DPDK only for historical reasons.
We should get rid of them, and rely only on upstream modules.

And it should be allowed to have kernel-related files elsewhere.
Examples: GPL tools or BPF code.

> > These modules are Linux modules, so they should be in the linuxapp dir.
> 
> 
> This is a cleaner separation w.r.t userspace/kernel space code.
> *kern* is a better placefolder for LKMs.

I prefer "kernel" name.

> Also eal is not getting overloaded.
> 
> linuxapp is part of librte_eal.  KNI is not related to EAL, but still 
> the kni kernel code is added to librte_eal under linuxapp.

Yes it makes sense.

More opinions/votes?

> > There are also some kernel modules in the bsdapp directory.
> 
> We can move them as well.



[dpdk-dev] [PATCH] eal: add function to return number of detected sockets

2017-12-22 Thread Anatoly Burakov
During lcore scan, find maximum socket ID and store it.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_lcore.c  | 12 
 lib/librte_eal/common/include/rte_eal.h   |  1 +
 lib/librte_eal/common/include/rte_lcore.h |  8 
 lib/librte_eal/rte_eal_version.map|  6 ++
 4 files changed, 27 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_lcore.c 
b/lib/librte_eal/common/eal_common_lcore.c
index 0db1555..546c802 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -57,6 +57,7 @@ rte_eal_cpu_init(void)
struct rte_config *config = rte_eal_get_configuration();
unsigned lcore_id;
unsigned count = 0;
+   unsigned max_socket_id = 0;
 
/*
 * Parse the maximum set of logical cores, detect the subset of running
@@ -100,6 +101,8 @@ rte_eal_cpu_init(void)
lcore_id, lcore_config[lcore_id].core_id,
lcore_config[lcore_id].socket_id);
count++;
+   max_socket_id = RTE_MAX(max_socket_id,
+   lcore_config[lcore_id].socket_id);
}
/* Set the count of enabled logical cores of the EAL configuration */
config->lcore_count = count;
@@ -108,5 +111,14 @@ rte_eal_cpu_init(void)
RTE_MAX_LCORE);
RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+   config->numa_node_count = max_socket_id + 1;
+   RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
return 0;
 }
+
+unsigned rte_num_sockets(void)
+{
+   const struct rte_config *config = rte_eal_get_configuration();
+   return config->numa_node_count;
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index 8e4e71c..5b12914 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -83,6 +83,7 @@ enum rte_proc_type_t {
 struct rte_config {
uint32_t master_lcore;   /**< Id of the master lcore */
uint32_t lcore_count;/**< Number of available logical cores. */
+   uint32_t numa_node_count;/**< Number of detected NUMA nodes. */
uint32_t service_lcore_count;/**< Number of available service cores. */
enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index c89e6ba..6a75c9b 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -148,6 +148,14 @@ rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets on the system.
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ *
+ */
+unsigned rte_num_sockets(void);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/rte_eal_version.map 
b/lib/librte_eal/rte_eal_version.map
index f4f46c1..e086c6e 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -200,6 +200,12 @@ DPDK_17.11 {
 
 } DPDK_17.08;
 
+DPDK_18.02 {
+   global:
+
+   rte_num_sockets;
+} DPDK_17.11;
+
 EXPERIMENTAL {
global:
 
-- 
2.7.4


[dpdk-dev] [PATCH v2] eal: add function to return number of detected sockets

2017-12-22 Thread Anatoly Burakov
During lcore scan, find maximum socket ID and store it.

Signed-off-by: Anatoly Burakov 
---

Notes:
v2:
- checkpatch changes
- check socket before deciding if the core is not to be used

 lib/librte_eal/common/eal_common_lcore.c  | 37 +--
 lib/librte_eal/common/include/rte_eal.h   |  1 +
 lib/librte_eal/common/include/rte_lcore.h |  8 +++
 lib/librte_eal/rte_eal_version.map|  6 +
 4 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_lcore.c 
b/lib/librte_eal/common/eal_common_lcore.c
index 0db1555..c9729a0 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -57,6 +57,7 @@ rte_eal_cpu_init(void)
struct rte_config *config = rte_eal_get_configuration();
unsigned lcore_id;
unsigned count = 0;
+   unsigned int socket_id, max_socket_id = 0;
 
/*
 * Parse the maximum set of logical cores, detect the subset of running
@@ -68,6 +69,19 @@ rte_eal_cpu_init(void)
/* init cpuset for per lcore config */
CPU_ZERO(&lcore_config[lcore_id].cpuset);
 
+   /* find socket first */
+   socket_id = eal_cpu_socket_id(lcore_id);
+   if (socket_id >= RTE_MAX_NUMA_NODES) {
+#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
+   socket_id = 0;
+#else
+   RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than 
RTE_MAX_NUMA_NODES (%d)\n",
+   socket_id, RTE_MAX_NUMA_NODES);
+   return -1;
+#endif
+   }
+   max_socket_id = RTE_MAX(max_socket_id, socket_id);
+
/* in 1:1 mapping, record related cpu detected state */
lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
if (lcore_config[lcore_id].detected == 0) {
@@ -83,18 +97,7 @@ rte_eal_cpu_init(void)
config->lcore_role[lcore_id] = ROLE_RTE;
lcore_config[lcore_id].core_role = ROLE_RTE;
lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-   lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-   if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
-#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
-   lcore_config[lcore_id].socket_id = 0;
-#else
-   RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
-   "RTE_MAX_NUMA_NODES (%d)\n",
-   lcore_config[lcore_id].socket_id,
-   RTE_MAX_NUMA_NODES);
-   return -1;
-#endif
-   }
+   lcore_config[lcore_id].socket_id = socket_id;
RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
"core %u on socket %u\n",
lcore_id, lcore_config[lcore_id].core_id,
@@ -108,5 +111,15 @@ rte_eal_cpu_init(void)
RTE_MAX_LCORE);
RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+   config->numa_node_count = max_socket_id + 1;
+   RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
return 0;
 }
+
+unsigned int
+rte_num_sockets(void)
+{
+   const struct rte_config *config = rte_eal_get_configuration();
+   return config->numa_node_count;
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index 8e4e71c..5b12914 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -83,6 +83,7 @@ enum rte_proc_type_t {
 struct rte_config {
uint32_t master_lcore;   /**< Id of the master lcore */
uint32_t lcore_count;/**< Number of available logical cores. */
+   uint32_t numa_node_count;/**< Number of detected NUMA nodes. */
uint32_t service_lcore_count;/**< Number of available service cores. */
enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/librte_eal/common/include/rte_lcore.h 
b/lib/librte_eal/common/include/rte_lcore.h
index c89e6ba..7c72c9e 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -148,6 +148,14 @@ rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets on the system.
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ *
+ */
+unsigned int rte_num_sockets(void);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/rte_eal_version.map 
b/lib/librte_eal/rte_eal_version.map
index f4f46c1..e086c6e 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -200,6 +200,12 @@ DPDK_17.11 {
 
 } DPDK_17.08;
 
+DPDK_18.02 {
+   global:
+
+   rte

Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules

2017-12-22 Thread Burakov, Anatoly

On 22-Dec-17 10:04 AM, Hemant Agrawal wrote:

On 12/22/2017 2:13 PM, Thomas Monjalon wrote:

Hi,

22/12/2017 06:57, Hemant Agrawal:

This patch moves the Linux kernel modules code to a common place.
 - Separate the kernel module code from user space code.
 - The GPL-2.0 licensed code is separated from the BSD-3 licensed 
userspace

   code


What is the benefit of separate things by license?


The separation makes it easy to identify and check the license.

Any patch introducing new file in *non-kern* folders shall not be 
GPL-2.0 licensed.  Or GPL-2.0 license is allowed only for kern folder.


The latter is better since BSD kernel modules are not GPL-licensed. So, 
anything in the kern/kernel dir is not *necessarily* GPL-licensed, but 
anything *outside* kern/kernel dir is *necessarily not* GPL-licensed.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v1 6/6] net: fix rte_ether conflicts with libc

2017-12-22 Thread Olivier MATZ
Hi Adrien,

On Thu, Dec 21, 2017 at 02:00:06PM +0100, Adrien Mazarguil wrote:
> Applications can't combine either net/ethernet.h or netinet/ether.h
> together with rte_ether.h due to the redefinition of struct ether_addr and
> various macros by the latter.
> 
> This patch adapts rte_ether.h to rely on system definitions while
> maintaining DPDK additions.
> 
> An unforeseen consequence of involving more system header files compilation
> issues with some base drivers (i40e, ixgbe) defining their own conflicting
> types (e.g. __le64). This is addressed by explicitly including rte_ether.h
> where missing to ensure system definitions always come first.
> 
> Signed-off-by: Adrien Mazarguil 
> Cc: Olivier Matz 
> Cc: Bruce Richardson 

[...]

> --- a/drivers/net/qede/base/bcm_osal.h
> +++ b/drivers/net/qede/base/bcm_osal.h
> @@ -334,7 +334,6 @@ u32 qede_find_first_zero_bit(unsigned long *, u32);
>   qede_find_first_zero_bit(bitmap, length)
>  
>  #define OSAL_BUILD_BUG_ON(cond)  nothing
> -#define ETH_ALEN ETHER_ADDR_LEN
>  
>  #define OSAL_BITMAP_WEIGHT(bitmap, count) 0
>  

Not sure we can update code in a 'base' driver as easily.
It should be checked how it can be done with the qede maintainer.


> diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> index 06d7b486c..19e62ea89 100644
> --- a/lib/librte_net/rte_ether.h
> +++ b/lib/librte_net/rte_ether.h
> @@ -44,6 +44,7 @@
>  extern "C" {
>  #endif
>  
> +#include 
>  #include 
>  #include 
>  
> @@ -52,15 +53,7 @@ extern "C" {
>  #include 
>  #include 
>  
> -#define ETHER_ADDR_LEN  6 /**< Length of Ethernet address. */
> -#define ETHER_TYPE_LEN  2 /**< Length of Ethernet type field. */
> -#define ETHER_CRC_LEN   4 /**< Length of Ethernet CRC. */
> -#define ETHER_HDR_LEN   \
> - (ETHER_ADDR_LEN * 2 + ETHER_TYPE_LEN) /**< Length of Ethernet header. */
> -#define ETHER_MIN_LEN   64/**< Minimum frame len, including CRC. */
> -#define ETHER_MAX_LEN   1518  /**< Maximum frame len, including CRC. */
> -#define ETHER_MTU   \
> - (ETHER_MAX_LEN - ETHER_HDR_LEN - ETHER_CRC_LEN) /**< Ethernet MTU. */
> +#define ETHER_MTU ETHERMTU /**< Deprecated, defined for compatibility. */
>  
>  #define ETHER_MAX_VLAN_FRAME_LEN \
>   (ETHER_MAX_LEN + 4) /**< Maximum VLAN frame length, including CRC. */
> @@ -72,8 +65,11 @@ extern "C" {
>  
>  #define ETHER_MIN_MTU 68 /**< Minimum MTU for IPv4 packets, see RFC 791. */
>  
> +#ifdef __DOXYGEN__
> +
>  /**
> - * Ethernet address:
> + * Ethernet address.
> + *
>   * A universally administered address is uniquely assigned to a device by its
>   * manufacturer. The first three octets (in transmission order) contain the
>   * Organizationally Unique Identifier (OUI). The following three (MAC-48 and
> @@ -82,11 +78,25 @@ extern "C" {
>   * A locally administered address is assigned to a device by a network
>   * administrator and does not contain OUIs.
>   * See http://standards.ieee.org/regauth/groupmac/tutorial.html
> + *
> + * This structure is defined system-wide by "net/ethernet.h", however since
> + * the name of its data field is OS-dependent, a macro named "addr_bytes" is
> + * defined as an alias for the convenience of DPDK applications.
> + *
> + * The following definition is only for documentation purposes.
>   */
>  struct ether_addr {
>   uint8_t addr_bytes[ETHER_ADDR_LEN]; /**< Addr bytes in tx order */
>  } __attribute__((__packed__));
>  
> +#endif /* __DOXYGEN__ */
> +
> +#if defined(__FreeBSD__)
> +#define addr_bytes octet
> +#else
> +#define addr_bytes ether_addr_octet
> +#endif
> +
>  #define ETHER_LOCAL_ADMIN_ADDR 0x02 /**< Locally assigned Eth. address. */
>  #define ETHER_GROUP_ADDR   0x01 /**< Multicast or broadcast Eth. 
> address. */

This kind of #define looks a bit dangerous to me: it can trigger
strange bugs because it will replace all occurences of addr_bytes
after this header is included.

Wouldn't it be a good opportunity to think about adding the rte_ prefix
to all variables/functions of rte_ether.h?


Re: [dpdk-dev] [RFC v3 0/1] Compression API in DPDK

2017-12-22 Thread Trahe, Fiona
Hi Ahmed, 
thanks for your feedback and sorry for the slow response.
Comments below.

> -Original Message-
> From: Ahmed Mansour [mailto:ahmed.mans...@nxp.com]
> Sent: Monday, December 18, 2017 9:07 PM
> To: dev@dpdk.org; shally.ve...@cavium.com; Hemant Agrawal 
> 
> Cc: Hemant Agrawal ; mahipal.cha...@cavium.com; 
> Trahe, Fiona
> ; narayanaprasad.athr...@cavium.com; De Lara Guarch, 
> Pablo
> ; Roy Pledge ; Youri 
> Querry
> 
> Subject: Re: [RFC v3 0/1] Compression API in DPDK
> 
> Hi Fiona,
> 
> On 12/15/2017 11:16 PM, Trahe, Fiona wrote:
> 
> > With the vast amounts of data being transported around networks and stored 
> > in
> > storage systems, reducing data size is becoming ever more important. There
> > are both software libraries and hardware devices available that provide
> > compression, but no common API. This RFC proposes a compression API for
> > DPDK to address this need.
> >
> > Features:
> > • Deflate Algorithm
> (https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Frfc195
> 1&data=02%7C01%7Cahmed.mansour%40nxp.com%7C76241a2796db4701ef0108d54634a70f%7C686ea
> 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636492114308805852&sdata=yglh48%2F8IuEn%2F7YCL4
> 9FlyhGyCnNRX4g4xx2WJQesFs%3D&reserved=0)
> > • LZS algorithm
> (https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Frfc239
> 5&data=02%7C01%7Cahmed.mansour%40nxp.com%7C76241a2796db4701ef0108d54634a70f%7C686ea
> 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636492114308805852&sdata=BlfNe3pRXpEFJE4KS8Spk8
> bJ5GWPDqADhZYoD7SKvjk%3D&reserved=0)
> > • Static and Dynamic Huffman encoding.
> > • Compression levels
> > • Checksum generation
> > • Asynchronous burst API
> > • Session-based (a session contains immutable data only and is useable 
> > across devices)
> > • stream-based to maintain state and history data for stateful flows.
> >
> > Note 1: Split of functionality above/below API
> > When considering whether features should be supported on the API or not, the
> > decision was based on the following:
> > The purpose of the API is to decouple the application from the 
> > compute-intensive
> > functions needed for compression by abstracting them under a common API. 
> > These
> > can then be implemented by either hardware accelerators or optimised 
> > software
> > libraries. Where features are not compute-intensive and unlikely to be
> > offloaded or optimised, there’s nothing to be gained by each PMD having
> > to separately implement them, and it makes more sense for them to be done
> > above the API. So the following are not handled on the API and can be done 
> > above.
> > • Prepending/appending protocol headers (gzip, zlib)
> 
> Agreed with the notion, however the header and footer handling can be
> added as an option. PMDs can support or not support each format. We
> (NXP) support auto padding of gzip and zlib headers as well as DEFLATE only.
[Fiona] We'd like to stabilise the API with the current functionality before 
adding new features. Our focus now is on delivering a v1 of the full code 
rather than another iteration of the RFC.
The API will be experimental initially, so I think it should be easy add these 
later.
 
> > • File-handling, breaking files up into packets, reassembling.
> > • Synchronous API
> > • Serialisation of stateful requests
> >
> >
> Is stateful planned for next phase? During design discussions we uncovered 
> many API design
> considerations necessary for stateful use. Chained stateful support in the 
> future might not be
> possible without compatibility breaking
> 
[Fiona] Stateful is covered in the v3 version. See the thread 
http://dpdk.org/ml/archives/dev/2017-December/084713.html
Have a look at this, though maybe it would be better to hold off on a reply 
until the v2 version of this doc is posted by Shally.


> > Note 2: The tricky question of where the API belongs
> > We considered
> > 1. Extending cryptodev
> > 2. New acceldev APIs for device handling + compressdev APIs for data path
> > 3. New acceldev for all APIs
> > 4. New compressdev API
> > We've gone with option 4, a compressdev API.  See original RFC [1] for 
> > reasons.
> > We explored wrapping this around a generic acceldev that would be hidden 
> > from the API
> > but could be common to cryptodev, compressdev and other accelerators on the 
> > PMD interface,
> > but this added complexity and indirection and didn't add enough value, so 
> > we've abandoned it.
> >
> Makes sense. compression is common enough to be attempted to be a
> different device category.
> 
> 
> > Opens:
> >  - Define structures and API for proposed hash functionality
> >  - Agree on stateful behaviour
> 
> What are the the current thoughts for stateful behavior?
> 
> >  - Complete capability APIs
> 
> A capability API is very much required as different HW/SW can have
> different capabilities.
[Fiona] Agreed. We have work to do to flesh this out  - but expect to do in v1

Re: [dpdk-dev] [PATCH 01/11] avp: implement dynamic logging

2017-12-22 Thread Olivier MATZ
On Thu, Dec 21, 2017 at 10:02:14AM -0800, Ferruh Yigit wrote:
> On 12/20/2017 10:58 AM, Stephen Hemminger wrote:
> >> [1] something like:
> >>  #define INIT_LOG_VAR_NAME(pmd, type)   logtype_ ## pmd ## _ ## type
> >>  #define INIT_LOG_FUNC_NAME(pmd, type)  log_ ## pmd ## _ ## type
> >>
> >>  #define PMD_INIT_LOG(pmd, type, level)\
> >> int INIT_LOG_VAR_NAME(pmd, type);   \
> >> RTE_INIT(INIT_LOG_FUNC_NAME(pmd, type));\
> >> static void INIT_LOG_FUNC_NAME(pmd, type)(void) \
> >> {   \
> >> INIT_LOG_VAR_NAME(pmd, type) = rte_log_register("pmd."
> >> RTE_STR(pmd) "." RTE_STR(type)); \
> >> if (INIT_LOG_VAR_NAME(pmd, type) > 0)   \
> >> rte_log_set_level(INIT_LOG_VAR_NAME(pmd, type),
> >> RTE_LOG_##level); \
> >> }
> > 
> > That macro is a little complex.  Also, for better or worse, the current
> > logging is done on a per driver basis. If we want to do something fancier
> > it should be in common EAL core.
> 
> Of course, my intention was putting it into rte_log.h so updates in each 
> driver
> will be minimal. But this can be done better to cover library updates as well.

It's a good idea.

Below is another proposition (untested) that panics if
rte_log_register() fails, and that defines a static variable with a
predefined name.

 #define RTE_LOG_TYPE_REGISTER(name, level)  \
 static int name##_log_type; \
 __attribute__((constructor, used))  \
 static void rte_log_register_##name(void)   \
 {   \
 name##_log_type = rte_log_register(#name);  \
 RTE_VERIFY(name##_log_type >= 0);   \
 rte_log_set_level(name##_log_type, level);  \
 }


Re: [dpdk-dev] [PATCH] mbuf: pktmbuf pool create helper for specific mempool ops

2017-12-22 Thread Wiles, Keith


> On Dec 22, 2017, at 5:30 AM, Hemant Agrawal  wrote:
> 
> Introduce a new helper for pktmbuf pool, which will allow
> the application to optionally specify the mempool ops name
> as well.
> 
> Signed-off-by: Hemant Agrawal 
> ---
> This change was discussed in the 
> "doc: announce ABI change for pktmbuf pool create API"
> http://dpdk.org/dev/patchwork/patch/32306/
> 
> lib/librte_mbuf/rte_mbuf.c | 24 ++--
> lib/librte_mbuf/rte_mbuf.h | 42 ++
> 2 files changed, 60 insertions(+), 6 deletions(-)
> 

This patch looks good to me, but you forgot the MAP file patch to add the new 
API.

Regards,
Keith



Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules

2017-12-22 Thread Wiles, Keith


> On Dec 22, 2017, at 5:38 AM, Thomas Monjalon  wrote:
> 
> 22/12/2017 11:04, Hemant Agrawal:
>> On 12/22/2017 2:13 PM, Thomas Monjalon wrote:
>>> Hi,
>>> 
>>> 22/12/2017 06:57, Hemant Agrawal:
 This patch moves the Linux kernel modules code to a common place.
 - Separate the kernel module code from user space code.
 - The GPL-2.0 licensed code is separated from the BSD-3 licensed userspace
   code
>>> 
>>> What is the benefit of separate things by license?
>> 
>> The separation makes it easy to identify and check the license.
>> 
>> Any patch introducing new file in *non-kern* folders shall not be 
>> GPL-2.0 licensed.  Or GPL-2.0 license is allowed only for kern folder.
> 
> The kernel modules are in DPDK only for historical reasons.
> We should get rid of them, and rely only on upstream modules.
> 
> And it should be allowed to have kernel-related files elsewhere.
> Examples: GPL tools or BPF code.
> 
>>> These modules are Linux modules, so they should be in the linuxapp dir.
>> 
>> 
>> This is a cleaner separation w.r.t userspace/kernel space code.
>> *kern* is a better placefolder for LKMs.
> 
> I prefer "kernel" name.

The name should be related to Linux in some way, like linux_kern or 
linux_kernel or linux_modules (this is the one I prefer) this way it make it 
clear which OS they are designed for.

> 
>> Also eal is not getting overloaded.
>> 
>> linuxapp is part of librte_eal.  KNI is not related to EAL, but still 
>> the kni kernel code is added to librte_eal under linuxapp.
> 
> Yes it makes sense.
> 
> More opinions/votes?
> 
>>> There are also some kernel modules in the bsdapp directory.
>> 
>> We can move them as well.

Regards,
Keith



Re: [dpdk-dev] [PATCH 1/2] mempool: indicate the usages of multi memzones

2017-12-22 Thread Olivier MATZ
On Wed, Dec 20, 2017 at 05:29:59PM +0530, Hemant Agrawal wrote:
> On 12/19/2017 6:38 PM, Hemant Agrawal wrote:
> > 
> > > That's true, I commented too fast :)
> > > And what about using mp->nb_mem_chunks instead? Would it do the job
> > > in your use-case?
> > 
> > It should work.  Let me check it out.
> 
> There is a slight problem with nb_mem_chunks.
> 
> It is getting incremented in the end of "rte_mempool_populate_phys",
> while the elements are getting populated before it in the call of
> mempool_add_elem.
> 
> I can use nb_mem_chunks are '0' check. However it can break in future if
> mempool_populate_phys changes.

Sorry, I'm not sure I'm getting what you say.

My question was about using mp->nb_mem_chunks instead of a new flag in the
dppa driver. Am I missing something?


Re: [dpdk-dev] [RFC v2 3/5] ether: Add flow timeout support

2017-12-22 Thread Wiles, Keith


> On Dec 22, 2017, at 3:03 AM, Zhang, Qi Z  wrote:
> 
> Alex:
> 
>> -Original Message-
>> From: Alex Rosenbaum [mailto:rosenbauma...@gmail.com]
>> Sent: Thursday, December 21, 2017 9:59 PM
>> To: Zhang, Qi Z 
>> Cc: adrien.mazarg...@6wind.com; DPDK ; Doherty, Declan
>> 
>> Subject: Re: [dpdk-dev] [RFC v2 3/5] ether: Add flow timeout support
>> 
>> On Thu, Dec 21, 2017 at 4:35 AM, Qi Zhang  wrote:
>>> Add new APIs to support flow timeout, application is able to 1. Setup
>>> the time duration of a flow, the flow is expected to be deleted
>>> automatically when timeout.
>> 
>> Can you explain how the application (OVS) is expected to use this API?
>> It will help to better understand the motivation here...
> 
> I think the purpose of the APIs is to expose the hardware feature that support
> flow auto delete with a timeout.
> As I know, for OVS, every flow in flow table will have time duration
> A flow be offloaded to hardware is still required to be deleted in specific 
> time, 
> I think these APIs help OVS to take advantage HW feature and simplify the flow
> aging management
> 
>> 
>> Are you trying to move the aging timer from application code into the PMD?
>> or can your HW remove/disable/inactivate a flow at certain time semantics
>> without software context?
> 
> Yes, it for hardware feature.

We also need to support a software timeout feature here and not just a hardware 
one. The reason is to make the APIs consistent across all hardware. If you are 
going to include hardware timeout then we need to add software supported 
timeout at the same time IMO.

> 
>> 
>> I would prefer to have the aging timer logic in a centralized location, leek 
>> the
>> application itself or some DPDK library. instead of having each PMD
>> implement its own software timers.
>> 
>> 
>>> 3. Register a callback function when a flow is deleted due to timeout.
>> 
>> Is the application 'struct rte_flow*' handle really deleted? or the flow was
>> removed from HW, just in-active at this time?
> 
> Here the flow is deleted, same thing happen as rte_flow_destroy and we need 
> to call
> rte_flow_create to re-enable the flow. 
> I will add more explanation to avoid confusion in next release.

Sorry, I little late into this thread, but we can not have 1000 callbacks for 
each timeout and we need make sure we bunch up a number of timeouts at a time 
to make the feature more performant IMO. Maybe that discussed or address in the 
code.

> 
>> 
>> Can a flow be re-activated? or does this require a call to
>> rte_flow_destory() and ret_flow_create()?
>> 
>> Alex
> 
> Thanks
> Qi

Regards,
Keith



Re: [dpdk-dev] [RFC v3 1/1] lib: add compressdev API

2017-12-22 Thread Trahe, Fiona
Hi Ahmed,

> -Original Message-
> From: Ahmed Mansour [mailto:ahmed.mans...@nxp.com]
> Sent: Monday, December 18, 2017 9:44 PM
> To: dev@dpdk.org; shally.ve...@cavium.com
> Cc: mahipal.cha...@cavium.com; narayanaprasad.athr...@cavium.com; De Lara 
> Guarch, Pablo
> ; Trahe, Fiona ; Roy 
> Pledge
> ; Youri Querry ; Hemant Agrawal
> 
> Subject: Re: [RFC v3 1/1] lib: add compressdev API
> 
> On 12/15/2017 11:19 PM, Trahe, Fiona wrote:
> .. 
> 
> > +
> > +/** Compression Algorithms */
> > +enum rte_comp_algorithm {
> > +   RTE_COMP_NULL = 0,
> > +   /**< No compression.
> > +* Pass-through, data is copied unchanged from source buffer to
> > +* destination buffer.
> > +*/
> > +   RTE_COMP_DEFLATE,
> > +   /**< DEFLATE compression algorithm
> > +*
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Frfc195
> 1&data=02%7C01%7Cahmed.mansour%40nxp.com%7Cf3edbd70b38b49eb1f0308d54634e444%7C686ea
> 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636492115333128845&sdata=B3G0aIncVAK17dXlnSivXi0
> e56h2D7pEQZ9gK%2Fh3qZQ%3D&reserved=0
> > +*/
> > +   RTE_COMP_LZS,
> > +   /**< LZS compression algorithm
> > +*
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Frfc239
> 5&data=02%7C01%7Cahmed.mansour%40nxp.com%7Cf3edbd70b38b49eb1f0308d54634e444%7C686ea
> 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636492115333128845&sdata=aNRFIfkelXlCUUgpp%2BzC
> YaTu28tp6fF0m6k7F13w1Ps%3D&reserved=0
> > +*/
> > +   RTE_COMP_ALGO_LIST_END
> > +};
> > +
> > +/**< Compression Level.
> > + * The number is interpreted by each PMD differently. However, lower 
> > numbers
> > + * give fastest compression, at the expense of compression ratio while
> > + * higher numbers may give better compression ratios but are likely slower.
> > + */
> > +#defineRTE_COMP_LEVEL_PMD_DEFAULT  (-1)
> > +/** Use PMD Default */
> > +#defineRTE_COMP_LEVEL_NONE (0)
> > +/** Output uncompressed blocks if supported by the specified algorithm */
> > +#define RTE_COMP_LEVEL_MIN (1)
> > +/** Use minimum compression level supported by the PMD */
> > +#define RTE_COMP_LEVEL_MAX (9)
> > +/** Use maximum compression level supported by the PMD */
> > +
> > +/** Compression checksum types */
> > +enum rte_comp_checksum_type {
> > +   RTE_COMP_NONE,
> > +   /**< No checksum generated */
> > +   RTE_COMP_CRC32,
> > +   /**< Generates a CRC32 checksum, as used by gzip */
> > +   RTE_COMP_ADLER32,
> > +   /**< Generates an Adler-32 checksum, as used by zlib */
> > +   RTE_COMP_CRC32_ADLER32,
> > +   /**< Generates both Adler-32 and CRC32 checksums, concatenated.
> > +* CRC32 is in the lower 32bits, Adler-32 in the upper 32 bits.
> > +*/
> 
> What would be a real life use case for returning both CRC32 and ADLER32?
> Packaging the data once as Gzip and once as zlib?
[Fiona] We've had requests for this from customers.

> 
> > +};
> > +
> > +/*
> > + * enum rte_comp_hash_algo {
> > + *   RTE_COMP_HASH_NONE,
> > + *   RTE_COMP_HASH_SHA1,
> > + *   RTE_COMP_HASH_SHA256,
> > + * };
> > + * Need further input from cavium on this
> > + * xform will need a flag with above enum value
> > + * op will need to provide a virt/phys ptr to a data buffer of appropriate 
> > size.
> > + * And via capability PMD can say whether supported or not.
> > + */
> > +
> > +/** Compression Huffman Type - used by DEFLATE algorithm */
> > +enum rte_comp_huffman {
> > +   RTE_COMP_DEFAULT,
> > +   /**< PMD may choose which Huffman codes to use */
> > +   RTE_COMP_FIXED,
> > +   /**< Use Fixed Huffman codes */
> > +   RTE_COMP_DYNAMIC,
> > +   /**< Use Dynamic Huffman codes */
> > +};
> > +
> > +
> > +enum rte_comp_flush_flag {
> > +   RTE_COMP_FLUSH_NONE,
> > +   /**< Data is not flushed. Output may remain in the compressor and be
> > +* processed during a following op. It may not be possible to decompress
> > +* output until a later op with some other flush flag has been sent.
> > +*/
> > +   RTE_COMP_FLUSH_SYNC,
> > +   /**< All data should be flushed to output buffer. Output data can be
> > +* decompressed. However state and history is not cleared, so future
> > +* ops may use history from this op */
> > +   RTE_COMP_FLUSH_FULL,
> > +   /**< All data should be flushed to output buffer. Output data can be
> > +* decompressed. State and history data is cleared, so future
> > +* ops will be independent of ops processed before this.
> > +*/
> > +   RTE_COMP_FLUSH_FINAL
> > +   /**< Same as RTE_COMP_FLUSH_FULL but also bfinal bit is set in last 
> > block
> > +*/
> > +/* TODO:
> > + * describe flag meanings for decompression.
> > + * describe behavous in OUT_OF_SPACE case.
> > + * At least the last flag is specific to deflate algo. Should this be
> > + * called rte_comp_deflate_flush_flag? And should there be
> > + * comp_op_deflate_params in the op? */
> 
> What about Z_BLOCK and Z_TREES? Those are needed for 

Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules

2017-12-22 Thread Thomas Monjalon
22/12/2017 14:59, Wiles, Keith:
> 
> > On Dec 22, 2017, at 5:38 AM, Thomas Monjalon  wrote:
> > 
> > 22/12/2017 11:04, Hemant Agrawal:
> >> On 12/22/2017 2:13 PM, Thomas Monjalon wrote:
> >>> These modules are Linux modules, so they should be in the linuxapp dir.
> >> 
> >> 
> >> This is a cleaner separation w.r.t userspace/kernel space code.
> >> *kern* is a better placefolder for LKMs.
> > 
> > I prefer "kernel" name.
> 
> The name should be related to Linux in some way, like linux_kern or 
> linux_kernel or linux_modules (this is the one I prefer) this way it make it 
> clear which OS they are designed for.

If such top-level directory is created, the BSD modules must be moved there too.
That's why "kernel/" or "kernel/linux/" is appropriate.

> >> Also eal is not getting overloaded.
> >> 
> >> linuxapp is part of librte_eal.  KNI is not related to EAL, but still 
> >> the kni kernel code is added to librte_eal under linuxapp.
> > 
> > Yes it makes sense.
> > 
> > More opinions/votes?
> > 
> >>> There are also some kernel modules in the bsdapp directory.
> >> 
> >> We can move them as well.



Re: [dpdk-dev] [PATCH v1 6/6] net: fix rte_ether conflicts with libc

2017-12-22 Thread Adrien Mazarguil
Hi Olivier,

On Fri, Dec 22, 2017 at 02:34:21PM +0100, Olivier MATZ wrote:
> Hi Adrien,
> 
> On Thu, Dec 21, 2017 at 02:00:06PM +0100, Adrien Mazarguil wrote:
> > Applications can't combine either net/ethernet.h or netinet/ether.h
> > together with rte_ether.h due to the redefinition of struct ether_addr and
> > various macros by the latter.
> > 
> > This patch adapts rte_ether.h to rely on system definitions while
> > maintaining DPDK additions.
> > 
> > An unforeseen consequence of involving more system header files compilation
> > issues with some base drivers (i40e, ixgbe) defining their own conflicting
> > types (e.g. __le64). This is addressed by explicitly including rte_ether.h
> > where missing to ensure system definitions always come first.
> > 
> > Signed-off-by: Adrien Mazarguil 
> > Cc: Olivier Matz 
> > Cc: Bruce Richardson 
> 
> [...]
> 
> > --- a/drivers/net/qede/base/bcm_osal.h
> > +++ b/drivers/net/qede/base/bcm_osal.h
> > @@ -334,7 +334,6 @@ u32 qede_find_first_zero_bit(unsigned long *, u32);
> > qede_find_first_zero_bit(bitmap, length)
> >  
> >  #define OSAL_BUILD_BUG_ON(cond)nothing
> > -#define ETH_ALEN   ETHER_ADDR_LEN
> >  
> >  #define OSAL_BITMAP_WEIGHT(bitmap, count) 0
> >  
> 
> Not sure we can update code in a 'base' driver as easily.
> It should be checked how it can be done with the qede maintainer.

Sure, although I have to send an update for this chunk already: ETH_ALEN
seems only defined in Linux, not under FreeBSD. I intend to enclose the
above within #ifdef ETH_ALEN.

Besides, updating this file shouldn't be a problem, it's already tailored
for DPDK as it includes and uses several DPDK headers.

> > diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> > index 06d7b486c..19e62ea89 100644
> > --- a/lib/librte_net/rte_ether.h
> > +++ b/lib/librte_net/rte_ether.h
> > @@ -44,6 +44,7 @@
> >  extern "C" {
> >  #endif
> >  
> > +#include 
> >  #include 
> >  #include 
> >  
> > @@ -52,15 +53,7 @@ extern "C" {
> >  #include 
> >  #include 
> >  
> > -#define ETHER_ADDR_LEN  6 /**< Length of Ethernet address. */
> > -#define ETHER_TYPE_LEN  2 /**< Length of Ethernet type field. */
> > -#define ETHER_CRC_LEN   4 /**< Length of Ethernet CRC. */
> > -#define ETHER_HDR_LEN   \
> > -   (ETHER_ADDR_LEN * 2 + ETHER_TYPE_LEN) /**< Length of Ethernet header. */
> > -#define ETHER_MIN_LEN   64/**< Minimum frame len, including CRC. */
> > -#define ETHER_MAX_LEN   1518  /**< Maximum frame len, including CRC. */
> > -#define ETHER_MTU   \
> > -   (ETHER_MAX_LEN - ETHER_HDR_LEN - ETHER_CRC_LEN) /**< Ethernet MTU. */
> > +#define ETHER_MTU ETHERMTU /**< Deprecated, defined for compatibility. */
> >  
> >  #define ETHER_MAX_VLAN_FRAME_LEN \
> > (ETHER_MAX_LEN + 4) /**< Maximum VLAN frame length, including CRC. */
> > @@ -72,8 +65,11 @@ extern "C" {
> >  
> >  #define ETHER_MIN_MTU 68 /**< Minimum MTU for IPv4 packets, see RFC 791. */
> >  
> > +#ifdef __DOXYGEN__
> > +
> >  /**
> > - * Ethernet address:
> > + * Ethernet address.
> > + *
> >   * A universally administered address is uniquely assigned to a device by 
> > its
> >   * manufacturer. The first three octets (in transmission order) contain the
> >   * Organizationally Unique Identifier (OUI). The following three (MAC-48 
> > and
> > @@ -82,11 +78,25 @@ extern "C" {
> >   * A locally administered address is assigned to a device by a network
> >   * administrator and does not contain OUIs.
> >   * See http://standards.ieee.org/regauth/groupmac/tutorial.html
> > + *
> > + * This structure is defined system-wide by "net/ethernet.h", however since
> > + * the name of its data field is OS-dependent, a macro named "addr_bytes" 
> > is
> > + * defined as an alias for the convenience of DPDK applications.
> > + *
> > + * The following definition is only for documentation purposes.
> >   */
> >  struct ether_addr {
> > uint8_t addr_bytes[ETHER_ADDR_LEN]; /**< Addr bytes in tx order */
> >  } __attribute__((__packed__));
> >  
> > +#endif /* __DOXYGEN__ */
> > +
> > +#if defined(__FreeBSD__)
> > +#define addr_bytes octet
> > +#else
> > +#define addr_bytes ether_addr_octet
> > +#endif
> > +
> >  #define ETHER_LOCAL_ADMIN_ADDR 0x02 /**< Locally assigned Eth. address. */
> >  #define ETHER_GROUP_ADDR   0x01 /**< Multicast or broadcast Eth. 
> > address. */
> 
> This kind of #define looks a bit dangerous to me: it can trigger
> strange bugs because it will replace all occurences of addr_bytes
> after this header is included.

Understandable, I checked before settling on this macro though, there's no
other usage of addr_bytes inside DPDK.

As for applications, there's no way to be completely sure. If we consider
they have to explicitly include rte_ether.h to get this definition, there
are chances addr_bytes is exclusively used with MAC addresses.

This change results in an API change (addr_bytes now documented as a
reserved macro) but has no ABI impact. I think it's a rather harmless

Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-22 Thread Neil Horman
On Thu, Dec 21, 2017 at 09:57:43PM +, Matan Azrad wrote:
> > -Original Message-
> > From: Neil Horman [mailto:nhor...@tuxdriver.com]
> > Sent: Thursday, December 21, 2017 10:14 PM
> > To: Matan Azrad 
> > Cc: Thomas Monjalon ; dev@dpdk.org; Bruce
> > Richardson ; Ananyev, Konstantin
> > ; Gaëtan Rivet ;
> > Wu, Jingjing 
> > Subject: Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > 
> > On Thu, Dec 21, 2017 at 07:37:06PM +, Matan Azrad wrote:
> > > Hi
> > >
> 
> > > > > > > I think we need to clearly describe what is the tread-safety
> > > > > > > policy in DPDK (especially in ethdev as a first example).
> > > > > > > Let's start with obvious things:
> > > > > > >
> > > > > > >   1/ A queue is not protected for races with multiple Rx or Tx
> > > > > > >   - no planned change because of performance
> > > > purpose
> > > > > > >   2/ The list of devices is racy
> > > > > > >   - to be fixed with atomics
> > > > > > >   3/ The configuration of different devices is thread-safe
> > > > > > >   - the configurations are different per-device
> > > > > > >   4/ The configuration of a given device is racy
> > > > > > >   - can be managed by the owner of the device
> > > > > > >   5/ The device ownership is racy
> > > > > > >   - to be fixed with atomics
> > > > > > >
> > > > > > > What am I missing?
> > > > > > >
> > >
> > > Thank you Thomas for this order.
> > > Actually the port ownership is a good opportunity to redefine the
> > > synchronization rules in ethdev :)
> > >
> > > > > > There is fan out to consider here:
> > > > > >
> > > > > > 1) Is device configuration racy with ownership?  That is to say,
> > > > > > can I change ownership of a device safely while another thread
> > > > > > that currently owns it modifies its configuration?
> > > > >
> > > > > If an entity steals ownership to another one, either it is agreed
> > > > > earlier, or it is done by a central authority.
> > > > > When it is acked that ownership can be moved, there should not be
> > > > > any configuration in progress.
> > > > > So it is more a communication issue than a race.
> > > > >
> > > > But if thats the case (specifically that mutual exclusion between
> > > > port ownership and configuration is an exercize left to an
> > > > application developer), then port ownership itself is largely
> > > > meaningless within the dpdk, because the notion of who owns the port
> > > > needs to be codified within the application anyway.
> > > >
> > >
> > > Bruce, As I understand it, only the dpdk entity who took ownership of a
> > port successfully can configure the device by default, if other dpdk 
> > entities
> > want to configure it too they must to be synchronized with the port owner
> > while it is not recommended after the port ownership integration.
> > >
> > Can you clarify what you mean by "it is not recommended after the port
> > ownership integration"?
> 
> Sure,
> The new defining of ethdev synchronization doesn't recommend to manage a port 
> by 2 different dpdk entities, it can be done but not recommended.
>   
Ok, thats just not what you said above.  Your suggestion made it sound like you
thought that  after the integration of a port ownership model, that multiple
dpdk entries should not synchronize with one another, which made no sense.

> >  I think there is consensus that the port owner must
> > be the only entitiy to operate on a port (be that configuration/frame 
> > rx/tx, or
> > some other operation).
> 
> Your question above caused me to think that you don't understand it, How can 
> someone who is not the port owner to change the port owner?
> Changing the port owner, like port configuration and port release must be 
> done by the owner itself except the case that there is no owner to the port.
> See the API rte_eth_dev_owner_remove.
> 
See above, your phrasing I don't think accurately reflected what you meant to
convey. Or at least thats not how I read it

> > Multithreaded operation on a port always means
> > some level of synchronization between application threads and the dpdk
> > library,
> Yes.
>  >but I'm not sure why that would be different if we introduced a more
> > concrete notion of port ownership via a new library.
> >
> 
> What do you mean by "new library"?, port is an ethdev instance and should be 
> managed by ethdev.
> 
I'm referring to the port ownership api that you proposed.  Apologies, I should
not have used the term "new library", but rather "new api".

>  > > So, for example,  if the dpdk entity is an application, the application 
> should
> >> take ownership of the port and manage the synchronization of this port
> >> configuration between the application threads and its EAL host thread
> >> callbacks, no other dpdk entity should configure the same port because they
> >> should fail when they try to take ownership of the same port too.
> 
> > Well, failing is one good approach, yes, blocking on 

Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules

2017-12-22 Thread Van Haaren, Harry
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Friday, December 22, 2017 11:38 AM
> To: Hemant Agrawal 
> Cc: dev@dpdk.org; Yigit, Ferruh 
> Subject: Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules
> 
> 22/12/2017 11:04, Hemant Agrawal:
> > On 12/22/2017 2:13 PM, Thomas Monjalon wrote:
> > > Hi,
> > >
> > > 22/12/2017 06:57, Hemant Agrawal:
> > >> This patch moves the Linux kernel modules code to a common place.
> > >>  - Separate the kernel module code from user space code.
> > >>  - The GPL-2.0 licensed code is separated from the BSD-3 licensed
> userspace
> > >>code
> > >
> > > What is the benefit of separate things by license?
> >
> > The separation makes it easy to identify and check the license.
> >
> > Any patch introducing new file in *non-kern* folders shall not be
> > GPL-2.0 licensed.  Or GPL-2.0 license is allowed only for kern folder.
> 
> The kernel modules are in DPDK only for historical reasons.
> We should get rid of them, and rely only on upstream modules.
> 
> And it should be allowed to have kernel-related files elsewhere.
> Examples: GPL tools or BPF code.
> 
> > > These modules are Linux modules, so they should be in the linuxapp dir.
> >
> >
> > This is a cleaner separation w.r.t userspace/kernel space code.
> > *kern* is a better placefolder for LKMs.
> 
> I prefer "kernel" name.
> 
> > Also eal is not getting overloaded.
> >
> > linuxapp is part of librte_eal.  KNI is not related to EAL, but still
> > the kni kernel code is added to librte_eal under linuxapp.
> 
> Yes it makes sense.
> 
> More opinions/votes?


No strong opinion on moving source code around here... but:

We should be careful that the build system leaves the .ko and other files in 
the same place as before as moving the build output may break automated 
deployments of other projects that use DPDK.

We've accidentally broken things before, for example moving scripts/ to 
usertools/ broke automation in OpenStack IIRC.


> > > There are also some kernel modules in the bsdapp directory.
> >
> > We can move them as well.



Re: [dpdk-dev] [RFC 0/5] Port Representor for control and monitoring of VF devices

2017-12-22 Thread Mohammad Abdul Awal

Hi ALex,


On 21/12/2017 14:51, Alex Rosenbaum wrote:

Declan, Mohammad,

The submission [1] of steering action between switch ports clearly
requires a switch model in DPDK.
The Port Representor based on a virtual PMD broker on NIC ops
(rte_dev_ops) does not provide the required functionality. Using NIC
terminology and not Switch API's will lead to a dead-end. Moreover, it
does not fit the Kernel design. We need to be careful from this ending
up as two different deployment models for users, which is very bad.
There was a long discussion about this in netdev ML [2], including the
VEPA mode support.

As described in the links Alejandro referenced earlier, each of the
switch ports should be a real PMD, and switch operations should be
applied on these PMD ports.
This includes the steering redirection of traffic between switch ports
[1], port ACL's to block/allow traffic, VST/VGT modes and anti
spoofing, link trust mode [3] for promiscuous configuration, mirroring
of switch port traffic, and Tx and Rx of switch port traffic to/from
VF's port.
I agree that we need a switch_domain parameter. At the moment we do not 
have APIs implemented for all the switch operations you have mentioned 
above. So, we are planning separate RFC with switch _domain and related 
APIs.




More over, building this as real PMD ports of a switch device removes
the need to add a new broker framework all together.
Each vendor just needs to map additional PMD ports during the probing
stage.
That is very much possible as well. If we agree to probe all the ports 
during the initialization phase, we can have all the representors ready 
without any interaction from application and broker. On the other hand, 
we may require a broker structure to enable hotplug support.



By adding a switchdev_id we can define these are ports
associated to the same switching device, and can allow new port and
inter-port actions.

[1] http://dpdk.org/dev/patchwork/patch/32550/
[2] https://www.spinics.net/lists/netdev/msg467375.html
[2] https://www.systutorials.com/docs/linux/man/8-ip-link/

Alex


Regards,
Awal.


[dpdk-dev] [DPDK 0/5] lib: add Port Representors

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual 
function) ports for the purposes of control and monitoring. Each port 
representor device represents a single VF and is associated with it's 
parent physical function (PF) PMD which provides the back-end hooks for 
the representor device ops and defines the control domain to which that 
port belongs. This allows to use existing DPDK APIs to monitor and control 
the port without the need to create and maintain VF specific APIs.

+-+   +---+  +---+
|Control Plane|   |   Data Plane  |  |   Data Plane  |
| Application |   |   Application |  |   Application |
+-+   +---+  +---+
| eth dev api |   |  eth dev api  |  |  eth dev api  |
+-+   +---+  +---+
+---+  +---+  +---+   +---+  +---+
|  PF0  |  | Port  |  | Port  |   |VF0 PMD|  |VF0 PMD|
|  PMD  <--+ Rep 0 |  | Rep 1 |   +---+  +--++
|   |  | PMD   |  | PMD   | |
+---+--^+  +---+  +-+-+ |
|  ||  ||
|  ++  ||
|  ||
|  ||
++  |
|   |  HW (logical view)   | |  |
| --+--+ +---+ +---+---+ |  |
| |   PF   | |  VF0  | |  VF1  | |  |
| || |   | |   ++
| ++ +---+ +---+ |
| ++ |
| |VEB | |
| ++ |
| ++ |
| |  Port  | |
| |   0| |
| ++ |
++

The figure above shows a deployment where the PF is bound to a DPDK control
plane application which uses representor ports to manage the configuration and
monitoring of it's VF ports. Each virtual function is represented in the
application by a representor port PMD which enables control of the corresponding
VF through eth dev APIs on the representor PMD such as:

- void rte_eth_promiscuous_enable(uint8_t port_id);
- void rte_eth_promiscuous_disable(uint8_t port_id);
- void rte_eth_allmulticast_enable(uint8_t port_id);
- void rte_eth_allmulticast_disable(uint8_t port_id);
- int rte_eth_dev_mac_addr_add(uint8_t port, struct ether_addr *mac_addr,
uint32_t pool);
- int rte_eth_dev_set_vlan_offload(uint8_t port_id, int offload_mask);

as well as monitoring through API's like

- void rte_eth_link_get(uint8_t port_id, struct rte_eth_link *link);
- int rte_eth_stats_get(uint8_t port_id, struct rte_eth_stats *stats);

The port representor infrastructure is enabled through a single common, device
independent, virtual PMD whos context is initialized and enabled through a
broker instance running within the context of the physical function device
driver.

+-+   +-+
|rte_ethdev   |   |   rte_ethdev|
+-+   +-+
|  Physical Function PMD  |   |  Port Reperesentor PMD  |
| +-+ |   | +-+ +-+ |
| | Representor | |   | | dev_data| | dev_ops | |
| |Broker   | |   | +++ +++ |
| | +-+ | |   +--|---|--+
| | | VF Port | | |  |   |
| | | Context +--+   |
| | +-+ | |  |
| | +-+ | |  |
| | | Handler +--+
| | |   Ops   | | |
| | +-+ | |
| +-+ |
+-+

Creation of representor ports can be achieved either through the --vdev EAL
option or through the rte_vdev_init() API. Each port representor requires the
BDF of it's parent PF and the Virtual Function ID of the port which the
representor will support. During initialization of the representor PMD, it calls
the broker API to register itself with the PF PMD and to get it's context
configured which includes the setting up of it's context and ops function
handlers.

As the port representor model is based around the paradigm of using standard
port based APIs, it will allow future expansion of functionality without the
need to add new APIs. For example it should be possible to support configuration
of egress QoS parameters using existing TM APIs by extending the port
representor PMD/broker infrastruc

[dpdk-dev] [DPDK 2/5] eal: add Port Representor command-line option

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

By default the Port Representor infrastructure is not enabled. This
patch implements the --enable-representor EAL command-line parameter
that activates representation functionality.

Signed-off-by: Declan Doherty 
Signed-off-by: Mohammad Abdul Awal 
Signed-off-by: Remy Horton 
---
 lib/librte_eal/bsdapp/eal/eal.c| 6 ++
 lib/librte_eal/common/eal_common_options.c | 1 +
 lib/librte_eal/common/eal_internal_cfg.h   | 2 ++
 lib/librte_eal/common/eal_options.h| 2 ++
 lib/librte_eal/common/include/rte_eal.h| 8 
 lib/librte_eal/linuxapp/eal/eal.c  | 9 +
 6 files changed, 28 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 369a682..002200a 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -778,3 +778,9 @@ int rte_vfio_noiommu_is_enabled(void)
 {
return 0;
 }
+
+/* return non-zero if port-representor is enabled. */
+int rte_representor_enabled(void)
+{
+   return internal_config.enable_representor;
+}
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 996a034..6f2cc05 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -78,6 +78,7 @@ const struct option
 eal_long_options[] = {
{OPT_BASE_VIRTADDR, 1, NULL, OPT_BASE_VIRTADDR_NUM},
{OPT_CREATE_UIO_DEV,0, NULL, OPT_CREATE_UIO_DEV_NUM   },
+   {OPT_ENABLE_REPRESENTOR, 0, NULL, OPT_ENABLE_REPRESENTOR_NUM   },
{OPT_FILE_PREFIX,   1, NULL, OPT_FILE_PREFIX_NUM  },
{OPT_HELP,  0, NULL, OPT_HELP_NUM },
{OPT_HUGE_DIR,  1, NULL, OPT_HUGE_DIR_NUM },
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index fa6ccbe..55cae8c 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -71,6 +71,8 @@ struct internal_config {

* instead of native TSC */
volatile unsigned no_shconf;  /**< true if there is no shared 
config */
volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices 
*/
+   volatile unsigned enable_representor;
+   /**< true to enable port representor broker for all PFs */
volatile enum rte_proc_type_t process_type; /**< multi-process proc 
type */
/** true to try allocating memory on specific sockets */
volatile unsigned force_sockets;
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index 30e6bb4..c2b2162 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -83,6 +83,8 @@ enum {
OPT_VFIO_INTR_NUM,
 #define OPT_VMWARE_TSC_MAP"vmware-tsc-map"
OPT_VMWARE_TSC_MAP_NUM,
+#define OPT_ENABLE_REPRESENTOR"enable-representor"
+   OPT_ENABLE_REPRESENTOR_NUM,
OPT_LONG_MAX_NUM
 };
 
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index 8e4e71c..c4e61d1 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -335,6 +335,14 @@ enum rte_iova_mode rte_eal_iova_mode(void);
 const char *
 rte_eal_mbuf_default_mempool_ops(void);
 
+/**
+ * Get flag for port representor should be enabled or not.
+ *
+ * @return
+ *   Returns the enable-representor flag.
+ */
+int rte_representor_enabled(void);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 229eec9..364a8b2 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -612,6 +612,10 @@ eal_parse_args(int argc, char **argv)
internal_config.mbuf_pool_ops_name = optarg;
break;
 
+   case OPT_ENABLE_REPRESENTOR_NUM:
+   internal_config.enable_representor = 1;
+   break;
+
default:
if (opt < OPT_LONG_MIN_NUM && isprint(opt)) {
RTE_LOG(ERR, EAL, "Option %c is not supported "
@@ -1041,3 +1045,8 @@ rte_eal_check_module(const char *module_name)
/* Module has been found */
return 1;
 }
+
+int rte_representor_enabled(void)
+{
+   return internal_config.enable_representor;

[dpdk-dev] [DPDK 1/5] lib: add Port Representor library

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

The library provides the broker infrastructure to be instantiated by
base driver and corresponding methods to manage the broker
infrastructure. The broker keeps records of list of representor PMDs.
The library also provides methods to manage the representor PMDs by the
broker.

Signed-off-by: Declan Doherty 
Signed-off-by: Mohammad Abdul Awal 
Signed-off-by: Remy Horton 
---
 config/common_base |   5 +
 lib/Makefile   |   3 +
 lib/librte_representor/Makefile|  26 ++
 lib/librte_representor/rte_port_representor.c  | 326 +
 lib/librte_representor/rte_port_representor.h  |  60 
 .../rte_port_representor_driver.h  | 138 +
 .../rte_port_representor_version.map   |   8 +
 mk/rte.app.mk  |   1 +
 8 files changed, 567 insertions(+)
 create mode 100644 lib/librte_representor/Makefile
 create mode 100644 lib/librte_representor/rte_port_representor.c
 create mode 100644 lib/librte_representor/rte_port_representor.h
 create mode 100644 lib/librte_representor/rte_port_representor_driver.h
 create mode 100644 lib/librte_representor/rte_port_representor_version.map

diff --git a/config/common_base b/config/common_base
index e74febe..febb80a 100644
--- a/config/common_base
+++ b/config/common_base
@@ -820,3 +820,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile representor PMD
+#
+CONFIG_RTE_LIBRTE_REPRESENTOR=y
diff --git a/lib/Makefile b/lib/Makefile
index dc4e8df..b9202ff 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -101,6 +101,9 @@ DEPDIRS-librte_distributor := librte_eal librte_mbuf 
librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_PORT) += librte_port
 DEPDIRS-librte_port := librte_eal librte_mempool librte_mbuf librte_ether
 DEPDIRS-librte_port += librte_ip_frag librte_sched
+DIRS-$(CONFIG_RTE_LIBRTE_REPRESENTOR) += librte_representor
+DEPDIRS-librte_representor += librte_ether
+
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DEPDIRS-librte_port += librte_kni
 endif
diff --git a/lib/librte_representor/Makefile b/lib/librte_representor/Makefile
new file mode 100644
index 000..4060cc6
--- /dev/null
+++ b/lib/librte_representor/Makefile
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_representor.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_port_representor_version.map
+
+LIBABIVER := 1
+
+SRCS-$(CONFIG_RTE_LIBRTE_REPRESENTOR) += rte_port_representor.c
+
+#
+# Export include files
+#
+SYMLINK-$(CONFIG_RTE_LIBRTE_REPRESENTOR)-include += rte_port_representor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_REPRESENTOR)-include += 
rte_port_representor_driver.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_representor/rte_port_representor.c 
b/lib/librte_representor/rte_port_representor.c
new file mode 100644
index 000..69a4bfc
--- /dev/null
+++ b/lib/librte_representor/rte_port_representor.c
@@ -0,0 +1,326 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+
+#include 
+
+TAILQ_HEAD(rte_broker_list, rte_representor_broker);
+
+struct rte_broker_list broker_list =
+   TAILQ_HEAD_INITIALIZER(broker_list);
+
+struct port_rep_parameters {
+   uint64_t vport_mask;
+   struct {
+   char bus[RTE_DEV_NAME_MAX_LEN];
+   char device[RTE_DEV_NAME_MAX_LEN];
+   } parent;
+};
+
+/* Macros to check for valid id */
+#define RTE_VERIFY_OR_ERR_RET(val, retval) do { \
+   if (!(val)) { \
+   RTE_PMD_DEBUG_TRACE("verify failed, ret= %d", (retval)); \
+   return retval; \
+   } \
+} while (0)
+
+#define RTE_VERIFY_OR_RET(val) do { \
+   if (!(val)) { \
+   RTE_PMD_DEBUG_TRACE("verify failed"); \
+   return; \
+   } \
+} while (0)
+
+int
+rte_representor_broker_init(struct rte_representor_broker *broker)
+{
+   RTE_VERIFY_OR_ERR_RET(broker, -ENODEV);
+
+   RTE_VERIFY_OR_ERR_RET(broker->bus && strlen(broker->bus), -ENXIO);
+   RTE_VERIFY_OR_ERR_RET(broker->device && strlen(broker->device), -ENXIO);
+
+   RTE_VERIFY_OR_ERR_RET(broker->nb_virtual_ports > 0, -EINVAL);
+
+   broker->vports = rte_malloc("rte_representor_ports",
+   sizeo

[dpdk-dev] [DPDK 5/5] app/test-pmd: add Port Representor commands

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

This patch adds the 'add representor' and 'del representor' commands
to test-pmd, which respectively allow the adding and removing of
port representors.

Signed-off-by: Remy Horton 
---
 app/test-pmd/cmdline.c | 88 ++
 1 file changed, 88 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f71d963..1a831ba 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -15535,6 +15536,91 @@ cmdline_parse_inst_t cmd_load_from_file = {
},
 };
 
+struct cmd_add_representor_result {
+   cmdline_fixed_string_t cmd;
+   cmdline_fixed_string_t representor;
+   cmdline_fixed_string_t pf;
+   uint16_t vport;
+};
+
+cmdline_parse_token_string_t cmd_addrepresentor_add =
+TOKEN_STRING_INITIALIZER(struct cmd_add_representor_result,
+   cmd, "add");
+cmdline_parse_token_string_t cmd_addrepresentor_del =
+TOKEN_STRING_INITIALIZER(struct cmd_add_representor_result,
+   cmd, "del");
+cmdline_parse_token_string_t cmd_addrepresentor_rep =
+TOKEN_STRING_INITIALIZER(struct cmd_add_representor_result,
+   representor, "representor");
+cmdline_parse_token_string_t cmd_addrepresentor_pf =
+TOKEN_STRING_INITIALIZER(struct cmd_add_representor_result,
+   pf, NULL);
+cmdline_parse_token_num_t cmd_addrepresentor_vport =
+TOKEN_NUM_INITIALIZER(struct cmd_add_representor_result,
+   vport, UINT16);
+
+static void cmd_add_representor_callback(void *parsed_result,
+   __attribute__((unused))  struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_add_representor_result *res = parsed_result;
+   uint16_t port_id;
+   int ret;
+
+   rte_log(RTE_LOG_INFO, RTE_LOGTYPE_USER1, "%s(): addr:%s vport:%i\n",
+   __func__, res->pf, res->vport);
+
+   ret = rte_representor_port_register(res->pf, res->vport, &port_id);
+   if (ret != 0)
+   printf("Registering port representor failed\n");
+   else
+   printf("Port Representor registered with port id %i\n",
+   port_id);
+}
+
+static void cmd_del_representor_callback(void *parsed_result,
+   __attribute__((unused))  struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_add_representor_result *res = parsed_result;
+   int ret;
+
+   rte_log(RTE_LOG_INFO, RTE_LOGTYPE_USER1, "%s(): port:%i\n", __func__,
+   res->vport);
+   ret = rte_representor_port_unregister(res->pf, res->vport);
+   if (ret != 0)
+   printf("Port %i is not a valid port representor.\n",
+   res->vport);
+}
+
+cmdline_parse_inst_t cmd_add_representor = {
+   .f = cmd_add_representor_callback,
+   .help_str = "add representor   "
+   "Add a Port Representor",
+   .data = NULL,
+   .tokens = {
+   (void *)&cmd_addrepresentor_add,
+   (void *)&cmd_addrepresentor_rep,
+   (void *)&cmd_addrepresentor_pf,
+   (void *)&cmd_addrepresentor_vport,
+   NULL
+   }
+};
+
+cmdline_parse_inst_t cmd_del_representor = {
+   .f = cmd_del_representor_callback,
+   .help_str = "del representor   "
+   "Delete a Port Representor",
+   .data = NULL,
+   .tokens = {
+   (void *)&cmd_addrepresentor_del,
+   (void *)&cmd_addrepresentor_rep,
+   (void *)&cmd_addrepresentor_pf,
+   (void *)&cmd_addrepresentor_vport,
+   NULL
+   }
+};
+
 /* 

 */
 
 /* list of instructions */
@@ -15576,6 +15662,8 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *) &cmd_show_bonding_config,
(cmdline_parse_inst_t *) &cmd_set_bonding_primary,
(cmdline_parse_inst_t *) &cmd_add_bonding_slave,
+   (cmdline_parse_inst_t *) &cmd_add_representor,
+   (cmdline_parse_inst_t *) &cmd_del_representor,
(cmdline_parse_inst_t *) &cmd_remove_bonding_slave,
(cmdline_parse_inst_t *) &cmd_create_bonded_device,
(cmdline_parse_inst_t *) &cmd_set_bond_mac_addr,
-- 
2.9.5



[dpdk-dev] [DPDK 4/5] drivers/net/ixgbe: add Port Representor functionality

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

This patch adds to the ixgbe PMD PMD the functions required to enable
port representor functionality.

Signed-off-by: Declan Doherty 
Signed-off-by: Mohammad Abdul Awal 
Signed-off-by: Remy Horton 
---
 drivers/net/ixgbe/Makefile |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c   |  22 +++-
 drivers/net/ixgbe/ixgbe_ethdev.h   |   5 +
 drivers/net/ixgbe/ixgbe_prep_ops.c | 259 +
 drivers/net/ixgbe/ixgbe_prep_ops.h |  15 +++
 5 files changed, 301 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_prep_ops.c
 create mode 100644 drivers/net/ixgbe/ixgbe_prep_ops.h

diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index 511a64e..4ec2422 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -130,6 +130,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_ipsec.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += rte_pmd_ixgbe.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_prep_ops.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)-include := rte_pmd_ixgbe.h
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index ff19a56..a48b783 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -75,6 +75,7 @@
 #include "base/ixgbe_type.h"
 #include "base/ixgbe_phy.h"
 #include "ixgbe_regs.h"
+#include "ixgbe_prep_ops.h"
 
 /*
  * High threshold controlling when to start sending XOFF frames. Must be at
@@ -1138,6 +1139,9 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
uint32_t ctrl_ext;
uint16_t csum;
int diag, i;
+   int ret;
+   struct ixgbe_adapter *eth_adapter =
+   (struct ixgbe_adapter *)eth_dev->data->dev_private;
 
PMD_INIT_FUNC_TRACE();
 
@@ -1261,6 +1265,17 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
return -EIO;
}
 
+   /* Init port representor broker */
+   if (rte_representor_enabled()) {
+   ret = ixgbe_port_representor_broker_init(eth_dev,
+   ð_adapter->broker, pci_dev);
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Representor broker register failed "
+   "with ret=%d\n", ret);
+   return ret;
+   }
+   }
+
/* Reset the hw statistics */
ixgbe_dev_stats_reset(eth_dev);
 
@@ -1363,6 +1378,8 @@ eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev)
struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
struct rte_intr_handle *intr_handle = &pci_dev->intr_handle;
struct ixgbe_hw *hw;
+   struct ixgbe_adapter *eth_adapter =
+   (struct ixgbe_adapter *)eth_dev->data->dev_private;
 
PMD_INIT_FUNC_TRACE();
 
@@ -1371,6 +1388,9 @@ eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev)
 
hw = IXGBE_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 
+   if (rte_representor_enabled())
+   rte_representor_broker_uninit(eth_adapter->broker);
+
if (hw->adapter_stopped == 0)
ixgbe_dev_close(eth_dev);
 
@@ -3962,7 +3982,7 @@ ixgbevf_check_link(struct ixgbe_hw *hw, ixgbe_link_speed 
*speed,
 }
 
 /* return 0 means link status changed, -1 means not changed */
-static int
+int
 ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
int wait_to_complete, int vf)
 {
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 51ddcfd..4cc2cf0 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -499,6 +499,7 @@ struct ixgbe_adapter {
struct rte_timecounter  rx_tstamp_tc;
struct rte_timecounter  tx_tstamp_tc;
struct ixgbe_tm_conftm_conf;
+   struct rte_representor_broker *broker;
 };
 
 #define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
@@ -673,6 +674,10 @@ int ixgbe_fdir_filter_program(struct rte_eth_dev *dev,
 
 void ixgbe_configure_dcb(struct rte_eth_dev *dev);
 
+int
+ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
+   int wait_to_complete, int vf);
+
 /*
  * misc function prototypes
  */
diff --git a/drivers/net/ixgbe/ixgbe_prep_ops.c 
b/drivers/net/ixgbe/ixgbe_prep_ops.c
new file mode 100644
index 000..a06df27
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_prep_ops.c
@@ -0,0 +1,259 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017

[dpdk-dev] [DPDK 3/5] drivers/net/i40e: add Port Representor functionality

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

This patch adds to the i40e PMD the functions required to enable port
representor functionality.

Signed-off-by: Declan Doherty 
Signed-off-by: Mohammad Abdul Awal 
Signed-off-by: Remy Horton 
---
 drivers/net/i40e/Makefile|   1 +
 drivers/net/i40e/i40e_ethdev.c   |  16 ++
 drivers/net/i40e/i40e_ethdev.h   |   1 +
 drivers/net/i40e/i40e_prep_ops.c | 495 +++
 drivers/net/i40e/i40e_prep_ops.h |  15 ++
 drivers/net/i40e/rte_pmd_i40e.c  |  47 
 drivers/net/i40e/rte_pmd_i40e.h  |  18 ++
 7 files changed, 593 insertions(+)
 create mode 100644 drivers/net/i40e/i40e_prep_ops.c
 create mode 100644 drivers/net/i40e/i40e_prep_ops.h

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 9ab8c84..641bf26 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -113,6 +113,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += rte_pmd_i40e.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_prep_ops.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_I40E_PMD)-include := rte_pmd_i40e.h
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 811cc9f..65bb320 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -67,6 +67,7 @@
 #include "i40e_pf.h"
 #include "i40e_regs.h"
 #include "rte_pmd_i40e.h"
+#include "i40e_prep_ops.h"
 
 #define ETH_I40E_FLOATING_VEB_ARG  "enable_floating_veb"
 #define ETH_I40E_FLOATING_VEB_LIST_ARG "floating_veb_list"
@@ -1122,6 +1123,17 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
hw->bus.func = pci_dev->addr.function;
hw->adapter_stopped = 0;
 
+   /* init representor broker */
+   if (rte_representor_enabled()) {
+   ret = i40e_port_representor_broker_init(dev, &pf->broker,
+   pci_dev);
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Representor broker register failed "
+   "with ret=%d\n", ret);
+   return ret;
+   }
+   }
+
/* Make sure all is clean before doing PF reset */
i40e_clear_hw(hw);
 
@@ -1457,6 +1469,10 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
pci_dev = RTE_ETH_DEV_TO_PCI(dev);
intr_handle = &pci_dev->intr_handle;
 
+   /* free port representor pmds */
+   if (rte_representor_enabled())
+   rte_representor_broker_uninit(pf->broker);
+
if (hw->adapter_stopped == 0)
i40e_dev_close(dev);
 
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cd67453..9e962eb 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -957,6 +957,7 @@ struct i40e_pf {
bool gtp_replace_flag;   /* 1 - GTP-C/U filter replace is done */
bool qinq_replace_flag;  /* QINQ filter replace is done */
struct i40e_tm_conf tm_conf;
+   struct rte_representor_broker *broker;
 
/* Dynamic Device Personalization */
bool gtp_support; /* 1 - support GTP-C and GTP-U */
diff --git a/drivers/net/i40e/i40e_prep_ops.c b/drivers/net/i40e/i40e_prep_ops.c
new file mode 100644
index 000..41ce4d4
--- /dev/null
+++ b/drivers/net/i40e/i40e_prep_ops.c
@@ -0,0 +1,495 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+
+#include "base/i40e_type.h"
+#include "base/virtchnl.h"
+#include "i40e_ethdev.h"
+#include "i40e_rxtx.h"
+#include "rte_pmd_i40e.h"
+
+#include "i40e_prep_ops.h"
+
+struct i40e_representor_private_data {
+   struct rte_eth_dev *pf_ethdev;
+};
+
+static int
+i40e_representor_link_update(struct rte_eth_dev *ethdev, int wait_to_complete)
+{
+   struct rte_representor_port *representor = ethdev->data->dev_private;
+   struct i40e_representor_private_data *i40e_priv_data =
+   representor->priv_data;
+
+   return i40e_dev_link_update(i40e_priv_data->pf_ethdev,
+   wait_to_complete);
+}
+
+static void
+i40e_representor_dev_infos_get(struct rte_eth_dev *ethdev,
+   struct rte_eth_dev_info *dev_info)
+{
+   struct rte_representor_port *representor = ethdev->data->dev_private;
+   struct i40e_representor_private_data *i40e_priv_data =
+   representor->priv_data;
+   stru

Re: [dpdk-dev] [PATCH 1/2] mbuf: update default Mempool ops with HW active pool

2017-12-22 Thread Olivier MATZ
Hi,

On Fri, Dec 15, 2017 at 03:54:42PM +0530, Hemant Agrawal wrote:
> With this patch the specific HW mempool are no longer required to be
> specified in the config file at compile. A default active hw mempool
> can be detected dynamically and published to default mempools ops
> config at run time. Only one type of HW mempool can be active default.
> 
> Signed-off-by: Hemant Agrawal 
> ---
>  lib/librte_mbuf/rte_mbuf.c | 33 -
>  lib/librte_mbuf/rte_mbuf.h | 13 +
>  2 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 7543662..e074afa 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -148,6 +148,37 @@ rte_pktmbuf_init(struct rte_mempool *mp,
>   m->next = NULL;
>  }
>  
> +static const char *active_mbuf_pool_ops_name;
> +
> +int
> +rte_pktmbuf_reg_active_mempool_ops(const char *ops_name)

I think active_mempool is not the best name: it is not always active
if the user forces another one.

Since there is only one pool like this, would "platform_mempool" be a
better name?

For naming, I suggest "pktmbuf" can be "mbuf", it's shorter and there is
no need anymore to differentiate with ctrlmbuf, because ctrlmbuf will be
removed soon.  I also think "register" is clearer than "reg".  So, what
about rte_mbuf_register_platform_mempool_ops()?

> +{
> + if (active_mbuf_pool_ops_name == NULL) {
> + active_mbuf_pool_ops_name = ops_name;
> + return 0;
> + }
> + RTE_LOG(ERR, MBUF,
> + "%s is already registered as active pktmbuf pool ops\n",
> + active_mbuf_pool_ops_name);
> + return -EACCES;
> +}
> +
> +/* Return mbuf pool ops name */
> +static const char *
> +rte_pktmbuf_active_mempool_ops(void)
> +{
> + const char *default_ops = rte_eal_mbuf_default_mempool_ops();
> +
> + /* If mbuf default ops is same as compile time default
> +  * Just to be sure that no one has updated it by other means.
> +  */
> + if ((strcmp(default_ops, RTE_MBUF_DEFAULT_MEMPOOL_OPS) == 0) &&
> + (active_mbuf_pool_ops_name != NULL))
> + return active_mbuf_pool_ops_name;
> + else
> + return default_ops;
> +}

The name of this function is confusing because it does not really return
the active mempool. If the user selected a pool with
--mbuf-pool-ops-name, it is returned...

...except if --mbuf-pool-ops-name= was passed,
which I think is also very confusing.


Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules

2017-12-22 Thread Wiles, Keith


> On Dec 22, 2017, at 8:17 AM, Thomas Monjalon  wrote:
> 
> 22/12/2017 14:59, Wiles, Keith:
>> 
>>> On Dec 22, 2017, at 5:38 AM, Thomas Monjalon  wrote:
>>> 
>>> 22/12/2017 11:04, Hemant Agrawal:
 On 12/22/2017 2:13 PM, Thomas Monjalon wrote:
> These modules are Linux modules, so they should be in the linuxapp dir.
 
 
 This is a cleaner separation w.r.t userspace/kernel space code.
 *kern* is a better placefolder for LKMs.
>>> 
>>> I prefer "kernel" name.
>> 
>> The name should be related to Linux in some way, like linux_kern or 
>> linux_kernel or linux_modules (this is the one I prefer) this way it make it 
>> clear which OS they are designed for.
> 
> If such top-level directory is created, the BSD modules must be moved there 
> too.
> That's why "kernel/" or "kernel/linux/" is appropriate.

OK seems reasonable, what about kernel/{freebsd,Linux, …}/modules/(module-name 
e.g. kni, igb_uio, nic_uio, …)

Kernel is misleading IMO, but I can live with it as long as we break down the 
different kernel related items. This is why I add modules in the path, as we 
could have other OSes like Windows with items that are not modules or VMs or 
containers…

I can live with kernel/{freebsd, linux, …}/{igb_uio, kni, nic_uio, ..}  but I 
would like to make sure it does not change in the future with adding windows.

> 
 Also eal is not getting overloaded.
 
 linuxapp is part of librte_eal.  KNI is not related to EAL, but still 
 the kni kernel code is added to librte_eal under linuxapp.
>>> 
>>> Yes it makes sense.
>>> 
>>> More opinions/votes?
>>> 
> There are also some kernel modules in the bsdapp directory.
 
 We can move them as well.
> 

Regards,
Keith



[dpdk-dev] [PATCH v3 0/5] lib: add Port Representors

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual 
function) ports for the purposes of control and monitoring. Each port 
representor device represents a single VF and is associated with it's 
parent physical function (PF) PMD which provides the back-end hooks for 
the representor device ops and defines the control domain to which that 
port belongs. This allows to use existing DPDK APIs to monitor and control 
the port without the need to create and maintain VF specific APIs.

+-+   +---+  +---+
|Control Plane|   |   Data Plane  |  |   Data Plane  |
| Application |   |   Application |  |   Application |
+-+   +---+  +---+
| eth dev api |   |  eth dev api  |  |  eth dev api  |
+-+   +---+  +---+
+---+  +---+  +---+   +---+  +---+
|  PF0  |  | Port  |  | Port  |   |VF0 PMD|  |VF0 PMD|
|  PMD  <--+ Rep 0 |  | Rep 1 |   +---+  +--++
|   |  | PMD   |  | PMD   | |
+---+--^+  +---+  +-+-+ |
|  ||  ||
|  ++  ||
|  ||
|  ||
++  |
|   |  HW (logical view)   | |  |
| --+--+ +---+ +---+---+ |  |
| |   PF   | |  VF0  | |  VF1  | |  |
| || |   | |   ++
| ++ +---+ +---+ |
| ++ |
| |VEB | |
| ++ |
| ++ |
| |  Port  | |
| |   0| |
| ++ |
++

The figure above shows a deployment where the PF is bound to a DPDK control
plane application which uses representor ports to manage the configuration and
monitoring of it's VF ports. Each virtual function is represented in the
application by a representor port PMD which enables control of the corresponding
VF through eth dev APIs on the representor PMD such as:

- void rte_eth_promiscuous_enable(uint8_t port_id);
- void rte_eth_promiscuous_disable(uint8_t port_id);
- void rte_eth_allmulticast_enable(uint8_t port_id);
- void rte_eth_allmulticast_disable(uint8_t port_id);
- int rte_eth_dev_mac_addr_add(uint8_t port, struct ether_addr *mac_addr,
uint32_t pool);
- int rte_eth_dev_set_vlan_offload(uint8_t port_id, int offload_mask);

as well as monitoring through API's like

- void rte_eth_link_get(uint8_t port_id, struct rte_eth_link *link);
- int rte_eth_stats_get(uint8_t port_id, struct rte_eth_stats *stats);

The port representor infrastructure is enabled through a single common, device
independent, virtual PMD whos context is initialized and enabled through a
broker instance running within the context of the physical function device
driver.

+-+   +-+
|rte_ethdev   |   |   rte_ethdev|
+-+   +-+
|  Physical Function PMD  |   |  Port Reperesentor PMD  |
| +-+ |   | +-+ +-+ |
| | Representor | |   | | dev_data| | dev_ops | |
| |Broker   | |   | +++ +++ |
| | +-+ | |   +--|---|--+
| | | VF Port | | |  |   |
| | | Context +--+   |
| | +-+ | |  |
| | +-+ | |  |
| | | Handler +--+
| | |   Ops   | | |
| | +-+ | |
| +-+ |
+-+

Creation of representor ports can be achieved either through the --vdev EAL
option or through the rte_vdev_init() API. Each port representor requires the
BDF of it's parent PF and the Virtual Function ID of the port which the
representor will support. During initialization of the representor PMD, it calls
the broker API to register itself with the PF PMD and to get it's context
configured which includes the setting up of it's context and ops function
handlers.

As the port representor model is based around the paradigm of using standard
port based APIs, it will allow future expansion of functionality without the
need to add new APIs. For example it should be possible to support configuration
of egress QoS parameters using existing TM APIs by extending the port
representor PMD/broker infrastruc

[dpdk-dev] [PATCH v3 1/5] lib: add Port Representor library

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

The library provides the broker infrastructure to be instantiated by
base driver and corresponding methods to manage the broker
infrastructure. The broker keeps records of list of representor PMDs.
The library also provides methods to manage the representor PMDs by the
broker.

Signed-off-by: Declan Doherty 
Signed-off-by: Mohammad Abdul Awal 
Signed-off-by: Remy Horton 
---
 config/common_base |   5 +
 lib/Makefile   |   3 +
 lib/librte_representor/Makefile|  26 ++
 lib/librte_representor/rte_port_representor.c  | 326 +
 lib/librte_representor/rte_port_representor.h  |  60 
 .../rte_port_representor_driver.h  | 138 +
 .../rte_port_representor_version.map   |   8 +
 mk/rte.app.mk  |   1 +
 8 files changed, 567 insertions(+)
 create mode 100644 lib/librte_representor/Makefile
 create mode 100644 lib/librte_representor/rte_port_representor.c
 create mode 100644 lib/librte_representor/rte_port_representor.h
 create mode 100644 lib/librte_representor/rte_port_representor_driver.h
 create mode 100644 lib/librte_representor/rte_port_representor_version.map

diff --git a/config/common_base b/config/common_base
index e74febe..febb80a 100644
--- a/config/common_base
+++ b/config/common_base
@@ -820,3 +820,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile representor PMD
+#
+CONFIG_RTE_LIBRTE_REPRESENTOR=y
diff --git a/lib/Makefile b/lib/Makefile
index dc4e8df..b9202ff 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -101,6 +101,9 @@ DEPDIRS-librte_distributor := librte_eal librte_mbuf 
librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_PORT) += librte_port
 DEPDIRS-librte_port := librte_eal librte_mempool librte_mbuf librte_ether
 DEPDIRS-librte_port += librte_ip_frag librte_sched
+DIRS-$(CONFIG_RTE_LIBRTE_REPRESENTOR) += librte_representor
+DEPDIRS-librte_representor += librte_ether
+
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DEPDIRS-librte_port += librte_kni
 endif
diff --git a/lib/librte_representor/Makefile b/lib/librte_representor/Makefile
new file mode 100644
index 000..4060cc6
--- /dev/null
+++ b/lib/librte_representor/Makefile
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_representor.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_port_representor_version.map
+
+LIBABIVER := 1
+
+SRCS-$(CONFIG_RTE_LIBRTE_REPRESENTOR) += rte_port_representor.c
+
+#
+# Export include files
+#
+SYMLINK-$(CONFIG_RTE_LIBRTE_REPRESENTOR)-include += rte_port_representor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_REPRESENTOR)-include += 
rte_port_representor_driver.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_representor/rte_port_representor.c 
b/lib/librte_representor/rte_port_representor.c
new file mode 100644
index 000..69a4bfc
--- /dev/null
+++ b/lib/librte_representor/rte_port_representor.c
@@ -0,0 +1,326 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+
+#include 
+
+TAILQ_HEAD(rte_broker_list, rte_representor_broker);
+
+struct rte_broker_list broker_list =
+   TAILQ_HEAD_INITIALIZER(broker_list);
+
+struct port_rep_parameters {
+   uint64_t vport_mask;
+   struct {
+   char bus[RTE_DEV_NAME_MAX_LEN];
+   char device[RTE_DEV_NAME_MAX_LEN];
+   } parent;
+};
+
+/* Macros to check for valid id */
+#define RTE_VERIFY_OR_ERR_RET(val, retval) do { \
+   if (!(val)) { \
+   RTE_PMD_DEBUG_TRACE("verify failed, ret= %d", (retval)); \
+   return retval; \
+   } \
+} while (0)
+
+#define RTE_VERIFY_OR_RET(val) do { \
+   if (!(val)) { \
+   RTE_PMD_DEBUG_TRACE("verify failed"); \
+   return; \
+   } \
+} while (0)
+
+int
+rte_representor_broker_init(struct rte_representor_broker *broker)
+{
+   RTE_VERIFY_OR_ERR_RET(broker, -ENODEV);
+
+   RTE_VERIFY_OR_ERR_RET(broker->bus && strlen(broker->bus), -ENXIO);
+   RTE_VERIFY_OR_ERR_RET(broker->device && strlen(broker->device), -ENXIO);
+
+   RTE_VERIFY_OR_ERR_RET(broker->nb_virtual_ports > 0, -EINVAL);
+
+   broker->vports = rte_malloc("rte_representor_ports",
+   sizeo

[dpdk-dev] [PATCH v3 2/5] eal: add Port Representor command-line option

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

By default the Port Representor infrastructure is not enabled. This
patch implements the --enable-representor EAL command-line parameter
that activates representation functionality.

Signed-off-by: Declan Doherty 
Signed-off-by: Mohammad Abdul Awal 
Signed-off-by: Remy Horton 
---
 lib/librte_eal/bsdapp/eal/eal.c| 6 ++
 lib/librte_eal/common/eal_common_options.c | 1 +
 lib/librte_eal/common/eal_internal_cfg.h   | 2 ++
 lib/librte_eal/common/eal_options.h| 2 ++
 lib/librte_eal/common/include/rte_eal.h| 8 
 lib/librte_eal/linuxapp/eal/eal.c  | 9 +
 6 files changed, 28 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 369a682..002200a 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -778,3 +778,9 @@ int rte_vfio_noiommu_is_enabled(void)
 {
return 0;
 }
+
+/* return non-zero if port-representor is enabled. */
+int rte_representor_enabled(void)
+{
+   return internal_config.enable_representor;
+}
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 996a034..6f2cc05 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -78,6 +78,7 @@ const struct option
 eal_long_options[] = {
{OPT_BASE_VIRTADDR, 1, NULL, OPT_BASE_VIRTADDR_NUM},
{OPT_CREATE_UIO_DEV,0, NULL, OPT_CREATE_UIO_DEV_NUM   },
+   {OPT_ENABLE_REPRESENTOR, 0, NULL, OPT_ENABLE_REPRESENTOR_NUM   },
{OPT_FILE_PREFIX,   1, NULL, OPT_FILE_PREFIX_NUM  },
{OPT_HELP,  0, NULL, OPT_HELP_NUM },
{OPT_HUGE_DIR,  1, NULL, OPT_HUGE_DIR_NUM },
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index fa6ccbe..55cae8c 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -71,6 +71,8 @@ struct internal_config {

* instead of native TSC */
volatile unsigned no_shconf;  /**< true if there is no shared 
config */
volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices 
*/
+   volatile unsigned enable_representor;
+   /**< true to enable port representor broker for all PFs */
volatile enum rte_proc_type_t process_type; /**< multi-process proc 
type */
/** true to try allocating memory on specific sockets */
volatile unsigned force_sockets;
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index 30e6bb4..c2b2162 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -83,6 +83,8 @@ enum {
OPT_VFIO_INTR_NUM,
 #define OPT_VMWARE_TSC_MAP"vmware-tsc-map"
OPT_VMWARE_TSC_MAP_NUM,
+#define OPT_ENABLE_REPRESENTOR"enable-representor"
+   OPT_ENABLE_REPRESENTOR_NUM,
OPT_LONG_MAX_NUM
 };
 
diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index 8e4e71c..c4e61d1 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -335,6 +335,14 @@ enum rte_iova_mode rte_eal_iova_mode(void);
 const char *
 rte_eal_mbuf_default_mempool_ops(void);
 
+/**
+ * Get flag for port representor should be enabled or not.
+ *
+ * @return
+ *   Returns the enable-representor flag.
+ */
+int rte_representor_enabled(void);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 229eec9..364a8b2 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -612,6 +612,10 @@ eal_parse_args(int argc, char **argv)
internal_config.mbuf_pool_ops_name = optarg;
break;
 
+   case OPT_ENABLE_REPRESENTOR_NUM:
+   internal_config.enable_representor = 1;
+   break;
+
default:
if (opt < OPT_LONG_MIN_NUM && isprint(opt)) {
RTE_LOG(ERR, EAL, "Option %c is not supported "
@@ -1041,3 +1045,8 @@ rte_eal_check_module(const char *module_name)
/* Module has been found */
return 1;
 }
+
+int rte_representor_enabled(void)
+{
+   return internal_config.enable_representor;

[dpdk-dev] [PATCH v3 4/5] drivers/net/ixgbe: add Port Representor functionality

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

This patch adds to the ixgbe PMD PMD the functions required to enable
port representor functionality.

Signed-off-by: Declan Doherty 
Signed-off-by: Mohammad Abdul Awal 
Signed-off-by: Remy Horton 
---
 drivers/net/ixgbe/Makefile |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c   |  22 +++-
 drivers/net/ixgbe/ixgbe_ethdev.h   |   5 +
 drivers/net/ixgbe/ixgbe_prep_ops.c | 259 +
 drivers/net/ixgbe/ixgbe_prep_ops.h |  15 +++
 5 files changed, 301 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ixgbe/ixgbe_prep_ops.c
 create mode 100644 drivers/net/ixgbe/ixgbe_prep_ops.h

diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index 511a64e..4ec2422 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -130,6 +130,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_ipsec.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += rte_pmd_ixgbe.c
 SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe_prep_ops.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)-include := rte_pmd_ixgbe.h
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index ff19a56..a48b783 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -75,6 +75,7 @@
 #include "base/ixgbe_type.h"
 #include "base/ixgbe_phy.h"
 #include "ixgbe_regs.h"
+#include "ixgbe_prep_ops.h"
 
 /*
  * High threshold controlling when to start sending XOFF frames. Must be at
@@ -1138,6 +1139,9 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
uint32_t ctrl_ext;
uint16_t csum;
int diag, i;
+   int ret;
+   struct ixgbe_adapter *eth_adapter =
+   (struct ixgbe_adapter *)eth_dev->data->dev_private;
 
PMD_INIT_FUNC_TRACE();
 
@@ -1261,6 +1265,17 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev)
return -EIO;
}
 
+   /* Init port representor broker */
+   if (rte_representor_enabled()) {
+   ret = ixgbe_port_representor_broker_init(eth_dev,
+   ð_adapter->broker, pci_dev);
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Representor broker register failed "
+   "with ret=%d\n", ret);
+   return ret;
+   }
+   }
+
/* Reset the hw statistics */
ixgbe_dev_stats_reset(eth_dev);
 
@@ -1363,6 +1378,8 @@ eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev)
struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
struct rte_intr_handle *intr_handle = &pci_dev->intr_handle;
struct ixgbe_hw *hw;
+   struct ixgbe_adapter *eth_adapter =
+   (struct ixgbe_adapter *)eth_dev->data->dev_private;
 
PMD_INIT_FUNC_TRACE();
 
@@ -1371,6 +1388,9 @@ eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev)
 
hw = IXGBE_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 
+   if (rte_representor_enabled())
+   rte_representor_broker_uninit(eth_adapter->broker);
+
if (hw->adapter_stopped == 0)
ixgbe_dev_close(eth_dev);
 
@@ -3962,7 +3982,7 @@ ixgbevf_check_link(struct ixgbe_hw *hw, ixgbe_link_speed 
*speed,
 }
 
 /* return 0 means link status changed, -1 means not changed */
-static int
+int
 ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
int wait_to_complete, int vf)
 {
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 51ddcfd..4cc2cf0 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -499,6 +499,7 @@ struct ixgbe_adapter {
struct rte_timecounter  rx_tstamp_tc;
struct rte_timecounter  tx_tstamp_tc;
struct ixgbe_tm_conftm_conf;
+   struct rte_representor_broker *broker;
 };
 
 #define IXGBE_DEV_PRIVATE_TO_HW(adapter)\
@@ -673,6 +674,10 @@ int ixgbe_fdir_filter_program(struct rte_eth_dev *dev,
 
 void ixgbe_configure_dcb(struct rte_eth_dev *dev);
 
+int
+ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
+   int wait_to_complete, int vf);
+
 /*
  * misc function prototypes
  */
diff --git a/drivers/net/ixgbe/ixgbe_prep_ops.c 
b/drivers/net/ixgbe/ixgbe_prep_ops.c
new file mode 100644
index 000..a06df27
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_prep_ops.c
@@ -0,0 +1,259 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017

[dpdk-dev] [PATCH v3 3/5] drivers/net/i40e: add Port Representor functionality

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

This patch adds to the i40e PMD the functions required to enable port
representor functionality.

Signed-off-by: Declan Doherty 
Signed-off-by: Mohammad Abdul Awal 
Signed-off-by: Remy Horton 
---
 drivers/net/i40e/Makefile|   1 +
 drivers/net/i40e/i40e_ethdev.c   |  16 ++
 drivers/net/i40e/i40e_ethdev.h   |   1 +
 drivers/net/i40e/i40e_prep_ops.c | 495 +++
 drivers/net/i40e/i40e_prep_ops.h |  15 ++
 drivers/net/i40e/rte_pmd_i40e.c  |  47 
 drivers/net/i40e/rte_pmd_i40e.h  |  18 ++
 7 files changed, 593 insertions(+)
 create mode 100644 drivers/net/i40e/i40e_prep_ops.c
 create mode 100644 drivers/net/i40e/i40e_prep_ops.h

diff --git a/drivers/net/i40e/Makefile b/drivers/net/i40e/Makefile
index 9ab8c84..641bf26 100644
--- a/drivers/net/i40e/Makefile
+++ b/drivers/net/i40e/Makefile
@@ -113,6 +113,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_fdir.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += rte_pmd_i40e.c
 SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_tm.c
+SRCS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e_prep_ops.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_I40E_PMD)-include := rte_pmd_i40e.h
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 811cc9f..65bb320 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -67,6 +67,7 @@
 #include "i40e_pf.h"
 #include "i40e_regs.h"
 #include "rte_pmd_i40e.h"
+#include "i40e_prep_ops.h"
 
 #define ETH_I40E_FLOATING_VEB_ARG  "enable_floating_veb"
 #define ETH_I40E_FLOATING_VEB_LIST_ARG "floating_veb_list"
@@ -1122,6 +1123,17 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
hw->bus.func = pci_dev->addr.function;
hw->adapter_stopped = 0;
 
+   /* init representor broker */
+   if (rte_representor_enabled()) {
+   ret = i40e_port_representor_broker_init(dev, &pf->broker,
+   pci_dev);
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Representor broker register failed "
+   "with ret=%d\n", ret);
+   return ret;
+   }
+   }
+
/* Make sure all is clean before doing PF reset */
i40e_clear_hw(hw);
 
@@ -1457,6 +1469,10 @@ eth_i40e_dev_uninit(struct rte_eth_dev *dev)
pci_dev = RTE_ETH_DEV_TO_PCI(dev);
intr_handle = &pci_dev->intr_handle;
 
+   /* free port representor pmds */
+   if (rte_representor_enabled())
+   rte_representor_broker_uninit(pf->broker);
+
if (hw->adapter_stopped == 0)
i40e_dev_close(dev);
 
diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index cd67453..9e962eb 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -957,6 +957,7 @@ struct i40e_pf {
bool gtp_replace_flag;   /* 1 - GTP-C/U filter replace is done */
bool qinq_replace_flag;  /* QINQ filter replace is done */
struct i40e_tm_conf tm_conf;
+   struct rte_representor_broker *broker;
 
/* Dynamic Device Personalization */
bool gtp_support; /* 1 - support GTP-C and GTP-U */
diff --git a/drivers/net/i40e/i40e_prep_ops.c b/drivers/net/i40e/i40e_prep_ops.c
new file mode 100644
index 000..41ce4d4
--- /dev/null
+++ b/drivers/net/i40e/i40e_prep_ops.c
@@ -0,0 +1,495 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+
+#include "base/i40e_type.h"
+#include "base/virtchnl.h"
+#include "i40e_ethdev.h"
+#include "i40e_rxtx.h"
+#include "rte_pmd_i40e.h"
+
+#include "i40e_prep_ops.h"
+
+struct i40e_representor_private_data {
+   struct rte_eth_dev *pf_ethdev;
+};
+
+static int
+i40e_representor_link_update(struct rte_eth_dev *ethdev, int wait_to_complete)
+{
+   struct rte_representor_port *representor = ethdev->data->dev_private;
+   struct i40e_representor_private_data *i40e_priv_data =
+   representor->priv_data;
+
+   return i40e_dev_link_update(i40e_priv_data->pf_ethdev,
+   wait_to_complete);
+}
+
+static void
+i40e_representor_dev_infos_get(struct rte_eth_dev *ethdev,
+   struct rte_eth_dev_info *dev_info)
+{
+   struct rte_representor_port *representor = ethdev->data->dev_private;
+   struct i40e_representor_private_data *i40e_priv_data =
+   representor->priv_data;
+   stru

[dpdk-dev] [PATCH v3 5/5] app/test-pmd: add Port Representor commands

2017-12-22 Thread Remy Horton
Port Representors provide a logical presentation in DPDK of VF (virtual
function) ports for the purposes of control and monitoring. Each port
representor device represents a single VF and is associated with it's
parent physical function (PF) PMD which provides the back-end hooks for
the representor device ops and defines the control domain to which that
port belongs. This allows to use existing DPDK APIs to monitor and control
the port without the need to create and maintain VF specific APIs.

This patch adds the 'add representor' and 'del representor' commands
to test-pmd, which respectively allow the adding and removing of
port representors.

Signed-off-by: Remy Horton 
---
 app/test-pmd/cmdline.c | 88 ++
 1 file changed, 88 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f71d963..1a831ba 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -76,6 +76,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -15535,6 +15536,91 @@ cmdline_parse_inst_t cmd_load_from_file = {
},
 };
 
+struct cmd_add_representor_result {
+   cmdline_fixed_string_t cmd;
+   cmdline_fixed_string_t representor;
+   cmdline_fixed_string_t pf;
+   uint16_t vport;
+};
+
+cmdline_parse_token_string_t cmd_addrepresentor_add =
+TOKEN_STRING_INITIALIZER(struct cmd_add_representor_result,
+   cmd, "add");
+cmdline_parse_token_string_t cmd_addrepresentor_del =
+TOKEN_STRING_INITIALIZER(struct cmd_add_representor_result,
+   cmd, "del");
+cmdline_parse_token_string_t cmd_addrepresentor_rep =
+TOKEN_STRING_INITIALIZER(struct cmd_add_representor_result,
+   representor, "representor");
+cmdline_parse_token_string_t cmd_addrepresentor_pf =
+TOKEN_STRING_INITIALIZER(struct cmd_add_representor_result,
+   pf, NULL);
+cmdline_parse_token_num_t cmd_addrepresentor_vport =
+TOKEN_NUM_INITIALIZER(struct cmd_add_representor_result,
+   vport, UINT16);
+
+static void cmd_add_representor_callback(void *parsed_result,
+   __attribute__((unused))  struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_add_representor_result *res = parsed_result;
+   uint16_t port_id;
+   int ret;
+
+   rte_log(RTE_LOG_INFO, RTE_LOGTYPE_USER1, "%s(): addr:%s vport:%i\n",
+   __func__, res->pf, res->vport);
+
+   ret = rte_representor_port_register(res->pf, res->vport, &port_id);
+   if (ret != 0)
+   printf("Registering port representor failed\n");
+   else
+   printf("Port Representor registered with port id %i\n",
+   port_id);
+}
+
+static void cmd_del_representor_callback(void *parsed_result,
+   __attribute__((unused))  struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_add_representor_result *res = parsed_result;
+   int ret;
+
+   rte_log(RTE_LOG_INFO, RTE_LOGTYPE_USER1, "%s(): port:%i\n", __func__,
+   res->vport);
+   ret = rte_representor_port_unregister(res->pf, res->vport);
+   if (ret != 0)
+   printf("Port %i is not a valid port representor.\n",
+   res->vport);
+}
+
+cmdline_parse_inst_t cmd_add_representor = {
+   .f = cmd_add_representor_callback,
+   .help_str = "add representor   "
+   "Add a Port Representor",
+   .data = NULL,
+   .tokens = {
+   (void *)&cmd_addrepresentor_add,
+   (void *)&cmd_addrepresentor_rep,
+   (void *)&cmd_addrepresentor_pf,
+   (void *)&cmd_addrepresentor_vport,
+   NULL
+   }
+};
+
+cmdline_parse_inst_t cmd_del_representor = {
+   .f = cmd_del_representor_callback,
+   .help_str = "del representor   "
+   "Delete a Port Representor",
+   .data = NULL,
+   .tokens = {
+   (void *)&cmd_addrepresentor_del,
+   (void *)&cmd_addrepresentor_rep,
+   (void *)&cmd_addrepresentor_pf,
+   (void *)&cmd_addrepresentor_vport,
+   NULL
+   }
+};
+
 /* 

 */
 
 /* list of instructions */
@@ -15576,6 +15662,8 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *) &cmd_show_bonding_config,
(cmdline_parse_inst_t *) &cmd_set_bonding_primary,
(cmdline_parse_inst_t *) &cmd_add_bonding_slave,
+   (cmdline_parse_inst_t *) &cmd_add_representor,
+   (cmdline_parse_inst_t *) &cmd_del_representor,
(cmdline_parse_inst_t *) &cmd_remove_bonding_slave,
(cmdline_parse_inst_t *) &cmd_create_bonded_device,
(cmdline_parse_inst_t *) &cmd_set_bond_mac_addr,
-- 
2.9.5



Re: [dpdk-dev] [PATCH 1/2] mbuf: update default Mempool ops with HW active pool

2017-12-22 Thread Olivier MATZ
On Mon, Dec 18, 2017 at 03:06:21PM +0530, Hemant Agrawal wrote:
> On 12/18/2017 2:25 PM, Jerin Jacob wrote:
> > -Original Message-
> > > Date: Fri, 15 Dec 2017 15:54:42 +0530
> > > From: Hemant Agrawal 
> > > To: olivier.m...@6wind.com, santosh.shu...@caviumnetworks.com
> > > CC: dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH 1/2] mbuf: update default Mempool ops with HW
> > >  active pool
> > > X-Mailer: git-send-email 2.7.4
> > > 
> > > With this patch the specific HW mempool are no longer required to be
> > > specified in the config file at compile. A default active hw mempool
> > > can be detected dynamically and published to default mempools ops
> > > config at run time. Only one type of HW mempool can be active default.
> > 
> > For me, it looks very reasonable approach as it caters the basic use
> > case without any change in the application nor the 
> > additional(--mbuf-pool-ops-name)
> > EAL command line scheme to select different mempool ops.
> > Though, this option will not enough cater all the use case. I think, we can 
> > have
> > three options and the following order of precedence to select the mempool 
> > ops
> > 
> > 1) This patch(update active mempool based on the device probe())
> > 2) Selection of mempool ops though --mbuf-pool-ops-name= EAL commandline 
> > argument.
> > Which can overridden the scheme(1)
> > 3) More sophisticated mempool section based on
> > a) The ethdev PMD capability exposed through existing 
> > rte_eth_dev_pool_ops_supported()
> > b) Add mempool ops option in rte_pktmbuf_pool_create()
> > http://dpdk.org/ml/archives/dev/2017-December/083985.html
> > c) Use (a) and (b) to select the update the mempool ops with
> > some "weight" based algorithm like
> > http://dpdk.org/dev/patchwork/patch/32245/
> > 
> 
> Yes! We need more options to fine tune control over the mempool uses,
> specially when dealing with HW mempools.
> 
> Once the above mentioned mechanisms will be in place, it will be much easier
> and flexible.

I'm inline with this description. It would be great if the same binary can work
on different platforms without configuration.

I just feel it's a bit messy to have:

- rte_eal_mbuf_default_mempool_ops() in eal API
  return user-selected ops if any, or compile-time default

- rte_pktmbuf_active_mempool_ops() in mbuf API
  return platform ops except if a selected user ops != compile default

Thomas suggested somewhere (but I don't remember in which thread) to have
rte_eal_mbuf_default_mempool_ops() in mbuf code, and I think he was right.

I think the whole mbuf pool ops selection mechanism should be at the
same place. I could be in a specific file of librte_mbuf.

The API could be:
- get compile time default ops
- get/set platform ops (NULL if none)
- get/set user ops (NULL if none)
- get preferred ops from currently configured PMD

- get best ops: return user, or pmd-prefered, or platform, or default.

rte_pktmbuf_pool_create() will use "get best ops" if no ops (NULL) is
passed as argument.


Re: [dpdk-dev] [DPDK 0/5] lib: add Port Representors

2017-12-22 Thread Remy Horton


On 22/12/2017 14:41, Remy Horton wrote:

Port Representors provide a logical presentation in DPDK of VF (virtual

[..]

Remy Horton (5):
  lib: add Port Representor library
  eal: add Port Representor command-line option
  drivers/net/i40e: add Port Representor functionality
  drivers/net/ixgbe: add Port Representor functionality
  app/test-pmd: add Port Representor commands


Opps, misconfigured patch script. Self-NAK on this patchset, and resent 
with correct subject headers.


Re: [dpdk-dev] [PATCH 1/2] mempool: notify mempool area after mempool alloc

2017-12-22 Thread Olivier MATZ
On Fri, Dec 15, 2017 at 09:30:30PM +0530, Pavan Nikhilesh wrote:
> Mempool creation needs to be completed first before notifying mempool to
> register the mempool area.
> 
> Signed-off-by: Pavan Nikhilesh 

Looks good to me.

Did you see any issue?
If yes, can you please resubmit the patch
with an updated title "mempool: fix first memory area notification",
a Fixes: and Cc: sta...@dpdk.org?

Thanks,
Olivier


Re: [dpdk-dev] [RFC v1] doc compression API for DPDK

2017-12-22 Thread Trahe, Fiona
Hi Shally,

> -Original Message-
> From: Verma, Shally [mailto:shally.ve...@cavium.com]
> Sent: Friday, December 22, 2017 7:46 AM
> To: Trahe, Fiona ; dev@dpdk.org
> Cc: Athreya, Narayana Prasad ; Gupta, 
> Ashish
> ; Sahu, Sunila ; De Lara 
> Guarch, Pablo
> ; Challa, Mahipal 
> ; Jain, Deepak K
> ; Hemant Agrawal ; Roy Pledge
> ; Youri Querry ; Ahmed Mansour
> ; Trahe, Fiona 
> Subject: RE: [RFC v1] doc compression API for DPDK
> 
> Hi Fiona
> 
> > -Original Message-
> > From: Trahe, Fiona [mailto:fiona.tr...@intel.com]
> > Sent: 20 December 2017 21:03
> > To: Verma, Shally ; dev@dpdk.org
> > Cc: Athreya, Narayana Prasad ;
> > Gupta, Ashish ; Sahu, Sunila
> > ; De Lara Guarch, Pablo
> > ; Challa, Mahipal
> > ; Jain, Deepak K ;
> > Hemant Agrawal ; Roy Pledge
> > ; Youri Querry ; Ahmed
> > Mansour ; Trahe, Fiona
> > 
> > Subject: RE: [RFC v1] doc compression API for DPDK
> >
> > Hi Shally,
> >
> > I think we are almost in sync now - a few comments below with just one
> > open question which I suspect was a typo.
> > If this is ok then no need for a meeting I think.
> > In this case will you issue a v2 of this doc ?
> >
> >
> > > -Original Message-
> > > From: Verma, Shally [mailto:shally.ve...@cavium.com]
> > > Sent: Wednesday, December 20, 2017 7:15 AM
> > > To: Trahe, Fiona ; dev@dpdk.org
> > > Cc: Athreya, Narayana Prasad ;
> > Gupta, Ashish
> > > ; Sahu, Sunila ;
> > De Lara Guarch, Pablo
> > > ; Challa, Mahipal
> > ; Jain, Deepak K
> > > ; Hemant Agrawal
> > ; Roy Pledge
> > > ; Youri Querry ;
> > Ahmed Mansour
> > > 
> > > Subject: RE: [RFC v1] doc compression API for DPDK
> > >
> > > Hi Fiona
> > >
> > > Please refer to my comments below with my understanding on two major
> > points OUT_OF_SPACE and
> > > Stateful Design.
> > > If you believe we still need a meeting to converge on same please share
> > meeting details to me.
> > >
> > >
> > > > -Original Message-
> > > > From: Trahe, Fiona [mailto:fiona.tr...@intel.com]
> > > > Sent: 15 December 2017 23:11
> > > > To: Verma, Shally ; dev@dpdk.org
> > > > Cc: Athreya, Narayana Prasad ;
> > > > Challa, Mahipal ; De Lara Guarch, Pablo
> > > > ; Gupta, Ashish
> > > > ; Sahu, Sunila ;
> > > > Trahe, Fiona ; Jain, Deepak K
> > > > 
> > > > Subject: RE: [RFC v1] doc compression API for DPDK
> > > >
> > > > Hi Shally,
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Verma, Shally [mailto:shally.ve...@cavium.com]
> > > > > Sent: Thursday, December 7, 2017 5:43 AM
> > > > > To: Trahe, Fiona ; dev@dpdk.org
> > > > > Cc: Athreya, Narayana Prasad
> > ;
> > > > Challa, Mahipal
> > > > > ; De Lara Guarch, Pablo
> > > > ; Gupta, Ashish
> > > > > ; Sahu, Sunila
> > 
> > > > > Subject: RE: [RFC v1] doc compression API for DPDK
> > > >
> > > > //snip
> > > >
> > > > > > > > > Please note any time output buffer ran out of space during 
> > > > > > > > > write
> > > > then
> > > > > > > > operation will turn “Stateful”.  See
> > > > > > > > > more on Stateful under respective section.
> > > > > > > > [Fiona] Let's come back to this later. An alternative is that
> > > > > > OUT_OF_SPACE is
> > > > > > > > returned and the  application
> > > > > > > > must treat as a fail and resubmit the operation with a larger
> > > > destination
> > > > > > > > buffer.
> > > > > > >
> > > > > > > [Shally] Then I propose to add a feature flag
> > > > > > "FF_SUPPORT_OUT_OF_SPACE" per xform type for flexible
> > > > > > > PMD design.
> > > > > > > As there're devices which treat it as error on compression but not
> > on
> > > > > > decompression.
> > > > > > > If it is not supported, then it should be treated as failure 
> > > > > > > condition
> > and
> > > > app
> > > > > > can resubmit operation.
> > > > > > > if supported, behaviour *To-be-Defined* under stateful.
> > > > > > [Fiona] Can you explain 'turn stateful' some more?
> > > > > > If compressor runs out of space during stateless operation, either
> > comp
> > > > or
> > > > > > decomp, and turns stateful, how would the app know? And what
> > would
> > > > be in
> > > > > > status, consumed and produced?
> > > > > > Could it return OUT_OF_SPACE, and if both consumed and produced
> > == 0
> > > > >
> > > > > [Shally] If consumed = produced == 0, then it's not OUT_OF_SPACE
> > > > condition.
> > > > >
> > > > > > then the whole op must be resubmitted with a bigger output buffer.
> > But
> > > > if
> > > > > > consumed and produced > 0 then app could take the output and
> > submit
> > > > next
> > > > > > op
> > > > > > continuing from consumed+1.
> > > > > >
> > > > >
> > > > > [Shally] consumed and produced will *always* be > 0 in case of
> > > > OUT_OF_SPACE.
> > > > > OUT_OF_SPACE means output buffer exhausted while writing data into
> > it
> > > > and PMD may have more to
> > > > > write to it. So in such case, PMD should set
> > > > > Produced = complete length of output buffer
> > > > > Status = OUT_OF_SPACE
> > > > > consume, following possibilities here:
> > > > > 1

[dpdk-dev] [PATCH v2 02/12] bus/dpaa: add event dequeue and consumption support

2017-12-22 Thread Sunil Kumar Kori
To receive events from given event port, corresponding
function needs to be added which receives events
from portal. Also added function to consume received
events based on entry index.

Signed-off-by: Sunil Kumar Kori 
---
 drivers/bus/dpaa/base/qbman/qman.c| 91 +--
 drivers/bus/dpaa/dpaa_bus.c   |  1 +
 drivers/bus/dpaa/include/fsl_qman.h   | 26 +++--
 drivers/bus/dpaa/rte_bus_dpaa_version.map |  5 ++
 drivers/bus/dpaa/rte_dpaa_bus.h   | 14 +
 drivers/net/dpaa/dpaa_rxtx.c  |  1 +
 6 files changed, 129 insertions(+), 9 deletions(-)

diff --git a/drivers/bus/dpaa/base/qbman/qman.c 
b/drivers/bus/dpaa/base/qbman/qman.c
index 42d509d..532afac 100644
--- a/drivers/bus/dpaa/base/qbman/qman.c
+++ b/drivers/bus/dpaa/base/qbman/qman.c
@@ -41,6 +41,8 @@
 #include "qman.h"
 #include 
 #include 
+#include 
+#include 
 
 /* Compilation constants */
 #define DQRR_MAXFILL   15
@@ -1144,6 +1146,74 @@ unsigned int qman_portal_poll_rx(unsigned int poll_limit,
return limit;
 }
 
+u32 qman_portal_dequeue(struct rte_event ev[], unsigned int poll_limit,
+   void **bufs)
+{
+   const struct qm_dqrr_entry *dq;
+   struct qman_fq *fq;
+   enum qman_cb_dqrr_result res;
+   unsigned int limit = 0;
+   struct qman_portal *p = get_affine_portal();
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   struct qm_dqrr_entry *shadow;
+#endif
+   unsigned int rx_number = 0;
+
+   do {
+   qm_dqrr_pvb_update(&p->p);
+   dq = qm_dqrr_current(&p->p);
+   if (!dq)
+   break;
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   /*
+* If running on an LE system the fields of the
+* dequeue entry must be swapper.  Because the
+* QMan HW will ignore writes the DQRR entry is
+* copied and the index stored within the copy
+*/
+   shadow = &p->shadow_dqrr[DQRR_PTR2IDX(dq)];
+   *shadow = *dq;
+   dq = shadow;
+   shadow->fqid = be32_to_cpu(shadow->fqid);
+   shadow->contextB = be32_to_cpu(shadow->contextB);
+   shadow->seqnum = be16_to_cpu(shadow->seqnum);
+   hw_fd_to_cpu(&shadow->fd);
+#endif
+
+  /* SDQCR: context_b points to the FQ */
+#ifdef CONFIG_FSL_QMAN_FQ_LOOKUP
+   fq = get_fq_table_entry(dq->contextB);
+#else
+   fq = (void *)(uintptr_t)dq->contextB;
+#endif
+   /* Now let the callback do its stuff */
+   res = fq->cb.dqrr_dpdk_cb(&ev[rx_number], p, fq,
+dq, &bufs[rx_number]);
+   rx_number++;
+   /* Interpret 'dq' from a driver perspective. */
+   /*
+* Parking isn't possible unless HELDACTIVE was set. NB,
+* FORCEELIGIBLE implies HELDACTIVE, so we only need to
+* check for HELDACTIVE to cover both.
+*/
+   DPAA_ASSERT((dq->stat & QM_DQRR_STAT_FQ_HELDACTIVE) ||
+   (res != qman_cb_dqrr_park));
+   if (res != qman_cb_dqrr_defer)
+   qm_dqrr_cdc_consume_1ptr(&p->p, dq,
+res == qman_cb_dqrr_park);
+   /* Move forward */
+   qm_dqrr_next(&p->p);
+   /*
+* Entry processed and consumed, increment our counter.  The
+* callback can request that we exit after consuming the
+* entry, and we also exit if we reach our processing limit,
+* so loop back only if neither of these conditions is met.
+*/
+   } while (++limit < poll_limit);
+
+   return limit;
+}
+
 struct qm_dqrr_entry *qman_dequeue(struct qman_fq *fq)
 {
struct qman_portal *p = get_affine_portal();
@@ -1262,13 +1332,20 @@ u32 qman_static_dequeue_get(struct qman_portal *qp)
return p->sdqcr;
 }
 
-void qman_dca(struct qm_dqrr_entry *dq, int park_request)
+void qman_dca(const struct qm_dqrr_entry *dq, int park_request)
 {
struct qman_portal *p = get_affine_portal();
 
qm_dqrr_cdc_consume_1ptr(&p->p, dq, park_request);
 }
 
+void qman_dca_index(u8 index, int park_request)
+{
+   struct qman_portal *p = get_affine_portal();
+
+   qm_dqrr_cdc_consume_1(&p->p, index, park_request);
+}
+
 /* Frame queue API */
 static const char *mcr_result_str(u8 result)
 {
@@ -2116,8 +2193,8 @@ int qman_enqueue(struct qman_fq *fq, const struct qm_fd 
*fd, u32 flags)
 }
 
 int qman_enqueue_multi(struct qman_fq *fq,
-  const struct qm_fd *fd,
-  int frames_to_send)
+  const struct qm_fd *fd, u32 *flags,
+   int frames_to_send)
 {
struct qman_portal *p = get_affine_portal();
struct qm_portal *portal = &p->

[dpdk-dev] [PATCH v2 03/12] bus/dpaa: add dpaa eventdev dynamic log support

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 drivers/bus/dpaa/dpaa_bus.c   |  5 +
 drivers/bus/dpaa/rte_bus_dpaa_version.map |  1 +
 drivers/bus/dpaa/rte_dpaa_logs.h  | 16 
 3 files changed, 22 insertions(+)

diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index 01b332a..60a1ad5 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -70,6 +70,7 @@
 int dpaa_logtype_bus;
 int dpaa_logtype_mempool;
 int dpaa_logtype_pmd;
+int dpaa_logtype_eventdev;
 
 struct rte_dpaa_bus rte_dpaa_bus;
 struct netcfg_info *dpaa_netcfg;
@@ -562,4 +563,8 @@ dpaa_init_log(void)
dpaa_logtype_pmd = rte_log_register("pmd.dpaa");
if (dpaa_logtype_pmd >= 0)
rte_log_set_level(dpaa_logtype_pmd, RTE_LOG_NOTICE);
+
+   dpaa_logtype_eventdev = rte_log_register("eventdev.dpaa");
+   if (dpaa_logtype_eventdev >= 0)
+   rte_log_set_level(dpaa_logtype_eventdev, RTE_LOG_NOTICE);
 }
diff --git a/drivers/bus/dpaa/rte_bus_dpaa_version.map 
b/drivers/bus/dpaa/rte_bus_dpaa_version.map
index afc40bc..93cd118 100644
--- a/drivers/bus/dpaa/rte_bus_dpaa_version.map
+++ b/drivers/bus/dpaa/rte_bus_dpaa_version.map
@@ -68,6 +68,7 @@ DPDK_17.11 {
 DPDK_18.02 {
global:
 
+   dpaa_logtype_eventdev;
dpaa_svr_family;
per_lcore_held_bufs;
qm_channel_pool1;
diff --git a/drivers/bus/dpaa/rte_dpaa_logs.h b/drivers/bus/dpaa/rte_dpaa_logs.h
index 037c96b..f36aac1 100644
--- a/drivers/bus/dpaa/rte_dpaa_logs.h
+++ b/drivers/bus/dpaa/rte_dpaa_logs.h
@@ -38,6 +38,7 @@
 extern int dpaa_logtype_bus;
 extern int dpaa_logtype_mempool;
 extern int dpaa_logtype_pmd;
+extern int dpaa_logtype_eventdev;
 
 #define DPAA_BUS_LOG(level, fmt, args...) \
rte_log(RTE_LOG_ ## level, dpaa_logtype_bus, "%s(): " fmt "\n", \
@@ -100,6 +101,21 @@ extern int dpaa_logtype_pmd;
 #define DPAA_PMD_WARN(fmt, args...) \
DPAA_PMD_LOG(WARNING, fmt, ## args)
 
+#define DPAA_EVENTDEV_LOG(level, fmt, args...) \
+   rte_log(RTE_LOG_ ## level, dpaa_logtype_eventdev, "%s(): " fmt "\n", \
+   __func__, ##args)
+
+#define EVENTDEV_INIT_FUNC_TRACE() DPAA_EVENTDEV_LOG(DEBUG, " >>")
+
+#define DPAA_EVENTDEV_DEBUG(fmt, args...) \
+   DPAA_EVENTDEV_LOG(DEBUG, fmt, ## args)
+#define DPAA_EVENTDEV_ERR(fmt, args...) \
+   DPAA_EVENTDEV_LOG(ERR, fmt, ## args)
+#define DPAA_EVENTDEV_INFO(fmt, args...) \
+   DPAA_EVENTDEV_LOG(INFO, fmt, ## args)
+#define DPAA_EVENTDEV_WARN(fmt, args...) \
+   DPAA_EVENTDEV_LOG(WARNING, fmt, ## args)
+
 /* DP Logs, toggled out at compile time if level lower than current level */
 #define DPAA_DP_LOG(level, fmt, args...) \
RTE_LOG_DP(level, PMD, fmt, ## args)
-- 
2.9.3



[dpdk-dev] [PATCH v2 01/12] config: enabling compilation of DPAA eventdev PMD

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 config/common_base   | 3 +++
 config/defconfig_arm64-dpaa-linuxapp-gcc | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/config/common_base b/config/common_base
index e74febe..d3acd84 100644
--- a/config/common_base
+++ b/config/common_base
@@ -332,6 +332,9 @@ CONFIG_RTE_LIBRTE_DPAA_BUS=n
 CONFIG_RTE_LIBRTE_DPAA_MEMPOOL=n
 CONFIG_RTE_LIBRTE_DPAA_PMD=n
 
+# Compile software NXP DPAA Event Dev PMD
+CONFIG_RTE_LIBRTE_PMD_DPAA_EVENTDEV=n
+
 #
 # Compile burst-oriented Cavium OCTEONTX network PMD driver
 #
diff --git a/config/defconfig_arm64-dpaa-linuxapp-gcc 
b/config/defconfig_arm64-dpaa-linuxapp-gcc
index e577432..c163f9d 100644
--- a/config/defconfig_arm64-dpaa-linuxapp-gcc
+++ b/config/defconfig_arm64-dpaa-linuxapp-gcc
@@ -58,6 +58,9 @@ CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_OPS="dpaa"
 # Compile software NXP DPAA PMD
 CONFIG_RTE_LIBRTE_DPAA_PMD=y
 
+# Compile software NXP DPAA Event Dev PMD
+CONFIG_RTE_LIBRTE_PMD_DPAA_EVENTDEV=y
+
 #
 # FSL DPAA caam - crypto driver
 #
-- 
2.9.3



[dpdk-dev] [PATCH v2 04/12] net/dpaa: ethdev Rx queue configurations with eventdev

2017-12-22 Thread Sunil Kumar Kori
Given ethernet Rx queues can be attached with event queue in
parallel or atomic mode. Patch imlmplements Rx queue
configuration, attachment/detachment with given event queue and their
corresponding callbacks to handle events from respective queues.

Signed-off-by: Sunil Kumar Kori 
---
 drivers/net/dpaa/Makefile |   2 +
 drivers/net/dpaa/dpaa_ethdev.c| 110 --
 drivers/net/dpaa/dpaa_ethdev.h|  29 
 drivers/net/dpaa/dpaa_rxtx.c  |  80 +-
 drivers/net/dpaa/rte_pmd_dpaa_version.map |   2 +
 5 files changed, 214 insertions(+), 9 deletions(-)

diff --git a/drivers/net/dpaa/Makefile b/drivers/net/dpaa/Makefile
index a99d1ee..c644353 100644
--- a/drivers/net/dpaa/Makefile
+++ b/drivers/net/dpaa/Makefile
@@ -43,7 +43,9 @@ CFLAGS += -I$(RTE_SDK_DPAA)/
 CFLAGS += -I$(RTE_SDK_DPAA)/include
 CFLAGS += -I$(RTE_SDK)/drivers/bus/dpaa
 CFLAGS += -I$(RTE_SDK)/drivers/bus/dpaa/include/
+CFLAGS += -I$(RTE_SDK)/drivers/bus/dpaa/base/qbman
 CFLAGS += -I$(RTE_SDK)/drivers/mempool/dpaa
+CFLAGS += -I$(RTE_SDK)/drivers/event/dpaa
 CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
 CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal/include
 
diff --git a/drivers/net/dpaa/dpaa_ethdev.c b/drivers/net/dpaa/dpaa_ethdev.c
index 7798994..457e421 100644
--- a/drivers/net/dpaa/dpaa_ethdev.c
+++ b/drivers/net/dpaa/dpaa_ethdev.c
@@ -121,6 +121,21 @@ static const struct rte_dpaa_xstats_name_off 
dpaa_xstats_strings[] = {
 
 static struct rte_dpaa_driver rte_dpaa_pmd;
 
+static inline void
+dpaa_poll_queue_default_config(struct qm_mcc_initfq *opts)
+{
+   memset(opts, 0, sizeof(struct qm_mcc_initfq));
+   opts->we_mask = QM_INITFQ_WE_FQCTRL | QM_INITFQ_WE_CONTEXTA;
+   opts->fqd.fq_ctrl = QM_FQCTRL_AVOIDBLOCK | QM_FQCTRL_CTXASTASHING |
+  QM_FQCTRL_PREFERINCACHE;
+   opts->fqd.context_a.stashing.exclusive = 0;
+   if (dpaa_svr_family != SVR_LS1046A_FAMILY)
+   opts->fqd.context_a.stashing.annotation_cl =
+   DPAA_IF_RX_ANNOTATION_STASH;
+   opts->fqd.context_a.stashing.data_cl = DPAA_IF_RX_DATA_STASH;
+   opts->fqd.context_a.stashing.context_cl = DPAA_IF_RX_CONTEXT_STASH;
+}
+
 static int
 dpaa_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
@@ -561,6 +576,92 @@ int dpaa_eth_rx_queue_setup(struct rte_eth_dev *dev, 
uint16_t queue_idx,
return 0;
 }
 
+int dpaa_eth_eventq_attach(const struct rte_eth_dev *dev,
+  int eth_rx_queue_id,
+   u16 ch_id,
+   const struct rte_event_eth_rx_adapter_queue_conf *queue_conf)
+{
+   int ret;
+   u32 flags = 0;
+   struct dpaa_if *dpaa_intf = dev->data->dev_private;
+   struct qman_fq *rxq = &dpaa_intf->rx_queues[eth_rx_queue_id];
+   struct qm_mcc_initfq opts = {0};
+
+   dpaa_poll_queue_default_config(&opts);
+
+   switch (queue_conf->ev.sched_type) {
+   case RTE_SCHED_TYPE_ATOMIC:
+   opts.fqd.fq_ctrl |= QM_FQCTRL_HOLDACTIVE;
+   /* Reset FQCTRL_AVOIDBLOCK bit as it is unnecessary
+* configuration with HOLD_ACTIVE setting
+*/
+   opts.fqd.fq_ctrl &= (~QM_FQCTRL_AVOIDBLOCK);
+   rxq->cb.dqrr_dpdk_cb = dpaa_rx_cb_atomic;
+   break;
+   case RTE_SCHED_TYPE_ORDERED:
+   DPAA_PMD_ERR("Ordered queue schedule type is not supported\n");
+   return -1;
+   default:
+   opts.fqd.fq_ctrl |= QM_FQCTRL_AVOIDBLOCK;
+   rxq->cb.dqrr_dpdk_cb = dpaa_rx_cb_parallel;
+   break;
+   }
+
+   opts.we_mask = opts.we_mask | QM_INITFQ_WE_DESTWQ;
+   opts.fqd.dest.channel = ch_id;
+   opts.fqd.dest.wq = queue_conf->ev.priority;
+
+   if (dpaa_intf->cgr_rx) {
+   opts.we_mask |= QM_INITFQ_WE_CGID;
+   opts.fqd.cgid = dpaa_intf->cgr_rx[eth_rx_queue_id].cgrid;
+   opts.fqd.fq_ctrl |= QM_FQCTRL_CGE;
+   }
+
+   flags = QMAN_INITFQ_FLAG_SCHED;
+
+   ret = qman_init_fq(rxq, flags, &opts);
+   if (ret) {
+   DPAA_PMD_ERR("Channel/Queue association failed. fqid %d ret:%d",
+rxq->fqid, ret);
+   return ret;
+   }
+
+   /* copy configuration which needs to be filled during dequeue */
+   memcpy(&rxq->ev, &queue_conf->ev, sizeof(struct rte_event));
+   dev->data->rx_queues[eth_rx_queue_id] = rxq;
+
+   return ret;
+}
+
+int dpaa_eth_eventq_detach(const struct rte_eth_dev *dev,
+  int eth_rx_queue_id)
+{
+   struct qm_mcc_initfq opts;
+   int ret;
+   u32 flags = 0;
+   struct dpaa_if *dpaa_intf = dev->data->dev_private;
+   struct qman_fq *rxq = &dpaa_intf->rx_queues[eth_rx_queue_id];
+
+   dpaa_poll_queue_default_config(&opts);
+
+   if (dpaa_intf->cgr_rx) {
+   opts.we_mask |= 

[dpdk-dev] [PATCH v2 07/12] event/dpaa: add event port config get/set support

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 drivers/event/dpaa/dpaa_eventdev.c | 105 +
 1 file changed, 105 insertions(+)

diff --git a/drivers/event/dpaa/dpaa_eventdev.c 
b/drivers/event/dpaa/dpaa_eventdev.c
index 538ba01..a3e1f7c 100644
--- a/drivers/event/dpaa/dpaa_eventdev.c
+++ b/drivers/event/dpaa/dpaa_eventdev.c
@@ -245,6 +245,106 @@ dpaa_event_queue_release(struct rte_eventdev *dev, 
uint8_t queue_id)
RTE_SET_USED(queue_id);
 }
 
+static void
+dpaa_event_port_default_conf_get(struct rte_eventdev *dev, uint8_t port_id,
+struct rte_event_port_conf *port_conf)
+{
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(dev);
+   RTE_SET_USED(port_id);
+
+   port_conf->new_event_threshold = DPAA_EVENT_MAX_NUM_EVENTS;
+   port_conf->dequeue_depth = DPAA_EVENT_MAX_PORT_DEQUEUE_DEPTH;
+   port_conf->enqueue_depth = DPAA_EVENT_MAX_PORT_ENQUEUE_DEPTH;
+}
+
+static int
+dpaa_event_port_setup(struct rte_eventdev *dev, uint8_t port_id,
+ const struct rte_event_port_conf *port_conf)
+{
+   struct dpaa_eventdev *eventdev = dev->data->dev_private;
+
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(port_conf);
+   dev->data->ports[port_id] = &eventdev->ports[port_id];
+
+   return 0;
+}
+
+static void
+dpaa_event_port_release(void *port)
+{
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(port);
+}
+
+static int
+dpaa_event_port_link(struct rte_eventdev *dev, void *port,
+const uint8_t queues[], const uint8_t priorities[],
+uint16_t nb_links)
+{
+   struct dpaa_eventdev *priv = dev->data->dev_private;
+   struct dpaa_port *event_port = (struct dpaa_port *)port;
+   struct dpaa_eventq *event_queue;
+   uint8_t eventq_id;
+   int i;
+
+   RTE_SET_USED(dev);
+   RTE_SET_USED(priorities);
+
+   /* First check that input configuration are valid */
+   for (i = 0; i < nb_links; i++) {
+   eventq_id = queues[i];
+   event_queue = &priv->evq_info[eventq_id];
+   if ((event_queue->event_queue_cfg
+   & RTE_EVENT_QUEUE_CFG_SINGLE_LINK)
+   && (event_queue->event_port)) {
+   return -EINVAL;
+   }
+   }
+
+   for (i = 0; i < nb_links; i++) {
+   eventq_id = queues[i];
+   event_queue = &priv->evq_info[eventq_id];
+   event_port->evq_info[i].event_queue_id = eventq_id;
+   event_port->evq_info[i].ch_id = event_queue->ch_id;
+   event_queue->event_port = port;
+   }
+
+   event_port->num_linked_evq = event_port->num_linked_evq + i;
+
+   return (int)i;
+}
+
+static int
+dpaa_event_port_unlink(struct rte_eventdev *dev, void *port,
+  uint8_t queues[], uint16_t nb_links)
+{
+   int i;
+   uint8_t eventq_id;
+   struct dpaa_eventq *event_queue;
+   struct dpaa_eventdev *priv = dev->data->dev_private;
+   struct dpaa_port *event_port = (struct dpaa_port *)port;
+
+   if (!event_port->num_linked_evq)
+   return nb_links;
+
+   for (i = 0; i < nb_links; i++) {
+   eventq_id = queues[i];
+   event_port->evq_info[eventq_id].event_queue_id = -1;
+   event_port->evq_info[eventq_id].ch_id = 0;
+   event_queue = &priv->evq_info[eventq_id];
+   event_queue->event_port = NULL;
+   }
+
+   event_port->num_linked_evq = event_port->num_linked_evq - i;
+
+   return (int)i;
+}
+
 static const struct rte_eventdev_ops dpaa_eventdev_ops = {
.dev_infos_get= dpaa_event_dev_info_get,
.dev_configure= dpaa_event_dev_configure,
@@ -254,6 +354,11 @@ static const struct rte_eventdev_ops dpaa_eventdev_ops = {
.queue_def_conf   = dpaa_event_queue_def_conf,
.queue_setup  = dpaa_event_queue_setup,
.queue_release  = dpaa_event_queue_release,
+   .port_def_conf= dpaa_event_port_default_conf_get,
+   .port_setup   = dpaa_event_port_setup,
+   .port_release   = dpaa_event_port_release,
+   .port_link= dpaa_event_port_link,
+   .port_unlink  = dpaa_event_port_unlink,
 };
 
 static int
-- 
2.9.3



[dpdk-dev] [PATCH v2 05/12] event/dpaa: add eventdev PMD

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 MAINTAINERS   |   5 +
 drivers/event/Makefile|   1 +
 drivers/event/dpaa/Makefile   |  37 +++
 drivers/event/dpaa/dpaa_eventdev.c| 267 ++
 drivers/event/dpaa/dpaa_eventdev.h|  81 +++
 drivers/event/dpaa/rte_pmd_dpaa_event_version.map |   4 +
 6 files changed, 395 insertions(+)
 create mode 100644 drivers/event/dpaa/Makefile
 create mode 100644 drivers/event/dpaa/dpaa_eventdev.c
 create mode 100644 drivers/event/dpaa/dpaa_eventdev.h
 create mode 100644 drivers/event/dpaa/rte_pmd_dpaa_event_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index f0baeb4..bf4d0da 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -692,6 +692,11 @@ M: Nipun Gupta 
 F: drivers/event/dpaa2/
 F: doc/guides/eventdevs/dpaa2.rst
 
+NXP DPAA eventdev
+M: Hemant Agrawal 
+M: Sunil Kumar Kori 
+F: drivers/event/dpaa/
+
 Software Eventdev PMD
 M: Harry van Haaren 
 F: drivers/event/sw/
diff --git a/drivers/event/Makefile b/drivers/event/Makefile
index 1f9c0ba..c726234 100644
--- a/drivers/event/Makefile
+++ b/drivers/event/Makefile
@@ -35,5 +35,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_EVENTDEV) += skeleton
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_OCTEONTX_SSOVF) += octeontx
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_DPAA2_EVENTDEV) += dpaa2
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_DPAA_EVENTDEV) += dpaa
 
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/event/dpaa/Makefile b/drivers/event/dpaa/Makefile
new file mode 100644
index 000..bd0b6c9
--- /dev/null
+++ b/drivers/event/dpaa/Makefile
@@ -0,0 +1,37 @@
+#   SPDX-License-Identifier:BSD-3-Clause
+#   Copyright 2017 NXP
+#
+
+include $(RTE_SDK)/mk/rte.vars.mk
+RTE_SDK_DPAA=$(RTE_SDK)/drivers/net/dpaa
+
+#
+# library name
+#
+LIB = librte_pmd_dpaa_event.a
+
+CFLAGS := -I$(SRCDIR) $(CFLAGS)
+CFLAGS += -O3 $(WERROR_FLAGS)
+CFLAGS += -Wno-pointer-arith
+CFLAGS += -I$(RTE_SDK_DPAA)/
+CFLAGS += -I$(RTE_SDK_DPAA)/include
+CFLAGS += -I$(RTE_SDK)/drivers/bus/dpaa
+CFLAGS += -I$(RTE_SDK)/drivers/bus/dpaa/include/
+CFLAGS += -I$(RTE_SDK)/drivers/mempool/dpaa
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal/include
+
+EXPORT_MAP := rte_pmd_dpaa_event_version.map
+
+LIBABIVER := 1
+
+# Interfaces with DPDK
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_DPAA_EVENTDEV) += dpaa_eventdev.c
+
+LDLIBS += -lrte_bus_dpaa
+LDLIBS += -lrte_mempool_dpaa
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
+LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs
+LDLIBS += -lrte_eventdev -lrte_pmd_dpaa -lrte_bus_vdev
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/event/dpaa/dpaa_eventdev.c 
b/drivers/event/dpaa/dpaa_eventdev.c
new file mode 100644
index 000..c4c81c9
--- /dev/null
+++ b/drivers/event/dpaa/dpaa_eventdev.c
@@ -0,0 +1,267 @@
+/*   SPDX-License-Identifier:BSD-3-Clause
+ *   Copyright 2017 NXP
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "dpaa_eventdev.h"
+#include 
+
+/*
+ * Clarifications
+ * Evendev = Virtual Instance for SoC
+ * Eventport = Portal Instance
+ * Eventqueue = Channel Instance
+ * 1 Eventdev can have N Eventqueue
+ */
+
+static int
+dpaa_event_dequeue_timeout_ticks(struct rte_eventdev *dev, uint64_t ns,
+uint64_t *timeout_ticks)
+{
+   uint64_t cycles_per_second;
+
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(dev);
+
+   cycles_per_second = rte_get_timer_hz();
+   *timeout_ticks = ns * (cycles_per_second / NS_PER_S);
+
+   return 0;
+}
+
+static void
+dpaa_event_dev_info_get(struct rte_eventdev *dev,
+   struct rte_event_dev_info *dev_info)
+{
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(dev);
+   dev_info->driver_name = "event_dpaa";
+   dev_info->min_dequeue_timeout_ns =
+   DPAA_EVENT_MIN_DEQUEUE_TIMEOUT;
+   dev_info->max_dequeue_timeout_ns =
+   DPAA_EVENT_MAX_DEQUEUE_TIMEOUT;
+   dev_info->dequeue_timeout_ns =
+   DPAA_EVENT_MIN_DEQUEUE_TIMEOUT;
+   dev_info->max_event_queues =
+   DPAA_EVENT_MAX_QUEUES;
+   dev_info->max_event_queue_flows =
+   DPAA_EVENT_MAX_QUEUE_FLOWS;
+   dev_info->max_event_queue_priority_levels =
+   DPAA_EVENT_MAX_QUEUE_PRIORITY_LEVELS;
+   dev_info->max_event_priority_levels =
+   DPAA_EVENT_MAX_EVENT_PRIORITY_LEVELS;
+   dev_info->max_event_ports =
+   DPAA_EVENT_MAX_EVENT_PORT;
+   dev_info->max_event_port_dequeue_depth =
+   DPAA_EVENT_MAX_PORT_DEQUEUE_DEPTH;
+   dev

[dpdk-dev] [PATCH v2 06/12] event/dpaa: add event queue config get/set support

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 drivers/event/dpaa/dpaa_eventdev.c | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/drivers/event/dpaa/dpaa_eventdev.c 
b/drivers/event/dpaa/dpaa_eventdev.c
index c4c81c9..538ba01 100644
--- a/drivers/event/dpaa/dpaa_eventdev.c
+++ b/drivers/event/dpaa/dpaa_eventdev.c
@@ -199,7 +199,51 @@ dpaa_event_dev_close(struct rte_eventdev *dev)
return 0;
 }
 
+static void
+dpaa_event_queue_def_conf(struct rte_eventdev *dev, uint8_t queue_id,
+ struct rte_event_queue_conf *queue_conf)
+{
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(dev);
+   RTE_SET_USED(queue_id);
+
+   memset(queue_conf, 0, sizeof(struct rte_event_queue_conf));
+   queue_conf->schedule_type = RTE_SCHED_TYPE_PARALLEL;
+   queue_conf->priority = RTE_EVENT_DEV_PRIORITY_HIGHEST;
+}
+
+static int
+dpaa_event_queue_setup(struct rte_eventdev *dev, uint8_t queue_id,
+  const struct rte_event_queue_conf *queue_conf)
+{
+   struct dpaa_eventdev *priv = dev->data->dev_private;
+   struct dpaa_eventq *evq_info = &priv->evq_info[queue_id];
+
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   switch (queue_conf->schedule_type) {
+   case RTE_SCHED_TYPE_PARALLEL:
+   case RTE_SCHED_TYPE_ATOMIC:
+   break;
+   case RTE_SCHED_TYPE_ORDERED:
+   EVENTDEV_DRV_ERR("Schedule type is not supported.");
+   return -1;
+   }
+   evq_info->event_queue_cfg = queue_conf->event_queue_cfg;
+   evq_info->event_queue_id = queue_id;
+
+   return 0;
+}
 
+static void
+dpaa_event_queue_release(struct rte_eventdev *dev, uint8_t queue_id)
+{
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(dev);
+   RTE_SET_USED(queue_id);
+}
 
 static const struct rte_eventdev_ops dpaa_eventdev_ops = {
.dev_infos_get= dpaa_event_dev_info_get,
@@ -207,6 +251,9 @@ static const struct rte_eventdev_ops dpaa_eventdev_ops = {
.dev_start= dpaa_event_dev_start,
.dev_stop = dpaa_event_dev_stop,
.dev_close= dpaa_event_dev_close,
+   .queue_def_conf   = dpaa_event_queue_def_conf,
+   .queue_setup  = dpaa_event_queue_setup,
+   .queue_release  = dpaa_event_queue_release,
 };
 
 static int
-- 
2.9.3



[dpdk-dev] [PATCH v2 09/12] event/dpaa: add eth rx adapter queue config support

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 drivers/event/dpaa/dpaa_eventdev.c | 117 +
 1 file changed, 117 insertions(+)

diff --git a/drivers/event/dpaa/dpaa_eventdev.c 
b/drivers/event/dpaa/dpaa_eventdev.c
index 7b3d8fb..13345da 100644
--- a/drivers/event/dpaa/dpaa_eventdev.c
+++ b/drivers/event/dpaa/dpaa_eventdev.c
@@ -345,6 +345,118 @@ dpaa_event_port_unlink(struct rte_eventdev *dev, void 
*port,
return (int)i;
 }
 
+static int
+dpaa_event_eth_rx_adapter_caps_get(const struct rte_eventdev *dev,
+  const struct rte_eth_dev *eth_dev,
+  uint32_t *caps)
+{
+   const char *ethdev_driver = eth_dev->device->driver->name;
+
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(dev);
+
+   if (!strcmp(ethdev_driver, "net_dpaa"))
+   *caps = RTE_EVENT_ETH_RX_ADAPTER_DPAA_CAP;
+   else
+   *caps = RTE_EVENT_ETH_RX_ADAPTER_SW_CAP;
+
+   return 0;
+}
+
+static int
+dpaa_event_eth_rx_adapter_queue_add(
+   const struct rte_eventdev *dev,
+   const struct rte_eth_dev *eth_dev,
+   int32_t rx_queue_id,
+   const struct rte_event_eth_rx_adapter_queue_conf *queue_conf)
+{
+   struct dpaa_eventdev *eventdev = dev->data->dev_private;
+   uint8_t ev_qid = queue_conf->ev.queue_id;
+   u16 ch_id = eventdev->evq_info[ev_qid].ch_id;
+   struct dpaa_if *dpaa_intf = eth_dev->data->dev_private;
+   int ret, i;
+
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   if (rx_queue_id == -1) {
+   for (i = 0; i < dpaa_intf->nb_rx_queues; i++) {
+   ret = dpaa_eth_eventq_attach(eth_dev, i, ch_id,
+queue_conf);
+   if (ret) {
+   EVENTDEV_DRV_ERR(
+   "Event Queue attach failed:%d\n", ret);
+   goto detach_configured_queues;
+   }
+   }
+   return 0;
+   }
+
+   ret = dpaa_eth_eventq_attach(eth_dev, rx_queue_id, ch_id, queue_conf);
+   if (ret)
+   EVENTDEV_DRV_ERR("dpaa_eth_eventq_attach failed:%d\n", ret);
+   return ret;
+
+detach_configured_queues:
+
+   for (i = (i - 1); i >= 0 ; i--)
+   dpaa_eth_eventq_detach(eth_dev, i);
+
+   return ret;
+}
+
+static int
+dpaa_event_eth_rx_adapter_queue_del(const struct rte_eventdev *dev,
+   const struct rte_eth_dev *eth_dev,
+   int32_t rx_queue_id)
+{
+   int ret, i;
+   struct dpaa_if *dpaa_intf = eth_dev->data->dev_private;
+
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(dev);
+   if (rx_queue_id == -1) {
+   for (i = 0; i < dpaa_intf->nb_rx_queues; i++) {
+   ret = dpaa_eth_eventq_detach(eth_dev, i);
+   if (ret)
+   EVENTDEV_DRV_ERR(
+   "Event Queue detach failed:%d\n", ret);
+   }
+
+   return 0;
+   }
+
+   ret = dpaa_eth_eventq_detach(eth_dev, rx_queue_id);
+   if (ret)
+   EVENTDEV_DRV_ERR("dpaa_eth_eventq_detach failed:%d\n", ret);
+   return ret;
+}
+
+static int
+dpaa_event_eth_rx_adapter_start(const struct rte_eventdev *dev,
+   const struct rte_eth_dev *eth_dev)
+{
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(dev);
+   RTE_SET_USED(eth_dev);
+
+   return 0;
+}
+
+static int
+dpaa_event_eth_rx_adapter_stop(const struct rte_eventdev *dev,
+  const struct rte_eth_dev *eth_dev)
+{
+   EVENTDEV_DRV_FUNC_TRACE();
+
+   RTE_SET_USED(dev);
+   RTE_SET_USED(eth_dev);
+
+   return 0;
+}
+
 static const struct rte_eventdev_ops dpaa_eventdev_ops = {
.dev_infos_get= dpaa_event_dev_info_get,
.dev_configure= dpaa_event_dev_configure,
@@ -360,6 +472,11 @@ static const struct rte_eventdev_ops dpaa_eventdev_ops = {
.port_link= dpaa_event_port_link,
.port_unlink  = dpaa_event_port_unlink,
.timeout_ticks= dpaa_event_dequeue_timeout_ticks,
+   .eth_rx_adapter_caps_get = dpaa_event_eth_rx_adapter_caps_get,
+   .eth_rx_adapter_queue_add = dpaa_event_eth_rx_adapter_queue_add,
+   .eth_rx_adapter_queue_del = dpaa_event_eth_rx_adapter_queue_del,
+   .eth_rx_adapter_start = dpaa_event_eth_rx_adapter_start,
+   .eth_rx_adapter_stop = dpaa_event_eth_rx_adapter_stop,
 };
 
 static int
-- 
2.9.3



[dpdk-dev] [PATCH v2 08/12] event/dpaa: add dequeue timeout conversion support

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 drivers/event/dpaa/dpaa_eventdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/event/dpaa/dpaa_eventdev.c 
b/drivers/event/dpaa/dpaa_eventdev.c
index a3e1f7c..7b3d8fb 100644
--- a/drivers/event/dpaa/dpaa_eventdev.c
+++ b/drivers/event/dpaa/dpaa_eventdev.c
@@ -359,6 +359,7 @@ static const struct rte_eventdev_ops dpaa_eventdev_ops = {
.port_release   = dpaa_event_port_release,
.port_link= dpaa_event_port_link,
.port_unlink  = dpaa_event_port_unlink,
+   .timeout_ticks= dpaa_event_dequeue_timeout_ticks,
 };
 
 static int
-- 
2.9.3



[dpdk-dev] [PATCH v2 10/12] event/dpaa: add eventdev enqueue/dequeue support

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 drivers/event/dpaa/dpaa_eventdev.c | 114 +
 1 file changed, 114 insertions(+)

diff --git a/drivers/event/dpaa/dpaa_eventdev.c 
b/drivers/event/dpaa/dpaa_eventdev.c
index 13345da..64b9eb4 100644
--- a/drivers/event/dpaa/dpaa_eventdev.c
+++ b/drivers/event/dpaa/dpaa_eventdev.c
@@ -60,6 +60,116 @@ dpaa_event_dequeue_timeout_ticks(struct rte_eventdev *dev, 
uint64_t ns,
 }
 
 static void
+dpaa_eventq_portal_add(u16 ch_id)
+{
+   uint32_t sdqcr;
+
+   sdqcr = QM_SDQCR_CHANNELS_POOL_CONV(ch_id);
+   qman_static_dequeue_add(sdqcr, NULL);
+}
+
+static uint16_t
+dpaa_event_enqueue_burst(void *port, const struct rte_event ev[],
+uint16_t nb_events)
+{
+   uint16_t i;
+   struct rte_mbuf *mbuf;
+
+   RTE_SET_USED(port);
+   /*Release all the contexts saved previously*/
+   for (i = 0; i < nb_events; i++) {
+   switch (ev[i].op) {
+   case RTE_EVENT_OP_RELEASE:
+   qman_dca_index(ev[i].impl_opaque, 0);
+   mbuf = DPAA_PER_LCORE_DQRR_MBUF(i);
+   mbuf->seqn = DPAA_INVALID_MBUF_SEQN;
+   DPAA_PER_LCORE_DQRR_HELD &= ~(1 << i);
+   DPAA_PER_LCORE_DQRR_SIZE--;
+   break;
+   default:
+   break;
+   }
+   }
+
+   return nb_events;
+}
+
+static uint16_t
+dpaa_event_enqueue(void *port, const struct rte_event *ev)
+{
+   return dpaa_event_enqueue_burst(port, ev, 1);
+}
+
+static uint16_t
+dpaa_event_dequeue_burst(void *port, struct rte_event ev[],
+uint16_t nb_events, uint64_t timeout_ticks)
+{
+   int ret;
+   u16 ch_id;
+   void *buffers[8];
+   u32 num_frames, i;
+   uint64_t wait_time, cur_ticks, start_ticks;
+   struct dpaa_port *portal = (struct dpaa_port *)port;
+   struct rte_mbuf *mbuf;
+
+   /* Affine current thread context to a qman portal */
+   ret = rte_dpaa_portal_init((void *)0);
+   if (ret) {
+   DPAA_EVENTDEV_ERR("Unable to initialize portal");
+   return ret;
+   }
+
+   if (unlikely(!portal->is_port_linked)) {
+   /*
+* Affine event queue for current thread context
+* to a qman portal.
+*/
+   for (i = 0; i < portal->num_linked_evq; i++) {
+   ch_id = portal->evq_info[i].ch_id;
+   dpaa_eventq_portal_add(ch_id);
+   }
+   portal->is_port_linked = true;
+   }
+
+   /* Check if there are atomic contexts to be released */
+   i = 0;
+   while (DPAA_PER_LCORE_DQRR_SIZE) {
+   if (DPAA_PER_LCORE_DQRR_HELD & (1 << i)) {
+   qman_dca_index(i, 0);
+   mbuf = DPAA_PER_LCORE_DQRR_MBUF(i);
+   mbuf->seqn = DPAA_INVALID_MBUF_SEQN;
+   DPAA_PER_LCORE_DQRR_HELD &= ~(1 << i);
+   DPAA_PER_LCORE_DQRR_SIZE--;
+   }
+   i++;
+   }
+   DPAA_PER_LCORE_DQRR_HELD = 0;
+
+   if (portal->timeout == DPAA_EVENT_PORT_DEQUEUE_TIMEOUT_INVALID)
+   wait_time = timeout_ticks;
+   else
+   wait_time = portal->timeout;
+
+   /* Lets dequeue the frames */
+   start_ticks = rte_get_timer_cycles();
+   wait_time += start_ticks;
+   do {
+   num_frames = qman_portal_dequeue(ev, nb_events, buffers);
+   if (num_frames != 0)
+   break;
+   cur_ticks = rte_get_timer_cycles();
+   } while (cur_ticks < wait_time);
+
+   return num_frames;
+}
+
+static uint16_t
+dpaa_event_dequeue(void *port, struct rte_event *ev, uint64_t timeout_ticks)
+{
+   return dpaa_event_dequeue_burst(port, ev, 1, timeout_ticks);
+}
+
+static void
 dpaa_event_dev_info_get(struct rte_eventdev *dev,
struct rte_event_dev_info *dev_info)
 {
@@ -494,6 +604,10 @@ dpaa_event_dev_create(const char *name)
}
 
eventdev->dev_ops   = &dpaa_eventdev_ops;
+   eventdev->enqueue   = dpaa_event_enqueue;
+   eventdev->enqueue_burst = dpaa_event_enqueue_burst;
+   eventdev->dequeue   = dpaa_event_dequeue;
+   eventdev->dequeue_burst = dpaa_event_dequeue_burst;
 
/* For secondary processes, the primary has done all the work */
if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-- 
2.9.3



[dpdk-dev] [PATCH v2 12/12] doc: add DPAA eventdev guide

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 MAINTAINERS|   1 +
 doc/guides/eventdevs/dpaa.rst  | 144 +
 doc/guides/eventdevs/index.rst |   1 +
 3 files changed, 146 insertions(+)
 create mode 100644 doc/guides/eventdevs/dpaa.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index bf4d0da..ced725a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -696,6 +696,7 @@ NXP DPAA eventdev
 M: Hemant Agrawal 
 M: Sunil Kumar Kori 
 F: drivers/event/dpaa/
+F: doc/guides/eventdevs/dpaa.rst
 
 Software Eventdev PMD
 M: Harry van Haaren 
diff --git a/doc/guides/eventdevs/dpaa.rst b/doc/guides/eventdevs/dpaa.rst
new file mode 100644
index 000..35e5211
--- /dev/null
+++ b/doc/guides/eventdevs/dpaa.rst
@@ -0,0 +1,144 @@
+.. SPDX-License-Identifier:BSD-3-Clause
+   Copyright 2017 NXP
+
+NXP DPAA Eventdev Driver
+=
+
+The dpaa eventdev is an implementation of the eventdev API, that provides a
+wide range of the eventdev features. The eventdev relies on a dpaa based
+platform to perform event scheduling.
+
+More information can be found at `NXP Official Website
+`_.
+
+Features
+
+
+The DPAA EVENTDEV implements many features in the eventdev API;
+
+- Hardware based event scheduler
+- 4 event ports
+- 4 event queues
+- Parallel flows
+- Atomic flows
+
+Supported DPAA SoCs
+
+
+- LS1046A
+- LS1043A
+
+Prerequisites
+-
+
+There are following pre-requisites for executing EVENTDEV on a DPAA compatible
+platform:
+
+1. **ARM 64 Tool Chain**
+
+  For example, the `*aarch64* Linaro Toolchain 
`_.
+
+2. **Linux Kernel**
+
+   It can be obtained from `NXP's Github hosting 
`_.
+
+3. **Rootfile System**
+
+   Any *aarch64* supporting filesystem can be used. For example,
+   Ubuntu 15.10 (Wily) or 16.04 LTS (Xenial) userland which can be obtained
+   from `here 
`_.
+
+As an alternative method, DPAA EVENTDEV can also be executed using images 
provided
+as part of SDK from NXP. The SDK includes all the above prerequisites necessary
+to bring up a DPAA board.
+
+The following dependencies are not part of DPDK and must be installed
+separately:
+
+- **NXP Linux SDK**
+
+  NXP Linux software development kit (SDK) includes support for family
+  of QorIQ® ARM-Architecture-based system on chip (SoC) processors
+  and corresponding boards.
+
+  It includes the Linux board support packages (BSPs) for NXP SoCs,
+  a fully operational tool chain, kernel and board specific modules.
+
+  SDK and related information can be obtained from:  `NXP QorIQ SDK  
`_.
+
+- **DPDK Extra Scripts**
+
+  DPAA based resources can be configured easily with the help of ready to use
+  xml files as provided in the DPDK Extra repository.
+
+  `DPDK Extras Scripts `_.
+
+Currently supported by DPDK:
+
+- NXP SDK **2.0+** or LSDK **17.09+**
+- Supported architectures:  **arm64 LE**.
+
+- Follow the DPDK :ref:`Getting Started Guide for Linux ` to setup 
the basic DPDK environment.
+
+Pre-Installation Configuration
+--
+
+Config File Options
+~~~
+
+The following options can be modified in the ``config`` file.
+Please note that enabling debugging options may affect system performance.
+
+- ``CONFIG_RTE_LIBRTE_PMD_DPAA_EVENTDEV`` (default ``y``)
+
+  Toggle compilation of the ``librte_pmd_dpaa_event`` driver.
+
+- ``CONFIG_RTE_LIBRTE_PMD_DPAA_EVENTDEV_DEBUG`` (default ``n``)
+
+  Toggle display of generic debugging messages
+
+Driver Compilation
+~~
+
+To compile the DPAA EVENTDEV PMD for Linux arm64 gcc target, run the
+following ``make`` command:
+
+.. code-block:: console
+
+   cd 
+   make config T=arm64-dpaa-linuxapp-gcc install
+
+Initialization
+--
+
+The dpaa eventdev is exposed as a vdev device which consists of a set of 
channels
+and queues. On EAL initialization, dpaa components will be
+probed and then vdev device can be created from the application code by
+
+* Invoking ``rte_vdev_init("event_dpaa")`` from the application
+
+* Using ``--vdev="event_dpaa"`` in the EAL options, which will call
+  rte_vdev_init() internally
+
+Example:
+
+.. code-block:: console
+
+./your_eventdev_application --vdev="event_dpaa"
+
+Limitations
+---
+
+1. DPAA eventdev can not work with DPAA PUSH mode queues configured for ethdev.
+   Please configure export DPAA_NUM_PUSH_QUEUES=0
+
+Platform Requirement
+
+
+DPAA drivers for DPDK can only work on NXP SoCs as list

[dpdk-dev] [PATCH v2 11/12] config: add eventdev library to application

2017-12-22 Thread Sunil Kumar Kori
Signed-off-by: Sunil Kumar Kori 
---
 mk/rte.app.mk | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 6a6a745..22512fc 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -198,6 +198,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_EVENTDEV) += 
-lrte_pmd_skeleton_event
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += -lrte_pmd_sw_event
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OCTEONTX_SSOVF) += -lrte_pmd_octeontx_ssovf
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_DPAA2_EVENTDEV) += -lrte_pmd_dpaa2_event
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_DPAA_EVENTDEV) += -lrte_pmd_dpaa_event
 _LDLIBS-$(CONFIG_RTE_LIBRTE_OCTEONTX_MEMPOOL) += -lrte_mempool_octeontx
 _LDLIBS-$(CONFIG_RTE_LIBRTE_OCTEONTX_PMD) += -lrte_pmd_octeontx
 endif # CONFIG_RTE_LIBRTE_EVENTDEV
-- 
2.9.3



Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for pktmbuf pool create API

2017-12-22 Thread Olivier MATZ
On Tue, Dec 19, 2017 at 01:41:05PM +, Wiles, Keith wrote:
> 
> 
> > On Dec 18, 2017, at 11:40 PM, Hemant Agrawal  wrote:
> > 
> > On 12/18/2017 7:21 PM, Wiles, Keith wrote:
> >> 
> >> 
> >>> On Dec 15, 2017, at 4:41 AM, Hemant Agrawal  
> >>> wrote:
> >>> 
> >>> Introduce a new argument ops_name in rte_mempool_set_ops_byname
> >>> for allowing the application to optionally specify the mempool ops.
> >>> 
> >>> Signed-off-by: Hemant Agrawal 
> >>> ---
> >>> v2: fix checkpatch error
> >>> 
> >>> doc/guides/rel_notes/deprecation.rst | 3 +++
> >>> 1 file changed, 3 insertions(+)
> >>> 
> >>> diff --git a/doc/guides/rel_notes/deprecation.rst 
> >>> b/doc/guides/rel_notes/deprecation.rst
> >>> index 13e8543..968ca14 100644
> >>> --- a/doc/guides/rel_notes/deprecation.rst
> >>> +++ b/doc/guides/rel_notes/deprecation.rst
> >>> @@ -53,3 +53,6 @@ Deprecation Notices
> >>> 
> >>> * librte_meter: The API will change to accommodate configuration profiles.
> >>>  Most of the API functions will have an additional opaque parameter.
> >>> +
> >>> +* librte_mbuf: a new optional parameter for representing name of 
> >>> mempool_ops
> >>> +  will be added to the API ``rte_pktmbuf_pool_create``.
> >> 
> >> 
> >> Sorry, for the late response I was on vacation.
> >> 
> >> My question is why do we need to change rte_pktmbuf_pool_create ABI yet 
> >> again, why could we not add a new API to just set the name of the pool 
> >> after it is created. This would allow all current applications to work 
> >> without any ABI breakage and only require adding a new API call for anyone 
> >> that wants the name. The rte_pktmbuf_pool_create() routine could assign a 
> >> default name or some incrementing style name as the default. e.g. 
> >> ‘pktmbuf_%d’ with a static incrementing variable or whatever you like.
> >> 
> >> Sorry if this was asked and answered before.
> >> 
> > 
> > I understand the concerns.
> > 
> > However, the new API to just set the name will not work post create.
> > rte_pktmbuf_pool_create is a wrapper API, which complete the mempool 
> > configuration on the basis default mempool_ops.
> 
> Really can not add the name after the fact, I have not looked, but it seem 
> very odd we can not use the mempool pointer and update the ops_name. What is 
> stopping this from working?

Changing the ops name is not possible after the mempool is created.
The ops name defines how the objects are stored, we cannot update it
once the objects are created.

> > The idea proposed is to create pktmbuf pool from a specific mempool 
> > (ops_name).
> > 
> > We can leave "rte_pktmbuf_pool_create" as it is.
> > and create another similar API with e.g. 
> > "rte_pktmbuf_pool_create_specific", which will also take ops_name as 
> > argument.  (We can combine the internal implementation with NULL ops_name 
> > for rte_pktmbuf_pool_create.)
> 
> I would accept this approach over the original patch to change the name of a 
> commonly used API.
> > 
> > This way we will have flexibility for the applications looking for pktmbufs 
> > from a specific mempool.
> > 
> > any thoughts?

What do you think about this proposition?
http://dpdk.org/ml/archives/dev/2017-December/084775.html

The application can do:

  /* override value previously set by eal args, if any */
  rte_mbuf_set_user_pool_ops("my-ops");

  /* as before, no API change */
  rte_pktmbuf_pool_create(...);

With this approach, it is less convenients to create several pools
with different ops, but I'm not sure there is a use case for that.



[dpdk-dev] [PATCH RFC 0/2] vhost: support selective datapath

2017-12-22 Thread Zhihong Wang
This patch set introduces support for selective datapath in DPDK vhost-user
lib to enable acceleration. The default selection is the existing software
implementation, while more options are available when more engines are
present.

vDPA stands for vhost Data Path Acceleration. The idea is to enable various
types of devices to do data transfer with virtio driver directly.

Design details


An engine is a group of devices support virtio datapath operations, like
enqueue, dequeue, interrupt, doorbell. The definition of engine is as
follows:

struct rte_vdpa_eng_id {
union {
uint8_t __dummy[64];

struct {
struct rte_pci_addr pci_addr;
};
};
};

struct rte_vdpa_eng_attr {
char name[MAX_VDPA_NAME_LEN];
struct rte_vdpa_eng_id *id;
};

struct rte_vdpa_dev_ops {
vdpa_dev_conf_tdev_conf;
vdpa_dev_close_t   dev_close;
vdpa_vring_state_set_t vring_state_set;
vdpa_migration_done_t  migration_done;
};

struct rte_vdpa_eng_ops {
vdpa_eng_init_t eng_init;
vdpa_eng_uninit_t eng_uninit;
};

struct rte_vdpa_eng_driver {
const char *name;
struct rte_vdpa_eng_ops eng_ops;
struct rte_vdpa_dev_ops dev_ops;
} __rte_cache_aligned;

struct rte_vdpa_engine {
struct rte_vdpa_eng_attr eng_attr;
struct rte_vdpa_eng_driver *eng_drv;
} __rte_cache_aligned;

Changes to the current vhost-user lib are:


 1. Make vhost device capabilities configurable to adopt various engines.
Such capabilities include supported features, protocol features, queue
number. APIs are introduced to let app configure these capabilities.

 2. In addition to the existing vhost framework, a set of callbacks is
added for vhost to call the driver for device operations at the right
time:

 a. dev_conf: Called to configure the actual device when the virtio
device becomes ready

 b. dev_close: Called to close the actual device when the virtio device
is stopped

 c. vring_state_set: Called to change the state of the vring in the
actual device when vring state changes

 d. migration_done: Called to allow the device to response to RARP
sending

 3. To make vhost aware of its own type, an engine id (eid) and a device
id (did) are added into the vhost data structure, to index the actual
device. APIs are introduced to let app configure it.

Working process:


 1. Register driver during DPDK initialization.

 2. Register engine with name and attributes, the name is used to match the
right driver.

 3. For each vhost device creation in app:
 
  a. Reserve device in the engine, so the eid and did are confirmed.

  b. Register vhost-user socket.

  c. Set capabilities of the vhost-user socket.

  d. Register vhost-user callbacks.

  e. Start to wait for connection.

 4. When connection comes and virtio device data structure is created,
set the eid and did in the new_device callback.

Zhihong Wang (2):
  vhost: make capabilities configurable
  vhost: support selective datapath

 lib/librte_vhost/Makefile |   4 +-
 lib/librte_vhost/rte_vdpa.h   | 126 ++
 lib/librte_vhost/rte_vhost.h  |  98 
 lib/librte_vhost/socket.c |  77 ++
 lib/librte_vhost/vdpa.c   | 122 
 lib/librte_vhost/vhost.c  |  53 ++
 lib/librte_vhost/vhost.h  |   7 +++
 lib/librte_vhost/vhost_user.c |  96 +++-
 8 files changed, 566 insertions(+), 17 deletions(-)
 create mode 100644 lib/librte_vhost/rte_vdpa.h
 create mode 100644 lib/librte_vhost/vdpa.c

-- 
2.7.5



[dpdk-dev] [PATCH RFC 1/2] vhost: make capabilities configurable

2017-12-22 Thread Zhihong Wang
This patch makes vhost device capabilities configurable to adopt various
engines. Such capabilities include supported features, protocol features,
queue number. APIs are introduced to let app configure these capabilities.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/rte_vhost.h  | 50 
 lib/librte_vhost/socket.c | 77 +++
 lib/librte_vhost/vhost_user.c | 48 ---
 3 files changed, 164 insertions(+), 11 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index f653644..17b4c6d 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -209,6 +209,56 @@ int rte_vhost_driver_unregister(const char *path);
 int rte_vhost_driver_set_features(const char *path, uint64_t features);
 
 /**
+ * Get the protocol feature bits before feature negotiation.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param protocol_features
+ *  A pointer to store the queried protocol feature bits
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_get_protocol_features(const char *path,
+   uint64_t *protocol_features);
+
+/**
+ * Set the protocol feature bits the vhost-user driver supports.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param protocol_features
+ *  Supported protocol features
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_set_protocol_features(const char *path,
+   uint64_t protocol_features);
+
+/**
+ * Get the queue number before feature negotiation.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param queue_num
+ *  A pointer to store the queried queue number
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_get_queue_num(const char *path, uint16_t *queue_num);
+
+/**
+ * Set the queue number the vhost-user driver supports.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param queue_num
+ *  Supported queue number
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_set_queue_num(const char *path, uint16_t queue_num);
+
+/**
  * Enable vhost-user driver features.
  *
  * Note that
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 422da00..742f772 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -78,7 +78,10 @@ struct vhost_user_socket {
 * features negotiation.
 */
uint64_t supported_features;
+   uint64_t supported_protocol_features;
uint64_t features;
+   uint64_t protocol_features;
+   uint16_t queue_num;
 
struct vhost_device_ops const *notify_ops;
 };
@@ -613,6 +616,75 @@ rte_vhost_driver_get_features(const char *path, uint64_t 
*features)
}
 }
 
+int rte_vhost_driver_set_protocol_features(const char *path,
+   uint64_t protocol_features)
+{
+   struct vhost_user_socket *vsocket;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket) {
+   vsocket->supported_protocol_features = protocol_features;
+   vsocket->protocol_features = protocol_features;
+   }
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   return vsocket ? 0 : -1;
+}
+
+int
+rte_vhost_driver_get_protocol_features(const char *path,
+   uint64_t *protocol_features)
+{
+   struct vhost_user_socket *vsocket;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket)
+   *protocol_features = vsocket->protocol_features;
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   if (!vsocket) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "socket file %s is not registered yet.\n", path);
+   return -1;
+   } else {
+   return 0;
+   }
+}
+
+int rte_vhost_driver_set_queue_num(const char *path, uint16_t queue_num)
+{
+   struct vhost_user_socket *vsocket;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket)
+   vsocket->queue_num = queue_num;
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   return vsocket ? 0 : -1;
+}
+
+int rte_vhost_driver_get_queue_num(const char *path, uint16_t *queue_num)
+{
+   struct vhost_user_socket *vsocket;
+
+   pthread_mutex_lock(&vhost_user.mutex);
+   vsocket = find_vhost_user_socket(path);
+   if (vsocket)
+   *queue_num = vsocket->queue_num;
+   pthread_mutex_unlock(&vhost_user.mutex);
+
+   if (!vsocket) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "socket file %s is not registered yet.\n", path);
+   return -1;
+   } else {
+   return 0;
+   }
+}
+
 /*
  * Register a new vhost-user socket; here we could act as server
  * (the default case), or client (when RTE_VHOST

[dpdk-dev] [PATCH RFC 2/2] vhost: support selective datapath

2017-12-22 Thread Zhihong Wang
This patch introduces support for selective datapath in DPDK vhost-user lib
to enable acceleration. The default selection is the existing software
implementation, while more options are available when more engines are
present.

Signed-off-by: Zhihong Wang 
---
 lib/librte_vhost/Makefile |   4 +-
 lib/librte_vhost/rte_vdpa.h   | 126 ++
 lib/librte_vhost/rte_vhost.h  |  48 
 lib/librte_vhost/vdpa.c   | 122 
 lib/librte_vhost/vhost.c  |  53 ++
 lib/librte_vhost/vhost.h  |   7 +++
 lib/librte_vhost/vhost_user.c |  48 ++--
 7 files changed, 402 insertions(+), 6 deletions(-)
 create mode 100644 lib/librte_vhost/rte_vdpa.h
 create mode 100644 lib/librte_vhost/vdpa.c

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index be18279..47930ba 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -49,9 +49,9 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
-   vhost_user.c virtio_net.c
+   vhost_user.c virtio_net.c vdpa.c
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
new file mode 100644
index 000..4f9eebd
--- /dev/null
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -0,0 +1,126 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_VDPA_H_
+#define _RTE_VDPA_H_
+
+#include 
+#include 
+#include "rte_vhost.h"
+
+/**
+ * @file
+ *
+ * Device specific vhost lib
+ */
+
+#define MAX_VDPA_ENGINE_NUM 128
+#define MAX_VDPA_NAME_LEN 128
+
+
+struct rte_vdpa_eng_id {
+   union {
+   uint8_t __dummy[64];
+
+   struct {
+   struct rte_pci_addr pci_addr;
+   };
+   };
+};
+
+struct rte_vdpa_eng_attr {
+   char name[MAX_VDPA_NAME_LEN];
+   struct rte_vdpa_eng_id *id;
+};
+
+/* register/remove engine and return the engine id */
+typedef int (*vdpa_eng_init_t)(int eid, struct rte_vdpa_eng_id *id);
+typedef int (*vdpa_eng_uninit_t)(int eid);
+
+/* driver configure/close the port based on connection */
+typedef int (*vdpa_dev_conf_t)(int vid);
+typedef int (*vdpa_dev_close_t)(int vid);
+
+/* enable/disable this vring */
+typedef int (*vdpa_vring_state_set_t)(int vid, int vring, int state);
+
+/* set features when changed */
+typedef int (*vdpa_feature_set_t)(int vid);
+
+/* destination operations when migration done, e.g. send rarp */
+typedef int (*vdpa_migration_done_t)(int vid);
+
+/* device ops */
+struct rte_vdpa_dev_ops {
+   vdpa_dev_conf_tdev_conf;
+   vdpa_dev_close_t   dev_close;
+   vdpa_vring_state_set_t vring_state_set;
+   vdpa_feature_set_t feature_set;
+   vdpa_migration_done_t  migration_done;
+};
+
+/* engine ops */
+struct rte_vdpa_eng_ops {
+   vdpa_eng_init_t eng_init;
+   vdpa_eng_uninit_t eng_uninit;
+};
+
+struct rte_vdpa_eng_driver {
+   const char *name;
+   struct rt

[dpdk-dev] [PATCH v2] net: update licence for network headers

2017-12-22 Thread Olivier Matz
To be compliant with the DPDK licensing guidelines, switch to
BSD-3-Clause. It can be done safely since the BSD headers from which
these files derive also exist as a BSD-3-Clause license in FreeBSD.

Link: 
https://raw.githubusercontent.com/freebsd/freebsd/78a6b0861813af31e1354fa407c5701e8764b4d6/sys/netinet/ip_icmp.h
Link: 
https://raw.githubusercontent.com/freebsd/freebsd/78a6b0861813af31e1354fa407c5701e8764b4d6/sys/netinet/ip.h
Link: 
https://raw.githubusercontent.com/freebsd/freebsd/78a6b0861813af31e1354fa407c5701e8764b4d6/sys/netinet/sctp.h
Link: 
https://raw.githubusercontent.com/freebsd/freebsd/78a6b0861813af31e1354fa407c5701e8764b4d6/sys/netinet/tcp.h
Link: 
https://raw.githubusercontent.com/freebsd/freebsd/78a6b0861813af31e1354fa407c5701e8764b4d6/sys/netinet/udp.h
Signed-off-by: Olivier Matz 
---
 lib/librte_net/rte_icmp.h | 69 ---
 lib/librte_net/rte_ip.h   | 74 ---
 lib/librte_net/rte_sctp.h | 72 -
 lib/librte_net/rte_tcp.h  | 72 -
 lib/librte_net/rte_udp.h  | 72 -
 5 files changed, 26 insertions(+), 333 deletions(-)

diff --git a/lib/librte_net/rte_icmp.h b/lib/librte_net/rte_icmp.h
index 8b287f6d0..053b5f6a4 100644
--- a/lib/librte_net/rte_icmp.h
+++ b/lib/librte_net/rte_icmp.h
@@ -1,67 +1,8 @@
-/*   BSD LICENSE
- *
- *   Copyright(c) 2013 6WIND.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of 6WIND S.A. nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-/*
- * Copyright (c) 1982, 1986, 1990, 1993
- *  The Regents of the University of California.  All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- *notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- *notice, this list of conditions and the following disclaimer in the
- *documentation and/or other materials provided with the distribution.
- * 3. All advertising materials mentioning features or use of this software
- *must display the following acknowledgement:
- *  This product includes software developed by the University of
- *  California, Berkeley and its contributors.
- * 4. Neither the name of the University nor the names of its contributors
- *may be used to endorse or promote products derived from this software
- *without specific prior written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
- * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- *
- *  @(#)in.h  

Re: [dpdk-dev] [PATCH v4 1/8] event/opdl: add the opdl ring infrastructure library

2017-12-22 Thread Sean Harte
On 22 December 2017 at 11:23, Liang Ma  wrote:
>
> OPDL ring is the core infrastructure of OPDL PMD. OPDL ring library
> provide the core data structure and core helper function set. The Ring
> implements a single ring multi-port/stage pipelined packet distribution
> mechanism. This mechanism has the following characteristics:
>
> • No multiple queue cost, therefore, latency is significant reduced.
> • Fixed dependencies between queue/ports is more suitable for complex.
>   fixed pipelines of stateless packet processing (static pipeline).
> • Has decentralized distribution (no scheduling core).
> • Packets remain in order (no reorder core(s)).
> * Update build system to enable compilation.
>
> Signed-off-by: Liang Ma 
> Signed-off-by: Peter Mccarthy 
> ---
>  config/common_base|6 +
>  drivers/event/Makefile|1 +
>  drivers/event/opdl/Makefile   |   62 +
>  drivers/event/opdl/opdl_log.h |   59 +
>  drivers/event/opdl/opdl_ring.c| 1252 
> +
>  drivers/event/opdl/opdl_ring.h|  628 +++
>  drivers/event/opdl/rte_pmd_evdev_opdl_version.map |3 +
>  mk/rte.app.mk |1 +
>  mk/toolchain/gcc/rte.toolchain-compat.mk  |6 +
>  mk/toolchain/icc/rte.toolchain-compat.mk  |6 +
>  10 files changed, 2024 insertions(+)
>  create mode 100644 drivers/event/opdl/Makefile
>  create mode 100644 drivers/event/opdl/opdl_log.h
>  create mode 100644 drivers/event/opdl/opdl_ring.c
>  create mode 100644 drivers/event/opdl/opdl_ring.h
>  create mode 100644 drivers/event/opdl/rte_pmd_evdev_opdl_version.map

[...]

Reviewed-by: Seán Harte 


Re: [dpdk-dev] [PATCH 1/2] mempool: indicate the usages of multi memzones

2017-12-22 Thread Hemant Agrawal

On 12/22/2017 7:29 PM, Olivier MATZ wrote:

On Wed, Dec 20, 2017 at 05:29:59PM +0530, Hemant Agrawal wrote:

On 12/19/2017 6:38 PM, Hemant Agrawal wrote:



That's true, I commented too fast :)
And what about using mp->nb_mem_chunks instead? Would it do the job
in your use-case?


It should work.  Let me check it out.


There is a slight problem with nb_mem_chunks.

It is getting incremented in the end of "rte_mempool_populate_phys",
while the elements are getting populated before it in the call of
mempool_add_elem.

I can use nb_mem_chunks are '0' check. However it can break in future if
mempool_populate_phys changes.


Sorry, I'm not sure I'm getting what you say.

My question was about using mp->nb_mem_chunks instead of a new flag in the
dppa driver. Am I missing something?



mp->nb_mem_chunks gets finalized when the mempool is fully created. It's 
value is transient before that i.e. it will keep on changing on the 
every call to rte_mempool_populate_phys.


However, we need this information on the very first element allocation. 
So, nb_mem_chunks will not work.


Re: [dpdk-dev] [PATCH 3/6] test: fix memory leak in ring autotest

2017-12-22 Thread Olivier MATZ
Hi,

On Fri, Dec 22, 2017 at 10:12:07AM +, Anatoly Burakov wrote:
> Fixes: af75078fece3 ("first public release")

Not sure about this commit id: freeing rings is only possible
since commit 4e32101f9b01 ("ring: support freeing").

[...]

> @@ -894,6 +895,8 @@ test_ring(void)
>   /* dump the ring status */
>   rte_ring_list_dump(stdout);
>  
> + rte_ring_free(r);
> +
>   return 0;

I think this is incorrect: r is a static variable, and if it is
not set to NULL, it will be reused at next call.

Ideally, removing the static variable would be better than just
resetting the value to NULL, but it will require more modifications:
add a ring argument to test function, and change return -1 -> goto fail.


Re: [dpdk-dev] [PATCH] Create kern folder for Linux kernel modules

2017-12-22 Thread Hemant Agrawal

On 12/22/2017 8:21 PM, Wiles, Keith wrote:




On Dec 22, 2017, at 8:17 AM, Thomas Monjalon  wrote:

22/12/2017 14:59, Wiles, Keith:



On Dec 22, 2017, at 5:38 AM, Thomas Monjalon  wrote:

22/12/2017 11:04, Hemant Agrawal:

On 12/22/2017 2:13 PM, Thomas Monjalon wrote:

These modules are Linux modules, so they should be in the linuxapp dir.



This is a cleaner separation w.r.t userspace/kernel space code.
*kern* is a better placefolder for LKMs.


I prefer "kernel" name.


The name should be related to Linux in some way, like linux_kern or 
linux_kernel or linux_modules (this is the one I prefer) this way it make it 
clear which OS they are designed for.


If such top-level directory is created, the BSD modules must be moved there too.
That's why "kernel/" or "kernel/linux/" is appropriate.


OK seems reasonable, what about kernel/{freebsd,Linux, …}/modules/(module-name 
e.g. kni, igb_uio, nic_uio, …)

Kernel is misleading IMO, but I can live with it as long as we break down the 
different kernel related items. This is why I add modules in the path, as we 
could have other OSes like Windows with items that are not modules or VMs or 
containers…

I can live with kernel/{freebsd, linux, …}/{igb_uio, kni, nic_uio, ..}  but I 
would like to make sure it does not change in the future with adding windows.


Your suggestion seems reasonable.

I am not sure about windows.
May be some working on DPDK-on-windows can comment






Also eal is not getting overloaded.

linuxapp is part of librte_eal.  KNI is not related to EAL, but still
the kni kernel code is added to librte_eal under linuxapp.


Yes it makes sense.

More opinions/votes?


There are also some kernel modules in the bsdapp directory.


We can move them as well.




Regards,
Keith





Re: [dpdk-dev] [PATCH 4/6] test: fix memory leak in ring perf autotest

2017-12-22 Thread Olivier MATZ
On Fri, Dec 22, 2017 at 10:12:08AM +, Anatoly Burakov wrote:
> Fixes: ac3fb3019c52 ("app: rework ring tests")
> Cc: sta...@dpdk.org
> Signed-off-by: Anatoly Burakov 
> ---
>  test/test/test_ring_perf.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c
> index 84d2003..b586459 100644
> --- a/test/test/test_ring_perf.c
> +++ b/test/test/test_ring_perf.c
> @@ -420,6 +420,7 @@ test_ring_perf(void)
>   printf("\n### Testing using two NUMA nodes ###\n");
>   run_on_core_pair(&cores, enqueue_bulk, dequeue_bulk);
>   }
> + rte_ring_free(r);
>   return 0;
>  }

Same comment than the functional ring test, r is static.


[dpdk-dev] [PATCH] app/testpmd: set metering algorithm to the correct value

2017-12-22 Thread Tomasz Duszynski
No meter what option for traffic metering algorithm was given
on the testpmd command line value 0 (RTE_MTR_NONE) was set
and passed to the driver.

Fix that by setting traffic metering algorithm to the proper value.

Fixes: 30ffb4e67ee3 ("app/testpmd: add commands traffic metering and policing")
Cc: cristian.dumitre...@intel.com

Signed-off-by: Tomasz Duszynski 
---
 app/test-pmd/cmdline_mtr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline_mtr.c b/app/test-pmd/cmdline_mtr.c
index d8d806d..8dad8f8 100644
--- a/app/test-pmd/cmdline_mtr.c
+++ b/app/test-pmd/cmdline_mtr.c
@@ -171,7 +171,7 @@ static void cmd_add_port_meter_profile_srtcm_parsed(void 
*parsed_result,
 
/* Private shaper profile params */
memset(&mp, 0, sizeof(struct rte_mtr_meter_profile));
-   mp.alg = 0;
+   mp.alg = RTE_MTR_SRTCM_RFC2697;
mp.srtcm_rfc2697.cir = res->cir;
mp.srtcm_rfc2697.cbs = res->cbs;
mp.srtcm_rfc2697.ebs = res->ebs;
@@ -277,7 +277,7 @@ static void cmd_add_port_meter_profile_trtcm_parsed(void 
*parsed_result,
 
/* Private shaper profile params */
memset(&mp, 0, sizeof(struct rte_mtr_meter_profile));
-   mp.alg = 0;
+   mp.alg = RTE_MTR_TRTCM_RFC2698;
mp.trtcm_rfc2698.cir = res->cir;
mp.trtcm_rfc2698.pir = res->pir;
mp.trtcm_rfc2698.cbs = res->cbs;
@@ -389,7 +389,7 @@ static void cmd_add_port_meter_profile_trtcm_rfc4115_parsed(
 
/* Private shaper profile params */
memset(&mp, 0, sizeof(struct rte_mtr_meter_profile));
-   mp.alg = 0;
+   mp.alg = RTE_MTR_TRTCM_RFC4115;
mp.trtcm_rfc4115.cir = res->cir;
mp.trtcm_rfc4115.eir = res->eir;
mp.trtcm_rfc4115.cbs = res->cbs;
-- 
2.7.4



Re: [dpdk-dev] [DPDK 0/5] lib: add Port Representors

2017-12-22 Thread Neil Horman
On Fri, Dec 22, 2017 at 02:41:16PM +, Remy Horton wrote:
> Port Representors provide a logical presentation in DPDK of VF (virtual 
> function) ports for the purposes of control and monitoring. Each port 
> representor device represents a single VF and is associated with it's 
> parent physical function (PF) PMD which provides the back-end hooks for 
> the representor device ops and defines the control domain to which that 
> port belongs. This allows to use existing DPDK APIs to monitor and control 
> the port without the need to create and maintain VF specific APIs.
> 
> +-+   +---+  +---+
> |Control Plane|   |   Data Plane  |  |   Data Plane  |
> | Application |   |   Application |  |   Application |
> +-+   +---+  +---+
> | eth dev api |   |  eth dev api  |  |  eth dev api  |
> +-+   +---+  +---+
> +---+  +---+  +---+   +---+  +---+
> |  PF0  |  | Port  |  | Port  |   |VF0 PMD|  |VF0 PMD|
> |  PMD  <--+ Rep 0 |  | Rep 1 |   +---+  +--++
> |   |  | PMD   |  | PMD   | |
> +---+--^+  +---+  +-+-+ |
> |  ||  ||
> |  ++  ||
> |  ||
> |  ||
> ++  |
> |   |  HW (logical view)   | |  |
> | --+--+ +---+ +---+---+ |  |
> | |   PF   | |  VF0  | |  VF1  | |  |
> | || |   | |   ++
> | ++ +---+ +---+ |
> | ++ |
> | |VEB | |
> | ++ |
> | ++ |
> | |  Port  | |
> | |   0| |
> | ++ |
> ++
> 

How does this mesh with the notion of port ownership that we've been discussing
in other threads?  In that thread, we've been discussing the need for a single
execution context to have exclusive access to the hardware for the purposes of
configuration and data i/o, and for the application/execution context to be
responsible for co-ordination of any shared use of a device.  In this feature
however, the notion of a Port Representor creates an alias to the same hardware
funciton (VF), where both aliases (in the control and data plan) have parallel
access to the hardware, in such a way that co-ordination between the two is
largely impossible (unless you want to make the data plane application explicity
aware of control plane actiivy).

Neil



[dpdk-dev] [PATCH v2 0/5] Introduce virtual PMD for Hyper-V/Azure platforms

2017-12-22 Thread Adrien Mazarguil
Virtual machines hosted by Hyper-V/Azure platforms are fitted with
simplified virtual network devices named NetVSC that are used for fast
communication between VM to VM, VM to hypervisor, and the outside.

They appear as standard system netdevices to user-land applications, the
main difference being they are implemented on top of VMBUS [1] instead of
emulated PCI devices.

While this reads like a case for a standard DPDK PMD, there is more to it.

To accelerate outside communication, NetVSC devices as they appear in a VM
can be paired with physical SR-IOV virtual function (VF) devices owned by
that same VM [2]. Both netdevices share the same MAC address in that case.

When paired, egress and most of the ingress traffic flow through the VF
device, while part of it (e.g. multicasts, hypervisor control data) still
flows through NetVSC. Moreover VF devices are not retained and disappear
during VM migration; from a VM standpoint, they can be hot-plugged anytime
with NetVSC acting as a fallback.

Running DPDK applications in such a context involves driving VF devices
using their dedicated PMDs in a vendor-independent fashion (to benefit from
maximum performance without writing dedicated code) while simultaneously
listening to NetVSC and handling the related hot-plug events.

This new virtual PMD (referred to as "vdev_netvsc" from this point on)
automatically coordinates the Hyper-V/Azure-specific management part
described above by relying on vendor-specific, failsafe and tap PMDs to
expose a single consolidated Ethernet device usable directly by existing
applications.

 .--.
 | DPDK application |
 `+-'
  |
   .--+--.
   | DPDK ethdev |
   `--+--'   Control
  | |
 .+.v.-.
 |   failsafe PMD  +-+ vdev_netvsc PMD |
 `--+---+--' `-'
|   |
|  .|.
|  :|:
   .+. :   .+.   :
   | tap PMD | :   | any PMD |   :
   `+' :   `+'   : <-- Hot-pluggable
|  :|:
 .--+---.  :  .-+-.  :
 | NetVSC-based |  :  | SR-IOV VF |  :
 |   netdevice  |  :  |   device  |  :
 `--'  :  `---'  :
   :.:

Note this diagram differs from that of the original RFC [3], with
vdev_netvsc no longer acting as a data plane layer.

This initial version of the driver only works in whitelist mode. Users have
to provide the --vdev net_vdev_netvsc EAL option at least once to trigger
it.

Subsequent work will add support for blacklist mode based on automatic
detection of the host environment.

[1] http://dpdk.org/ml/archives/dev/2017-January/054165.html
[2] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v
[3] http://dpdk.org/ml/archives/dev/2017-November/082339.html

v2 changes:

- Renamed driver from "hyperv" to "vdev_netvsc". This change covers
  documentation and symbols prefix.
- Driver is now tagged EXPERIMENTAL.
- Replaced ether_addr_from_str() with a basic sscanf() call.
- Removed debugging code (memset() poisoning).
- Fixed hyperv_iface_is_netvsc()'s buffer allocation according to comments.
- Removed hyperv_basename().
- Discarded unused variables through __rte_unused.
- Added separate but necessary free() bugfix for failsafe PMD.
- Added file descriptor input support to failsafe PMD.
- Replaced temporary bash execution; failsafe now reads device definitions
  directly through a pipe without an intermediate bash one-liner.
- Expanded DEBUG/INFO/WARN/ERROR() macros as PMD_DRV_LOG().
- Added dynamic log type (pmd.vdev_netvsc).
- Modified initialization code to probe devices immediately during startup.
- Fixed several snprintf() return value checks ("ret >= sizeof(foo)" is more
  appropriate than "ret >= sizeof(foo) - 1").

Adrien Mazarguil (5):
  net/failsafe: fix invalid free
  net/failsafe: add "fd" parameter
  net/vdev_netvsc: introduce Hyper-V platform driver
  net/vdev_netvsc: implement core functionality
  net/vdev_netvsc: add "force" parameter

 MAINTAINERS |   6 +
 config/common_base  |   5 +
 config/common_linuxapp  |   1 +
 doc/guides/nics/fail_safe.rst   |   9 +
 doc/guides/nics/features/vdev_netvsc.ini|  12 +
 doc/guides/nics/index.rst   |   1 +
 doc/guides/nics/vdev_netvsc.rst | 116 +++
 drivers/net/Makefile|   1 +
 drivers/net/failsafe/failsafe_args.c|  88 ++-
 drivers/net/failsafe/failsafe_private.h |   3 +
 drivers/net/vdev_netvsc/Makefile|  58 ++
 .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map |   4 +
 drive

[dpdk-dev] [PATCH v2 1/5] net/failsafe: fix invalid free

2017-12-22 Thread Adrien Mazarguil
rte_free() is not supposed to work with pointers returned by calloc().

Fixes: a0194d828100 ("net/failsafe: add flexible device definition")
Cc: sta...@dpdk.org
Cc: Gaetan Rivet 

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/failsafe/failsafe_args.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/failsafe/failsafe_args.c 
b/drivers/net/failsafe/failsafe_args.c
index cfc83e365..ec63ac972 100644
--- a/drivers/net/failsafe/failsafe_args.c
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -407,7 +407,7 @@ failsafe_args_free(struct rte_eth_dev *dev)
uint8_t i;
 
FOREACH_SUBDEV(sdev, i, dev) {
-   rte_free(sdev->cmdline);
+   free(sdev->cmdline);
sdev->cmdline = NULL;
free(sdev->devargs.args);
sdev->devargs.args = NULL;
-- 
2.11.0


[dpdk-dev] [PATCH v2 3/5] net/vdev_netvsc: introduce Hyper-V platform driver

2017-12-22 Thread Adrien Mazarguil
This patch lays the groundwork for this driver (draft documentation,
copyright notices, code base skeleton and build system hooks). While it can
be successfully compiled and invoked, it's an empty shell at this stage.

Signed-off-by: Adrien Mazarguil 
---
 MAINTAINERS |   6 +
 config/common_base  |   5 +
 config/common_linuxapp  |   1 +
 doc/guides/nics/features/vdev_netvsc.ini|  12 ++
 doc/guides/nics/index.rst   |   1 +
 doc/guides/nics/vdev_netvsc.rst |  46 +++
 drivers/net/Makefile|   1 +
 drivers/net/vdev_netvsc/Makefile|  54 
 .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map |   4 +
 drivers/net/vdev_netvsc/vdev_netvsc.c   | 132 +++
 mk/rte.app.mk   |   1 +
 11 files changed, 263 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5a63b40c2..2b61c93aa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -451,6 +451,12 @@ F: drivers/net/mrvl/
 F: doc/guides/nics/mrvl.rst
 F: doc/guides/nics/features/mrvl.ini
 
+Microsoft vdev-netvsc - EXPERIMENTAL
+M: Adrien Mazarguil 
+F: drivers/net/vdev-netvsc/
+F: doc/guides/nics/vdev-netvsc.rst
+F: doc/guides/nics/features/vdev-netvsc.ini
+
 Netcope szedata2
 M: Matej Vido 
 F: drivers/net/szedata2/
diff --git a/config/common_base b/config/common_base
index b8ee8f91c..ef904dfd5 100644
--- a/config/common_base
+++ b/config/common_base
@@ -280,6 +280,11 @@ CONFIG_RTE_LIBRTE_NFP_DEBUG=n
 CONFIG_RTE_LIBRTE_MRVL_PMD=n
 
 #
+# Compile virtual device driver for NetVSC on Hyper-V/Azure
+#
+CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n
+
+#
 # Compile burst-oriented Broadcom BNXT PMD driver
 #
 CONFIG_RTE_LIBRTE_BNXT_PMD=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74c7d64ec..e04326224 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -47,6 +47,7 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_PMD_TAP=y
 CONFIG_RTE_LIBRTE_AVP_PMD=y
+CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y
 CONFIG_RTE_LIBRTE_NFP_PMD=y
 CONFIG_RTE_LIBRTE_POWER=y
 CONFIG_RTE_VIRTIO_USER=y
diff --git a/doc/guides/nics/features/vdev_netvsc.ini 
b/doc/guides/nics/features/vdev_netvsc.ini
new file mode 100644
index 0..cfc5cb93e
--- /dev/null
+++ b/doc/guides/nics/features/vdev_netvsc.ini
@@ -0,0 +1,12 @@
+;
+; Supported features of the 'vdev_netvsc' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+ARMv7= Y
+ARMv8= Y
+Power8   = Y
+x86-32   = Y
+x86-64   = Y
+Usage doc= Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 23babe933..566604671 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -64,6 +64,7 @@ Network Interface Controller Drivers
 szedata2
 tap
 thunderx
+vdev_netvsc
 virtio
 vhost
 vmxnet3
diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst
new file mode 100644
index 0..be31b6597
--- /dev/null
+++ b/doc/guides/nics/vdev_netvsc.rst
@@ -0,0 +1,46 @@
+..  BSD LICENSE
+Copyright 2017 6WIND S.A.
+Copyright 2017 Mellanox
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of 6WIND S.A. nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+VDEV_NETVSC poll mode driver
+
+
+The VDEV_NETVSC PMD (librte_pmd_vdev_netvsc) provi

[dpdk-dev] [PATCH v2 2/5] net/failsafe: add "fd" parameter

2017-12-22 Thread Adrien Mazarguil
This parameter enables applications to provide device definitions through
an arbitrary file descriptor number.

Signed-off-by: Adrien Mazarguil 
Cc: Gaetan Rivet 
---
 doc/guides/nics/fail_safe.rst   |  9 +++
 drivers/net/failsafe/failsafe_args.c| 86 +++-
 drivers/net/failsafe/failsafe_private.h |  3 +
 3 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index c4e3d2e8d..5b1b47e56 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -106,6 +106,15 @@ Fail-safe command line parameters
   All commas within the ``shell command`` are replaced by spaces before
   executing the command. This helps using scripts to specify devices.
 
+- **fd()** parameter
+
+  This parameter reads a device definition from an arbitrary file descriptor
+  number in  format as described above.
+
+  The file descriptor is read in non-blocking mode and is never closed in
+  order to take only the last line into account (unlike ``exec()``) at every
+  probe attempt.
+
 - **mac** parameter [MAC address]
 
   This parameter allows the user to set a default MAC address to the fail-safe
diff --git a/drivers/net/failsafe/failsafe_args.c 
b/drivers/net/failsafe/failsafe_args.c
index ec63ac972..7a8605174 100644
--- a/drivers/net/failsafe/failsafe_args.c
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -31,7 +31,11 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#include 
+#include 
+#include 
 #include 
+#include 
 #include 
 
 #include 
@@ -161,6 +165,73 @@ fs_execute_cmd(struct sub_device *sdev, char *cmdline)
 }
 
 static int
+fs_read_fd(struct sub_device *sdev, char *fd_str)
+{
+   FILE *fp = NULL;
+   int fd = -1;
+   /* store possible newline as well */
+   char output[DEVARGS_MAXLEN + 1];
+   int err = -ENODEV;
+   int ret;
+
+   RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL);
+   if (sdev->fd_str == NULL) {
+   sdev->fd_str = strdup(fd_str);
+   if (sdev->fd_str == NULL) {
+   ERROR("Command line allocation failed");
+   return -ENOMEM;
+   }
+   }
+   errno = 0;
+   fd = strtol(fd_str, &fd_str, 0);
+   if (errno || *fd_str || fd < 0) {
+   ERROR("Parsing FD number failed");
+   goto error;
+   }
+   /* Fiddle with copy of file descriptor */
+   fd = dup(fd);
+   if (fd == -1)
+   goto error;
+   ret = fcntl(fd, F_GETFL);
+   if (ret == -1)
+   goto error;
+   ret = fcntl(fd, F_SETFL, fd | O_NONBLOCK);
+   if (ret == -1)
+   goto error;
+   fp = fdopen(fd, "r");
+   if (!fp)
+   goto error;
+   fd = -1;
+   /* Only take the last line into account */
+   ret = 0;
+   while (fgets(output, sizeof(output), fp))
+   ++ret;
+   if (feof(fp)) {
+   if (!ret)
+   goto error;
+   } else if (ferror(fp)) {
+   if (errno != EAGAIN || !ret)
+   goto error;
+   } else if (!ret) {
+   goto error;
+   }
+   /* Line must end with a newline character */
+   fs_sanitize_cmdline(output);
+   if (output[0] == '\0')
+   goto error;
+   ret = fs_parse_device(sdev, output);
+   if (ret)
+   ERROR("Parsing device '%s' failed", output);
+   err = ret;
+error:
+   if (fp)
+   fclose(fp);
+   if (fd != -1)
+   close(fd);
+   return err;
+}
+
+static int
 fs_parse_device_param(struct rte_eth_dev *dev, const char *param,
uint8_t head)
 {
@@ -202,6 +273,14 @@ fs_parse_device_param(struct rte_eth_dev *dev, const char 
*param,
}
if (ret)
goto free_args;
+   } else if (strncmp(param, "fd", 2) == 0) {
+   ret = fs_read_fd(sdev, args);
+   if (ret == -ENODEV) {
+   DEBUG("Reading device info from FD failed");
+   ret = 0;
+   }
+   if (ret)
+   goto free_args;
} else {
ERROR("Unrecognized device type: %.*s", (int)b, param);
return -EINVAL;
@@ -409,6 +488,8 @@ failsafe_args_free(struct rte_eth_dev *dev)
FOREACH_SUBDEV(sdev, i, dev) {
free(sdev->cmdline);
sdev->cmdline = NULL;
+   free(sdev->fd_str);
+   sdev->fd_str = NULL;
free(sdev->devargs.args);
sdev->devargs.args = NULL;
}
@@ -424,7 +505,8 @@ fs_count_device(struct rte_eth_dev *dev, const char *param,
param[b] != '\0')
b++;
if (strncmp(param, "dev", b) != 0 &&
-   strncmp(param, "exec", b) != 0) {
+   strncmp(param, "exec", b) != 

[dpdk-dev] [PATCH v2 4/5] net/vdev_netvsc: implement core functionality

2017-12-22 Thread Adrien Mazarguil
As described in more details in the attached documentation (see patch
contents), this virtual device driver manages NetVSC interfaces in virtual
machines hosted by Hyper-V/Azure platforms.

This driver does not manage traffic nor Ethernet devices directly; it acts
as a thin configuration layer that automatically instantiates and controls
fail-safe PMD instances combining tap and PCI sub-devices, so that each
NetVSC interface is exposed as a single consolidated port to DPDK
applications.

PCI sub-devices being hot-pluggable (e.g. during VM migration),
applications automatically benefit from increased throughput when present
and automatic fallback on NetVSC otherwise without interruption thanks to
fail-safe's hot-plug handling.

Once initialized, the sole job of the vdev_netvsc driver is to regularly
scan for PCI devices to associate with NetVSC interfaces and feed their
addresses to corresponding fail-safe instances.

Signed-off-by: Adrien Mazarguil 
---
 doc/guides/nics/vdev_netvsc.rst   |  65 
 drivers/net/vdev_netvsc/Makefile  |   4 +
 drivers/net/vdev_netvsc/vdev_netvsc.c | 581 -
 3 files changed, 649 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst
index be31b6597..73a63e552 100644
--- a/doc/guides/nics/vdev_netvsc.rst
+++ b/doc/guides/nics/vdev_netvsc.rst
@@ -38,9 +38,74 @@ platforms.
 
 .. _Hyper-V: 
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v
 
+Implementation details
+--
+
+Each instance of this driver effectively needs to drive two devices: the
+NetVSC interface proper and its SR-IOV VF (referred to as "physical" from
+this point on) counterpart sharing the same MAC address.
+
+Physical devices are part of the host system and cannot be maintained during
+VM migration. From a VM standpoint they appear as hot-plug devices that come
+and go without prior notice.
+
+When the physical device is present, egress and most of the ingress traffic
+flows through it; only multicasts and other hypervisor control still flow
+through NetVSC. Otherwise, NetVSC acts as a fallback for all traffic.
+
+To avoid unnecessary code duplication and ensure maximum performance,
+handling of physical devices is left to their original PMDs; this virtual
+device driver (also known as *vdev*) manages other PMDs as summarized by the
+following block diagram::
+
+ .--.
+ | DPDK application |
+ `+-'
+  |
+   .--+--.
+   | DPDK ethdev |
+   `--+--'   Control
+  | |
+ .+.v.-.
+ |   failsafe PMD  +-+ vdev_netvsc PMD |
+ `--+---+--' `-'
+|   |
+|  .|.
+|  :|:
+   .+. :   .+.   :
+   | tap PMD | :   | any PMD |   :
+   `+' :   `+'   : <-- Hot-pluggable
+|  :|:
+ .--+---.  :  .-+-.  :
+ | NetVSC-based |  :  | SR-IOV VF |  :
+ |   netdevice  |  :  |   device  |  :
+ `--'  :  `---'  :
+   :.:
+
 Build options
 -
 
 - ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``)
 
Toggle compilation of this driver.
+
+Run-time parameters
+---
+
+To invoke this PMD, applications have to explicitly provide the
+``--vdev=net_vdev_netvsc`` EAL option.
+
+The following device parameters are supported:
+
+- ``iface`` [string]
+
+  Provide a specific NetVSC interface (netdevice) name to attach this PMD
+  to. Can be provided multiple times for additional instances.
+
+- ``mac`` [string]
+
+  Same as ``iface`` except a suitable NetVSC interface is located using its
+  MAC address.
+
+Not specifying either ``iface`` or ``mac`` makes this PMD attach itself to
+all NetVSC interfaces found on the system.
diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile
index e53050fe1..3b3fe1c56 100644
--- a/drivers/net/vdev_netvsc/Makefile
+++ b/drivers/net/vdev_netvsc/Makefile
@@ -40,6 +40,9 @@ EXPORT_MAP := rte_pmd_vdev_netvsc_version.map
 CFLAGS += -O3
 CFLAGS += -g
 CFLAGS += -std=c11 -pedantic -Wall -Wextra
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += -D_BSD_SOURCE
+CFLAGS += -D_DEFAULT_SOURCE
 CFLAGS += $(WERROR_FLAGS)
 
 # Dependencies.
@@ -47,6 +50,7 @@ LDLIBS += -lrte_bus_vdev
 LDLIBS += -lrte_eal
 LDLIBS += -lrte_ethdev
 LDLIBS += -lrte_kvargs
+LDLIBS += -lrte_net
 
 # Source files.
 SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c
diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c 
b/drivers/net/vdev_netvsc/vdev_netvsc.c
index 3b73482da..738196e75 100644
--- a/drivers/net/vdev_netvsc/vdev_netvsc.c
+++ b/drivers/net/vdev_netvsc/vdev_netvsc.c
@@ -31,1

[dpdk-dev] [PATCH v2 5/5] net/vdev_netvsc: add "force" parameter

2017-12-22 Thread Adrien Mazarguil
This parameter allows specifying any non-NetVSC interface to use with tap
sub-devices for development purposes.

Signed-off-by: Adrien Mazarguil 
---
 doc/guides/nics/vdev_netvsc.rst   |  5 +
 drivers/net/vdev_netvsc/vdev_netvsc.c | 27 +++
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst
index 73a63e552..a0417b5ef 100644
--- a/doc/guides/nics/vdev_netvsc.rst
+++ b/doc/guides/nics/vdev_netvsc.rst
@@ -107,5 +107,10 @@ The following device parameters are supported:
   Same as ``iface`` except a suitable NetVSC interface is located using its
   MAC address.
 
+- ``force`` [int]
+
+  If nonzero, forces the use of specified interfaces even if not detected as
+  NetVSC.
+
 Not specifying either ``iface`` or ``mac`` makes this PMD attach itself to
 all NetVSC interfaces found on the system.
diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c 
b/drivers/net/vdev_netvsc/vdev_netvsc.c
index 738196e75..5e426adc0 100644
--- a/drivers/net/vdev_netvsc/vdev_netvsc.c
+++ b/drivers/net/vdev_netvsc/vdev_netvsc.c
@@ -63,6 +63,7 @@
 #define VDEV_NETVSC_DRIVER net_vdev_netvsc
 #define VDEV_NETVSC_ARG_IFACE "iface"
 #define VDEV_NETVSC_ARG_MAC "mac"
+#define VDEV_NETVSC_ARG_FORCE "force"
 #define VDEV_NETVSC_PROBE_MS 1000
 
 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}"
@@ -405,6 +406,9 @@ vdev_netvsc_alarm(__rte_unused void *arg)
  *   - struct rte_kvargs *kvargs:
  * Device arguments provided to current driver instance.
  *
+ *   - int force:
+ * Accept specified interface even if not detected as NetVSC.
+ *
  *   - unsigned int specified:
  * Number of specific netdevices provided as device arguments.
  *
@@ -422,6 +426,7 @@ vdev_netvsc_netvsc_probe(const struct if_nameindex *iface,
 {
const char *name = va_arg(ap, const char *);
struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *);
+   int force = va_arg(ap, int);
unsigned int specified = va_arg(ap, unsigned int);
unsigned int *matched = va_arg(ap, unsigned int *);
unsigned int i;
@@ -480,10 +485,11 @@ vdev_netvsc_netvsc_probe(const struct if_nameindex *iface,
if (!specified)
return 0;
PMD_DRV_LOG(WARNING,
-   "interface \"%s\" (index %u) is not NetVSC,"
-   " skipping",
-   iface->if_name, iface->if_index);
-   return 0;
+   "interface \"%s\" (index %u) is not NetVSC, %s",
+   iface->if_name, iface->if_index,
+   force ? "using anyway (forced)" : "skipping");
+   if (!force)
+   return 0;
}
/* Create interface context. */
ctx = calloc(1, sizeof(*ctx));
@@ -610,6 +616,7 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev)
static const char *const vdev_netvsc_arg[] = {
VDEV_NETVSC_ARG_IFACE,
VDEV_NETVSC_ARG_MAC,
+   VDEV_NETVSC_ARG_FORCE,
NULL,
};
const char *name = rte_vdev_device_name(dev);
@@ -618,6 +625,7 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev)
 vdev_netvsc_arg);
unsigned int specified = 0;
unsigned int matched = 0;
+   int force = 0;
unsigned int i;
int ret;
 
@@ -631,14 +639,16 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev)
for (i = 0; i != kvargs->count; ++i) {
const struct rte_kvargs_pair *pair = &kvargs->pairs[i];
 
-   if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) ||
-   !strcmp(pair->key, VDEV_NETVSC_ARG_MAC))
+   if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE))
+   force = !!atoi(pair->value);
+   else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) ||
+!strcmp(pair->key, VDEV_NETVSC_ARG_MAC))
++specified;
}
rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL);
/* Gather interfaces. */
ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs,
-   specified, &matched);
+   force, specified, &matched);
if (ret < 0)
goto error;
if (matched < specified)
@@ -697,7 +707,8 @@ RTE_PMD_REGISTER_VDEV(VDEV_NETVSC_DRIVER, vdev_netvsc_vdev);
 RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc);
 RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc,
  VDEV_NETVSC_ARG_IFACE "= "
- VDEV_NETVSC_ARG_MAC "=");
+ VDEV_NETVSC_ARG_MAC "= "
+ VDEV_NETVSC_ARG_FORCE "=");
 
 /** Initialize driver log type. */
 static void
-- 
2.11.0


Re: [dpdk-dev] [PATCH] member: fix memory leak on error

2017-12-22 Thread Wang, Yipeng1
>-Original Message-
>From: Burakov, Anatoly
>Yep, i can see that now. Didn't think to look inside rte_member_free()
>:/ However, you're creating a race condition there - you're unlocking a
>tailq, and then locking (and unlocking) it again inside
>rte_member_free() - it probably needs _thread_unsafe() functions that
>you can call from behind the lock.
>
>--

Thank you Anatoly,

I realize that rte_member_free does not do anything good here. As a fix, I 
think the following should work. Is there any other concern?

diff --git a/lib/librte_member/rte_member.c b/lib/librte_member/rte_member.c
index cc9ea84..25934e8 100644
--- a/lib/librte_member/rte_member.c
+++ b/lib/librte_member/rte_member.c
@@ -192,7 +192,8 @@ rte_member_create(const struct rte_member_parameters 
*params)

 error_unlock_exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
-   rte_member_free(setsum);
+   rte_free(te);
+   rte_free(setsum);
return NULL;
 }

Thank you!


Re: [dpdk-dev] [PATCH 1/3] kni: support for MAC addr change

2017-12-22 Thread Ferruh Yigit
On 11/30/2017 3:46 AM, Hemant Agrawal wrote:
> This patch adds following:
> 1. Option to configure the mac address during create. Generate random
>address only if the user has not provided any valid address.
> 2. Inform usespace, if mac address is being changed in linux.
> 3. Implement default handling of mac address change in the corresponding
>ethernet device.>
> Signed-off-by: Hemant Agrawal 

Overall lgtm, there are a few issues commented below.

Thanks,
ferruh

<...>

> @@ -269,11 +275,15 @@ The code for allocating the kernel NIC interfaces for a 
> specific port is as foll
>  conf.addr = dev_info.pci_dev->addr;
>  conf.id = dev_info.pci_dev->id;
>  
> +/* Get the interface default mac address */
> +rte_eth_macaddr_get(port_id, struct ether_addr 
> *)&conf.mac_addr);

a parentheses is missing, good to fix although this is document :)

<...>

> @@ -587,3 +603,26 @@ Currently, setting a new MTU and configuring the network 
> interface (up/ down) ar
>  RTE_LOG(ERR, APP, "Failed to start port %d\n", port_id);
>  return ret;
>  }
> +
> +/* Callback for request of configuring device mac address */
> +
> +static int
> +kni_config_mac_address(uint16_t port_id, uint8_t mac_addr[])
> +{
> +int ret = 0;
> +
> +if (port_id >= rte_eth_dev_count() || port_id >= RTE_MAX_ETHPORTS) {
> +RTE_LOG(ERR, KNI, "Invalid port id %d\n", port_id);
> +return -EINVAL;
> +}
> +
> +RTE_LOG(INFO, KNI, "Configure mac address of %d", port_id);
> +/* Configure network interface mac address */
> +ret = rte_eth_dev_default_mac_addr_set(port_id,
> +   (struct ether_addr 
> *)mac_addr);
> +if (ret < 0)
> +RTE_LOG(ERR, KNI, "Failed to config mac_addr for port %d\n",
> +port_id);
> +
> +return ret;
> +}


It is hard to maintain code in doc, I am aware other related code is already in
document but what do you think keeping this minimal, like:

static int
kni_config_mac_address(uint16_t port_id, uint8_t mac_addr[])
{
   
}


<...>

> @@ -559,6 +583,14 @@ rte_kni_handle_request(struct rte_kni *kni)
>   req->result = kni->ops.config_network_if(\
>   kni->ops.port_id, req->if_up);
>   break;
> + case RTE_KNI_REQ_CHANGE_MAC_ADDR: /* Change MAC Address */
> + if (kni->ops.config_mac_address)
> + req->result = kni->ops.config_mac_address(
> + kni->ops.port_id, req->mac_addr);
> + else
> + req->result = kni_config_mac_address(
> + kni->ops.port_id, req->mac_addr);

ops.port_id can be unset if there is no physically backing device the kni
interface. And I guess for that case port_id will be 0 and it will corrupt other
interface's data. There needs to find a way to handle the port_id not set case.

Since kni sample always creates a KNI interface backed by pyhsical device, this
is not an issue for kni sample app but please think about kni pmd case.

<...>

> @@ -87,6 +91,7 @@ struct rte_kni_conf {
>   unsigned mbuf_size; /* mbuf size */
>   struct rte_pci_addr addr;
>   struct rte_pci_id id;
> + char mac_addr[ETHER_ADDR_LEN]; /* MAC address assigned to KNI */
>  
>   __extension__
>   uint8_t force_bind : 1; /* Flag to bind kernel thread */

"struct rte_kni_conf" is a public struct. Adding a variable into the middle of
the struct will break the ABI.
But I think it is OK to add to the end, unless struct is not used as array.

<...>


Re: [dpdk-dev] [PATCH 2/3] kni: add support for promisc mode set

2017-12-22 Thread Ferruh Yigit
On 11/30/2017 3:46 AM, Hemant Agrawal wrote:
> Inform userspace app about promisc mode change

Same two concerns here with prev patch.
- Breaking ABI
- And handling ops.port_id not set by application case.

> 
> Signed-off-by: Hemant Agrawal 

<...>



Re: [dpdk-dev] [PATCH 3/3] kni: set initial value for MTU

2017-12-22 Thread Ferruh Yigit
On 11/30/2017 3:46 AM, Hemant Agrawal wrote:
> Configure initial application provided  mtu on the KNI interface.
> 
> Signed-off-by: Hemant Agrawal 

<...>

> @@ -95,6 +95,7 @@ struct rte_kni_conf {
>   struct rte_pci_addr addr;
>   struct rte_pci_id id;
>   char mac_addr[ETHER_ADDR_LEN]; /* MAC address assigned to KNI */
> + uint16_t mtu;

Same issue here, adding a new field into middle of the public struct.
I think it would be OK to add to the end, but to be sure would you please run
ABI check script (validate-abi.sh) after adding to the end?

Thanks,
ferruh

>  
>   __extension__
>   uint8_t force_bind : 1; /* Flag to bind kernel thread */
> 



Re: [dpdk-dev] [RFC v2 1/5] ether: add flow action to redirect packet in a switch domain

2017-12-22 Thread Alex Rosenbaum
+Adrien

On Fri, Dec 22, 2017 at 10:20 AM, Zhang, Qi Z  wrote:
>> On Thu, Dec 21, 2017 at 4:35 AM, Qi Zhang  wrote:
>> > Add action RTE_FLOW_ACTION_TYPE_SWITCH_PORT, it can be used to
>> > redirect

>> A verbs would be better suited for an ACTION_TYPE. while ".._TYPE_PORT" is
>> a nous.
>> Probably ".._TYPE_REDIRECT" would better fit here.
>
> I agree it will be better to use verbs for action, so we can have 
> TYPE_REDIRECT_TO_PORT/VF/PF...,
> But since we already have ACTION_TYPE_VF, ACTION_TYPE_PF ...
> Maybe it's better just to follow the same pattern?

hemmm, missed these ACTION_TYPE_VF/PF...
Adrien, what do you think about these naming conventions?


Re: [dpdk-dev] [RFC v2 3/5] ether: Add flow timeout support

2017-12-22 Thread Alex Rosenbaum
On Fri, Dec 22, 2017 at 11:03 AM, Zhang, Qi Z  wrote:
>> On Thu, Dec 21, 2017 at 4:35 AM, Qi Zhang  wrote:
>> > Add new APIs to support flow timeout, application is able to 1. Setup
>> > the time duration of a flow, the flow is expected to be deleted
>> > automatically when timeout.
>>
>> Can you explain how the application (OVS) is expected to use this API?
>> It will help to better understand the motivation here...
>
> I think the purpose of the APIs is to expose the hardware feature that support
> flow auto delete with a timeout.
> As I know, for OVS, every flow in flow table will have time duration
> A flow be offloaded to hardware is still required to be deleted in specific 
> time,
> I think these APIs help OVS to take advantage HW feature and simplify the flow
> aging management

Are you sure this will allow OVS to 'fire-and-forget' about the rule removal?
or will OVS anyway do rule cleanup from application tables?

Do you know if OVS flow timers are (or can be) re-armed in different
use cases? e.g. extending the timeout duration if traffic is still
flowing?



>> Are you trying to move the aging timer from application code into the PMD?
>> or can your HW remove/disable/inactivate a flow at certain time semantics
>> without software context?
>
> Yes, it for hardware feature.

So if the hardware auto removes the hardware steering entry, what
software part deletes the rte_flow handle?
What software part triggers the application callback? from what
context? will locks be required?

How do you prevent races between application thread and the context
deleting/accessing the rte_flow handle?
I mean in cases that application wants to delete the flow before the
timeout expires, but actually it is same time hardware deletes it.

Alex


  1   2   >