[dpdk-dev] [dpdk-announce] DPDK hands on lab March 10th, Bangalore India

2018-01-20 Thread Tibrewala, Sujata
DPDK hands on lab March 10th, Bangalore India [1]
Please apply at eventbrite link [2] 

In this hands on lab you will learn what is new with DPDK, how it is used in 
other projects such as fd.io, kata containers and Mobile Edge computing. You 
will get to log in to the latest Intel Xeon processors in this lab and follow 
along with step by step instructions given by the instructors. These labs are 
designed to give you head start in the latest technology in virtualized packet 
processing. Please apply for the lab, we will review your application and let 
you know if we can accommodate you. 

For more details on the lab and updated agenda and local DPDK, SDN/NFV related 
events please join the Bangalore meet up group [3]. Please note joining the 
meet up does not guarantee your place in the lab, you will need to apply at [2].

[1] 
https://www.theleela.com/en_us/hotels-in-bengaluru/the-leela-palace-hotel-bengaluru/
[2] http://bit.ly/2mU41YZ
[3] https://www.meetup.com/Out-of-the-Box-Network-Developers-Bangalore

Sujata Tibrewala @sujatatibre
Community Development Manager 
Intel Developer Zone
https://software.intel.com/networking
NPG Marketing Training PM (DOT)




Re: [dpdk-dev] Compilation errors in drivers/event/opdl/

2018-01-20 Thread Thomas Monjalon
20/01/2018 06:18, Patil, Harish:
> Hi,
> 
> I am seeing below compilation errors in drivers/event/opdl/, this is with
> cloned latest DPDK (git clone http://dpdk.org/git/dpdk).
> 
> ..
> ..
> /home2/hpatil/e4/jan19-inbox-submit/dpdk/drivers/event/opdl/opdl_evdev_xsta
> ts.c: In function ‘opdl_xstats_get_names’:
> /home2/hpatil/e4/jan19-inbox-submit/dpdk/drivers/event/opdl/opdl_evdev_xsta
> ts.c:89:2: error: ‘for’ loop initial declarations are only allowed in
> C99 mode
>   for (uint32_t j = 0; j < max_num_port_xstat; j++) {
>   ^

My compiler does not raise this error.
What is your compiler?

Anyone to fix it QUICKLY please? today?

Harish, do you think we should revert if not fixed?


Re: [dpdk-dev] [PATCH v6 15/23] eventtimer: add buffering of timer expiry events

2018-01-20 Thread Pavan Nikhilesh
On Thu, Jan 18, 2018 at 11:07:52PM +, Carrillo, Erik G wrote:
> > -Original Message-
> > From: Pavan Nikhilesh [mailto:pbhagavat...@caviumnetworks.com]
> > Sent: Thursday, January 11, 2018 6:19 AM
> > To: Carrillo, Erik G ;
> > jerin.ja...@caviumnetworks.com; nipun.gu...@nxp.com;
> > hemant.agra...@nxp.com
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH v6 15/23] eventtimer: add buffering of timer expiry
> > events
> >
> > On Wed, Jan 10, 2018 at 06:21:06PM -0600, Erik Gabriel Carrillo wrote:
> > > Buffer timer expiry events generated while walking a "run list"
> > > in rte_timer_manage, and burst enqueue them to an event device to the
> > > extent possible.
> > >
> >
> > IMO in some cases this adds a lot of delay between expiries and events being
> > published to event dev. For example, having long expiry interval (default 
> > 300
> > seconds for mac expiry) the expired entries would remain in the buffer till 
> > 32
> > other entries expire.
> >
>
> The service function invokes rte_timer_manage to handle expired timers, and 
> as it does so, the buffer will be flushed under two conditions:  the buffer 
> is full of expired timer events, or the buffer is not full but there are no 
> more expired timers to handle for this iteration of the service.  The latter 
> condition will flush the buffer even if only one event has been buffered 
> after walking the list of expired rte_timers.

Ah, I missed the flush call after timer_manage().

>
> So, there could be some delay for the events that got buffered earliest, but 
> it seems like the throughput benefit outweighs the small delay there.  
> Thoughts?
>
> We could also make the buffer size configurable.

Maybe make it compile time configurable i.e. in config/common_base.

>
> >
> > > Signed-off-by: Erik Gabriel Carrillo 
> > > ---
> > >  lib/librte_eventdev/rte_event_timer_adapter.c | 118
> > > +++---
> > >  1 file changed, 108 insertions(+), 10 deletions(-)
> > >
> > 


Re: [dpdk-dev] Compilation errors in drivers/event/opdl/

2018-01-20 Thread Jerin Jacob
-Original Message-
> Date: Sat, 20 Jan 2018 05:18:30 +
> From: "Patil, Harish" 
> To: "liang.j...@intel.com" ,
>  "peter.mccar...@intel.com" 
> CC: "dev@dpdk.org" 
> Subject: [dpdk-dev] Compilation errors in drivers/event/opdl/
> 
> [This sender failed our fraud detection checks and may not be who they appear 
> to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
> 
> Hi,
> 
> I am seeing below compilation errors in drivers/event/opdl/, this is with
> cloned latest DPDK (git clone http://dpdk.org/git/dpdk).
> 
> ..
> ..
> /home2/hpatil/e4/jan19-inbox-submit/dpdk/drivers/event/opdl/opdl_evdev_xsta
> ts.c: In function ‘opdl_xstats_get_names’:
> /home2/hpatil/e4/jan19-inbox-submit/dpdk/drivers/event/opdl/opdl_evdev_xsta
> ts.c:89:2: error: ‘for’ loop initial declarations are only allowed in
> C99 mode
>   for (uint32_t j = 0; j < max_num_port_xstat; j++) {
>   ^
> /home2/hpatil/e4/jan19-inbox-submit/dpdk/drivers/event/opdl/opdl_evdev_xsta
> ts.c:89:2: note: use option -std=c99 or -std=gnu99 to compile your code
> /home2/hpatil/e4/jan19-inbox-submit/dpdk/drivers/event/opdl/opdl_evdev_xsta
> ts.c: In function ‘opdl_xstats_get’:
> /home2/hpatil/e4/jan19-inbox-submit/dpdk/drivers/event/opdl/opdl_evdev_xsta
> ts.c:124:2: error: ‘for’ loop initial declarations are only allowed in
> C99 mode
>   for (uint32_t i = 0; i < n; i++) {
>   ^
> /home2/hpatil/e4/jan19-inbox-submit/dpdk/drivers/event/opdl/opdl_evdev_xsta
> ts.c: In function ‘opdl_xstats_get_by_name’:
> /home2/hpatil/e4/jan19-inbox-submit/dpdk/drivers/event/opdl/opdl_evdev_xsta
> ts.c:145:2: error: ‘for’ loop initial declarations are only allowed in
> C99 mode
>   for (uint32_t i = 0; i < max_index; i++) {


Tested with gcc(7.2 and 5.3) and clang(5.0.1) versions. Found no issues.
Which compiler you are using?


> ..
> ..
>   ^
> 
> Thanks,
> Harish
> 
> 
> 
> 
> 


Re: [dpdk-dev] [PATCH v3] net/i40e: fix fdir Rx resource defect

2018-01-20 Thread Zhang, Helin


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing
> Sent: Friday, January 19, 2018 1:24 PM
> To: Zhang, Qi Z; Wu, Jingjing
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH v3] net/i40e: fix fdir Rx resource defect
> 
> FDIR Rx ring isn't initialized and Rx queue HW tail isn't updated when there's
> error detected during programming FDIR flow. There'll be some potential risk.
> This patch updates FDIR Rx resource.
> 
> Fixes: a778a1fa2e4e ("i40e: set up and initialize flow director")
> Fixes: 05999aab4ca6 ("i40e: add or delete flow director")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Beilei Xing 
> Acked-by: Jingjing Wu 
Applied to dpdk-next-net-intel, thanks!

/Helin


Re: [dpdk-dev] [PATCH v2] net/ixgbe: check if security capabilities are enabled by HW

2018-01-20 Thread Zhang, Helin


> -Original Message-
> From: Zhang, Helin
> Sent: Thursday, January 18, 2018 8:43 AM
> To: Ananyev, Konstantin; Nicolau, Radu; dev@dpdk.org
> Cc: Yigit, Ferruh; Lu, Wenzhuo; Zhao, XinfengX; De Lara Guarch, Pablo
> Subject: RE: [PATCH v2] net/ixgbe: check if security capabilities are enabled 
> by
> HW
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > Konstantin
> > Sent: Thursday, January 18, 2018 6:48 AM
> > To: Nicolau, Radu; dev@dpdk.org
> > Cc: Yigit, Ferruh; Lu, Wenzhuo; Zhao, XinfengX; De Lara Guarch, Pablo
> > Subject: Re: [dpdk-dev] [PATCH v2] net/ixgbe: check if security
> > capabilities are enabled by HW
> >
> >
> >
> > > -Original Message-
> > > From: Nicolau, Radu
> > > Sent: Wednesday, January 17, 2018 11:55 AM
> > > To: dev@dpdk.org
> > > Cc: Yigit, Ferruh ; Lu, Wenzhuo
> > > ; Ananyev, Konstantin
> > > ; Zhao, XinfengX
> > > ; De Lara Guarch, Pablo
> > > ; Nicolau, Radu
> > > 
> > > Subject: [PATCH v2] net/ixgbe: check if security capabilities are
> > > enabled by HW
> > >
> > > Check if the security enable bits are not fused before setting
> > > offload capabilities for security
> > >
> > > Signed-off-by: Radu Nicolau 
> > > ---
> >
> > Acked-by: Konstantin Ananyev 
> Applied to dpdk-next-net-intel, with minor commit title changes. Thanks!
Removed from dpdk-next-net-intel, as new version was sent out. Thanks!

/Helin
> 
> /Helin


Re: [dpdk-dev] [PATCH v3] net/i40e: fix packet type parser issue

2018-01-20 Thread Zhang, Helin


> -Original Message-
> From: Zhang, Helin
> Sent: Wednesday, January 17, 2018 10:57 PM
> To: Zhang, Qi Z; Xing, Beilei
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: RE: [PATCH v3] net/i40e: fix packet type parser issue
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhang, Qi Z
> > Sent: Tuesday, January 16, 2018 7:56 AM
> > To: Xing, Beilei
> > Cc: dev@dpdk.org; sta...@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v3] net/i40e: fix packet type parser
> > issue
> >
> >
> >
> > > -Original Message-
> > > From: Xing, Beilei
> > > Sent: Monday, January 15, 2018 9:51 PM
> > > To: Zhang, Qi Z 
> > > Cc: dev@dpdk.org; sta...@dpdk.org
> > > Subject: [PATCH v3] net/i40e: fix packet type parser issue
> > >
> > > Ptype mapping table will fail to update when loading PPP profile,
> > > fix the issue via modifying metadata and adding check.
> > > This patch also adds parser for IPV4FRAG and IPV6FRAG.
> > >
> > > Fixes: ab2e350c4f4b ("net/i40e: improve packet type parser")
> > > Cc: sta...@dpdk.org
> > >
> > > Signed-off-by: Beilei Xing 
> >
> > Acked-by: Qi Zhang 
> Applied into dpdk-next-net-intel, with minor commit log corrections. Thanks!
Removed from dpdk-next-net-intel, as new version was sent. Thanks!

/Helin
> 
> /Helin


Re: [dpdk-dev] [PATCH] net/i40e/avf/ixgbe: remove unnecessary mbuf field initialization in PMD

2018-01-20 Thread Zhang, Helin
Hi Rosen

You may need to split the patches into 3, one per each PMD.
Also please get it reviewed by maintainers, and do/fix any patchwork check 
issues.
Thanks!

/Helin

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Rosen Xu
> Sent: Friday, January 19, 2018 11:02 AM
> To: Xing, Beilei; Zhang, Qi Z; Lu, Wenzhuo
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/i40e/avf/ixgbe: remove unnecessary mbuf
> field initialization in PMD
> 
> Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next to NULL when
> the mbuf is initialized or stored inside the mempool (unused).
> All of these are done in rte_pktmbuf_pool_create() and
> rte_pktmbuf_prefree_seg().
> So we remove the redundant code from i40e, ixgbe and avf module.
> 
> Fixes: 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Rosen Xu 
> ---
>  drivers/net/avf/avf_rxtx.c | 6 --
>  drivers/net/i40e/i40e_rxtx.c   | 6 --
>  drivers/net/ixgbe/ixgbe_rxtx.c | 1 -
>  3 files changed, 13 deletions(-)
> 
> diff --git a/drivers/net/avf/avf_rxtx.c b/drivers/net/avf/avf_rxtx.c index
> e0c4583..b9051d6 100644
> --- a/drivers/net/avf/avf_rxtx.c
> +++ b/drivers/net/avf/avf_rxtx.c
> @@ -221,10 +221,7 @@
>   return -ENOMEM;
>   }
> 
> - rte_mbuf_refcnt_set(mbuf, 1);
> - mbuf->next = NULL;
>   mbuf->data_off = RTE_PKTMBUF_HEADROOM;
> - mbuf->nb_segs = 1;
>   mbuf->port = rxq->port_id;
> 
>   dma_addr =
> @@ -1239,10 +1236,7 @@
>   rte_prefetch0(rxep[i + 1]);
> 
>   mb = rxep[i];
> - rte_mbuf_refcnt_set(mb, 1);
> - mb->next = NULL;
>   mb->data_off = RTE_PKTMBUF_HEADROOM;
> - mb->nb_segs = 1;
>   mb->port = rxq->port_id;
>   dma_addr =
> rte_cpu_to_le_64(rte_mbuf_data_iova_default(mb));
>   rxdp[i].read.hdr_addr = 0;
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> 23256b7..b578957 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -550,10 +550,7 @@
>   rte_prefetch0(rxep[i + 1].mbuf);
> 
>   mb = rxep[i].mbuf;
> - rte_mbuf_refcnt_set(mb, 1);
> - mb->next = NULL;
>   mb->data_off = RTE_PKTMBUF_HEADROOM;
> - mb->nb_segs = 1;
>   mb->port = rxq->port_id;
>   dma_addr = rte_cpu_to_le_64(\
>   rte_mbuf_data_iova_default(mb));
> @@ -2411,10 +2408,7 @@
>   return -ENOMEM;
>   }
> 
> - rte_mbuf_refcnt_set(mbuf, 1);
> - mbuf->next = NULL;
>   mbuf->data_off = RTE_PKTMBUF_HEADROOM;
> - mbuf->nb_segs = 1;
>   mbuf->port = rxq->port_id;
> 
>   dma_addr =
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index 4b38247..72da571 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1629,7 +1629,6 @@ uint16_t ixgbe_xmit_fixed_burst_vec(void
> *tx_queue, struct rte_mbuf **tx_pkts,
>   mb->port = rxq->port_id;
>   }
> 
> - rte_mbuf_refcnt_set(mb, 1);
>   mb->data_off = RTE_PKTMBUF_HEADROOM;
> 
>   /* populate the descriptors */
> --
> 1.8.3.1



Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership

2018-01-20 Thread Ananyev, Konstantin
Hi Neil,

> - Message-
> From: Neil Horman [mailto:nhor...@tuxdriver.com]
> Sent: Friday, January 19, 2018 7:48 PM
> To: Thomas Monjalon 
> Cc: dev@dpdk.org; Matan Azrad ; Richardson, Bruce 
> ; Ananyev, Konstantin
> ; Gaetan Rivet ; Wu, 
> Jingjing 
> Subject: Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
> 
> On Fri, Jan 19, 2018 at 07:12:36PM +0100, Thomas Monjalon wrote:
> > 19/01/2018 18:43, Neil Horman:
> > > On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
> > > > 19/01/2018 16:27, Neil Horman:
> > > > > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > > > > > 19/01/2018 14:30, Neil Horman:
> > > > > > > So it seems like the real point of contention that we need to 
> > > > > > > settle here is,
> > > > > > > what codifies an 'owner'.  Must it be a specific execution 
> > > > > > > context, or can we
> > > > > > > define any arbitrary section of code as being an owner?  I would 
> > > > > > > agrue against
> > > > > > > the latter.
> > > > > >
> > > > > > This is the first thing explained in the cover letter:
> > > > > > "2. The port usage synchronization will be managed by the port 
> > > > > > owner."
> > > > > > There is no intent to manage the threads synchronization for a 
> > > > > > given port.
> > > > > > It is the responsibility of the owner (a code object) to configure 
> > > > > > its
> > > > > > port via only one thread.
> > > > > > It is consistent with not trying to manage threads synchronization
> > > > > > for Rx/Tx on a given queue.
> > > > > >
> > > > > >
> > > > > Yes, in his cover letter, and I contend that notion is an invalid 
> > > > > design point.
> > > > > By codifying an area of code as an 'owner', rather than an execution 
> > > > > context,
> > > > > you're defining the notion of heirarchy, not ownership. That is to 
> > > > > say,
> > > > > you want to codify the notion that there are top level ports that the
> > > > > application might see, and some of those top level ports are parents 
> > > > > to
> > > > > subordinate ports, which only the parent port should access directly. 
> > > > >  If thats
> > > > > all you want to encode, there are far easier ways to do it:
> > > > >
> > > > > struct rte_eth_shared_data {
> > > > >   < existing bits >
> > > > >   struct rte_eth_port_list {
> > > > >   struct rte_eth_port_list *children;
> > > > >   struct rte_eth_port_list *parent;
> > > > >   };
> > > > > };
> > > > >
> > > > >
> > > > > Build an api around a structure like that, so that the parent/child 
> > > > > relationship
> > > > > is globally clear, and this would be much easier, especially if you 
> > > > > want to
> > > > > continue asserting that the notion of synchronization/exclusion is an 
> > > > > exercise
> > > > > left to the application.
> > > >
> > > > Not only Neil.
> > > > An owner can be something else than a port.
> > > > An owner can be an app process (multi-processes).
> > > > An owner can be a library.
> > > > The intent is really to solve the generic problem of which code
> > > > is managing a port.
> > > >
> > > I don't see how this precludes any part of what you just said.  Define the
> > > rte_eth_port_list externally to the shared_data struct and allow any 
> > > object you
> > > want to allocate it, then anything you want to control a heirarchy of 
> > > ports can
> > > do so without issue, and the structure is far more clear than an opaque 
> > > id that
> > > carries subtle semantic ordering with it.
> >
> > Sorry, I don't understand. Please could you rephrase?
> >
> 
> Sure, I'm saying the fact that you want an owner to be an object
> (library/port/process) rather than strictly an execution context
> (process/thread) doesn't preclude what I'm proposing above.  You can create a
> generic version of the strcture I propose above like so:
> 
> struct rte_obj_heirarchy {
>   struct rte_obj_heirarchy *children;
>   struct rte_obj_heirarchy *parent;
>   void *owner_data; /* optional */
> };
> 
> And embed that structure in any object you would like to give a representative
> heirarchy to, you then have a fairly simple api
> 
> struct rte_obj_heirarchy *heirarchy_alloc();
> bool heirarchy_set(struct rte_obj_heirarchy *parent, struct rte_obj_heirarcy 
> *child)
> void heirarchy_release(struct rte_obj_heirarchy *obj)
> 
> That gives you the privately held list relationship I think you are in part
> looking for (i.e. the ability for a failsafe device to iterate over the ports 
> it
> is in control of), without the awkwardness of the ordinal priority that the
> current implementation imposes.
> 
> In summary, if what you want is ownership in the strictest sense of the word
> (i.e. mutually exclusive access, which I think makes sense), then using a lock
> and flag is really the simplest way to go.  If instead what you want is a
> heirarchical relationship where you can iterate over a limited set of objects
> (the failsafe child port example), then the 

Re: [dpdk-dev] [PATCH v4 0/3] net/i40e: change for ptype parser

2018-01-20 Thread Zhang, Qi Z


> -Original Message-
> From: Xing, Beilei
> Sent: Friday, January 19, 2018 3:50 PM
> To: Zhang, Qi Z 
> Cc: dev@dpdk.org; Chilikin, Andrey 
> Subject: [PATCH v4 0/3] net/i40e: change for ptype parser
> 
> This patchset is mainly for fixing fail to update SW ptype table and adding
> parser for IPV4FRAG and IPV6FRAG.
> 
> v4 changes:
>  - Slipt patchset and replace strncmp with strncasecmp.
> v3 changes:
>  - Reorder IPv4 case.
> v2 changes:
>  - Add parser for IPV4FRAG and IPV6FRAG.
> 
> Beilei Xing (3):
>   net/i40e: fix fail to update ptype table
>   net/i40e: add parser for IPV4FRAG and IPV6FRAG
>   net/i40e: replace strncmp with strncasecmp
> 
>  drivers/net/i40e/i40e_ethdev.c  | 83
> +
>  drivers/net/i40e/rte_pmd_i40e.c |  6 ++-
>  2 files changed, 54 insertions(+), 35 deletions(-)   
> 
> --
> 2.5.5

Acked-by: Qi Zhang



Re: [dpdk-dev] [PATCH v4 0/3] net/i40e: change for ptype parser

2018-01-20 Thread Zhang, Helin


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhang, Qi Z
> Sent: Saturday, January 20, 2018 9:19 PM
> To: Xing, Beilei
> Cc: dev@dpdk.org; Chilikin, Andrey
> Subject: Re: [dpdk-dev] [PATCH v4 0/3] net/i40e: change for ptype parser
> 
> 
> 
> > -Original Message-
> > From: Xing, Beilei
> > Sent: Friday, January 19, 2018 3:50 PM
> > To: Zhang, Qi Z 
> > Cc: dev@dpdk.org; Chilikin, Andrey 
> > Subject: [PATCH v4 0/3] net/i40e: change for ptype parser
> >
> > This patchset is mainly for fixing fail to update SW ptype table and
> > adding parser for IPV4FRAG and IPV6FRAG.
> >
> > v4 changes:
> >  - Slipt patchset and replace strncmp with strncasecmp.
> > v3 changes:
> >  - Reorder IPv4 case.
> > v2 changes:
> >  - Add parser for IPV4FRAG and IPV6FRAG.
> >
> > Beilei Xing (3):
> >   net/i40e: fix fail to update ptype table
> >   net/i40e: add parser for IPV4FRAG and IPV6FRAG
> >   net/i40e: replace strncmp with strncasecmp
> >
> >  drivers/net/i40e/i40e_ethdev.c  | 83
> > +
> >  drivers/net/i40e/rte_pmd_i40e.c |  6 ++-
> >  2 files changed, 54 insertions(+), 35 deletions(-)
> >
> > --
> > 2.5.5
> 
> Acked-by: Qi Zhang 
Applied the series into dpdk-next-net-intel, thanks!
 
/Helin


Re: [dpdk-dev] [PATCH v1] testpmd: fix incorrect port_id word size

2018-01-20 Thread Ferruh Yigit
On 1/19/2018 1:27 PM, Remy Horton wrote:

"app/testpmd: fix incorrect port id word size" ?

> The word size of port_id is now 16 bits, but there were parsing directives
> that assumed it was still of type UINT8, resulting in incorrect commandline
> parse results.
> 
> Fixes: f14a210a65fe ("app: fix port id type")

Cc: sta...@dpdk.org

> Signed-off-by: Remy Horton 

Reviewed-by: Ferruh Yigit 


Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership

2018-01-20 Thread Thomas Monjalon
20/01/2018 13:54, Ananyev, Konstantin:
> Hi Neil,
> 
> > - Message-
> > From: Neil Horman [mailto:nhor...@tuxdriver.com]
> > Sent: Friday, January 19, 2018 7:48 PM
> > To: Thomas Monjalon 
> > Cc: dev@dpdk.org; Matan Azrad ; Richardson, Bruce 
> > ; Ananyev, Konstantin
> > ; Gaetan Rivet ; Wu, 
> > Jingjing 
> > Subject: Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
> > 
> > On Fri, Jan 19, 2018 at 07:12:36PM +0100, Thomas Monjalon wrote:
> > > 19/01/2018 18:43, Neil Horman:
> > > > On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
> > > > > 19/01/2018 16:27, Neil Horman:
> > > > > > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > > > > > > 19/01/2018 14:30, Neil Horman:
> > > > > > > > So it seems like the real point of contention that we need to 
> > > > > > > > settle here is,
> > > > > > > > what codifies an 'owner'.  Must it be a specific execution 
> > > > > > > > context, or can we
> > > > > > > > define any arbitrary section of code as being an owner?  I 
> > > > > > > > would agrue against
> > > > > > > > the latter.
> > > > > > >
> > > > > > > This is the first thing explained in the cover letter:
> > > > > > > "2. The port usage synchronization will be managed by the port 
> > > > > > > owner."
> > > > > > > There is no intent to manage the threads synchronization for a 
> > > > > > > given port.
> > > > > > > It is the responsibility of the owner (a code object) to 
> > > > > > > configure its
> > > > > > > port via only one thread.
> > > > > > > It is consistent with not trying to manage threads synchronization
> > > > > > > for Rx/Tx on a given queue.
> > > > > > >
> > > > > > >
> > > > > > Yes, in his cover letter, and I contend that notion is an invalid 
> > > > > > design point.
> > > > > > By codifying an area of code as an 'owner', rather than an 
> > > > > > execution context,
> > > > > > you're defining the notion of heirarchy, not ownership. That is to 
> > > > > > say,
> > > > > > you want to codify the notion that there are top level ports that 
> > > > > > the
> > > > > > application might see, and some of those top level ports are 
> > > > > > parents to
> > > > > > subordinate ports, which only the parent port should access 
> > > > > > directly.  If thats
> > > > > > all you want to encode, there are far easier ways to do it:
> > > > > >
> > > > > > struct rte_eth_shared_data {
> > > > > > < existing bits >
> > > > > > struct rte_eth_port_list {
> > > > > > struct rte_eth_port_list *children;
> > > > > > struct rte_eth_port_list *parent;
> > > > > > };
> > > > > > };
> > > > > >
> > > > > >
> > > > > > Build an api around a structure like that, so that the parent/child 
> > > > > > relationship
> > > > > > is globally clear, and this would be much easier, especially if you 
> > > > > > want to
> > > > > > continue asserting that the notion of synchronization/exclusion is 
> > > > > > an exercise
> > > > > > left to the application.
> > > > >
> > > > > Not only Neil.
> > > > > An owner can be something else than a port.
> > > > > An owner can be an app process (multi-processes).
> > > > > An owner can be a library.
> > > > > The intent is really to solve the generic problem of which code
> > > > > is managing a port.
> > > > >
> > > > I don't see how this precludes any part of what you just said.  Define 
> > > > the
> > > > rte_eth_port_list externally to the shared_data struct and allow any 
> > > > object you
> > > > want to allocate it, then anything you want to control a heirarchy of 
> > > > ports can
> > > > do so without issue, and the structure is far more clear than an opaque 
> > > > id that
> > > > carries subtle semantic ordering with it.
> > >
> > > Sorry, I don't understand. Please could you rephrase?
> > >
> > 
> > Sure, I'm saying the fact that you want an owner to be an object
> > (library/port/process) rather than strictly an execution context
> > (process/thread) doesn't preclude what I'm proposing above.  You can create 
> > a
> > generic version of the strcture I propose above like so:
> > 
> > struct rte_obj_heirarchy {
> > struct rte_obj_heirarchy *children;
> > struct rte_obj_heirarchy *parent;
> > void *owner_data; /* optional */
> > };
> > 
> > And embed that structure in any object you would like to give a 
> > representative
> > heirarchy to, you then have a fairly simple api
> > 
> > struct rte_obj_heirarchy *heirarchy_alloc();
> > bool heirarchy_set(struct rte_obj_heirarchy *parent, struct 
> > rte_obj_heirarcy *child)
> > void heirarchy_release(struct rte_obj_heirarchy *obj)
> > 
> > That gives you the privately held list relationship I think you are in part
> > looking for (i.e. the ability for a failsafe device to iterate over the 
> > ports it
> > is in control of), without the awkwardness of the ordinal priority that the
> > current implementation imposes.
> > 
> > In summary, if what you want is ownership in the strictest sense of the word
> 

Re: [dpdk-dev] [PATCH] app/testpmd: change log level at run time

2018-01-20 Thread Thomas Monjalon
19/01/2018 23:19, Elza Mathew:
> @@ -16026,6 +16078,7 @@ struct cmd_cmdfile_result {
>   (cmdline_parse_inst_t *)&cmd_set_link_down,
>   (cmdline_parse_inst_t *)&cmd_reset,
>   (cmdline_parse_inst_t *)&cmd_set_numbers,
> + (cmdline_parse_inst_t *)&cmd_set_log,
>   (cmdline_parse_inst_t *)&cmd_set_txpkts,
>   (cmdline_parse_inst_t *)&cmd_set_txsplit,
>   (cmdline_parse_inst_t *)&cmd_set_fwd_list,
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index d8c9ef0..3c55eb6 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -505,6 +505,25 @@ When retry is enabled, the transmit delay time and 
> number of retries can also be
>  
> testpmd> set burst tx delay (microseconds) retry (num)
>  
> +set log
> +~~~
> +
> +Set the log level for a log type::
> +
> + testpmd> set log global|(type) (level)
> +
> +Where:
> +
> +* ``type`` is the log name.
> +
> +* ``level`` is the log level.
> +
> +For example, to change the global log level::
> + testpmd> set log global (level)
> +
> +Regexes can also be used for type. To change log level of user1, user2 and 
> user3::
> + testpmd> set log user[1-3] (level)
> +
>  set txpkts
>  ~~

I know that testpmd code and doc are a bit messy,
but maybe we can find a better place for this command
in the doc and in the code.
As it is very generic, I think it should be one of the first commands.


Re: [dpdk-dev] [PATCH v11 5/5] net/virtio: support GUEST ANNOUNCE

2018-01-20 Thread Ferruh Yigit
On 1/19/2018 5:33 PM, Ferruh Yigit wrote:
> On 1/16/2018 9:41 PM, Xiao Wang wrote:
>> When live migration is done, for the backup VM, either the virtio
>> frontend or the vhost backend needs to send out gratuitous RARP packet
>> to announce its new network location.
>>
>> This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support live
>> migration scenario where the vhost backend doesn't have the ability to
>> generate RARP packet.
>>
>> Brief introduction of the work flow:
>> 1. QEMU finishes live migration, pokes the backup VM with an interrupt.
>> 2. Virtio interrupt handler reads out the interrupt status value, and
>>realizes it needs to send out RARP packet to announce its location.
>> 3. Pause device to stop worker thread touching the queues.
>> 4. Inject a RARP packet into a Tx Queue.
>> 5. Ack the interrupt via control queue.
>> 6. Resume device to continue packet processing.
>>
>> Signed-off-by: Xiao Wang 
>> Reviewed-by: Maxime Coquelin 
> 
> 
> Hi Yuanhan,
> 
> This commit breaks the build!

I switched two patches and problem gone, like:
first: net: fixup RARP generation
second: net/virtio: support GUEST ANNOUNCE

>From my point of view nothing more needs to be done, but can you please double
check the patches.

Thanks,
ferruh

> 
> As far as I understand you send a fix but merged into other patch, which 
> leaves
> this commit still broken.
> 
> What do you think sending a fix that can be mergable to this one, so I can
> squash it on next-net?
> 
> Thanks,
> ferruh
> 



Re: [dpdk-dev] [PATCH 2/3] hash: run-time function selection

2018-01-20 Thread Thomas Monjalon
11/12/2017 14:26, Bruce Richardson:
> On Mon, Nov 06, 2017 at 10:04:49AM -0800, Elza Mathew wrote:
> > Compile-time function selection can potentially lead to
> > lower performance on generic builds done by distros.
> > Replaced compile time flag checks with run-time function
> > selection.
> > 
> > Signed-off-by: Elza Mathew 
> > ---
> >  lib/librte_hash/rte_fbk_hash.c | 11 ++-
> >  lib/librte_hash/rte_fbk_hash.h |  8 
> >  2 files changed, 10 insertions(+), 9 deletions(-)
> >
> Title needs an update to indicate this change is for fbk-hash. I suspect
> that can be fixed on apply.
> 
> Acked-by: Bruce Richardson 

Applied, thanks


Re: [dpdk-dev] [PATCH 1/3] hash: run-time function selection

2018-01-20 Thread Thomas Monjalon
11/12/2017 13:52, Bruce Richardson:
> On Mon, Nov 06, 2017 at 10:04:02AM -0800, Elza Mathew wrote:
> > Compile-time function selection can potentially lead to
> > lower performance on generic builds done by distros.
> > Replaced compile time flag checks with run-time function
> > selection.
> > 
> > Signed-off-by: Elza Mathew 
> > ---
> >  lib/librte_hash/rte_cuckoo_hash.c | 10 +-
> >  lib/librte_hash/rte_cuckoo_hash.h |  6 --
> >  2 files changed, 9 insertions(+), 7 deletions(-)
> > 
> 
> Looks good to me.
> 
> Acked-by: Bruce Richardson 

Applied, thanks


Re: [dpdk-dev] [PATCH v2] vfio: noiommu check error handling

2018-01-20 Thread Thomas Monjalon
19/01/2018 18:37, Maxime Coquelin:
> 
> On 10/31/2017 04:59 PM, Jonas Pfefferle wrote:
> > Check and report errors on open/read in noiommu check.
> > 
> > Signed-off-by: Jonas Pfefferle 
> > ---
> >   lib/librte_eal/linuxapp/eal/eal_vfio.c | 29 +
> >   1 file changed, 21 insertions(+), 8 deletions(-)
> 
> I agree with the fix, Kernels v4.4 and earlier does have vfio, but not
> the noiommu mode, so the file does not exist.
> 
> Acked-by: Maxime Coquelin 

Applied, and added this explanation, thanks



[dpdk-dev] [pull-request] next-crypto 18.02 rc1

2018-01-20 Thread Pablo de Lara
The following changes since commit c43cb3b184ca416c2a2ecd8edb797cbcd25c:

  hash: select fbk function at run-time (2018-01-20 15:35:16 +0100)

are available in the Git repository at:

  http://dpdk.org/git/next/dpdk-next-crypto 

for you to fetch changes up to a4063a1d84ff393334c8c5f325529a99586f9751:

  examples/ipsec-secgw: try end in flow actions before fail (2018-01-20 
15:10:53 +)


Akhil Goyal (8):
  security: fix enum start value
  examples/ipsec-secgw: add cryptodev mask option
  crypto/dpaa_sec: support ipsec protocol offload
  examples/ipsec-secgw: update mbuf packet type
  examples/ipsec-secgw: update incremental checksum
  crypto/dpaa_sec: rewrite Rx/Tx path
  examples/ipsec-secgw: improve ipsec dequeue logic
  examples/ipsec-secgw: fix corner case for SPI value

Alok Makhariya (1):
  crypto/dpaa_sec: retire fq while detaching with session

Andrea Grandi (2):
  doc: fix lists of supported algorithms
  doc: fix format in OpenSSL installation guide

Anoob Joseph (3):
  examples/ipsec-secgw: fix usage of incorrect port
  security: support userdata retrieval
  examples/ipsec-secgw: support inline protocol

Billy O'Mahony (2):
  doc: fix typo in QAT doc
  cryptodev: extend sym session Doxygen info

Fan Zhang (1):
  crypto/aesni_mb: support AES-CCM

Hemant Agrawal (2):
  crypto/dpaa_sec: optimize virt to phy conversion
  crypto/dpaa_sec: support multiple sessions per qp

Jerin Jacob (1):
  test/crypto: fix missing include

Nélio Laranjeiro (8):
  security: fix device operation type
  crypto: fix pedantic compilation errors
  security: fix pedantic compilation
  examples/ipsec-secgw: fix missing ingress flow attribute
  examples/ipsec-secgw: add target queues in flow actions
  examples/ipsec-secgw: add egress flow actions
  net: fix ESP header byte ordering definition
  examples/ipsec-secgw: fix SPI byte order in flow item

Pablo de Lara (5):
  doc: update IPSec Multi-buffer lib versioning
  cryptodev: add missing CPU flag string
  cryptodev: fix function prototype
  app/crypto-perf: support IMIX
  cryptodev: remove duplicated device name length

Radu Nicolau (4):
  security: add get session size function
  net/ixgbe: implement security session get size
  examples/ipsec_secgw: create session mempools for ethdevs
  examples/ipsec-secgw: try end in flow actions before fail

Tomasz Duszynski (1):
  crypto/mrvl: update MRVL CRYPTO PMD documentation

 app/test-crypto-perf/cperf_ops.c   |  77 +-
 app/test-crypto-perf/cperf_ops.h   |   2 +-
 app/test-crypto-perf/cperf_options.h   |   7 +-
 app/test-crypto-perf/cperf_options_parsing.c   |  61 +-
 app/test-crypto-perf/cperf_test_latency.c  |   4 +-
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c   |   8 +-
 app/test-crypto-perf/cperf_test_throughput.c   |   3 +-
 app/test-crypto-perf/cperf_test_verify.c   |   3 +-
 app/test-crypto-perf/main.c|  89 ++-
 config/common_base |   1 -
 doc/guides/cryptodevs/aesni_gcm.rst|   4 +-
 doc/guides/cryptodevs/aesni_mb.rst |  13 +-
 doc/guides/cryptodevs/features/aesni_mb.ini|   1 +
 doc/guides/cryptodevs/features/dpaa_sec.ini|   1 +
 doc/guides/cryptodevs/mrvl.rst |  31 +-
 doc/guides/cryptodevs/openssl.rst  |  15 +-
 doc/guides/cryptodevs/qat.rst  |   3 +-
 doc/guides/nics/mrvl.rst   |   2 +
 doc/guides/prog_guide/rte_security.rst |  22 +-
 doc/guides/rel_notes/release_18_02.rst |  11 +
 doc/guides/sample_app_ug/ipsec_secgw.rst   |  10 +-
 doc/guides/tools/cryptoperf.rst|  14 +
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h   |   2 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c | 145 +++-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c |  33 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |  12 +-
 drivers/crypto/armv8/rte_armv8_pmd_private.h   |   2 +-
 drivers/crypto/dpaa_sec/dpaa_sec.c | 806 -
 drivers/crypto/dpaa_sec/dpaa_sec.h | 138 +++-
 drivers/crypto/kasumi/rte_kasumi_pmd_private.h |   2 +-
 drivers/crypto/mrvl/rte_mrvl_pmd_ops.c |   2 +-
 drivers/crypto/null/null_crypto_pmd_private.h  |   2 +-
 drivers/crypto/openssl/rte_openssl_pmd_private.h   |   2 +-
 drivers/crypto/snow3g/rte_snow3g_pmd_private.h |   2 +-
 drivers/crypto/zuc/rte_zuc_pmd_private.h   |   2 +-
 drivers/net/ixgbe/ixgbe_ipsec.c|   7 +
 examples/ipsec-secgw/esp.c |   6 +-
 examples/ipsec-secgw/ipip.h|  26 +-
 examples/ipsec-secgw/ipsec-secgw.c

Re: [dpdk-dev] [dpdk-stable] [PATCH v3] bus/pci: forbid VA as IOVA mode if IOMMU address width too small

2018-01-20 Thread Thomas Monjalon
12/01/2018 11:22, Maxime Coquelin:
> Intel VT-d supports different address widths for the IOVAs, from
> 39 bits to 56 bits.
> 
> While recent processors support at least 48 bits, VT-d emulation
> currently only supports 39 bits. It makes DMA mapping to fail in this
> case when using VA as IOVA mode, as user-space virtual addresses uses
> up to 47 bits (see kernel's Documentation/x86/x86_64/mm.txt).
> 
> This patch parses VT-d CAP register value available in sysfs, and
> forbid VA as IOVA mode if the GAW is 39 bits or unknown.
> 
> Fixes: f37dfab21c98 ("drivers/net: enable IOVA mode for Intel PMDs")
> 
> Cc: sta...@dpdk.org
> Signed-off-by: Maxime Coquelin 
[...]
> + if (fscanf(fp, "%lx", &vtd_cap_reg) != 1) {

Compilation error on 32-bit. Fix:

-   if (fscanf(fp, "%lx", &vtd_cap_reg) != 1) {
+   if (fscanf(fp, "%" PRIx64, &vtd_cap_reg) != 1) {

[...]
> +#elif defined(RTE_ARCH_PPC_64)
> +static bool
> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> +{
> +   return false;
> +}
> +#else
> +static bool
> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> +{
> + return true;
> +}
> +#endif

Compilation error on non-x86. Fix:

 #elif defined(RTE_ARCH_PPC_64)
 static bool
-pci_one_device_iommu_support_va(struct rte_pci_device *dev)
+pci_one_device_iommu_support_va(__rte_unused struct rte_pci_device *dev)
 {
return false;
 }
 #else
 static bool
-pci_one_device_iommu_support_va(struct rte_pci_device *dev)
+pci_one_device_iommu_support_va(__rte_unused struct rte_pci_device *dev)
 {
return true;
 }

Applied with above fixes, thanks.


Re: [dpdk-dev] [PATCH] virtio: add new driver for crypto devices

2018-01-20 Thread Thomas Monjalon
Hi,

28/11/2017 02:27, Jay Zhou:
> For DPDK, I'm a newbie. Thanks for testing and pointing these steps
> out, will fix them in V2.

Any news about this work?

I see there is also a patch from Fan Zhang to support crypto in vhost-user.
Do you work together?


Re: [dpdk-dev] [PATCH 00/12] lib/librte_vhost: introduce new vhost_user crypto

2018-01-20 Thread Thomas Monjalon
18/01/2018 15:59, Yuanhan Liu:
> On Mon, Nov 27, 2017 at 08:01:03PM +, Fan Zhang wrote:
> > This patchset adds crypto backend suppport to vhost_user library,
> > including a proof-of-concept sample application. The implementation
> > follows the virtio-crypto specification and have been tested
> > with qemu 2.9.50 (with several patches applied, detailed later)
> > with Fedora 24 running in the frontend.
> > 
> > The vhost_user library acts as a "bridge" method that translate
> > the virtio-crypto crypto requests to DPDK crypto operations, so it
> > is purely software implementation. However it does require the user
> > to provide the DPDK Cryptodev ID so it knows how to handle the
> > virtio-crypto session creation and deletion mesages.
> > 
> > Currently the implementation supports AES-CBC-128 and HMAC-SHA1
> > cipher only/chaining modes and does not support sessionless mode
> > yet. The guest can use standard virtio-crypto driver to set up
> > session and sends encryption/decryption requests to backend. The
> > vhost-crypto sample application provided in this patchset will
> > do the actual crypto work.
> > 
> > To make this patchset working, a few tweaks need to be done:
> > 
> > In the host:
> > 1. Download the qemu source code, and apply the patches in:
> > list.nongnu.org/archive/html/qemu-devel/2017-07/msg04664.html.
> 
> I could not open it. What's the status of them now? Have they got
> merged?

As usual, we must wait to have Qemu support ready.

How this work is related to drivers/crypto/virtio/ proposed
by Jay Zhou (Huawei)?



Re: [dpdk-dev] [PATCH] virtio: add new driver for crypto devices

2018-01-20 Thread Thomas Monjalon
+Cc Pablo, maintainer of the crypto tree

20/01/2018 16:50, Thomas Monjalon:
> Hi,
> 
> 28/11/2017 02:27, Jay Zhou:
> > For DPDK, I'm a newbie. Thanks for testing and pointing these steps
> > out, will fix them in V2.
> 
> Any news about this work?
> 
> I see there is also a patch from Fan Zhang to support crypto in vhost-user.
> Do you work together?



Re: [dpdk-dev] [PATCH v5 0/6] TAP RSS eBPF cover letter

2018-01-20 Thread Ferruh Yigit
On 1/19/2018 6:48 AM, Pascal Mazon wrote:
> Hi,
> 
> It seems more logical to me to introduce tap_program (patch 3) before
> its compiled version (patch 2).
> Source code is indeed written down before compiling it.
> 
> The doc section is a good addition.
> I'll be happy to see the upcoming utility for turning eBPF bytecode to C
> arrays.
> I'd have liked to see automation code (in a not-executed Makefile target
> typically) for generating the bytecode.
> I'm being told it should happen in the upcoming series along with the
> aforementioned utility.
> 
> Otherwise code looks good enough (I couldn't see everything for lack of
> time), considering that later patches are expected in next release.
> 
> Acked-by: Pascal Mazon 
> 
> Best regards,
> Pascal
> 
> On 18/01/2018 14:38, Ophir Munk wrote:
>> The patches of TAP RSS eBPF follow the RFC on this issue
>> https://dpdk.org/dev/patchwork/patch/31781/
>>
>> v5 changes with respect to v4
>> =
>> Update TAP document guide with RSS
>>
>> v4 changes with respect to v3
>> =
>> * Code updates based on review comments
>> * New commits organization (2-->5) based on review comments
>>   1. net/tap: support actions for different classifiers (preparations for 
>> BPF. 
>>  No BPF code yet)
>>   2. net/tap: add eBPF bytes code (BPF bytes code in a separate file)
>>   3. net/tap: add eBPF program file (Program source code of bytes code)
>>   4. net/tap: add eBPF API (BPF API to be used by TAP)
>>   5. net/tap: implement TAP RSS using eBPF
>>
>> v3 changes with respect to v2
>> =
>> * Add support for IPv6 RSS in BPF program
>> * Bug fixes
>> * Updated compatibility to kernel versions:
>>   eBPF requires Linux version 4.9 configured with BPF
>> * New license header (SPDX) for newly added files
>>
>> v2 changes with respect to v1
>> =
>> * v2 has new commits organization (3 --> 2)
>> * BPF program was revised. It is successfully tested on
>>   IPv4 L3 L4 layers (compatible to mlx4 device)
>> * Licensing: no comments received for using "Dual BSD/GPL"
>>   string during BPF program loading to the kernel.
>>   (v1 and v2 are using the same license strings)
>>   Any comments are welcome.
>> * Compatibility to kernel versions:
>>   eBPF requires Linux version 4.2 configured with BPF. TAP PMD will
>>   successfully compile on systems with old or non-BPF configured kernels.
>>   During compilation time the required Linux headers are searched for.
>>   If they are not present missing definitions are locally added
>>   (tap_autoconf.h).
>>   If the kernel cannot support a BPF operation - at runtime it will
>>   gracefully reject the netlink message (with BPF) sent to it.
>>
>> Ophir Munk (6):
>>   net/tap: support actions for different classifiers
>>   net/tap: add eBPF bytes code
>>   net/tap: add eBPF program file
>>   net/tap: add eBPF API
>>   net/tap: implement TAP RSS using eBPF
>>   doc: detail new tap RSS feature in guides

Series applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH v6] arch/arm: optimization for memcpy on ARM64

2018-01-20 Thread Thomas Monjalon
19/01/2018 07:10, Herbert Guan:
> This patch provides an option to do rte_memcpy() using 'restrict'
> qualifier, which can induce GCC to do optimizations by using more
> efficient instructions, providing some performance gain over memcpy()
> on some ARM64 platforms/enviroments.
> 
> The memory copy performance differs between different ARM64
> platforms. And a more recent glibc (e.g. 2.23 or later)
> can provide a better memcpy() performance compared to old glibc
> versions. It's always suggested to use a more recent glibc if
> possible, from which the entire system can get benefit. If for some
> reason an old glibc has to be used, this patch is provided for an
> alternative.
> 
> This implementation can improve memory copy on some ARM64
> platforms, when an old glibc (e.g. 2.19, 2.17...) is being used.
> It is disabled by default and needs "RTE_ARCH_ARM64_MEMCPY"
> defined to activate. It's not always proving better performance
> than memcpy() so users need to run DPDK unit test
> "memcpy_perf_autotest" and customize parameters in "customization
> section" in rte_memcpy_64.h for best performance.
> 
> Compiler version will also impact the rte_memcpy() performance.
> It's observed on some platforms and with the same code, GCC 7.2.0
> compiled binary can provide better performance than GCC 4.8.5. It's
> suggested to use GCC 5.4.0 or later.
> 
> Signed-off-by: Herbert Guan 
> Acked-by: Jerin Jacob 

Applied, thanks



Re: [dpdk-dev] [PATCH] eal/common: better likely() and unlikely()

2018-01-20 Thread Thomas Monjalon
19/11/2017 23:16, Aleksey Baulin:
> A warning is issued when using an argument to likely() or unlikely()
> builtins which is evaluated to a pointer value, as __builtin_expect()
> expects a 'long int' type for its first argument. With this fix
> a pointer value is converted to an integer with the value of 0 or 1.
> 
> Signed-off-by: Aleksey Baulin 

After philosophical debates,
Applied, thanks :)



Re: [dpdk-dev] [PATCH v3 2/7] ethdev: fix used portid allocation

2018-01-20 Thread Matan Azrad
Hi Konstantin

From: Ananyev, Konstantin, Friday, January 19, 2018 2:40 PM
> > -Original Message-
> > From: Matan Azrad [mailto:ma...@mellanox.com]
> > Sent: Thursday, January 18, 2018 4:35 PM
> > To: Thomas Monjalon ; Gaetan Rivet
> > ; Wu, Jingjing 
> > Cc: dev@dpdk.org; Neil Horman ; Richardson,
> > Bruce ; Ananyev, Konstantin
> > ; sta...@dpdk.org
> > Subject: [PATCH v3 2/7] ethdev: fix used portid allocation
> >
> > rte_eth_dev_find_free_port() found a free port by state checking.
> > The state field are in local process memory, so other DPDK processes
> > may get the same port ID because their local states may be different.
> >
> > Replace the state checking by the ethdev port name checking, so, if
> > the name is an empty string the port ID will be detected as unused.
> >
> > Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple
> > process model")
> > Cc: sta...@dpdk.org
> >
> > Suggested-by: Konstantin Ananyev 
> > Signed-off-by: Matan Azrad 
> > ---
> >  lib/librte_ether/rte_ethdev.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_ether/rte_ethdev.c
> > b/lib/librte_ether/rte_ethdev.c index 156231c..5d87f72 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -164,7 +164,7 @@ struct rte_eth_dev *
> > unsigned i;
> >
> > for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > -   if (rte_eth_devices[i].state == RTE_ETH_DEV_UNUSED)
> > +   if (rte_eth_dev_share_data->data[i].name[0] == '\0')
> 
> I know it is not really necessary, but I'd keep both (just in case):
> if (rte_eth_devices[i].state == RTE_ETH_DEV_UNUSED) &&
> rte_eth_dev_share_data->data[i].name[0] == '\0')
> 
Since, as you, I don't think it is necessary, searched again and didn't find 
reason to that,
What's about 
RTE_ASSERT(rte_eth_devices[i].state == RTE_ETH_DEV_UNUSED);
 Instead?

> Aprart from that: Acked-by: Konstantin Ananyev
> 
> 
> > return i;
> > }
> > return RTE_MAX_ETHPORTS;
> > --
> > 1.8.3.1



[dpdk-dev] [PATCH v4] config: sort PMD config options

2018-01-20 Thread Ferruh Yigit
No config option changed, added or removed.
Only reshuffle PMD config options mostly to help new PMDs where to put
their new config option.

Ordered as physical, paravirtual and virtual groups. Alphabetical order
within a group.

Also tried to group vendor devices together which breaks alphabetical
order in some places.

Signed-off-by: Ferruh Yigit 
---

v2: rebased
v3: rebased
v4: rebased
---
 config/common_base | 208 ++---
 1 file changed, 104 insertions(+), 104 deletions(-)

diff --git a/config/common_base b/config/common_base
index 913af51b0..9ab176766 100644
--- a/config/common_base
+++ b/config/common_base
@@ -162,6 +162,67 @@ CONFIG_RTE_LIBRTE_PCI_BUS=y
 #
 CONFIG_RTE_LIBRTE_VDEV_BUS=y
 
+#
+# Compile ARK PMD
+#
+CONFIG_RTE_LIBRTE_ARK_PMD=y
+CONFIG_RTE_LIBRTE_ARK_PAD_TX=y
+CONFIG_RTE_LIBRTE_ARK_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE=n
+
+#
+# Compile burst-oriented Broadcom PMD driver
+#
+CONFIG_RTE_LIBRTE_BNX2X_PMD=n
+CONFIG_RTE_LIBRTE_BNX2X_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_BNX2X_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_BNX2X_MF_SUPPORT=n
+CONFIG_RTE_LIBRTE_BNX2X_DEBUG_PERIODIC=n
+
+#
+# Compile burst-oriented Broadcom BNXT PMD driver
+#
+CONFIG_RTE_LIBRTE_BNXT_PMD=y
+
+#
+# Compile burst-oriented Chelsio Terminator (CXGBE) PMD
+#
+CONFIG_RTE_LIBRTE_CXGBE_PMD=y
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG=n
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG_REG=n
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG_MBOX=n
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_CXGBE_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_CXGBE_TPUT=y
+
+# NXP DPAA Bus
+CONFIG_RTE_LIBRTE_DPAA_BUS=n
+CONFIG_RTE_LIBRTE_DPAA_MEMPOOL=n
+CONFIG_RTE_LIBRTE_DPAA_PMD=n
+
+#
+# Compile NXP DPAA2 FSL-MC Bus
+#
+CONFIG_RTE_LIBRTE_FSLMC_BUS=n
+
+#
+# Compile Support Libraries for NXP DPAA2
+#
+CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL=n
+CONFIG_RTE_LIBRTE_DPAA2_USE_PHYS_IOVA=y
+
+#
+# Compile burst-oriented NXP DPAA2 PMD driver
+#
+CONFIG_RTE_LIBRTE_DPAA2_PMD=n
+CONFIG_RTE_LIBRTE_DPAA2_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_DPAA2_DEBUG_DRIVER=n
+CONFIG_RTE_LIBRTE_DPAA2_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_DPAA2_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_DPAA2_DEBUG_TX_FREE=n
+
 #
 # Compile burst-oriented Amazon ENA PMD driver
 #
@@ -171,6 +232,11 @@ CONFIG_RTE_LIBRTE_ENA_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_ENA_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_ENA_COM_DEBUG=n
 
+#
+# Compile burst-oriented Cisco ENIC PMD driver
+#
+CONFIG_RTE_LIBRTE_ENIC_PMD=y
+
 #
 # Compile burst-oriented IGB & EM PMD drivers
 #
@@ -241,31 +307,6 @@ CONFIG_RTE_LIBRTE_MLX5_PMD=n
 CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
 CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8
 
-#
-# Compile burst-oriented Broadcom PMD driver
-#
-CONFIG_RTE_LIBRTE_BNX2X_PMD=n
-CONFIG_RTE_LIBRTE_BNX2X_DEBUG_RX=n
-CONFIG_RTE_LIBRTE_BNX2X_DEBUG_TX=n
-CONFIG_RTE_LIBRTE_BNX2X_MF_SUPPORT=n
-CONFIG_RTE_LIBRTE_BNX2X_DEBUG_PERIODIC=n
-
-#
-# Compile burst-oriented Chelsio Terminator (CXGBE) PMD
-#
-CONFIG_RTE_LIBRTE_CXGBE_PMD=y
-CONFIG_RTE_LIBRTE_CXGBE_DEBUG=n
-CONFIG_RTE_LIBRTE_CXGBE_DEBUG_REG=n
-CONFIG_RTE_LIBRTE_CXGBE_DEBUG_MBOX=n
-CONFIG_RTE_LIBRTE_CXGBE_DEBUG_TX=n
-CONFIG_RTE_LIBRTE_CXGBE_DEBUG_RX=n
-CONFIG_RTE_LIBRTE_CXGBE_TPUT=y
-
-#
-# Compile burst-oriented Cisco ENIC PMD driver
-#
-CONFIG_RTE_LIBRTE_ENIC_PMD=y
-
 #
 # Compile burst-oriented Netronome NFP PMD driver
 #
@@ -273,20 +314,15 @@ CONFIG_RTE_LIBRTE_NFP_PMD=n
 CONFIG_RTE_LIBRTE_NFP_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_NFP_DEBUG_RX=n
 
+# QLogic 10G/25G/40G/50G/100G PMD
 #
-# Compile Marvell PMD driver
-#
-CONFIG_RTE_LIBRTE_MRVL_PMD=n
-
-#
-# Compile virtual device driver for NetVSC on Hyper-V/Azure
-#
-CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n
-
-#
-# Compile burst-oriented Broadcom BNXT PMD driver
-#
-CONFIG_RTE_LIBRTE_BNXT_PMD=y
+CONFIG_RTE_LIBRTE_QEDE_PMD=y
+CONFIG_RTE_LIBRTE_QEDE_DEBUG_INFO=n
+CONFIG_RTE_LIBRTE_QEDE_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_QEDE_DEBUG_RX=n
+#Provides abs path/name of the firmware file.
+#Empty string denotes driver will use default firmware
+CONFIG_RTE_LIBRTE_QEDE_FW=""
 
 #
 # Compile burst-oriented Solarflare libefx-based PMD
@@ -294,11 +330,6 @@ CONFIG_RTE_LIBRTE_BNXT_PMD=y
 CONFIG_RTE_LIBRTE_SFC_EFX_PMD=y
 CONFIG_RTE_LIBRTE_SFC_EFX_DEBUG=n
 
-#
-# Compile SOFTNIC PMD
-#
-CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
-
 #
 # Compile software PMD backed by SZEDATA2 device
 #
@@ -325,36 +356,18 @@ CONFIG_RTE_LIBRTE_LIO_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_LIO_DEBUG_MBOX=n
 CONFIG_RTE_LIBRTE_LIO_DEBUG_REGS=n
 
-# NXP DPAA Bus
-CONFIG_RTE_LIBRTE_DPAA_BUS=n
-CONFIG_RTE_LIBRTE_DPAA_MEMPOOL=n
-CONFIG_RTE_LIBRTE_DPAA_PMD=n
-
 #
 # Compile burst-oriented Cavium OCTEONTX network PMD driver
 #
 CONFIG_RTE_LIBRTE_OCTEONTX_PMD=y
 
 #
-# Compile NXP DPAA2 FSL-MC Bus
-#
-CONFIG_RTE_LIBRTE_FSLMC_BUS=n
-
-#
-# Compile Support Libraries for NXP DPAA2
-#
-CONFIG_RTE_LIBRTE_DPAA2_MEMPOOL=n
-CONFIG_RTE_LIBRTE_DPAA2_USE_PHYS_IOVA=y
-
-#
-# Compile burst-oriented NXP DPAA2 PMD driver
+# Compile WRS accelerated virtual port (AVP) guest PMD driver
 #
-CONFIG_RTE_LIBRTE_D

[dpdk-dev] [PATCH v4 1/4] ethdev: separate driver APIs

2018-01-20 Thread Ferruh Yigit
Create a rte_ethdev_driver.h file and move PMD specific APIs here.
Drivers updated to include this new header file.

There is no update in header content and since ethdev.h included by
ethdev_driver.h, nothing changed from driver point of view, only
logically grouping of APIs. From applications point of view they can't
access to driver specific APIs anymore and they shouldn't.

More PMD specific data structures still remain in ethdev.h because of
inline functions in header use them. Those will be handled separately.

Signed-off-by: Ferruh Yigit 
Acked-by: Shreyansh Jain 
Acked-by: Andrew Rybchenko 
---
v2: use SPDX header
v3: rebased on next-net
v4: rebased on next-net
---
 drivers/bus/dpaa/dpaa_bus.c |   2 +-
 drivers/bus/dpaa/include/fman.h |   2 +-
 drivers/bus/fslmc/fslmc_bus.c   |   2 +-
 drivers/bus/fslmc/fslmc_vfio.c  |   2 +-
 drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c|   2 +-
 drivers/bus/fslmc/portal/dpaa2_hw_dpci.c|   2 +-
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.c|   2 +-
 drivers/event/dpaa2/dpaa2_eventdev.c|   2 +-
 drivers/event/dpaa2/dpaa2_hw_dpcon.c|   2 +-
 drivers/event/octeontx/ssovf_evdev.c|   2 +-
 drivers/mempool/dpaa2/dpaa2_hw_mempool.c|   2 +-
 drivers/net/af_packet/rte_eth_af_packet.c   |   2 +-
 drivers/net/ark/ark_ethdev_rx.h |   2 +-
 drivers/net/ark/ark_ethdev_tx.h |   2 +-
 drivers/net/ark/ark_ext.h   |   2 +-
 drivers/net/ark/ark_global.h|   2 +-
 drivers/net/ark/ark_pktchkr.c   |   2 +-
 drivers/net/ark/ark_pktgen.c|   2 +-
 drivers/net/avf/avf_ethdev.c|   2 +-
 drivers/net/avf/avf_rxtx.c  |   2 +-
 drivers/net/avf/avf_rxtx_vec_common.h   |   2 +-
 drivers/net/avf/avf_rxtx_vec_sse.c  |   2 +-
 drivers/net/avf/avf_vchnl.c |   2 +-
 drivers/net/avp/avp_ethdev.c|   2 +-
 drivers/net/bnx2x/bnx2x_ethdev.h|   2 +-
 drivers/net/bnxt/bnxt.h |   2 +-
 drivers/net/bnxt/bnxt_ethdev.c  |   2 +-
 drivers/net/bnxt/bnxt_stats.h   |   2 +-
 drivers/net/bnxt/rte_pmd_bnxt.c |   2 +-
 drivers/net/bnxt/rte_pmd_bnxt.h |   2 +-
 drivers/net/bonding/rte_eth_bond_api.c  |   2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c  |   2 +-
 drivers/net/bonding/rte_eth_bond_private.h  |   2 +-
 drivers/net/cxgbe/base/t4_hw.c  |   2 +-
 drivers/net/cxgbe/cxgbe_ethdev.c|   2 +-
 drivers/net/cxgbe/cxgbe_main.c  |   2 +-
 drivers/net/cxgbe/sge.c |   2 +-
 drivers/net/dpaa/dpaa_ethdev.c  |   2 +-
 drivers/net/dpaa/dpaa_ethdev.h  |   2 +-
 drivers/net/dpaa/dpaa_rxtx.c|   2 +-
 drivers/net/dpaa/rte_pmd_dpaa.h |   2 +-
 drivers/net/dpaa2/base/dpaa2_hw_dpni.c  |   2 +-
 drivers/net/dpaa2/dpaa2_ethdev.c|   2 +-
 drivers/net/dpaa2/dpaa2_rxtx.c  |   2 +-
 drivers/net/e1000/em_ethdev.c   |   2 +-
 drivers/net/e1000/em_rxtx.c |   2 +-
 drivers/net/e1000/igb_ethdev.c  |   2 +-
 drivers/net/e1000/igb_flow.c|   2 +-
 drivers/net/e1000/igb_pf.c  |   2 +-
 drivers/net/e1000/igb_rxtx.c|   2 +-
 drivers/net/ena/ena_ethdev.c|   2 +-
 drivers/net/enic/enic_clsf.c|   2 +-
 drivers/net/enic/enic_ethdev.c  |   2 +-
 drivers/net/enic/enic_flow.c|   2 +-
 drivers/net/enic/enic_main.c|   2 +-
 drivers/net/enic/enic_res.c |   2 +-
 drivers/net/enic/enic_rxtx.c|   2 +-
 drivers/net/failsafe/failsafe.c |   2 +-
 drivers/net/failsafe/failsafe_ops.c |   2 +-
 drivers/net/failsafe/failsafe_private.h |   2 +-
 drivers/net/failsafe/failsafe_rxtx.c|   2 +-
 drivers/net/fm10k/fm10k_ethdev.c|   2 +-
 drivers/net/fm10k/fm10k_rxtx.c  |   2 +-
 drivers/net/fm10k/fm10k_rxtx_vec.c  |   2 +-
 drivers/net/i40e/i40e_ethdev.c  |   2 +-
 drivers/net/i40e/i40e_ethdev_vf.c   |   2 +-
 drivers/net/i40e/i40e_fdir.c|   2 +-
 drivers/net/i40e/i40e_flow.c|   2 +-
 drivers/net/i40e/i40e_pf.c  |   2 +-
 drivers/net/i40e/i40e_rxtx.c|   2 +-
 drivers/net/i40e/i40e_rxtx_vec_altivec.c|   2 +-
 drivers/net/i40e/i40e_rxtx_vec_avx2.c   |   2 +-
 drivers/net/i40e/i40e_rxtx_vec_common.h |   2 +-
 drivers/net/i40e/i40e_rxtx_vec_neon.c   |   2 +-
 drivers/net/i40e/i40e_rx

[dpdk-dev] [PATCH v4 2/4] ethdev: separate internal structures into own header

2018-01-20 Thread Ferruh Yigit
rte_ethdev_core.h created. Internal data structures are moved here.

These structures are mostly intended to be used by drivers, but they
need to be in the public header file because of the inline functions
in the ethdev.h header, and those inline functions are preferred to
kept because of the performance concerns.

The accessibility of the data structures are not changed, only logically
grouped to show that they are not intended to be used by applications.

Signed-off-by: Ferruh Yigit 
---
 lib/librte_ether/Makefile  |   1 +
 lib/librte_ether/rte_ethdev.h  | 579 +--
 lib/librte_ether/rte_ethdev_core.h | 602 +
 3 files changed, 604 insertions(+), 578 deletions(-)
 create mode 100644 lib/librte_ether/rte_ethdev_core.h

diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 48d84f445..34d014e71 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -28,6 +28,7 @@ SRCS-y += ethdev_profile.c
 #
 SYMLINK-y-include += rte_ethdev.h
 SYMLINK-y-include += rte_ethdev_driver.h
+SYMLINK-y-include += rte_ethdev_core.h
 SYMLINK-y-include += rte_ethdev_pci.h
 SYMLINK-y-include += rte_ethdev_vdev.h
 SYMLINK-y-include += rte_eth_ctrl.h
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 523809c3a..80a9ce6fc 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1153,478 +1153,6 @@ TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
 /**< l2 tunnel forwarding mask */
 #define ETH_L2_TUNNEL_FORWARDING_MASK   0x0008
 
-/*
- * Definitions of all functions exported by an Ethernet driver through the
- * the generic structure of type *eth_dev_ops* supplied in the *rte_eth_dev*
- * structure associated with an Ethernet device.
- */
-
-typedef int  (*eth_dev_configure_t)(struct rte_eth_dev *dev);
-/**< @internal Ethernet device configuration. */
-
-typedef int  (*eth_dev_start_t)(struct rte_eth_dev *dev);
-/**< @internal Function used to start a configured Ethernet device. */
-
-typedef void (*eth_dev_stop_t)(struct rte_eth_dev *dev);
-/**< @internal Function used to stop a configured Ethernet device. */
-
-typedef int  (*eth_dev_set_link_up_t)(struct rte_eth_dev *dev);
-/**< @internal Function used to link up a configured Ethernet device. */
-
-typedef int  (*eth_dev_set_link_down_t)(struct rte_eth_dev *dev);
-/**< @internal Function used to link down a configured Ethernet device. */
-
-typedef void (*eth_dev_close_t)(struct rte_eth_dev *dev);
-/**< @internal Function used to close a configured Ethernet device. */
-
-typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
-/** <@internal Function used to reset a configured Ethernet device. */
-
-typedef void (*eth_promiscuous_enable_t)(struct rte_eth_dev *dev);
-/**< @internal Function used to enable the RX promiscuous mode of an Ethernet 
device. */
-
-typedef void (*eth_promiscuous_disable_t)(struct rte_eth_dev *dev);
-/**< @internal Function used to disable the RX promiscuous mode of an Ethernet 
device. */
-
-typedef void (*eth_allmulticast_enable_t)(struct rte_eth_dev *dev);
-/**< @internal Enable the receipt of all multicast packets by an Ethernet 
device. */
-
-typedef void (*eth_allmulticast_disable_t)(struct rte_eth_dev *dev);
-/**< @internal Disable the receipt of all multicast packets by an Ethernet 
device. */
-
-typedef int (*eth_link_update_t)(struct rte_eth_dev *dev,
-   int wait_to_complete);
-/**< @internal Get link speed, duplex mode and state (up/down) of an Ethernet 
device. */
-
-typedef int (*eth_stats_get_t)(struct rte_eth_dev *dev,
-   struct rte_eth_stats *igb_stats);
-/**< @internal Get global I/O statistics of an Ethernet device. */
-
-typedef void (*eth_stats_reset_t)(struct rte_eth_dev *dev);
-/**< @internal Reset global I/O statistics of an Ethernet device to 0. */
-
-typedef int (*eth_xstats_get_t)(struct rte_eth_dev *dev,
-   struct rte_eth_xstat *stats, unsigned n);
-/**< @internal Get extended stats of an Ethernet device. */
-
-typedef int (*eth_xstats_get_by_id_t)(struct rte_eth_dev *dev,
- const uint64_t *ids,
- uint64_t *values,
- unsigned int n);
-/**< @internal Get extended stats of an Ethernet device. */
-
-typedef void (*eth_xstats_reset_t)(struct rte_eth_dev *dev);
-/**< @internal Reset extended stats of an Ethernet device. */
-
-typedef int (*eth_xstats_get_names_t)(struct rte_eth_dev *dev,
-   struct rte_eth_xstat_name *xstats_names, unsigned size);
-/**< @internal Get names of extended stats of an Ethernet device. */
-
-typedef int (*eth_xstats_get_names_by_id_t)(struct rte_eth_dev *dev,
-   struct rte_eth_xstat_name *xstats_names, const uint64_t *ids,
-   unsigned int size);
-/**< @internal Get names of extended stats of an Ethernet device. */
-
-typedef int (*eth_queue_stats_mapp

[dpdk-dev] [PATCH v4 3/4] ethdev: reorder inline functions

2018-01-20 Thread Ferruh Yigit
Move all inline function to the end of the ethdev.h header file and move
the ethdev_core.h just before inline functions.

Since inline functions need data structures in ethdev_core.h, this
reorder is to group them and make it clear where put further inline
functions.

Signed-off-by: Ferruh Yigit 
---
 lib/librte_ether/rte_ethdev.h | 2694 +
 1 file changed, 1348 insertions(+), 1346 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 80a9ce6fc..2c90175c6 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1221,8 +1221,6 @@ struct rte_eth_dev_sriov {
 
 #define RTE_ETH_NAME_MAX_LEN RTE_DEV_NAME_MAX_LEN
 
-#include 
-
 /** Device supports link state interrupt */
 #define RTE_ETH_DEV_INTR_LSC 0x0002
 /** Device is a bonded slave */
@@ -2167,1784 +2165,1788 @@ int rte_eth_dev_get_vlan_offload(uint16_t port_id);
  */
 int rte_eth_dev_set_vlan_pvid(uint16_t port_id, uint16_t pvid, int on);
 
+typedef void (*buffer_tx_error_fn)(struct rte_mbuf **unsent, uint16_t count,
+   void *userdata);
+
 /**
+ * Structure used to buffer packets for future TX
+ * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush
+ */
+struct rte_eth_dev_tx_buffer {
+   buffer_tx_error_fn error_callback;
+   void *error_userdata;
+   uint16_t size;   /**< Size of buffer for buffered tx */
+   uint16_t length; /**< Number of packets in the array */
+   struct rte_mbuf *pkts[];
+   /**< Pending packets to be sent on explicit flush or when full */
+};
+
+/**
+ * Calculate the size of the tx buffer.
  *
- * Retrieve a burst of input packets from a receive queue of an Ethernet
- * device. The retrieved packets are stored in *rte_mbuf* structures whose
- * pointers are supplied in the *rx_pkts* array.
- *
- * The rte_eth_rx_burst() function loops, parsing the RX ring of the
- * receive queue, up to *nb_pkts* packets, and for each completed RX
- * descriptor in the ring, it performs the following operations:
+ * @param sz
+ *   Number of stored packets.
+ */
+#define RTE_ETH_TX_BUFFER_SIZE(sz) \
+   (sizeof(struct rte_eth_dev_tx_buffer) + (sz) * sizeof(struct rte_mbuf 
*))
+
+/**
+ * Initialize default values for buffered transmitting
  *
- * - Initialize the *rte_mbuf* data structure associated with the
- *   RX descriptor according to the information provided by the NIC into
- *   that RX descriptor.
+ * @param buffer
+ *   Tx buffer to be initialized.
+ * @param size
+ *   Buffer size
+ * @return
+ *   0 if no error
+ */
+int
+rte_eth_tx_buffer_init(struct rte_eth_dev_tx_buffer *buffer, uint16_t size);
+
+/**
+ * Configure a callback for buffered packets which cannot be sent
  *
- * - Store the *rte_mbuf* data structure into the next entry of the
- *   *rx_pkts* array.
+ * Register a specific callback to be called when an attempt is made to send
+ * all packets buffered on an ethernet port, but not all packets can
+ * successfully be sent. The callback registered here will be called only
+ * from calls to rte_eth_tx_buffer() and rte_eth_tx_buffer_flush() APIs.
+ * The default callback configured for each queue by default just frees the
+ * packets back to the calling mempool. If additional behaviour is required,
+ * for example, to count dropped packets, or to retry transmission of packets
+ * which cannot be sent, this function should be used to register a suitable
+ * callback function to implement the desired behaviour.
+ * The example callback "rte_eth_count_unsent_packet_callback()" is also
+ * provided as reference.
  *
- * - Replenish the RX descriptor with a new *rte_mbuf* buffer
- *   allocated from the memory pool associated with the receive queue at
- *   initialization time.
+ * @param buffer
+ *   The port identifier of the Ethernet device.
+ * @param callback
+ *   The function to be used as the callback.
+ * @param userdata
+ *   Arbitrary parameter to be passed to the callback function
+ * @return
+ *   0 on success, or -1 on error with rte_errno set appropriately
+ */
+int
+rte_eth_tx_buffer_set_err_callback(struct rte_eth_dev_tx_buffer *buffer,
+   buffer_tx_error_fn callback, void *userdata);
+
+/**
+ * Callback function for silently dropping unsent buffered packets.
  *
- * When retrieving an input packet that was scattered by the controller
- * into multiple receive descriptors, the rte_eth_rx_burst() function
- * appends the associated *rte_mbuf* buffers to the first buffer of the
- * packet.
+ * This function can be passed to rte_eth_tx_buffer_set_err_callback() to
+ * adjust the default behavior when buffered packets cannot be sent. This
+ * function drops any unsent packets silently and is used by tx buffered
+ * operations as default behavior.
  *
- * The rte_eth_rx_burst() function returns the number of packets
- * actually retrieved, which is the number of *rte_mbuf* data structures
- * effectively supplied into the *rx_pkts* arra

[dpdk-dev] [PATCH v4 4/4] ethdev: rename function parameter for consistency

2018-01-20 Thread Ferruh Yigit
Update "port" function argument variable to "port_id" in public
header to be consistent in all APIs.

No functional change.

Signed-off-by: Ferruh Yigit 
---
 lib/librte_ether/rte_ethdev.h | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2c90175c6..6b60262bb 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1159,7 +1159,7 @@ TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
  * The callback function is called on RX with a burst of packets that have
  * been received on the given port and queue.
  *
- * @param port
+ * @param port_id
  *   The Ethernet port on which RX is being performed.
  * @param queue
  *   The queue on the Ethernet port which is being used to receive the packets.
@@ -1175,7 +1175,7 @@ TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback);
  * @return
  *   The number of packets returned to the user.
  */
-typedef uint16_t (*rte_rx_callback_fn)(uint16_t port, uint16_t queue,
+typedef uint16_t (*rte_rx_callback_fn)(uint16_t port_id, uint16_t queue,
struct rte_mbuf *pkts[], uint16_t nb_pkts, uint16_t max_pkts,
void *user_param);
 
@@ -1185,7 +1185,7 @@ typedef uint16_t (*rte_rx_callback_fn)(uint16_t port, 
uint16_t queue,
  * The callback function is called on TX with a burst of packets immediately
  * before the packets are put onto the hardware queue for transmission.
  *
- * @param port
+ * @param port_id
  *   The Ethernet port on which TX is being performed.
  * @param queue
  *   The queue on the Ethernet port which is being used to transmit the 
packets.
@@ -1199,7 +1199,7 @@ typedef uint16_t (*rte_rx_callback_fn)(uint16_t port, 
uint16_t queue,
  * @return
  *   The number of packets to be written to the NIC.
  */
-typedef uint16_t (*rte_tx_callback_fn)(uint16_t port, uint16_t queue,
+typedef uint16_t (*rte_tx_callback_fn)(uint16_t port_id, uint16_t queue,
struct rte_mbuf *pkts[], uint16_t nb_pkts, void *user_param);
 
 /**
@@ -2546,7 +2546,7 @@ int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
  * Add a MAC address to an internal array of addresses used to enable whitelist
  * filtering to accept packets only if the destination MAC address matches.
  *
- * @param port
+ * @param port_id
  *   The port identifier of the Ethernet device.
  * @param mac_addr
  *   The MAC address to add.
@@ -2554,19 +2554,19 @@ int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
  *   VMDq pool index to associate address with (if VMDq is enabled). If VMDq is
  *   not enabled, this should be set to 0.
  * @return
- *   - (0) if successfully added or *mac_addr" was already added.
+ *   - (0) if successfully added or *mac_addr* was already added.
  *   - (-ENOTSUP) if hardware doesn't support this feature.
  *   - (-ENODEV) if *port* is invalid.
  *   - (-ENOSPC) if no more MAC addresses can be added.
  *   - (-EINVAL) if MAC address is invalid.
  */
-int rte_eth_dev_mac_addr_add(uint16_t port, struct ether_addr *mac_addr,
+int rte_eth_dev_mac_addr_add(uint16_t port_id, struct ether_addr *mac_addr,
uint32_t pool);
 
 /**
  * Remove a MAC address from the internal array of addresses.
  *
- * @param port
+ * @param port_id
  *   The port identifier of the Ethernet device.
  * @param mac_addr
  *   MAC address to remove.
@@ -2576,12 +2576,12 @@ int rte_eth_dev_mac_addr_add(uint16_t port, struct 
ether_addr *mac_addr,
  *   - (-ENODEV) if *port* invalid.
  *   - (-EADDRINUSE) if attempting to remove the default MAC address
  */
-int rte_eth_dev_mac_addr_remove(uint16_t port, struct ether_addr *mac_addr);
+int rte_eth_dev_mac_addr_remove(uint16_t port_id, struct ether_addr *mac_addr);
 
 /**
  * Set the default MAC address.
  *
- * @param port
+ * @param port_id
  *   The port identifier of the Ethernet device.
  * @param mac_addr
  *   New default MAC address.
@@ -2591,13 +2591,13 @@ int rte_eth_dev_mac_addr_remove(uint16_t port, struct 
ether_addr *mac_addr);
  *   - (-ENODEV) if *port* invalid.
  *   - (-EINVAL) if MAC address is invalid.
  */
-int rte_eth_dev_default_mac_addr_set(uint16_t port,
+int rte_eth_dev_default_mac_addr_set(uint16_t port_id,
struct ether_addr *mac_addr);
 
 /**
  * Update Redirection Table(RETA) of Receive Side Scaling of Ethernet device.
  *
- * @param port
+ * @param port_id
  *   The port identifier of the Ethernet device.
  * @param reta_conf
  *   RETA to update.
@@ -2609,14 +2609,14 @@ int rte_eth_dev_default_mac_addr_set(uint16_t port,
  *   - (-ENOTSUP) if hardware doesn't support.
  *   - (-EINVAL) if bad parameter.
  */
-int rte_eth_dev_rss_reta_update(uint16_t port,
+int rte_eth_dev_rss_reta_update(uint16_t port_id,
struct rte_eth_rss_reta_entry64 *reta_conf,
uint16_t reta_size);
 
  /**
  * Query Redirection Table(RETA) of Receive Side Scaling of Ethernet de

Re: [dpdk-dev] [pull-request] next-crypto 18.02 rc1

2018-01-20 Thread Thomas Monjalon
20/01/2018 16:15, Pablo de Lara:
>   http://dpdk.org/git/next/dpdk-next-crypto 

There is a compilation error with clang:

drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h:46:4: fatal error:
use of undeclared identifier 'AES_CCM'



Re: [dpdk-dev] [PATCH v3 2/7] ethdev: fix used portid allocation

2018-01-20 Thread Ananyev, Konstantin
Hi Matan,

> 
> Hi Konstantin
> 
> From: Ananyev, Konstantin, Friday, January 19, 2018 2:40 PM
> > > -Original Message-
> > > From: Matan Azrad [mailto:ma...@mellanox.com]
> > > Sent: Thursday, January 18, 2018 4:35 PM
> > > To: Thomas Monjalon ; Gaetan Rivet
> > > ; Wu, Jingjing 
> > > Cc: dev@dpdk.org; Neil Horman ; Richardson,
> > > Bruce ; Ananyev, Konstantin
> > > ; sta...@dpdk.org
> > > Subject: [PATCH v3 2/7] ethdev: fix used portid allocation
> > >
> > > rte_eth_dev_find_free_port() found a free port by state checking.
> > > The state field are in local process memory, so other DPDK processes
> > > may get the same port ID because their local states may be different.
> > >
> > > Replace the state checking by the ethdev port name checking, so, if
> > > the name is an empty string the port ID will be detected as unused.
> > >
> > > Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple
> > > process model")
> > > Cc: sta...@dpdk.org
> > >
> > > Suggested-by: Konstantin Ananyev 
> > > Signed-off-by: Matan Azrad 
> > > ---
> > >  lib/librte_ether/rte_ethdev.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > b/lib/librte_ether/rte_ethdev.c index 156231c..5d87f72 100644
> > > --- a/lib/librte_ether/rte_ethdev.c
> > > +++ b/lib/librte_ether/rte_ethdev.c
> > > @@ -164,7 +164,7 @@ struct rte_eth_dev *
> > >   unsigned i;
> > >
> > >   for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > - if (rte_eth_devices[i].state == RTE_ETH_DEV_UNUSED)
> > > + if (rte_eth_dev_share_data->data[i].name[0] == '\0')
> >
> > I know it is not really necessary, but I'd keep both (just in case):
> > if (rte_eth_devices[i].state == RTE_ETH_DEV_UNUSED) &&
> > rte_eth_dev_share_data->data[i].name[0] == '\0')
> >
> Since, as you, I don't think it is necessary, searched again and didn't find 
> reason to that,
> What's about
> RTE_ASSERT(rte_eth_devices[i].state == RTE_ETH_DEV_UNUSED);
>  Instead?

Sounds ok to me.
Konstantin

> 
> > Aprart from that: Acked-by: Konstantin Ananyev
> > 
> >
> > >   return i;
> > >   }
> > >   return RTE_MAX_ETHPORTS;
> > > --
> > > 1.8.3.1



Re: [dpdk-dev] [PATCH v3 7/7] app/testpmd: adjust ethdev port ownership

2018-01-20 Thread Matan Azrad
Hi Gaetan

From: Gaëtan Rivet, Friday, January 19, 2018 5:00 PM
> Hi Matan,
> 
> On Fri, Jan 19, 2018 at 01:35:10PM +, Matan Azrad wrote:
> > Hi Konstantin
> >
> > From: Ananyev, Konstantin, Friday, January 19, 2018 3:09 PM
> > > > -Original Message-
> > > > From: Matan Azrad [mailto:ma...@mellanox.com]
> > > > Sent: Friday, January 19, 2018 12:52 PM
> > > > To: Ananyev, Konstantin ; Thomas
> > > > Monjalon ; Gaetan Rivet
> > > ;
> > > > Wu, Jingjing 
> > > > Cc: dev@dpdk.org; Neil Horman ;
> Richardson,
> > > > Bruce 
> > > > Subject: RE: [PATCH v3 7/7] app/testpmd: adjust ethdev port
> > > > ownership
> > > >
> > > > Hi Konstantin
> > > >
> > > > From: Ananyev, Konstantin, Friday, January 19, 2018 2:38 PM
> > > > > To: Matan Azrad ; Thomas Monjalon
> > > > > ; Gaetan Rivet ;
> > > Wu,
> > > > > Jingjing 
> > > > > Cc: dev@dpdk.org; Neil Horman ;
> > > > > Richardson, Bruce 
> > > > > Subject: RE: [PATCH v3 7/7] app/testpmd: adjust ethdev port
> > > > > ownership
> > > > >
> > > > > Hi Matan,
> > > > >
> > > > > > -Original Message-
> > > > > > From: Matan Azrad [mailto:ma...@mellanox.com]
> > > > > > Sent: Thursday, January 18, 2018 4:35 PM
> > > > > > To: Thomas Monjalon ; Gaetan Rivet
> > > > > > ; Wu, Jingjing 
> > > > > > Cc: dev@dpdk.org; Neil Horman ;
> > > Richardson,
> > > > > > Bruce ; Ananyev, Konstantin
> > > > > > 
> > > > > > Subject: [PATCH v3 7/7] app/testpmd: adjust ethdev port
> > > > > > ownership
> > > > > >
> > > > > > Testpmd should not use ethdev ports which are managed by other
> > > > > > DPDK entities.
> > > > > >
> > > > > > Set Testpmd ownership to each port which is not used by other
> > > > > > entity and prevent any usage of ethdev ports which are not
> > > > > > owned by
> > > Testpmd.
> > > > > >
> > > > > > Signed-off-by: Matan Azrad 
> > > > > > ---
> > > > > >  app/test-pmd/cmdline.c  | 89 +++-
> 
> > > 
> > > > > -
> > > > > >  app/test-pmd/cmdline_flow.c |  2 +-
> > > > > >  app/test-pmd/config.c   | 37 ++-
> > > > > >  app/test-pmd/parameters.c   |  4 +-
> > > > > >  app/test-pmd/testpmd.c  | 63 --
> --
> > > > > >  app/test-pmd/testpmd.h  |  3 ++
> > > > > >  6 files changed, 103 insertions(+), 95 deletions(-)
> > > > > >
> > > > > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> > > > > > index
> > > > > > 31919ba..6199c64 100644
> > > > > > --- a/app/test-pmd/cmdline.c
> > > > > > +++ b/app/test-pmd/cmdline.c
> > > > > > @@ -1394,7 +1394,7 @@ struct cmd_config_speed_all {
> > > > > > &link_speed) < 0)
> > > > > > return;
> > > > > >
> > > > > > -   RTE_ETH_FOREACH_DEV(pid) {
> > > > > > +   RTE_ETH_FOREACH_DEV_OWNED_BY(pid, my_owner.id) {
> > > > >
> > > > > Why do we need all these changes?
> > > > > As I understand you changed definition of RTE_ETH_FOREACH_DEV(),
> > > > > so no testpmd should work ok default (no_owner case).
> > > > > Am I missing something here?
> > > >
> > > > Now, After Gaetan suggestion RTE_ETH_FOREACH_DEV(pid) will iterate
> > > over all valid and ownerless ports.
> > >
> > > Yes.
> > >
> > > > Here Testpmd wants to iterate over its owned ports.
> > >
> > > Why? Why it can't just iterate over all valid and ownerless ports?
> > > As I understand it would be enough to fix current problems and would
> > > allow us to avoid any changes in testmpd (which I think is a good thing).
> >
> > Yes, I understand that this big change is very daunted, But I think the
> current a lot of bugs in testpmd(regarding port ownership) even more
> daunted.
> >
> > Look,
> > Testpmd initiates some of its internal databases depends on specific
> > port iteration, In some time someone may take ownership of Testpmd
> ports and testpmd will continue to touch them.
> >
> 
> If I look back on the fail-safe, its sole purpose is to have seamless hotplug
> with existing applications.
> 

Yes.

> Port ownership is a genericization of some functions introduced by the fail-
> safe, that could structure DPDK further.

Not only.
Port ownership is a new concept saying that not all the ports are only for the 
application
and defines well the new port usage synchronization rules.

It can be a solution for failsafe scenario, but it solves a big generic problem 
regardless fail-safe.  

> It should allow applications to have a seamless integration with subsystems 
> using port ownership. Without this, port ownership cannot be used.

I do not think it is accurate.
We can use different solution to solve the fail-safe case (seamless) by using 
the DEFFERED state as you did.
Port ownership is not only for failsafe case - it is a generic new concept 
which BTW can fix the fail-safe case(full fix).
So, application should use port ownership regardless the failsafe using, just 
to be sure no one touch its ports.

> Testpmd should be fixed, but follow the most common design patterns of
> DPDK applications. Going with port own

Re: [dpdk-dev] [pull-request] next-crypto 18.02 rc1

2018-01-20 Thread De Lara Guarch, Pablo
Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Saturday, January 20, 2018 5:00 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [pull-request] next-crypto 18.02 rc1
> 
> 20/01/2018 16:15, Pablo de Lara:
> >   http://dpdk.org/git/next/dpdk-next-crypto
> 
> There is a compilation error with clang:
> 
> drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h:46:4: fatal error:
> use of undeclared identifier 'AES_CCM'

There is a new version of the Multi buffer library. I sent a patch that updated 
the documentation.
You need to get version 0.48.

Thanks,
Pablo


Re: [dpdk-dev] [PATCH v6 4/6] ethdev: adjust APIs removal error report

2018-01-20 Thread Matan Azrad
Hi all

From: Thomas Monjalon, Friday, January 19, 2018 8:17 PM
> 19/01/2018 19:13, Ferruh Yigit:
> > On 1/19/2018 5:54 PM, Thomas Monjalon wrote:
> > > 19/01/2018 17:19, Ferruh Yigit:
> > >> On 1/18/2018 6:10 PM, Matan Azrad wrote:
> > >>> From: Ferruh Yigit, Thursday, January 18, 2018 7:31 PM
> >  This patch updates *all* ethdev public APIs to add if device is
> >  removed check?
> > >>>
> > >>> Yes.
> > >>>
> >  And each check goes to ethdev is_removed() dev_ops to ask if dev
> >  is removed.
> > >>> Probably, if the REMOVED state setted in will not call device
> is_remove.
> > >>>
> >  These must be better way of doing this, am I missing something.
> > >>>
> > >>> Suggest.
> > >>
> > >> With a silly analogy, this is like a blind person asking each time
> > >> if he is dead before talking to other person.

Just to accurate the analogy:)
This is like a blind person(application,ethdev) using its guide dog(ethdev 
device), every time the dog refuses to take action (error occurred), the blind 
person asks if the dog can be a guide dog anymore(removal error). 

> > >> At first glance I can think of a kind of watchdog timer can be
> > >> implemented in ethdev layer. It provides periodic checks and if
> > >> device is dead it calls the registered user callback function.
> > >>
> > >> This method presented as synchronous method but not triggered from
> > >> side where event happens, I mean not triggered from PMD but from
> application.
> > >> So does application doing polling continuously if device is dead?
> > >> Or if application is relying this patch to add a check in each API,
> > >> what happens if device removed during data processing, will app rely on
> asynchronous method?
> > >
> > > We cannot put a mutex on hardware removal :) So we have to live with
> > > errors du to removal.
> > > If we are trying to configure a removed device, the error will be
> > > not related to the root cause.
> > > This patch is just trying to improve the situation by returning an
> > > appropriate error code if removal can be detected.
> > > Note: the check is run only if there is an error and if the removal
> > > is not already detected.
> > >
> > > If I understand well, you prefer relying only on asynchronous
> > > hotplug events? Even if they come really late?
> >
> > I think asynchronous hotplug events are better approach, but if they
> > are late I understand you need a solution.
> >
> > I assume issue is when device is removed, application can make API
> > calls until it detects the removal.
> >
> > For that case what do you think instead of ethdev abstraction layer
> > doing a translation, define a specific error type and return it from
> > PMDs to indicate that error is related to the missing device and
> > application will know device is no more there? And consistently use that
> error type in all PMDs.
> 
> Yes it is also a good solution.
> I think Matan proposed to integrate it in ethdev to avoid duplicating the
> same mechanism in every PMDs.
> 

Yes, as a lot of ethdev API code pieces do.

> > Or what do you think above suggested flexible watchdog timer
> > implementation in ethdev, so applications may be informed about missing
> device faster.
> 
> I think the watchdog would compete with hotplug events.
> So I prefer not integrating one more asynchronous mechanism for the same
> purpose.
> If we want polling, it can be an option to enable in EAL hotplug.

Konstantin wrote in another thread:
>+  RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
>+
>+  dev = &rte_eth_devices[port_id];
>+
>+  RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);

> I'd says these 2 checks have to be swapped.

Konstantin, Please explain why.



Re: [dpdk-dev] [PATCH v3 02/10] app/testpmd: convert to new Ethdev Rx offloads API

2018-01-20 Thread Shahaf Shuler
Hi Harish, 


Friday, January 19, 2018 9:30 PM, Patil, Harish:
> >
> 
> Hi Shahaf,
> This testpmd change is causing some issues for qede PMD.
> In this patch, rte_eth_dev_configure() and RX/TX queue setup functions are
> called for the second time after applying TX offloads but without calling
> rte_eth_dev_close() before.

This issue is not related to the patch, rather to how DPDK and ethdev layer is 
defined.
In DPDK, after device probe the device is considered usable from ethdev. Then 
from ethdev the device can be configured (both port and queues) as long as it 
is not started yet.
The move between device start, stop, port config and queue config can happen 
multiple times without the need to move through device close.
In fact, the only way to make the device usable again after close is by another 
probe. 

> Also there is no way in the driver to detect that this is a port
> reconfiguration condition in which case it needs to do certain resources
> deallocation/cleanup based on prior configuration.

Am not sure I understand. 
You mean it is impossible from your side to detect port configuration in your 
PMD? Other PMDs do that.

> Ideally, we don’t want to maintain port states in driver internally. So is
> there any suggestions here?

Generally, I think this is a big issue in qede PMD. It is not following the 
rules of ethdev. 
I guess that with this misfunctionality you will have bugs also with real 
applications.  

As a temporary walk around you can configure the Tx offload you want through 
--tx-offloads command line parameter and avoid enablement using the CLI.
This way the port and queues will be reconfigured only once. 

> 
> Thanks,
> Harish
> 
> 
> 
> 
> >



Re: [dpdk-dev] [PATCH v6 4/6] ethdev: adjust APIs removal error report

2018-01-20 Thread Thomas Monjalon
20/01/2018 20:04, Matan Azrad:
> Konstantin wrote in another thread:
> >+RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> >+
> >+dev = &rte_eth_devices[port_id];
> >+
> >+RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);
> 
> > I'd says these 2 checks have to be swapped.
> 
> Konstantin, Please explain why.

I think he was talking about these 2 tests:

+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);
+   if (dev->state == RTE_ETH_DEV_REMOVED)
+   return 1;



Re: [dpdk-dev] [pull-request] next-crypto 18.02 rc1

2018-01-20 Thread Thomas Monjalon
20/01/2018 19:19, De Lara Guarch, Pablo:
> Hi Thomas,
> 
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > 20/01/2018 16:15, Pablo de Lara:
> > >   http://dpdk.org/git/next/dpdk-next-crypto
> > 
> > There is a compilation error with clang:
> > 
> > drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h:46:4: fatal error:
> > use of undeclared identifier 'AES_CCM'
> 
> There is a new version of the Multi buffer library. I sent a patch that 
> updated the documentation.
> You need to get version 0.48.

Thank you, I always forget to update these crypto libraries when pulling.


Re: [dpdk-dev] [PATCH v6 4/6] ethdev: adjust APIs removal error report

2018-01-20 Thread Matan Azrad
Hi Thomas
From: Thomas Monjalon, Saturday, January 20, 2018 10:29 PM
> 20/01/2018 20:04, Matan Azrad:
> > Konstantin wrote in another thread:
> > >+  RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> > >+
> > >+  dev = &rte_eth_devices[port_id];
> > >+
> > >+  RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);
> >
> > > I'd says these 2 checks have to be swapped.
> >
> > Konstantin, Please explain why.
> 
> I think he was talking about these 2 tests:
> 
> + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);
> + if (dev->state == RTE_ETH_DEV_REMOVED)
> + return 1;

Ahh yes, it makes sense, I will swap them.

Thanks.


[dpdk-dev] [PATCH v6 0/6] TAP RSS eBPF cover letter

2018-01-20 Thread Ophir Munk
The patches of TAP RSS eBPF follow the RFC on this issue
https://dpdk.org/dev/patchwork/patch/31781/

v6 changes with respect to v5
=
1. Reorder thes following commits (source file commit before byte code commit)
  net/tap: add eBPF program file
  net/tap: add eBPF bytes code
2. Add acknowledgment to commits 

v5 changes with respect to v4
=
Update TAP document guide with RSS

v4 changes with respect to v3
=
* Code updates based on review comments
* New commits organization (2-->5) based on review comments
  1. net/tap: support actions for different classifiers (preparations for BPF. 
 No BPF code yet)
  2. net/tap: add eBPF bytes code (BPF bytes code in a separate file)
  3. net/tap: add eBPF program file (Program source code of bytes code)
  4. net/tap: add eBPF API (BPF API to be used by TAP)
  5. net/tap: implement TAP RSS using eBPF

v3 changes with respect to v2
=
* Add support for IPv6 RSS in BPF program
* Bug fixes
* Updated compatibility to kernel versions:
  eBPF requires Linux version 4.9 configured with BPF
* New license header (SPDX) for newly added files

v2 changes with respect to v1
=
* v2 has new commits organization (3 --> 2)
* BPF program was revised. It is successfully tested on
  IPv4 L3 L4 layers (compatible to mlx4 device)
* Licensing: no comments received for using "Dual BSD/GPL"
  string during BPF program loading to the kernel.
  (v1 and v2 are using the same license strings)
  Any comments are welcome.
* Compatibility to kernel versions:
  eBPF requires Linux version 4.2 configured with BPF. TAP PMD will
  successfully compile on systems with old or non-BPF configured kernels.
  During compilation time the required Linux headers are searched for.
  If they are not present missing definitions are locally added
  (tap_autoconf.h).
  If the kernel cannot support a BPF operation - at runtime it will
  gracefully reject the netlink message (with BPF) sent to it.
Ophir Munk (6):
  net/tap: support actions for different classifiers
  net/tap: add eBPF program file
  net/tap: add eBPF bytes code
  net/tap: add eBPF API
  net/tap: implement TAP RSS using eBPF
  doc: detail new tap RSS feature in guides

 doc/guides/nics/tap.rst   |   60 ++
 drivers/net/tap/Makefile  |   34 +
 drivers/net/tap/rte_eth_tap.h |9 +-
 drivers/net/tap/tap_bpf.h |  112 +++
 drivers/net/tap/tap_bpf_api.c |  190 +
 drivers/net/tap/tap_bpf_insns.h   | 1693 +
 drivers/net/tap/tap_bpf_program.c |  221 +
 drivers/net/tap/tap_flow.c|  648 +++---
 drivers/net/tap/tap_flow.h|   13 +
 drivers/net/tap/tap_rss.h |   34 +
 drivers/net/tap/tap_tcmsgs.h  |4 +
 11 files changed, 2922 insertions(+), 96 deletions(-)
 create mode 100644 drivers/net/tap/tap_bpf.h
 create mode 100644 drivers/net/tap/tap_bpf_api.c
 create mode 100644 drivers/net/tap/tap_bpf_insns.h
 create mode 100644 drivers/net/tap/tap_bpf_program.c
 create mode 100644 drivers/net/tap/tap_rss.h

-- 
2.7.4



[dpdk-dev] [PATCH v6 1/6] net/tap: support actions for different classifiers

2018-01-20 Thread Ophir Munk
Add a generic TC actions handling for TC actions: "mirred",
"gact", "skbedit". This will be useful when introducing
BPF actions, as it uses TCA_BPF_ACT instead of TCA_FLOWER_ACT

Signed-off-by: Ophir Munk 
Acked-by: Pascal Mazon 
---
 drivers/net/tap/Makefile  |   8 ++
 drivers/net/tap/rte_eth_tap.h |   4 +-
 drivers/net/tap/tap_flow.c| 224 +-
 3 files changed, 145 insertions(+), 91 deletions(-)

diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
index fd4195f..fbf84e1 100644
--- a/drivers/net/tap/Makefile
+++ b/drivers/net/tap/Makefile
@@ -12,6 +12,12 @@ EXPORT_MAP := rte_pmd_tap_version.map
 
 LIBABIVER := 1
 
+#
+# TAP_MAX_QUEUES must be a power of 2
+#
+ifeq ($(TAP_MAX_QUEUES),)
+   TAP_MAX_QUEUES = 16
+endif
 CFLAGS += -O3
 CFLAGS += -I$(SRCDIR)
 CFLAGS += -I.
@@ -20,6 +26,8 @@ LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
 LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_hash
 LDLIBS += -lrte_bus_vdev
 
+CFLAGS += -DTAP_MAX_QUEUES=$(TAP_MAX_QUEUES)
+
 #
 # all source are stored in SRCS-y
 #
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 829f32f..202b3cd 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -45,7 +45,7 @@
 #include 
 
 #ifdef IFF_MULTI_QUEUE
-#define RTE_PMD_TAP_MAX_QUEUES 16
+#define RTE_PMD_TAP_MAX_QUEUES TAP_MAX_QUEUES
 #else
 #define RTE_PMD_TAP_MAX_QUEUES 1
 #endif
@@ -90,6 +90,8 @@ struct pmd_internals {
int ioctl_sock;   /* socket for ioctl calls */
int nlsk_fd;  /* Netlink socket fd */
int flow_isolate; /* 1 if flow isolation is enabled */
+   int flower_support;   /* 1 if kernel supports, else 0 */
+   int flower_vlan_support;  /* 1 if kernel supports, else 0 */
LIST_HEAD(tap_flows, rte_flow) flows;/* rte_flow rules */
/* implicit rte_flow rules set when a remote device is active */
LIST_HEAD(tap_implicit_flows, rte_flow) implicit_flows;
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 90b2654..d2a69a7 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -33,6 +33,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -104,6 +105,19 @@ struct remote_rule {
int mirred;
 };
 
+struct action_data {
+   char id[16];
+
+   union {
+   struct tc_gact gact;
+   struct tc_mirred mirred;
+   struct skbedit {
+   struct tc_skbedit skbedit;
+   uint16_t queue;
+   } skbedit;
+   };
+};
+
 static int tap_flow_create_eth(const struct rte_flow_item *item, void *data);
 static int tap_flow_create_vlan(const struct rte_flow_item *item, void *data);
 static int tap_flow_create_ipv4(const struct rte_flow_item *item, void *data);
@@ -819,111 +833,89 @@ tap_flow_item_validate(const struct rte_flow_item *item,
 }
 
 /**
- * Transform a DROP/PASSTHRU action item in the provided flow for TC.
+ * Configure the kernel with a TC action and its configured parameters
+ * Handled actions: "gact", "mirred", "skbedit", "bpf"
  *
- * @param[in, out] flow
- *   Flow to be filled.
- * @param[in] action
- *   Appropriate action to be set in the TCA_GACT_PARMS structure.
+ * @param[in] flow
+ *   Pointer to rte flow containing the netlink message
  *
- * @return
- *   0 if checks are alright, -1 otherwise.
- */
-static int
-add_action_gact(struct rte_flow *flow, int action)
-{
-   struct nlmsg *msg = &flow->msg;
-   size_t act_index = 1;
-   struct tc_gact p = {
-   .action = action
-   };
-
-   if (tap_nlattr_nested_start(msg, TCA_FLOWER_ACT) < 0)
-   return -1;
-   if (tap_nlattr_nested_start(msg, act_index++) < 0)
-   return -1;
-   tap_nlattr_add(&msg->nh, TCA_ACT_KIND, sizeof("gact"), "gact");
-   if (tap_nlattr_nested_start(msg, TCA_ACT_OPTIONS) < 0)
-   return -1;
-   tap_nlattr_add(&msg->nh, TCA_GACT_PARMS, sizeof(p), &p);
-   tap_nlattr_nested_finish(msg); /* nested TCA_ACT_OPTIONS */
-   tap_nlattr_nested_finish(msg); /* nested act_index */
-   tap_nlattr_nested_finish(msg); /* nested TCA_FLOWER_ACT */
-   return 0;
-}
-
-/**
- * Transform a MIRRED action item in the provided flow for TC.
+ * @param[in, out] act_index
+ *   Pointer to action sequence number in the TC command
  *
- * @param[in, out] flow
- *   Flow to be filled.
- * @param[in] ifindex
- *   Netdevice ifindex, where to mirror/redirect packet to.
- * @param[in] action_type
- *   Either TCA_EGRESS_REDIR for redirection or TCA_EGRESS_MIRROR for 
mirroring.
+ * @param[in] adata
+ *  Pointer to struct holding the action parameters
  *
  * @return
- *   0 if checks are alright, -1 otherwise.
+ *   -1 on failure, 0 on success
  */
 static int
-add_action_mirred(struct rte_flow *flow, uint16_t ifindex, uint16

[dpdk-dev] [PATCH v6 3/6] net/tap: add eBPF bytes code

2018-01-20 Thread Ophir Munk
File tap_bpf_insns.h was added. It includes  eBPF bytes code
which corresponds to source file tap_bpf_program.c
(see "net/tap: add eBPF program file").
The bytes code is in the format of C arrays of struct bpf_insn and
was generated from the C file tap_bpf_program.c
1. The C file was compiled via LLVM into an object file in ELF
format as:
   clang -O2 -emit-llvm -c tap_bpf_program.c -o - | llc -march=bpf \
   -filetype=obj -o 

clang version must be 3.7 and above
The C functions are under different ELF sections and are considered
different BPF programs to be downloaded to the kernel

2. Using an external tool the ELF sections are parsed and the C arrays
of struct bpf_insn are generated. Each C array (corresponding to a
different function under an ELF section) is downloaded to the kernel
using an BPF systm call. The external tool that generates the C arrays
will be added in separate commits.

Signed-off-by: Ophir Munk 
Acked-by: Pascal Mazon 
---
 drivers/net/tap/tap_bpf_insns.h | 1693 +++
 1 file changed, 1693 insertions(+)
 create mode 100644 drivers/net/tap/tap_bpf_insns.h

diff --git a/drivers/net/tap/tap_bpf_insns.h b/drivers/net/tap/tap_bpf_insns.h
new file mode 100644
index 000..c406f78
--- /dev/null
+++ b/drivers/net/tap/tap_bpf_insns.h
@@ -0,0 +1,1693 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2017 Mellanox Technologies, Ltd.
+ */
+
+#include 
+
+/* bpf_insn array matching cls_q section. See tap_bpf_program.c file */
+struct bpf_insn cls_q_insns[] = {
+   {0x61,1,1,   52, 0x},
+   {0x18,2,0,0, 0xdeadbeef},
+   {0x00,0,0,0, 0x},
+   {0x63,   10,2,   -4, 0x},
+   {0x61,2,   10,   -4, 0x},
+   {0x07,2,0,0, 0x0001},
+   {0x67,2,0,0, 0x0020},
+   {0x77,2,0,0, 0x0020},
+   {0xb7,0,0,0, 0x},
+   {0x1d,1,2,1, 0x},
+   {0xb7,0,0,0, 0x},
+   {0x95,0,0,0, 0x},
+};
+
+/* bpf_insn array matching l3_l4 section. see tap_bpf_program.c file */
+struct bpf_insn l3_l4_hash_insns[] = {
+   {0xbf,7,1,0, 0x},
+   {0x61,8,7,   16, 0x},
+   {0x61,6,7,   76, 0x},
+   {0x61,9,7,   80, 0x},
+   {0x18,1,0,0, 0xdeadbeef},
+   {0x00,0,0,0, 0x},
+   {0x63,   10,1,   -4, 0x},
+   {0xbf,2,   10,0, 0x},
+   {0x07,2,0,0, 0xfffc},
+   {0x18,1,1,0, 0xcafe},
+   {0x00,0,0,0, 0x},
+   {0x85,0,0,0, 0x0001},
+   {0x55,0,0,   21, 0x},
+   {0xb7,1,0,0, 0x0a64},
+   {0x6b,   10,1,  -16, 0x},
+   {0x18,1,0,0, 0x69666e6f},
+   {0x00,0,0,0, 0x65727567},
+   {0x7b,   10,1,  -24, 0x},
+   {0x18,1,0,0, 0x6e207369},
+   {0x00,0,0,0, 0x6320746f},
+   {0x7b,   10,1,  -32, 0x},
+   {0x18,1,0,0, 0x20737372},
+   {0x00,0,0,0, 0x2079656b},
+   {0x7b,   10,1,  -40, 0x},
+   {0x18,1,0,0, 0x68736168},
+   {0x00,0,0,0, 0x203a2928},
+   {0x7b,   10,1,  -48, 0x},
+   {0xb7,7,0,0, 0x},
+   {0x73,   10,7,  -14, 0x},
+   {0xbf,1,   10,0, 0x},
+   {0x07,1,0,0, 0xffd0},
+   {0xb7,2,0,0, 0x0023},
+   {0x85,0,0,0, 0x0006},
+   {0x05,0,0, 1632, 0x},
+   {0xb7,1,0,0, 0x000e},
+   {0x61,2,7,   20, 0x},
+   {0x15,2,0,   10, 0x},
+   {0x61,2,7,   28, 0x},
+   {0x55,2,0,8, 0xa888},
+   {0xbf,2,7,0, 0x},
+   {0xb7,7,0,0, 0x},
+   {0xbf,1,6,0, 0x},
+   {0x07,1,0,0, 0x0012},
+   {0x2d,1,9, 1622, 0x},
+   {0xb7,1,0,0, 0x0012},
+   {0x69,8,6,   16, 0x},
+   {0xbf,7,2,0, 0x},
+   {0x7b,   10,7,  -56, 0x},
+   {0x57,8,0,0, 0x},
+   {0x15,8,0,  409, 0xdd86},
+   {0xb7,7,0,0, 0x0003},
+   {0x55,8,0, 1614, 0x0008},
+   {0x0f,6,1,0, 0x},
+   {0xb7,7,0,0, 0x},
+   {0xbf,1,6,0, 0x0

[dpdk-dev] [PATCH v6 2/6] net/tap: add eBPF program file

2018-01-20 Thread Ophir Munk
File tap_bpf_program.c was added with two ELF sections
corresponding to two BPF programs and one BPF map.

Section cls_q - BPF classifier to classify packets to their
corresponding queue after an RSS hash was calculated on the packet
and saved in skb->cb[1]
Section l3_l4 - BPF action to calculate RSS hash on packet
layers 3 and 4
This file is not part of DPDK tree compilation.

Signed-off-by: Ophir Munk 
Acked-by: Pascal Mazon 
---
 drivers/net/tap/tap_bpf_program.c | 221 ++
 1 file changed, 221 insertions(+)
 create mode 100644 drivers/net/tap/tap_bpf_program.c

diff --git a/drivers/net/tap/tap_bpf_program.c 
b/drivers/net/tap/tap_bpf_program.c
new file mode 100644
index 000..848c50b
--- /dev/null
+++ b/drivers/net/tap/tap_bpf_program.c
@@ -0,0 +1,221 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "tap_rss.h"
+
+/** Create IPv4 address */
+#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
+   (((b) & 0xff) << 16) | \
+   (((c) & 0xff) << 8)  | \
+   ((d) & 0xff))
+
+#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
+   ((b) & 0xff))
+
+/*
+ * The queue number is offset by 1, to distinguish packets that have
+ * gone through this rule (skb->cb[1] != 0) from others.
+ */
+#define QUEUE_OFFSET   1
+#define PIN_GLOBAL_NS  2
+
+#define KEY_IDX0
+#define BPF_MAP_ID_KEY 1
+
+struct vlan_hdr {
+   __be16 proto;
+   __be16 tci;
+};
+
+struct bpf_elf_map __attribute__((section("maps"), used))
+map_keys = {
+   .type   =   BPF_MAP_TYPE_HASH,
+   .id =   BPF_MAP_ID_KEY,
+   .size_key   =   sizeof(__u32),
+   .size_value =   sizeof(struct rss_key),
+   .max_elem   =   256,
+   .pinning=   PIN_GLOBAL_NS,
+};
+
+__section("cls_q") int
+match_q(struct __sk_buff *skb)
+{
+   __u32 queue = skb->cb[1];
+   volatile __u32 q = 0xdeadbeef;
+   __u32 match_queue = QUEUE_OFFSET + q;
+
+   /* printt("match_q$i() queue = %d\n", queue); */
+
+   if (queue != match_queue)
+   return TC_ACT_OK;
+   return TC_ACT_UNSPEC;
+}
+
+
+struct ipv4_l3_l4_tuple {
+   __u32src_addr;
+   __u32dst_addr;
+   __u16dport;
+   __u16sport;
+} __attribute__((packed));
+
+struct ipv6_l3_l4_tuple {
+   __u8src_addr[16];
+   __u8dst_addr[16];
+   __u16   dport;
+   __u16   sport;
+} __attribute__((packed));
+
+static const __u8 def_rss_key[] = {
+   0xd1, 0x81, 0xc6, 0x2c,
+   0xf7, 0xf4, 0xdb, 0x5b,
+   0x19, 0x83, 0xa2, 0xfc,
+   0x94, 0x3e, 0x1a, 0xdb,
+   0xd9, 0x38, 0x9e, 0x6b,
+   0xd1, 0x03, 0x9c, 0x2c,
+   0xa7, 0x44, 0x99, 0xad,
+   0x59, 0x3d, 0x56, 0xd9,
+   0xf3, 0x25, 0x3c, 0x06,
+   0x2a, 0xdc, 0x1f, 0xfc,
+};
+
+static __u32  __attribute__((always_inline))
+rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
+   __u8 input_len)
+{
+   __u32 i, j, hash = 0;
+#pragma unroll
+   for (j = 0; j < input_len; j++) {
+#pragma unroll
+   for (i = 0; i < 32; i++) {
+   if (input_tuple[j] & (1 << (31 - i))) {
+   hash ^= ((const __u32 *)def_rss_key)[j] << i |
+   (__u32)((uint64_t)
+   (((const __u32 *)def_rss_key)[j + 1])
+   >> (32 - i));
+   }
+   }
+   }
+   return hash;
+}
+
+static int __attribute__((always_inline))
+rss_l3_l4(struct __sk_buff *skb)
+{
+   void *data_end = (void *)(long)skb->data_end;
+   void *data = (void *)(long)skb->data;
+   __u16 proto = (__u16)skb->protocol;
+   __u32 key_idx = 0xdeadbeef;
+   __u32 hash;
+   struct rss_key *rsskey;
+   __u64 off = ETH_HLEN;
+   int j;
+   __u8 *key = 0;
+   __u32 len;
+   __u32 queue = 0;
+
+   rsskey = map_lookup_elem(&map_keys, &key_idx);
+   if (!rsskey) {
+   printt("hash(): rss key is not configured\n");
+   return TC_ACT_OK;
+   }
+   key = (__u8 *)rsskey->key;
+
+   /* Get correct proto for 802.1ad */
+   if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
+   if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
+   sizeof(proto) > data_end)
+   return TC_ACT_OK;
+   proto = *(__u16 *)(data + ETH_ALEN * 2 +
+  sizeof(struct vlan_hdr));
+   off += sizeof(struct vlan_hdr);
+   }
+
+   if (proto == htons(ETH_P_IP)) {
+   if (data + off + sizeof(struc

[dpdk-dev] [PATCH v7 0/6] Fail-safe\ethdev: fix removal handling lack

2018-01-20 Thread Matan Azrad
There is time between the physical removal of the device until sub-device PMDs 
get a RMV interrupt. 
At this time DPDK PMDs and applications still don't know about the removal and 
may call sub-device control operation which should return an error.

This series adds new ethdev operation to check device removal, adds support for 
it in mlx PMDs, adjust ethdev APIs to return -EIO in case of removal and fixes 
the fail-safe bug of removal error report.

V2:
Remove ENODEV definition.
Remove checks from all mlx control commands.
Add new devop - "is_removed".
Implement it in mlx4 and mlx5.
Fix failsafe bug by the new devop.

V3:
Adjust ethdev APIs removal error report.
Change failsafe check to check eth_dev* return values.
Remove backporting of fail-safe patch.

V4:
Improve fail-safe internal API to adjust the actual error value as discussed.
Remove "Fixes" lines from fail-safe patch.
No changes in ethdev\mlx patches.

V5:
Rebase on top of master-net-mlx. 

V6:
Move ethdev new API to be EXPERIMENTAL.

V7:
Fix API return value description.
Swap checks in the new API as Konstantin suggested.
Add comment in the API as Ferruh suggested.

Matan Azrad (6):
  ethdev: add devop to check removal status
  net/mlx4: support a device removal check operation
  net/mlx5: support a device removal check operation
  ethdev: adjust APIs removal error report
  ethdev: adjust flow APIs removal error report
  net/failsafe: fix removed device handling

 drivers/net/failsafe/failsafe_flow.c|  18 ++-
 drivers/net/failsafe/failsafe_ops.c |  35 +++--
 drivers/net/failsafe/failsafe_private.h |  11 ++
 drivers/net/mlx4/mlx4.c |   1 +
 drivers/net/mlx4/mlx4.h |   1 +
 drivers/net/mlx4/mlx4_ethdev.c  |  20 +++
 drivers/net/mlx5/mlx5.c |   2 +
 drivers/net/mlx5/mlx5.h |   1 +
 drivers/net/mlx5/mlx5_ethdev.c  |  20 +++
 lib/librte_ether/rte_ethdev.c   | 219 +---
 lib/librte_ether/rte_ethdev.h   |  71 ++-
 lib/librte_ether/rte_ethdev_version.map |   1 +
 lib/librte_ether/rte_flow.c |  34 -
 lib/librte_ether/rte_flow.h |   2 +
 14 files changed, 336 insertions(+), 100 deletions(-)

-- 
1.8.3.1



[dpdk-dev] [PATCH v6 6/6] doc: detail new tap RSS feature in guides

2018-01-20 Thread Ophir Munk
Signed-off-by: Ophir Munk 
Acked-by: Pascal Mazon 
---
 doc/guides/nics/tap.rst | 60 +
 1 file changed, 60 insertions(+)

diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
index 04086b1..dc6f834 100644
--- a/doc/guides/nics/tap.rst
+++ b/doc/guides/nics/tap.rst
@@ -132,6 +132,7 @@ Supported actions:
 - DROP
 - QUEUE
 - PASSTHRU
+- RSS
 
 It is generally not possible to provide a "last" item. However, if the "last"
 item, once masked, is identical to the masked spec, then it is supported.
@@ -161,6 +162,11 @@ Drop UDP packets in vlan 3::
testpmd> flow create 0 priority 3 ingress pattern eth / vlan vid is 3 / \
 ipv4 proto is 17 / end actions drop / end
 
+Distribute IPv4 TCP packets using RSS to a given MAC address over queues 0-3::
+
+   testpmd> flow create 0 priority 4 ingress pattern eth dst is 
0a:0b:0c:0d:0e:0f \
+/ ipv4 / tcp / end actions rss queues 0 1 2 3 end / end
+
 Example
 ---
 
@@ -213,3 +219,57 @@ traffic is being looped back. You can use ``set all size 
XXX`` to change the
 size of the packets after you stop the traffic. Use pktgen ``help``
 command to see a list of all commands. You can also use the ``-f`` option to
 load commands at startup in command line or Lua script in pktgen.
+
+RSS specifics
+-
+Packet distribution in TAP is done by the kernel which has a default
+distribution. This feature is adding RSS distribution based on eBPF code.
+The default eBPF code calculates RSS hash based on Toeplitz algorithm for
+a fixed RSS key. It is calculated on fixed packet offsets. For IPv4 and IPv6 it
+is calculated over src/dst addresses (8 or 32 bytes for IPv4 or IPv6
+respectively) and src/dst TCP/UDP ports (4 bytes).
+
+The RSS algorithm is written in file ``tap_bpf_program.c`` which
+does not take part in TAP PMD compilation. Instead this file is compiled
+in advance to eBPF object file. The eBPF object file is then parsed and
+translated into eBPF byte code in the format of C arrays of eBPF
+instructions. The C array of eBPF instructions is part of TAP PMD tree and
+is taking part in TAP PMD compilation. At run time the C arrays are uploaded to
+the kernel via BPF system calls and the RSS hash is calculated by the
+kernel.
+
+It is possible to support different RSS hash algorithms by updating file
+``tap_bpf_program.c``  In order to add a new RSS hash algorithm follow these
+steps:
+
+1. Write the new RSS implementation in file ``tap_bpf_program.c``
+
+BPF programs which are uploaded to the kernel correspond to
+C functions under different ELF sections.
+
+2. Install ``LLVM`` library and ``clang`` compiler versions 3.7 and above
+
+3. Compile ``tap_bpf_program.c`` via ``LLVM`` into an object file::
+
+clang -O2 -emit-llvm -c tap_bpf_program.c -o - | llc -march=bpf \
+-filetype=obj -o 
+
+
+4. Use a tool that receives two parameters: an eBPF object file and a section
+name, and prints out the section as a C array of eBPF instructions.
+Embed the C array in your TAP PMD tree.
+
+The C arrays are uploaded to the kernel using BPF system calls.
+
+``tc`` (traffic control) is a well known user space utility program used to
+configure the Linux kernel packet scheduler. It is usually packaged as
+part of the ``iproute2`` package.
+Since commit 11c39b5e9 ("tc: add eBPF support to f_bpf") ``tc`` can be used
+to uploads eBPF code to the kernel and can be patched in order to print the
+C arrays of eBPF instructions just before calling the BPF system call.
+Please refer to ``iproute2`` package file ``lib/bpf.c`` function
+``bpf_prog_load()``.
+
+An example utility for eBPF instruction generation in the format of C arrays 
will
+be added in next releases
+
-- 
2.7.4



[dpdk-dev] [PATCH v6 4/6] net/tap: add eBPF API

2018-01-20 Thread Ophir Munk
This commit include BPF API to be used by TAP.

tap_flow_bpf_cls_q() - download to kernel BPF program that classifies
packets to their matching queues
tap_flow_bpf_calc_l3_l4_hash() - download to kernel BPF program that
calculates per packet layer 3 and layer 4 RSS hash
tap_flow_bpf_rss_map_create() - create BPF RSS map for storing RSS
parameters per RSS rule
tap_flow_bpf_update_rss_elem() - update BPF map entry with RSS rule
parameters

Signed-off-by: Ophir Munk 
Acked-by: Pascal Mazon 
---
 drivers/net/tap/Makefile  |   6 ++
 drivers/net/tap/tap_bpf.h | 112 +
 drivers/net/tap/tap_bpf_api.c | 190 ++
 drivers/net/tap/tap_flow.h|   6 ++
 4 files changed, 314 insertions(+)
 create mode 100644 drivers/net/tap/tap_bpf.h
 create mode 100644 drivers/net/tap/tap_bpf_api.c

diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
index fbf84e1..fad8a94 100644
--- a/drivers/net/tap/Makefile
+++ b/drivers/net/tap/Makefile
@@ -35,6 +35,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += rte_eth_tap.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_netlink.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_tcmsgs.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap_bpf_api.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
 
@@ -61,6 +62,11 @@ tap_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
linux/pkt_cls.h \
enum TCA_FLOWER_KEY_VLAN_PRIO \
$(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_BPF_PROG_LOAD \
+   linux/bpf.h \
+   enum BPF_PROG_LOAD \
+   $(AUTOCONF_OUTPUT)
 
 # Create tap_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/tap/tap_bpf.h b/drivers/net/tap/tap_bpf.h
new file mode 100644
index 000..30eefb3
--- /dev/null
+++ b/drivers/net/tap/tap_bpf.h
@@ -0,0 +1,112 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd.
+ */
+
+#ifndef __TAP_BPF_H__
+#define __TAP_BPF_H__
+
+#include 
+
+#ifdef HAVE_BPF_PROG_LOAD
+#include 
+#else
+/* BPF_MAP_UPDATE_ELEM command flags */
+#defineBPF_ANY 0 /* create a new element or update an existing */
+
+/* BPF architecture instruction struct */
+struct bpf_insn {
+   __u8code;
+   __u8dst_reg:4;
+   __u8src_reg:4;
+   __s16   off;
+   __s32   imm; /* immediate value */
+};
+
+/* BPF program types */
+enum bpf_prog_type {
+   BPF_PROG_TYPE_UNSPEC,
+   BPF_PROG_TYPE_SOCKET_FILTER,
+   BPF_PROG_TYPE_KPROBE,
+   BPF_PROG_TYPE_SCHED_CLS,
+   BPF_PROG_TYPE_SCHED_ACT,
+};
+
+/* BPF commands types */
+enum bpf_cmd {
+   BPF_MAP_CREATE,
+   BPF_MAP_LOOKUP_ELEM,
+   BPF_MAP_UPDATE_ELEM,
+   BPF_MAP_DELETE_ELEM,
+   BPF_MAP_GET_NEXT_KEY,
+   BPF_PROG_LOAD,
+};
+
+/* BPF maps types */
+enum bpf_map_type {
+   BPF_MAP_TYPE_UNSPEC,
+   BPF_MAP_TYPE_HASH,
+};
+
+/* union of anonymous structs used with TAP BPF commands */
+union bpf_attr {
+   /* BPF_MAP_CREATE command */
+   struct {
+   __u32   map_type;
+   __u32   key_size;
+   __u32   value_size;
+   __u32   max_entries;
+   __u32   map_flags;
+   __u32   inner_map_fd;
+   };
+
+   /* BPF_MAP_UPDATE_ELEM, BPF_MAP_DELETE_ELEM commands */
+   struct {
+   __u32   map_fd;
+   __aligned_u64   key;
+   union {
+   __aligned_u64 value;
+   __aligned_u64 next_key;
+   };
+   __u64   flags;
+   };
+
+   /* BPF_PROG_LOAD command */
+   struct {
+   __u32   prog_type;
+   __u32   insn_cnt;
+   __aligned_u64   insns;
+   __aligned_u64   license;
+   __u32   log_level;
+   __u32   log_size;
+   __aligned_u64   log_buf;
+   __u32   kern_version;
+   __u32   prog_flags;
+   };
+} __attribute__((aligned(8)));
+#endif
+
+#ifndef __NR_bpf
+# if defined(__i386__)
+#  define __NR_bpf 357
+# elif defined(__x86_64__)
+#  define __NR_bpf 321
+# elif defined(__aarch64__)
+#  define __NR_bpf 280
+# elif defined(__sparc__)
+#  define __NR_bpf 349
+# elif defined(__s390__)
+#  define __NR_bpf 351
+# else
+#  error __NR_bpf not defined
+# endif
+#endif
+
+enum {
+   BPF_MAP_ID_KEY,
+   BPF_MAP_ID_SIMPLE,
+};
+
+static int bpf_load(enum bpf_prog_type type, const struct bpf_insn *insns,
+   size_t insns_cnt, const char *license);
+
+#endif /* __TAP_BPF_H__ */
diff --git a/drivers/net/tap/tap_bpf_api.c b/drivers/net/tap/tap_bpf_api.c
new file mode 100644
index 000..109a681
--- /dev/null
+++ b/drivers/net/tap/tap_bpf_api.c
@@ -0,0 +1,190 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyri

[dpdk-dev] [PATCH v6 5/6] net/tap: implement TAP RSS using eBPF

2018-01-20 Thread Ophir Munk
TAP PMD is required to support RSS queue mapping based on rte_flow API. An
example usage for this requirement is failsafe transparent switching from a
PCI device to TAP device while keep redirecting packets to the same RSS
queues on both devices.

TAP RSS implementation is based on eBPF programs sent to Linux kernel
through BPF system calls and using netlink messages to reference the
programs as part of traffic control commands.

TC uses eBPF programs as classifiers and actions.
eBPF classification: packets marked with an RSS queue will be directed
to this queue using TC with "skbedit" action.
BPF classifiers are downloaded to the kernel once on TAP creation for
each TAP Rx queue.

eBPF action: calculate the Toeplitz RSS hash based on L3 addresses and
L4 ports. Mark the packet with the RSS queue according the resulting
RSS hash, then reclassify the packet.
BPF actions are downloaded to the kernel for each new RSS rule.

TAP eBPF requires Linux version 4.9 configured with BPF. TAP PMD will
successfully compile on systems with old or non-BPF configured kernels but
RSS rules creation on TAP devices will not be successful

Signed-off-by: Ophir Munk 
Acked-by: Pascal Mazon 
---
 drivers/net/tap/Makefile  |  20 ++
 drivers/net/tap/rte_eth_tap.h |   5 +
 drivers/net/tap/tap_flow.c| 450 --
 drivers/net/tap/tap_flow.h|   7 +
 drivers/net/tap/tap_rss.h |  34 
 drivers/net/tap/tap_tcmsgs.h  |   4 +
 6 files changed, 502 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/tap/tap_rss.h

diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
index fad8a94..e23c3a2 100644
--- a/drivers/net/tap/Makefile
+++ b/drivers/net/tap/Makefile
@@ -63,6 +63,26 @@ tap_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
enum TCA_FLOWER_KEY_VLAN_PRIO \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
+   HAVE_TC_BPF \
+   linux/pkt_cls.h \
+   enum TCA_BPF_UNSPEC \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TC_BPF_FD \
+   linux/pkt_cls.h \
+   enum TCA_BPF_FD \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TC_ACT_BPF \
+   linux/tc_act/tc_bpf.h \
+   enum TCA_ACT_BPF_UNSPEC \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_TC_ACT_BPF_FD \
+   linux/tc_act/tc_bpf.h \
+   enum TCA_ACT_BPF_FD \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
HAVE_BPF_PROG_LOAD \
linux/bpf.h \
enum BPF_PROG_LOAD \
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 202b3cd..c185473 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -92,6 +92,11 @@ struct pmd_internals {
int flow_isolate; /* 1 if flow isolation is enabled */
int flower_support;   /* 1 if kernel supports, else 0 */
int flower_vlan_support;  /* 1 if kernel supports, else 0 */
+   int rss_enabled;  /* 1 if RSS is enabled, else 0 */
+   /* implicit rules set when RSS is enabled */
+   int map_fd;   /* BPF RSS map fd */
+   int bpf_fd[RTE_PMD_TAP_MAX_QUEUES];/* List of bpf fds per queue */
+   LIST_HEAD(tap_rss_flows, rte_flow) rss_flows;
LIST_HEAD(tap_flows, rte_flow) flows;/* rte_flow rules */
/* implicit rte_flow rules set when a remote device is active */
LIST_HEAD(tap_implicit_flows, rte_flow) implicit_flows;
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index d2a69a7..6aa53a7 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -43,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef HAVE_TC_FLOWER
 /*
@@ -82,12 +84,79 @@ enum {
TCA_FLOWER_KEY_VLAN_ETH_TYPE,   /* be16 */
 };
 #endif
+/*
+ * For kernels < 4.2 BPF related enums may not be defined.
+ * Runtime checks will be carried out to gracefully report on TC messages that
+ * are rejected by the kernel. Rejection reasons may be due to:
+ * 1. enum is not defined
+ * 2. enum is defined but kernel is not configured to support BPF system calls,
+ *BPF classifications or BPF actions.
+ */
+#ifndef HAVE_TC_BPF
+enum {
+   TCA_BPF_UNSPEC,
+   TCA_BPF_ACT,
+   TCA_BPF_POLICE,
+   TCA_BPF_CLASSID,
+   TCA_BPF_OPS_LEN,
+   TCA_BPF_OPS,
+};
+#endif
+#ifndef HAVE_TC_BPF_FD
+enum {
+   TCA_BPF_FD = TCA_BPF_OPS + 1,
+   TCA_BPF_NAME,
+};
+#endif
+#ifndef HAVE_TC_ACT_BPF
+#define tc_gen \
+   __u32 index; \
+   __u32 capab; \
+   int   action; \
+   int   refcnt; \
+   int

[dpdk-dev] [PATCH v7 1/6] ethdev: add devop to check removal status

2018-01-20 Thread Matan Azrad
There is time between the physical removal of the device until PMDs get
a RMV interrupt. At this time DPDK PMDs and applications still don't
know about the removal.

Current removal detection is achieved only by registration to device RMV
event and the notification comes asynchronously. So, there is no option
to detect a device removal synchronously.
Applications and other DPDK entities may want to check a device removal
synchronously and to take an immediate decision accordingly.

Add new dev op called is_removed to allow DPDK entities to check an
Ethernet device removal status immediately.

Signed-off-by: Matan Azrad 
Acked-by: Thomas Monjalon 
---
 lib/librte_ether/rte_ethdev.c   | 29 ++---
 lib/librte_ether/rte_ethdev.h   | 20 
 lib/librte_ether/rte_ethdev_version.map |  1 +
 3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b349599..fd70d10 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -114,7 +114,8 @@ enum {
 rte_eth_find_next(uint16_t port_id)
 {
while (port_id < RTE_MAX_ETHPORTS &&
-  rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED)
+  rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
+  rte_eth_devices[port_id].state != RTE_ETH_DEV_REMOVED)
port_id++;
 
if (port_id >= RTE_MAX_ETHPORTS)
@@ -262,8 +263,7 @@ struct rte_eth_dev *
 rte_eth_dev_is_valid_port(uint16_t port_id)
 {
if (port_id >= RTE_MAX_ETHPORTS ||
-   (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED &&
-rte_eth_devices[port_id].state != RTE_ETH_DEV_DEFERRED))
+   (rte_eth_devices[port_id].state == RTE_ETH_DEV_UNUSED))
return 0;
else
return 1;
@@ -1094,6 +1094,29 @@ struct rte_eth_dev *
 }
 
 int
+rte_eth_dev_is_removed(uint16_t port_id)
+{
+   struct rte_eth_dev *dev;
+   int ret;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
+
+   dev = &rte_eth_devices[port_id];
+
+   if (dev->state == RTE_ETH_DEV_REMOVED)
+   return 1;
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->is_removed, 0);
+
+   ret = dev->dev_ops->is_removed(dev);
+   if (ret != 0)
+   /* Device is physically removed. */
+   dev->state = RTE_ETH_DEV_REMOVED;
+
+   return ret;
+}
+
+int
 rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
   uint16_t nb_rx_desc, unsigned int socket_id,
   const struct rte_eth_rxconf *rx_conf,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index f0eeefe..ed31a10 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1169,6 +1169,9 @@ struct rte_eth_dcb_info {
 typedef int (*eth_dev_reset_t)(struct rte_eth_dev *dev);
 /** <@internal Function used to reset a configured Ethernet device. */
 
+typedef int (*eth_is_removed_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to detect an Ethernet device removal. */
+
 typedef void (*eth_promiscuous_enable_t)(struct rte_eth_dev *dev);
 /**< @internal Function used to enable the RX promiscuous mode of an Ethernet 
device. */
 
@@ -1498,6 +1501,8 @@ struct eth_dev_ops {
eth_dev_close_tdev_close; /**< Close device. */
eth_dev_reset_tdev_reset; /**< Reset device. */
eth_link_update_t  link_update;   /**< Get device link state. */
+   eth_is_removed_t   is_removed;
+   /**< Check if the device was physically removed. */
 
eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous ON. */
eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous OFF. */
@@ -1684,6 +1689,7 @@ enum rte_eth_dev_state {
RTE_ETH_DEV_UNUSED = 0,
RTE_ETH_DEV_ATTACHED,
RTE_ETH_DEV_DEFERRED,
+   RTE_ETH_DEV_REMOVED,
 };
 
 /**
@@ -1970,6 +1976,20 @@ int rte_eth_dev_configure(uint16_t port_id, uint16_t 
nb_rx_queue,
 void _rte_eth_dev_reset(struct rte_eth_dev *dev);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check if an Ethernet device was physically removed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   1 when the Ethernet device is removed, otherwise 0.
+ */
+int
+rte_eth_dev_is_removed(uint16_t port_id);
+
+/**
  * Allocate and set up a receive queue for an Ethernet device.
  *
  * The function allocates a contiguous block of memory for *nb_rx_desc*
diff --git a/lib/librte_ether/rte_ethdev_version.map 
b/lib/librte_ether/rte_ethdev_version.map
index e9681ac..88b7908 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -201,6 +201,7 @@ DPDK_17.11 {
 EXPERIMENTAL {
global:
 
+   rte_eth_dev_is_removed;
rte_mtr_capabilit

[dpdk-dev] [PATCH v7 3/6] net/mlx5: support a device removal check operation

2018-01-20 Thread Matan Azrad
Add support to get removal status of mlx5 device.
It is not supported in secondary process.

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx5/mlx5.c|  2 ++
 drivers/net/mlx5/mlx5.h|  1 +
 drivers/net/mlx5/mlx5_ethdev.c | 20 
 3 files changed, 23 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 1c95f35..c13a2d3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -284,6 +284,7 @@
.tx_descriptor_status = mlx5_tx_descriptor_status,
.rx_queue_intr_enable = mlx5_rx_intr_enable,
.rx_queue_intr_disable = mlx5_rx_intr_disable,
+   .is_removed = mlx5_is_removed,
 };
 
 static const struct eth_dev_ops mlx5_dev_sec_ops = {
@@ -331,6 +332,7 @@
.tx_descriptor_status = mlx5_tx_descriptor_status,
.rx_queue_intr_enable = mlx5_rx_intr_enable,
.rx_queue_intr_disable = mlx5_rx_intr_disable,
+   .is_removed = mlx5_is_removed,
 };
 
 static struct {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e740a4e..aaff180 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -237,6 +237,7 @@ int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
 void priv_dev_interrupt_handler_install(struct priv *, struct rte_eth_dev *);
 int mlx5_set_link_down(struct rte_eth_dev *dev);
 int mlx5_set_link_up(struct rte_eth_dev *dev);
+int mlx5_is_removed(struct rte_eth_dev *dev);
 eth_tx_burst_t priv_select_tx_function(struct priv *, struct rte_eth_dev *);
 eth_rx_burst_t priv_select_rx_function(struct priv *, struct rte_eth_dev *);
 
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 6f78adc..1c067ca 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1453,3 +1453,23 @@ struct ethtool_link_settings {
}
return rx_pkt_burst;
 }
+
+/**
+ * Check if mlx5 device was removed.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   1 when device is removed, otherwise 0.
+ */
+int
+mlx5_is_removed(struct rte_eth_dev *dev)
+{
+   struct ibv_device_attr device_attr;
+   struct priv *priv = dev->data->dev_private;
+
+   if (ibv_query_device(priv->ctx, &device_attr) == EIO)
+   return 1;
+   return 0;
+}
-- 
1.8.3.1



[dpdk-dev] [PATCH v7 2/6] net/mlx4: support a device removal check operation

2018-01-20 Thread Matan Azrad
Add support to get removal status of mlx4 device.

Signed-off-by: Matan Azrad 
---
 drivers/net/mlx4/mlx4.c|  1 +
 drivers/net/mlx4/mlx4.h|  1 +
 drivers/net/mlx4/mlx4_ethdev.c | 20 
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 61c5bf4..703513e 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -256,6 +256,7 @@ struct mlx4_conf {
.filter_ctrl = mlx4_filter_ctrl,
.rx_queue_intr_enable = mlx4_rx_intr_enable,
.rx_queue_intr_disable = mlx4_rx_intr_disable,
+   .is_removed = mlx4_is_removed,
 };
 
 /**
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 99dc335..2ab2988 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -171,6 +171,7 @@ int mlx4_flow_ctrl_get(struct rte_eth_dev *dev,
 int mlx4_flow_ctrl_set(struct rte_eth_dev *dev,
   struct rte_eth_fc_conf *fc_conf);
 const uint32_t *mlx4_dev_supported_ptypes_get(struct rte_eth_dev *dev);
+int mlx4_is_removed(struct rte_eth_dev *dev);
 
 /* mlx4_intr.c */
 
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index c80eab5..5318b56 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -1052,3 +1052,23 @@ enum rxmode_toggle {
}
return NULL;
 }
+
+/**
+ * Check if mlx4 device was removed.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   1 when device is removed, otherwise 0.
+ */
+int
+mlx4_is_removed(struct rte_eth_dev *dev)
+{
+   struct ibv_device_attr device_attr;
+   struct priv *priv = dev->data->dev_private;
+
+   if (ibv_query_device(priv->ctx, &device_attr) == EIO)
+   return 1;
+   return 0;
+}
-- 
1.8.3.1



[dpdk-dev] [PATCH v7 6/6] net/failsafe: fix removed device handling

2018-01-20 Thread Matan Azrad
There is time between the physical removal of the device until
sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and
applications still don't know about the removal and may call sub-device
control operation which should return an error.

In previous code this error is reported to the application contrary to
fail-safe principle that the app should not be aware of device removal.

Add an removal check in each relevant control command error flow and
prevent an error report to application when the sub-device is removed.

Signed-off-by: Matan Azrad 
Acked-by: Gaetan Rivet 
---
 drivers/net/failsafe/failsafe_flow.c| 18 ++---
 drivers/net/failsafe/failsafe_ops.c | 35 ++---
 drivers/net/failsafe/failsafe_private.h | 11 +++
 3 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/drivers/net/failsafe/failsafe_flow.c 
b/drivers/net/failsafe/failsafe_flow.c
index 153ceee..c072d1e 100644
--- a/drivers/net/failsafe/failsafe_flow.c
+++ b/drivers/net/failsafe/failsafe_flow.c
@@ -87,7 +87,7 @@
DEBUG("Calling rte_flow_validate on sub_device %d", i);
ret = rte_flow_validate(PORT_ID(sdev),
attr, patterns, actions, error);
-   if (ret) {
+   if ((ret = fs_err(sdev, ret))) {
ERROR("Operation rte_flow_validate failed for 
sub_device %d"
  " with error %d", i, ret);
return ret;
@@ -111,7 +111,7 @@
FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
flow->flows[i] = rte_flow_create(PORT_ID(sdev),
attr, patterns, actions, error);
-   if (flow->flows[i] == NULL) {
+   if (flow->flows[i] == NULL && fs_err(sdev, -rte_errno)) {
ERROR("Failed to create flow on sub_device %d",
i);
goto err;
@@ -150,7 +150,7 @@
continue;
local_ret = rte_flow_destroy(PORT_ID(sdev),
flow->flows[i], error);
-   if (local_ret) {
+   if ((local_ret = fs_err(sdev, local_ret))) {
ERROR("Failed to destroy flow on sub_device %d: %d",
i, local_ret);
if (ret == 0)
@@ -175,7 +175,7 @@
FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
DEBUG("Calling rte_flow_flush on sub_device %d", i);
ret = rte_flow_flush(PORT_ID(sdev), error);
-   if (ret) {
+   if ((ret = fs_err(sdev, ret))) {
ERROR("Operation rte_flow_flush failed for sub_device 
%d"
  " with error %d", i, ret);
return ret;
@@ -199,8 +199,12 @@
 
sdev = TX_SUBDEV(dev);
if (sdev != NULL) {
-   return rte_flow_query(PORT_ID(sdev),
-   flow->flows[SUB_ID(sdev)], type, arg, error);
+   int ret = rte_flow_query(PORT_ID(sdev),
+flow->flows[SUB_ID(sdev)],
+type, arg, error);
+
+   if ((ret = fs_err(sdev, ret)))
+   return ret;
}
WARN("No active sub_device to query about its flow");
return -1;
@@ -223,7 +227,7 @@
WARN("flow isolation mode of sub_device %d in 
incoherent state.",
i);
ret = rte_flow_isolate(PORT_ID(sdev), set, error);
-   if (ret) {
+   if ((ret = fs_err(sdev, ret))) {
ERROR("Operation rte_flow_isolate failed for sub_device 
%d"
  " with error %d", i, ret);
return ret;
diff --git a/drivers/net/failsafe/failsafe_ops.c 
b/drivers/net/failsafe/failsafe_ops.c
index fe957ad..0976745 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -121,6 +121,8 @@
dev->data->nb_tx_queues,
&dev->data->dev_conf);
if (ret) {
+   if (!fs_err(sdev, ret))
+   continue;
ERROR("Could not configure sub_device %d", i);
return ret;
}
@@ -163,8 +165,11 @@
continue;
DEBUG("Starting sub_device %d", i);
ret = rte_eth_dev_start(PORT_ID(sdev));
-   if (ret)
+   if (ret) {
+   if (!fs_err(sdev, ret))
+   continue;
return ret;
+   }
sdev->state = DEV_STARTED;
}
if (PRIV(dev)->state < DEV_STARTED)
@@ -196,7 +201,7 @@
FOREACH

[dpdk-dev] [PATCH v7 5/6] ethdev: adjust flow APIs removal error report

2018-01-20 Thread Matan Azrad
rte_eth_dev_is_removed API was added to detect a device removal
synchronously.

When a device removal occurs during flow command execution, many
different errors can be reported to the user.

Adjust all flow APIs error reports to return -EIO in case of device
removal using rte_eth_dev_is_removed API.

Signed-off-by: Matan Azrad 
Acked-by: Thomas Monjalon 
---
 lib/librte_ether/rte_flow.c | 34 +++---
 lib/librte_ether/rte_flow.h |  2 ++
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
index 913d1a5..a86bfbd 100644
--- a/lib/librte_ether/rte_flow.c
+++ b/lib/librte_ether/rte_flow.c
@@ -107,6 +107,18 @@ struct rte_flow_desc_data {
MK_FLOW_ACTION(VF, sizeof(struct rte_flow_action_vf)),
 };
 
+static int
+flow_err(uint16_t port_id, int ret, struct rte_flow_error *error)
+{
+   if (ret == 0)
+   return 0;
+   if (rte_eth_dev_is_removed(port_id))
+   return rte_flow_error_set(error, EIO,
+ RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+ NULL, rte_strerror(EIO));
+   return ret;
+}
+
 /* Get generic flow operations structure from a port. */
 const struct rte_flow_ops *
 rte_flow_ops_get(uint16_t port_id, struct rte_flow_error *error)
@@ -145,7 +157,8 @@ struct rte_flow_desc_data {
if (unlikely(!ops))
return -rte_errno;
if (likely(!!ops->validate))
-   return ops->validate(dev, attr, pattern, actions, error);
+   return flow_err(port_id, ops->validate(dev, attr, pattern,
+  actions, error), error);
return rte_flow_error_set(error, ENOSYS,
  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
  NULL, rte_strerror(ENOSYS));
@@ -160,12 +173,17 @@ struct rte_flow *
struct rte_flow_error *error)
 {
struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+   struct rte_flow *flow;
const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 
if (unlikely(!ops))
return NULL;
-   if (likely(!!ops->create))
-   return ops->create(dev, attr, pattern, actions, error);
+   if (likely(!!ops->create)) {
+   flow = ops->create(dev, attr, pattern, actions, error);
+   if (flow == NULL)
+   flow_err(port_id, -rte_errno, error);
+   return flow;
+   }
rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
   NULL, rte_strerror(ENOSYS));
return NULL;
@@ -183,7 +201,8 @@ struct rte_flow *
if (unlikely(!ops))
return -rte_errno;
if (likely(!!ops->destroy))
-   return ops->destroy(dev, flow, error);
+   return flow_err(port_id, ops->destroy(dev, flow, error),
+   error);
return rte_flow_error_set(error, ENOSYS,
  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
  NULL, rte_strerror(ENOSYS));
@@ -200,7 +219,7 @@ struct rte_flow *
if (unlikely(!ops))
return -rte_errno;
if (likely(!!ops->flush))
-   return ops->flush(dev, error);
+   return flow_err(port_id, ops->flush(dev, error), error);
return rte_flow_error_set(error, ENOSYS,
  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
  NULL, rte_strerror(ENOSYS));
@@ -220,7 +239,8 @@ struct rte_flow *
if (!ops)
return -rte_errno;
if (likely(!!ops->query))
-   return ops->query(dev, flow, action, data, error);
+   return flow_err(port_id, ops->query(dev, flow, action, data,
+   error), error);
return rte_flow_error_set(error, ENOSYS,
  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
  NULL, rte_strerror(ENOSYS));
@@ -238,7 +258,7 @@ struct rte_flow *
if (!ops)
return -rte_errno;
if (likely(!!ops->isolate))
-   return ops->isolate(dev, set, error);
+   return flow_err(port_id, ops->isolate(dev, set, error), error);
return rte_flow_error_set(error, ENOSYS,
  RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
  NULL, rte_strerror(ENOSYS));
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
index e0402cf..07ec217 100644
--- a/lib/librte_ether/rte_flow.h
+++ b/lib/librte_ether/rte_flow.h
@@ -1267,6 +1267,8 @@ struct rte_flow_error {
  *
  *   -ENOSYS: underlying device does not support this functionality.
  *
+ *   -EIO: underlying device is removed.
+ *
  *   -EINVAL: unknown or invalid rule specification.
 

[dpdk-dev] [PATCH v7 4/6] ethdev: adjust APIs removal error report

2018-01-20 Thread Matan Azrad
rte_eth_dev_is_removed API was added to detect a device removal
synchronously.

When a device removal occurs during control command execution, many
different errors can be reported to the user.

Adjust all ethdev APIs error reports to return -EIO in case of device
removal using rte_eth_dev_is_removed API.

Signed-off-by: Matan Azrad 
Acked-by: Thomas Monjalon 
---
 lib/librte_ether/rte_ethdev.c | 192 +++---
 lib/librte_ether/rte_ethdev.h |  51 ++-
 2 files changed, 170 insertions(+), 73 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index fd70d10..c4ff1b0 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -338,6 +338,16 @@ struct rte_eth_dev *
return -ENODEV;
 }
 
+static int
+eth_err(uint16_t port_id, int ret)
+{
+   if (ret == 0)
+   return 0;
+   if (rte_eth_dev_is_removed(port_id))
+   return -EIO;
+   return ret;
+}
+
 /* attach the new device, then store port_id of the device */
 int
 rte_eth_dev_attach(const char *devargs, uint16_t *port_id)
@@ -492,7 +502,8 @@ struct rte_eth_dev *
return 0;
}
 
-   return dev->dev_ops->rx_queue_start(dev, rx_queue_id);
+   return eth_err(port_id, dev->dev_ops->rx_queue_start(dev,
+rx_queue_id));
 
 }
 
@@ -518,7 +529,7 @@ struct rte_eth_dev *
return 0;
}
 
-   return dev->dev_ops->rx_queue_stop(dev, rx_queue_id);
+   return eth_err(port_id, dev->dev_ops->rx_queue_stop(dev, rx_queue_id));
 
 }
 
@@ -544,7 +555,8 @@ struct rte_eth_dev *
return 0;
}
 
-   return dev->dev_ops->tx_queue_start(dev, tx_queue_id);
+   return eth_err(port_id, dev->dev_ops->tx_queue_start(dev,
+tx_queue_id));
 
 }
 
@@ -570,7 +582,7 @@ struct rte_eth_dev *
return 0;
}
 
-   return dev->dev_ops->tx_queue_stop(dev, tx_queue_id);
+   return eth_err(port_id, dev->dev_ops->tx_queue_stop(dev, tx_queue_id));
 
 }
 
@@ -888,7 +900,7 @@ struct rte_eth_dev *
port_id, diag);
rte_eth_dev_rx_queue_config(dev, 0);
rte_eth_dev_tx_queue_config(dev, 0);
-   return diag;
+   return eth_err(port_id, diag);
}
 
/* Initialize Rx profiling if enabled at compilation time. */
@@ -898,7 +910,7 @@ struct rte_eth_dev *
port_id, diag);
rte_eth_dev_rx_queue_config(dev, 0);
rte_eth_dev_tx_queue_config(dev, 0);
-   return diag;
+   return eth_err(port_id, diag);
}
 
return 0;
@@ -998,7 +1010,7 @@ struct rte_eth_dev *
if (diag == 0)
dev->data->dev_started = 1;
else
-   return diag;
+   return eth_err(port_id, diag);
 
rte_eth_dev_config_restore(port_id);
 
@@ -1040,7 +1052,7 @@ struct rte_eth_dev *
dev = &rte_eth_devices[port_id];
 
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_set_link_up, -ENOTSUP);
-   return (*dev->dev_ops->dev_set_link_up)(dev);
+   return eth_err(port_id, (*dev->dev_ops->dev_set_link_up)(dev));
 }
 
 int
@@ -1053,7 +1065,7 @@ struct rte_eth_dev *
dev = &rte_eth_devices[port_id];
 
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_set_link_down, -ENOTSUP);
-   return (*dev->dev_ops->dev_set_link_down)(dev);
+   return eth_err(port_id, (*dev->dev_ops->dev_set_link_down)(dev));
 }
 
 void
@@ -1090,7 +1102,7 @@ struct rte_eth_dev *
rte_eth_dev_stop(port_id);
ret = dev->dev_ops->dev_reset(dev);
 
-   return ret;
+   return eth_err(port_id, ret);
 }
 
 int
@@ -1215,7 +1227,7 @@ struct rte_eth_dev *
dev->data->min_rx_buf_size = mbp_buf_size;
}
 
-   return ret;
+   return eth_err(port_id, ret);
 }
 
 /**
@@ -1334,8 +1346,8 @@ struct rte_eth_dev *
  &local_conf.offloads);
}
 
-   return (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
-  socket_id, &local_conf);
+   return eth_err(port_id, (*dev->dev_ops->tx_queue_setup)(dev,
+  tx_queue_id, nb_tx_desc, socket_id, &local_conf));
 }
 
 void
@@ -1391,14 +1403,16 @@ struct rte_eth_dev *
 rte_eth_tx_done_cleanup(uint16_t port_id, uint16_t queue_id, uint32_t free_cnt)
 {
struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+   int ret;
 
/* Validate Input Data. Bail if not valid or not supported. */
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_done_cleanup, -ENOTSUP);
 
/* Call driver to free pending mbufs. */
-   return (*dev->dev_ops->tx_done_cleanup)(dev->data->tx_que

[dpdk-dev] [PATCH v4 0/7] Port ownership and syncronization

2018-01-20 Thread Matan Azrad
Add ownership mechanism to DPDK Ethernet devices to avoid multiple management 
of a device by different DPDK entities.
The port ownership mechanism is a good point to redefine the synchronization 
rules in ethdev:

1. The port allocation and port release synchronization will be managed 
by ethdev.
2. The port usage synchronization will be managed by the port owner.
3. The port ownership synchronization will be managed by ethdev.
4. DPDK entity which want to use a port safely must take ownership 
before.


V2:  
Synchronize ethdev port creation.
Synchronize port ownership mechanism.
Rename owner remove API to rte_eth_dev_owner_unset.
Remove "ethdev: free a port by a dedicated API" patch - passed to another 
series.
Add "ethdev: fix port data reset timing" patch.
Cahnge owner get API to return int value and to pass copy of the owner 
structure.
Adjust testpmd to the improved owner get API.
Adjust documentations.

V3:
Change RTE_ETH_FOREACH_DEV iterator to skip owned ports(Gaetan suggestion).
Prevent goto in set\unset APIs by adding internal API - this also adds reuse of 
code(Konstantin suggestion).
Group all the shared processes variables in one struct to allow easy allocation 
of it(Konstantin suggestion).
Take owner name truncation as warning and not as error(Konstantin suggestion).
Mark the new APIs as EXPERIMENTAL.
Rebase on top of master_net_mlx.
Rebase on top of "[PATCH v6 0/6] Fail-safe\ethdev: fix removal handling lack" 
series.
Rebase on top of "[PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure 
platforms" .
Add "ethdev: fix used portid allocation" patch suggested y Konstantin.

v4:
Share => shared in ethdev patches(Thomas suggestion).
Rephase some code comments(Thomas suggestion).
Fix compilation issue caused by wrong rebase with "fix used portid allocation" 
patch.
Add assert check for the correct port state to above fix patch.

Matan Azrad (7):
  ethdev: fix port data reset timing
  ethdev: fix used portid allocation
  ethdev: add port ownership
  ethdev: synchronize port allocation
  net/failsafe: free an eth port by a dedicated API
  net/failsafe: use ownership mechanism to own ports
  app/testpmd: adjust ethdev port ownership

 app/test-pmd/cmdline.c  |  89 +---
 app/test-pmd/cmdline_flow.c |   2 +-
 app/test-pmd/config.c   |  37 ++---
 app/test-pmd/parameters.c   |   4 +-
 app/test-pmd/testpmd.c  |  63 +---
 app/test-pmd/testpmd.h  |   3 +
 doc/guides/prog_guide/poll_mode_drv.rst |  14 +-
 drivers/net/failsafe/failsafe.c |   7 +
 drivers/net/failsafe/failsafe_eal.c |   6 +
 drivers/net/failsafe/failsafe_ether.c   |   2 +-
 drivers/net/failsafe/failsafe_private.h |   2 +
 lib/librte_ether/rte_ethdev.c   | 245 +++-
 lib/librte_ether/rte_ethdev.h   | 115 ++-
 lib/librte_ether/rte_ethdev_version.map |   6 +
 14 files changed, 458 insertions(+), 137 deletions(-)

-- 
1.8.3.1



[dpdk-dev] [PATCH v4 3/7] ethdev: add port ownership

2018-01-20 Thread Matan Azrad
The ownership of a port is implicit in DPDK.
Making it explicit is better from the next reasons:
1. It will define well who is in charge of the port usage synchronization.
2. A library could work on top of a port.
3. A port can work on top of another port.

Also in the fail-safe case, an issue has been met in testpmd.
We need to check that the application is not trying to use a port which
is already managed by fail-safe.

A port owner is built from owner id(number) and owner name(string) while
the owner id must be unique to distinguish between two identical entity
instances and the owner name can be any name.
The name helps to logically recognize the owner by different DPDK
entities and allows easy debug.
Each DPDK entity can allocate an owner unique identifier and can use it
and its preferred name to owns valid ethdev ports.
Each DPDK entity can get any port owner status to decide if it can
manage the port or not.

The mechanism is synchronized for both the primary process threads and
the secondary processes threads to allow secondary process entity to be
a port owner.

Add a synchronized ownership mechanism to DPDK Ethernet devices to
avoid multiple management of a device by different DPDK entities.

The current ethdev internal port management is not affected by this
feature.

Signed-off-by: Matan Azrad 
Acked-by: Thomas Monjalon 
Acked-by: Konstantin Ananyev 
---
 doc/guides/prog_guide/poll_mode_drv.rst |  14 ++-
 lib/librte_ether/rte_ethdev.c   | 202 
 lib/librte_ether/rte_ethdev.h   | 115 +-
 lib/librte_ether/rte_ethdev_version.map |   6 +
 4 files changed, 306 insertions(+), 31 deletions(-)

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst 
b/doc/guides/prog_guide/poll_mode_drv.rst
index d1d4b1c..d513ee3 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -156,8 +156,8 @@ concurrently on the same tx queue without SW lock. This PMD 
feature found in som
 
 See `Hardware Offload`_ for ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capability probing 
details.
 
-Device Identification and Configuration

+Device Identification, Ownership and Configuration
+--
 
 Device Identification
 ~
@@ -171,6 +171,16 @@ Based on their PCI identifier, NIC ports are assigned two 
other identifiers:
 *   A port name used to designate the port in console messages, for 
administration or debugging purposes.
 For ease of use, the port name includes the port index.
 
+Port Ownership
+~~
+The Ethernet devices ports can be owned by a single DPDK entity (application, 
library, PMD, process, etc).
+The ownership mechanism is controlled by ethdev APIs and allows to 
set/remove/get a port owner by DPDK entities.
+Allowing this should prevent any multiple management of Ethernet port by 
different entities.
+
+.. note::
+
+It is the DPDK entity responsibility to set the port owner before using it 
and to manage the port usage synchronization between different threads or 
processes.
+
 Device Configuration
 
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 3a25a64..af0e072 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -41,7 +41,6 @@
 
 static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
 struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
-static struct rte_eth_dev_data *rte_eth_dev_data;
 static uint8_t eth_dev_last_created_port;
 
 /* spinlock for eth device callbacks */
@@ -59,6 +58,13 @@ struct rte_eth_xstats_name_off {
unsigned offset;
 };
 
+/* Shared memory between primary and secondary processes. */
+static struct {
+   uint64_t next_owner_id;
+   rte_spinlock_t ownership_lock;
+   struct rte_eth_dev_data data[RTE_MAX_ETHPORTS];
+} *rte_eth_dev_shared_data;
+
 static const struct rte_eth_xstats_name_off rte_stats_strings[] = {
{"rx_good_packets", offsetof(struct rte_eth_stats, ipackets)},
{"tx_good_packets", offsetof(struct rte_eth_stats, opackets)},
@@ -125,24 +131,29 @@ enum {
 }
 
 static void
-rte_eth_dev_data_alloc(void)
+rte_eth_dev_shared_data_alloc(void)
 {
const unsigned flags = 0;
const struct rte_memzone *mz;
 
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   /* Allocate shared memory for port data and ownership. */
mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
-   RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data),
-   rte_socket_id(), flags);
+sizeof(*rte_eth_dev_shared_data),
+rte_socket_id(), flags);
} else
mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
if (mz == NULL)
rte_panic("Cannot allocate memzone for ethernet port

[dpdk-dev] [PATCH v4 4/7] ethdev: synchronize port allocation

2018-01-20 Thread Matan Azrad
Ethernet port allocation was not thread safe, means 2 threads which tried
to allocate a new port at the same time might get an identical port
identifier and caused to memory overwrite.
Actually, all the port configurations were not thread safe from ethdev
point of view.

The port ownership mechanism added to the ethdev is a good point to
redefine the synchronization rules in ethdev:

1. The port allocation and port release synchronization will be
   managed by ethdev.
2. The port usage synchronization will be managed by the port owner.
3. The port ownership synchronization will be managed by ethdev.

Add port allocation synchronization to complete the new rules.

Signed-off-by: Matan Azrad 
Acked-by: Konstantin Ananyev 
---
 lib/librte_ether/rte_ethdev.c | 43 +++
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index af0e072..f616775 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -52,6 +52,9 @@
 /* spinlock for add/remove tx callbacks */
 static rte_spinlock_t rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
 
+/* spinlock for shared data allocation */
+static rte_spinlock_t rte_eth_shared_data_lock = RTE_SPINLOCK_INITIALIZER;
+
 /* store statistics names and its offset in stats structure  */
 struct rte_eth_xstats_name_off {
char name[RTE_ETH_XSTATS_NAME_SIZE];
@@ -202,21 +205,27 @@ struct rte_eth_dev *
 rte_eth_dev_allocate(const char *name)
 {
uint16_t port_id;
-   struct rte_eth_dev *eth_dev;
+   struct rte_eth_dev *eth_dev = NULL;
+
+   /* Synchronize local threads to allocate shared data only once. */
+   rte_spinlock_lock(&rte_eth_shared_data_lock);
+   if (rte_eth_dev_shared_data == NULL)
+   rte_eth_dev_shared_data_alloc();
+   rte_spinlock_unlock(&rte_eth_shared_data_lock);
+
+   /* Synchronize port creation between primary and secondary threads. */
+   rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
 
port_id = rte_eth_dev_find_free_port();
if (port_id == RTE_MAX_ETHPORTS) {
RTE_PMD_DEBUG_TRACE("Reached maximum number of Ethernet 
ports\n");
-   return NULL;
+   goto unlock;
}
 
-   if (rte_eth_dev_shared_data == NULL)
-   rte_eth_dev_shared_data_alloc();
-
if (rte_eth_dev_allocated(name) != NULL) {
RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s already 
allocated!\n",
name);
-   return NULL;
+   goto unlock;
}
 
eth_dev = eth_dev_get(port_id);
@@ -224,7 +233,11 @@ struct rte_eth_dev *
eth_dev->data->port_id = port_id;
eth_dev->data->mtu = ETHER_MTU;
 
-   _rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_NEW, NULL);
+unlock:
+   rte_spinlock_unlock(&rte_eth_dev_shared_data->ownership_lock);
+
+   if (eth_dev != NULL)
+   _rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_NEW, NULL);
 
return eth_dev;
 }
@@ -238,10 +251,16 @@ struct rte_eth_dev *
 rte_eth_dev_attach_secondary(const char *name)
 {
uint16_t i;
-   struct rte_eth_dev *eth_dev;
+   struct rte_eth_dev *eth_dev = NULL;
 
+   /* Synchronize local threads to attach shared data only once. */
+   rte_spinlock_lock(&rte_eth_shared_data_lock);
if (rte_eth_dev_shared_data == NULL)
rte_eth_dev_shared_data_alloc();
+   rte_spinlock_unlock(&rte_eth_shared_data_lock);
+
+   /* Synchronize port attachment to primary port creation and release. */
+   rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
 
for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
if (strcmp(rte_eth_dev_shared_data->data[i].name, name) == 0)
@@ -251,12 +270,12 @@ struct rte_eth_dev *
RTE_PMD_DEBUG_TRACE(
"device %s is not driven by the primary process\n",
name);
-   return NULL;
+   } else {
+   eth_dev = eth_dev_get(i);
+   RTE_ASSERT(eth_dev->data->port_id == i);
}
 
-   eth_dev = eth_dev_get(i);
-   RTE_ASSERT(eth_dev->data->port_id == i);
-
+   rte_spinlock_unlock(&rte_eth_dev_shared_data->ownership_lock);
return eth_dev;
 }
 
-- 
1.8.3.1



[dpdk-dev] [PATCH v4 2/7] ethdev: fix used portid allocation

2018-01-20 Thread Matan Azrad
rte_eth_dev_find_free_port() found a free port by state checking.
The state field are in local process memory, so other DPDK processes
may get the same port ID because their local states may be different.

Replace the state checking by the ethdev port name checking,
so, if the name is an empty string the port ID will be detected as
unused.

Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process 
model")
Cc: sta...@dpdk.org

Suggested-by: Konstantin Ananyev 
Signed-off-by: Matan Azrad 
Acked-by: Thomas Monjalon 
Acked-by: Konstantin Ananyev 
---
 lib/librte_ether/rte_ethdev.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 23b7442..3a25a64 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -164,8 +164,12 @@ struct rte_eth_dev *
unsigned i;
 
for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
-   if (rte_eth_devices[i].state == RTE_ETH_DEV_UNUSED)
+   /* Using shared name field to find a free port. */
+   if (rte_eth_dev_data[i].name[0] == '\0') {
+   RTE_ASSERT(rte_eth_devices[i].state ==
+  RTE_ETH_DEV_UNUSED);
return i;
+   }
}
return RTE_MAX_ETHPORTS;
 }
-- 
1.8.3.1



[dpdk-dev] [PATCH v4 1/7] ethdev: fix port data reset timing

2018-01-20 Thread Matan Azrad
rte_eth_dev_data structure is allocated per ethdev port and can be
used to get a data of the port internally.

rte_eth_dev_attach_secondary tries to find the port identifier using
rte_eth_dev_data name field comparison and may get an identifier of
invalid port in case of this port was released by the primary process
because the port release API doesn't reset the port data.

So, it will be better to reset the port data in release time instead of
allocation time.

Move the port data reset to the port release API.

Fixes: d948f596fee2 ("ethdev: fix port data mismatched in multiple process 
model")
Cc: sta...@dpdk.org

Signed-off-by: Matan Azrad 
Acked-by: Thomas Monjalon 
Acked-by: Konstantin Ananyev 
---
 lib/librte_ether/rte_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index c4ff1b0..23b7442 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -204,7 +204,6 @@ struct rte_eth_dev *
return NULL;
}
 
-   memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
eth_dev = eth_dev_get(port_id);
snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
eth_dev->data->port_id = port_id;
@@ -252,6 +251,7 @@ struct rte_eth_dev *
if (eth_dev == NULL)
return -EINVAL;
 
+   memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
eth_dev->state = RTE_ETH_DEV_UNUSED;
 
_rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_DESTROY, NULL);
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v5 0/6] TAP RSS eBPF cover letter

2018-01-20 Thread Ophir Munk
Hi Ferruh,
Thanks for applying v5 patches while changing the order of commits and adding 
"Acked-by: ..."

I have sent v6 which does the same but also updates the commit messages of the 
switched commits to reflect more accurately the new order. 

Please let know if you are going to leave v5 as is or replace it with v6.

Regards,
Ophir

> -Original Message-
> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> Sent: Saturday, January 20, 2018 6:16 PM
> To: Pascal Mazon ; Ophir Munk
> ; dev@dpdk.org
> Cc: Thomas Monjalon ; Olga Shern
> 
> Subject: Re: [dpdk-dev] [PATCH v5 0/6] TAP RSS eBPF cover letter
> 
> On 1/19/2018 6:48 AM, Pascal Mazon wrote:
> > Hi,
> >
> > It seems more logical to me to introduce tap_program (patch 3) before
> > its compiled version (patch 2).
> > Source code is indeed written down before compiling it.
> >
> > The doc section is a good addition.
> > I'll be happy to see the upcoming utility for turning eBPF bytecode to
> > C arrays.
> > I'd have liked to see automation code (in a not-executed Makefile
> > target
> > typically) for generating the bytecode.
> > I'm being told it should happen in the upcoming series along with the
> > aforementioned utility.
> >
> > Otherwise code looks good enough (I couldn't see everything for lack
> > of time), considering that later patches are expected in next release.
> >
> > Acked-by: Pascal Mazon 
> >
> > Best regards,
> > Pascal
> >
> > On 18/01/2018 14:38, Ophir Munk wrote:
> >> The patches of TAP RSS eBPF follow the RFC on this issue
> >>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdp
> >>
> dk.org%2Fdev%2Fpatchwork%2Fpatch%2F31781%2F&data=02%7C01%7Cop
> hirmu%40
> >>
> mellanox.com%7Ccd9b412a6c1d428fe52308d56021141b%7Ca652971c7d2e
> 4d9ba6a
> >>
> 4d149256f461b%7C0%7C0%7C636520617565078480&sdata=7AuH4FxyKlZR
> %2Fwy6%2
> >> B3hEnW3UQIWmGonkq%2FtAxPdEG2w%3D&reserved=0
> >>
> >> v5 changes with respect to v4
> >> =
> >> Update TAP document guide with RSS
> >>
> >> v4 changes with respect to v3
> >> =
> >> * Code updates based on review comments
> >> * New commits organization (2-->5) based on review comments
> >>   1. net/tap: support actions for different classifiers (preparations for 
> >> BPF.
> >>  No BPF code yet)
> >>   2. net/tap: add eBPF bytes code (BPF bytes code in a separate file)
> >>   3. net/tap: add eBPF program file (Program source code of bytes code)
> >>   4. net/tap: add eBPF API (BPF API to be used by TAP)
> >>   5. net/tap: implement TAP RSS using eBPF
> >>
> >> v3 changes with respect to v2
> >> =
> >> * Add support for IPv6 RSS in BPF program
> >> * Bug fixes
> >> * Updated compatibility to kernel versions:
> >>   eBPF requires Linux version 4.9 configured with BPF
> >> * New license header (SPDX) for newly added files
> >>
> >> v2 changes with respect to v1
> >> =
> >> * v2 has new commits organization (3 --> 2)
> >> * BPF program was revised. It is successfully tested on
> >>   IPv4 L3 L4 layers (compatible to mlx4 device)
> >> * Licensing: no comments received for using "Dual BSD/GPL"
> >>   string during BPF program loading to the kernel.
> >>   (v1 and v2 are using the same license strings)
> >>   Any comments are welcome.
> >> * Compatibility to kernel versions:
> >>   eBPF requires Linux version 4.2 configured with BPF. TAP PMD will
> >>   successfully compile on systems with old or non-BPF configured kernels.
> >>   During compilation time the required Linux headers are searched for.
> >>   If they are not present missing definitions are locally added
> >>   (tap_autoconf.h).
> >>   If the kernel cannot support a BPF operation - at runtime it will
> >>   gracefully reject the netlink message (with BPF) sent to it.
> >>
> >> Ophir Munk (6):
> >>   net/tap: support actions for different classifiers
> >>   net/tap: add eBPF bytes code
> >>   net/tap: add eBPF program file
> >>   net/tap: add eBPF API
> >>   net/tap: implement TAP RSS using eBPF
> >>   doc: detail new tap RSS feature in guides
> 
> Series applied to dpdk-next-net/master, thanks.


[dpdk-dev] [PATCH v4 6/7] net/failsafe: use ownership mechanism to own ports

2018-01-20 Thread Matan Azrad
Fail-safe PMD sub devices management is based on ethdev port mechanism.
So, the sub-devices management structures are exposed to other DPDK
entities which may use them in parallel to fail-safe PMD.

Use the new port ownership mechanism to avoid multiple managments of
fail-safe PMD sub-devices.

Signed-off-by: Matan Azrad 
Acked-by: Gaetan Rivet 
---
 drivers/net/failsafe/failsafe.c | 7 +++
 drivers/net/failsafe/failsafe_eal.c | 6 ++
 drivers/net/failsafe/failsafe_private.h | 2 ++
 3 files changed, 15 insertions(+)

diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
index b767352..a1e1c7a 100644
--- a/drivers/net/failsafe/failsafe.c
+++ b/drivers/net/failsafe/failsafe.c
@@ -196,6 +196,13 @@
ret = failsafe_args_parse(dev, params);
if (ret)
goto free_subs;
+   ret = rte_eth_dev_owner_new(&priv->my_owner.id);
+   if (ret) {
+   ERROR("Failed to get unique owner identifier");
+   goto free_args;
+   }
+   snprintf(priv->my_owner.name, sizeof(priv->my_owner.name),
+FAILSAFE_OWNER_NAME);
ret = failsafe_eal_init(dev);
if (ret)
goto free_args;
diff --git a/drivers/net/failsafe/failsafe_eal.c 
b/drivers/net/failsafe/failsafe_eal.c
index 33a5adf..5f3da06 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -106,6 +106,12 @@
INFO("Taking control of a probed sub device"
  " %d named %s", i, da->name);
}
+   ret = rte_eth_dev_owner_set(pid, &PRIV(dev)->my_owner);
+   if (ret) {
+   INFO("sub_device %d owner set failed (%s),"
+" will try again later", i, strerror(ret));
+   continue;
+   }
ETH(sdev) = &rte_eth_devices[pid];
SUB_ID(sdev) = i;
sdev->fs_dev = dev;
diff --git a/drivers/net/failsafe/failsafe_private.h 
b/drivers/net/failsafe/failsafe_private.h
index 4916365..b377046 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -42,6 +42,7 @@
 #include 
 
 #define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
+#define FAILSAFE_OWNER_NAME "Fail-safe"
 
 #define PMD_FAILSAFE_MAC_KVARG "mac"
 #define PMD_FAILSAFE_HOTPLUG_POLL_KVARG "hotplug_poll"
@@ -145,6 +146,7 @@ struct fs_priv {
uint32_t mac_addr_pool[FAILSAFE_MAX_ETHADDR];
/* current capabilities */
struct rte_eth_dev_info infos;
+   struct rte_eth_dev_owner my_owner; /* Unique owner. */
/*
 * Fail-safe state machine.
 * This level will be tracking state of the EAL and eth
-- 
1.8.3.1



[dpdk-dev] [PATCH v4 5/7] net/failsafe: free an eth port by a dedicated API

2018-01-20 Thread Matan Azrad
Call dedicated ethdev API to free port in remove time as was done in
other fail-safe places.

Signed-off-by: Matan Azrad 
Acked-by: Gaetan Rivet 
---
 drivers/net/failsafe/failsafe_ether.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/failsafe/failsafe_ether.c 
b/drivers/net/failsafe/failsafe_ether.c
index 8a4cacf..e9b0cfe 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -297,7 +297,7 @@
ERROR("Bus detach failed for sub_device %u",
  SUB_ID(sdev));
} else {
-   ETH(sdev)->state = RTE_ETH_DEV_UNUSED;
+   rte_eth_dev_release_port(ETH(sdev));
}
sdev->state = DEV_PARSED;
/* fallthrough */
-- 
1.8.3.1



[dpdk-dev] [PATCH v4 7/7] app/testpmd: adjust ethdev port ownership

2018-01-20 Thread Matan Azrad
Testpmd should not use ethdev ports which are managed by other DPDK
entities.

Set Testpmd ownership to each port which is not used by other entity and
prevent any usage of ethdev ports which are not owned by Testpmd.

Signed-off-by: Matan Azrad 
---
 app/test-pmd/cmdline.c  | 89 +++--
 app/test-pmd/cmdline_flow.c |  2 +-
 app/test-pmd/config.c   | 37 ++-
 app/test-pmd/parameters.c   |  4 +-
 app/test-pmd/testpmd.c  | 63 
 app/test-pmd/testpmd.h  |  3 ++
 6 files changed, 103 insertions(+), 95 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 31919ba..6199c64 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -1394,7 +1394,7 @@ struct cmd_config_speed_all {
&link_speed) < 0)
return;
 
-   RTE_ETH_FOREACH_DEV(pid) {
+   RTE_ETH_FOREACH_DEV_OWNED_BY(pid, my_owner.id) {
ports[pid].dev_conf.link_speeds = link_speed;
}
 
@@ -1902,7 +1902,7 @@ struct cmd_config_rss {
struct cmd_config_rss *res = parsed_result;
struct rte_eth_rss_conf rss_conf = { .rss_key_len = 0, };
int diag;
-   uint8_t i;
+   uint16_t pid;
 
if (!strcmp(res->value, "all"))
rss_conf.rss_hf = ETH_RSS_IP | ETH_RSS_TCP |
@@ -1936,12 +1936,12 @@ struct cmd_config_rss {
return;
}
rss_conf.rss_key = NULL;
-   for (i = 0; i < rte_eth_dev_count(); i++) {
-   diag = rte_eth_dev_rss_hash_update(i, &rss_conf);
+   RTE_ETH_FOREACH_DEV_OWNED_BY(pid, my_owner.id) {
+   diag = rte_eth_dev_rss_hash_update(pid, &rss_conf);
if (diag < 0)
printf("Configuration of RSS hash at ethernet port %d "
"failed with error (%d): %s.\n",
-   i, -diag, strerror(-diag));
+   pid, -diag, strerror(-diag));
}
 }
 
@@ -3686,10 +3686,9 @@ struct cmd_csum_result {
uint64_t csum_offloads = 0;
struct rte_eth_dev_info dev_info;
 
-   if (port_id_is_invalid(res->port_id, ENABLED_WARN)) {
-   printf("invalid port %d\n", res->port_id);
+   if (port_id_is_invalid(res->port_id, ENABLED_WARN))
return;
-   }
+
if (!port_is_stopped(res->port_id)) {
printf("Please stop port %d first\n", res->port_id);
return;
@@ -4364,8 +4363,8 @@ struct cmd_gso_show_result {
 {
struct cmd_gso_show_result *res = parsed_result;
 
-   if (!rte_eth_dev_is_valid_port(res->cmd_pid)) {
-   printf("invalid port id %u\n", res->cmd_pid);
+   if (port_id_is_invalid(res->cmd_pid, ENABLED_WARN)) {
+   printf("invalid/not owned port id %u\n", res->cmd_pid);
return;
}
if (!strcmp(res->cmd_keyword, "gso")) {
@@ -5375,7 +5374,12 @@ static void cmd_create_bonded_device_parsed(void 
*parsed_result,
port_id);
 
/* Update number of ports */
-   nb_ports = rte_eth_dev_count();
+   if (rte_eth_dev_owner_set(port_id, &my_owner) != 0) {
+   printf("Error: cannot own new attached port %d\n",
+  port_id);
+   return;
+   }
+   nb_ports++;
reconfig(port_id, res->socket);
rte_eth_promiscuous_enable(port_id);
}
@@ -5484,10 +5488,8 @@ static void cmd_set_bond_mon_period_parsed(void 
*parsed_result,
struct cmd_set_bond_mon_period_result *res = parsed_result;
int ret;
 
-   if (res->port_num >= nb_ports) {
-   printf("Port id %d must be less than %d\n", res->port_num, 
nb_ports);
+   if (port_id_is_invalid(res->port_num, ENABLED_WARN))
return;
-   }
 
ret = rte_eth_bond_link_monitoring_set(res->port_num, res->period_ms);
 
@@ -5545,11 +5547,8 @@ struct cmd_set_bonding_agg_mode_policy_result {
struct cmd_set_bonding_agg_mode_policy_result *res = parsed_result;
uint8_t policy = AGG_BANDWIDTH;
 
-   if (res->port_num >= nb_ports) {
-   printf("Port id %d must be less than %d\n",
-   res->port_num, nb_ports);
+   if (port_id_is_invalid(res->port_num, ENABLED_WARN))
return;
-   }
 
if (!strcmp(res->policy, "bandwidth"))
policy = AGG_BANDWIDTH;
@@ -5808,7 +5807,7 @@ static void cmd_set_promisc_mode_parsed(void 
*parsed_result,
 
/* all ports */
if (allports) {
-   RTE_ETH_FOREACH_DEV(i) {
+   RTE_ETH_FOREACH_DEV_OWNED_BY(i, my_owner.id) {
if (enable)
rte_eth_promiscuous_enable(i);
else
@@ -5888,7 +5887,7 @@ static void cmd_se

Re: [dpdk-dev] [PATCH v11 5/5] net/virtio: support GUEST ANNOUNCE

2018-01-20 Thread Wang, Xiao W


> -Original Message-
> From: Yigit, Ferruh
> Sent: Saturday, January 20, 2018 10:31 PM
> To: Wang, Xiao W ; y...@fridaylinux.org;
> olivier.m...@6wind.com; maxime.coque...@redhat.com; Thomas Monjalon
> 
> Cc: dev@dpdk.org; Bie, Tiwei ;
> step...@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v11 5/5] net/virtio: support GUEST ANNOUNCE
> 
> On 1/19/2018 5:33 PM, Ferruh Yigit wrote:
> > On 1/16/2018 9:41 PM, Xiao Wang wrote:
> >> When live migration is done, for the backup VM, either the virtio
> >> frontend or the vhost backend needs to send out gratuitous RARP packet
> >> to announce its new network location.
> >>
> >> This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support
> live
> >> migration scenario where the vhost backend doesn't have the ability to
> >> generate RARP packet.
> >>
> >> Brief introduction of the work flow:
> >> 1. QEMU finishes live migration, pokes the backup VM with an interrupt.
> >> 2. Virtio interrupt handler reads out the interrupt status value, and
> >>realizes it needs to send out RARP packet to announce its location.
> >> 3. Pause device to stop worker thread touching the queues.
> >> 4. Inject a RARP packet into a Tx Queue.
> >> 5. Ack the interrupt via control queue.
> >> 6. Resume device to continue packet processing.
> >>
> >> Signed-off-by: Xiao Wang 
> >> Reviewed-by: Maxime Coquelin 
> >
> >
> > Hi Yuanhan,
> >
> > This commit breaks the build!
> 
> I switched two patches and problem gone, like:
> first: net: fixup RARP generation
> second: net/virtio: support GUEST ANNOUNCE
> 
> From my point of view nothing more needs to be done, but can you please
> double
> check the patches.

The 2 patches are OK.
Thanks!

BRs,
Xiao
> 
> Thanks,
> ferruh
> 
> >
> > As far as I understand you send a fix but merged into other patch, which
> leaves
> > this commit still broken.
> >
> > What do you think sending a fix that can be mergable to this one, so I can
> > squash it on next-net?
> >
> > Thanks,
> > ferruh
> >



Re: [dpdk-dev] [PATCH] net/mlx5: fix Memory Region lookup

2018-01-20 Thread Shahaf Shuler
Friday, January 19, 2018 10:37 AM, Nélio Laranjeiro:
> On Thu, Jan 18, 2018 at 11:52:55PM -0800, Yongseok Koh wrote:
> > This patch reverts:
> > commit 3a6f2eb8c5c5 ("net/mlx5: fix Memory Region registration")
> >
> > Although granularity of chunks in a mempool is a cacheline, addresses
> > are extended to align to page boundary for performance reason in
> > device when registering a MR (Memory Region). This could make some
> > regions overlap, then can cause Tx completion error due to incorrect
> > LKEY search. If the error occurs, the Tx queue will get stuck. It is
> > because buffer address is compared against aligned addresses for
> > Memory Region. Saving original addresses of mempool for comparison
> doesn't create any overlap.
> >
> > Fixes: b0b093845793 ("net/mlx5: use buffer address for LKEY search")
> > Fixes: 3a6f2eb8c5c5 ("net/mlx5: fix Memory Region registration")
> > Cc: sta...@dpdk.org
> >
> > Reported-by: Xueming Li 
> > Signed-off-by: Xueming Li 
> > Signed-off-by: Yongseok Koh 
> Acked-by: Nelio Laranjeiro 

Applied to next-net-mlx, thanks. 

> 
> --
> Nélio Laranjeiro
> 6WIND


Re: [dpdk-dev] [dpdk-stable] [PATCH v2 2/2] net/mlx5: fix allocation when no memory on device NUMA node

2018-01-20 Thread Shahaf Shuler
Friday, January 19, 2018 6:25 PM, Olivier Matz:
on the same numa node than the device, it is
> preferable to fallback on another socket instead of failing.
> 
> Fixes: 1e3a39f72d5d ("net/mlx5: allocate verbs object into shared memory")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Olivier Matz 
> Signed-off-by: Nelio Laranjeiro 
> ---
> 
> This new version of the patch was provided by Nelio (thanks), I validated it
> on my platform. I just did minimal changes to fix the checkpatch issues in the
> comments of mlx5.h (/** instead of /*).

Per my understanding the below patch is to select the socket on which to create 
the Verbs object based on the ethdev configuration rather than the PCI numa 
node.
While it introduce the infrastructure to do fallback to other socket id, it is 
not yet used. 
I think the commit log should be modified to better explain this patch.

> 
>  drivers/net/mlx5/mlx5.c | 14 --
>  drivers/net/mlx5/mlx5.h | 20 
>  drivers/net/mlx5/mlx5_rxq.c |  4 
>  drivers/net/mlx5/mlx5_txq.c |  4 
>  4 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 1c95f3520..7a04ccf98 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -139,10 +139,20 @@ mlx5_alloc_verbs_buf(size_t size, void *data)
>   struct priv *priv = data;
>   void *ret;
>   size_t alignment = sysconf(_SC_PAGESIZE);
> + unsigned int socket = SOCKET_ID_ANY;
> 
> + if (priv->verbs_alloc_ctx.type ==
> MLX5_VERSB_ALLOC_TYPE_TX_QUEUE) {
> + const struct mlx5_txq_ctrl *ctrl = priv->verbs_alloc_ctx.obj;
> +
> + socket = ctrl->socket;
> + } else if (priv->verbs_alloc_ctx.type ==
> +MLX5_VERSB_ALLOC_TYPE_RX_QUEUE) {
> + const struct mlx5_rxq_ctrl *ctrl = priv->verbs_alloc_ctx.obj;
> +
> + socket = ctrl->socket;
> + }
>   assert(data != NULL);
> - ret = rte_malloc_socket(__func__, size, alignment,
> - priv->dev->device->numa_node);
> + ret = rte_malloc_socket(__func__, size, alignment, socket);
>   DEBUG("Extern alloc size: %lu, align: %lu: %p", size, alignment, ret);
>   return ret;
>  }
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> e740a4e77..abcae95b8 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -123,6 +123,24 @@ struct mlx5_dev_config {
>   int inline_max_packet_sz; /* Max packet size for inlining. */  };
> 
> +/**
> + * Type of objet being allocated.
> + */
> +enum mlx5_verbs_alloc_type {
> + MLX5_VERSB_ALLOC_TYPE_NONE,
> + MLX5_VERSB_ALLOC_TYPE_TX_QUEUE,
> + MLX5_VERSB_ALLOC_TYPE_RX_QUEUE,
> +};
> +
> +/**
> + * Verbs allocator needs a context to know in the callback which kind
> +of
> + * resources it is allocating.
> + */
> +struct mlx5_verbs_alloc_ctx {
> + enum mlx5_verbs_alloc_type type; /* Kind of object being allocated.
> */
> + const void *obj; /* Pointer to the DPDK object. */ };
> +
>  struct priv {
>   struct rte_eth_dev *dev; /* Ethernet device of master process. */
>   struct ibv_context *ctx; /* Verbs context. */ @@ -164,6 +182,8 @@
> struct priv {
>   int primary_socket; /* Unix socket for primary process. */
>   struct rte_intr_handle intr_handle_socket; /* Interrupt handler. */
>   struct mlx5_dev_config config; /* Device configuration. */
> + struct mlx5_verbs_alloc_ctx verbs_alloc_ctx;
> + /* Context for Verbs allocator. */
>  };
> 
>  /**
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index 950472754..a43a67526 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -655,6 +655,8 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t
> idx)
> 
>   assert(rxq_data);
>   assert(!rxq_ctrl->ibv);
> + priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_RX_QUEUE;
> + priv->verbs_alloc_ctx.obj = rxq_ctrl;
>   tmpl = rte_calloc_socket(__func__, 1, sizeof(*tmpl), 0,
>rxq_ctrl->socket);
>   if (!tmpl) {
> @@ -818,6 +820,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t
> idx)
>   DEBUG("%p: Verbs Rx queue %p: refcnt %d", (void *)priv,
> (void *)tmpl, rte_atomic32_read(&tmpl->refcnt));
>   LIST_INSERT_HEAD(&priv->rxqsibv, tmpl, next);
> + priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
>   return tmpl;
>  error:
>   if (tmpl->wq)
> @@ -828,6 +831,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t
> idx)
>   claim_zero(ibv_destroy_comp_channel(tmpl->channel));
>   if (tmpl->mr)
>   priv_mr_release(priv, tmpl->mr);
> + priv->verbs_alloc_ctx.type = MLX5_VERSB_ALLOC_TYPE_NONE;
>   return NULL;
>  }
> 
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
> index 26db15a4f..b43cc9ed0 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net