Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-07 Thread Yuanhan Liu
On Tue, Dec 06, 2016 at 03:53:42PM +, Ferruh Yigit wrote:
> > Please, we need a comment for each driver saying
> > "it is OK, we do not need any checksum preparation for TSO"
> > or
> > "yes we have to implement tx_prepare or TSO will not work in this mode"
> >

Sorry for late. For virtio, I think it's not a must. The checksum stuff
has been handled inside the Tx function. However, we may could move it
to tx_prepare, which would actually recover the performance lost
introduced while enabling TSO for the non-TSO case.

--yliu


[dpdk-dev] Does DPDK 1.7.1 support jumboframes for VMXNET3

2016-12-07 Thread Kanika Singhal
Hi,

I am a new to DPDK.

My product is using DPDK 1.6 and it does not support jumbo frames for
VMXNET3.

I have to  upgrade to 1.7.1, but i am not able to find info on whether the
jumbo frame support is being added for VMXNET3 or not.

I can see some jumbo frame related patches being posted for vmxnet3

Do I need to apply the patches in 1.7.1 to make jumbo work/Is there any
DPDK version which supports jumbo?

Thanks,
Kanika


Re: [dpdk-dev] [PATCH 1/8] drivers/common/dpaa2: Run time assembler for Descriptor formation

2016-12-07 Thread Thomas Monjalon
2016-12-07 06:24, Akhil Goyal:
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
> 2016-12-05 18:25, Akhil Goyal:
> > FLib is a library which helps in making the descriptors which is 
> > understood by NXP's SEC hardware.
> > This patch provides header files for command words which can be used 
> > for descritptor formation.
> 
> It seems this code is old. Does it exist as a standalone library somewhere?
> Where was it hosted before duplicating it in DPDK?
> 
> Why do you want to have a common directory drivers/common/dpaa2/flib instead 
> of a sub-directory in the crypto driver?
> 
> [Akhil] This is not really a library. This is a set of header files which is 
> required for compilation. We have 2 other cypto drivers (for different 
> platforms viz: Non-DPAA and DPAA1_QORIQ) which uses the same flib. So we put 
> it in common directory. We plan to send patches for other drivers in the 
> upcoming releases. 

Please Akhil, could you answer to the three questions?


Re: [dpdk-dev] [PATCH 11/32] net/dpaa2: add dpaa2 vfio support

2016-12-07 Thread Thomas Monjalon
2016-12-07 12:30, Hemant Agrawal:
> On 12/7/2016 2:34 AM, Thomas Monjalon wrote:
> > 2016-12-04 23:47, Hemant Agrawal:
> >> Add support for using VFIO for dpaa2 based fsl-mc bus.
> >
> > Why do we need so much special code for interfacing VFIO on fsl-mc?
> > Can you reuse some code from EAL VFIO?
> >
> 
> fsl-mc VFIO scans the objects.  So, it is slightly different.
> 
> Even though I have tried to minimize changes and re-use the code from 
> EAL VFIO. It is still refactoring work, which we will take up little later.

Do you mean you could re-use some EAL code but do not want to do it now?
It sounds like something heard too many times earlier (from others).
The experience tells we must not wait to do things right.


[dpdk-dev] [PATCH v2] eal: optimize aligned rte_memcpy

2016-12-07 Thread Zhihong Wang
This patch optimizes rte_memcpy for well aligned cases, where both
dst and src addr are aligned to maximum MOV width. It introduces a
dedicated function called rte_memcpy_aligned to handle the aligned
cases with simplified instruction stream. The existing rte_memcpy
is renamed as rte_memcpy_generic. The selection between them 2 is
done at the entry of rte_memcpy.

The existing rte_memcpy is for generic cases, it handles unaligned
copies and make store aligned, it even makes load aligned for micro
architectures like Ivy Bridge. However alignment handling comes at
a price: It adds extra load/store instructions, which can cause
complications sometime.

DPDK Vhost memcpy with Mergeable Rx Buffer feature as an example:
The copy is aligned, and remote, and there is header write along
which is also remote. In this case the memcpy instruction stream
should be simplified, to reduce extra load/store, therefore reduce
the probability of load/store buffer full caused pipeline stall, to
let the actual memcpy instructions be issued and let H/W prefetcher
goes to work as early as possible.

This patch is tested on Ivy Bridge, Haswell and Skylake, it provides
up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging
from 64 to 1500 bytes.

The test can also be conducted without NIC, by setting loopback
traffic between Virtio and Vhost. For example, modify the macro
TXONLY_DEF_PACKET_LEN to the requested packet size in testpmd.h,
rebuild and start testpmd in both host and guest, then "start" on
one side and "start tx_first 32" on the other.


Signed-off-by: Zhihong Wang 
---
 .../common/include/arch/x86/rte_memcpy.h   | 81 +-
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h 
b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
index b3bfc23..b9785e8 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
@@ -69,6 +69,8 @@ rte_memcpy(void *dst, const void *src, size_t n) 
__attribute__((always_inline));
 
 #ifdef RTE_MACHINE_CPUFLAG_AVX512F
 
+#define ALIGNMENT_MASK 0x3F
+
 /**
  * AVX512 implementation below
  */
@@ -189,7 +191,7 @@ rte_mov512blocks(uint8_t *dst, const uint8_t *src, size_t n)
 }
 
 static inline void *
-rte_memcpy(void *dst, const void *src, size_t n)
+rte_memcpy_generic(void *dst, const void *src, size_t n)
 {
uintptr_t dstu = (uintptr_t)dst;
uintptr_t srcu = (uintptr_t)src;
@@ -308,6 +310,8 @@ COPY_BLOCK_128_BACK63:
 
 #elif defined RTE_MACHINE_CPUFLAG_AVX2
 
+#define ALIGNMENT_MASK 0x1F
+
 /**
  * AVX2 implementation below
  */
@@ -387,7 +391,7 @@ rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n)
 }
 
 static inline void *
-rte_memcpy(void *dst, const void *src, size_t n)
+rte_memcpy_generic(void *dst, const void *src, size_t n)
 {
uintptr_t dstu = (uintptr_t)dst;
uintptr_t srcu = (uintptr_t)src;
@@ -499,6 +503,8 @@ COPY_BLOCK_128_BACK31:
 
 #else /* RTE_MACHINE_CPUFLAG */
 
+#define ALIGNMENT_MASK 0x0F
+
 /**
  * SSE & AVX implementation below
  */
@@ -677,7 +683,7 @@ __extension__ ({
  \
 })
 
 static inline void *
-rte_memcpy(void *dst, const void *src, size_t n)
+rte_memcpy_generic(void *dst, const void *src, size_t n)
 {
__m128i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8;
uintptr_t dstu = (uintptr_t)dst;
@@ -821,6 +827,75 @@ COPY_BLOCK_64_BACK15:
 
 #endif /* RTE_MACHINE_CPUFLAG */
 
+static inline void *
+rte_memcpy_aligned(void *dst, const void *src, size_t n)
+{
+   void *ret = dst;
+
+   /* Copy size <= 16 bytes */
+   if (n < 16) {
+   if (n & 0x01) {
+   *(uint8_t *)dst = *(const uint8_t *)src;
+   src = (const uint8_t *)src + 1;
+   dst = (uint8_t *)dst + 1;
+   }
+   if (n & 0x02) {
+   *(uint16_t *)dst = *(const uint16_t *)src;
+   src = (const uint16_t *)src + 1;
+   dst = (uint16_t *)dst + 1;
+   }
+   if (n & 0x04) {
+   *(uint32_t *)dst = *(const uint32_t *)src;
+   src = (const uint32_t *)src + 1;
+   dst = (uint32_t *)dst + 1;
+   }
+   if (n & 0x08)
+   *(uint64_t *)dst = *(const uint64_t *)src;
+
+   return ret;
+   }
+
+   /* Copy 16 <= size <= 32 bytes */
+   if (n <= 32) {
+   rte_mov16((uint8_t *)dst, (const uint8_t *)src);
+   rte_mov16((uint8_t *)dst - 16 + n,
+   (const uint8_t *)src - 16 + n);
+
+   return ret;
+   }
+
+   /* Copy 32 < size <= 64 bytes */
+   if (n <= 64) {
+   rte_mov32((uint8_t *)dst, (const uint8_t *)src);
+   rte_mov32((uint8_t *)dst - 32 + n,
+  

Re: [dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset and related test

2016-12-07 Thread Yang, Zhiyong
Hi, Maxime:

> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Tuesday, December 6, 2016 4:30 PM
> To: Yang, Zhiyong ; dev@dpdk.org
> Cc: yuanhan@linux.intel.com; Richardson, Bruce
> ; Ananyev, Konstantin
> ; Pierre Pfister (ppfister)
> 
> Subject: Re: [dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset
> and related test
> 
> 
> 
> On 12/06/2016 07:33 AM, Yang, Zhiyong wrote:
> > Hi, Maxime:
> >
> >> -Original Message-
> >> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> >> Sent: Friday, December 2, 2016 6:01 PM
> >> To: Yang, Zhiyong ; dev@dpdk.org
> >> Cc: yuanhan@linux.intel.com; Richardson, Bruce
> >> ; Ananyev, Konstantin
> >> 
> >> Subject: Re: [dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset
> >> and related test
> >>
> >> Hi Zhiyong,
> >>
> >> On 12/05/2016 09:26 AM, Zhiyong Yang wrote:
> >>> DPDK code has met performance drop badly in some case when calling
> >>> glibc function memset. Reference to discussions about memset in
> >>> http://dpdk.org/ml/archives/dev/2016-October/048628.html
> >>> It is necessary to introduce more high efficient function to fix it.
> >>> One important thing about rte_memset is that we can get clear
> >>> control on what instruction flow is used.
> >>>
> >>> This patchset introduces rte_memset to bring more high efficient
> >>> implementation, and will bring obvious perf improvement, especially
> >>> for small N bytes in the most application scenarios.
> >>>
> >>> Patch 1 implements rte_memset in the file rte_memset.h on IA
> >>> platform The file supports three types of instruction sets including
> >>> sse & avx (128bits), avx2(256bits) and avx512(512bits). rte_memset
> >>> makes use of vectorization and inline function to improve the perf
> >>> on IA. In addition, cache line and memory alignment are fully taken
> >>> into
> >> consideration.
> >>>
> >>> Patch 2 implements functional autotest to validates the function
> >>> whether to work in a right way.
> >>>
> >>> Patch 3 implements performance autotest separately in cache and
> memory.
> >>>
> >>> Patch 4 Using rte_memset instead of copy_virtio_net_hdr can bring
> >>> 3%~4% performance improvements on IA platform from virtio/vhost
> >>> non-mergeable loopback testing.
> >>>
> >>> Zhiyong Yang (4):
> >>>   eal/common: introduce rte_memset on IA platform
> >>>   app/test: add functional autotest for rte_memset
> >>>   app/test: add performance autotest for rte_memset
> >>>   lib/librte_vhost: improve vhost perf using rte_memset
> >>>
> >>>  app/test/Makefile  |   3 +
> >>>  app/test/test_memset.c | 158 +
> >>>  app/test/test_memset_perf.c| 348
> +++
> >>>  doc/guides/rel_notes/release_17_02.rst |  11 +
> >>>  .../common/include/arch/x86/rte_memset.h   | 376
> >> +
> >>>  lib/librte_eal/common/include/generic/rte_memset.h |  51 +++
> >>>  lib/librte_vhost/virtio_net.c  |  18 +-
> >>>  7 files changed, 958 insertions(+), 7 deletions(-)  create mode
> >>> 100644 app/test/test_memset.c  create mode 100644
> >>> app/test/test_memset_perf.c  create mode 100644
> >>> lib/librte_eal/common/include/arch/x86/rte_memset.h
> >>>  create mode 100644
> >> lib/librte_eal/common/include/generic/rte_memset.h
> >>>
> >>
> >> Thanks for the series, idea looks good to me.
> >>
> >> Wouldn't be worth to also use rte_memset in Virtio PMD (not
> >> compiled/tested)? :
> >>
> >
> > I think  rte_memset  maybe can bring some benefit here,  but , I'm not
> > clear how to enter the branch and test it. :)
> 
> Indeed, you will need Pierre's patch:
> [dpdk-dev] [PATCH] virtio: tx with can_push when VERSION_1 is set
> 
> Thanks,
> Maxime
> >
Thank you Maxime.
I can see a little, but not obviously  performance improvement here.  
You know, memset(hdr, 0, head_size); only consumes  fewer cycles for virtio 
pmd. 
head_size only  10 or 12 bytes.
I optimize rte_memset perf further for N=8~15 bytes.
The main purpose of Introducing rte_memset is that we can use it
to avoid perf drop issue instead of glibc memset on some platform, I think. 

> >
> >> diff --git a/drivers/net/virtio/virtio_rxtx.c
> >> b/drivers/net/virtio/virtio_rxtx.c
> >> index 22d97a4..a5f70c4 100644
> >> --- a/drivers/net/virtio/virtio_rxtx.c
> >> +++ b/drivers/net/virtio/virtio_rxtx.c
> >> @@ -287,7 +287,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq,
> >> struct rte_mbuf *cookie,
> >>  rte_pktmbuf_prepend(cookie, head_size);
> >>  /* if offload disabled, it is not zeroed below, do it now 
> >> */
> >>  if (offload == 0)
> >> -   memset(hdr, 0, head_size);
> >> +   rte_memset(hdr, 0, head_size);
> >>  } else if (use_indirect) {
> >>  /* setup tx ring slot to point to indirect
> >>   * descriptor

Re: [dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset and related test

2016-12-07 Thread Yuanhan Liu
On Wed, Dec 07, 2016 at 09:28:17AM +, Yang, Zhiyong wrote:
> > >> Wouldn't be worth to also use rte_memset in Virtio PMD (not
> > >> compiled/tested)? :
> > >>
> > >
> > > I think  rte_memset  maybe can bring some benefit here,  but , I'm not
> > > clear how to enter the branch and test it. :)
> > 
> > Indeed, you will need Pierre's patch:
> > [dpdk-dev] [PATCH] virtio: tx with can_push when VERSION_1 is set

I will apply it shortly.

> > Thanks,
> > Maxime
> > >
> Thank you Maxime.
> I can see a little, but not obviously  performance improvement here.  

Are you you have run into that code piece? FYI, you have to enable
virtio 1.0 explicitly, which is disabled by deafault.

> You know, memset(hdr, 0, head_size); only consumes  fewer cycles for virtio 
> pmd. 
> head_size only  10 or 12 bytes.
> I optimize rte_memset perf further for N=8~15 bytes.
> The main purpose of Introducing rte_memset is that we can use it
> to avoid perf drop issue instead of glibc memset on some platform, I think. 

For this case (as well as the 4th patch), it's more about making sure
rte_memset is inlined.

--yliu


Re: [dpdk-dev] [PATCH v2] doc: introduce PVP reference benchmark

2016-12-07 Thread Yuanhan Liu
On Tue, Dec 06, 2016 at 01:24:40PM +0100, Maxime Coquelin wrote:
> Having reference benchmarks is important in order to obtain
> reproducible performance figures.
> 
> This patch describes required steps to configure a PVP setup
> using testpmd in both host and guest.
> 
> Not relying on external vSwitch ease integration in a CI loop by
> not being impacted by DPDK API changes.
> 
> Signed-off-by: Maxime Coquelin 

Reviewed-by: Yuanhan Liu 

Thanks for the doc!

--yliu


Re: [dpdk-dev] [PATCH v2] virtio: tx with can_push when VERSION_1 is set

2016-12-07 Thread Yuanhan Liu
On Wed, Nov 30, 2016 at 09:18:42AM +, Pierre Pfister (ppfister) wrote:
> Current virtio driver advertises VERSION_1 support,
> but does not handle device's VERSION_1 support when
> sending packets (it looks for ANY_LAYOUT feature,
> which is absent).
> 
> This patch enables 'can_push' in tx path when VERSION_1
> is advertised by the device.
> 
> This significantly improves small packets forwarding rate
> towards devices advertising VERSION_1 feature.
> 
> Signed-off-by: Pierre Pfister 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset and related test

2016-12-07 Thread Yang, Zhiyong
Hi, yuanhan:

> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan@linux.intel.com]
> Sent: Wednesday, December 7, 2016 5:38 PM
> To: Yang, Zhiyong 
> Cc: Maxime Coquelin ; dev@dpdk.org;
> Richardson, Bruce ; Ananyev, Konstantin
> ; Pierre Pfister (ppfister)
> 
> Subject: Re: [dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset
> and related test
> 
> On Wed, Dec 07, 2016 at 09:28:17AM +, Yang, Zhiyong wrote:
> > > >> Wouldn't be worth to also use rte_memset in Virtio PMD (not
> > > >> compiled/tested)? :
> > > >>
> > > >
> > > > I think  rte_memset  maybe can bring some benefit here,  but , I'm
> > > > not clear how to enter the branch and test it. :)
> > >
> > > Indeed, you will need Pierre's patch:
> > > [dpdk-dev] [PATCH] virtio: tx with can_push when VERSION_1 is set
> 
> I will apply it shortly.
> 
> > > Thanks,
> > > Maxime
> > > >
> > Thank you Maxime.
> > I can see a little, but not obviously  performance improvement here.
> 
> Are you you have run into that code piece? FYI, you have to enable virtio 1.0
> explicitly, which is disabled by deafault.

Yes. I use the patch from Pierre and set offload  = 0 ; 
Thanks
Zhiyong

> 
> > You know, memset(hdr, 0, head_size); only consumes  fewer cycles for
> virtio pmd.
> > head_size only  10 or 12 bytes.
> > I optimize rte_memset perf further for N=8~15 bytes.
> > The main purpose of Introducing rte_memset is that we can use it to
> > avoid perf drop issue instead of glibc memset on some platform, I think.
> 
> For this case (as well as the 4th patch), it's more about making sure
> rte_memset is inlined.
> 
>   --yliu


Re: [dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset and related test

2016-12-07 Thread Yuanhan Liu
On Wed, Dec 07, 2016 at 09:43:06AM +, Yang, Zhiyong wrote:
> > On Wed, Dec 07, 2016 at 09:28:17AM +, Yang, Zhiyong wrote:
> > > > >> Wouldn't be worth to also use rte_memset in Virtio PMD (not
> > > > >> compiled/tested)? :
> > > > >>
> > > > >
> > > > > I think  rte_memset  maybe can bring some benefit here,  but , I'm
> > > > > not clear how to enter the branch and test it. :)
> > > >
> > > > Indeed, you will need Pierre's patch:
> > > > [dpdk-dev] [PATCH] virtio: tx with can_push when VERSION_1 is set
> > 
> > I will apply it shortly.
> > 
> > > > Thanks,
> > > > Maxime
> > > > >
> > > Thank you Maxime.
> > > I can see a little, but not obviously  performance improvement here.
> > 
> > Are you you have run into that code piece? FYI, you have to enable virtio 
> > 1.0
> > explicitly, which is disabled by deafault.
> 
> Yes. I use the patch from Pierre and set offload  = 0 ; 

I meant virtio 1.0. Have you added following options for the QEMU virtio-net
device?

disable-modern=false

--yliu


Re: [dpdk-dev] [PATCH 00/13] Introducing EAL Bus-Device-Driver Model

2016-12-07 Thread Shreyansh Jain

Hello David,

On Wednesday 07 December 2016 02:22 AM, David Marchand wrote:

"Big patchset and a lot of things to look at.

Here is a first look at it.


Thanks for comments.



On Sun, Dec 4, 2016 at 11:11 AM, Shreyansh Jain  wrote:

In continuation to the RFC posted on 17/Nov [9],
A series of patches is being posted which attempts to create:
 1. A basic bus model
`- define rte_bus and associated methods/helpers
`- test infrastructure to test the Bus infra
 2. Changes in EAL to support PCI as a bus
`- a "pci" bus is registered
`- existing scan/match/probe are modified to allow for bus integration
`- PCI Device and Driver list, which were global entities, have been
   moved to rte_bus->[device/driver]_list

I have sanity tested this patch over a XeonD X552 available with me, as
well as part of PoC for verifying NXP's DPAA2 PMD (being pushed out in a
separate series). Exhaustive testing is still pending.


I saw some checkpatch issues for patch 1 (I would ignore this one) and
4 (please check).


Yes, for Patch 1 I too ignored the warnings.
For Patch 4, there are 3 warnings, of which first 2 were reported by 
checkpatch tool before I sent out. Surprisingly, I didn't see the third 
one when I ran the tool in my environment.


I will fix them (those which I can) in v2.





:: Brief about Patch Layout ::

0001:  Container_of patch from [3]



0002~0003: Introducing the basic Bus model and associated test case
0005:  Support insertion of device rather than addition to tail


Patch 2 and 5 could be squashed.


I deliberately kept them separate. I intent to extend the Patch 5 for 
hotplugging. But, if I don't end up adding support for that in this 
series, I will merge these two.



The constructor priority stuff seems unneeded as long as we use
explicit reference to a global (or local, did not check) bus symbol
rather than a runtime lookup.


I didn't understand your point here.
IMO, constructor priority (or some other way to handle this) is 
important. I faced this issue while verifying it at my end when the 
drivers were getting registered before the bus.


Can you elaborate more on '..use explicit reference to a global...'?





0004:  Add scan and match callbacks for the Bus and updated test case


Why do you push back the bus object in the 'scan' method ?
This method is bus specific which means that the code "knows" the
object registered with the callback.


This 'knows' is the grey area for me.
The bus (for example, PCI) after scanning needs to call 
rte_eal_bus_add_device() to link the device in bus's device_list.


Two options:
1. Have a global reference to "pci" bus (rte_bus) somewhere in eal_pci.c
2. Call rte_eal_get_bus() every time someone needs the reference.
3. C++ style, 'this->'.

I have taken the 3rd path. It simplifies my code to not assume a handle 
as well as not allow for reference fetch calls every now and then.


As a disadvantage: it means passing this as argument - and some cases 
maintaining it as __rte_unused.


Taking (1) or (2) is not advantageous than this approach.


Is is that you want to have a single scan method used by multiple buses ?


Yes, but only as a use case. For example, platform devices are of 
various types - what if we have a south-bound bus over a platform bus. 
In which case, a hierarchical bus layout is possible.

But, this is far-fetched idea for now.





0006:  Integrate bus scan/match with EAL, without any effective driver


Hard to find a right balance in patch splittng, but patch 4 and 6 are
linked, I would squash them into one.


Yes, it is hard and sometimes there is simply no strong rationale for 
splitting or merging. This is one of those cases.
My idea was that one patch _only_ introduces Bus services (structures, 
functions etc) and another should enable the calls to it from EAL.
In that sense, I still think 4 and 6 should remain separate, may be 
consecutive, though.






0007:  rte_pci_driver->probe replaced with rte_driver->probe


This patch is too big, please separate in two patches: eal changes
then ethdev/driver changes.


I don't think that can be done. One change is incomplete without the other.

Changes to all files are only for rte_pci_driver->probe to 
rte_driver->probe movement. EAL changes is to allow 
rte_eth_dev_pci_probe function after such a change as rte_driver->probe 
has different arguments as compared to rte_pci_driver->probe. The 
patches won't compile if I split.


Am I missing something?


I almost missed that mlx4 has been broken : you moved the drv_flags
from the mlx4 pci driver to rte_driver.


Oh! This is indeed my mistake. I will fix this right away.



Why do you push back the driver object in the 'probe' method ? (idem
rte_bus->scan).


I am assuming you are referring to rte_driver->probe().
This is being done so that implementations (specific to drivers on a 
particular bus) can start extracting the rte_xxx_driver, if need be.


For example, for e1000/em_ethdev.c, rte_driver

Re: [dpdk-dev] [PATCH 00/32] NXP DPAA2 PMD

2016-12-07 Thread Hemant Agrawal

On 12/7/2016 1:18 AM, Ferruh Yigit wrote:

On 12/4/2016 6:16 PM, Hemant Agrawal wrote:

The patch series adds NXP’s QorIQ-Layerscape DPAA2 Architecture based
network SoC PMD.  This version of the driver supports NXP LS208xA,
LS204xA and LS108x families Network SoCs.

DPAA2, or Data Path Acceleration Architecture, is a hardware architecture
designed for high-speed network packet processing. It uses a bus name
‘fsl-mc’, part of Linux Kernel Staging tree [2], for resource management.

A brief description of architecture is given below; detailed description
is part of the documentation in the patches itself.

DPAA2 contains hardware component called the Management Complex (or MC).
It manages the DPAA2 hardware resources.  The MC provides an object-based
abstraction for software drivers to use the DPAA2 hardware.

Some of the key objects are:
- DPNI, which refers to the network interface object.
- DPBP, which refers to HW based memory pool object
- DPIO, refers to processing context for accessing QBMAN

Besides the MC, DPAA2 also includes a Hardware based Queue and Buffer Manager
called QBMAN. Prime responsibility of QBMAN is to allow lockless access to
software/user-space to the queues and buffers implemented in the hardware.

The patch series could be logically structured into following sub-areas:
1. (Patch 0001) DPAA2 Architecture overview document
2. (Patches 0002-0007) Common dpaa2 hw accelerator drivers for MC and QBMAN.
3. (Patch 0008) Enabling crc in armv8 core machine type
4. (Patch 0009) Adding rte_device in rte_eth_dev
5. (Patches 0010-0013) introduce DPAA2 bus and VFIO routines
6. (Patches 0014-0017) dpio and dpbp object drivers
7. (Patches 0018-0027) Support for DPAA2 Ethernet Device (ethdev)
8. (Patches 0028-0032) Additional functionality in DPAA2 ethdev.

The following design decisions are made during development:

1. DPAA2 implements a new bus called "dpaa2" and some common accelerator 
drivers.
   These drivers will be shared with dpaa2 based crypto drivers.
 - For this, patch series from Shreyansh [1] has been used for creating a
   bus handler.
 - For the purpose of this bus, rte_dpaa2_device/rte_dpaa2_driver might
   also be required but they are not part of the first patch series.
   Currently, rte_device/driver are being directly used as a PoC.

2. DPAA2 implements the HW mempool offload with DPBP object.
 - The new pool is being configured using compile time option and pool name
   as "dpaa2".

3. It maintains per lcore DPIO objects and affine the DPIO instance to the
   processing threads accessing the QBMAN HW.

Prerequisites:
 - For running the PMD, NXP's SoC (board) and SDK (software/BSP) is required.
   Information about obtaining relevant software is available in the docs
   as part of the patch.
 - At present the series has limited support for Ethernet functions. But,
   more functionality would be made available in a phased manner.
 - This PMD has been validated over the Bus Model [1] and SoC Patchset [3]


Just to clarify this patchset depends other patchset, although mentioned
above, it is good to have links for dependent patches:

Dependencies:

- [4]

[4] http://dpdk.org/dev/patchwork/patch/17620/


Thanks for pointing. I missed it.






[1] http://dpdk.org/ml/archives/dev/2016-December/051349.html
[2] https://www.kernel.org/doc/readme/drivers-staging-fsl-mc-README.txt
[3] http://dpdk.org/ml/archives/dev/2016-October/048949.html


Hemant Agrawal (32):
  doc: add dpaa2 nic details
  drivers/common: introducing dpaa2 mc driver
  drivers/common/dpaa2: add mc dpni object support
  drivers/common/dpaa2: add mc dpio object support
  drivers/common/dpaa2: add mc dpbp object support
  drivers/common/dpaa2: add mc dpseci object support
  drivers/common/dpaa2: adding qbman driver
  mk/dpaa2: add the crc support to the machine type
  lib/ether: add rte_device in rte_eth_dev
  net/dpaa2: introducing dpaa2 bus driver for fsl-mc bus
  net/dpaa2: add dpaa2 vfio support
  net/dpaa2: vfio scan for net and sec device
  net/dpaa2: add debug log macros
  net/dpaa2: dpio object driver
  net/dpaa2: dpio routine to affine to crypto threads
  net/dpaa2: dpio add support to check SOC type
  net/dpaa2: dpbp based mempool hw offload driver
  net/dpaa2: introducing dpaa2 pmd driver
  net/dpaa2: adding eth ops to dpaa2
  net/dpaa2: add queue configuration support
  net/dpaa2: add rss flow distribution
  net/dpaa2: configure mac address at init
  net/dpaa2: attach the buffer pool to dpni
  net/dpaa2: add support for l3 and l4 checksum offload
  net/dpaa2: add support for promiscuous mode
  net/dpaa2: add mtu config support
  net/dpaa2: add packet rx and tx support
  net/dpaa2: add support for physical address usages
  net/dpaa2: rx packet parsing and packet type support
  net/dpaa2: frame queue based dq storage alloc
  net/dpaa2: add support for non hw buffer pool packet transmit
  net/dpaa2: enable stashing for LS2088A devices

 config/defconfig_arm64-dpaa2-linuxapp-gcc  |   15 +-
 d

Re: [dpdk-dev] [PATCH 18/32] net/dpaa2: introducing dpaa2 pmd driver

2016-12-07 Thread Hemant Agrawal

On 12/7/2016 2:38 AM, Thomas Monjalon wrote:

2016-12-06 19:49, Ferruh Yigit:

On 12/4/2016 6:17 PM, Hemant Agrawal wrote:

+   if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+   eth_dev->data->dev_private = rte_zmalloc(
+   "ethdev private structure",
+   eth_drv->dev_private_size,
+   RTE_CACHE_LINE_SIZE);
+   if (eth_dev->data->dev_private == NULL)
+   rte_panic("Cannot allocate memzone for private port"
+ " data\n");


Should this error kill all app, or return an error for this PMD that is
probed.


It cannot be a question :)
rte_panic() inside libs or drivers is forbidden (and existing ones must
be removed).


I will change it to return an error.



Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-07 Thread Ferruh Yigit
On 12/6/2016 6:25 PM, Yong Wang wrote:
>> -Original Message-
>> From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
>> Sent: Sunday, December 4, 2016 4:11 AM
>> To: Yong Wang ; Thomas Monjalon
>> 
>> Cc: Harish Patil ; dev@dpdk.org; Rahul Lakkireddy
>> ; Stephen Hurd
>> ; Jan Medala ; Jakub
>> Palider ; John Daley ; Adrien
>> Mazarguil ; Alejandro Lucero
>> ; Rasesh Mody
>> ; Jacob, Jerin ;
>> Yuanhan Liu ; Kulasek, TomaszX
>> ; olivier.m...@6wind.com
>> Subject: RE: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
>>
>> Hi
>>
>>
>>

>>
 2016-11-30 17:42, Ananyev, Konstantin:
>>
>>> Please, we need a comment for each driver saying
>>
>>> "it is OK, we do not need any checksum preparation for TSO"
>>
>>> or
>>
>>> "yes we have to implement tx_prepare or TSO will not work in this
>>
 mode"
>>
>>>
>>
>>
>>
>> qede PMD doesn’t currently support TSO yet, it only supports Tx
>>
 TCP/UDP/IP
>>
>> csum offloads.
>>
>> So Tx preparation isn’t applicable. So as of now -
>>
>> "it is OK, we do not need any checksum preparation for TSO"
>>
>
>>
> Thanks for the answer.
>>
> Though please note that it not only for TSO.
>>

>>
 Oh yes, sorry, my wording was incorrect.
>>
 We need to know if any checksum preparation is needed prior
>>
 offloading its final computation to the hardware or driver.
>>
 So the question applies to TSO and simple checksum offload.
>>

>>
 We are still waiting answers for
>>
bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
>>
>>>
>>
>>> The case for a virtual device is a little bit more complicated as packets
>> offloaded from a virtual device can eventually be delivered to
>>
>>> another virtual NIC or different physical NICs that have different offload
>> requirements.  In ESX, the hypervisor will enforce that the packets
>>
>>> offloaded will be something that the hardware expects.  The contract for
>> vmxnet3 is that the guest needs to fill in pseudo header checksum
>>
>>> for both l4 checksum only and TSO + l4 checksum offload cases.
>>
>>
>>
>> Ok, so at first glance that looks to me very similar to Intel HW 
>> requirements.
>>
>> Could you confirm would rte_net_intel_cksum_prepare()
>>
>> also work for vmxnet3 or some extra modifications are required?
>>
>> You can look at it here: https://urldefense.proofpoint.com/v2/url?u=http-
>> 3A__dpdk.org_dev_patchwork_patch_17184_&d=DgIGaQ&c=uilaK90D4TOV
>> oH58JNXRgQ&r=v4BBYIqiDq552fkYnKKFBFyqvMXOR3UXSdFO2plFD1s&m=NS
>> 4zOl2je_tyGhnOJMSnu37HmJxOZf-1KLYcVsu8iYY&s=dL-NOC-
>> 18HclXUURQzuyW5Udw4NY13pKMndYvfgCfbA&e= .
>>
>> Note that for Intel HW the rules for pseudo-header csum calculation
>>
>> differ for TSO and non-TSO case.
>>
>> For TSO length inside pseudo-header are set to 0, while for non-tso case
>>
>> It should be set to L3 payload length.
>>
>> Is it the same for vmxnet3 or no?
>>
>> Thanks
>>
>> Konstantin
>>
> 
> Yes and this is the same for vmxnet3.
> 

This means vmxnet3 PMD also should be updated, right? Should that update
be part of tx_prep patchset? Or separate patch?

>>>
>>
> This is for any TX offload for which the upper layer SW would have
>>
> to modify the contents of the packet.
>>
> Though as I can see for qede neither PKT_TX_IP_CKSUM or
>>
 PKT_TX_TCP_CKSUM
>>
> exhibits any extra requirements for the user.
>>
> Is that correct?
>>
>>
> 



Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-07 Thread Ananyev, Konstantin

Hi Ferruh,

> 
> On 12/6/2016 6:25 PM, Yong Wang wrote:
> >> -Original Message-
> >> From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
> >> Sent: Sunday, December 4, 2016 4:11 AM
> >> To: Yong Wang ; Thomas Monjalon
> >> 
> >> Cc: Harish Patil ; dev@dpdk.org; Rahul Lakkireddy
> >> ; Stephen Hurd
> >> ; Jan Medala ; Jakub
> >> Palider ; John Daley ; Adrien
> >> Mazarguil ; Alejandro Lucero
> >> ; Rasesh Mody
> >> ; Jacob, Jerin ;
> >> Yuanhan Liu ; Kulasek, TomaszX
> >> ; olivier.m...@6wind.com
> >> Subject: RE: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> >>
> >> Hi
> >>
> >>
> >>
> 
> >>
>  2016-11-30 17:42, Ananyev, Konstantin:
> >>
> >>> Please, we need a comment for each driver saying
> >>
> >>> "it is OK, we do not need any checksum preparation for TSO"
> >>
> >>> or
> >>
> >>> "yes we have to implement tx_prepare or TSO will not work in this
> >>
>  mode"
> >>
> >>>
> >>
> >>
> >>
> >> qede PMD doesn’t currently support TSO yet, it only supports Tx
> >>
>  TCP/UDP/IP
> >>
> >> csum offloads.
> >>
> >> So Tx preparation isn’t applicable. So as of now -
> >>
> >> "it is OK, we do not need any checksum preparation for TSO"
> >>
> >
> >>
> > Thanks for the answer.
> >>
> > Though please note that it not only for TSO.
> >>
> 
> >>
>  Oh yes, sorry, my wording was incorrect.
> >>
>  We need to know if any checksum preparation is needed prior
> >>
>  offloading its final computation to the hardware or driver.
> >>
>  So the question applies to TSO and simple checksum offload.
> >>
> 
> >>
>  We are still waiting answers for
> >>
>   bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
> >>
> >>>
> >>
> >>> The case for a virtual device is a little bit more complicated as packets
> >> offloaded from a virtual device can eventually be delivered to
> >>
> >>> another virtual NIC or different physical NICs that have different offload
> >> requirements.  In ESX, the hypervisor will enforce that the packets
> >>
> >>> offloaded will be something that the hardware expects.  The contract for
> >> vmxnet3 is that the guest needs to fill in pseudo header checksum
> >>
> >>> for both l4 checksum only and TSO + l4 checksum offload cases.
> >>
> >>
> >>
> >> Ok, so at first glance that looks to me very similar to Intel HW 
> >> requirements.
> >>
> >> Could you confirm would rte_net_intel_cksum_prepare()
> >>
> >> also work for vmxnet3 or some extra modifications are required?
> >>
> >> You can look at it here: https://urldefense.proofpoint.com/v2/url?u=http-
> >> 3A__dpdk.org_dev_patchwork_patch_17184_&d=DgIGaQ&c=uilaK90D4TOV
> >> oH58JNXRgQ&r=v4BBYIqiDq552fkYnKKFBFyqvMXOR3UXSdFO2plFD1s&m=NS
> >> 4zOl2je_tyGhnOJMSnu37HmJxOZf-1KLYcVsu8iYY&s=dL-NOC-
> >> 18HclXUURQzuyW5Udw4NY13pKMndYvfgCfbA&e= .
> >>
> >> Note that for Intel HW the rules for pseudo-header csum calculation
> >>
> >> differ for TSO and non-TSO case.
> >>
> >> For TSO length inside pseudo-header are set to 0, while for non-tso case
> >>
> >> It should be set to L3 payload length.
> >>
> >> Is it the same for vmxnet3 or no?
> >>
> >> Thanks
> >>
> >> Konstantin
> >>
> >
> > Yes and this is the same for vmxnet3.
> >
> 
> This means vmxnet3 PMD also should be updated, right? 

Yes, that's right.

>Should that update
> be part of tx_prep patchset? Or separate patch?

Another question I suppose is who will do the actual patch for vmxnet3.
Yong, are you ok to do the patch for vmxnet3, or prefer us to do that?
Please note, that in both cases will need your help in testing/reviewing it.
Konstantin

> 
> >>>
> >>
> > This is for any TX offload for which the upper layer SW would have
> >>
> > to modify the contents of the packet.
> >>
> > Though as I can see for qede neither PKT_TX_IP_CKSUM or
> >>
>  PKT_TX_TCP_CKSUM
> >>
> > exhibits any extra requirements for the user.
> >>
> > Is that correct?
> >>
> >>
> >



Re: [dpdk-dev] [PATCH 11/32] net/dpaa2: add dpaa2 vfio support

2016-12-07 Thread Hemant Agrawal

On 12/7/2016 2:08 PM, Thomas Monjalon wrote:

2016-12-07 12:30, Hemant Agrawal:

On 12/7/2016 2:34 AM, Thomas Monjalon wrote:

2016-12-04 23:47, Hemant Agrawal:

Add support for using VFIO for dpaa2 based fsl-mc bus.


Why do we need so much special code for interfacing VFIO on fsl-mc?
Can you reuse some code from EAL VFIO?



fsl-mc VFIO scans the objects.  So, it is slightly different.

Even though I have tried to minimize changes and re-use the code from
EAL VFIO. It is still refactoring work, which we will take up little later.


Do you mean you could re-use some EAL code but do not want to do it now?
It sounds like something heard too many times earlier (from others).
The experience tells we must not wait to do things right.

I meant that I was able to use some of the EAL VFIO functions. There are 
some changes which I am planning to propose in eal vfio and in our driver.


I will do it in subsequent patchsets of current series shortly.





Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-07 Thread Adrien Mazarguil
On Tue, Dec 06, 2016 at 08:31:35PM +, Ananyev, Konstantin wrote:
> > Hi Konstantin,
> > 
> > On Tue, Dec 06, 2016 at 10:56:26AM +, Ananyev, Konstantin wrote:
> > >
> > > Hi Adrien,
> > >
> > > >
> > > > On Mon, Dec 05, 2016 at 04:43:52PM +, Ananyev, Konstantin wrote:
> > > > [...]
> > > > > > On Fri, Dec 02, 2016 at 01:00:55AM +, Ananyev, Konstantin wrote:
> > > > > > [...]
> > > > > > > > On Wed, Nov 30, 2016 at 10:54:50AM +, Ananyev, Konstantin 
> > > > > > > > wrote:
> > > > > > > > [...]
> > > > > > > > > Do you have anything particular in mind here?
> > > > > > > >
> > > > > > > > Nothing in particular, so for the sake of the argument, let's 
> > > > > > > > suppose that I
> > > > > > > > would like to add a field to expose some limitation that only 
> > > > > > > > applies to my
> > > > > > > > PMD during TX but looks generic enough to make sense, e.g. 
> > > > > > > > maximum packet
> > > > > > > > size when VLAN tagging is requested.
> > > > > > >
> > > > > > > Hmm, I didn't hear about such limitations so far, but if it is 
> > > > > > > real case -
> > > > > > > sure, feel free to submit the patch.
> > > > > >
> > > > > > I won't, that was hypothetical.
> > > > >
> > > > > Then why we discussing it? :)
> > > >
> > > > Just to make a point, which is that new limitations may appear anytime 
> > > > and
> > > > tx_prepare() can now be used to check for them. First patch of the 
> > > > series
> > > > does it:
> > > >
> > > >  +   uint16_t nb_seg_max; /**< Max number of segments per whole 
> > > > packet. */
> > > >  +   uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
> > > >
> > > > And states that:
> > > >
> > > >  + * For each packet to send, the rte_eth_tx_prepare() function performs
> > > >  + * the following operations:
> > > >  + *
> > > >  + * - Check if packet meets devices requirements for tx offloads.
> > > >  + *
> > > >  + * - Check limitations about number of segments.
> > > >  + *
> > > >  + * - Check additional requirements when debug is enabled.
> > > >  + *
> > > >  + * - Update and/or reset required checksums when tx offload is set 
> > > > for packet.
> > >
> > > I think I already explained in my previous email why I think that
> > > nb_seg_max and nb_mtu_seg_max are not redundant because of tx_prepare().
> > > From my point they are complement to tx_prepare():
> > > Even if people do use tx_prepare() they still should take this 
> > > information into account.
> > > As an example ixgbe can't TX packets with then 40 segments.
> > > tx_prepare() for ixgbe will flag that issue, but it can't make a decision 
> > > on user behalf
> > > what to do in that case: drop the packet, try to coalesce it into the 
> > > packet with less
> > > number of segments, split the packet into several smaller, etc.
> > > That's up to user to make such decision, and to make it, user might need 
> > > this information.
> > 
> > Yet tx_prepare() has already the ability to update mbuf contents, issue is
> > what will this function do in the future, where will it stop? It is defined
> > in a way that each PMD does what it wants to make mbufs edible for
> > tx_burst(), because of this applications will just always call it to be on
> > the safe side.
> > 
> > > > It's like making this function mandatory IMO.
> > >
> > > That's probably where confusion starts: I don't think that
> > > tx_prepare() should be mandatory for the user to call.
> > > Yes, it should be a recommended way.
> > > But the user still should have the ability to by-pass it,
> > > if he believes there is no need for it, or he prefers to implement
> > > the same functionality on his own.
> > > As an example,  if the user knows that he is going to send  a group
> > > of one-segment packets that don't require any tx offloads, he can safely 
> > > skip
> > > tx_prepare() for them.
> > 
> > I understand your point, and agree with the example you provide. Many
> > applications do not know what's inside mbufs though, except perhaps that
> > they contain TCP and may want to perform TSO because of that. Those will
> > have to call tx_prepare() to be future-proof.
> > 
> > > > > > > > PMDs are free to set that field to some
> > > > > > > > special value (say, 0) if they do not care.
> > > > > > > >
> > > > > > > > Since that field exists however, conscious applications should 
> > > > > > > > check its
> > > > > > > > value for each packet that needs to be transmitted. This extra 
> > > > > > > > code causes a
> > > > > > > > slowdown just by sitting in the data path. Since it is not the 
> > > > > > > > only field in
> > > > > > > > that structure, the performance impact can be significant.
> > >
> > > Conscious user will  probably use this information at the stage of packet 
> > > formation.
> > > He probably has to do this sort of things for large packets anyway:
> > > Check what is the underlying mtu, to decide does he need to split the 
> > > packet,
> > > or enable tso for it, etc.
> > 
> > There are alr

Re: [dpdk-dev] [PATCH 10/32] net/dpaa2: introducing dpaa2 bus driver for fsl-mc bus

2016-12-07 Thread Shreyansh Jain

+ CC: David Marchand

On Sunday 04 December 2016 11:47 PM, Hemant Agrawal wrote:

The DPAA2 bus driver is a rte_bus driver which scans the fsl-mc bus.

Signed-off-by: Hemant Agrawal 
---
 drivers/net/Makefile|   2 +-
 drivers/net/dpaa2/Makefile  |  60 ++
 drivers/net/dpaa2/dpaa2_bus.c   |  99 +++
 drivers/net/dpaa2/rte_dpaa2.h   | 121 
 drivers/net/dpaa2/rte_pmd_dpaa2_version.map |   4 +
 mk/rte.app.mk   |   1 +
 6 files changed, 286 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/dpaa2/Makefile
 create mode 100644 drivers/net/dpaa2/dpaa2_bus.c
 create mode 100644 drivers/net/dpaa2/rte_dpaa2.h
 create mode 100644 drivers/net/dpaa2/rte_pmd_dpaa2_version.map

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index bc93230..2bcf67b 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -55,7 +55,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
-
+DIRS-$(CONFIG_RTE_LIBRTE_DPAA2_PMD) += dpaa2
 ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
 endif # $(CONFIG_RTE_LIBRTE_VHOST)
diff --git a/drivers/net/dpaa2/Makefile b/drivers/net/dpaa2/Makefile
new file mode 100644
index 000..a99ce22
--- /dev/null
+++ b/drivers/net/dpaa2/Makefile
@@ -0,0 +1,60 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2016 NXP. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of NXP nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_dpaa2.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/
+CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
+
+# versioning export map
+EXPORT_MAP := rte_pmd_dpaa2_version.map
+
+# library version
+LIBABIVER := 1
+
+
+# Interfaces with DPDK
+SRCS-$(CONFIG_RTE_LIBRTE_DPAA2_PMD) += dpaa2_bus.c
+
+# library dependencies
+DEPDIRS-y += lib/librte_eal
+DEPDIRS-y += drivers/common/dpaa/mc
+DEPDIRS-y += drivers/common/dpaa/qbman
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/dpaa2/dpaa2_bus.c b/drivers/net/dpaa2/dpaa2_bus.c
new file mode 100644
index 000..571066c
--- /dev/null
+++ b/drivers/net/dpaa2/dpaa2_bus.c
@@ -0,0 +1,99 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2016 NXP. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of NXP nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WAR

Re: [dpdk-dev] [PATCH] vhost: allow for many vhost user ports

2016-12-07 Thread Yuanhan Liu
On Thu, Dec 01, 2016 at 04:26:50PM +0100, Jan Wickbom wrote:
>  static int
> -fdset_fill(fd_set *rfset, fd_set *wfset, struct fdset *pfdset)
> +fdset_fill(struct pollfd *rwfds, struct fdset *pfdset)
>  {
>   struct fdentry *pfdentry;
> - int i, maxfds = -1;
> - int num = MAX_FDS;
> -
> - if (pfdset == NULL)
> - return -1;
> + int i;
> + int num;
>  
> - for (i = 0; i < num; i++) {
> + for (i = 0, num = pfdset->num; i < num; i++) {
>   pfdentry = &pfdset->fd[i];
> - if (pfdentry->fd != -1) {
> - int added = 0;
> - if (pfdentry->rcb && rfset) {
> - FD_SET(pfdentry->fd, rfset);
> - added = 1;
> - }
> - if (pfdentry->wcb && wfset) {
> - FD_SET(pfdentry->fd, wfset);
> - added = 1;
> - }
> - if (added)
> - maxfds = pfdentry->fd < maxfds ?
> - maxfds : pfdentry->fd;
> +
> + if (pfdentry->fd < 0) {
> + /* Hole in the list. Move the last one here */
> +
> + *pfdentry = pfdset->fd[num - 1];
> + pfdset->fd[num - 1].fd = -1;
> + num = fdset_adjust_num(pfdset);
>   }
> + rwfds[i].fd = pfdentry->fd;
> + rwfds[i].events = pfdentry->rcb ? POLLIN : 0;
> + rwfds[i].events |= pfdentry->wcb ? POLLOUT : 0;

Another thing is we don't have to re-init this rwfds array again
and again. Instead, we could 

- set it up correctly when fdset_add is invoked: set the fd and
  events.

- reset revents when it's been handled at fdset_event_dispatch().

- swap with the last one and shrink the array on fd delete

Could you make a follow up patch for that?

Thanks.

--yliu


Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-07 Thread Ananyev, Konstantin

Hi Yliu,

> 
> On Tue, Dec 06, 2016 at 03:53:42PM +, Ferruh Yigit wrote:
> > > Please, we need a comment for each driver saying
> > > "it is OK, we do not need any checksum preparation for TSO"
> > > or
> > > "yes we have to implement tx_prepare or TSO will not work in this mode"
> > >
> 
> Sorry for late. For virtio, I think it's not a must. The checksum stuff
> has been handled inside the Tx function. However, we may could move it
> to tx_prepare, which would actually recover the performance lost
> introduced while enabling TSO for the non-TSO case.
> 

So would you like to provide a patch for it,
Or would you like to keep tx_prepare() for virtio as NOP for now?
Thanks
Konstantin


Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-07 Thread Yuanhan Liu
On Wed, Dec 07, 2016 at 10:13:14AM +, Ananyev, Konstantin wrote:
> 
> Hi Yliu,
> 
> > 
> > On Tue, Dec 06, 2016 at 03:53:42PM +, Ferruh Yigit wrote:
> > > > Please, we need a comment for each driver saying
> > > > "it is OK, we do not need any checksum preparation for TSO"
> > > > or
> > > > "yes we have to implement tx_prepare or TSO will not work in this mode"
> > > >
> > 
> > Sorry for late. For virtio, I think it's not a must. The checksum stuff
> > has been handled inside the Tx function. However, we may could move it
> > to tx_prepare, which would actually recover the performance lost
> > introduced while enabling TSO for the non-TSO case.
> > 
> 
> So would you like to provide a patch for it,
> Or would you like to keep tx_prepare() for virtio as NOP for now?

Hi Konstantin,

I'd keep it as it is for now. It should be a trivial patch after all, that
I could provide it when everything are settled down.

--yliu


Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-07 Thread Ananyev, Konstantin


> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan@linux.intel.com]
> Sent: Wednesday, December 7, 2016 10:19 AM
> To: Ananyev, Konstantin 
> Cc: Yigit, Ferruh ; Olivier Matz 
> ; Thomas Monjalon ;
> dev@dpdk.org; Jan Medala ; Jakub Palider 
> ; Netanel Belgazal ; Evgeny
> Schemeilin ; Alejandro Lucero 
> ; Yong Wang ;
> Andrew Rybchenko ; Hemant Agrawal 
> ; Kulasek, TomaszX
> 
> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> On Wed, Dec 07, 2016 at 10:13:14AM +, Ananyev, Konstantin wrote:
> >
> > Hi Yliu,
> >
> > >
> > > On Tue, Dec 06, 2016 at 03:53:42PM +, Ferruh Yigit wrote:
> > > > > Please, we need a comment for each driver saying
> > > > > "it is OK, we do not need any checksum preparation for TSO"
> > > > > or
> > > > > "yes we have to implement tx_prepare or TSO will not work in this 
> > > > > mode"
> > > > >
> > >
> > > Sorry for late. For virtio, I think it's not a must. The checksum stuff
> > > has been handled inside the Tx function. However, we may could move it
> > > to tx_prepare, which would actually recover the performance lost
> > > introduced while enabling TSO for the non-TSO case.
> > >
> >
> > So would you like to provide a patch for it,
> > Or would you like to keep tx_prepare() for virtio as NOP for now?
> 
> Hi Konstantin,
> 
> I'd keep it as it is for now. It should be a trivial patch after all, that
> I could provide it when everything are settled down.

Ok, thanks for clarification.
Konstantin

> 
>   --yliu


Re: [dpdk-dev] [PATCH 10/32] net/dpaa2: introducing dpaa2 bus driver for fsl-mc bus

2016-12-07 Thread Thomas Monjalon
2016-12-07 15:43, Shreyansh Jain:
> IMO, the way Bus is kept is debatable.
>   - should it be in EAL (lib/librte_eal/linuxapp/eal_pci.c like Bus 
> patches) [1]?
>   - Should it a 'handler/driver' parallel to device drivers?
> 
> I personally prefer a clean layer for buses with:
> 
>   - RTE_SDK/drivers/net/dpaa2/
>   - RTE_SDK/drivers/bus
>   - RTE_SDK/drivers/bus/dpaa2/
>   - RTE_SDK/drivers/bus/dpaa2/dpaa2_bus.c etc.

I agree, it is a good idea.

> For PCI, which is generic (or for other similar generic buses, like 
> platform), we can keep the implementation within lib/librte_eal/linuxapp/*.

I would be in favor of moving PCI and vdev code from EAL to drivers/bus/.
We can keep the API in EAL and implement the buses as drivers.

Other opinions?


[dpdk-dev] [PATCH] crypto/openssl: fix extra bytes being written at end of data

2016-12-07 Thread Piotr Azarewicz
Extra bytes are being written at end of data while process standard
openssl cipher encryption. This behaviour is unexpected.

This patch disable the padding feature in openssl library, which is
causing the problem.

Fixes: d61f70b4c918 ("crypto/libcrypto: add driver for OpenSSL library")

Signed-off-by: Piotr Azarewicz 
---
 drivers/crypto/openssl/rte_openssl_pmd.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/crypto/openssl/rte_openssl_pmd.c 
b/drivers/crypto/openssl/rte_openssl_pmd.c
index 5f8fa33..832ea1d 100644
--- a/drivers/crypto/openssl/rte_openssl_pmd.c
+++ b/drivers/crypto/openssl/rte_openssl_pmd.c
@@ -496,6 +496,8 @@
if (EVP_EncryptInit_ex(ctx, algo, NULL, key, iv) <= 0)
goto process_cipher_encrypt_err;
 
+   EVP_CIPHER_CTX_set_padding(ctx, 0);
+
if (EVP_EncryptUpdate(ctx, dst, &dstlen, src, srclen) <= 0)
goto process_cipher_encrypt_err;
 
-- 
1.7.9.5



Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model

2016-12-07 Thread Van Haaren, Harry
> From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]

Hi Jerin,

Re v2 rte_event struct, there seems to be some changes in the struct layout and 
field sizes. I've investigated them, and would like to propose some changes to 
balance the byte-alignment and accessing of the fields.

These changes target only the first 64 bits of the rte_event struct. I've left 
the current v2 code for reference, please find my proposed changes below.

> +struct rte_event {
> + /** WORD0 */
> + RTE_STD_C11
> + union {
> + uint64_t event;
> + /** Event attributes for dequeue or enqueue operation */
> + struct {
> + uint64_t flow_id:20;
> + /**< Targeted flow identifier for the enqueue and
> +  * dequeue operation.
> +  * The value must be in the range of
> +  * [0, nb_event_queue_flows - 1] which
> +  * previously supplied to rte_event_dev_configure().
> +  */
> + uint64_t sub_event_type:8;
> + /**< Sub-event types based on the event source.
> +  * @see RTE_EVENT_TYPE_CPU
> +  */
> + uint64_t event_type:4;
> + /**< Event type to classify the event source.
> +  * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
> +  */
> + uint64_t sched_type:2;
> + /**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
> +  * associated with flow id on a given event queue
> +  * for the enqueue and dequeue operation.
> +  */
> + uint64_t queue_id:8;
> + /**< Targeted event queue identifier for the enqueue or
> +  * dequeue operation.
> +  * The value must be in the range of
> +  * [0, nb_event_queues - 1] which previously supplied to
> +  * rte_event_dev_configure().
> +  */
> + uint64_t priority:8;
> + /**< Event priority relative to other events in the
> +  * event queue. The requested priority should in the
> +  * range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST,
> +  * RTE_EVENT_DEV_PRIORITY_LOWEST].
> +  * The implementation shall normalize the requested
> +  * priority to supported priority value.
> +  * Valid when the device has
> +  * RTE_EVENT_DEV_CAP_FLAG_EVENT_QOS capability.
> +  */
> + uint64_t op:2;
> + /**< The type of event enqueue operation - new/forward/
> +  * etc.This field is not preserved across an instance
> +  * and is undefined on dequeue.
> +  *  @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
> +  */
> + uint64_t impl_opaque:12;
> + /**< Implementation specific opaque value.
> +  * An implementation may use this field to hold
> +  * implementation specific value to share between
> +  * dequeue and enqueue operation.
> +  * The application should not modify this field.
> +  */
> + };
> + };

struct rte_event {
/** WORD0 */
RTE_STD_C11
union {
uint64_t event;
struct {
uint32_t flow_id: 24;
uint32_t impl_opaque : 8; /* not defined on deq */

uint8_t queue_id;
uint8_t priority;

uint8_t operation  : 4; /* new fwd drop */
uint8_t sched_type : 4;

uint8_t event_type : 4;
uint8_t sub_event_type : 4;
};
};
/** word 1 */



The changes made are as follows:
* Restore flow_id to 24 bits of a 32 bit int (previous size was 20 bits)
* Add impl_opaque to the remaining 8 bits of those 32 bits (previous size was 
12 bits)

* QueueID and Priority remain 8 bit integers - but now accessible as 8 bit ints.

* Operation and sched_type *increased* to 4 bits each (from previous value of 
2) to allow future expansion without ABI changes

* Event type remains constant at 4 bits
* sub-event-type reduced to 4 bits (previous value was 8 bits). Can we think of 
situations where 16 values for application specified identifiers of each 
event-type is genuinely not enough?

In my opinion this structure layout is more balanced, and will perform better 
due to less loads that will need masking to access the required value.


Feedback and impro

Re: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler

2016-12-07 Thread Alan Robertson
Hi Cristian,

Looking at points 10 and 11 it's good to hear nodes can be dynamically added.

We've been trying to decide the best way to do this for support of qos on 
tunnels for
some time now and the existing implementation doesn't allow this so effectively 
ruled
out hierarchical queueing for tunnel targets on the output interface.

Having said that, has thought been given to separating the queueing from being 
so closely
tied to the Ethernet transmit process ?   When queueing on a tunnel for example 
we may
be working with encryption.   When running with an anti-reply window it is 
really much
better to do the QOS (packet reordering) before the encryption.  To support 
this would
it be possible to have a separate scheduler structure which can be passed into 
the
scheduling API ?  This means the calling code can hang the structure of 
whatever entity
it wishes to perform qos on, and we get dynamic target support 
(sessions/tunnels etc).

Regarding the structure allocation, would it be possible to make the number of 
queues
associated with a TC a compile time option which the scheduler would 
accommodate ?
We frequently only use one queue per tc which means 75% of the space allocated 
at
the queueing layer for that tc is never used.  This may be specific to our 
implementation
but if other implementations do the same if folks could say we may get a better 
idea
if this is a common case.

Whilst touching on the scheduler, the token replenishment works using a 
division and
multiplication obviously to cater for the fact that it may be run after several 
tc windows
have passed.  The most commonly used industrial scheduler simply does a lapsed 
on the tc
and then adds the bc.   This relies on the scheduler being called within the tc 
window
though.  It would be nice to have this as a configurable option since it's much 
for efficient
assuming the infra code from which it's called can guarantee the calling 
frequency.

I hope you'll consider these points for inclusion into a future road map.  
Hopefully in the
future my employer will increase the priority of some of the tasks and a PR may 
appear
on the mailing list.

Thanks,
Alan.

Subject:

[dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler

Date:

Wed, 30 Nov 2016 18:16:50 +

From:

Cristian Dumitrescu 


To:

dev@dpdk.org

CC:

cristian.dumitre...@intel.com



This RFC proposes an ethdev-based abstraction layer for Quality of Service (QoS)

hierarchical scheduler. The goal of the abstraction layer is to provide a simple

generic API that is agnostic of the underlying HW, SW or mixed HW-SW complex

implementation.



Q1: What is the benefit for having an abstraction layer for QoS hierarchical

layer?

A1: There is growing interest in the industry for handling various HW-based,

SW-based or mixed hierarchical scheduler implementations using a unified DPDK

API.



Q2: Which devices are targeted by this abstraction layer?

A2: All current and future devices that expose a hierarchical scheduler feature

under DPDK, including NICs, FPGAs, ASICs, SOCs, SW libraries.



Q3: Which scheduler hierarchies are supported by the API?

A3: Hopefully any scheduler hierarchy can be described and covered by the

current API. Of course, functional correctness, accuracy and performance levels

depend on the specific implementations of this API.



Q4: Why have this abstraction layer into ethdev as opposed to a new type of

device (e.g. scheddev) similar to ethdev, cryptodev, eventdev, etc?

A4: Packets are sent to the Ethernet device using the ethdev API

rte_eth_tx_burst() function, with the hierarchical scheduling taking place

automatically (i.e. no SW intervention) in HW implementations. Basically, the

hierarchical scheduler is done as part of packet TX operation.

The hierarchical scheduler is typically the last stage before packet TX and it

is tightly integrated with the TX stage. The hierarchical scheduler is just

another offload feature of the Ethernet device, which needs to be accommodated

by the ethdev API similar to any other offload feature (such as RSS, DCB,

flow director, etc).

Once the decision to schedule a specific packet has been taken, this packet

cannot be dropped and it has to be sent over the wire as is, otherwise what

takes place on the wire is not what was planned at scheduling time, so the

scheduling is not accurate (Note: there are some devices which allow prepending

headers to the packet after the scheduling stage at the expense of sending

correction requests back to the scheduler, but this only strengthens the bond

between scheduling and TX).



Q5: Given that the packet scheduling takes place automatically for pure HW

implementations, how does packet scheduling take place for poll-mode SW

implementations?

A5: The API provided function rte_sched_run() is designed to take care of this.

For HW implementations, t

Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model

2016-12-07 Thread Bruce Richardson
On Tue, Dec 06, 2016 at 09:22:15AM +0530, Jerin Jacob wrote:
> In a polling model, lcores poll ethdev ports and associated
> rx queues directly to look for packet. In an event driven model,
> by contrast, lcores call the scheduler that selects packets for
> them based on programmer-specified criteria. Eventdev library
> adds support for event driven programming model, which offer
> applications automatic multicore scaling, dynamic load balancing,
> pipelining, packet ingress order maintenance and
> synchronization services to simplify application packet processing.
> 
> By introducing event driven programming model, DPDK can support
> both polling and event driven programming models for packet processing,
> and applications are free to choose whatever model
> (or combination of the two) that best suits their needs.
> 
> This patch adds the eventdev specification header file.
> 
> Signed-off-by: Jerin Jacob 
> ---
>  MAINTAINERS|3 +
>  doc/api/doxy-api-index.md  |1 +
>  doc/api/doxy-api.conf  |1 +
>  lib/librte_eventdev/rte_eventdev.h | 1274 
> 
>  4 files changed, 1279 insertions(+)

> +
> +/** Structure to hold the queue to port link establishment attributes */
> +struct rte_event_queue_link {
> + uint8_t queue_id;
> + /**< Event queue identifier to select the source queue to link */
> + uint8_t priority;
> + /**< The priority of the event queue for this event port.
> +  * The priority defines the event port's servicing priority for
> +  * event queue, which may be ignored by an implementation.
> +  * The requested priority should in the range of
> +  * [RTE_EVENT_DEV_PRIORITY_HIGHEST, RTE_EVENT_DEV_PRIORITY_LOWEST].
> +  * The implementation shall normalize the requested priority to
> +  * implementation supported priority value.
> +  */
> +};
> +
> +/**
> + * Link multiple source event queues supplied in *rte_event_queue_link*
> + * structure as *queue_id* to the destination event port designated by its
> + * *port_id* on the event device designated by its *dev_id*.
> + *
> + * The link establishment shall enable the event port *port_id* from
> + * receiving events from the specified event queue *queue_id*
> + *
> + * An event queue may link to one or more event ports.
> + * The number of links can be established from an event queue to event port 
> is
> + * implementation defined.
> + *
> + * Event queue(s) to event port link establishment can be changed at runtime
> + * without re-configuring the device to support scaling and to reduce the
> + * latency of critical work by establishing the link with more event ports
> + * at runtime.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @param port_id
> + *   Event port identifier to select the destination port to link.
> + *
> + * @param link
> + *   Points to an array of *nb_links* objects of type *rte_event_queue_link*
> + *   structure which contain the event queue to event port link establishment
> + *   attributes.
> + *   NULL value is allowed, in which case this function links all the 
> configured
> + *   event queues *nb_event_queues* which previously supplied to
> + *   rte_event_dev_configure() to the event port *port_id* with normal 
> servicing
> + *   priority(RTE_EVENT_DEV_PRIORITY_NORMAL).
> + *
> + * @param nb_links
> + *   The number of links to establish
> + *
> + * @return
> + * The number of links actually established. The return value can be less 
> than
> + * the value of the *nb_links* parameter when the implementation has the
> + * limitation on specific queue to port link establishment or if invalid
> + * parameters are specified in a *rte_event_queue_link*.
> + * If the return value is less than *nb_links*, the remaining links at the 
> end
> + * of link[] are not established, and the caller has to take care of them.
> + * If return value is less than *nb_links* then implementation shall update 
> the
> + * rte_errno accordingly, Possible rte_errno values are
> + * (-EDQUOT) Quota exceeded(Application tried to link the queue configured 
> with
> + *  RTE_EVENT_QUEUE_CFG_FLAG_SINGLE_LINK to more than one event ports)
> + * (-EINVAL) Invalid parameter
> + *
> + */
> +int
> +rte_event_port_link(uint8_t dev_id, uint8_t port_id,
> + const struct rte_event_queue_link link[],
> + uint16_t nb_links);
> +

Hi again Jerin,

another small suggestion here. I'm not a big fan of using small
structures to pass parameters into functions, especially when not all
fields are always going to be used. Rather than use the event queue link
structure, can we just pass in two array parameters here - the list of
QIDs, and the list of priorities. In cases where the eventdev
implementation does not support link prioritization, or where the app
does not want different priority mappings , then the second
array can be null [implying NORMAL priority for the don't care 

Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine

2016-12-07 Thread Ferruh Yigit
On 11/23/2016 5:36 PM, Tomasz Kulasek wrote:
> Added "csum txprep (on|off)" command which allows to switch to the
> tx path using Tx preparation API.
> 
> By default unchanged implementation is used.
> 
> Using Tx preparation path, pseudo header calculation for udp/tcp/tso
> packets from application, and used Tx preparation API for
> packet preparation and verification.
> 
> Adding additional step to the csum engine costs about 3-4% of performance
> drop, on my setup with ixgbe driver. It's caused mostly by the need
> of reaccessing and modification of packet data.
> 
> Signed-off-by: Tomasz Kulasek 
> Acked-by: Konstantin Ananyev 
> ---
>  app/test-pmd/cmdline.c  |   49 
> +++
>  app/test-pmd/csumonly.c |   33 ---
>  app/test-pmd/testpmd.c  |5 +
>  app/test-pmd/testpmd.h  |2 ++
>  4 files changed, 82 insertions(+), 7 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 63b55dc..373fc59 100644
<...>
> +cmdline_parse_inst_t cmd_csum_txprep = {
> + .f = cmd_csum_txprep_parsed,
> + .data = NULL,
> + .help_str = "enable/disable tx preparation path for csum engine: "
> + "csum txprep on|off",

Can you please format help string as:
"cmd fixed_string fixed|string|options : Description"
see commit 26faac80327f

above becomes:
"csum txprep on|off: Enable/Disable tx preparation path for csum engine"

<...>


[dpdk-dev] [PATCH] Force python scripts to run with python2

2016-12-07 Thread Martin Kletzander
With python3 being the default in some distributions/installations,
shebang with just "python" will make the script run under version of
python that the scripts are not written for.  In order to fix that and
mitigate future errors, use shebang properly and specify the python
version.  That way the scripts will run in any distro/install, no
matter what python version is set as the default.

Signed-off-by: Martin Kletzander 
---
 app/cmdline_test/cmdline_test.py| 2 +-
 app/cmdline_test/cmdline_test_data.py   | 2 +-
 app/test/autotest.py| 2 +-
 app/test/autotest_data.py   | 2 +-
 app/test/autotest_runner.py | 2 +-
 app/test/autotest_test_funcs.py | 2 +-
 examples/ip_pipeline/config/diagram-generator.py| 2 +-
 examples/ip_pipeline/config/pipeline-to-core-mapping.py | 2 +-
 tools/cpu_layout.py | 2 +-
 tools/dpdk-devbind.py   | 2 +-
 tools/dpdk-pmdinfo.py   | 2 +-
 11 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/app/cmdline_test/cmdline_test.py b/app/cmdline_test/cmdline_test.py
index 8efc5ead4439..3a3937abc8fb 100755
--- a/app/cmdline_test/cmdline_test.py
+++ b/app/cmdline_test/cmdline_test.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/env python2

 #   BSD LICENSE
 #
diff --git a/app/cmdline_test/cmdline_test_data.py 
b/app/cmdline_test/cmdline_test_data.py
index b1945a579f24..882b7355f83d 100644
--- a/app/cmdline_test/cmdline_test_data.py
+++ b/app/cmdline_test/cmdline_test_data.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/env python2

 #   BSD LICENSE
 #
diff --git a/app/test/autotest.py b/app/test/autotest.py
index b9fd6b6f5b21..4891410dbbd6 100644
--- a/app/test/autotest.py
+++ b/app/test/autotest.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/env python2

 #   BSD LICENSE
 #
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 9e8fd946a063..6e84eccc6a2e 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/env python2

 #   BSD LICENSE
 #
diff --git a/app/test/autotest_runner.py b/app/test/autotest_runner.py
index 21d3be2cb2cd..1eb64d3bedd5 100644
--- a/app/test/autotest_runner.py
+++ b/app/test/autotest_runner.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/env python2

 #   BSD LICENSE
 #
diff --git a/app/test/autotest_test_funcs.py b/app/test/autotest_test_funcs.py
index 14cffd014565..8ee9b280706b 100644
--- a/app/test/autotest_test_funcs.py
+++ b/app/test/autotest_test_funcs.py
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/env python2

 #   BSD LICENSE
 #
diff --git a/examples/ip_pipeline/config/diagram-generator.py 
b/examples/ip_pipeline/config/diagram-generator.py
index 6b7170b00486..7a2eb3fd6319 100755
--- a/examples/ip_pipeline/config/diagram-generator.py
+++ b/examples/ip_pipeline/config/diagram-generator.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python2

 #   BSD LICENSE
 #
diff --git a/examples/ip_pipeline/config/pipeline-to-core-mapping.py 
b/examples/ip_pipeline/config/pipeline-to-core-mapping.py
index c2050b82a9fb..355061405d9d 100755
--- a/examples/ip_pipeline/config/pipeline-to-core-mapping.py
+++ b/examples/ip_pipeline/config/pipeline-to-core-mapping.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python2

 #   BSD LICENSE
 #
diff --git a/tools/cpu_layout.py b/tools/cpu_layout.py
index d38d0b5a0cfe..97038b37aa28 100755
--- a/tools/cpu_layout.py
+++ b/tools/cpu_layout.py
@@ -1,4 +1,4 @@
-#! /usr/bin/python
+#!/usr/bin/env python2
 #
 #   BSD LICENSE
 #
diff --git a/tools/dpdk-devbind.py b/tools/dpdk-devbind.py
index f1d374d6b08c..f520f9a99d5b 100755
--- a/tools/dpdk-devbind.py
+++ b/tools/dpdk-devbind.py
@@ -1,4 +1,4 @@
-#! /usr/bin/python
+#!/usr/bin/env python2
 #
 #   BSD LICENSE
 #
diff --git a/tools/dpdk-pmdinfo.py b/tools/dpdk-pmdinfo.py
index 3db9819c61d3..0a8fba01f0ff 100755
--- a/tools/dpdk-pmdinfo.py
+++ b/tools/dpdk-pmdinfo.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python2
 # -
 #
 # Utility to dump PMD_INFO_STRING support from an object file
-- 
2.11.0



Re: [dpdk-dev] [PATCH 1/8] drivers/common/dpaa2: Run time assembler for Descriptor formation

2016-12-07 Thread Akhil Goyal


-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Wednesday, December 07, 2016 2:03 PM
To: Akhil Goyal 
Cc: dev@dpdk.org; declan.dohe...@intel.com; pablo.de.lara.gua...@intel.com; 
Hemant Agrawal ; Horia Geantă 
Subject: Re: [PATCH 1/8] drivers/common/dpaa2: Run time assembler for 
Descriptor formation

2016-12-07 06:24, Akhil Goyal:
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
> 2016-12-05 18:25, Akhil Goyal:
> > FLib is a library which helps in making the descriptors which is 
> > understood by NXP's SEC hardware.
> > This patch provides header files for command words which can be used 
> > for descritptor formation.
> 
> It seems this code is old. Does it exist as a standalone library somewhere?
[Akhil] Let me correct here. This is not a library. This is a set of header 
files.
Yes this is an old code. This is generally shipped with NXP SDK.

> Where was it hosted before duplicating it in DPDK?
[Akhil] This is part of NXP SDK and also available at git.freescale.com.

> 
> Why do you want to have a common directory drivers/common/dpaa2/flib instead 
> of a sub-directory in the crypto driver?
[Akhil] I agree with your suggestion. This can be maintained within 
drivers/crypto as a common header files set for different NXP Architecture 
crypto drivers.

> 
> [Akhil] This is not really a library. This is a set of header files which is 
> required for compilation. We have 2 other cypto drivers (for different 
> platforms viz: Non-DPAA and DPAA1_QORIQ) which uses the same flib. So we put 
> it in common directory. We plan to send patches for other drivers in the 
> upcoming releases. 

Please Akhil, could you answer to the three questions?


Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine

2016-12-07 Thread Kulasek, TomaszX
Hi,

> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, December 7, 2016 12:14
> To: Kulasek, TomaszX ; dev@dpdk.org
> Cc: Ananyev, Konstantin ;
> olivier.m...@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> csum engine
> 
> On 11/23/2016 5:36 PM, Tomasz Kulasek wrote:
> > Added "csum txprep (on|off)" command which allows to switch to the tx
> > path using Tx preparation API.
> >
> > By default unchanged implementation is used.
> >
> > Using Tx preparation path, pseudo header calculation for udp/tcp/tso
> > packets from application, and used Tx preparation API for packet
> > preparation and verification.
> >
> > Adding additional step to the csum engine costs about 3-4% of
> > performance drop, on my setup with ixgbe driver. It's caused mostly by
> > the need of reaccessing and modification of packet data.
> >
> > Signed-off-by: Tomasz Kulasek 
> > Acked-by: Konstantin Ananyev 
> > ---
> >  app/test-pmd/cmdline.c  |   49
> +++
> >  app/test-pmd/csumonly.c |   33 ---
> >  app/test-pmd/testpmd.c  |5 +
> >  app/test-pmd/testpmd.h  |2 ++
> >  4 files changed, 82 insertions(+), 7 deletions(-)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > 63b55dc..373fc59 100644
> <...>
> > +cmdline_parse_inst_t cmd_csum_txprep = {
> > +   .f = cmd_csum_txprep_parsed,
> > +   .data = NULL,
> > +   .help_str = "enable/disable tx preparation path for csum engine: "
> > +   "csum txprep on|off",
> 
> Can you please format help string as:
> "cmd fixed_string fixed|string|options : Description"
> see commit 26faac80327f
> 
> above becomes:
> "csum txprep on|off: Enable/Disable tx preparation path for csum engine"
> 
> <...>

Sure, thanks.

Tomasz


Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine

2016-12-07 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Wednesday, December 7, 2016 11:14 AM
> To: Kulasek, TomaszX ; dev@dpdk.org
> Cc: Ananyev, Konstantin ;
> olivier.m...@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> csum engine
> 
> ...
> <...>
> > +cmdline_parse_inst_t cmd_csum_txprep = {
> > +   .f = cmd_csum_txprep_parsed,
> > +   .data = NULL,
> > +   .help_str = "enable/disable tx preparation path for csum engine: "
> > +   "csum txprep on|off",
> 
> Can you please format help string as:
> "cmd fixed_string fixed|string|options : Description"
> see commit 26faac80327f
> 
> above becomes:
> "csum txprep on|off: Enable/Disable tx preparation path for csum engine"
> 


Also, does this require an update to the testpmd docs?

 


Re: [dpdk-dev] [PATCH] Force python scripts to run with python2

2016-12-07 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Martin Kletzander
> Sent: Wednesday, December 7, 2016 10:17 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] Force python scripts to run with python2
> 
> With python3 being the default in some distributions/installations,
> shebang with just "python" will make the script run under version of
> python that the scripts are not written for.  In order to fix that and
> mitigate future errors, use shebang properly and specify the python
> version.  That way the scripts will run in any distro/install, no matter
> what python version is set as the default.

I think a better approach would be to make the scripts Python 2 and Python 3 
compatible.

Some of the new ones already are.





Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine

2016-12-07 Thread Kulasek, TomaszX
Hi John,

> -Original Message-
> From: Mcnamara, John
> Sent: Wednesday, December 7, 2016 13:01
> To: Yigit, Ferruh ; Kulasek, TomaszX
> ; dev@dpdk.org
> Cc: Ananyev, Konstantin ;
> olivier.m...@6wind.com
> Subject: RE: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> csum engine
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit
> > Sent: Wednesday, December 7, 2016 11:14 AM
> > To: Kulasek, TomaszX ; dev@dpdk.org
> > Cc: Ananyev, Konstantin ;
> > olivier.m...@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> > csum engine
> >
> > ...
> > <...>
> > > +cmdline_parse_inst_t cmd_csum_txprep = {
> > > + .f = cmd_csum_txprep_parsed,
> > > + .data = NULL,
> > > + .help_str = "enable/disable tx preparation path for csum engine: "
> > > + "csum txprep on|off",
> >
> > Can you please format help string as:
> > "cmd fixed_string fixed|string|options : Description"
> > see commit 26faac80327f
> >
> > above becomes:
> > "csum txprep on|off: Enable/Disable tx preparation path for csum engine"
> >
> 
> 
> Also, does this require an update to the testpmd docs?
> 
> 

Yes, I think. I will add adequate description to the docs.

Tomasz


Re: [dpdk-dev] [PATCH 00/13] Introducing EAL Bus-Device-Driver Model

2016-12-07 Thread David Marchand
Hello Shreyansh,

On Wed, Dec 7, 2016 at 10:55 AM, Shreyansh Jain  wrote:
> On Wednesday 07 December 2016 02:22 AM, David Marchand wrote:
>>> 0002~0003: Introducing the basic Bus model and associated test case
>>> 0005:  Support insertion of device rather than addition to tail
>>
>>
>> Patch 2 and 5 could be squashed.
>
>
> I deliberately kept them separate. I intent to extend the Patch 5 for
> hotplugging. But, if I don't end up adding support for that in this series,
> I will merge these two.

Fine.


>> The constructor priority stuff seems unneeded as long as we use
>> explicit reference to a global (or local, did not check) bus symbol
>> rather than a runtime lookup.
>
>
> I didn't understand your point here.
> IMO, constructor priority (or some other way to handle this) is important. I
> faced this issue while verifying it at my end when the drivers were getting
> registered before the bus.
>
> Can you elaborate more on '..use explicit reference to a global...'?

The drivers register themselves to a bus using this bus specific api.

For pci, this is rte_eal_pci_register().
The pci_bus object must be moved to eal_common_pci.c (we can stil
internally expose for bsd / linux specific implementations).
Then, rte_eal_pci_register() can add the pci driver to the pci_bus
drivers list even if this pci_bus object is not registered yet to the
buses list.

And no constructor order issue ?


>>
>>
>>> 0004:  Add scan and match callbacks for the Bus and updated test case
>>
>>
>> Why do you push back the bus object in the 'scan' method ?
>> This method is bus specific which means that the code "knows" the
>> object registered with the callback.
>
>
> This 'knows' is the grey area for me.
> The bus (for example, PCI) after scanning needs to call
> rte_eal_bus_add_device() to link the device in bus's device_list.
>
> Two options:
> 1. Have a global reference to "pci" bus (rte_bus) somewhere in eal_pci.c
> 2. Call rte_eal_get_bus() every time someone needs the reference.
> 3. C++ style, 'this->'.
>
> I have taken the 3rd path. It simplifies my code to not assume a handle as
> well as not allow for reference fetch calls every now and then.
>
> As a disadvantage: it means passing this as argument - and some cases
> maintaining it as __rte_unused.
>
> Taking (1) or (2) is not advantageous than this approach.

1) is the simplest one.

When you write a pci_scan method and embed it in you pci_bus object,
but this pci_scan method still wonders which bus object it is supposed
to work on, this is a bit like Schizophrenia ;-).


>> Is is that you want to have a single scan method used by multiple buses ?
>
>
> Yes, but only as a use case. For example, platform devices are of various
> types - what if we have a south-bound bus over a platform bus. In which
> case, a hierarchical bus layout is possible.
> But, this is far-fetched idea for now.

Well, if you have no usecase at the moment, let's keep it simple, please.


>>
>>> 0006:  Integrate bus scan/match with EAL, without any effective
>>> driver
>>
>>
>> Hard to find a right balance in patch splittng, but patch 4 and 6 are
>> linked, I would squash them into one.
>
>
> Yes, it is hard and sometimes there is simply no strong rationale for
> splitting or merging. This is one of those cases.
> My idea was that one patch _only_ introduces Bus services (structures,
> functions etc) and another should enable the calls to it from EAL.
> In that sense, I still think 4 and 6 should remain separate, may be
> consecutive, though.

Ok, will see in next version of the patchset.


>>
>>> 0007:  rte_pci_driver->probe replaced with rte_driver->probe
>>
>>
>> This patch is too big, please separate in two patches: eal changes
>> then ethdev/driver changes.
>
>
> I don't think that can be done. One change is incomplete without the other.
>
> Changes to all files are only for rte_pci_driver->probe to rte_driver->probe
> movement. EAL changes is to allow rte_eth_dev_pci_probe function after such
> a change as rte_driver->probe has different arguments as compared to
> rte_pci_driver->probe. The patches won't compile if I split.
>
> Am I missing something?
>>
>> Why do you push back the driver object in the 'probe' method ? (idem
>> rte_bus->scan).
>
>
> I am assuming you are referring to rte_driver->probe().
> This is being done so that implementations (specific to drivers on a
> particular bus) can start extracting the rte_xxx_driver, if need be.
>
> For example, for e1000/em_ethdev.c, rte_driver->probe() have been set to
> rte_eth_dev_pci_probe() which requires rte_pci_driver to work with. In
> absence of the rte_driver object, this function cannot call
> rte_pci_driver->probe (for example) for driver specific operations.

Sorry, I am thinking a step ahead with eth_driver out of the picture.
But once eth_driver disappears, I can see no reason to keep this
driver in the probe method (Schizophrenia again).



-- 
David Marchand


Re: [dpdk-dev] [PATCH 10/32] net/dpaa2: introducing dpaa2 bus driver for fsl-mc bus

2016-12-07 Thread David Marchand
On Wed, Dec 7, 2016 at 11:40 AM, Thomas Monjalon
 wrote:
> 2016-12-07 15:43, Shreyansh Jain:
>> IMO, the way Bus is kept is debatable.
>>   - should it be in EAL (lib/librte_eal/linuxapp/eal_pci.c like Bus
>> patches) [1]?
>>   - Should it a 'handler/driver' parallel to device drivers?
>>
>> I personally prefer a clean layer for buses with:
>>
>>   - RTE_SDK/drivers/net/dpaa2/
>>   - RTE_SDK/drivers/bus
>>   - RTE_SDK/drivers/bus/dpaa2/
>>   - RTE_SDK/drivers/bus/dpaa2/dpaa2_bus.c etc.
>
> I agree, it is a good idea.

Indeed.

>> For PCI, which is generic (or for other similar generic buses, like
>> platform), we can keep the implementation within lib/librte_eal/linuxapp/*.
>
> I would be in favor of moving PCI and vdev code from EAL to drivers/bus/.
> We can keep the API in EAL and implement the buses as drivers.
>
> Other opinions?

The only issue I see for now is how to pass the configuration to these
drivers, like vdev args or the pci blacklist/whitelist.


-- 
David Marchand


Re: [dpdk-dev] [PATCH 10/32] net/dpaa2: introducing dpaa2 bus driver for fsl-mc bus

2016-12-07 Thread Hemant Agrawal
> -Original Message-
> From: David Marchand [mailto:david.march...@6wind.com]
> Sent: Wednesday, December 07, 2016 5:52 PM
> To: Thomas Monjalon 
> Cc: Shreyansh Jain ; Hemant Agrawal
> ; dev@dpdk.org; Richardson, Bruce
> 
> Subject: Re: [PATCH 10/32] net/dpaa2: introducing dpaa2 bus driver for fsl-mc
> bus
> 
> On Wed, Dec 7, 2016 at 11:40 AM, Thomas Monjalon
>  wrote:
> > 2016-12-07 15:43, Shreyansh Jain:
> >> IMO, the way Bus is kept is debatable.
> >>   - should it be in EAL (lib/librte_eal/linuxapp/eal_pci.c like Bus
> >> patches) [1]?
> >>   - Should it a 'handler/driver' parallel to device drivers?
> >>
> >> I personally prefer a clean layer for buses with:
> >>
> >>   - RTE_SDK/drivers/net/dpaa2/
> >>   - RTE_SDK/drivers/bus
> >>   - RTE_SDK/drivers/bus/dpaa2/
> >>   - RTE_SDK/drivers/bus/dpaa2/dpaa2_bus.c etc.
> >
> > I agree, it is a good idea.
> 
> Indeed.

[Hemant]  I  will fix it in v2. 
> 
> >> For PCI, which is generic (or for other similar generic buses, like
> >> platform), we can keep the implementation within lib/librte_eal/linuxapp/*.
> >
> > I would be in favor of moving PCI and vdev code from EAL to drivers/bus/.
> > We can keep the API in EAL and implement the buses as drivers.
> >
> > Other opinions?
> 
> The only issue I see for now is how to pass the configuration to these 
> drivers,
> like vdev args or the pci blacklist/whitelist.
> 
> 
> --
> David Marchand


Re: [dpdk-dev] [PATCH v2 02/32] net/i40e: add callback to user on VF to PF mbox msg

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
> The callback asks the user application if it is allowed to
> perform the mailbox messages.
> 
> If the return value from user is RTE_PMD_I40E_MB_EVENT_PROCEED
> then continue. If ACK or NACK, do nothing and send
> not_supported to VF.
> 
> Signed-off-by: Wenzhuo Lu 
> ---
>  drivers/net/i40e/i40e_pf.c  | 230 
> ++--
>  drivers/net/i40e/rte_pmd_i40e.h |  21 
>  2 files changed, 216 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_pf.c b/drivers/net/i40e/i40e_pf.c
> index f70712b..8b8a14f 100644
> --- a/drivers/net/i40e/i40e_pf.c
> +++ b/drivers/net/i40e/i40e_pf.c
> @@ -55,6 +55,7 @@
>  #include "i40e_ethdev.h"
>  #include "i40e_rxtx.h"
>  #include "i40e_pf.h"
> +#include "rte_pmd_i40e.h"
>  
>  #define I40E_CFG_CRCSTRIP_DEFAULT 1
>  
> @@ -272,14 +273,23 @@
>  }
>  
>  static void
> -i40e_pf_host_process_cmd_version(struct i40e_pf_vf *vf)
> +i40e_pf_host_process_cmd_version(struct i40e_pf_vf *vf, bool b_op)
>  {
>   struct i40e_virtchnl_version_info info;
>  
>   info.major = I40E_DPDK_VERSION_MAJOR;
>   info.minor = I40E_DPDK_VERSION_MINOR;
> - i40e_pf_host_send_msg_to_vf(vf, I40E_VIRTCHNL_OP_VERSION,
> - I40E_SUCCESS, (uint8_t *)&info, sizeof(info));
> +
> + if (b_op)
> + i40e_pf_host_send_msg_to_vf(vf, I40E_VIRTCHNL_OP_VERSION,
> + I40E_SUCCESS,
> + (uint8_t *)&info,
> + sizeof(info));
> + else
> + i40e_pf_host_send_msg_to_vf(vf, I40E_VIRTCHNL_OP_VERSION,
> + I40E_NOT_SUPPORTED,
> + (uint8_t *)&info,
> + sizeof(info));

Even I40E_NOT_SUPPORTED returned to the VF, it seems VF is ignoring this
value, is it something should be fixed as part of this patch?

This path is a little complex, I may be missing something but stack
trace is:

VF:
i40evf_check_api_version()
  i40evf_execute_vf_cmd(vfd_cmd_info arg)
i40e_aq_send_msg_to_pf(arg->op, retval, arg->in_msg)
  desc <- op, retval
  msg_in <- arg_in_msg
  i40e_asq_send_command(desc, msg_in)

PF:
i40e_pf_host_handle_vf_msg(op, msg_in)
  i40e_pf_host_process_cmd_version()
i40e_pf_host_send_msg_to_vf(op, retval, msg_out)
  i40e_aq_send_msg_to_vf(op, retval, msg_out)
desc <- op, retval
i40e_asq_send_command(desc, msg_out)

VF:
data <- arg->out_xxx
i40evf_read_pfmsg(data)
  event <- data->out_msg
  op <-
  retval <-
  i40e_clean_arq_element(event)
 event->desc <- desc
 event->msg <- msg_out
 data->result = retval<
   return 0;
 ver = arg->out_msg
 return 0;


So, as far as I can see I40E_NOT_SUPPORTED is somewhere in the stack but
not reached to the final VF function, is this OK?


>  }
>  
>  static int
> @@ -292,13 +302,20 @@
>  }
>  
>  static int
> -i40e_pf_host_process_cmd_get_vf_resource(struct i40e_pf_vf *vf)
> +i40e_pf_host_process_cmd_get_vf_resource(struct i40e_pf_vf *vf, bool b_op)
>  {
>   struct i40e_virtchnl_vf_resource *vf_res = NULL;
>   struct i40e_hw *hw = I40E_PF_TO_HW(vf->pf);
>   uint32_t len = 0;
>   int ret = I40E_SUCCESS;
>  
> + if (!b_op) {
> + i40e_pf_host_send_msg_to_vf(vf,
> + I40E_VIRTCHNL_OP_GET_VF_RESOURCES,
> + I40E_NOT_SUPPORTED, NULL, 0);
> + return ret;
> + }
> +
>   /* only have 1 VSI by default */
>   len =  sizeof(struct i40e_virtchnl_vf_resource) +
>   I40E_DEFAULT_VF_VSI_NUM *
> @@ -423,7 +440,8 @@
>  static int
>  i40e_pf_host_process_cmd_config_vsi_queues(struct i40e_pf_vf *vf,
>  uint8_t *msg,
> -uint16_t msglen)
> +uint16_t msglen,
> +bool b_op)
>  {
>   struct i40e_hw *hw = I40E_PF_TO_HW(vf->pf);
>   struct i40e_vsi *vsi = vf->vsi;
> @@ -432,6 +450,13 @@
>   struct i40e_virtchnl_queue_pair_info *vc_qpi;
>   int i, ret = I40E_SUCCESS;
>  
> + if (!b_op) {
> + i40e_pf_host_send_msg_to_vf(vf,
> + I40E_VIRTCHNL_OP_CONFIG_VSI_QUEUES,
> + I40E_NOT_SUPPORTED, NULL, 0);
> + return ret;
> + }
> +
>   if (!msg || vc_vqci->num_queue_pairs > vsi->nb_qps ||
>   vc_vqci->num_queue_pairs > I40E_MAX_VSI_QP ||
>   msglen < I40E_VIRTCHNL_CONFIG_VSI_QUEUES_SIZE(vc_vqci,
> @@ -482,7 +507,8 @@
>  static int
>  i40e_pf_host_process_cmd_config_vsi_queues_ext(struct i40e_pf_vf *vf,
>  uint8_t *msg,
> -  

[dpdk-dev] [PATCH v2] vhost: allow for many vhost user ports

2016-12-07 Thread Jan Wickbom
Currently select() is used to monitor file descriptors for vhostuser
ports. This limits the number of ports possible to create since the
fd number is used as index in the fd_set and we have seen fds > 1023.
This patch changes select() to poll(). This way we can keep an
packed (pollfd) array for the fds, e.g. as many fds as the size of
the array.

Also see:
http://dpdk.org/ml/archives/dev/2016-April/037024.html

Signed-off-by: Jan Wickbom 
Reported-by: Patrik Andersson 
---

v2:
* removed unnecessary casts
* static array replacing allocated memory

 lib/librte_vhost/fd_man.c | 142 +++---
 lib/librte_vhost/fd_man.h |   2 +-
 2 files changed, 85 insertions(+), 59 deletions(-)

diff --git a/lib/librte_vhost/fd_man.c b/lib/librte_vhost/fd_man.c
index 2d3eeb7..3c743e3 100644
--- a/lib/librte_vhost/fd_man.c
+++ b/lib/librte_vhost/fd_man.c
@@ -35,16 +35,43 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 #include 
+#include 
 #include 
 
 #include "fd_man.h"
 
+#define FDPOLLERR (POLLERR | POLLHUP | POLLNVAL)
+
+
+static struct pollfd rwfds[MAX_FDS];
+
+/**
+ * Adjusts the highest index populated in the array of fds
+ * @return
+ *   The new size of fdset.
+ */
+static int
+fdset_shrink(struct fdset *pfdset)
+{
+   int idx;
+
+   for (idx = pfdset->num - 1;
+idx >= 0 && pfdset->fd[idx].fd == -1;
+idx--)
+   ;
+
+   pfdset->num = idx + 1;
+
+   return pfdset->num;
+}
+
 /**
  * Returns the index in the fdset for a given fd.
  * If fd is -1, it means to search for a free entry.
@@ -56,72 +83,62 @@
 {
int i;
 
-   if (pfdset == NULL)
-   return -1;
-
-   for (i = 0; i < MAX_FDS && pfdset->fd[i].fd != fd; i++)
+   for (i = 0; i < pfdset->num && pfdset->fd[i].fd != fd; i++)
;
 
-   return i ==  MAX_FDS ? -1 : i;
+   return i == pfdset->num ? -1 : i;
 }
 
 static int
 fdset_find_free_slot(struct fdset *pfdset)
 {
-   return fdset_find_fd(pfdset, -1);
+   if (pfdset->num < MAX_FDS)
+   return pfdset->num;
+   else
+   return fdset_find_fd(pfdset, -1);
 }
 
-static int
-fdset_add_fd(struct fdset  *pfdset, int idx, int fd,
+static void
+fdset_add_fd(struct fdset *pfdset, int idx, int fd,
fd_cb rcb, fd_cb wcb, void *dat)
 {
struct fdentry *pfdentry;
 
-   if (pfdset == NULL || idx >= MAX_FDS || fd >= FD_SETSIZE)
-   return -1;
-
pfdentry = &pfdset->fd[idx];
pfdentry->fd = fd;
pfdentry->rcb = rcb;
pfdentry->wcb = wcb;
pfdentry->dat = dat;
-
-   return 0;
 }
 
 /**
- * Fill the read/write fd_set with the fds in the fdset.
+ * Compact the fdset and fill the read/write fds with the fds in the fdset.
  * @return
- *  the maximum fds filled in the read/write fd_set.
+ *  the number of fds filled in the read/write fds.
  */
 static int
-fdset_fill(fd_set *rfset, fd_set *wfset, struct fdset *pfdset)
+fdset_fill(struct pollfd *rwfds, struct fdset *pfdset)
 {
struct fdentry *pfdentry;
-   int i, maxfds = -1;
-   int num = MAX_FDS;
-
-   if (pfdset == NULL)
-   return -1;
+   int i;
+   int num;
 
-   for (i = 0; i < num; i++) {
+   for (i = 0, num = pfdset->num; i < num; i++) {
pfdentry = &pfdset->fd[i];
-   if (pfdentry->fd != -1) {
-   int added = 0;
-   if (pfdentry->rcb && rfset) {
-   FD_SET(pfdentry->fd, rfset);
-   added = 1;
-   }
-   if (pfdentry->wcb && wfset) {
-   FD_SET(pfdentry->fd, wfset);
-   added = 1;
-   }
-   if (added)
-   maxfds = pfdentry->fd < maxfds ?
-   maxfds : pfdentry->fd;
+
+   if (pfdentry->fd < 0) {
+   /* Hole in the list. Move the last one here */
+
+   *pfdentry = pfdset->fd[num - 1];
+   pfdset->fd[num - 1].fd = -1;
+   num = fdset_shrink(pfdset);
}
+   rwfds[i].fd = pfdentry->fd;
+   rwfds[i].events = pfdentry->rcb ? POLLIN : 0;
+   rwfds[i].events |= pfdentry->wcb ? POLLOUT : 0;
}
-   return maxfds;
+
+   return i;
 }
 
 void
@@ -132,6 +149,8 @@
if (pfdset == NULL)
return;
 
+   pthread_mutex_init(&pfdset->fd_mutex, NULL);
+
for (i = 0; i < MAX_FDS; i++) {
pfdset->fd[i].fd = -1;
pfdset->fd[i].dat = NULL;
@@ -152,14 +171,15 @@
 
pthread_mutex_lock(&pfdset->fd_mutex);
 
-   /* Find a free slot in the list. */
i = fdset_find_free_slot(pfdset);
-   if (i == -1 || fdset_add_fd(p

Re: [dpdk-dev] [PATCH] Scheduler: add driver for scheduler crypto pmd

2016-12-07 Thread Declan Doherty

On 05/12/16 15:12, Neil Horman wrote:

On Fri, Dec 02, 2016 at 04:22:16PM +, Declan Doherty wrote:

On 02/12/16 14:57, Bruce Richardson wrote:

On Fri, Dec 02, 2016 at 03:31:24PM +0100, Thomas Monjalon wrote:

2016-12-02 14:15, Fan Zhang:

This patch provides the initial implementation of the scheduler poll mode
driver using DPDK cryptodev framework.

Scheduler PMD is used to schedule and enqueue the crypto ops to the
hardware and/or software crypto devices attached to it (slaves). The
dequeue operation from the slave(s), and the possible dequeued crypto op
reordering, are then carried out by the scheduler.

The scheduler PMD can be used to fill the throughput gap between the
physical core and the existing cryptodevs to increase the overall
performance. For example, if a physical core has higher crypto op
processing rate than a cryptodev, the scheduler PMD can be introduced to
attach more than one cryptodevs.

This initial implementation is limited to supporting the following
scheduling modes:

- CRYPTO_SCHED_SW_ROUND_ROBIN_MODE (round robin amongst attached software
slave cryptodevs, to set this mode, the scheduler should have been
attached 1 or more software cryptodevs.

- CRYPTO_SCHED_HW_ROUND_ROBIN_MODE (round robin amongst attached hardware
slave cryptodevs (QAT), to set this mode, the scheduler should have
been attached 1 or more QATs.


Could it be implemented on top of the eventdev API?


Not really. The eventdev API is for different types of scheduling
between multiple sources that are all polling for packets, compared to
this, which is more analgous - as I understand it - to the bonding PMD
for ethdev.

To make something like this work with an eventdev API you would need to
use one of the following models:
* have worker cores for offloading packets to the different crypto
  blocks pulling from the eventdev APIs. This would make it difficult to
  do any "smart" scheduling of crypto operations between the blocks,
  e.g. that one crypto instance may be better at certain types of
  operations than another.
* move the logic in this driver into an existing eventdev instance,
  which uses the eventdev api rather than the crypto APIs and so has an
  extra level of "structure abstraction" that has to be worked though.
  It's just not really a good fit.

So for this workload, I believe the pseudo-cryptodev instance is the
best way to go.

/Bruce




As Bruce says this is much more analogous to the ethdev bonding driver, the
main idea is to allow different crypto op scheduling mechanisms to be
defined transparently to an application. This could be load-balancing across
multiple hw crypto devices, or having a software crypto device to act as a
backup device for a hw accelerator if it becomes oversubscribed. I think the
main advantage of a crypto-scheduler approach means that the data path of
the application doesn't need to have any knowledge that scheduling is
happening at all, it is just using a different crypto device id, which is
then manages the distribution of crypto work.




This is a good deal like the bonding pmd, and so from a certain standpoint it
makes sense to do this, but whereas the bonding pmd is meant to create a single
path to a logical network over several physical networks, this pmd really only
focuses on maximizing througput, and for that we already have tools.  As Thomas
mentions, there is the eventdev library, but from my view the distributor
library already fits this bill.  It already is a basic framework to process
mbufs in parallel according to whatever policy you want to implement, which
sounds like exactly what the goal of this pmd is.

Neil




Hey Neil,

this is actually intended to act and look a good deal like the ethernet 
bonding device but to handling the crypto scheduling use cases.


For example, take the case where multiple hw accelerators may be 
available. We want to provide user applications with a mechanism to 
transparently balance work across all devices without having to manage 
the load balancing details or the guaranteeing of ordering of the 
processed ops on the dequeue_burst side. In this case the application 
would just use the crypto dev_id of the scheduler and it would look 
after balancing the workload across the available hw accelerators.



+---+
|  Crypto Sch PMD   |
|   |
| ORDERING / RR SCH |
+---+
^ ^ ^
| | |
  +-+ | +---+
  |   +---+ |
  |   | |
  V   V V
+---+ +---+ +---+
| Crypto HW PMD | | Crypto HW PMD | | Crypto HW PMD |
+---+ +---+ +---+

Another use case we hope to support is migration of processing from one 
device to another where a hw and sw crypto pmd can be bound to the same 
crypto scheduler and the crypto processing could be  transparently 
migrate

Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum engine

2016-12-07 Thread Ananyev, Konstantin
Hi everyone,

> -Original Message-
> From: Kulasek, TomaszX
> Sent: Wednesday, December 7, 2016 12:12 PM
> To: Mcnamara, John ; Yigit, Ferruh 
> ; dev@dpdk.org
> Cc: Ananyev, Konstantin ; olivier.m...@6wind.com
> Subject: RE: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in csum 
> engine
> 
> Hi John,
> 
> > -Original Message-
> > From: Mcnamara, John
> > Sent: Wednesday, December 7, 2016 13:01
> > To: Yigit, Ferruh ; Kulasek, TomaszX
> > ; dev@dpdk.org
> > Cc: Ananyev, Konstantin ;
> > olivier.m...@6wind.com
> > Subject: RE: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> > csum engine
> >
> > > -Original Message-
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit
> > > Sent: Wednesday, December 7, 2016 11:14 AM
> > > To: Kulasek, TomaszX ; dev@dpdk.org
> > > Cc: Ananyev, Konstantin ;
> > > olivier.m...@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH v12 6/6] testpmd: use Tx preparation in
> > > csum engine
> > >
> > > ...
> > > <...>
> > > > +cmdline_parse_inst_t cmd_csum_txprep = {
> > > > +   .f = cmd_csum_txprep_parsed,
> > > > +   .data = NULL,
> > > > +   .help_str = "enable/disable tx preparation path for csum 
> > > > engine: "
> > > > +   "csum txprep on|off",
> > >
> > > Can you please format help string as:
> > > "cmd fixed_string fixed|string|options : Description"
> > > see commit 26faac80327f
> > >
> > > above becomes:
> > > "csum txprep on|off: Enable/Disable tx preparation path for csum engine"
> > >
> >
> >
> > Also, does this require an update to the testpmd docs?
> >
> >
> 
> Yes, I think. I will add adequate description to the docs.

I suppose if we'll get tx_prepare() addressed on all PMDs DPDK support
we can safely remove that command.
Konstantin

> 
> Tomasz


Re: [dpdk-dev] [PATCH v2] doc: introduce PVP reference benchmark

2016-12-07 Thread Kevin Traynor
On 12/06/2016 12:24 PM, Maxime Coquelin wrote:
> Having reference benchmarks is important in order to obtain
> reproducible performance figures.
> 
> This patch describes required steps to configure a PVP setup
> using testpmd in both host and guest.
> 
> Not relying on external vSwitch ease integration in a CI loop by
> not being impacted by DPDK API changes.
> 
> Signed-off-by: Maxime Coquelin 
> ---
> 
> Thanks to all the reviewers, this v2 should take all remarks into account.
> 
>  -- Maxime

Acked-by: Kevin Traynor 



Re: [dpdk-dev] [PATCH v2 05/32] net/i40e: set TX loopback from PF

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
> Support enabling/disabling TX loopback from PF.
> User can call the API on PF to enable/disable TX loopback
> for all the PF and VFs.
> 
> Signed-off-by: Wenzhuo Lu 
> ---
>  drivers/net/i40e/i40e_ethdev.c| 219 
> ++
>  drivers/net/i40e/rte_pmd_i40e.h   |  16 +++
>  drivers/net/i40e/rte_pmd_i40e_version.map |   1 +
>  3 files changed, 236 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index ec863b9..8bd0d70 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -9938,3 +9938,222 @@ static void i40e_set_default_mac_addr(struct 
> rte_eth_dev *dev,
>  
>   return ret;
>  }
> +

<...>

> +
> +static int
> +i40e_vsi_set_tx_loopback(struct i40e_vsi *vsi, uint8_t on)
> +{
> + struct i40e_vsi_context ctxt;
> + struct i40e_hw *hw;
> + int ret;
> +
> + hw = I40E_VSI_TO_HW(vsi);
> +
> + /* Use the FW API if FW >= v5.0 */
> + if (hw->aq.fw_maj_ver < 5) {
> + PMD_INIT_LOG(ERR, "FW < v5.0, cannot enable loopback");
> + return -ENOTSUP;
> + }
> +
> + /* Check if it has been already on or off */
> + if (vsi->info.valid_sections &
> + rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID)) {
> + if (on) {
> + if ((vsi->info.switch_id &
> +  I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB) ==
> + I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB)
> + return 0; /* already on */
> + } else {
> + if ((vsi->info.switch_id &
> +  I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB) == 0)
> + return 0; /* already off */
> + }
> + }
> +
> + /* remove all the MACs first */
> + ret = i40e_vsi_rm_mac_filter(vsi);
> + if (ret)
> + return ret;
> +
> + vsi->info.valid_sections = cpu_to_le16(I40E_AQ_VSI_PROP_SWITCH_VALID);
> + if (on)
> + vsi->info.switch_id |= I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB;
> + else
> + vsi->info.switch_id &= ~I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB;
> +

>

> + memset(&ctxt, 0, sizeof(ctxt));
> + (void)rte_memcpy(&ctxt.info, &vsi->info, sizeof(vsi->info));
> + ctxt.seid = vsi->seid;
> +
> + ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
> + if (ret != I40E_SUCCESS) {
> + PMD_DRV_LOG(ERR, "Failed to update VSI params");
> + return ret;
> + }

<

This part is now duplicated in a few functions, does it make sense to
make it separate function, in the first patch it appeared 3/32 ?

> +
> + /* add all the MACs back */
> + ret = i40e_vsi_restore_mac_filter(vsi);
> + if (ret)
> + return ret;
> +
> + return ret;
> +}
> +
> +int
> +rte_pmd_i40e_set_tx_loopback(uint8_t port, uint8_t on)
> +{
> + struct rte_eth_dev *dev;
> + struct i40e_pf *pf;
> + struct i40e_pf_vf *vf;
> + struct i40e_vsi *vsi;
> + uint16_t vf_id;
> + int ret;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
> +
> + dev = &rte_eth_devices[port];
> +
> + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> +
> + /* setup PF TX loopback */
> + vsi = pf->main_vsi;
> + ret = i40e_vsi_set_tx_loopback(vsi, on);
> + if (ret)
> + return ret;
> +
> + /* setup TX loopback for all the VFs */
> + if (!pf->vfs) {
> + PMD_DRV_LOG(ERR, "Invalid argument.");
> + return -EINVAL;
> + }
> +
> + for (vf_id = 0; vf_id < pf->vf_num; vf_id++) {
> + vf = &pf->vfs[vf_id];
> + vsi = vf->vsi;
> +
> + ret = i40e_vsi_set_tx_loopback(vsi, on);
> + if (ret)
> + return ret;
> + }
> +
> + return ret;
> +}
> diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
> index c8736c8..3c65be4 100644
> --- a/drivers/net/i40e/rte_pmd_i40e.h
> +++ b/drivers/net/i40e/rte_pmd_i40e.h
> @@ -114,4 +114,20 @@ int rte_pmd_i40e_set_vf_vlan_anti_spoof(uint8_t port,
>   uint16_t vf_id,
>   uint8_t on);
>  
> +/**
> + * Enable/Disable TX loopback on all the PF and VFs.
> + *
> + * @param port
> + *The port identifier of the Ethernet device.
> + * @param on
> + *1 - Enable TX loopback.
> + *0 - Disable TX loopback.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENODEV) if *port* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +int rte_pmd_i40e_set_tx_loopback(uint8_t port,
> +  uint8_t on);
> +
>  #endif /* _PMD_I40E_H_ */
> diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map 
> b/drivers/net/i40e/rte_pmd_i40e_version.map
> index fff6cf9..3da04d3 100644
> --- a/drivers/net/i40e/rte_pmd_i40e_version.map
> +++ b/drivers/net/i40e/rte_pmd_i40e_version

Re: [dpdk-dev] [PATCH 00/13] Introducing EAL Bus-Device-Driver Model

2016-12-07 Thread Shreyansh Jain

On Wednesday 07 December 2016 05:47 PM, David Marchand wrote:

Hello Shreyansh,

On Wed, Dec 7, 2016 at 10:55 AM, Shreyansh Jain  wrote:

On Wednesday 07 December 2016 02:22 AM, David Marchand wrote:

0002~0003: Introducing the basic Bus model and associated test case
0005:  Support insertion of device rather than addition to tail



Patch 2 and 5 could be squashed.



I deliberately kept them separate. I intent to extend the Patch 5 for
hotplugging. But, if I don't end up adding support for that in this series,
I will merge these two.


Fine.



The constructor priority stuff seems unneeded as long as we use
explicit reference to a global (or local, did not check) bus symbol
rather than a runtime lookup.



I didn't understand your point here.
IMO, constructor priority (or some other way to handle this) is important. I
faced this issue while verifying it at my end when the drivers were getting
registered before the bus.

Can you elaborate more on '..use explicit reference to a global...'?


The drivers register themselves to a bus using this bus specific api.

For pci, this is rte_eal_pci_register().
The pci_bus object must be moved to eal_common_pci.c (we can stil
internally expose for bsd / linux specific implementations).
Then, rte_eal_pci_register() can add the pci driver to the pci_bus
drivers list even if this pci_bus object is not registered yet to the
buses list.


So, in eal_common_bus.c

--->8---

struct rte_bus *global_ptr_to_pci_bus = NULL;

struct rte_bus pci_bus = { ... };

rte_eal_pci_register() {
if (global_ptr_to_pci_bus == NULL)
rte_eal_bus_register(&pci_bus)
else
   // continue as if PCI bus is registered
}

--->8---

so, no RTE_REGISTER_BUS()?

If yes, then RTE_REGISTER_BUS() should also check for an existing 
registration for duplication.


I was banking on a model where bus handlers (or bus drivers) are 
independent entities, just like PMDs. So, we have a bus XYZ without any 
drivers necessarily based on it.


By making registration dependent on driver registration, it becomes 
implicit that buses don't exist without drivers.
I am not in favor of this - or maybe I lack enough reason for this 
(about how it will make framework/PMD life better).




And no constructor order issue ?






0004:  Add scan and match callbacks for the Bus and updated test case



Why do you push back the bus object in the 'scan' method ?
This method is bus specific which means that the code "knows" the
object registered with the callback.



This 'knows' is the grey area for me.
The bus (for example, PCI) after scanning needs to call
rte_eal_bus_add_device() to link the device in bus's device_list.

Two options:
1. Have a global reference to "pci" bus (rte_bus) somewhere in eal_pci.c
2. Call rte_eal_get_bus() every time someone needs the reference.
3. C++ style, 'this->'.

I have taken the 3rd path. It simplifies my code to not assume a handle as
well as not allow for reference fetch calls every now and then.

As a disadvantage: it means passing this as argument - and some cases
maintaining it as __rte_unused.

Taking (1) or (2) is not advantageous than this approach.


1) is the simplest one.

When you write a pci_scan method and embed it in you pci_bus object,
but this pci_scan method still wonders which bus object it is supposed
to work on, this is a bit like Schizophrenia ;-).


:)
This now is linked to the above issue of constructor priority and having 
a global bus reference. I don't personally prefer it.

I will still give this a serious thought, though.





Is is that you want to have a single scan method used by multiple buses ?



Yes, but only as a use case. For example, platform devices are of various
types - what if we have a south-bound bus over a platform bus. In which
case, a hierarchical bus layout is possible.
But, this is far-fetched idea for now.


Well, if you have no usecase at the moment, let's keep it simple, please.



Ok.






0006:  Integrate bus scan/match with EAL, without any effective
driver



Hard to find a right balance in patch splittng, but patch 4 and 6 are
linked, I would squash them into one.



Yes, it is hard and sometimes there is simply no strong rationale for
splitting or merging. This is one of those cases.
My idea was that one patch _only_ introduces Bus services (structures,
functions etc) and another should enable the calls to it from EAL.
In that sense, I still think 4 and 6 should remain separate, may be
consecutive, though.


Ok, will see in next version of the patchset.


Is there anything specific that you are looking for in patchset v2?
I was thinking of:
0. fixing BSD compilation issue reported by CI
1. improving the test_pci.c
2. hotplugging
3. trying to move PCI to drives/bus/pci/linux/* and resolving how 
drivers link to it, and how EAL resources like devargs are consumed.


Anything else?







0007:  rte_pci_driver->probe replaced with rte_driver->probe



This patch is too big, please separate in two patche

Re: [dpdk-dev] [PATCH 1/8] drivers/common/dpaa2: Run time assembler for Descriptor formation

2016-12-07 Thread Thomas Monjalon
2016-12-07 11:44, Akhil Goyal:
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
> 2016-12-07 06:24, Akhil Goyal:
> > From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
> > 2016-12-05 18:25, Akhil Goyal:
> > > FLib is a library which helps in making the descriptors which is 
> > > understood by NXP's SEC hardware.
> > > This patch provides header files for command words which can be used 
> > > for descritptor formation.
> > 
> > It seems this code is old. Does it exist as a standalone library somewhere?
> [Akhil] Let me correct here. This is not a library. This is a set of header 
> files.

Use the name you want but please be consistent. You said above:
"FLib is a library"
and now
"This is not a library"
Funny :)

> Yes this is an old code. This is generally shipped with NXP SDK.
> 
> > Where was it hosted before duplicating it in DPDK?
> [Akhil] This is part of NXP SDK and also available at git.freescale.com.

So it is not a DPDK code.
It should be a standalone library (or set of headers) and packaged as well.

> > Why do you want to have a common directory drivers/common/dpaa2/flib 
> > instead of a sub-directory in the crypto driver?
> [Akhil] I agree with your suggestion. This can be maintained within 
> drivers/crypto as a common header files set for different NXP Architecture 
> crypto drivers.

Yes it can be in drivers/crypto/ or not in DPDK at all.


Re: [dpdk-dev] [PATCH 00/13] Introducing EAL Bus-Device-Driver Model

2016-12-07 Thread Thomas Monjalon
2016-12-07 18:40, Shreyansh Jain:
> Is there anything specific that you are looking for in patchset v2?
> I was thinking of:
> 0. fixing BSD compilation issue reported by CI
> 1. improving the test_pci.c
> 2. hotplugging
> 3. trying to move PCI to drives/bus/pci/linux/* and resolving how 
> drivers link to it, and how EAL resources like devargs are consumed.

I am concerned about the time needed for all these changes.
Please let's make sure that the basic parts are well done and pushed, first.
That's why I suggest to postpone 1, 2 and 3 to next release if possible.
The priority is to have a clean bus model,
and if time permits, integrate the NXP driver.


Re: [dpdk-dev] [PATCH] vhost: allow for many vhost user ports

2016-12-07 Thread Jan Wickbom


> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan@linux.intel.com]
> Sent: den 7 december 2016 11:13
> To: Jan Wickbom 
> Cc: dev@dpdk.org; Patrik Andersson R 
> Subject: Re: [PATCH] vhost: allow for many vhost user ports
> 
> On Thu, Dec 01, 2016 at 04:26:50PM +0100, Jan Wickbom wrote:
> >  static int
> > -fdset_fill(fd_set *rfset, fd_set *wfset, struct fdset *pfdset)
> > +fdset_fill(struct pollfd *rwfds, struct fdset *pfdset)
> >  {
> > struct fdentry *pfdentry;
> > -   int i, maxfds = -1;
> > -   int num = MAX_FDS;
> > -
> > -   if (pfdset == NULL)
> > -   return -1;
> > +   int i;
> > +   int num;
> >
> > -   for (i = 0; i < num; i++) {
> > +   for (i = 0, num = pfdset->num; i < num; i++) {
> > pfdentry = &pfdset->fd[i];
> > -   if (pfdentry->fd != -1) {
> > -   int added = 0;
> > -   if (pfdentry->rcb && rfset) {
> > -   FD_SET(pfdentry-
> >fd, rfset);
> > -   added = 1;
> > -   }
> > -   if (pfdentry->wcb && wfset) {
> > -   FD_SET(pfdentry-
> >fd, wfset);
> > -   added = 1;
> > -   }
> > -   if (added)
> > -   maxfds = pfdentry-
> >fd < maxfds ?
> > -
>   maxfds : pfdentry->fd;
> > +
> > +   if (pfdentry->fd < 0) {
> > +   /* Hole in the list. Move the last
> one here */
> > +
> > +   *pfdentry = pfdset->fd[num - 1];
> > +   pfdset->fd[num - 1].fd = -1;
> > +   num =
> fdset_adjust_num(pfdset);
> > }
> > +   rwfds[i].fd = pfdentry->fd;
> > +   rwfds[i].events = pfdentry->rcb ? POLLIN : 0;
> > +   rwfds[i].events |= pfdentry->wcb ? POLLOUT :
> 0;
> 
> Another thing is we don't have to re-init this rwfds array again
> and again. Instead, we could
> 
> - set it up correctly when fdset_add is invoked: set the fd and
>   events.
> 
> - reset revents when it's been handled at fdset_event_dispatch().
> 
> - swap with the last one and shrink the array on fd delete
> 
> Could you make a follow up patch for that?

I don't see how that could easily be done. The loop index, i, is a direct 
reference between
an entry in the rwfds array and an entry in the pfdset array. It should stay 
like that while we are
hanging in the poll(). If  an entry in the pfdset array is removed while we are 
hanging in the poll()
and we then immediately replaces it with the last entry in the array we will 
end up in trouble if the
revent gets set for the "replaced" index. The direct reference is gone.
Or am I missing something?
/jaw
> 
> Thanks.
> 
>   --yliu


Re: [dpdk-dev] [PATCH v2 09/32] rte: add APIs for VF stats get/reset

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
> This patch add below two APIs so that VF statistics
> can be get/clear from PF side.
> rte_eth_vf_stats_get.
> rte_eth_vf_stats_reset.

patch subject can have " ... from PF" both to be consistent with other
patches and to clarify what it does: add APIS to get/reset VF stats from PF?

> 
> Signed-off-by: Qi Zhang 
> ---

<...>

> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 9678179..8b564ee 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1271,6 +1271,15 @@ typedef int (*eth_set_vf_vlan_filter_t)(struct 
> rte_eth_dev *dev,
> uint8_t vlan_on);
>  /**< @internal Set VF VLAN pool filter */
>  
> +typedef int (*eth_vf_stats_get)(struct rte_eth_dev *dev,
> + uint16_t vf,
> + struct rte_eth_stats *stats);
> +/**< @internal Get VF statistics */
> +
> +typedef int (*eth_vf_stats_reset)(struct rte_eth_dev *dev,
> +   uint16_t vf);
> +/**< @internal Clear VF statistics */
> +
>  typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
>   uint16_t queue_idx,
>   uint16_t tx_rate);
> @@ -1483,6 +1492,8 @@ struct eth_dev_ops {
>   eth_set_vf_rx_tset_vf_rx;  /**< enable/disable a VF receive 
> */
>   eth_set_vf_tx_tset_vf_tx;  /**< enable/disable a VF 
> transmit */
>   eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter 
> */
> + eth_vf_stats_get   vf_stats_get; /**< Get VF's statistics */
> + eth_vf_stats_reset vf_stats_reset; /**< Reset VF's statistics */

Do we really want to add more ops to the eth_dev_ops?

Although vf_stats_get & vf_stats_reset sounds generic, why not implement
these first in PMD specific manner, and more PMDs implement these, move
to the generic eth_dev_ops layer?

CC: Thomas

<...>


Re: [dpdk-dev] [PATCH v2 10/32] net/i40e: implement ops for VF stats get/reset

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
> This patch implement vf_stats_get and vf_stats_reset ops for i40e.
> 
> Signed-off-by: Qi Zhang 
> ---
<...>

> +static int
> +i40e_vf_stats_get(struct rte_eth_dev *dev,
> +   uint16_t vf,
> +   struct rte_eth_stats *stats)
> +{
> + struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> + struct i40e_vsi *vsi;
> +
> + int ret = 0;
> +
> + if (pf->vf_num <= vf) {
> + PMD_DRV_LOG(ERR, "Invalid VF id %d\n", vf);
> + return -EINVAL;
> + }

Do we need following check as it has been done in prev patches:

+   rte_eth_dev_info_get(port, &dev_info);
+
+   if (vf_id >= dev_info.max_vfs)
+   return -EINVAL;



Re: [dpdk-dev] [PATCH] Scheduler: add driver for scheduler crypto pmd

2016-12-07 Thread Neil Horman
On Wed, Dec 07, 2016 at 12:42:15PM +, Declan Doherty wrote:
> On 05/12/16 15:12, Neil Horman wrote:
> > On Fri, Dec 02, 2016 at 04:22:16PM +, Declan Doherty wrote:
> > > On 02/12/16 14:57, Bruce Richardson wrote:
> > > > On Fri, Dec 02, 2016 at 03:31:24PM +0100, Thomas Monjalon wrote:
> > > > > 2016-12-02 14:15, Fan Zhang:
> > > > > > This patch provides the initial implementation of the scheduler 
> > > > > > poll mode
> > > > > > driver using DPDK cryptodev framework.
> > > > > > 
> > > > > > Scheduler PMD is used to schedule and enqueue the crypto ops to the
> > > > > > hardware and/or software crypto devices attached to it (slaves). The
> > > > > > dequeue operation from the slave(s), and the possible dequeued 
> > > > > > crypto op
> > > > > > reordering, are then carried out by the scheduler.
> > > > > > 
> > > > > > The scheduler PMD can be used to fill the throughput gap between the
> > > > > > physical core and the existing cryptodevs to increase the overall
> > > > > > performance. For example, if a physical core has higher crypto op
> > > > > > processing rate than a cryptodev, the scheduler PMD can be 
> > > > > > introduced to
> > > > > > attach more than one cryptodevs.
> > > > > > 
> > > > > > This initial implementation is limited to supporting the following
> > > > > > scheduling modes:
> > > > > > 
> > > > > > - CRYPTO_SCHED_SW_ROUND_ROBIN_MODE (round robin amongst attached 
> > > > > > software
> > > > > > slave cryptodevs, to set this mode, the scheduler should have 
> > > > > > been
> > > > > > attached 1 or more software cryptodevs.
> > > > > > 
> > > > > > - CRYPTO_SCHED_HW_ROUND_ROBIN_MODE (round robin amongst attached 
> > > > > > hardware
> > > > > > slave cryptodevs (QAT), to set this mode, the scheduler should 
> > > > > > have
> > > > > > been attached 1 or more QATs.
> > > > > 
> > > > > Could it be implemented on top of the eventdev API?
> > > > > 
> > > > Not really. The eventdev API is for different types of scheduling
> > > > between multiple sources that are all polling for packets, compared to
> > > > this, which is more analgous - as I understand it - to the bonding PMD
> > > > for ethdev.
> > > > 
> > > > To make something like this work with an eventdev API you would need to
> > > > use one of the following models:
> > > > * have worker cores for offloading packets to the different crypto
> > > >   blocks pulling from the eventdev APIs. This would make it difficult to
> > > >   do any "smart" scheduling of crypto operations between the blocks,
> > > >   e.g. that one crypto instance may be better at certain types of
> > > >   operations than another.
> > > > * move the logic in this driver into an existing eventdev instance,
> > > >   which uses the eventdev api rather than the crypto APIs and so has an
> > > >   extra level of "structure abstraction" that has to be worked though.
> > > >   It's just not really a good fit.
> > > > 
> > > > So for this workload, I believe the pseudo-cryptodev instance is the
> > > > best way to go.
> > > > 
> > > > /Bruce
> > > > 
> > > 
> > > 
> > > As Bruce says this is much more analogous to the ethdev bonding driver, 
> > > the
> > > main idea is to allow different crypto op scheduling mechanisms to be
> > > defined transparently to an application. This could be load-balancing 
> > > across
> > > multiple hw crypto devices, or having a software crypto device to act as a
> > > backup device for a hw accelerator if it becomes oversubscribed. I think 
> > > the
> > > main advantage of a crypto-scheduler approach means that the data path of
> > > the application doesn't need to have any knowledge that scheduling is
> > > happening at all, it is just using a different crypto device id, which is
> > > then manages the distribution of crypto work.
> > > 
> > > 
> > > 
> > This is a good deal like the bonding pmd, and so from a certain standpoint 
> > it
> > makes sense to do this, but whereas the bonding pmd is meant to create a 
> > single
> > path to a logical network over several physical networks, this pmd really 
> > only
> > focuses on maximizing througput, and for that we already have tools.  As 
> > Thomas
> > mentions, there is the eventdev library, but from my view the distributor
> > library already fits this bill.  It already is a basic framework to process
> > mbufs in parallel according to whatever policy you want to implement, which
> > sounds like exactly what the goal of this pmd is.
> > 
> > Neil
> > 
> > 
> 
> Hey Neil,
> 
> this is actually intended to act and look a good deal like the ethernet
> bonding device but to handling the crypto scheduling use cases.
> 
> For example, take the case where multiple hw accelerators may be available.
> We want to provide user applications with a mechanism to transparently
> balance work across all devices without having to manage the load balancing
> details or the guaranteeing of ordering of the processed ops on the
> dequeue_burst side. In this case the a

Re: [dpdk-dev] [PATCH v2 15/32] net/i40e: add VF vlan strip func

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
> Add a function to configure vlan strip enable/disable for specific
> SRIOV VF device.
> 
> Signed-off-by: Chen Jing D(Mark) 
> ---

<...>

> +
> +/* Set vlan strip on/off for specific VF from host */
> +int
> +rte_pmd_i40e_set_vf_vlan_stripq(uint8_t port, uint16_t vf_id, uint8_t on)
> +{
> + struct rte_eth_dev *dev;
> + struct i40e_pf *pf;
> + struct i40e_vsi *vsi;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
> +
> + dev = &rte_eth_devices[port];
> + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> +
> + if (vf_id > pf->vf_num - 1 || !pf->vfs) {
> + PMD_DRV_LOG(ERR, "Invalid argument.");
> + return -EINVAL;
> + }
> +
> + vsi = pf->vfs[vf_id].vsi;
> +
> + if (vsi)
> + return i40e_vsi_config_vlan_stripping(vsi, !!on);
> + else

if vd_if is valid, can vsi be NULL? If so this check may be required in
some prev patches too.

> + return -EINVAL;
> +}
> diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
> index ca5e05a..043ae62 100644
> --- a/drivers/net/i40e/rte_pmd_i40e.h
> +++ b/drivers/net/i40e/rte_pmd_i40e.h
> @@ -187,4 +187,24 @@ int rte_pmd_i40e_set_vf_multicast_promisc(uint8_t port,
>  int rte_pmd_i40e_set_vf_mac_addr(uint8_t port, uint16_t vf_id,
>struct ether_addr *mac_addr);
>  
> +/**
> + * Enable/Disable vf vlan strip for all queues in a pool
> + *
> + * @param port
> + *The port identifier of the Ethernet device.
> + * @param vf
> + *ID specifying VF.
> + * @param on
> + *1 - Enable VF's vlan strip on RX queues.
> + *0 - Disable VF's vlan strip on RX queues.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support this feature.

Is this error type returned?

> + *   - (-ENODEV) if *port* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
<...>


Re: [dpdk-dev] [PATCH v2 16/32] net/i40e: add set VF VLAN insert function

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
> Support inserting VF VLAN id from PF.
> User can call the API on PF to insert a VLAN id to a
> specific VF.

Same comment with prev patch, does it make sense to insert " from PF" to
patch title?

> 
> Signed-off-by: Bernard Iremonger 
> ---
>  drivers/net/i40e/i40e_ethdev.c| 53 
> +++
>  drivers/net/i40e/rte_pmd_i40e.h   | 19 +++
>  drivers/net/i40e/rte_pmd_i40e_version.map |  1 +
>  3 files changed, 73 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index 253209b..c571d8b 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -10361,3 +10361,56 @@ static void i40e_set_default_mac_addr(struct 
> rte_eth_dev *dev,
>   else
>   return -EINVAL;
>  }
> +
> +int rte_pmd_i40e_set_vf_vlan_insert(uint8_t port, uint16_t vf_id,
> + uint16_t vlan_id)
> +{
> + struct rte_eth_dev *dev;
> + struct rte_eth_dev_info dev_info;
> + struct i40e_pf *pf;
> + struct i40e_pf_vf *vf;
> + struct i40e_hw *hw;
> + struct i40e_vsi *vsi;
> + struct i40e_vsi_context ctxt;
> + int ret;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
> +
> + dev = &rte_eth_devices[port];
> + rte_eth_dev_info_get(port, &dev_info);
> +
> + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> + hw = I40E_PF_TO_HW(pf);
> +
> + /**
> +  * return -ENODEV if SRIOV not enabled, VF number not configured
> +  * or no queue assigned.
> +  */
> + if (!hw->func_caps.sr_iov_1_1 || pf->vf_num == 0 ||
> + pf->vf_nb_qps == 0)
> + return -ENODEV;
> +
> + if (vf_id > pf->vf_num)

This check was [1] in prev patches:
[1]
if (vf_id > pf->vf_num - 1 || !pf->vfs)

> + return -EINVAL;
> +
> + if (vlan_id > 4095)

Can there be any define in base driver for this? Or ETH_VLAN_ID_MAX perhaps?

> + return -EINVAL;
> +
> + vf = &pf->vfs[vf_id];
> + vsi = vf->vsi;
> +
> + vsi->info.valid_sections = cpu_to_le16(I40E_AQ_VSI_PROP_VLAN_VALID);
> + vsi->info.port_vlan_flags |= I40E_AQ_VSI_PVLAN_INSERT_PVID;
> + vsi->info.pvid = vlan_id;
> +

--->
> + memset(&ctxt, 0, sizeof(ctxt));
> + (void)rte_memcpy(&ctxt.info, &vsi->info, sizeof(vsi->info));
> + ctxt.seid = vsi->seid;
> +
> + hw = I40E_VSI_TO_HW(vsi);
> + ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
> + if (ret != I40E_SUCCESS)
> + PMD_DRV_LOG(ERR, "Failed to update VSI params");
<-

If Wenzhuo prefers to extract this part into a function, it can be
re-used here too.

<...>



Re: [dpdk-dev] [PATCH v2 12/32] net/i40e: set VF MAC from PF support

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
> Support setting VF MAC address from PF.
> User can call the API on PF to set a specific
> VF's MAC address.
> 
> This will reset the VF.
> 
> Signed-off-by: Ferruh Yigit 
> ---

<...>

>  
> +/**
> + * Set the VF MAC address.
> + *
> + * This will reset the vf.

It may be good if I add a comment that this also will remove all
existing mac filters. Same to commit log perhaps.

> + *
> + * @param port
> + *   The port identifier of the Ethernet device.
> + * @param vf_id
> + *   VF id.
> + * @param mac_addr
> + *   VF MAC address.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENODEV) if *port* invalid.
> + *   - (-EINVAL) if *vf* or *mac_addr* is invalid.
> + */
> +int rte_pmd_i40e_set_vf_mac_addr(uint8_t port, uint16_t vf_id,
> +  struct ether_addr *mac_addr);
> +

<...>



Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-07 Thread Alejandro Lucero
For NFP, we do not have TSO support yet, although it is coming and
hopefully it will be within next release.

Regarding this email thread, it is "it is OK, we do not need any checksum
preparation for TSO"

On Wed, Dec 7, 2016 at 10:03 AM, Ananyev, Konstantin <
konstantin.anan...@intel.com> wrote:

>
> Hi Ferruh,
>
> >
> > On 12/6/2016 6:25 PM, Yong Wang wrote:
> > >> -Original Message-
> > >> From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
> > >> Sent: Sunday, December 4, 2016 4:11 AM
> > >> To: Yong Wang ; Thomas Monjalon
> > >> 
> > >> Cc: Harish Patil ; dev@dpdk.org; Rahul
> Lakkireddy
> > >> ; Stephen Hurd
> > >> ; Jan Medala ; Jakub
> > >> Palider ; John Daley ; Adrien
> > >> Mazarguil ; Alejandro Lucero
> > >> ; Rasesh Mody
> > >> ; Jacob, Jerin ;
> > >> Yuanhan Liu ; Kulasek, TomaszX
> > >> ; olivier.m...@6wind.com
> > >> Subject: RE: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> > >>
> > >> Hi
> > >>
> > >>
> > >>
> > 
> > >>
> >  2016-11-30 17:42, Ananyev, Konstantin:
> > >>
> > >>> Please, we need a comment for each driver saying
> > >>
> > >>> "it is OK, we do not need any checksum preparation for TSO"
> > >>
> > >>> or
> > >>
> > >>> "yes we have to implement tx_prepare or TSO will not work in this
> > >>
> >  mode"
> > >>
> > >>>
> > >>
> > >>
> > >>
> > >> qede PMD doesn’t currently support TSO yet, it only supports Tx
> > >>
> >  TCP/UDP/IP
> > >>
> > >> csum offloads.
> > >>
> > >> So Tx preparation isn’t applicable. So as of now -
> > >>
> > >> "it is OK, we do not need any checksum preparation for TSO"
> > >>
> > >
> > >>
> > > Thanks for the answer.
> > >>
> > > Though please note that it not only for TSO.
> > >>
> > 
> > >>
> >  Oh yes, sorry, my wording was incorrect.
> > >>
> >  We need to know if any checksum preparation is needed prior
> > >>
> >  offloading its final computation to the hardware or driver.
> > >>
> >  So the question applies to TSO and simple checksum offload.
> > >>
> > 
> > >>
> >  We are still waiting answers for
> > >>
> >   bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.
> > >>
> > >>>
> > >>
> > >>> The case for a virtual device is a little bit more complicated as
> packets
> > >> offloaded from a virtual device can eventually be delivered to
> > >>
> > >>> another virtual NIC or different physical NICs that have different
> offload
> > >> requirements.  In ESX, the hypervisor will enforce that the packets
> > >>
> > >>> offloaded will be something that the hardware expects.  The contract
> for
> > >> vmxnet3 is that the guest needs to fill in pseudo header checksum
> > >>
> > >>> for both l4 checksum only and TSO + l4 checksum offload cases.
> > >>
> > >>
> > >>
> > >> Ok, so at first glance that looks to me very similar to Intel HW
> requirements.
> > >>
> > >> Could you confirm would rte_net_intel_cksum_prepare()
> > >>
> > >> also work for vmxnet3 or some extra modifications are required?
> > >>
> > >> You can look at it here: https://urldefense.proofpoint.
> com/v2/url?u=http-
> > >> 3A__dpdk.org_dev_patchwork_patch_17184_&d=DgIGaQ&c=uilaK90D4TOV
> > >> oH58JNXRgQ&r=v4BBYIqiDq552fkYnKKFBFyqvMXOR3UXSdFO2plFD1s&m=NS
> > >> 4zOl2je_tyGhnOJMSnu37HmJxOZf-1KLYcVsu8iYY&s=dL-NOC-
> > >> 18HclXUURQzuyW5Udw4NY13pKMndYvfgCfbA&e= .
> > >>
> > >> Note that for Intel HW the rules for pseudo-header csum calculation
> > >>
> > >> differ for TSO and non-TSO case.
> > >>
> > >> For TSO length inside pseudo-header are set to 0, while for non-tso
> case
> > >>
> > >> It should be set to L3 payload length.
> > >>
> > >> Is it the same for vmxnet3 or no?
> > >>
> > >> Thanks
> > >>
> > >> Konstantin
> > >>
> > >
> > > Yes and this is the same for vmxnet3.
> > >
> >
> > This means vmxnet3 PMD also should be updated, right?
>
> Yes, that's right.
>
> >Should that update
> > be part of tx_prep patchset? Or separate patch?
>
> Another question I suppose is who will do the actual patch for vmxnet3.
> Yong, are you ok to do the patch for vmxnet3, or prefer us to do that?
> Please note, that in both cases will need your help in testing/reviewing
> it.
> Konstantin
>
> >
> > >>>
> > >>
> > > This is for any TX offload for which the upper layer SW would have
> > >>
> > > to modify the contents of the packet.
> > >>
> > > Though as I can see for qede neither PKT_TX_IP_CKSUM or
> > >>
> >  PKT_TX_TCP_CKSUM
> > >>
> > > exhibits any extra requirements for the user.
> > >>
> > > Is that correct?
> > >>
> > >>
> > >
>
>


Re: [dpdk-dev] [PATCH v2 17/32] net/i40e: set VF broadcast mode from PF

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> Support enabling/disabling VF broadcast mode from PF.
> User can call the API on PF to enable/disable a specific
> VF's broadcast mode.
> 
> Signed-off-by: Bernard Iremonger 

<...>

> +int rte_pmd_i40e_set_vf_broadcast(uint8_t port, uint16_t vf_id, uint8_t on)
> +{
> + struct rte_eth_dev *dev;
> + struct rte_eth_dev_info dev_info;
> + struct i40e_pf *pf;
> + struct i40e_pf_vf *vf;
> + struct i40e_hw *hw;
> + int ret;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
> +
> + dev = &rte_eth_devices[port];
> + rte_eth_dev_info_get(port, &dev_info);
> +
> + if (vf_id >= dev_info.max_vfs)
> + return -EINVAL;
> +
> + if (on > 1)
> + return -EINVAL;
> +
> + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> + hw = I40E_PF_TO_HW(pf);
> +
> + if (vf_id > pf->vf_num)

if (vf_id > pf->vf_num - 1 || !pf->vfs)

> + return -EINVAL;
> +

<...>


Re: [dpdk-dev] [PATCH v2 16/32] net/i40e: add set VF VLAN insert function

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 2:26 PM, Ferruh Yigit wrote:
> On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
>> Support inserting VF VLAN id from PF.
>> User can call the API on PF to insert a VLAN id to a
>> specific VF.
> 
> Same comment with prev patch, does it make sense to insert " from PF" to
> patch title?
> 
>>
>> Signed-off-by: Bernard Iremonger 
>> ---
>>  drivers/net/i40e/i40e_ethdev.c| 53 
>> +++
>>  drivers/net/i40e/rte_pmd_i40e.h   | 19 +++
>>  drivers/net/i40e/rte_pmd_i40e_version.map |  1 +
>>  3 files changed, 73 insertions(+)
>>
>> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
>> index 253209b..c571d8b 100644
>> --- a/drivers/net/i40e/i40e_ethdev.c
>> +++ b/drivers/net/i40e/i40e_ethdev.c
>> @@ -10361,3 +10361,56 @@ static void i40e_set_default_mac_addr(struct 
>> rte_eth_dev *dev,
>>  else
>>  return -EINVAL;
>>  }
>> +
>> +int rte_pmd_i40e_set_vf_vlan_insert(uint8_t port, uint16_t vf_id,
>> +uint16_t vlan_id)
>> +{
>> +struct rte_eth_dev *dev;
>> +struct rte_eth_dev_info dev_info;
>> +struct i40e_pf *pf;
>> +struct i40e_pf_vf *vf;
>> +struct i40e_hw *hw;
>> +struct i40e_vsi *vsi;
>> +struct i40e_vsi_context ctxt;
>> +int ret;
>> +
>> +RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
>> +
>> +dev = &rte_eth_devices[port];
>> +rte_eth_dev_info_get(port, &dev_info);
>> +
>> +pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
>> +hw = I40E_PF_TO_HW(pf);
>> +
>> +/**
>> + * return -ENODEV if SRIOV not enabled, VF number not configured
>> + * or no queue assigned.
>> + */
>> +if (!hw->func_caps.sr_iov_1_1 || pf->vf_num == 0 ||
>> +pf->vf_nb_qps == 0)
>> +return -ENODEV;
>> +
>> +if (vf_id > pf->vf_num)
> 
> This check was [1] in prev patches:
> [1]
>   if (vf_id > pf->vf_num - 1 || !pf->vfs)
> 
>> +return -EINVAL;
>> +
>> +if (vlan_id > 4095)
> 
> Can there be any define in base driver for this? Or ETH_VLAN_ID_MAX perhaps?

Answer was in next patches, it seems we have two options (which is bad)

lib/librte_net/rte_ether.h
#define ETHER_MAX_VLAN_ID  4095 /**< Maximum VLAN ID. */

lib/librte_ether/rte_ethdev.h
#define ETH_VLAN_ID_MAX   0x0FFF /**< VLAN ID is in lower 12 bits*/

> 
>> +return -EINVAL;
>> +
>> +vf = &pf->vfs[vf_id];
>> +vsi = vf->vsi;
>> +
>> +vsi->info.valid_sections = cpu_to_le16(I40E_AQ_VSI_PROP_VLAN_VALID);
>> +vsi->info.port_vlan_flags |= I40E_AQ_VSI_PVLAN_INSERT_PVID;
>> +vsi->info.pvid = vlan_id;
>> +
> 
> --->
>> +memset(&ctxt, 0, sizeof(ctxt));
>> +(void)rte_memcpy(&ctxt.info, &vsi->info, sizeof(vsi->info));
>> +ctxt.seid = vsi->seid;
>> +
>> +hw = I40E_VSI_TO_HW(vsi);
>> +ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
>> +if (ret != I40E_SUCCESS)
>> +PMD_DRV_LOG(ERR, "Failed to update VSI params");
> <-
> 
> If Wenzhuo prefers to extract this part into a function, it can be
> re-used here too.
> 
> <...>
> 



Re: [dpdk-dev] [PATCH] Scheduler: add driver for scheduler crypto pmd

2016-12-07 Thread Richardson, Bruce


> -Original Message-
> From: Neil Horman [mailto:nhor...@tuxdriver.com]
> Sent: Wednesday, December 7, 2016 2:17 PM
> To: Doherty, Declan 
> Cc: Richardson, Bruce ; Thomas Monjalon
> ; Zhang, Roy Fan ;
> dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] Scheduler: add driver for scheduler crypto
> pmd
> 
> On Wed, Dec 07, 2016 at 12:42:15PM +, Declan Doherty wrote:
> > On 05/12/16 15:12, Neil Horman wrote:
> > > On Fri, Dec 02, 2016 at 04:22:16PM +, Declan Doherty wrote:
> > > > On 02/12/16 14:57, Bruce Richardson wrote:
> > > > > On Fri, Dec 02, 2016 at 03:31:24PM +0100, Thomas Monjalon wrote:
> > > > > > 2016-12-02 14:15, Fan Zhang:
> > > > > > > This patch provides the initial implementation of the
> > > > > > > scheduler poll mode driver using DPDK cryptodev framework.
> > > > > > >
> > > > > > > Scheduler PMD is used to schedule and enqueue the crypto ops
> > > > > > > to the hardware and/or software crypto devices attached to
> > > > > > > it (slaves). The dequeue operation from the slave(s), and
> > > > > > > the possible dequeued crypto op reordering, are then carried
> out by the scheduler.
> > > > > > >
> > > > > > > The scheduler PMD can be used to fill the throughput gap
> > > > > > > between the physical core and the existing cryptodevs to
> > > > > > > increase the overall performance. For example, if a physical
> > > > > > > core has higher crypto op processing rate than a cryptodev,
> > > > > > > the scheduler PMD can be introduced to attach more than one
> cryptodevs.
> > > > > > >
> > > > > > > This initial implementation is limited to supporting the
> > > > > > > following scheduling modes:
> > > > > > >
> > > > > > > - CRYPTO_SCHED_SW_ROUND_ROBIN_MODE (round robin amongst
> attached software
> > > > > > > slave cryptodevs, to set this mode, the scheduler should
> have been
> > > > > > > attached 1 or more software cryptodevs.
> > > > > > >
> > > > > > > - CRYPTO_SCHED_HW_ROUND_ROBIN_MODE (round robin amongst
> attached hardware
> > > > > > > slave cryptodevs (QAT), to set this mode, the scheduler
> should have
> > > > > > > been attached 1 or more QATs.
> > > > > >
> > > > > > Could it be implemented on top of the eventdev API?
> > > > > >
> > > > > Not really. The eventdev API is for different types of
> > > > > scheduling between multiple sources that are all polling for
> > > > > packets, compared to this, which is more analgous - as I
> > > > > understand it - to the bonding PMD for ethdev.
> > > > >
> > > > > To make something like this work with an eventdev API you would
> > > > > need to use one of the following models:
> > > > > * have worker cores for offloading packets to the different crypto
> > > > >   blocks pulling from the eventdev APIs. This would make it
> difficult to
> > > > >   do any "smart" scheduling of crypto operations between the
> blocks,
> > > > >   e.g. that one crypto instance may be better at certain types of
> > > > >   operations than another.
> > > > > * move the logic in this driver into an existing eventdev
> instance,
> > > > >   which uses the eventdev api rather than the crypto APIs and so
> has an
> > > > >   extra level of "structure abstraction" that has to be worked
> though.
> > > > >   It's just not really a good fit.
> > > > >
> > > > > So for this workload, I believe the pseudo-cryptodev instance is
> > > > > the best way to go.
> > > > >
> > > > > /Bruce
> > > > >
> > > >
> > > >
> > > > As Bruce says this is much more analogous to the ethdev bonding
> > > > driver, the main idea is to allow different crypto op scheduling
> > > > mechanisms to be defined transparently to an application. This
> > > > could be load-balancing across multiple hw crypto devices, or
> > > > having a software crypto device to act as a backup device for a hw
> > > > accelerator if it becomes oversubscribed. I think the main
> > > > advantage of a crypto-scheduler approach means that the data path
> > > > of the application doesn't need to have any knowledge that
> > > > scheduling is happening at all, it is just using a different crypto
> device id, which is then manages the distribution of crypto work.
> > > >
> > > >
> > > >
> > > This is a good deal like the bonding pmd, and so from a certain
> > > standpoint it makes sense to do this, but whereas the bonding pmd is
> > > meant to create a single path to a logical network over several
> > > physical networks, this pmd really only focuses on maximizing
> > > througput, and for that we already have tools.  As Thomas mentions,
> > > there is the eventdev library, but from my view the distributor
> > > library already fits this bill.  It already is a basic framework to
> > > process mbufs in parallel according to whatever policy you want to
> implement, which sounds like exactly what the goal of this pmd is.
> > >
> > > Neil
> > >
> > >
> >
> > Hey Neil,
> >
> > this is actually intended to act and look a good deal like the
> > ethernet bonding device but to handling the crypto schedu

Re: [dpdk-dev] [PATCH v2 19/32] net/i40e: set VF VLAN filter from PF

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> add rte_pmd_i40e_set_vf_vlan_filter API.
> User can call the API on PF to enable/disable
> a set of VF's VLAN filters.
> 
> Signed-off-by: Bernard Iremonger 
> ---
>  drivers/net/i40e/i40e_ethdev.c| 50 
> +++
>  drivers/net/i40e/rte_pmd_i40e.h   | 22 ++
>  drivers/net/i40e/rte_pmd_i40e_version.map |  1 +
>  3 files changed, 73 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index 601e933..bc96698 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -10516,3 +10516,53 @@ int rte_pmd_i40e_set_vf_vlan_tag(uint8_t port, 
> uint16_t vf_id, uint8_t on)
>  
>   return ret;
>  }
> +
> +int rte_pmd_i40e_set_vf_vlan_filter(uint8_t port, uint16_t vlan_id,
> + uint64_t vf_mask, uint8_t on)
> +{
<...>
> +
> + for (pool_idx = 0;
> +  pool_idx < ETH_64_POOLS && pool_idx < pf->nb_cfg_vmdq_vsi;
> +  pool_idx++) {
> + if (vf_mask & ((uint64_t)(1ULL << pool_idx))) {
> + if (on)
> + ret = i40e_vsi_add_vlan(pf->vmdq[pool_idx].vsi,
> + vlan_id);
> + else
> + ret = i40e_vsi_delete_vlan(
> + pf->vmdq[pool_idx].vsi, vlan_id);
> + }
> + }
> +
> + if (ret != I40E_SUCCESS)
> + PMD_DRV_LOG(ERR, "Failed to set VF VLAN filter, on = %d", on);

Since loop not break on error, this will only log the last one, if the
error is in the middle, it is missed.

<...>


Re: [dpdk-dev] [PATCH] Force python scripts to run with python2

2016-12-07 Thread Thomas Monjalon
2016-12-07 12:04, Mcnamara, John:
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Martin Kletzander
> > Sent: Wednesday, December 7, 2016 10:17 AM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH] Force python scripts to run with python2
> > 
> > With python3 being the default in some distributions/installations,
> > shebang with just "python" will make the script run under version of
> > python that the scripts are not written for.  In order to fix that and
> > mitigate future errors, use shebang properly and specify the python
> > version.  That way the scripts will run in any distro/install, no matter
> > what python version is set as the default.
> 
> I think a better approach would be to make the scripts Python 2 and Python 3 
> compatible.
> 
> Some of the new ones already are.

Yes

The "solution" using python2 in the shebang does not work everywhere
because python2 can be an unknown command.


Re: [dpdk-dev] [PATCH v2 03/12] crypto/armv8: Add core crypto operations for ARMv8

2016-12-07 Thread Thomas Monjalon
2016-12-07 04:54, Jerin Jacob:
> On Tue, Dec 06, 2016 at 02:41:01PM -0800, Thomas Monjalon wrote:
> > 2016-12-07 03:35, Jerin Jacob:
> > > On Tue, Dec 06, 2016 at 10:42:51PM +0100, Thomas Monjalon wrote:
> > > > 2016-12-07 02:48, Jerin Jacob:
> > > > > On Tue, Dec 06, 2016 at 09:29:25PM +0100, Thomas Monjalon wrote:
> > > > > > 2016-12-06 18:32, zbigniew.bo...@caviumnetworks.com:
> > > > > > > From: Zbigniew Bodek 
> > > > > > > 
> > > > > > > This patch adds core low-level crypto operations
> > > > > > > for ARMv8 processors. The assembly code is a base
> > > > > > > for an optimized PMD and is currently excluded
> > > > > > > from the build.
> > > > > > 
> > > > > > It's a bit sad that you cannot achieve the same performance with
> > > > > > C code and a good compiler.
> > > > > > Have you tried it? How much is the difference?
> > > > > 
> > > > > Like AES-NI on IA side(exposed as separate PMD in dpdk),
> > > > > armv8 has special dedicated instructions for crypto operation using 
> > > > > SIMD.
> > > > > This patch is using the "dedicated" armv8 crypto instructions and SIMD
> > > > > operation to achieve better performance.
> > > > 
> > > > It does not justify to have all the code in asm.
> > > 
> > > Why ? if we can have separate dpdk pmd for AES-NI on IA . Why not for ARM?
> > 
> > Jerin, you or me is not understanding the other.
> > It is perfectly fine to have a separate PMD.
> > I am just talking about the language C vs ASM.
> 
> Hmm. Both are bit connected topic :-)
> 
> If you check the AES-NI PMD installation guide, We need to download the
> "ASM" optimized AES-NI library and build with yasm.
> We all uses fine grained ASM code such work.
> So AES-NI case those are still ASM code but reside in some other
> library.

Yes

> http://dpdk.org/doc/guides/cryptodevs/aesni_mb.html(Check Installation 
> section)
> https://downloadcenter.intel.com/download/22972
> 
> Even linux kernel use, hardcore ASM for crypto work.
> https://github.com/torvalds/linux/blob/master/arch/arm/crypto/aes-ce-core.S

Yes

> > > > > We had compared with openssl implementation.Here is the performance
> > > > > improvement for chained crypto operations case WRT openssl pmd
> > > > > 
> > > > > Buffer
> > > > > Size(B)   OPS(M)  Throughput(Gbps)
> > > > > 64729 %742 %
> > > > > 128   577 %592 %
> > > > > 256   483 %476 %
> > > > > 512   336 %351 %
> > > > > 768   300 %286 %
> > > > > 1024  263 %250 %
> > > > > 1280  225 %229 %
> > > > > 1536  214 %213 %
> > > > > 1792  186 %203 %
> > > > > 2048  200 %193 %
> > > > 
> > > > OK but what is the performance difference between this asm code
> > > > and a C equivalent?
> > > 
> > > Do you you want compare against the scalar version of C code? its not
> > > even worth to think about it. The vector version will use
> > > dedicated armv8 instruction for crypto so its not portable anyway.
> > > We would like to asm code so that we can have better control on what we do
> > > and we cant rely compiler for that.
> > 
> > No I'm talking about comparing a PMD written in C vs this one in ASM.
> 
> Only fast stuff written in ASM. Remaining pmd is written in C.
> Look  "crypto/armv8: add PMD optimized for ARMv8 processors"
> 
> > It"s just harder to read ASM. Most of DPDK code is in C.
> > And only some small functions are written in ASM.
> > The vector instructions use some C intrinsics.
> > Do you mean that the instructions that you are using have no intrinsics
> > equivalent? Nobody made it into GCC?
> 
> There is intrinsic equivalent for crypto but that will work only on
> armv8. If we start using the arch specific intrinsic then it better to
> plain ASM code, it is clean and we all do similar scheme for core crypto
> work(like AES-NI library, linux etc)
> 
> We did a lot of effort to make clean armv8 ASM code _optimized_ for DPDK 
> workload.
> Just because someone doesn't familiar with armv8 Assembly its not fair to
> say write it in C.

I'm just saying it is sad, as it is sad for AES-NI or Linux code.
Please read again my questions:
Have you tried it? How much is the difference?

I'm not saying it should not enter in DPDK, I'm just asking some basic
questions to better understand the motivations and the status of ARM crypto
in general.
You did not answer for comparing with a C implementation, so I guess you
have implemented it in ASM without even trying to do it in C.
The conclusion: we will never know what is the real gain of coding this in ASM.


Re: [dpdk-dev] [PATCH v2 22/32] app/testpmd: use multicast promiscuous mode on i40e

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> Add testpmd CLI to set VF multicast promiscuous mode on i40e.
> 
> Signed-off-by: Wenzhuo Lu 
> ---
>  app/test-pmd/cmdline.c  | 86 
> +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  8 +++
>  2 files changed, 94 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index d39712e..7e7a016 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -407,6 +407,9 @@ static void cmd_help_long_parsed(void *parsed_result,
>  #ifdef RTE_LIBRTE_I40E_PMD
>   "set vf unicast-promisc (port_id) (vf_id) (on|off)\n"
>   "Set unicast promiscuous mode for a VF from the 
> PF.\n\n"
> +
> + "set vf multicast-promisc (port_id) (vf_id) (on|off)\n"
> + "Set multicast promiscuous mode for a VF from the 
> PF.\n\n"

Why not "allmulti" instead of multicast-promisc?

Also same comments as previous patch for help_str and documentation.

<...>


Re: [dpdk-dev] [PATCH v2 26/32] app/testpmd: initialize receive mode for VMDq

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> Initialise VMDq in the init_port_config function in a similar
> way to how it is done in the VMDq sample application.

What is the effect of doing existing initialization?

> 
> Signed-off-by: Bernard Iremonger 
> ---

<...>


Re: [dpdk-dev] [PATCH v2 27/32] net/i40e: change version number to support Linux VF

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> i40e PF host only support to work with DPDK VF driver, Linux
> VF driver is not supported. This change will enhance in version
> number returned.
> 
> Current version info returned won't be able to be recognized
> by Linux VF driver, change to values that both DPDK VF and Linux
> driver can recognize.
> 
> The expense is original DPDK host specific feature like
> CFG_VLAN_PVID and CONFIG_VSI_QUEUES_EXT will not available.
> 
> DPDK VF also can't identify host driver by version number returned.
> It always assume talking with Linux PF.

I guess you mention from following code [1], should it be also updated
to prevent it giving wrong information:

[1] i40e_ethdev_vf.c
if (vf->version_major == I40E_DPDK_VERSION_MAJOR)
PMD_DRV_LOG(INFO, "Peer is DPDK PF host");
else if ((vf->version_major == I40E_VIRTCHNL_VERSION_MAJOR) &&
(vf->version_minor <= I40E_VIRTCHNL_VERSION_MINOR))
PMD_DRV_LOG(INFO, "Peer is Linux PF host");
else {

> 
> Signed-off-by: Chen Jing D(Mark) 
> ---

<...>


Re: [dpdk-dev] [PATCH v2 28/32] net/i40e: return correct vsi_id

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> PF host didn't return correct VSI id to VF.
> This change fix it.

This looks like a fix for current code,
can you please update commit title and log to reflect the fix?

> 
> Signed-off-by: Chen Jing D(Mark) 
> ---
>  drivers/net/i40e/i40e_pf.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_pf.c b/drivers/net/i40e/i40e_pf.c
> index 0f582ed..8319c2c 100644
> --- a/drivers/net/i40e/i40e_pf.c
> +++ b/drivers/net/i40e/i40e_pf.c
> @@ -351,8 +351,7 @@
>  
>   /* Change below setting if PF host can support more VSIs for VF */
>   vf_res->vsi_res[0].vsi_type = I40E_VSI_SRIOV;
> - /* As assume Vf only has single VSI now, always return 0 */
> - vf_res->vsi_res[0].vsi_id = 0;
> + vf_res->vsi_res[0].vsi_id = vf->vsi->vsi_id;
>   vf_res->vsi_res[0].num_queue_pairs = vf->vsi->nb_qps;
>   ether_addr_copy(&vf->mac_addr,
>   (struct ether_addr *)vf_res->vsi_res[0].default_mac_addr);
> 



Re: [dpdk-dev] [PATCH v2 21/32] app/testpmd: use unicast promiscuous mode on i40e

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> Add testpmd CLI to set VF unicast promiscuous mode on i40e.
> 
> Signed-off-by: Wenzhuo Lu 
> ---
>  app/test-pmd/cmdline.c  | 92 
> +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  8 +++
>  2 files changed, 100 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 12126ce..d39712e 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -404,6 +404,11 @@ static void cmd_help_long_parsed(void *parsed_result,
>   "set allmulti (port_id|all) (on|off)\n"
>   "Set the allmulti mode on port_id, or all.\n\n"
>  
> +#ifdef RTE_LIBRTE_I40E_PMD
> + "set vf unicast-promisc (port_id) (vf_id) (on|off)\n"

Previous usages are all "promisc" instead of "unicals-promisc". Is this
to promisc mode for multicast packets? If so testpmd calls them
"allmulti" I guess, so they won't cause trouble.

Can we keep using command: "promisc"?

<...>

> +
> +cmdline_parse_inst_t cmd_set_vf_unicast_promisc = {
> + .f = cmd_set_vf_unicast_promisc_parsed,
> + .data = NULL,
> + .help_str = "set vf unicast promiscuous port_id vf_id on|off",

Can you please differentiate the keyword and variable by wrapping
variables with <>? Like:
"set vf unicast-promiscuous   on|off"

<...>

>  
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index f1c269a..e17e3d5 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -820,6 +820,14 @@ Set the allmulti mode for a port or for all ports::
>  
>  Same as the ifconfig (8) option. Controls how multicast packets are handled.
>  
> +set unicast promisc (for VF)
> +

Should we mention this is PMD specific feature and only enabled with
some PMDs?

> +
> +Set the unicast promiscuous mode for a VF from PF.
> +In promiscuous mode packets are not dropped if they aren't for the specified 
> MAC address::
> +
> +   testpmd> set vf unicast-promisc (port_id) (vf_id) (on|off)
> +
>  set flow_ctrl rx
>  
>  
> 



Re: [dpdk-dev] [PATCH v2 29/32] net/i40e: parse more VF parameter and configure

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> When VF requested to configure TX queue, a few parameters are
> missed to be configured in PF host. This change have more
> fields parsed and configured for TX context.

What is the effect of missing Tx paramters configured?

If this cause a bug, this patch should be a fix.

> 
> Signed-off-by: Chen Jing D(Mark) 
> ---
>  drivers/net/i40e/i40e_pf.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/i40e/i40e_pf.c b/drivers/net/i40e/i40e_pf.c
> index 8319c2c..1ad5ed1 100644
> --- a/drivers/net/i40e/i40e_pf.c
> +++ b/drivers/net/i40e/i40e_pf.c
> @@ -422,10 +422,12 @@
>  
>   /* clear the context structure first */
>   memset(&tx_ctx, 0, sizeof(tx_ctx));
> - tx_ctx.new_context = 1;
>   tx_ctx.base = txq->dma_ring_addr / I40E_QUEUE_BASE_ADDR_UNIT;
>   tx_ctx.qlen = txq->ring_len;
>   tx_ctx.rdylist = rte_le_to_cpu_16(vf->vsi->info.qs_handle[0]);
> + tx_ctx.head_wb_ena = txq->headwb_enabled;
> + tx_ctx.head_wb_addr = txq->dma_headwb_addr;
> +
>   err = i40e_clear_lan_tx_queue_context(hw, abs_queue_id);
>   if (err != I40E_SUCCESS)
>   return err;
> 



Re: [dpdk-dev] [PATCH v2 29/32] net/i40e: parse more VF parameter and configure

2016-12-07 Thread Chen, Jing D
Hi, Ferruh,

> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, December 7, 2016 11:19 PM
> To: Lu, Wenzhuo ; dev@dpdk.org
> Cc: Chen, Jing D 
> Subject: Re: [dpdk-dev] [PATCH v2 29/32] net/i40e: parse more VF parameter
> and configure
> 
> On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> > When VF requested to configure TX queue, a few parameters are missed
> > to be configured in PF host. This change have more fields parsed and
> > configured for TX context.
> 
> What is the effect of missing Tx paramters configured?
> 
> If this cause a bug, this patch should be a fix.
> 
This need to analyze case by case. If PF driver is serving a DPDK VF client, 
the missing part is OK. If serving a Linux VF client, the missing part will 
cause some unexpected TX queue behaviors. 

So, this is an enhancement to support Linux VF client. 

> >
> > Signed-off-by: Chen Jing D(Mark) 
> > ---
> >  drivers/net/i40e/i40e_pf.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/i40e/i40e_pf.c b/drivers/net/i40e/i40e_pf.c
> > index 8319c2c..1ad5ed1 100644
> > --- a/drivers/net/i40e/i40e_pf.c
> > +++ b/drivers/net/i40e/i40e_pf.c
> > @@ -422,10 +422,12 @@
> >
> > /* clear the context structure first */
> > memset(&tx_ctx, 0, sizeof(tx_ctx));
> > -   tx_ctx.new_context = 1;
> > tx_ctx.base = txq->dma_ring_addr / I40E_QUEUE_BASE_ADDR_UNIT;
> > tx_ctx.qlen = txq->ring_len;
> > tx_ctx.rdylist = rte_le_to_cpu_16(vf->vsi->info.qs_handle[0]);
> > +   tx_ctx.head_wb_ena = txq->headwb_enabled;
> > +   tx_ctx.head_wb_addr = txq->dma_headwb_addr;
> > +
> > err = i40e_clear_lan_tx_queue_context(hw, abs_queue_id);
> > if (err != I40E_SUCCESS)
> > return err;
> >



Re: [dpdk-dev] [PATCH v2 28/32] net/i40e: return correct vsi_id

2016-12-07 Thread Chen, Jing D
Ferruh,

> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, December 7, 2016 11:16 PM
> To: Lu, Wenzhuo ; dev@dpdk.org
> Cc: Chen, Jing D 
> Subject: Re: [dpdk-dev] [PATCH v2 28/32] net/i40e: return correct vsi_id
> 
> On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> > PF host didn't return correct VSI id to VF.
> > This change fix it.
> 
> This looks like a fix for current code,
> can you please update commit title and log to reflect the fix?
> 

This is similar patch to support Linux VF client. DPDK VF client needn't
Vsi ID.

> >
> > Signed-off-by: Chen Jing D(Mark) 
> > ---
> >  drivers/net/i40e/i40e_pf.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/i40e/i40e_pf.c b/drivers/net/i40e/i40e_pf.c
> > index 0f582ed..8319c2c 100644
> > --- a/drivers/net/i40e/i40e_pf.c
> > +++ b/drivers/net/i40e/i40e_pf.c
> > @@ -351,8 +351,7 @@
> >
> > /* Change below setting if PF host can support more VSIs for VF */
> > vf_res->vsi_res[0].vsi_type = I40E_VSI_SRIOV;
> > -   /* As assume Vf only has single VSI now, always return 0 */
> > -   vf_res->vsi_res[0].vsi_id = 0;
> > +   vf_res->vsi_res[0].vsi_id = vf->vsi->vsi_id;
> > vf_res->vsi_res[0].num_queue_pairs = vf->vsi->nb_qps;
> > ether_addr_copy(&vf->mac_addr,
> > (struct ether_addr *)vf_res->vsi_res[0].default_mac_addr);
> >



Re: [dpdk-dev] [PATCH v2 32/32] app/testpmd: fix invalid port ID

2016-12-07 Thread Ferruh Yigit
On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> Some CLIs don't check the input port ID, it
> may cause segmentation fault (core dumped).

Are these functions (that we are adding extra check) added with this
patchset? If so why not add these checks where function implemented.

If these functions are already implemented before this patchset, this
patch can be standalone patch, instead of being part of this patchset,
and can be before this patchset so that it can be easily backported to
stable trees.

> 
> Fixes: 425781ff5afe ("app/testpmd: add ixgbe VF management")
> 
> Signed-off-by: Wenzhuo Lu 
> Signed-off-by: Chen Jing D(Mark) 
> ---

<...>



Re: [dpdk-dev] [PATCH] Scheduler: add driver for scheduler crypto pmd

2016-12-07 Thread Declan Doherty

On 07/12/16 14:46, Richardson, Bruce wrote:




-Original Message-
From: Neil Horman [mailto:nhor...@tuxdriver.com]
Sent: Wednesday, December 7, 2016 2:17 PM
To: Doherty, Declan 
Cc: Richardson, Bruce ; Thomas Monjalon
; Zhang, Roy Fan ;
dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] Scheduler: add driver for scheduler crypto
pmd

On Wed, Dec 07, 2016 at 12:42:15PM +, Declan Doherty wrote:

On 05/12/16 15:12, Neil Horman wrote:

On Fri, Dec 02, 2016 at 04:22:16PM +, Declan Doherty wrote:

On 02/12/16 14:57, Bruce Richardson wrote:

On Fri, Dec 02, 2016 at 03:31:24PM +0100, Thomas Monjalon wrote:

2016-12-02 14:15, Fan Zhang:

This patch provides the initial implementation of the
scheduler poll mode driver using DPDK cryptodev framework.

Scheduler PMD is used to schedule and enqueue the crypto ops
to the hardware and/or software crypto devices attached to
it (slaves). The dequeue operation from the slave(s), and
the possible dequeued crypto op reordering, are then carried

out by the scheduler.


The scheduler PMD can be used to fill the throughput gap
between the physical core and the existing cryptodevs to
increase the overall performance. For example, if a physical
core has higher crypto op processing rate than a cryptodev,
the scheduler PMD can be introduced to attach more than one

cryptodevs.


This initial implementation is limited to supporting the
following scheduling modes:

- CRYPTO_SCHED_SW_ROUND_ROBIN_MODE (round robin amongst

attached software

slave cryptodevs, to set this mode, the scheduler should

have been

attached 1 or more software cryptodevs.

- CRYPTO_SCHED_HW_ROUND_ROBIN_MODE (round robin amongst

attached hardware

slave cryptodevs (QAT), to set this mode, the scheduler

should have

been attached 1 or more QATs.


Could it be implemented on top of the eventdev API?


Not really. The eventdev API is for different types of
scheduling between multiple sources that are all polling for
packets, compared to this, which is more analgous - as I
understand it - to the bonding PMD for ethdev.

To make something like this work with an eventdev API you would
need to use one of the following models:
* have worker cores for offloading packets to the different crypto
  blocks pulling from the eventdev APIs. This would make it

difficult to

  do any "smart" scheduling of crypto operations between the

blocks,

  e.g. that one crypto instance may be better at certain types of
  operations than another.
* move the logic in this driver into an existing eventdev

instance,

  which uses the eventdev api rather than the crypto APIs and so

has an

  extra level of "structure abstraction" that has to be worked

though.

  It's just not really a good fit.

So for this workload, I believe the pseudo-cryptodev instance is
the best way to go.

/Bruce




As Bruce says this is much more analogous to the ethdev bonding
driver, the main idea is to allow different crypto op scheduling
mechanisms to be defined transparently to an application. This
could be load-balancing across multiple hw crypto devices, or
having a software crypto device to act as a backup device for a hw
accelerator if it becomes oversubscribed. I think the main
advantage of a crypto-scheduler approach means that the data path
of the application doesn't need to have any knowledge that
scheduling is happening at all, it is just using a different crypto

device id, which is then manages the distribution of crypto work.





This is a good deal like the bonding pmd, and so from a certain
standpoint it makes sense to do this, but whereas the bonding pmd is
meant to create a single path to a logical network over several
physical networks, this pmd really only focuses on maximizing
througput, and for that we already have tools.  As Thomas mentions,
there is the eventdev library, but from my view the distributor
library already fits this bill.  It already is a basic framework to
process mbufs in parallel according to whatever policy you want to

implement, which sounds like exactly what the goal of this pmd is.


Neil




Hey Neil,

this is actually intended to act and look a good deal like the
ethernet bonding device but to handling the crypto scheduling use cases.

For example, take the case where multiple hw accelerators may be

available.

We want to provide user applications with a mechanism to transparently
balance work across all devices without having to manage the load
balancing details or the guaranteeing of ordering of the processed ops
on the dequeue_burst side. In this case the application would just use
the crypto dev_id of the scheduler and it would look after balancing
the workload across the available hw accelerators.


+---+
|  Crypto Sch PMD   |
|   |
| ORDERING / RR SCH |
+---+
^ ^ ^
| | |
  +-+ | +---+
  |   +---+ |
  |   | |
  V

Re: [dpdk-dev] [PATCH v2 03/12] crypto/armv8: Add core crypto operations for ARMv8

2016-12-07 Thread Jerin Jacob
On Wed, Dec 07, 2016 at 04:00:07PM +0100, Thomas Monjalon wrote:
> 2016-12-07 04:54, Jerin Jacob:
> > On Tue, Dec 06, 2016 at 02:41:01PM -0800, Thomas Monjalon wrote:
> > > 2016-12-07 03:35, Jerin Jacob:
> > > > On Tue, Dec 06, 2016 at 10:42:51PM +0100, Thomas Monjalon wrote:
> > > > > 2016-12-07 02:48, Jerin Jacob:
> > > > > > On Tue, Dec 06, 2016 at 09:29:25PM +0100, Thomas Monjalon wrote:
> > > > > > > 2016-12-06 18:32, zbigniew.bo...@caviumnetworks.com:
> > > > > > > > From: Zbigniew Bodek 
> > > > > > > > 
> > > > > > > > This patch adds core low-level crypto operations
> > > > > > > > for ARMv8 processors. The assembly code is a base
> > > > > > > > for an optimized PMD and is currently excluded
> > > > > > > > from the build.
> > > > > > > 
> > > > > > > It's a bit sad that you cannot achieve the same performance with
> > > > > > > C code and a good compiler.
> > > > > > > Have you tried it? How much is the difference?
> > > > > > 
> > > > > > Like AES-NI on IA side(exposed as separate PMD in dpdk),
> > > > > > armv8 has special dedicated instructions for crypto operation using 
> > > > > > SIMD.
> > > > > > This patch is using the "dedicated" armv8 crypto instructions and 
> > > > > > SIMD
> > > > > > operation to achieve better performance.
> > > > > 
> > > > > It does not justify to have all the code in asm.
> > > > 
> > > > Why ? if we can have separate dpdk pmd for AES-NI on IA . Why not for 
> > > > ARM?
> > > 
> > > Jerin, you or me is not understanding the other.
> > > It is perfectly fine to have a separate PMD.
> > > I am just talking about the language C vs ASM.
> > 
> > Hmm. Both are bit connected topic :-)
> > 
> > If you check the AES-NI PMD installation guide, We need to download the
> > "ASM" optimized AES-NI library and build with yasm.
> > We all uses fine grained ASM code such work.
> > So AES-NI case those are still ASM code but reside in some other
> > library.
> 
> Yes
> 
> > http://dpdk.org/doc/guides/cryptodevs/aesni_mb.html(Check Installation 
> > section)
> > https://downloadcenter.intel.com/download/22972
> > 
> > Even linux kernel use, hardcore ASM for crypto work.
> > https://github.com/torvalds/linux/blob/master/arch/arm/crypto/aes-ce-core.S
> 
> Yes
> 
> > > > > > We had compared with openssl implementation.Here is the performance
> > > > > > improvement for chained crypto operations case WRT openssl pmd
> > > > > > 
> > > > > > Buffer
> > > > > > Size(B)   OPS(M)  Throughput(Gbps)
> > > > > > 64729 %742 %
> > > > > > 128   577 %592 %
> > > > > > 256   483 %476 %
> > > > > > 512   336 %351 %
> > > > > > 768   300 %286 %
> > > > > > 1024  263 %250 %
> > > > > > 1280  225 %229 %
> > > > > > 1536  214 %213 %
> > > > > > 1792  186 %203 %
> > > > > > 2048  200 %193 %
> > > > > 
> > > > > OK but what is the performance difference between this asm code
> > > > > and a C equivalent?
> > > > 
> > > > Do you you want compare against the scalar version of C code? its not
> > > > even worth to think about it. The vector version will use
> > > > dedicated armv8 instruction for crypto so its not portable anyway.
> > > > We would like to asm code so that we can have better control on what we 
> > > > do
> > > > and we cant rely compiler for that.
> > > 
> > > No I'm talking about comparing a PMD written in C vs this one in ASM.
> > 
> > Only fast stuff written in ASM. Remaining pmd is written in C.
> > Look  "crypto/armv8: add PMD optimized for ARMv8 processors"
> > 
> > > It"s just harder to read ASM. Most of DPDK code is in C.
> > > And only some small functions are written in ASM.
> > > The vector instructions use some C intrinsics.
> > > Do you mean that the instructions that you are using have no intrinsics
> > > equivalent? Nobody made it into GCC?
> > 
> > There is intrinsic equivalent for crypto but that will work only on
> > armv8. If we start using the arch specific intrinsic then it better to
> > plain ASM code, it is clean and we all do similar scheme for core crypto
> > work(like AES-NI library, linux etc)
> > 
> > We did a lot of effort to make clean armv8 ASM code _optimized_ for DPDK 
> > workload.
> > Just because someone doesn't familiar with armv8 Assembly its not fair to
> > say write it in C.
> 
> I'm just saying it is sad, as it is sad for AES-NI or Linux code.
> Please read again my questions:
> Have you tried it? How much is the difference?

We haven't tried due to following reasons,
1) It is a norm in the industry to write such things in ASM.So we have to do it 
anyway.
2) It really takes a lot of R&D cycles first to write it in C and then ASM. So
skipped the R&D part and moved to ASM directly as we need to write in
ASM anyway.

> I'm not saying it should not enter in DPDK, I'm just asking some basic
> questions to better understand the motivations and the status of ARM crypto
> in general.

OK

> You did not answer for comparing wit

Re: [dpdk-dev] [PATCH v2 26/32] app/testpmd: initialize receive mode for VMDq

2016-12-07 Thread Iremonger, Bernard
Hi Ferruh,

> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, December 7, 2016 3:07 PM
> To: Lu, Wenzhuo ; dev@dpdk.org
> Cc: Iremonger, Bernard 
> Subject: Re: [dpdk-dev] [PATCH v2 26/32] app/testpmd: initialize receive
> mode for VMDq
> 
> On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> > Initialise VMDq in the init_port_config function in a similar way to
> > how it is done in the VMDq sample application.
> 
> What is the effect of doing existing initialization?

This patch results in the function i40e_vmdq_setup() function being called  in 
i40e_ethdev.c at
Line 1307: ret = i40e_vmdq_setup(dev);

We are seeing unexpected side effects from this patch. 
It will be dropped from the v3 of this patch set, pending further investigation.

Regards,

Bernard.



Re: [dpdk-dev] [PATCH v2 3/6] eventdev: implement the northbound APIs

2016-12-07 Thread Jerin Jacob
On Tue, Dec 06, 2016 at 05:17:12PM +, Bruce Richardson wrote:
> On Tue, Dec 06, 2016 at 09:22:17AM +0530, Jerin Jacob wrote:
> > This patch implements northbound eventdev API interface using
> > southbond driver interface
> > 
> > Signed-off-by: Jerin Jacob 
> > ---
> > +   /* Re allocate memory to store queue priority */
> > +   queues_prio = dev->data->queues_prio;
> > +   queues_prio = rte_realloc(queues_prio,
> > +   sizeof(queues_prio[0]) * nb_queues,
> > +   RTE_CACHE_LINE_SIZE);
> > +   if (queues_prio == NULL) {
> > +   RTE_EDEV_LOG_ERR("failed to realloc queue priority,"
> > +   " nb_queues %u", nb_queues);
> > +   return -(ENOMEM);
> > +   }
> > +   dev->data->queues_prio = queues_prio;
> > +
> > +   if (nb_queues > old_nb_queues) {
> > +   uint8_t new_qs = nb_queues - old_nb_queues;
> > +
> > +   memset(queues + old_nb_queues, 0,
> > +   sizeof(queues[0]) * new_qs);
> > +   memset(queues_prio + old_nb_queues, 0,
> > +   sizeof(queues_prio[0]) * new_qs);
> > +   }
> > +   } else if (dev->data->queues != NULL && nb_queues == 0) {
> > +   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->queue_release, -ENOTSUP);
> > +
> > +   queues = dev->data->queues;
> > +   for (i = nb_queues; i < old_nb_queues; i++)
> > +   (*dev->dev_ops->queue_release)(queues[i]);
> > +   }
> > +
> > +   dev->data->nb_queues = nb_queues;
> > +   return 0;
> > +}
> > +
> While the ports array makes sense to have available at the top level of
> the API and allocated from rte_eventdev.c, I'm not seeing what the value
> of having the queues allocated at that level is. The only time the queue
> array is indexed by eventdev layer is when releasing a queue. Therefore,
> I suggest just saving the number of queues for sanity checking and let
> the queue array allocation and freeing be handled entirely in the
> drivers themselves.

I thought it would be useful for other drivers. I agree, If something is not
common across all the driver lets remove it from common code.
I will remove it in v3

> 
> /Bruce


[dpdk-dev] [PATCH] net/ixgbe: fix typo in comment

2016-12-07 Thread Ferruh Yigit
Fixes: c03fcee9abbd ("ixgbe: remove CRC size from byte counters")

Signed-off-by: Ferruh Yigit 
---
CC: sta...@dpdk.org
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index edc9b22..4626d01 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -2564,9 +2564,9 @@ ixgbe_read_stats_registers(struct ixgbe_hw *hw,
uint32_t delta_gprc = 0;
unsigned i;
/* Workaround for RX byte count not including CRC bytes when CRC
-+   * strip is enabled. CRC bytes are removed from counters when crc_strip
+* strip is enabled. CRC bytes are removed from counters when crc_strip
 * is disabled.
-+   */
+*/
int crc_strip = (IXGBE_READ_REG(hw, IXGBE_HLREG0) &
IXGBE_HLREG0_RXCRCSTRP);
 
-- 
2.9.3



Re: [dpdk-dev] [PATCH] config: remove insecure warnings

2016-12-07 Thread Thomas Monjalon
2016-12-05 10:27, Bruce Richardson:
> On Sun, Dec 04, 2016 at 11:17:06PM +0100, Thomas Monjalon wrote:
> > There was an option CONFIG_RTE_INSECURE_FUNCTION_WARNING (disabled by
> > default), which prevents from using some libc functions:
> > sprintf, snprintf, vsnprintf, strcpy, strncpy, strcat, strncat, sscanf,
> > strtok, strsep and strlen.
> > 
> > It's all about using them at the right place with the right precautions.
> > However, it is neither really possible nor a good advice to disable them.
> > 
> > Signed-off-by: Thomas Monjalon 
> 
> Agreed. That option has been in DPDK a long time and I suspect is never
> used.
> 
> Acked-by: Bruce Richardson 

Applied


Re: [dpdk-dev] [PATCH v2 16/32] net/i40e: add set VF VLAN insert function

2016-12-07 Thread Iremonger, Bernard
Hi Ferruh,

> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, December 7, 2016 2:26 PM
> To: Lu, Wenzhuo ; dev@dpdk.org
> Cc: Iremonger, Bernard 
> Subject: Re: [dpdk-dev] [PATCH v2 16/32] net/i40e: add set VF VLAN insert
> function
> 
> On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
> > Support inserting VF VLAN id from PF.
> > User can call the API on PF to insert a VLAN id to a specific VF.
> 
> Same comment with prev patch, does it make sense to insert " from PF" to
> patch title?

Consistency in the patch titles makes sense.
"set VF VLAN filter from PF"  might be better. 
 
> >
> > Signed-off-by: Bernard Iremonger 
> > ---
> >  drivers/net/i40e/i40e_ethdev.c| 53
> +++
> >  drivers/net/i40e/rte_pmd_i40e.h   | 19 +++
> >  drivers/net/i40e/rte_pmd_i40e_version.map |  1 +
> >  3 files changed, 73 insertions(+)
> >
> > diff --git a/drivers/net/i40e/i40e_ethdev.c
> > b/drivers/net/i40e/i40e_ethdev.c index 253209b..c571d8b 100644
> > --- a/drivers/net/i40e/i40e_ethdev.c
> > +++ b/drivers/net/i40e/i40e_ethdev.c
> > @@ -10361,3 +10361,56 @@ static void i40e_set_default_mac_addr(struct
> rte_eth_dev *dev,
> > else
> > return -EINVAL;
> >  }
> > +
> > +int rte_pmd_i40e_set_vf_vlan_insert(uint8_t port, uint16_t vf_id,
> > +   uint16_t vlan_id)
> > +{
> > +   struct rte_eth_dev *dev;
> > +   struct rte_eth_dev_info dev_info;
> > +   struct i40e_pf *pf;
> > +   struct i40e_pf_vf *vf;
> > +   struct i40e_hw *hw;
> > +   struct i40e_vsi *vsi;
> > +   struct i40e_vsi_context ctxt;
> > +   int ret;
> > +
> > +   RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
> > +
> > +   dev = &rte_eth_devices[port];
> > +   rte_eth_dev_info_get(port, &dev_info);
> > +
> > +   pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> > +   hw = I40E_PF_TO_HW(pf);
> > +
> > +   /**
> > +* return -ENODEV if SRIOV not enabled, VF number not configured
> > +* or no queue assigned.
> > +*/
> > +   if (!hw->func_caps.sr_iov_1_1 || pf->vf_num == 0 ||
> > +   pf->vf_nb_qps == 0)
> > +   return -ENODEV;
> > +
> > +   if (vf_id > pf->vf_num)

if (vf_id >= pf->vf_num)  

would be better.
 
> This check was [1] in prev patches:
> [1]
>   if (vf_id > pf->vf_num - 1 || !pf->vfs)

if (vf_id >= pf->vf_num  || !pf->vfs)

would be better.

> 
> > +   return -EINVAL;
> > +
> > +   if (vlan_id > 4095)
> 
> Can there be any define in base driver for this? Or ETH_VLAN_ID_MAX
> perhaps?

ETHER_MAX_VLAN_ID in rte_ether.h should be used .
 
> > +   return -EINVAL;
> > +
> > +   vf = &pf->vfs[vf_id];
> > +   vsi = vf->vsi;
> > +
> > +   vsi->info.valid_sections =
> cpu_to_le16(I40E_AQ_VSI_PROP_VLAN_VALID);
> > +   vsi->info.port_vlan_flags |= I40E_AQ_VSI_PVLAN_INSERT_PVID;
> > +   vsi->info.pvid = vlan_id;
> > +
> 
> --->
> > +   memset(&ctxt, 0, sizeof(ctxt));
> > +   (void)rte_memcpy(&ctxt.info, &vsi->info, sizeof(vsi->info));
> > +   ctxt.seid = vsi->seid;
> > +
> > +   hw = I40E_VSI_TO_HW(vsi);
> > +   ret = i40e_aq_update_vsi_params(hw, &ctxt, NULL);
> > +   if (ret != I40E_SUCCESS)
> > +   PMD_DRV_LOG(ERR, "Failed to update VSI params");
> <-
> 
> If Wenzhuo prefers to extract this part into a function, it can be re-used 
> here
> too.
> 
> <...>



Re: [dpdk-dev] [PATCH v3] kni: avoid using lsb_release script

2016-12-07 Thread Thomas Monjalon
2016-11-25 12:24, Ferruh Yigit:
> On 11/25/2016 12:22 PM, Robin Jarry wrote:
> > The lsb_release script is part of an optional package which is not
> > always installed. On the other hand, /etc/lsb-release is always present
> > even on minimal Ubuntu installations.
> > 
> > root@ubuntu1604:~# dpkg -S /etc/lsb-release
> > base-files: /etc/lsb-release
> > 
> > Read the file if present and use the variables defined in it.
> > 
> > Signed-off-by: Robin Jarry 
> 
> Acked-by: Ferruh Yigit 

Applied, thanks


Re: [dpdk-dev] [PATCH v2 19/32] net/i40e: set VF VLAN filter from PF

2016-12-07 Thread Iremonger, Bernard
Hi Ferruh,

> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, December 7, 2016 2:46 PM
> To: Lu, Wenzhuo ; dev@dpdk.org
> Cc: Iremonger, Bernard 
> Subject: Re: [dpdk-dev] [PATCH v2 19/32] net/i40e: set VF VLAN filter from
> PF
> 
> On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> > add rte_pmd_i40e_set_vf_vlan_filter API.
> > User can call the API on PF to enable/disable a set of VF's VLAN
> > filters.
> >
> > Signed-off-by: Bernard Iremonger 
> > ---
> >  drivers/net/i40e/i40e_ethdev.c| 50
> +++
> >  drivers/net/i40e/rte_pmd_i40e.h   | 22 ++
> >  drivers/net/i40e/rte_pmd_i40e_version.map |  1 +
> >  3 files changed, 73 insertions(+)
> >
> > diff --git a/drivers/net/i40e/i40e_ethdev.c
> > b/drivers/net/i40e/i40e_ethdev.c index 601e933..bc96698 100644
> > --- a/drivers/net/i40e/i40e_ethdev.c
> > +++ b/drivers/net/i40e/i40e_ethdev.c
> > @@ -10516,3 +10516,53 @@ int rte_pmd_i40e_set_vf_vlan_tag(uint8_t
> > port, uint16_t vf_id, uint8_t on)
> >
> > return ret;
> >  }
> > +
> > +int rte_pmd_i40e_set_vf_vlan_filter(uint8_t port, uint16_t vlan_id,
> > +   uint64_t vf_mask, uint8_t on) {
> <...>
> > +
> > +   for (pool_idx = 0;
> > +pool_idx < ETH_64_POOLS && pool_idx < pf->nb_cfg_vmdq_vsi;
> > +pool_idx++) {
> > +   if (vf_mask & ((uint64_t)(1ULL << pool_idx))) {
> > +   if (on)
> > +   ret = i40e_vsi_add_vlan(pf-
> >vmdq[pool_idx].vsi,
> > +   vlan_id);
> > +   else
> > +   ret = i40e_vsi_delete_vlan(
> > +   pf->vmdq[pool_idx].vsi, vlan_id);
> > +   }
> > +   }
> > +
> > +   if (ret != I40E_SUCCESS)
> > +   PMD_DRV_LOG(ERR, "Failed to set VF VLAN filter, on = %d",
> on);
> 
> Since loop not break on error, this will only log the last one, if the error 
> is in
> the middle, it is missed.

It would be better to break out of the loop on error.

> 
> <...>

Regards,

Bernard.



[dpdk-dev] [RFC] pci: remove unused UNBIND support

2016-12-07 Thread Stephen Hemminger
No device driver sets the unbind flag in current public code base.
Therefore it is good time to remove the unused dead code.

Signed-off-by: Stephen Hemminger 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c |  9 
 lib/librte_eal/common/eal_common_pci.c  |  5 -
 lib/librte_eal/common/include/rte_pci.h |  2 --
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 39 -
 4 files changed, 55 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 8b3ed88..3a5c315 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -87,15 +87,6 @@
  * enabling bus master.
  */
 
-/* unbind kernel driver for this device */
-int
-pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused)
-{
-   RTE_LOG(ERR, EAL, "RTE_PCI_DRV_FORCE_UNBIND flag is not implemented "
-   "for BSD\n");
-   return -ENOTSUP;
-}
-
 /* Map pci device */
 int
 rte_eal_pci_map_device(struct rte_pci_device *dev)
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 6bff675..33485bc 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -203,11 +203,6 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
ret = rte_eal_pci_map_device(dev);
if (ret != 0)
return ret;
-   } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
-   rte_eal_process_type() == RTE_PROC_PRIMARY) {
-   /* unbind current driver */
-   if (pci_unbind_kernel_driver(dev) < 0)
-   return -1;
}
 
/* reference driver structure */
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 9ce8847..fbc4311 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -208,8 +208,6 @@ struct rte_pci_driver {
 
 /** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
 #define RTE_PCI_DRV_NEED_MAPPING 0x0001
-/** Device needs to be unbound even if no module is provided */
-#define RTE_PCI_DRV_FORCE_UNBIND 0x0004
 /** Device driver supports link state interrupt */
 #define RTE_PCI_DRV_INTR_LSC   0x0008
 /** Device driver supports detaching capability */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 876ba38..4350134 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -54,45 +54,6 @@
  * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */
 
-/* unbind kernel driver for this device */
-int
-pci_unbind_kernel_driver(struct rte_pci_device *dev)
-{
-   int n;
-   FILE *f;
-   char filename[PATH_MAX];
-   char buf[BUFSIZ];
-   struct rte_pci_addr *loc = &dev->addr;
-
-   /* open /sys/bus/pci/devices/:BB:CC.D/driver */
-   snprintf(filename, sizeof(filename),
-   "%s/" PCI_PRI_FMT "/driver/unbind", pci_get_sysfs_path(),
-   loc->domain, loc->bus, loc->devid, loc->function);
-
-   f = fopen(filename, "w");
-   if (f == NULL) /* device was not bound */
-   return 0;
-
-   n = snprintf(buf, sizeof(buf), PCI_PRI_FMT "\n",
-loc->domain, loc->bus, loc->devid, loc->function);
-   if ((n < 0) || (n >= (int)sizeof(buf))) {
-   RTE_LOG(ERR, EAL, "%s(): snprintf failed\n", __func__);
-   goto error;
-   }
-   if (fwrite(buf, n, 1, f) == 0) {
-   RTE_LOG(ERR, EAL, "%s(): could not write to %s\n", __func__,
-   filename);
-   goto error;
-   }
-
-   fclose(f);
-   return 0;
-
-error:
-   fclose(f);
-   return -1;
-}
-
 static int
 pci_get_kernel_driver_by_path(const char *filename, char *dri_name)
 {
-- 
2.10.2



Re: [dpdk-dev] [PATCH] ethdev: don't look for devices if none were found

2016-12-07 Thread Thomas Monjalon
2016-11-19 13:10, Anatoly Burakov:
> Aside from avoiding doing useless work, this also fixes a segfault
> when calling rte_eth_dev_get_port_by_name() whenever no devices
> were found yet, and therefore rte_eth_dev_data wasn't yet allocated.
> 
> Fixes: 9c5b8d8b9feb ("ethdev: clean port id retrieval when attaching")
> 
> Signed-off-by: Anatoly Burakov 

Acked-by: Thomas Monjalon 

Title reworded: "ethdev: fix port lookup if none"

Applied, thanks

PS: your patches are not in patchwork because a strange character in
an email header and a bug in patchwork that I thought was fixed.


Re: [dpdk-dev] [PATCH v3] net/i40evf: fix reporting of imissed packets

2016-12-07 Thread Ferruh Yigit
On 12/6/2016 8:16 PM, Tom Crugnale wrote:
> Missed packets on RX were erroneously being assigned to the ierrors
> struct member. Change it to be assigned to imissed.
> 
> Fixes: 4861cde4 ("i40e: new poll mode driver")
> 
> Signed-off-by: Tom Crugnale 
> ---
> v3:
> * Fixed coding style issues
> v2:
> * Fixed line breaks in original patch submission
> ---
> 
>  drivers/net/i40e/i40e_ethdev_vf.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
> b/drivers/net/i40e/i40e_ethdev_vf.c
> index aa306d6..90876c8 100644
> --- a/drivers/net/i40e/i40e_ethdev_vf.c
> +++ b/drivers/net/i40e/i40e_ethdev_vf.c
> @@ -952,7 +952,7 @@ struct rte_i40evf_xstats_name_off {
>  }
>  
>  static int
> -i40evf_get_statics(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +i40evf_get_stats(struct rte_eth_dev *dev, struct rte_eth_stats *stats)

I aware this is simple modification, but if you can, can you please make
this separate patch?
Stable trees may not interested in internal function name changes but
may want to get stats counter fix.

>  {
>   int ret;
>   struct i40e_eth_stats *pstats = NULL;
> @@ -965,7 +965,7 @@ struct rte_i40evf_xstats_name_off {
>   pstats->rx_broadcast;
>   stats->opackets = pstats->tx_broadcast + pstats->tx_multicast +
>   pstats->tx_unicast;
> - stats->ierrors = pstats->rx_discards;
> + stats->imissed = pstats->rx_discards;

so won't VF driver report any rx_error?

>   stats->oerrors = pstats->tx_errors + pstats->tx_discards;

This line is not part of the patch, but what do you think about this
one, should oerrors contain tx_discards values?

>   stats->ibytes = pstats->rx_bytes;
>   stats->obytes = pstats->tx_bytes;
> @@ -2277,8 +2277,8 @@ static int i40evf_dev_xstats_get(struct rte_eth_dev 
> *dev,
>  static void
>  i40evf_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>  {
> - if (i40evf_get_statics(dev, stats))
> - PMD_DRV_LOG(ERR, "Get statics failed");
> + if (i40evf_get_stats(dev, stats))
> + PMD_DRV_LOG(ERR, "Get stats failed");
>  }
>  
>  static void
> 



Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model

2016-12-07 Thread Jerin Jacob
On Tue, Dec 06, 2016 at 04:51:19PM +, Bruce Richardson wrote:
> On Tue, Dec 06, 2016 at 09:22:15AM +0530, Jerin Jacob wrote:
> > In a polling model, lcores poll ethdev ports and associated
> > rx queues directly to look for packet. In an event driven model,
> > by contrast, lcores call the scheduler that selects packets for
> > them based on programmer-specified criteria. Eventdev library
> > adds support for event driven programming model, which offer
> > applications automatic multicore scaling, dynamic load balancing,
> > pipelining, packet ingress order maintenance and
> > synchronization services to simplify application packet processing.
> > 
> > By introducing event driven programming model, DPDK can support
> > both polling and event driven programming models for packet processing,
> > and applications are free to choose whatever model
> > (or combination of the two) that best suits their needs.
> > 
> > This patch adds the eventdev specification header file.
> > 
> > Signed-off-by: Jerin Jacob 
> > ---
> > +   /** WORD1 */
> > +   RTE_STD_C11
> > +   union {
> > +   uint64_t u64;
> > +   /**< Opaque 64-bit value */
> > +   uintptr_t event_ptr;
> > +   /**< Opaque event pointer */
> 
> Since we have a uint64_t member of the union, might this be better as a
> void * rather than uintptr_t?

No strong opinion here. For me, uintptr_t looks clean.
But, It is OK to change to void* as per your input.

> 
> > +   struct rte_mbuf *mbuf;
> > +   /**< mbuf pointer if dequeued event is associated with mbuf */
> > +   };
> > +};
> > +
> 
> > +/**
> > + * Link multiple source event queues supplied in *rte_event_queue_link*
> > + * structure as *queue_id* to the destination event port designated by its
> > + * *port_id* on the event device designated by its *dev_id*.
> > + *
> > + * The link establishment shall enable the event port *port_id* from
> > + * receiving events from the specified event queue *queue_id*
> > + *
> > + * An event queue may link to one or more event ports.
> > + * The number of links can be established from an event queue to event 
> > port is
> > + * implementation defined.
> > + *
> > + * Event queue(s) to event port link establishment can be changed at 
> > runtime
> > + * without re-configuring the device to support scaling and to reduce the
> > + * latency of critical work by establishing the link with more event ports
> > + * at runtime.
> 
> I think this might need to be clarified. The device doesn't need to be
> reconfigured, but does it need to be stopped? In SW implementation, this
> affects how much we have to make things thread-safe. At minimum I think
> we should limit this to having only one thread call the function at a
> time, but we may allow enqueue dequeue ops from the data plane to run
> in parallel.

Cavium implementation can change it at runtime without re-configuring or 
stopping
the device to support runtime load balancing from the application perspective.

AFAIK, link establishment is _NOT_ fast path API. But the application
can invoke it from worker thread whenever there is a need for re-wiring
the queue to port connection for better explicit load balancing. IMO, A
software implementation with lock is fine here as we don't use this in
fastpath.

Thoughts?
>
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + *
> > + */
> > +int
> > +rte_event_port_link(uint8_t dev_id, uint8_t port_id,
> > +   const struct rte_event_queue_link link[],
> > +   uint16_t nb_links);
> > +
> > +/**
> > + * Unlink multiple source event queues supplied in *queues* from the 
> > destination
> > + * event port designated by its *port_id* on the event device designated
> > + * by its *dev_id*.
> > + *
> > + * The unlink establishment shall disable the event port *port_id* from
> > + * receiving events from the specified event queue *queue_id*
> > + *
> > + * Event queue(s) to event port unlink establishment can be changed at 
> > runtime
> > + * without re-configuring the device.
> 
> Clarify, as above with link call.

Same as above.

> 
> > + *


Re: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler

2016-12-07 Thread Dumitrescu, Cristian
Hi Steve,

Thanks for your comments!


> This seems to be more of an abstraction of existing QoS.

Really? Why exactly do you say this, any particular examples?

I think the current proposal provides an abstraction for far more features than 
librte_sched provides. The goal for this API is to be able to describe 
virtually any hierarchy that could be implemented in HW and/or SW, not just 
what is currently provided by librte_sched.

If your statement is true, then I failed in my mission, and hopefully I didn't 
:)


> Why not something like Linux Qdisc or FreeBSD DummyNet/PF/ALTQ where the Qos 
> components are stackable objects? 

After designing Packet Framework, I don't think anybody could suspect me of not 
being a fan of stackable objects ;). Not sure why you say this either, as 
basically current proposal builds the hierarchy out of inter-connected nodes 
sitting on top of shapers and WRED contexts. To me, this is a decent stack?

I don't think this proposal is that far away from Linux qdisc: qdisc classes 
are nodes, shapers are present, WRED contexts as well. Any particular qdisc 
feature you see missing?

Of course, Linux qdisc also includes classification, policing, marking, etc 
which are outside of the hierarchical scheduling that is targeted by this 
proposal. But this is an interesting thought: designing a qdisc-like layer 
within DPDK that binds together classification, policing, filters, scheduling.


> And why not make it the same as existing OS abstractions?

Do you mean using the Linux qdisc API and implementation as is? Of course, this 
is GPL licensed code and we cannot do this in DPDK.

Do you mean having a Linux qdisc-like API? I largely agree with this, and I 
think the current proposal is very much inline with this; if you think 
otherwise, again, specific examples of what's missing would help a lot. I can 
also take a look at DummyNet to make sure there is nothing left behind.


> Rather than reinventing wheel which seems to be DPDK Standard Procedure, 
> could an existing abstraction be used?

I thought we are just trying to create a car instead of a faster horse :)


Regards,
Cristian


Re: [dpdk-dev] [PATCH v2 02/12] lib: add cryptodev type for the upcoming ARMv8 PMD

2016-12-07 Thread Zbigniew Bodek

On 06.12.2016 21:27, Thomas Monjalon wrote:

2016-12-06 18:32, zbigniew.bo...@caviumnetworks.com:

From: Zbigniew Bodek 

Add type and name for ARMv8 crypto PMD

Signed-off-by: Zbigniew Bodek 

[...]

--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -66,6 +66,8 @@
 /**< KASUMI PMD device name */
 #define CRYPTODEV_NAME_ZUC_PMD crypto_zuc
 /**< KASUMI PMD device name */
+#define CRYPTODEV_NAME_ARMV8_PMD   crypto_armv8
+/**< ARMv8 CM device name */

 /** Crypto device type */
 enum rte_cryptodev_type {
@@ -77,6 +79,7 @@ enum rte_cryptodev_type {
RTE_CRYPTODEV_KASUMI_PMD,   /**< KASUMI PMD */
RTE_CRYPTODEV_ZUC_PMD,  /**< ZUC PMD */
RTE_CRYPTODEV_OPENSSL_PMD,/**<  OpenSSL PMD */
+   RTE_CRYPTODEV_ARMV8_PMD,/**< ARMv8 crypto PMD */
 };


Can we remove all these types and names in the generic crypto API?



Hello Thomas,

I added another PMD type and therefore we need new, unique number for 
it. I'm not sure if I understand correctly what you mean here, so please 
elaborate.


Kind regards
Zbigniew


[dpdk-dev] [PATCH v2] maintainers: update for qede PMD and bnx2x PMD

2016-12-07 Thread Rasesh Mody
Following Cavium's acquisition of QLogic we need to update all the
qlogic PMD maintainer's entries to point to our new e-mail addresses.
Update driver's maintainers as they are no longer working for Cavium.

Thanks to Sony Chacko for his support and development of our various
dpdk drivers.

Signed-off-by: Rasesh Mody 
---
 MAINTAINERS |   10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d915426..1c170d7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -351,16 +351,14 @@ F: drivers/net/nfp/
 F: doc/guides/nics/nfp.rst
 
 QLogic bnx2x
-M: Sony Chacko 
-M: Harish Patil 
-M: Rasesh Mody 
+M: Harish Patil 
+M: Rasesh Mody 
 F: drivers/net/bnx2x/
 F: doc/guides/nics/bnx2x.rst
 
 QLogic qede PMD
-M: Harish Patil 
-M: Rasesh Mody 
-M: Sony Chacko 
+M: Rasesh Mody 
+M: Harish Patil 
 F: drivers/net/qede/
 F: doc/guides/nics/qede.rst
 
-- 
1.7.10.3



[dpdk-dev] [PATCH v10] drivers/net:new TUN/TAP device PMD

2016-12-07 Thread Keith Wiles
The PMD allows for DPDK and the host to communicate using a raw
device interface on the host and in the DPDK application. The device
created is a Tap device with a L2 packet header.

v10- Change the string name used to allow for multiple devices.
v9 - Fix up the docs to use correct syntax
v8 - Fix issue with tap_tx_queue_setup() not return zero on success.
v7 - Reword the comment in common_base and fix the data->name issue
v6 - fixed the checkpatch issues
v5 - merge in changes from list review see related emails
 fixed many minor edits
v4 - merge with latest driver changes
v3 - fix includes by removing ifdef for other type besides Linux
 Fix the copyright notice in the Makefile
v2 - merge all of the patches into one patch
 Fix a typo on naming the tap device
 Update the maintainers list

Signed-off-by: Keith Wiles 
---
 MAINTAINERS |   5 +
 config/common_base  |   9 +
 config/common_linuxapp  |   1 +
 doc/guides/nics/tap.rst | 136 ++
 drivers/net/Makefile|   1 +
 drivers/net/tap/Makefile|  57 +++
 drivers/net/tap/rte_eth_tap.c   | 765 
 drivers/net/tap/rte_pmd_tap_version.map |   4 +
 mk/rte.app.mk   |   1 +
 9 files changed, 979 insertions(+)
 create mode 100644 doc/guides/nics/tap.rst
 create mode 100644 drivers/net/tap/Makefile
 create mode 100644 drivers/net/tap/rte_eth_tap.c
 create mode 100644 drivers/net/tap/rte_pmd_tap_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 26d9590..842fb6d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -398,6 +398,11 @@ F: doc/guides/nics/pcap_ring.rst
 F: app/test/test_pmd_ring.c
 F: app/test/test_pmd_ring_perf.c
 
+Tap PMD
+M: Keith Wiles 
+F: drivers/net/tap
+F: doc/guides/nics/tap.rst
+
 Null Networking PMD
 M: Tetsuya Mukawa 
 F: drivers/net/null/
diff --git a/config/common_base b/config/common_base
index 652a839..eb51cdb 100644
--- a/config/common_base
+++ b/config/common_base
@@ -590,3 +590,12 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
 CONFIG_RTE_TEST_PMD=y
 CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
 CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
+
+#
+# Compile the TAP PMD
+#
+# The TAP PMD is currently only built for Linux and the
+# option is enabled by default in common_linuxapp file,
+# set to 'n' in the common_base file.
+#
+CONFIG_RTE_LIBRTE_PMD_TAP=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2483dfa..782b503 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -44,3 +44,4 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_POWER=y
 CONFIG_RTE_VIRTIO_USER=y
+CONFIG_RTE_LIBRTE_PMD_TAP=y
diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
new file mode 100644
index 000..622b9e7
--- /dev/null
+++ b/doc/guides/nics/tap.rst
@@ -0,0 +1,136 @@
+..  BSD LICENSE
+Copyright(c) 2016 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Tun/Tap Poll Mode Driver
+
+
+The ``rte_eth_tap.c`` PMD creates a device using TUN/TAP interfaces on the
+local host. The PMD allows for DPDK and the host to communicate using a raw
+device interface on the host and in the DPDK application.
+
+The device created is a TAP device, which sends/receives packet in a raw
+format with a L2 header. The usage for a TAP PMD is for connectivity to the
+local host us

Re: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler

2016-12-07 Thread Dumitrescu, Cristian
Hi Alan,

Thanks for your comments!


> Hi Cristian,

> Looking at points 10 and 11 it's good to hear nodes can be dynamically added.

Yes, many implementations allow on-the-fly remapping a node from one parent to 
another
one, or simply adding more nodes post-initialization, so it is natural for the 
API to provide this.


> We've been trying to decide the best way to do this for support of qos on 
> tunnels for
> some time now and the existing implementation doesn't allow this so 
> effectively ruled
> out hierarchical queueing for tunnel targets on the output interface.

> Having said that, has thought been given to separating the queueing from 
> being so closely
> tied to the Ethernet transmit process ?   When queueing on a tunnel for 
> example we may
> be working with encryption.   When running with an anti-reply window it is 
> really much
> better to do the QOS (packet reordering) before the encryption.  To support 
> this would
> it be possible to have a separate scheduler structure which can be passed 
> into the
> scheduling API ?  This means the calling code can hang the structure of 
> whatever entity
> it wishes to perform qos on, and we get dynamic target support 
> (sessions/tunnels etc).

Yes, this is one point where we need to look for a better solution. Current 
proposal attaches
the hierarchical scheduler function to an ethdev, so scheduling traffic for 
tunnels that have a
pre-defined bandwidth is not supported nicely. This question was also raised in 
VPP, but
there tunnels are supported as a type of output interfaces, so attaching 
scheduling to an
output interface also covers the tunnels case.

Looks to me that nice tunnel abstractions are a gap in DPDK as well. Any 
thoughts about
how tunnels should be supported in DPDK? What do other people think about this?


> Regarding the structure allocation, would it be possible to make the number 
> of queues
> associated with a TC a compile time option which the scheduler would 
> accommodate ?
> We frequently only use one queue per tc which means 75% of the space 
> allocated at
> the queueing layer for that tc is never used.  This may be specific to our 
> implementation
> but if other implementations do the same if folks could say we may get a 
> better idea
> if this is a common case.

> Whilst touching on the scheduler, the token replenishment works using a 
> division and
> multiplication obviously to cater for the fact that it may be run after 
> several tc windows
> have passed.  The most commonly used industrial scheduler simply does a 
> lapsed on the tc
> and then adds the bc.   This relies on the scheduler being called within the 
> tc window
> though.  It would be nice to have this as a configurable option since it's 
> much for efficient
> assuming the infra code from which it's called can guarantee the calling 
> frequency.

This is probably feedback for librte_sched as opposed to the current API 
proposal, as the
Latter is intended to be generic/implementation-agnostic and therefor its scope 
far
exceeds the existing set of librte_sched features.

Btw, we do plan using the librte_sched feature as the default fall-back when 
the HW
ethdev is not scheduler-enabled, as well as the implementation of choice for a 
lot of
use-cases where it fits really well, so we do have to continue evolve and 
improve
librte_sched feature-wise and performance-wise.


> I hope you'll consider these points for inclusion into a future road map.  
> Hopefully in the
> future my employer will increase the priority of some of the tasks and a PR 
> may appear
> on the mailing list.

> Thanks,
> Alan.



Re: [dpdk-dev] [PATCH v2 02/12] lib: add cryptodev type for the upcoming ARMv8 PMD

2016-12-07 Thread Thomas Monjalon
2016-12-07 20:04, Zbigniew Bodek:
> On 06.12.2016 21:27, Thomas Monjalon wrote:
> > 2016-12-06 18:32, zbigniew.bo...@caviumnetworks.com:
> >> From: Zbigniew Bodek 
> >>
> >> Add type and name for ARMv8 crypto PMD
> >>
> >> Signed-off-by: Zbigniew Bodek 
> > [...]
> >> --- a/lib/librte_cryptodev/rte_cryptodev.h
> >> +++ b/lib/librte_cryptodev/rte_cryptodev.h
> >> @@ -66,6 +66,8 @@
> >>  /**< KASUMI PMD device name */
> >>  #define CRYPTODEV_NAME_ZUC_PMDcrypto_zuc
> >>  /**< KASUMI PMD device name */
> >> +#define CRYPTODEV_NAME_ARMV8_PMD  crypto_armv8
> >> +/**< ARMv8 CM device name */
> >>
> >>  /** Crypto device type */
> >>  enum rte_cryptodev_type {
> >> @@ -77,6 +79,7 @@ enum rte_cryptodev_type {
> >>RTE_CRYPTODEV_KASUMI_PMD,   /**< KASUMI PMD */
> >>RTE_CRYPTODEV_ZUC_PMD,  /**< ZUC PMD */
> >>RTE_CRYPTODEV_OPENSSL_PMD,/**<  OpenSSL PMD */
> >> +  RTE_CRYPTODEV_ARMV8_PMD,/**< ARMv8 crypto PMD */
> >>  };
> >
> > Can we remove all these types and names in the generic crypto API?
> >
> 
> Hello Thomas,
> 
> I added another PMD type and therefore we need new, unique number for 
> it. I'm not sure if I understand correctly what you mean here, so please 
> elaborate.

My comment is not specific to your PMD.
I think there is something wrong in the design of cryptodev if we need
to update rte_cryptodev.h each time a new driver is added.
There is no such thing in ethdev.


Re: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler

2016-12-07 Thread Dumitrescu, Cristian


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com]
> Sent: Tuesday, December 6, 2016 10:14 PM
> To: Stephen Hemminger 
> Cc: dev@dpdk.org; Dumitrescu, Cristian 
> Subject: Re: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical
> scheduler
> 
> 2016-12-06 11:51, Stephen Hemminger:
> > Rather than reinventing wheel which seems to be DPDK Standard
> > Procedure, could an existing abstraction be used?
> 
> Stephen, you know that the DPDK standard procedure is to consider
> reviews and good comments ;)

Thomas, sorry for not CC-ing you (as the librte_ether maintainer) when
the initial RFC was sent out, comments from you would be welcomed and
much appreciated as well!

Also same kind reminder for other people in the community as well.
(CC-ing just a few). Some of you have expressed interest in the past for
building an abstraction layer for QoS hierarchical scheduler, so here is the
RFC for it which we can evolve together. Let's make sure we build a robust
layer that fits well for all devices and vendors that are part of the DPDK
community!



Re: [dpdk-dev] [PATCH] vdev: fix missing alias check on uninit

2016-12-07 Thread Thomas Monjalon
2016-11-18 16:32, Anatoly Burakov:
> Fixes: d63eed6b2dca ("eal: add driver name alias")
> 
> Signed-off-by: Anatoly Burakov 

CC: sta...@dpdk.org

Applied, thanks


Re: [dpdk-dev] [PATCH v10] drivers/net:new TUN/TAP device PMD

2016-12-07 Thread Aws Ismail
On Wed, Dec 7, 2016 at 2:38 PM, Keith Wiles  wrote:
>
> Signed-off-by: Keith Wiles 

Tested-by: Aws Ismail 


Re: [dpdk-dev] [PATCH v2] doc: introduce PVP reference benchmark

2016-12-07 Thread Mcnamara, John
> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Tuesday, December 6, 2016 12:25 PM
> To: yuanhan@linux.intel.com; thomas.monja...@6wind.com; Mcnamara, John
> ; Yang, Zhiyong ;
> ktray...@redhat.com; dev@dpdk.org
> Cc: Maxime Coquelin 
> Subject: [PATCH v2] doc: introduce PVP reference benchmark
> 
> Having reference benchmarks is important in order to obtain reproducible
> performance figures.
> 
> This patch describes required steps to configure a PVP setup using testpmd
> in both host and guest.
> 
> Not relying on external vSwitch ease integration in a CI loop by not being
> impacted by DPDK API changes.
> 
> Signed-off-by: Maxime Coquelin 

There is one trailing whitespace warning but apart from that:

Acked-by: John McNamara 



> ---
> 
> Thanks to all the reviewers, this v2 should take all remarks into account.
> 
>  -- Maxime
> 
>  doc/guides/howto/img/pvp_2nics.svg   | 556
> +++
>  doc/guides/howto/index.rst   |   1 +
>  doc/guides/howto/pvp_reference_benchmark.rst | 395 +++
>  3 files changed, 952 insertions(+)
>  create mode 100644 doc/guides/howto/img/pvp_2nics.svg
>  create mode 100644 doc/guides/howto/pvp_reference_benchmark.rst
> 
> diff --git a/doc/guides/howto/img/pvp_2nics.svg
> b/doc/guides/howto/img/pvp_2nics.svg
> new file mode 100644
> index 000..517a800
> --- /dev/null
> +++ b/doc/guides/howto/img/pvp_2nics.svg
> @@ -0,0 +1,556 @@
> +
> +
> +
> + +   xmlns:dc="http://purl.org/dc/elements/1.1/";
> +   xmlns:cc="http://creativecommons.org/ns#";
> +   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
> +   xmlns:svg="http://www.w3.org/2000/svg";
> +   xmlns="http://www.w3.org/2000/svg";
> +   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd";
> +   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape";
> +   width="127.46428mm"
> +   height="139.41411mm"
> +   viewBox="0 0 451.64508 493.987"
> +   id="svg2"
> +   version="1.1"
> +   inkscape:version="0.92pre2 r"
> +   sodipodi:docname="pvp_2nics.svg"
> +   inkscape:export-filename="/home/max/Pictures/dpdk/pvp/pvp.png"
> +   inkscape:export-xdpi="90"
> +   inkscape:export-ydpi="90">
> +   + id="defs4">
> + +   inkscape:stockid="Arrow1Lend"
> +   orient="auto"
> +   refY="0"
> +   refX="0"
> +   id="marker4760"
> +   style="overflow:visible"
> +   inkscape:isstock="true">
> +   + inkscape:connector-curvature="0"
> + id="path4762"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + style="fill:#ff;fill-opacity:1;fill-
> rule:evenodd;stroke:#ff;stroke-width:1.0003pt;stroke-opacity:1"
> + transform="matrix(-0.8,0,0,-0.8,-10,0)" />
> +
> + +   inkscape:stockid="Arrow1Lend"
> +   orient="auto"
> +   refY="0"
> +   refX="0"
> +   id="marker4642"
> +   style="overflow:visible"
> +   inkscape:isstock="true">
> +   + inkscape:connector-curvature="0"
> + id="path4644"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + style="fill:#ff;fill-opacity:1;fill-
> rule:evenodd;stroke:#ff;stroke-width:1.0003pt;stroke-opacity:1"
> + transform="matrix(-0.8,0,0,-0.8,-10,0)" />
> +
> + +   inkscape:stockid="Arrow1Lstart"
> +   orient="auto"
> +   refY="0"
> +   refX="0"
> +   id="marker10370"
> +   style="overflow:visible"
> +   inkscape:isstock="true">
> +   + id="path10372"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + style="fill:#ff;fill-opacity:1;fill-
> rule:evenodd;stroke:#ff;stroke-width:1.0003pt;stroke-opacity:1"
> + transform="matrix(0.8,0,0,0.8,10,0)"
> + inkscape:connector-curvature="0" />
> +
> + +   inkscape:isstock="true"
> +   style="overflow:visible"
> +   id="marker10306"
> +   refX="0"
> +   refY="0"
> +   orient="auto"
> +   inkscape:stockid="Arrow1Lend">
> +   + transform="matrix(-0.8,0,0,-0.8,-10,0)"
> + style="fill:#ff;fill-opacity:1;fill-
> rule:evenodd;stroke:#ff;stroke-width:1.0003pt;stroke-opacity:1"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + id="path10308"
> + inkscape:connector-curvature="0" />
> +
> + +   inkscape:isstock="true"
> +   style="overflow:visible"
> +   id="marker9757"
> +   refX="0"
> +   refY="0"
> +   orient="auto"
> +   inkscape:stockid="Arrow1Lend"
> +   inkscape:collect="always">
> +   + transform="matrix(-0.8,0,0,-0.8,-10,0)"
> + style="fill:#ff;fill-opacity:1;fill-
> rule:evenodd;stroke:#ff;stroke-width:1.0003pt;stroke-opacity:1"
> + d="M 0,0 5,-5 -12.5,0 5,5 Z"
> + id="path9759"
> + inkscape:connector-curvature="0" />
> +
> + +   inkscape:stockid="Arrow1Lstart"
> +   orient="auto"
> +   refY="0"
> +   refX="0"
> +   id="A

Re: [dpdk-dev] [PATCH v2 09/12] doc/armv8: update documentation about crypto PMD

2016-12-07 Thread Mcnamara, John


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of
> zbigniew.bo...@caviumnetworks.com
> Sent: Wednesday, December 7, 2016 2:33 AM
> To: De Lara Guarch, Pablo ;
> jerin.ja...@caviumnetworks.com
> Cc: dev@dpdk.org; Zbigniew Bodek 
> Subject: [dpdk-dev] [PATCH v2 09/12] doc/armv8: update documentation about
> crypto PMD
> 
> From: Zbigniew Bodek 
> 
> Add documentation about the driver and update release notes.
> 
> Signed-off-by: Zbigniew Bodek 


Acked-by: John McNamara 





Re: [dpdk-dev] [PATCH v2 32/32] app/testpmd: fix invalid port ID

2016-12-07 Thread Lu, Wenzhuo
Hi Ferruh,


> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, December 7, 2016 11:49 PM
> To: Lu, Wenzhuo; dev@dpdk.org
> Cc: Chen, Jing D
> Subject: Re: [dpdk-dev] [PATCH v2 32/32] app/testpmd: fix invalid port ID
> 
> On 12/7/2016 3:32 AM, Wenzhuo Lu wrote:
> > Some CLIs don't check the input port ID, it may cause segmentation
> > fault (core dumped).
> 
> Are these functions (that we are adding extra check) added with this 
> patchset? If
> so why not add these checks where function implemented.
> 
> If these functions are already implemented before this patchset, this patch 
> can
> be standalone patch, instead of being part of this patchset, and can be before
> this patchset so that it can be easily backported to stable trees.
This patch is for the existing code. I think you're right. I'll remove it from 
this patch set and send an isolate patch for it.

> 
> >
> > Fixes: 425781ff5afe ("app/testpmd: add ixgbe VF management")
> >
> > Signed-off-by: Wenzhuo Lu 
> > Signed-off-by: Chen Jing D(Mark) 
> > ---
> 
> <...>



Re: [dpdk-dev] [PATCH] net/ixgbe: fix typo in comment

2016-12-07 Thread Lu, Wenzhuo
Hi,


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Thursday, December 8, 2016 1:20 AM
> To: dev@dpdk.org
> Cc: Zhang, Helin; Ananyev, Konstantin; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/ixgbe: fix typo in comment
> 
> Fixes: c03fcee9abbd ("ixgbe: remove CRC size from byte counters")
> 
> Signed-off-by: Ferruh Yigit 
Acked-by: Wenzhuo Lu 



Re: [dpdk-dev] [PATCH v2] eal: optimize aligned rte_memcpy

2016-12-07 Thread Yao, Lei A
Tested-by: Lei Yao 
- Apply patch to v16.11
I have tested the loopback performance for this patch on 3 following settings:
CPU: IVB
Ubutnu16.04
Kernal:  4.4.0
gcc : 5.4.0

CPU: HSW
Fedora 21
Kernal: 4.1.13
gcc: 4.9.2

CPU:BDW
Ubutnu16.04
Kernal:  4.4.0
gcc : 5.4.0

I can see 10%~20% performance gain for different packet size on mergeable path. 
Only on IVB + gcc5.4.0, slight performance drop(~4%) on vector path for packet 
size 128 ,260. It's may related to gcc version as this performance drop not see 
with gcc 6+.


-Original Message-
From: Wang, Zhihong 
Sent: Wednesday, December 7, 2016 9:31 AM
To: dev@dpdk.org
Cc: yuanhan@linux.intel.com; thomas.monja...@6wind.com; Yao, Lei A 
; Wang, Zhihong 
Subject: [PATCH v2] eal: optimize aligned rte_memcpy

This patch optimizes rte_memcpy for well aligned cases, where both
dst and src addr are aligned to maximum MOV width. It introduces a
dedicated function called rte_memcpy_aligned to handle the aligned
cases with simplified instruction stream. The existing rte_memcpy
is renamed as rte_memcpy_generic. The selection between them 2 is
done at the entry of rte_memcpy.

The existing rte_memcpy is for generic cases, it handles unaligned
copies and make store aligned, it even makes load aligned for micro
architectures like Ivy Bridge. However alignment handling comes at
a price: It adds extra load/store instructions, which can cause
complications sometime.

DPDK Vhost memcpy with Mergeable Rx Buffer feature as an example:
The copy is aligned, and remote, and there is header write along
which is also remote. In this case the memcpy instruction stream
should be simplified, to reduce extra load/store, therefore reduce
the probability of load/store buffer full caused pipeline stall, to
let the actual memcpy instructions be issued and let H/W prefetcher
goes to work as early as possible.

This patch is tested on Ivy Bridge, Haswell and Skylake, it provides
up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging
from 64 to 1500 bytes.

The test can also be conducted without NIC, by setting loopback
traffic between Virtio and Vhost. For example, modify the macro
TXONLY_DEF_PACKET_LEN to the requested packet size in testpmd.h,
rebuild and start testpmd in both host and guest, then "start" on
one side and "start tx_first 32" on the other.


Signed-off-by: Zhihong Wang 
---
 .../common/include/arch/x86/rte_memcpy.h   | 81 +-
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h 
b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
index b3bfc23..b9785e8 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
@@ -69,6 +69,8 @@ rte_memcpy(void *dst, const void *src, size_t n) 
__attribute__((always_inline));
 
 #ifdef RTE_MACHINE_CPUFLAG_AVX512F
 
+#define ALIGNMENT_MASK 0x3F
+
 /**
  * AVX512 implementation below
  */
@@ -189,7 +191,7 @@ rte_mov512blocks(uint8_t *dst, const uint8_t *src, size_t n)
 }
 
 static inline void *
-rte_memcpy(void *dst, const void *src, size_t n)
+rte_memcpy_generic(void *dst, const void *src, size_t n)
 {
uintptr_t dstu = (uintptr_t)dst;
uintptr_t srcu = (uintptr_t)src;
@@ -308,6 +310,8 @@ COPY_BLOCK_128_BACK63:
 
 #elif defined RTE_MACHINE_CPUFLAG_AVX2
 
+#define ALIGNMENT_MASK 0x1F
+
 /**
  * AVX2 implementation below
  */
@@ -387,7 +391,7 @@ rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n)
 }
 
 static inline void *
-rte_memcpy(void *dst, const void *src, size_t n)
+rte_memcpy_generic(void *dst, const void *src, size_t n)
 {
uintptr_t dstu = (uintptr_t)dst;
uintptr_t srcu = (uintptr_t)src;
@@ -499,6 +503,8 @@ COPY_BLOCK_128_BACK31:
 
 #else /* RTE_MACHINE_CPUFLAG */
 
+#define ALIGNMENT_MASK 0x0F
+
 /**
  * SSE & AVX implementation below
  */
@@ -677,7 +683,7 @@ __extension__ ({
  \
 })
 
 static inline void *
-rte_memcpy(void *dst, const void *src, size_t n)
+rte_memcpy_generic(void *dst, const void *src, size_t n)
 {
__m128i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8;
uintptr_t dstu = (uintptr_t)dst;
@@ -821,6 +827,75 @@ COPY_BLOCK_64_BACK15:
 
 #endif /* RTE_MACHINE_CPUFLAG */
 
+static inline void *
+rte_memcpy_aligned(void *dst, const void *src, size_t n)
+{
+   void *ret = dst;
+
+   /* Copy size <= 16 bytes */
+   if (n < 16) {
+   if (n & 0x01) {
+   *(uint8_t *)dst = *(const uint8_t *)src;
+   src = (const uint8_t *)src + 1;
+   dst = (uint8_t *)dst + 1;
+   }
+   if (n & 0x02) {
+   *(uint16_t *)dst = *(const uint16_t *)src;
+   src = (const uint16_t *)src + 1;
+   dst = (uint16_t *)dst + 1;
+   }
+   if (n & 0x04) {
+   *(uint32_

Re: [dpdk-dev] [PATCH v2 12/32] net/i40e: set VF MAC from PF support

2016-12-07 Thread Lu, Wenzhuo
Hi Ferruh,

> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, December 7, 2016 10:30 PM
> To: Lu, Wenzhuo; dev@dpdk.org
> Subject: Re: [PATCH v2 12/32] net/i40e: set VF MAC from PF support
> 
> On 12/7/2016 3:31 AM, Wenzhuo Lu wrote:
> > Support setting VF MAC address from PF.
> > User can call the API on PF to set a specific VF's MAC address.
> >
> > This will reset the VF.
> >
> > Signed-off-by: Ferruh Yigit 
> > ---
> 
> <...>
> 
> >
> > +/**
> > + * Set the VF MAC address.
> > + *
> > + * This will reset the vf.
> 
> It may be good if I add a comment that this also will remove all existing mac
> filters. Same to commit log perhaps.
Agree. Please:)



Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model

2016-12-07 Thread Jerin Jacob
On Wed, Dec 07, 2016 at 10:57:13AM +, Van Haaren, Harry wrote:
> > From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> 
> Hi Jerin,

Hi Harry,

> 
> Re v2 rte_event struct, there seems to be some changes in the struct layout 
> and field sizes. I've investigated them, and would like to propose some 
> changes to balance the byte-alignment and accessing of the fields.

OK. Looks like balanced byte-alignment makes more sense on IA.We will go with 
that then.
Few comments below,


> 
> These changes target only the first 64 bits of the rte_event struct. I've 
> left the current v2 code for reference, please find my proposed changes below.
> 
> > +struct rte_event {
> > +   /** WORD0 */
> > +   RTE_STD_C11
> > +   union {
> > +   uint64_t event;
> > +   /** Event attributes for dequeue or enqueue operation */
> > +   struct {
> > +   uint64_t flow_id:20;
> > +   /**< Targeted flow identifier for the enqueue and
> > +* dequeue operation.
> > +* The value must be in the range of
> > +* [0, nb_event_queue_flows - 1] which
> > +* previously supplied to rte_event_dev_configure().
> > +*/
> > +   uint64_t sub_event_type:8;
> > +   /**< Sub-event types based on the event source.
> > +* @see RTE_EVENT_TYPE_CPU
> > +*/
> > +   uint64_t event_type:4;
> > +   /**< Event type to classify the event source.
> > +* @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
> > +*/
> > +   uint64_t sched_type:2;
> > +   /**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
> > +* associated with flow id on a given event queue
> > +* for the enqueue and dequeue operation.
> > +*/
> > +   uint64_t queue_id:8;
> > +   /**< Targeted event queue identifier for the enqueue or
> > +* dequeue operation.
> > +* The value must be in the range of
> > +* [0, nb_event_queues - 1] which previously supplied to
> > +* rte_event_dev_configure().
> > +*/
> > +   uint64_t priority:8;
> > +   /**< Event priority relative to other events in the
> > +* event queue. The requested priority should in the
> > +* range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST,
> > +* RTE_EVENT_DEV_PRIORITY_LOWEST].
> > +* The implementation shall normalize the requested
> > +* priority to supported priority value.
> > +* Valid when the device has
> > +* RTE_EVENT_DEV_CAP_FLAG_EVENT_QOS capability.
> > +*/
> > +   uint64_t op:2;
> > +   /**< The type of event enqueue operation - new/forward/
> > +* etc.This field is not preserved across an instance
> > +* and is undefined on dequeue.
> > +*  @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
> > +*/
> > +   uint64_t impl_opaque:12;
> > +   /**< Implementation specific opaque value.
> > +* An implementation may use this field to hold
> > +* implementation specific value to share between
> > +* dequeue and enqueue operation.
> > +* The application should not modify this field.
> > +*/
> > +   };
> > +   };
> 
> struct rte_event {
>   /** WORD0 */
>   RTE_STD_C11
>   union {
>   uint64_t event;
>   struct {
>   uint32_t flow_id: 24;
>   uint32_t impl_opaque : 8; /* not defined on deq */
> 
>   uint8_t queue_id;
>   uint8_t priority;
> 
>   uint8_t operation  : 4; /* new fwd drop */
>   uint8_t sched_type : 4;
> 
>   uint8_t event_type : 4;
>   uint8_t sub_event_type : 4;
>   };
>   };
>   /** word 1 */
> 
> 
> 
> The changes made are as follows:
> * Add impl_opaque to the remaining 8 bits of those 32 bits (previous size was 
> 12 bits)
OK
> 
> * QueueID and Priority remain 8 bit integers - but now accessible as 8 bit 
> ints.
OK
> 
> * Operation and sched_type *increased* to 4 bits each (from previous value of 
> 2) to allow future expansion without ABI changes

Anyway it will break ABI if we add new operation. I would propose to keep 4bit
reserved and add it when required.

> 
> * Event type remains constant at 4 bits

OK

> * Restore flow_id to 24 bits of a 32 bit int (previ

[dpdk-dev] [PATCH 0/2] minor cleanups

2016-12-07 Thread Stephen Hemminger
Saw this while reviewing other changes

Stephen Hemminger (2):
  eth: get rid of goto's in rte_eth_dev_detach
  sched/malloc: remove unnecesary return statements

 drivers/net/bonding/rte_eth_bond_pmd.c |  2 +-
 lib/librte_eal/common/rte_malloc.c |  1 -
 lib/librte_ether/rte_ethdev.c  | 21 +++--
 lib/librte_sched/rte_bitmap.h  |  2 --
 4 files changed, 8 insertions(+), 18 deletions(-)

-- 
2.10.2



  1   2   >