Re: [dpdk-dev] [PATCH v7 8/8] mempool: notify memory area to pool

2017-10-02 Thread santosh
Hi Olivier,


On Sunday 01 October 2017 02:59 PM, Santosh Shukla wrote:
> HW pool manager e.g. Octeontx SoC demands s/w to program start and end
> address of pool. Currently, there is no such api in external mempool.
> Introducing rte_mempool_ops_register_memory_area api which will let HW(pool
> manager) to know when common layer selects hugepage:
> For each hugepage - Notify its start/end address to HW pool manager.
>
> Signed-off-by: Santosh Shukla 
> Signed-off-by: Jerin Jacob 
> ---

ping, required for -rc1. Thanks.,




Re: [dpdk-dev] [PATCH v7 7/8] mempool: introduce block size align flag

2017-10-02 Thread santosh
Hi Olivier,

On Sunday 01 October 2017 02:59 PM, Santosh Shukla wrote:
> Some mempool hw like octeontx/fpa block, demands block size
> (/total_elem_sz) aligned object start address.
>
> Introducing an MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS flag.
> If this flag is set:
> - Align object start address(vaddr) to a multiple of total_elt_sz.
> - Allocate one additional object. Additional object is needed to make
>   sure that requested 'n' object gets correctly populated.
>
> Example:
> - Let's say that we get 'x' size of memory chunk from memzone.
> - And application has requested 'n' object from mempool.
> - Ideally, we start using objects at start address 0 to...(x-block_sz)
>   for n obj.
> - Not necessarily first object address i.e. 0 is aligned to block_sz.
> - So we derive 'offset' value for block_sz alignment purpose i.e..'off'.
> - That 'off' makes sure that start address of object is blk_sz aligned.
> - Calculating 'off' may end up sacrificing first block_sz area of
>   memzone area x. So total number of the object which can fit in the
>   pool area is n-1, Which is incorrect behavior.
>
> Therefore we request one additional object (/block_sz area) from memzone
> when MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS flag is set.
>
> Signed-off-by: Santosh Shukla 
> Signed-off-by: Jerin Jacob 
> Tested-by: Hemant Agrawal 
> ---

early ping, since we needed this -rc1! Thanks.



Re: [dpdk-dev] [PATCH v5 0/2] Dynamically configure mempool handle

2017-10-02 Thread santosh

On Sunday 01 October 2017 02:44 PM, Santosh Shukla wrote:
> v5:
> - Includes v4 minor review comment.
>   Patches rebased on upstream tip / commit id:5dce9fcdb2

ping, Thanks.



Re: [dpdk-dev] [Suspected-Phishing]Re: [PATCH v2] net/bonding: support bifurcated driver in eal cli using --vdev

2017-10-02 Thread Raslan Darawsheh
Hi Guys,
This is gentle remainder of this patch,
Do we have any updates about it?

Kindest regards
Raslan Darawsheh

-Original Message-
From: gowrishankar muthukrishnan [mailto:gowrishanka...@linux.vnet.ibm.com] 
Sent: Wednesday, September 6, 2017 11:59 AM
To: Thomas Monjalon 
Cc: dev@dpdk.org; Gaëtan Rivet ; Declan Doherty 
; Ferruh Yigit ; Raslan 
Darawsheh 
Subject: [Suspected-Phishing]Re: [dpdk-dev] [PATCH v2] net/bonding: support 
bifurcated driver in eal cli using --vdev

Hi Thomas,
I will rework on my patch with these suggestions and send new version.
Thanks Declan and Gaëtan. Thank you Thomas too reminding me.

Regards,
Gowrishankar

On Tuesday 05 September 2017 02:43 PM, Thomas Monjalon wrote:
> Ping - any news?
>
> 31/07/2017 16:34, Gaëtan Rivet:
>> Hi Gowrishankar, Declan,
>>
>> On Mon, Jul 10, 2017 at 12:02:24PM +0530, gowrishankar muthukrishnan wrote:
>>> On Friday 07 July 2017 09:08 PM, Declan Doherty wrote:
 On 04/07/2017 12:57 PM, Gowrishankar wrote:
> From: Gowrishankar Muthukrishnan 
> 
>
> At present, creating bonding devices using --vdev is broken for 
> PMD like
> mlx5 as it is neither UIO nor VFIO based and hence PMD driver is 
> unknown to find_port_id_by_pci_addr(), as below.
>
> testpmd  --vdev 'net_bonding0,mode=1,slave=,socket_id=0'
>
> PMD: bond_ethdev_parse_slave_port_kvarg(150) - Invalid slave port 
> value () specified
> EAL: Failed to parse slave ports for bonded device net_bonding0
>
> This patch fixes parsing PCI ID from bonding device params by 
> verifying it in RTE PCI bus, rather than checking dev->kdrv.
>
> Changes:
>   v2 - revisit fix by iterating rte_pci_bus
>
> Signed-off-by: Gowrishankar Muthukrishnan 
> 
> ---
 ...
 Hey Gowrishankar,

 I was having a look at this patch and there is the following 
 checkpatch error.

 _coding style issues_


 WARNING:AVOID_EXTERNS: externs should be avoided in .c files
 #48: FILE: drivers/net/bonding/rte_eth_bond_args.c:43:
 +extern struct rte_pci_bus rte_pci_bus;

>>> Hi Declan,
>>> Thank you for your review.
>>> Yes, but I also saw some references like above in older code.
>>>
 Looking at bit closer at the issue I think there is a simpler 
 solution, the bonding driver really shouldn't be parsing the PCI 
 bus directly, and since PCI devices use the PCI DBF as their name 
 we can simply replace the all the scanning code with a simple call 
 to rte_eth_dev_get_port_by_name API.

>> I agree that it would be better to be able to use the ether API for 
>> this.
>>
>> The issue is that PCI devices are inconsistent regarding their names. 
>> The possibility is given to the user to employ the simplified BDF 
>> format for PCI device name, instead of the DomBDF format.
>>
>> Unfortunately, the default device name for a PCI device is in the 
>> DomBDF format. This means that the name won't match if the device was 
>> probed by using the PCI blacklist mode (the default PCI mode).
>>
>> The matching must be refined.
>>
>>> But you are removing an option to mention ports by PCI addresses 
>>> right  (as I see parse_port_id() completely removed in your patch) ?.
>>> IMO, we just need to check if given eth pci id (incase we mention 
>>> ports ib PCI ID) is one of what EAL scanned in PCI. Also, slaves 
>>> should not be from any blacklisted PCI ids (as we test with -b or -w).
>>>
>> Declan is right about the iteration of PCI devices. The device list 
>> for the PCI bus is private, the extern declaration to the rte_pci_bus 
>> is the telltale sign that there is something wrong in the approach here.
>>
>> In order to respect the new rte_bus logic, I think what you want to 
>> achieve can be done by using the rte_bus->find_device with the 
>> correct device comparison function.
>>
>> static int
>> pci_addr_cmp(const struct rte_device *dev, const void *_pci_addr) {
>>  struct rte_pci_device *pdev;
>>  char *addr = _pci_addr;
>>  struct rte_pci_addr paddr;
>>  static struct rte_bus *pci_bus = NULL;
>>
>>  if (pci_bus == NULL)
>>  pci_bus = rte_bus_find_by_name("pci");
>>
>>  if (pci_bus->parse(addr, &paddr) != 0) {
>>  /* Invalid PCI addr given as input. */
>>  return -1;
>>  }
>>  pdev = RTE_DEV_TO_PCI(dev);
>>  return rte_eal_compare_pci_addr(&pdev->addr, &paddr); }
>>
>> Then verify that you are able to get a device by using it as follows:
>>
>> {
>>  struct rte_bus *pci_bus;
>>  struct rte_device *dev;
>>
>>  pci_bus = rte_bus_find_by_name("pci");
>>  if (pci_bus == NULL) {
>>  RTE_LOG(ERR, PMD, "Unable to find PCI bus\n");
>>  return -1;
>>  }
>>  dev = pci_bus->find_device(NULL, pci_addr_cmp, devname);
>>  if (dev == NULL) {
>>  RTE_LOG(ERR, PMD, "Unable to find the device %s to enslave.\n",
>>  devname);
>>  return -EINVAL;
>>

Re: [dpdk-dev] [Suspected-Phishing]Re: [PATCH v2] net/bonding: support bifurcated driver in eal cli using --vdev

2017-10-02 Thread gowrishankar muthukrishnan

Hi Raslan,
I had submitted newer version and waiting for ack/merge.

dpdk.org/dev/patchwork/patch/29039/

Thanks,
Gowrishankar

On Monday 02 October 2017 02:11 PM, Raslan Darawsheh wrote:

Hi Guys,
This is gentle remainder of this patch,
Do we have any updates about it?

Kindest regards
Raslan Darawsheh

-Original Message-
From: gowrishankar muthukrishnan [mailto:gowrishanka...@linux.vnet.ibm.com]
Sent: Wednesday, September 6, 2017 11:59 AM
To: Thomas Monjalon 
Cc: dev@dpdk.org; Gaëtan Rivet ; Declan Doherty 
; Ferruh Yigit ; Raslan Darawsheh 

Subject: [Suspected-Phishing]Re: [dpdk-dev] [PATCH v2] net/bonding: support 
bifurcated driver in eal cli using --vdev

Hi Thomas,
I will rework on my patch with these suggestions and send new version.
Thanks Declan and Gaëtan. Thank you Thomas too reminding me.

Regards,
Gowrishankar

On Tuesday 05 September 2017 02:43 PM, Thomas Monjalon wrote:

Ping - any news?

31/07/2017 16:34, Gaëtan Rivet:

Hi Gowrishankar, Declan,

On Mon, Jul 10, 2017 at 12:02:24PM +0530, gowrishankar muthukrishnan wrote:

On Friday 07 July 2017 09:08 PM, Declan Doherty wrote:

On 04/07/2017 12:57 PM, Gowrishankar wrote:

From: Gowrishankar Muthukrishnan


At present, creating bonding devices using --vdev is broken for
PMD like
mlx5 as it is neither UIO nor VFIO based and hence PMD driver is
unknown to find_port_id_by_pci_addr(), as below.

testpmd  --vdev 'net_bonding0,mode=1,slave=,socket_id=0'

PMD: bond_ethdev_parse_slave_port_kvarg(150) - Invalid slave port
value () specified
EAL: Failed to parse slave ports for bonded device net_bonding0

This patch fixes parsing PCI ID from bonding device params by
verifying it in RTE PCI bus, rather than checking dev->kdrv.

Changes:
   v2 - revisit fix by iterating rte_pci_bus

Signed-off-by: Gowrishankar Muthukrishnan

---

...
Hey Gowrishankar,

I was having a look at this patch and there is the following
checkpatch error.

_coding style issues_


WARNING:AVOID_EXTERNS: externs should be avoided in .c files
#48: FILE: drivers/net/bonding/rte_eth_bond_args.c:43:
+extern struct rte_pci_bus rte_pci_bus;


Hi Declan,
Thank you for your review.
Yes, but I also saw some references like above in older code.


Looking at bit closer at the issue I think there is a simpler
solution, the bonding driver really shouldn't be parsing the PCI
bus directly, and since PCI devices use the PCI DBF as their name
we can simply replace the all the scanning code with a simple call
to rte_eth_dev_get_port_by_name API.


I agree that it would be better to be able to use the ether API for
this.

The issue is that PCI devices are inconsistent regarding their names.
The possibility is given to the user to employ the simplified BDF
format for PCI device name, instead of the DomBDF format.

Unfortunately, the default device name for a PCI device is in the
DomBDF format. This means that the name won't match if the device was
probed by using the PCI blacklist mode (the default PCI mode).

The matching must be refined.


But you are removing an option to mention ports by PCI addresses
right  (as I see parse_port_id() completely removed in your patch) ?.
IMO, we just need to check if given eth pci id (incase we mention
ports ib PCI ID) is one of what EAL scanned in PCI. Also, slaves
should not be from any blacklisted PCI ids (as we test with -b or -w).


Declan is right about the iteration of PCI devices. The device list
for the PCI bus is private, the extern declaration to the rte_pci_bus
is the telltale sign that there is something wrong in the approach here.

In order to respect the new rte_bus logic, I think what you want to
achieve can be done by using the rte_bus->find_device with the
correct device comparison function.

static int
pci_addr_cmp(const struct rte_device *dev, const void *_pci_addr) {
  struct rte_pci_device *pdev;
  char *addr = _pci_addr;
  struct rte_pci_addr paddr;
  static struct rte_bus *pci_bus = NULL;

  if (pci_bus == NULL)
  pci_bus = rte_bus_find_by_name("pci");

  if (pci_bus->parse(addr, &paddr) != 0) {
  /* Invalid PCI addr given as input. */
  return -1;
  }
  pdev = RTE_DEV_TO_PCI(dev);
  return rte_eal_compare_pci_addr(&pdev->addr, &paddr); }

Then verify that you are able to get a device by using it as follows:

{
  struct rte_bus *pci_bus;
  struct rte_device *dev;

  pci_bus = rte_bus_find_by_name("pci");
  if (pci_bus == NULL) {
  RTE_LOG(ERR, PMD, "Unable to find PCI bus\n");
  return -1;
  }
  dev = pci_bus->find_device(NULL, pci_addr_cmp, devname);
  if (dev == NULL) {
  RTE_LOG(ERR, PMD, "Unable to find the device %s to enslave.\n",
  devname);
  return -EINVAL;
  }
}

I hope it's clear enough. You can find examples of use for this API
in lib/librte_eal/common/eal_common_dev.c

It's a quick implementation to outline the possible direction, I
haven't compiled it. It should be refined.

For e

Re: [dpdk-dev] [PATCH v4 1/4] eventdev: Add caps API and PMD callbacks for rte_event_eth_rx_adapter

2017-10-02 Thread Jerin Jacob
-Original Message-
> Date: Sun, 24 Sep 2017 17:44:06 +0530
> From: "Rao, Nikhil" 
> To: Jerin Jacob 
> CC: bruce.richard...@intel.com, gage.e...@intel.com, dev@dpdk.org,
>  tho...@monjalon.net, harry.van.haa...@intel.com, hemant.agra...@nxp.com,
>  nipun.gu...@nxp.com, narender.vang...@intel.com,
>  erik.g.carri...@intel.com, abhinandan.guj...@intel.com,
>  santosh.shu...@caviumnetworks.com
> Subject: Re: [PATCH v4 1/4] eventdev: Add caps API and PMD callbacks for
>  rte_event_eth_rx_adapter
> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.3.0
> 
> On 9/21/2017 9:16 PM, Jerin Jacob wrote:
> > -Original Message-
> > > Date: Fri, 22 Sep 2017 02:47:11 +0530
> > > From: Nikhil Rao 
> > > To: jerin.ja...@caviumnetworks.com, bruce.richard...@intel.com
> > > CC: gage.e...@intel.com, dev@dpdk.org, tho...@monjalon.net,
> > >   harry.van.haa...@intel.com, hemant.agra...@nxp.com, nipun.gu...@nxp.com,
> > >   narender.vang...@intel.com, erik.g.carri...@intel.com,
> > >   abhinandan.guj...@intel.com, santosh.shu...@caviumnetworks.com
> > > Subject: [PATCH v4 1/4] eventdev: Add caps API and PMD callbacks for
> > >   rte_event_eth_rx_adapter
> > > X-Mailer: git-send-email 2.7.4
> > > +/* Ethdev Rx adapter capability bitmap flags */
> > > +#define RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT   0x1
> > > +/**< Eventdev can send packets to ethdev using internal event port */
> > > +#define RTE_EVENT_ETH_RX_ADAPTER_CAP_SINGLE_EVENTQ   0x2
> > > +/**< Ethdev Rx queues can be connected to single event queue */
> > 
> > I think, Its is more of limitation. Since we are expressing it as
> > capability. How about changing it as 
> > RTE_EVENT_ETH_RX_ADAPTER_CAP_MULTI_EVENTQ
> > (same as exiting !RTE_EVENT_ETH_RX_ADAPTER_CAP_SINGLE_EVENTQ and SW driver 
> > has this capability)
> > i.e Ethdev Rx queues of single ethdev port can be connected to multiple
> > event queue.
> > 
> OK. I agree that the MULTI_EVENTQ is better suited to be expressed as a
> capability.
> 
> > > +#define RTE_EVENT_ETH_RX_ADAPTER_CAP_FLOW_ID 0x4
> > > +/**< Ethdev Rx adapter can set flow ID for event queue, if this flag
> > > + * is unset, the application needs to provide a flow id when adding
> > 
> > Looking at implementation, If I understand it correctly, it not application
> > "needs" to provide instead, it is application can provide. If so, I think,
> > making it as RTE_EVENT_ETH_RX_ADAPTER_CAP_SET_FLOW_ID or
> > RTE_EVENT_ETH_RX_ADAPTER_CAP_OVERRIDE_FLOW_ID makes more sense.
> > 
> If the FLOW_ID cap is not set, it is required for the application to provide
> it, else the application optionally can provide it but the feature of the
> application being able to provide (override) the flag should be a separate
> flag.
> 
> If it's only the override behavior that is required, we can rename the flag
> to OVERRIDE_FLOW_ID.

Yes. OVERRIDE_FLOW_ID behavior makes sense to me. Please update the
doxygen comments as well.



[dpdk-dev] [PATCH v7 1/4] librte_flow_classify: add librte_flow_classify library

2017-10-02 Thread Bernard Iremonger
From: Ferruh Yigit 

The following library APIs's are implemented:
rte_flow_classify_create
rte_flow_classify_validate
rte_flow_classify_destroy
rte_flow_classify_query

The following librte_table ACL API's are used:
f_create to create a table ACL.
f_add to add an ACL rule to the table.
f_del to delete an ACL form the table.
f_lookup to match packets with the ACL rules.

use f_add entry data for matching

The library supports counting of IPv4 five tupple packets only,
ie IPv4 UDP, TCP and SCTP packets.

updated MAINTAINERS file

Signed-off-by: Ferruh Yigit 
Signed-off-by: Bernard Iremonger 
---
 MAINTAINERS|   7 +
 config/common_base |   6 +
 doc/api/doxy-api-index.md  |   1 +
 doc/api/doxy-api.conf  |   1 +
 lib/Makefile   |   3 +
 lib/librte_eal/common/include/rte_log.h|   1 +
 lib/librte_flow_classify/Makefile  |  51 ++
 lib/librte_flow_classify/rte_flow_classify.c   | 460 +
 lib/librte_flow_classify/rte_flow_classify.h   | 207 
 lib/librte_flow_classify/rte_flow_classify_parse.c | 546 +
 lib/librte_flow_classify/rte_flow_classify_parse.h |  74 +++
 .../rte_flow_classify_version.map  |  10 +
 mk/rte.app.mk  |   2 +-
 13 files changed, 1368 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_flow_classify/Makefile
 create mode 100644 lib/librte_flow_classify/rte_flow_classify.c
 create mode 100644 lib/librte_flow_classify/rte_flow_classify.h
 create mode 100644 lib/librte_flow_classify/rte_flow_classify_parse.c
 create mode 100644 lib/librte_flow_classify/rte_flow_classify_parse.h
 create mode 100644 lib/librte_flow_classify/rte_flow_classify_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df2a7f..4b875ad 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -677,6 +677,13 @@ F: doc/guides/prog_guide/pdump_lib.rst
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Flow classify
+M: Bernard Iremonger 
+F: lib/librte_flow_classify/
+F: test/test/test_flow_classify*
+F: examples/flow_classify/
+F: doc/guides/sample_app_ug/flow_classify.rst
+F: doc/guides/prog_guide/flow_classify_lib.rst
 
 Packet Framework
 
diff --git a/config/common_base b/config/common_base
index 12f6be9..0638a37 100644
--- a/config/common_base
+++ b/config/common_base
@@ -658,6 +658,12 @@ CONFIG_RTE_LIBRTE_GRO=y
 CONFIG_RTE_LIBRTE_METER=y
 
 #
+# Compile librte_classify
+#
+CONFIG_RTE_LIBRTE_FLOW_CLASSIFY=y
+CONFIG_RTE_LIBRTE_CLASSIFY_DEBUG=n
+
+#
 # Compile librte_sched
 #
 CONFIG_RTE_LIBRTE_SCHED=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 19e0d4f..a2fa281 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -105,6 +105,7 @@ The public API headers are grouped by topics:
   [LPM IPv4 route] (@ref rte_lpm.h),
   [LPM IPv6 route] (@ref rte_lpm6.h),
   [ACL](@ref rte_acl.h),
+  [flow_classify]  (@ref rte_flow_classify.h),
   [EFD](@ref rte_efd.h)
 
 - **QoS**:
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 823554f..4e43a66 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -46,6 +46,7 @@ INPUT   = doc/api/doxy-api-index.md \
   lib/librte_efd \
   lib/librte_ether \
   lib/librte_eventdev \
+  lib/librte_flow_classify \
   lib/librte_gro \
   lib/librte_hash \
   lib/librte_ip_frag \
diff --git a/lib/Makefile b/lib/Makefile
index 86caba1..21fc3b0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -82,6 +82,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DEPDIRS-librte_power := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DEPDIRS-librte_meter := librte_eal
+DIRS-$(CONFIG_RTE_LIBRTE_FLOW_CLASSIFY) += librte_flow_classify
+DEPDIRS-librte_flow_classify := librte_eal librte_ether librte_net
+DEPDIRS-librte_flow_classify += librte_table librte_acl librte_port
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
 DEPDIRS-librte_sched := librte_eal librte_mempool librte_mbuf librte_net
 DEPDIRS-librte_sched += librte_timer
diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index ec8dba7..f975bde 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -87,6 +87,7 @@ struct rte_logs {
 #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */
 #define RTE_LOGTYPE_EFD   18 /**< Log related to EFD. */
 #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
+#define RTE_LOGTYPE_CLASSIFY  20 /**< Log related to flow classify. */
 
 /* these log types can be used in an application */
 #define RTE_LOGTYPE_USE

[dpdk-dev] [PATCH v7 0/4] flow classification library

2017-10-02 Thread Bernard Iremonger
DPDK works with packets, but some network administration tools works based on 
flow information.

This library is suggested to provide a helper API to convert packet based 
information to the flow records.

Basically the library consist of APIs to validate, create and destroy the rule 
and to query the stats.
Application should call the query API for all received packets.

The library header file has more comments on how library works and provided 
APIs.

Packets to flow conversion will cause performance drop, that is why conversion 
done on demand by an API call
provided by this library.

The initial implementation is to provide counting of IPv4 five tuple packets 
for UDP, TCP and SCTP but the
library is planned to be as generic as possible.
The flow information provided by this library is missing to implement full 
IPFIX features,
but this is planned to be initial step.

Flows are defined using rte_flow, also measurements (actions) are provided by 
rte_flow.
To support more IPFIX measurements, the implementation may require extending 
rte_flow in addition to
extending this library.

The library uses both flows and actions defined by rte_flow.h so this library 
has a dependency on rte_flow.h.

For further steps, this library can be expanded to benefit from hardware 
filters for better performance.

It will be more beneficial to shape this library to cover more use cases,
please feel free to comment on possible other use cases and desired 
functionalities.

Changes in v7:
Fix rte_flow_classify_version.map file.
Fix checkpatch warnings.

Changes in v6:
Dropped two librte_table patches (patches 1 and 2 in v5 patch set).
Revised librte_flow_classify patch to use librte_table API's correctly.

Changes in v5:
Added tests for TCP an STCP traffic to unit test code.
Added patch to packet_burst_generator code to add functions for TCP and SCTP 
protocols.

Changes in v4:
Replaced GET_CB_FIELD macro with get_cb_field function in the flow classify 
sample application
to fix checkpatch warning.
Fixed checkpatch warnings in test_flow_classify.c

Changes in v3:
Patch 3 from the v2 patch set has been dropped,"librte_ether: initialise IPv4 
protocol mask for rte_flow".
The flow_classify sample application is now using an input file of IPv4 five 
tuple rules instead of
hardcoded values.
A minor fix to the rte_flow_classify_create() function.

Changes in v2:
Patch 1, librte_table: move structure to header file, has been dropped.
The code has been reworked to not access struct rte_table_acl directly.
An entry_size parameter has been added to the rte_flow_classify_create function.
The f_lookup function is now called instead of the rte_acl_classify function.

Patch 2, librte_table: fix acl lookup function,  has been added.

Changes in v1, since RFC v3:
added rte_flow_classify_validate API.
librte_table ACL is used for packet matching.
a table_acl parameter has been added to all of the API's
an error parameter has been been added to all of the API's


Bernard Iremonger (3):
  examples/flow_classify: flow classify sample application
  test: add packet burst generator functions
  test: flow classify library unit tests

Ferruh Yigit (1):
  librte_flow_classify: add librte_flow_classify library

 MAINTAINERS|   7 +
 config/common_base |   6 +
 doc/api/doxy-api-index.md  |   1 +
 doc/api/doxy-api.conf  |   1 +
 examples/flow_classify/Makefile|  57 ++
 examples/flow_classify/flow_classify.c | 897 +
 examples/flow_classify/ipv4_rules_file.txt |  14 +
 lib/Makefile   |   3 +
 lib/librte_eal/common/include/rte_log.h|   1 +
 lib/librte_flow_classify/Makefile  |  51 ++
 lib/librte_flow_classify/rte_flow_classify.c   | 460 +++
 lib/librte_flow_classify/rte_flow_classify.h   | 207 +
 lib/librte_flow_classify/rte_flow_classify_parse.c | 546 +
 lib/librte_flow_classify/rte_flow_classify_parse.h |  74 ++
 .../rte_flow_classify_version.map  |  10 +
 mk/rte.app.mk  |   2 +-
 test/test/Makefile |   1 +
 test/test/packet_burst_generator.c | 191 +
 test/test/packet_burst_generator.h |  22 +-
 test/test/test_flow_classify.c | 698 
 test/test/test_flow_classify.h | 240 ++
 21 files changed, 3486 insertions(+), 3 deletions(-)
 create mode 100644 examples/flow_classify/Makefile
 create mode 100644 examples/flow_classify/flow_classify.c
 create mode 100644 examples/flow_classify/ipv4_rules_file.txt
 create mode 100644 lib/librte_flow_classify/Makefile
 create mode 100644 lib/librte_flow_classify/rte_flow_classify.c
 create mode 100644 lib/librte_flow_classify/rte_flow_classify.h
 create mode 100644 lib

[dpdk-dev] [PATCH v7 4/4] test: flow classify library unit tests

2017-10-02 Thread Bernard Iremonger
Add flow_classify_autotest program.

Set up IPv4 ACL field definitions.
Create table_acl for use by librte_flow_classify API's.
Create an mbuf pool for use by rte_flow_classify_query.

For each of the librte_flow_classify API's:
add bad parameter tests
add bad pattern tests
add bad action tests
add good parameter tests

Initialise ipv4 udp traffic for use by the udp test for
rte_flow_classif_query.

Initialise ipv4 tcp traffic for use by the tcp test for
rte_flow_classif_query.

Initialise ipv4 sctp traffic for use by the sctp test for
rte_flow_classif_query.

Signed-off-by: Bernard Iremonger 
---
 test/test/Makefile |   1 +
 test/test/test_flow_classify.c | 698 +
 test/test/test_flow_classify.h | 240 ++
 3 files changed, 939 insertions(+)
 create mode 100644 test/test/test_flow_classify.c
 create mode 100644 test/test/test_flow_classify.h

diff --git a/test/test/Makefile b/test/test/Makefile
index 42d9a49..073e1ed 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -106,6 +106,7 @@ SRCS-y += test_table_tables.c
 SRCS-y += test_table_ports.c
 SRCS-y += test_table_combined.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += test_table_acl.c
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += test_flow_classify.c
 endif
 
 SRCS-y += test_rwlock.c
diff --git a/test/test/test_flow_classify.c b/test/test/test_flow_classify.c
new file mode 100644
index 000..e7fbe73
--- /dev/null
+++ b/test/test/test_flow_classify.c
@@ -0,0 +1,698 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include "test.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "packet_burst_generator.h"
+#include "test_flow_classify.h"
+
+
+#define FLOW_CLASSIFY_MAX_RULE_NUM 100
+static void *table_acl;
+static uint32_t entry_size;
+
+/*
+ * test functions by passing invalid or
+ * non-workable parameters.
+ */
+static int
+test_invalid_parameters(void)
+{
+   struct rte_flow_classify *classify;
+   int ret;
+
+   ret = rte_flow_classify_validate(NULL, NULL, NULL, NULL, NULL);
+   if (!ret) {
+   printf("Line %i: flow_classify_validate", __LINE__);
+   printf(" with NULL param should have failed!\n");
+   return -1;
+   }
+
+   classify = rte_flow_classify_create(NULL, 0, NULL, NULL, NULL, NULL);
+   if (classify) {
+   printf("Line %i: flow_classify_create", __LINE__);
+   printf(" with NULL param should have failed!\n");
+   return -1;
+   }
+
+   ret = rte_flow_classify_destroy(NULL, NULL, NULL);
+   if (!ret) {
+   printf("Line %i: flow_classify_destroy", __LINE__);
+   printf(" with NULL param should have failed!\n");
+   return -1;
+   }
+
+   ret = rte_flow_classify_query(NULL, NULL, NULL, 0, NULL, NULL);
+   if (!ret) {
+   printf("Line %i: flow_classify_query", __LINE__);
+   printf(" with NULL param should have failed!\n");
+   return -1;
+   }
+
+   ret = rte_flow_classify_validate(NULL, NULL, NULL, NULL, &error);
+   if (!ret) {
+   printf("Line %i: flow_classify_validate", __LINE__);
+   printf(" with NULL param should have failed!\n

[dpdk-dev] [PATCH v7 2/4] examples/flow_classify: flow classify sample application

2017-10-02 Thread Bernard Iremonger
The flow_classify sample application exercises the following
librte_flow_classify API's:

rte_flow_classify_create
rte_flow_classify_validate
rte_flow_classify_destroy
rte_flow_classify_query

It sets up the IPv4 ACL field definitions.
It creates table_acl and adds and deletes rules using the
librte_table API.

It uses a file of IPv4 five tuple rules for input.

Signed-off-by: Bernard Iremonger 
---
 examples/flow_classify/Makefile|  57 ++
 examples/flow_classify/flow_classify.c | 897 +
 examples/flow_classify/ipv4_rules_file.txt |  14 +
 3 files changed, 968 insertions(+)
 create mode 100644 examples/flow_classify/Makefile
 create mode 100644 examples/flow_classify/flow_classify.c
 create mode 100644 examples/flow_classify/ipv4_rules_file.txt

diff --git a/examples/flow_classify/Makefile b/examples/flow_classify/Makefile
new file mode 100644
index 000..eecdde1
--- /dev/null
+++ b/examples/flow_classify/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overridden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = flow_classify
+
+
+# all source are stored in SRCS-y
+SRCS-y := flow_classify.c
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/flow_classify/flow_classify.c 
b/examples/flow_classify/flow_classify.c
new file mode 100644
index 000..651fa8f
--- /dev/null
+++ b/examples/flow_classify/flow_classify.c
@@ -0,0 +1,897 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF 

[dpdk-dev] [PATCH v7 3/4] test: add packet burst generator functions

2017-10-02 Thread Bernard Iremonger
add initialize_tcp_header function
add initialize_stcp_header function
add initialize_ipv4_header_proto function
add generate_packet_burst_proto function

Signed-off-by: Bernard Iremonger 
---
 test/test/packet_burst_generator.c | 191 +
 test/test/packet_burst_generator.h |  22 -
 2 files changed, 211 insertions(+), 2 deletions(-)

diff --git a/test/test/packet_burst_generator.c 
b/test/test/packet_burst_generator.c
index a93c3b5..8f4ddcc 100644
--- a/test/test/packet_burst_generator.c
+++ b/test/test/packet_burst_generator.c
@@ -134,6 +134,36 @@
return pkt_len;
 }
 
+uint16_t
+initialize_tcp_header(struct tcp_hdr *tcp_hdr, uint16_t src_port,
+   uint16_t dst_port, uint16_t pkt_data_len)
+{
+   uint16_t pkt_len;
+
+   pkt_len = (uint16_t) (pkt_data_len + sizeof(struct tcp_hdr));
+
+   memset(tcp_hdr, 0, sizeof(struct tcp_hdr));
+   tcp_hdr->src_port = rte_cpu_to_be_16(src_port);
+   tcp_hdr->dst_port = rte_cpu_to_be_16(dst_port);
+
+   return pkt_len;
+}
+
+uint16_t
+initialize_sctp_header(struct sctp_hdr *sctp_hdr, uint16_t src_port,
+   uint16_t dst_port, uint16_t pkt_data_len)
+{
+   uint16_t pkt_len;
+
+   pkt_len = (uint16_t) (pkt_data_len + sizeof(struct udp_hdr));
+
+   sctp_hdr->src_port = rte_cpu_to_be_16(src_port);
+   sctp_hdr->dst_port = rte_cpu_to_be_16(dst_port);
+   sctp_hdr->tag = 0;
+   sctp_hdr->cksum = 0; /* No SCTP checksum. */
+
+   return pkt_len;
+}
 
 uint16_t
 initialize_ipv6_header(struct ipv6_hdr *ip_hdr, uint8_t *src_addr,
@@ -198,7 +228,53 @@
return pkt_len;
 }
 
+uint16_t
+initialize_ipv4_header_proto(struct ipv4_hdr *ip_hdr, uint32_t src_addr,
+   uint32_t dst_addr, uint16_t pkt_data_len, uint8_t proto)
+{
+   uint16_t pkt_len;
+   unaligned_uint16_t *ptr16;
+   uint32_t ip_cksum;
+
+   /*
+* Initialize IP header.
+*/
+   pkt_len = (uint16_t) (pkt_data_len + sizeof(struct ipv4_hdr));
+
+   ip_hdr->version_ihl   = IP_VHL_DEF;
+   ip_hdr->type_of_service   = 0;
+   ip_hdr->fragment_offset = 0;
+   ip_hdr->time_to_live   = IP_DEFTTL;
+   ip_hdr->next_proto_id = proto;
+   ip_hdr->packet_id = 0;
+   ip_hdr->total_length   = rte_cpu_to_be_16(pkt_len);
+   ip_hdr->src_addr = rte_cpu_to_be_32(src_addr);
+   ip_hdr->dst_addr = rte_cpu_to_be_32(dst_addr);
+
+   /*
+* Compute IP header checksum.
+*/
+   ptr16 = (unaligned_uint16_t *)ip_hdr;
+   ip_cksum = 0;
+   ip_cksum += ptr16[0]; ip_cksum += ptr16[1];
+   ip_cksum += ptr16[2]; ip_cksum += ptr16[3];
+   ip_cksum += ptr16[4];
+   ip_cksum += ptr16[6]; ip_cksum += ptr16[7];
+   ip_cksum += ptr16[8]; ip_cksum += ptr16[9];
 
+   /*
+* Reduce 32 bit checksum to 16 bits and complement it.
+*/
+   ip_cksum = ((ip_cksum & 0x) >> 16) +
+   (ip_cksum & 0x);
+   ip_cksum %= 65536;
+   ip_cksum = (~ip_cksum) & 0x;
+   if (ip_cksum == 0)
+   ip_cksum = 0x;
+   ip_hdr->hdr_checksum = (uint16_t) ip_cksum;
+
+   return pkt_len;
+}
 
 /*
  * The maximum number of segments per packet is used when creating
@@ -283,3 +359,118 @@
 
return nb_pkt;
 }
+
+int
+generate_packet_burst_proto(struct rte_mempool *mp,
+   struct rte_mbuf **pkts_burst,
+   struct ether_hdr *eth_hdr, uint8_t vlan_enabled, void *ip_hdr,
+   uint8_t ipv4, uint8_t proto, void *proto_hdr,
+   int nb_pkt_per_burst, uint8_t pkt_len, uint8_t nb_pkt_segs)
+{
+   int i, nb_pkt = 0;
+   size_t eth_hdr_size;
+
+   struct rte_mbuf *pkt_seg;
+   struct rte_mbuf *pkt;
+
+   for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
+   pkt = rte_pktmbuf_alloc(mp);
+   if (pkt == NULL) {
+nomore_mbuf:
+   if (nb_pkt == 0)
+   return -1;
+   break;
+   }
+
+   pkt->data_len = pkt_len;
+   pkt_seg = pkt;
+   for (i = 1; i < nb_pkt_segs; i++) {
+   pkt_seg->next = rte_pktmbuf_alloc(mp);
+   if (pkt_seg->next == NULL) {
+   pkt->nb_segs = i;
+   rte_pktmbuf_free(pkt);
+   goto nomore_mbuf;
+   }
+   pkt_seg = pkt_seg->next;
+   pkt_seg->data_len = pkt_len;
+   }
+   pkt_seg->next = NULL; /* Last segment of packet. */
+
+   /*
+* Copy headers in first packet segment(s).
+*/
+   if (vlan_enabled)
+   eth_hdr_size = sizeof(struct ether_hdr) +
+   sizeof(struct vlan_hdr);
+   else
+   eth_hdr_size = sizeo

Re: [dpdk-dev] [PATCH v3 3/3] efd: run-time dispatch over x86 EFD functions

2017-10-02 Thread Ananyev, Konstantin

> 
> >
> > This patch dynamically selects x86 EFD functions at run-time.
> 
> I don't think it really does.
> In fact, I am not sure that we need to touch EFD at all here -
> from what I can see, it already does dynamic selection properly.

Actually I was wrong here - in some cases it doesn't work properly.
As I can see for default target proper avx2 code wouldn't be compiled.
So some work still needed here - same as for memcpy().
Konstantin


> Konstantin
> 
> > This patch uses function pointer and binds it to the relative
> > function based on CPU flags at constructor time.
> >
> > Signed-off-by: Xiaoyun Li 
> > ---
> >  lib/librte_efd/rte_efd_x86.h | 41 ++---
> >  1 file changed, 38 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_efd/rte_efd_x86.h b/lib/librte_efd/rte_efd_x86.h
> > index 34f37d7..93b6743 100644
> > --- a/lib/librte_efd/rte_efd_x86.h
> > +++ b/lib/librte_efd/rte_efd_x86.h
> > @@ -43,12 +43,29 @@
> >  #define EFD_LOAD_SI128(val) _mm_lddqu_si128(val)
> >  #endif
> >
> > +typedef efd_value_t
> > +(*efd_lookup_internal_avx2_t)(const efd_hashfunc_t *group_hash_idx,
> > +   const efd_lookuptbl_t *group_lookup_table,
> > +   const uint32_t hash_val_a, const uint32_t hash_val_b);
> > +
> > +static efd_lookup_internal_avx2_t efd_lookup_internal_avx2_ptr;
> > +
> >  static inline efd_value_t
> >  efd_lookup_internal_avx2(const efd_hashfunc_t *group_hash_idx,
> > const efd_lookuptbl_t *group_lookup_table,
> > const uint32_t hash_val_a, const uint32_t hash_val_b)
> >  {
> > -#ifdef RTE_MACHINE_CPUFLAG_AVX2
> > +   return (*efd_lookup_internal_avx2_ptr)(group_hash_idx,
> > +  group_lookup_table,
> > +  hash_val_a, hash_val_b);
> > +}
> > +
> > +#ifdef CC_SUPPORT_AVX2
> > +static inline efd_value_t
> > +efd_lookup_internal_avx2_AVX2(const efd_hashfunc_t *group_hash_idx,
> > +   const efd_lookuptbl_t *group_lookup_table,
> > +   const uint32_t hash_val_a, const uint32_t hash_val_b)
> > +{
> > efd_value_t value = 0;
> > uint32_t i = 0;
> > __m256i vhash_val_a = _mm256_set1_epi32(hash_val_a);
> > @@ -74,13 +91,31 @@ efd_lookup_internal_avx2(const efd_hashfunc_t 
> > *group_hash_idx,
> > }
> >
> > return value;
> > -#else
> > +}
> > +#endif
> > +
> > +static inline efd_value_t
> > +efd_lookup_internal_avx2_DEFAULT(const efd_hashfunc_t *group_hash_idx,
> > +   const efd_lookuptbl_t *group_lookup_table,
> > +   const uint32_t hash_val_a, const uint32_t hash_val_b)
> > +{
> > RTE_SET_USED(group_hash_idx);
> > RTE_SET_USED(group_lookup_table);
> > RTE_SET_USED(hash_val_a);
> > RTE_SET_USED(hash_val_b);
> > /* Return dummy value, only to avoid compilation breakage */
> > return 0;
> > -#endif
> > +}
> >
> > +static void __attribute__((constructor))
> > +rte_efd_x86_init(void)
> > +{
> > +#ifdef CC_SUPPORT_AVX2
> > +   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> > +   efd_lookup_internal_avx2_ptr = efd_lookup_internal_avx2_AVX2;
> > +   else
> > +   efd_lookup_internal_avx2_ptr = efd_lookup_internal_avx2_DEFAULT;
> > +#else
> > +   efd_lookup_internal_avx2_ptr = efd_lookup_internal_avx2_DEFAULT;
> > +#endif
> >  }
> > --
> > 2.7.4



[dpdk-dev] [PATCH v7 1/2] net/i40e: get information about protocols defined in ddp profile

2017-10-02 Thread Kirill Rybalchenko
This patch adds new package info types to get list of protocols,
pctypes and ptypes defined in a profile

---
v3
info_size parameter always represents size of the info buffer in bytes

v6
fix bug with wrong usage of info_size parameter

v7
change misleading variable names, change order of checking variable
for zero value

Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/rte_pmd_i40e.c | 172 
 drivers/net/i40e/rte_pmd_i40e.h |  25 ++
 2 files changed, 197 insertions(+)

diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index c08e07a..8289a43 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -1706,6 +1706,26 @@ rte_pmd_i40e_process_ddp_package(uint8_t port, uint8_t 
*buff,
return status;
 }
 
+/* Get number of tvl records in the section */
+static unsigned int
+i40e_get_tlv_section_size(struct i40e_profile_section_header *sec)
+{
+   unsigned int i, nb_rec, nb_tlv = 0;
+   struct i40e_profile_tlv_section_record *tlv;
+
+   if (!sec)
+   return nb_tlv;
+
+   /* get number of records in the section */
+   nb_rec = sec->section.size / sizeof(struct 
i40e_profile_tlv_section_record);
+   for (i = 0; i < nb_rec; ) {
+   tlv = (struct i40e_profile_tlv_section_record *)&sec[1 + i];
+   i += tlv->len;
+   nb_tlv++;
+   }
+   return nb_tlv;
+}
+
 int rte_pmd_i40e_get_ddp_info(uint8_t *pkg_buff, uint32_t pkg_size,
uint8_t *info_buff, uint32_t info_size,
enum rte_pmd_i40e_package_info type)
@@ -1860,6 +1880,158 @@ int rte_pmd_i40e_get_ddp_info(uint8_t *pkg_buff, 
uint32_t pkg_size,
return I40E_SUCCESS;
}
 
+   /* get number of protocols */
+   if (type == RTE_PMD_I40E_PKG_INFO_PROTOCOL_NUM) {
+   struct i40e_profile_section_header *proto;
+
+   if (info_size < sizeof(uint32_t)) {
+   PMD_DRV_LOG(ERR, "Invalid information buffer size");
+   return -EINVAL;
+   }
+   proto = i40e_find_section_in_profile(SECTION_TYPE_PROTO,
+(struct 
i40e_profile_segment *)
+i40e_seg_hdr);
+   *(uint32_t *)info_buff = i40e_get_tlv_section_size(proto);
+   return I40E_SUCCESS;
+   }
+
+   /* get list of protocols */
+   if (type == RTE_PMD_I40E_PKG_INFO_PROTOCOL_LIST) {
+   uint32_t i, j, nb_tlv, nb_rec, nb_proto_info;
+   struct rte_pmd_i40e_proto_info *pinfo;
+   struct i40e_profile_section_header *proto;
+   struct i40e_profile_tlv_section_record *tlv;
+
+   pinfo = (struct rte_pmd_i40e_proto_info *)info_buff;
+   nb_proto_info = info_size / sizeof(struct 
rte_pmd_i40e_proto_info);
+   for (i = 0; i < nb_proto_info; i++) {
+   pinfo[i].proto_id = RTE_PMD_I40E_PROTO_UNUSED;
+   memset(pinfo[i].name, 0, RTE_PMD_I40E_DDP_NAME_SIZE);
+   }
+   proto = i40e_find_section_in_profile(SECTION_TYPE_PROTO,
+(struct 
i40e_profile_segment *)
+i40e_seg_hdr);
+   nb_tlv = i40e_get_tlv_section_size(proto);
+   if (nb_tlv == 0)
+   return I40E_SUCCESS;
+   if (nb_proto_info < nb_tlv) {
+   PMD_DRV_LOG(ERR, "Invalid information buffer size");
+   return -EINVAL;
+   }
+   /* get number of records in the section */
+   nb_rec = proto->section.size /
+   sizeof(struct 
i40e_profile_tlv_section_record);
+   tlv = (struct i40e_profile_tlv_section_record *)&proto[1];
+   for (i = j = 0; i < nb_rec; j++) {
+   pinfo[j].proto_id = tlv->data[0];
+   strncpy(pinfo[j].name, (const char *)&tlv->data[1],
+   I40E_DDP_NAME_SIZE);
+   i += tlv->len;
+   tlv = &tlv[tlv->len];
+   }
+   return I40E_SUCCESS;
+   }
+
+   /* get number of packet classification types */
+   if (type == RTE_PMD_I40E_PKG_INFO_PCTYPE_NUM) {
+   struct i40e_profile_section_header *pctype;
+
+   if (info_size < sizeof(uint32_t)) {
+   PMD_DRV_LOG(ERR, "Invalid information buffer size");
+   return -EINVAL;
+   }
+   pctype = i40e_find_section_in_profile(SECTION_TYPE_PCTYPE,
+ (struct 
i40e_profile_segment *)
+ i40e_seg_hdr);
+   *(uint32_t *)info_

[dpdk-dev] [PATCH v7 0/2] net/i40e: get information about protocols defined in ddp profile

2017-10-02 Thread Kirill Rybalchenko
This patch adds ability to request information about protocols defined in 
dynamic
device personalization profile

v2:
Some code style warnings were removed

v3:
info_size parameter always represents size of the info buffer in bytes;
fix code style;

v4:
another code style fixes

v5:
in testpmd buff_size parameter in rte_pmd_i40e_get_ddp_info function call
always represents buffer size in bytes

v6:
fix bug with wrong usage of buff_size parameter

v7:
change misleading variable names, change order of checking variable
for zero value

Kirill Rybalchenko (2):
  net/i40e: get information about protocols defined in ddp profile
  app/testpmd: get information about protocols defined in ddp profile

 app/test-pmd/cmdline.c  | 120 ++--
 drivers/net/i40e/rte_pmd_i40e.c | 172 
 drivers/net/i40e/rte_pmd_i40e.h |  25 ++
 3 files changed, 310 insertions(+), 7 deletions(-)

-- 
2.5.5



[dpdk-dev] [PATCH v7 2/2] app/testpmd: get information about protocols defined in ddp profile

2017-10-02 Thread Kirill Rybalchenko
Update 'ddp get info' command to display protocols defined in  a profile

v5
buff_size parameter in rte_pmd_i40e_get_ddp_info function call
always represents buffer size in bytes

v6
fix bug with wrong usage of buff_size parameter

Signed-off-by: Kirill Rybalchenko 
---
 app/test-pmd/cmdline.c | 120 ++---
 1 file changed, 113 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4f2d731..dfca164 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -13427,12 +13427,20 @@ cmd_ddp_info_parsed(
uint32_t pkg_size;
int ret = -ENOTSUP;
 #ifdef RTE_LIBRTE_I40E_PMD
-   uint32_t i;
+   uint32_t i, j, n;
uint8_t *buff;
-   uint32_t buff_size;
+   uint32_t buff_size = 0;
struct rte_pmd_i40e_profile_info info;
-   uint32_t dev_num;
+   uint32_t dev_num = 0;
struct rte_pmd_i40e_ddp_device_id *devs;
+   uint32_t proto_num = 0;
+   struct rte_pmd_i40e_proto_info *proto;
+   uint32_t pctype_num = 0;
+   struct rte_pmd_i40e_ptype_info *pctype;
+   uint32_t ptype_num = 0;
+   struct rte_pmd_i40e_ptype_info *ptype;
+   uint8_t proto_id;
+
 #endif
 
pkg = open_ddp_package_file(res->filepath, &pkg_size);
@@ -13485,12 +13493,11 @@ cmd_ddp_info_parsed(
(uint8_t *)&dev_num, sizeof(dev_num),
RTE_PMD_I40E_PKG_INFO_DEVID_NUM);
if (!ret && dev_num) {
-   devs = (struct rte_pmd_i40e_ddp_device_id *)malloc(dev_num *
-   sizeof(struct rte_pmd_i40e_ddp_device_id));
+   buff_size = dev_num * sizeof(struct rte_pmd_i40e_ddp_device_id);
+   devs = (struct rte_pmd_i40e_ddp_device_id *)malloc(buff_size);
if (devs) {
ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
-   (uint8_t *)devs, dev_num *
-   sizeof(struct 
rte_pmd_i40e_ddp_device_id),
+   (uint8_t *)devs, buff_size,

RTE_PMD_I40E_PKG_INFO_DEVID_LIST);
if (!ret) {
printf("List of supported devices:\n");
@@ -13506,8 +13513,107 @@ cmd_ddp_info_parsed(
free(devs);
}
}
+
+   /* get information about protocols and packet types */
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   (uint8_t *)&proto_num, sizeof(proto_num),
+   RTE_PMD_I40E_PKG_INFO_PROTOCOL_NUM);
+   if (ret || !proto_num)
+   goto no_print_return;
+
+   buff_size = proto_num * sizeof(struct rte_pmd_i40e_proto_info);
+   proto = (struct rte_pmd_i40e_proto_info *)malloc(buff_size);
+   if (!proto)
+   goto no_print_return;
+
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size, (uint8_t *)proto, 
buff_size,
+   RTE_PMD_I40E_PKG_INFO_PROTOCOL_LIST);
+   if (!ret) {
+   printf("List of used protocols:\n");
+   for (i = 0; i < proto_num; i++)
+   printf("  %2u: %s\n", proto[i].proto_id,
+  proto[i].name);
+   printf("\n");
+   }
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size,
+   (uint8_t *)&pctype_num, sizeof(pctype_num),
+   RTE_PMD_I40E_PKG_INFO_PCTYPE_NUM);
+   if (ret || !pctype_num)
+   goto no_print_pctypes;
+
+   buff_size = pctype_num * sizeof(struct rte_pmd_i40e_ptype_info);
+   pctype = (struct rte_pmd_i40e_ptype_info *)malloc(buff_size);
+   if (!pctype)
+   goto no_print_pctypes;
+
+   ret = rte_pmd_i40e_get_ddp_info(pkg, pkg_size, (uint8_t *)pctype, 
buff_size,
+   RTE_PMD_I40E_PKG_INFO_PCTYPE_LIST);
+   if (ret) {
+   free(pctype);
+   goto no_print_pctypes;
+   }
+
+   printf("List of defined packet classification types:\n");
+   for (i = 0; i < pctype_num; i++) {
+   printf("  %2u:", pctype[i].ptype_id);
+   for (j = 0; j < RTE_PMD_I40E_PROTO_NUM; j++) {
+   proto_id = pctype[i].protocols[j];
+   if (proto_id != RTE_PMD_I40E_PROTO_UNUSED) {
+   for (n = 0; n < proto_num; n++) {
+   if (proto[n].proto_id == proto_id) {
+   printf(" %s", proto[n].name);
+   break;
+   }
+   }
+   }
+   }
+   printf("\n");
+   }
+   printf("\n");
+   free(pctype);
+
+no_print_pctypes:
+
+   ret = rte_pmd_i40e_get

[dpdk-dev] [PATCH v2] doc: add use of mlockall to programmers guide

2017-10-02 Thread Eelco Chaudron
When I was adding mlockall() to the testpmd application it was
suggested to add a reference to the use case of mlockall(). This patch
adds is.

Signed-off-by: Eelco Chaudron 
---
 doc/guides/prog_guide/writing_efficient_code.rst | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/doc/guides/prog_guide/writing_efficient_code.rst 
b/doc/guides/prog_guide/writing_efficient_code.rst
index 8223aceea..d7ac6778b 100644
--- a/doc/guides/prog_guide/writing_efficient_code.rst
+++ b/doc/guides/prog_guide/writing_efficient_code.rst
@@ -105,6 +105,21 @@ meaning that if all memory access operations are done on 
the first channel only,
 
 By default, the  :ref:`Mempool Library ` spreads the 
addresses of objects among memory channels.
 
+Locking memory pages
+
+
+The underlying operating system is allowed to load/unload memory pages at its 
own discretion.
+These page loads could impact the performance, as the process is on hold when 
the kernel fetches them.
+
+To avoid these you could pre-load, and lock them into memory with the 
``mlockall()`` call.
+
+.. code-block:: c
+
+if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
+RTE_LOG(NOTICE, USER1, "mlockall() failed with error \"%s\"\n",
+strerror(errno));
+}
+
 Communication Between lcores
 
 
-- 
2.13.6



Re: [dpdk-dev] [PATCH] checkpatch: re-enable warnings about split long strings

2017-10-02 Thread Luca Boccassi
On Fri, 2017-09-29 at 08:37 -0700, Stephen Hemminger wrote:
> The Linux kernel style policy about strings is that strings should
> be always put on one line. This makes sense since a typical use
> case is for a user to type the error message into a search engine
> or grep, and it won't be found if split across lines.
> This patch just re-enables that check.
> 
> Yes, lots of DPDK code now splits strings, that doesn't
> make it right.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  devtools/checkpatches.sh | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
> index a56c41a301c0..3e6081dd673e 100755
> --- a/devtools/checkpatches.sh
> +++ b/devtools/checkpatches.sh
> @@ -44,7 +44,6 @@ options="$options --show-types"
>  options="$options --ignore=LINUX_VERSION_CODE,FILE_PATH_CHANGES,\
>  VOLATILE,PREFER_PACKED,PREFER_ALIGNED,PREFER_PRINTF,\
>  PREFER_KERNEL_TYPES,BIT_MACRO,CONST_STRUCT,\
> -SPLIT_STRING,LONG_LINE_STRING,\
>  LINE_SPACING,PARENTHESIS_ALIGNMENT,NETWORKING_BLOCK_COMMENT_STYLE,\
>  NEW_TYPEDEFS,COMPARISON_TO_NULL"
>  

Acked-by: Luca Boccassi 

+1 - being able to reliably Google/grep across such a large code base
is extremely useful

-- 
Kind regards,
Luca Boccassi


Re: [dpdk-dev] [PATCH v4 1/7] member: implement main API

2017-10-02 Thread De Lara Guarch, Pablo
Hi Yipeng,

> -Original Message-
> From: Wang, Yipeng1
> Sent: Wednesday, September 27, 2017 6:40 PM
> To: dev@dpdk.org
> Cc: tho...@monjalon.net; Tai, Charlie ; Gobriel,
> Sameh ; De Lara Guarch, Pablo
> ; Mcnamara, John
> ; Wang, Yipeng1 
> Subject: [PATCH v4 1/7] member: implement main API
> 
> Membership library is an extension and generalization of a traditional filter
> (for example Bloom Filter) structure. In general, the Membership library is a
> data structure that provides a "set-summary" and responds to set-
> membership queries of whether a certain element belongs to a set(s). A
> membership test for an element will return the set this element belongs to
> or not-found if the element is never inserted into the set-summary.
> 
> The results of the membership test are not 100% accurate. Certain false
> positive or false negative probability could exist. However, comparing to a
> "full-blown" complete list of elements, a "set-summary"
> is memory efficient and fast on lookup.
> 
> This patch adds the main API definition.
> 
> Signed-off-by: Yipeng Wang 

A few comments on changes that you didn't make in the v4.

Thanks,
Pablo

...

> +
> +struct rte_member_setsum *
> +rte_member_create(const struct rte_member_parameters *params) {
> + struct rte_tailq_entry *te;
> + struct rte_member_list *member_list;
> + struct rte_member_setsum *setsum;
> + int ret;
> +
> + if (params == NULL) {
> + rte_errno = EINVAL;
> + return NULL;
> + }
> +
> + if (params->key_len == 0 ||
> + params->prim_hash_seed == params-
> >sec_hash_seed) {
> + rte_errno = EINVAL;
> + RTE_MEMBER_LOG(ERR, "Memship create with invalid
> parameters\n");

Do not use "Memship". Change to " rte_member_create has invalid parameters"?
Or something else that you want, but not Memship.

> + return NULL;
> + }
> +

...

> +struct rte_member_parameters {
> + const char *name;   /**< Name of the hash. */
> +
> + /**
> +  * User to specify the type of the setsummary from one of
> +  * rte_member_setsum_type.
> +  *
> +  * HT based setsummary is implemented like a hash table. User
> should use
> +  * this type when there are many sets.
> +  *
> +  * vBF setsummary is a vector of bloom filters. It is used when
> number
> +  * of sets is not big (less than 32 for current implementation).
> +  */
> + enum rte_member_setsum_type type;
> +
> + /**
> +  * If it is HT based setsummary, user to specify the subtype or mode
> +  * of the setsummary. It could be cache, or non-cache mode.
> +  * Set iscache to be 1 if to use as cache mode.

Change to "is_cache".

> +  *
> +  * For cache mode, keys can be evicted out of the HT setsummary.
> Keys
> +  * with the same signature and map to the same bucket
> +  * will overwrite each other in the setsummary table.
> +  * This mode is useful for the case that the set-summary only
> +  * needs to keep record of the recently inserted keys. Both
> +  * false-negative and false-positive could happen.
> +  *
> +  * For non-cache mode, keys cannot be evicted out of the cache. So
> for
> +  * this mode the setsummary will become full eventually. Keys with
> the
> +  * same signature but map to the same bucket will still occupy
> multiple
> +  * entries. This mode does not give false-negative result.
> +  */
> + uint8_t is_cache;
> +
> + /**
> +  * For HT setsummary, num_keys equals to the number of entries
> of the
> +  * table. When the number of keys that inserted in the HT
> setsummary

"number of keys inserted in the HT summary". Or "were inserted".

> +  * approaches this number, eviction could happen. For cache mode,
> +  * keys could be evicted out of the table. For non-cache mode, keys
> will

...

> + /**
> +  * false_positive_rate is only relevant to vBF based setsummary.
> +  * false_positive_rate is the user-defined false positive rate
> +  * given expected number of inserted keys (num_keys). It is used to
> +  * calculate the total number of bits for each BF, and the number of
> +  * hash values used during lookup and insertion. For details please
> +  * refer to vBF implementation and membership library
> documentation.
> +  * Note that this parameter is not directly set by users for HT mode.
> +  *
> +  * HT setsummary's false positive rate is in the order of:
> +  * false_pos = (1/bucket_count)*(1/2^16), since we use 16-bit
> signature.
> +  * This is because two keys needs to map to same bucket and same
> +  * signature to have a collision (false positive). bucket_count is
> equal
> +  * to number of entries (num_keys) divided by entry count per
> bucket
> +  * (RTE_MEMBER_BUCKET_ENTRIES). Thus, the false_positive_rate
> is not
> +  * directly set by users.

Unless I und

Re: [dpdk-dev] [PATCH v2 07/12] eal: add channel for primary/secondary communication

2017-10-02 Thread Burakov, Anatoly

On 30-Sep-17 5:07 AM, Tan, Jianfeng wrote:



On 9/29/2017 6:00 PM, Burakov, Anatoly wrote:

On 29-Sep-17 2:03 AM, Tan, Jianfeng wrote:

+ Reshma and Jan.


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Burakov, Anatoly
Sent: Thursday, September 28, 2017 11:30 PM
To: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 07/12] eal: add channel for
primary/secondary communication

On 28-Sep-17 4:01 PM, Ananyev, Konstantin wrote:

Hi Jianfeng,



-Original Message-
From: Tan, Jianfeng
Sent: Thursday, September 28, 2017 2:56 PM
To: dev@dpdk.org
Cc: Richardson, Bruce ; Ananyev,

Konstantin ; De Lara Guarch, Pablo

; tho...@monjalon.net;

y...@fridaylinux.org; maxime.coque...@redhat.com; mtetsu...@gmail.com;

Yigit, Ferruh ; Tan, Jianfeng



Subject: [PATCH v2 07/12] eal: add channel for primary/secondary

communication


Previouly, there is only one way for primary/secondary to exchange
messages, that is, primary process writes info into some predefind
file, and secondary process reads info out. That cannot address
the requirements:
    a. Secondary wants to send info to primary, for example, 
secondary
   would like to send request (about some specific vdev to 
primary).

    b. Sending info at any time, instead of just initialization time.
    c. Share FDs with the other side, for vdev like vhost, related 
FDs

   (memory region, kick) should be shared.

This patch proposes to create a communication channel, as an unix
socket connection, for above requirements. Primary will listen on
the unix socket; secondary will connect this socket to talk.

Three new APIs are added:

    1. rte_eal_mp_action_register is used to register an action,
   indexed by a string; if the calling component wants to
   response the messages from the corresponding component in
   its primary process or secondary processes.
    2. rte_eal_mp_action_unregister is used to unregister the action
   if the calling component does not want to response the 
messages.

    3. rte_eal_mp_sendmsg is used to send a message.


I think we already have similar channel in librte_pdump().
Also it seems like eal_vfio also has it's own socket to communicate

between mp/sp.
Could we probably make it generic - so same code (and socket) be 
used by

all such  places.

Konstantin



Agreed, however looking at this, it's already a generic-enough 
solution,

and other places could be fixed to use this in later releases.


Yes, to provide a generic communication way, instead of one channel 
for each subsystem, is the goal of this patch.


Reshma and Jan, can I ask comment from you to have a look if the way 
of this patch is generic enough to cover pdump and vfio-sync's 
requirement?


Possible limitation of this patch (so far) is that it only provides 
an async way for request/response, do we need synchronous way?


That said, i believe this particular part of the patchset should go 
in as a

separate patchset and more design consideration and input from others.


OK, let's collect more info here, and then take out this patch out as 
a separate patch.


Thanks,
Jianfeng


Hi Jianfeng,

Yes, i believe VFIO does need synchronous communcation, because it 
relies on passing fd's from primary to secondary, on request. It could 
be rewritten to be asynchronous, similarly to how you handle vdevs 
here, but as it stands, it assumes synchronous communication.




Good to know, thanks Anatoly. Even it can be rewritten to do in asyn 
way, do we need to propose sync way now?


Thanks,
Jianfeng



I believe that we do, because we can't assume that everything can be 
rewritten to be asynchronous. I'm open to other opinions though :)


--
Thanks,
Anatoly


[dpdk-dev] [PATCH v2] eal/x86: implement x86 specific tsc hz

2017-10-02 Thread Sergio Gonzalez Monroy
First, try to use CPUID Time Stamp Counter and Nominal Core Crystal
Clock Information Leaf to determine the tsc hz on platforms that
supports it (does not require privileged user).

If the CPUID leaf is not available, then try to determine the tsc hz by
reading the MSR 0xCE (requires privileged user).

Default to the tsc hz estimation if both methods fail.

Signed-off-by: Sergio Gonzalez Monroy 
---
DEPENDS on:
http://dpdk.org/dev/patchwork/patch/29086/

v2:
 - fix misspelled word in commit message
 - address comment for more clear code

 lib/librte_eal/common/arch/x86/rte_cycles.c| 142 +
 .../common/include/arch/x86/rte_cycles.h   |   7 +-
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 3 files changed, 145 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/common/arch/x86/rte_cycles.c

diff --git a/lib/librte_eal/common/arch/x86/rte_cycles.c 
b/lib/librte_eal/common/arch/x86/rte_cycles.c
new file mode 100644
index 000..7cf6093
--- /dev/null
+++ b/lib/librte_eal/common/arch/x86/rte_cycles.c
@@ -0,0 +1,142 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static unsigned int
+rte_cpu_get_model(uint32_t fam_mod_step)
+{
+   uint32_t family, model, ext_model;
+
+   family = (fam_mod_step >> 8) & 0xf;
+   model = (fam_mod_step >> 4) & 0xf;
+
+   if (family == 6 || family == 15) {
+   ext_model = (fam_mod_step >> 16) & 0xf;
+   model += (ext_model << 4);
+   }
+
+   return model;
+}
+
+static int32_t
+rdmsr(int msr, uint64_t *val)
+{
+   int fd;
+   int ret;
+
+   fd = open("/dev/cpu/0/msr", O_RDONLY);
+   if (fd < 0)
+   return fd;
+
+   ret = pread(fd, val, sizeof(uint64_t), msr);
+
+   close(fd);
+
+   return ret;
+}
+
+static uint32_t
+check_model_wsm_nhm(uint8_t model)
+{
+   switch (model) {
+   /* Westmere */
+   case 0x25:
+   case 0x2C:
+   case 0x2F:
+   /* Nehalem */
+   case 0x1E:
+   case 0x1F:
+   case 0x1A:
+   case 0x2E:
+   return 1;
+   }
+
+   return 0;
+}
+
+static uint32_t
+check_model_gdm_dnv(uint8_t model)
+{
+   switch (model) {
+   /* Goldmont */
+   case 0x5C:
+   /* Denverton */
+   case 0x5F:
+   return 1;
+   }
+
+   return 0;
+}
+
+uint64_t
+rte_rdtsc_arch_hz(void)
+{
+   uint64_t tsc_hz = 0;
+   uint32_t a, b, c, d, maxleaf;
+   uint8_t mult, model;
+   int32_t ret;
+
+   /*
+* Time Stamp Counter and Nominal Core Crystal Clock
+* Information Leaf
+*/
+   maxleaf = __get_cpuid_max(0, NULL);
+
+   if (maxleaf >= 0x15) {
+   __cpuid(0x15, a, b, c, d);
+
+   /* EBX : TSC/Crystal ratio, ECX : Crystal Hz */
+   if (b && c)
+   return c * (b / a);
+   }
+
+   __cpuid(0x1, a, b, c, d);
+   model = rte_cpu_get_model(a);
+
+   if (check_model_wsm_nhm(model))
+   mult = 133;
+   else if ((c & bit_AVX) || check_model_gdm_dnv(model))
+   mult = 100;
+   else
+   return 0;
+
+   ret = rdmsr(0xCE, &tsc_hz);
+   if (ret < 0)
+   return 0;
+
+   return ((tsc_hz >> 8) & 

Re: [dpdk-dev] [PATCH v2] doc: add use of mlockall to programmers guide

2017-10-02 Thread Mcnamara, John


> -Original Message-
> From: Eelco Chaudron [mailto:echau...@redhat.com]
> Sent: Monday, October 2, 2017 11:02 AM
> To: Mcnamara, John 
> Cc: dev@dpdk.org
> Subject: [PATCH v2] doc: add use of mlockall to programmers guide
> 
> When I was adding mlockall() to the testpmd application it was suggested
> to add a reference to the use case of mlockall(). This patch adds is.
> 
> Signed-off-by: Eelco Chaudron 

Acked-by: John McNamara 




Re: [dpdk-dev] [PATCH 01/38] eal: add support for 24 40 and 48 bit operations

2017-10-02 Thread Avi Kivity



On 06/16/2017 08:40 AM, Shreyansh Jain wrote:

From: Hemant Agrawal 

Bit Swap and LE<=>BE conversions for 23, 40 and 48 bit width

Signed-off-by: Hemant Agrawal 
---
  .../common/include/generic/rte_byteorder.h | 78 ++
  1 file changed, 78 insertions(+)

diff --git a/lib/librte_eal/common/include/generic/rte_byteorder.h 
b/lib/librte_eal/common/include/generic/rte_byteorder.h
index e00bccb..8903ff6 100644
--- a/lib/librte_eal/common/include/generic/rte_byteorder.h
+++ b/lib/librte_eal/common/include/generic/rte_byteorder.h
@@ -122,6 +122,84 @@ rte_constant_bswap64(uint64_t x)
((x & 0xff00ULL) >> 56);
  }
  
+/*

+ * An internal function to swap bytes of a 48-bit value.
+ */
+static inline uint64_t
+rte_constant_bswap48(uint64_t x)
+{
+   return  ((x & 0x00ffULL) << 40) |
+   ((x & 0xff00ULL) << 24) |
+   ((x & 0x00ffULL) <<  8) |
+   ((x & 0xff00ULL) >>  8) |
+   ((x & 0x00ffULL) >> 24) |
+   ((x & 0xff00ULL) >> 40);
+}
+


Won't something like bswap64(x << 16) be much more efficient? Two 
instructions for the non-constant case, compared to 15-20 here.




Re: [dpdk-dev] [PATCH v4 3/4] eventdev: Add eventdev ethernet Rx adapter

2017-10-02 Thread Rao, Nikhil

On 9/22/2017 11:38 AM, santosh wrote:





In general api comment: Fix missing param definition like *service_id* above
and pl. remove other unnecessary params description from api above.


OK.



+static inline int
+valid_id(uint8_t id)
+{
+   return id < RTE_MAX_EVENT_ETH_RX_ADAPTER_INSTANCE;
+}
+
+#define RTE_EVENT_ETH_RX_ADAPTER_ID_VALID_OR_ERR_RET(id, retval) do { \
+   if (!valid_id(id)) { \
+   RTE_EDEV_LOG_ERR("Invalid eth Rx adapter id = %d\n", id); \
+   return retval; \
+   } \
+} while (0)
+


Worth, moving this macro to rte_eventdev_pmd.h
Or How about reusing existing one ie.. RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET? >
I didn't see a reason for this macro to be used elsewhere apart from 
rte_event_eth_rx_adapter.c.
Also, the check is different from the one in 
RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET.



+
+static inline void
+mtoip(struct rte_mbuf *m, struct ipv4_hdr **ipv4_hdr,
+   struct ipv6_hdr **ipv6_hdr)
+{


mtoip(), imo is more of global api, likely other modules may use in future..
perhaps move to rte_io.h Or more correct place.. thought?



Good suggestion, Will post a separate patch for this in the future.

Nikhil



Re: [dpdk-dev] [PATCH] doc: Adds reference to use mlockall() in the Programmer's guide

2017-10-02 Thread Eelco Chaudron

On 29/09/17 17:44, Mcnamara, John wrote:

Hi Eelco,

Thanks for that. It is always good to get doc improvements.
Some minor comments below.


Thanks for pointing out the below issues, and I send out a V2.

The title should be lowercase (except for known acronyms), <= 50 
characters

and the verb should be in the imperative. This is explained in the  
Contributor's
Guide:

http://dpdk.org/doc/guides/contributing/patches.html#commit-messages-subject-line

Also you can check with the DPDK check-git-log.sh tool:

 $ devtools/check-git-log.sh
 Wrong headline uppercase:
 doc: Adds reference to use mlockall() in the Programmer's guide
 Headline too long:
 doc: Adds reference to use mlockall() in the Programmer's guide

I'd suggest a title like:

 doc: add use of mlockall to programmers guide


diff --git a/doc/guides/prog_guide/writing_efficient_code.rst
b/doc/guides/prog_guide/writing_efficient_code.rst
index 8223aceea..3975026ce 100644
--- a/doc/guides/prog_guide/writing_efficient_code.rst
+++ b/doc/guides/prog_guide/writing_efficient_code.rst
@@ -105,6 +105,20 @@ meaning that if all memory access operations are done
on the first channel only,

  By default, the  :ref:`Mempool Library ` spreads the
addresses of objects among memory channels.

+Locking memory pages
+

Add a blank line after a header+underline.


+The underlying operating system is allowed to load/unload memory pages at
its own discretion.
+These page loads could impact the performance, as the process is on hold
when the kernel fetches them.
+
+To avoid these you could pre-load, and lock them into memory with the
mlockall() call.

Include the function call in backquotes: ``mlockall()``.

Thanks,

John

Reviewed-by: John McNamara 






Re: [dpdk-dev] [PATCH v4 3/4] eventdev: Add eventdev ethernet Rx adapter

2017-10-02 Thread Rao, Nikhil

On 9/25/2017 8:29 AM, Rao, Nikhil wrote:

On 9/24/2017 11:46 PM, Rao, Nikhil wrote:

On 9/22/2017 2:40 PM, Jerin Jacob wrote:

When we worked on a prototype, we figured out that we need a separate 
event type

for RX adapter. Probably RTE_EVENT_TYPE_ETHDEV_RX_ADAPTER?
The Reason is:
- In the HW based Rx adapter case, the packet are coming directly to 
eventdev once it is configured.
- So on a HW implementation of the event dequeue(), CPU needs to 
convert HW specific

metadata to mbuf
- The event dequeue() is used in two cases
a) octeontx eventdev driver used with any external NIC
b) octeontx eventdev driver used with integrated NIC(without service
core to inject the packet)
We need some identifier to understand case (a) and (b).So, in 
dequeue(), if the
packet is from RTE_EVENT_TYPE_ETHDEV then we can do "HW specific 
metadata" to mbuf
conversion and in another case (!RTE_EVENT_TYPE_ETHDEV) result in no 
mbuf

conversion.

Application can check if it is an Ethernet type event by
ev.event_type == RTE_EVENT_TYPE_ETHDEV || ev.event_type ==
RTE_EVENT_TYPE_ETHDEV_RX_ADAPTER



As per my understanding, the case (a) uses an in built port
Is it possible for the eventdev PMD to do the conversion based off the 
eventdev port ?




I realized the dequeue wouldn't have knowledge of the port the event was 
injected from, the application shouldn't have to see the difference 
between case (a) & (b).


Would it be possible to use the impl_opaque field within struct rte_event ?

Nikhil


Hi Jerin,

Any further thoughts on this ?

Nikhil



Re: [dpdk-dev] [PATCH v4 4/4] eventdev: Add tests for event eth Rx adapter APIs

2017-10-02 Thread Jerin Jacob
-Original Message-
> Date: Sun, 24 Sep 2017 23:54:38 +0530
> From: "Rao, Nikhil" 
> To: Jerin Jacob 
> CC: bruce.richard...@intel.com, gage.e...@intel.com, dev@dpdk.org,
>  tho...@monjalon.net, harry.van.haa...@intel.com, hemant.agra...@nxp.com,
>  nipun.gu...@nxp.com, narender.vang...@intel.com,
>  erik.g.carri...@intel.com, abhinandan.guj...@intel.com,
>  santosh.shu...@caviumnetworks.com
> Subject: Re: [PATCH v4 4/4] eventdev: Add tests for event eth Rx adapter
>  APIs
> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.3.0
> 
> On 9/22/2017 5:42 PM, Jerin Jacob wrote:
> > -Original Message-
> > > Date: Fri, 22 Sep 2017 02:47:14 +0530
> > > From: Nikhil Rao 
> > > To: jerin.ja...@caviumnetworks.com, bruce.richard...@intel.com
> > > CC: gage.e...@intel.com, dev@dpdk.org, tho...@monjalon.net,
> > >   harry.van.haa...@intel.com, hemant.agra...@nxp.com, nipun.gu...@nxp.com,
> > >   narender.vang...@intel.com, erik.g.carri...@intel.com,
> > >   abhinandan.guj...@intel.com, santosh.shu...@caviumnetworks.com
> > > Subject: [PATCH v4 4/4] eventdev: Add tests for event eth Rx adapter APIs
> > > X-Mailer: git-send-email 2.7.4
> > > 
> > > Add unit tests for rte_event_eth_rx_adapter_xxx() APIs
> > 
> 
> 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include 
> > > +
> > > +#include "test.h"
> > > +
> > > +/* i40e limits max to 64 */
> > 
> > This comment could be removed.
> > 
> OK, I am documenting why the code doesn't just use dev_info.max_rx_queues,
> won't the comment be useful to retain ?

OK. If dev_info.max_rx_queues for i40e is not 64 as expected then we
could the fix the i40e driver as well.


> 


Re: [dpdk-dev] [PATCH v4 3/4] eventdev: Add eventdev ethernet Rx adapter

2017-10-02 Thread Jerin Jacob
-Original Message-
> Date: Mon, 2 Oct 2017 15:58:56 +0530
> From: "Rao, Nikhil" 
> To: Jerin Jacob 
> CC: bruce.richard...@intel.com, gage.e...@intel.com, dev@dpdk.org,
>  tho...@monjalon.net, harry.van.haa...@intel.com, hemant.agra...@nxp.com,
>  nipun.gu...@nxp.com, narender.vang...@intel.com,
>  erik.g.carri...@intel.com, abhinandan.guj...@intel.com,
>  santosh.shu...@caviumnetworks.com
> Subject: Re: [PATCH v4 3/4] eventdev: Add eventdev ethernet Rx adapter
> User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
>  Thunderbird/52.3.0
> 
> On 9/25/2017 8:29 AM, Rao, Nikhil wrote:
> > On 9/24/2017 11:46 PM, Rao, Nikhil wrote:
> > > On 9/22/2017 2:40 PM, Jerin Jacob wrote:
> > > 
> > > > When we worked on a prototype, we figured out that we need a
> > > > separate event type
> > > > for RX adapter. Probably RTE_EVENT_TYPE_ETHDEV_RX_ADAPTER?
> > > > The Reason is:
> > > > - In the HW based Rx adapter case, the packet are coming
> > > > directly to eventdev once it is configured.
> > > > - So on a HW implementation of the event dequeue(), CPU needs to
> > > > convert HW specific
> > > > metadata to mbuf
> > > > - The event dequeue() is used in two cases
> > > > a) octeontx eventdev driver used with any external NIC
> > > > b) octeontx eventdev driver used with integrated NIC(without service
> > > > core to inject the packet)
> > > > We need some identifier to understand case (a) and (b).So, in
> > > > dequeue(), if the
> > > > packet is from RTE_EVENT_TYPE_ETHDEV then we can do "HW specific
> > > > metadata" to mbuf
> > > > conversion and in another case (!RTE_EVENT_TYPE_ETHDEV) result
> > > > in no mbuf
> > > > conversion.
> > > > 
> > > > Application can check if it is an Ethernet type event by
> > > > ev.event_type == RTE_EVENT_TYPE_ETHDEV || ev.event_type ==
> > > > RTE_EVENT_TYPE_ETHDEV_RX_ADAPTER
> > > > 
> > > 
> > > As per my understanding, the case (a) uses an in built port
> > > Is it possible for the eventdev PMD to do the conversion based off
> > > the eventdev port ?
> > > 
> > 
> > I realized the dequeue wouldn't have knowledge of the port the event was
> > injected from, the application shouldn't have to see the difference
> > between case (a) & (b).
> > 
> > Would it be possible to use the impl_opaque field within struct rte_event ?
> > 
> > Nikhil
> 
> Hi Jerin,
> 
> Any further thoughts on this ?

impl_opaque field could be one option. But I think, NXP driver is using
it for internal operation. So overriding it from Rx adapter will cause
issue. How about adding new event type? So it gets a new name space so no
collision.

➜ [master][dpdk-next-eventdev] $ git diff
diff --git a/lib/librte_eventdev/rte_eventdev.h
b/lib/librte_eventdev/rte_eventdev.h
index ec7aabd9a..b33423c7e 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -878,6 +878,8 @@ rte_event_dev_close(uint8_t dev_id);
 /**< The event generated from cpu for pipelining.
  * Application may use *sub_event_type* to further classify the event
  */
+#define RTE_EVENT_TYPE_ETHDEV_ADAPTER   0x4
+/**< The event generated from ethdev Rx adapter */
 #define RTE_EVENT_TYPE_MAX  0x10
 /**< Maximum number of event types */


Re: [dpdk-dev] [PATCH v3] net/bonding: support bifurcated driver in eal cli using --vdev

2017-10-02 Thread Doherty, Declan

On 20/09/2017 7:04 PM, Gowrishankar wrote:

From: Gowrishankar Muthukrishnan 

At present, creating bonding devices using --vdev is broken for PMD like
mlx5 as it is neither UIO nor VFIO based and hence PMD driver is unknown
to find_port_id_by_pci_addr(), as below.

testpmd  --vdev 'net_bonding0,mode=1,slave=,socket_id=0'

PMD: bond_ethdev_parse_slave_port_kvarg(150) - Invalid slave port value
  () specified
EAL: Failed to parse slave ports for bonded device net_bonding0

This patch fixes parsing PCI ID from bonding device params by verifying
it in RTE PCI bus, rather than checking dev->kdrv.

Fixes: eac901ce ("ethdev: decouple from PCI device")
Signed-off-by: Gowrishankar Muthukrishnan 
---
v3:
  - adapt rte_bus API (with suggestions from Declan and Gaëtan)


...




Acked-by: Declan Doherty 


Re: [dpdk-dev] [PATCH v2 2/4] net/mrvl: add mrvl net pmd driver

2017-10-02 Thread Bruce Richardson
On Fri, Sep 29, 2017 at 08:38:00AM -0700, Stephen Hemminger wrote:
> On Thu, 28 Sep 2017 12:22:36 +0200
> Tomasz Duszynski  wrote:
> 
> > +
> > +struct mrvl_rxq;
> > +struct mrvl_txq;
> 
> These forward decl should not be nececessary
> > +static inline int
> > +mrvl_get_bpool_size(intpp2_id, int pool_id)
>  No tab here please
> 
> Why does this need to be inlined?  Is it in critical path?
> 
> 
> > +{
> > +   int i;
> > +   int size = 0;
> > +
> > +   for (i = mrvl_lcore_first; i <= mrvl_lcore_last; i++)
> > +   size += mrvl_port_bpool_size[pp2_id][pool_id][i];
> > +
> > +   return size;
> > +}
> > +
> 
> Also, I prefer that the following restrictions from the kernel be
> also applied to DPDK code.
> 

+1 for LINE_SPACING (multiple blank lines), MULTILINE_DEREFERENCE and
SPLIT_STRING. [Is split string not already enforced?]

I'm a bit ambivilant about forcing a blank line after definitions, I
think it's a good idea in most cases, but I'm not sure it needs to be
enforced in all cases.

/Bruce

> ### [dpdk-dev] [PATCH v2 2/4] net/mrvl: add mrvl net pmd driver
> 
> CHECK:LINE_SPACING: Please don't use multiple blank lines
> #452: FILE: drivers/net/mrvl/mrvl_ethdev.c:180:
> +
> +
> 
> WARNING:LINE_SPACING: Missing a blank line after declarations
> #457: FILE: drivers/net/mrvl/mrvl_ethdev.c:185:
> + int n = sizeof(*bitmap) * 8 - __builtin_clz(*bitmap);
> + if (n >= max)
> 
> CHECK:LINE_SPACING: Please don't use multiple blank lines
> #562: FILE: drivers/net/mrvl/mrvl_ethdev.c:290:
> +
> +
> 
> WARNING:MULTILINE_DEREFERENCE: Avoid multiple line dereference - prefer 
> 'priv->ppio_params.inqs_params.tcs_params[i].inqs_params'
> #880: FILE: drivers/net/mrvl/mrvl_ethdev.c:608:
> + rte_free(priv->ppio_params.inqs_params.
> + tcs_params[i].inqs_params);
> 
> WARNING:MULTILINE_DEREFERENCE: Avoid multiple line dereference - prefer 
> 'priv->ppio_params.inqs_params.tcs_params[i].inqs_params'
> #882: FILE: drivers/net/mrvl/mrvl_ethdev.c:610:
> + priv->ppio_params.inqs_params.
> + tcs_params[i].inqs_params = NULL;
> 
> WARNING:MULTILINE_DEREFERENCE: Avoid multiple line dereference - prefer 
> 'priv->ppio_params.inqs_params.tcs_params[tc].inqs_params[inq].size'
> #1330: FILE: drivers/net/mrvl/mrvl_ethdev.c:1058:
> + qinfo->nb_desc = priv->ppio_params.inqs_params.
> +  tcs_params[tc].inqs_params[inq].size;
> 
> WARNING:SPLIT_STRING: quoted string split across lines
> #1476: FILE: drivers/net/mrvl/mrvl_ethdev.c:1204:
> + RTE_LOG(ERR, PMD, "Mbuf size must be increased to %u bytes"
> +   " to hold up to %u bytes of data.\n",
> 
> WARNING:MULTILINE_DEREFERENCE: Avoid multiple line dereference - prefer 
> 'priv->ppio_params.inqs_params.tcs_params[priv->rxq_map[rxq->queue_id].tc'
> #1500: FILE: drivers/net/mrvl/mrvl_ethdev.c:1228:
> + priv->ppio_params.inqs_params.
> + tcs_params[priv->rxq_map[rxq->queue_id].tc].
> 
> WARNING:MULTILINE_DEREFERENCE: Avoid multiple line dereference - prefer 
> 'q->priv->ppio_params.inqs_params.tcs_params[q->priv->rxq_map[q->queue_id].tc'
> #1532: FILE: drivers/net/mrvl/mrvl_ethdev.c:1260:
> + num = q->priv->ppio_params.inqs_params.
> + tcs_params[q->priv->rxq_map[q->queue_id].tc].
> 
> WARNING:SPLIT_STRING: quoted string split across lines
> #1902: FILE: drivers/net/mrvl/mrvl_ethdev.c:1630:
> + RTE_LOG(DEBUG, PMD, "\nport-%d:%d: bpool %d oversize -"
> + " remove %d buffers (pool size: %d -> %d)\n",
> 
> WARNING:SPLIT_STRING: quoted string split across lines
> #2094: FILE: drivers/net/mrvl/mrvl_ethdev.c:1822:
> + "No room in shadow queue for %d packets!!!"
> + "%d packets will be sent.\n",
> 
> CHECK:LINE_SPACING: Please don't use multiple blank lines
> #2294: FILE: drivers/net/mrvl/mrvl_ethdev.c:2022:
> +
> +
> 
> CHECK:LINE_SPACING: Please don't use multiple blank lines
> #2595: FILE: drivers/net/mrvl/mrvl_ethdev.h:40:
> +
> +
> 
> WARNING:LINE_SPACING: Missing a blank line after declarations
> #3253: FILE: drivers/net/mrvl/mrvl_qos.c:577:
> + uint8_t idx = port_cfg->tc[tc].inq[i];
> + priv->rxq_map[idx].tc = tc;
> 


[dpdk-dev] [PATCH v3] eal/x86: implement x86 specific tsc hz

2017-10-02 Thread Sergio Gonzalez Monroy
First, try to use CPUID Time Stamp Counter and Nominal Core Crystal
Clock Information Leaf to determine the tsc hz on platforms that
supports it (does not require privileged user).

If the CPUID leaf is not available, then try to determine the tsc hz by
reading the MSR 0xCE (requires privileged user).

Default to the tsc hz estimation if both methods fail.

Signed-off-by: Sergio Gonzalez Monroy 
Acked-by: Harry van Haaren 
Tested-by: Bruce Richardson 
---
DEPENDS on:
http://dpdk.org/dev/patchwork/patch/29086/

v3:
 - acked-by and tested-by tags

v2:
 - fix misspelled word in commit message
 - address comment for more clear code

 lib/librte_eal/common/arch/x86/rte_cycles.c| 142 +
 .../common/include/arch/x86/rte_cycles.h   |   7 +-
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 3 files changed, 145 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/common/arch/x86/rte_cycles.c

diff --git a/lib/librte_eal/common/arch/x86/rte_cycles.c 
b/lib/librte_eal/common/arch/x86/rte_cycles.c
new file mode 100644
index 000..7cf6093
--- /dev/null
+++ b/lib/librte_eal/common/arch/x86/rte_cycles.c
@@ -0,0 +1,142 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static unsigned int
+rte_cpu_get_model(uint32_t fam_mod_step)
+{
+   uint32_t family, model, ext_model;
+
+   family = (fam_mod_step >> 8) & 0xf;
+   model = (fam_mod_step >> 4) & 0xf;
+
+   if (family == 6 || family == 15) {
+   ext_model = (fam_mod_step >> 16) & 0xf;
+   model += (ext_model << 4);
+   }
+
+   return model;
+}
+
+static int32_t
+rdmsr(int msr, uint64_t *val)
+{
+   int fd;
+   int ret;
+
+   fd = open("/dev/cpu/0/msr", O_RDONLY);
+   if (fd < 0)
+   return fd;
+
+   ret = pread(fd, val, sizeof(uint64_t), msr);
+
+   close(fd);
+
+   return ret;
+}
+
+static uint32_t
+check_model_wsm_nhm(uint8_t model)
+{
+   switch (model) {
+   /* Westmere */
+   case 0x25:
+   case 0x2C:
+   case 0x2F:
+   /* Nehalem */
+   case 0x1E:
+   case 0x1F:
+   case 0x1A:
+   case 0x2E:
+   return 1;
+   }
+
+   return 0;
+}
+
+static uint32_t
+check_model_gdm_dnv(uint8_t model)
+{
+   switch (model) {
+   /* Goldmont */
+   case 0x5C:
+   /* Denverton */
+   case 0x5F:
+   return 1;
+   }
+
+   return 0;
+}
+
+uint64_t
+rte_rdtsc_arch_hz(void)
+{
+   uint64_t tsc_hz = 0;
+   uint32_t a, b, c, d, maxleaf;
+   uint8_t mult, model;
+   int32_t ret;
+
+   /*
+* Time Stamp Counter and Nominal Core Crystal Clock
+* Information Leaf
+*/
+   maxleaf = __get_cpuid_max(0, NULL);
+
+   if (maxleaf >= 0x15) {
+   __cpuid(0x15, a, b, c, d);
+
+   /* EBX : TSC/Crystal ratio, ECX : Crystal Hz */
+   if (b && c)
+   return c * (b / a);
+   }
+
+   __cpuid(0x1, a, b, c, d);
+   model = rte_cpu_get_model(a);
+
+   if (check_model_wsm_nhm(model))
+   mult = 133;
+   else if ((c & bit_AVX) || check_model_gdm_dnv(model))
+   mult = 100;
+   else
+   return 0;
+
+   ret = rdmsr(0xCE

Re: [dpdk-dev] [PATCH v3] eal/x86: implement x86 specific tsc hz

2017-10-02 Thread Jerin Jacob
-Original Message-
> Date: Mon, 2 Oct 2017 12:17:38 +0100
> From: Sergio Gonzalez Monroy 
> To: dev@dpdk.org
> CC: harry.van.haa...@intel.com, bruce.richard...@intel.com
> Subject: [dpdk-dev] [PATCH v3] eal/x86: implement x86 specific tsc hz
> X-Mailer: git-send-email 2.9.5
> 
> First, try to use CPUID Time Stamp Counter and Nominal Core Crystal
> Clock Information Leaf to determine the tsc hz on platforms that
> supports it (does not require privileged user).
> 
> If the CPUID leaf is not available, then try to determine the tsc hz by
> reading the MSR 0xCE (requires privileged user).
> 
> Default to the tsc hz estimation if both methods fail.
> 
> Signed-off-by: Sergio Gonzalez Monroy 
> Acked-by: Harry van Haaren 
> Tested-by: Bruce Richardson 
> ---
> DEPENDS on:
> http://dpdk.org/dev/patchwork/patch/29086/
> 
> v3:
>  - acked-by and tested-by tags
> 
> v2:
>  - fix misspelled word in commit message
>  - address comment for more clear code
> 
>  lib/librte_eal/common/arch/x86/rte_cycles.c| 142 
> +
>  .../common/include/arch/x86/rte_cycles.h   |   7 +-
>  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
>  3 files changed, 145 insertions(+), 5 deletions(-)
>  create mode 100644 lib/librte_eal/common/arch/x86/rte_cycles.c
> +
> +static uint32_t
> +check_model_wsm_nhm(uint8_t model)
> +{
> + switch (model) {
> + /* Westmere */
> + case 0x25:
> + case 0x2C:
> + case 0x2F:
> + /* Nehalem */
> + case 0x1E:
> + case 0x1F:
> + case 0x1A:

See next comment.

> + case 0x2E:
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> +static uint32_t
> +check_model_gdm_dnv(uint8_t model)
> +{
> + switch (model) {
> + /* Goldmont */
> + case 0x5C:
> + /* Denverton */

Not adding "/* fall-through */" may break gcc 7 build.

> + case 0x5F:
> + return 1;
> + }
> +
> + return 0;
> +}
> +


Re: [dpdk-dev] [PATCH v3] eal/x86: implement x86 specific tsc hz

2017-10-02 Thread Sergio Gonzalez Monroy

On 02/10/2017 12:24, Jerin Jacob wrote:

-Original Message-

Date: Mon, 2 Oct 2017 12:17:38 +0100
From: Sergio Gonzalez Monroy 
To: dev@dpdk.org
CC: harry.van.haa...@intel.com, bruce.richard...@intel.com
Subject: [dpdk-dev] [PATCH v3] eal/x86: implement x86 specific tsc hz
X-Mailer: git-send-email 2.9.5

First, try to use CPUID Time Stamp Counter and Nominal Core Crystal
Clock Information Leaf to determine the tsc hz on platforms that
supports it (does not require privileged user).

If the CPUID leaf is not available, then try to determine the tsc hz by
reading the MSR 0xCE (requires privileged user).

Default to the tsc hz estimation if both methods fail.

Signed-off-by: Sergio Gonzalez Monroy 
Acked-by: Harry van Haaren 
Tested-by: Bruce Richardson 
---
DEPENDS on:
http://dpdk.org/dev/patchwork/patch/29086/

v3:
  - acked-by and tested-by tags

v2:
  - fix misspelled word in commit message
  - address comment for more clear code

  lib/librte_eal/common/arch/x86/rte_cycles.c| 142 +
  .../common/include/arch/x86/rte_cycles.h   |   7 +-
  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
  3 files changed, 145 insertions(+), 5 deletions(-)
  create mode 100644 lib/librte_eal/common/arch/x86/rte_cycles.c
+
+static uint32_t
+check_model_wsm_nhm(uint8_t model)
+{
+   switch (model) {
+   /* Westmere */
+   case 0x25:
+   case 0x2C:
+   case 0x2F:
+   /* Nehalem */
+   case 0x1E:
+   case 0x1F:
+   case 0x1A:

See next comment.


+   case 0x2E:
+   return 1;
+   }
+
+   return 0;
+}
+
+static uint32_t
+check_model_gdm_dnv(uint8_t model)
+{
+   switch (model) {
+   /* Goldmont */
+   case 0x5C:
+   /* Denverton */

Not adding "/* fall-through */" may break gcc 7 build.


See Bruce's comment on:
http://dpdk.org/ml/archives/dev/2017-September/074259.html

Thanks,
Sergio


Re: [dpdk-dev] [PATCH] checkpatch: re-enable warnings about split long strings

2017-10-02 Thread Adrien Mazarguil
Hi Stephen,

On Fri, Sep 29, 2017 at 08:37:49AM -0700, Stephen Hemminger wrote:
> The Linux kernel style policy about strings is that strings should
> be always put on one line. This makes sense since a typical use
> case is for a user to type the error message into a search engine
> or grep, and it won't be found if split across lines.
> This patch just re-enables that check.
> 
> Yes, lots of DPDK code now splits strings, that doesn't
> make it right.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  devtools/checkpatches.sh | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
> index a56c41a301c0..3e6081dd673e 100755
> --- a/devtools/checkpatches.sh
> +++ b/devtools/checkpatches.sh
> @@ -44,7 +44,6 @@ options="$options --show-types"
>  options="$options --ignore=LINUX_VERSION_CODE,FILE_PATH_CHANGES,\
>  VOLATILE,PREFER_PACKED,PREFER_ALIGNED,PREFER_PRINTF,\
>  PREFER_KERNEL_TYPES,BIT_MACRO,CONST_STRUCT,\
> -SPLIT_STRING,LONG_LINE_STRING,\
>  LINE_SPACING,PARENTHESIS_ALIGNMENT,NETWORKING_BLOCK_COMMENT_STYLE,\
>  NEW_TYPEDEFS,COMPARISON_TO_NULL"

I'm not sure, given that the main reason for splitting strings in the first
place is to avoid LONG_LINE_STRING warnings, I think we must choose between
the two options. If split strings are not allowed, then long lines must be.

Since checkpatches.sh is used by various automated scripts to complain
loudly about problems in submissions, the above change prevents maintainers
from writing long string at all (can't split and can't go past 80 columns).

As a result, they will be tempted to cripple their code with nasty
workarounds to shut up checkpatches.sh, we don't want that to happen.

Also I think the reasons stated by original commit cf75514c8e2e are still
relevant. My vote would be to keep things as is.

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH v3] net/bonding: support bifurcated driver in eal cli using --vdev

2017-10-02 Thread Gaëtan Rivet
Hi Gowrishankar,

There will be a trivial conflict with my PCI patchset on the
pci_addr_cmp function. I don't know the best way to solve this.

It depends on my patchset being accepted as-is or not, and which
address namespace has precedence over the other.

You could rename pci_addr_cmp with a reference to the bonding namespace,
it is always nice when debugging to know that we are within the bonding
PMD, even if the function is static inlined...

Anyway, on the principle anyway the code seems ok to me, so

Reviewed-by: Gaetan Rivet 

On Wed, Sep 20, 2017 at 11:34:58PM +0530, Gowrishankar wrote:
> From: Gowrishankar Muthukrishnan 
> 
> At present, creating bonding devices using --vdev is broken for PMD like
> mlx5 as it is neither UIO nor VFIO based and hence PMD driver is unknown
> to find_port_id_by_pci_addr(), as below.
> 
> testpmd  --vdev 'net_bonding0,mode=1,slave=,socket_id=0'
> 
> PMD: bond_ethdev_parse_slave_port_kvarg(150) - Invalid slave port value
>  () specified
> EAL: Failed to parse slave ports for bonded device net_bonding0
> 
> This patch fixes parsing PCI ID from bonding device params by verifying
> it in RTE PCI bus, rather than checking dev->kdrv.
> 
> Fixes: eac901ce ("ethdev: decouple from PCI device")
> Signed-off-by: Gowrishankar Muthukrishnan 
> ---
> v3:
>  - adapt rte_bus API (with suggestions from Declan and Gaëtan)
> 
>  drivers/net/bonding/rte_eth_bond_args.c | 35 
> ++---
>  1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_args.c 
> b/drivers/net/bonding/rte_eth_bond_args.c
> index bb634c6..7c65dda 100644
> --- a/drivers/net/bonding/rte_eth_bond_args.c
> +++ b/drivers/net/bonding/rte_eth_bond_args.c
> @@ -61,16 +61,6 @@
>   unsigned i;
>  
>   for (i = 0; i < rte_eth_dev_count(); i++) {
> -
> - /* Currently populated by rte_eth_copy_pci_info().
> -  *
> -  * TODO: Once the PCI bus has arrived we should have a better
> -  * way to test for being a PCI device or not.
> -  */
> - if (rte_eth_devices[i].data->kdrv == RTE_KDRV_UNKNOWN ||
> - rte_eth_devices[i].data->kdrv == RTE_KDRV_NONE)
> - continue;
> -
>   pci_dev = RTE_ETH_DEV_TO_PCI(&rte_eth_devices[i]);
>   eth_pci_addr = &pci_dev->addr;
>  
> @@ -98,6 +88,16 @@
>   return -1;
>  }
>  
> +static inline int
> +pci_addr_cmp(const struct rte_device *dev, const void *_pci_addr)
> +{
> + struct rte_pci_device *pdev;
> + const struct rte_pci_addr *paddr = _pci_addr;
> +
> + pdev = RTE_DEV_TO_PCI(*(struct rte_device **)(void *)&dev);
> + return rte_eal_compare_pci_addr(&pdev->addr, paddr);
> +}
> +
>  /**
>   * Parses a port identifier string to a port id by pci address, then by name,
>   * and finally port id.
> @@ -106,10 +106,23 @@
>  parse_port_id(const char *port_str)
>  {
>   struct rte_pci_addr dev_addr;
> + struct rte_bus *pci_bus;
> + struct rte_device *dev;
>   int port_id;
>  
> + pci_bus = rte_bus_find_by_name("pci");
> + if (pci_bus == NULL) {
> + RTE_LOG(ERR, PMD, "unable to find PCI bus\n");
> + return -1;
> + }
> +
>   /* try parsing as pci address, physical devices */
> - if (eal_parse_pci_DomBDF(port_str, &dev_addr) == 0) {
> + if (pci_bus->parse(port_str, &dev_addr) == 0) {
> + dev = pci_bus->find_device(NULL, pci_addr_cmp, &dev_addr);
> + if (dev == NULL) {
> + RTE_LOG(ERR, PMD, "unable to find PCI device\n");
> + return -1;
> + }
>   port_id = find_port_id_by_pci_addr(&dev_addr);
>   if (port_id < 0)
>   return -1;
> -- 
> 1.9.1
> 

-- 
Gaëtan Rivet
6WIND


Re: [dpdk-dev] [PATCH v6 4/8] ethdev: add GTP items to support flow API

2017-10-02 Thread Adrien Mazarguil
On Fri, Sep 29, 2017 at 10:29:55AM +0100, Sean Harte wrote:
> On 29 September 2017 at 09:54, Xing, Beilei  wrote:

> >> >  /**
> >> > + * RTE_FLOW_ITEM_TYPE_GTP.
> >> > + *
> >> > + * Matches a GTPv1 header.
> >> > + */
> >> > +struct rte_flow_item_gtp {
> >> > +   /**
> >> > +* Version (3b), protocol type (1b), reserved (1b),
> >> > +* Extension header flag (1b),
> >> > +* Sequence number flag (1b),
> >> > +* N-PDU number flag (1b).
> >> > +*/
> >> > +   uint8_t v_pt_rsv_flags;
> >> > +   uint8_t msg_type; /**< Message type. */
> >> > +   rte_be16_t msg_len; /**< Message length. */
> >> > +   rte_be32_t teid; /**< Tunnel endpoint identifier. */ };
> >>
> >> In future, you might add support for GTPv2 (which is used since LTE).
> >> Maybe this structure should have v1 in its name to avoid confusion?
> >
> > I considered it before. But I think we can modify it when we support GTPv2 
> > in future, and keep concise 'GTP' currently:)  since I have described it 
> > matches v1 header.
> >
> 
> You could rename v_pt_rsv_flags to version_flags to avoid some future
> code changes to support GTPv2. There's still the issue that not all
> GTPv2 messages have a TEID though.

Although they have the same size, the header of these two protocols
obviously differs. My suggestion would be to go with a separate GTPv2
pattern item using its own dedicated structure instead.

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH v3 1/3] eal/x86: run-time dispatch over memcpy

2017-10-02 Thread Konstantin Ananyev
Hi Xiaoyun,
Just to be a bit more specific about what I suggest -
here is a draft patch below.
It still needs more testing and probably polishing,
but I suppose gives you an idea.
Konstantin


---
 lib/librte_eal/bsdapp/eal/Makefile |  20 +
 lib/librte_eal/common/arch/x86/rte_memcpy.c|  58 ++
 lib/librte_eal/common/arch/x86/rte_memcpy_avx2.c   |  44 +
 .../common/arch/x86/rte_memcpy_avx512f.c   |  44 +
 lib/librte_eal/common/arch/x86/rte_memcpy_sse.c|  40 +
 .../common/include/arch/x86/rte_memcpy.h   | 854 +--
 .../common/include/arch/x86/rte_memcpy_internal.h  | 904 +
 lib/librte_eal/linuxapp/eal/Makefile   |  20 +
 8 files changed, 1149 insertions(+), 835 deletions(-)
 create mode 100644 lib/librte_eal/common/arch/x86/rte_memcpy.c
 create mode 100644 lib/librte_eal/common/arch/x86/rte_memcpy_avx2.c
 create mode 100644 lib/librte_eal/common/arch/x86/rte_memcpy_avx512f.c
 create mode 100644 lib/librte_eal/common/arch/x86/rte_memcpy_sse.c
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index 005019e..32d025b 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -93,6 +93,26 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_service.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_cpuflags.c
 SRCS-$(CONFIG_RTE_ARCH_X86) += rte_spinlock.c
 
+#memcpy dynamic stuff
+SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy.c
+SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy_sse.c
+
+CC_SUPPORT_AVX2 := $(shell $(CC) -march=core-avx2 -dM -E - < /dev/null 2>&1 | 
grep -q AVX2 && echo 1)
+ifeq ($(CC_SUPPORT_AVX2),1)
+CFLAGS_rte_memcpy.o += -DCC_SUPPORT_AVX2
+SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy_avx2.c
+CFLAGS_rte_memcpy_avx2.o += -mavx2
+CFLAGS_rte_memcpy_avx2.o += -DRTE_MACHINE_CPUFLAG_AVX2
+endif
+
+CC_SUPPORT_AVX512F := $(shell $(CC) -mavx512f -dM -E - < /dev/null 2>&1 | grep 
-q AVX512F && echo 1)
+ifeq ($(CC_SUPPORT_AVX512F),1)
+CFLAGS_rte_memcpy.o += -DCC_SUPPORT_AVX512F
+SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy_avx512f.c
+CFLAGS_rte_memcpy_avx512f.o += -mavx512f
+CFLAGS_rte_memcpy_avx512f.o += -DRTE_MACHINE_CPUFLAG_AVX512F
+endif
+
 CFLAGS_eal_common_cpuflags.o := $(CPUFLAGS_LIST)
 
 CFLAGS_eal.o := -D_GNU_SOURCE
diff --git a/lib/librte_eal/common/arch/x86/rte_memcpy.c 
b/lib/librte_eal/common/arch/x86/rte_memcpy.c
new file mode 100644
index 000..9feb2b5
--- /dev/null
+++ b/lib/librte_eal/common/arch/x86/rte_memcpy.c
@@ -0,0 +1,58 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+void *(*rte_memcpy_ptr)(void *dst, const void *src, size_t n) = NULL;
+
+static void __attribute__((constructor))
+rte_memcpy_init(void)
+{
+#ifdef CC_SUPPORT_AVX512F
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) {
+   rte_memcpy_ptr = rte_memcpy_avx512f;
+   printf("%s: AVX512 is using!\n", __func__);
+   return;
+   }
+#endif
+#ifdef CC_SUPPORT_AVX2
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) {
+   rte_memcpy_ptr = rte_memcpy_avx2;
+   printf("%s: AVX2 is using!\n", __func__);
+   return;
+   }
+#endif
+   rte_memcpy_ptr = rte_me

[dpdk-dev] [PATCH] lib/power: add turbo functions to version.map

2017-10-02 Thread David Hunt
allows vm_power_manager example to be built against shared libraries

Signed-off-by: David Hunt 
---
 lib/librte_power/rte_power_version.map | 9 +
 1 file changed, 9 insertions(+)

diff --git a/lib/librte_power/rte_power_version.map 
b/lib/librte_power/rte_power_version.map
index db75ff3..9a2bb36 100644
--- a/lib/librte_power/rte_power_version.map
+++ b/lib/librte_power/rte_power_version.map
@@ -16,3 +16,12 @@ DPDK_2.0 {
 
local: *;
 };
+
+DPDK_17.11 {
+   global:
+
+   rte_power_acpi_turbo_status;
+   rte_power_freq_disable_turbo;
+   rte_power_freq_enable_turbo;
+};
+
-- 
2.7.4



Re: [dpdk-dev] [PATCH v4 2/7] member: implement HT mode

2017-10-02 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Wang, Yipeng1
> Sent: Wednesday, September 27, 2017 6:40 PM
> To: dev@dpdk.org
> Cc: tho...@monjalon.net; Tai, Charlie ; Gobriel,
> Sameh ; De Lara Guarch, Pablo
> ; Mcnamara, John
> ; Wang, Yipeng1 
> Subject: [PATCH v4 2/7] member: implement HT mode
> 

...

> diff --git a/lib/librte_member/Makefile b/lib/librte_member/Makefile index
> 1a79eaa..ad26548 100644
> --- a/lib/librte_member/Makefile
> +++ b/lib/librte_member/Makefile
> @@ -42,7 +42,7 @@ EXPORT_MAP := rte_member_version.map
> LIBABIVER := 1
> 
>  # all source are stored in SRCS-y
> -SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) +=  rte_member.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) +=  rte_member.c
> rte_member_ht.c
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_MEMBER)-include := rte_member.h
> 
> diff --git a/lib/librte_member/rte_member_ht.c
> b/lib/librte_member/rte_member_ht.c
> new file mode 100644
> index 000..55672a4
> --- /dev/null
> +++ b/lib/librte_member/rte_member_ht.c

...

> +
> +static inline int
> +insert_overwrite_search(uint32_t bucket, member_sig_t tmp_sig,
> + struct member_ht_bucket *buckets,
> + member_set_t set_id)

I would call "bucket", "bucket_id", for better understanding.
This comment also applies to other parts of the code (e.g. prim_buckets).

> +{
> + int i;
> + for (i = 0; i < RTE_MEMBER_BUCKET_ENTRIES; i++) {
> + if (buckets[bucket].sigs[i] == tmp_sig) {
> + buckets[bucket].sets[i] = set_id;
> + return 1;
> + }

Is this function used to update an existing entry?
At first, I thought that this was evicting another entry, when the bucket was 
full,
but it looks like it is updating the set_id of an existing entry.
If this is the case, I would change the function name and add a comment 
explaining this.

> + }
> + return 0;
> +}
> +
> +static inline int
> +search_bucket_single(uint32_t bucket, member_sig_t tmp_sig,
> + struct member_ht_bucket *buckets,
> + member_set_t *set_id)
> +{
> + int iter;

Iter should be "unsigned int" (or similar, maybe "uint8_t")

> + for (iter = 0; iter < RTE_MEMBER_BUCKET_ENTRIES; iter++) {
> + if (tmp_sig == buckets[bucket].sigs[iter] &&
> + buckets[bucket].sets[iter] !=
> + RTE_MEMBER_NO_MATCH) {
> + *set_id = buckets[bucket].sets[iter];
> + return 1;
> + }
> + }
> + return 0;
> +}
> +
> +static inline void
> +search_bucket_multi(uint32_t bucket, member_sig_t tmp_sig,
> + struct member_ht_bucket *buckets,
> + uint32_t *counter,
> + uint32_t match_per_key,

Better change to "matches_per_key".

> + member_set_t *set_id)
> +{
> + int iter;
> + for (iter = 0; iter < RTE_MEMBER_BUCKET_ENTRIES; iter++) {
> + if (tmp_sig == buckets[bucket].sigs[iter] &&
> + buckets[bucket].sets[iter] !=
> + RTE_MEMBER_NO_MATCH) {
> + set_id[*counter] = buckets[bucket].sets[iter];
> + (*counter)++;
> + if (*counter >= match_per_key)
> + return;
> + }
> + }
> +}
> +
> +int
> +rte_member_create_ht(struct rte_member_setsum *ss,
> + const struct rte_member_parameters *params) {

...

> +
> + RTE_MEMBER_LOG(DEBUG, "Hash table based filter created, "
> + "the table has %u entries, %u buckets\n",
> + num_buckets,
> + num_buckets / RTE_MEMBER_BUCKET_ENTRIES);

Shouldn't this be "num_buckets * RTE_MEMBER_BUCKET_ENTRIES" and "num_buckets"?

> + return 0;
> +}
> +
> +static inline
> +void get_buckets_index(const struct rte_member_setsum *ss, const void
> *key,
> + uint32_t *prim_bkt, uint32_t *sec_bkt, member_sig_t
> *sig) {

"static inline void" should be in the same line.

> + uint32_t first_hash = MEMBER_HASH_FUNC(key, ss->key_len,
> + ss->prim_hash_seed);
> + uint32_t sec_hash = MEMBER_HASH_FUNC(&first_hash,
> sizeof(uint32_t),
> + ss->sec_hash_seed);
> + *sig = first_hash;
> + if (ss->cache) {
> + *prim_bkt = sec_hash & ss->bucket_mask;

Is this correct? Using the secondary hash to acces the primary bucket?
Why is the cache case different from the non-cache case?
I think this function deserves some comments to explain all this calculations.

> + *sec_bkt =  (sec_hash >> 16) & ss->bucket_mask;
> + } else {
> + *prim_bkt = sec_hash & ss->bucket_mask;
> + *sec_bkt =  (*prim_bkt ^ *sig) & ss->bucket_mask;
> + }
> +}
> +
> +int
> +rte_member_lookup_ht(const struct rte_member_setsum *ss,
> + const void *key, member_set_t *set_id) {
> + uint32_t prim_bucket, s

Re: [dpdk-dev] [PATCH] checkpatch: re-enable warnings about split long strings

2017-10-02 Thread Bruce Richardson
On Mon, Oct 02, 2017 at 01:53:17PM +0200, Adrien Mazarguil wrote:
> Hi Stephen,
> 
> On Fri, Sep 29, 2017 at 08:37:49AM -0700, Stephen Hemminger wrote:
> > The Linux kernel style policy about strings is that strings should
> > be always put on one line. This makes sense since a typical use case
> > is for a user to type the error message into a search engine or
> > grep, and it won't be found if split across lines.  This patch just
> > re-enables that check.
> > 
> > Yes, lots of DPDK code now splits strings, that doesn't make it
> > right.
> > 
> > Signed-off-by: Stephen Hemminger  ---
> > devtools/checkpatches.sh | 1 - 1 file changed, 1 deletion(-)
> > 
> > diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
> > index a56c41a301c0..3e6081dd673e 100755 ---
> > a/devtools/checkpatches.sh +++ b/devtools/checkpatches.sh @@ -44,7
> > +44,6 @@ options="$options --show-types" options="$options
> > --ignore=LINUX_VERSION_CODE,FILE_PATH_CHANGES,\
> > VOLATILE,PREFER_PACKED,PREFER_ALIGNED,PREFER_PRINTF,\
> > PREFER_KERNEL_TYPES,BIT_MACRO,CONST_STRUCT,\
> > -SPLIT_STRING,LONG_LINE_STRING,\
> > LINE_SPACING,PARENTHESIS_ALIGNMENT,NETWORKING_BLOCK_COMMENT_STYLE,\
> > NEW_TYPEDEFS,COMPARISON_TO_NULL"
> 
> I'm not sure, given that the main reason for splitting strings in the
> first place is to avoid LONG_LINE_STRING warnings, I think we must
> choose between the two options. If split strings are not allowed, then
> long lines must be.
> 
> Since checkpatches.sh is used by various automated scripts to complain
> loudly about problems in submissions, the above change prevents
> maintainers from writing long string at all (can't split and can't go
> past 80 columns).
> 
> As a result, they will be tempted to cripple their code with nasty
> workarounds to shut up checkpatches.sh, we don't want that to happen.
> 
> Also I think the reasons stated by original commit cf75514c8e2e are
> still relevant. My vote would be to keep things as is.
> 
In my experience, checkpatch is smart enough to recognise when a long
line overflows the 80 character limit because of a single long string,
so the two options are not mutually exclusive. In other words, long
lines are not allowed except in the case where shortening the line
involves splitting a string. There may be a small amount of work in
getting checkpatch happy, i.e. by putting the string on a line on it's
own, but we can indeed have our cake and eat it too in this case.

/Bruce


Re: [dpdk-dev] [PATCH] examples/vhost_scsi: fix buffer not terminated

2017-10-02 Thread Jastrzebski, MichalX K
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Michal Jastrzebski
> Sent: Friday, September 22, 2017 3:08 PM
> To: y...@fridaylinux.org; maxime.coque...@redhat.com
> Cc: dev@dpdk.org; Jain, Deepak K ; Piasecki,
> JacekX ; Liu, Changpeng
> ; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] examples/vhost_scsi: fix buffer not terminated
> 
> From: Jacek Piasecki 
> 
> Fix size of buffer in strcpy. There was possible to get
> not terminated string after copy operation.
> 
> Coverity issue: 158631
> Fixes: db75c7af19bb ("examples/vhost_scsi: introduce a new sample app")
> Cc: changpeng@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Jacek Piasecki 
> ---
>  examples/vhost_scsi/scsi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/examples/vhost_scsi/scsi.c b/examples/vhost_scsi/scsi.c
> index 54d3104..de9639a 100644
> --- a/examples/vhost_scsi/scsi.c
> +++ b/examples/vhost_scsi/scsi.c
> @@ -307,7 +307,8 @@
>   strncpy((char *)inqdata->t10_vendor_id, "INTEL", 8);
> 
>   /* PRODUCT IDENTIFICATION */
> - strncpy((char *)inqdata->product_id, bdev->product_name,
> 16);
> + strncpy((char *)inqdata->product_id, bdev->product_name,
> + ARRAY_SIZE(inqdata->product_id) - 1);
> 
>   /* PRODUCT REVISION LEVEL */
>   strncpy((char *)inqdata->product_rev, "0001", 4);
> --
> 1.9.1

Hi Yu / Maxime,
I would like to ask for a feedback regarding proposed fix.
If everything is ok with it, please send acked-by.

Best regards
Michal.


Re: [dpdk-dev] [PATCH] examples/vhost_scsi: fix buffer not terminated

2017-10-02 Thread Jastrzebski, MichalX K
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Michal Jastrzebski
> Sent: Friday, September 22, 2017 3:10 PM
> To: y...@fridaylinux.org; maxime.coque...@redhat.com
> Cc: dev@dpdk.org; Jain, Deepak K ; Piasecki,
> JacekX ; Liu, Changpeng
> ; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] examples/vhost_scsi: fix buffer not terminated
> 
> From: Jacek Piasecki 
> 
> Fix size of buffer in strcpy. There was possible to get
> not terminated string after copy operation.
> 
> Coverity issue: 158629
> Fixes: db75c7af19bb ("examples/vhost_scsi: introduce a new sample app")
> Cc: changpeng@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Jacek Piasecki 
> ---
>  examples/vhost_scsi/vhost_scsi.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/examples/vhost_scsi/vhost_scsi.c
> b/examples/vhost_scsi/vhost_scsi.c
> index b4f1f8d..b1a8c93 100644
> --- a/examples/vhost_scsi/vhost_scsi.c
> +++ b/examples/vhost_scsi/vhost_scsi.c
> @@ -186,8 +186,9 @@ static uint64_t gpa_to_vva(int vid, uint64_t gpa)
>   if (!bdev)
>   return NULL;
> 
> - strncpy(bdev->name, bdev_name, sizeof(bdev->name));
> - strncpy(bdev->product_name, bdev_serial, sizeof(bdev-
> >product_name));
> + strncpy(bdev->name, bdev_name, sizeof(bdev->name) - 1);
> + strncpy(bdev->product_name, bdev_serial,
> + sizeof(bdev->product_name) - 1);
>   bdev->blocklen = blk_size;
>   bdev->blockcnt = blk_cnt;
>   bdev->write_cache = wce_enable;
> --
> 1.9.1

Hi Yu / Maxime,
I would like to ask for a feedback regarding proposed fix.
If everything is ok with it, please send acked-by.

Best regards
Michal.


Re: [dpdk-dev] [PATCH v2] net/vmxnet3: fix dereference before null check

2017-10-02 Thread Jastrzebski, MichalX K
> -Original Message-
> From: Jastrzebski, MichalX K
> Sent: Friday, September 29, 2017 3:04 PM
> To: skh...@vmware.com
> Cc: dev@dpdk.org; Jain, Deepak K ; Yigit, Ferruh
> ; Jastrzebski, MichalX K
> ; yongw...@vmware.com;
> sta...@dpdk.org; Kulasek, TomaszX 
> Subject: [PATCH v2] net/vmxnet3: fix dereference before null check
> 
> Coverity reports check_after_deref:
> Null-checking rq suggests that it may be null, but it
> has already been dereferenced on all paths leading to
> the check.
> This patch removes NULL checking of "rq" from function
> vmxnet3_dev_rx_queue_reset as it is already checked against NULL
> one level up the callstack (function vmxnet3_dev_clear_queues).
> 
> Coverity issue: 143468
> Fixes: 5aecdc17a97d ("vmxnet3: fix stop/restart")
> Cc: yongw...@vmware.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Tomasz Kulasek 
> Signed-off-by: Michal Jastrzebski 
> ---
>  drivers/net/vmxnet3/vmxnet3_rxtx.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c
> b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> index d9cf437..0f8cfff 100644
> --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
> +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> @@ -265,11 +265,9 @@ vmxnet3_dev_rx_queue_reset(void *rxq)
>   struct vmxnet3_rx_data_ring *data_ring = &rq->data_ring;
>   int size;
> 
> - if (rq != NULL) {
> - /* Release both the cmd_rings mbufs */
> - for (i = 0; i < VMXNET3_RX_CMDRING_SIZE; i++)
> - vmxnet3_rx_cmd_ring_release_mbufs(&rq-
> >cmd_ring[i]);
> - }
> + /* Release both the cmd_rings mbufs */
> + for (i = 0; i < VMXNET3_RX_CMDRING_SIZE; i++)
> + vmxnet3_rx_cmd_ring_release_mbufs(&rq->cmd_ring[i]);
> 
>   ring0 = &rq->cmd_ring[0];
>   ring1 = &rq->cmd_ring[1];
> --
> 2.7.4

Hi Shrikrishna Khare,
I would like to ask for a feedback regarding proposed fix.
If everything is ok with it, please send acked-by.

Best regards
Michal.


[dpdk-dev] [PATCH 1/3] ethdev: add Rx HW timestamp capability

2017-10-02 Thread Raslan Darawsheh
Add a new offload capability flag for Rx HW timestamp and enabling/disabling
this via rte_eth_rxmode.

Signed-off-by: Raslan Darawsheh 
Acked-by: Yongseok Koh 
---
This patch should be applied after after this series:
http://dpdk.org/dev/patchwork/patch/29368/
---
 doc/guides/nics/features.rst  | 11 +++
 lib/librte_ether/rte_ethdev.c |  6 ++
 lib/librte_ether/rte_ethdev.h |  5 -
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index ba0d19f..fbdd6eb 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -566,6 +566,17 @@ Supports L4 checksum offload.
 * **[provides] rte_eth_dev_info**: 
``rx_offload_capa:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM``,
   
``tx_offload_capa:DEV_TX_OFFLOAD_UDP_CKSUM,DEV_TX_OFFLOAD_TCP_CKSUM,DEV_TX_OFFLOAD_SCTP_CKSUM``.
 
+.. _nic_features_hw_timestamp:
+
+Timestamp offload
+-
+
+Supports Timestamp.
+
+* **[uses] rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_TIMESTAMP``.
+* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_TIMESTAMP``.
+  ``mbuf.timestamp``.
+  **[provides] rte_eth_dev_info**: ``rx_offload_capa:DEV_RX_OFFLOAD_TIMESTAMP``
 
 .. _nic_features_macsec_offload:
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 9b73d23..c5c5164 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -715,6 +715,8 @@ rte_eth_convert_rx_offload_bitfield(const struct 
rte_eth_rxmode *rxmode,
offloads |= DEV_RX_OFFLOAD_SCATTER;
if (rxmode->enable_lro == 1)
offloads |= DEV_RX_OFFLOAD_TCP_LRO;
+   if (rxmode->hw_timestamp == 1)
+   offloads |= DEV_RX_OFFLOAD_TIMESTAMP;
 
*rx_offloads = offloads;
 }
@@ -763,6 +765,10 @@ rte_eth_convert_rx_offloads(const uint64_t rx_offloads,
rxmode->enable_lro = 1;
else
rxmode->enable_lro = 0;
+   if (rx_offloads & DEV_RX_OFFLOAD_TIMESTAMP)
+   rxmode->hw_timestamp = 1;
+   else
+   rxmode->hw_timestamp = 0;
 }
 
 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index ffd2ee5..bd63730 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -368,7 +368,8 @@ struct rte_eth_rxmode {
jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
enable_scatter   : 1, /**< Enable scatter packets rx handler */
-   enable_lro   : 1; /**< Enable LRO */
+   enable_lro   : 1, /**< Enable LRO */
+   hw_timestamp : 1; /**< Enable HW timestamp */
 };
 
 /**
@@ -924,6 +925,8 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_QINQ_STRIP  0x0020
 #define DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM 0x0040
 #define DEV_RX_OFFLOAD_MACSEC_STRIP 0x0080
+#define DEV_RX_OFFLOAD_TIMESTAMP 0x0100
+/**< Device delivers timestamp of packet arrival. */
 
 /**
  * TX offload capabilities of a device.
-- 
2.7.4



[dpdk-dev] [PATCH 2/3] app/testpmd: add Rx HW timestamp

2017-10-02 Thread Raslan Darawsheh
Add enabling/disabling Rx HW timestamp from command line and parameter.

Signed-off-by: Raslan Darawsheh 
Acked-by: Yongseok Koh 
---
 app/test-pmd/cmdline.c| 15 ---
 app/test-pmd/config.c |  8 
 app/test-pmd/parameters.c |  5 +
 app/test-pmd/rxonly.c |  2 ++
 app/test-pmd/testpmd.c|  1 +
 5 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4f2d731..80a249e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -675,7 +675,7 @@ static void cmd_help_long_parsed(void *parsed_result,
"port config all max-pkt-len (value)\n"
"Set the max packet length.\n\n"
 
-   "port config all 
(crc-strip|scatter|rx-cksum|hw-vlan|hw-vlan-filter|"
+   "port config all 
(crc-strip|scatter|rx-cksum|rx-timestamp|hw-vlan|hw-vlan-filter|"
"hw-vlan-strip|hw-vlan-extend|drop-en)"
" (on|off)\n"
"Set 
crc-strip/scatter/rx-checksum/hardware-vlan/drop_en"
@@ -1588,6 +1588,15 @@ cmd_config_rx_mode_flag_parsed(void *parsed_result,
printf("Unknown parameter\n");
return;
}
+   } else if (!strcmp(res->name, "rx-timestamp")) {
+   if (!strcmp(res->value, "on"))
+   rx_mode.hw_timestamp = 1;
+   else if (!strcmp(res->value, "off"))
+   rx_mode.hw_timestamp = 0;
+   else {
+   printf("Unknown parameter\n");
+   return;
+   }
} else if (!strcmp(res->name, "hw-vlan")) {
if (!strcmp(res->value, "on")) {
rx_mode.hw_vlan_filter = 1;
@@ -1656,7 +1665,7 @@ cmdline_parse_token_string_t cmd_config_rx_mode_flag_all =
TOKEN_STRING_INITIALIZER(struct cmd_config_rx_mode_flag, all, "all");
 cmdline_parse_token_string_t cmd_config_rx_mode_flag_name =
TOKEN_STRING_INITIALIZER(struct cmd_config_rx_mode_flag, name,
-   "crc-strip#scatter#rx-cksum#hw-vlan#"
+   
"crc-strip#scatter#rx-cksum#rx-timestamp#hw-vlan#"

"hw-vlan-filter#hw-vlan-strip#hw-vlan-extend");
 cmdline_parse_token_string_t cmd_config_rx_mode_flag_value =
TOKEN_STRING_INITIALIZER(struct cmd_config_rx_mode_flag, value,
@@ -1665,7 +1674,7 @@ cmdline_parse_token_string_t 
cmd_config_rx_mode_flag_value =
 cmdline_parse_inst_t cmd_config_rx_mode_flag = {
.f = cmd_config_rx_mode_flag_parsed,
.data = NULL,
-   .help_str = "port config all crc-strip|scatter|rx-cksum|hw-vlan|"
+   .help_str = "port config all 
crc-strip|scatter|rx-cksum|rx-timestamp|hw-vlan|"
"hw-vlan-filter|hw-vlan-strip|hw-vlan-extend on|off",
.tokens = {
(void *)&cmd_config_rx_mode_flag_port,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index e8e311c..76addf3 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -600,6 +600,14 @@ port_offload_cap_display(portid_t port_id)
printf("off\n");
}
 
+   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TIMESTAMP) {
+   printf("HW timestamp:  ");
+   if (dev->data->dev_conf.rxmode.hw_timestamp)
+   printf("on\n");
+   else
+   printf("off\n");
+   }
+
if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_QINQ_INSERT) {
printf("Double VLANs insert:   ");
if (ports[port_id].tx_ol_flags &
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 2f7f70f..602d98d 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -162,6 +162,7 @@ usage(char* progname)
printf("  --disable-crc-strip: disable CRC stripping by hardware.\n");
printf("  --enable-lro: enable large receive offload.\n");
printf("  --enable-rx-cksum: enable rx hardware checksum offload.\n");
+   printf("  --enable-rx-timestamp: enable rx hardware timestamp 
offload.\n");
printf("  --disable-hw-vlan: disable hardware vlan.\n");
printf("  --disable-hw-vlan-filter: disable hardware vlan filter.\n");
printf("  --disable-hw-vlan-strip: disable hardware vlan strip.\n");
@@ -601,6 +602,7 @@ launch_args_parse(int argc, char** argv)
{ "disable-crc-strip",  0, 0, 0 },
{ "enable-lro", 0, 0, 0 },
{ "enable-rx-cksum",0, 0, 0 },
+   { "enable-rx-timestamp",0, 0, 0 },
{ "enable-scatter", 0, 0, 0 },
{ "disable-hw-vlan",0, 0, 0 },
{ "disable-hw-vlan-filter", 0, 0, 0 },
@@ -899,6 +901,9 

[dpdk-dev] [PATCH 3/3] net/mlx5: add Rx HW timestamp

2017-10-02 Thread Raslan Darawsheh
Expose Rx HW timestamp to packet mbufs.

Signed-off-by :Raslan Darawsheh 
Acked-by: Yongseok Koh 
---
 drivers/net/mlx5/mlx5_ethdev.c   |  3 ++-
 drivers/net/mlx5/mlx5_rxq.c  |  6 +-
 drivers/net/mlx5/mlx5_rxtx.c |  5 +
 drivers/net/mlx5/mlx5_rxtx.h |  3 ++-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 13 -
 5 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index d8bcef4..892c2cc 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -686,7 +686,8 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)
  DEV_RX_OFFLOAD_UDP_CKSUM |
  DEV_RX_OFFLOAD_TCP_CKSUM) :
 0) |
-   (priv->hw_vlan_strip ? DEV_RX_OFFLOAD_VLAN_STRIP : 0);
+   (priv->hw_vlan_strip ? DEV_RX_OFFLOAD_VLAN_STRIP : 0) |
+   DEV_RX_OFFLOAD_TIMESTAMP;
if (!priv->mps)
info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT;
if (priv->hw_csum)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 9bb6a29..e48e240 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -609,7 +609,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
attr.cq.mlx5 = (struct mlx5dv_cq_init_attr){
.comp_mask = 0,
};
-   if (priv->cqe_comp) {
+   if (priv->cqe_comp && !rxq_data->hw_timestamp) {
attr.cq.ibv.comp_mask |= IBV_CQ_INIT_ATTR_MASK_FLAGS;
attr.cq.mlx5.comp_mask |=
MLX5DV_CQ_INIT_ATTR_MASK_COMPRESSED_CQE;
@@ -620,6 +620,8 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
 */
if (rxq_check_vec_support(rxq_data) < 0)
cqe_n *= 2;
+   } else if (priv->cqe_comp && rxq_data->hw_timestamp) {
+   DEBUG("Rx CQE compression is disabled for HW timestamp");
}
tmpl->cq = ibv_cq_ex_to_cq(mlx5dv_create_cq(priv->ctx, &attr.cq.ibv,
&attr.cq.mlx5));
@@ -936,6 +938,8 @@ mlx5_priv_rxq_new(struct priv *priv, uint16_t idx, uint16_t 
desc,
if (priv->hw_csum_l2tun)
tmpl->rxq.csum_l2tun =
!!dev->data->dev_conf.rxmode.hw_ip_checksum;
+   tmpl->rxq.hw_timestamp =
+   !!dev->data->dev_conf.rxmode.hw_timestamp;
/* Configure VLAN stripping. */
tmpl->rxq.vlan_strip = (priv->hw_vlan_strip &&
   !!dev->data->dev_conf.rxmode.hw_vlan_strip);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 275cd6a..961967b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1887,6 +1887,11 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
pkt->vlan_tci =
rte_be_to_cpu_16(cqe->vlan_info);
}
+   if (rxq->hw_timestamp) {
+   pkt->timestamp =
+   rte_be_to_cpu_64(cqe->timestamp);
+   pkt->ol_flags |= PKT_RX_TIMESTAMP;
+   }
if (rxq->crc_present)
len -= ETHER_CRC_LEN;
PKT_LEN(pkt) = len;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 8470a55..c207a8b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -106,6 +106,7 @@ struct rxq_zip {
 struct mlx5_rxq_data {
unsigned int csum:1; /* Enable checksum offloading. */
unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
+   unsigned int hw_timestamp:1; /* Enable HW timestamp. */
unsigned int vlan_strip:1; /* Enable VLAN stripping. */
unsigned int crc_present:1; /* CRC must be subtracted. */
unsigned int sges_n:2; /* Log 2 of SGEs (max buffers per packet). */
@@ -115,7 +116,7 @@ struct mlx5_rxq_data {
unsigned int rss_hash:1; /* RSS hash result is enabled. */
unsigned int mark:1; /* Marked flow available on the queue. */
unsigned int pending_err:1; /* CQE error needs to be handled. */
-   unsigned int :7; /* Remaining bits. */
+   unsigned int :6; /* Remaining bits. */
volatile uint32_t *rq_db;
volatile uint32_t *cq_db;
uint16_t rq_ci;
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h 
b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index c2142d7..e9819b7 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -545,7 +545,8 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq, __m128i 
cqes[4],
 {
__m128i pinfo0, pinfo1;
__m128i pinfo, ptype;
-   __m128i ol_flags = _mm_set1_epi32(rxq->rss_hash * PKT

Re: [dpdk-dev] [PATCH] examples/performance-thread: fix out-of-bounds read

2017-10-02 Thread Jastrzebski, MichalX K
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Slawomir
> Mrozowicz
> Sent: Wednesday, September 20, 2017 9:48 AM
> To: Mcnamara, John 
> Cc: dev@dpdk.org; Mrozowicz, SlawomirX
> ; ian.be...@intel.com; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] examples/performance-thread: fix out-of-
> bounds read
> 
> Overrunning array schedcore of 128 8-byte elements at element index 128
> using index lcore_id.
> Fixed by correct check index lcoreid condition and
> change type of lcoreid to unsigned.
> 
> Coverity issue: 143459
> Fixes: 116819b9ed0d ("examples/performance-thread: add lthread
> subsystem")
> Cc: ian.be...@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Slawomir Mrozowicz 
> ---
>  examples/performance-thread/common/lthread.h   |  2 +-
>  examples/performance-thread/common/lthread_sched.c | 11 +++
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/examples/performance-thread/common/lthread.h
> b/examples/performance-thread/common/lthread.h
> index 5c2c1a5f0..0cde5919b 100644
> --- a/examples/performance-thread/common/lthread.h
> +++ b/examples/performance-thread/common/lthread.h
> @@ -87,7 +87,7 @@ int _lthread_desched_sleep(struct lthread *lt);
> 
>  void _lthread_free(struct lthread *lt);
> 
> -struct lthread_sched *_lthread_sched_get(int lcore_id);
> +struct lthread_sched *_lthread_sched_get(unsigned int lcore_id);
> 
>  struct lthread_stack *_stack_alloc(void);
> 
> diff --git a/examples/performance-thread/common/lthread_sched.c
> b/examples/performance-thread/common/lthread_sched.c
> index 98291478e..3484387b4 100644
> --- a/examples/performance-thread/common/lthread_sched.c
> +++ b/examples/performance-thread/common/lthread_sched.c
> @@ -562,11 +562,14 @@ void lthread_run(void)
>   * Return the scheduler for this lcore
>   *
>   */
> -struct lthread_sched *_lthread_sched_get(int lcore_id)
> +struct lthread_sched *_lthread_sched_get(unsigned int lcore_id)
>  {
> - if (lcore_id > LTHREAD_MAX_LCORES)
> - return NULL;
> - return schedcore[lcore_id];
> + struct lthread_sched *res = NULL;
> +
> + if (lcore_id < LTHREAD_MAX_LCORES)
> + res = schedcore[lcore_id];
> +
> + return res;
>  }
> 
>  /*
> --
> 2.11.0

Hi John, 
Here are four fixes for coverity issues in lthread code:
http://dpdk.org/dev/patchwork/patch/28979/
http://dpdk.org/dev/patchwork/patch/28977/
http://dpdk.org/dev/patchwork/patch/28976/
http://dpdk.org/dev/patchwork/patch/28975/

I would like to ask for Your feedback about these fix proposals.
If everything is ok with them, please send acked-by.

Best regards
Michal.


Re: [dpdk-dev] [PATCH] acl: fix unchecked return value

2017-10-02 Thread Jastrzebski, MichalX K
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Kuba Kozak
> Sent: Wednesday, September 20, 2017 12:02 PM
> To: Ananyev, Konstantin 
> Cc: dev@dpdk.org; Kozak, KubaX ;
> sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] acl: fix unchecked return value
> 
> Add return value check and error handling for fseek call.
> 
> Coverity issue: 143435
> Fixes: 361b2e9559fc ("acl: new sample l3fwd-acl")
> Cc: konstantin.anan...@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Kuba Kozak 
> ---
>  examples/l3fwd-acl/main.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
> index 8eff4de..708e9eb 100644
> --- a/examples/l3fwd-acl/main.c
> +++ b/examples/l3fwd-acl/main.c
> @@ -1026,6 +1026,7 @@ add_rules(const char *rule_path,
>   char buff[LINE_MAX];
>   FILE *fh = fopen(rule_path, "rb");
>   unsigned int i = 0;
> + int val;
> 
>   if (fh == NULL)
>   rte_exit(EXIT_FAILURE, "%s: Open %s failed\n", __func__,
> @@ -1042,7 +1043,11 @@ add_rules(const char *rule_path,
>   rte_exit(EXIT_FAILURE, "Not find any route entries in %s!\n",
>   rule_path);
> 
> - fseek(fh, 0, SEEK_SET);
> + val = fseek(fh, 0, SEEK_SET);
> + if (val < 0) {
> + rte_exit(EXIT_FAILURE, "%s: File seek operation failed\n",
> + __func__);
> + }
> 
>   acl_rules = calloc(acl_num, rule_size);
> 
> --
> 2.7.4

Hi Konstantin,
I would like to ask for a feedback regarding proposed fix. 
If everything is ok with it, please send acked-by. 

Best regards Michal.


Re: [dpdk-dev] [PATCH v2] eal: fix resource leak

2017-10-02 Thread Jastrzebski, MichalX K
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Daniel Mrzyglod
> Sent: Friday, September 22, 2017 4:48 PM
> To: tho...@monjalon.net
> Cc: dev@dpdk.org; Mrzyglod, DanielX T 
> Subject: [dpdk-dev] [PATCH v2] eal: fix resource leak
> 
> Memory allocated in strdup is not free.
> 
> Coverity issue: 143257
> Fixes: d8a2bc71dfc2 ("log: remove app path from syslog id")
> Cc: tho...@monjalon.net
> 
> Signed-off-by: Daniel Mrzyglod 
> ---
> v2:
> * Fix due to compilation errors
> 
>  lib/librte_eal/linuxapp/eal/eal.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 48f12f4..a7df566 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -751,7 +751,7 @@ rte_eal_init(int argc, char **argv)
>   int i, fctret, ret;
>   pthread_t thread_id;
>   static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0);
> - const char *logid;
> + char *logid;
>   char cpuset[RTE_CPU_AFFINITY_STR_LEN];
>   char thread_name[RTE_MAX_THREAD_NAME_LEN];
> 
> @@ -781,6 +781,7 @@ rte_eal_init(int argc, char **argv)
>   if (rte_eal_cpu_init() < 0) {
>   rte_eal_init_alert("Cannot detect lcores.");
>   rte_errno = ENOTSUP;
> + free(logid);
>   return -1;
>   }
> 
> @@ -789,6 +790,7 @@ rte_eal_init(int argc, char **argv)
>   rte_eal_init_alert("Invalid 'command line' arguments.");
>   rte_errno = EINVAL;
>   rte_atomic32_clear(&run_once);
> + free(logid);
>   return -1;
>   }
> 
> @@ -799,6 +801,7 @@ rte_eal_init(int argc, char **argv)
>   rte_eal_init_alert("Cannot get hugepage information.");
>   rte_errno = EACCES;
>   rte_atomic32_clear(&run_once);
> + free(logid);
>   return -1;
>   }
> 
> @@ -826,6 +829,7 @@ rte_eal_init(int argc, char **argv)
>   rte_eal_init_alert("Cannot init logging.");
>   rte_errno = ENOMEM;
>   rte_atomic32_clear(&run_once);
> + free(logid);
>   return -1;
>   }
> 
> @@ -834,6 +838,7 @@ rte_eal_init(int argc, char **argv)
>   rte_eal_init_alert("Cannot init VFIO\n");
>   rte_errno = EAGAIN;
>   rte_atomic32_clear(&run_once);
> + free(logid);
>   return -1;
>   }
>  #endif
> @@ -841,6 +846,7 @@ rte_eal_init(int argc, char **argv)
>   if (rte_eal_memory_init() < 0) {
>   rte_eal_init_alert("Cannot init memory\n");
>   rte_errno = ENOMEM;
> + free(logid);
>   return -1;
>   }
> 
> @@ -850,24 +856,28 @@ rte_eal_init(int argc, char **argv)
>   if (rte_eal_memzone_init() < 0) {
>   rte_eal_init_alert("Cannot init memzone\n");
>   rte_errno = ENODEV;
> + free(logid);
>   return -1;
>   }
> 
>   if (rte_eal_tailqs_init() < 0) {
>   rte_eal_init_alert("Cannot init tail queues for objects\n");
>   rte_errno = EFAULT;
> + free(logid);
>   return -1;
>   }
> 
>   if (rte_eal_alarm_init() < 0) {
>   rte_eal_init_alert("Cannot init interrupt-handling
> thread\n");
>   /* rte_eal_alarm_init sets rte_errno on failure. */
> + free(logid);
>   return -1;
>   }
> 
>   if (rte_eal_timer_init() < 0) {
>   rte_eal_init_alert("Cannot init HPET or TSC timers\n");
>   rte_errno = ENOTSUP;
> + free(logid);
>   return -1;
>   }
> 
> @@ -886,17 +896,20 @@ rte_eal_init(int argc, char **argv)
> 
>   if (rte_eal_intr_init() < 0) {
>   rte_eal_init_alert("Cannot init interrupt-handling
> thread\n");
> + free(logid);
>   return -1;
>   }
> 
>   if (eal_option_device_parse()) {
>   rte_errno = ENODEV;
> + free(logid);
>   return -1;
>   }
> 
>   if (rte_bus_scan()) {
>   rte_eal_init_alert("Cannot scan the buses for devices\n");
>   rte_errno = ENODEV;
> + free(logid);
>   return -1;
>   }
> 
> @@ -941,6 +954,7 @@ rte_eal_init(int argc, char **argv)
>   if (ret) {
>   rte_eal_init_alert("rte_service_init() failed\n");
>   rte_errno = ENOEXEC;
> + free(logid);
>   return -1;
>   }
> 
> @@ -948,6 +962,7 @@ rte_eal_init(int argc, char **argv)
>   if (rte_bus_probe()) {
>   rte_eal_init_alert("Cannot probe devices\n");
>   rte_errno = ENOTSUP;
> + free(logid);
>   return -1;
>   }
> 
> @@ -957,6 +972,7 @@ rte_eal_init(int argc, char **argv)
>   ret = rte_service_start_with_defaults();
>   if (ret

Re: [dpdk-dev] [PATCH] vhost: fix unchecked return value

2017-10-02 Thread Jastrzebski, MichalX K
> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Kuba Kozak
> Sent: Friday, September 22, 2017 2:18 PM
> To: y...@fridaylinux.org; maxime.coque...@redhat.com
> Cc: dev@dpdk.org; Kozak, KubaX ;
> jan.wick...@ericsson.com; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH] vhost: fix unchecked return value
> 
> Add return value check for poll() call.
> 
> Coverity issue: 140740
> Fixes: 59317cef249c ("vhost: allow many vhost-user ports")
> Cc: jan.wick...@ericsson.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Kuba Kozak 
> ---
>  lib/librte_vhost/fd_man.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/fd_man.c b/lib/librte_vhost/fd_man.c
> index 2ceacc9..4c6fed4 100644
> --- a/lib/librte_vhost/fd_man.c
> +++ b/lib/librte_vhost/fd_man.c
> @@ -222,6 +222,7 @@ fdset_event_dispatch(void *arg)
>   int remove1, remove2;
>   int need_shrink;
>   struct fdset *pfdset = arg;
> + int val;
> 
>   if (pfdset == NULL)
>   return NULL;
> @@ -239,7 +240,9 @@ fdset_event_dispatch(void *arg)
>   numfds = pfdset->num;
>   pthread_mutex_unlock(&pfdset->fd_mutex);
> 
> - poll(pfdset->rwfds, numfds, 1000 /* millisecs */);
> + val = poll(pfdset->rwfds, numfds, 1000 /* millisecs */);
> + if (val < 0)
> + continue;
> 
>   need_shrink = 0;
>   for (i = 0; i < numfds; i++) {
> --
> 2.7.4

Hi Yu / Maxime,
I would like to ask for a feedback regarding proposed fix.
If everything is ok with it, please send acked-by.

Best regards
Michal.


Re: [dpdk-dev] [PATCH] acl: fix unchecked return value

2017-10-02 Thread Ananyev, Konstantin


> -Original Message-
> From: Kozak, KubaX
> Sent: Wednesday, September 20, 2017 11:02 AM
> To: Ananyev, Konstantin 
> Cc: dev@dpdk.org; Kozak, KubaX ; sta...@dpdk.org
> Subject: [PATCH] acl: fix unchecked return value
> 
> Add return value check and error handling for fseek call.
> 
> Coverity issue: 143435
> Fixes: 361b2e9559fc ("acl: new sample l3fwd-acl")
> Cc: konstantin.anan...@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Kuba Kozak 
> ---
>  examples/l3fwd-acl/main.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c
> index 8eff4de..708e9eb 100644
> --- a/examples/l3fwd-acl/main.c
> +++ b/examples/l3fwd-acl/main.c
> @@ -1026,6 +1026,7 @@ add_rules(const char *rule_path,
>   char buff[LINE_MAX];
>   FILE *fh = fopen(rule_path, "rb");
>   unsigned int i = 0;
> + int val;
> 
>   if (fh == NULL)
>   rte_exit(EXIT_FAILURE, "%s: Open %s failed\n", __func__,
> @@ -1042,7 +1043,11 @@ add_rules(const char *rule_path,
>   rte_exit(EXIT_FAILURE, "Not find any route entries in %s!\n",
>   rule_path);
> 
> - fseek(fh, 0, SEEK_SET);
> + val = fseek(fh, 0, SEEK_SET);
> + if (val < 0) {
> + rte_exit(EXIT_FAILURE, "%s: File seek operation failed\n",
> + __func__);
> + }
> 
>   acl_rules = calloc(acl_num, rule_size);
> 
> --
> 2.7.4

Acked-by: Konstantin Ananyev 

BTW, I think it should be l3fwd-acl inside the subject line.



Re: [dpdk-dev] [PATCH v7 7/8] mempool: introduce block size align flag

2017-10-02 Thread Olivier MATZ
On Sun, Oct 01, 2017 at 02:59:01PM +0530, Santosh Shukla wrote:
> Some mempool hw like octeontx/fpa block, demands block size
> (/total_elem_sz) aligned object start address.
> 
> Introducing an MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS flag.
> If this flag is set:
> - Align object start address(vaddr) to a multiple of total_elt_sz.
> - Allocate one additional object. Additional object is needed to make
>   sure that requested 'n' object gets correctly populated.
> 
> Example:
> - Let's say that we get 'x' size of memory chunk from memzone.
> - And application has requested 'n' object from mempool.
> - Ideally, we start using objects at start address 0 to...(x-block_sz)
>   for n obj.
> - Not necessarily first object address i.e. 0 is aligned to block_sz.
> - So we derive 'offset' value for block_sz alignment purpose i.e..'off'.
> - That 'off' makes sure that start address of object is blk_sz aligned.
> - Calculating 'off' may end up sacrificing first block_sz area of
>   memzone area x. So total number of the object which can fit in the
>   pool area is n-1, Which is incorrect behavior.
> 
> Therefore we request one additional object (/block_sz area) from memzone
> when MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS flag is set.
> 
> Signed-off-by: Santosh Shukla 
> Signed-off-by: Jerin Jacob 
> Tested-by: Hemant Agrawal 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH v7 8/8] mempool: notify memory area to pool

2017-10-02 Thread Olivier MATZ
On Sun, Oct 01, 2017 at 02:59:02PM +0530, Santosh Shukla wrote:
> HW pool manager e.g. Octeontx SoC demands s/w to program start and end
> address of pool. Currently, there is no such api in external mempool.
> Introducing rte_mempool_ops_register_memory_area api which will let HW(pool
> manager) to know when common layer selects hugepage:
> For each hugepage - Notify its start/end address to HW pool manager.
> 
> Signed-off-by: Santosh Shukla 
> Signed-off-by: Jerin Jacob 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH v5 1/2] eal: allow user to override default pool handle

2017-10-02 Thread Olivier MATZ
On Sun, Oct 01, 2017 at 02:44:39PM +0530, Santosh Shukla wrote:
> DPDK has support for both sw and hw mempool and
> currently user is limited to use ring_mp_mc pool.
> In case user want to use other pool handle,
> need to update config RTE_MEMPOOL_OPS_DEFAULT, then
> build and run with desired pool handle.
> 
> Introducing eal option to override default pool handle.
> 
> Now user can override the RTE_MEMPOOL_OPS_DEFAULT by passing
> pool handle to eal `--mbuf-pool-ops-name=""`.
> 
> Signed-off-by: Santosh Shukla 
> Acked-by: Hemant Agrawal 

Acked-by: Olivier Matz 


Re: [dpdk-dev] [PATCH v5 2/2] ethdev: get the supported pool for a port

2017-10-02 Thread Olivier MATZ
On Sun, Oct 01, 2017 at 02:44:40PM +0530, Santosh Shukla wrote:
> Now that dpdk supports more than one mempool drivers and
> each mempool driver works best for specific PMD, example:
> - sw ring based mempool for Intel PMD drivers.
> - dpaa2 HW mempool manager for dpaa2 PMD driver.
> - fpa HW mempool manager for Octeontx PMD driver.
> 
> Application would like to know the best mempool handle
> for any port.
> 
> Introducing rte_eth_dev_pool_ops_supported() API,
> which allows PMD driver to advertise
> his supported pool capability to the application.
> 
> Supported pools are categorized in below priority:-
> - Best mempool handle for this port (Highest priority '0')
> - Port supports this mempool handle (Priority '1')
> 
> Signed-off-by: Santosh Shukla 

Acked-by: Olivier Matz 


[dpdk-dev] [PATCH 1/3] ethdev: add Rx HW timestamp capability

2017-10-02 Thread Raslan Darawsheh
Add a new offload capability flag for Rx HW
timestamp and enabling/disabling this via rte_eth_rxmode.

Signed-off-by: Raslan Darawsheh 
Acked-by: Yongseok Koh 
---
This patch should be applied after after this series:
http://dpdk.org/dev/patchwork/patch/29368/
---
 doc/guides/nics/features.rst  | 11 +++
 lib/librte_ether/rte_ethdev.c |  6 ++
 lib/librte_ether/rte_ethdev.h |  5 -
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index ba0d19f..fbdd6eb 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -566,6 +566,17 @@ Supports L4 checksum offload.
 * **[provides] rte_eth_dev_info**: 
``rx_offload_capa:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM``,
   
``tx_offload_capa:DEV_TX_OFFLOAD_UDP_CKSUM,DEV_TX_OFFLOAD_TCP_CKSUM,DEV_TX_OFFLOAD_SCTP_CKSUM``.
 
+.. _nic_features_hw_timestamp:
+
+Timestamp offload
+-
+
+Supports Timestamp.
+
+* **[uses] rte_eth_rxconf,rte_eth_rxmode**: 
``offloads:DEV_RX_OFFLOAD_TIMESTAMP``.
+* **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_TIMESTAMP``.
+  ``mbuf.timestamp``.
+  **[provides] rte_eth_dev_info**: ``rx_offload_capa:DEV_RX_OFFLOAD_TIMESTAMP``
 
 .. _nic_features_macsec_offload:
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 9b73d23..c5c5164 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -715,6 +715,8 @@ rte_eth_convert_rx_offload_bitfield(const struct 
rte_eth_rxmode *rxmode,
offloads |= DEV_RX_OFFLOAD_SCATTER;
if (rxmode->enable_lro == 1)
offloads |= DEV_RX_OFFLOAD_TCP_LRO;
+   if (rxmode->hw_timestamp == 1)
+   offloads |= DEV_RX_OFFLOAD_TIMESTAMP;
 
*rx_offloads = offloads;
 }
@@ -763,6 +765,10 @@ rte_eth_convert_rx_offloads(const uint64_t rx_offloads,
rxmode->enable_lro = 1;
else
rxmode->enable_lro = 0;
+   if (rx_offloads & DEV_RX_OFFLOAD_TIMESTAMP)
+   rxmode->hw_timestamp = 1;
+   else
+   rxmode->hw_timestamp = 0;
 }
 
 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index ffd2ee5..bd63730 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -368,7 +368,8 @@ struct rte_eth_rxmode {
jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
enable_scatter   : 1, /**< Enable scatter packets rx handler */
-   enable_lro   : 1; /**< Enable LRO */
+   enable_lro   : 1, /**< Enable LRO */
+   hw_timestamp : 1; /**< Enable HW timestamp */
 };
 
 /**
@@ -924,6 +925,8 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_QINQ_STRIP  0x0020
 #define DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM 0x0040
 #define DEV_RX_OFFLOAD_MACSEC_STRIP 0x0080
+#define DEV_RX_OFFLOAD_TIMESTAMP 0x0100
+/**< Device delivers timestamp of packet arrival. */
 
 /**
  * TX offload capabilities of a device.
-- 
2.7.4



[dpdk-dev] [PATCH 2/3] app/testpmd: add Rx HW timestamp

2017-10-02 Thread Raslan Darawsheh
Add enabling/disabling Rx HW timestamp from
command line and parameter.

Signed-off-by: Raslan Darawsheh 
Acked-by: Yongseok Koh 
---
 app/test-pmd/cmdline.c| 15 ---
 app/test-pmd/config.c |  8 
 app/test-pmd/parameters.c |  5 +
 app/test-pmd/rxonly.c |  2 ++
 app/test-pmd/testpmd.c|  1 +
 5 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4f2d731..80a249e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -675,7 +675,7 @@ static void cmd_help_long_parsed(void *parsed_result,
"port config all max-pkt-len (value)\n"
"Set the max packet length.\n\n"
 
-   "port config all 
(crc-strip|scatter|rx-cksum|hw-vlan|hw-vlan-filter|"
+   "port config all 
(crc-strip|scatter|rx-cksum|rx-timestamp|hw-vlan|hw-vlan-filter|"
"hw-vlan-strip|hw-vlan-extend|drop-en)"
" (on|off)\n"
"Set 
crc-strip/scatter/rx-checksum/hardware-vlan/drop_en"
@@ -1588,6 +1588,15 @@ cmd_config_rx_mode_flag_parsed(void *parsed_result,
printf("Unknown parameter\n");
return;
}
+   } else if (!strcmp(res->name, "rx-timestamp")) {
+   if (!strcmp(res->value, "on"))
+   rx_mode.hw_timestamp = 1;
+   else if (!strcmp(res->value, "off"))
+   rx_mode.hw_timestamp = 0;
+   else {
+   printf("Unknown parameter\n");
+   return;
+   }
} else if (!strcmp(res->name, "hw-vlan")) {
if (!strcmp(res->value, "on")) {
rx_mode.hw_vlan_filter = 1;
@@ -1656,7 +1665,7 @@ cmdline_parse_token_string_t cmd_config_rx_mode_flag_all =
TOKEN_STRING_INITIALIZER(struct cmd_config_rx_mode_flag, all, "all");
 cmdline_parse_token_string_t cmd_config_rx_mode_flag_name =
TOKEN_STRING_INITIALIZER(struct cmd_config_rx_mode_flag, name,
-   "crc-strip#scatter#rx-cksum#hw-vlan#"
+   
"crc-strip#scatter#rx-cksum#rx-timestamp#hw-vlan#"

"hw-vlan-filter#hw-vlan-strip#hw-vlan-extend");
 cmdline_parse_token_string_t cmd_config_rx_mode_flag_value =
TOKEN_STRING_INITIALIZER(struct cmd_config_rx_mode_flag, value,
@@ -1665,7 +1674,7 @@ cmdline_parse_token_string_t 
cmd_config_rx_mode_flag_value =
 cmdline_parse_inst_t cmd_config_rx_mode_flag = {
.f = cmd_config_rx_mode_flag_parsed,
.data = NULL,
-   .help_str = "port config all crc-strip|scatter|rx-cksum|hw-vlan|"
+   .help_str = "port config all 
crc-strip|scatter|rx-cksum|rx-timestamp|hw-vlan|"
"hw-vlan-filter|hw-vlan-strip|hw-vlan-extend on|off",
.tokens = {
(void *)&cmd_config_rx_mode_flag_port,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index e8e311c..76addf3 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -600,6 +600,14 @@ port_offload_cap_display(portid_t port_id)
printf("off\n");
}
 
+   if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TIMESTAMP) {
+   printf("HW timestamp:  ");
+   if (dev->data->dev_conf.rxmode.hw_timestamp)
+   printf("on\n");
+   else
+   printf("off\n");
+   }
+
if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_QINQ_INSERT) {
printf("Double VLANs insert:   ");
if (ports[port_id].tx_ol_flags &
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 2f7f70f..602d98d 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -162,6 +162,7 @@ usage(char* progname)
printf("  --disable-crc-strip: disable CRC stripping by hardware.\n");
printf("  --enable-lro: enable large receive offload.\n");
printf("  --enable-rx-cksum: enable rx hardware checksum offload.\n");
+   printf("  --enable-rx-timestamp: enable rx hardware timestamp 
offload.\n");
printf("  --disable-hw-vlan: disable hardware vlan.\n");
printf("  --disable-hw-vlan-filter: disable hardware vlan filter.\n");
printf("  --disable-hw-vlan-strip: disable hardware vlan strip.\n");
@@ -601,6 +602,7 @@ launch_args_parse(int argc, char** argv)
{ "disable-crc-strip",  0, 0, 0 },
{ "enable-lro", 0, 0, 0 },
{ "enable-rx-cksum",0, 0, 0 },
+   { "enable-rx-timestamp",0, 0, 0 },
{ "enable-scatter", 0, 0, 0 },
{ "disable-hw-vlan",0, 0, 0 },
{ "disable-hw-vlan-filter", 0, 0, 0 },
@@ -899,6 +901,9 

[dpdk-dev] [PATCH 3/3] net/mlx5: add Rx HW timestamp

2017-10-02 Thread Raslan Darawsheh
Expose Rx HW timestamp to packet mbufs.

Signed-off-by :Raslan Darawsheh 
Acked-by: Yongseok Koh 
---
 drivers/net/mlx5/mlx5_ethdev.c   |  3 ++-
 drivers/net/mlx5/mlx5_rxq.c  |  6 +-
 drivers/net/mlx5/mlx5_rxtx.c |  5 +
 drivers/net/mlx5/mlx5_rxtx.h |  3 ++-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 13 -
 5 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index d8bcef4..892c2cc 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -686,7 +686,8 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *info)
  DEV_RX_OFFLOAD_UDP_CKSUM |
  DEV_RX_OFFLOAD_TCP_CKSUM) :
 0) |
-   (priv->hw_vlan_strip ? DEV_RX_OFFLOAD_VLAN_STRIP : 0);
+   (priv->hw_vlan_strip ? DEV_RX_OFFLOAD_VLAN_STRIP : 0) |
+   DEV_RX_OFFLOAD_TIMESTAMP;
if (!priv->mps)
info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT;
if (priv->hw_csum)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 9bb6a29..e48e240 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -609,7 +609,7 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
attr.cq.mlx5 = (struct mlx5dv_cq_init_attr){
.comp_mask = 0,
};
-   if (priv->cqe_comp) {
+   if (priv->cqe_comp && !rxq_data->hw_timestamp) {
attr.cq.ibv.comp_mask |= IBV_CQ_INIT_ATTR_MASK_FLAGS;
attr.cq.mlx5.comp_mask |=
MLX5DV_CQ_INIT_ATTR_MASK_COMPRESSED_CQE;
@@ -620,6 +620,8 @@ mlx5_priv_rxq_ibv_new(struct priv *priv, uint16_t idx)
 */
if (rxq_check_vec_support(rxq_data) < 0)
cqe_n *= 2;
+   } else if (priv->cqe_comp && rxq_data->hw_timestamp) {
+   DEBUG("Rx CQE compression is disabled for HW timestamp");
}
tmpl->cq = ibv_cq_ex_to_cq(mlx5dv_create_cq(priv->ctx, &attr.cq.ibv,
&attr.cq.mlx5));
@@ -936,6 +938,8 @@ mlx5_priv_rxq_new(struct priv *priv, uint16_t idx, uint16_t 
desc,
if (priv->hw_csum_l2tun)
tmpl->rxq.csum_l2tun =
!!dev->data->dev_conf.rxmode.hw_ip_checksum;
+   tmpl->rxq.hw_timestamp =
+   !!dev->data->dev_conf.rxmode.hw_timestamp;
/* Configure VLAN stripping. */
tmpl->rxq.vlan_strip = (priv->hw_vlan_strip &&
   !!dev->data->dev_conf.rxmode.hw_vlan_strip);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 275cd6a..961967b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1887,6 +1887,11 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
pkt->vlan_tci =
rte_be_to_cpu_16(cqe->vlan_info);
}
+   if (rxq->hw_timestamp) {
+   pkt->timestamp =
+   rte_be_to_cpu_64(cqe->timestamp);
+   pkt->ol_flags |= PKT_RX_TIMESTAMP;
+   }
if (rxq->crc_present)
len -= ETHER_CRC_LEN;
PKT_LEN(pkt) = len;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 8470a55..c207a8b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -106,6 +106,7 @@ struct rxq_zip {
 struct mlx5_rxq_data {
unsigned int csum:1; /* Enable checksum offloading. */
unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
+   unsigned int hw_timestamp:1; /* Enable HW timestamp. */
unsigned int vlan_strip:1; /* Enable VLAN stripping. */
unsigned int crc_present:1; /* CRC must be subtracted. */
unsigned int sges_n:2; /* Log 2 of SGEs (max buffers per packet). */
@@ -115,7 +116,7 @@ struct mlx5_rxq_data {
unsigned int rss_hash:1; /* RSS hash result is enabled. */
unsigned int mark:1; /* Marked flow available on the queue. */
unsigned int pending_err:1; /* CQE error needs to be handled. */
-   unsigned int :7; /* Remaining bits. */
+   unsigned int :6; /* Remaining bits. */
volatile uint32_t *rq_db;
volatile uint32_t *cq_db;
uint16_t rq_ci;
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h 
b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index c2142d7..e9819b7 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -545,7 +545,8 @@ rxq_cq_to_ptype_oflags_v(struct mlx5_rxq_data *rxq, __m128i 
cqes[4],
 {
__m128i pinfo0, pinfo1;
__m128i pinfo, ptype;
-   __m128i ol_flags = _mm_set1_epi32(rxq->rss_hash * PKT

Re: [dpdk-dev] [PATCH] lib/power: add turbo functions to version.map

2017-10-02 Thread Thomas Monjalon
Hi,
I have some comments about the API scope and some formatting.

Suggested title:
power: add turbo functions to map file

02/10/2017 14:20, David Hunt:
> allows vm_power_manager example to be built against shared libraries

Fixes: 94608a0f7f45 ("power: add per-core turbo boost API")

> Signed-off-by: David Hunt 
[...]
> +DPDK_17.11 {
> + global:
> +
> + rte_power_acpi_turbo_status;

Is it really the function you want to expose?
rte_power_turbo_status seems more generic.

More comments about what is part of the API:
If you do not want to expose ACPI and VM implementations,
it should not be part of the rte_* include files.

> + rte_power_freq_disable_turbo;
> + rte_power_freq_enable_turbo;
> +};
> +

This is a trailing new line.




[dpdk-dev] [PATCH] eal: add doc for constructor macros

2017-10-02 Thread Thomas Monjalon
It is a reminder that the constructors without priority
get the lowest priority.

Signed-off-by: Thomas Monjalon 
---
 lib/librte_eal/common/include/rte_eal.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index 0e7363d77..559d2308e 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -287,9 +287,26 @@ static inline int rte_gettid(void)
return RTE_PER_LCORE(_thread_id);
 }
 
+/**
+ * Run function before main() with low priority.
+ *
+ * The constructor will be run after prioritized constructors.
+ *
+ * @param func
+ *   Constructor function.
+ */
 #define RTE_INIT(func) \
 static void __attribute__((constructor, used)) func(void)
 
+/**
+ * Run function before main() with high priority.
+ *
+ * @param func
+ *   Constructor function.
+ * @param prio
+ *   Priority number must be above 100.
+ *   Lowest number is the first to run.
+ */
 #define RTE_INIT_PRIO(func, prio) \
 static void __attribute__((constructor(prio), used)) func(void)
 
-- 
2.14.1



Re: [dpdk-dev] [PATCH] lib/power: add turbo functions to version.map

2017-10-02 Thread Hunt, David

Hi Thomas


On 2/10/2017 3:55 PM, Thomas Monjalon wrote:

Hi,
I have some comments about the API scope and some formatting.

Suggested title:
power: add turbo functions to map file

02/10/2017 14:20, David Hunt:

allows vm_power_manager example to be built against shared libraries

Fixes: 94608a0f7f45 ("power: add per-core turbo boost API")


Sure, I'll address this in next version.

Signed-off-by: David Hunt 

[...]

+DPDK_17.11 {
+   global:
+
+   rte_power_acpi_turbo_status;

Is it really the function you want to expose?
rte_power_turbo_status seems more generic.


Not really, it was in there for completeness, but users should be able 
to keep track of the turbo'd cores, so not really needed.



More comments about what is part of the API:
If you do not want to expose ACPI and VM implementations,
it should not be part of the rte_* include files.


+   rte_power_freq_disable_turbo;
+   rte_power_freq_enable_turbo;
+};
+

This is a trailing new line.





I'll address the above comments in the next version.

Regards,
Dave.





Re: [dpdk-dev] [PATCH] examples/vhost_scsi: fix buffer not terminated

2017-10-02 Thread Maxime Coquelin



On 10/02/2017 03:50 PM, Jastrzebski, MichalX K wrote:

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Michal Jastrzebski
Sent: Friday, September 22, 2017 3:08 PM
To: y...@fridaylinux.org; maxime.coque...@redhat.com
Cc: dev@dpdk.org; Jain, Deepak K ; Piasecki,
JacekX ; Liu, Changpeng
; sta...@dpdk.org
Subject: [dpdk-dev] [PATCH] examples/vhost_scsi: fix buffer not terminated

From: Jacek Piasecki 

Fix size of buffer in strcpy. There was possible to get
not terminated string after copy operation.

Coverity issue: 158631
Fixes: db75c7af19bb ("examples/vhost_scsi: introduce a new sample app")
Cc: changpeng@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Jacek Piasecki 
---
  examples/vhost_scsi/scsi.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/examples/vhost_scsi/scsi.c b/examples/vhost_scsi/scsi.c
index 54d3104..de9639a 100644
--- a/examples/vhost_scsi/scsi.c
+++ b/examples/vhost_scsi/scsi.c
@@ -307,7 +307,8 @@
strncpy((char *)inqdata->t10_vendor_id, "INTEL", 8);

/* PRODUCT IDENTIFICATION */
-   strncpy((char *)inqdata->product_id, bdev->product_name,
16);
+   strncpy((char *)inqdata->product_id, bdev->product_name,
+   ARRAY_SIZE(inqdata->product_id) - 1);


Does it assume that product_id is memzero'ed before?
IIUC strncpy manpage, it wouldn't protect against non-null terminated
strings if it is not the case:

"
   A simple implementation of strncpy() might be:

   char *
   strncpy(char *dest, const char *src, size_t n)
   {
   size_t i;

   for (i = 0; i < n && src[i] != '\0'; i++)
   dest[i] = src[i];
   for ( ; i < n; i++)
   dest[i] = '\0';

   return dest;
   }
"

Cheers,
Maxime



/* PRODUCT REVISION LEVEL */
strncpy((char *)inqdata->product_rev, "0001", 4);
--
1.9.1


Hi Yu / Maxime,
I would like to ask for a feedback regarding proposed fix.
If everything is ok with it, please send acked-by.

Best regards
Michal.



[dpdk-dev] [PATCH v4 1/5] net/i40e: remove unnecessary bit operations

2017-10-02 Thread Kirill Rybalchenko
Remove unnecessary bit operations in I40E_PFQF_HENA
and I40E_VFQF_HENA registers

Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/i40e_ethdev.c| 21 +++--
 drivers/net/i40e/i40e_ethdev_vf.c | 22 --
 2 files changed, 7 insertions(+), 36 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index acdf0de..41c4033 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -6717,16 +6717,9 @@ static void
 i40e_pf_disable_rss(struct i40e_pf *pf)
 {
struct i40e_hw *hw = I40E_PF_TO_HW(pf);
-   uint64_t hena;
 
-   hena = (uint64_t)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(0));
-   hena |= ((uint64_t)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(1))) << 32;
-   if (hw->mac.type == I40E_MAC_X722)
-   hena &= ~I40E_RSS_HENA_ALL_X722;
-   else
-   hena &= ~I40E_RSS_HENA_ALL;
-   i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), (uint32_t)hena);
-   i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), (uint32_t)(hena >> 32));
+   i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), 0);
+   i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), 0);
I40E_WRITE_FLUSH(hw);
 }
 
@@ -6798,7 +6791,6 @@ static int
 i40e_hw_rss_hash_set(struct i40e_pf *pf, struct rte_eth_rss_conf *rss_conf)
 {
struct i40e_hw *hw = I40E_PF_TO_HW(pf);
-   uint64_t rss_hf;
uint64_t hena;
int ret;
 
@@ -6807,14 +6799,7 @@ i40e_hw_rss_hash_set(struct i40e_pf *pf, struct 
rte_eth_rss_conf *rss_conf)
if (ret)
return ret;
 
-   rss_hf = rss_conf->rss_hf;
-   hena = (uint64_t)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(0));
-   hena |= ((uint64_t)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(1))) << 32;
-   if (hw->mac.type == I40E_MAC_X722)
-   hena &= ~I40E_RSS_HENA_ALL_X722;
-   else
-   hena &= ~I40E_RSS_HENA_ALL;
-   hena |= i40e_config_hena(rss_hf, hw->mac.type);
+   hena = i40e_config_hena(rss_conf->rss_hf, hw->mac.type);
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), (uint32_t)hena);
i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), (uint32_t)(hena >> 32));
I40E_WRITE_FLUSH(hw);
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index b35011a..a903deb 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -2498,7 +2498,7 @@ static int
 i40evf_hw_rss_hash_set(struct i40e_vf *vf, struct rte_eth_rss_conf *rss_conf)
 {
struct i40e_hw *hw = I40E_VF_TO_HW(vf);
-   uint64_t rss_hf, hena;
+   uint64_t hena;
int ret;
 
ret = i40evf_set_rss_key(&vf->vsi, rss_conf->rss_key,
@@ -2506,14 +2506,7 @@ i40evf_hw_rss_hash_set(struct i40e_vf *vf, struct 
rte_eth_rss_conf *rss_conf)
if (ret)
return ret;
 
-   rss_hf = rss_conf->rss_hf;
-   hena = (uint64_t)i40e_read_rx_ctl(hw, I40E_VFQF_HENA(0));
-   hena |= ((uint64_t)i40e_read_rx_ctl(hw, I40E_VFQF_HENA(1))) << 32;
-   if (hw->mac.type == I40E_MAC_X722)
-   hena &= ~I40E_RSS_HENA_ALL_X722;
-   else
-   hena &= ~I40E_RSS_HENA_ALL;
-   hena |= i40e_config_hena(rss_hf, hw->mac.type);
+   hena = i40e_config_hena(rss_conf->rss_hf, hw->mac.type);
i40e_write_rx_ctl(hw, I40E_VFQF_HENA(0), (uint32_t)hena);
i40e_write_rx_ctl(hw, I40E_VFQF_HENA(1), (uint32_t)(hena >> 32));
I40EVF_WRITE_FLUSH(hw);
@@ -2525,16 +2518,9 @@ static void
 i40evf_disable_rss(struct i40e_vf *vf)
 {
struct i40e_hw *hw = I40E_VF_TO_HW(vf);
-   uint64_t hena;
 
-   hena = (uint64_t)i40e_read_rx_ctl(hw, I40E_VFQF_HENA(0));
-   hena |= ((uint64_t)i40e_read_rx_ctl(hw, I40E_VFQF_HENA(1))) << 32;
-   if (hw->mac.type == I40E_MAC_X722)
-   hena &= ~I40E_RSS_HENA_ALL_X722;
-   else
-   hena &= ~I40E_RSS_HENA_ALL;
-   i40e_write_rx_ctl(hw, I40E_VFQF_HENA(0), (uint32_t)hena);
-   i40e_write_rx_ctl(hw, I40E_VFQF_HENA(1), (uint32_t)(hena >> 32));
+   i40e_write_rx_ctl(hw, I40E_VFQF_HENA(0), 0);
+   i40e_write_rx_ctl(hw, I40E_VFQF_HENA(1), 0);
I40EVF_WRITE_FLUSH(hw);
 }
 
-- 
2.5.5



[dpdk-dev] [PATCH v4 0/5] net/i40e: implement dynamic mapping of flow types to pctypes

2017-10-02 Thread Kirill Rybalchenko
Implement dynamic mapping of software flow types to hardware pctypes.
This allows to map new flow types to pctypes without changing
API of the driver.

v2:
Remove unnecessary check for new flow types.
Re-arrange patchset to avoid compillation errors.
Remove unnecessary usage of statically defined flow types and pctypes.

v3:
Remove unnecessary bit operations in I40E_PFQF_HENA and I40E_VFQF_HENA 
registers.
Add new definition in enum i40e_filter_pctype for for invalid pctype.
Fixed bugs in i40e_pctype_to_flowtype and i40e_flowtype_to_pctype functions.
Function rte_pmd_i40e_flow_type_mapping_get returns now full mapping table.
testpmd: changed command syntax from 'pctype mapping...' to
'port config pctype mapping...' and 'show port pctype mapping'
Various small modifications in code style after reviewing.

v4:
Change prototypes of some static functions.
Move declaration of automatic variables to beginning of function.
Move declaration of I40E_FILTER_PCTYPE_INVALID to i40e_ethdev.h
Fix some typos in source filea and documentation.

Kirill Rybalchenko (5):
  net/i40e: remove unnecessary bit operations
  net/i40e: implement dynamic mapping of sw flow types to hw pctypes
  net/i40e: add new functions to manipulate with pctype  mapping
table
  app/testpmd: add new commands to manipulate with pctype mapping
  ethdev: remove unnecessary check for new flow type

 app/test-pmd/cmdline.c  | 336 -
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  46 
 drivers/net/i40e/i40e_ethdev.c  | 363 
 drivers/net/i40e/i40e_ethdev.h  |  22 +-
 drivers/net/i40e/i40e_ethdev_vf.c   |  36 +--
 drivers/net/i40e/i40e_fdir.c|  54 ++---
 drivers/net/i40e/i40e_flow.c|   5 +-
 drivers/net/i40e/i40e_rxtx.c|  57 +
 drivers/net/i40e/i40e_rxtx.h|   1 +
 drivers/net/i40e/rte_pmd_i40e.c |  90 +++
 drivers/net/i40e/rte_pmd_i40e.h |  55 +
 drivers/net/i40e/rte_pmd_i40e_version.map   |   3 +
 lib/librte_ether/rte_ethdev.c   |   8 -
 13 files changed, 737 insertions(+), 339 deletions(-)

-- 
2.5.5



[dpdk-dev] [PATCH v4 2/5] net/i40e: implement dynamic mapping of sw flow types to hw pctypes

2017-10-02 Thread Kirill Rybalchenko
Implement dynamic mapping of software flow types to hardware pctypes.
This allows to add new flow types and pctypes for DDP without changing
API of the driver. The mapping table is located in private
data area for particular network adapter and can be individually
modified with set of appropriate functions.

v2:
Re-arrange patchset to avoid compillation errors.
Remove usage of statically defined flow types and pctypes.

v3:
Change prototypes of some static functions.
Fixe bugs in i40e_pctype_to_flowtype and i40e_flowtype_to_pctype
functions.
Various small modifications after reviewing.

v4:
Change prototypes of some static functions.
Move declaration of automatic variables to beginning of function.
Move declaration of I40E_FILTER_PCTYPE_INVALID to i40e_ethdev.h

Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/i40e_ethdev.c| 344 --
 drivers/net/i40e/i40e_ethdev.h|  22 ++-
 drivers/net/i40e/i40e_ethdev_vf.c |  16 +-
 drivers/net/i40e/i40e_fdir.c  |  54 +++---
 drivers/net/i40e/i40e_flow.c  |   5 +-
 drivers/net/i40e/i40e_rxtx.c  |  57 +++
 drivers/net/i40e/i40e_rxtx.h  |   1 +
 7 files changed, 212 insertions(+), 287 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 41c4033..6443702 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1070,6 +1070,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
return 0;
}
i40e_set_default_ptype_table(dev);
+   i40e_set_default_pctype_table(dev);
pci_dev = RTE_ETH_DEV_TO_PCI(dev);
intr_handle = &pci_dev->intr_handle;
 
@@ -3020,7 +3021,7 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
sizeof(uint32_t);
dev_info->reta_size = pf->hash_lut_size;
-   dev_info->flow_type_rss_offloads = I40E_RSS_OFFLOAD_ALL;
+   dev_info->flow_type_rss_offloads = pf->adapter->flow_types_mask;
 
dev_info->default_rxconf = (struct rte_eth_rxconf) {
.rx_thresh = {
@@ -6611,104 +6612,36 @@ i40e_vsi_delete_mac(struct i40e_vsi *vsi, struct 
ether_addr *addr)
 
 /* Configure hash enable flags for RSS */
 uint64_t
-i40e_config_hena(uint64_t flags, enum i40e_mac_type type)
+i40e_config_hena(const struct i40e_adapter *adapter, uint64_t flags)
 {
uint64_t hena = 0;
+   int i;
 
if (!flags)
return hena;
 
-   if (flags & ETH_RSS_FRAG_IPV4)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_FRAG_IPV4;
-   if (flags & ETH_RSS_NONFRAG_IPV4_TCP) {
-   if (type == I40E_MAC_X722) {
-   hena |= (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_TCP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK);
-   } else
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_TCP;
-   }
-   if (flags & ETH_RSS_NONFRAG_IPV4_UDP) {
-   if (type == I40E_MAC_X722) {
-   hena |= (1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_UDP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_UNICAST_IPV4_UDP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV4_UDP);
-   } else
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_UDP;
-   }
-   if (flags & ETH_RSS_NONFRAG_IPV4_SCTP)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_SCTP;
-   if (flags & ETH_RSS_NONFRAG_IPV4_OTHER)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV4_OTHER;
-   if (flags & ETH_RSS_FRAG_IPV6)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_FRAG_IPV6;
-   if (flags & ETH_RSS_NONFRAG_IPV6_TCP) {
-   if (type == I40E_MAC_X722) {
-   hena |= (1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_TCP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_TCP_SYN_NO_ACK);
-   } else
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_TCP;
+   for (i = RTE_ETH_FLOW_UNKNOWN + 1; i < I40E_FLOW_TYPE_MAX; i++) {
+   if (flags & (1ULL << i))
+   hena |= adapter->pctypes_tbl[i];
}
-   if (flags & ETH_RSS_NONFRAG_IPV6_UDP) {
-   if (type == I40E_MAC_X722) {
-   hena |= (1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_UDP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) |
-(1ULL << I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP);
-   } else
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_UDP;
-   }
-   if (flags & ETH_RSS_NONFRAG_IPV6_SCTP)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_SCTP;
-   if (flags & ETH_RSS_NONFRAG_IPV6_OTHER)
-   hena |= 1ULL << I40E_FILTER_PCTYPE_NONF_IPV6_OTHER;
-   if (fl

[dpdk-dev] [PATCH v4 4/5] app/testpmd: add new commands to manipulate with pctype mapping

2017-10-02 Thread Kirill Rybalchenko
Add new commands to manipulate with dynamic flow type to
pctype mapping table in i40e PMD.
Commands allow to print table, modify it and reset to default value.

v3:
changed command syntax from 'pctype mapping...' to
'port config pctype mapping...' and 'show port pctype mapping'

v4:
Fix typos in cmdline.c and documentation.
Move variable declaration to the beginning of function.

Signed-off-by: Kirill Rybalchenko 
---
 app/test-pmd/cmdline.c  | 336 +++-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  46 
 2 files changed, 372 insertions(+), 10 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 4f2d731..83baae3 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -230,6 +230,10 @@ static void cmd_help_long_parsed(void *parsed_result,
 
"clear vf stats (port_id) (vf_id)\n"
"Reset a VF's statistics.\n\n"
+
+   "show port (port_id) pctype mapping\n"
+   "Get flow ptype to pctype mapping on a port\n\n"
+
);
}
 
@@ -681,7 +685,8 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set 
crc-strip/scatter/rx-checksum/hardware-vlan/drop_en"
" for ports.\n\n"
 
-   "port config all rss 
(all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none)\n"
+   "port config all rss 
(all|ip|tcp|udp|sctp|ether|port|vxlan|"
+   "geneve|nvgre|none|)\n"
"Set the RSS mode.\n\n"
 
"port config port-id rss reta 
(hash,queue)[,(hash,queue)]\n"
@@ -716,6 +721,13 @@ static void cmd_help_long_parsed(void *parsed_result,
"port config (port_id|all) l2-tunnel E-tag"
" (enable|disable)\n"
"Enable/disable the E-tag support.\n\n"
+
+   "port config (port_id) pctype mapping reset\n"
+   "Reset flow type to pctype mapping on a port\n\n"
+
+   "port config (port_id) pctype mapping update"
+   " (pctype_id_0[,pctype_id_1]*) (flow_type_id)\n"
+   "Update a flow type to pctype mapping item on a 
port\n\n"
);
}
 
@@ -878,8 +890,8 @@ static void cmd_help_long_parsed(void *parsed_result,
"set_hash_input_set (port_id) (ipv4|ipv4-frag|"
"ipv4-tcp|ipv4-udp|ipv4-sctp|ipv4-other|ipv6|"
"ipv6-frag|ipv6-tcp|ipv6-udp|ipv6-sctp|ipv6-other|"
-   "l2_payload) (ovlan|ivlan|src-ipv4|dst-ipv4|src-ipv6|"
-   "dst-ipv6|ipv4-tos|ipv4-proto|ipv6-tc|"
+   "l2_payload|) 
(ovlan|ivlan|src-ipv4|dst-ipv4|"
+   "src-ipv6|dst-ipv6|ipv4-tos|ipv4-proto|ipv6-tc|"
"ipv6-next-header|udp-src-port|udp-dst-port|"
"tcp-src-port|tcp-dst-port|sctp-src-port|"
"sctp-dst-port|sctp-veri-tag|udp-key|gre-key|fld-1st|"
@@ -1720,6 +1732,8 @@ cmd_config_rss_parsed(void *parsed_result,
rss_conf.rss_hf = ETH_RSS_NVGRE;
else if (!strcmp(res->value, "none"))
rss_conf.rss_hf = 0;
+   else if (isdigit(res->value[0]) && atoi(res->value) > 0 && 
atoi(res->value) < 64)
+   rss_conf.rss_hf = 1ULL << atoi(res->value);
else {
printf("Unknown parameter\n");
return;
@@ -1743,14 +1757,13 @@ cmdline_parse_token_string_t cmd_config_rss_all =
 cmdline_parse_token_string_t cmd_config_rss_name =
TOKEN_STRING_INITIALIZER(struct cmd_config_rss, name, "rss");
 cmdline_parse_token_string_t cmd_config_rss_value =
-   TOKEN_STRING_INITIALIZER(struct cmd_config_rss, value,
-   "all#ip#tcp#udp#sctp#ether#port#vxlan#geneve#nvgre#none");
+   TOKEN_STRING_INITIALIZER(struct cmd_config_rss, value, NULL);
 
 cmdline_parse_inst_t cmd_config_rss = {
.f = cmd_config_rss_parsed,
.data = NULL,
.help_str = "port config all rss "
-   "all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none",
+   
"all|ip|tcp|udp|sctp|ether|port|vxlan|geneve|nvgre|none|",
.tokens = {
(void *)&cmd_config_rss_port,
(void *)&cmd_config_rss_keyword,
@@ -8991,6 +9004,10 @@ str2flowtype(char *string)
if (!strcmp(flowtype_str[i].str, string))
return flowtype_str[i].type;
}
+
+   if (isdigit(string[0]) && atoi(string) > 0 && atoi(string) < 64)
+   return (uint16_t)atoi(string);
+
return RTE_ETH_FLOW_UNKNOWN;
 }
 
@@ -10467,9 +10484,7 @@ cmdline_parse_token_num_t 
cmd_set_hash_input_set_port_id =
port_id, UINT8);
 cmdline_parse_token_string_t cmd_set_hash_input_set_flow_t

[dpdk-dev] [PATCH v4 5/5] ethdev: remove unnecessary check for new flow type

2017-10-02 Thread Kirill Rybalchenko
Remove unnecessary check for new flow type for rss hash
filter update.

Signed-off-by: Kirill Rybalchenko 
---
 lib/librte_ether/rte_ethdev.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 1849a3b..f3bf3e5 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2291,16 +2291,8 @@ int
 rte_eth_dev_rss_hash_update(uint8_t port_id, struct rte_eth_rss_conf *rss_conf)
 {
struct rte_eth_dev *dev;
-   uint16_t rss_hash_protos;
 
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-   rss_hash_protos = rss_conf->rss_hf;
-   if ((rss_hash_protos != 0) &&
-   ((rss_hash_protos & ETH_RSS_PROTO_MASK) == 0)) {
-   RTE_PMD_DEBUG_TRACE("Invalid rss_hash_protos=0x%x\n",
-   rss_hash_protos);
-   return -EINVAL;
-   }
dev = &rte_eth_devices[port_id];
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rss_hash_update, -ENOTSUP);
return (*dev->dev_ops->rss_hash_update)(dev, rss_conf);
-- 
2.5.5



[dpdk-dev] [PATCH v4 3/5] net/i40e: add new functions to manipulate with pctype mapping table

2017-10-02 Thread Kirill Rybalchenko
Add new functions which allow modify, return or reset to default
the contents of flow type to pctype dynamic mapping table.

v3:
Function rte_pmd_i40e_flow_type_mapping_get returns now full
mapping table.

v4:
Fix typo in rte_pmd_i40e_version.map file.

Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/rte_pmd_i40e.c   | 90 +++
 drivers/net/i40e/rte_pmd_i40e.h   | 55 +++
 drivers/net/i40e/rte_pmd_i40e_version.map |  3 ++
 3 files changed, 148 insertions(+)

diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c
index c08e07a..7cf56db 100644
--- a/drivers/net/i40e/rte_pmd_i40e.c
+++ b/drivers/net/i40e/rte_pmd_i40e.c
@@ -2161,3 +2161,93 @@ rte_pmd_i40e_add_vf_mac_addr(uint8_t port, uint16_t 
vf_id,
 
return 0;
 }
+
+int rte_pmd_i40e_flow_type_mapping_reset(uint8_t port)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+   dev = &rte_eth_devices[port];
+
+   if (!is_i40e_supported(dev))
+   return -ENOTSUP;
+
+   i40e_set_default_pctype_table(dev);
+
+   return 0;
+}
+
+int rte_pmd_i40e_flow_type_mapping_get(
+   uint8_t port,
+   struct rte_pmd_i40e_flow_type_mapping *mapping_items)
+{
+   struct rte_eth_dev *dev;
+   struct i40e_adapter *ad;
+   uint16_t i;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+   dev = &rte_eth_devices[port];
+
+   if (!is_i40e_supported(dev))
+   return -ENOTSUP;
+
+   ad = I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+   for (i = 0; i < I40E_FLOW_TYPE_MAX; i++) {
+   mapping_items[i].flow_type = i;
+   mapping_items[i].pctype = ad->pctypes_tbl[i];
+   }
+
+   return 0;
+}
+
+int
+rte_pmd_i40e_flow_type_mapping_update(
+   uint8_t port,
+   struct rte_pmd_i40e_flow_type_mapping *mapping_items,
+   uint16_t count,
+   uint8_t exclusive)
+{
+   struct rte_eth_dev *dev;
+   struct i40e_adapter *ad;
+   int i;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+   dev = &rte_eth_devices[port];
+
+   if (!is_i40e_supported(dev))
+   return -ENOTSUP;
+
+   if (count > I40E_FLOW_TYPE_MAX)
+   return -EINVAL;
+
+   for (i = 0; i < count; i++)
+   if (mapping_items[i].flow_type >= I40E_FLOW_TYPE_MAX ||
+   mapping_items[i].flow_type == RTE_ETH_FLOW_UNKNOWN ||
+   (mapping_items[i].pctype & (1ULL << 
I40E_FILTER_PCTYPE_INVALID)))
+   return -EINVAL;
+
+   ad = I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+
+   if (exclusive) {
+   for (i = 0; i < I40E_FLOW_TYPE_MAX; i++)
+   ad->pctypes_tbl[i] = 0ULL;
+   ad->flow_types_mask = 0ULL;
+   }
+
+   for (i = 0; i < count; i++) {
+   ad->pctypes_tbl[mapping_items[i].flow_type] = 
mapping_items[i].pctype;
+   if (mapping_items[i].pctype)
+   ad->flow_types_mask |= (1ULL << 
mapping_items[i].flow_type);
+   else
+   ad->flow_types_mask &= ~(1ULL << 
mapping_items[i].flow_type);
+   }
+
+   for (i = 0, ad->pctypes_mask = 0ULL; i < I40E_FLOW_TYPE_MAX; i++)
+   ad->pctypes_mask |= ad->pctypes_tbl[i];
+
+   return 0;
+}
diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
index 155b7e8..004a8a5 100644
--- a/drivers/net/i40e/rte_pmd_i40e.h
+++ b/drivers/net/i40e/rte_pmd_i40e.h
@@ -657,4 +657,59 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port,
 int rte_pmd_i40e_add_vf_mac_addr(uint8_t port, uint16_t vf_id,
 struct ether_addr *mac_addr);
 
+#define RTE_PMD_I40E_PCTYPE_MAX64
+#define RTE_PMD_I40E_FLOW_TYPE_MAX 64
+
+struct rte_pmd_i40e_flow_type_mapping {
+   uint16_t flow_type; /**< software defined flow type*/
+   uint64_t pctype;/**< hardware defined pctype */
+};
+
+/**
+ * Update hardware defined pctype to software defined flow type
+ * mapping table.
+ *
+ * @param port
+ *pointer to port identifier of the device.
+ * @param mapping_items
+ *the base address of the mapping items array.
+ * @param count
+ *number of mapping items.
+ * @param exclusive
+ *the flag indicate different pctype mapping update method.
+ *-(0) only overwrite referred PCTYPE mapping,
+ * keep other PCTYPEs mapping unchanged.
+ *-(!0) overwrite referred PCTYPE mapping,
+ * set other PCTYPEs maps to PCTYPE_INVALID.
+ */
+int rte_pmd_i40e_flow_type_mapping_update(
+   uint8_t port,
+   struct rte_pmd_i40e_flow_type_mapping *mapping_items,
+   uint16_t count,
+   uint8_t exclusive);
+
+/**
+ * Get softwar

[dpdk-dev] [PATCH v2] power: add turbo functions to map file

2017-10-02 Thread David Hunt
Fixes: 94608a0f7f45 ("power: add per-core turbo boost API")

Signed-off-by: David Hunt 
---
 lib/librte_power/rte_power_version.map | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/lib/librte_power/rte_power_version.map 
b/lib/librte_power/rte_power_version.map
index db75ff3..ec512ea 100644
--- a/lib/librte_power/rte_power_version.map
+++ b/lib/librte_power/rte_power_version.map
@@ -16,3 +16,10 @@ DPDK_2.0 {
 
local: *;
 };
+
+DPDK_17.11 {
+   global:
+
+   rte_power_freq_disable_turbo;
+   rte_power_freq_enable_turbo;
+} DPDK_2.0;
\ No newline at end of file
-- 
2.7.4



Re: [dpdk-dev] [PATCH] lib/power: add turbo functions to version.map

2017-10-02 Thread Thomas Monjalon
02/10/2017 17:06, Hunt, David:
> On 2/10/2017 3:55 PM, Thomas Monjalon wrote:
>> +DPDK_17.11 {
> >> +  global:
> >> +
> >> +  rte_power_acpi_turbo_status;
> > Is it really the function you want to expose?
> > rte_power_turbo_status seems more generic.
> 
> Not really, it was in there for completeness, but users should be able 
> to keep track of the turbo'd cores, so not really needed.
> 
> > More comments about what is part of the API:
> > If you do not want to expose ACPI and VM implementations,
> > it should not be part of the rte_* include files.
> 
> I'll address the above comments in the next version.

You did not address the comment about what is rte_*.h.
If you do not want to expose everything, you should move it to
another .h file.

Files starting with rte_ are included in doxygen API doc.
Only rte_power.h is installed.
The installed include, the doxygen doc and the map file
should all expose the same API consistently.

I think a cleanup is needed.


Re: [dpdk-dev] [PATCH v4 3/7] member: implement vBF mode

2017-10-02 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Wang, Yipeng1
> Sent: Wednesday, September 27, 2017 6:41 PM
> To: dev@dpdk.org
> Cc: tho...@monjalon.net; Tai, Charlie ; Gobriel,
> Sameh ; De Lara Guarch, Pablo
> ; Mcnamara, John
> ; Wang, Yipeng1 
> Subject: [PATCH v4 3/7] member: implement vBF mode
> 
> Bloom Filter (BF) [1] is a well-known space-efficient probabilistic data
> structure that answers set membership queries.
> Vector of Bloom Filters (vBF) is an extension to traditional BF that supports
> multi-set membership testing. Traditional BF will return found or not-found
> for each key. vBF will also return which set the key belongs to if it is 
> found.
> 
> Since each set requires a BF, vBF should be used when set count is small.
> vBF's false positive rate could be set appropriately so that its memory
> requirement and lookup speed is better in certain cases comparing to HT
> based set-summary.
> 
> This patch adds the vBF implementation.
> 
> [1]B H Bloom, “Space/Time Trade-offs in Hash Coding with Allowable
> Errors,” Communications of the ACM, 1970.
> 
> Signed-off-by: Yipeng Wang 

...

> diff --git a/lib/librte_member/rte_member_vbf.c
> b/lib/librte_member/rte_member_vbf.c

...

> +int
> +rte_member_create_vbf(struct rte_member_setsum *ss,
> + const struct rte_member_parameters *params) {
> +
> + if (params->num_set > 32 || !rte_is_power_of_2(params-
> >num_set) ||

Magic number. Define a macro instead.

> + params->num_keys == 0 ||
> + params->false_positive_rate == 0 ||
> + params->false_positive_rate > 1) {
> + rte_errno = EINVAL;
> + RTE_MEMBER_LOG(ERR, "vBF create with invalid
> parameters\n");
> + return -EINVAL;

...

> +
> + /*
> +  * reduce hash function count, until we approach the user specified
> +  * false-positive rate. otherwise it is too conservative

Watch out for capital letters at the start of the comment and after a full stop.

> +  */
> + int tmp_num_hash = ss->num_hashes;
> +
> + while (tmp_num_hash > 1) {
> + float tmp_fp = new_fp;
> +
> + tmp_num_hash--;
> + new_fp = pow((1 - pow((1 - 1.0 / ss->bits),
> num_keys_per_bf *
> + tmp_num_hash)), tmp_num_hash);
> + new_fp = 1 - pow((1 - new_fp), ss->num_set);
> +
> + if (new_fp > params->false_positive_rate) {
> + new_fp = tmp_fp;
> + tmp_num_hash++;
> + break;
> + }
> + }
> +
> + ss->num_hashes = tmp_num_hash;
> +
> + RTE_MEMBER_LOG(DEBUG, "vector bloom filter created, "
> + "each bloom filter expects %u keys, needs %u bits, %u
> hashes, "
> + "with false positive rate set as %.5f, "
> + "The new calculated vBF false positive rate is %.5f\n",
> + num_keys_per_bf, ss->bits, ss->num_hashes, x, new_fp);

Use a more descriptive variable name for "x".

> +
> + ss->table = rte_zmalloc_socket(NULL, ss->num_set * (ss->bits >> 3),
> + RTE_CACHE_LINE_SIZE, ss-
> >socket_id);
> +
> + /*
> +  * To avoid multiplication and division:
> +  * mul_shift is used for multiplication shift during bit test
> +  * div_shift is used for division shift, to be divided by number of bits
> +  * represented by a uint32_t variable
> +  */
> + ss->mul_shift = __builtin_ctzl(ss->num_set);
> + ss->div_shift = __builtin_ctzl(32 >> ss->mul_shift);
> +
> + if (ss->table == NULL)
> + return -ENOMEM;

I would move this check just after the malloc call.

> +
> + return 0;
> +}
> +
> +static inline uint32_t
> +test_bit(uint32_t bit_loc, const struct rte_member_setsum *ss) {
> + uint32_t *vbf = ss->table;
> + uint32_t n = ss->num_set;
> + uint32_t div_shift = ss->div_shift;
> + uint32_t mul_shift = ss->mul_shift;
> + /*
> +  * a is how many bits in one BF are represented by one 32bit
> +  * variable.
> +  */
> + uint32_t a = 32 >> mul_shift;
> + /*
> +  * x>>b is the divide, x & (a-1) is the mod, & (1< bits
> +  * we do not need
> +  */
> + return (vbf[bit_loc>>div_shift] >> ((bit_loc & (a - 1)) << mul_shift))

Add spaces around ">>".

> &
> + ((1ULL << n) - 1);
> +}
> +
> +static inline void
> +set_bit(uint32_t bit_loc, const struct rte_member_setsum *ss, int32_t
> +set) {
> + uint32_t *vbf = ss->table;
> + uint32_t div_shift = ss->div_shift;
> + uint32_t mul_shift = ss->mul_shift;
> + uint32_t a = 32 >> mul_shift;
> +
> + vbf[bit_loc>>div_shift] |= 1U << (((bit_loc & (a - 1)) << mul_shift) +
> + set - 1);

Same as above.

> +}
> +
> +int
> +rte_member_lookup_vbf(const struct rte_member_setsum *ss, const
> void *key,
> + membe

Re: [dpdk-dev] [PATCH v4 5/7] member: enable the library

2017-10-02 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Wang, Yipeng1
> Sent: Wednesday, September 27, 2017 6:41 PM
> To: dev@dpdk.org
> Cc: tho...@monjalon.net; Tai, Charlie ; Gobriel,
> Sameh ; De Lara Guarch, Pablo
> ; Mcnamara, John
> ; Wang, Yipeng1 
> Subject: [PATCH v4 5/7] member: enable the library
> 
> This patch enables the Membership library.
> 
> Signed-off-by: Yipeng Wang 
> ---
>  MAINTAINERS| 8 +++-
>  config/common_base | 5 +
>  lib/librte_member/Makefile | 2 ++
>  mk/rte.app.mk  | 2 ++
>  4 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a0cd75e..adb8e2c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -726,6 +726,13 @@ F: test/test/test_lpm*
>  F: test/test/test_func_reentrancy.c
>  F: test/test/test_xmmt_ops.h
> 
> +Membership - EXPERIMENTAL
> +M: Yipeng Wang 
> +M: Sameh Gobriel 
> +F: lib/librte_member/
> +F: doc/guides/prog_guide/member_lib.rst
> +F: test/test/test_member*
> +

Add these last two items in patches 6 and 7, where you are adding the files.




[dpdk-dev] [PATCH v4 0/3] run-time Linking support

2017-10-02 Thread Xiaoyun Li
This patchset dynamically selects functions at run-time based on CPU flags
that current machine supports. This patchset modifies mempcy, memcpy perf
test and x86 EFD, using function pointers and bind them at constructor time.
Then in the cloud environment, users can compiler once for the minimum target
such as 'haswell'(not 'native') and run on different platforms (equal or above
haswell) and can get ISA optimization based on running CPU.

Xiaoyun Li (3):
  eal/x86: run-time dispatch over memcpy
  app/test: run-time dispatch over memcpy perf test
  efd: run-time dispatch over x86 EFD functions

---
v2
* Use gcc function multi-versioning to avoid compilation issues.
* Add macros for AVX512 and AVX2. Only if users enable AVX512 and the
compiler supports it, the AVX512 codes would be compiled. Only if the
compiler supports AVX2, the AVX2 codes would be compiled.

v3
* Reduce function calls via only keep rte_memcpy_xxx.
* Add conditions that when copy size is small, use inline code path.
Otherwise, use dynamic code path.
* To support attribute target, clang version must be greater than 3.7.
Otherwise, would choose SSE/AVX code path, the same as before.
* Move two mocro functions to the top of the code since they would be
used in inline SSE/AVX and dynamic SSE/AVX codes.

v4
* Modify rte_memcpy.h to several .c files and modify makefiles to compile
AVX2 and AVX512 files.

 lib/librte_eal/bsdapp/eal/Makefile |  17 +
 .../common/include/arch/x86/rte_memcpy.c   |  59 ++
 .../common/include/arch/x86/rte_memcpy.h   | 861 +--
 .../common/include/arch/x86/rte_memcpy_avx2.c  | 291 +++
 .../common/include/arch/x86/rte_memcpy_avx512f.c   | 316 +++
 .../common/include/arch/x86/rte_memcpy_internal.h  | 909 +
 .../common/include/arch/x86/rte_memcpy_sse.c   | 585 +
 lib/librte_eal/linuxapp/eal/Makefile   |  17 +
 lib/librte_efd/rte_efd_x86.h   |  41 +-
 mk/rte.cpuflags.mk |  14 +
 test/test/test_memcpy_perf.c   |  40 +-
 11 files changed, 2288 insertions(+), 862 deletions(-)
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy.c
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_avx2.c
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_avx512f.c
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_sse.c

-- 
2.7.4



[dpdk-dev] [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy

2017-10-02 Thread Xiaoyun Li
This patch dynamically selects functions of memcpy at run-time based
on CPU flags that current machine supports. This patch uses function
pointers which are bind to the relative functions at constrctor time.
In addition, AVX512 instructions set would be compiled only if users
config it enabled and the compiler supports it.

Signed-off-by: Xiaoyun Li 
---
v2
* Use gcc function multi-versioning to avoid compilation issues.
* Add macros for AVX512 and AVX2. Only if users enable AVX512 and the
compiler supports it, the AVX512 codes would be compiled. Only if the
compiler supports AVX2, the AVX2 codes would be compiled.

v3
* Reduce function calls via only keep rte_memcpy_xxx.
* Add conditions that when copy size is small, use inline code path.
Otherwise, use dynamic code path.
* To support attribute target, clang version must be greater than 3.7.
Otherwise, would choose SSE/AVX code path, the same as before.
* Move two mocro functions to the top of the code since they would be
used in inline SSE/AVX and dynamic SSE/AVX codes.

v4
* Modify rte_memcpy.h to several .c files and modify makefiles to compile
AVX2 and AVX512 files.

 lib/librte_eal/bsdapp/eal/Makefile |  17 +
 .../common/include/arch/x86/rte_memcpy.c   |  59 ++
 .../common/include/arch/x86/rte_memcpy.h   | 861 +--
 .../common/include/arch/x86/rte_memcpy_avx2.c  | 291 +++
 .../common/include/arch/x86/rte_memcpy_avx512f.c   | 316 +++
 .../common/include/arch/x86/rte_memcpy_internal.h  | 909 +
 .../common/include/arch/x86/rte_memcpy_sse.c   | 585 +
 lib/librte_eal/linuxapp/eal/Makefile   |  17 +
 mk/rte.cpuflags.mk |  14 +
 9 files changed, 2223 insertions(+), 846 deletions(-)
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy.c
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_avx2.c
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_avx512f.c
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_sse.c

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index 005019e..27023c6 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -36,6 +36,7 @@ LIB = librte_eal.a
 ARCH_DIR ?= $(RTE_ARCH)
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
+VPATH += $(RTE_SDK)/lib/librte_eal/common/include/arch/$(ARCH_DIR)
 
 CFLAGS += -I$(SRCDIR)/include
 CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common
@@ -93,6 +94,22 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_service.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_cpuflags.c
 SRCS-$(CONFIG_RTE_ARCH_X86) += rte_spinlock.c
 
+# for run-time dispatch of memcpy
+SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy.c
+SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy_sse.c
+
+# if the compiler supports AVX512, add avx512 file
+ifneq ($(filter $(MACHINE_CFLAGS),CC_SUPPORT_AVX512F),)
+SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy_avx512f.c
+CFLAGS_rte_memcpy_avx512f.o += -mavx512f
+endif
+
+# if the compiler supports AVX2, add avx2 file
+ifneq ($(filter $(MACHINE_CFLAGS),CC_SUPPORT_AVX2),)
+SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy_avx2.c
+CFLAGS_rte_memcpy_avx2.o += -mavx2
+endif
+
 CFLAGS_eal_common_cpuflags.o := $(CPUFLAGS_LIST)
 
 CFLAGS_eal.o := -D_GNU_SOURCE
diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.c 
b/lib/librte_eal/common/include/arch/x86/rte_memcpy.c
new file mode 100644
index 000..74ae702
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.c
@@ -0,0 +1,59 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, IND

[dpdk-dev] [PATCH v4 2/3] app/test: run-time dispatch over memcpy perf test

2017-10-02 Thread Xiaoyun Li
This patch modifies assignment of alignment unit from build-time
to run-time based on CPU flags that machine supports.

Signed-off-by: Xiaoyun Li 
---
 test/test/test_memcpy_perf.c | 40 +++-
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/test/test/test_memcpy_perf.c b/test/test/test_memcpy_perf.c
index ff3..33def3b 100644
--- a/test/test/test_memcpy_perf.c
+++ b/test/test/test_memcpy_perf.c
@@ -79,13 +79,7 @@ static size_t buf_sizes[TEST_VALUE_RANGE];
 #define TEST_BATCH_SIZE 100
 
 /* Data is aligned on this many bytes (power of 2) */
-#ifdef RTE_MACHINE_CPUFLAG_AVX512F
-#define ALIGNMENT_UNIT  64
-#elif defined RTE_MACHINE_CPUFLAG_AVX2
-#define ALIGNMENT_UNIT  32
-#else /* RTE_MACHINE_CPUFLAG */
-#define ALIGNMENT_UNIT  16
-#endif /* RTE_MACHINE_CPUFLAG */
+static uint8_t alignment_unit = 16;
 
 /*
  * Pointers used in performance tests. The two large buffers are for uncached
@@ -100,20 +94,39 @@ static int
 init_buffers(void)
 {
unsigned i;
+#ifdef CC_SUPPORT_AVX512
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F))
+   alignment_unit = 64;
+   else
+#endif
+#ifdef CC_SUPPORT_AVX2
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
+   alignment_unit = 32;
+   else
+#endif
+   alignment_unit = 16;
 
-   large_buf_read = rte_malloc("memcpy", LARGE_BUFFER_SIZE + 
ALIGNMENT_UNIT, ALIGNMENT_UNIT);
+   large_buf_read = rte_malloc("memcpy",
+   LARGE_BUFFER_SIZE + alignment_unit,
+   alignment_unit);
if (large_buf_read == NULL)
goto error_large_buf_read;
 
-   large_buf_write = rte_malloc("memcpy", LARGE_BUFFER_SIZE + 
ALIGNMENT_UNIT, ALIGNMENT_UNIT);
+   large_buf_write = rte_malloc("memcpy",
+LARGE_BUFFER_SIZE + alignment_unit,
+alignment_unit);
if (large_buf_write == NULL)
goto error_large_buf_write;
 
-   small_buf_read = rte_malloc("memcpy", SMALL_BUFFER_SIZE + 
ALIGNMENT_UNIT, ALIGNMENT_UNIT);
+   small_buf_read = rte_malloc("memcpy",
+   SMALL_BUFFER_SIZE + alignment_unit,
+   alignment_unit);
if (small_buf_read == NULL)
goto error_small_buf_read;
 
-   small_buf_write = rte_malloc("memcpy", SMALL_BUFFER_SIZE + 
ALIGNMENT_UNIT, ALIGNMENT_UNIT);
+   small_buf_write = rte_malloc("memcpy",
+SMALL_BUFFER_SIZE + alignment_unit,
+alignment_unit);
if (small_buf_write == NULL)
goto error_small_buf_write;
 
@@ -153,7 +166,7 @@ static inline size_t
 get_rand_offset(size_t uoffset)
 {
return ((rte_rand() % (LARGE_BUFFER_SIZE - SMALL_BUFFER_SIZE)) &
-   ~(ALIGNMENT_UNIT - 1)) + uoffset;
+   ~(alignment_unit - 1)) + uoffset;
 }
 
 /* Fill in source and destination addresses. */
@@ -321,7 +334,8 @@ perf_test(void)
   "(bytes)(ticks)(ticks)(ticks)
(ticks)\n"
   "--- -- -- -- 
--");
 
-   printf("\n== %2dB aligned 
", ALIGNMENT_UNIT);
+   printf("\n= %2dB aligned 
",
+   alignment_unit);
/* Do aligned tests where size is a variable */
perf_test_variable_aligned();
printf("\n--- -- -- -- 
--");
-- 
2.7.4



[dpdk-dev] [PATCH v4 3/3] efd: run-time dispatch over x86 EFD functions

2017-10-02 Thread Xiaoyun Li
This patch dynamically selects x86 EFD functions at run-time.
This patch uses function pointer and binds it to the relative
function based on CPU flags at constructor time.

Signed-off-by: Xiaoyun Li 
---
 lib/librte_efd/rte_efd_x86.h | 41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/lib/librte_efd/rte_efd_x86.h b/lib/librte_efd/rte_efd_x86.h
index 34f37d7..93b6743 100644
--- a/lib/librte_efd/rte_efd_x86.h
+++ b/lib/librte_efd/rte_efd_x86.h
@@ -43,12 +43,29 @@
 #define EFD_LOAD_SI128(val) _mm_lddqu_si128(val)
 #endif
 
+typedef efd_value_t
+(*efd_lookup_internal_avx2_t)(const efd_hashfunc_t *group_hash_idx,
+   const efd_lookuptbl_t *group_lookup_table,
+   const uint32_t hash_val_a, const uint32_t hash_val_b);
+
+static efd_lookup_internal_avx2_t efd_lookup_internal_avx2_ptr;
+
 static inline efd_value_t
 efd_lookup_internal_avx2(const efd_hashfunc_t *group_hash_idx,
const efd_lookuptbl_t *group_lookup_table,
const uint32_t hash_val_a, const uint32_t hash_val_b)
 {
-#ifdef RTE_MACHINE_CPUFLAG_AVX2
+   return (*efd_lookup_internal_avx2_ptr)(group_hash_idx,
+  group_lookup_table,
+  hash_val_a, hash_val_b);
+}
+
+#ifdef CC_SUPPORT_AVX2
+static inline efd_value_t
+efd_lookup_internal_avx2_AVX2(const efd_hashfunc_t *group_hash_idx,
+   const efd_lookuptbl_t *group_lookup_table,
+   const uint32_t hash_val_a, const uint32_t hash_val_b)
+{
efd_value_t value = 0;
uint32_t i = 0;
__m256i vhash_val_a = _mm256_set1_epi32(hash_val_a);
@@ -74,13 +91,31 @@ efd_lookup_internal_avx2(const efd_hashfunc_t 
*group_hash_idx,
}
 
return value;
-#else
+}
+#endif
+
+static inline efd_value_t
+efd_lookup_internal_avx2_DEFAULT(const efd_hashfunc_t *group_hash_idx,
+   const efd_lookuptbl_t *group_lookup_table,
+   const uint32_t hash_val_a, const uint32_t hash_val_b)
+{
RTE_SET_USED(group_hash_idx);
RTE_SET_USED(group_lookup_table);
RTE_SET_USED(hash_val_a);
RTE_SET_USED(hash_val_b);
/* Return dummy value, only to avoid compilation breakage */
return 0;
-#endif
+}
 
+static void __attribute__((constructor))
+rte_efd_x86_init(void)
+{
+#ifdef CC_SUPPORT_AVX2
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
+   efd_lookup_internal_avx2_ptr = efd_lookup_internal_avx2_AVX2;
+   else
+   efd_lookup_internal_avx2_ptr = efd_lookup_internal_avx2_DEFAULT;
+#else
+   efd_lookup_internal_avx2_ptr = efd_lookup_internal_avx2_DEFAULT;
+#endif
 }
-- 
2.7.4



Re: [dpdk-dev] [PATCH] checkpatch: re-enable warnings about split long strings

2017-10-02 Thread Adrien Mazarguil
On Mon, Oct 02, 2017 at 02:46:24PM +0100, Bruce Richardson wrote:
> On Mon, Oct 02, 2017 at 01:53:17PM +0200, Adrien Mazarguil wrote:
> > Hi Stephen,
> > 
> > On Fri, Sep 29, 2017 at 08:37:49AM -0700, Stephen Hemminger wrote:
> > > The Linux kernel style policy about strings is that strings should
> > > be always put on one line. This makes sense since a typical use case
> > > is for a user to type the error message into a search engine or
> > > grep, and it won't be found if split across lines.  This patch just
> > > re-enables that check.
> > > 
> > > Yes, lots of DPDK code now splits strings, that doesn't make it
> > > right.
> > > 
> > > Signed-off-by: Stephen Hemminger  ---
> > > devtools/checkpatches.sh | 1 - 1 file changed, 1 deletion(-)
> > > 
> > > diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
> > > index a56c41a301c0..3e6081dd673e 100755 ---
> > > a/devtools/checkpatches.sh +++ b/devtools/checkpatches.sh @@ -44,7
> > > +44,6 @@ options="$options --show-types" options="$options
> > > --ignore=LINUX_VERSION_CODE,FILE_PATH_CHANGES,\
> > > VOLATILE,PREFER_PACKED,PREFER_ALIGNED,PREFER_PRINTF,\
> > > PREFER_KERNEL_TYPES,BIT_MACRO,CONST_STRUCT,\
> > > -SPLIT_STRING,LONG_LINE_STRING,\
> > > LINE_SPACING,PARENTHESIS_ALIGNMENT,NETWORKING_BLOCK_COMMENT_STYLE,\
> > > NEW_TYPEDEFS,COMPARISON_TO_NULL"
> > 
> > I'm not sure, given that the main reason for splitting strings in the
> > first place is to avoid LONG_LINE_STRING warnings, I think we must
> > choose between the two options. If split strings are not allowed, then
> > long lines must be.
> > 
> > Since checkpatches.sh is used by various automated scripts to complain
> > loudly about problems in submissions, the above change prevents
> > maintainers from writing long string at all (can't split and can't go
> > past 80 columns).
> > 
> > As a result, they will be tempted to cripple their code with nasty
> > workarounds to shut up checkpatches.sh, we don't want that to happen.
> > 
> > Also I think the reasons stated by original commit cf75514c8e2e are
> > still relevant. My vote would be to keep things as is.
> > 
> In my experience, checkpatch is smart enough to recognise when a long
> line overflows the 80 character limit because of a single long string,
> so the two options are not mutually exclusive. In other words, long
> lines are not allowed except in the case where shortening the line
> involves splitting a string. There may be a small amount of work in
> getting checkpatch happy, i.e. by putting the string on a line on it's
> own, but we can indeed have our cake and eat it too in this case.

I can't seem to get around warnings without ignoring either SPLIT_STRING or
LONG_LINE_STRING as of Linux v4.14-rc3's checkpatch.pl. I think you can only
get around them by fooling it somehow. You really need to ignore at least
LONG_LINE_STRING to meet the requirements of the commit log.

However SPLIT_STRING still looks necessary to address part of cf75514c8e2e
("devtools: ignore warning on long log string"):

 "...lines that make use of PRIx64 with string concatenation will still be
  flagged if the beginning of the last string fragment begins after the 80
  character threshold."

It's not all that uncommon in my opinion.

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH v4 6/7] test/member: add functional and perf tests

2017-10-02 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Wang, Yipeng1
> Sent: Wednesday, September 27, 2017 6:41 PM
> To: dev@dpdk.org
> Cc: tho...@monjalon.net; Tai, Charlie ; Gobriel,
> Sameh ; De Lara Guarch, Pablo
> ; Mcnamara, John
> ; Wang, Yipeng1 
> Subject: [PATCH v4 6/7] test/member: add functional and perf tests
> 
> This patch adds functional and performance tests for membership library.
> 
> Signed-off-by: Yipeng Wang 

...

> +++ b/test/test/test_member.c

...

> +
> +#define MAX_ENTRIES (1 << 16)
> +uint8_t gened_keys[MAX_ENTRIES][KEY_SIZE];

Gened? Is this "generated"? Maybe it is worth using the full word.


> +
> +static struct rte_member_parameters params = {
> + .num_keys = MAX_ENTRIES,   /* Total hash table entries.
> */
> + .key_len = KEY_SIZE,   /* Length of hash key. */

Align comments with tabs.

> +
> + /*num_set and false_positive_rate only relevant to vBF
> setsum*/

Add whitespaces around comment.

> + .num_set = 32,
> + .false_positive_rate = 0.03,
> + .prim_hash_seed = 1,
> + .sec_hash_seed = 11,
> + .socket_id = 0  /* NUMA Socket ID for memory. */
> +};


> +static int test_member_insert(void)
> +{
> + int ret_ht, ret_cache, ret_vbf, i;
> +
> + for (i = 0; i < 5; i++) {

Use macro for the value 5, used here and other functions.

> + ret_ht = rte_member_add(setsum_ht, &keys[i], test_set[i]);
> + ret_cache = rte_member_add(setsum_cache, &keys[i],
> + test_set[i]);
> + ret_vbf = rte_member_add(setsum_vbf, &keys[i],
> test_set[i]);
> + TEST_ASSERT(ret_ht >= 0 && ret_cache >= 0 && ret_vbf >=
> 0,
> + "insert error");
> + }
> + printf("insert key success\n");
> + return 0;
> +}

...

> +static int test_member_multimatch(void) {
> + int ret_ht, ret_vbf, ret_cache;
> + member_set_t set_ids_ht[32] = {0};

Same comment about the value 5 applies here, for the value 32.

> + member_set_t set_ids_vbf[32] = {0};
> + member_set_t set_ids_cache[32] = {0};
> +
> + member_set_t set_ids_ht_m[5][32] = {{0} };
> + member_set_t set_ids_vbf_m[5][32] = {{0} };
> + member_set_t set_ids_cache_m[5][32] = {{0} };
> +
> + uint32_t match_count_ht[5];
> + uint32_t match_count_vbf[5];
> + uint32_t match_count_cache[5];
> +
> + uint32_t num_key_ht = 5;
> + uint32_t num_key_vbf = 5;
> + uint32_t num_key_cache = 5;
> +
> + const void *key_array[5];
> +
> + uint32_t i, j;
> +
> + /* same key at most inserted 2*entry_per_bucket times for HT
> mode */
> + for (i = 1; i < 33; i++) {

This 33 can be expressed as 32 + 1 (using macro for 32).
Also, add a comment explaining why you are skipping value 0.

...

> --- /dev/null
> +++ b/test/test/test_member_perf.c
> @@ -0,0 +1,630 @@

...

> +static int
> +timed_lookups_bulk(struct member_perf_params *params, int type) {
> + unsigned int i, j, k;
> + member_set_t result[BURST_SIZE] = {0};
> + const void *keys_burst[BURST_SIZE];
> + int ret;
> +
> + false_data_bulk[type][params->cycle] = 0;
> +
> + const uint64_t start_tsc = rte_rdtsc();
> +
> + for (i = 0; i < NUM_LOOKUPS / KEYS_TO_ADD; i++) {
> + for (j = 0; j < KEYS_TO_ADD / BURST_SIZE; j++) {
> + for (k = 0; k < BURST_SIZE; k++)
> + keys_burst[k] = keys[j * BURST_SIZE + k];
> +
> + ret = rte_member_lookup_bulk(params-
> >setsum[type],
> + &keys_burst[0],

Using keys_burst directly is equivalent to this, right?

...

> +static int
> +timed_lookups_multimatch(struct member_perf_params *params, int
> type) {
> + unsigned int i, j;
> + member_set_t result[RTE_MEMBER_BUCKET_ENTRIES] = {0};
> + int ret;
> + false_data_multi[type][params->cycle] = 0;
> +
> + const uint64_t start_tsc = rte_rdtsc();
> +
> + for (i = 0; i < NUM_LOOKUPS / KEYS_TO_ADD; i++) {
> + for (j = 0; j < KEYS_TO_ADD; j++) {
> + ret = rte_member_lookup_multi(params-
> >setsum[type],
> + &keys[j], RTE_MEMBER_BUCKET_ENTRIES,
> result);
> + if (type != CACHE && ret <= 0) {
> + printf("lookup multi has wrong return value
> %d,"
> + "type %d\n", ret, type);
> + }
> + if (result[0] != data[type][j])

Why using always result[0]? A comment would be good.



Re: [dpdk-dev] [PATCH] lib/power: add turbo functions to version.map

2017-10-02 Thread Hunt, David

Hi Thomas,


On 2/10/2017 4:39 PM, Thomas Monjalon wrote:

02/10/2017 17:06, Hunt, David:

On 2/10/2017 3:55 PM, Thomas Monjalon wrote:

+DPDK_17.11 {

+   global:
+
+   rte_power_acpi_turbo_status;

Is it really the function you want to expose?
rte_power_turbo_status seems more generic.

Not really, it was in there for completeness, but users should be able
to keep track of the turbo'd cores, so not really needed.


More comments about what is part of the API:
If you do not want to expose ACPI and VM implementations,
it should not be part of the rte_* include files.

I'll address the above comments in the next version.

You did not address the comment about what is rte_*.h.
If you do not want to expose everything, you should move it to
another .h file.

Files starting with rte_ are included in doxygen API doc.
Only rte_power.h is installed.
The installed include, the doxygen doc and the map file
should all expose the same API consistently.

I think a cleanup is needed.


While I agree a cleanup is needed, this small patch is only intended to 
fix the priority issue of the shared library builds, which are broken at 
the moment.
The initial patch should have had rte_power_turbo_status, not 
rte_power_acpi_turbo_status.
Rather than moving code around at this stage, I propose having the three 
exposed functions in the map file (with the correct names).
Then, later on, I can do an ABI breakage notification for the next 
release to rename all the other rte*.h files, as some consumers of DPDK 
may be using those directly, at which stage we will be down to just 
exporting the functions in rte_power.h.

Does that sound OK with you?
Regards,
Dave.




Re: [dpdk-dev] [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy

2017-10-02 Thread Ananyev, Konstantin


> -Original Message-
> From: Li, Xiaoyun
> Sent: Monday, October 2, 2017 5:13 PM
> To: Ananyev, Konstantin ; Richardson, Bruce 
> 
> Cc: Lu, Wenzhuo ; Zhang, Helin ; 
> dev@dpdk.org; Li, Xiaoyun 
> Subject: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy
> 
> This patch dynamically selects functions of memcpy at run-time based
> on CPU flags that current machine supports. This patch uses function
> pointers which are bind to the relative functions at constrctor time.
> In addition, AVX512 instructions set would be compiled only if users
> config it enabled and the compiler supports it.
> 
> Signed-off-by: Xiaoyun Li 
> ---
> v2
> * Use gcc function multi-versioning to avoid compilation issues.
> * Add macros for AVX512 and AVX2. Only if users enable AVX512 and the
> compiler supports it, the AVX512 codes would be compiled. Only if the
> compiler supports AVX2, the AVX2 codes would be compiled.
> 
> v3
> * Reduce function calls via only keep rte_memcpy_xxx.
> * Add conditions that when copy size is small, use inline code path.
> Otherwise, use dynamic code path.
> * To support attribute target, clang version must be greater than 3.7.
> Otherwise, would choose SSE/AVX code path, the same as before.
> * Move two mocro functions to the top of the code since they would be
> used in inline SSE/AVX and dynamic SSE/AVX codes.
> 
> v4
> * Modify rte_memcpy.h to several .c files and modify makefiles to compile
> AVX2 and AVX512 files.

Could you explain to me why instead of reusing existing rte_memcpy() code
to generate _sse/_avx2/ax512f flavors you keep pushing changes with 3 separate 
implementations?
Obviously that is much more expensive in terms of maintenance and doesn't look 
like
feasible solution to me.
Is existing rte_memcpy() implementation is not good enough in terms of 
functionality and/or performance?
If so, can you outline these problems and try to fix them first.
Konstantin

> 
>  lib/librte_eal/bsdapp/eal/Makefile |  17 +
>  .../common/include/arch/x86/rte_memcpy.c   |  59 ++
>  .../common/include/arch/x86/rte_memcpy.h   | 861 +--
>  .../common/include/arch/x86/rte_memcpy_avx2.c  | 291 +++
>  .../common/include/arch/x86/rte_memcpy_avx512f.c   | 316 +++
>  .../common/include/arch/x86/rte_memcpy_internal.h  | 909 
> +
>  .../common/include/arch/x86/rte_memcpy_sse.c   | 585 +
>  lib/librte_eal/linuxapp/eal/Makefile   |  17 +
>  mk/rte.cpuflags.mk |  14 +
>  9 files changed, 2223 insertions(+), 846 deletions(-)
>  create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy.c
>  create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_avx2.c
>  create mode 100644 
> lib/librte_eal/common/include/arch/x86/rte_memcpy_avx512f.c
>  create mode 100644 
> lib/librte_eal/common/include/arch/x86/rte_memcpy_internal.h
>  create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcpy_sse.c
> 
> diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
> b/lib/librte_eal/bsdapp/eal/Makefile
> index 005019e..27023c6 100644
> --- a/lib/librte_eal/bsdapp/eal/Makefile
> +++ b/lib/librte_eal/bsdapp/eal/Makefile
> @@ -36,6 +36,7 @@ LIB = librte_eal.a
>  ARCH_DIR ?= $(RTE_ARCH)
>  VPATH += $(RTE_SDK)/lib/librte_eal/common
>  VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
> +VPATH += $(RTE_SDK)/lib/librte_eal/common/include/arch/$(ARCH_DIR)
> 
>  CFLAGS += -I$(SRCDIR)/include
>  CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common
> @@ -93,6 +94,22 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_service.c
>  SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_cpuflags.c
>  SRCS-$(CONFIG_RTE_ARCH_X86) += rte_spinlock.c
> 
> +# for run-time dispatch of memcpy
> +SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy.c
> +SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy_sse.c
> +
> +# if the compiler supports AVX512, add avx512 file
> +ifneq ($(filter $(MACHINE_CFLAGS),CC_SUPPORT_AVX512F),)
> +SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy_avx512f.c
> +CFLAGS_rte_memcpy_avx512f.o += -mavx512f
> +endif
> +
> +# if the compiler supports AVX2, add avx2 file
> +ifneq ($(filter $(MACHINE_CFLAGS),CC_SUPPORT_AVX2),)
> +SRCS-$(CONFIG_RTE_ARCH_X86) += rte_memcpy_avx2.c
> +CFLAGS_rte_memcpy_avx2.o += -mavx2
> +endif
> +
>  CFLAGS_eal_common_cpuflags.o := $(CPUFLAGS_LIST)
> 
>  CFLAGS_eal.o := -D_GNU_SOURCE
> diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.c 
> b/lib/librte_eal/common/include/arch/x86/rte_memcpy.c
> new file mode 100644
> index 000..74ae702
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.c
> @@ -0,0 +1,59 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributi

[dpdk-dev] [PATCH v6 0/6] Support TCP/IPv4, VxLAN, and GRE GSO in DPDK

2017-10-02 Thread Mark Kavanagh
Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.

To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. This patch adds GSO support to DPDK for specific
packet types: specifically, TCP/IPv4, VxLAN, and GRE.

The first patch introduces the GSO API framework. The second patch
adds GSO support for TCP/IPv4 packets (containing an optional VLAN
tag). The third patch adds GSO support for VxLAN packets that contain
outer IPv4, and inner TCP/IPv4 headers (plus optional inner and/or 
outer VLAN tags). The fourth patch adds GSO support for GRE packets
that contain outer IPv4, and inner TCP/IPv4 headers (with optional 
outer VLAN tag). The fifth patch in the series enables TCP/IPv4, VxLAN,
and GRE GSO in testpmd's checksum forwarding engine. The final patch
in the series adds GSO documentation to the programmer's guide.

Performance Testing
===
The performance of TCP/IPv4 GSO on a 10Gbps link is demonstrated using
iperf. Setup for the test is described as follows:

a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same
   machine, together physically.
b. Launch testpmd with P0 and a vhost-user port, and use csum
   forwarding engine with "retry".
c. Select IP and TCP HW checksum calculation for P0; select TCP HW
   checksum calculation for vhost-user port.
d. Launch a VM with csum and tso offloading enabled.
e. Run iperf-client on virtio-net port in the VM to send TCP packets.
   With enabling csum and tso, the VM can send large TCP/IPv4 packets
   (mss is up to 64KB).
f. P1 is assigned to linux kernel and enabled kernel GRO. Run
   iperf-server on P1.

We conduct three iperf tests:

test-1: enable GSO for P0 in testpmd, and set max GSO segment length
to 1518B. Run two iperf-client in the VM.
test-2: enable TSO for P0 in testpmd, and set TSO segsz to 1518B. Run
two iperf-client in the VM.
test-3: disable GSO and TSO in testpmd. Run two iperf-client in the VM.

Throughput of the above three tests:

test-1: 9.4Gbps
test-2: 9.5Gbps
test-3: 3Mbps

Functional Testing
==
Unlike TCP packets, VMs can't send large VxLAN or GRE packets. The max
length of tunneled packets from VMs is 1514B. So current experiment
method can't be used to measure VxLAN and GRE GSO performance, but simply
test the functionality via setting small GSO segment length (e.g. 500B).

VxLAN
-
To test VxLAN GSO functionality, we use the following setup:

a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same
   machine, together physically.
b. Launch testpmd with P0 and a vhost-user port, and use csum forwarding
   engine with "retry".
c. Testpmd commands:
- csum parse_tunnel on "P0"
- csum parse_tunnel on "vhost-user port"
- csum set outer-ip hw "P0"
- csum set ip hw "P0"
- csum set tcp hw "P0"
- csum set tcp hw "vhost-user port"
- set port "P0" gso on
- set gso segsz 500
d. Launch a VM with csum and tso offloading enabled.
e. Create a vxlan port for the virtio-net port in the VM. Run iperf-client
   on the VxLAN port, so TCP packets are VxLAN encapsulated. However, the
   max packet length is 1514B.
f. P1 is assigned to linux kernel and kernel GRO is disabled. Similarly,
   create a VxLAN port for P1, and run iperf-server on the VxLAN port.

In testpmd, we can see the length of all packets sent from P0 is smaller
than or equal to 500B. Additionally, the packets arriving in P1 is
encapsulated and is smaller than or equal to 500B.

GRE
---
The same process may be used to test GRE functionality, with the exception that
the tunnel type created for both the guest's virtio-net, and the host's kernel
interfaces is GRE:
   `ip tunnel add  mode gre remote  local `

As in the VxLAN testcase, the length of packets sent from P0, and received on
P1, is less than 500B.

Change log
==
v6:
- rebase to HEAD of master (i5dce9fcA)
- remove 'l3_offset' parameter from 'update_ipv4_tcp_headers'

v5:
- add GSO section to the programmer's guide.
- use MF or (previously 'and') offset to check if a packet is IP
  fragmented.
- move 'update_header' helper functions to gso_common.h.
- move txp/ipv4 'update_header' function to gso_tcp4.c.
- move tunnel 'update_header' function to gso_tunnel_tcp4.c.
- add offset parameter to 'update_header' functions.
- combine GRE and VxLAN tunnel header update functions into a single
  function.
- correct typos and errors in comments/commit messages.

v4:
- use ol_flags instead of packet_type to decide which segmentation
  function to use.
- use MF and offset to check if a packet is IP fragmented, instead of
  using DF.
- remove ETHER_CRC_LEN from gso segment payload length calculation.
- refactor internal header update and other functions.
- remove RTE_GSO_IPID_INCREASE.
- add some 

[dpdk-dev] [PATCH v6 1/6] gso: add Generic Segmentation Offload API framework

2017-10-02 Thread Mark Kavanagh
From: Jiayu Hu 

Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.

To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. To segment a packet requires two steps. The first
is to set proper flags to mbuf->ol_flags, where the flags are the same
as that of TSO. The second is to call the segmentation API,
rte_gso_segment(). This patch introduces the GSO API framework to DPDK.

rte_gso_segment() splits an input packet into small ones in each
invocation. The GSO library refers to these small packets generated
by rte_gso_segment() as GSO segments. Each of the newly-created GSO
segments is organized as a two-segment MBUF, where the first segment is a
standard MBUF, which stores a copy of packet header, and the second is an
indirect MBUF which points to a section of data in the input packet.
rte_gso_segment() reduces the refcnt of the input packet by 1. Therefore,
when all GSO segments are freed, the input packet is freed automatically.
Additionally, since each GSO segment has multiple MBUFs (i.e. 2 MBUFs),
the driver of the interface which the GSO segments are sent to should
support to transmit multi-segment packets.

The GSO framework clears the PKT_TX_TCP_SEG flag for both the input
packet, and all produced GSO segments in the event of success, since
segmentation in hardware is no longer required at that point.

Signed-off-by: Jiayu Hu 
Signed-off-by: Mark Kavanagh 
---
 config/common_base |   5 ++
 doc/api/doxy-api-index.md  |   1 +
 doc/api/doxy-api.conf  |   1 +
 doc/guides/rel_notes/release_17_11.rst |   1 +
 lib/Makefile   |   2 +
 lib/librte_gso/Makefile|  49 +++
 lib/librte_gso/rte_gso.c   |  52 
 lib/librte_gso/rte_gso.h   | 145 +
 lib/librte_gso/rte_gso_version.map |   7 ++
 mk/rte.app.mk  |   1 +
 10 files changed, 264 insertions(+)
 create mode 100644 lib/librte_gso/Makefile
 create mode 100644 lib/librte_gso/rte_gso.c
 create mode 100644 lib/librte_gso/rte_gso.h
 create mode 100644 lib/librte_gso/rte_gso_version.map

diff --git a/config/common_base b/config/common_base
index 12f6be9..58ca5c0 100644
--- a/config/common_base
+++ b/config/common_base
@@ -653,6 +653,11 @@ CONFIG_RTE_LIBRTE_IP_FRAG_TBL_STAT=n
 CONFIG_RTE_LIBRTE_GRO=y
 
 #
+# Compile GSO library
+#
+CONFIG_RTE_LIBRTE_GSO=y
+
+#
 # Compile librte_meter
 #
 CONFIG_RTE_LIBRTE_METER=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 19e0d4f..6512918 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -101,6 +101,7 @@ The public API headers are grouped by topics:
   [TCP](@ref rte_tcp.h),
   [UDP](@ref rte_udp.h),
   [GRO](@ref rte_gro.h),
+  [GSO](@ref rte_gso.h),
   [frag/reass] (@ref rte_ip_frag.h),
   [LPM IPv4 route] (@ref rte_lpm.h),
   [LPM IPv6 route] (@ref rte_lpm6.h),
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 823554f..408f2e6 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -47,6 +47,7 @@ INPUT   = doc/api/doxy-api-index.md \
   lib/librte_ether \
   lib/librte_eventdev \
   lib/librte_gro \
+  lib/librte_gso \
   lib/librte_hash \
   lib/librte_ip_frag \
   lib/librte_jobstats \
diff --git a/doc/guides/rel_notes/release_17_11.rst 
b/doc/guides/rel_notes/release_17_11.rst
index 8bf91bd..7508be7 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -174,6 +174,7 @@ The libraries prepended with a plus sign were incremented 
in this version.
  librte_ethdev.so.7
  librte_eventdev.so.2
  librte_gro.so.1
+   + librte_gso.so.1
  librte_hash.so.2
  librte_ip_frag.so.1
  librte_jobstats.so.1
diff --git a/lib/Makefile b/lib/Makefile
index 86caba1..3d123f4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
+DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
new file mode 100644
index 000..aeaacbc
--- /dev/null
+++ b/lib/libr

[dpdk-dev] [PATCH v6 2/6] gso: add TCP/IPv4 GSO support

2017-10-02 Thread Mark Kavanagh
From: Jiayu Hu 

This patch adds GSO support for TCP/IPv4 packets. Supported packets
may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input
packets have correct checksums, and doesn't update checksums for
output packets (the responsibility for this lies with the application).
Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.

TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
MBUF, to organize an output packet. Note that we refer to these two
chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
header, while the indirect mbuf simply points to a location within the
original packet's payload. Consequently, use of the GSO library requires
multi-segment MBUF support in the TX functions of the NIC driver.

If a packet is GSO'd, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
result, when all of its GSOed segments are freed, the packet is freed
automatically.

Signed-off-by: Jiayu Hu 
Signed-off-by: Mark Kavanagh 
Tested-by: Lei Yao 
---
 doc/guides/rel_notes/release_17_11.rst  |  12 +++
 lib/librte_eal/common/include/rte_log.h |   1 +
 lib/librte_gso/Makefile |   2 +
 lib/librte_gso/gso_common.c | 153 
 lib/librte_gso/gso_common.h | 141 +
 lib/librte_gso/gso_tcp4.c   | 104 ++
 lib/librte_gso/gso_tcp4.h   |  74 +++
 lib/librte_gso/rte_gso.c|  52 ++-
 8 files changed, 536 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_gso/gso_common.c
 create mode 100644 lib/librte_gso/gso_common.h
 create mode 100644 lib/librte_gso/gso_tcp4.c
 create mode 100644 lib/librte_gso/gso_tcp4.h

diff --git a/doc/guides/rel_notes/release_17_11.rst 
b/doc/guides/rel_notes/release_17_11.rst
index 7508be7..c414f73 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -41,6 +41,18 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Added the Generic Segmentation Offload Library.**
+
+  Added the Generic Segmentation Offload (GSO) library to enable
+  applications to split large packets (e.g. MTU is 64KB) into small
+  ones (e.g. MTU is 1500B). Supported packet types are:
+
+  * TCP/IPv4 packets, which may include a single VLAN tag.
+
+  The GSO library doesn't check if the input packets have correct
+  checksums, and doesn't update checksums for output packets.
+  Additionally, the GSO library doesn't process IP fragmented packets.
+
 
 Resolved Issues
 ---
diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index ec8dba7..2fa1199 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -87,6 +87,7 @@ struct rte_logs {
 #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */
 #define RTE_LOGTYPE_EFD   18 /**< Log related to EFD. */
 #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
+#define RTE_LOGTYPE_GSO   20 /**< Log related to GSO. */
 
 /* these log types can be used in an application */
 #define RTE_LOGTYPE_USER1 24 /**< User-defined log type 1. */
diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
index aeaacbc..2be64d1 100644
--- a/lib/librte_gso/Makefile
+++ b/lib/librte_gso/Makefile
@@ -42,6 +42,8 @@ LIBABIVER := 1
 
 #source files
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
new file mode 100644
index 000..ee75d4c
--- /dev/null
+++ b/lib/librte_gso/gso_common.c
@@ -0,0 +1,153 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF

[dpdk-dev] [PATCH v6 3/6] gso: add VxLAN GSO support

2017-10-02 Thread Mark Kavanagh
This patch adds a framework that allows GSO on tunneled packets.
Furthermore, it leverages that framework to provide GSO support for
VxLAN-encapsulated packets.

Supported VxLAN packets must have an outer IPv4 header (prepended by an
optional VLAN tag), and contain an inner TCP/IPv4 packet (with an optional
inner VLAN tag).

VxLAN GSO doesn't check if input packets have correct checksums and
doesn't update checksums for output packets. Additionally, it doesn't
process IP fragmented packets.

As with TCP/IPv4 GSO, VxLAN GSO uses a two-segment MBUF to organize each
output packet, which mandates support for multi-segment mbufs in the TX
functions of the NIC driver. Also, if a packet is GSOed, VxLAN GSO
reduces its MBUF refcnt by 1. As a result, when all of its GSO'd segments
are freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh 
Signed-off-by: Jiayu Hu 
---
 doc/guides/rel_notes/release_17_11.rst |   3 +
 lib/librte_gso/Makefile|   1 +
 lib/librte_gso/gso_common.h|  25 +++
 lib/librte_gso/gso_tunnel_tcp4.c   | 123 +
 lib/librte_gso/gso_tunnel_tcp4.h   |  75 
 lib/librte_gso/rte_gso.c   |  13 +++-
 6 files changed, 237 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_gso/gso_tunnel_tcp4.c
 create mode 100644 lib/librte_gso/gso_tunnel_tcp4.h

diff --git a/doc/guides/rel_notes/release_17_11.rst 
b/doc/guides/rel_notes/release_17_11.rst
index c414f73..25b8a78 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -48,6 +48,9 @@ New Features
   ones (e.g. MTU is 1500B). Supported packet types are:
 
   * TCP/IPv4 packets, which may include a single VLAN tag.
+  * VxLAN packets, which must have an outer IPv4 header (prepended by
+an optional VLAN tag), and contain an inner TCP/IPv4 packet (with
+an optional VLAN tag).
 
   The GSO library doesn't check if the input packets have correct
   checksums, and doesn't update checksums for output packets.
diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
index 2be64d1..e6d41df 100644
--- a/lib/librte_gso/Makefile
+++ b/lib/librte_gso/Makefile
@@ -44,6 +44,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tunnel_tcp4.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
index 8d9b94e..c051295 100644
--- a/lib/librte_gso/gso_common.h
+++ b/lib/librte_gso/gso_common.h
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IS_FRAGMENTED(frag_off) (((frag_off) & IPV4_HDR_OFFSET_MASK) != 0 \
|| ((frag_off) & IPV4_HDR_MF_FLAG) == IPV4_HDR_MF_FLAG)
@@ -49,6 +50,30 @@
 #define IS_IPV4_TCP(flag) (((flag) & (PKT_TX_TCP_SEG | PKT_TX_IPV4)) == \
(PKT_TX_TCP_SEG | PKT_TX_IPV4))
 
+#define IS_IPV4_VXLAN_TCP4(flag) (((flag) & (PKT_TX_TCP_SEG | PKT_TX_IPV4 | \
+   PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_VXLAN)) == \
+   (PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \
+PKT_TX_TUNNEL_VXLAN))
+
+/**
+ * Internal function which updates the UDP header of a packet, following
+ * segmentation. This is required to update the header's datagram length field.
+ *
+ * @param pkt
+ *  The packet containing the UDP header.
+ * @param udp_offset
+ *  The offset of the UDP header from the start of the packet.
+ */
+static inline void
+update_udp_header(struct rte_mbuf *pkt, uint16_t udp_offset)
+{
+   struct udp_hdr *udp_hdr;
+
+   udp_hdr = (struct udp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+   udp_offset);
+   udp_hdr->dgram_len = rte_cpu_to_be_16(pkt->pkt_len - udp_offset);
+}
+
 /**
  * Internal function which updates the TCP header of a packet, following
  * segmentation. This is required to update the header's 'sent' sequence
diff --git a/lib/librte_gso/gso_tunnel_tcp4.c b/lib/librte_gso/gso_tunnel_tcp4.c
new file mode 100644
index 000..34bbbd7
--- /dev/null
+++ b/lib/librte_gso/gso_tunnel_tcp4.c
@@ -0,0 +1,123 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corpo

[dpdk-dev] [PATCH v6 4/6] gso: add GRE GSO support

2017-10-02 Thread Mark Kavanagh
This patch adds GSO support for GRE-tunneled packets. Supported GRE
packets must contain an outer IPv4 header, and inner TCP/IPv4 headers.
They may also contain a single VLAN tag. GRE GSO doesn't check if all
input packets have correct checksums and doesn't update checksums for
output packets. Additionally, it doesn't process IP fragmented packets.

As with VxLAN GSO, GRE GSO uses a two-segment MBUF to organize each
output packet, which requires multi-segment mbuf support in the TX
functions of the NIC driver. Also, if a packet is GSOed, GRE GSO reduces
its MBUF refcnt by 1. As a result, when all of its GSOed segments are
freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh 
Signed-off-by: Jiayu Hu 
---
 doc/guides/rel_notes/release_17_11.rst |  3 +++
 lib/librte_gso/gso_common.h|  5 +
 lib/librte_gso/gso_tunnel_tcp4.c   | 14 ++
 lib/librte_gso/rte_gso.c   |  8 +---
 4 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_11.rst 
b/doc/guides/rel_notes/release_17_11.rst
index 25b8a78..808f537 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -51,6 +51,9 @@ New Features
   * VxLAN packets, which must have an outer IPv4 header (prepended by
 an optional VLAN tag), and contain an inner TCP/IPv4 packet (with
 an optional VLAN tag).
+  * GRE packets, which must contain an outer IPv4 header (prepended by
+an optional VLAN tag), and inner TCP/IPv4 headers (with an optional
+VLAN tag).
 
   The GSO library doesn't check if the input packets have correct
   checksums, and doesn't update checksums for output packets.
diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
index c051295..1e99cc0 100644
--- a/lib/librte_gso/gso_common.h
+++ b/lib/librte_gso/gso_common.h
@@ -55,6 +55,11 @@
(PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \
 PKT_TX_TUNNEL_VXLAN))
 
+#define IS_IPV4_GRE_TCP4(flag) (((flag) & (PKT_TX_TCP_SEG | PKT_TX_IPV4 | \
+   PKT_TX_OUTER_IPV4 | PKT_TX_TUNNEL_GRE)) == \
+   (PKT_TX_TCP_SEG | PKT_TX_IPV4 | PKT_TX_OUTER_IPV4 | \
+PKT_TX_TUNNEL_GRE))
+
 /**
  * Internal function which updates the UDP header of a packet, following
  * segmentation. This is required to update the header's datagram length field.
diff --git a/lib/librte_gso/gso_tunnel_tcp4.c b/lib/librte_gso/gso_tunnel_tcp4.c
index 34bbbd7..d79fc6b 100644
--- a/lib/librte_gso/gso_tunnel_tcp4.c
+++ b/lib/librte_gso/gso_tunnel_tcp4.c
@@ -42,11 +42,13 @@
struct tcp_hdr *tcp_hdr;
uint32_t sent_seq;
uint16_t outer_id, inner_id, tail_idx, i;
-   uint16_t outer_ipv4_offset, inner_ipv4_offset, udp_offset, tcp_offset;
+   uint16_t outer_ipv4_offset, inner_ipv4_offset;
+   uint16_t udp_gre_offset, tcp_offset;
+   uint8_t update_udp_hdr;
 
outer_ipv4_offset = pkt->outer_l2_len;
-   udp_offset = outer_ipv4_offset + pkt->outer_l3_len;
-   inner_ipv4_offset = udp_offset + pkt->l2_len;
+   udp_gre_offset = outer_ipv4_offset + pkt->outer_l3_len;
+   inner_ipv4_offset = udp_gre_offset + pkt->l2_len;
tcp_offset = inner_ipv4_offset + pkt->l3_len;
 
/* Outer IPv4 header. */
@@ -63,9 +65,13 @@
sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
tail_idx = nb_segs - 1;
 
+   /* Only update UDP header for VxLAN packets. */
+   update_udp_hdr = (pkt->ol_flags & PKT_TX_TUNNEL_VXLAN) ? 1 : 0;
+
for (i = 0; i < nb_segs; i++) {
update_ipv4_header(segs[i], outer_ipv4_offset, outer_id);
-   update_udp_header(segs[i], udp_offset);
+   if (update_udp_hdr)
+   update_udp_header(segs[i], udp_gre_offset);
update_ipv4_header(segs[i], inner_ipv4_offset, inner_id);
update_tcp_header(segs[i], tcp_offset, sent_seq, i < tail_idx);
outer_id++;
diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
index 6095689..b748ab1 100644
--- a/lib/librte_gso/rte_gso.c
+++ b/lib/librte_gso/rte_gso.c
@@ -60,8 +60,9 @@
 
if ((gso_ctx->gso_size >= pkt->pkt_len) || (gso_ctx->gso_types &
(DEV_TX_OFFLOAD_TCP_TSO |
-DEV_TX_OFFLOAD_VXLAN_TNL_TSO)) !=
-   gso_ctx->gso_types) {
+DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
+DEV_TX_OFFLOAD_GRE_TNL_TSO)) !=
+gso_ctx->gso_types) {
pkt->ol_flags &= (~PKT_TX_TCP_SEG);
pkts_out[0] = pkt;
return 1;
@@ -73,7 +74,8 @@
ipid_delta = (gso_ctx->ipid_flag != RTE_GSO_IPID_FIXED);
ol_flags = pkt->ol_flags;
 
-   if (IS_IPV4_VXLAN_TCP4(pkt->ol_flags)) {
+   if (IS_IPV4_VXLAN_TCP4(pkt->ol_flags) ||
+   IS_IP

[dpdk-dev] [PATCH v6 5/6] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO

2017-10-02 Thread Mark Kavanagh
From: Jiayu Hu 

This patch adds GSO support to the csum forwarding engine. Oversized
packets transmitted over a GSO-enabled port will undergo segmentation
(with the exception of packet-types unsupported by the GSO library).
GSO support is disabled by default.

GSO support may be toggled on a per-port basis, using the command:

"set port  gso on|off"

The maximum packet length (including the packet header and payload) for
GSO segments may be set with the command:

"set gso segsz "

Show GSO configuration for a given port with the command:

"show port  gso"

Signed-off-by: Jiayu Hu 
Signed-off-by: Mark Kavanagh 
---
 app/test-pmd/cmdline.c  | 178 
 app/test-pmd/config.c   |  24 
 app/test-pmd/csumonly.c |  69 ++-
 app/test-pmd/testpmd.c  |  13 ++
 app/test-pmd/testpmd.h  |  10 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  46 +++
 6 files changed, 335 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ccdf239..05b0ce8 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -431,6 +431,17 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set max flow number and max packet number per-flow"
" for GRO.\n\n"
 
+   "set port (port_id) gso (on|off)"
+   "Enable or disable Generic Segmentation Offload in"
+   " csum forwarding engine.\n\n"
+
+   "set gso segsz (length)\n"
+   "Set max packet length for output GSO segments,"
+   " including packet header and payload.\n\n"
+
+   "show port (port_id) gso\n"
+   "Show GSO configuration.\n\n"
+
"set fwd (%s)\n"
"Set packet forwarding mode.\n\n"
 
@@ -3967,6 +3978,170 @@ struct cmd_gro_set_result {
},
 };
 
+/* *** ENABLE/DISABLE GSO *** */
+struct cmd_gso_enable_result {
+   cmdline_fixed_string_t cmd_set;
+   cmdline_fixed_string_t cmd_port;
+   cmdline_fixed_string_t cmd_keyword;
+   cmdline_fixed_string_t cmd_mode;
+   uint8_t cmd_pid;
+};
+
+static void
+cmd_gso_enable_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_gso_enable_result *res;
+
+   res = parsed_result;
+   if (!strcmp(res->cmd_keyword, "gso"))
+   setup_gso(res->cmd_mode, res->cmd_pid);
+}
+
+cmdline_parse_token_string_t cmd_gso_enable_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_set, "set");
+cmdline_parse_token_string_t cmd_gso_enable_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_port, "port");
+cmdline_parse_token_string_t cmd_gso_enable_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_keyword, "gso");
+cmdline_parse_token_string_t cmd_gso_enable_mode =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_mode, "on#off");
+cmdline_parse_token_num_t cmd_gso_enable_pid =
+   TOKEN_NUM_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_pid, UINT8);
+
+cmdline_parse_inst_t cmd_gso_enable = {
+   .f = cmd_gso_enable_parsed,
+   .data = NULL,
+   .help_str = "set port  gso on|off",
+   .tokens = {
+   (void *)&cmd_gso_enable_set,
+   (void *)&cmd_gso_enable_port,
+   (void *)&cmd_gso_enable_pid,
+   (void *)&cmd_gso_enable_keyword,
+   (void *)&cmd_gso_enable_mode,
+   NULL,
+   },
+};
+
+/* *** SET MAX PACKET LENGTH FOR GSO SEGMENTS *** */
+struct cmd_gso_size_result {
+   cmdline_fixed_string_t cmd_set;
+   cmdline_fixed_string_t cmd_keyword;
+   cmdline_fixed_string_t cmd_segsz;
+   uint16_t cmd_size;
+};
+
+static void
+cmd_gso_size_parsed(void *parsed_result,
+  __attribute__((unused)) struct cmdline *cl,
+  __attribute__((unused)) void *data)
+{
+   struct cmd_gso_size_result *res = parsed_result;
+
+   if (test_done == 0) {
+   printf("Before setting GSO segsz, please first stop 
fowarding\n");
+   return;
+   }
+
+   if (!strcmp(res->cmd_keyword, "gso") &&
+   !strcmp(res->cmd_segsz, "segsz")) {
+   if (res->cmd_size == 0) {
+   printf("gso_size should be larger than 0."
+   " Please input a legal value\n");
+   } else
+   gso_max_segment_size = res->cmd_size;
+   }
+}
+
+cmdline_parse_token_string_t cmd_gso_size_s

[dpdk-dev] [PATCH v6 6/6] doc: add GSO programmer's guide

2017-10-02 Thread Mark Kavanagh
Add programmer's guide doc to explain the design and use of the
GSO library.

Signed-off-by: Mark Kavanagh 
Signed-off-by: Jiayu Hu 
---
 MAINTAINERS|   6 +
 .../generic_segmentation_offload_lib.rst   | 256 +++
 .../prog_guide/img/gso-output-segment-format.svg   | 313 ++
 doc/guides/prog_guide/img/gso-three-seg-mbuf.svg   | 477 +
 doc/guides/prog_guide/index.rst|   1 +
 5 files changed, 1053 insertions(+)
 create mode 100644 doc/guides/prog_guide/generic_segmentation_offload_lib.rst
 create mode 100644 doc/guides/prog_guide/img/gso-output-segment-format.svg
 create mode 100644 doc/guides/prog_guide/img/gso-three-seg-mbuf.svg

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df2a7f..8f0a4bd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -644,6 +644,12 @@ M: Jiayu Hu 
 F: lib/librte_gro/
 F: doc/guides/prog_guide/generic_receive_offload_lib.rst
 
+Generic Segmentation Offload
+M: Jiayu Hu 
+M: Mark Kavanagh 
+F: lib/librte_gso/
+F: doc/guides/prog_guide/generic_segmentation_offload_lib.rst
+
 Distributor
 M: Bruce Richardson 
 M: David Hunt 
diff --git a/doc/guides/prog_guide/generic_segmentation_offload_lib.rst 
b/doc/guides/prog_guide/generic_segmentation_offload_lib.rst
new file mode 100644
index 000..5e78f16
--- /dev/null
+++ b/doc/guides/prog_guide/generic_segmentation_offload_lib.rst
@@ -0,0 +1,256 @@
+..  BSD LICENSE
+Copyright(c) 2017 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Generic Segmentation Offload Library
+
+
+Overview
+
+Generic Segmentation Offload (GSO) is a widely used software implementation of
+TCP Segmentation Offload (TSO), which reduces per-packet processing overhead.
+Much like TSO, GSO gains performance by enabling upper layer applications to
+process a smaller number of large packets (e.g. MTU size of 64KB), instead of
+processing higher numbers of small packets (e.g. MTU size of 1500B), thus
+reducing per-packet overhead.
+
+For example, GSO allows guest kernel stacks to transmit over-sized TCP segments
+that far exceed the kernel interface's MTU; this eliminates the need to segment
+packets within the guest, and improves the data-to-overhead ratio of both the
+guest-host link, and PCI bus. The expectation of the guest network stack in 
this
+scenario is that segmentation of egress frames will take place either in the 
NIC
+HW, or where that hardware capability is unavailable, either in the host
+application, or network stack.
+
+Bearing that in mind, the GSO library enables DPDK applications to segment
+packets in software. Note however, that GSO is implemented as a standalone
+library, and not via a 'fallback' mechanism (i.e. for when TSO is unsupported
+in the underlying hardware); that is, applications must explicitly invoke the
+GSO library to segment packets. The size of GSO segments ``(segsz)`` is
+configurable by the application.
+
+Limitations
+---
+
+#. The GSO library doesn't check if input packets have correct checksums.
+
+#. In addition, the GSO library doesn't re-calculate checksums for segmented
+   packets (that task is left to the application).
+
+#. IP fragments are unsupported by the GSO library.
+
+#. The egress interface's driver must support multi-segment packets.
+
+#. Currently, the GSO library supports the following IPv4 packet types:
+
+ - TCP
+ - VxLAN
+ - G

Re: [dpdk-dev] [PATCH] lib/power: add turbo functions to version.map

2017-10-02 Thread Thomas Monjalon
02/10/2017 18:25, Hunt, David:
> Hi Thomas,
> 
> 
> On 2/10/2017 4:39 PM, Thomas Monjalon wrote:
> > 02/10/2017 17:06, Hunt, David:
> >> On 2/10/2017 3:55 PM, Thomas Monjalon wrote:
> >>> +DPDK_17.11 {
>  +global:
>  +
>  +rte_power_acpi_turbo_status;
> >>> Is it really the function you want to expose?
> >>> rte_power_turbo_status seems more generic.
> >> Not really, it was in there for completeness, but users should be able
> >> to keep track of the turbo'd cores, so not really needed.
> >>
> >>> More comments about what is part of the API:
> >>> If you do not want to expose ACPI and VM implementations,
> >>> it should not be part of the rte_* include files.
> >> I'll address the above comments in the next version.
> > 
> > You did not address the comment about what is rte_*.h.
> > If you do not want to expose everything, you should move it to
> > another .h file.
> >
> > Files starting with rte_ are included in doxygen API doc.
> > Only rte_power.h is installed.
> > The installed include, the doxygen doc and the map file
> > should all expose the same API consistently.
> >
> > I think a cleanup is needed.
> 
> While I agree a cleanup is needed, this small patch is only intended to 
> fix the priority issue of the shared library builds, which are broken at 
> the moment.
> The initial patch should have had rte_power_turbo_status, not 
> rte_power_acpi_turbo_status.
> Rather than moving code around at this stage, I propose having the three 
> exposed functions in the map file (with the correct names).

OK, so we need a v3 (v2 has only 2 functions).

> Then, later on, I can do an ABI breakage notification for the next 
> release to rename all the other rte*.h files, as some consumers of DPDK 
> may be using those directly, at which stage we will be down to just 
> exporting the functions in rte_power.h.
> Does that sound OK with you?

OK, thanks


Re: [dpdk-dev] [PATCH v4 3/3] efd: run-time dispatch over x86 EFD functions

2017-10-02 Thread Ananyev, Konstantin


> -Original Message-
> From: Li, Xiaoyun
> Sent: Monday, October 2, 2017 5:13 PM
> To: Ananyev, Konstantin ; Richardson, Bruce 
> 
> Cc: Lu, Wenzhuo ; Zhang, Helin ; 
> dev@dpdk.org; Li, Xiaoyun 
> Subject: [PATCH v4 3/3] efd: run-time dispatch over x86 EFD functions
> 
> This patch dynamically selects x86 EFD functions at run-time.
> This patch uses function pointer and binds it to the relative
> function based on CPU flags at constructor time.
> 
> Signed-off-by: Xiaoyun Li 
> ---
>  lib/librte_efd/rte_efd_x86.h | 41 ++---
>  1 file changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_efd/rte_efd_x86.h b/lib/librte_efd/rte_efd_x86.h
> index 34f37d7..93b6743 100644
> --- a/lib/librte_efd/rte_efd_x86.h
> +++ b/lib/librte_efd/rte_efd_x86.h
> @@ -43,12 +43,29 @@
>  #define EFD_LOAD_SI128(val) _mm_lddqu_si128(val)
>  #endif
> 
> +typedef efd_value_t
> +(*efd_lookup_internal_avx2_t)(const efd_hashfunc_t *group_hash_idx,
> + const efd_lookuptbl_t *group_lookup_table,
> + const uint32_t hash_val_a, const uint32_t hash_val_b);
> +
> +static efd_lookup_internal_avx2_t efd_lookup_internal_avx2_ptr;
> +
>  static inline efd_value_t
>  efd_lookup_internal_avx2(const efd_hashfunc_t *group_hash_idx,
>   const efd_lookuptbl_t *group_lookup_table,
>   const uint32_t hash_val_a, const uint32_t hash_val_b)
>  {
> -#ifdef RTE_MACHINE_CPUFLAG_AVX2
> + return (*efd_lookup_internal_avx2_ptr)(group_hash_idx,
> +group_lookup_table,
> +hash_val_a, hash_val_b);

I don't think you need all that.
All you need - build proper avx2 function even if current HW doesn't support it.
The existing runtime selection here seems ok already.
Konstantin

> +}
> +
> +#ifdef CC_SUPPORT_AVX2
> +static inline efd_value_t
> +efd_lookup_internal_avx2_AVX2(const efd_hashfunc_t *group_hash_idx,
> + const efd_lookuptbl_t *group_lookup_table,
> + const uint32_t hash_val_a, const uint32_t hash_val_b)
> +{
>   efd_value_t value = 0;
>   uint32_t i = 0;
>   __m256i vhash_val_a = _mm256_set1_epi32(hash_val_a);
> @@ -74,13 +91,31 @@ efd_lookup_internal_avx2(const efd_hashfunc_t 
> *group_hash_idx,
>   }
> 
>   return value;
> -#else
> +}
> +#endif
> +
> +static inline efd_value_t
> +efd_lookup_internal_avx2_DEFAULT(const efd_hashfunc_t *group_hash_idx,
> + const efd_lookuptbl_t *group_lookup_table,
> + const uint32_t hash_val_a, const uint32_t hash_val_b)
> +{
>   RTE_SET_USED(group_hash_idx);
>   RTE_SET_USED(group_lookup_table);
>   RTE_SET_USED(hash_val_a);
>   RTE_SET_USED(hash_val_b);
>   /* Return dummy value, only to avoid compilation breakage */
>   return 0;
> -#endif
> +}
> 
> +static void __attribute__((constructor))
> +rte_efd_x86_init(void)
> +{
> +#ifdef CC_SUPPORT_AVX2
> + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
> + efd_lookup_internal_avx2_ptr = efd_lookup_internal_avx2_AVX2;
> + else
> + efd_lookup_internal_avx2_ptr = efd_lookup_internal_avx2_DEFAULT;
> +#else
> + efd_lookup_internal_avx2_ptr = efd_lookup_internal_avx2_DEFAULT;
> +#endif
>  }
> --
> 2.7.4



Re: [dpdk-dev] [PATCH] igb_uio: remove PCI reset during uio device open

2017-10-02 Thread Shijith Thotton
On Fri, Sep 29, 2017 at 12:57:22PM +, Wu, Jingjing wrote:
> Hi, Shijith
> 
> Only removing the PCI reset in uio device open function is not enough.
> 
> We faced an issue like:
> 
> 1. Here is a FVL NIC, generate VF on one port, and then pass-through the VF 
> by vfio-pci to VM:
> For example:
> echo 1 > /sys/bus/pci/devices/\:07\:00.1/sriov_numvfs
> modprobe vfio-pci
> echo "8086 154c" > /sys/bus/pci/drivers/vfio-pci/new_id
> echo :07:0a.0 > /sys/bus/pci/devices/\:07\:0a.0/driver/unbind
> echo :07:0a.0 > /sys/bus/pci/drivers/vfio-pci/bind
> 
> 2. Start VM (by QEMU) in the VM, and in VM, bind the passthrough VF to 
> igb_uio driver
> 3.Check the MSIX status of that VF, you can see the MSIX is enabled both in 
> guest and host.
> For example:
> root@ubuntu-4:~ # lspci -vv -s 00:04.0 | grep MSI
> Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 
> [root@dpdk2]# lspci -vv -s 07:0a.0 | grep MSI
> Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 
> 4. start dpdk example (e.g. testpmd)
> 5. quit the dpdk example
> 6. Check the MSIX status of that VF, you can see the MSIX is enabled in 
> Guest, but disabled on host
> 
> Such like:
> root@ubuntu-4:~ # lspci -vv -s 00:04.0 | grep MSI
> Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
> Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 
> [root@dpdk2 dpdk.org]# lspci -vv -s 07:0a.0 | grep MSI
>Capabilities: [70] MSI-X: Enable- Count=5 Masked-
> Capabilities: [a0] Express (v2) Endpoint, MSI 00
> 
> 7. if restart dpdk application again, DPDK in VM cannot get any interrupts on 
> that VF.
> 
> 
> After investigate, I found current Qemu cannot support pci_reset_function 
> well if the MSI-X is enabled on that VF..
> Because when we use pci_reset_function to reset VF in in VM, the Qemu 
> captures the control register reading/writing.
> 
> In pci_reset_function, it first reads the PCI configure and set FLR reset, 
> and then writes PCI configure as restoration. But not all the writing are 
> successful to Host.
> If we look into the vfio-pci driver, you will find that, for different PCI 
> CAP ID, the read/write functions are different. For PCI MSI-X, it cannot be 
> write to host VF. I think that is because vfio already provides ioctl ops to 
> deal with MSI-X cap.
> 
> So I think it is a common issue, not only for intel NICs.
> 
> There may be same ways to fix that:
> 
> 1. fix Qemu to capture the FLR writing, and sync the Qemu's status on MSIX.
> 2. revert the patch in DPDK which introduced "pci_reset_function".
> 3. move the pci_reset_function from open/release func to igb_uio probe/remove 
> func.
> 4. move the enable/disable MSIX from probe/remove to open/release func.
> 
> Any opinions?
> 

Hi Jingjing,

Thanks for finding the root cause. I'm in for reverting the patch (as there are
chances of issues in future), even though option 4 can fix the issue for both
side. If there are no expert opinion on this, please proceed with the best
option.

Shijith

> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Shijith Thotton
> > Sent: Tuesday, September 19, 2017 6:24 PM
> > To: dev@dpdk.org
> > Cc: Yigit, Ferruh ; Thomas Monjalon 
> > ;
> > Yang, Qiming ; Patil, Harish 
> > ; Zhang,
> > Helin ; Gregory Etelson ; Tan, 
> > Jianfeng
> > ; Hu, Xuekun ; Li, Xiaoyun
> > ; Thotton, Shijith ;
> > sta...@dpdk.org
> > Subject: [dpdk-dev] [PATCH] igb_uio: remove PCI reset during uio device open
> > 
> > Issuing reset during uio device open caused PMD init failure for some
> > NIC VFs (i40, ixgbe, qede) in host. So this initial reset is removed.
> > Bus master enable is kept as part of open since we disable it in uio
> > device release.
> > 
> > Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device 
> > file")
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Shijith Thotton 
> > ---
> >  lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> > b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> > index 07a19a3..a6c2996 100644
> > --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> > +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
> > @@ -179,9 +179,7 @@ struct rte_uio_pci_dev {
> > struct rte_uio_pci_dev *udev = info->priv;
> > struct pci_dev *dev = udev->pdev;
> > 
> > -   pci_reset_function(dev);
> > -
> > -   /* set bus master, which was cleared by the reset function */
> > +   /* enable bus mastering on the device */
> > pci_set_master(dev);
> > 
> > return 0;
> > --
> > 1.8.3.1
> 


Re: [dpdk-dev] [PATCH 1/3] ethdev: add Rx HW timestamp capability

2017-10-02 Thread Ferruh Yigit
On 10/2/2017 3:50 PM, Raslan Darawsheh wrote:
> Add a new offload capability flag for Rx HW
> timestamp and enabling/disabling this via rte_eth_rxmode.
> 
> Signed-off-by: Raslan Darawsheh 
> Acked-by: Yongseok Koh 

Hi Raslan,

Is this v4? There are two versions sent today without version
information, it is confusing.

Can you please send latest version again with version information (v4 or
v5) so it will be possible to figure out latest version?

Thanks,
ferruh


Re: [dpdk-dev] [PATCH] eal: add doc for constructor macros

2017-10-02 Thread Ferruh Yigit
On 10/2/2017 3:59 PM, Thomas Monjalon wrote:
> It is a reminder that the constructors without priority
> get the lowest priority.
> 
> Signed-off-by: Thomas Monjalon 

Reviewed-by: Ferruh Yigit 


[dpdk-dev] Inter-PMD dependencies when building shared libraries

2017-10-02 Thread Eads, Gage
I believe I've spotted an issue in the way inter-PMD dependencies are handled 
when building shared libraries. The depdirs_rule in mk/rte.subdir.mk relies on 
DEPDIRS-xyz containing the names of subdirectories that xyz depends on. In 
mk/rte.lib.mk, these DEPDIRS are converted into LDLIBS. This works when the 
subdirectory names match the library names (i.e. any of the libraries under 
lib/). However when the dependency is on a PMD, the subdirectory and library 
names don't match.

This is a problem, for example, in a patch for the net/octeontx PMD, which has 
a dependency on the event/octeontx PMD: 
http://dpdk.org/ml/archives/dev/2017-August/073983.html

I've reproduced this with a contrived example, by making the failsafe PMD 
depend on the NULL PMD in drivers/net/Makefile:
-DEPDIRS-failsafe = $(core-libs)
+DEPDIRS-failsafe = $(core-libs) librte_pmd_null

You can reproduce the build failure by running this command:
./devtools/test-build.sh x86_64-native-linuxapp-gcc+CONFIG_RTE_BUILD_SHARED_LIB

I'm no expert on DPDK's dependency handling code, but one option is to modify 
rte.lib.mk like so:
-LDLIBS += $(subst lib,-l,$(_LDDIRS))
+LDLIBS += $(subst lib,-l,$(filter lib%,$(_LDDIRS)))

Then you could put the PMD's directory name in DEPDIRs, and specify the 
depended-on library in the PMD's LDLIBS (as is done in the aforementioned 
net/octeontx PMD).

Thoughts?

Thanks,
Gage


Re: [dpdk-dev] [PATCH v4 00/24] bnxt patchset

2017-10-02 Thread Ferruh Yigit
On 9/28/2017 10:43 PM, Ajit Khaparde wrote:
> This patch set includes some bug fixes and also adds
> support for new dev_ops like rx_queue_count,
> rx/tx_descriptor_status, get/set_eeprom
> and rx_queue_intr_enable/disable.
> It also adds support for the flow_filter funciton to add
> Flow API functionality.
> 
> Please apply.
> 
> Ajit Khaparde (22):
>   net/bnxt: fix HWRM_*() macros and locking
>   net/bnxt: use 64-bits of address for vlan_table
>   net/bnxt: fix an issue with group id calculation
>   net/bnxt: fix calculation of number of pools
>   net/bnxt: handle multi queue mode properly
>   net/bnxt: fix rx handling and buffer allocation logic
>   net/bnxt: fix an issue with broadcast traffic
>   net/bnxt: fix usage of ETH_VMDQ_* flags
>   net/bnxt: set checksum offload flags correctly
>   net/bnxt: update status of Rx IP/L4 CKSUM
>   net/bnxt: add support for xstats get by id
>   net/bnxt: fix config rss update
>   net/bnxt: set the hash_key_size
>   net/bnxt: add support for rx_queue_count
>   net/bnxt: add support for rx_descriptor_status
>   net/bnxt: add support for tx_descriptor_status
>   net/bnxt: add new HWRM structs to support flow filtering
>   net/bnxt: add support for flow filter ops
>   doc: update release notes
>   net/bnxt: fix per queue stats display in xstats
>   net/bnxt: prevent interrupt handler from accessing freed memory
>   net/bnxt: add dev_supported_ptypes_get dev_op
> 
> Somnath Kotur (2):
>   net/bnxt: add support for get/set EEPROM
>   net/bnxt: add support for rx_queue_intr_enable/disable APIs

Series applied to dpdk-next-net/master, thanks.

Welcome Somnath!

(I have updated some patches to update bnxt.ini file, can you please
confirm final bnxt.ini file)


Re: [dpdk-dev] [PATCH] mk: add silvermont to replace atom as a target

2017-10-02 Thread Ferruh Yigit
On 9/8/2017 10:07 AM, Bruce Richardson wrote:
> On Fri, Sep 08, 2017 at 11:28:52AM +0800, Xiaoyun Li wrote:
>> The -march=atom flag is for older atom CPUs and don't support SSE4 which
>> is the minimum reqiurement for DPDK. And in fact, the current atom CPUs
>> support SSE4. So this patch removes atom as a target for DPDK builds and
>> adds a silvermont replacement instead.
>>
>> Signed-off-by: Xiaoyun Li 

<...>

> Acked-by: Bruce Richardson 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH v2] net/vmxnet3: fix dereference before null check

2017-10-02 Thread Ferruh Yigit
On 9/29/2017 2:04 PM, Michal Jastrzebski wrote:
> Coverity reports check_after_deref:
> Null-checking rq suggests that it may be null, but it
> has already been dereferenced on all paths leading to
> the check.
> This patch removes NULL checking of "rq" from function
> vmxnet3_dev_rx_queue_reset as it is already checked against NULL
> one level up the callstack (function vmxnet3_dev_clear_queues).
> 
> Coverity issue: 143468
> Fixes: 5aecdc17a97d ("vmxnet3: fix stop/restart")
> Cc: yongw...@vmware.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Tomasz Kulasek 
> Signed-off-by: Michal Jastrzebski 

Reviewed-by: Ferruh Yigit 


Re: [dpdk-dev] [dpdk-stable] [PATCH v2] net/vmxnet3: fix dereference before null check

2017-10-02 Thread Ferruh Yigit
On 10/2/2017 10:39 PM, Ferruh Yigit wrote:
> On 9/29/2017 2:04 PM, Michal Jastrzebski wrote:
>> Coverity reports check_after_deref:
>> Null-checking rq suggests that it may be null, but it
>> has already been dereferenced on all paths leading to
>> the check.
>> This patch removes NULL checking of "rq" from function
>> vmxnet3_dev_rx_queue_reset as it is already checked against NULL
>> one level up the callstack (function vmxnet3_dev_clear_queues).
>>
>> Coverity issue: 143468
>> Fixes: 5aecdc17a97d ("vmxnet3: fix stop/restart")
>> Cc: yongw...@vmware.com
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Tomasz Kulasek 
>> Signed-off-by: Michal Jastrzebski 
> 
> Reviewed-by: Ferruh Yigit 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH v6 00/40] Introduce NXP DPAA Bus, Mempool and PMD

2017-10-02 Thread Ferruh Yigit
On 9/28/2017 1:29 PM, Shreyansh Jain wrote:
> Change Log:
> 
> 
> v6:
>  - rebased over net-next/master (9d660ac) 
>  - fixed mk/rte.app.mk (Thomas's comment). It had incorrect
>style of adding library linking
>  - changed from manual memcpy of etheraddr to ether_addr_copy
>as suggested by Ferruh
>  (these were minor changes missed in v5)
> 
>  v5:
>  - rebased over net-next/master (9d660ac) 
>  - restructuring debugging macros. Removed a few and combined
>others. DPAA now reflects the dynamic logging with segragated
>DP logging
>  - updated documentation for missing configuration option
>  - fixed map file; shared build was broken earlier
>  - other minor fixes from review comments
> 
> v4:
>  - Some checkpatch fixes which were reported by checkpatch@dpdk
>  - adding extra stats feature patch (patch 41)
> 
> v3:
>  - Rebasing over 17.11-rc0 (85238f50)
>  - Checkpatch fixes
>(There are still 2 errors which I think are false positives)
>  - Implement rte_bus.find_device() interface
>  - Various other minor updates/cleanups
> 
> v2:
>  - Fixing various comments from Ferruh, but broadly:
>   -) Logging is been changed to reflect rte_log_register
>   -) Logs across Bus, Mempool and PMD updated
>   -) fixed incorrect feature claimed in dpaa.ini
>  - Removed 24/40/48 bit swapping macro from EAL.
>These are defined in dpaa/bus now (compat.h)
>  - Added missing memory cleanup operation
>  - Updated documentation with some missing information
> 
> Introduction
> 
> 
> RFC was posted here -> [R3]
> V5 was posted here  -> [R8]
> 
> This patch series adds NXP's QorIQ-Layerscape DPAA Architecture based
> bus driver, mempool driver and PMD. This version of driver supports NXP
> LS1043A/LS1023A, LS1046A/LS1026A family of network SoCs. [R1]
> 
> DPAA, or Datapath Acceleration Architecture [R2], is a set of hardware
> components designed for high-speed network packet processing. This
> architecture provides the infrastructure to support simplified sharing of
> networking interfaces and accelerators by multiple CPU cores, and the
> accelerators themselves.
> 
> This patchset introduces the following:
> 1. DPAA Bus (drivers/bus/dpaa)
>  The core of DPAA bus is implemented using 3 main hardware blocks: QMan,
>  or Queue Manager; BMan, or Buffer Manager and FMan, or Frame Manager.
>  The patches introduce necessary layers to expose the DPAA hardware
>  blocks for interfacing with RTE framework.
> 
> 2. DPAA Mempool (drivers/mempool/dpaa)
>  BMan, or Buffer Manager, block of DPAA features a hardware offloaded
>  mempool. These patches add support for a driver to manage the BMan
>  block. This driver allows for mempool creation, deletion, buffer
>  acquire and release, as per the RTE APIs.
> 
> 3. DPAA PMD (drivers/net/dpaa)
>  The Poll Mode Driver for DPAA NIC Interfaces.
> 
> Patch Layout
> 
> 
> 01: Add DPAA SoC build configuration
> 02~16: Add DPAA Bus support and features, incrementally
> 17: Add Documentation
> 18~21: Add DPAA Mempool support
> 22~40: Add PMD and its various features, incrementally
> 
> References
> ==
> 
> [R1] 
> http://www.nxp.com/products/microcontrollers-and-processors/arm-processors/qoriq-layerscape-arm-processors:QORIQ-ARM
> [R2] http://www.nxp.com/assets/documents/data/en/white-papers/QORIQDPAAWP.pdf
> [R3] RFC: http://dpdk.org/ml/archives/dev/2017-May/066675.html
> [R4] v1: http://dpdk.org/ml/archives/dev/2017-June/068020.html
> [R5] v2: http://dpdk.org/ml/archives/dev/2017-July/070113.html
> [R6] v3: http://dpdk.org/ml/archives/dev/2017-August/073269.html
> [R7] v4: http://dpdk.org/ml/archives/dev/2017-September/074936.html
> [R8] v5: http://dpdk.org/dev/patchwork/patch/29245/
> 
> Hemant Agrawal (3):
>   bus/dpaa: add compatibility and helper macros
>   net/dpaa: support firmware version get API
>   net/dpaa: support extended statistics
> 
> Shreyansh Jain (37):
>   config: add NXP DPAA SoC build configuration
>   bus/dpaa: introduce NXP DPAA Bus driver skeleton
>   bus/dpaa: add OF parser for device scanning
>   bus/dpaa: introducing FMan configurations
>   bus/dpaa: add FMan hardware operations
>   bus/dpaa: enable DPAA IOCTL portal driver
>   bus/dpaa: add layer for interrupt emulation using pthread
>   bus/dpaa: add routines for managing a RB tree
>   bus/dpaa: add QMAN interface driver
>   bus/dpaa: add QMan driver core routines
>   bus/dpaa: add BMAN driver core
>   bus/dpaa: support FMAN frame queue lookup
>   bus/dpaa: add BMan hardware interfaces
>   bus/dpaa: add fman flow control threshold setting
>   bus/dpaa: integrate DPAA Bus with hardware blocks
>   doc: add NXP DPAA PMD documentation
>   bus/dpaa: add DPAA mempool logging macros
>   mempool/dpaa: support NXP DPAA Mempool
>   config: enable compilation of DPAA Mempool driver
>   bus/dpaa: add DPAA PMD logging macros
>   net/dpaa: add NXP DPAA PMD driver skeleton
>   config: enable NXP DPAA PMD compilation
>   net/dpaa: support Tx and Rx queue setup
>   net/dp

Re: [dpdk-dev] [PATCH 1/2] bus/dpaa: fix incorrect ccsr mem allocation

2017-10-02 Thread Ferruh Yigit
On 9/28/2017 3:10 PM, Shreyansh Jain wrote:
> Fixes: 5ad2d123be48 "(bus/dpaa: introducing FMan configurations)"
> 
> Signed-off-by: Shreyansh Jain 

Series squashed into relevant commit in next-net, thanks.


Re: [dpdk-dev] [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy

2017-10-02 Thread Li, Xiaoyun
Hi

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Tuesday, October 3, 2017 00:39
> To: Li, Xiaoyun ; Richardson, Bruce
> 
> Cc: Lu, Wenzhuo ; Zhang, Helin
> ; dev@dpdk.org
> Subject: RE: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy
> 
> 
> 
> > -Original Message-
> > From: Li, Xiaoyun
> > Sent: Monday, October 2, 2017 5:13 PM
> > To: Ananyev, Konstantin ; Richardson,
> Bruce 
> > Cc: Lu, Wenzhuo ; Zhang, Helin
> ; dev@dpdk.org; Li, Xiaoyun 
> > Subject: [PATCH v4 1/3] eal/x86: run-time dispatch over memcpy
> >
> > This patch dynamically selects functions of memcpy at run-time based
> > on CPU flags that current machine supports. This patch uses function
> > pointers which are bind to the relative functions at constrctor time.
> > In addition, AVX512 instructions set would be compiled only if users
> > config it enabled and the compiler supports it.
> >
> > Signed-off-by: Xiaoyun Li 
> > ---
> > v2
> > * Use gcc function multi-versioning to avoid compilation issues.
> > * Add macros for AVX512 and AVX2. Only if users enable AVX512 and the
> > compiler supports it, the AVX512 codes would be compiled. Only if the
> > compiler supports AVX2, the AVX2 codes would be compiled.
> >
> > v3
> > * Reduce function calls via only keep rte_memcpy_xxx.
> > * Add conditions that when copy size is small, use inline code path.
> > Otherwise, use dynamic code path.
> > * To support attribute target, clang version must be greater than 3.7.
> > Otherwise, would choose SSE/AVX code path, the same as before.
> > * Move two mocro functions to the top of the code since they would be
> > used in inline SSE/AVX and dynamic SSE/AVX codes.
> >
> > v4
> > * Modify rte_memcpy.h to several .c files and modify makefiles to compile
> > AVX2 and AVX512 files.
> 
> Could you explain to me why instead of reusing existing rte_memcpy() code
> to generate _sse/_avx2/ax512f flavors you keep pushing changes with 3
> separate implementations?
> Obviously that is much more expensive in terms of maintenance and doesn't
> look like
> feasible solution to me.
> Is existing rte_memcpy() implementation is not good enough in terms of
> functionality and/or performance?
> If so, can you outline these problems and try to fix them first.
> Konstantin
> 

I just change many small functions to one function in those 3 separate 
functions.
Because the existing codes are totally inline, including rte_memcpy() itself. 
So the compilation will 
change all rte_memcpy() calls into the basic codes like xmm0=xxx.

The existing codes in this way are OK. But when run-time, it will bring lots of 
function calls
and cause perf drop.


Best Regards,
Xiaoyun Li

 


Re: [dpdk-dev] [PATCH v3] net/bonding: support bifurcated driver in eal cli using --vdev

2017-10-02 Thread Ferruh Yigit
On 10/2/2017 12:06 PM, Doherty, Declan wrote:
> On 20/09/2017 7:04 PM, Gowrishankar wrote:
>> From: Gowrishankar Muthukrishnan 
>>
>> At present, creating bonding devices using --vdev is broken for PMD like
>> mlx5 as it is neither UIO nor VFIO based and hence PMD driver is unknown
>> to find_port_id_by_pci_addr(), as below.
>>
>> testpmd  --vdev 'net_bonding0,mode=1,slave=,socket_id=0'
>>
>> PMD: bond_ethdev_parse_slave_port_kvarg(150) - Invalid slave port value
>>   () specified
>> EAL: Failed to parse slave ports for bonded device net_bonding0
>>
>> This patch fixes parsing PCI ID from bonding device params by verifying
>> it in RTE PCI bus, rather than checking dev->kdrv.
>>
>> Fixes: eac901ce ("ethdev: decouple from PCI device")
>> Signed-off-by: Gowrishankar Muthukrishnan 
>> ---
<...>
> Acked-by: Declan Doherty 

Reviewed-by: Gaetan Rivet 

Applied to dpdk-next-net/master, thanks.

(Gaetan, we may ask your help during merge on this.)


  1   2   >