Re: [dpdk-dev] [PATCH 12/21] vhost: introduce guest IOVA to backend VA helper

2017-09-05 Thread Maxime Coquelin



On 09/05/2017 06:14 AM, Tiwei Bie wrote:

On Thu, Aug 31, 2017 at 11:50:14AM +0200, Maxime Coquelin wrote:

This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual addresses to backend's virtual addresses.

When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.

When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.

Signed-off-by: Maxime Coquelin 
---
  lib/librte_vhost/vhost.c | 27 +++
  lib/librte_vhost/vhost.h |  3 +++
  2 files changed, 30 insertions(+)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index bae98b02d..0e8c0386a 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -48,9 +48,11 @@
  #include 
  #include 
  #include 
+#include 


This header isn't needed.

Right, I'll remove it.

Thanks,
Maxime


Best regards,
Tiwei Bie



Re: [dpdk-dev] [RFC PATCH 0/4] ethdev new offloads API

2017-09-05 Thread Jerin Jacob
-Original Message-
> Date: Mon, 28 Aug 2017 12:57:13 +0200
> From: Thomas Monjalon 
> To: Jerin Jacob 
> Cc: dev@dpdk.org, Shahaf Shuler 
> Subject: Re: [dpdk-dev] [RFC PATCH 0/4] ethdev new offloads API
> 
> 28/08/2017 07:00, Jerin Jacob:
> > From: Shahaf Shuler 
> > > Friday, August 25, 2017 1:32 PM, Jerin Jacob:
> > > > >
> > > > > The new API does not have an equivalent for the below Tx flags:
> > > > >
> > > > > * ETH_TXQ_FLAGS_NOREFCOUNT
> > > > > * ETH_TXQ_FLAGS_NOMULTMEMP
> > > > 
> > > > IMO, it make sense to keep those flags as PMD optimization if an 
> > > > application
> > > > does not need reference count and multi mempool in the application.
> > > > As example, An non trivial application like l3fwd does not need both of 
> > > > them.
> > > 
> > > The l3fwd application is yet another simple example from DPDK tree. Am 
> > > not sure that a complete vRouter/vSwitch implementation is with the same 
> > > characteristics.
> > 
> > But not all dpdk applications are complete vRouter/vSwitch implementation.
> > 
> > > Moreover, I think the fact there is an application which is able to use 
> > > it is not enough.  IMO there needs to be some basic functionality always 
> > > provided by the PMDs and not controlled by flags.
> > > For example, let's say we have an application which always sends the 
> > > mbufs with the same ol_flags, or even with the same length.
> > 
> > Does ETH_TXQ_FLAGS_NOREFCOUNT comes in same category like mbuf->ol_flags?
> > 
> > > Will it make sense to add more flags to control it?
> > > Will it makes sense to run RFC2544 benchmark with testpmd io forwarding 
> > > with those flags? 
> > > 
> > > If the answer is yes, maybe those flags (and others to follow) belong on 
> > > different location on ethdev. However for sure they are not offloads.
> > 
> > I am not sure about the reason for opting out mempool related flags.
> > In the context of HW assisted external mempool managers, Enabling reference 
> > count is an offload
> > from Ethernet device.
> > For example, with external HW assisted mempool, ethdev driver needs to
> > have different way of forming TXQ descriptor in case if reference count
> > is enabled(as, in the case of HW assisted mempool managers, bu default,
> > HW frees the packet on send)
> > 
> > I am fine with moving the flags to some where else if it is make sense to 
> > you.But
> > from PMD optimization perspective, I think it is important have these flags.
> 
> Why not.

> We can have a function to enable such optimizations.

OK

> However I am not sure ethdev is the right place as these hints apply
> to any mbuf.

I think, We are talking about the mbuf behavior when working with ethdev TXQ
here. Right? IMO, it make senses in the ethdev layer.

We pulled in ETH_TXQ_FLAGS_NOMULTSEGS flag for the rework even though it
is related to mbuf. If you think, ETH_TXQ_FLAGS_NOREFCOUNT and
ETH_TXQ_FLAGS_NOMULTMEMP should NOT a be PER TXQ configurable flag then it can
be moved to Port level TXQ configuration at

function: rte_eth_dev_configure()
struct rte_eth_conf::struct rte_eth_txmode:

I think it is important for The ARM64 architecture to optimize for the
application which does not need reference counting and single pool
configuration like l3fwd.

Reasons:
- The NPU class ethdev hardwares are tightly coupled with external
  mempool ops with HW offload, and there is provision to utilize these features
- For the general purpose arm64 perspective, At least for the low-end systems,
  the cache hierarchy is quite different from x86. So its costly to deference
  the mbuf area(which stores the mempool handler) after the Tx free. We
  can support both use cases, just that it should configurable based on
  the flags by the application.

> Please Jerin, could you work on moving these settings in a new API?

Sure. Once the generic code is in place. We are committed to fix the
PMDs by 18.02.

Jerin


Re: [dpdk-dev] [PATCH 08/21] vhost: iotlb: add pending miss request list and helpers

2017-09-05 Thread Tiwei Bie
On Thu, Aug 31, 2017 at 11:50:10AM +0200, Maxime Coquelin wrote:
> In order to be able to handle other ports or queues while waiting
> for an IOTLB miss reply, a pending list is created so that waiter
> can return and restart later on with sending again a miss request.
> 
> Signed-off-by: Maxime Coquelin 
> ---
>  lib/librte_vhost/iotlb.c | 88 
> ++--
>  lib/librte_vhost/iotlb.h |  4 +++
>  lib/librte_vhost/vhost.h |  1 +
>  3 files changed, 91 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_vhost/iotlb.c b/lib/librte_vhost/iotlb.c
> index 1b739dae5..d014bfe98 100644
> --- a/lib/librte_vhost/iotlb.c
> +++ b/lib/librte_vhost/iotlb.c
> @@ -49,7 +49,86 @@ struct vhost_iotlb_entry {
>   uint8_t perm;
>  };
>  
> -#define IOTLB_CACHE_SIZE 1024
> +#define IOTLB_CACHE_SIZE 2048
> +
> +static void vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
> +{
> + struct vhost_iotlb_entry *node, *temp_node;
> +
> + rte_rwlock_write_lock(&vq->iotlb_lock);
> +
> + TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
> + TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
> + rte_mempool_put(vq->iotlb_pool, node);
> + }
> +
> + rte_rwlock_write_unlock(&vq->iotlb_lock);
> +}
> +
> +int vhost_user_iotlb_pending_miss(struct vhost_virtqueue *vq, uint64_t iova,
> + uint8_t perm)
> +{
> + struct vhost_iotlb_entry *node;
> + int found = 0;
> +

The return value of this function is boolean. So it's better
to return bool instead of int.

> + rte_rwlock_read_lock(&vq->iotlb_lock);
> +
> + TAILQ_FOREACH(node, &vq->iotlb_pending_list, next) {
> + if ((node->iova == iova) && (node->perm == perm)) {
> + found = 1;
> + break;
> + }
> + }
> +
> + rte_rwlock_read_unlock(&vq->iotlb_lock);
> +
> + return found;
> +}
> +
> +void vhost_user_iotlb_pending_insert(struct vhost_virtqueue *vq,
> + uint64_t iova, uint8_t perm)
> +{
> + struct vhost_iotlb_entry *node;
> + int ret;
> +
> + ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
> + if (ret) {
> + RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool empty, invalidate 
> cache\n");

I think The log level should be INFO or the likes, not ERR.

> + vhost_user_iotlb_pending_remove_all(vq);
> + ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
> + if (ret) {
> + RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool still empty, 
> failure\n");
> + return;
> + }
> + }
> +
> + node->iova = iova;
> + node->perm = perm;
> +
> + rte_rwlock_write_lock(&vq->iotlb_lock);
> +
> + TAILQ_INSERT_TAIL(&vq->iotlb_pending_list, node, next);
> +
> + rte_rwlock_write_unlock(&vq->iotlb_lock);
> +}
> +
> +static void vhost_user_iotlb_pending_remove(struct vhost_virtqueue *vq,
> + uint64_t iova, uint64_t size, uint8_t perm)
> +{
> + struct vhost_iotlb_entry *node, *temp_node;
> +
> + /* .iotlb_lock already locked by the caller */
> + TAILQ_FOREACH_SAFE(node, &vq->iotlb_pending_list, next, temp_node) {
> + if (node->iova < iova)
> + continue;
> + if (node->iova >= iova + size)
> + continue;
> + if ((node->perm & perm) != node->perm)
> + continue;
> + TAILQ_REMOVE(&vq->iotlb_pending_list, node, next);
> + rte_mempool_put(vq->iotlb_pool, node);
> + }
> +}
>  
>  static void vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
>  {
> @@ -106,7 +185,10 @@ void vhost_user_iotlb_cache_insert(struct 
> vhost_virtqueue *vq, uint64_t iova,
>   TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
>  
>  unlock:
> + vhost_user_iotlb_pending_remove(vq, iova, size, perm);
> +
>   rte_rwlock_write_unlock(&vq->iotlb_lock);
> +

This empty line should be removed.

Best regards,
Tiwei Bie

>  }
>  
>  void vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,


Re: [dpdk-dev] [PATCH v2] librte_mbuf: modify port initialization value

2017-09-05 Thread Thomas Monjalon
05/09/2017 07:13, Zhiyong Yang:
> In order to support more than 256 virtual ports, the field "port"
> in rte_mbuf has been increased to 16 bits. The initialization/reset
> value of the field "port" should be changed from 0xff to 0x
> accordingly.

This patch should be merged with the range increase.


Re: [dpdk-dev] [PATCH 5/6] eal: remove xen dom0 support

2017-09-05 Thread Thomas Monjalon
05/09/2017 05:41, Tan, Jianfeng:
> From: Richardson, Bruce
> > 
> > Reading the contributors guide section on ABI, specifically
> > http://dpdk.org/doc/guides/contributing/versioning.html#deprecating-an-
> > entire-abi-version
> > it seems like we should collapse down the versions to a single one
> > following the function removal, and also increment the whole library so
> > version.
> 
> So for lib/librte_eal/linuxapp/eal/rte_eal_version.map, we should change it 
> in below way?
> 
> DPDK_2.1 {
> {APIs in DPDK_2.0 except xen APIs}
> ...
> };
> 
> DPDK_16.04 {
> {APIs in DPDK_2.1 except xen APIs}
> ...
> } DPDK_2.1;

No, you don't need to collapse. You can just remove Xen functions.



Re: [dpdk-dev] [PATCH v3 0/2] Dynamically configure mempool handle

2017-09-05 Thread Olivier MATZ
On Mon, Sep 04, 2017 at 03:24:38PM +0100, Sergio Gonzalez Monroy wrote:
> Hi Olivier,
> 
> On 04/09/2017 14:34, Olivier MATZ wrote:
> > Hi Sergio,
> > 
> > On Mon, Sep 04, 2017 at 10:41:56AM +0100, Sergio Gonzalez Monroy wrote:
> > > On 15/08/2017 09:07, Santosh Shukla wrote:
> > > > * Application programming sequencing would be
> > > >   char pref_mempool[RTE_MEMPOOL_OPS_NAMESIZE];
> > > >   rte_eth_dev_get_preferred_pool_ops(ethdev_port_id, pref_mempool 
> > > > /* out */);
> > > >   rte_mempool_create_empty();
> > > >   rte_mempool_set_ops_byname( , pref_memppol, );
> > > >   rte_mempool_populate_default();
> > > What about introducing an API like:
> > > rte_pktmbuf_poll_create_with_ops (..., ops_name, config_pool);
> > > 
> > > I think that API would help for the case the application wants an mbuf 
> > > pool
> > > with ie. stack handler.
> > > Sure we can do the empty/set_ops/populate sequence, but the only thing we
> > > want to change from default pktmbuf_pool_create API is the pool handler.
> > > 
> > > Application just needs to decide the ops handler to use, either default or
> > > one suggested by PMD?
> > > 
> > > I think ideally we would have similar APIs:
> > > - rte_mempool_create_with_ops (...)
> > > - rte_memppol_xmem_create_with_ops (...)
> > Today, we may only want to change the mempool handler, but if we
> > need to change something else tomorrow, we would need to add another
> > parameter again, breaking the ABI.
> > 
> > If we pass a config structure, adding a new field in it would also break the
> > ABI, except if the structure is opaque, with accessors. These accessors 
> > would be
> > functions (ex: mempool_cfg_new, mempool_cfg_set_pool_ops, ...). This is not 
> > so
> > much different than what we have now.
> > 
> > The advantage I can see of working on a config structure instead of 
> > directly on
> > a mempool is that the api can be reused to build a default config.
> > 
> > That said, I think it's quite orthogonal to this patch since we still 
> > require
> > the ethdev api.
> 
> Fair enough.
> 
> Just to double check that we are on the same page:
> - rte_mempool_create is just there for backwards compatibility and a
> sequence of create_empty -> set_ops (optional) ->init -> populate_default ->
> obj_iter (optional) is the recommended way of creating mempools.

Yes, I think rte_mempool_create() has too many arguments.

> - if application wants to use rte_mempool_xmem_create with different pool
> handler needs to replicate function and add set_ops step.

Yes. And now that xen support is going to be removed, I'm wondering if
this function is still required, given the API is not that easy to
use. Calling rte_mempool_populate_phys() several times looks more
flexible. But this is also another topic.

> Now if rte_pktmbuf_pool_create is still the preferred mechanism for
> applications to create mbuf pools, wouldn't it make sense to offer the
> option of either pass the ops_name as suggested before or for the
> application to just set a different pool handler? I understand the it is
> either breaking API or introducing a new API, but is the solution to
> basically "copy" the whole function in the application and add an optional
> step (set_ops)?

I was quite reticent about introducing
rte_pktmbuf_pool_create_with_ops() because for the same reasons, we
would also want to introduce functions to create a mempool that use a
different pktmbuf_init() or pool_init() callback, or to create the pool
in external memory, ... and we would end up with a functions with too
many arguments.

I like the approach of having several simple functions, because it's
easier to read (even if the code is longer), and it's easily extensible.

Now if we feel that the mempool ops is more important than the other
parameters, we can consider to add it in rte_pktmbuf_pool_create().

> With these patches we can:
> - change the default pool handler with EAL option, which does *not* allow
> for pktmbuf_pool_create with different handlers.
> - We can check the PMDs preferred/supported handlers but then we need to
> implement function with whole sequence, cannot use pktmbuf_pool_create.
> 
> It looks to me then that any application that wants to use different pool
> handlers than default/ring need to implement their own create_pool with
> set_ops.



Re: [dpdk-dev] [PATCH 4/4] ethdev: add helpers to move to the new offloads API

2017-09-05 Thread Thomas Monjalon
04/09/2017 16:18, Ananyev, Konstantin:
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > 04/09/2017 15:25, Ananyev, Konstantin:
> > > Hi Shahaf,
> > >
> > > > +/**
> > > > + * A conversion function from rxmode offloads API to rte_eth_rxq_conf
> > > > + * offloads API.
> > > > + */
> > > > +static void
> > > > +rte_eth_convert_rxmode_offloads(struct rte_eth_rxmode *rxmode,
> > > > +   struct rte_eth_rxq_conf *rxq_conf)
> > > > +{
> > > > +   if (rxmode->header_split == 1)
> > > > +   rxq_conf->offloads |= DEV_RX_OFFLOAD_HEADER_SPLIT;
> > > > +   if (rxmode->hw_ip_checksum == 1)
> > > > +   rxq_conf->offloads |= DEV_RX_OFFLOAD_CHECKSUM;
> > > > +   if (rxmode->hw_vlan_filter == 1)
> > > > +   rxq_conf->offloads |= DEV_RX_OFFLOAD_VLAN_FILTER;
> > >
> > > Thinking on it a bit more:
> > > VLAN_FILTER is definitely one per device, as it would affect VFs also.
> > > At least that's what we have for Intel devices (ixgbe, i40e) right now.
> > > For Intel devices VLAN_STRIP is also per device and
> > > will also be  applied to all corresponding VFs.
> > > In fact, right now it is possible to query/change these 3 vlan offload 
> > > flags on the fly
> > > (after dev_start) on  port basis by rte_eth_dev_(get|set)_vlan_offload 
> > > API.
> > > So, I think at least these 3 flags need to be remained on a port basis.
> > 
> > I don't understand how it helps to be able to configure the same thing
> > in 2 places.
> 
> Because some offloads are per device, another - per queue.
> Configuring on a device basis would allow most users to conjure all
> queues in the same manner by default.
> Those users who would  need more fine-grained setup (per queue)
> will be able to overwrite it by rx_queue_setup().

Those users can set the same config for all queues.
>  
> > I think you are just describing a limitation of these HW: some offloads
> > must be the same for all queues.
> 
> As I said above - on some devices some offloads might also affect queues
> that belong to VFs (to another ports in DPDK words).   
> You might never invoke rx_queue_setup() for these queues per your app.
> But you still want to enable this offload on that device.

You are advocating for per-port configuration API because
some settings must be the same on all the ports of your hardware?
So there is a big trouble. You don't need per-port settings,
but per-hw-device settings.
Or would you accept more fine-grained per-port settings?
If yes, you can accept even finer grained per-queues settings.
> 
> > It does not prevent from configuring them in the per-queue setup.
> > 
> > > In fact, why can't we have both per port and per queue RX offload:
> > > - dev_configure() will accept RX_OFFLOAD_* flags and apply them on a port 
> > > basis.
> > > - rx_queue_setup() will also accept RX_OFFLOAD_* flags and apply them on 
> > > a queue basis.
> > > - if particular RX_OFFLOAD flag for that device couldn't be setup on a 
> > > queue basis  -
> > >rx_queue_setup() will return an error.
> > 
> > The queue setup can work while the value is the same for every queues.
> 
> Ok, and how people would know that?
> That for device N offload X has to be the same for all queues,
> and for device M offload X can be differs for different queues.

We can know the hardware limitations by filling this information
at PMD init.

> Again, if we don't allow to enable/disable offloads for particular queue,
> why to bother with updating rx_queue_setup() API at all? 

I do not understand this question.

> > > - rte_eth_rxq_info can be extended to provide information which 
> > > RX_OFFLOADs
> > >   can be configured on a per queue basis.
> > 
> > Yes the PMD should advertise its limitations like being forced to
> > apply the same configuration to all its queues.
> 
> Didn't get your last sentence.

I agree that the hardware limitations must be written in an ethdev structure.


[dpdk-dev] [PATCH v2 0/5] Support TCP/IPv4, VxLAN and GRE GSO in DPDK

2017-09-05 Thread Jiayu Hu
Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.

To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. This patch adds GSO support to DPDK for specific
packet types: specifically, TCP/IPv4, VxLAN, and GRE.

The first patch introduces the GSO API framework. The second patch
adds GSO support for TCP/IPv4 packets (containing an optional VLAN
tag). The third patch adds GSO support for VxLAN packets that contain
outer IPv4, and inner TCP/IPv4 headers (plus optional inner and/or 
outer VLAN tags). The fourth patch adds GSO support for GRE packets
that contain outer IPv4, and inner TCP/IPv4 headers (with optional 
outer VLAN tag). The last patch in the series enables TCP/IPv4, VxLAN,
and GRE GSO in testpmd's checksum forwarding engine.

The performance of TCP/IPv4 GSO on a 10Gbps link is demonstrated using
iperf. Setup for the test is described as follows:

a. Connect 2 x 10Gbps physical ports (P0, P1), which are in the same
   machine, together physically.
b. Launch testpmd with P0 and a vhost-user port, and use csum
   forwarding engine with "retry".
c. Select IP and TCP HW checksum calculation for P0; select TCP HW
   checksum calculation for vhost-user port.
d. Launch a VM with csum and tso offloading enabled.
e. Run iperf-client on virtio-net port in the VM to send TCP packets.
   With enabling csum and tso, the VM can send large TCP/IPv4 packets
   (mss is up to 64KB).
f. P1 is assigned to linux kernel and enabled kernel GRO. Run
   iperf-server on P1.

We conduct three iperf tests:

test-1: enable GSO for P0 in testpmd, and set max GSO segment length
to 1518B. Run two iperf-client in the VM.
test-2: enable TSO for P0 in testpmd, and set TSO segsz to 1518B. Run
two iperf-client in the VM.
test-3: disable GSO and TSO in testpmd. Run two iperf-client in the VM.

Throughput of the above three tests:

test-1: ~9Gbps
test-2: 9.5Gbps
test-3: 3Mbps

The experimental data of VxLAN and GRE will be shown later.

Change log
==
v2:
- merge data segments whose data_len is less than mss into a large data
  segment in gso_do_segment().
- use mbuf->packet_type/l2_len/l3_len etc. instead of parsing the packet
  header in rte_gso_segment().
- provide IP id macros for applications to select fixed or incremental IP
  ids.
- change the defination of gso_types in struct rte_gso_ctx.
- replace rte_pktmbuf_detach() with rte_pktmbuf_free().
- refactor gso_update_pkt_headers().
- change the return value of rte_gso_segment().
- remove parameter checks in rte_gso_segment().
- use rte_net_get_ptype() in app/test-pmd/csumonly.c to fill
  mbuf->packet_type.
- add a new GSO command in testpmd to show GSO configuration for ports.
- misc: fix typo and optimize function description.

Jiayu Hu (3):
  gso: add Generic Segmentation Offload API framework
  gso: add TCP/IPv4 GSO support
  app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO

Mark Kavanagh (2):
  gso: add VxLAN GSO support
  gso: add GRE GSO support

 app/test-pmd/cmdline.c  | 178 
 app/test-pmd/config.c   |  24 +++
 app/test-pmd/csumonly.c |  60 ++-
 app/test-pmd/testpmd.c  |  15 ++
 app/test-pmd/testpmd.h  |  10 ++
 config/common_base  |   5 +
 lib/Makefile|   2 +
 lib/librte_eal/common/include/rte_log.h |   1 +
 lib/librte_gso/Makefile |  52 ++
 lib/librte_gso/gso_common.c | 281 
 lib/librte_gso/gso_common.h | 156 ++
 lib/librte_gso/gso_tcp.c|  83 ++
 lib/librte_gso/gso_tcp.h|  76 +
 lib/librte_gso/gso_tunnel.c |  61 +++
 lib/librte_gso/gso_tunnel.h |  75 +
 lib/librte_gso/rte_gso.c|  99 +++
 lib/librte_gso/rte_gso.h| 133 +++
 lib/librte_gso/rte_gso_version.map  |   7 +
 mk/rte.app.mk   |   1 +
 19 files changed, 1315 insertions(+), 4 deletions(-)
 create mode 100644 lib/librte_gso/Makefile
 create mode 100644 lib/librte_gso/gso_common.c
 create mode 100644 lib/librte_gso/gso_common.h
 create mode 100644 lib/librte_gso/gso_tcp.c
 create mode 100644 lib/librte_gso/gso_tcp.h
 create mode 100644 lib/librte_gso/gso_tunnel.c
 create mode 100644 lib/librte_gso/gso_tunnel.h
 create mode 100644 lib/librte_gso/rte_gso.c
 create mode 100644 lib/librte_gso/rte_gso.h
 create mode 100644 lib/librte_gso/rte_gso_version.map

-- 
2.7.4



[dpdk-dev] [PATCH v2 2/5] gso: add TCP/IPv4 GSO support

2017-09-05 Thread Jiayu Hu
This patch adds GSO support for TCP/IPv4 packets. Supported packets
may include a single VLAN tag. TCP/IPv4 GSO assumes that all input
packets have correct checksums, and doesn't update checksums for output
packets (the responsibility for this lies with the application).
Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.

TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
MBUF, to organize an output packet. Note that we refer to these two
chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
header, while the indirect mbuf simply points to a location within the
original packet's payload. Consequently, use of the GSO library requires
multi-segment MBUF support in the TX functions of the NIC driver.

If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
result, when all of its GSOed segments are freed, the packet is freed
automatically.

Signed-off-by: Jiayu Hu 
Signed-off-by: Mark Kavanagh 
---
 lib/librte_eal/common/include/rte_log.h |   1 +
 lib/librte_gso/Makefile |   2 +
 lib/librte_gso/gso_common.c | 207 
 lib/librte_gso/gso_common.h | 107 +
 lib/librte_gso/gso_tcp.c|  83 +
 lib/librte_gso/gso_tcp.h|  76 
 lib/librte_gso/rte_gso.c|  46 ++-
 7 files changed, 519 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_gso/gso_common.c
 create mode 100644 lib/librte_gso/gso_common.h
 create mode 100644 lib/librte_gso/gso_tcp.c
 create mode 100644 lib/librte_gso/gso_tcp.h

diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index ec8dba7..2fa1199 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -87,6 +87,7 @@ extern struct rte_logs rte_logs;
 #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */
 #define RTE_LOGTYPE_EFD   18 /**< Log related to EFD. */
 #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
+#define RTE_LOGTYPE_GSO   20 /**< Log related to GSO. */
 
 /* these log types can be used in an application */
 #define RTE_LOGTYPE_USER1 24 /**< User-defined log type 1. */
diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
index aeaacbc..0f8e38f 100644
--- a/lib/librte_gso/Makefile
+++ b/lib/librte_gso/Makefile
@@ -42,6 +42,8 @@ LIBABIVER := 1
 
 #source files
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
new file mode 100644
index 000..4d4c3fd
--- /dev/null
+++ b/lib/librte_gso/gso_common.c
@@ -0,0 +1,207 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+
+#include "gso_common.h"
+
+static inline void
+hdr_segment_init(struct rte_mbuf *hdr_segment, struct rte_mbuf *pkt,
+   uint16_t pkt_hdr_offset)
+{
+   /* copy mbuf metadata */
+   hdr_segment->nb_segs = 1;
+   hdr_segment->port = pkt->port;
+   hdr_segment->ol_flags = pkt->ol_flags;
+ 

[dpdk-dev] [PATCH v2 3/5] gso: add VxLAN GSO support

2017-09-05 Thread Jiayu Hu
From: Mark Kavanagh 

This patch adds GSO support for VxLAN-encapsulated packets. Supported
VxLAN packets must have an outer IPv4 header (prepended by an optional
VLAN tag), and contain an inner TCP/IPv4 packet (with an optional inner
VLAN tag).

VxLAN GSO assumes that all input packets have correct checksums and
doesn't update checksums for output packets. Additionally, it doesn't
process IP fragmented packets.

As with TCP/IPv4 GSO, VxLAN GSO uses a two-segment MBUF to organize each
output packet, which mandates support for multi-segment mbufs in the TX
functions of the NIC driver. Also, if a packet is GSOed, VxLAN GSO
reduces its MBUF refcnt by 1. As a result, when all of its GSOed
segments are freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh 
Signed-off-by: Jiayu Hu 
---
 lib/librte_gso/Makefile |  1 +
 lib/librte_gso/gso_common.c | 50 ++
 lib/librte_gso/gso_common.h | 36 +-
 lib/librte_gso/gso_tunnel.c | 61 
 lib/librte_gso/gso_tunnel.h | 75 +
 lib/librte_gso/rte_gso.c|  9 ++
 6 files changed, 231 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_gso/gso_tunnel.c
 create mode 100644 lib/librte_gso/gso_tunnel.h

diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
index 0f8e38f..a4d1a81 100644
--- a/lib/librte_gso/Makefile
+++ b/lib/librte_gso/Makefile
@@ -44,6 +44,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp.c
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tunnel.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
index 4d4c3fd..1e16c9c 100644
--- a/lib/librte_gso/gso_common.c
+++ b/lib/librte_gso/gso_common.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "gso_common.h"
 
@@ -194,11 +195,60 @@ update_inner_tcp4_header(struct rte_mbuf *pkt, uint8_t 
ipid_delta,
}
 }
 
+static inline void
+update_outer_ipv4_header(struct rte_mbuf *pkt, uint16_t id)
+{
+   struct ipv4_hdr *ipv4_hdr;
+
+   ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+   pkt->outer_l2_len);
+   ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len -
+   pkt->outer_l2_len);
+   ipv4_hdr->packet_id = rte_cpu_to_be_16(id);
+}
+
+static inline void
+update_outer_udp_header(struct rte_mbuf *pkt)
+{
+   struct udp_hdr *udp_hdr;
+   uint16_t length;
+
+   length = pkt->outer_l2_len + pkt->outer_l3_len;
+   udp_hdr = (struct udp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+   length);
+   udp_hdr->dgram_len = rte_cpu_to_be_16(pkt->pkt_len - length);
+}
+
+static inline void
+update_ipv4_vxlan_tcp4_header(struct rte_mbuf *pkt, uint8_t ipid_delta,
+   struct rte_mbuf **segs, uint16_t nb_segs)
+{
+   struct ipv4_hdr *ipv4_hdr;
+   uint16_t i, id;
+
+   ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+   pkt->outer_l2_len);
+   id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+   for (i = 0; i < nb_segs; i++) {
+   update_outer_ipv4_header(segs[i], id);
+   id += ipid_delta;
+   update_outer_udp_header(segs[i]);
+   }
+   /* update inner TCP/IPv4 headers */
+   update_inner_tcp4_header(pkt, ipid_delta, segs, nb_segs);
+}
+
 void
 gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
struct rte_mbuf **segs, uint16_t nb_segs)
 {
switch (pkt->packet_type) {
+   case ETHER_VLAN_IPv4_UDP_VXLAN_VLAN_IPv4_TCP_PKT:
+   case ETHER_VLAN_IPv4_UDP_VXLAN_IPv4_TCP_PKT:
+   case ETHER_IPv4_UDP_VXLAN_VLAN_IPv4_TCP_PKT:
+   case ETHER_IPv4_UDP_VXLAN_IPv4_TCP_PKT:
+   update_ipv4_vxlan_tcp4_header(pkt, ipid_delta, segs, nb_segs);
+   break;
case ETHER_VLAN_IPv4_TCP_PKT:
case ETHER_IPv4_TCP_PKT:
update_inner_tcp4_header(pkt, ipid_delta, segs, nb_segs);
diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
index ce3b955..3f76fd1 100644
--- a/lib/librte_gso/gso_common.h
+++ b/lib/librte_gso/gso_common.h
@@ -44,6 +44,13 @@
 #define TCP_HDR_FIN_MASK ((uint8_t)0x01)
 
 #define ETHER_IPv4_PKT (RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4)
+#define INNER_ETHER_IPv4_TCP_PKT (RTE_PTYPE_INNER_L2_ETHER | \
+   RTE_PTYPE_INNER_L3_IPV4 | \
+   RTE_PTYPE_INNER_L4_TCP)
+#define INNER_ETHER_VLAN_IPv4_TCP_PKT (RTE_PTYPE_INNER_L2_ETHER_VLAN | \
+   RTE_PTYPE_INNER_L3_IPV4 | \
+   RTE_PTYPE_INNER_L4_TCP)
+
 /* TCP/IPv4 packet. */
 #define ETHER_IPv4_TCP_PKT (ETHER_IPv4_PKT | RTE_PTYPE_L4_TCP)
 
@@ -51,6 +58,33 @@
 #define ETHER_VLAN_IPv4_TCP_PKT (RTE_PTYPE_L2_ETHER_VLAN | \
RTE

[dpdk-dev] [PATCH v2 1/5] gso: add Generic Segmentation Offload API framework

2017-09-05 Thread Jiayu Hu
Generic Segmentation Offload (GSO) is a SW technique to split large
packets into small ones. Akin to TSO, GSO enables applications to
operate on large packets, thus reducing per-packet processing overhead.

To enable more flexibility to applications, DPDK GSO is implemented
as a standalone library. Applications explicitly use the GSO library
to segment packets. This patch introduces the GSO API framework to DPDK.

The GSO library provides a segmentation API, rte_gso_segment(), for
applications. It splits an input packet into small ones in each
invocation. The GSO library refers to these small packets generated
by rte_gso_segment() as GSO segments. Each of the newly-created GSO
segments is organized as a two-segment MBUF, where the first segment is a
standard MBUF, which stores a copy of packet header, and the second is an
indirect MBUF which points to a section of data in the input packet.
rte_gso_segment() reduces the refcnt of the input packet by 1. Therefore,
when all GSO segments are freed, the input packet is freed automatically.
Additionally, since each GSO segment has multiple MBUFs (i.e. 2 MBUFs),
the driver of the interface which the GSO segments are sent to should
support to transmit multi-segment packets.

Signed-off-by: Jiayu Hu 
Signed-off-by: Mark Kavanagh 
---
 config/common_base |   5 ++
 lib/Makefile   |   2 +
 lib/librte_gso/Makefile|  49 ++
 lib/librte_gso/rte_gso.c   |  48 +
 lib/librte_gso/rte_gso.h   | 133 +
 lib/librte_gso/rte_gso_version.map |   7 ++
 mk/rte.app.mk  |   1 +
 7 files changed, 245 insertions(+)
 create mode 100644 lib/librte_gso/Makefile
 create mode 100644 lib/librte_gso/rte_gso.c
 create mode 100644 lib/librte_gso/rte_gso.h
 create mode 100644 lib/librte_gso/rte_gso_version.map

diff --git a/config/common_base b/config/common_base
index 5e97a08..603e340 100644
--- a/config/common_base
+++ b/config/common_base
@@ -652,6 +652,11 @@ CONFIG_RTE_LIBRTE_IP_FRAG_TBL_STAT=n
 CONFIG_RTE_LIBRTE_GRO=y
 
 #
+# Compile GSO library
+#
+CONFIG_RTE_LIBRTE_GSO=y
+
+#
 # Compile librte_meter
 #
 CONFIG_RTE_LIBRTE_METER=y
diff --git a/lib/Makefile b/lib/Makefile
index 86caba1..3d123f4 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
+DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
new file mode 100644
index 000..aeaacbc
--- /dev/null
+++ b/lib/librte_gso/Makefile
@@ -0,0 +1,49 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gso.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+
+EXPORT_MAP := rte_gso_version.map
+
+LIBABIVER := 1
+
+#source files
+SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h
+
+include $(RTE_SDK)/mk/rte.li

[dpdk-dev] [PATCH v2 5/5] app/testpmd: enable TCP/IPv4, VxLAN and GRE GSO

2017-09-05 Thread Jiayu Hu
This patch adds GSO support to the csum forwarding engine. Oversized
packets transmitted over a GSO-enabled port will undergo segmentation
(with the exception of packet-types unsupported by the GSO library).
GSO support is disabled by default.

GSO support may be toggled on a per-port basis, using the command:

"set port  gso on|off"

The maximum packet length (including the packet header and payload) for
GSO segments may be set with the command:

"set gso segsz "

Show GSO configuration for a given port with the command:

"show port  gso"

Signed-off-by: Jiayu Hu 
Signed-off-by: Mark Kavanagh 
---
 app/test-pmd/cmdline.c  | 178 
 app/test-pmd/config.c   |  24 +++
 app/test-pmd/csumonly.c |  60 ++--
 app/test-pmd/testpmd.c  |  15 
 app/test-pmd/testpmd.h  |  10 +++
 5 files changed, 283 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index cd8c358..03b98a3 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -431,6 +431,17 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set max flow number and max packet number per-flow"
" for GRO.\n\n"
 
+   "set port (port_id) gso (on|off)"
+   "Enable or disable Generic Segmentation Offload in"
+   " csum forwarding engine.\n\n"
+
+   "set gso segsz (length)\n"
+   "Set max packet length for output GSO segments,"
+   " including packet header and payload.\n\n"
+
+   "show port (port_id) gso\n"
+   "Show GSO configuration.\n\n"
+
"set fwd (%s)\n"
"Set packet forwarding mode.\n\n"
 
@@ -3963,6 +3974,170 @@ cmdline_parse_inst_t cmd_gro_set = {
},
 };
 
+/* *** ENABLE/DISABLE GSO *** */
+struct cmd_gso_enable_result {
+   cmdline_fixed_string_t cmd_set;
+   cmdline_fixed_string_t cmd_port;
+   cmdline_fixed_string_t cmd_keyword;
+   cmdline_fixed_string_t cmd_mode;
+   uint8_t cmd_pid;
+};
+
+static void
+cmd_gso_enable_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_gso_enable_result *res;
+
+   res = parsed_result;
+   if (!strcmp(res->cmd_keyword, "gso"))
+   setup_gso(res->cmd_mode, res->cmd_pid);
+}
+
+cmdline_parse_token_string_t cmd_gso_enable_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_set, "set");
+cmdline_parse_token_string_t cmd_gso_enable_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_port, "port");
+cmdline_parse_token_string_t cmd_gso_enable_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_keyword, "gso");
+cmdline_parse_token_string_t cmd_gso_enable_mode =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_mode, "on#off");
+cmdline_parse_token_num_t cmd_gso_enable_pid =
+   TOKEN_NUM_INITIALIZER(struct cmd_gso_enable_result,
+   cmd_pid, UINT8);
+
+cmdline_parse_inst_t cmd_gso_enable = {
+   .f = cmd_gso_enable_parsed,
+   .data = NULL,
+   .help_str = "set port  gso on|off",
+   .tokens = {
+   (void *)&cmd_gso_enable_set,
+   (void *)&cmd_gso_enable_port,
+   (void *)&cmd_gso_enable_pid,
+   (void *)&cmd_gso_enable_keyword,
+   (void *)&cmd_gso_enable_mode,
+   NULL,
+   },
+};
+
+/* *** SET MAX PACKET LENGTH FOR GSO SEGMENTS *** */
+struct cmd_gso_size_result {
+   cmdline_fixed_string_t cmd_set;
+   cmdline_fixed_string_t cmd_keyword;
+   cmdline_fixed_string_t cmd_segsz;
+   uint16_t cmd_size;
+};
+
+static void
+cmd_gso_size_parsed(void *parsed_result,
+  __attribute__((unused)) struct cmdline *cl,
+  __attribute__((unused)) void *data)
+{
+   struct cmd_gso_size_result *res = parsed_result;
+
+   if (test_done == 0) {
+   printf("Before set GSO segsz, please stop fowarding first\n");
+   return;
+   }
+
+   if (!strcmp(res->cmd_keyword, "gso") &&
+   !strcmp(res->cmd_segsz, "segsz")) {
+   if (res->cmd_size == 0) {
+   printf("gso_size should be larger than 0."
+   " Please input a legal value\n");
+   } else
+   gso_max_segment_size = res->cmd_size;
+   }
+}
+
+cmdline_parse_token_string_t cmd_gso_size_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_gso_size_result,
+   cmd_set, "set");
+cmdline_parse_token_strin

[dpdk-dev] [PATCH v2 4/5] gso: add GRE GSO support

2017-09-05 Thread Jiayu Hu
From: Mark Kavanagh 

This patch adds GSO support for GRE-tunneled packets. Supported GRE
packets must contain an outer IPv4 header, and inner TCP/IPv4 headers.
They may also contain a single VLAN tag. GRE GSO assumes that all input
packets have correct checksums and doesn't update checksums for output
packets. Additionally, it doesn't process IP fragmented packets.

As with VxLAN GSO, GRE GSO uses a two-segment MBUF to organize each
output packet, which requires multi-segment mbuf support in the TX
functions of the NIC driver. Also, if a packet is GSOed, GRE GSO reduces
its MBUF refcnt by 1. As a result, when all of its GSOed segments are
freed, the packet is freed automatically.

Signed-off-by: Mark Kavanagh 
Signed-off-by: Jiayu Hu 
---
 lib/librte_gso/gso_common.c | 24 
 lib/librte_gso/gso_common.h | 15 +++
 lib/librte_gso/rte_gso.c|  2 ++
 3 files changed, 41 insertions(+)

diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
index 1e16c9c..668d2d0 100644
--- a/lib/librte_gso/gso_common.c
+++ b/lib/librte_gso/gso_common.c
@@ -37,6 +37,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -238,6 +239,25 @@ update_ipv4_vxlan_tcp4_header(struct rte_mbuf *pkt, 
uint8_t ipid_delta,
update_inner_tcp4_header(pkt, ipid_delta, segs, nb_segs);
 }
 
+static inline void
+update_ipv4_gre_tcp4_header(struct rte_mbuf *pkt, uint8_t ipid_delta,
+   struct rte_mbuf **segs, uint16_t nb_segs)
+{
+   struct ipv4_hdr *ipv4_hdr;
+   uint16_t i, id;
+
+   ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+   pkt->outer_l2_len);
+   id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
+   for (i = 0; i < nb_segs; i++) {
+   update_outer_ipv4_header(segs[i], id);
+   id += ipid_delta;
+   }
+
+   /* update inner TCP/IPv4 headers */
+   update_inner_tcp4_header(pkt, ipid_delta, segs, nb_segs);
+}
+
 void
 gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t ipid_delta,
struct rte_mbuf **segs, uint16_t nb_segs)
@@ -249,6 +269,10 @@ gso_update_pkt_headers(struct rte_mbuf *pkt, uint8_t 
ipid_delta,
case ETHER_IPv4_UDP_VXLAN_IPv4_TCP_PKT:
update_ipv4_vxlan_tcp4_header(pkt, ipid_delta, segs, nb_segs);
break;
+   case ETHER_VLAN_IPv4_GRE_IPv4_TCP_PKT:
+   case ETHER_IPv4_GRE_IPv4_TCP_PKT:
+   update_ipv4_gre_tcp4_header(pkt, ipid_delta, segs, nb_segs);
+   break;
case ETHER_VLAN_IPv4_TCP_PKT:
case ETHER_IPv4_TCP_PKT:
update_inner_tcp4_header(pkt, ipid_delta, segs, nb_segs);
diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
index 3f76fd1..bd53bde 100644
--- a/lib/librte_gso/gso_common.h
+++ b/lib/librte_gso/gso_common.h
@@ -85,6 +85,21 @@
RTE_PTYPE_TUNNEL_VXLAN | \
INNER_ETHER_VLAN_IPv4_TCP_PKT)
 
+/* GRE packet. */
+#define ETHER_IPv4_GRE_IPv4_TCP_PKT (\
+   ETHER_IPv4_PKT  | \
+   RTE_PTYPE_TUNNEL_GRE| \
+   RTE_PTYPE_INNER_L3_IPV4 | \
+   RTE_PTYPE_INNER_L4_TCP)
+
+/* GRE packet with VLAN tag. */
+#define ETHER_VLAN_IPv4_GRE_IPv4_TCP_PKT (\
+   RTE_PTYPE_L2_ETHER_VLAN | \
+   RTE_PTYPE_L3_IPV4   | \
+   RTE_PTYPE_TUNNEL_GRE| \
+   RTE_PTYPE_INNER_L3_IPV4 | \
+   RTE_PTYPE_INNER_L4_TCP)
+
 /**
  * Internal function which updates relevant packet headers, following
  * segmentation. This is required to update, for example, the IPv4
diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
index 0170abc..d40fda9 100644
--- a/lib/librte_gso/rte_gso.c
+++ b/lib/librte_gso/rte_gso.c
@@ -76,6 +76,8 @@ rte_gso_segment(struct rte_mbuf *pkt,
case ETHER_VLAN_IPv4_UDP_VXLAN_IPv4_TCP_PKT:
case ETHER_IPv4_UDP_VXLAN_VLAN_IPv4_TCP_PKT:
case ETHER_IPv4_UDP_VXLAN_IPv4_TCP_PKT:
+   case ETHER_VLAN_IPv4_GRE_IPv4_TCP_PKT:
+   case ETHER_IPv4_GRE_IPv4_TCP_PKT:
ret = gso_tunnel_segment(pkt, gso_size, ipid_delta,
direct_pool, indirect_pool,
pkts_out, nb_pkts_out);
-- 
2.7.4



Re: [dpdk-dev] [PATCH 5/6] eal: remove xen dom0 support

2017-09-05 Thread Tan, Jianfeng


> -Original Message-
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Tuesday, September 5, 2017 3:31 PM
> To: Tan, Jianfeng
> Cc: Richardson, Bruce; dev@dpdk.org; xen-de...@lists.xenproject.org;
> Mcnamara, John; joao.m.mart...@oracle.com;
> jerin.ja...@caviumnetworks.com; shah...@mellanox.com
> Subject: Re: [dpdk-dev] [PATCH 5/6] eal: remove xen dom0 support
> 
> 05/09/2017 05:41, Tan, Jianfeng:
> > From: Richardson, Bruce
> > >
> > > Reading the contributors guide section on ABI, specifically
> > > http://dpdk.org/doc/guides/contributing/versioning.html#deprecating-
> an-
> > > entire-abi-version
> > > it seems like we should collapse down the versions to a single one
> > > following the function removal, and also increment the whole library so
> > > version.
> >
> > So for lib/librte_eal/linuxapp/eal/rte_eal_version.map, we should change
> it in below way?
> >
> > DPDK_2.1 {
> > {APIs in DPDK_2.0 except xen APIs}
> > ...
> > };
> >
> > DPDK_16.04 {
> > {APIs in DPDK_2.1 except xen APIs}
> > ...
> > } DPDK_2.1;
> 
> No, you don't need to collapse. You can just remove Xen functions.

Thanks.

Besides, two more things: 
1. Shall we increase the so version?
2. This patch is about 8K lines long, do we need to split?


Re: [dpdk-dev] [PATCH 4/4] ethdev: add helpers to move to the new offloads API

2017-09-05 Thread Ananyev, Konstantin


> -Original Message-
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Tuesday, September 5, 2017 8:48 AM
> To: Ananyev, Konstantin 
> Cc: Shahaf Shuler ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 4/4] ethdev: add helpers to move to the new 
> offloads API
> 
> 04/09/2017 16:18, Ananyev, Konstantin:
> > From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > > 04/09/2017 15:25, Ananyev, Konstantin:
> > > > Hi Shahaf,
> > > >
> > > > > +/**
> > > > > + * A conversion function from rxmode offloads API to rte_eth_rxq_conf
> > > > > + * offloads API.
> > > > > + */
> > > > > +static void
> > > > > +rte_eth_convert_rxmode_offloads(struct rte_eth_rxmode *rxmode,
> > > > > + struct rte_eth_rxq_conf *rxq_conf)
> > > > > +{
> > > > > + if (rxmode->header_split == 1)
> > > > > + rxq_conf->offloads |= DEV_RX_OFFLOAD_HEADER_SPLIT;
> > > > > + if (rxmode->hw_ip_checksum == 1)
> > > > > + rxq_conf->offloads |= DEV_RX_OFFLOAD_CHECKSUM;
> > > > > + if (rxmode->hw_vlan_filter == 1)
> > > > > + rxq_conf->offloads |= DEV_RX_OFFLOAD_VLAN_FILTER;
> > > >
> > > > Thinking on it a bit more:
> > > > VLAN_FILTER is definitely one per device, as it would affect VFs also.
> > > > At least that's what we have for Intel devices (ixgbe, i40e) right now.
> > > > For Intel devices VLAN_STRIP is also per device and
> > > > will also be  applied to all corresponding VFs.
> > > > In fact, right now it is possible to query/change these 3 vlan offload 
> > > > flags on the fly
> > > > (after dev_start) on  port basis by rte_eth_dev_(get|set)_vlan_offload 
> > > > API.
> > > > So, I think at least these 3 flags need to be remained on a port basis.
> > >
> > > I don't understand how it helps to be able to configure the same thing
> > > in 2 places.
> >
> > Because some offloads are per device, another - per queue.
> > Configuring on a device basis would allow most users to conjure all
> > queues in the same manner by default.
> > Those users who would  need more fine-grained setup (per queue)
> > will be able to overwrite it by rx_queue_setup().
> 
> Those users can set the same config for all queues.
> >
> > > I think you are just describing a limitation of these HW: some offloads
> > > must be the same for all queues.
> >
> > As I said above - on some devices some offloads might also affect queues
> > that belong to VFs (to another ports in DPDK words).
> > You might never invoke rx_queue_setup() for these queues per your app.
> > But you still want to enable this offload on that device.

I am ok with having per-port and per-queue offload configuration.
My concern is that after that patch only per-queue offload configuration will 
remain.
I think we need both.
Konstantin

> 
> You are advocating for per-port configuration API because
> some settings must be the same on all the ports of your hardware?
> So there is a big trouble. You don't need per-port settings,
> but per-hw-device settings.
> Or would you accept more fine-grained per-port settings?
> If yes, you can accept even finer grained per-queues settings.
> >
> > > It does not prevent from configuring them in the per-queue setup.
> > >
> > > > In fact, why can't we have both per port and per queue RX offload:
> > > > - dev_configure() will accept RX_OFFLOAD_* flags and apply them on a 
> > > > port basis.
> > > > - rx_queue_setup() will also accept RX_OFFLOAD_* flags and apply them 
> > > > on a queue basis.
> > > > - if particular RX_OFFLOAD flag for that device couldn't be setup on a 
> > > > queue basis  -
> > > >rx_queue_setup() will return an error.
> > >
> > > The queue setup can work while the value is the same for every queues.
> >
> > Ok, and how people would know that?
> > That for device N offload X has to be the same for all queues,
> > and for device M offload X can be differs for different queues.
> 
> We can know the hardware limitations by filling this information
> at PMD init.
> 
> > Again, if we don't allow to enable/disable offloads for particular queue,
> > why to bother with updating rx_queue_setup() API at all?
> 
> I do not understand this question.
> 
> > > > - rte_eth_rxq_info can be extended to provide information which 
> > > > RX_OFFLOADs
> > > >   can be configured on a per queue basis.
> > >
> > > Yes the PMD should advertise its limitations like being forced to
> > > apply the same configuration to all its queues.
> >
> > Didn't get your last sentence.
> 
> I agree that the hardware limitations must be written in an ethdev structure.


Re: [dpdk-dev] [PATCH v3 0/2] Dynamically configure mempool handle

2017-09-05 Thread Jerin Jacob
-Original Message-
> Date: Tue, 5 Sep 2017 09:47:26 +0200
> From: Olivier MATZ 
> To: Sergio Gonzalez Monroy 
> CC: Santosh Shukla , dev@dpdk.org,
>  tho...@monjalon.net, jerin.ja...@caviumnetworks.com,
>  hemant.agra...@nxp.com
> Subject: Re: [dpdk-dev] [PATCH v3 0/2] Dynamically configure mempool handle
> User-Agent: NeoMutt/20170113 (1.7.2)
> 
> On Mon, Sep 04, 2017 at 03:24:38PM +0100, Sergio Gonzalez Monroy wrote:
> > Hi Olivier,
> > 
> > On 04/09/2017 14:34, Olivier MATZ wrote:
> > > Hi Sergio,
> > > 
> > > On Mon, Sep 04, 2017 at 10:41:56AM +0100, Sergio Gonzalez Monroy wrote:
> > > > On 15/08/2017 09:07, Santosh Shukla wrote:
> > > > > * Application programming sequencing would be
> > > > >   char pref_mempool[RTE_MEMPOOL_OPS_NAMESIZE];
> > > > >   rte_eth_dev_get_preferred_pool_ops(ethdev_port_id, pref_mempool 
> > > > > /* out */);
> > > > >   rte_mempool_create_empty();
> > > > >   rte_mempool_set_ops_byname( , pref_memppol, );
> > > > >   rte_mempool_populate_default();
> > > > What about introducing an API like:
> > > > rte_pktmbuf_poll_create_with_ops (..., ops_name, config_pool);
> > > > 
> > > > I think that API would help for the case the application wants an mbuf 
> > > > pool
> > > > with ie. stack handler.
> > > > Sure we can do the empty/set_ops/populate sequence, but the only thing 
> > > > we
> > > > want to change from default pktmbuf_pool_create API is the pool handler.
> > > > 
> > > > Application just needs to decide the ops handler to use, either default 
> > > > or
> > > > one suggested by PMD?
> > > > 
> > > > I think ideally we would have similar APIs:
> > > > - rte_mempool_create_with_ops (...)
> > > > - rte_memppol_xmem_create_with_ops (...)
> > > Today, we may only want to change the mempool handler, but if we
> > > need to change something else tomorrow, we would need to add another
> > > parameter again, breaking the ABI.
> > > 
> > > If we pass a config structure, adding a new field in it would also break 
> > > the
> > > ABI, except if the structure is opaque, with accessors. These accessors 
> > > would be
> > > functions (ex: mempool_cfg_new, mempool_cfg_set_pool_ops, ...). This is 
> > > not so
> > > much different than what we have now.
> > > 
> > > The advantage I can see of working on a config structure instead of 
> > > directly on
> > > a mempool is that the api can be reused to build a default config.
> > > 
> > > That said, I think it's quite orthogonal to this patch since we still 
> > > require
> > > the ethdev api.
> > 
> > Fair enough.
> > 
> > Just to double check that we are on the same page:
> > - rte_mempool_create is just there for backwards compatibility and a
> > sequence of create_empty -> set_ops (optional) ->init -> populate_default ->
> > obj_iter (optional) is the recommended way of creating mempools.
> 
> Yes, I think rte_mempool_create() has too many arguments.
> 
> > - if application wants to use rte_mempool_xmem_create with different pool
> > handler needs to replicate function and add set_ops step.
> 
> Yes. And now that xen support is going to be removed, I'm wondering if
> this function is still required, given the API is not that easy to
> use. Calling rte_mempool_populate_phys() several times looks more
> flexible. But this is also another topic.
> 
> > Now if rte_pktmbuf_pool_create is still the preferred mechanism for
> > applications to create mbuf pools, wouldn't it make sense to offer the
> > option of either pass the ops_name as suggested before or for the
> > application to just set a different pool handler? I understand the it is
> > either breaking API or introducing a new API, but is the solution to
> > basically "copy" the whole function in the application and add an optional
> > step (set_ops)?
> 
> I was quite reticent about introducing
> rte_pktmbuf_pool_create_with_ops() because for the same reasons, we
> would also want to introduce functions to create a mempool that use a
> different pktmbuf_init() or pool_init() callback, or to create the pool
> in external memory, ... and we would end up with a functions with too
> many arguments.
> 
> I like the approach of having several simple functions, because it's
> easier to read (even if the code is longer), and it's easily extensible.
> 
> Now if we feel that the mempool ops is more important than the other
> parameters, we can consider to add it in rte_pktmbuf_pool_create().

Yes. I agree with Sergio here.

Something we could target for v18.02.


Re: [dpdk-dev] [PATCH 05/21] vhost: add support to slave requests channel

2017-09-05 Thread Maxime Coquelin



On 09/05/2017 06:19 AM, Tiwei Bie wrote:

On Thu, Aug 31, 2017 at 11:50:07AM +0200, Maxime Coquelin wrote:

Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.

This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.

Signed-off-by: Maxime Coquelin 
---
  lib/librte_vhost/vhost.h  |  2 ++
  lib/librte_vhost/vhost_user.c | 27 +++
  lib/librte_vhost/vhost_user.h | 10 +-
  3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 18ad69c85..2340b0c2a 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -196,6 +196,8 @@ struct virtio_net {
uint32_tnr_guest_pages;
uint32_tmax_guest_pages;
struct guest_page   *guest_pages;
+
+   int slave_req_fd;
  } __rte_cache_aligned;
  
  
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c

index 8984dcb6a..7b3c2f32a 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -76,6 +76,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
[VHOST_USER_SET_VRING_ENABLE]  = "VHOST_USER_SET_VRING_ENABLE",
[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
+   [VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
  };
  
  static uint64_t

@@ -122,6 +123,11 @@ vhost_backend_cleanup(struct virtio_net *dev)
munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
dev->log_addr = 0;
}
+
+   if (dev->slave_req_fd >= 0) {
+   close(dev->slave_req_fd);
+   dev->slave_req_fd = -1;


The slave_req_fd should also be initialized to -1 when allocating
the virtio_net structure. Currently, it's missing.


Good catch, thanks for spotting this.

Maxime

Best regards,
Tiwei Bie



Re: [dpdk-dev] [PATCH v2 00/15] devargs fixes

2017-09-05 Thread Gaëtan Rivet
Hi Ferruh,

On Mon, Sep 04, 2017 at 05:04:57PM +0100, Ferruh Yigit wrote:
> On 7/14/2017 10:11 PM, Jan Blunck wrote:
> > The changes to enum rte_devtype that got merged into 17.08-rc1 are breaking
> > API without prior notice. This series is reworking the rte_devargs changes
> > in a way hopefully compliant to the new failover PMD and still keeping API
> > compatible with earlier releases.
> 
> This patchset seems target 17.08, but 17.08 already released and some of
> the patches in this patchset seems included in the release.
> 
> Patchset needs to be rebased on top of latest HEAD.
> 

The relevant fixes in this patchset were included. Other "fixes" that
tried to impose a certain API and one way of doing things were not.

These evolutions should have been proposed within the proposal window.
I am currently working on a series addressing a few of those elements.

> > 
> > The introduced changes to 17.08-rc1 are trading the tightly coupling of
> > struct rte_devargs to the PCI and vdev bus against the struct rte_bus.
> > The changes proposed in this series decouple struct rte_devargs from
> > the new dependencies.
> > 
> > Changes since v1:
> > - explicitly pass busname to rte_eal_devargs_parse() and validate it
> > - better explain why changes are done
> > 
> > Jan Blunck (15):
> >   Revert "devargs: make device types generic"
> >   devargs: fix unittest
> >   devargs: extend unittest
> >   devargs: deprecate enum rte_devtype based functions
> >   pci: use scan_mode configuration
> >   bus: add configuration interface for buses
> >   devargs: use bus configuration interface to set scanning mode
> >   devargs: use existing functions in rte_eal_devargs_parse()
> >   devargs: add busname string field
> >   devargs: use busname
> >   pci: use busname
> >   vdev: use busname
> >   devargs: pass busname argument when parsing
> >   devargs: remove type field
> >   devargs: remove bus field
> 
> <...>
> 

-- 
Gaëtan Rivet
6WIND


Re: [dpdk-dev] [PATCH v2] crypto/openssl: add openssl path for cross compile

2017-09-05 Thread De Lara Guarch, Pablo
Hi Akhil,

> -Original Message-
> From: Akhil Goyal [mailto:akhil.go...@nxp.com]
> Sent: Tuesday, August 29, 2017 8:02 AM
> To: dev@dpdk.org; De Lara Guarch, Pablo
> 
> Cc: hemant.agra...@nxp.com; Doherty, Declan
> ; Akhil Goyal 
> Subject: [PATCH v2] crypto/openssl: add openssl path for cross compile
> 
> OPENSSL_PATH should be defined in case openssl driver is cross compiled
> 
> Signed-off-by: Akhil Goyal 
> ---

...

> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -151,7 +151,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)
> += -lrte_pmd_aesni_mb
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)+= -
> L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM)   += -
> lrte_pmd_aesni_gcm
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM)   += -
> L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
> -_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -lrte_pmd_openssl -
> lcrypto
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -
> L${OPENSSL_PATH}/lib -lrte_pmd_openssl -lcrypto

I am getting the following messages when compiling:

/usr/bin/ld: skipping incompatible /lib/libcrypto.so when searching for -lcrypto
/usr/bin/ld: skipping incompatible /lib/librt.so when searching for -lrt
/usr/bin/ld: skipping incompatible /lib/libm.so when searching for -lm

Since, OPENSSL_PATH is not defined in my system, it is trying to link against 
libraries in /lib/.
I suggest adding a condition to add the openssl directory only if OPENSSL_PATH 
is defined:

+ifeq ($(OPENSSL_PATH),)
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -lrte_pmd_openssl -lcrypto
+else
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -L${OPENSSL_PATH}/lib 
-lrte_pmd_openssl -lcrypto
+endif

Would this work for you?

Thanks,
Pablo


Re: [dpdk-dev] [PATCH 5/6] eal: remove xen dom0 support

2017-09-05 Thread Thomas Monjalon
05/09/2017 10:07, Tan, Jianfeng:
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> > 05/09/2017 05:41, Tan, Jianfeng:
> > > From: Richardson, Bruce
> > > >
> > > > Reading the contributors guide section on ABI, specifically
> > > > http://dpdk.org/doc/guides/contributing/versioning.html#deprecating-
> > an-
> > > > entire-abi-version
> > > > it seems like we should collapse down the versions to a single one
> > > > following the function removal, and also increment the whole library so
> > > > version.
> > >
> > > So for lib/librte_eal/linuxapp/eal/rte_eal_version.map, we should change
> > it in below way?
> > >
> > > DPDK_2.1 {
> > > {APIs in DPDK_2.0 except xen APIs}
> > > ...
> > > };
> > >
> > > DPDK_16.04 {
> > > {APIs in DPDK_2.1 except xen APIs}
> > > ...
> > > } DPDK_2.1;
> > 
> > No, you don't need to collapse. You can just remove Xen functions.
> 
> Thanks.
> 
> Besides, two more things: 
> 1. Shall we increase the so version?

Yes

> 2. This patch is about 8K lines long, do we need to split?

I do not know how you can split a removal.
If you have ideas, feel free.



Re: [dpdk-dev] [PATCH v2] crypto/openssl: add openssl path for cross compile

2017-09-05 Thread Akhil Goyal

Hi Pablo,
On 9/5/2017 1:52 PM, De Lara Guarch, Pablo wrote:

Hi Akhil,


-Original Message-
From: Akhil Goyal [mailto:akhil.go...@nxp.com]
Sent: Tuesday, August 29, 2017 8:02 AM
To: dev@dpdk.org; De Lara Guarch, Pablo

Cc: hemant.agra...@nxp.com; Doherty, Declan
; Akhil Goyal 
Subject: [PATCH v2] crypto/openssl: add openssl path for cross compile

OPENSSL_PATH should be defined in case openssl driver is cross compiled

Signed-off-by: Akhil Goyal 
---


...


--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -151,7 +151,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)
+= -lrte_pmd_aesni_mb
  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)+= -
L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM)   += -
lrte_pmd_aesni_gcm
  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM)   += -
L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
-_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -lrte_pmd_openssl -
lcrypto
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -
L${OPENSSL_PATH}/lib -lrte_pmd_openssl -lcrypto


I am getting the following messages when compiling:

/usr/bin/ld: skipping incompatible /lib/libcrypto.so when searching for -lcrypto
/usr/bin/ld: skipping incompatible /lib/librt.so when searching for -lrt
/usr/bin/ld: skipping incompatible /lib/libm.so when searching for -lm

Since, OPENSSL_PATH is not defined in my system, it is trying to link against 
libraries in /lib/.
I suggest adding a condition to add the openssl directory only if OPENSSL_PATH 
is defined:

+ifeq ($(OPENSSL_PATH),)
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -lrte_pmd_openssl -lcrypto
+else
  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -L${OPENSSL_PATH}/lib 
-lrte_pmd_openssl -lcrypto
+endif

Would this work for you?


Thanks for the suggestion.
yes this would be fine. I will update the patch accordingly.

-Akhil


Re: [dpdk-dev] [PATCH v7 3/9] linuxapp/eal_pci: get iommu class

2017-09-05 Thread santosh
Hi Anatoly,


On Monday 04 September 2017 08:38 PM, Burakov, Anatoly wrote:
>> From: Santosh Shukla [mailto:santosh.shu...@caviumnetworks.com]
>> Sent: Thursday, August 31, 2017 4:26 AM
>> To: dev@dpdk.org
>> Cc: tho...@monjalon.net; jerin.ja...@caviumnetworks.com;
>> hemant.agra...@nxp.com; olivier.m...@6wind.com;
>> maxime.coque...@redhat.com; Gonzalez Monroy, Sergio
>> ; Richardson, Bruce
>> ; shreyansh.j...@nxp.com;
>> gaetan.ri...@6wind.com; Burakov, Anatoly ;
>> step...@networkplumber.org; acon...@redhat.com; Santosh Shukla
>> 
>> Subject: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
>>
>> Get iommu class of PCI device on the bus and returns preferred iova
>> mapping mode for that bus.
>>
>> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
>> Flag used when driver needs to operate in iova=va mode.
>>
>> Algorithm for iova scheme selection for PCI bus:
>> 0. If no device bound then return with RTE_IOVA_DC mapping mode, else
>> goto 1).
>> 1. Look for device attached to vfio kdrv and has .drv_flag set to
>> RTE_PCI_DRV_IOVA_AS_VA.
>> 2. Look for any device attached to UIO class of driver.
>> 3. Check for vfio-noiommu mode enabled.
>>
>> If 2) & 3) is false and 1) is true then select mapping scheme as RTE_IOVA_VA.
>> Otherwise use default mapping scheme (RTE_IOVA_PA).
>>
>> Signed-off-by: Santosh Shukla 
>> Signed-off-by: Jerin Jacob 
>> Reviewed-by: Maxime Coquelin 
>> Acked-by: Hemant Agrawal 
>> ---
>> v6 --> v7:
>> - squashed v6 series patch no [01/12] & [05/12]..
>>   i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron
>> comment).
>>
>>  lib/librte_eal/common/include/rte_pci.h |  2 +
>>  lib/librte_eal/linuxapp/eal/eal_pci.c   | 95
>> +
>>  lib/librte_eal/linuxapp/eal/eal_vfio.c  | 19 +
>>  lib/librte_eal/linuxapp/eal/eal_vfio.h  |  4 ++
>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
>>  5 files changed, 121 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/include/rte_pci.h
>> b/lib/librte_eal/common/include/rte_pci.h
>> index 0e36de093..a67d77f22 100644
>> --- a/lib/librte_eal/common/include/rte_pci.h
>> +++ b/lib/librte_eal/common/include/rte_pci.h
>> @@ -202,6 +202,8 @@ struct rte_pci_bus {  #define
>> RTE_PCI_DRV_INTR_RMV 0x0010
>>  /** Device driver needs to keep mapped resources if unsupported dev
>> detected */  #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
>> +/** Device driver supports iova as va */ #define
>> RTE_PCI_DRV_IOVA_AS_VA
>> +0X0040
>>
>>  /**
>>   * A structure describing a PCI mapping.
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> index 8951ce742..9725fd493 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
>> @@ -45,6 +45,7 @@
>>  #include "eal_filesystem.h"
>>  #include "eal_private.h"
>>  #include "eal_pci_init.h"
>> +#include "eal_vfio.h"
>>
>>  /**
>>   * @file
>> @@ -487,6 +488,100 @@ rte_pci_scan(void)
>>  return -1;
>>  }
>>
>> +/*
>> + * Is pci device bound to any kdrv
>> + */
>> +static inline int
>> +pci_device_is_bound(void)
>> +{
>> +struct rte_pci_device *dev = NULL;
>> +int ret = 0;
>> +
>> +FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +if (dev->kdrv == RTE_KDRV_UNKNOWN ||
>> +dev->kdrv == RTE_KDRV_NONE) {
>> +continue;
>> +} else {
>> +ret = 1;
>> +break;
>> +}
>> +}
>> +return ret;
>> +}
>> +
>> +/*
>> + * Any one of the device bound to uio
>> + */
>> +static inline int
>> +pci_device_bound_uio(void)
>> +{
>> +struct rte_pci_device *dev = NULL;
>> +
>> +FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +if (dev->kdrv == RTE_KDRV_IGB_UIO ||
>> +   dev->kdrv == RTE_KDRV_UIO_GENERIC) {
>> +return 1;
>> +}
>> +}
>> +return 0;
>> +}
>> +
>> +/*
>> + * Any one of the device has iova as va  */ static inline int
>> +pci_device_has_iova_va(void)
>> +{
>> +struct rte_pci_device *dev = NULL;
>> +struct rte_pci_driver *drv = NULL;
>> +
>> +FOREACH_DRIVER_ON_PCIBUS(drv) {
>> +if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
>> +FOREACH_DEVICE_ON_PCIBUS(dev) {
>> +if (dev->kdrv == RTE_KDRV_VFIO &&
>> +rte_pci_match(drv, dev))
>> +return 1;
>> +}
>> +}
>> +}
>> +return 0;
>> +}
>> +
>> +/*
>> + * Get iommu class of PCI devices on the bus.
>> + */
>> +enum rte_iova_mode
>> +rte_pci_get_iommu_class(void)
>> +{
>> +bool is_bound;
>> +bool is_vfio_noiommu_enabled = true;
>> +bool has_iova_va;
>> +bool is_bound_uio;
>> +
>> +is_bound = pci_device_is_bound();
>> +if (!is_bound)
>> +return RTE_IOVA_DC;
>> +
>> +has_iova_va = pci_device_has_iova_va();
>> +is_bound_uio = pci_device

Re: [dpdk-dev] [PATCH 1/8] crypto/aesni_gcm: do not append digest

2017-09-05 Thread De Lara Guarch, Pablo
Hi Fan,

> -Original Message-
> From: Zhang, Roy Fan
> Sent: Monday, September 4, 2017 11:08 AM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; Zhang, Roy Fan 
> Subject: RE: [dpdk-dev] [PATCH 1/8] crypto/aesni_gcm: do not append
> digest
> 
> Hi Pablo,
> 
> Thanks for the patch. It is very good idea of allocating only necessary buffer
> for digests in the operation.
> Comments inline:
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Pablo de Lara
> > Sent: Friday, August 18, 2017 8:21 AM
> > To: Doherty, Declan ;
> > jerin.ja...@caviumnetworks.com
> > Cc: dev@dpdk.org; De Lara Guarch, Pablo
> > 
> > Subject: [dpdk-dev] [PATCH 1/8] crypto/aesni_gcm: do not append digest
> >
> > When performing an authentication verification, the PMD was using
> > memory at the end of the input buffer, to store temporarily the digest.
> > This operation requires the buffer to have enough tailroom unnecessarily.
> > Instead, memory is allocated for each queue pair, to store temporarily
> > the digest generated by the driver, so it can be compared with the one
> > provided in the crypto operation, without needing to touch the input
> buffer.
> >
> > Signed-off-by: Pablo de Lara 
> > ---
> >  drivers/crypto/aesni_gcm/aesni_gcm_pmd.c | 31 +
> ---
> >  drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h |  7 ++
> >  2 files changed, 13 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> > b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> > index d9c91d0..ae670a7 100644
> > --- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> > +++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> > @@ -298,14 +298,7 @@ process_gcm_crypto_op(struct aesni_gcm_qp
> *qp,
> > struct rte_crypto_op *op,
> > sym_op->aead.digest.data,
> > (uint64_t)session->digest_length);
> > } else if (session->op ==
> > AESNI_GCM_OP_AUTHENTICATED_DECRYPTION) {
> > -   uint8_t *auth_tag = (uint8_t
> > *)rte_pktmbuf_append(sym_op->m_dst ?
> > -   sym_op->m_dst : sym_op->m_src,
> > -   session->digest_length);
> > -
> > -   if (!auth_tag) {
> > -   GCM_LOG_ERR("auth_tag");
> > -   return -1;
> > -   }
> 
> qp->tmp_digest has already the data type of uint8_t*, the casting is not
> necessary here. Although the "&" here seems to be wrong. auth_tag didn't
> point to qp->tmp_digest but a buffer with the address as &qp->tmp_digest.

Thanks for spotting this! Will send a v2 soon.

Pablo


[dpdk-dev] [PATCH] service: fix service lcore stop function

2017-09-05 Thread Guduri Prathyusha
lcore_states store the state of the lcore. Fixing the invalid
dereference of lcore_states with service number

Fixes: 21698354c832 ("service: introduce service cores concept")

Signed-off-by: Guduri Prathyusha 
---
 lib/librte_eal/common/rte_service.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_service.c 
b/lib/librte_eal/common/rte_service.c
index 7efb76dc8..2ac77cc2a 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -609,7 +609,7 @@ rte_service_lcore_stop(uint32_t lcore)
uint32_t i;
for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) {
int32_t enabled =
-   lcore_states[i].service_mask & (UINT64_C(1) << i);
+   lcore_states[lcore].service_mask & (UINT64_C(1) << i);
int32_t service_running = rte_services[i].runstate !=
RUNSTATE_STOPPED;
int32_t only_core = rte_services[i].num_mapped_cores == 1;
--
2.14.1



Re: [dpdk-dev] [PATCH v7 3/9] linuxapp/eal_pci: get iommu class

2017-09-05 Thread Burakov, Anatoly
> From: santosh [mailto:santosh.shu...@caviumnetworks.com]
> Sent: Tuesday, September 5, 2017 9:48 AM
> To: Burakov, Anatoly ; dev@dpdk.org
> Cc: tho...@monjalon.net; jerin.ja...@caviumnetworks.com;
> hemant.agra...@nxp.com; olivier.m...@6wind.com;
> maxime.coque...@redhat.com; Gonzalez Monroy, Sergio
> ; Richardson, Bruce
> ; shreyansh.j...@nxp.com;
> gaetan.ri...@6wind.com; step...@networkplumber.org;
> acon...@redhat.com
> Subject: Re: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
> 
> Hi Anatoly,
> 
> 
> On Monday 04 September 2017 08:38 PM, Burakov, Anatoly wrote:
> >> From: Santosh Shukla [mailto:santosh.shu...@caviumnetworks.com]
> >> Sent: Thursday, August 31, 2017 4:26 AM
> >> To: dev@dpdk.org
> >> Cc: tho...@monjalon.net; jerin.ja...@caviumnetworks.com;
> >> hemant.agra...@nxp.com; olivier.m...@6wind.com;
> >> maxime.coque...@redhat.com; Gonzalez Monroy, Sergio
> >> ; Richardson, Bruce
> >> ; shreyansh.j...@nxp.com;
> >> gaetan.ri...@6wind.com; Burakov, Anatoly
> ;
> >> step...@networkplumber.org; acon...@redhat.com; Santosh Shukla
> >> 
> >> Subject: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
> >>
> >> Get iommu class of PCI device on the bus and returns preferred iova
> >> mapping mode for that bus.
> >>
> >> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
> >> Flag used when driver needs to operate in iova=va mode.
> >>
> >> Algorithm for iova scheme selection for PCI bus:
> >> 0. If no device bound then return with RTE_IOVA_DC mapping mode, else
> >> goto 1).
> >> 1. Look for device attached to vfio kdrv and has .drv_flag set to
> >> RTE_PCI_DRV_IOVA_AS_VA.
> >> 2. Look for any device attached to UIO class of driver.
> >> 3. Check for vfio-noiommu mode enabled.
> >>
> >> If 2) & 3) is false and 1) is true then select mapping scheme as
> RTE_IOVA_VA.
> >> Otherwise use default mapping scheme (RTE_IOVA_PA).
> >>
> >> Signed-off-by: Santosh Shukla 
> >> Signed-off-by: Jerin Jacob 
> >> Reviewed-by: Maxime Coquelin 
> >> Acked-by: Hemant Agrawal 
> >> ---
> >> v6 --> v7:
> >> - squashed v6 series patch no [01/12] & [05/12]..
> >>   i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron
> >> comment).
> >>
> >>  lib/librte_eal/common/include/rte_pci.h |  2 +
> >>  lib/librte_eal/linuxapp/eal/eal_pci.c   | 95
> >> +
> >>  lib/librte_eal/linuxapp/eal/eal_vfio.c  | 19 +
> >>  lib/librte_eal/linuxapp/eal/eal_vfio.h  |  4 ++
> >>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
> >>  5 files changed, 121 insertions(+)
> >>
> >> diff --git a/lib/librte_eal/common/include/rte_pci.h
> >> b/lib/librte_eal/common/include/rte_pci.h
> >> index 0e36de093..a67d77f22 100644
> >> --- a/lib/librte_eal/common/include/rte_pci.h
> >> +++ b/lib/librte_eal/common/include/rte_pci.h
> >> @@ -202,6 +202,8 @@ struct rte_pci_bus {  #define
> >> RTE_PCI_DRV_INTR_RMV 0x0010
> >>  /** Device driver needs to keep mapped resources if unsupported dev
> >> detected */  #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
> >> +/** Device driver supports iova as va */ #define
> >> RTE_PCI_DRV_IOVA_AS_VA
> >> +0X0040
> >>
> >>  /**
> >>   * A structure describing a PCI mapping.
> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
> >> b/lib/librte_eal/linuxapp/eal/eal_pci.c
> >> index 8951ce742..9725fd493 100644
> >> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> >> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> >> @@ -45,6 +45,7 @@
> >>  #include "eal_filesystem.h"
> >>  #include "eal_private.h"
> >>  #include "eal_pci_init.h"
> >> +#include "eal_vfio.h"
> >>
> >>  /**
> >>   * @file
> >> @@ -487,6 +488,100 @@ rte_pci_scan(void)
> >>return -1;
> >>  }
> >>
> >> +/*
> >> + * Is pci device bound to any kdrv
> >> + */
> >> +static inline int
> >> +pci_device_is_bound(void)
> >> +{
> >> +  struct rte_pci_device *dev = NULL;
> >> +  int ret = 0;
> >> +
> >> +  FOREACH_DEVICE_ON_PCIBUS(dev) {
> >> +  if (dev->kdrv == RTE_KDRV_UNKNOWN ||
> >> +  dev->kdrv == RTE_KDRV_NONE) {
> >> +  continue;
> >> +  } else {
> >> +  ret = 1;
> >> +  break;
> >> +  }
> >> +  }
> >> +  return ret;
> >> +}
> >> +
> >> +/*
> >> + * Any one of the device bound to uio  */ static inline int
> >> +pci_device_bound_uio(void)
> >> +{
> >> +  struct rte_pci_device *dev = NULL;
> >> +
> >> +  FOREACH_DEVICE_ON_PCIBUS(dev) {
> >> +  if (dev->kdrv == RTE_KDRV_IGB_UIO ||
> >> + dev->kdrv == RTE_KDRV_UIO_GENERIC) {
> >> +  return 1;
> >> +  }
> >> +  }
> >> +  return 0;
> >> +}
> >> +
> >> +/*
> >> + * Any one of the device has iova as va  */ static inline int
> >> +pci_device_has_iova_va(void)
> >> +{
> >> +  struct rte_pci_device *dev = NULL;
> >> +  struct rte_pci_driver *drv = NULL;
> >> +
> >> +  FOREACH_DRIVER_ON_PCIBUS(drv) {
> >> +  if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) {
> >> +  FOREACH_DEVICE_ON_PCIBUS(dev) {
> 

Re: [dpdk-dev] [PATCH v7 3/9] linuxapp/eal_pci: get iommu class

2017-09-05 Thread santosh
Hi Anatoly,


On Tuesday 05 September 2017 02:25 PM, Burakov, Anatoly wrote:
>> From: santosh [mailto:santosh.shu...@caviumnetworks.com]
>> Sent: Tuesday, September 5, 2017 9:48 AM
>> To: Burakov, Anatoly ; dev@dpdk.org
>> Cc: tho...@monjalon.net; jerin.ja...@caviumnetworks.com;
>> hemant.agra...@nxp.com; olivier.m...@6wind.com;
>> maxime.coque...@redhat.com; Gonzalez Monroy, Sergio
>> ; Richardson, Bruce
>> ; shreyansh.j...@nxp.com;
>> gaetan.ri...@6wind.com; step...@networkplumber.org;
>> acon...@redhat.com
>> Subject: Re: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
>>
>> Hi Anatoly,
>>
>>
>> On Monday 04 September 2017 08:38 PM, Burakov, Anatoly wrote:
 From: Santosh Shukla [mailto:santosh.shu...@caviumnetworks.com]
 Sent: Thursday, August 31, 2017 4:26 AM
 To: dev@dpdk.org
 Cc: tho...@monjalon.net; jerin.ja...@caviumnetworks.com;
 hemant.agra...@nxp.com; olivier.m...@6wind.com;
 maxime.coque...@redhat.com; Gonzalez Monroy, Sergio
 ; Richardson, Bruce
 ; shreyansh.j...@nxp.com;
 gaetan.ri...@6wind.com; Burakov, Anatoly
>> ;
 step...@networkplumber.org; acon...@redhat.com; Santosh Shukla
 
 Subject: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class

 Get iommu class of PCI device on the bus and returns preferred iova
 mapping mode for that bus.

 Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
 Flag used when driver needs to operate in iova=va mode.

 Algorithm for iova scheme selection for PCI bus:
 0. If no device bound then return with RTE_IOVA_DC mapping mode, else
 goto 1).
 1. Look for device attached to vfio kdrv and has .drv_flag set to
 RTE_PCI_DRV_IOVA_AS_VA.
 2. Look for any device attached to UIO class of driver.
 3. Check for vfio-noiommu mode enabled.

 If 2) & 3) is false and 1) is true then select mapping scheme as
>> RTE_IOVA_VA.
 Otherwise use default mapping scheme (RTE_IOVA_PA).

 Signed-off-by: Santosh Shukla 
 Signed-off-by: Jerin Jacob 
 Reviewed-by: Maxime Coquelin 
 Acked-by: Hemant Agrawal 
 ---
 v6 --> v7:
 - squashed v6 series patch no [01/12] & [05/12]..
   i.e.. moved RTE_PCI_DRV_IOVA_AS_VA flag into this patch. (Aaron
 comment).

  lib/librte_eal/common/include/rte_pci.h |  2 +
  lib/librte_eal/linuxapp/eal/eal_pci.c   | 95
 +
  lib/librte_eal/linuxapp/eal/eal_vfio.c  | 19 +
  lib/librte_eal/linuxapp/eal/eal_vfio.h  |  4 ++
  lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
  5 files changed, 121 insertions(+)

 diff --git a/lib/librte_eal/common/include/rte_pci.h
 b/lib/librte_eal/common/include/rte_pci.h
 index 0e36de093..a67d77f22 100644
 --- a/lib/librte_eal/common/include/rte_pci.h
 +++ b/lib/librte_eal/common/include/rte_pci.h
 @@ -202,6 +202,8 @@ struct rte_pci_bus {  #define
 RTE_PCI_DRV_INTR_RMV 0x0010
  /** Device driver needs to keep mapped resources if unsupported dev
 detected */  #define RTE_PCI_DRV_KEEP_MAPPED_RES 0x0020
 +/** Device driver supports iova as va */ #define
 RTE_PCI_DRV_IOVA_AS_VA
 +0X0040

  /**
   * A structure describing a PCI mapping.
 diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c
 b/lib/librte_eal/linuxapp/eal/eal_pci.c
 index 8951ce742..9725fd493 100644
 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
 +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
 @@ -45,6 +45,7 @@
  #include "eal_filesystem.h"
  #include "eal_private.h"
  #include "eal_pci_init.h"
 +#include "eal_vfio.h"

  /**
   * @file
 @@ -487,6 +488,100 @@ rte_pci_scan(void)
return -1;
  }

 +/*
 + * Is pci device bound to any kdrv
 + */
 +static inline int
 +pci_device_is_bound(void)
 +{
 +  struct rte_pci_device *dev = NULL;
 +  int ret = 0;
 +
 +  FOREACH_DEVICE_ON_PCIBUS(dev) {
 +  if (dev->kdrv == RTE_KDRV_UNKNOWN ||
 +  dev->kdrv == RTE_KDRV_NONE) {
 +  continue;
 +  } else {
 +  ret = 1;
 +  break;
 +  }
 +  }
 +  return ret;
 +}
 +
 +/*
 + * Any one of the device bound to uio  */ static inline int
 +pci_device_bound_uio(void)
 +{
 +  struct rte_pci_device *dev = NULL;
 +
 +  FOREACH_DEVICE_ON_PCIBUS(dev) {
 +  if (dev->kdrv == RTE_KDRV_IGB_UIO ||
 + dev->kdrv == RTE_KDRV_UIO_GENERIC) {
 +  return 1;
 +  }
 +  }
 +  return 0;
 +}
 +
 +/*
 + * Any one of the device has iova as va  */ static inline int
 +pci_device_has_iova_va(void)
 +{
 +  struct rte_pci_device *dev = NULL;
 +  struct rte_pci_driver *drv = NULL;
 +
 +  FOREACH_DRIVER_ON_PCIBUS(drv) {
 +  if (drv && drv->drv_

Re: [dpdk-dev] [PATCH v7 3/9] linuxapp/eal_pci: get iommu class

2017-09-05 Thread Burakov, Anatoly
> From: Santosh Shukla [mailto:santosh.shu...@caviumnetworks.com]
> Sent: Thursday, August 31, 2017 4:26 AM
> To: dev@dpdk.org
> Cc: tho...@monjalon.net; jerin.ja...@caviumnetworks.com;
> hemant.agra...@nxp.com; olivier.m...@6wind.com;
> maxime.coque...@redhat.com; Gonzalez Monroy, Sergio
> ; Richardson, Bruce
> ; shreyansh.j...@nxp.com;
> gaetan.ri...@6wind.com; Burakov, Anatoly ;
> step...@networkplumber.org; acon...@redhat.com; Santosh Shukla
> 
> Subject: [PATCH v7 3/9] linuxapp/eal_pci: get iommu class
> 
> Get iommu class of PCI device on the bus and returns preferred iova
> mapping mode for that bus.
> 
> Patch also introduces RTE_PCI_DRV_IOVA_AS_VA drv flag.
> Flag used when driver needs to operate in iova=va mode.
> 
> Algorithm for iova scheme selection for PCI bus:
> 0. If no device bound then return with RTE_IOVA_DC mapping mode, else
> goto 1).
> 1. Look for device attached to vfio kdrv and has .drv_flag set to
> RTE_PCI_DRV_IOVA_AS_VA.
> 2. Look for any device attached to UIO class of driver.
> 3. Check for vfio-noiommu mode enabled.
> 
> If 2) & 3) is false and 1) is true then select mapping scheme as RTE_IOVA_VA.
> Otherwise use default mapping scheme (RTE_IOVA_PA).
> 
> Signed-off-by: Santosh Shukla 
> Signed-off-by: Jerin Jacob 
> Reviewed-by: Maxime Coquelin 
> Acked-by: Hemant Agrawal 
> ---

Reviewed-by: Anatoly Burakov 





[dpdk-dev] [PATCH] net/cxgbe: fix memory leak

2017-09-05 Thread Congwen Zhang
In function t4_wr_mbox_meat_timeout(), Dynamic memory stored
in 'temp' not free when the functon return, It possible memory
leak.

Signed-off-by: Congwen Zhang 
---
 drivers/net/cxgbe/base/t4_hw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/cxgbe/base/t4_hw.c b/drivers/net/cxgbe/base/t4_hw.c
index a8ccea0..013d996 100644
--- a/drivers/net/cxgbe/base/t4_hw.c
+++ b/drivers/net/cxgbe/base/t4_hw.c
@@ -403,6 +403,7 @@ int t4_wr_mbox_meat_timeout(struct adapter *adap, int mbox,
t4_os_atomic_list_del(&entry, &adap->mbox_list,
  &adap->mbox_lock);
t4_report_fw_error(adap);
+   free(temp);
return (pcie_fw & F_PCIE_FW_ERR) ? -ENXIO : -EBUSY;
}
 
@@ -446,6 +447,7 @@ int t4_wr_mbox_meat_timeout(struct adapter *adap, int mbox,
 &adap->mbox_list,
 &adap->mbox_lock));
t4_report_fw_error(adap);
+   free(temp);
return (v == X_MBOWNER_FW ? -EBUSY : -ETIMEDOUT);
}
 
@@ -546,6 +548,7 @@ int t4_wr_mbox_meat_timeout(struct adapter *adap, int mbox,
T4_OS_MBOX_LOCKING(
t4_os_atomic_list_del(&entry, &adap->mbox_list,
  &adap->mbox_lock));
+   free(temp);
return -G_FW_CMD_RETVAL((int)res);
}
}
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH 3/3] net/mlx5: fix interrupt enable return value

2017-09-05 Thread Shachar Beiser
OK, 
I will fix to " commit e1016cb733 ("net/mlx5: fix Rx interrupts management") "


> -Original Message-
> From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> Sent: Monday, September 4, 2017 6:24 PM
> To: Shachar Beiser 
> Cc: dev@dpdk.org; Adrien Mazarguil ;
> sta...@dpdk.org
> Subject: Re: [PATCH 3/3] net/mlx5: fix interrupt enable return value
> 
> On Mon, Sep 04, 2017 at 11:48:47AM +, Shachar Beiser wrote:
> > Fixes: 3c7d44af252a ("net/mlx5: support user space Rx interrupt
> > event")
> 
> It should fix commit e1016cb733 ("net/mlx5: fix Rx interrupts management")
> 
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Shachar Beiser 
> > ---
> >  drivers/net/mlx5/mlx5_rxq.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> > index 437dc02..24887fb 100644
> > --- a/drivers/net/mlx5/mlx5_rxq.c
> > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > @@ -1330,7 +1330,7 @@
> > struct priv *priv = mlx5_get_priv(dev);
> > struct rxq *rxq = (*priv->rxqs)[rx_queue_id];
> > struct rxq_ctrl *rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq);
> > -   int ret;
> > +   int ret = 0;
> >
> > if (!rxq || !rxq_ctrl->channel) {
> > ret = EINVAL;
> > --
> > 1.8.3.1
> >
> 
> --
> Nélio Laranjeiro
> 6WIND


Re: [dpdk-dev] [PATCH 1/3] net/mlx5: replace network to host macros

2017-09-05 Thread Shachar Beiser
Does it OK to change it to :
" Fixes: 1be17b6a5539 ("eal: introduce big and little endian types")"
?

> -Original Message-
> From: Nélio Laranjeiro [mailto:nelio.laranje...@6wind.com]
> Sent: Monday, September 4, 2017 6:15 PM
> To: Shachar Beiser 
> Cc: dev@dpdk.org; Adrien Mazarguil 
> Subject: Re: [PATCH 1/3] net/mlx5: replace network to host macros
> 
> The title is a little wrong, it also replace the host to network.
> 
> On Mon, Sep 04, 2017 at 11:48:45AM +, Shachar Beiser wrote:
> > Fixes: 8bb5119634b7 ("net/mlx5: replace network byte order macro")
> 
> This commit does not exists, are you sure of it?
> 
> > Cc: shacha...@mellanox.com
> >
> > Signed-off-by: Shachar Beiser 
> 
> --
> Nélio Laranjeiro
> 6WIND


[dpdk-dev] [PATCH v3] crypto/openssl: add openssl path for cross compile

2017-09-05 Thread Akhil Goyal
OPENSSL_PATH should be defined in case openssl
driver is cross compiled

Signed-off-by: Akhil Goyal 
---
changes in v3:
make OPENSSL_PATH usage conditional if it is not set as suggested by Pablo
 doc/guides/cryptodevs/openssl.rst | 4 
 drivers/crypto/openssl/Makefile   | 7 +++
 mk/rte.app.mk | 4 
 3 files changed, 15 insertions(+)

diff --git a/doc/guides/cryptodevs/openssl.rst 
b/doc/guides/cryptodevs/openssl.rst
index f18a456..08cc9ba 100644
--- a/doc/guides/cryptodevs/openssl.rst
+++ b/doc/guides/cryptodevs/openssl.rst
@@ -88,6 +88,10 @@ sudo apt-get install libc6-dev-i386 (for 
i686-native-linuxapp-gcc target)
 This code was also verified on Fedora 24.
 This code was NOT yet verified on FreeBSD.
 
+In case openssl is cross compiled, openssl can be installed separately
+and path for openssl install directory can be given as
+export OPENSSL_PATH=
+
 Initialization
 --
 
diff --git a/drivers/crypto/openssl/Makefile b/drivers/crypto/openssl/Makefile
index e5fdfb5..a6f13e0 100644
--- a/drivers/crypto/openssl/Makefile
+++ b/drivers/crypto/openssl/Makefile
@@ -35,6 +35,9 @@ LIB = librte_pmd_openssl.a
 
 # build flags
 CFLAGS += -O3
+ifneq ($(OPENSSL_PATH),)
+CFLAGS += -I${OPENSSL_PATH}/include/
+endif
 CFLAGS += $(WERROR_FLAGS)
 
 # library version
@@ -44,7 +47,11 @@ LIBABIVER := 1
 EXPORT_MAP := rte_pmd_openssl_version.map
 
 # external library dependencies
+ifneq ($(OPENSSL_PATH),)
+LDLIBS += -L${OPENSSL_PATH}/lib/ -lcrypto
+else
 LDLIBS += -lcrypto
+endif
 
 # library source files
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += rte_openssl_pmd.c
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index c25fdd9..799aa99 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -151,7 +151,11 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)+= 
-lrte_pmd_aesni_mb
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)+= 
-L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM)   += -lrte_pmd_aesni_gcm
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_GCM)   += 
-L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
+ifeq ($(OPENSSL_PATH),)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -lrte_pmd_openssl -lcrypto
+else
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_OPENSSL) += -L${OPENSSL_PATH}/lib 
-lrte_pmd_openssl -lcrypto
+endif # ($(OPENSSL_PATH),)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO) += -lrte_pmd_null_crypto
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT) += -lrte_pmd_qat -lcrypto
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G)  += -lrte_pmd_snow3g
-- 
2.9.3



Re: [dpdk-dev] [PATCH v2] net/bonding: support bifurcated driver in eal cli using --vdev

2017-09-05 Thread Thomas Monjalon
Ping - any news?

31/07/2017 16:34, Gaëtan Rivet:
> Hi Gowrishankar, Declan,
> 
> On Mon, Jul 10, 2017 at 12:02:24PM +0530, gowrishankar muthukrishnan wrote:
> > On Friday 07 July 2017 09:08 PM, Declan Doherty wrote:
> > >On 04/07/2017 12:57 PM, Gowrishankar wrote:
> > >>From: Gowrishankar Muthukrishnan 
> > >>
> > >>At present, creating bonding devices using --vdev is broken for PMD like
> > >>mlx5 as it is neither UIO nor VFIO based and hence PMD driver is unknown
> > >>to find_port_id_by_pci_addr(), as below.
> > >>
> > >>testpmd  --vdev 'net_bonding0,mode=1,slave=,socket_id=0'
> > >>
> > >>PMD: bond_ethdev_parse_slave_port_kvarg(150) - Invalid slave port value
> > >> () specified
> > >>EAL: Failed to parse slave ports for bonded device net_bonding0
> > >>
> > >>This patch fixes parsing PCI ID from bonding device params by verifying
> > >>it in RTE PCI bus, rather than checking dev->kdrv.
> > >>
> > >>Changes:
> > >>  v2 - revisit fix by iterating rte_pci_bus
> > >>
> > >>Signed-off-by: Gowrishankar Muthukrishnan
> > >>
> > >>---
> > >...
> > >>
> > >
> > >Hey Gowrishankar,
> > >
> > >I was having a look at this patch and there is the following checkpatch
> > >error.
> > >
> > >_coding style issues_
> > >
> > >
> > >WARNING:AVOID_EXTERNS: externs should be avoided in .c files
> > >#48: FILE: drivers/net/bonding/rte_eth_bond_args.c:43:
> > >+extern struct rte_pci_bus rte_pci_bus;
> > >
> > Hi Declan,
> > Thank you for your review.
> > Yes, but I also saw some references like above in older code.
> > 
> > >
> > >Looking at bit closer at the issue I think there is a simpler solution,
> > >the bonding driver really shouldn't be parsing the PCI bus directly, and
> > >since PCI devices use the PCI DBF as their name we can simply replace the
> > >all the scanning code with a simple call to rte_eth_dev_get_port_by_name
> > >API.
> > >
> 
> I agree that it would be better to be able to use the ether API for
> this.
> 
> The issue is that PCI devices are inconsistent regarding their names. The
> possibility is given to the user to employ the simplified BDF format for
> PCI device name, instead of the DomBDF format.
> 
> Unfortunately, the default device name for a PCI device is in the DomBDF
> format. This means that the name won't match if the device was probed by
> using the PCI blacklist mode (the default PCI mode).
> 
> The matching must be refined.
> 
> > 
> > But you are removing an option to mention ports by PCI addresses right  (as
> > I see parse_port_id() completely removed in your patch) ?.
> > IMO, we just need to check if given eth pci id (incase we mention ports ib
> > PCI ID) is one of what EAL scanned in PCI. Also, slaves should not be from
> > any blacklisted PCI ids (as we test with -b or -w).
> > 
> 
> Declan is right about the iteration of PCI devices. The device list for
> the PCI bus is private, the extern declaration to the rte_pci_bus is the
> telltale sign that there is something wrong in the approach here.
> 
> In order to respect the new rte_bus logic, I think what you want to
> achieve can be done by using the rte_bus->find_device with the correct
> device comparison function.
> 
> static int
> pci_addr_cmp(const struct rte_device *dev, const void *_pci_addr)
> {
> struct rte_pci_device *pdev;
> char *addr = _pci_addr;
> struct rte_pci_addr paddr;
> static struct rte_bus *pci_bus = NULL;
> 
> if (pci_bus == NULL)
> pci_bus = rte_bus_find_by_name("pci");
> 
> if (pci_bus->parse(addr, &paddr) != 0) {
> /* Invalid PCI addr given as input. */
> return -1;
> }
> pdev = RTE_DEV_TO_PCI(dev);
> return rte_eal_compare_pci_addr(&pdev->addr, &paddr);
> }
> 
> Then verify that you are able to get a device by using it as follows:
> 
> {
> struct rte_bus *pci_bus;
> struct rte_device *dev;
> 
> pci_bus = rte_bus_find_by_name("pci");
> if (pci_bus == NULL) {
> RTE_LOG(ERR, PMD, "Unable to find PCI bus\n");
> return -1;
> }
> dev = pci_bus->find_device(NULL, pci_addr_cmp, devname);
> if (dev == NULL) {
> RTE_LOG(ERR, PMD, "Unable to find the device %s to enslave.\n",
> devname);
> return -EINVAL;
> }
> }
> 
> I hope it's clear enough. You can find examples of use for this API in
> lib/librte_eal/common/eal_common_dev.c
> 
> It's a quick implementation to outline the possible direction, I
> haven't compiled it. It should be refined.
> 
> For example, the PCI address validation should not be happening in the
> comparison function, the pci_bus could be matched once instead of twice,
> etc...
> 
> But the logic should work.
> 
> Best regards,
> 





Re: [dpdk-dev] [PATCH v3] net/af_packet: support Tx scattered mbuf input

2017-09-05 Thread Ferruh Yigit
On 8/18/2017 3:10 PM, Ferruh Yigit wrote:
> On 8/7/2017 10:45 AM, Wenfeng Liu wrote:
>> Signed-off-by: Wenfeng Liu 
> 
> Reviewed-by: Ferruh Yigit 

Applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH 03/21] vhost: protect virtio_net device struct

2017-09-05 Thread Maxime Coquelin



On 09/05/2017 06:45 AM, Tiwei Bie wrote:

On Thu, Aug 31, 2017 at 11:50:05AM +0200, Maxime Coquelin wrote:

virtio_net device might be accessed while being reallocated
in case of NUMA awareness. This case might be theoretical,
but it will be needed anyway to protect vrings pages against
invalidation.

The virtio_net devs are now protected with a readers/writers
lock, so that before reallocating the device, it is ensured
that it is not being referenced by the processing threads.


[...]
  
+struct virtio_net *

+get_device(int vid)
+{
+   struct virtio_net *dev;
+
+   rte_rwlock_read_lock(&vhost_devices[vid].lock);
+
+   dev = __get_device(vid);
+   if (unlikely(!dev))
+   rte_rwlock_read_unlock(&vhost_devices[vid].lock);
+
+   return dev;
+}
+
+void
+put_device(int vid)
+{
+   rte_rwlock_read_unlock(&vhost_devices[vid].lock);
+}
+


This patch introduced a per-device rwlock which needs to be acquired
unconditionally in the data path. So for each vhost device, the IO
threads of different queues will need to acquire/release this lock
during each enqueue and dequeue operation, which will cause cache
contention when multiple queues are enabled and handled by different
cores. With this patch alone, I saw ~7% performance drop when enabling
6 queues to do 64bytes iofwd loopback test. Is there any way to avoid
introducing this lock to the data path?


First, I'd like to thank you for running the MQ test.
I agree it may have a performance impact in this case.

This lock has currently two purposes:
1. Prevent referencing freed virtio_dev struct in case of numa_realloc.
2. Protect vring pages against invalidation.

For 2., it can be fixed by using the per-vq IOTLB lock (it was not the
case in my early prototypes that had per device IOTLB cache).

For 1., this is an existing problem, so we might consider it is
acceptable to keep current state. Maybe it could be improved by only
reallocating in case VQ0 is not on the right NUMA node, the other VQs
not being initialized at this point.

If we do this we might be able to get rid of this lock, I need some more
time though to ensure I'm not missing something.

What do you think?

Thanks,
Maxime



Best regards,
Tiwei Bie



Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal event

2017-09-05 Thread Adrien Mazarguil
Hi Matan,

On Mon, Sep 04, 2017 at 05:52:55PM +, Matan Azrad wrote:
> Hi Adrien,
> 
> > -Original Message-
> > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > Sent: Monday, September 4, 2017 6:33 PM
> > To: Matan Azrad 
> > Cc: Nélio Laranjeiro ; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal event
> > 
> > Hi Matan,
> > 
> > One comment I have is, while this patch adds support for RMV, it also 
> > silently
> > addresses a bug (see large comment you added to
> > priv_link_status_update()).
> > 
> > This should be split in two commits, with the fix part coming first and CC
> > sta...@dpdk.org, and a second commit adding RMV support proper.
> > 
> 
> Actually, the mlx4 bug was not appeared in the mlx5 previous code,
> Probably because the RMV interrupt was not implemented in mlx5 before this 
> patch.

Good point, no RMV could occur before it is implemented, however a dedicated
commit for the fix itself (i.e. alarm callback not supposed to end up
calling ibv_get_async_event()) might better explain the logic behind these
changes. What I mean is, if there was no problem, you wouldn't need to make
priv_link_status_update() a separate function, right?

> The big comment just explains the link inconsistent issue and was added
> here since Nelio and I think the new function, priv_link_status_update(),
> justifies this comment for future review.  

I understand, this could also have been part of the commit log of the
dedicated commit.

Thanks.

-- 
Adrien Mazarguil
6WIND


[dpdk-dev] [PATCH] net/ixgbe: fix adding multiple mirror type in a rule

2017-09-05 Thread Wei Dai
mirror rule_type can be a bit OR result of multiple mirror type of
a rule. Before the commit which introduced this issue, the code
supported adding multiple mirror type in a rule.

Fixes: 7ba29a76b196 ("ethdev: rename and extend the mirror type")
Cc: sta...@dpdk.org

Signed-off-by: Wei Dai 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 22171d8..858230d 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -5450,13 +5450,13 @@ ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
IXGBE_WRITE_REG(hw, IXGBE_MRCTL(rule_id), mr_ctl);
 
/* write pool mirrror control  register */
-   if (mirror_conf->rule_type == ETH_MIRROR_VIRTUAL_POOL_UP) {
+   if (mirror_conf->rule_type & ETH_MIRROR_VIRTUAL_POOL_UP) {
IXGBE_WRITE_REG(hw, IXGBE_VMRVM(rule_id), mp_lsb);
IXGBE_WRITE_REG(hw, IXGBE_VMRVM(rule_id + rule_mr_offset),
mp_msb);
}
/* write VLAN mirrror control  register */
-   if (mirror_conf->rule_type == ETH_MIRROR_VLAN) {
+   if (mirror_conf->rule_type & ETH_MIRROR_VLAN) {
IXGBE_WRITE_REG(hw, IXGBE_VMRVLAN(rule_id), mv_lsb);
IXGBE_WRITE_REG(hw, IXGBE_VMRVLAN(rule_id + rule_mr_offset),
mv_msb);
-- 
2.7.5



Re: [dpdk-dev] [PATCH] doc: add i40e firmware upgrade guide

2017-09-05 Thread Ferruh Yigit
On 9/4/2017 5:35 PM, Mcnamara, John wrote:
> 
> 
>> -Original Message-
>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Qiming Yang
>> Sent: Tuesday, August 15, 2017 4:27 AM
>> To: dev@dpdk.org
>> Cc: Wu, Jingjing ; Xing, Beilei
>> ; Yang, Qiming 
>> Subject: [dpdk-dev] [PATCH] doc: add i40e firmware upgrade guide
>>
>> This patch adds one link to DPDK i40e doc, which is for users on how to
>> upgrade firmware.
>>
>> Signed-off-by: Qiming Yang 

> 
> Slightly better would be:
> 
> s/follow/following/
> s/if need/if needed/
> 
> But these changes could be make inline during commit so:
> 
> Acked-by: John McNamara 

Applied to dpdk-next-net/master, thanks.

(suggested changes done while applying.)


Re: [dpdk-dev] [PATCH v2] kni: fix build on SLE12 SP3

2017-09-05 Thread Nirmoy Das


On 09/04/2017 11:46 AM, Ferruh Yigit wrote:
> On 8/29/2017 4:06 PM, Nirmoy Das wrote:
>> compilation error:
>> build/lib/librte_eal/linuxapp/kni/kni_net.c:215:5: error:
>> ‘struct net_device’ has no member named ‘trans_start’
>>   dev->trans_start = jiffies;
>>
>> Signed-off-by: Nirmoy Das 
>> ---
>>  lib/librte_eal/linuxapp/kni/compat.h | 32 +++-
>>  1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/kni/compat.h 
>> b/lib/librte_eal/linuxapp/kni/compat.h
>> index 6a1587b4e..19f8e96ce 100644
>> --- a/lib/librte_eal/linuxapp/kni/compat.h
>> +++ b/lib/librte_eal/linuxapp/kni/compat.h
>> @@ -8,6 +8,34 @@
>>  #define RHEL_RELEASE_VERSION(a, b) (((a) << 8) + (b))
>>  #endif
>>  
>> +/* SuSE version macro is the same as Linux kernel version */
>> +#ifndef SLE_VERSION
>> +#define SLE_VERSION(a, b, c) KERNEL_VERSION(a, b, c)
>> +#endif
>> +#ifdef CONFIG_SUSE_KERNEL
>> +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(4, 4, 57))
>> +/* SLES12SP3 is at least 4.4.57+ based */
>> +#define SLE_VERSION_CODE SLE_VERSION(12, 3, 0)
> Just to double check, is there a macro set in SUSE that we can use here,
> instead of defining here ourselves, like RHEL_RELEASE_CODE?
Unfortunately SUSE doesn't have such logic/macro.
>
>> +#elif (LINUX_VERSION_CODE >= KERNEL_VERSION(3, 12, 28))
>> +/* SLES12 is at least 3.12.28+ based */
>> +#define SLE_VERSION_CODE SLE_VERSION(12, 0, 0)
>> +#elif ((LINUX_VERSION_CODE >= KERNEL_VERSION(3, 0, 61)) && \
>> +   (LINUX_VERSION_CODE < KERNEL_VERSION(3, 1, 0)))
> This line gives following checkpatch warning:
> WARNING:LEADING_SPACE: please, no spaces at the start of a line
>
>> +/* SLES11 SP3 is at least 3.0.61+ based */
>> +#define SLE_VERSION_CODE SLE_VERSION(11, 3, 0)
>> +#elif (LINUX_VERSION_CODE == KERNEL_VERSION(2, 6, 32))
>> +/* SLES11 SP1 is 2.6.32 based */
>> +#define SLE_VERSION_CODE SLE_VERSION(11, 1, 0)
>> +#elif (LINUX_VERSION_CODE == KERNEL_VERSION(2, 6, 27))
>> +/* SLES11 GA is 2.6.27 based */
>> +#define SLE_VERSION_CODE SLE_VERSION(11, 0, 0)
>> +#endif /* LINUX_VERSION_CODE == KERNEL_VERSION(x,y,z) */
>> +#endif /* CONFIG_SUSE_KERNEL */
>> +#ifndef SLE_VERSION_CODE
>> +#define SLE_VERSION_CODE 0
> [1] see  below.
>
>> +#endif /* SLE_VERSION_CODE */
>> +
>> +
>>  #if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 39) && \
>>  (!(defined(RHEL_RELEASE_CODE) && \
>> RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(6, 4)))
>> @@ -55,7 +83,9 @@
>>  
>>  #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 7, 0) || \
>>  (defined(RHEL_RELEASE_CODE) && \
>> - RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 4))
>> + RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 4)) || \
>> + (defined(SLE_VERSION_CODE) && \
> defined check is not required, since SLE_VERSION_CODE always defined [1].
>
> It can be either:
> a) (SLE_VERSION_CODE && SLE_VERSION_CODE == SLE_VERSION(12, 3, 0))
> or directly check:
> b) (SLE_VERSION_CODE == SLE_VERSION(12, 3, 0))
Thanks for your comments I will modify the patch.
>
>> +  SLE_VERSION_CODE == SLE_VERSION(12, 3, 0))
>>  #define HAVE_TRANS_START_HELPER
>>  #endif
>>  
>>
Regards,
Nirmoy




[dpdk-dev] [PATCH] net/failsafe: stat support enhancement

2017-09-05 Thread Matan Azrad
The previous stats code returned only the current TX sub
device stats.

This enhancement extends it to return the sum of all sub
devices stats with history of removed sub-devices.

Dedicated stats accumulator saves the stat history of all
sub device remove events.

Each failsafe sub device contains the last stats asked by
the user and updates the accumulator in removal time.

I would like to implement ultimate snapshot on removal time.
The stats_get API needs to be changed to return error in the
case it is too late to retrieve statistics.
By this way, failsafe can get stats snapshot in removal interrupt
callback for each PMD which can give stats after removal event.

Signed-off-by: Matan Azrad 
---
 drivers/net/failsafe/failsafe_ether.c   | 33 +
 drivers/net/failsafe/failsafe_ops.c | 16 
 drivers/net/failsafe/failsafe_private.h |  5 +
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/drivers/net/failsafe/failsafe_ether.c 
b/drivers/net/failsafe/failsafe_ether.c
index a3a8cce..133080d 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -399,6 +399,37 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
return ret;
 }
 
+void
+fs_increment_stats(struct rte_eth_stats *from, struct rte_eth_stats *to)
+{
+   uint8_t i;
+
+   RTE_ASSERT(from != NULL && to != NULL);
+   to->ipackets += from->ipackets;
+   to->opackets += from->opackets;
+   to->ibytes += from->ibytes;
+   to->obytes += from->obytes;
+   to->imissed += from->imissed;
+   to->ierrors += from->ierrors;
+   to->oerrors += from->oerrors;
+   to->rx_nombuf += from->rx_nombuf;
+   for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS; i++) {
+   to->q_ipackets[i] += from->q_ipackets[i];
+   to->q_opackets[i] += from->q_opackets[i];
+   to->q_ibytes[i] += from->q_ibytes[i];
+   to->q_obytes[i] += from->q_obytes[i];
+   to->q_errors[i] += from->q_errors[i];
+   }
+}
+
+void
+fs_increment_stats_accumulator(struct sub_device *sdev)
+{
+   fs_increment_stats(&sdev->stats_snapshot,
+   &PRIV(sdev->fs_dev)->stats_accumulator);
+   memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
+}
+
 int
 failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
enum rte_eth_event_type event __rte_unused,
@@ -410,6 +441,8 @@ failsafe_eth_rmv_event_callback(uint8_t port_id 
__rte_unused,
fs_switch_dev(sdev->fs_dev, sdev);
/* Use safe bursts in any case. */
set_burst_fn(sdev->fs_dev, 1);
+   /* Increment the stats accumulator by the last stats snapshot. */
+   fs_increment_stats_accumulator(sdev);
/*
 * Async removal, the sub-PMD will try to unregister
 * the callback at the source of the current thread context.
diff --git a/drivers/net/failsafe/failsafe_ops.c 
b/drivers/net/failsafe/failsafe_ops.c
index ff9ad15..e47cc85 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -586,9 +586,14 @@ static void
 fs_stats_get(struct rte_eth_dev *dev,
 struct rte_eth_stats *stats)
 {
-   if (TX_SUBDEV(dev) == NULL)
-   return;
-   rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
+   struct sub_device *sdev;
+   uint8_t i;
+
+   memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
+   FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
+   rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot);
+   fs_increment_stats(&sdev->stats_snapshot, stats);
+   }
 }
 
 static void
@@ -597,8 +602,11 @@ fs_stats_reset(struct rte_eth_dev *dev)
struct sub_device *sdev;
uint8_t i;
 
-   FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
+   FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
rte_eth_stats_reset(PORT_ID(sdev));
+   memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
+   }
+   memset(&PRIV(dev)->stats_accumulator, 0, sizeof(struct rte_eth_stats));
 }
 
 /**
diff --git a/drivers/net/failsafe/failsafe_private.h 
b/drivers/net/failsafe/failsafe_private.h
index 0361cf4..267c749 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -102,6 +102,8 @@ struct sub_device {
uint8_t sid;
/* Device state machine */
enum dev_state state;
+   /* Last stats snapshot passed to user */
+   struct rte_eth_stats stats_snapshot;
/* Some device are defined as a command line */
char *cmdline;
/* fail-safe device backreference */
@@ -140,6 +142,7 @@ struct fs_priv {
 * synchronized state.
 */
enum dev_state state;
+   struct rte_eth_stats stats_accumulator;
unsigned int pending_alarm:1; /* An alarm is pending */
/* flow isolat

Re: [dpdk-dev] [PATCH v2 00/51] net/mlx4: trim and refactor entire PMD

2017-09-05 Thread Ferruh Yigit
On 9/1/2017 9:06 AM, Adrien Mazarguil wrote:
> The main purpose of this large series is to relieve the mlx4 PMD from its
> dependency on Mellanox OFED to instead rely on the standard rdma-core
> package provided by Linux distributions.
> 
> While compatibility with Mellanox OFED is preserved, all nonstandard
> functionality has to be stripped from the PMD in order to re-implement it
> through an approach compatible with rdma-core.
> 
> Due to the amount of changes necessary to achieve this goal, this rework
> starts off by removing extraneous code to simplify the PMD as much as
> possible before either replacing or dismantling functionality that relies on
> nonstandard Verbs.
> 
> What remains after applying this series is single-segment Tx/Rx support,
> without offloads nor RSS, on the default MAC address (which cannot be
> configured). Support for multiple queues and the flow API (minus the RSS
> action) are also preserved.
> 
> Missing functionality that needs substantial work will be restored later by
> subsequent series.
> 
> Also because the mlx4 PMD is mostly contained in a single very large source
> file of 6400+ lines (mlx4.c) which has become extremely difficult to
> maintain, this rework is used as an opportunity to finally group functions
> into separate files, as in mlx5.
> 
> This rework targets DPDK 17.11.
> 
> Changes since v1:
> 
> - Rebased series on top of the latest upstream fixes.
> 
> - Cleaned up remaining typos and coding style issues.
> 
> - "net/mlx4: check max number of ports dynamically":
>   Removed extra loop and added error message on maximum number of ports
>   according to Allain's suggestion.
> 
> - "net/mlx4: drop scatter/gather support":
>   Additionally removed unnecessary mbuf pool from rxq_alloc_elts().
> 
> - "net/mlx4: simplify Rx buffer handling":
>   New patch removing unnecessary code from the simplified Rx path.
> 
> - "net/mlx4: remove isolated mode constraint":
>   New patch removing needless constraint for isolated mode, which can now
>   be toggled anytime.
> 
> - "net/mlx4: rely on ethdev for Tx/Rx queue arrays":
>   New patch refactoring duplicated information from ethdev.
> 
> Adrien Mazarguil (51):
>   net/mlx4: add consistency to copyright notices
>   net/mlx4: remove limitation on number of instances
>   net/mlx4: check max number of ports dynamically
>   net/mlx4: remove useless compilation checks
>   net/mlx4: remove secondary process support
>   net/mlx4: remove useless code
>   net/mlx4: remove soft counters compilation option
>   net/mlx4: remove scatter mode compilation option
>   net/mlx4: remove Tx inline compilation option
>   net/mlx4: remove allmulti and promisc support
>   net/mlx4: remove VLAN filter support
>   net/mlx4: remove MAC address configuration support
>   net/mlx4: drop MAC flows affecting all Rx queues
>   net/mlx4: revert flow API RSS support
>   net/mlx4: revert RSS parent queue refactoring
>   net/mlx4: drop RSS support
>   net/mlx4: drop checksum offloads support
>   net/mlx4: drop packet type recognition support
>   net/mlx4: drop scatter/gather support
>   net/mlx4: drop inline receive support
>   net/mlx4: use standard QP attributes
>   net/mlx4: revert resource domain support
>   net/mlx4: revert multicast echo prevention
>   net/mlx4: revert fast Verbs interface for Tx
>   net/mlx4: revert fast Verbs interface for Rx
>   net/mlx4: simplify Rx buffer handling
>   net/mlx4: simplify link update function
>   net/mlx4: standardize on negative errno values
>   net/mlx4: clean up coding style inconsistencies
>   net/mlx4: remove control path locks
>   net/mlx4: remove unnecessary wrapper functions
>   net/mlx4: remove mbuf macro definitions
>   net/mlx4: use standard macro to get array size
>   net/mlx4: separate debugging macros
>   net/mlx4: use a single interrupt handle
>   net/mlx4: rename alarm field
>   net/mlx4: refactor interrupt FD settings
>   net/mlx4: clean up interrupt functions prototypes
>   net/mlx4: compact interrupt functions
>   net/mlx4: separate interrupt handling
>   net/mlx4: separate Rx/Tx definitions
>   net/mlx4: separate Rx/Tx functions
>   net/mlx4: separate device control functions
>   net/mlx4: separate Tx configuration functions
>   net/mlx4: separate Rx configuration functions
>   net/mlx4: group flow API handlers in common file
>   net/mlx4: rename private functions in flow API
>   net/mlx4: separate memory management functions
>   net/mlx4: clean up includes and comments
>   net/mlx4: remove isolated mode constraint
>   net/mlx4: rely on ethdev for Tx/Rx queue arrays

Series applied to dpdk-next-net/master, thanks.


Re: [dpdk-dev] [PATCH 03/21] vhost: protect virtio_net device struct

2017-09-05 Thread Tiwei Bie
On Tue, Sep 05, 2017 at 11:24:14AM +0200, Maxime Coquelin wrote:
> On 09/05/2017 06:45 AM, Tiwei Bie wrote:
> > On Thu, Aug 31, 2017 at 11:50:05AM +0200, Maxime Coquelin wrote:
> > > virtio_net device might be accessed while being reallocated
> > > in case of NUMA awareness. This case might be theoretical,
> > > but it will be needed anyway to protect vrings pages against
> > > invalidation.
> > > 
> > > The virtio_net devs are now protected with a readers/writers
> > > lock, so that before reallocating the device, it is ensured
> > > that it is not being referenced by the processing threads.
> > > 
> > [...]
> > > +struct virtio_net *
> > > +get_device(int vid)
> > > +{
> > > + struct virtio_net *dev;
> > > +
> > > + rte_rwlock_read_lock(&vhost_devices[vid].lock);
> > > +
> > > + dev = __get_device(vid);
> > > + if (unlikely(!dev))
> > > + rte_rwlock_read_unlock(&vhost_devices[vid].lock);
> > > +
> > > + return dev;
> > > +}
> > > +
> > > +void
> > > +put_device(int vid)
> > > +{
> > > + rte_rwlock_read_unlock(&vhost_devices[vid].lock);
> > > +}
> > > +
> > 
> > This patch introduced a per-device rwlock which needs to be acquired
> > unconditionally in the data path. So for each vhost device, the IO
> > threads of different queues will need to acquire/release this lock
> > during each enqueue and dequeue operation, which will cause cache
> > contention when multiple queues are enabled and handled by different
> > cores. With this patch alone, I saw ~7% performance drop when enabling
> > 6 queues to do 64bytes iofwd loopback test. Is there any way to avoid
> > introducing this lock to the data path?
> 
> First, I'd like to thank you for running the MQ test.
> I agree it may have a performance impact in this case.
> 
> This lock has currently two purposes:
> 1. Prevent referencing freed virtio_dev struct in case of numa_realloc.
> 2. Protect vring pages against invalidation.
> 
> For 2., it can be fixed by using the per-vq IOTLB lock (it was not the
> case in my early prototypes that had per device IOTLB cache).
> 
> For 1., this is an existing problem, so we might consider it is
> acceptable to keep current state. Maybe it could be improved by only
> reallocating in case VQ0 is not on the right NUMA node, the other VQs
> not being initialized at this point.
> 
> If we do this we might be able to get rid of this lock, I need some more
> time though to ensure I'm not missing something.
> 
> What do you think?
> 

Cool. So it's possible that the lock in the data path will be
acquired only when the IOMMU feature is enabled. It will be
great!

Besides, I just did a very simple MQ test to verify my thoughts.
Lei (CC'ed in this mail) may do a thorough performance test for
this patch set to evaluate the performance impacts.

Best regards,
Tiwei Bie


[dpdk-dev] [PATCH v2 0/8] Remove temporary digest allocation

2017-09-05 Thread Pablo de Lara
When performing authentication verification,
some crypto PMDs require extra memory where the generated
digest can be placed.
Currently, these PMDs are getting the memory from the end
of the source mbuf, which might fail if there is not enough
tailroom.

To avoid this situation, some memory is allocated
in each queue pair of the device, to store temporarily
these digests.

Changes in v2:
- Removed incorrect indirection when getting the memory
  to store the generated digest (i.e. removed "&" in &temp_digest) 

Pablo de Lara (8):
  crypto/aesni_gcm: do not append digest
  crypto/armv8: do not append digest
  crypto/openssl: do not append digest
  crypto/kasumi: do not append digest
  crypto/snow3g: do not append digest
  crypto/zuc: do not append digest
  crypto/aesni_mb: do not append digest
  test/crypto: do not allocate extra memory for digest

 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c   | 31 ---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h   |  7 +
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c | 36 +++---
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c |  5 +++
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h | 12 +++-
 drivers/crypto/armv8/rte_armv8_pmd.c   | 14 +++--
 drivers/crypto/armv8/rte_armv8_pmd_private.h   |  8 +
 drivers/crypto/kasumi/rte_kasumi_pmd.c | 22 +
 drivers/crypto/kasumi/rte_kasumi_pmd_private.h |  7 +
 drivers/crypto/openssl/rte_openssl_pmd.c   | 19 +---
 drivers/crypto/openssl/rte_openssl_pmd_private.h   |  7 +
 drivers/crypto/snow3g/rte_snow3g_pmd.c | 22 +
 drivers/crypto/snow3g/rte_snow3g_pmd_private.h |  7 +
 drivers/crypto/zuc/rte_zuc_pmd.c   | 16 +++---
 drivers/crypto/zuc/rte_zuc_pmd_private.h   |  7 +
 test/test/test_cryptodev_blockcipher.c | 29 ++---
 16 files changed, 112 insertions(+), 137 deletions(-)

-- 
2.9.4



[dpdk-dev] [PATCH v2 2/8] crypto/armv8: do not append digest

2017-09-05 Thread Pablo de Lara
When performing an authentication verification,
the PMD was using memory at the end of the input buffer,
to store temporarily the digest.
This operation requires the buffer to have enough
tailroom unnecessarily.
Instead, memory is allocated for each queue pair, to store
temporarily the digest generated by the driver, so it can
be compared with the one provided in the crypto operation,
without needing to touch the input buffer.

Signed-off-by: Pablo de Lara 
---
 drivers/crypto/armv8/rte_armv8_pmd.c | 14 +-
 drivers/crypto/armv8/rte_armv8_pmd_private.h |  8 
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/armv8/rte_armv8_pmd.c 
b/drivers/crypto/armv8/rte_armv8_pmd.c
index a5c39c9..d5da02b 100644
--- a/drivers/crypto/armv8/rte_armv8_pmd.c
+++ b/drivers/crypto/armv8/rte_armv8_pmd.c
@@ -575,8 +575,8 @@ get_session(struct armv8_crypto_qp *qp, struct 
rte_crypto_op *op)
 
 /** Process cipher operation */
 static inline void
-process_armv8_chained_op
-   (struct rte_crypto_op *op, struct armv8_crypto_session *sess,
+process_armv8_chained_op(struct armv8_crypto_qp *qp, struct rte_crypto_op *op,
+   struct armv8_crypto_session *sess,
struct rte_mbuf *mbuf_src, struct rte_mbuf *mbuf_dst)
 {
crypto_func_t crypto_func;
@@ -633,8 +633,7 @@ process_armv8_chained_op
op->sym->auth.data.length);
}
} else {
-   adst = (uint8_t *)rte_pktmbuf_append(m_asrc,
-   sess->auth.digest_length);
+   adst = qp->temp_digest;
}
 
arg.cipher.iv = rte_crypto_op_ctod_offset(op, uint8_t *,
@@ -655,15 +654,12 @@ process_armv8_chained_op
sess->auth.digest_length) != 0) {
op->status = RTE_CRYPTO_OP_STATUS_AUTH_FAILED;
}
-   /* Trim area used for digest from mbuf. */
-   rte_pktmbuf_trim(m_asrc,
-   sess->auth.digest_length);
}
 }
 
 /** Process crypto operation for mbuf */
 static inline int
-process_op(const struct armv8_crypto_qp *qp, struct rte_crypto_op *op,
+process_op(struct armv8_crypto_qp *qp, struct rte_crypto_op *op,
struct armv8_crypto_session *sess)
 {
struct rte_mbuf *msrc, *mdst;
@@ -676,7 +672,7 @@ process_op(const struct armv8_crypto_qp *qp, struct 
rte_crypto_op *op,
switch (sess->chain_order) {
case ARMV8_CRYPTO_CHAIN_CIPHER_AUTH:
case ARMV8_CRYPTO_CHAIN_AUTH_CIPHER: /* Fall through */
-   process_armv8_chained_op(op, sess, msrc, mdst);
+   process_armv8_chained_op(qp, op, sess, msrc, mdst);
break;
default:
op->status = RTE_CRYPTO_OP_STATUS_ERROR;
diff --git a/drivers/crypto/armv8/rte_armv8_pmd_private.h 
b/drivers/crypto/armv8/rte_armv8_pmd_private.h
index d02992a..fa31f0a 100644
--- a/drivers/crypto/armv8/rte_armv8_pmd_private.h
+++ b/drivers/crypto/armv8/rte_armv8_pmd_private.h
@@ -69,6 +69,9 @@ do {  
\
 #define NBBY   8   /* Number of bits in a byte */
 #define BYTE_LENGTH(x) ((x) / NBBY)/* Number of bytes in x (round down) */
 
+/* Maximum length for digest (SHA-256 needs 32 bytes) */
+#define DIGEST_LENGTH_MAX 32
+
 /** ARMv8 operation order mode enumerator */
 enum armv8_crypto_chain_order {
ARMV8_CRYPTO_CHAIN_CIPHER_AUTH,
@@ -147,6 +150,11 @@ struct armv8_crypto_qp {
/**< Queue pair statistics */
char name[RTE_CRYPTODEV_NAME_LEN];
/**< Unique Queue Pair Name */
+   uint8_t temp_digest[DIGEST_LENGTH_MAX];
+   /**< Buffer used to store the digest generated
+* by the driver when verifying a digest provided
+* by the user (using authentication verify operation)
+*/
 } __rte_cache_aligned;
 
 /** ARMv8 crypto private session structure */
-- 
2.9.4



[dpdk-dev] [PATCH v2 1/8] crypto/aesni_gcm: do not append digest

2017-09-05 Thread Pablo de Lara
When performing an authentication verification,
the PMD was using memory at the end of the input buffer,
to store temporarily the digest.
This operation requires the buffer to have enough
tailroom unnecessarily.
Instead, memory is allocated for each queue pair, to store
temporarily the digest generated by the driver, so it can
be compared with the one provided in the crypto operation,
without needing to touch the input buffer.

Signed-off-by: Pablo de Lara 
---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c | 31 +---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h |  7 ++
 2 files changed, 13 insertions(+), 25 deletions(-)

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c 
b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
index d9c91d0..8c9c211 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
@@ -298,14 +298,7 @@ process_gcm_crypto_op(struct aesni_gcm_qp *qp, struct 
rte_crypto_op *op,
sym_op->aead.digest.data,
(uint64_t)session->digest_length);
} else if (session->op == AESNI_GCM_OP_AUTHENTICATED_DECRYPTION) {
-   uint8_t *auth_tag = (uint8_t *)rte_pktmbuf_append(sym_op->m_dst 
?
-   sym_op->m_dst : sym_op->m_src,
-   session->digest_length);
-
-   if (!auth_tag) {
-   GCM_LOG_ERR("auth_tag");
-   return -1;
-   }
+   uint8_t *auth_tag = qp->temp_digest;
 
qp->ops[session->key].init(&session->gdata_key,
&qp->gdata_ctx,
@@ -350,14 +343,7 @@ process_gcm_crypto_op(struct aesni_gcm_qp *qp, struct 
rte_crypto_op *op,
sym_op->auth.digest.data,
(uint64_t)session->digest_length);
} else { /* AESNI_GMAC_OP_VERIFY */
-   uint8_t *auth_tag = (uint8_t *)rte_pktmbuf_append(sym_op->m_dst 
?
-   sym_op->m_dst : sym_op->m_src,
-   session->digest_length);
-
-   if (!auth_tag) {
-   GCM_LOG_ERR("auth_tag");
-   return -1;
-   }
+   uint8_t *auth_tag = qp->temp_digest;
 
qp->ops[session->key].init(&session->gdata_key,
&qp->gdata_ctx,
@@ -385,11 +371,10 @@ process_gcm_crypto_op(struct aesni_gcm_qp *qp, struct 
rte_crypto_op *op,
  * - Returns NULL on invalid job
  */
 static void
-post_process_gcm_crypto_op(struct rte_crypto_op *op,
+post_process_gcm_crypto_op(struct aesni_gcm_qp *qp,
+   struct rte_crypto_op *op,
struct aesni_gcm_session *session)
 {
-   struct rte_mbuf *m = op->sym->m_dst ? op->sym->m_dst : op->sym->m_src;
-
op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
 
/* Verify digest if required */
@@ -397,8 +382,7 @@ post_process_gcm_crypto_op(struct rte_crypto_op *op,
session->op == AESNI_GMAC_OP_VERIFY) {
uint8_t *digest;
 
-   uint8_t *tag = rte_pktmbuf_mtod_offset(m, uint8_t *,
-   m->data_len - session->digest_length);
+   uint8_t *tag = (uint8_t *)&qp->temp_digest;
 
if (session->op == AESNI_GMAC_OP_VERIFY)
digest = op->sym->auth.digest.data;
@@ -414,9 +398,6 @@ post_process_gcm_crypto_op(struct rte_crypto_op *op,
 
if (memcmp(tag, digest, session->digest_length) != 0)
op->status = RTE_CRYPTO_OP_STATUS_AUTH_FAILED;
-
-   /* trim area used for digest from mbuf */
-   rte_pktmbuf_trim(m, session->digest_length);
}
 }
 
@@ -435,7 +416,7 @@ handle_completed_gcm_crypto_op(struct aesni_gcm_qp *qp,
struct rte_crypto_op *op,
struct aesni_gcm_session *sess)
 {
-   post_process_gcm_crypto_op(op, sess);
+   post_process_gcm_crypto_op(qp, op, sess);
 
/* Free session if a session-less crypto op */
if (op->sess_type == RTE_CRYPTO_OP_SESSIONLESS) {
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h 
b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
index 7e15572..1c8835b 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
@@ -58,6 +58,8 @@
 #define GCM_LOG_DBG(fmt, args...)
 #endif
 
+/* Maximum length for digest */
+#define DIGEST_LENGTH_MAX 16
 
 /** private data structure for each virtual AESNI GCM device */
 struct aesni_gcm_private {
@@ -84,6 +86,11 @@ struct aesni_gcm_qp {
/**< Queue Pair Identifier */
char name[RTE_CRYPTODEV_NAME_LEN];
/**< Unique Queue Pair Name */
+   uint8_t temp_digest[DIGEST_LENGTH_MAX];
+   /**< Buffer used to store the digest generated
+* by the driver when verifying a digest provided

[dpdk-dev] [PATCH v2 4/8] crypto/kasumi: do not append digest

2017-09-05 Thread Pablo de Lara
When performing an authentication verification,
the PMD was using memory at the end of the input buffer,
to store temporarily the digest.
This operation requires the buffer to have enough
tailroom unnecessarily.
Instead, memory is allocated for each queue pair, to store
temporarily the digest generated by the driver, so it can
be compared with the one provided in the crypto operation,
without needing to touch the input buffer.

Signed-off-by: Pablo de Lara 
---
 drivers/crypto/kasumi/rte_kasumi_pmd.c | 22 --
 drivers/crypto/kasumi/rte_kasumi_pmd_private.h |  7 +++
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/crypto/kasumi/rte_kasumi_pmd.c 
b/drivers/crypto/kasumi/rte_kasumi_pmd.c
index 38cd8a9..4177c3a 100644
--- a/drivers/crypto/kasumi/rte_kasumi_pmd.c
+++ b/drivers/crypto/kasumi/rte_kasumi_pmd.c
@@ -44,7 +44,6 @@
 
 #define KASUMI_KEY_LENGTH 16
 #define KASUMI_IV_LENGTH 8
-#define KASUMI_DIGEST_LENGTH 4
 #define KASUMI_MAX_BURST 4
 #define BYTE_LEN 8
 
@@ -261,7 +260,7 @@ process_kasumi_cipher_op_bit(struct rte_crypto_op *op,
 
 /** Generate/verify hash from mbufs with same hash key. */
 static int
-process_kasumi_hash_op(struct rte_crypto_op **ops,
+process_kasumi_hash_op(struct kasumi_qp *qp, struct rte_crypto_op **ops,
struct kasumi_session *session,
uint8_t num_ops)
 {
@@ -287,8 +286,7 @@ process_kasumi_hash_op(struct rte_crypto_op **ops,
num_bytes = length_in_bits >> 3;
 
if (session->auth_op == RTE_CRYPTO_AUTH_OP_VERIFY) {
-   dst = (uint8_t *)rte_pktmbuf_append(ops[i]->sym->m_src,
-   KASUMI_DIGEST_LENGTH);
+   dst = qp->temp_digest;
sso_kasumi_f9_1_buffer(&session->pKeySched_hash, src,
num_bytes, dst);
 
@@ -296,10 +294,6 @@ process_kasumi_hash_op(struct rte_crypto_op **ops,
if (memcmp(dst, ops[i]->sym->auth.digest.data,
KASUMI_DIGEST_LENGTH) != 0)
ops[i]->status = 
RTE_CRYPTO_OP_STATUS_AUTH_FAILED;
-
-   /* Trim area used for digest from mbuf. */
-   rte_pktmbuf_trim(ops[i]->sym->m_src,
-   KASUMI_DIGEST_LENGTH);
} else  {
dst = ops[i]->sym->auth.digest.data;
 
@@ -327,16 +321,16 @@ process_ops(struct rte_crypto_op **ops, struct 
kasumi_session *session,
session, num_ops);
break;
case KASUMI_OP_ONLY_AUTH:
-   processed_ops = process_kasumi_hash_op(ops, session,
+   processed_ops = process_kasumi_hash_op(qp, ops, session,
num_ops);
break;
case KASUMI_OP_CIPHER_AUTH:
processed_ops = process_kasumi_cipher_op(ops, session,
num_ops);
-   process_kasumi_hash_op(ops, session, processed_ops);
+   process_kasumi_hash_op(qp, ops, session, processed_ops);
break;
case KASUMI_OP_AUTH_CIPHER:
-   processed_ops = process_kasumi_hash_op(ops, session,
+   processed_ops = process_kasumi_hash_op(qp, ops, session,
num_ops);
process_kasumi_cipher_op(ops, session, processed_ops);
break;
@@ -384,15 +378,15 @@ process_op_bit(struct rte_crypto_op *op, struct 
kasumi_session *session,
session);
break;
case KASUMI_OP_ONLY_AUTH:
-   processed_op = process_kasumi_hash_op(&op, session, 1);
+   processed_op = process_kasumi_hash_op(qp, &op, session, 1);
break;
case KASUMI_OP_CIPHER_AUTH:
processed_op = process_kasumi_cipher_op_bit(op, session);
if (processed_op == 1)
-   process_kasumi_hash_op(&op, session, 1);
+   process_kasumi_hash_op(qp, &op, session, 1);
break;
case KASUMI_OP_AUTH_CIPHER:
-   processed_op = process_kasumi_hash_op(&op, session, 1);
+   processed_op = process_kasumi_hash_op(qp, &op, session, 1);
if (processed_op == 1)
process_kasumi_cipher_op_bit(op, session);
break;
diff --git a/drivers/crypto/kasumi/rte_kasumi_pmd_private.h 
b/drivers/crypto/kasumi/rte_kasumi_pmd_private.h
index 0ce2a2e..5f7044b 100644
--- a/drivers/crypto/kasumi/rte_kasumi_pmd_private.h
+++ b/drivers/crypto/kasumi/rte_kasumi_pmd_private.h
@@ -58,6 +58,8 @@
 #define KASUMI_LOG_DBG(fmt, args...)
 #endif
 
+#define KASUMI_DIGEST_LENGTH 4
+
 /** private data structure for each virtual KASUMI device */
 struct kasumi_private {
unsigned max_nb_queue_pairs;
@@ -78,6 +80,11 @@ str

[dpdk-dev] [PATCH v2 5/8] crypto/snow3g: do not append digest

2017-09-05 Thread Pablo de Lara
When performing an authentication verification,
the PMD was using memory at the end of the input buffer,
to store temporarily the digest.
This operation requires the buffer to have enough
tailroom unnecessarily.
Instead, memory is allocated for each queue pair, to store
temporarily the digest generated by the driver, so it can
be compared with the one provided in the crypto operation,
without needing to touch the input buffer.

Signed-off-by: Pablo de Lara 
---
 drivers/crypto/snow3g/rte_snow3g_pmd.c | 22 --
 drivers/crypto/snow3g/rte_snow3g_pmd_private.h |  7 +++
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/crypto/snow3g/rte_snow3g_pmd.c 
b/drivers/crypto/snow3g/rte_snow3g_pmd.c
index dad4506..15f9381 100644
--- a/drivers/crypto/snow3g/rte_snow3g_pmd.c
+++ b/drivers/crypto/snow3g/rte_snow3g_pmd.c
@@ -43,7 +43,6 @@
 #include "rte_snow3g_pmd_private.h"
 
 #define SNOW3G_IV_LENGTH 16
-#define SNOW3G_DIGEST_LENGTH 4
 #define SNOW3G_MAX_BURST 8
 #define BYTE_LEN 8
 
@@ -263,7 +262,7 @@ process_snow3g_cipher_op_bit(struct rte_crypto_op *op,
 
 /** Generate/verify hash from mbufs with same hash key. */
 static int
-process_snow3g_hash_op(struct rte_crypto_op **ops,
+process_snow3g_hash_op(struct snow3g_qp *qp, struct rte_crypto_op **ops,
struct snow3g_session *session,
uint8_t num_ops)
 {
@@ -289,8 +288,7 @@ process_snow3g_hash_op(struct rte_crypto_op **ops,
session->auth_iv_offset);
 
if (session->auth_op == RTE_CRYPTO_AUTH_OP_VERIFY) {
-   dst = (uint8_t *)rte_pktmbuf_append(ops[i]->sym->m_src,
-   SNOW3G_DIGEST_LENGTH);
+   dst = qp->temp_digest;
 
sso_snow3g_f9_1_buffer(&session->pKeySched_hash,
iv, src,
@@ -299,10 +297,6 @@ process_snow3g_hash_op(struct rte_crypto_op **ops,
if (memcmp(dst, ops[i]->sym->auth.digest.data,
SNOW3G_DIGEST_LENGTH) != 0)
ops[i]->status = 
RTE_CRYPTO_OP_STATUS_AUTH_FAILED;
-
-   /* Trim area used for digest from mbuf. */
-   rte_pktmbuf_trim(ops[i]->sym->m_src,
-   SNOW3G_DIGEST_LENGTH);
} else  {
dst = ops[i]->sym->auth.digest.data;
 
@@ -346,16 +340,16 @@ process_ops(struct rte_crypto_op **ops, struct 
snow3g_session *session,
session, num_ops);
break;
case SNOW3G_OP_ONLY_AUTH:
-   processed_ops = process_snow3g_hash_op(ops, session,
+   processed_ops = process_snow3g_hash_op(qp, ops, session,
num_ops);
break;
case SNOW3G_OP_CIPHER_AUTH:
processed_ops = process_snow3g_cipher_op(ops, session,
num_ops);
-   process_snow3g_hash_op(ops, session, processed_ops);
+   process_snow3g_hash_op(qp, ops, session, processed_ops);
break;
case SNOW3G_OP_AUTH_CIPHER:
-   processed_ops = process_snow3g_hash_op(ops, session,
+   processed_ops = process_snow3g_hash_op(qp, ops, session,
num_ops);
process_snow3g_cipher_op(ops, session, processed_ops);
break;
@@ -403,15 +397,15 @@ process_op_bit(struct rte_crypto_op *op, struct 
snow3g_session *session,
session);
break;
case SNOW3G_OP_ONLY_AUTH:
-   processed_op = process_snow3g_hash_op(&op, session, 1);
+   processed_op = process_snow3g_hash_op(qp, &op, session, 1);
break;
case SNOW3G_OP_CIPHER_AUTH:
processed_op = process_snow3g_cipher_op_bit(op, session);
if (processed_op == 1)
-   process_snow3g_hash_op(&op, session, 1);
+   process_snow3g_hash_op(qp, &op, session, 1);
break;
case SNOW3G_OP_AUTH_CIPHER:
-   processed_op = process_snow3g_hash_op(&op, session, 1);
+   processed_op = process_snow3g_hash_op(qp, &op, session, 1);
if (processed_op == 1)
process_snow3g_cipher_op_bit(op, session);
break;
diff --git a/drivers/crypto/snow3g/rte_snow3g_pmd_private.h 
b/drivers/crypto/snow3g/rte_snow3g_pmd_private.h
index fba3cb8..7b9729f 100644
--- a/drivers/crypto/snow3g/rte_snow3g_pmd_private.h
+++ b/drivers/crypto/snow3g/rte_snow3g_pmd_private.h
@@ -58,6 +58,8 @@
 #define SNOW3G_LOG_DBG(fmt, args...)
 #endif
 
+#define SNOW3G_DIGEST_LENGTH 4
+
 /** private data structure for each virtual SNOW 3G device */
 struct snow3g_private {
unsigned max_nb_queue_pairs;
@@ -78,6 +80,11 @

[dpdk-dev] [PATCH v2 3/8] crypto/openssl: do not append digest

2017-09-05 Thread Pablo de Lara
When performing an authentication verification,
the PMD was using memory at the end of the input buffer,
to store temporarily the digest.
This operation requires the buffer to have enough
tailroom unnecessarily.
Instead, memory is allocated for each queue pair, to store
temporarily the digest generated by the driver, so it can
be compared with the one provided in the crypto operation,
without needing to touch the input buffer.

Signed-off-by: Pablo de Lara 
---
 drivers/crypto/openssl/rte_openssl_pmd.c | 19 ---
 drivers/crypto/openssl/rte_openssl_pmd_private.h |  7 +++
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/openssl/rte_openssl_pmd.c 
b/drivers/crypto/openssl/rte_openssl_pmd.c
index 0bd5f98..6e01669 100644
--- a/drivers/crypto/openssl/rte_openssl_pmd.c
+++ b/drivers/crypto/openssl/rte_openssl_pmd.c
@@ -1237,9 +1237,9 @@ process_openssl_docsis_bpi_op(struct rte_crypto_op *op,
 
 /** Process auth operation */
 static void
-process_openssl_auth_op
-   (struct rte_crypto_op *op, struct openssl_session *sess,
-   struct rte_mbuf *mbuf_src, struct rte_mbuf *mbuf_dst)
+process_openssl_auth_op(struct openssl_qp *qp, struct rte_crypto_op *op,
+   struct openssl_session *sess, struct rte_mbuf *mbuf_src,
+   struct rte_mbuf *mbuf_dst)
 {
uint8_t *dst;
int srclen, status;
@@ -1247,8 +1247,7 @@ process_openssl_auth_op
srclen = op->sym->auth.data.length;
 
if (sess->auth.operation == RTE_CRYPTO_AUTH_OP_VERIFY)
-   dst = (uint8_t *)rte_pktmbuf_append(mbuf_src,
-   sess->auth.digest_length);
+   dst = qp->temp_digest;
else {
dst = op->sym->auth.digest.data;
if (dst == NULL)
@@ -1279,8 +1278,6 @@ process_openssl_auth_op
sess->auth.digest_length) != 0) {
op->status = RTE_CRYPTO_OP_STATUS_AUTH_FAILED;
}
-   /* Trim area used for digest from mbuf. */
-   rte_pktmbuf_trim(mbuf_src, sess->auth.digest_length);
}
 
if (status != 0)
@@ -1289,7 +1286,7 @@ process_openssl_auth_op
 
 /** Process crypto operation for mbuf */
 static int
-process_op(const struct openssl_qp *qp, struct rte_crypto_op *op,
+process_op(struct openssl_qp *qp, struct rte_crypto_op *op,
struct openssl_session *sess)
 {
struct rte_mbuf *msrc, *mdst;
@@ -1305,14 +1302,14 @@ process_op(const struct openssl_qp *qp, struct 
rte_crypto_op *op,
process_openssl_cipher_op(op, sess, msrc, mdst);
break;
case OPENSSL_CHAIN_ONLY_AUTH:
-   process_openssl_auth_op(op, sess, msrc, mdst);
+   process_openssl_auth_op(qp, op, sess, msrc, mdst);
break;
case OPENSSL_CHAIN_CIPHER_AUTH:
process_openssl_cipher_op(op, sess, msrc, mdst);
-   process_openssl_auth_op(op, sess, mdst, mdst);
+   process_openssl_auth_op(qp, op, sess, mdst, mdst);
break;
case OPENSSL_CHAIN_AUTH_CIPHER:
-   process_openssl_auth_op(op, sess, msrc, mdst);
+   process_openssl_auth_op(qp, op, sess, msrc, mdst);
process_openssl_cipher_op(op, sess, msrc, mdst);
break;
case OPENSSL_CHAIN_COMBINED:
diff --git a/drivers/crypto/openssl/rte_openssl_pmd_private.h 
b/drivers/crypto/openssl/rte_openssl_pmd_private.h
index b7f7475..93937d5 100644
--- a/drivers/crypto/openssl/rte_openssl_pmd_private.h
+++ b/drivers/crypto/openssl/rte_openssl_pmd_private.h
@@ -59,6 +59,8 @@
 #define OPENSSL_LOG_DBG(fmt, args...)
 #endif
 
+/* Maximum length for digest (SHA-512 needs 64 bytes) */
+#define DIGEST_LENGTH_MAX 64
 
 /** OPENSSL operation order mode enumerator */
 enum openssl_chain_order {
@@ -103,6 +105,11 @@ struct openssl_qp {
/**< Session Mempool */
struct rte_cryptodev_stats stats;
/**< Queue pair statistics */
+   uint8_t temp_digest[DIGEST_LENGTH_MAX];
+   /**< Buffer used to store the digest generated
+* by the driver when verifying a digest provided
+* by the user (using authentication verify operation)
+*/
 } __rte_cache_aligned;
 
 /** OPENSSL crypto private session structure */
-- 
2.9.4



[dpdk-dev] [PATCH v2 6/8] crypto/zuc: do not append digest

2017-09-05 Thread Pablo de Lara
When performing an authentication verification,
the PMD was using memory at the end of the input buffer,
to store temporarily the digest.
This operation requires the buffer to have enough
tailroom unnecessarily.
Instead, memory is allocated for each queue pair, to store
temporarily the digest generated by the driver, so it can
be compared with the one provided in the crypto operation,
without needing to touch the input buffer.

Signed-off-by: Pablo de Lara 
---
 drivers/crypto/zuc/rte_zuc_pmd.c | 16 +---
 drivers/crypto/zuc/rte_zuc_pmd_private.h |  7 +++
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/zuc/rte_zuc_pmd.c b/drivers/crypto/zuc/rte_zuc_pmd.c
index b301711..dd882c4 100644
--- a/drivers/crypto/zuc/rte_zuc_pmd.c
+++ b/drivers/crypto/zuc/rte_zuc_pmd.c
@@ -42,7 +42,6 @@
 
 #include "rte_zuc_pmd_private.h"
 
-#define ZUC_DIGEST_LENGTH 4
 #define ZUC_MAX_BURST 8
 #define BYTE_LEN 8
 
@@ -258,7 +257,7 @@ process_zuc_cipher_op(struct rte_crypto_op **ops,
 
 /** Generate/verify hash from mbufs with same hash key. */
 static int
-process_zuc_hash_op(struct rte_crypto_op **ops,
+process_zuc_hash_op(struct zuc_qp *qp, struct rte_crypto_op **ops,
struct zuc_session *session,
uint8_t num_ops)
 {
@@ -285,8 +284,7 @@ process_zuc_hash_op(struct rte_crypto_op **ops,
session->auth_iv_offset);
 
if (session->auth_op == RTE_CRYPTO_AUTH_OP_VERIFY) {
-   dst = (uint32_t *)rte_pktmbuf_append(ops[i]->sym->m_src,
-   ZUC_DIGEST_LENGTH);
+   dst = (uint32_t *)qp->temp_digest;
 
sso_zuc_eia3_1_buffer(session->pKey_hash,
iv, src,
@@ -295,10 +293,6 @@ process_zuc_hash_op(struct rte_crypto_op **ops,
if (memcmp(dst, ops[i]->sym->auth.digest.data,
ZUC_DIGEST_LENGTH) != 0)
ops[i]->status = 
RTE_CRYPTO_OP_STATUS_AUTH_FAILED;
-
-   /* Trim area used for digest from mbuf. */
-   rte_pktmbuf_trim(ops[i]->sym->m_src,
-   ZUC_DIGEST_LENGTH);
} else  {
dst = (uint32_t *)ops[i]->sym->auth.digest.data;
 
@@ -327,16 +321,16 @@ process_ops(struct rte_crypto_op **ops, struct 
zuc_session *session,
session, num_ops);
break;
case ZUC_OP_ONLY_AUTH:
-   processed_ops = process_zuc_hash_op(ops, session,
+   processed_ops = process_zuc_hash_op(qp, ops, session,
num_ops);
break;
case ZUC_OP_CIPHER_AUTH:
processed_ops = process_zuc_cipher_op(ops, session,
num_ops);
-   process_zuc_hash_op(ops, session, processed_ops);
+   process_zuc_hash_op(qp, ops, session, processed_ops);
break;
case ZUC_OP_AUTH_CIPHER:
-   processed_ops = process_zuc_hash_op(ops, session,
+   processed_ops = process_zuc_hash_op(qp, ops, session,
num_ops);
process_zuc_cipher_op(ops, session, processed_ops);
break;
diff --git a/drivers/crypto/zuc/rte_zuc_pmd_private.h 
b/drivers/crypto/zuc/rte_zuc_pmd_private.h
index b706e0a..a57b8cd 100644
--- a/drivers/crypto/zuc/rte_zuc_pmd_private.h
+++ b/drivers/crypto/zuc/rte_zuc_pmd_private.h
@@ -59,6 +59,8 @@
 #endif
 
 #define ZUC_IV_KEY_LENGTH 16
+#define ZUC_DIGEST_LENGTH 4
+
 /** private data structure for each virtual ZUC device */
 struct zuc_private {
unsigned max_nb_queue_pairs;
@@ -79,6 +81,11 @@ struct zuc_qp {
/**< Session Mempool */
struct rte_cryptodev_stats qp_stats;
/**< Queue pair statistics */
+   uint8_t temp_digest[ZUC_DIGEST_LENGTH];
+   /**< Buffer used to store the digest generated
+* by the driver when verifying a digest provided
+* by the user (using authentication verify operation)
+*/
 } __rte_cache_aligned;
 
 enum zuc_operation {
-- 
2.9.4



[dpdk-dev] [PATCH v2 7/8] crypto/aesni_mb: do not append digest

2017-09-05 Thread Pablo de Lara
When performing an authentication verification,
the PMD was using memory at the end of the input buffer,
to store temporarily the digest.
This operation requires the buffer to have enough
tailroom unnecessarily.
Instead, memory is allocated for each queue pair, to store
temporarily the digest generated by the driver, so it can
be compared with the one provided in the crypto operation,
without needing to touch the input buffer.

Signed-off-by: Pablo de Lara 
---
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c | 36 +++---
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c |  5 +++
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h | 12 +++-
 3 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c 
b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
index 16e1451..529f469 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
@@ -407,7 +407,7 @@ get_session(struct aesni_mb_qp *qp, struct rte_crypto_op 
*op)
  */
 static inline int
 set_mb_job_params(JOB_AES_HMAC *job, struct aesni_mb_qp *qp,
-   struct rte_crypto_op *op)
+   struct rte_crypto_op *op, uint8_t *digest_idx)
 {
struct rte_mbuf *m_src = op->sym->m_src, *m_dst;
struct aesni_mb_session *session;
@@ -466,19 +466,8 @@ set_mb_job_params(JOB_AES_HMAC *job, struct aesni_mb_qp 
*qp,
/* Set digest output location */
if (job->hash_alg != NULL_HASH &&
session->auth.operation == RTE_CRYPTO_AUTH_OP_VERIFY) {
-   job->auth_tag_output = (uint8_t *)rte_pktmbuf_append(m_dst,
-   get_digest_byte_length(job->hash_alg));
-
-   if (job->auth_tag_output == NULL) {
-   MB_LOG_ERR("failed to allocate space in output mbuf "
-   "for temp digest");
-   op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-   return -1;
-   }
-
-   memset(job->auth_tag_output, 0,
-   sizeof(get_digest_byte_length(job->hash_alg)));
-
+   job->auth_tag_output = qp->temp_digests[*digest_idx];
+   *digest_idx = (*digest_idx + 1) % MAX_JOBS;
} else {
job->auth_tag_output = op->sym->auth.digest.data;
}
@@ -507,22 +496,17 @@ set_mb_job_params(JOB_AES_HMAC *job, struct aesni_mb_qp 
*qp,
 
/* Set user data to be crypto operation data struct */
job->user_data = op;
-   job->user_data2 = m_dst;
 
return 0;
 }
 
 static inline void
-verify_digest(JOB_AES_HMAC *job, struct rte_crypto_op *op) {
-   struct rte_mbuf *m_dst = (struct rte_mbuf *)job->user_data2;
-
+verify_digest(struct aesni_mb_qp *qp __rte_unused, JOB_AES_HMAC *job,
+   struct rte_crypto_op *op) {
/* Verify digest if required */
if (memcmp(job->auth_tag_output, op->sym->auth.digest.data,
job->auth_tag_output_len_in_bytes) != 0)
op->status = RTE_CRYPTO_OP_STATUS_AUTH_FAILED;
-
-   /* trim area used for digest from mbuf */
-   rte_pktmbuf_trim(m_dst, get_digest_byte_length(job->hash_alg));
 }
 
 /**
@@ -532,8 +516,7 @@ verify_digest(JOB_AES_HMAC *job, struct rte_crypto_op *op) {
  * @param job  JOB_AES_HMAC job to process
  *
  * @return
- * - Returns processed crypto operation which mbuf is trimmed of output digest
- *   used in verification of supplied digest.
+ * - Returns processed crypto operation.
  * - Returns NULL on invalid job
  */
 static inline struct rte_crypto_op *
@@ -552,7 +535,7 @@ post_process_mb_job(struct aesni_mb_qp *qp, JOB_AES_HMAC 
*job)
if (job->hash_alg != NULL_HASH) {
if (sess->auth.operation ==
RTE_CRYPTO_AUTH_OP_VERIFY)
-   verify_digest(job, op);
+   verify_digest(qp, job, op);
}
break;
default:
@@ -650,6 +633,7 @@ aesni_mb_pmd_dequeue_burst(void *queue_pair, struct 
rte_crypto_op **ops,
if (unlikely(nb_ops == 0))
return 0;
 
+   uint8_t digest_idx = qp->digest_idx;
do {
/* Get next operation to process from ingress queue */
retval = rte_ring_dequeue(qp->ingress_queue, (void **)&op);
@@ -667,7 +651,7 @@ aesni_mb_pmd_dequeue_burst(void *queue_pair, struct 
rte_crypto_op **ops,
job = (*qp->op_fns->job.get_next)(&qp->mb_mgr);
}
 
-   retval = set_mb_job_params(job, qp, op);
+   retval = set_mb_job_params(job, qp, op, &digest_idx);
if (unlikely(retval != 0)) {
qp->stats.dequeue_err_count++;
set_job_null_op(job);
@@ -687,6 +671,8 @@ aesni_mb_pmd_de

[dpdk-dev] [PATCH v2 8/8] test/crypto: do not allocate extra memory for digest

2017-09-05 Thread Pablo de Lara
Now that PMDs do not need extra space in the mbuf
to store temporarily the digest when verifying
an authentication tag, it is not required to allocate
more memory in the mbufs passed to cryptodev.

Signed-off-by: Pablo de Lara 
---
 test/test/test_cryptodev_blockcipher.c | 29 ++---
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/test/test/test_cryptodev_blockcipher.c 
b/test/test/test_cryptodev_blockcipher.c
index 6089af4..f8222bd 100644
--- a/test/test/test_cryptodev_blockcipher.c
+++ b/test/test/test_cryptodev_blockcipher.c
@@ -452,25 +452,13 @@ test_blockcipher_one_case(const struct 
blockcipher_test_case *t,
if (t->feature_mask & BLOCKCIPHER_TEST_FEATURE_OOP) {
struct rte_mbuf *mbuf;
uint8_t value;
-   uint32_t head_unchanged_len = 0, changed_len = 0;
+   uint32_t head_unchanged_len, changed_len = 0;
uint32_t i;
 
mbuf = sym_op->m_src;
-   if (t->op_mask & BLOCKCIPHER_TEST_OP_AUTH_VERIFY) {
-   /* white-box test: PMDs use some of the
-* tailroom as temp storage in verify case
-*/
-   head_unchanged_len = rte_pktmbuf_headroom(mbuf)
-   + rte_pktmbuf_data_len(mbuf);
-   changed_len = digest_len;
-   } else {
-   head_unchanged_len = mbuf->buf_len;
-   changed_len = 0;
-   }
+   head_unchanged_len = mbuf->buf_len;
 
for (i = 0; i < mbuf->buf_len; i++) {
-   if (i == head_unchanged_len)
-   i += changed_len;
value = *((uint8_t *)(mbuf->buf_addr)+i);
if (value != tmp_src_buf[i]) {
snprintf(test_msg, BLOCKCIPHER_TEST_MSG_LEN,
@@ -531,19 +519,6 @@ test_blockcipher_one_case(const struct 
blockcipher_test_case *t,
if (t->op_mask & BLOCKCIPHER_TEST_OP_AUTH_GEN)
changed_len += digest_len;
 
-   if (t->op_mask & BLOCKCIPHER_TEST_OP_AUTH_VERIFY) {
-   /* white-box test: PMDs use some of the
-* tailroom as temp storage in verify case
-*/
-   if (t->op_mask & BLOCKCIPHER_TEST_OP_CIPHER) {
-   /* This is simplified, not checking digest*/
-   changed_len += digest_len*2;
-   } else {
-   head_unchanged_len += digest_len;
-   changed_len += digest_len;
-   }
-   }
-
for (i = 0; i < mbuf->buf_len; i++) {
if (i == head_unchanged_len)
i += changed_len;
-- 
2.9.4



[dpdk-dev] [PATCH v2 0/5] make dpdk iova aware

2017-09-05 Thread Santosh Shukla
v2:
Include build fixes reported in patchworks.

Changeset based on deprecation notice[1], planned for v17.11 release.
Patches are based commit: 
(c42021fe56 : ethdev: rename map file to match library name)

Summary:
Renaming memory address translation api/ datatypes
and memory struct members to iova types.

1st patch : rename phys_addr_t to iova_addr_t
2nd patch : rename dma var mainly buf_physaddr to buf_iovaaddr
3rd patch : rename rte_memseg {.phys_addr} to {.iova_addr}.
4rd patch : rename memory translation api to _iova types.
5th patch : remove deprecation notice for dpdk iova aware.

v1 --> v2:
- Includes build fixes reported in v1 [2]
- aded separate patch for rte_memseg's phys_addr to iova_addr renaming.

Checkpatch warning:
- Noticed warning in changeset comining from legacy code. unreleated
  with iova changes.

Thanks.

[1]
http://dpdk.org/browse/dpdk/commit/doc/guides/rel_notes?id=caa570db61307e07efc461cf558ec291a3e71b29
[2] http://dpdk.org/ml/archives/test-report/2017-August/027020.html

Santosh Shukla (5):
  eal: rename phys_addr_t to iova_addr_t
  eal/memory: rename buf_physaddr to buf_iovaaddr
  eal/memory: rename memseg member phys to iova addr
  eal/memory: rename memory api to iova types
  doc: remove dpdk iova aware notice

 app/proc_info/main.c   |  2 +-
 app/test-crypto-perf/cperf_test_vector_parsing.c   |  4 +--
 app/test-crypto-perf/cperf_test_vectors.c  |  6 ++--
 app/test-crypto-perf/cperf_test_vectors.h  |  4 +--
 app/test-pmd/cmdline.c |  2 +-
 doc/guides/contributing/documentation.rst  |  4 +--
 doc/guides/prog_guide/cryptodev_lib.rst|  6 ++--
 doc/guides/prog_guide/img/mbuf1.svg|  2 +-
 doc/guides/rel_notes/deprecation.rst   |  7 
 doc/guides/rel_notes/release_17_11.rst | 27 ++
 drivers/bus/fslmc/fslmc_vfio.c |  2 +-
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h| 20 +--
 drivers/crypto/qat/qat_adf/qat_algs.h  |  6 ++--
 drivers/crypto/qat/qat_crypto.h|  2 +-
 drivers/crypto/qat/qat_qp.c|  2 +-
 drivers/mempool/dpaa2/dpaa2_hw_mempool.h   |  2 +-
 drivers/net/ark/ark_ddm.c  |  2 +-
 drivers/net/ark/ark_ddm.h  |  4 +--
 drivers/net/ark/ark_ethdev_rx.c| 24 ++---
 drivers/net/ark/ark_ethdev_tx.c|  6 ++--
 drivers/net/ark/ark_mpu.c  |  2 +-
 drivers/net/ark/ark_mpu.h  |  4 +--
 drivers/net/ark/ark_udm.c  |  2 +-
 drivers/net/ark/ark_udm.h  |  4 +--
 drivers/net/avp/avp_ethdev.c   |  2 +-
 drivers/net/avp/rte_avp_common.h   | 20 +--
 drivers/net/bnx2x/bnx2x.c  | 40 ++---
 drivers/net/bnx2x/bnx2x.h  | 22 ++--
 drivers/net/bnx2x/bnx2x_rxtx.c |  8 ++---
 drivers/net/bnx2x/bnx2x_stats.c|  2 +-
 drivers/net/bnx2x/bnx2x_vfpf.c |  2 +-
 drivers/net/bnx2x/ecore_sp.h   |  2 +-
 drivers/net/bnxt/bnxt.h| 10 +++---
 drivers/net/bnxt/bnxt_cpr.h|  4 +--
 drivers/net/bnxt/bnxt_ethdev.c | 10 +++---
 drivers/net/bnxt/bnxt_hwrm.c   | 14 
 drivers/net/bnxt/bnxt_ring.c   |  6 ++--
 drivers/net/bnxt/bnxt_ring.h   |  4 +--
 drivers/net/bnxt/bnxt_rxr.h|  4 +--
 drivers/net/bnxt/bnxt_txr.h|  2 +-
 drivers/net/bnxt/bnxt_vnic.c   |  6 ++--
 drivers/net/bnxt/bnxt_vnic.h   |  6 ++--
 drivers/net/cxgbe/sge.c|  4 +--
 drivers/net/e1000/em_rxtx.c|  4 +--
 drivers/net/e1000/igb_rxtx.c   |  4 +--
 drivers/net/ena/ena_ethdev.c   |  6 ++--
 drivers/net/enic/enic_main.c   |  2 +-
 drivers/net/enic/enic_rxtx.c   |  6 ++--
 drivers/net/fm10k/fm10k.h  |  4 +--
 drivers/net/fm10k/fm10k_ethdev.c   |  4 +--
 drivers/net/fm10k/fm10k_rxtx_vec.c |  4 +--
 drivers/net/i40e/i40e_ethdev.c |  2 +-
 drivers/net/i40e/i40e_fdir.c   |  2 +-
 drivers/net/i40e/i40e_rxtx.c   |  8 ++---
 drivers/net/i40e/i40e_rxtx_vec_altivec.c   |  4 +--
 drivers/net/i40e/i40e_rxtx_vec_neon.c  |  6 ++--
 drivers/net/i40e/i40e_rxtx_vec_sse.c   |  6 ++--
 drivers/net/ixgbe/ixgbe_rxtx.c |  4 +--
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c|  6 ++--
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c |  6 ++--

[dpdk-dev] [PATCH v2 1/5] eal: rename phys_addr_t to iova_addr_t

2017-09-05 Thread Santosh Shukla
Renamed data type from phys_addr_t to iova_addr_t.

Signed-off-by: Santosh Shukla 
---
v1 --> v2:
- clang build fix for v1 for linuxapp.

v1 note:
- As changes percolate to all possible dpdk subsystem..
  so its difficult to tag subject with one common title,
  but since, core of changeset is at eal layer so keeping 'eal:'
  as title.

 app/test-crypto-perf/cperf_test_vectors.h  |  4 +--
 doc/guides/contributing/documentation.rst  |  4 +--
 doc/guides/prog_guide/cryptodev_lib.rst|  6 ++--
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h| 14 
 drivers/crypto/qat/qat_adf/qat_algs.h  |  6 ++--
 drivers/crypto/qat/qat_crypto.h|  2 +-
 drivers/mempool/dpaa2/dpaa2_hw_mempool.h   |  2 +-
 drivers/net/ark/ark_ddm.c  |  2 +-
 drivers/net/ark/ark_ddm.h  |  4 +--
 drivers/net/ark/ark_ethdev_rx.c| 12 +++
 drivers/net/ark/ark_ethdev_tx.c|  2 +-
 drivers/net/ark/ark_mpu.c  |  2 +-
 drivers/net/ark/ark_mpu.h  |  4 +--
 drivers/net/ark/ark_udm.c  |  2 +-
 drivers/net/ark/ark_udm.h  |  4 +--
 drivers/net/avp/avp_ethdev.c   |  2 +-
 drivers/net/avp/rte_avp_common.h   | 20 +--
 drivers/net/bnx2x/bnx2x.c  | 40 +++---
 drivers/net/bnx2x/bnx2x.h  | 22 ++--
 drivers/net/bnx2x/bnx2x_rxtx.c |  4 +--
 drivers/net/bnx2x/bnx2x_stats.c|  2 +-
 drivers/net/bnx2x/bnx2x_vfpf.c |  2 +-
 drivers/net/bnx2x/ecore_sp.h   |  2 +-
 drivers/net/bnxt/bnxt.h| 10 +++---
 drivers/net/bnxt/bnxt_cpr.h|  4 +--
 drivers/net/bnxt/bnxt_ethdev.c |  2 +-
 drivers/net/bnxt/bnxt_ring.c   |  2 +-
 drivers/net/bnxt/bnxt_ring.h   |  2 +-
 drivers/net/bnxt/bnxt_rxr.h|  4 +--
 drivers/net/bnxt/bnxt_txr.h|  2 +-
 drivers/net/bnxt/bnxt_vnic.c   |  2 +-
 drivers/net/bnxt/bnxt_vnic.h   |  6 ++--
 drivers/net/liquidio/lio_rxtx.c|  2 +-
 drivers/net/liquidio/lio_rxtx.h|  4 +--
 drivers/net/qede/base/bcm_osal.h   |  2 +-
 drivers/net/sfc/efsys.h|  2 +-
 drivers/net/sfc/sfc_ef10_rx.c  |  2 +-
 drivers/net/sfc/sfc_ef10_tx.c  |  4 +--
 drivers/net/thunderx/base/nicvf_hw.c   |  2 +-
 drivers/net/thunderx/base/nicvf_hw.h   |  2 +-
 drivers/net/thunderx/base/nicvf_hw_defs.h  |  6 ++--
 drivers/net/thunderx/nicvf_ethdev.c|  4 +--
 drivers/net/thunderx/nicvf_ethdev.h|  4 +--
 drivers/net/thunderx/nicvf_struct.h|  6 ++--
 drivers/net/virtio/virtio_rxtx.h   |  4 +--
 drivers/net/virtio/virtqueue.h |  2 +-
 drivers/net/xenvirt/rte_eth_xenvirt.c  |  2 +-
 drivers/net/xenvirt/rte_mempool_gntalloc.c |  6 ++--
 drivers/net/xenvirt/rte_xen_lib.c  |  6 ++--
 drivers/net/xenvirt/rte_xen_lib.h  |  8 ++---
 examples/l2fwd-crypto/main.c   |  2 +-
 lib/librte_cryptodev/rte_crypto.h  |  2 +-
 lib/librte_cryptodev/rte_crypto_sym.h  |  6 ++--
 lib/librte_cryptodev/rte_cryptodev.h   |  2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c |  4 +--
 lib/librte_eal/common/include/rte_malloc.h |  2 +-
 lib/librte_eal/common/include/rte_memory.h | 18 +-
 lib/librte_eal/common/include/rte_memzone.h|  2 +-
 lib/librte_eal/common/rte_malloc.c |  2 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c   |  8 ++---
 lib/librte_eal/linuxapp/eal/eal_xen_memory.c   |  4 +--
 .../linuxapp/eal/include/exec-env/rte_kni_common.h | 20 ++-
 lib/librte_mbuf/rte_mbuf.h |  8 ++---
 lib/librte_mempool/rte_mempool.c   | 18 +-
 lib/librte_mempool/rte_mempool.h   | 14 
 lib/librte_vhost/vhost.h   |  2 +-
 test/test/test_cryptodev.h |  2 +-
 test/test/test_memzone.c   |  8 ++---
 68 files changed, 197 insertions(+), 193 deletions(-)

diff --git a/app/test-crypto-perf/cperf_test_vectors.h 
b/app/test-crypto-perf/cperf_test_vectors.h
index 85955703c..a203272cf 100644
--- a/app/test-crypto-perf/cperf_test_vectors.h
+++ b/app/test-crypto-perf/cperf_test_vectors.h
@@ -78,13 +78,13 @@ struct cperf_test_vector {
 
struct {
uint8_t *data;
-   phys_addr_t phys_addr;
+   iova_addr_t phy

[dpdk-dev] [PATCH v2 2/5] eal/memory: rename buf_physaddr to buf_iovaaddr

2017-09-05 Thread Santosh Shukla
Signed-off-by: Santosh Shukla 
---
v1 notes:
 Since crux of change is at eal/memory area so using that as
 title.

 doc/guides/prog_guide/img/mbuf1.svg|  2 +-
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h|  2 +-
 drivers/net/ark/ark_ethdev_rx.c|  8 
 drivers/net/bnx2x/bnx2x_rxtx.c |  4 ++--
 drivers/net/bnxt/bnxt_ring.h   |  2 +-
 drivers/net/cxgbe/sge.c|  4 ++--
 drivers/net/ena/ena_ethdev.c   |  6 +++---
 drivers/net/enic/enic_main.c   |  2 +-
 drivers/net/enic/enic_rxtx.c   |  6 +++---
 drivers/net/fm10k/fm10k.h  |  4 ++--
 drivers/net/fm10k/fm10k_rxtx_vec.c |  4 ++--
 drivers/net/i40e/i40e_rxtx_vec_altivec.c   |  4 ++--
 drivers/net/i40e/i40e_rxtx_vec_neon.c  |  6 +++---
 drivers/net/i40e/i40e_rxtx_vec_sse.c   |  6 +++---
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c|  6 +++---
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c |  6 +++---
 drivers/net/nfp/nfp_net.c  |  2 +-
 drivers/net/virtio/virtio_ethdev.c |  2 +-
 drivers/net/virtio/virtqueue.h |  2 +-
 .../linuxapp/eal/include/exec-env/rte_kni_common.h |  2 +-
 lib/librte_eal/linuxapp/kni/kni_net.c  |  6 +++---
 lib/librte_kni/rte_kni.c   |  2 +-
 lib/librte_mbuf/rte_mbuf.c |  6 +++---
 lib/librte_mbuf/rte_mbuf.h | 14 +++---
 lib/librte_vhost/virtio_net.c  |  2 +-
 test/test/test_mbuf.c  |  2 +-
 26 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/doc/guides/prog_guide/img/mbuf1.svg 
b/doc/guides/prog_guide/img/mbuf1.svg
index 5bd84d1bf..2f856bfd9 100644
--- a/doc/guides/prog_guide/img/mbuf1.svg
+++ b/doc/guides/prog_guide/img/mbuf1.svg
@@ -482,7 +482,7 @@
  sodipodi:role="line"
  x="187.85715"
  y="347.7193"
- id="tspan5240">(m->buf_physaddr is the(m->buf_iovaaddr is thebuf_physaddr)
+#define DPAA2_MBUF_VADDR_TO_IOVA(mbuf) ((mbuf)->buf_iovaaddr)
 #define DPAA2_OP_VADDR_TO_IOVA(op) (op->phys_addr)
 
 /**
diff --git a/drivers/net/ark/ark_ethdev_rx.c b/drivers/net/ark/ark_ethdev_rx.c
index 1cbda01a7..90cf304c0 100644
--- a/drivers/net/ark/ark_ethdev_rx.c
+++ b/drivers/net/ark/ark_ethdev_rx.c
@@ -500,22 +500,22 @@ eth_ark_rx_seed_mbufs(struct ark_rx_queue *queue)
case 0:
while (count != nb) {
queue->paddress_q[seed_m++] =
-   (*mbufs++)->buf_physaddr;
+   (*mbufs++)->buf_iovaaddr;
count++;
/* FALLTHROUGH */
case 3:
queue->paddress_q[seed_m++] =
-   (*mbufs++)->buf_physaddr;
+   (*mbufs++)->buf_iovaaddr;
count++;
/* FALLTHROUGH */
case 2:
queue->paddress_q[seed_m++] =
-   (*mbufs++)->buf_physaddr;
+   (*mbufs++)->buf_iovaaddr;
count++;
/* FALLTHROUGH */
case 1:
queue->paddress_q[seed_m++] =
-   (*mbufs++)->buf_physaddr;
+   (*mbufs++)->buf_iovaaddr;
count++;
/* FALLTHROUGH */
 
diff --git a/drivers/net/bnx2x/bnx2x_rxtx.c b/drivers/net/bnx2x/bnx2x_rxtx.c
index 7336124fc..e558bb12c 100644
--- a/drivers/net/bnx2x/bnx2x_rxtx.c
+++ b/drivers/net/bnx2x/bnx2x_rxtx.c
@@ -140,7 +140,7 @@ bnx2x_dev_rx_queue_setup(struct rte_eth_dev *dev,
return -ENOMEM;
}
rxq->sw_ring[idx] = mbuf;
-   rxq->rx_ring[idx] = mbuf->buf_physaddr;
+   rxq->rx_ring[idx] = mbuf->buf_iovaaddr;
}
rxq->pkt_first_seg = NULL;
rxq->pkt_last_seg = NULL;
@@ -400,7 +400,7 @@ bnx2x_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
 
rx_mb = rxq->sw_ring[bd_cons];
rxq->sw_ring[bd_cons] = new_mb;
-   rxq->rx_ring[bd_prod] = new_mb->buf_physaddr;
+   rxq->rx_ring[bd_prod] = new_mb->buf_iovaaddr;
 
rx_pref = NEXT_RX_BD(bd_cons) & MAX_RX_BD(rxq);
rte_prefetch0(rxq->sw_ring[rx_pref]);
diff --git a/drivers/net/bnxt/bnxt_ring.h b/drivers/net/bnxt/bnxt_ring.h
index 09042cb80..79504af24 100644
--- a/drivers/net/bnxt/bnxt_ring.h
+++ b/drivers/net/bnxt/bnxt_ring.h
@@ -41,7 +41,7 @@
 #define RING_NEXT(ring, idx)   (((idx) + 1) & (ring)->ring_mask)
 
 #define RTE_MBUF_DAT

[dpdk-dev] [PATCH v2 4/5] eal/memory: rename memory api to iova types

2017-09-05 Thread Santosh Shukla
Renamed memory translational api to _iova types.
The following api renamed from:

rte_mempool_populate_phys()
rte_mempool_populate_phys_tab()
rte_eal_using_phys_addrs()
rte_mem_virt2phy()
rte_dump_physmem_layout()
rte_eal_get_physmem_layout()
rte_eal_get_physmem_size()
rte_malloc_virt2phy()
rte_mem_phy2mch()

To the following iova types api:

rte_mempool_populate_iova()
rte_mempool_populate_iova_tab()
rte_eal_using_iova_addrs()
rte_mem_virt2iova()
rte_dump_iovamem_layout()
rte_eal_get_iovamem_layout()
rte_eal_get_iovamem_size()
rte_malloc_virt2iova()
rte_mem_phy2iova()

Signed-off-by: Santosh Shukla 
---
 app/proc_info/main.c |  2 +-
 app/test-crypto-perf/cperf_test_vector_parsing.c |  4 ++--
 app/test-crypto-perf/cperf_test_vectors.c|  6 +++---
 app/test-pmd/cmdline.c   |  2 +-
 drivers/bus/fslmc/fslmc_vfio.c   |  2 +-
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h  |  4 ++--
 drivers/crypto/qat/qat_qp.c  |  2 +-
 drivers/net/ark/ark_ethdev_rx.c  |  4 ++--
 drivers/net/ark/ark_ethdev_tx.c  |  4 ++--
 drivers/net/bnxt/bnxt_ethdev.c   |  8 
 drivers/net/bnxt/bnxt_hwrm.c | 14 +++---
 drivers/net/bnxt/bnxt_ring.c |  4 ++--
 drivers/net/bnxt/bnxt_vnic.c |  4 ++--
 drivers/net/e1000/em_rxtx.c  |  4 ++--
 drivers/net/e1000/igb_rxtx.c |  4 ++--
 drivers/net/fm10k/fm10k_ethdev.c |  4 ++--
 drivers/net/i40e/i40e_ethdev.c   |  2 +-
 drivers/net/i40e/i40e_fdir.c |  2 +-
 drivers/net/i40e/i40e_rxtx.c |  8 
 drivers/net/ixgbe/ixgbe_rxtx.c   |  4 ++--
 drivers/net/liquidio/lio_rxtx.c  |  2 +-
 drivers/net/mlx4/mlx4.c  |  2 +-
 drivers/net/mlx5/mlx5_mr.c   |  2 +-
 drivers/net/sfc/sfc.c|  2 +-
 drivers/net/sfc/sfc_tso.c|  2 +-
 examples/l2fwd-crypto/main.c |  2 +-
 lib/librte_cryptodev/rte_cryptodev.c |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c  |  2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c   |  2 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map| 12 ++--
 lib/librte_eal/common/eal_common_memory.c|  6 +++---
 lib/librte_eal/common/eal_common_memzone.c   |  4 ++--
 lib/librte_eal/common/eal_private.h  |  2 +-
 lib/librte_eal/common/include/rte_malloc.h   |  2 +-
 lib/librte_eal/common/include/rte_memory.h   | 12 ++--
 lib/librte_eal/common/rte_malloc.c   |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c|  2 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c |  8 
 lib/librte_eal/linuxapp/eal/eal_pci.c|  4 ++--
 lib/librte_eal/linuxapp/eal/eal_vfio.c   |  6 +++---
 lib/librte_eal/linuxapp/eal/rte_eal_version.map  | 12 ++--
 lib/librte_mempool/rte_mempool.c | 24 
 lib/librte_mempool/rte_mempool.h |  4 ++--
 lib/librte_mempool/rte_mempool_version.map   |  4 ++--
 lib/librte_vhost/vhost_user.c|  4 ++--
 test/test/commands.c |  2 +-
 test/test/test_malloc.c  |  4 ++--
 test/test/test_memory.c  |  6 +++---
 test/test/test_mempool.c |  4 ++--
 test/test/test_memzone.c | 10 +-
 50 files changed, 120 insertions(+), 120 deletions(-)

diff --git a/app/proc_info/main.c b/app/proc_info/main.c
index 8b753a2ee..16df6d4b1 100644
--- a/app/proc_info/main.c
+++ b/app/proc_info/main.c
@@ -297,7 +297,7 @@ static void
 meminfo_display(void)
 {
printf("--- MEMORY_SEGMENTS ---\n");
-   rte_dump_physmem_layout(stdout);
+   rte_dump_iovamem_layout(stdout);
printf("- END_MEMORY_SEGMENTS -\n");
 
printf(" MEMORY_ZONES -\n");
diff --git a/app/test-crypto-perf/cperf_test_vector_parsing.c 
b/app/test-crypto-perf/cperf_test_vector_parsing.c
index 148a60414..2e4e10a85 100644
--- a/app/test-crypto-perf/cperf_test_vector_parsing.c
+++ b/app/test-crypto-perf/cperf_test_vector_parsing.c
@@ -390,7 +390,7 @@ parse_entry(char *entry, struct cperf_test_vector *vector,
} else if (strstr(key_token, "aad")) {
rte_free(vector->aad.data);
vector->aad.data = data;
-   vector->aad.phys_addr = rte_malloc_virt2phy(vector->aad.data);
+   vector->aad.phys_addr = rte_malloc_virt2iova(vector->aad.data);
if (tc_found)
vector->aad.length = data_length;
else {
@@ -405,7 +405,7 @@ parse_entry(char *entry, struct cperf_test_vector *vector,
   

[dpdk-dev] [PATCH v2 3/5] eal/memory: rename memseg member phys to iova addr

2017-09-05 Thread Santosh Shukla
Renaming rte_memseg {.phys_addr} to {.iova_addr}

Signed-off-by: Santosh Shukla 
---
v1 --> v2:
- includes freebsdp v1 build fixes.

 lib/librte_eal/bsdapp/eal/eal_memory.c | 4 ++--
 lib/librte_eal/common/eal_common_memory.c  | 2 +-
 lib/librte_eal/common/include/rte_memory.h | 4 ++--
 lib/librte_eal/common/rte_malloc.c | 5 +++--
 lib/librte_eal/linuxapp/eal/eal_memory.c   | 8 
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 4 ++--
 6 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c 
b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 10c2e121f..d8882dcef 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -73,7 +73,7 @@ rte_eal_hugepage_init(void)
/* for debug purposes, hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
addr = malloc(internal_config.memory);
-   mcfg->memseg[0].phys_addr = (iova_addr_t)(uintptr_t)addr;
+   mcfg->memseg[0].iova_addr = (iova_addr_t)(uintptr_t)addr;
mcfg->memseg[0].addr = addr;
mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
mcfg->memseg[0].len = internal_config.memory;
@@ -114,7 +114,7 @@ rte_eal_hugepage_init(void)
 
seg = &mcfg->memseg[seg_idx++];
seg->addr = addr;
-   seg->phys_addr = physaddr;
+   seg->iova_addr = (iova_addr_t)physaddr;
seg->hugepage_sz = hpi->hugepage_sz;
seg->len = hpi->hugepage_sz;
seg->nchannel = mcfg->nchannel;
diff --git a/lib/librte_eal/common/eal_common_memory.c 
b/lib/librte_eal/common/eal_common_memory.c
index 996877ef5..5ed83d20a 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -100,7 +100,7 @@ rte_dump_physmem_layout(FILE *f)
   "virt:%p, socket_id:%"PRId32", "
   "hugepage_sz:%"PRIu64", nchannel:%"PRIx32", "
   "nrank:%"PRIx32"\n", i,
-  mcfg->memseg[i].phys_addr,
+  mcfg->memseg[i].iova_addr,
   mcfg->memseg[i].len,
   mcfg->memseg[i].addr,
   mcfg->memseg[i].socket_id,
diff --git a/lib/librte_eal/common/include/rte_memory.h 
b/lib/librte_eal/common/include/rte_memory.h
index 5face8c86..6b148ba8e 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -98,14 +98,14 @@ enum rte_page_sizes {
  */
 #define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
 
-typedef uint64_t iova_addr_t; /**< Physical address definition. */
+typedef uint64_t iova_addr_t; /**< Iova address definition. */
 #define RTE_BAD_PHYS_ADDR ((iova_addr_t)-1)
 
 /**
  * Physical memory segment descriptor.
  */
 struct rte_memseg {
-   iova_addr_t phys_addr;  /**< Start physical address. */
+   iova_addr_t iova_addr;  /**< Start iova(_pa or _va) address. */
RTE_STD_C11
union {
void *addr; /**< Start virtual address. */
diff --git a/lib/librte_eal/common/rte_malloc.c 
b/lib/librte_eal/common/rte_malloc.c
index 3ce6034bf..b65a06f9d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -254,7 +254,8 @@ rte_malloc_virt2phy(const void *addr)
const struct malloc_elem *elem = malloc_elem_from_data(addr);
if (elem == NULL)
return RTE_BAD_PHYS_ADDR;
-   if (elem->ms->phys_addr == RTE_BAD_PHYS_ADDR)
+   if (elem->ms->iova_addr == RTE_BAD_PHYS_ADDR)
return RTE_BAD_PHYS_ADDR;
-   return elem->ms->phys_addr + ((uintptr_t)addr - 
(uintptr_t)elem->ms->addr);
+   return elem->ms->iova_addr +
+   ((uintptr_t)addr - (uintptr_t)elem->ms->addr);
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 405c15bcd..5d9702c72 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -154,7 +154,7 @@ rte_mem_virt2phy(const void *virtaddr)
if (virtaddr > memseg->addr &&
virtaddr < RTE_PTR_ADD(memseg->addr,
memseg->len)) {
-   return memseg->phys_addr +
+   return memseg->iova_addr +
RTE_PTR_DIFF(virtaddr, memseg->addr);
}
}
@@ -1059,7 +1059,7 @@ rte_eal_hugepage_init(void)
strerror(errno));
return -1;
}
-   mcfg->memseg[0].phys_addr = RTE_BAD_PHYS_ADDR;
+   mcfg->memseg[0].iova_addr = RTE_BAD_PHYS_ADDR;
mcfg->memseg[0

[dpdk-dev] [PATCH v2 5/5] doc: remove dpdk iova aware notice

2017-09-05 Thread Santosh Shukla
Removed dpdk iova aware ABI deprecation notice,
and updated ABI change details in release_17.11.rst.

Signed-off-by: Santosh Shukla 
---
 doc/guides/rel_notes/deprecation.rst   |  7 ---
 doc/guides/rel_notes/release_17_11.rst | 27 +++
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 3362f3350..6482363bf 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -32,13 +32,6 @@ Deprecation Notices
 * eal: the support of Xen dom0 will be removed from EAL in 17.11; and with
   that, drivers/net/xenvirt and examples/vhost_xen will also be removed.
 
-* eal: An ABI change is planned for 17.11 to make DPDK aware of IOVA address
-  translation scheme.
-  Reference to phys address in EAL data-structure or functions may change to
-  IOVA address or more appropriate name.
-  The change will be only for the name.
-  Functional aspects of the API or data-structure will remain same.
-
 * The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated and
   are respectively replaced by PKT_RX_VLAN_STRIPPED and
   PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
diff --git a/doc/guides/rel_notes/release_17_11.rst 
b/doc/guides/rel_notes/release_17_11.rst
index 170f4f916..30d0c0229 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -124,7 +124,34 @@ ABI Changes
Also, make sure to start the actual text at the margin.
=
 
+* **Following datatypes, structure member and function renamed to iova type.**
 
+  * Renamed ``phys_addr_t`` to ``iova_addr_t``.
+  * Renamed ``buf_physaddr`` to ``buf_iovaaddr`` for struct rte_mbuf.
+  * Renamed ``phys_addr`` to ``iova_addr`` for struct rte_memseg.
+  * The Following memory translation api renamed from:
+
+* ``rte_mempool_populate_phys()``
+* ``rte_mempool_populate_phys_tab()``
+* ``rte_eal_using_phys_addrs()``
+* ``rte_mem_virt2phy()``
+* ``rte_dump_physmem_layout()``
+* ``rte_eal_get_physmem_layout()``
+* ``rte_eal_get_physmem_size()``
+* ``rte_malloc_virt2phy()``
+* ``rte_mem_phy2mch()``
+
+  * To the following iova types api:
+
+* ``rte_mempool_populate_iova()``
+* ``rte_mempool_populate_iova_tab()``
+* ``rte_eal_using_iova_addrs()``
+* ``rte_mem_virt2iova()``
+* ``rte_dump_iovamem_layout()``
+* ``rte_eal_get_iovamem_layout()``
+* ``rte_eal_get_iovamem_size()``
+* ``rte_malloc_virt2iova()``
+* ``rte_mem_phy2iova()``
 
 Shared Library Versions
 ---
-- 
2.11.0



Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal event

2017-09-05 Thread Matan Azrad
Hi Adrien

> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> Sent: Tuesday, September 5, 2017 12:28 PM
> To: Matan Azrad 
> Cc: Nélio Laranjeiro ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal event
> 
> Hi Matan,
> 
> On Mon, Sep 04, 2017 at 05:52:55PM +, Matan Azrad wrote:
> > Hi Adrien,
> >
> > > -Original Message-
> > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > > Sent: Monday, September 4, 2017 6:33 PM
> > > To: Matan Azrad 
> > > Cc: Nélio Laranjeiro ; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal
> > > event
> > >
> > > Hi Matan,
> > >
> > > One comment I have is, while this patch adds support for RMV, it
> > > also silently addresses a bug (see large comment you added to
> > > priv_link_status_update()).
> > >
> > > This should be split in two commits, with the fix part coming first
> > > and CC sta...@dpdk.org, and a second commit adding RMV support
> proper.
> > >
> >
> > Actually, the mlx4 bug was not appeared in the mlx5 previous code,
> > Probably because the RMV interrupt was not implemented in mlx5 before
> this patch.
> 
> Good point, no RMV could occur before it is implemented, however a
> dedicated commit for the fix itself (i.e. alarm callback not supposed to end 
> up
> calling ibv_get_async_event()) might better explain the logic behind these
> changes. What I mean is, if there was no problem, you wouldn't need to
> make
> priv_link_status_update() a separate function, right?
> 

The separation was done mainly because of the new interrupt implementation,
else, there was bug here.
The unnecessary  alarm ibv_get_async_event calling was harmless in
the previous code.
I gets your point for the logic explanation behind these changes and I can add 
it in this
patch commit log to be clearer, something like:
The link update operation was separated from the interrupt callback
to avoid RMV interrupt disregard and unnecessary event acknowledgment
caused by the inconsistent link status alarm callback.

> > The big comment just explains the link inconsistent issue and was
> > added here since Nelio and I think the new function,
> > priv_link_status_update(), justifies this comment for future review.
> 
> I understand, this could also have been part of the commit log of the
> dedicated commit.
> 
Are you sure we need to describe the code comment reason in the commit log?

> Thanks.
> 
> --
> Adrien Mazarguil
> 6WIND


[dpdk-dev] [PATCH v4 1/3] eal: introduce integer divide through reciprocal

2017-09-05 Thread Pavan Nikhilesh
From: Pavan Bhagavatula 

In some use cases of integer division, denominator remains constant and
numerator varies. It is possible to optimize division for such specific
scenarios.

The librte_sched uses rte_reciprocal to optimize division so, moving it to
eal/common would allow other libraries and applications to use it.

Signed-off-by: Pavan Nikhilesh 
Reviewed-by: Anatoly Burakov 
---
v4 changes:
 - minor fix for test cases
 - fix u32 divisor generation

v3 changes:
 - fix x86_32 compilation issue
 - fix improper licence in test

v2 changes:
 - fix compilation issues with .map files
 - add test cases for correctness and performance
 - remove extra licence inclusion
 - fix coding style issues

 lib/librte_eal/bsdapp/eal/Makefile   | 1 +
 lib/librte_eal/bsdapp/eal/rte_eal_version.map| 7 +++
 lib/librte_eal/common/Makefile   | 1 +
 lib/{librte_sched => librte_eal/common/include}/rte_reciprocal.h | 6 --
 lib/{librte_sched => librte_eal/common}/rte_reciprocal.c | 6 --
 lib/librte_eal/linuxapp/eal/Makefile | 1 +
 lib/librte_eal/linuxapp/eal/rte_eal_version.map  | 7 +++
 lib/librte_sched/Makefile| 2 --
 lib/librte_sched/rte_sched.c | 2 +-
 9 files changed, 26 insertions(+), 7 deletions(-)
 rename lib/{librte_sched => librte_eal/common/include}/rte_reciprocal.h (87%)
 rename lib/{librte_sched => librte_eal/common}/rte_reciprocal.c (96%)

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index 005019e..56f9804 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -88,6 +88,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_elem.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_heap.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_keepalive.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_service.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_reciprocal.c

 # from arch dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_cpuflags.c
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index aac6fd7..90d7258 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -237,3 +237,10 @@ EXPERIMENTAL {
rte_service_unregister;

 } DPDK_17.08;
+
+DPDK_17.11 {
+   global:
+
+   rte_reciprocal_value;
+
+} DPDK_17.08;
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index e8fd67a..a680b2d 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -42,6 +42,7 @@ INC += rte_hexdump.h rte_devargs.h rte_bus.h rte_dev.h 
rte_vdev.h
 INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
 INC += rte_malloc.h rte_keepalive.h rte_time.h
 INC += rte_service.h rte_service_component.h
+INC += rte_reciprocal.h

 GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
 GENERIC_INC += rte_spinlock.h rte_memcpy.h rte_cpuflags.h rte_rwlock.h
diff --git a/lib/librte_sched/rte_reciprocal.h 
b/lib/librte_eal/common/include/rte_reciprocal.h
similarity index 87%
rename from lib/librte_sched/rte_reciprocal.h
rename to lib/librte_eal/common/include/rte_reciprocal.h
index 5e21f09..b6d752f 100644
--- a/lib/librte_sched/rte_reciprocal.h
+++ b/lib/librte_eal/common/include/rte_reciprocal.h
@@ -29,13 +29,15 @@ struct rte_reciprocal {
uint8_t sh1, sh2;
 };

-static inline uint32_t rte_reciprocal_divide(uint32_t a, struct rte_reciprocal 
R)
+static inline uint32_t
+rte_reciprocal_divide(uint32_t a, struct rte_reciprocal R)
 {
uint32_t t = (uint32_t)(((uint64_t)a * R.m) >> 32);

return (t + ((a - t) >> R.sh1)) >> R.sh2;
 }

-struct rte_reciprocal rte_reciprocal_value(uint32_t d);
+struct rte_reciprocal
+rte_reciprocal_value(uint32_t d);

 #endif /* _RTE_RECIPROCAL_H_ */
diff --git a/lib/librte_sched/rte_reciprocal.c 
b/lib/librte_eal/common/rte_reciprocal.c
similarity index 96%
rename from lib/librte_sched/rte_reciprocal.c
rename to lib/librte_eal/common/rte_reciprocal.c
index 652f023..7ab99b4 100644
--- a/lib/librte_sched/rte_reciprocal.c
+++ b/lib/librte_eal/common/rte_reciprocal.c
@@ -41,7 +41,8 @@
 /* find largest set bit.
  * portable and slow but does not matter for this usage.
  */
-static inline int fls(uint32_t x)
+static inline int
+fls(uint32_t x)
 {
int b;

@@ -53,7 +54,8 @@ static inline int fls(uint32_t x)
return 0;
 }

-struct rte_reciprocal rte_reciprocal_value(uint32_t d)
+struct rte_reciprocal
+rte_reciprocal_value(uint32_t d)
 {
struct rte_reciprocal R;
uint64_t m;
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 90bca4d..98f3b8e 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -100,6 +100,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_

[dpdk-dev] [PATCH v4 2/3] eal: add u64 bit variant for reciprocal

2017-09-05 Thread Pavan Nikhilesh
From: Pavan Bhagavatula 

Currently, rte_reciprocal only supports unsigned 32bit divisors. This
commit adds support for unsigned 64bit divisors.

Rename unsigned 32bit specific functions appropriately and update
librte_sched accordingly.

Signed-off-by: Pavan Nikhilesh 
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   3 +-
 lib/librte_eal/common/include/rte_reciprocal.h  | 111 +--
 lib/librte_eal/common/rte_reciprocal.c  | 116 +---
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   3 +-
 lib/librte_sched/Makefile   |   4 +-
 lib/librte_sched/rte_sched.c|   9 +-
 6 files changed, 220 insertions(+), 26 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 90d7258..59a85bb 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -241,6 +241,7 @@ EXPERIMENTAL {
 DPDK_17.11 {
global:
 
-   rte_reciprocal_value;
+   rte_reciprocal_value_u32;
+   rte_reciprocal_value_u64;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_reciprocal.h 
b/lib/librte_eal/common/include/rte_reciprocal.h
index b6d752f..801d1c8 100644
--- a/lib/librte_eal/common/include/rte_reciprocal.h
+++ b/lib/librte_eal/common/include/rte_reciprocal.h
@@ -22,22 +22,117 @@
 #ifndef _RTE_RECIPROCAL_H_
 #define _RTE_RECIPROCAL_H_
 
-#include 
+#include 
 
-struct rte_reciprocal {
+/**
+ * Unsigned 32-bit divisor structure.
+ */
+struct rte_reciprocal_u32 {
uint32_t m;
uint8_t sh1, sh2;
-};
+} __rte_cache_aligned;
+
+/**
+ * Unsigned 64-bit divisor structure.
+ */
+struct rte_reciprocal_u64 {
+   uint64_t m;
+   uint8_t sh1;
+} __rte_cache_aligned;
 
+/**
+ * Divide given unsigned 32-bit integer with pre calculated divisor.
+ *
+ * @param a
+ *   The 32-bit dividend.
+ * @param R
+ *   The pointer to pre calculated divisor reciprocal structure.
+ *
+ * @return
+ *   The result of the division
+ */
 static inline uint32_t
-rte_reciprocal_divide(uint32_t a, struct rte_reciprocal R)
+rte_reciprocal_divide_u32(uint32_t a, struct rte_reciprocal_u32 *R)
+{
+   uint32_t t = (((uint64_t)a * R->m) >> 32);
+
+   return (t + ((a - t) >> R->sh1)) >> R->sh2;
+}
+
+static inline uint64_t
+mullhi_u64(uint64_t x, uint64_t y)
+{
+#ifdef __SIZEOF_INT128__
+   __uint128_t xl = x;
+   __uint128_t rl = xl * y;
+
+   return (rl >> 64);
+#else
+   uint64_t u0, u1, v0, v1, k, t;
+   uint64_t w1, w2;
+   uint64_t whi;
+
+   u1 = x >> 32; u0 = x & 0x;
+   v1 = y >> 32; v0 = y & 0x;
+
+   t = u0*v0;
+   k = t >> 32;
+
+   t = u1*v0 + k;
+   w1 = t & 0x;
+   w2 = t >> 32;
+
+   t = u0*v1 + w1;
+   k = t >> 32;
+
+   whi = u1*v1 + w2 + k;
+
+   return whi;
+#endif
+}
+
+/**
+ * Divide given unsigned 64-bit integer with pre calculated divisor.
+ *
+ * @param a
+ *   The 64-bit dividend.
+ * @param R
+ *   The pointer to pre calculated divisor reciprocal structure.
+ *
+ * @return
+ *   The result of the division
+ */
+static inline uint64_t
+rte_reciprocal_divide_u64(uint64_t a, struct rte_reciprocal_u64 *R)
 {
-   uint32_t t = (uint32_t)(((uint64_t)a * R.m) >> 32);
+   uint64_t q = mullhi_u64(R->m, a);
+   uint64_t t = ((a - q) >> 1) + q;
 
-   return (t + ((a - t) >> R.sh1)) >> R.sh2;
+   return t >> R->sh1;
 }
 
-struct rte_reciprocal
-rte_reciprocal_value(uint32_t d);
+/**
+ * Generate pre calculated divisor structure.
+ *
+ * @param d
+ *   The unsigned 32-bit divisor.
+ *
+ * @return
+ *   Divisor structure.
+ */
+struct rte_reciprocal_u32
+rte_reciprocal_value_u32(uint32_t d);
+
+/**
+ * Generate pre calculated divisor structure.
+ *
+ * @param d
+ *   The unsigned 64-bit divisor.
+ *
+ * @return
+ *   Divisor structure.
+ */
+struct rte_reciprocal_u64
+rte_reciprocal_value_u64(uint64_t d);
 
 #endif /* _RTE_RECIPROCAL_H_ */
diff --git a/lib/librte_eal/common/rte_reciprocal.c 
b/lib/librte_eal/common/rte_reciprocal.c
index 7ab99b4..2024e62 100644
--- a/lib/librte_eal/common/rte_reciprocal.c
+++ b/lib/librte_eal/common/rte_reciprocal.c
@@ -31,18 +31,13 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include 
-#include 
-
-#include 
-
-#include "rte_reciprocal.h"
+#include 
 
 /* find largest set bit.
  * portable and slow but does not matter for this usage.
  */
 static inline int
-fls(uint32_t x)
+fls_u32(uint32_t x)
 {
int b;
 
@@ -54,14 +49,14 @@ fls(uint32_t x)
return 0;
 }
 
-struct rte_reciprocal
-rte_reciprocal_value(uint32_t d)
+struct rte_reciprocal_u32
+rte_reciprocal_value_u32(uint32_t d)
 {
-   struct rte_reciprocal R;
+   struct rte_reciprocal_u32 R;
uint64_t m;
int l;
 
-   l = fls(d - 1);
+   l = fls_u32(d - 1);
m = ((1ULL << 32) * ((1ULL << l) - d));
m /= d;
 

[dpdk-dev] [PATCH v4 3/3] test: add tests for reciprocal based division

2017-09-05 Thread Pavan Nikhilesh
From: Pavan Bhagavatula 

This commit provides a set of tests for verifying the correctness and
performance of both unsigned 32 and 64bit reciprocal based division.

Signed-off-by: Pavan Nikhilesh 
---
 test/test/Makefile|   2 +
 test/test/test_reciprocal_division.c  | 109 +
 test/test/test_reciprocal_division_perf.c | 193 ++
 3 files changed, 304 insertions(+)
 create mode 100644 test/test/test_reciprocal_division.c
 create mode 100644 test/test/test_reciprocal_division_perf.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 42d9a49..6017862 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -94,6 +94,8 @@ SRCS-y += test_cycles.c
 SRCS-y += test_spinlock.c
 SRCS-y += test_memory.c
 SRCS-y += test_memzone.c
+SRCS-y += test_reciprocal_division.c
+SRCS-y += test_reciprocal_division_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/test_reciprocal_division.c 
b/test/test/test_reciprocal_division.c
new file mode 100644
index 000..771ea64
--- /dev/null
+++ b/test/test/test_reciprocal_division.c
@@ -0,0 +1,109 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright (C) Cavium, Inc. 2017.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Cavium, Inc nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "test.h"
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define MAX_ITERATIONS 100
+#define DIVIDE_ITER100
+
+static int
+test_reciprocal_division(void)
+{
+   int i;
+   int result = 0;
+   uint32_t divisor_u32 = 0;
+   uint32_t dividend_u32;
+   uint32_t nresult_u32;
+   uint32_t rresult_u32;
+   uint64_t divisor_u64 = 0;
+   uint64_t dividend_u64;
+   uint64_t nresult_u64;
+   uint64_t rresult_u64;
+   struct rte_reciprocal_u32 reci_u32;
+   struct rte_reciprocal_u64 reci_u64;
+
+   rte_srand(rte_rdtsc());
+   printf("Validating unsigned 32bit division.\n");
+   for (i = 0; i < MAX_ITERATIONS; i++) {
+   /* Change divisor every DIVIDE_ITER iterations. */
+   if (i % DIVIDE_ITER == 0) {
+   divisor_u32 = rte_rand();
+   reci_u32 = rte_reciprocal_value_u32(divisor_u32);
+   }
+
+   dividend_u32 = rte_rand();
+   nresult_u32 = dividend_u32 / divisor_u32;
+   rresult_u32 = rte_reciprocal_divide_u32(dividend_u32,
+   &reci_u32);
+   if (nresult_u32 != rresult_u32) {
+   printf("Division failed, expected %"PRIu32" "
+  "result %"PRIu32"",
+   nresult_u32, rresult_u32);
+   result = 1;
+   break;
+   }
+   }
+
+   printf("Validating unsigned 64bit division.\n");
+   for (i = 0; i < MAX_ITERATIONS; i++) {
+   /* Change divisor every DIVIDE_ITER iterations. */
+   if (i % DIVIDE_ITER == 0) {
+   divisor_u64 = rte_rand();
+   reci_u64 = rte_reciprocal_value_u64(divisor_u64);
+   }
+
+   dividend_u64 = rte_rand();
+   nresult_u64 = dividend_u64 / divisor_u64;
+   rresult_u64 = rte_reciprocal_divide_u64(dividend_u64,
+   &reci_u64);
+   if (nresult_u64 != rresult_u64) {
+

Re: [dpdk-dev] [PATCH 4/4] ethdev: add helpers to move to the new offloads API

2017-09-05 Thread Shahaf Shuler
Tuesday, September 5, 2017 11:10 AM, Ananyev, Konstantin:

> > > > > In fact, right now it is possible to query/change these 3 vlan
> > > > > offload flags on the fly (after dev_start) on  port basis by
> rte_eth_dev_(get|set)_vlan_offload API.

Regarding this API from ethdev.

So this seems like a hack on ethdev. Currently there are 2 ways for user to set 
Rx vlan offloads.
One is through dev_configure which require the ports to be stopped. The other 
is this API which can set even if the port is started.

We should have only one place were application set offloads and this is 
currently on dev_configure,
And future to be on rx_queue_setup.

I would say that this API should be removed as well.
Application which wants to change those offloads will stop the ports and 
reconfigure the PMD.
Am quite sure that there are PMDs which need to re-create the Rxq based on vlan 
offloads changing and this cannot be done while the traffic flows.


> > > > > So, I think at least these 3 flags need to be remained on a port 
> > > > > basis.
> > > >
> > > > I don't understand how it helps to be able to configure the same
> > > > thing in 2 places.
> > >
> > > Because some offloads are per device, another - per queue.
> > > Configuring on a device basis would allow most users to conjure all
> > > queues in the same manner by default.
> > > Those users who would  need more fine-grained setup (per queue) will
> > > be able to overwrite it by rx_queue_setup().
> >
> > Those users can set the same config for all queues.
> > >
> > > > I think you are just describing a limitation of these HW: some
> > > > offloads must be the same for all queues.
> > >
> > > As I said above - on some devices some offloads might also affect
> > > queues that belong to VFs (to another ports in DPDK words).
> > > You might never invoke rx_queue_setup() for these queues per your
> app.
> > > But you still want to enable this offload on that device.
> 
> I am ok with having per-port and per-queue offload configuration.
> My concern is that after that patch only per-queue offload configuration will
> remain.
> I think we need both.

So looks like we all agree PMDs should report as part of the 
rte_eth_dev_info_get which offloads are per port and which are per queue.

Regarding the offloads configuration by application I see 2 options:
1. have an API to set offloads per port as part of device configure and API to 
set offloads per queue as part of queue setup
2. set all offloads as part of queue configuration (per port offloads will be 
set equally for all queues). In case of a mixed configuration for port offloads 
PMD will return error.
Such error can be reported on device start. The PMD will traverse the 
queues and check for conflicts.

I will focus on the cons, since both achieve the goal:

Cons of #1:
- Two places to configure offloads.
- Like Thomas mentioned - what about offloads per device? This direction leads 
to more places to configure the offloads.

Cons of #2:
- Late error reporting - on device start and not on queue setup.

I would go with #2.

> Konstantin
> 
> >
> > You are advocating for per-port configuration API because some
> > settings must be the same on all the ports of your hardware?
> > So there is a big trouble. You don't need per-port settings, but
> > per-hw-device settings.
> > Or would you accept more fine-grained per-port settings?
> > If yes, you can accept even finer grained per-queues settings.
> > >
> > > > It does not prevent from configuring them in the per-queue setup.
> > > >
> > > > > In fact, why can't we have both per port and per queue RX offload:
> > > > > - dev_configure() will accept RX_OFFLOAD_* flags and apply them on
> a port basis.
> > > > > - rx_queue_setup() will also accept RX_OFFLOAD_* flags and apply
> them on a queue basis.
> > > > > - if particular RX_OFFLOAD flag for that device couldn't be setup on a
> queue basis  -
> > > > >rx_queue_setup() will return an error.
> > > >
> > > > The queue setup can work while the value is the same for every
> queues.
> > >
> > > Ok, and how people would know that?
> > > That for device N offload X has to be the same for all queues, and
> > > for device M offload X can be differs for different queues.
> >
> > We can know the hardware limitations by filling this information at
> > PMD init.
> >
> > > Again, if we don't allow to enable/disable offloads for particular
> > > queue, why to bother with updating rx_queue_setup() API at all?
> >
> > I do not understand this question.
> >
> > > > > - rte_eth_rxq_info can be extended to provide information which
> RX_OFFLOADs
> > > > >   can be configured on a per queue basis.
> > > >
> > > > Yes the PMD should advertise its limitations like being forced to
> > > > apply the same configuration to all its queues.
> > >
> > > Didn't get your last sentence.
> >
> > I agree that the hardware limitations must be written in an ethdev
> structure.


Re: [dpdk-dev] [PATCH 03/21] vhost: protect virtio_net device struct

2017-09-05 Thread Maxime Coquelin



On 09/05/2017 12:07 PM, Tiwei Bie wrote:

On Tue, Sep 05, 2017 at 11:24:14AM +0200, Maxime Coquelin wrote:

On 09/05/2017 06:45 AM, Tiwei Bie wrote:

On Thu, Aug 31, 2017 at 11:50:05AM +0200, Maxime Coquelin wrote:

virtio_net device might be accessed while being reallocated
in case of NUMA awareness. This case might be theoretical,
but it will be needed anyway to protect vrings pages against
invalidation.

The virtio_net devs are now protected with a readers/writers
lock, so that before reallocating the device, it is ensured
that it is not being referenced by the processing threads.


[...]

+struct virtio_net *
+get_device(int vid)
+{
+   struct virtio_net *dev;
+
+   rte_rwlock_read_lock(&vhost_devices[vid].lock);
+
+   dev = __get_device(vid);
+   if (unlikely(!dev))
+   rte_rwlock_read_unlock(&vhost_devices[vid].lock);
+
+   return dev;
+}
+
+void
+put_device(int vid)
+{
+   rte_rwlock_read_unlock(&vhost_devices[vid].lock);
+}
+


This patch introduced a per-device rwlock which needs to be acquired
unconditionally in the data path. So for each vhost device, the IO
threads of different queues will need to acquire/release this lock
during each enqueue and dequeue operation, which will cause cache
contention when multiple queues are enabled and handled by different
cores. With this patch alone, I saw ~7% performance drop when enabling
6 queues to do 64bytes iofwd loopback test. Is there any way to avoid
introducing this lock to the data path?


First, I'd like to thank you for running the MQ test.
I agree it may have a performance impact in this case.

This lock has currently two purposes:
1. Prevent referencing freed virtio_dev struct in case of numa_realloc.
2. Protect vring pages against invalidation.

For 2., it can be fixed by using the per-vq IOTLB lock (it was not the
case in my early prototypes that had per device IOTLB cache).

For 1., this is an existing problem, so we might consider it is
acceptable to keep current state. Maybe it could be improved by only
reallocating in case VQ0 is not on the right NUMA node, the other VQs
not being initialized at this point.

If we do this we might be able to get rid of this lock, I need some more
time though to ensure I'm not missing something.

What do you think?



Cool. So it's possible that the lock in the data path will be
acquired only when the IOMMU feature is enabled. It will be
great!

Besides, I just did a very simple MQ test to verify my thoughts.
Lei (CC'ed in this mail) may do a thorough performance test for
this patch set to evaluate the performance impacts.


I'll try to post v2 this week including the proposed change.
Maybe it'll be better Lei waits for the v2.

Thanks,
Maxime


Best regards,
Tiwei Bie



Re: [dpdk-dev] [PATCH v2] app/crypto-perf: fix uninitialized errno value

2017-09-05 Thread De Lara Guarch, Pablo


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Hemant Agrawal
> Sent: Tuesday, September 5, 2017 7:17 AM
> To: dev@dpdk.org
> Cc: Doherty, Declan ; De Lara Guarch, Pablo
> ; akhil.go...@nxp.com
> Subject: [dpdk-dev] [PATCH v2] app/crypto-perf: fix uninitialized errno value
> 
> errno should be initialized to 0 before calling strtol
> 
> Fixes: f6cefe253cc8 ("app/crypto-perf: add range/list of sizes")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Hemant Agrawal 
> Reviewed-by: Kirill Rybalchenko 

Applied to dpdk-next-crypto.
Thanks,

Pablo


Re: [dpdk-dev] [PATCH] examples/l2fwd-crypto: fix uninitialized errno value

2017-09-05 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Hemant Agrawal [mailto:hemant.agra...@nxp.com]
> Sent: Wednesday, August 23, 2017 1:24 PM
> To: dev@dpdk.org
> Cc: Doherty, Declan ; De Lara Guarch, Pablo
> ; akhil.go...@nxp.com
> Subject: [PATCH] examples/l2fwd-crypto: fix uninitialized errno value
> 
> errno should be initialized to 0 before calling strtol
> 
> Fixes: 1df9c0109f4c ("examples/l2fwd-crypto: parse key parameters")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Hemant Agrawal 

Applied to dpdk-next-crypto.
Thanks,

Pablo


[dpdk-dev] [PATCH v5 1/3] eal: introduce integer divide through reciprocal

2017-09-05 Thread Pavan Nikhilesh
From: Pavan Bhagavatula 

In some use cases of integer division, denominator remains constant and
numerator varies. It is possible to optimize division for such specific
scenarios.

The librte_sched uses rte_reciprocal to optimize division so, moving it to
eal/common would allow other libraries and applications to use it.

Signed-off-by: Pavan Nikhilesh 
Reviewed-by: Anatoly Burakov 
---
v5 changes:
 - fix test print strings

v4 changes:
 - minor fix for test cases
 - fix u32 divisor generation

v3 changes:
 - fix x86_32 compilation issue
 - fix improper licence in test

v2 changes:
 - fix compilation issues with .map files
 - add test cases for correctness and performance
 - remove extra licence inclusion
 - fix coding style issues

 lib/librte_eal/bsdapp/eal/Makefile   | 1 +
 lib/librte_eal/bsdapp/eal/rte_eal_version.map| 7 +++
 lib/librte_eal/common/Makefile   | 1 +
 lib/{librte_sched => librte_eal/common/include}/rte_reciprocal.h | 6 --
 lib/{librte_sched => librte_eal/common}/rte_reciprocal.c | 6 --
 lib/librte_eal/linuxapp/eal/Makefile | 1 +
 lib/librte_eal/linuxapp/eal/rte_eal_version.map  | 7 +++
 lib/librte_sched/Makefile| 2 --
 lib/librte_sched/rte_sched.c | 2 +-
 9 files changed, 26 insertions(+), 7 deletions(-)
 rename lib/{librte_sched => librte_eal/common/include}/rte_reciprocal.h (87%)
 rename lib/{librte_sched => librte_eal/common}/rte_reciprocal.c (96%)

diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index 005019e..56f9804 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -88,6 +88,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_elem.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_heap.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_keepalive.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_service.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_reciprocal.c

 # from arch dir
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_cpuflags.c
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 79e7d31..d0bda66 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -238,3 +238,10 @@ EXPERIMENTAL {
rte_service_unregister;

 } DPDK_17.08;
+
+DPDK_17.11 {
+   global:
+
+   rte_reciprocal_value;
+
+} DPDK_17.08;
diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index e8fd67a..a680b2d 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -42,6 +42,7 @@ INC += rte_hexdump.h rte_devargs.h rte_bus.h rte_dev.h 
rte_vdev.h
 INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
 INC += rte_malloc.h rte_keepalive.h rte_time.h
 INC += rte_service.h rte_service_component.h
+INC += rte_reciprocal.h

 GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
 GENERIC_INC += rte_spinlock.h rte_memcpy.h rte_cpuflags.h rte_rwlock.h
diff --git a/lib/librte_sched/rte_reciprocal.h 
b/lib/librte_eal/common/include/rte_reciprocal.h
similarity index 87%
rename from lib/librte_sched/rte_reciprocal.h
rename to lib/librte_eal/common/include/rte_reciprocal.h
index 5e21f09..b6d752f 100644
--- a/lib/librte_sched/rte_reciprocal.h
+++ b/lib/librte_eal/common/include/rte_reciprocal.h
@@ -29,13 +29,15 @@ struct rte_reciprocal {
uint8_t sh1, sh2;
 };

-static inline uint32_t rte_reciprocal_divide(uint32_t a, struct rte_reciprocal 
R)
+static inline uint32_t
+rte_reciprocal_divide(uint32_t a, struct rte_reciprocal R)
 {
uint32_t t = (uint32_t)(((uint64_t)a * R.m) >> 32);

return (t + ((a - t) >> R.sh1)) >> R.sh2;
 }

-struct rte_reciprocal rte_reciprocal_value(uint32_t d);
+struct rte_reciprocal
+rte_reciprocal_value(uint32_t d);

 #endif /* _RTE_RECIPROCAL_H_ */
diff --git a/lib/librte_sched/rte_reciprocal.c 
b/lib/librte_eal/common/rte_reciprocal.c
similarity index 96%
rename from lib/librte_sched/rte_reciprocal.c
rename to lib/librte_eal/common/rte_reciprocal.c
index 652f023..7ab99b4 100644
--- a/lib/librte_sched/rte_reciprocal.c
+++ b/lib/librte_eal/common/rte_reciprocal.c
@@ -41,7 +41,8 @@
 /* find largest set bit.
  * portable and slow but does not matter for this usage.
  */
-static inline int fls(uint32_t x)
+static inline int
+fls(uint32_t x)
 {
int b;

@@ -53,7 +54,8 @@ static inline int fls(uint32_t x)
return 0;
 }

-struct rte_reciprocal rte_reciprocal_value(uint32_t d)
+struct rte_reciprocal
+rte_reciprocal_value(uint32_t d)
 {
struct rte_reciprocal R;
uint64_t m;
diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 90bca4d..98f3b8e 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -100,

[dpdk-dev] [PATCH v5 2/3] eal: add u64 bit variant for reciprocal

2017-09-05 Thread Pavan Nikhilesh
From: Pavan Bhagavatula 

Currently, rte_reciprocal only supports unsigned 32bit divisors. This
commit adds support for unsigned 64bit divisors.

Rename unsigned 32bit specific functions appropriately and update
librte_sched accordingly.

Signed-off-by: Pavan Nikhilesh 
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   3 +-
 lib/librte_eal/common/include/rte_reciprocal.h  | 111 +--
 lib/librte_eal/common/rte_reciprocal.c  | 116 +---
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   3 +-
 lib/librte_sched/Makefile   |   4 +-
 lib/librte_sched/rte_sched.c|   9 +-
 6 files changed, 220 insertions(+), 26 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index d0bda66..5fd6101 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -242,6 +242,7 @@ EXPERIMENTAL {
 DPDK_17.11 {
global:
 
-   rte_reciprocal_value;
+   rte_reciprocal_value_u32;
+   rte_reciprocal_value_u64;
 
 } DPDK_17.08;
diff --git a/lib/librte_eal/common/include/rte_reciprocal.h 
b/lib/librte_eal/common/include/rte_reciprocal.h
index b6d752f..801d1c8 100644
--- a/lib/librte_eal/common/include/rte_reciprocal.h
+++ b/lib/librte_eal/common/include/rte_reciprocal.h
@@ -22,22 +22,117 @@
 #ifndef _RTE_RECIPROCAL_H_
 #define _RTE_RECIPROCAL_H_
 
-#include 
+#include 
 
-struct rte_reciprocal {
+/**
+ * Unsigned 32-bit divisor structure.
+ */
+struct rte_reciprocal_u32 {
uint32_t m;
uint8_t sh1, sh2;
-};
+} __rte_cache_aligned;
+
+/**
+ * Unsigned 64-bit divisor structure.
+ */
+struct rte_reciprocal_u64 {
+   uint64_t m;
+   uint8_t sh1;
+} __rte_cache_aligned;
 
+/**
+ * Divide given unsigned 32-bit integer with pre calculated divisor.
+ *
+ * @param a
+ *   The 32-bit dividend.
+ * @param R
+ *   The pointer to pre calculated divisor reciprocal structure.
+ *
+ * @return
+ *   The result of the division
+ */
 static inline uint32_t
-rte_reciprocal_divide(uint32_t a, struct rte_reciprocal R)
+rte_reciprocal_divide_u32(uint32_t a, struct rte_reciprocal_u32 *R)
+{
+   uint32_t t = (((uint64_t)a * R->m) >> 32);
+
+   return (t + ((a - t) >> R->sh1)) >> R->sh2;
+}
+
+static inline uint64_t
+mullhi_u64(uint64_t x, uint64_t y)
+{
+#ifdef __SIZEOF_INT128__
+   __uint128_t xl = x;
+   __uint128_t rl = xl * y;
+
+   return (rl >> 64);
+#else
+   uint64_t u0, u1, v0, v1, k, t;
+   uint64_t w1, w2;
+   uint64_t whi;
+
+   u1 = x >> 32; u0 = x & 0x;
+   v1 = y >> 32; v0 = y & 0x;
+
+   t = u0*v0;
+   k = t >> 32;
+
+   t = u1*v0 + k;
+   w1 = t & 0x;
+   w2 = t >> 32;
+
+   t = u0*v1 + w1;
+   k = t >> 32;
+
+   whi = u1*v1 + w2 + k;
+
+   return whi;
+#endif
+}
+
+/**
+ * Divide given unsigned 64-bit integer with pre calculated divisor.
+ *
+ * @param a
+ *   The 64-bit dividend.
+ * @param R
+ *   The pointer to pre calculated divisor reciprocal structure.
+ *
+ * @return
+ *   The result of the division
+ */
+static inline uint64_t
+rte_reciprocal_divide_u64(uint64_t a, struct rte_reciprocal_u64 *R)
 {
-   uint32_t t = (uint32_t)(((uint64_t)a * R.m) >> 32);
+   uint64_t q = mullhi_u64(R->m, a);
+   uint64_t t = ((a - q) >> 1) + q;
 
-   return (t + ((a - t) >> R.sh1)) >> R.sh2;
+   return t >> R->sh1;
 }
 
-struct rte_reciprocal
-rte_reciprocal_value(uint32_t d);
+/**
+ * Generate pre calculated divisor structure.
+ *
+ * @param d
+ *   The unsigned 32-bit divisor.
+ *
+ * @return
+ *   Divisor structure.
+ */
+struct rte_reciprocal_u32
+rte_reciprocal_value_u32(uint32_t d);
+
+/**
+ * Generate pre calculated divisor structure.
+ *
+ * @param d
+ *   The unsigned 64-bit divisor.
+ *
+ * @return
+ *   Divisor structure.
+ */
+struct rte_reciprocal_u64
+rte_reciprocal_value_u64(uint64_t d);
 
 #endif /* _RTE_RECIPROCAL_H_ */
diff --git a/lib/librte_eal/common/rte_reciprocal.c 
b/lib/librte_eal/common/rte_reciprocal.c
index 7ab99b4..2024e62 100644
--- a/lib/librte_eal/common/rte_reciprocal.c
+++ b/lib/librte_eal/common/rte_reciprocal.c
@@ -31,18 +31,13 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include 
-#include 
-
-#include 
-
-#include "rte_reciprocal.h"
+#include 
 
 /* find largest set bit.
  * portable and slow but does not matter for this usage.
  */
 static inline int
-fls(uint32_t x)
+fls_u32(uint32_t x)
 {
int b;
 
@@ -54,14 +49,14 @@ fls(uint32_t x)
return 0;
 }
 
-struct rte_reciprocal
-rte_reciprocal_value(uint32_t d)
+struct rte_reciprocal_u32
+rte_reciprocal_value_u32(uint32_t d)
 {
-   struct rte_reciprocal R;
+   struct rte_reciprocal_u32 R;
uint64_t m;
int l;
 
-   l = fls(d - 1);
+   l = fls_u32(d - 1);
m = ((1ULL << 32) * ((1ULL << l) - d));
m /= d;
 

[dpdk-dev] [PATCH v5 3/3] test: add tests for reciprocal based division

2017-09-05 Thread Pavan Nikhilesh
From: Pavan Bhagavatula 

This commit provides a set of tests for verifying the correctness and
performance of both unsigned 32 and 64bit reciprocal based division.

Signed-off-by: Pavan Nikhilesh 
---
 test/test/Makefile|   2 +
 test/test/test_reciprocal_division.c  | 109 +
 test/test/test_reciprocal_division_perf.c | 193 ++
 3 files changed, 304 insertions(+)
 create mode 100644 test/test/test_reciprocal_division.c
 create mode 100644 test/test/test_reciprocal_division_perf.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 42d9a49..6017862 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -94,6 +94,8 @@ SRCS-y += test_cycles.c
 SRCS-y += test_spinlock.c
 SRCS-y += test_memory.c
 SRCS-y += test_memzone.c
+SRCS-y += test_reciprocal_division.c
+SRCS-y += test_reciprocal_division_perf.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/test_reciprocal_division.c 
b/test/test/test_reciprocal_division.c
new file mode 100644
index 000..771ea64
--- /dev/null
+++ b/test/test/test_reciprocal_division.c
@@ -0,0 +1,109 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright (C) Cavium, Inc. 2017.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Cavium, Inc nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "test.h"
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define MAX_ITERATIONS 100
+#define DIVIDE_ITER100
+
+static int
+test_reciprocal_division(void)
+{
+   int i;
+   int result = 0;
+   uint32_t divisor_u32 = 0;
+   uint32_t dividend_u32;
+   uint32_t nresult_u32;
+   uint32_t rresult_u32;
+   uint64_t divisor_u64 = 0;
+   uint64_t dividend_u64;
+   uint64_t nresult_u64;
+   uint64_t rresult_u64;
+   struct rte_reciprocal_u32 reci_u32;
+   struct rte_reciprocal_u64 reci_u64;
+
+   rte_srand(rte_rdtsc());
+   printf("Validating unsigned 32bit division.\n");
+   for (i = 0; i < MAX_ITERATIONS; i++) {
+   /* Change divisor every DIVIDE_ITER iterations. */
+   if (i % DIVIDE_ITER == 0) {
+   divisor_u32 = rte_rand();
+   reci_u32 = rte_reciprocal_value_u32(divisor_u32);
+   }
+
+   dividend_u32 = rte_rand();
+   nresult_u32 = dividend_u32 / divisor_u32;
+   rresult_u32 = rte_reciprocal_divide_u32(dividend_u32,
+   &reci_u32);
+   if (nresult_u32 != rresult_u32) {
+   printf("Division failed, expected %"PRIu32" "
+  "result %"PRIu32"",
+   nresult_u32, rresult_u32);
+   result = 1;
+   break;
+   }
+   }
+
+   printf("Validating unsigned 64bit division.\n");
+   for (i = 0; i < MAX_ITERATIONS; i++) {
+   /* Change divisor every DIVIDE_ITER iterations. */
+   if (i % DIVIDE_ITER == 0) {
+   divisor_u64 = rte_rand();
+   reci_u64 = rte_reciprocal_value_u64(divisor_u64);
+   }
+
+   dividend_u64 = rte_rand();
+   nresult_u64 = dividend_u64 / divisor_u64;
+   rresult_u64 = rte_reciprocal_divide_u64(dividend_u64,
+   &reci_u64);
+   if (nresult_u64 != rresult_u64) {
+

Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal event

2017-09-05 Thread Adrien Mazarguil
Hi Matan,

On Tue, Sep 05, 2017 at 10:38:21AM +, Matan Azrad wrote:
> Hi Adrien
> 
> > -Original Message-
> > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > Sent: Tuesday, September 5, 2017 12:28 PM
> > To: Matan Azrad 
> > Cc: Nélio Laranjeiro ; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal event
> > 
> > Hi Matan,
> > 
> > On Mon, Sep 04, 2017 at 05:52:55PM +, Matan Azrad wrote:
> > > Hi Adrien,
> > >
> > > > -Original Message-
> > > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > > > Sent: Monday, September 4, 2017 6:33 PM
> > > > To: Matan Azrad 
> > > > Cc: Nélio Laranjeiro ; dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal
> > > > event
> > > >
> > > > Hi Matan,
> > > >
> > > > One comment I have is, while this patch adds support for RMV, it
> > > > also silently addresses a bug (see large comment you added to
> > > > priv_link_status_update()).
> > > >
> > > > This should be split in two commits, with the fix part coming first
> > > > and CC sta...@dpdk.org, and a second commit adding RMV support
> > proper.
> > > >
> > >
> > > Actually, the mlx4 bug was not appeared in the mlx5 previous code,
> > > Probably because the RMV interrupt was not implemented in mlx5 before
> > this patch.
> > 
> > Good point, no RMV could occur before it is implemented, however a
> > dedicated commit for the fix itself (i.e. alarm callback not supposed to 
> > end up
> > calling ibv_get_async_event()) might better explain the logic behind these
> > changes. What I mean is, if there was no problem, you wouldn't need to
> > make
> > priv_link_status_update() a separate function, right?
> > 
> 
> The separation was done mainly because of the new interrupt implementation,
> else, there was bug here.
> The unnecessary  alarm ibv_get_async_event calling was harmless in
> the previous code.
> I gets your point for the logic explanation behind these changes and I can 
> add it in this
> patch commit log to be clearer, something like:
> The link update operation was separated from the interrupt callback
> to avoid RMV interrupt disregard and unnecessary event acknowledgment
> caused by the inconsistent link status alarm callback.

Yes, it's better to explain why you did this in the commit log, but see
below.

> > > The big comment just explains the link inconsistent issue and was
> > > added here since Nelio and I think the new function,
> > > priv_link_status_update(), justifies this comment for future review.
> > 
> > I understand, this could also have been part of the commit log of the
> > dedicated commit.
> > 
> Are you sure we need to describe the code comment reason in the commit log?

It's a change you did to address a possible bug otherwise so we have to,
however remember that a commit should, as much as possible, do exactly one
thing. If you need to explain that you did this in order to do that, "this"
and "that" can often be identified as two separate commits. Doing so makes
it much easier for reviewers to understand the reasoning behind changes and
leads to quicker reviews (makes instant-acks even possible).

It'd still like a separate commit if you don't mind.

-- 
Adrien Mazarguil
6WIND


[dpdk-dev] [PATCH v7 3/6] igb_uio: fix MSI-X IRQ assignment with new IRQ function

2017-09-05 Thread Markus Theil
The patch which introduced the usage of pci_alloc_irq_vectors
came after the patch which switched to non-threaded ISR (f0d1896fa1),
but did not use non-threaded ISR, if pci_alloc_irq_vectors
is used.

Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for
MSI-X")
Cc: nicolas.dich...@6wind.com

Signed-off-by: Markus Theil 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 93bb71d..6885e72 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -331,6 +331,7 @@ igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
 #else
if (pci_alloc_irq_vectors(udev->pdev, 1, 1, PCI_IRQ_MSIX) == 1) 
{
dev_dbg(&udev->pdev->dev, "using MSI-X");
+   udev->info.irq_flags = IRQF_NO_THREAD;
udev->info.irq = pci_irq_vector(udev->pdev, 0);
udev->mode = RTE_INTR_MODE_MSIX;
break;
-- 
2.7.4



[dpdk-dev] [PATCH v7 4/6] igb_uio: release in exact reverse order

2017-09-05 Thread Markus Theil
For better readability throughout the module, the destruction
order is changed to the exact inverse construction order.

Signed-off-by: Markus Theil 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 6885e72..c570eed 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -482,7 +482,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
 
err = sysfs_create_group(&dev->dev.kobj, &dev_attr_grp);
if (err != 0)
-   goto fail_release_iomem;
+   goto fail_disable_interrupts;
 
/* register uio driver */
err = uio_register_device(&dev->dev, &udev->info);
@@ -519,9 +519,10 @@ igbuio_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
 
 fail_remove_group:
sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
+fail_disable_interrupts:
+   igbuio_pci_disable_interrupts(udev);
 fail_release_iomem:
igbuio_pci_release_iomem(&udev->info);
-   igbuio_pci_disable_interrupts(udev);
pci_disable_device(dev);
 fail_free:
kfree(udev);
@@ -536,8 +537,8 @@ igbuio_pci_remove(struct pci_dev *dev)
 
sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
uio_unregister_device(&udev->info);
-   igbuio_pci_release_iomem(&udev->info);
igbuio_pci_disable_interrupts(udev);
+   igbuio_pci_release_iomem(&udev->info);
pci_disable_device(dev);
pci_set_drvdata(dev, NULL);
kfree(udev);
-- 
2.7.4



[dpdk-dev] [PATCH v7 1/6] igb_uio: refactor irq enable/disable into own functions

2017-09-05 Thread Markus Theil
Interrupt setup code in igb_uio has to deal with multiple
types of interrupts and kernel versions. This patch moves
the setup and teardown code into own functions, to make
it more readable.

Signed-off-by: Markus Theil 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 112 +-
 1 file changed, 64 insertions(+), 48 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 07a19a3..e2e9263 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -309,6 +309,66 @@ igbuio_pci_release_iomem(struct uio_info *info)
 }
 
 static int
+igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
+{
+   int err = 0;
+#ifdef HAVE_PCI_ENABLE_MSIX
+   struct msix_entry msix_entry;
+#endif
+
+   switch (igbuio_intr_mode_preferred) {
+   case RTE_INTR_MODE_MSIX:
+   /* Only 1 msi-x vector needed */
+#ifdef HAVE_PCI_ENABLE_MSIX
+   msix_entry.entry = 0;
+   if (pci_enable_msix(udev->pdev, &msix_entry, 1) == 0) {
+   dev_dbg(&udev->pdev->dev, "using MSI-X");
+   udev->info.irq_flags = IRQF_NO_THREAD;
+   udev->info.irq = msix_entry.vector;
+   udev->mode = RTE_INTR_MODE_MSIX;
+   break;
+   }
+#else
+   if (pci_alloc_irq_vectors(udev->pdev, 1, 1, PCI_IRQ_MSIX) == 1) 
{
+   dev_dbg(&udev->pdev->dev, "using MSI-X");
+   udev->info.irq = pci_irq_vector(udev->pdev, 0);
+   udev->mode = RTE_INTR_MODE_MSIX;
+   break;
+   }
+#endif
+   /* fall back to INTX */
+   case RTE_INTR_MODE_LEGACY:
+   if (pci_intx_mask_supported(udev->pdev)) {
+   dev_dbg(&udev->pdev->dev, "using INTX");
+   udev->info.irq_flags = IRQF_SHARED | IRQF_NO_THREAD;
+   udev->info.irq = udev->pdev->irq;
+   udev->mode = RTE_INTR_MODE_LEGACY;
+   break;
+   }
+   dev_notice(&udev->pdev->dev, "PCI INTX mask not supported\n");
+   /* fall back to no IRQ */
+   case RTE_INTR_MODE_NONE:
+   udev->mode = RTE_INTR_MODE_NONE;
+   udev->info.irq = 0;
+   break;
+
+   default:
+   dev_err(&udev->pdev->dev, "invalid IRQ mode %u",
+   igbuio_intr_mode_preferred);
+   err = -EINVAL;
+   }
+
+   return err;
+}
+
+static void
+igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
+{
+   if (udev->mode == RTE_INTR_MODE_MSIX)
+   pci_disable_msix(udev->pdev);
+}
+
+static int
 igbuio_setup_bars(struct pci_dev *dev, struct uio_info *info)
 {
int i, iom, iop, ret;
@@ -356,9 +416,6 @@ static int
 igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
struct rte_uio_pci_dev *udev;
-#ifdef HAVE_PCI_ENABLE_MSIX
-   struct msix_entry msix_entry;
-#endif
dma_addr_t map_dma_addr;
void *map_addr;
int err;
@@ -413,48 +470,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
udev->info.priv = udev;
udev->pdev = dev;
 
-   switch (igbuio_intr_mode_preferred) {
-   case RTE_INTR_MODE_MSIX:
-   /* Only 1 msi-x vector needed */
-#ifdef HAVE_PCI_ENABLE_MSIX
-   msix_entry.entry = 0;
-   if (pci_enable_msix(dev, &msix_entry, 1) == 0) {
-   dev_dbg(&dev->dev, "using MSI-X");
-   udev->info.irq_flags = IRQF_NO_THREAD;
-   udev->info.irq = msix_entry.vector;
-   udev->mode = RTE_INTR_MODE_MSIX;
-   break;
-   }
-#else
-   if (pci_alloc_irq_vectors(dev, 1, 1, PCI_IRQ_MSIX) == 1) {
-   dev_dbg(&dev->dev, "using MSI-X");
-   udev->info.irq = pci_irq_vector(dev, 0);
-   udev->mode = RTE_INTR_MODE_MSIX;
-   break;
-   }
-#endif
-   /* fall back to INTX */
-   case RTE_INTR_MODE_LEGACY:
-   if (pci_intx_mask_supported(dev)) {
-   dev_dbg(&dev->dev, "using INTX");
-   udev->info.irq_flags = IRQF_SHARED | IRQF_NO_THREAD;
-   udev->info.irq = dev->irq;
-   udev->mode = RTE_INTR_MODE_LEGACY;
-   break;
-   }
-   dev_notice(&dev->dev, "PCI INTX mask not supported\n");
-   /* fall back to no IRQ */
-   case RTE_INTR_MODE_NONE:
-   udev->mode = RTE_INTR_MODE_NONE;
-   udev->info.irq = 0;
-   break;
-
-   default:
-   dev_err(&dev->dev, "invalid IRQ mode %u",
- 

[dpdk-dev] [PATCH v7 6/6] igb_uio: MSI IRQ mode

2017-09-05 Thread Markus Theil
This patch adds MSI IRQ mode in a way, that should
also work on older kernel versions. The base for my patch
was an attempt to do this in cf705bc36c which was later
reverted in d8ee82745a. Compilation was tested on Linux 3.2,
4.10 and 4.12.

Signed-off-by: Markus Theil 
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 32 ---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index e4ef817..b578c4a 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -119,7 +119,7 @@ igbuio_pci_irqcontrol(struct uio_info *info, s32 irq_state)
 
pci_cfg_access_lock(pdev);
 
-   if (udev->mode == RTE_INTR_MODE_MSIX) {
+   if (udev->mode == RTE_INTR_MODE_MSIX || udev->mode == 
RTE_INTR_MODE_MSI) {
 #ifdef HAVE_PCI_MSI_MASK_IRQ
if (irq_state == 1)
pci_msi_unmask_irq(irq);
@@ -326,6 +326,25 @@ igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
break;
}
 #endif
+   /* fall back to MSI */
+   case RTE_INTR_MODE_MSI:
+#ifndef HAVE_ALLOC_IRQ_VECTORS
+   if (pci_enable_msi(udev->pdev) == 0) {
+   dev_dbg(&udev->pdev->dev, "using MSI");
+   udev->info.irq_flags = IRQF_NO_THREAD;
+   udev->info.irq = udev->pdev->irq;
+   udev->mode = RTE_INTR_MODE_MSI;
+   break;
+   }
+#else
+   if (pci_alloc_irq_vectors(udev->pdev, 1, 1, PCI_IRQ_MSI) == 1) {
+   dev_dbg(&udev->pdev->dev, "using MSI");
+   udev->info.irq_flags = IRQF_NO_THREAD;
+   udev->info.irq = pci_irq_vector(udev->pdev, 0);
+   udev->mode = RTE_INTR_MODE_MSI;
+   break;
+   }
+#endif
/* fall back to INTX */
case RTE_INTR_MODE_LEGACY:
if (pci_intx_mask_supported(udev->pdev)) {
@@ -336,7 +355,7 @@ igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
break;
}
dev_notice(&udev->pdev->dev, "PCI INTX mask not supported\n");
-   /* fall back to no IRQ */
+   /* fall back to no IRQ */
case RTE_INTR_MODE_NONE:
udev->mode = RTE_INTR_MODE_NONE;
udev->info.irq = 0;
@@ -357,8 +376,11 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #ifndef HAVE_ALLOC_IRQ_VECTORS
if (udev->mode == RTE_INTR_MODE_MSIX)
pci_disable_msix(udev->pdev);
+   if (udev->mode == RTE_INTR_MODE_MSI)
+   pci_disable_msi(udev->pdev);
 #else
-   if (udev->mode == RTE_INTR_MODE_MSIX)
+   if (udev->mode == RTE_INTR_MODE_MSIX ||
+   udev->mode == RTE_INTR_MODE_MSI)
pci_free_irq_vectors(udev->pdev);
 #endif
 }
@@ -544,6 +566,9 @@ igbuio_config_intr_mode(char *intr_str)
if (!strcmp(intr_str, RTE_INTR_MODE_MSIX_NAME)) {
igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
pr_info("Use MSIX interrupt\n");
+   } else if (!strcmp(intr_str, RTE_INTR_MODE_MSI_NAME)) {
+   igbuio_intr_mode_preferred = RTE_INTR_MODE_MSI;
+   pr_info("Use MSI interrupt\n");
} else if (!strcmp(intr_str, RTE_INTR_MODE_LEGACY_NAME)) {
igbuio_intr_mode_preferred = RTE_INTR_MODE_LEGACY;
pr_info("Use legacy interrupt\n");
@@ -587,6 +612,7 @@ module_param(intr_mode, charp, S_IRUGO);
 MODULE_PARM_DESC(intr_mode,
 "igb_uio interrupt mode (default=msix):\n"
 "" RTE_INTR_MODE_MSIX_NAME "   Use MSIX interrupt\n"
+"" RTE_INTR_MODE_MSI_NAME "Use MSI interrupt\n"
 "" RTE_INTR_MODE_LEGACY_NAME " Use Legacy interrupt\n"
 "\n");
 
-- 
2.7.4



[dpdk-dev] [PATCH v7 2/6] igb_uio: fix irq disable on recent kernels

2017-09-05 Thread Markus Theil
igb_uio already allocates irqs using pci_alloc_irq_vectors on
recent kernels >= 4.8. The interrupt disable code was not
using the corresponding pci_free_irq_vectors, but the also
deprecated pci_disable_msix, before this fix.

Fixes: 99bb58f3adc7 ("igb_uio: switch to new irq function for MSI-X")
Cc: nicolas.dich...@6wind.com

Signed-off-by: Markus Theil 
---
 lib/librte_eal/linuxapp/igb_uio/compat.h  | 4 ++--
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 9 +++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/compat.h 
b/lib/librte_eal/linuxapp/igb_uio/compat.h
index b800a53..3825933 100644
--- a/lib/librte_eal/linuxapp/igb_uio/compat.h
+++ b/lib/librte_eal/linuxapp/igb_uio/compat.h
@@ -124,6 +124,6 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev)
 
 #endif /* < 3.3.0 */
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 8, 0)
-#define HAVE_PCI_ENABLE_MSIX
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 8, 0)
+#define HAVE_ALLOC_IRQ_VECTORS 1
 #endif
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index e2e9263..93bb71d 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -312,14 +312,14 @@ static int
 igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
 {
int err = 0;
-#ifdef HAVE_PCI_ENABLE_MSIX
+#ifndef HAVE_ALLOC_IRQ_VECTORS
struct msix_entry msix_entry;
 #endif
 
switch (igbuio_intr_mode_preferred) {
case RTE_INTR_MODE_MSIX:
/* Only 1 msi-x vector needed */
-#ifdef HAVE_PCI_ENABLE_MSIX
+#ifndef HAVE_ALLOC_IRQ_VECTORS
msix_entry.entry = 0;
if (pci_enable_msix(udev->pdev, &msix_entry, 1) == 0) {
dev_dbg(&udev->pdev->dev, "using MSI-X");
@@ -364,8 +364,13 @@ igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev)
 static void
 igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 {
+#ifndef HAVE_ALLOC_IRQ_VECTORS
if (udev->mode == RTE_INTR_MODE_MSIX)
pci_disable_msix(udev->pdev);
+#else
+   if (udev->mode == RTE_INTR_MODE_MSIX)
+   pci_free_irq_vectors(udev->pdev);
+#endif
 }
 
 static int
-- 
2.7.4



[dpdk-dev] [PATCH v7 5/6] igb_uio: use kernel functions for masking MSI-X

2017-09-05 Thread Markus Theil
This patch removes the custom MSI-X mask/unmask code and
uses already existing kernel functions.

Signed-off-by: Markus Theil 
---
 lib/librte_eal/linuxapp/igb_uio/compat.h  | 26 +---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 51 ---
 2 files changed, 28 insertions(+), 49 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/compat.h 
b/lib/librte_eal/linuxapp/igb_uio/compat.h
index 3825933..67a7ab3 100644
--- a/lib/librte_eal/linuxapp/igb_uio/compat.h
+++ b/lib/librte_eal/linuxapp/igb_uio/compat.h
@@ -15,24 +15,6 @@
 #define HAVE_PTE_MASK_PAGE_IOMAP
 #endif
 
-#ifndef PCI_MSIX_ENTRY_SIZE
-#define PCI_MSIX_ENTRY_SIZE 16
-#define  PCI_MSIX_ENTRY_LOWER_ADDR  0
-#define  PCI_MSIX_ENTRY_UPPER_ADDR  4
-#define  PCI_MSIX_ENTRY_DATA8
-#define  PCI_MSIX_ENTRY_VECTOR_CTRL 12
-#define   PCI_MSIX_ENTRY_CTRL_MASKBIT   1
-#endif
-
-/*
- * for kernels < 2.6.38 and backported patch that moves MSI-X entry definition
- * to pci_regs.h Those kernels has PCI_MSIX_ENTRY_SIZE defined but not
- * PCI_MSIX_ENTRY_CTRL_MASKBIT
- */
-#ifndef PCI_MSIX_ENTRY_CTRL_MASKBIT
-#define PCI_MSIX_ENTRY_CTRL_MASKBIT1
-#endif
-
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2, 6, 34) && \
(!(defined(RHEL_RELEASE_CODE) && \
 RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(5, 9)))
@@ -127,3 +109,11 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev)
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 8, 0)
 #define HAVE_ALLOC_IRQ_VECTORS 1
 #endif
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 19, 0)
+#define HAVE_PCI_MSI_MASK_IRQ 1
+#endif
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 37)
+#define HAVE_IRQ_DATA 1
+#endif
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c 
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index c570eed..e4ef817 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -91,27 +91,6 @@ static struct attribute *dev_attrs[] = {
 static const struct attribute_group dev_attr_grp = {
.attrs = dev_attrs,
 };
-/*
- * It masks the msix on/off of generating MSI-X messages.
- */
-static void
-igbuio_msix_mask_irq(struct msi_desc *desc, int32_t state)
-{
-   u32 mask_bits = desc->masked;
-   unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
-   PCI_MSIX_ENTRY_VECTOR_CTRL;
-
-   if (state != 0)
-   mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
-   else
-   mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
-
-   if (mask_bits != desc->masked) {
-   writel(mask_bits, desc->mask_base + offset);
-   readl(desc->mask_base);
-   desc->masked = mask_bits;
-   }
-}
 
 /**
  * This is the irqcontrol callback to be registered to uio_info.
@@ -132,21 +111,31 @@ igbuio_pci_irqcontrol(struct uio_info *info, s32 
irq_state)
struct rte_uio_pci_dev *udev = info->priv;
struct pci_dev *pdev = udev->pdev;
 
-   pci_cfg_access_lock(pdev);
-   if (udev->mode == RTE_INTR_MODE_LEGACY)
-   pci_intx(pdev, !!irq_state);
+#ifdef HAVE_IRQ_DATA
+   struct irq_data *irq = irq_get_irq_data(udev->info.irq);
+#else
+   unsigned int irq = udev->info.irq;
+#endif
 
-   else if (udev->mode == RTE_INTR_MODE_MSIX) {
-   struct msi_desc *desc;
+   pci_cfg_access_lock(pdev);
 
-#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 3, 0))
-   list_for_each_entry(desc, &pdev->msi_list, list)
-   igbuio_msix_mask_irq(desc, irq_state);
+   if (udev->mode == RTE_INTR_MODE_MSIX) {
+#ifdef HAVE_PCI_MSI_MASK_IRQ
+   if (irq_state == 1)
+   pci_msi_unmask_irq(irq);
+   else
+   pci_msi_mask_irq(irq);
 #else
-   list_for_each_entry(desc, &pdev->dev.msi_list, list)
-   igbuio_msix_mask_irq(desc, irq_state);
+   if (irq_state == 1)
+   unmask_msi_irq(irq);
+   else
+   mask_msi_irq(irq);
 #endif
}
+
+   if (udev->mode == RTE_INTR_MODE_LEGACY)
+   pci_intx(pdev, !!irq_state);
+
pci_cfg_access_unlock(pdev);
 
return 0;
-- 
2.7.4



Re: [dpdk-dev] [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus

2017-09-05 Thread Hemant Agrawal

Tested-by: Hemant Agrawal 

On 8/31/2017 8:56 AM, Santosh Shukla wrote:

v7:
Includes no major change, minor change detailing:
- patch sqashing (Aaron suggestion)
- added run_once for device_parse() and bus_scan() in eal init
(Aaron suggestion)
- Moved rte_eal_device_parse() up in eal initialization order.
- Patches rebased on top of version: 17.11-rc0
For v6 info refer [11].

v6:
Sending v5 series rebased on top of version: 17.11-rc0.

v5:
Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va mapping.
If a PCI driver demand for IOVA as VA scheme then the driver can add it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
 0. If no device bound then return with RTE_IOVA_DC mapping mode,
 else goto 1).
 1. Look for device attached to vfio kdrv and has .drv_flag set
 to RTE_PCI_DRV_IOVA_AS_VA.
 2. Look for any device attached to UIO class of driver.
 3. Check for vfio-noiommu mode enabled.

 If 2) & 3) is false and 1) is true then select
 mapping scheme as RTE_IOVA_VA. Otherwise use default
 mapping scheme (RTE_IOVA_PA).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

v6 --> v7:
- Patches squashed per v6.
- Added run_once in eal per v6.
- Moved rte_eal_device_parse() up in eal init oder.

v5 --> v6:
- Added api info in eal's versiom.map (release DPDK_v17.11).

v4 --> v5:
- Change DPDK_17.08 to DPDK_17.11 in _version.map.
- Reworded bus api description (suggested by Hemant).
- Added reviewed-by from Maxime in v5.
- Added acked-by from Hemant for pci and bus patches.

v3 --> v4:
- Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
- Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
- Reworded WARNING message(suggested by Maxime[7]).
- Created a separate patch for rte_pci_get_iommu_class (suggested by Maxime[]).
- Added VFIO_PRESENT ifdef build fix.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier [4])

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have means to
   truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

Patch Summary:
1) 1nd: declare rte_pci_match api in pci header. Required for autodetection in
follow up patches.
2) 2nd - 3rd - 4th : autodetection mapping infrastructure for Linux/bsdapp.
3) 5th: iova mode helper API.
4) 6th: Infra to detect iova mode.
5) 7th: make vfio mapping iova aware.
6) 8th - 9th : Check for IOVA_VA mode in below APIs
 - rte_mem_virt2phy
 - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
   Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].
For v3, Refer [9].
For v4, refer [10].
for v6, refer [11].

Checkpatch result:
* Debug message - WARNING: line over 80 characters

Thanks.,
[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
[5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
[6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
[7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
[8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
[9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
[10] http://dpdk.org/ml/archives/dev/2017-July/071754.html
[11] http://dpdk.org/ml/archives/dev/2017-August/072871.html


Santosh Shukla (9):
  eal/pci: export match function
  eal/pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce iova mode helper api
  eal: auto detect iova mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c | 33 ++---
 lib/librte_eal/bsdapp/eal/eal_pci.c | 10 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
 lib/librte_eal/common/eal_common_bus.c  | 23 ++
 lib/librte_eal/common/eal_common_pci.c  | 11 +--
 lib/librte_eal/common/include/rte_bus.h | 35 +
 lib/librte_eal/common/include/rte_eal.h | 12 
 lib/librte_eal/common/include/rte_pci.h | 28 
 lib/librte_eal/common/rte_malloc.c  |  9 ++-
 lib/librte_eal/linuxapp/eal/eal.c   | 33 ++---
 lib/librte_eal/linuxapp/eal/eal_memory.c|  3 +
 lib/librte_eal/li

Re: [dpdk-dev] [PATCH v2] librte_mbuf: modify port initialization value

2017-09-05 Thread Yang, Zhiyong
> -Original Message-
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Tuesday, September 5, 2017 3:28 PM
> To: Yang, Zhiyong 
> Cc: dev@dpdk.org; Yigit, Ferruh ;
> step...@networkplumber.org
> Subject: Re: [PATCH v2] librte_mbuf: modify port initialization value
> 
> 05/09/2017 07:13, Zhiyong Yang:
> > In order to support more than 256 virtual ports, the field "port"
> > in rte_mbuf has been increased to 16 bits. The initialization/reset
> > value of the field "port" should be changed from 0xff to 0x
> > accordingly.
> 
> This patch should be merged with the range increase.

Ok. it will be merged with range increase patchset.
thanks
Zhiyong


Re: [dpdk-dev] [PATCH v7 0/9] Infrastructure to detect iova mapping on the bus

2017-09-05 Thread Hemant Agrawal

Please note that this series break the DPAA2 BUS.
Following patch series (Shreyansh) is required to fix DPAA2 bus working 
with this patch series:


http://dpdk.org/dev/patchwork/patch/27950/


On 9/5/2017 5:58 PM, Hemant Agrawal wrote:

Tested-by: Hemant Agrawal 

On 8/31/2017 8:56 AM, Santosh Shukla wrote:

v7:
Includes no major change, minor change detailing:
- patch sqashing (Aaron suggestion)
- added run_once for device_parse() and bus_scan() in eal init
(Aaron suggestion)
- Moved rte_eal_device_parse() up in eal initialization order.
- Patches rebased on top of version: 17.11-rc0
For v6 info refer [11].

v6:
Sending v5 series rebased on top of version: 17.11-rc0.

v5:
Introducing RTE_PCI_DRV_IOVA_AS_VA flag for autodetection of iova va
mapping.
If a PCI driver demand for IOVA as VA scheme then the driver can add
it in the
PCI driver registration function.

Algorithm to select IOVA as VA for PCI bus case:
 0. If no device bound then return with RTE_IOVA_DC mapping mode,
 else goto 1).
 1. Look for device attached to vfio kdrv and has .drv_flag set
 to RTE_PCI_DRV_IOVA_AS_VA.
 2. Look for any device attached to UIO class of driver.
 3. Check for vfio-noiommu mode enabled.

 If 2) & 3) is false and 1) is true then select
 mapping scheme as RTE_IOVA_VA. Otherwise use default
 mapping scheme (RTE_IOVA_PA).

That way, Bus can truly autodetect the iova mapping mode for
a device Or a set of the device.

v6 --> v7:
- Patches squashed per v6.
- Added run_once in eal per v6.
- Moved rte_eal_device_parse() up in eal init oder.

v5 --> v6:
- Added api info in eal's versiom.map (release DPDK_v17.11).

v4 --> v5:
- Change DPDK_17.08 to DPDK_17.11 in _version.map.
- Reworded bus api description (suggested by Hemant).
- Added reviewed-by from Maxime in v5.
- Added acked-by from Hemant for pci and bus patches.

v3 --> v4:
- Re-introduced RTE_IOVA_DEC mode (Suggested by Hemant [5]).
- Renamed flag to RTE_PCI_DRV_IOVA_AS_VA (Suggested by Maxime).
- Reworded WARNING message(suggested by Maxime[7]).
- Created a separate patch for rte_pci_get_iommu_class (suggested by
Maxime[]).
- Added VFIO_PRESENT ifdef build fix.

v2 --> v3:
- Removed rte_mempool_virt2phy (suggested by Olivier [4])

v1 --> v2:
- Removed override eal option i.e. (--iova-mode=<>) Because we have
means to
   truly autodetect the iova mode.
- Introduced RTE_PCI_DRV_NEED_IOVA_VA drv_flag (Suggested by Maxime [3]).
- Using NEED_IOVA_VA drv_flag in autodetection logic.
- Removed Linux version check macro in vfio code, As per Maxime feedback.
- Moved rte_pci_match API from local to global.

Patch Summary:
1) 1nd: declare rte_pci_match api in pci header. Required for
autodetection in
follow up patches.
2) 2nd - 3rd - 4th : autodetection mapping infrastructure for
Linux/bsdapp.
3) 5th: iova mode helper API.
4) 6th: Infra to detect iova mode.
5) 7th: make vfio mapping iova aware.
6) 8th - 9th : Check for IOVA_VA mode in below APIs
 - rte_mem_virt2phy
 - rte_malloc_virt2phy

Test History:
- Tested for x86/XL710 40G NIC card for both modes (iova_va/pa).
- Tested for arm64/thunderx vNIC Integrated NIC for both modes
- Tested for arm64/Octeontx integrated NICs for only
   Iova_va mode(It supports only one mode.)
- Ran standalone tests like mempool_autotest, mbuf_autotest.
- Verified for Doxygen.

Work History:
For v1, Refer [1].
For v2, Refer [2].
For v3, Refer [9].
For v4, refer [10].
for v6, refer [11].

Checkpatch result:
* Debug message - WARNING: line over 80 characters

Thanks.,
[1] https://www.mail-archive.com/dev@dpdk.org/msg67438.html
[2] https://www.mail-archive.com/dev@dpdk.org/msg70674.html
[3] https://www.mail-archive.com/dev@dpdk.org/msg70279.html
[4] https://www.mail-archive.com/dev@dpdk.org/msg70692.html
[5] http://dpdk.org/ml/archives/dev/2017-July/071282.html
[6] http://dpdk.org/ml/archives/dev/2017-July/070951.html
[7] http://dpdk.org/ml/archives/dev/2017-July/070941.html
[8] http://dpdk.org/ml/archives/dev/2017-July/070952.html
[9] http://dpdk.org/ml/archives/dev/2017-July/070918.html
[10] http://dpdk.org/ml/archives/dev/2017-July/071754.html
[11] http://dpdk.org/ml/archives/dev/2017-August/072871.html


Santosh Shukla (9):
  eal/pci: export match function
  eal/pci: get iommu class
  linuxapp/eal_pci: get iommu class
  bus: get iommu class
  eal: introduce iova mode helper api
  eal: auto detect iova mode
  linuxapp/eal_vfio: honor iova mode before mapping
  linuxapp/eal_memory: honor iova mode in virt2phy
  eal/rte_malloc: honor iova mode in virt2phy

 lib/librte_eal/bsdapp/eal/eal.c | 33 ++---
 lib/librte_eal/bsdapp/eal/eal_pci.c | 10 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   | 10 +++
 lib/librte_eal/common/eal_common_bus.c  | 23 ++
 lib/librte_eal/common/eal_common_pci.c  | 11 +--
 lib/librte_eal/common/include/rte_bus.h | 35 +
 lib/librte_eal/common/include/rte_eal.h | 12 
 lib/librte_eal/common/in

Re: [dpdk-dev] [PATCH v2 0/8] Remove temporary digest allocation

2017-09-05 Thread Zhang, Roy Fan
Hi Pablo,

Thanks, looks great!

Regards,
Fan

> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Tuesday, September 5, 2017 3:20 AM
> To: Doherty, Declan ; Zhang, Roy Fan
> ; jerin.ja...@caviumnetworks.com
> Cc: dev@dpdk.org; De Lara Guarch, Pablo 
> Subject: [PATCH v2 0/8] Remove temporary digest allocation
> 
> When performing authentication verification, some crypto PMDs require
> extra memory where the generated digest can be placed.
> Currently, these PMDs are getting the memory from the end of the source
> mbuf, which might fail if there is not enough tailroom.
> 
> To avoid this situation, some memory is allocated in each queue pair of the
> device, to store temporarily these digests.
> 
> Changes in v2:
> - Removed incorrect indirection when getting the memory
>   to store the generated digest (i.e. removed "&" in &temp_digest)

Series Acked-by: Fan Zhang 


[dpdk-dev] [PATCH v1 0/3] net/mlx4: additional interrupt handling fixes

2017-09-05 Thread Adrien Mazarguil
While the previous interrupt rework improved the situation, it failed to
address one last issue pointed out by Matan: RMV/LSC events may be missed
and not reported to the application.

Adrien Mazarguil (3):
  net/mlx4: fix unhandled event debug message
  net/mlx4: fix rescheduled link status check
  net/mlx4: merge interrupt collector function

 drivers/net/mlx4/mlx4_intr.c | 150 ++
 1 file changed, 71 insertions(+), 79 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH v1 2/3] net/mlx4: fix rescheduled link status check

2017-09-05 Thread Adrien Mazarguil
Link status is sometimes inconsistent during a LSC event. When it occurs,
the PMD refrains from immediately notifying the application; instead, an
alarm is scheduled to check link status later and notify the application
once it has settled.

The problem is that subsequent link status checks are only performed if
additional LSC events occur in the meantime, which is not always the case.

Worse, since support for removal events was added, rescheduled link status
checks may consume them as well without notifying the application. With the
right timing, a link loss occurring just before a device removal event may
hide it from the application.

Fixes: 6dd7b7056d7f ("net/mlx4: support device removal event")
Fixes: 2d449f7c52de ("net/mlx4: fix assertion failure on link update")
Cc: sta...@dpdk.org
Cc: Gaetan Rivet 

Reported-by: Matan Azrad 
Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4_intr.c | 71 +--
 1 file changed, 46 insertions(+), 25 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_intr.c b/drivers/net/mlx4/mlx4_intr.c
index d7f1098..e1e6c05 100644
--- a/drivers/net/mlx4/mlx4_intr.c
+++ b/drivers/net/mlx4/mlx4_intr.c
@@ -59,7 +59,7 @@
 #include "mlx4_rxtx.h"
 #include "mlx4_utils.h"
 
-static void mlx4_link_status_alarm(struct priv *priv);
+static int mlx4_link_status_check(struct priv *priv);
 
 /**
  * Clean up Rx interrupts handler.
@@ -149,8 +149,6 @@ static int
 mlx4_collect_interrupt_events(struct priv *priv, uint32_t *events)
 {
struct ibv_async_event event;
-   int port_change = 0;
-   struct rte_eth_link *link = &priv->dev->data->dev_link;
const struct rte_intr_conf *const intr_conf =
&priv->dev->data->dev_conf.intr_conf;
int ret = 0;
@@ -163,9 +161,9 @@ mlx4_collect_interrupt_events(struct priv *priv, uint32_t 
*events)
switch (event.event_type) {
case IBV_EVENT_PORT_ACTIVE:
case IBV_EVENT_PORT_ERR:
-   if (!intr_conf->lsc)
+   if (!intr_conf->lsc || mlx4_link_status_check(priv))
break;
-   port_change = 1;
+   *events |= (1 << RTE_ETH_EVENT_INTR_LSC);
ret++;
break;
case IBV_EVENT_DEVICE_FATAL:
@@ -180,47 +178,70 @@ mlx4_collect_interrupt_events(struct priv *priv, uint32_t 
*events)
}
ibv_ack_async_event(&event);
}
-   if (!port_change)
-   return ret;
-   mlx4_link_update(priv->dev, 0);
-   if (((link->link_speed == 0) && link->link_status) ||
-   ((link->link_speed != 0) && !link->link_status)) {
-   if (!priv->intr_alarm) {
-   /* Inconsistent status, check again later. */
-   priv->intr_alarm = 1;
-   rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
- (void (*)(void *))
- mlx4_link_status_alarm,
- priv);
-   }
-   } else {
-   *events |= (1 << RTE_ETH_EVENT_INTR_LSC);
-   }
return ret;
 }
 
 /**
  * Process scheduled link status check.
  *
+ * If LSC interrupts are requested, process related callback.
+ *
  * @param priv
  *   Pointer to private structure.
  */
 static void
 mlx4_link_status_alarm(struct priv *priv)
 {
-   uint32_t events;
-   int ret;
+   const struct rte_intr_conf *const intr_conf =
+   &priv->dev->data->dev_conf.intr_conf;
 
assert(priv->intr_alarm == 1);
priv->intr_alarm = 0;
-   ret = mlx4_collect_interrupt_events(priv, &events);
-   if (ret > 0 && events & (1 << RTE_ETH_EVENT_INTR_LSC))
+   if (intr_conf->lsc && !mlx4_link_status_check(priv))
_rte_eth_dev_callback_process(priv->dev,
  RTE_ETH_EVENT_INTR_LSC,
  NULL, NULL);
 }
 
 /**
+ * Check link status.
+ *
+ * In case of inconsistency, another check is scheduled.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   0 on success (link status is consistent), negative errno value
+ *   otherwise and rte_errno is set.
+ */
+static int
+mlx4_link_status_check(struct priv *priv)
+{
+   struct rte_eth_link *link = &priv->dev->data->dev_link;
+   int ret = mlx4_link_update(priv->dev, 0);
+
+   if (ret)
+   return ret;
+   if ((!link->link_speed && link->link_status) ||
+   (link->link_speed && !link->link_status)) {
+   if (!priv->intr_alarm) {
+   /* Inconsistent status, check again later. */
+   ret = rte_eal_alarm_set(MLX4_INTR_ALARM_TIMEOUT,
+   (void (*)(void *))
+   

[dpdk-dev] [PATCH v1 3/3] net/mlx4: merge interrupt collector function

2017-09-05 Thread Adrien Mazarguil
Since interrupt handler is the only function relying on it, merging them
simplifies the code as there is no need for an API to return collected
events.

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4_intr.c | 94 +--
 1 file changed, 30 insertions(+), 64 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_intr.c b/drivers/net/mlx4/mlx4_intr.c
index e1e6c05..3806322 100644
--- a/drivers/net/mlx4/mlx4_intr.c
+++ b/drivers/net/mlx4/mlx4_intr.c
@@ -135,53 +135,6 @@ mlx4_rx_intr_vec_enable(struct priv *priv)
 }
 
 /**
- * Collect interrupt events.
- *
- * @param priv
- *   Pointer to private structure.
- * @param events
- *   Pointer to event flags holder.
- *
- * @return
- *   Number of events.
- */
-static int
-mlx4_collect_interrupt_events(struct priv *priv, uint32_t *events)
-{
-   struct ibv_async_event event;
-   const struct rte_intr_conf *const intr_conf =
-   &priv->dev->data->dev_conf.intr_conf;
-   int ret = 0;
-
-   *events = 0;
-   /* Read all message and acknowledge them. */
-   for (;;) {
-   if (ibv_get_async_event(priv->ctx, &event))
-   break;
-   switch (event.event_type) {
-   case IBV_EVENT_PORT_ACTIVE:
-   case IBV_EVENT_PORT_ERR:
-   if (!intr_conf->lsc || mlx4_link_status_check(priv))
-   break;
-   *events |= (1 << RTE_ETH_EVENT_INTR_LSC);
-   ret++;
-   break;
-   case IBV_EVENT_DEVICE_FATAL:
-   if (!intr_conf->rmv)
-   break;
-   *events |= (1 << RTE_ETH_EVENT_INTR_RMV);
-   ret++;
-   break;
-   default:
-   DEBUG("event type %d on port %d not handled",
- event.event_type, event.element.port_num);
-   }
-   ibv_ack_async_event(&event);
-   }
-   return ret;
-}
-
-/**
  * Process scheduled link status check.
  *
  * If LSC interrupts are requested, process related callback.
@@ -250,26 +203,39 @@ mlx4_link_status_check(struct priv *priv)
 static void
 mlx4_interrupt_handler(struct priv *priv)
 {
-   int ret;
-   uint32_t ev;
-   int i;
+   enum { LSC, RMV, };
+   static const enum rte_eth_event_type type[] = {
+   [LSC] = RTE_ETH_EVENT_INTR_LSC,
+   [RMV] = RTE_ETH_EVENT_INTR_RMV,
+   };
+   uint32_t caught[RTE_DIM(type)] = { 0 };
+   struct ibv_async_event event;
+   const struct rte_intr_conf *const intr_conf =
+   &priv->dev->data->dev_conf.intr_conf;
+   unsigned int i;
 
-   ret = mlx4_collect_interrupt_events(priv, &ev);
-   if (ret > 0) {
-   for (i = RTE_ETH_EVENT_UNKNOWN;
-i < RTE_ETH_EVENT_MAX;
-i++) {
-   if (ev & (1 << i)) {
-   ev &= ~(1 << i);
-   _rte_eth_dev_callback_process(priv->dev, i,
- NULL, NULL);
-   ret--;
-   }
+   /* Read all message and acknowledge them. */
+   while (!ibv_get_async_event(priv->ctx, &event)) {
+   switch (event.event_type) {
+   case IBV_EVENT_PORT_ACTIVE:
+   case IBV_EVENT_PORT_ERR:
+   if (intr_conf->lsc && !mlx4_link_status_check(priv))
+   ++caught[LSC];
+   break;
+   case IBV_EVENT_DEVICE_FATAL:
+   if (intr_conf->rmv)
+   ++caught[RMV];
+   break;
+   default:
+   DEBUG("event type %d on physical port %d not handled",
+ event.event_type, event.element.port_num);
}
-   if (ret)
-   WARN("%d event%s not processed", ret,
-(ret > 1 ? "s were" : " was"));
+   ibv_ack_async_event(&event);
}
+   for (i = 0; i != RTE_DIM(caught); ++i)
+   if (caught[i])
+   _rte_eth_dev_callback_process(priv->dev, type[i],
+ NULL, NULL);
 }
 
 /**
-- 
2.1.4



[dpdk-dev] [PATCH v1 1/3] net/mlx4: fix unhandled event debug message

2017-09-05 Thread Adrien Mazarguil
When LSC or RMV events are received by the PMD but are not requested by the
application, a misleading debugging message implying the PMD does not
support them is shown.

Fixes: 6dd7b7056d7f ("net/mlx4: support device removal event")
Cc: Gaetan Rivet 
Cc: sta...@dpdk.org

Signed-off-by: Adrien Mazarguil 
---
 drivers/net/mlx4/mlx4_intr.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/mlx4/mlx4_intr.c b/drivers/net/mlx4/mlx4_intr.c
index e3449ee..d7f1098 100644
--- a/drivers/net/mlx4/mlx4_intr.c
+++ b/drivers/net/mlx4/mlx4_intr.c
@@ -160,16 +160,21 @@ mlx4_collect_interrupt_events(struct priv *priv, uint32_t 
*events)
for (;;) {
if (ibv_get_async_event(priv->ctx, &event))
break;
-   if ((event.event_type == IBV_EVENT_PORT_ACTIVE ||
-event.event_type == IBV_EVENT_PORT_ERR) &&
-   intr_conf->lsc) {
+   switch (event.event_type) {
+   case IBV_EVENT_PORT_ACTIVE:
+   case IBV_EVENT_PORT_ERR:
+   if (!intr_conf->lsc)
+   break;
port_change = 1;
ret++;
-   } else if (event.event_type == IBV_EVENT_DEVICE_FATAL &&
-  intr_conf->rmv) {
+   break;
+   case IBV_EVENT_DEVICE_FATAL:
+   if (!intr_conf->rmv)
+   break;
*events |= (1 << RTE_ETH_EVENT_INTR_RMV);
ret++;
-   } else {
+   break;
+   default:
DEBUG("event type %d on port %d not handled",
  event.event_type, event.element.port_num);
}
-- 
2.1.4



[dpdk-dev] [PATCH] buildtools: zero elf info variable in pmdinfogen

2017-09-05 Thread Harry van Haaren
This commit zeros out the elf_info struct at startup of the
pmdinfogen code. If it is not zeroed, later in the code gcc
produces "may be unused" prints. Clang does not report any
issue.

This commit enables a simplification in the meson build
system, removing the requirement for "-Wno-maybe-uninitialized".

Signed-off-by: Harry van Haaren 
---

 buildtools/pmdinfogen/pmdinfogen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/buildtools/pmdinfogen/pmdinfogen.c 
b/buildtools/pmdinfogen/pmdinfogen.c
index ba1a12e..e73fc76 100644
--- a/buildtools/pmdinfogen/pmdinfogen.c
+++ b/buildtools/pmdinfogen/pmdinfogen.c
@@ -397,7 +397,7 @@ static void output_pmd_info_string(struct elf_info *info, 
char *outfile)
 
 int main(int argc, char **argv)
 {
-   struct elf_info info;
+   struct elf_info info = {0};
int rc = 1;
 
if (argc < 3) {
-- 
2.7.4



[dpdk-dev] [PATCH v2 1/3] net/mlx5: replace network to host macros

2017-09-05 Thread Shachar Beiser
Signed-off-by: Shachar Beiser 
---
 drivers/net/mlx5/mlx5_mac.c  |   8 ++-
 drivers/net/mlx5/mlx5_mr.c   |   2 +-
 drivers/net/mlx5/mlx5_rxmode.c   |   8 ++-
 drivers/net/mlx5/mlx5_rxq.c  |   9 +--
 drivers/net/mlx5/mlx5_rxtx.c | 131 +++
 drivers/net/mlx5/mlx5_rxtx.h |  12 ++--
 drivers/net/mlx5/mlx5_rxtx_vec_sse.c |  12 ++--
 7 files changed, 102 insertions(+), 80 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 45d23e4..b3c3fa2 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -263,11 +263,15 @@
(*mac)[0], (*mac)[1], (*mac)[2],
(*mac)[3], (*mac)[4], (*mac)[5]
},
-   .vlan_tag = (vlan_enabled ? htons(vlan_id) : 0),
+   .vlan_tag = (vlan_enabled ?
+rte_cpu_to_be_16(vlan_id)
+: 0),
},
.mask = {
.dst_mac = "\xff\xff\xff\xff\xff\xff",
-   .vlan_tag = (vlan_enabled ? htons(0xfff) : 0),
+   .vlan_tag = (vlan_enabled ?
+rte_cpu_to_be_16(0xfff) :
+0),
},
};
DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 9593830..9a9f73a 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -203,7 +203,7 @@ struct ibv_mr *
txq_ctrl->txq.mp2mr[idx].start = (uintptr_t)mr->addr;
txq_ctrl->txq.mp2mr[idx].end = (uintptr_t)mr->addr + mr->length;
txq_ctrl->txq.mp2mr[idx].mr = mr;
-   txq_ctrl->txq.mp2mr[idx].lkey = htonl(mr->lkey);
+   txq_ctrl->txq.mp2mr[idx].lkey = rte_cpu_to_be_32(mr->lkey);
DEBUG("%p: new MR lkey for MP \"%s\" (%p): 0x%08" PRIu32,
  (void *)txq_ctrl, mp->name, (void *)mp,
  txq_ctrl->txq.mp2mr[idx].lkey);
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 4a51e47..db2e05b 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -159,14 +159,18 @@
mac[0], mac[1], mac[2],
mac[3], mac[4], mac[5],
},
-   .vlan_tag = (vlan_enabled ? htons(vlan_id) : 0),
+   .vlan_tag = (vlan_enabled ?
+rte_cpu_to_be_16(vlan_id) :
+0),
},
.mask = {
.dst_mac = {
mask[0], mask[1], mask[2],
mask[3], mask[4], mask[5],
},
-   .vlan_tag = (vlan_enabled ? htons(0xfff) : 0),
+   .vlan_tag = (vlan_enabled ?
+rte_cpu_to_be_16(0xfff) :
+0),
},
};
 
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 35c5cb4..437dc02 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -672,9 +672,10 @@
/* scat->addr must be able to store a pointer. */
assert(sizeof(scat->addr) >= sizeof(uintptr_t));
*scat = (struct mlx5_wqe_data_seg){
-   .addr = htonll(rte_pktmbuf_mtod(buf, uintptr_t)),
-   .byte_count = htonl(DATA_LEN(buf)),
-   .lkey = htonl(rxq_ctrl->mr->lkey),
+   .addr =
+   rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, uintptr_t)),
+   .byte_count = rte_cpu_to_be_32(DATA_LEN(buf)),
+   .lkey = rte_cpu_to_be_32(rxq_ctrl->mr->lkey),
};
(*rxq_ctrl->rxq.elts)[i] = buf;
}
@@ -1077,7 +1078,7 @@
/* Update doorbell counter. */
rxq_ctrl->rxq.rq_ci = desc >> rxq_ctrl->rxq.sges_n;
rte_wmb();
-   *rxq_ctrl->rxq.rq_db = htonl(rxq_ctrl->rxq.rq_ci);
+   *rxq_ctrl->rxq.rq_db = rte_cpu_to_be_32(rxq_ctrl->rxq.rq_ci);
DEBUG("%p: rxq updated with %p", (void *)rxq_ctrl, (void *)&tmpl);
assert(ret == 0);
return 0;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index fe9e7ea..e1a35a3 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -306,7 +306,7 @@
 
op_own = cqe->op_own;
if (MLX5_CQE_FORMAT(op_own) == MLX5_COMPRESSED)
-   n = ntohl(cqe->byte_cnt);
+   n = rte_be_to_cpu_32(cqe->byte_cnt);
else
n = 1;
cq_ci += n;
@@ -434,7 +434,8 @@

[dpdk-dev] [PATCH v2 3/3] net/mlx5: fix interrupt enable return value

2017-09-05 Thread Shachar Beiser
return value is sometimes returned uninitialized

Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
Fixes: b18042fb8f49 ("net/mlx5: fix misplaced Rx interrupts functions")
Cc: adrien.mazarg...@6wind.com
Cc: sta...@dpdk.org

Signed-off-by: Shachar Beiser 
---
 drivers/net/mlx5/mlx5_rxq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 437dc02..24887fb 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1330,7 +1330,7 @@
struct priv *priv = mlx5_get_priv(dev);
struct rxq *rxq = (*priv->rxqs)[rx_queue_id];
struct rxq_ctrl *rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq);
-   int ret;
+   int ret = 0;
 
if (!rxq || !rxq_ctrl->channel) {
ret = EINVAL;
-- 
1.8.3.1



[dpdk-dev] [PATCH v2 2/3] net/mlx5: fix TSO MLNX OFED 3.3 verification

2017-09-05 Thread Shachar Beiser
Fixes: 3cf87e68d97b ("net/mlx5: remove old MLNX OFED 3.3 verification")
Cc: sta...@dpdk.org

Signed-off-by: Shachar Beiser 
---
 drivers/net/mlx5/mlx5_prm.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 608072f..8b82b5e 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -89,9 +89,6 @@
 /* Default max packet length to be inlined. */
 #define MLX5_EMPW_MAX_INLINE_LEN (4U * MLX5_WQE_SIZE)
 
-#ifndef HAVE_VERBS_MLX5_OPCODE_TSO
-#define MLX5_OPCODE_TSO MLX5_OPCODE_LSO_MPW /* Compat with OFED 3.3. */
-#endif
 
 #define MLX5_OPC_MOD_ENHANCED_MPSW 0
 #define MLX5_OPCODE_ENHANCED_MPSW 0x29
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH] buildtools: zero elf info variable in pmdinfogen

2017-09-05 Thread Bruce Richardson
On Tue, Sep 05, 2017 at 02:03:33PM +0100, Harry van Haaren wrote:
> This commit zeros out the elf_info struct at startup of the
> pmdinfogen code. If it is not zeroed, later in the code gcc
> produces "may be unused" prints. Clang does not report any
> issue.
> 
> This commit enables a simplification in the meson build
> system, removing the requirement for "-Wno-maybe-uninitialized".
> 
> Signed-off-by: Harry van Haaren 

It's worth adding to the commit message that this error only shows up in
optimized builds, which is why the warning is not disabled by default in
the existing makefile.

> ---
> 
>  buildtools/pmdinfogen/pmdinfogen.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/buildtools/pmdinfogen/pmdinfogen.c 
> b/buildtools/pmdinfogen/pmdinfogen.c
> index ba1a12e..e73fc76 100644
> --- a/buildtools/pmdinfogen/pmdinfogen.c
> +++ b/buildtools/pmdinfogen/pmdinfogen.c
> @@ -397,7 +397,7 @@ static void output_pmd_info_string(struct elf_info *info, 
> char *outfile)
>  
>  int main(int argc, char **argv)
>  {
> - struct elf_info info;
> + struct elf_info info = {0};
>   int rc = 1;
>  
>   if (argc < 3) {
> -- 
> 2.7.4
> 


[dpdk-dev] [PATCH v3] net/mlx5: support upstream rdma-core

2017-09-05 Thread Shachar Beiser
This removes the dependency on specific Mellanox OFED libraries by
using the upstream rdma-core and linux upstream community code.

---
a. Compile with rdma-core commit f11292efd541 ("Merge pull request #202")
b. Tested with linux kernel 4.13-rc4
c. For performance testing recommended to wait till kernel 4.14

Signed-off-by: Shachar Beiser 
---
 doc/guides/nics/mlx5.rst |  32 +++--
 drivers/net/mlx5/Makefile|  23 ++--
 drivers/net/mlx5/mlx5.c  |  97 ---
 drivers/net/mlx5/mlx5.h  |   4 +-
 drivers/net/mlx5/mlx5_ethdev.c   |   4 +-
 drivers/net/mlx5/mlx5_fdir.c | 103 
 drivers/net/mlx5/mlx5_flow.c | 230 +--
 drivers/net/mlx5/mlx5_mac.c  |  18 +--
 drivers/net/mlx5/mlx5_prm.h  |  42 ++-
 drivers/net/mlx5/mlx5_rxmode.c   |  18 +--
 drivers/net/mlx5/mlx5_rxq.c  | 226 +++---
 drivers/net/mlx5/mlx5_rxtx.c |   3 +-
 drivers/net/mlx5/mlx5_rxtx.h |  33 ++---
 drivers/net/mlx5/mlx5_rxtx_vec_sse.c |   3 +-
 drivers/net/mlx5/mlx5_txq.c  |  73 ++-
 drivers/net/mlx5/mlx5_vlan.c |  13 +-
 mk/rte.app.mk|   2 +-
 17 files changed, 503 insertions(+), 421 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index f4cb18b..ffa20a2 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -293,8 +293,9 @@ DPDK and must be installed separately:
   This library basically implements send/receive calls to the hardware
   queues.
 
-- **Kernel modules** (mlnx-ofed-kernel)
+- **Kernel modules** (mlnx-ofed-kernel or linux upstream)
 
+  DPDK 17.11 supports rdma-corev16 , linux upstream kernel 4.14.
   They provide the kernel-side Verbs API and low level device drivers that
   manage actual hardware initialization and resources sharing with user
   space processes.
@@ -320,9 +321,29 @@ DPDK and must be installed separately:
Both libraries are BSD and GPL licensed. Linux kernel modules are GPL
licensed.
 
+- **installation options**
+
+  In order to install the above , Mellanox supports two options:
+
+rmda-core + upstream kernel (recommened)
+
+
+Currently supported by DPDK:
+
+  - minimal kernel version : 4.13-rc4
+  - minimal rdma-core version: v15
+
+installation instructions can be found in :
+
+  - https://github.com/linux-rdma/rdma-core
+  - https://github.com/Mellanox/linux
+
+Mellanox OFED
+~
+
 Currently supported by DPDK:
 
-- Mellanox OFED version: **4.1**.
+- Mellanox OFED version: **4.2**.
 - firmware version:
 
   - ConnectX-4: **12.20.1010** and above.
@@ -330,9 +351,6 @@ Currently supported by DPDK:
   - ConnectX-5: **16.20.1010** and above.
   - ConnectX-5 Ex: **16.20.1010** and above.
 
-Getting Mellanox OFED
-~
-
 While these libraries and kernel modules are available on OpenFabrics
 Alliance's `website `__ and provided by package
 managers on most distributions, this PMD requires Ethernet extensions that
@@ -373,8 +391,8 @@ Supported NICs
 * Mellanox(R) ConnectX(R)-5 100G MCX556A-ECAT (2x100G)
 * Mellanox(R) ConnectX(R)-5 Ex EN 100G MCX516A-CDAT (2x100G)
 
-Quick Start Guide
--
+Quick Start Guide on OFED
+--
 
 1. Download latest Mellanox OFED. For more info check the  `prerequisites`_.
 
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 14b739a..d9c42b5 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -104,19 +104,19 @@ mlx5_autoconf.h.new: FORCE
 mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
$Q $(RM) -f -- '$@'
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_IBV_EXP_CQ_COMPRESSED_CQE \
-   infiniband/verbs_exp.h \
-   enum IBV_EXP_CQ_COMPRESSED_CQE \
+   HAVE_IBV_DEVICE_VXLAN_SUPPORT \
+   infiniband/verbs.h \
+   enum IBV_DEVICE_VXLAN_SUPPORT \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_MLX5_ETH_VLAN_INLINE_HEADER_SIZE \
-   infiniband/mlx5_hw.h \
-   enum MLX5_ETH_VLAN_INLINE_HEADER_SIZE \
+   HAVE_IBV_WQ_FLAG_RX_END_PADDING \
+   infiniband/verbs.h \
+   enum IBV_WQ_FLAG_RX_END_PADDING \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
-   HAVE_VERBS_MLX5_OPCODE_TSO \
-   infiniband/mlx5_hw.h \
-   enum MLX5_OPCODE_TSO \
+   HAVE_IBV_MLX5_MOD_MPW \
+   infiniband/mlx5dv.h \
+   enum MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
HAVE_ETHTOOL_LINK_MODE_25G \
@@ -133,11 +133,6 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
/usr/include/linux/ethtool.h \
 

Re: [dpdk-dev] [PATCH] net/failsafe: stat support enhancement

2017-09-05 Thread Gaëtan Rivet
Hi Matan,

On Tue, Sep 05, 2017 at 12:56:34PM +0300, Matan Azrad wrote:
> The previous stats code returned only the current TX sub
> device stats.
> 
> This enhancement extends it to return the sum of all sub
> devices stats with history of removed sub-devices.
> 
> Dedicated stats accumulator saves the stat history of all
> sub device remove events.
> 
> Each failsafe sub device contains the last stats asked by
> the user and updates the accumulator in removal time.
> 
> I would like to implement ultimate snapshot on removal time.
> The stats_get API needs to be changed to return error in the
> case it is too late to retrieve statistics.

You need to have this API evolution first. It is not assured to be
accepted by the community.

As it is, this version is incorrect, because the only available stats
for a removed slave is the last snapshot.

Thus it complicates the rules while still being incorrect. Even if you
were able to push for this API evolution, PMDs with hard counters would
be the ones to return an error code on stat read while removed.
You may be lucky at the moment because MLX drivers do not support hard
counters, but this won't always be true (and it will be false soon
enough). You will thus be back at square one, with a new useless API and
still incorrect stats in the fail-safe. On the other hand, the fail-safe
should strive to be as easy to use as possible with most drivers, and
not cater specifically to soft-counters ones.

So, my take on this: I understand that aggregated stats would be
interesting. Keep it simple & stupid: simple aggregation on stats_get.
You will have a rollback at some point on device removal, but this is
still easier to detect / deal with than partially / sometimes incorrect
stats.

> By this way, failsafe can get stats snapshot in removal interrupt
> callback for each PMD which can give stats after removal event.
> 
> Signed-off-by: Matan Azrad 
> ---
>  drivers/net/failsafe/failsafe_ether.c   | 33 
> +
>  drivers/net/failsafe/failsafe_ops.c | 16 
>  drivers/net/failsafe/failsafe_private.h |  5 +
>  3 files changed, 50 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_ether.c 
> b/drivers/net/failsafe/failsafe_ether.c
> index a3a8cce..133080d 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -399,6 +399,37 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
>   return ret;
>  }
>  
> +void
> +fs_increment_stats(struct rte_eth_stats *from, struct rte_eth_stats *to)
> +{
> + uint8_t i;
> +
> + RTE_ASSERT(from != NULL && to != NULL);
> + to->ipackets += from->ipackets;
> + to->opackets += from->opackets;
> + to->ibytes += from->ibytes;
> + to->obytes += from->obytes;
> + to->imissed += from->imissed;
> + to->ierrors += from->ierrors;
> + to->oerrors += from->oerrors;
> + to->rx_nombuf += from->rx_nombuf;
> + for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS; i++) {
> + to->q_ipackets[i] += from->q_ipackets[i];
> + to->q_opackets[i] += from->q_opackets[i];
> + to->q_ibytes[i] += from->q_ibytes[i];
> + to->q_obytes[i] += from->q_obytes[i];
> + to->q_errors[i] += from->q_errors[i];
> + }
> +}
> +
> +void
> +fs_increment_stats_accumulator(struct sub_device *sdev)
> +{
> + fs_increment_stats(&sdev->stats_snapshot,
> + &PRIV(sdev->fs_dev)->stats_accumulator);
> + memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
> +}
> +
>  int
>  failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
>   enum rte_eth_event_type event __rte_unused,
> @@ -410,6 +441,8 @@ failsafe_eth_rmv_event_callback(uint8_t port_id 
> __rte_unused,
>   fs_switch_dev(sdev->fs_dev, sdev);
>   /* Use safe bursts in any case. */
>   set_burst_fn(sdev->fs_dev, 1);
> + /* Increment the stats accumulator by the last stats snapshot. */
> + fs_increment_stats_accumulator(sdev);
>   /*
>* Async removal, the sub-PMD will try to unregister
>* the callback at the source of the current thread context.
> diff --git a/drivers/net/failsafe/failsafe_ops.c 
> b/drivers/net/failsafe/failsafe_ops.c
> index ff9ad15..e47cc85 100644
> --- a/drivers/net/failsafe/failsafe_ops.c
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -586,9 +586,14 @@ static void
>  fs_stats_get(struct rte_eth_dev *dev,
>struct rte_eth_stats *stats)
>  {
> - if (TX_SUBDEV(dev) == NULL)
> - return;
> - rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
> + struct sub_device *sdev;
> + uint8_t i;
> +
> + memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
> + FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> + rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot);
> + fs_increment_stats(&sdev->stats_snapshot, stats);
>

Re: [dpdk-dev] [PATCH v2 3/3] net/mlx5: fix interrupt enable return value

2017-09-05 Thread Adrien Mazarguil
Hi Shachar,

On Tue, Sep 05, 2017 at 01:04:38PM +, Shachar Beiser wrote:
> return value is sometimes returned uninitialized
> 
> Fixes: e1016cb73383 ("net/mlx5: fix Rx interrupts management")
> Fixes: b18042fb8f49 ("net/mlx5: fix misplaced Rx interrupts functions")
> 
> Cc: adrien.mazarg...@6wind.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Shachar Beiser 

Looks like in both commits, ret is properly initialized so I'm wondering if
the fixes line is right? Did you even get a compilation error?

Otherwise, you should drop this patch from the series.

> ---
>  drivers/net/mlx5/mlx5_rxq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index 437dc02..24887fb 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -1330,7 +1330,7 @@
>   struct priv *priv = mlx5_get_priv(dev);
>   struct rxq *rxq = (*priv->rxqs)[rx_queue_id];
>   struct rxq_ctrl *rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq);
> - int ret;
> + int ret = 0;
>  
>   if (!rxq || !rxq_ctrl->channel) {
>   ret = EINVAL;
> -- 
> 1.8.3.1
> 

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH v1 0/3] net/mlx4: additional interrupt handling fixes

2017-09-05 Thread Gaëtan Rivet
On Tue, Sep 05, 2017 at 02:56:36PM +0200, Adrien Mazarguil wrote:
> While the previous interrupt rework improved the situation, it failed to
> address one last issue pointed out by Matan: RMV/LSC events may be missed
> and not reported to the application.
> 
> Adrien Mazarguil (3):
>   net/mlx4: fix unhandled event debug message
>   net/mlx4: fix rescheduled link status check
>   net/mlx4: merge interrupt collector function
> 
>  drivers/net/mlx4/mlx4_intr.c | 150 ++
>  1 file changed, 71 insertions(+), 79 deletions(-)
> 
> -- 
> 2.1.4
> 

For the series:

Acked-by: Gaetan Rivet 

-- 
Gaëtan Rivet
6WIND


[dpdk-dev] [PATCH v2] buildtools: zero elf info variable in pmdinfogen

2017-09-05 Thread Harry van Haaren
This commit zeros out the elf_info struct at startup of the
pmdinfogen code. If it is not zeroed, later in the code gcc
produces "may be unused" prints. Clang does not report any
issue.

This issue is only observed when compiling pmdinfogen as an
optimized build, hence this warning is not disabled in the
existing Makefile.

This commit enables a simplification in the meson build
system, removing the requirement for "-Wno-maybe-uninitialized".

Signed-off-by: Harry van Haaren 

---

v2:
- Added note to commit message about optimized compiles (Bruce)

---
 buildtools/pmdinfogen/pmdinfogen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/buildtools/pmdinfogen/pmdinfogen.c 
b/buildtools/pmdinfogen/pmdinfogen.c
index ba1a12e..e73fc76 100644
--- a/buildtools/pmdinfogen/pmdinfogen.c
+++ b/buildtools/pmdinfogen/pmdinfogen.c
@@ -397,7 +397,7 @@ static void output_pmd_info_string(struct elf_info *info, 
char *outfile)
 
 int main(int argc, char **argv)
 {
-   struct elf_info info;
+   struct elf_info info = {0};
int rc = 1;
 
if (argc < 3) {
-- 
2.7.4



Re: [dpdk-dev] [PATCH v1 3/3] net/mlx4: merge interrupt collector function

2017-09-05 Thread Gaëtan Rivet
On Tue, Sep 05, 2017 at 02:56:39PM +0200, Adrien Mazarguil wrote:
> Since interrupt handler is the only function relying on it, merging them
> simplifies the code as there is no need for an API to return collected
> events.
> 
> Signed-off-by: Adrien Mazarguil 
> ---
>  drivers/net/mlx4/mlx4_intr.c | 94 +--
>  1 file changed, 30 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/net/mlx4/mlx4_intr.c b/drivers/net/mlx4/mlx4_intr.c
> index e1e6c05..3806322 100644
> --- a/drivers/net/mlx4/mlx4_intr.c
> +++ b/drivers/net/mlx4/mlx4_intr.c
> @@ -135,53 +135,6 @@ mlx4_rx_intr_vec_enable(struct priv *priv)
>  }
>  
>  /**
> - * Collect interrupt events.
> - *
> - * @param priv
> - *   Pointer to private structure.
> - * @param events
> - *   Pointer to event flags holder.
> - *
> - * @return
> - *   Number of events.
> - */
> -static int
> -mlx4_collect_interrupt_events(struct priv *priv, uint32_t *events)
> -{
> - struct ibv_async_event event;
> - const struct rte_intr_conf *const intr_conf =
> - &priv->dev->data->dev_conf.intr_conf;
> - int ret = 0;
> -
> - *events = 0;
> - /* Read all message and acknowledge them. */
> - for (;;) {
> - if (ibv_get_async_event(priv->ctx, &event))
> - break;
> - switch (event.event_type) {
> - case IBV_EVENT_PORT_ACTIVE:
> - case IBV_EVENT_PORT_ERR:
> - if (!intr_conf->lsc || mlx4_link_status_check(priv))
> - break;
> - *events |= (1 << RTE_ETH_EVENT_INTR_LSC);
> - ret++;
> - break;
> - case IBV_EVENT_DEVICE_FATAL:
> - if (!intr_conf->rmv)
> - break;
> - *events |= (1 << RTE_ETH_EVENT_INTR_RMV);
> - ret++;
> - break;
> - default:
> - DEBUG("event type %d on port %d not handled",
> -   event.event_type, event.element.port_num);
> - }
> - ibv_ack_async_event(&event);
> - }
> - return ret;
> -}
> -
> -/**
>   * Process scheduled link status check.
>   *
>   * If LSC interrupts are requested, process related callback.
> @@ -250,26 +203,39 @@ mlx4_link_status_check(struct priv *priv)
>  static void
>  mlx4_interrupt_handler(struct priv *priv)
>  {
> - int ret;
> - uint32_t ev;
> - int i;
> + enum { LSC, RMV, };
> + static const enum rte_eth_event_type type[] = {
> + [LSC] = RTE_ETH_EVENT_INTR_LSC,
> + [RMV] = RTE_ETH_EVENT_INTR_RMV,
> + };
> + uint32_t caught[RTE_DIM(type)] = { 0 };

This is nicely written

> + struct ibv_async_event event;
> + const struct rte_intr_conf *const intr_conf =
> + &priv->dev->data->dev_conf.intr_conf;
> + unsigned int i;
>  
> - ret = mlx4_collect_interrupt_events(priv, &ev);
> - if (ret > 0) {
> - for (i = RTE_ETH_EVENT_UNKNOWN;
> -  i < RTE_ETH_EVENT_MAX;
> -  i++) {
> - if (ev & (1 << i)) {
> - ev &= ~(1 << i);
> - _rte_eth_dev_callback_process(priv->dev, i,
> -   NULL, NULL);
> - ret--;
> - }
> + /* Read all message and acknowledge them. */
> + while (!ibv_get_async_event(priv->ctx, &event)) {
> + switch (event.event_type) {
> + case IBV_EVENT_PORT_ACTIVE:
> + case IBV_EVENT_PORT_ERR:
> + if (intr_conf->lsc && !mlx4_link_status_check(priv))
> + ++caught[LSC];
> + break;
> + case IBV_EVENT_DEVICE_FATAL:
> + if (intr_conf->rmv)
> + ++caught[RMV];
> + break;
> + default:
> + DEBUG("event type %d on physical port %d not handled",
> +   event.event_type, event.element.port_num);
>   }
> - if (ret)
> - WARN("%d event%s not processed", ret,
> -  (ret > 1 ? "s were" : " was"));
> + ibv_ack_async_event(&event);
>   }
> + for (i = 0; i != RTE_DIM(caught); ++i)
> + if (caught[i])
> + _rte_eth_dev_callback_process(priv->dev, type[i],
> +   NULL, NULL);
>  }
>  
>  /**
> -- 
> 2.1.4
> 

-- 
Gaëtan Rivet
6WIND


Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal event

2017-09-05 Thread Matan Azrad
Hi Adrien

> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> Sent: Tuesday, September 5, 2017 3:02 PM
> To: Matan Azrad 
> Cc: Nélio Laranjeiro ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal event
> 
> Hi Matan,
> 
> On Tue, Sep 05, 2017 at 10:38:21AM +, Matan Azrad wrote:
> > Hi Adrien
> >
> > > -Original Message-
> > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > > Sent: Tuesday, September 5, 2017 12:28 PM
> > > To: Matan Azrad 
> > > Cc: Nélio Laranjeiro ; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: support device removal
> > > event
> > >
> > > Hi Matan,
> > >
> > > On Mon, Sep 04, 2017 at 05:52:55PM +, Matan Azrad wrote:
> > > > Hi Adrien,
> > > >
> > > > > -Original Message-
> > > > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> > > > > Sent: Monday, September 4, 2017 6:33 PM
> > > > > To: Matan Azrad 
> > > > > Cc: Nélio Laranjeiro ; dev@dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: support device
> > > > > removal event
> > > > >
> > > > > Hi Matan,
> > > > >
> > > > > One comment I have is, while this patch adds support for RMV, it
> > > > > also silently addresses a bug (see large comment you added to
> > > > > priv_link_status_update()).
> > > > >
> > > > > This should be split in two commits, with the fix part coming
> > > > > first and CC sta...@dpdk.org, and a second commit adding RMV
> > > > > support
> > > proper.
> > > > >
> > > >
> > > > Actually, the mlx4 bug was not appeared in the mlx5 previous code,
> > > > Probably because the RMV interrupt was not implemented in mlx5
> > > > before
> > > this patch.
> > >
> > > Good point, no RMV could occur before it is implemented, however a
> > > dedicated commit for the fix itself (i.e. alarm callback not
> > > supposed to end up calling ibv_get_async_event()) might better
> > > explain the logic behind these changes. What I mean is, if there was
> > > no problem, you wouldn't need to make
> > > priv_link_status_update() a separate function, right?
> > >
> >
> > The separation was done mainly because of the new interrupt
> > implementation, else, there was bug here.
> > The unnecessary  alarm ibv_get_async_event calling was harmless in the
> > previous code.
> > I gets your point for the logic explanation behind these changes and I
> > can add it in this patch commit log to be clearer, something like:
> > The link update operation was separated from the interrupt callback to
> > avoid RMV interrupt disregard and unnecessary event acknowledgment
> > caused by the inconsistent link status alarm callback.
> 
> Yes, it's better to explain why you did this in the commit log, but see below.
> 
> > > > The big comment just explains the link inconsistent issue and was
> > > > added here since Nelio and I think the new function,
> > > > priv_link_status_update(), justifies this comment for future review.
> > >
> > > I understand, this could also have been part of the commit log of
> > > the dedicated commit.
> > >
> > Are you sure we need to describe the code comment reason in the commit
> log?
> 
> It's a change you did to address a possible bug otherwise so we have to,
> however remember that a commit should, as much as possible, do exactly
> one thing. If you need to explain that you did this in order to do that, 
> "this"
> and "that" can often be identified as two separate commits. Doing so makes
> it much easier for reviewers to understand the reasoning behind changes
> and leads to quicker reviews (makes instant-acks even possible).
> 
> It'd still like a separate commit if you don't mind.

Sorry, but I think it is an infinite order.
I have just added RMV interrupt, I did a lot of things in this patch for it.
I think  I don't need to separate each thing done for this support.
I prefer to stay it in one patch if you don't mind. 
 
> 
> --
> Adrien Mazarguil
> 6WIND


Re: [dpdk-dev] [PATCH v2 2/3] net/mlx5: fix TSO MLNX OFED 3.3 verification

2017-09-05 Thread Nélio Laranjeiro
On Tue, Sep 05, 2017 at 01:04:37PM +, Shachar Beiser wrote:
> Fixes: 3cf87e68d97b ("net/mlx5: remove old MLNX OFED 3.3 verification")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Shachar Beiser 
> ---
>  drivers/net/mlx5/mlx5_prm.h | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
> index 608072f..8b82b5e 100644
> --- a/drivers/net/mlx5/mlx5_prm.h
> +++ b/drivers/net/mlx5/mlx5_prm.h
> @@ -89,9 +89,6 @@
>  /* Default max packet length to be inlined. */
>  #define MLX5_EMPW_MAX_INLINE_LEN (4U * MLX5_WQE_SIZE)
>  
> -#ifndef HAVE_VERBS_MLX5_OPCODE_TSO
> -#define MLX5_OPCODE_TSO MLX5_OPCODE_LSO_MPW /* Compat with OFED 3.3. */
> -#endif
>  
>  #define MLX5_OPC_MOD_ENHANCED_MPSW 0
>  #define MLX5_OPCODE_ENHANCED_MPSW 0x29
> -- 
> 1.8.3.1
> 


Acked-by: Nelio Laranjeiro 

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v2 1/3] net/mlx5: replace network to host macros

2017-09-05 Thread Nélio Laranjeiro
On Tue, Sep 05, 2017 at 01:04:36PM +, Shachar Beiser wrote:
> Signed-off-by: Shachar Beiser 
Acked-by: Nelio Laranjeiro 

Thanks,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v3] net/mlx5: support upstream rdma-core

2017-09-05 Thread Nélio Laranjeiro
On Tue, Sep 05, 2017 at 01:19:08PM +, Shachar Beiser wrote:
> This removes the dependency on specific Mellanox OFED libraries by
> using the upstream rdma-core and linux upstream community code.
> 
> ---
> a. Compile with rdma-core commit f11292efd541 ("Merge pull request #202")
> b. Tested with linux kernel 4.13-rc4
> c. For performance testing recommended to wait till kernel 4.14
> 
> Signed-off-by: Shachar Beiser 
Acked-by: Nelio Laranjeiro 

Thanks,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH 4/4] ethdev: add helpers to move to the new offloads API

2017-09-05 Thread Thomas Monjalon
05/09/2017 12:51, Shahaf Shuler:
> So looks like we all agree PMDs should report as part of the 
> rte_eth_dev_info_get which offloads are per port and which are per queue.
> 
> Regarding the offloads configuration by application I see 2 options:
> 1. have an API to set offloads per port as part of device configure and API 
> to set offloads per queue as part of queue setup
> 2. set all offloads as part of queue configuration (per port offloads will be 
> set equally for all queues). In case of a mixed configuration for port 
> offloads PMD will return error.
> Such error can be reported on device start. The PMD will traverse the 
> queues and check for conflicts.
> 
> I will focus on the cons, since both achieve the goal:
> 
> Cons of #1:
> - Two places to configure offloads.
> - Like Thomas mentioned - what about offloads per device? This direction 
> leads to more places to configure the offloads.
> 
> Cons of #2:
> - Late error reporting - on device start and not on queue setup.

Why not reporting error on queue setup?

> I would go with #2.

I vote also for #2


[dpdk-dev] [PATCH 1/2] service: fix service lcore stop function

2017-09-05 Thread Guduri Prathyusha
lcore_states store the state of the lcore. Fixing the invalid
dereference of lcore_states with service number

Fixes: 21698354c832 ("service: introduce service cores concept")

Signed-off-by: Guduri Prathyusha 
---
 lib/librte_eal/common/rte_service.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/rte_service.c 
b/lib/librte_eal/common/rte_service.c
index 7efb76dc8..2ac77cc2a 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -609,7 +609,7 @@ rte_service_lcore_stop(uint32_t lcore)
uint32_t i;
for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) {
int32_t enabled =
-   lcore_states[i].service_mask & (UINT64_C(1) << i);
+   lcore_states[lcore].service_mask & (UINT64_C(1) << i);
int32_t service_running = rte_services[i].runstate !=
RUNSTATE_STOPPED;
int32_t only_core = rte_services[i].num_mapped_cores == 1;
-- 
2.14.1



[dpdk-dev] [PATCH 2/2] service: fix service lcore start stop unit test

2017-09-05 Thread Guduri Prathyusha
Unit test case service_lcore_start_stop fails since the service core was
stopped without stopping the service.

This commit fixes the test by adding negative and positive cases of
stopping the service lcore before and after stopping the service
respectively.

Fixes: f038a81e1c56 ("service: add unit tests")

Signed-off-by: Guduri Prathyusha 
---
 test/test/test_service_cores.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/test/test/test_service_cores.c b/test/test/test_service_cores.c
index 88fac8f44..b043397ef 100644
--- a/test/test/test_service_cores.c
+++ b/test/test/test_service_cores.c
@@ -553,6 +553,10 @@ service_lcore_start_stop(void)
"Service core expected to poll service but it didn't");

/* core stop */
+   TEST_ASSERT_EQUAL(-EBUSY, rte_service_lcore_stop(slcore_id),
+   "Service core running a service should return -EBUSY");
+   TEST_ASSERT_EQUAL(0, rte_service_stop(s),
+   "Stopping valid service failed");
TEST_ASSERT_EQUAL(-EINVAL, rte_service_lcore_stop(10),
"Invalid Service core stop should return -EINVAL");
TEST_ASSERT_EQUAL(0, rte_service_lcore_stop(slcore_id),
--
2.14.1



  1   2   >