Re: [dpdk-dev] [PATCH] net/bonding: fix buf corruption in merging un-transmitted packets

2018-08-20 Thread Jia Yu
Hi Chas,

Could you help to review the v2 of the patch? I have addressed the “Fixes” 
requirement in commit message, and modified slave_tx_fail_count’s definition.

Thanks,
Jia

From: Chas Williams <3ch...@gmail.com>
Date: Sunday, August 19, 2018 at 5:07 PM
To: Jia Yu 
Cc: "dev@dpdk.org" , Declan Doherty , 
Chas Williams 
Subject: Re: [dpdk-dev] [PATCH] net/bonding: fix buf corruption in merging 
un-transmitted packets



On Sun, Aug 19, 2018 at 6:19 PM Jia Yu 
mailto:j...@vmware.com>> wrote:
Hi Chas,

Thanks for reviewing the change.

Our application crashed after upgrading to DPDK 18.02, when packet rate is high 
and bond is configured. It happened because txq contains invalid mbuf addresses 
after rte_eth_tx_burst call (for example, 0x1 repeated 13 times in one 
core dump). It seems that the invalid addresses came from buf shift code below, 
so I changed this part of code to earlier version. We don’t see crash after the 
fix. Please let us know if the fix is reasonable.

With respect to correctness, the code you are fixing does seem broken.
See inline below.


m_table = {0x7fdf23a68ec0, 0x7fdf23a68400, 0x7fdf23a66e80, 0x7fdf23a65900,
0x7fdf23960e00, 0x1 , 0x7fdf23978640, 
0x7fdf23977b80, 0x7fdf239745c0, 0x7fdf23a4d600, 0x7fdf23a4cb40,
0x7fdf23a75b00, 0x7fdf23a74580, 0x7fdf23972580, 0x7fdf239a76c0, 
0x7fdf239a5680, 0x7fdf23a7a640, 0x7fdf23a79b80, 0x7fdf23a77b40,
0x7fdf2395b800}

/* Send packet burst on each slave device */
for (i = 0; i < slave_count; i++) {
if (slave_nb_bufs[i] == 0)
continue;

slave_tx_count = rte_eth_tx_burst(slave_port_ids[i],
bd_tx_q->queue_id, slave_bufs[i],
slave_nb_bufs[i]);

A typical failure here would be to transmit no packets.  slave_tx_count = 0.




total_tx_count += slave_tx_count;

/* If tx burst fails move packets to end of bufs */
if (unlikely(slave_tx_count < slave_nb_bufs[i])) {
slave_tx_fail_count[i] = slave_nb_bufs[i] -
slave_tx_count;
total_tx_fail_count += slave_tx_fail_count[i];

/*
 * Shift bufs to beginning of array to allow reordering
 * later
 */
for (j = 0; j < slave_tx_fail_count[i]; j++) {
slave_bufs[i][j] =
slave_bufs[i][(slave_tx_count - 1) + j]; < what if 
slave_tx_count == 0, j == 0

As you correctly point out, slave_tx_count - 1 would be -1.  Bad news.
So, yes, I agree with your fix.  In your fix, you might notice that
slave_tx_fail_count doesn't need to be an array.  It's local to
if (unlikely(slave_tx_count < slave_nb_bufs[i])) now.


}
}
}

Thanks,
Jia

From: Chas Williams <3ch...@gmail.com>
Date: Saturday, August 18, 2018 at 4:50 PM
To: Jia Yu mailto:j...@vmware.com>>
Cc: "dev@dpdk.org" mailto:dev@dpdk.org>>, 
Declan Doherty mailto:declan.dohe...@intel.com>>, 
Chas Williams mailto:ch...@att.com>>
Subject: Re: [dpdk-dev] [PATCH] net/bonding: fix buf corruption in merging 
un-transmitted packets



On Fri, Aug 17, 2018 at 9:46 PM Jia Yu 
mailto:j...@vmware.com>> wrote:
When bond slave devices cannot transmit all packets in bufs array,
tx_burst callback shall merge the un-transmitted packets back to
bufs array. Recent merge logic introduced a bug which causes
invalid mbuf addresses being written to bufs array.

Can you expand on this a bit?  What was the commit?


When caller frees the un-transmitted packets, due to invalid addresses,
application will crash.

The fix is avoid shifting mbufs, and directly write un-transmitted
packets back to bufs array.

Signed-off-by: Jia Yu mailto:j...@vmware.com>>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 98 +-
 1 file changed, 14 insertions(+), 84 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 58f7377..cce973a 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -303,7 +303,7 @@ bond_ethdev_tx_burst_8023ad_fast_queue(void *queue, struct 
rte_mbuf **bufs,
uint16_t slave_tx_count, slave_tx_fail_count[RTE_MAX_ETHPORTS] = { 0 };
uint16_t total_tx_count = 0, total_tx_fail_count = 0;

-   uint16_t i, j;
+   uint16_t i;

if (unlikely(nb_bufs == 0))
return 0;
@@ -361,31 +361,9 @@ bond_ethdev_tx_burst_8023ad_fast_queue(void *queue, struct 
rte_mbuf **bufs,
slave_tx_fail_count[i] = slave_nb_bufs[i] -
slave_tx_count;
total_tx_fail_count += slave_tx_fail_count[i];
-
-   /*
-* Shift bufs to beginning of array to allow reordering
-* later
-*/
-   for (j = 0; j < slave_tx_fail_count[i]; j++) {
-

Re: [dpdk-dev] dpaa2: building with EXTRA_CFLAGS="-g -O0" and shared libs link error

2018-08-20 Thread Hemant Agrawal
Hi Keith,
I am able to reproduce the issue. We will send the fix asap.

Regards,
Hemant

-Original Message-
From: dev  On Behalf Of Wiles, Keith
Sent: Sunday, August 19, 2018 8:25 PM
To: DPDK 
Subject: [dpdk-dev] dpaa2: building with EXTRA_CFLAGS="-g -O0" and shared libs 
link error


I am building on Ubuntu 18.04.1 LTS, gcc 7.3.0-16ubuntu3 and with 
EXTRA_CFLAGS=“-g -O0” with shared libs enabled.

I get an undefined reference to rte_dpaa2_memsegs when building 
portal/dpaa2_hw_dpio.c in the link phase.

The odd part is this does not happen unless you are building with EXTRA_CFLAGS 
and with shared libs enabled.

It seems to have circular dependency on drivers/mempool/dpaa2 and 
drivers/bus/fslmc/portal

This breaks being able to build a debug version of DPDK.


Regards,
Keith



Re: [dpdk-dev] [PATCH 2/2] virtio: fix PCI config err handling

2018-08-20 Thread Tiwei Bie
On Thu, Aug 16, 2018 at 07:49:43PM +0100, Luca Boccassi wrote:
> On Thu, 2018-08-16 at 19:47 +0100, Luca Boccassi wrote:
> > From: Brian Russell 
> > 
> > In virtio_read_caps and vtpci_msix_detect, rte_pci_read_config
> > returns
> > the number of bytes read from PCI config or < 0 on error.
> > If less than the expected number of bytes are read then log the
> > failure and return rather than carrying on with garbage.
> > 
> > Fixes: 6ba1f63b5ab0 ("virtio: support specification 1.0")
> > 
> > Signed-off-by: Brian Russell 
> > Signed-off-by: Luca Boccassi 
> > ---
> > v2: handle additional rte_pci_read_config incomplete reads
> > 
> >  drivers/net/virtio/virtio_pci.c | 35 +
> > 
> >  1 file changed, 22 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/net/virtio/virtio_pci.c
> > b/drivers/net/virtio/virtio_pci.c
> > index 6bd22e54a6..ff6f96f361 100644
> > --- a/drivers/net/virtio/virtio_pci.c
> > +++ b/drivers/net/virtio/virtio_pci.c
> ...
> > @@ -610,9 +613,13 @@ virtio_read_caps(struct rte_pci_device *dev,
> > struct virtio_hw *hw)
> >     hw->common_cfg = get_cfg_addr(dev, &cap);
> >     break;
> >     case VIRTIO_PCI_CAP_NOTIFY_CFG:
> > -   rte_pci_read_config(dev, &hw-
> > >notify_off_multiplier,
> > -   4, pos + sizeof(cap));
> > -   hw->notify_base = get_cfg_addr(dev, &cap);
> > +   /* Avoid half-reads into hw */
> > +   ret = rte_pci_read_config(dev, &multiplier,
> > 4,
> > +   pos + sizeof(cap));
> > +   if (ret == 4) {
> > +   hw->notify_off_multiplier =
> > multiplier;
> > +   hw->notify_base = get_cfg_addr(dev,
> > &cap);
> > +   }
> >     break;
> >     case VIRTIO_PCI_CAP_DEVICE_CFG:
> >     hw->dev_cfg = get_cfg_addr(dev, &cap);
> 
> Tiwei: not 100% sure what's the best way to handle a failure here, this
> will avoid writing to *hw in case of errors. Let me know if this is OK.

My concern is about reading the virtio capability directly.
With this patch, it will give up reading other capabilities
when failed to read a full virtio PCI capability structure
(the returned bytes are less than the expected bytes) even
though when the capability it's reading isn't a virtio vendor
specific capability. I'm not quite sure whether it will bring
any bad consequence. How about just reading the first two
bytes first? Something like this:

https://github.com/freebsd/freebsd/blob/72445a41b3ff/sys/dev/pci/pci.c#L1491-L1497

Thanks


Re: [dpdk-dev] [dpdk-stable] 16.11.8 (LTS) patches review and test

2018-08-20 Thread Luca Boccassi
Great, thanks for checking!

On Mon, 2018-08-20 at 08:46 +0200, Marco Varlese wrote:
> Hi Luca & all,
> 
> I have gone through the usual smoke tests using test_pmd and OvS-
> DPDK.
> I have not experienced any issues with the -rc1 release.
> 
> 
> Thanks,
> Marco
> 
> On Mon, 2018-08-13 at 19:21 +0100, luca.bocca...@gmail.com wrote:
> > Hi all,
> > 
> > Here is a list of patches targeted for LTS release 16.11.8. Please
> > help review and test. The planned date for the final release is
> > August
> > the 23rd.
> > Before that, please shout if anyone has objections with these
> > patches being applied.
> > 
> > Also for the companies committed to running regression tests,
> > please run the tests and report any issue before the release date.
> > 
> > A release candidate tarball can be found at:
> > 
> > https://dpdk.org/browse/dpdk-stable/tag/?id=v16.11.8-rc1
> > 
> > These patches are located at branch 16.11 of dpdk-stable repo:
> > https://dpdk.org/browse/dpdk-stable/
> > 
> > Thanks.
> > 
> > Luca Boccassi
> > 
> > ---
> > Adrien Mazarguil (1):
> >   maintainers: update for Mellanox PMDs
> > 
> > Ajit Khaparde (6):
> >   net/bnxt: fix HW Tx checksum offload check
> >   net/bnxt: fix incorrect IO address handling in Tx
> >   net/bnxt: fix Rx ring count limitation
> >   net/bnxt: check access denied for HWRM commands
> >   net/bnxt: fix RETA size
> >   net/bnxt: fix close operation
> > 
> > Alejandro Lucero (1):
> >   net/nfp: fix field initialization in Tx descriptor
> > 
> > Anatoly Burakov (2):
> >   eal/linux: fix invalid syntax in interrupts
> >   test: fix EAL flags autotest on FreeBSD
> > 
> > Beilei Xing (1):
> >   net/i40e: fix shifts of 32-bit value
> > 
> > Bruce Richardson (2):
> >   examples/exception_path: fix out-of-bounds read
> >   mk: fix permissions when using make install
> > 
> > Chas Williams (1):
> >   net/bonding: do not clear active slave count
> > 
> > Damjan Marion (1):
> >   net/i40e: do not reset device info data
> > 
> > Dan Gora (1):
> >   kni: fix crash with null name
> > 
> > Daria Kolistratova (1):
> >   net/ena: fix SIGFPE with 0 Rx queue
> > 
> > Dariusz Stojaczyk (1):
> >   eal: fix return codes on thread naming failure
> > 
> > Drocula Lambda (1):
> >   kni: fix build on RHEL 7.5
> > 
> > Emma Kenny (1):
> >   examples/multi_process: build l2fwd_fork app
> > 
> > Ferruh Yigit (2):
> >   kni: fix build with gcc 8.1
> >   net/thunderx: fix build with gcc optimization on
> > 
> > Fiona Trahe (1):
> >   crypto/qat: fix checks for 3GPP algo bit params
> > 
> > Gage Eads (1):
> >   net: rename u16 to fix shadowed declaration
> > 
> > Gavin Hu (1):
> >   maintainers: claim maintainership for ARM v7 and v8
> > 
> > Haiyue Wang (2):
> >   mbuf: fix typo in IPv6 macro comment
> >   net/i40e: workaround performance degradation
> > 
> > Hemant Agrawal (1):
> >   test/crypto: fix device id when stopping port
> > 
> > Hyong Youb Kim (1):
> >   net/enic: do not overwrite admin Tx queue limit
> > 
> > Ido Goshen (1):
> >   net/pcap: fix multiple queues
> > 
> > Jerin Jacob (2):
> >   ethdev: fix queue statistics mapping documentation
> >   eal: fix bitmap documentation
> > 
> > Kiran Kumar (3):
> >   net/bonding: fix MAC address reset
> >   ethdev: check queue stats mapping input arguments
> >   net/thunderx: avoid sq door bell write on zero packet
> > 
> > Konstantin Ananyev (3):
> >   examples/ipsec-secgw: fix IPv4 checksum at Tx
> >   examples/ipsec-secgw: fix bypass rule processing
> >   app/testpmd: fix DCB config
> > 
> > Maxime Coquelin (1):
> >   vhost: fix missing increment of log cache count
> > 
> > Pablo de Lara (3):
> >   test/hash: fix multiwriter with non consecutive cores
> >   test/hash: fix potential memory leak
> >   hash: fix doxygen of return values
> > 
> > Radu Nicolau (2):
> >   test: fix uninitialized port configuration
> >   net/bonding: fix race condition
> > 
> > Rafal Kozik (4):
> >   net/ena: check pointer before memset
> >   net/ena: change memory type
> >   net/ena: fix GENMASK_ULL macro
> >   net/ena: set link speed as none
> > 
> > Rahul Lakkireddy (2):
> >   net/cxgbe/base: update flash part information
> >   net/cxgbe: fix init failure due to new flash parts
> > 
> > Rami Rosen (2):
> >   examples/l3fwd: remove useless include
> >   ethdev: fix a doxygen comment for port allocation
> > 
> > Rasesh Mody (3):
> >   net/qede: fix default extended VLAN offload config
> >   net/qede/base: fix GRC attention callback
> >   net/bnx2x: fix FW command timeout during stop
> > 
> > Shahed Shaikh (1):
> >   net/qede: fix MAC address removal failure message
> > 
> > Shreyansh Jain (1):
> >   doc: fix bonding command in testpmd
> > 
> > Wei Zhao (6):
> >   net/ixgbe: fix tunnel id format error for FDIR
> >   net/ixgbe: fix t

[dpdk-dev] Minimum kernel version for DPDK

2018-08-20 Thread Kevin Wilson
Hi all,
Maybe this is a silly question, please bear with me.

According to the “Getting Started Guide for Linux” in
http://doc.dpdk.org/guides/linux_gsg/sys_reqs.html,
the kernel version should be >= 3.2.
Accrding to my understanding, there are only two kernel modules in DPDK,
namely rte_kni.ko and igb_iuo.ko.
Instead of igb_uio.ko I intend to use uio_pci_generic.ko for binding the device.

I want to work with DPDK with kernel 3.1, without using rte_kni and no igb_uio
as mentioned.

Won't the DPDK code compile for such a kernel ? won't it work ?

Regards,
Kevin


Re: [dpdk-dev] 16.11.8 (LTS) patches review and test

2018-08-20 Thread Luca Boccassi
On Mon, 2018-08-13 at 19:21 +0100, luca.bocca...@gmail.com wrote:
> Hi all,
> 
> Here is a list of patches targeted for LTS release 16.11.8. Please
> help review and test. The planned date for the final release is
> August
> the 23rd.
> Before that, please shout if anyone has objections with these
> patches being applied.
> 
> Also for the companies committed to running regression tests,
> please run the tests and report any issue before the release date.
> 
> A release candidate tarball can be found at:
> 
> https://dpdk.org/browse/dpdk-stable/tag/?id=v16.11.8-rc1
> 
> These patches are located at branch 16.11 of dpdk-stable repo:
> https://dpdk.org/browse/dpdk-stable/
> 
> Thanks.
> 
> Luca Boccassi

Regression tests from AT&T have come out clear.

-- 
Kind regards,
Luca Boccassi


Re: [dpdk-dev] [PATCH v15 1/7] ethdev: add function to release port in secondary process

2018-08-20 Thread Andrew Rybchenko

On 16.08.2018 06:04, Qi Zhang wrote:

Add driver API rte_eth_release_port_secondary to support the
case when an ethdev need to be detached on a secondary process.
Local state is set to unused and shared data will not be reset
so the primary process can still use it.


There are few questions below, but in general I'm really puzzled
after looking at variety of release, destroy etc functions and how
call chains look like.

IMHO, introduction of the function is really wrong direction.
It duplicates part of rte_eth_dev_release_port() functionality,
it will complicate maintenance since it will be required to remember
to find and update both. Also it looks like it already has bugs
(missing init of shared data, missing lock).

I would prefer to update rte_eth_dev_release_port() to make it
secondary process aware.


Signed-off-by: Qi Zhang 
---
  lib/librte_ethdev/rte_ethdev.c   | 17 +++--
  lib/librte_ethdev/rte_ethdev_driver.h| 16 +++-
  lib/librte_ethdev/rte_ethdev_pci.h   | 10 --
  lib/librte_ethdev/rte_ethdev_version.map |  7 +++
  4 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 4c3202505..1a1cc1125 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -359,6 +359,18 @@ rte_eth_dev_attach_secondary(const char *name)
  }
  
  int

+rte_eth_dev_release_port_secondary(struct rte_eth_dev *eth_dev)
+{
+   if (eth_dev == NULL)
+   return -EINVAL;
+
+   _rte_eth_dev_callback_process(eth_dev, RTE_ETH_EVENT_DESTROY, NULL);
+   eth_dev->state = RTE_ETH_DEV_UNUSED;


rte_eth_dev_release_port() does it under ownership lock.
Why is lock not required here?


+
+   return 0;
+}
+
+int
  rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
  {
if (eth_dev == NULL)
@@ -3532,9 +3544,10 @@ rte_eth_dev_destroy(struct rte_eth_dev *ethdev,
return ret;
}
  
-	if (rte_eal_process_type() == RTE_PROC_PRIMARY)

-   rte_free(ethdev->data->dev_private);
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return rte_eth_dev_release_port_secondary(ethdev);
  
+	rte_free(ethdev->data->dev_private);

ethdev->data->dev_private = NULL;
  
  	return rte_eth_dev_release_port(ethdev);

diff --git a/lib/librte_ethdev/rte_ethdev_driver.h 
b/lib/librte_ethdev/rte_ethdev_driver.h
index c6d9bc1a3..8fe82d2ab 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -61,7 +61,7 @@ struct rte_eth_dev *rte_eth_dev_attach_secondary(const char 
*name);
   * Release the specified ethdev port.
   *
   * @param eth_dev
- * The *eth_dev* pointer is the address of the *rte_eth_dev* structure.
+ * Device to be detached.
   * @return
   *   - 0 on success, negative on error
   */
@@ -69,6 +69,20 @@ int rte_eth_dev_release_port(struct rte_eth_dev *eth_dev);
  
  /**

   * @internal
+ * Release the specified ethdev port in the local process.
+ * Only set ethdev state to unused, but not reset shared data since
+ * it assume other processes is still using it. typically it is
+ * called by a secondary process.
+ *
+ * @param eth_dev
+ * Device to be detached.
+ * @return
+ *   - 0 on success, negative on error
+ */
+int rte_eth_dev_release_port_secondary(struct rte_eth_dev *eth_dev);
+
+/**
+ * @internal
   * Release device queues and clear its configuration to force the user
   * application to reconfigure it. It is for internal use only.
   *
diff --git a/lib/librte_ethdev/rte_ethdev_pci.h 
b/lib/librte_ethdev/rte_ethdev_pci.h
index f652596f4..70d2d2503 100644
--- a/lib/librte_ethdev/rte_ethdev_pci.h
+++ b/lib/librte_ethdev/rte_ethdev_pci.h
@@ -135,9 +135,15 @@ rte_eth_dev_pci_allocate(struct rte_pci_device *dev, 
size_t private_data_size)
  static inline void
  rte_eth_dev_pci_release(struct rte_eth_dev *eth_dev)
  {
-   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-   rte_free(eth_dev->data->dev_private);
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+   eth_dev->device = NULL;
+   eth_dev->intr_handle = NULL;


Why are above two assignments done here in the PCI device release,
but not included in rte_eth_dev_release_port_secondary()?


+   rte_eth_dev_release_port_secondary(eth_dev);
+   return;
+   }
  
+	/* primary process */

+   rte_free(eth_dev->data->dev_private);
eth_dev->data->dev_private = NULL;
  
  	/*

diff --git a/lib/librte_ethdev/rte_ethdev_version.map 
b/lib/librte_ethdev/rte_ethdev_version.map
index 38f117f01..acc407f86 100644
--- a/lib/librte_ethdev/rte_ethdev_version.map
+++ b/lib/librte_ethdev/rte_ethdev_version.map
@@ -220,6 +220,13 @@ DPDK_18.08 {
  
  } DPDK_18.05;
  
+DPDK_18.11 {

+   global:
+
+   rte_eth_dev_release_port_secondary;
+
+} DPDK_18.08;


Shouldn't it be experimental?


+
  EXPERIMENTAL {
global:
  




Re: [dpdk-dev] [PATCH v1 0/5] Enable hotplug in vfio

2018-08-20 Thread Gaëtan Rivet
Hi Jeff,

On Fri, Aug 17, 2018 at 06:51:26PM +0800, Jeff Guo wrote:
> As we may know that the process of hotplug is different between igb_uio
> and vfio. For igb_uio, it could use uevent notification and memory
> failure handle mechanism for hotplug. But for vfio, when device is be
> hotplug-out, the uevent can not be detected immediately, because of the
> vfio kernel module will use a special mechanism to guaranty the pci
> device would not be deleted until the user space release the resources,
> so it will use another event “req notifier” at first to notify user space
> to release resources for hotplug.
> 
> This patch will add a new interrupt type of req notifier in eal interrupt,
> and add the new interrupt handler in pci device to handle the req device
> event. When the req notifier be detected, it can trigger the device event
> callback process to process for hotplug. With this mechanism, hotplug
> could be enable in vfio.
> 

The REQ event seems VFIO specific.
Why not leave it within the PCI bus, around the vfio glue, and leave the
EAL unmodified?

> Jeff Guo (5):
>   eal: add a new req notifier to eal interrupt
>   eal: add a new req event to device event
>   eal: modify device event callback process func
>   pci: add req handler field to generic pci device
>   vfio: enable vfio hotplug by req notifier handler
> 
>  drivers/bus/pci/linux/pci_vfio.c   | 104 
> +
>  drivers/bus/pci/pci_common.c   |  10 ++
>  drivers/bus/pci/rte_bus_pci.h  |   1 +
>  lib/librte_eal/common/eal_common_dev.c |   5 +-
>  lib/librte_eal/common/eal_private.h|  12 ---
>  lib/librte_eal/common/include/rte_dev.h|  20 +++-
>  lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
>  lib/librte_eal/linuxapp/eal/eal_dev.c  |   2 +-
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c   |  71 ++
>  lib/librte_ethdev/rte_ethdev.c |   3 +-
>  10 files changed, 212 insertions(+), 17 deletions(-)
> 
> -- 
> 2.7.4
> 

-- 
Gaëtan Rivet
6WIND


Re: [dpdk-dev] 17.11.4 patches review and test

2018-08-20 Thread Marco Varlese
Hi,

The code in 17.11.4-rc1 does not compile for me. 
Please, see below what I get:

[  100s] gcc -m64 -DVERSION="17.11.4" -L/home/abuild/rpmbuild/BUILD/dpdk-stable-
17.11.4-rc1/x86_64-native-linuxapp-gcc-default/lib -Wl,--version-
script=/home/abuild/rpmbuild/BUILD/dpdk-stable-17.11.4-
rc1/drivers/net/bnx2x/rte_pmd_bnx2x_version.map  -shared bnx2x.o bnx2x_rxtx.o
bnx2x_stats.o bnx2x_ethdev.o ecore_sp.o elink.o bnx2x_vfpf.o -z defs -lz
-lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring -lrte_ethdev -lrte_net
-lrte_kvargs -lrte_bus_pci -Wl,-soname,librte_pmd_bnx2x.so.17.11.4 -o
librte_pmd_bnx2x.so.17.11.4 
[  100s] bnx2x_ethdev.o: In function `bnx2x_periodic_start':
[  100s] bnx2x_ethdev.c:(.text+0x5ce): undefined reference to
`rte_eth_linkstatus_set'
[  100s] bnx2x_ethdev.o: In function `bnx2x_dev_link_update':
[  100s] bnx2x_ethdev.c:(.text+0x94f): undefined reference to
`rte_eth_linkstatus_set'
[  100s] bnx2x_ethdev.o: In function `bnx2xvf_dev_link_update':
[  100s] bnx2x_ethdev.c:(.text+0x9f0): undefined reference to
`rte_eth_linkstatus_set'
[  100s] bnx2x_ethdev.o: In function `bnx2x_interrupt_handler':
[  100s] bnx2x_ethdev.c:(.text+0xb1b): undefined reference to
`rte_eth_linkstatus_set'
[  100s] collect2: error: ld returned 1 exit status
[  100s] make[5]: *** [/home/abuild/rpmbuild/BUILD/dpdk-stable-17.11.4-
rc1/mk/rte.lib.mk:128: librte_pmd_bnx2x.so.17.11.4] Error 1
[  100s] make[4]: *** [/home/abuild/rpmbuild/BUILD/dpdk-stable-17.11.4-
rc1/mk/rte.subdir.mk:65: bnx2x] Error 2
[  100s] make[4]: *** Waiting for unfinished jobs


Cheers,
Marco

On Thu, 2018-08-16 at 11:18 -0700, Yongseok Koh wrote:
> Hi all,
> 
> Here is a list of patches targeted for LTS release 17.11.4. Please help review
> and test. The planned date for the final release is August 23. Before that,
> please shout if anyone has objections with these patches being applied.
> 
> Also for the companies committed to running regression tests, please run the
> tests and report any issue before the release date.
> 
> A release candidate tarball can be found at:
> 
> https://dpdk.org/browse/dpdk-stable/tag/?id=v17.11.4-rc1
> 
> These patches are located at branch 17.11 of dpdk-stable repo:
> https://dpdk.org/browse/dpdk-stable/
> 
> Thanks,
> Yongseok
> 
> ---
> Adrien Mazarguil (2):
>   maintainers: update for Mellanox PMDs
>   net/mlx4: fix minor resource leak during init
> 
> Ajit Khaparde (7):
>   net/bnxt: fix HW Tx checksum offload check
>   net/bnxt: fix set MTU
>   net/bnxt: fix Rx ring count limitation
>   net/bnxt: fix memory leaks in NVM commands
>   net/bnxt: fix lock release on NVM write failure
>   net/bnxt: check access denied for HWRM commands
>   net/bnxt: fix RETA size
> 
> Alejandro Lucero (1):
>   net/nfp: fix field initialization in Tx descriptor
> 
> Alok Makhariya (1):
>   bus/dpaa: fix phandle support for Linux 4.16
> 
> Anatoly Burakov (8):
>   eal/linux: fix invalid syntax in interrupts
>   eal/linux: fix uninitialized value
>   test: fix EAL flags autotest on FreeBSD
>   test: fix result printing
>   test: fix code on report
>   test: make autotest runner python 2/3 compliant
>   test: print autotest categories
>   test: improve filtering
> 
> Andrew Rybchenko (2):
>   net/sfc: cut non VLAN ID bits from TCI
>   net/sfc: fix assert in set multicast address list
> 
> Andy Green (1):
>   ring: fix sign conversion warning
> 
> Beilei Xing (3):
>   net/i40e: fix shifts of 32-bit value
>   net/i40e: fix packet type parsing with DDP
>   net/i40e: fix setting TPID with AQ command
> 
> Bruce Richardson (2):
>   examples/exception_path: fix out-of-bounds read
>   mk: fix permissions when using make install
> 
> Chas Williams (2):
>   net/bonding: always update bonding link status
>   net/bonding: do not clear active slave count
> 
> Dan Gora (1):
>   kni: fix crash with null name
> 
> Daria Kolistratova (1):
>   net/ena: fix SIGFPE with 0 Rx queue
> 
> Dariusz Stojaczyk (1):
>   eal: fix return codes on thread naming failure
> 
> David Marchand (1):
>   net/bnxt: add missing ids in xstats
> 
> Drocula Lambda (1):
>   kni: fix build on RHEL 7.5
> 
> Ferruh Yigit (2):
>   kni: fix build with gcc 8.1
>   net/thunderx: fix build with gcc optimization on
> 
> Gavin Hu (3):
>   mk: fix cross build
>   net/dpaa2: remove loop for unused pool entries
>   maintainers: claim maintainership for ARM v7 and v8
> 
> Haiyue Wang (1):
>   net/i40e: workaround performance degradation
> 
> Harry van Haaren (1):
>   event: fix ring init failure handling
> 
> Hemant Agrawal (2):
>   test/crypto: fix device id when stopping port
>   bus/dpaa: fix buffer offset setting in FMAN
> 
> Hyong Youb Kim (1):
>   net/enic: do not overwrite admin Tx queue limit
> 
> Ido Goshen (1):
>   net/pcap: fix multiple queues
> 
> Jananee Parthasarathy (1):
>   mk: update targets 

[dpdk-dev] [PATCH] bus/fslmc: fix the undefined ref of rte dpaa2 memsegs

2018-08-20 Thread Hemant Agrawal
This patch fix the undefined reference issue with rte_dpaa2_memsegs
when compiled in shared lib mode with EXTRA_CFLAGS="-g -O0"

Fixes: 365fb925d3b3 ("bus/fslmc: optimize physical to virtual address search")
Cc: sta...@dpdk.org

Signed-off-by: Hemant Agrawal 
Reported-by: Keith Wiles 
---
 drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c| 7 +++
 drivers/bus/fslmc/rte_bus_fslmc_version.map | 1 +
 drivers/mempool/dpaa2/dpaa2_hw_mempool.c| 7 ---
 drivers/mempool/dpaa2/rte_mempool_dpaa2_version.map | 1 -
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c 
b/drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c
index 39c5adf..db49d63 100644
--- a/drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c
+++ b/drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c
@@ -28,6 +28,13 @@
 #include "portal/dpaa2_hw_pvt.h"
 #include "portal/dpaa2_hw_dpio.h"
 
+/* List of all the memseg information locally maintained in dpaa2 driver. This
+ * is to optimize the PA_to_VA searches until a better mechanism (algo) is
+ * available.
+ */
+struct dpaa2_memseg_list rte_dpaa2_memsegs
+   = TAILQ_HEAD_INITIALIZER(rte_dpaa2_memsegs);
+
 TAILQ_HEAD(dpbp_dev_list, dpaa2_dpbp_dev);
 static struct dpbp_dev_list dpbp_dev_list
= TAILQ_HEAD_INITIALIZER(dpbp_dev_list); /*!< DPBP device list */
diff --git a/drivers/bus/fslmc/rte_bus_fslmc_version.map 
b/drivers/bus/fslmc/rte_bus_fslmc_version.map
index fe45a11..b4a8817 100644
--- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
+++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
@@ -114,5 +114,6 @@ DPDK_18.05 {
dpdmai_open;
dpdmai_set_rx_queue;
rte_dpaa2_free_dpci_dev;
+   rte_dpaa2_memsegs;
 
 } DPDK_18.02;
diff --git a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c 
b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
index 7d0435f..84ff128 100644
--- a/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
+++ b/drivers/mempool/dpaa2/dpaa2_hw_mempool.c
@@ -33,13 +33,6 @@
 struct dpaa2_bp_info rte_dpaa2_bpid_info[MAX_BPID];
 static struct dpaa2_bp_list *h_bp_list;
 
-/* List of all the memseg information locally maintained in dpaa2 driver. This
- * is to optimize the PA_to_VA searches until a better mechanism (algo) is
- * available.
- */
-struct dpaa2_memseg_list rte_dpaa2_memsegs
-   = TAILQ_HEAD_INITIALIZER(rte_dpaa2_memsegs);
-
 /* Dynamic logging identified for mempool */
 int dpaa2_logtype_mempool;
 
diff --git a/drivers/mempool/dpaa2/rte_mempool_dpaa2_version.map 
b/drivers/mempool/dpaa2/rte_mempool_dpaa2_version.map
index b9d996a..b45e7a9 100644
--- a/drivers/mempool/dpaa2/rte_mempool_dpaa2_version.map
+++ b/drivers/mempool/dpaa2/rte_mempool_dpaa2_version.map
@@ -3,7 +3,6 @@ DPDK_17.05 {
 
rte_dpaa2_bpid_info;
rte_dpaa2_mbuf_alloc_bulk;
-   rte_dpaa2_memsegs;
 
local: *;
 };
-- 
2.7.4



Re: [dpdk-dev] [PATCH v1 2/5] eal: add a new req event to device event

2018-08-20 Thread Andrew Rybchenko

On 17.08.2018 13:51, Jeff Guo wrote:

Add a new req event in eal device event for vfio hotplug. When the req
request send from the vfio kernel module be detected, vfio userpace
driver could use this event to notify the app to handler it.

Signed-off-by: Jeff Guo 
---
  lib/librte_eal/common/include/rte_dev.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h
index ff580a0..0324c84 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -30,6 +30,7 @@ extern "C" {
  enum rte_dev_event_type {
RTE_DEV_EVENT_ADD,  /**< device being added */
RTE_DEV_EVENT_REMOVE,   /**< device being removed */
+   RTE_DEV_EVENT_REQ,  /**< device being removed */


Comment is the copy of previous one.


RTE_DEV_EVENT_MAX   /**< max value of this enum */
  };
  




Re: [dpdk-dev] [PATCH 0/5] AESNI MB PMD changes

2018-08-20 Thread Kovacevic, Marko
> The Multi-buffer library supports full digest sizes for the HMAC algorithms
> (except for MD5), from 0.50 version.
> Also, since 0.50, keys larger than the algorithm block size can be used for
> HMAC algorithms, performing a hash on the key.
> 
> Therefore, the AESNI MB PMD now supports any key size for the HMAC
> algorithms and any truncated digest size for any SHAx-HMAC and AES-CMAC
> algorithm.
> 
> Pablo de Lara (5):
>   crypto/aesni_mb: support all truncated HMAC digest sizes
>   crypto/aesni_mb: check for invalid digest size
>   crypto/aesni_mb: fix truncated digest size for CMAC
>   crypto/aesni_mb: support all truncated CMAC digest sizes
>   crypto/aesni_mb: support large HMAC key sizes
> 
>  drivers/crypto/aesni_mb/aesni_mb_ops.h|  61 ++
>  drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c| 180 +++---
>  .../crypto/aesni_mb/rte_aesni_mb_pmd_ops.c|  60 +-
>  .../aesni_mb/rte_aesni_mb_pmd_private.h   |  24 +--
>  4 files changed, 285 insertions(+), 40 deletions(-)
> 

Series-Acked-by: Marko Kovacevic 


[dpdk-dev] [RFC] ethdev: flow counters batch query

2018-08-20 Thread viacheslavo
There is a demand to perform query operations on the set of counters
generally belonging to the different RTE flows. The counter queries
is the very effective way for application to know in detail what
is going on into the network and to take some actions basing on this
knowledge. As example, the modern vSwitch implementation supporting
full hardware offload may need a huge number of counters and query
these ones periodically for configuration aging check purposes.

For some devices the counter query can be very expensive operation
and can take a significant amount of time - often it involves the
calls of kernel mode components and performing requests to hardware.
If application wants to query multiple counters, belonging to the
different RTE flows, it has to execute queries in one-by-one fashion
due to the current RTE API implementation that allows to query
counters belonging to the same flow only. The proposed RTE API
extension introduces the compatible and more performance way
to allow applications to query the counter values as a batch.

At the creation the counter can be optionally assigned with the
new introduced 'group_id' identifier (batch identifier). All
counters with the same group_id form the new introduced entity
- 'counter batch'. The group_id specifies the unique batch
within device with given port_id. The same group_id may specify
the different batches on different devices.

The new method 'rte_flow_query_update()' is proposed to perform the
counter batch query from the device as a single call, this eliminates
a much of overhead involved with multiple quieries for the counters
belonging to the different flows. The rte_flow_query_update() fetches
the actual counter values from the device using underlying software
and/or hardware and stores obtained data into the counter objects
within PMD.

The application can get the stored data by invoking the existing
rte_flow_query() method with specifying the new introduced flag
'read_cached'. Such approach is compatible with the current
implementation, improves the performance and requires the minor
changes from applications.

Let's assume we have an array of flows and attached counters.
If application wants to read them it does something like that:

foreach(flow) { // compatible mode
rte_flow_query(flow, flow_counters[]);  // no read_cached set
}   // doing as previously

With counter batch query implemented application can do the following:

// new query mode
rte_flow_query_update(group_id of interest);// actual query here
foreach(flow) { // as single call
rte_flow_query(flow, flow_counters[]);  // read_cached flag set
}   // read stored data
// no underlying calls

For some devices implementation of rte_flow_query_update() may require
a lot of preparations before performing the actual query. If batch is
permanent and assumed not to be changed frequently the preparations
can be cached internally by implementation. By setting the
RTE_FLOW_QUERY_FLAG_PERMANENT flag application gives the hint to PMD
that batch is assumed to be long-term and allows to optimize the
succesive calls of rte_flow_query_update() for the same group_id
on given device.

If permanent batch is subject to change (occurs an adding or removing
the counter with specified batch id) the PMD should free all
resources, internally allocated for batch query optimization.

If RTE_FLOW_QUERY_FLAG_PERMANENT is not set, rte_flow_query_update()
should free the resources allocated (including ones done in previous
calls, if any) for batch query optimization with given group_id.

Signed-off-by: Viacheslav Ovsiienko 
---
 doc/guides/prog_guide/rte_flow.rst | 76 --
 lib/librte_ethdev/rte_flow.h   | 59 -
 2 files changed, 113 insertions(+), 22 deletions(-)

diff --git a/doc/guides/prog_guide/rte_flow.rst 
b/doc/guides/prog_guide/rte_flow.rst
index b305a72..84b3b67 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1500,6 +1500,10 @@ action must specify a unique id.
 Counters can be retrieved and reset through ``rte_flow_query()``, see
 ``struct rte_flow_query_count``.
 
+Counters can be assigned with group_id, all counters with matched group_id
+on the same port are grouped into batch and can be queried from device
+using the single call of rte_flow_query_update()
+
 The shared flag indicates whether the counter is unique to the flow rule the
 action is specified with, or whether it is a shared counter.
 
@@ -1515,13 +1519,15 @@ to all ports within that switch domain.
 
 .. table:: COUNT
 
-   ++-+
-   | Field  | Value   |
-   ++=+
-   | ``shared`` | s

[dpdk-dev] [PATCH] port: add sym crypto port

2018-08-20 Thread Zhang, Roy Fan
This patch adds the symmetric crypto support to port library.
The crypto port acts as a shim layer to DPDK cryptodev library and
supports in-place crypto workload processing.

Change-Id: I8e9c87bdaaa252b39873596cbfaa2de2324e765c
Signed-off-by: Zhang, Roy Fan 
---
 lib/librte_port/Makefile  |   2 +
 lib/librte_port/rte_port_sym_crypto.c | 552 ++
 lib/librte_port/rte_port_sym_crypto.h |  93 ++
 3 files changed, 647 insertions(+)
 create mode 100644 lib/librte_port/rte_port_sym_crypto.c
 create mode 100644 lib/librte_port/rte_port_sym_crypto.h

diff --git a/lib/librte_port/Makefile b/lib/librte_port/Makefile
index 8df4864ed..30113f790 100644
--- a/lib/librte_port/Makefile
+++ b/lib/librte_port/Makefile
@@ -38,6 +38,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 SRCS-$(CONFIG_RTE_LIBRTE_PORT) += rte_port_kni.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_PORT) += rte_port_source_sink.c
+SRCS-$(CONFIG_RTE_LIBRTE_PORT) += rte_port_sym_crypto.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_PORT)-include += rte_port.h
@@ -53,5 +54,6 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 SYMLINK-$(CONFIG_RTE_LIBRTE_PORT)-include += rte_port_kni.h
 endif
 SYMLINK-$(CONFIG_RTE_LIBRTE_PORT)-include += rte_port_source_sink.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_PORT)-include += rte_port_sym_crypto.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_port/rte_port_sym_crypto.c 
b/lib/librte_port/rte_port_sym_crypto.c
new file mode 100644
index 0..1cd242e22
--- /dev/null
+++ b/lib/librte_port/rte_port_sym_crypto.c
@@ -0,0 +1,552 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#include 
+
+#include 
+#include 
+
+#include "rte_port_sym_crypto.h"
+
+/*
+ * Port Crypto Reader
+ */
+#ifdef RTE_PORT_STATS_COLLECT
+
+#define RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_IN_ADD(port, val) \
+   port->stats.n_pkts_in += val
+#define RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_DROP_ADD(port, val) \
+   port->stats.n_pkts_drop += val
+
+#else
+
+#define RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_IN_ADD(port, val)
+#define RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_DROP_ADD(port, val)
+
+#endif
+
+struct rte_port_sym_crypto_reader {
+   struct rte_port_in_stats stats;
+
+   uint8_t cryptodev_id;
+   uint16_t queue_id;
+   struct rte_crypto_op *ops[RTE_PORT_IN_BURST_SIZE_MAX];
+   rte_port_sym_crypto_reader_callback_fn f_callback;
+   void *arg_callback;
+};
+
+static void *
+rte_port_sym_crypto_reader_create(void *params, int socket_id)
+{
+   struct rte_port_sym_crypto_reader_params *conf =
+   params;
+   struct rte_port_sym_crypto_reader *port;
+
+   /* Check input parameters */
+   if (conf == NULL) {
+   RTE_LOG(ERR, PORT, "%s: params is NULL\n", __func__);
+   return NULL;
+   }
+
+   /* Memory allocation */
+   port = rte_zmalloc_socket("PORT", sizeof(*port),
+   RTE_CACHE_LINE_SIZE, socket_id);
+   if (port == NULL) {
+   RTE_LOG(ERR, PORT, "%s: Failed to allocate port\n", __func__);
+   return NULL;
+   }
+
+   /* Initialization */
+   port->cryptodev_id = conf->cryptodev_id;
+   port->queue_id = conf->queue_id;
+   port->f_callback = conf->f_callback;
+   port->arg_callback = conf->arg_callback;
+
+   return port;
+}
+
+static int
+rte_port_sym_crypto_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t 
n_pkts)
+{
+   struct rte_port_sym_crypto_reader *p =
+   port;
+   uint16_t rx_ops_cnt, i, n = 0;
+
+   rx_ops_cnt = rte_cryptodev_dequeue_burst(p->cryptodev_id, p->queue_id,
+   p->ops, n_pkts);
+
+   for (i = 0; i < rx_ops_cnt; i++) {
+   struct rte_crypto_op *op = p->ops[i];
+
+   /** Drop failed pkts */
+   if (unlikely(op->status != RTE_CRYPTO_OP_STATUS_SUCCESS)) {
+   rte_pktmbuf_free(op->sym->m_src);
+   continue;
+   }
+
+   pkts[n++] = op->sym->m_src;
+   }
+
+   if (p->f_callback)
+   (*p->f_callback)(pkts, n, p->arg_callback);
+
+   RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_IN_ADD(p, n);
+   RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_DROP_ADD(p, rx_pkt_cnt - n);
+
+   return n;
+}
+
+static int
+rte_port_sym_crypto_reader_free(void *port)
+{
+   if (port == NULL) {
+   RTE_LOG(ERR, PORT, "%s: port is NULL\n", __func__);
+   return -EINVAL;
+   }
+
+   rte_free(port);
+
+   return 0;
+}
+
+static int rte_port_sym_crypto_reader_stats_read(void *port,
+   struct rte_port_in_stats *stats, int clear)
+{
+   struct rte_port_sym_crypto_reader *p =
+   port;
+
+   if (stats != NULL)
+   memcpy(stats, &p->stats, sizeof(p->stats));
+
+   if (clear)
+   memset(&p->stats, 0, sizeof(p->stats));
+
+   return 0;
+}
+
+/*
+ * Port cr

Re: [dpdk-dev] dpaa2: building with EXTRA_CFLAGS="-g -O0" and shared libs link error

2018-08-20 Thread Wiles, Keith


> On Aug 20, 2018, at 2:33 AM, Hemant Agrawal  wrote:
> 
> Hi Keith,
>   I am able to reproduce the issue. We will send the fix asap.

Thanks Hemant.
> 
> Regards,
> Hemant
> 
> -Original Message-
> From: dev  On Behalf Of Wiles, Keith
> Sent: Sunday, August 19, 2018 8:25 PM
> To: DPDK 
> Subject: [dpdk-dev] dpaa2: building with EXTRA_CFLAGS="-g -O0" and shared 
> libs link error
> 
> 
> I am building on Ubuntu 18.04.1 LTS, gcc 7.3.0-16ubuntu3 and with 
> EXTRA_CFLAGS=“-g -O0” with shared libs enabled.
> 
> I get an undefined reference to rte_dpaa2_memsegs when building 
> portal/dpaa2_hw_dpio.c in the link phase.
> 
> The odd part is this does not happen unless you are building with 
> EXTRA_CFLAGS and with shared libs enabled.
> 
> It seems to have circular dependency on drivers/mempool/dpaa2 and 
> drivers/bus/fslmc/portal
> 
> This breaks being able to build a debug version of DPDK.
> 
> 
> Regards,
> Keith
> 

Regards,
Keith



Re: [dpdk-dev] Minimum kernel version for DPDK

2018-08-20 Thread Wiles, Keith


> On Aug 20, 2018, at 3:46 AM, Kevin Wilson  wrote:
> 
> Hi all,
> Maybe this is a silly question, please bear with me.
> 
> According to the “Getting Started Guide for Linux” in
> http://doc.dpdk.org/guides/linux_gsg/sys_reqs.html,
> the kernel version should be >= 3.2.
> Accrding to my understanding, there are only two kernel modules in DPDK,
> namely rte_kni.ko and igb_iuo.ko.
> Instead of igb_uio.ko I intend to use uio_pci_generic.ko for binding the 
> device.
> 
> I want to work with DPDK with kernel 3.1, without using rte_kni and no igb_uio
> as mentioned.
> 
> Won't the DPDK code compile for such a kernel ? won't it work ?

I would assume so as long as the kernel has huge page support. If I remember 
correctly that was one of the big issues with older kernels.

> 
> Regards,
> Kevin

Regards,
Keith



Re: [dpdk-dev] [RFC] ethdev: flow counters batch query

2018-08-20 Thread Slava Ovsiienko
Adding relevant maintainers.

> -Original Message-
> From: Slava Ovsiienko
> Sent: Monday, August 20, 2018 14:39
> To: dev@dpdk.org
> Cc: Shahaf Shuler ; Slava Ovsiienko
> 
> Subject: [RFC] ethdev: flow counters batch query
> 
> There is a demand to perform query operations on the set of counters
> generally belonging to the different RTE flows. The counter queries is the
> very effective way for application to know in detail what is going on into the
> network and to take some actions basing on this knowledge. As example, the
> modern vSwitch implementation supporting full hardware offload may need
> a huge number of counters and query these ones periodically for
> configuration aging check purposes.
> 
> For some devices the counter query can be very expensive operation and
> can take a significant amount of time - often it involves the calls of kernel
> mode components and performing requests to hardware.
> If application wants to query multiple counters, belonging to the different
> RTE flows, it has to execute queries in one-by-one fashion due to the current
> RTE API implementation that allows to query counters belonging to the same
> flow only. The proposed RTE API extension introduces the compatible and
> more performance way to allow applications to query the counter values as a
> batch.
> 
> At the creation the counter can be optionally assigned with the new
> introduced 'group_id' identifier (batch identifier). All counters with the 
> same
> group_id form the new introduced entity
> - 'counter batch'. The group_id specifies the unique batch within device with
> given port_id. The same group_id may specify the different batches on
> different devices.
> 
> The new method 'rte_flow_query_update()' is proposed to perform the
> counter batch query from the device as a single call, this eliminates a much 
> of
> overhead involved with multiple quieries for the counters belonging to the
> different flows. The rte_flow_query_update() fetches the actual counter
> values from the device using underlying software and/or hardware and
> stores obtained data into the counter objects within PMD.
> 
> The application can get the stored data by invoking the existing
> rte_flow_query() method with specifying the new introduced flag
> 'read_cached'. Such approach is compatible with the current
> implementation, improves the performance and requires the minor changes
> from applications.
> 
> Let's assume we have an array of flows and attached counters.
> If application wants to read them it does something like that:
> 
> foreach(flow) { // compatible mode
> rte_flow_query(flow, flow_counters[]);  // no read_cached set
> }   // doing as previously
> 
> With counter batch query implemented application can do the following:
> 
> // new query mode
> rte_flow_query_update(group_id of interest);// actual query here
> foreach(flow) { // as single call
> rte_flow_query(flow, flow_counters[]);  // read_cached flag set
> }   // read stored data
> // no underlying calls
> 
> For some devices implementation of rte_flow_query_update() may require
> a lot of preparations before performing the actual query. If batch is
> permanent and assumed not to be changed frequently the preparations can
> be cached internally by implementation. By setting the
> RTE_FLOW_QUERY_FLAG_PERMANENT flag application gives the hint to
> PMD that batch is assumed to be long-term and allows to optimize the
> succesive calls of rte_flow_query_update() for the same group_id on given
> device.
> 
> If permanent batch is subject to change (occurs an adding or removing the
> counter with specified batch id) the PMD should free all resources, internally
> allocated for batch query optimization.
> 
> If RTE_FLOW_QUERY_FLAG_PERMANENT is not set,
> rte_flow_query_update() should free the resources allocated (including
> ones done in previous calls, if any) for batch query optimization with given
> group_id.
> 
> Signed-off-by: Viacheslav Ovsiienko 
> ---
>  doc/guides/prog_guide/rte_flow.rst | 76
> --
>  lib/librte_ethdev/rte_flow.h   | 59 -
>  2 files changed, 113 insertions(+), 22 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> index b305a72..84b3b67 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -1500,6 +1500,10 @@ action must specify a unique id.
>  Counters can be retrieved and reset through ``rte_flow_query()``, see
> ``struct rte_flow_query_count``.
> 
> +Counters can be assigned with group_id, all counters with matched
> +group_id on the same port are grouped into batch and can be queried
> +from device us

[dpdk-dev] [RFC] mlx5: flow counters support on the linux-rdma v19 base

2018-08-20 Thread viacheslavo
Mellanox mlx5 PMD supports Flow Counters via Verbs library.
The current implementation is based on the Mellanox proprietary
Verbs library included in MLNX OFED packages. The Flow Counter
support is recently added into linux-rdma release (v19),
so the mlx5 PMD update is needed to provide Counter feature
on the base of linux-rdma.

mlx5 PMD can be compiled with MLNX OFED or linux-rdma v19
and provide flow counters for both.

Signed-off-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/Makefile| 10 +++
 drivers/net/mlx5/mlx5.c  |  6 
 drivers/net/mlx5/mlx5_flow.c | 67 
 drivers/net/mlx5/mlx5_glue.c | 41 +++
 drivers/net/mlx5/mlx5_glue.h | 16 +++
 5 files changed, 128 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 2e70dec..ca4c143 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -155,6 +155,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
type 'struct ibv_counter_set_init_attr' \
$(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
+   HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT \
+   infiniband/verbs.h \
+   type 'struct ibv_counters_init_attr' \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
+   HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT_V45 \
+   infiniband/verbs.h \
+   type 'struct ibv_counters_init_attr' \
+   $(AUTOCONF_OUTPUT)
+   $Q sh -- '$<' '$@' \
HAVE_RDMA_NL_NLDEV \
rdma/rdma_netlink.h \
enum RDMA_NL_NLDEV \
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ec63bc6..ad289f0 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -730,8 +730,10 @@
unsigned int mprq_min_stride_num_n = 0;
unsigned int mprq_max_stride_num_n = 0;
 #ifdef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT_V45
struct ibv_counter_set_description cs_desc = { .counter_type = 0 };
 #endif
+#endif
struct ether_addr mac;
char name[RTE_ETH_NAME_MAX_LEN];
int own_domain_id = 0;
@@ -1001,11 +1003,15 @@
DRV_LOG(DEBUG, "checksum offloading is %ssupported",
(config.hw_csum ? "" : "not "));
 #ifdef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT
+#ifndef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT_V45
config.flow_counter_en = !!attr.max_counter_sets;
mlx5_glue->describe_counter_set(ctx, 0, &cs_desc);
DRV_LOG(DEBUG, "counter type = %d, num of cs = %ld, attributes = %d",
cs_desc.counter_type, cs_desc.num_of_cs,
cs_desc.attributes);
+#else
+   config.flow_counter_en = 1;
+#endif
 #endif
config.ind_table_max_size =
attr.rss_caps.max_rwq_indirection_table_size;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index ca4625b..4d762dd 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -290,7 +290,11 @@ struct mlx5_flow_counter {
uint32_t shared:1; /**< Share counter ID with other flow rules. */
uint32_t ref_cnt:31; /**< Reference counter. */
uint32_t id; /**< Counter ID. */
-   struct ibv_counter_set *cs; /**< Holds the counters for the rule. */
+#ifdef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT_V45
+struct ibv_counters *cs; /**< Holds the counters for the rule. */
+#else
+struct ibv_counter_set *cs; /**< Holds the counters for the rule. */
+#endif
uint64_t hits; /**< Number of packets matched by the rule. */
uint64_t bytes; /**< Number of bytes matched by the rule. */
 };
@@ -519,27 +523,32 @@ struct mlx5_flow_tunnel_info {
 static struct mlx5_flow_counter *
 mlx5_flow_counter_new(struct rte_eth_dev *dev, uint32_t shared, uint32_t id)
 {
+#ifdef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT
struct priv *priv = dev->data->dev_private;
struct mlx5_flow_counter *cnt;
 
-   LIST_FOREACH(cnt, &priv->flow_counters, next) {
-   if (!cnt->shared || cnt->shared != shared)
-   continue;
-   if (cnt->id != id)
-   continue;
-   cnt->ref_cnt++;
-   return cnt;
+   if (shared) {
+   LIST_FOREACH(cnt, &priv->flow_counters, next)
+   if (cnt->shared && cnt->id == id) {
+   cnt->ref_cnt++;
+   return cnt;
+   }
}
-#ifdef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT
 
struct mlx5_flow_counter tmpl = {
.shared = shared,
.id = id,
+#ifdef HAVE_IBV_DEVICE_COUNTERS_SET_SUPPORT_V45
+   .cs = mlx5_glue->create_counter_set
+   (priv->ctx,
+&(struct ibv_counters_init_attr){0}),
+#else
.cs = mlx5_glue->create_counter_set

Re: [dpdk-dev] [PATCH v2] net/bonding: fix buf corruption in merging un-transmitted packets

2018-08-20 Thread Chas Williams
On Mon, Aug 20, 2018 at 2:54 AM Jia Yu  wrote:

> When bond slave devices cannot transmit all packets in bufs array,
> tx_burst callback shall merge the un-transmitted packets back to
> bufs array. Recent merge logic introduced a bug which causes
> invalid mbuf addresses being written to bufs array.
> When caller frees the un-transmitted packets, due to invalid addresses,
> application will crash.
>
> The fix is avoid shifting mbufs, and directly write un-transmitted
> packets back to bufs array.
>
> Fixes: 09150784a776 ("net/bonding: burst mode hash calculation")
> Cc: sta...@dpdk.org
> Signed-off-by: Jia Yu 
>

Acked-by: Chas Williams 



> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 116
> +++--
>  1 file changed, 23 insertions(+), 93 deletions(-)
>
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> b/drivers/net/bonding/rte_eth_bond_pmd.c
> index 58f7377..c745f31 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -300,10 +300,10 @@ bond_ethdev_tx_burst_8023ad_fast_queue(void *queue,
> struct rte_mbuf **bufs,
> /* Mapping array generated by hash function to map mbufs to slaves
> */
> uint16_t bufs_slave_port_idxs[RTE_MAX_ETHPORTS] = { 0 };
>
> -   uint16_t slave_tx_count, slave_tx_fail_count[RTE_MAX_ETHPORTS] = {
> 0 };
> +   uint16_t slave_tx_count;
> uint16_t total_tx_count = 0, total_tx_fail_count = 0;
>
> -   uint16_t i, j;
> +   uint16_t i;
>
> if (unlikely(nb_bufs == 0))
> return 0;
> @@ -358,34 +358,12 @@ bond_ethdev_tx_burst_8023ad_fast_queue(void *queue,
> struct rte_mbuf **bufs,
>
> /* If tx burst fails move packets to end of bufs */
> if (unlikely(slave_tx_count < slave_nb_bufs[i])) {
> -   slave_tx_fail_count[i] = slave_nb_bufs[i] -
> +   int slave_tx_fail_count = slave_nb_bufs[i] -
> slave_tx_count;
> -   total_tx_fail_count += slave_tx_fail_count[i];
> -
> -   /*
> -* Shift bufs to beginning of array to allow
> reordering
> -* later
> -*/
> -   for (j = 0; j < slave_tx_fail_count[i]; j++) {
> -   slave_bufs[i][j] =
> -   slave_bufs[i][(slave_tx_count - 1)
> + j];
> -   }
> -   }
> -   }
> -
> -   /*
> -* If there are tx burst failures we move packets to end of bufs to
> -* preserve expected PMD behaviour of all failed transmitted being
> -* at the end of the input mbuf array
> -*/
> -   if (unlikely(total_tx_fail_count > 0)) {
> -   int bufs_idx = nb_bufs - total_tx_fail_count - 1;
> -
> -   for (i = 0; i < slave_count; i++) {
> -   if (slave_tx_fail_count[i] > 0) {
> -   for (j = 0; j < slave_tx_fail_count[i];
> j++)
> -   bufs[bufs_idx++] =
> slave_bufs[i][j];
> -   }
> +   total_tx_fail_count += slave_tx_fail_count;
> +   memcpy(&bufs[nb_bufs - total_tx_fail_count],
> +  &slave_bufs[i][slave_tx_count],
> +  slave_tx_fail_count * sizeof(bufs[0]));
> }
> }
>
> @@ -715,8 +693,8 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct
> rte_mbuf **bufs,
> tx_fail_total += tx_fail_slave;
>
> memcpy(&bufs[nb_pkts - tx_fail_total],
> -
>  &slave_bufs[i][num_tx_slave],
> -   tx_fail_slave *
> sizeof(bufs[0]));
> +  &slave_bufs[i][num_tx_slave],
> +  tx_fail_slave * sizeof(bufs[0]));
> }
> num_tx_total += num_tx_slave;
> }
> @@ -1221,10 +1199,10 @@ bond_ethdev_tx_burst_balance(void *queue, struct
> rte_mbuf **bufs,
> /* Mapping array generated by hash function to map mbufs to slaves
> */
> uint16_t bufs_slave_port_idxs[nb_bufs];
>
> -   uint16_t slave_tx_count, slave_tx_fail_count[RTE_MAX_ETHPORTS] = {
> 0 };
> +   uint16_t slave_tx_count;
> uint16_t total_tx_count = 0, total_tx_fail_count = 0;
>
> -   uint16_t i, j;
> +   uint16_t i;
>
> if (unlikely(nb_bufs == 0))
> return 0;
> @@ -1265,34 +1243,12 @@ bond_ethdev_tx_burst_balance(void *queue, struct
> rte_mbuf **bufs,
>
> /* If tx burst fails move packets to end of bufs */
> if (unlikely(slave_tx_count < slave_nb_bufs[i])) {
> -   slave_tx_fail_count[i] = slave_nb_bufs[i] -
> +   int slave_tx_fail_co

Re: [dpdk-dev] [PATCH] net/bonding: fix buf corruption in merging un-transmitted packets

2018-08-20 Thread Chas Williams
On Mon, Aug 20, 2018 at 3:02 AM Jia Yu  wrote:

> Hi Chas,
>
>
>
> Could you help to review the v2 of the patch? I have addressed the “Fixes”
> requirement in commit message, and modified slave_tx_fail_count’s
> definition.
>

Looks fine to me.  Thanks for the patch!


>
>
> Thanks,
>
> Jia
>
>
>
> *From: *Chas Williams <3ch...@gmail.com>
> *Date: *Sunday, August 19, 2018 at 5:07 PM
> *To: *Jia Yu 
> *Cc: *"dev@dpdk.org" , Declan Doherty <
> declan.dohe...@intel.com>, Chas Williams 
> *Subject: *Re: [dpdk-dev] [PATCH] net/bonding: fix buf corruption in
> merging un-transmitted packets
>
>
>
>
>
>
>
> On Sun, Aug 19, 2018 at 6:19 PM Jia Yu  wrote:
>
> Hi Chas,
>
>
>
> Thanks for reviewing the change.
>
>
>
> Our application crashed after upgrading to DPDK 18.02, when packet rate is
> high and bond is configured. It happened because txq contains invalid mbuf
> addresses after rte_eth_tx_burst call (for example, 0x1 repeated 13
> times in one core dump). It seems that the invalid addresses came from buf
> shift code below, so I changed this part of code to earlier version. We
> don’t see crash after the fix. Please let us know if the fix is reasonable.
>
>
>
> With respect to correctness, the code you are fixing does seem broken.
>
> See inline below.
>
>
>
>
>
> m_table = {0x7fdf23a68ec0, 0x7fdf23a68400, 0x7fdf23a66e80, 0x7fdf23a65900,
> 0x7fdf23960e00, 0x1 , 0x7fdf23978640,
> 0x7fdf23977b80, 0x7fdf239745c0, 0x7fdf23a4d600, 0x7fdf23a4cb40,
> 0x7fdf23a75b00, 0x7fdf23a74580, 0x7fdf23972580, 0x7fdf239a76c0,
> 0x7fdf239a5680, 0x7fdf23a7a640, 0x7fdf23a79b80, 0x7fdf23a77b40,
> 0x7fdf2395b800}
>
>
>
> /* Send packet burst on each slave device */
> for (i = 0; i < slave_count; i++) {
> if (slave_nb_bufs[i] == 0)
> continue;
>
>
> slave_tx_count = rte_eth_tx_burst(slave_port_ids[i],
> bd_tx_q->queue_id, slave_bufs[i],
> slave_nb_bufs[i]);
>
>
>
> A typical failure here would be to transmit no packets.  slave_tx_count =
> 0.
>
>
>
>
>
>
>
>
> total_tx_count += slave_tx_count;
>
>
> /* If tx burst fails move packets to end of bufs */
> if (unlikely(slave_tx_count < slave_nb_bufs[i])) {
> slave_tx_fail_count[i] = slave_nb_bufs[i] -
> slave_tx_count;
> total_tx_fail_count += slave_tx_fail_count[i];
>
>
> /*
>  * Shift bufs to beginning of array to allow reordering
>  * later
>  */
> for (j = 0; j < slave_tx_fail_count[i]; j++) {
> slave_bufs[i][j] =
> slave_bufs[i][(slave_tx_count - 1) + j]; < what if
> slave_tx_count == 0, j == 0
>
>
>
> As you correctly point out, slave_tx_count - 1 would be -1.  Bad news.
>
> So, yes, I agree with your fix.  In your fix, you might notice that
>
> slave_tx_fail_count doesn't need to be an array.  It's local to
>
> if (unlikely(slave_tx_count < slave_nb_bufs[i])) now.
>
>
>
>
> }
> }
> }
>
>
>
> Thanks,
>
> Jia
>
>
>
> *From: *Chas Williams <3ch...@gmail.com>
> *Date: *Saturday, August 18, 2018 at 4:50 PM
> *To: *Jia Yu 
> *Cc: *"dev@dpdk.org" , Declan Doherty <
> declan.dohe...@intel.com>, Chas Williams 
> *Subject: *Re: [dpdk-dev] [PATCH] net/bonding: fix buf corruption in
> merging un-transmitted packets
>
>
>
>
>
>
>
> On Fri, Aug 17, 2018 at 9:46 PM Jia Yu  wrote:
>
> When bond slave devices cannot transmit all packets in bufs array,
> tx_burst callback shall merge the un-transmitted packets back to
> bufs array. Recent merge logic introduced a bug which causes
> invalid mbuf addresses being written to bufs array.
>
>
>
> Can you expand on this a bit?  What was the commit?
>
>
>
>
>
> When caller frees the un-transmitted packets, due to invalid addresses,
> application will crash.
>
> The fix is avoid shifting mbufs, and directly write un-transmitted
> packets back to bufs array.
>
> Signed-off-by: Jia Yu 
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 98
> +-
>  1 file changed, 14 insertions(+), 84 deletions(-)
>
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> b/drivers/net/bonding/rte_eth_bond_pmd.c
> index 58f7377..cce973a 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -303,7 +303,7 @@ bond_ethdev_tx_burst_8023ad_fast_queue(void *queue,
> struct rte_mbuf **bufs,
> uint16_t slave_tx_count, slave_tx_fail_count[RTE_MAX_ETHPORTS] = {
> 0 };
> uint16_t total_tx_count = 0, total_tx_fail_count = 0;
>
> -   uint16_t i, j;
> +   uint16_t i;
>
> if (unlikely(nb_bufs == 0))
> return 0;
> @@ -361,31 +361,9 @@ bond_ethdev_tx_burst_8023ad_fast_queue(void *queue,
> struct rte_mbuf **bufs,
> slave_tx_fail_count[i] = slave_nb_bufs[i] -
> slave_tx_count;
> total_tx_fail_count += slave

Re: [dpdk-dev] [PATCH v4] net/bonding: per-slave intermediate rx ring

2018-08-20 Thread Chas Williams
This will need to be implemented for some of the other RX
burst methods at some point for other modes to see this
performance improvement (with the exception of active-backup).

On Thu, Aug 16, 2018 at 9:32 AM Luca Boccassi  wrote:

> During bond 802.3ad receive, a burst of packets is fetched from
> each slave into a local array and appended to per-slave ring buffer.
> Packets are taken from the head of the ring buffer and returned to
> the caller.  The number of mbufs provided to each slave is sufficient
> to meet the requirements of the ixgbe vector receive.
>
> This small change improves performances of the bonding driver
> considerably. Vyatta has been using it for years in production.
>
> Cc: sta...@dpdk.org
>
> Signed-off-by: Eric Kinzie 
> Signed-off-by: Luca Boccassi 
>

Acked-by: Chas Williams 



> ---
> v2 and v3: fix checkpatch warnings
> v4: add Eric's original signed-off-by from the Vyatta internal repo
>
>  drivers/net/bonding/rte_eth_bond_api.c | 13 +++
>  drivers/net/bonding/rte_eth_bond_pmd.c | 98 +-
>  drivers/net/bonding/rte_eth_bond_private.h |  4 +
>  3 files changed, 95 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/bonding/rte_eth_bond_api.c
> b/drivers/net/bonding/rte_eth_bond_api.c
> index 8bc04cfd11..3d22203e91 100644
> --- a/drivers/net/bonding/rte_eth_bond_api.c
> +++ b/drivers/net/bonding/rte_eth_bond_api.c
> @@ -524,6 +524,10 @@ __eth_bond_slave_remove_lock_free(uint16_t
> bonded_port_id,
>
> sizeof(*(rte_eth_devices[bonded_port_id].data->mac_addrs)));
> }
> if (internals->slave_count == 0) {
> +   /* Remove any remaining packets in the receive ring */
> +   struct rte_mbuf *bufs[PMD_BOND_RECV_PKTS_PER_SLAVE];
> +   unsigned int j, count;
> +
> internals->rx_offload_capa = 0;
> internals->tx_offload_capa = 0;
> internals->rx_queue_offload_capa = 0;
> @@ -532,6 +536,15 @@ __eth_bond_slave_remove_lock_free(uint16_t
> bonded_port_id,
> internals->reta_size = 0;
> internals->candidate_max_rx_pktlen = 0;
> internals->max_rx_pktlen = 0;
> +
> +   do {
> +   count = rte_ring_dequeue_burst(internals->rx_ring,
> +   (void **)bufs,
> +   PMD_BOND_RECV_PKTS_PER_SLAVE,
> +   NULL);
> +   for  (j = 0; j < count; j++)
> +   rte_pktmbuf_free(bufs[j]);
> +   } while (count > 0);
> }
> return 0;
>  }
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> b/drivers/net/bonding/rte_eth_bond_pmd.c
> index 58f7377c60..ae756c4e3a 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -18,6 +18,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
>  #include "rte_eth_bond.h"
>  #include "rte_eth_bond_private.h"
> @@ -402,10 +404,15 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct
> rte_mbuf **bufs,
> struct bond_dev_private *internals = bd_rx_q->dev_private;
> struct ether_addr bond_mac;
>
> +   unsigned int rx_ring_avail =
> rte_ring_free_count(internals->rx_ring);
> +   struct rte_mbuf *mbuf_bounce[PMD_BOND_RECV_PKTS_PER_SLAVE];
> +
> struct ether_hdr *hdr;
>
> const uint16_t ether_type_slow_be =
> rte_be_to_cpu_16(ETHER_TYPE_SLOW);
> uint16_t num_rx_total = 0;  /* Total number of received
> packets */
> +   uint16_t num_rx_slave;
> +   uint16_t num_enq_slave;
> uint16_t slaves[RTE_MAX_ETHPORTS];
> uint16_t slave_count, idx;
>
> @@ -414,6 +421,9 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct
> rte_mbuf **bufs,
> uint8_t i, j, k;
> uint8_t subtype;
>
> +   if (rx_ring_avail < PMD_BOND_RECV_PKTS_PER_SLAVE)
> +   goto dequeue;
> +
> rte_eth_macaddr_get(internals->port_id, &bond_mac);
> /* Copy slave list to protect against slave up/down changes during
> tx
>  * bursting */
> @@ -426,62 +436,96 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct
> rte_mbuf **bufs,
> internals->active_slave = 0;
> idx = 0;
> }
> -   for (i = 0; i < slave_count && num_rx_total < nb_pkts; i++) {
> -   j = num_rx_total;
> +   for (i = 0; i < slave_count && num_rx_total < rx_ring_avail; i++) {
> +   j = 0;
> collecting = ACTOR_STATE(&mode_8023ad_ports[slaves[idx]],
>  COLLECTING);
>
> /* Read packets from this slave */
> -   num_rx_total += rte_eth_rx_burst(slaves[idx],
> bd_rx_q->queue_id,
> -   &bufs[num_rx_total], nb_pkts -
> num_rx_total);
> +   if (unlikely(rx_ring_avail - num_rx_total <
> +   PM

Re: [dpdk-dev] Dpdk eventdev performance testing

2018-08-20 Thread Ferruh Yigit
On 8/18/2018 4:12 PM, Guru Prasad wrote:
> Hi All,
> 
> While testing performance with eventdev, I am seeing a crash in dpdk
> service core. Test scenario is given below
> 1) DPDK 1805
> 2 Eventdev ordered Queues & 2 Ports
> 2) 2 Workers threads + 1 Service core for running scheduling.
> 3) 1st worker is getting packet from NIC and enqueueing to Q1 * [PLEASE
> FILL THIS]*
> 4) Packet rate of 5.6Mpps (Pkt size 64 bytes)

Thanks Guru for reporting.

Would you mind submitting this to https://bugs.dpdk.org/, it helps us trace it
better.

Thanks,
ferruh


[dpdk-dev] [PATCH v2] port: add sym crypto port

2018-08-20 Thread Zhang, Roy Fan
This patch adds the symmetric crypto support to port library.
The crypto port acts as a shim layer to DPDK cryptodev library and
supports in-place crypto workload processing.

Signed-off-by: Fan Zhang 
Acked-by: Dumitrescu, Cristian 
---
v2:
- added experimental flags
- enabled meson build

 lib/librte_port/Makefile  |   2 +
 lib/librte_port/meson.build   |   8 +-
 lib/librte_port/rte_port_sym_crypto.c | 553 ++
 lib/librte_port/rte_port_sym_crypto.h |  97 ++
 lib/librte_port/rte_port_version.map  |   8 +
 5 files changed, 665 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_port/rte_port_sym_crypto.c
 create mode 100644 lib/librte_port/rte_port_sym_crypto.h

diff --git a/lib/librte_port/Makefile b/lib/librte_port/Makefile
index 8df4864ed..30113f790 100644
--- a/lib/librte_port/Makefile
+++ b/lib/librte_port/Makefile
@@ -38,6 +38,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 SRCS-$(CONFIG_RTE_LIBRTE_PORT) += rte_port_kni.c
 endif
 SRCS-$(CONFIG_RTE_LIBRTE_PORT) += rte_port_source_sink.c
+SRCS-$(CONFIG_RTE_LIBRTE_PORT) += rte_port_sym_crypto.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_PORT)-include += rte_port.h
@@ -53,5 +54,6 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 SYMLINK-$(CONFIG_RTE_LIBRTE_PORT)-include += rte_port_kni.h
 endif
 SYMLINK-$(CONFIG_RTE_LIBRTE_PORT)-include += rte_port_source_sink.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_PORT)-include += rte_port_sym_crypto.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_port/meson.build b/lib/librte_port/meson.build
index f3d8b4434..0d11456f0 100644
--- a/lib/librte_port/meson.build
+++ b/lib/librte_port/meson.build
@@ -9,7 +9,8 @@ sources = files(
'rte_port_ras.c',
'rte_port_ring.c',
'rte_port_sched.c',
-   'rte_port_source_sink.c')
+   'rte_port_source_sink.c',
+   'rte_port_sym_crypto.c')
 headers = files(
'rte_port_ethdev.h',
'rte_port_fd.h',
@@ -18,8 +19,9 @@ headers = files(
'rte_port.h',
'rte_port_ring.h',
'rte_port_sched.h',
-   'rte_port_source_sink.h')
-deps += ['ethdev', 'sched', 'ip_frag']
+   'rte_port_source_sink.h',
+   'rte_port_sym_crypto.h')
+deps += ['ethdev', 'sched', 'ip_frag', 'cryptodev']
 
 if dpdk_conf.has('RTE_LIBRTE_KNI')
sources += files('rte_port_kni.c')
diff --git a/lib/librte_port/rte_port_sym_crypto.c 
b/lib/librte_port/rte_port_sym_crypto.c
new file mode 100644
index 0..aa3de4fdf
--- /dev/null
+++ b/lib/librte_port/rte_port_sym_crypto.c
@@ -0,0 +1,553 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#include 
+
+#include 
+#include 
+
+#include "rte_port_sym_crypto.h"
+
+/*
+ * Port Crypto Reader
+ */
+#ifdef RTE_PORT_STATS_COLLECT
+
+#define RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_IN_ADD(port, val) \
+   port->stats.n_pkts_in += val
+#define RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_DROP_ADD(port, val) \
+   port->stats.n_pkts_drop += val
+
+#else
+
+#define RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_IN_ADD(port, val)
+#define RTE_PORT_SYM_CRYPTO_READER_STATS_PKTS_DROP_ADD(port, val)
+
+#endif
+
+struct rte_port_sym_crypto_reader {
+   struct rte_port_in_stats stats;
+
+   uint8_t cryptodev_id;
+   uint16_t queue_id;
+   struct rte_crypto_op *ops[RTE_PORT_IN_BURST_SIZE_MAX];
+   rte_port_sym_crypto_reader_callback_fn f_callback;
+   void *arg_callback;
+};
+
+static void *
+rte_port_sym_crypto_reader_create(void *params, int socket_id)
+{
+   struct rte_port_sym_crypto_reader_params *conf =
+   params;
+   struct rte_port_sym_crypto_reader *port;
+
+   /* Check input parameters */
+   if (conf == NULL) {
+   RTE_LOG(ERR, PORT, "%s: params is NULL\n", __func__);
+   return NULL;
+   }
+
+   /* Memory allocation */
+   port = rte_zmalloc_socket("PORT", sizeof(*port),
+   RTE_CACHE_LINE_SIZE, socket_id);
+   if (port == NULL) {
+   RTE_LOG(ERR, PORT, "%s: Failed to allocate port\n", __func__);
+   return NULL;
+   }
+
+   /* Initialization */
+   port->cryptodev_id = conf->cryptodev_id;
+   port->queue_id = conf->queue_id;
+   port->f_callback = conf->f_callback;
+   port->arg_callback = conf->arg_callback;
+
+   return port;
+}
+
+static int
+rte_port_sym_crypto_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t 
n_pkts)
+{
+   struct rte_port_sym_crypto_reader *p =
+   port;
+   uint16_t rx_ops_cnt, i, n = 0;
+
+   rx_ops_cnt = rte_cryptodev_dequeue_burst(p->cryptodev_id, p->queue_id,
+   p->ops, n_pkts);
+
+   for (i = 0; i < rx_ops_cnt; i++) {
+   struct rte_crypto_op *op = p->ops[i];
+
+   /** Drop failed pkts */
+   if (unlikely(op->status != RTE_CRYPTO_OP_STATUS_SUCCESS)) {
+   rte_pktmbuf_free(op->sym->m_src);
+ 

Re: [dpdk-dev] Multi-thread mempool usage

2018-08-20 Thread Matteo Lanzuisi

Hello Olivier,

Il 13/08/2018 23:54, Olivier Matz ha scritto:

Hello Matteo,

On Mon, Aug 13, 2018 at 03:20:44PM +0200, Matteo Lanzuisi wrote:

Any suggestion? any idea about this behaviour?

Il 08/08/2018 11:56, Matteo Lanzuisi ha scritto:

Hi all,

recently I began using "dpdk-17.11-11.el7.x86_64" rpm (RedHat rpm) on
RedHat 7.5 kernel 3.10.0-862.6.3.el7.x86_64 as a porting of an
application from RH6 to RH7. On RH6 I used dpdk-2.2.0.

This application is made up by one or more threads (each one on a
different logical core) reading packets from i40e interfaces.

Each thread can call the following code lines when receiving a specific
packet:

RTE_LCORE_FOREACH(lcore_id)
{
     result =
rte_mempool_get(cea_main_lcore_conf[lcore_id].de_conf.cmd_pool, (VOID_P
*) &new_work);        // mempools are created one for each logical core
     if (((uint64_t)(new_work)) < 0x7f00)
     printf("Result %d, lcore di partenza %u, lcore di ricezione
%u, pointer %p\n", result, rte_lcore_id(), lcore_id, new_work);    //
debug print, on my server it should never happen but with multi-thread
happens always on the last logical core

Here, checking the value of new_work looks wrong to me, before
ensuring that result == 0. At least, new_work should be set to
NULL before calling rte_mempool_get().
I put the check after result == 0, and just before the rte_mempool_get() 
I set new_work to NULL, but nothing changed.

The first time something goes wrong the print is

Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, 
pointer 0x880002


Sorry for the italian language print :) it means that application is 
sending a message from the logical core 1 to the logical core 2, it's 
the 635th time, the result is 0 and the pointer is 0x880002 while all 
pointers before were 0x7ffxx.
One strange thing is that this behaviour happens always from the logical 
core 1 to the logical core 2 when the counter is 635!!! (Sending 
messages from 2 to 1 or 1 to 1 or 2 to 2 is all ok)
Another strange thing is that pointers from counter 636 to 640 are NULL, 
and from 641 begin again to be good... as you can see here following (I 
attached the result of a test without the "if" of the check on the value 
of new_work, and only for messages from the lcore 1 to lcore 2)


Result 0, lcore di partenza 1, lcore di ricezione 2, counter 627, 
pointer 0x7ffe8a261880
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 628, 
pointer 0x7ffe8a261900
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 629, 
pointer 0x7ffe8a261980
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 630, 
pointer 0x7ffe8a261a00
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 631, 
pointer 0x7ffe8a261a80
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 632, 
pointer 0x7ffe8a261b00
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 633, 
pointer 0x7ffe8a261b80
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 634, 
pointer 0x7ffe8a261c00
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, 
pointer 0x880002
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 636, 
pointer (nil)
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 637, 
pointer (nil)
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 638, 
pointer (nil)
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 639, 
pointer (nil)
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 640, 
pointer (nil)
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 641, 
pointer 0x7ffe8a262b00
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 642, 
pointer 0x7ffe8a262b80
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 643, 
pointer 0x7ffe8a262d00
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 644, 
pointer 0x7ffe8a262d80
Result 0, lcore di partenza 1, lcore di ricezione 2, counter 645, 
pointer 0x7ffe8a262e00





     if (result == 0)
     {
     new_work->command = command; // usage of the memory gotten
from the mempool... <- here is where the application crashes

Do you know why it crashes? Is it that new_work is NULL?
The pointer is not NULL but is not sequential to the others (0x880002 as 
written before in this email). It seems to be in a memory zone not in 
DPDK hugepages or something similar.

If I use this pointer the application crashes.


Can you check how the mempool is initialized? It should be in multi
consumer and depending on your use case, single or multi producer.

Here is the initialization of this mempool

cea_main_cmd_pool[i] = rte_mempool_create(pool_name,
    (unsigned int) (ikco_cmd_buffers - 1), // 65536 - 1 in this 
case

    sizeof (CEA_DECODE_CMD_T), // 24 bytes
    0, 0,
    rte_pktmbuf_pool_init, NULL,
    rte_pktmbuf_init, NULL,
    rte_socket_id(), 0);


Another thing that could be checked: at all the places where you
return 

Re: [dpdk-dev] Minimum kernel version for DPDK

2018-08-20 Thread Stephen Hemminger
On Mon, 20 Aug 2018 11:46:25 +0300
Kevin Wilson  wrote:

> Hi all,
> Maybe this is a silly question, please bear with me.
> 
> According to the “Getting Started Guide for Linux” in
> http://doc.dpdk.org/guides/linux_gsg/sys_reqs.html,
> the kernel version should be >= 3.2.
> Accrding to my understanding, there are only two kernel modules in DPDK,
> namely rte_kni.ko and igb_iuo.ko.
> Instead of igb_uio.ko I intend to use uio_pci_generic.ko for binding the 
> device.
> 
> I want to work with DPDK with kernel 3.1, without using rte_kni and no igb_uio
> as mentioned.
> 
> Won't the DPDK code compile for such a kernel ? won't it work ?
> 
> Regards,
> Kevin

I would never build and deploy an application a kernel older than
the current Long Term Supported kernel (3.16). If you are  an unsupported
kernel, you put yourself at risk of the next big security attack.
If you are selling a product, then using older kernels is just
asking it to be pawned.



Re: [dpdk-dev] [RFC v2] ethdev: add flow metadata

2018-08-20 Thread Ferruh Yigit
On 6/26/2018 9:54 AM, Xueming Li wrote:
> Currently, rte_flow pattern only match packet header fields.
> This patch adds additional data to match the packet.
> 
> For example, in egress direction, to do an action depending on the VM
> id, the application needs to configure rte_flow rule with the new
> metadata pattern:
> pattern meta data is {vm} / end action encap …
> Then the PMD will send VM id as metadata associated in mbuf to NIC,
> then egress flow on NIC match metadata as other regular packet headers,
> the appropriate encapsulation is done according to the VM id metadata.
> 
> Metadata could be used on ingress as well to save useful info before
> flow modification (not defined yet) or decapsulation action. PMD is
> responsible to save metadata into mbuf field. The application must get
> metadata from the mbuf.
> 
> Cc: Thomas Monjalon 
> Cc: Olivier Matz 
> Cc: Shahaf Shuler 
> Signed-off-by: Xueming Li 
> ---
>  doc/guides/prog_guide/rte_flow.rst |  7 +++
>  lib/librte_ethdev/rte_flow.c   |  1 +
>  lib/librte_ethdev/rte_flow.h   | 28 
>  3 files changed, 36 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst 
> b/doc/guides/prog_guide/rte_flow.rst
> index b305a72a5..7989e5856 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -1191,6 +1191,13 @@ Normally preceded by any of:
>  - `Item: ICMP6_ND_NS`_
>  - `Item: ICMP6_ND_OPT`_
>  
> +Item: ``META``
> +^^
> +
> +Matches a metadata variable.
> +
> +- ``data``: 64 bit value.
> +

Isn't this "user data" more than metadata?

And there is only one userdata can be provided for a flow.
Can there be a need to provide multiple userdata? Should we support it?
For your example, on egress what if you want to do specific action on a flow for
both VM id and process id, you can only pass single user data?
What about passing a key-value pair?



Re: [dpdk-dev] Multi-thread mempool usage

2018-08-20 Thread Wiles, Keith



> On Aug 20, 2018, at 9:47 AM, Matteo Lanzuisi  wrote:
> 
> Hello Olivier,
> 
> Il 13/08/2018 23:54, Olivier Matz ha scritto:
>> Hello Matteo,
>> 
>> On Mon, Aug 13, 2018 at 03:20:44PM +0200, Matteo Lanzuisi wrote:
>>> Any suggestion? any idea about this behaviour?
>>> 
>>> Il 08/08/2018 11:56, Matteo Lanzuisi ha scritto:
 Hi all,
 
 recently I began using "dpdk-17.11-11.el7.x86_64" rpm (RedHat rpm) on
 RedHat 7.5 kernel 3.10.0-862.6.3.el7.x86_64 as a porting of an
 application from RH6 to RH7. On RH6 I used dpdk-2.2.0.
 
 This application is made up by one or more threads (each one on a
 different logical core) reading packets from i40e interfaces.
 
 Each thread can call the following code lines when receiving a specific
 packet:
 
 RTE_LCORE_FOREACH(lcore_id)
 {
 result =
 rte_mempool_get(cea_main_lcore_conf[lcore_id].de_conf.cmd_pool, (VOID_P
 *) &new_work);// mempools are created one for each logical core
 if (((uint64_t)(new_work)) < 0x7f00)
 printf("Result %d, lcore di partenza %u, lcore di ricezione
 %u, pointer %p\n", result, rte_lcore_id(), lcore_id, new_work);//
 debug print, on my server it should never happen but with multi-thread
 happens always on the last logical core
>> Here, checking the value of new_work looks wrong to me, before
>> ensuring that result == 0. At least, new_work should be set to
>> NULL before calling rte_mempool_get().
> I put the check after result == 0, and just before the rte_mempool_get() I 
> set new_work to NULL, but nothing changed.
> The first time something goes wrong the print is
> 
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, pointer 
> 0x880002
> 
> Sorry for the italian language print :) it means that application is sending 
> a message from the logical core 1 to the logical core 2, it's the 635th time, 
> the result is 0 and the pointer is 0x880002 while all pointers before were 
> 0x7ffxx.
> One strange thing is that this behaviour happens always from the logical core 
> 1 to the logical core 2 when the counter is 635!!! (Sending messages from 2 
> to 1 or 1 to 1 or 2 to 2 is all ok)
> Another strange thing is that pointers from counter 636 to 640 are NULL, and 
> from 641 begin again to be good... as you can see here following (I attached 
> the result of a test without the "if" of the check on the value of new_work, 
> and only for messages from the lcore 1 to lcore 2)
> 
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 627, pointer 
> 0x7ffe8a261880
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 628, pointer 
> 0x7ffe8a261900
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 629, pointer 
> 0x7ffe8a261980
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 630, pointer 
> 0x7ffe8a261a00
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 631, pointer 
> 0x7ffe8a261a80
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 632, pointer 
> 0x7ffe8a261b00
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 633, pointer 
> 0x7ffe8a261b80
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 634, pointer 
> 0x7ffe8a261c00
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 635, pointer 
> 0x880002
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 636, pointer 
> (nil)
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 637, pointer 
> (nil)
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 638, pointer 
> (nil)
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 639, pointer 
> (nil)
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 640, pointer 
> (nil)

This sure does seem like a memory over write problem, with maybe a memset(0) in 
the mix as well. Have you tried using hardware break points with the 0x880002 
or 0x00 being written into this range?

> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 641, pointer 
> 0x7ffe8a262b00
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 642, pointer 
> 0x7ffe8a262b80
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 643, pointer 
> 0x7ffe8a262d00
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 644, pointer 
> 0x7ffe8a262d80
> Result 0, lcore di partenza 1, lcore di ricezione 2, counter 645, pointer 
> 0x7ffe8a262e00
> 
>> 
 if (result == 0)
 {
 new_work->command = command; // usage of the memory gotten
 from the mempool... <- here is where the application crashes
>> Do you know why it crashes? Is it that new_work is NULL?
> The pointer is not NULL but is not sequential to the others (0x880002 as 
> written before in this email). It seems to be in a memory zone not in DPDK 
> hugepages or something similar.
> If I use this pointer the application crashes.
>> 
>> Can you check how the

Re: [dpdk-dev] [RFC v2] ethdev: add flow metadata

2018-08-20 Thread Ferruh Yigit
On 8/20/2018 4:45 PM, Ferruh Yigit wrote:
> On 6/26/2018 9:54 AM, Xueming Li wrote:
>> Currently, rte_flow pattern only match packet header fields.
>> This patch adds additional data to match the packet.
>>
>> For example, in egress direction, to do an action depending on the VM
>> id, the application needs to configure rte_flow rule with the new
>> metadata pattern:
>> pattern meta data is {vm} / end action encap …
>> Then the PMD will send VM id as metadata associated in mbuf to NIC,
>> then egress flow on NIC match metadata as other regular packet headers,
>> the appropriate encapsulation is done according to the VM id metadata.
>>
>> Metadata could be used on ingress as well to save useful info before
>> flow modification (not defined yet) or decapsulation action. PMD is
>> responsible to save metadata into mbuf field. The application must get
>> metadata from the mbuf.
>>
>> Cc: Thomas Monjalon 
>> Cc: Olivier Matz 
>> Cc: Shahaf Shuler 
>> Signed-off-by: Xueming Li 
>> ---
>>  doc/guides/prog_guide/rte_flow.rst |  7 +++
>>  lib/librte_ethdev/rte_flow.c   |  1 +
>>  lib/librte_ethdev/rte_flow.h   | 28 
>>  3 files changed, 36 insertions(+)
>>
>> diff --git a/doc/guides/prog_guide/rte_flow.rst 
>> b/doc/guides/prog_guide/rte_flow.rst
>> index b305a72a5..7989e5856 100644
>> --- a/doc/guides/prog_guide/rte_flow.rst
>> +++ b/doc/guides/prog_guide/rte_flow.rst
>> @@ -1191,6 +1191,13 @@ Normally preceded by any of:
>>  - `Item: ICMP6_ND_NS`_
>>  - `Item: ICMP6_ND_OPT`_
>>  
>> +Item: ``META``
>> +^^
>> +
>> +Matches a metadata variable.
>> +
>> +- ``data``: 64 bit value.
>> +
> 
> Isn't this "user data" more than metadata?
> 
> And there is only one userdata can be provided for a flow.
> Can there be a need to provide multiple userdata? Should we support it?
> For your example, on egress what if you want to do specific action on a flow 
> for
> both VM id and process id, you can only pass single user data?
> What about passing a key-value pair?

Looks like a there is a new version of the patch:
https://patches.dpdk.org/patch/43684/

I will update status of this one as "Superseded" and continue with new one.



[dpdk-dev] [PATCH v3 1/2] bus/pci: harmonize and document rte_pci_read_config return value

2018-08-20 Thread Luca Boccassi
On Linux, rte_pci_read_config on success returns the number of read
bytes, but on BSD it returns 0.
Document the return values, and have BSD behave as Linux does.

At least one case (bnx2x PMD) treats 0 as an error, so the change
makes sense also for that.

Signed-off-by: Luca Boccassi 
---
 drivers/bus/pci/bsd/pci.c | 4 +++-
 drivers/bus/pci/rte_bus_pci.h | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 655b34b7e4..175d83cf1b 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -439,6 +439,8 @@ int rte_pci_read_config(const struct rte_pci_device *dev,
 {
int fd = -1;
int size;
+   /* Copy Linux implementation's behaviour */
+   const int return_len = len;
struct pci_io pi = {
.pi_sel = {
.pc_domain = dev->addr.domain,
@@ -469,7 +471,7 @@ int rte_pci_read_config(const struct rte_pci_device *dev,
}
close(fd);
 
-   return 0;
+   return return_len;
 
  error:
if (fd >= 0)
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index 0d1955ffe0..df8f64798d 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -219,6 +219,8 @@ void rte_pci_unregister(struct rte_pci_driver *driver);
  *   The length of the data buffer.
  * @param offset
  *   The offset into PCI config space
+ * @return
+ *  Number of bytes read on success, negative on error.
  */
 int rte_pci_read_config(const struct rte_pci_device *device,
void *buf, size_t len, off_t offset);
-- 
2.18.0



[dpdk-dev] [PATCH v3 2/2] virtio: fix PCI config err handling

2018-08-20 Thread Luca Boccassi
From: Brian Russell 

In virtio_read_caps and vtpci_msix_detect, rte_pci_read_config returns
the number of bytes read from PCI config or < 0 on error.
If less than the expected number of bytes are read then log the
failure and return rather than carrying on with garbage.

Fixes: 6ba1f63b5ab0 ("virtio: support specification 1.0")

Signed-off-by: Brian Russell 
Signed-off-by: Luca Boccassi 
---
v2: handle additional rte_pci_read_config incomplete reads
v3: do not handle rte_pci_read_config of virtio cap, added in v2,
as it's less clear what the right thing to do there is

 drivers/net/virtio/virtio_pci.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 6bd22e54a6..e1df2c3b4d 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -567,16 +567,18 @@ virtio_read_caps(struct rte_pci_device *dev, struct 
virtio_hw *hw)
}
 
ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
-   if (ret < 0) {
-   PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
+   if (ret != 1) {
+   PMD_INIT_LOG(DEBUG,
+"failed to read pci capability list, ret %d", ret);
return -1;
}
 
while (pos) {
ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
-   if (ret < 0) {
-   PMD_INIT_LOG(ERR,
-   "failed to read pci cap at pos: %x", pos);
+   if (ret != sizeof(cap)) {
+   PMD_INIT_LOG(DEBUG,
+"failed to read pci cap at pos: %x ret %d",
+pos, ret);
break;
}
 
@@ -693,16 +695,18 @@ vtpci_msix_detect(struct rte_pci_device *dev)
int ret;
 
ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
-   if (ret < 0) {
-   PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
+   if (ret != 1) {
+   PMD_INIT_LOG(DEBUG,
+"failed to read pci capability list, ret %d", ret);
return VIRTIO_MSIX_NONE;
}
 
while (pos) {
ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
-   if (ret < 0) {
-   PMD_INIT_LOG(ERR,
-   "failed to read pci cap at pos: %x", pos);
+   if (ret != sizeof(cap)) {
+   PMD_INIT_LOG(DEBUG,
+"failed to read pci cap at pos: %x ret %d",
+pos, ret);
break;
}
 
-- 
2.18.0



Re: [dpdk-dev] [PATCH 2/2] virtio: fix PCI config err handling

2018-08-20 Thread Luca Boccassi
On Mon, 2018-08-20 at 16:18 +0800, Tiwei Bie wrote:
> On Thu, Aug 16, 2018 at 07:49:43PM +0100, Luca Boccassi wrote:
> > On Thu, 2018-08-16 at 19:47 +0100, Luca Boccassi wrote:
> > > From: Brian Russell 
> > > 
> > > In virtio_read_caps and vtpci_msix_detect, rte_pci_read_config
> > > returns
> > > the number of bytes read from PCI config or < 0 on error.
> > > If less than the expected number of bytes are read then log the
> > > failure and return rather than carrying on with garbage.
> > > 
> > > Fixes: 6ba1f63b5ab0 ("virtio: support specification 1.0")
> > > 
> > > Signed-off-by: Brian Russell 
> > > Signed-off-by: Luca Boccassi 
> > > ---
> > > v2: handle additional rte_pci_read_config incomplete reads
> > > 
> > >  drivers/net/virtio/virtio_pci.c | 35 +
> > > 
> > > 
> > >  1 file changed, 22 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/net/virtio/virtio_pci.c
> > > b/drivers/net/virtio/virtio_pci.c
> > > index 6bd22e54a6..ff6f96f361 100644
> > > --- a/drivers/net/virtio/virtio_pci.c
> > > +++ b/drivers/net/virtio/virtio_pci.c
> > 
> > ...
> > > @@ -610,9 +613,13 @@ virtio_read_caps(struct rte_pci_device *dev,
> > > struct virtio_hw *hw)
> > >   hw->common_cfg = get_cfg_addr(dev,
> > > &cap);
> > >   break;
> > >   case VIRTIO_PCI_CAP_NOTIFY_CFG:
> > > - rte_pci_read_config(dev, &hw-
> > > > notify_off_multiplier,
> > > 
> > > - 4, pos + sizeof(cap));
> > > - hw->notify_base = get_cfg_addr(dev,
> > > &cap);
> > > + /* Avoid half-reads into hw */
> > > + ret = rte_pci_read_config(dev,
> > > &multiplier,
> > > 4,
> > > + pos + sizeof(cap));
> > > + if (ret == 4) {
> > > + hw->notify_off_multiplier =
> > > multiplier;
> > > + hw->notify_base =
> > > get_cfg_addr(dev,
> > > &cap);
> > > + }
> > >   break;
> > >   case VIRTIO_PCI_CAP_DEVICE_CFG:
> > >   hw->dev_cfg = get_cfg_addr(dev, &cap);
> > 
> > Tiwei: not 100% sure what's the best way to handle a failure here,
> > this
> > will avoid writing to *hw in case of errors. Let me know if this is
> > OK.
> 
> My concern is about reading the virtio capability directly.
> With this patch, it will give up reading other capabilities
> when failed to read a full virtio PCI capability structure
> (the returned bytes are less than the expected bytes) even
> though when the capability it's reading isn't a virtio vendor
> specific capability. I'm not quite sure whether it will bring
> any bad consequence. How about just reading the first two
> bytes first? Something like this:
> 
> https://github.com/freebsd/freebsd/blob/72445a41b3ff/sys/dev/pci/pci.
> c#L1491-L1497

I'm not sure, to be honest. That bit didn't give me trouble at all, so
at this point I'd much rather leave it alone so that the maintainers
can take care of it how they see fit, if necessary :-)

I've sent a v3 that removes that individual change.

-- 
Kind regards,
Luca Boccassi


Re: [dpdk-dev] [PATCH v3] ip_frag: extend rte_ipv6_frag_get_ipv6_fragment_header()

2018-08-20 Thread Cody Doucette
Re-upping for review.

Thanks,
Cody

On Fri, Jul 27, 2018 at 9:52 AM, Cody Doucette  wrote:

> Extend rte_ipv6_frag_get_ipv6_fragment_header() to skip over any
> other IPv6 extension headers when finding the fragment header.
>
> According to RFC 8200, there is no guarantee that the IPv6
> Fragment extension header will come before any other extension
> header, even though it is recommended.
>
> Signed-off-by: Cody Doucette 
> Signed-off-by: Qiaobin Fu 
> Reviewed-by: Michel Machado 
> ---
> v3:
> * Removed compilation flag D_XOPEN_SOURCE=700 from the
>   failsafe driver to allow compilation on freebsd.
>
> v2:
> * Moved IPv6 extension header definitions to lib_net.
>
>  drivers/net/failsafe/Makefile   |  1 -
>  drivers/net/failsafe/meson.build|  1 -
>  examples/ip_reassembly/main.c   |  6 ++--
>  lib/librte_ip_frag/rte_ip_frag.h| 23 ++---
>  lib/librte_ip_frag/rte_ip_frag_version.map  |  1 +
>  lib/librte_ip_frag/rte_ipv6_fragmentation.c | 38 +
>  lib/librte_ip_frag/rte_ipv6_reassembly.c|  4 +--
>  lib/librte_net/rte_ip.h | 27 +++
>  lib/librte_port/rte_port_ras.c  |  6 ++--
>  9 files changed, 86 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
> index 81802d092..eb9292425 100644
> --- a/drivers/net/failsafe/Makefile
> +++ b/drivers/net/failsafe/Makefile
> @@ -34,7 +34,6 @@ CFLAGS += -std=gnu99 -Wextra
>  CFLAGS += -O3
>  CFLAGS += -I.
>  CFLAGS += -D_DEFAULT_SOURCE
> -CFLAGS += -D_XOPEN_SOURCE=700
>  CFLAGS += $(WERROR_FLAGS)
>  CFLAGS += -Wno-strict-prototypes
>  CFLAGS += -pedantic
> diff --git a/drivers/net/failsafe/meson.build
> b/drivers/net/failsafe/meson.build
> index a249ff4af..d47a52acd 100644
> --- a/drivers/net/failsafe/meson.build
> +++ b/drivers/net/failsafe/meson.build
> @@ -3,7 +3,6 @@
>
>  cflags += '-std=gnu99'
>  cflags += '-D_DEFAULT_SOURCE'
> -cflags += '-D_XOPEN_SOURCE=700'
>  cflags += '-pedantic'
>  if host_machine.system() == 'linux'
> cflags += '-DLINUX'
> diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
> index b830f67a5..58c388708 100644
> --- a/examples/ip_reassembly/main.c
> +++ b/examples/ip_reassembly/main.c
> @@ -366,12 +366,14 @@ reassemble(struct rte_mbuf *m, uint16_t portid,
> uint32_t queue,
> eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
> } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
> /* if packet is IPv6 */
> -   struct ipv6_extension_fragment *frag_hdr;
> +   const struct ipv6_extension_fragment *frag_hdr;
> +   struct ipv6_extension_fragment frag_hdr_buf;
> struct ipv6_hdr *ip_hdr;
>
> ip_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
>
> -   frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(ip_hdr);
> +   frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(m,
> +   ip_hdr, &frag_hdr_buf);
>
> if (frag_hdr != NULL) {
> struct rte_mbuf *mo;
> diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_
> frag.h
> index b3f3f78df..bd6f02a16 100644
> --- a/lib/librte_ip_frag/rte_ip_frag.h
> +++ b/lib/librte_ip_frag/rte_ip_frag.h
> @@ -208,28 +208,25 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
>  struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl
> *tbl,
> struct rte_ip_frag_death_row *dr,
> struct rte_mbuf *mb, uint64_t tms, struct ipv6_hdr *ip_hdr,
> -   struct ipv6_extension_fragment *frag_hdr);
> +   const struct ipv6_extension_fragment *frag_hdr);
>
>  /**
>   * Return a pointer to the packet's fragment header, if found.
> - * It only looks at the extension header that's right after the fixed IPv6
> - * header, and doesn't follow the whole chain of extension headers.
>   *
> - * @param hdr
> + * @param pkt
> + *   Pointer to the mbuf of the packet.
> + * @param ip_hdr
>   *   Pointer to the IPv6 header.
> + * @param frag_hdr
> + *   A pointer to the buffer where the fragment header
> + *   will be copied if it is not contiguous in mbuf data.
>   * @return
>   *   Pointer to the IPv6 fragment extension header, or NULL if it's not
>   *   present.
>   */
> -static inline struct ipv6_extension_fragment *
> -rte_ipv6_frag_get_ipv6_fragment_header(struct ipv6_hdr *hdr)
> -{
> -   if (hdr->proto == IPPROTO_FRAGMENT) {
> -   return (struct ipv6_extension_fragment *) ++hdr;
> -   }
> -   else
> -   return NULL;
> -}
> +const struct ipv6_extension_fragment *rte_ipv6_frag_get_ipv6_
> fragment_header(
> +   struct rte_mbuf *pkt, const struct ipv6_hdr *ip_hdr,
> +   struct ipv6_extension_fragment *frag_hdr);
>
>  /**
>   * IPv4 fragmentation.
> diff --git a/lib/librte_ip_frag/rte_ip_frag_version.map
> b/lib/

Re: [dpdk-dev] 17.11.4 patches review and test

2018-08-20 Thread Mody, Rasesh
>From: dev  On Behalf Of Marco Varlese
>Sent: Monday, August 20, 2018 2:21 AM
>
>Hi,
>
>The code in 17.11.4-rc1 does not compile for me.

This needs to be fixed for 17.11.4 as rte_eth_linkstatus_set() is not available 
in DPDK 17.11.x. Will send a patch with fix.

Thanks!
-Rasesh

>Please, see below what I get:
>
>[  100s] gcc -m64 -DVERSION="17.11.4" -
>L/home/abuild/rpmbuild/BUILD/dpdk-stable-
>17.11.4-rc1/x86_64-native-linuxapp-gcc-default/lib -Wl,--version-
>script=/home/abuild/rpmbuild/BUILD/dpdk-stable-17.11.4-
>rc1/drivers/net/bnx2x/rte_pmd_bnx2x_version.map  -shared bnx2x.o
>bnx2x_rxtx.o bnx2x_stats.o bnx2x_ethdev.o ecore_sp.o elink.o bnx2x_vfpf.o
>-z defs -lz -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring -lrte_ethdev -
>lrte_net -lrte_kvargs -lrte_bus_pci -Wl,-soname,librte_pmd_bnx2x.so.17.11.4
>-o
>librte_pmd_bnx2x.so.17.11.4
>[  100s] bnx2x_ethdev.o: In function `bnx2x_periodic_start':
>[  100s] bnx2x_ethdev.c:(.text+0x5ce): undefined reference to
>`rte_eth_linkstatus_set'
>[  100s] bnx2x_ethdev.o: In function `bnx2x_dev_link_update':
>[  100s] bnx2x_ethdev.c:(.text+0x94f): undefined reference to
>`rte_eth_linkstatus_set'
>[  100s] bnx2x_ethdev.o: In function `bnx2xvf_dev_link_update':
>[  100s] bnx2x_ethdev.c:(.text+0x9f0): undefined reference to
>`rte_eth_linkstatus_set'
>[  100s] bnx2x_ethdev.o: In function `bnx2x_interrupt_handler':
>[  100s] bnx2x_ethdev.c:(.text+0xb1b): undefined reference to
>`rte_eth_linkstatus_set'
>[  100s] collect2: error: ld returned 1 exit status [  100s] make[5]: ***
>[/home/abuild/rpmbuild/BUILD/dpdk-stable-17.11.4-
>rc1/mk/rte.lib.mk:128: librte_pmd_bnx2x.so.17.11.4] Error 1 [  100s] make[4]:
>*** [/home/abuild/rpmbuild/BUILD/dpdk-stable-17.11.4-
>rc1/mk/rte.subdir.mk:65: bnx2x] Error 2
>[  100s] make[4]: *** Waiting for unfinished jobs
>
>
>Cheers,
>Marco
>
>On Thu, 2018-08-16 at 11:18 -0700, Yongseok Koh wrote:
>> Hi all,
>>
>> Here is a list of patches targeted for LTS release 17.11.4. Please
>> help review and test. The planned date for the final release is August
>> 23. Before that, please shout if anyone has objections with these patches
>being applied.
>>
>> Also for the companies committed to running regression tests, please
>> run the tests and report any issue before the release date.
>>
>> A release candidate tarball can be found at:
>>
>> https://dpdk.org/browse/dpdk-stable/tag/?id=v17.11.4-rc1
>>
>> These patches are located at branch 17.11 of dpdk-stable repo:
>> https://dpdk.org/browse/dpdk-stable/
>>
>> Thanks,
>> Yongseok
>>
>> ---
>> Adrien Mazarguil (2):
>>   maintainers: update for Mellanox PMDs
>>   net/mlx4: fix minor resource leak during init
>>
>> Ajit Khaparde (7):
>>   net/bnxt: fix HW Tx checksum offload check
>>   net/bnxt: fix set MTU
>>   net/bnxt: fix Rx ring count limitation
>>   net/bnxt: fix memory leaks in NVM commands
>>   net/bnxt: fix lock release on NVM write failure
>>   net/bnxt: check access denied for HWRM commands
>>   net/bnxt: fix RETA size
>>
>> Alejandro Lucero (1):
>>   net/nfp: fix field initialization in Tx descriptor
>>
>> Alok Makhariya (1):
>>   bus/dpaa: fix phandle support for Linux 4.16
>>
>> Anatoly Burakov (8):
>>   eal/linux: fix invalid syntax in interrupts
>>   eal/linux: fix uninitialized value
>>   test: fix EAL flags autotest on FreeBSD
>>   test: fix result printing
>>   test: fix code on report
>>   test: make autotest runner python 2/3 compliant
>>   test: print autotest categories
>>   test: improve filtering
>>
>> Andrew Rybchenko (2):
>>   net/sfc: cut non VLAN ID bits from TCI
>>   net/sfc: fix assert in set multicast address list
>>
>> Andy Green (1):
>>   ring: fix sign conversion warning
>>
>> Beilei Xing (3):
>>   net/i40e: fix shifts of 32-bit value
>>   net/i40e: fix packet type parsing with DDP
>>   net/i40e: fix setting TPID with AQ command
>>
>> Bruce Richardson (2):
>>   examples/exception_path: fix out-of-bounds read
>>   mk: fix permissions when using make install
>>
>> Chas Williams (2):
>>   net/bonding: always update bonding link status
>>   net/bonding: do not clear active slave count
>>
>> Dan Gora (1):
>>   kni: fix crash with null name
>>
>> Daria Kolistratova (1):
>>   net/ena: fix SIGFPE with 0 Rx queue
>>
>> Dariusz Stojaczyk (1):
>>   eal: fix return codes on thread naming failure
>>
>> David Marchand (1):
>>   net/bnxt: add missing ids in xstats
>>
>> Drocula Lambda (1):
>>   kni: fix build on RHEL 7.5
>>
>> Ferruh Yigit (2):
>>   kni: fix build with gcc 8.1
>>   net/thunderx: fix build with gcc optimization on
>>
>> Gavin Hu (3):
>>   mk: fix cross build
>>   net/dpaa2: remove loop for unused pool entries
>>   maintainers: claim maintainership for ARM v7 and v8
>>
>> Haiyue Wang (1):
>>   net/i40e: workaround performance degradation
>>
>> Harry van Haaren (1):
>>   event: fix ring init failure h

Re: [dpdk-dev] [PATCH 2/2] virtio: fix PCI config err handling

2018-08-20 Thread Tiwei Bie
On Mon, Aug 20, 2018 at 05:45:35PM +0100, Luca Boccassi wrote:
> On Mon, 2018-08-20 at 16:18 +0800, Tiwei Bie wrote:
> > On Thu, Aug 16, 2018 at 07:49:43PM +0100, Luca Boccassi wrote:
> > > On Thu, 2018-08-16 at 19:47 +0100, Luca Boccassi wrote:
> > > > From: Brian Russell 
> > > > 
> > > > In virtio_read_caps and vtpci_msix_detect, rte_pci_read_config
> > > > returns
> > > > the number of bytes read from PCI config or < 0 on error.
> > > > If less than the expected number of bytes are read then log the
> > > > failure and return rather than carrying on with garbage.
> > > > 
> > > > Fixes: 6ba1f63b5ab0 ("virtio: support specification 1.0")
> > > > 
> > > > Signed-off-by: Brian Russell 
> > > > Signed-off-by: Luca Boccassi 
> > > > ---
> > > > v2: handle additional rte_pci_read_config incomplete reads
> > > > 
> > > >  drivers/net/virtio/virtio_pci.c | 35 +
> > > > 
> > > > 
> > > >  1 file changed, 22 insertions(+), 13 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/virtio/virtio_pci.c
> > > > b/drivers/net/virtio/virtio_pci.c
> > > > index 6bd22e54a6..ff6f96f361 100644
> > > > --- a/drivers/net/virtio/virtio_pci.c
> > > > +++ b/drivers/net/virtio/virtio_pci.c
> > > 
> > > ...
> > > > @@ -610,9 +613,13 @@ virtio_read_caps(struct rte_pci_device *dev,
> > > > struct virtio_hw *hw)
> > > >     hw->common_cfg = get_cfg_addr(dev,
> > > > &cap);
> > > >     break;
> > > >     case VIRTIO_PCI_CAP_NOTIFY_CFG:
> > > > -   rte_pci_read_config(dev, &hw-
> > > > > notify_off_multiplier,
> > > > 
> > > > -   4, pos + sizeof(cap));
> > > > -   hw->notify_base = get_cfg_addr(dev,
> > > > &cap);
> > > > +   /* Avoid half-reads into hw */
> > > > +   ret = rte_pci_read_config(dev,
> > > > &multiplier,
> > > > 4,
> > > > +   pos + sizeof(cap));
> > > > +   if (ret == 4) {
> > > > +   hw->notify_off_multiplier =
> > > > multiplier;
> > > > +   hw->notify_base =
> > > > get_cfg_addr(dev,
> > > > &cap);
> > > > +   }
> > > >     break;
> > > >     case VIRTIO_PCI_CAP_DEVICE_CFG:
> > > >     hw->dev_cfg = get_cfg_addr(dev, &cap);
> > > 
> > > Tiwei: not 100% sure what's the best way to handle a failure here,
> > > this
> > > will avoid writing to *hw in case of errors. Let me know if this is
> > > OK.
> > 
> > My concern is about reading the virtio capability directly.
> > With this patch, it will give up reading other capabilities
> > when failed to read a full virtio PCI capability structure
> > (the returned bytes are less than the expected bytes) even
> > though when the capability it's reading isn't a virtio vendor
> > specific capability. I'm not quite sure whether it will bring
> > any bad consequence. How about just reading the first two
> > bytes first? Something like this:
> > 
> > https://github.com/freebsd/freebsd/blob/72445a41b3ff/sys/dev/pci/pci.
> > c#L1491-L1497
> 
> I'm not sure, to be honest. That bit didn't give me trouble at all, so
> at this point I'd much rather leave it alone so that the maintainers
> can take care of it how they see fit, if necessary :-)
> 
> I've sent a v3 that removes that individual change.

My concern isn't about the above change (which handled the
errors when reading multiplier). In fact, above change looks
good to me! :-)  I mean the below changes in this patch:

while (pos) {
ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
-   if (ret < 0) {
-   PMD_INIT_LOG(ERR,
-   "failed to read pci cap at pos: %x", pos);
+   if (ret != sizeof(cap)) {
+   PMD_INIT_LOG(DEBUG,
+"failed to read pci cap at pos: %x ret %d",
+pos, ret);
break;
}

With this change, it will give up reading other capabilities
when failed to read a full virtio PCI capability structure
(the returned bytes are less than the expected bytes) even
though when the capability it's reading isn't a virtio vendor
specific capability. Maybe it would be better to read the
first two bytes first and check whether it's the capability
we want to parse (i.e. vendor capability and MSIX capability).
Something like this:

https://github.com/freebsd/freebsd/blob/72445a41b3ff/sys/dev/pci/pci.c#L1491-L1497

How do you think?

Thanks


Re: [dpdk-dev] [PATCH v2] hash table: add an iterator over conflicting entries

2018-08-20 Thread Honnappa Nagarahalli


-Original Message-
From: Michel Machado  
Sent: Saturday, August 18, 2018 6:08 PM
To: Honnappa Nagarahalli ; Fu, Qiaobin 
; Richardson, Bruce ; De Lara 
Guarch, Pablo 
Cc: dev@dpdk.org; Doucette, Cody, Joseph ; Wang, Yipeng1 
; Wiles, Keith ; Gobriel, Sameh 
; Tai, Charlie ; Stephen 
Hemminger ; nd 
Subject: Re: [dpdk-dev] [PATCH v2] hash table: add an iterator over conflicting 
entries

On 08/17/2018 03:41 PM, Honnappa Nagarahalli wrote:
> Can you elaborate more on using ' struct rte_conflict_iterator_state' as the 
> argument for the API?
> 
> If the API signature is changed to: rte_hash_iterate_conflict_entries (const 
> struct rte_hash *h, void **key, void **data, const hash_sig_t sig, struct 
> rte_conflict_iterator_state *state) - it will be inline with the existing 
> APIs. Contents of 'state' must be initialized to 0 for the first call. This 
> will also avoid creating 'rte_hash_iterator_conflict_entries_init' API.

Testing `state' every time rte_hash_iterate_conflict_entries() is called to 
find out if it's the first call of the iterator will possibly add some small, 
but unnecessary, overhead on
rte_hash_iterate_conflict_entries() and constraints on struct 
rte_conflict_iterator_state. Moreover,
rte_hash_iterator_conflict_entries_init() enables one to easily add variations 
of the init function to initialize the state (e.g. using a key instead of a 
sig) and still use the exactly same iterator.

IMO, I think, this over-head will be trivial. Looking at the function 
'rte_hash_iterate_conflict_entries' the check for '(__state->vnext < 
RTE_HASH_BUCKET_ENTRIES * 2)' already exists. If the primary/secondary bucket 
indices are calculated as well in 'rte_hash_iterate_conflict_entries' API 
('rte_hash_iterate' API does such calculations), storing them in the state can 
be avoided. I am wondering if it makes sense to benchmark with these changes 
and then take a decision?
 
[ ]'s
Michel Machado


Re: [dpdk-dev] [PATCH v1 0/5] Enable hotplug in vfio

2018-08-20 Thread Jeff Guo

hi, gaetan


On 8/20/2018 5:15 PM, Gaëtan Rivet wrote:

Hi Jeff,

On Fri, Aug 17, 2018 at 06:51:26PM +0800, Jeff Guo wrote:

As we may know that the process of hotplug is different between igb_uio
and vfio. For igb_uio, it could use uevent notification and memory
failure handle mechanism for hotplug. But for vfio, when device is be
hotplug-out, the uevent can not be detected immediately, because of the
vfio kernel module will use a special mechanism to guaranty the pci
device would not be deleted until the user space release the resources,
so it will use another event “req notifier” at first to notify user space
to release resources for hotplug.

This patch will add a new interrupt type of req notifier in eal interrupt,
and add the new interrupt handler in pci device to handle the req device
event. When the req notifier be detected, it can trigger the device event
callback process to process for hotplug. With this mechanism, hotplug
could be enable in vfio.


The REQ event seems VFIO specific.
Why not leave it within the PCI bus, around the vfio glue, and leave the
EAL unmodified?


Sorry i don' t see if there are any problem. Firstly, as we can see the 
eal interrupt type, it cover the ext/uio/vfio/dev event types, which are 
not general for all platform/bus/driver type.
so i think base on the current framework, the interrupt type structure 
should be considerate as combined set, it could extend for other adding 
interrupts case by case. And secondly, i don't know what
is your way about leave it within the PCI bus, because i need to use the 
eal interrupt epoll to process this req interrupt which is familiar with 
other interrupts. What do you think about that, if you still

have better suggestion, please detail it for clarify.


Jeff Guo (5):
   eal: add a new req notifier to eal interrupt
   eal: add a new req event to device event
   eal: modify device event callback process func
   pci: add req handler field to generic pci device
   vfio: enable vfio hotplug by req notifier handler

  drivers/bus/pci/linux/pci_vfio.c   | 104 +
  drivers/bus/pci/pci_common.c   |  10 ++
  drivers/bus/pci/rte_bus_pci.h  |   1 +
  lib/librte_eal/common/eal_common_dev.c |   5 +-
  lib/librte_eal/common/eal_private.h|  12 ---
  lib/librte_eal/common/include/rte_dev.h|  20 +++-
  lib/librte_eal/common/include/rte_eal_interrupts.h |   1 +
  lib/librte_eal/linuxapp/eal/eal_dev.c  |   2 +-
  lib/librte_eal/linuxapp/eal/eal_interrupts.c   |  71 ++
  lib/librte_ethdev/rte_ethdev.c |   3 +-
  10 files changed, 212 insertions(+), 17 deletions(-)

--
2.7.4





Re: [dpdk-dev] [PATCH v1 2/5] eal: add a new req event to device event

2018-08-20 Thread Jeff Guo

hi, andrew


On 8/20/2018 6:37 PM, Andrew Rybchenko wrote:

On 17.08.2018 13:51, Jeff Guo wrote:

Add a new req event in eal device event for vfio hotplug. When the req
request send from the vfio kernel module be detected, vfio userpace
driver could use this event to notify the app to handler it.

Signed-off-by: Jeff Guo 
---
  lib/librte_eal/common/include/rte_dev.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/include/rte_dev.h 
b/lib/librte_eal/common/include/rte_dev.h

index ff580a0..0324c84 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -30,6 +30,7 @@ extern "C" {
  enum rte_dev_event_type {
  RTE_DEV_EVENT_ADD,/**< device being added */
  RTE_DEV_EVENT_REMOVE,/**< device being removed */
+RTE_DEV_EVENT_REQ,/**< device being removed */


Comment is the copy of previous one.



You are right here, even we process these type in the same way but 
should be considered it as the different type if we exactly want to add 
new one.
so base on the interpret from the vfio kernel driver, this req event is 
used to require the user space to release the device resources, so it 
should be
interpret it by "device release request". If you object and have other 
better idea, let me know.



  RTE_DEV_EVENT_MAX/**< max value of this enum */
  };