Re: [dpdk-dev] dmadev discussion summary

2021-07-04 Thread Jerin Jacob
On Sat, Jul 3, 2021 at 5:30 PM Morten Brørup  wrote:
>
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of fengchengwen
> > Sent: Saturday, 3 July 2021 11.45
> >
> > On 2021/7/3 16:53, Morten Brørup wrote:
> > >> From: fengchengwen [mailto:fengcheng...@huawei.com]
> > >> Sent: Saturday, 3 July 2021 02.32
> > >>
> > >> On 2021/7/2 22:57, Morten Brørup wrote:
> >  In the DPDK framework, many data-plane API names contain queues.
> > >> e.g.
> >  eventdev/crypto..
> >  The concept of virt queues has continuity.
> > >>>
> > >>> I was also wondering about the name "virtual queue".
> > >>>
> > >>> Usually, something "virtual" would be an abstraction of something
> > >> physical, e.g. a software layer on top of something physical.
> > >>>
> > >>> Back in the days, a "DMA channel" used to mean a DMA engine on a
> > CPU.
> > >> If a CPU had 2 DMA channels, they could both be set up
> > simultaneously.
> > >>>
> > >>> The current design has the "dmadev" representing a CPU or other
> > chip,
> > >> which has one or more "HW-queues" representing DMA channels (of the
> > >> same type), and then "virt-queue" as a software abstraction on top,
> > for
> > >> using a DMA channel in different ways through individually
> > configured
> > >> contexts (virt-queues).
> > >>>
> > >>> It makes sense to me, although I would consider renaming "HW-queue"
> > >> to "channel" and perhaps "virt-queue" to "queue".
> > >>
> > >> The 'DMA channel' is more used than 'DMA queue', at least google
> > show
> > >> that there are at least 20+ times more.
> > >>
> > >> It's a good idea build the abstraction layer: queue <> channel <>
> > dma-
> > >> controller.
> > >> In this way, the meaning of each layer is relatively easy to
> > >> distinguish literally.
> > >>
> > >> will fix in V2
> > >>
> > >
> > > After re-reading all the mails in this thread, I have found one more
> > important high level detail still not decided:
> > >
> > > Bruce had suggested flattening the DMA channels, so each dmadev
> > represents a DMA channel. And DMA controllers with multiple DMA
> > channels will have to instantiate multiple dmadevs, one for each DMA
> > channel.
> >
> > The dpaa2_qdma support multiple DMA channels, I looked into the
> > dpaa2_qdma
> > and found the control-plane interacts with the kernel, so if we use the
> > flattening model, there maybe interactions between dmadevs.
>
> It is perfectly acceptable for the control-plane DMA controller functions to 
> interact across multiple dmadevs, not being thread safe and using locks etc. 
> to protect critical regions accessing shared registers.
>
> The key question is: Can the data-plane dmadev functions (rte_dma_copy() 
> etc.) be implemented to be thread safe, so multiple threads can use 
> data-plane dmadev functions simultaneously?

It can . if we need it in that way.

>
> >
> > >
> > > Just like a four port NIC instantiates four ethdevs.
> > >
> > > Then, like ethdevs, there would only be two abstraction layers:
> > dmadev <> queue, where a dmadev is a DMA channel on a DMA controller.
> >
> > the dmadev <> channel <> queue model, there are three abstraction
> > layers,
> > and two abstraction layers.
> >
> > >
> > > However, this assumes that the fast path functions on the individual
> > DMA channels of a DMA controller can be accessed completely
> > independently and simultaneously by multiple threads. (Otherwise, the
> > driver would need to implement critical regions or locking around
> > accessing the common registers in the DMA controller shared by the DMA
> > channels.)
> >
> > Yes, this scheme has a big implicit dependency, that is, the channels
> > are
> > independent of each other.
> >
> > >
> > > Unless any of the DMA controller vendors claim that this assumption
> > about independence of the DMA channels is wrong, I strongly support
> > Bruce's flattening suggestion.
> > >
> > > -Morten
> > >
> >
>


Re: [dpdk-dev] dmadev discussion summary

2021-07-04 Thread Jerin Jacob
On Sat, Jul 3, 2021 at 5:54 PM Morten Brørup  wrote:
>
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Saturday, 3 July 2021 11.09
> >
> > On Sat, Jul 3, 2021 at 2:23 PM Morten Brørup 
> > wrote:
> > >
> > > > From: fengchengwen [mailto:fengcheng...@huawei.com]
> > > > Sent: Saturday, 3 July 2021 02.32
> > > >
> > > > On 2021/7/2 22:57, Morten Brørup wrote:
> > > > >> In the DPDK framework, many data-plane API names contain queues.
> > > > e.g.
> > > > >> eventdev/crypto..
> > > > >> The concept of virt queues has continuity.
> > > > >
> > > > > I was also wondering about the name "virtual queue".
> > > > >
> > > > > Usually, something "virtual" would be an abstraction of something
> > > > physical, e.g. a software layer on top of something physical.
> > > > >
> > > > > Back in the days, a "DMA channel" used to mean a DMA engine on a
> > CPU.
> > > > If a CPU had 2 DMA channels, they could both be set up
> > simultaneously.
> > > > >
> > > > > The current design has the "dmadev" representing a CPU or other
> > chip,
> > > > which has one or more "HW-queues" representing DMA channels (of the
> > > > same type), and then "virt-queue" as a software abstraction on top,
> > for
> > > > using a DMA channel in different ways through individually
> > configured
> > > > contexts (virt-queues).
> > > > >
> > > > > It makes sense to me, although I would consider renaming "HW-
> > queue"
> > > > to "channel" and perhaps "virt-queue" to "queue".
> > > >
> > > > The 'DMA channel' is more used than 'DMA queue', at least google
> > show
> > > > that there are at least 20+ times more.
> > > >
> > > > It's a good idea build the abstraction layer: queue <> channel <>
> > dma-
> > > > controller.
> > > > In this way, the meaning of each layer is relatively easy to
> > > > distinguish literally.
> > > >
> > > > will fix in V2
> > > >
> > >
> > > After re-reading all the mails in this thread, I have found one more
> > important high level detail still not decided:
> > >
> > > Bruce had suggested flattening the DMA channels, so each dmadev
> > represents a DMA channel. And DMA controllers with multiple DMA
> > channels will have to instantiate multiple dmadevs, one for each DMA
> > channel.
> > >
> > > Just like a four port NIC instantiates four ethdevs.
> > >
> > > Then, like ethdevs, there would only be two abstraction layers:
> > dmadev <> queue, where a dmadev is a DMA channel on a DMA controller.
> > >
> > > However, this assumes that the fast path functions on the individual
> > DMA channels of a DMA controller can be accessed completely
> > independently and simultaneously by multiple threads. (Otherwise, the
> > driver would need to implement critical regions or locking around
> > accessing the common registers in the DMA controller shared by the DMA
> > channels.)
> > >
> > > Unless any of the DMA controller vendors claim that this assumption
> > about independence of the DMA channels is wrong, I strongly support
> > Bruce's flattening suggestion.
> >
> > It is wrong from alteast octeontx2_dma PoV.
> >
> > # The PCI device is DMA controller where the driver/device is
> > mapped.(As device driver is based on PCI bus, We dont want to have
> > vdev for this)
> > # The PCI device has HW queue(s)
> > # Each HW queue has different channels.
> >
> > In the current configuration, we have only one queue per device and it
> > has 4 channels. 4 channels are not threaded safe as it is based on
> > single queue.
>
> Please clarify "current configuration": Is that a configuration modifiable by 
> changing some software/driver, or is it the chip that was built that way in 
> the RTL code?

We have 8 queues per SoC, Based on some of HW versions it can be
configured as (a) or (b) using FW settings.
a) One PCI devices with 8 Queues
b) 8 PCI devices with each one has one queue.

Everyone is using mode (b) as it helps 8 different applications to use
DMA as if one application binds the PCI device other applications can
not use the same PCI device.
If one application needs 8 queues, it is possible that 8 dmadevice can
be bound to a single application with mode (b).


I think, in above way we can flatten to  <> 

>
> >
> > I think, if we need to flatten it, I think, it makes sense to have
> > dmadev <> channel (and each channel can have thread-safe capability
> > based on how it mapped on HW queues based on the device driver
> > capability).
>
> The key question is how many threads can independently call data-plane dmadev 
> functions (rte_dma_copy() etc.) simultaneously. If I understand your 
> explanation correctly, only one - because you only have one DMA device, and 
> all access to it goes through a single hardware queue.
>
> I just realized that although you only have one DMA Controller with only one 
> HW queue, your four DMA channels allows four sequentially initiated 
> transactions to be running simultaneously. Does the application have any 
> benefit by knowing that the dmadev can have multiple ongoin

Re: [dpdk-dev] NUMA node/socket

2021-07-04 Thread Andrew Rybchenko
On 7/4/21 4:53 AM, Thomas Monjalon wrote:
> 04/07/2021 03:38, Thomas Monjalon:
>> There are some mix between NUMA node and socket IDs in DPDK.
>> Examples:
>>  * rte_eth_dev_socket_id() returns the NUMA node.
>>  * rte_malloc use sockets to allocate the memory
>>
>> Is it critical?
> 
> There is a function, implemented for Windows only,
> which distinguishes clearly node and socket
> but it assumes there is only 1 node per socket:
> 
> unsigned int
> eal_socket_numa_node(unsigned int socket_id)
> {
> return cpu_map.sockets[socket_id].node_id;
> }
> 
> Reminder: AMD can be configured to have multiple nodes per socket.

Taking the reminder into account the topic definitely
requires improvements.

I apologize for my ignorance, but
Is socket ID really interesting to anybody in DPDK?
If no, we should just clarify terminology and switch
to NUMA node everywhere.


Re: [dpdk-dev] NUMA node/socket

2021-07-04 Thread Thomas Monjalon
04/07/2021 10:27, Andrew Rybchenko:
> On 7/4/21 4:53 AM, Thomas Monjalon wrote:
> > 04/07/2021 03:38, Thomas Monjalon:
> >> There are some mix between NUMA node and socket IDs in DPDK.
> >> Examples:
> >>* rte_eth_dev_socket_id() returns the NUMA node.
> >>* rte_malloc use sockets to allocate the memory
> >>
> >> Is it critical?
> > 
> > There is a function, implemented for Windows only,
> > which distinguishes clearly node and socket
> > but it assumes there is only 1 node per socket:
> > 
> > unsigned int
> > eal_socket_numa_node(unsigned int socket_id)
> > {
> > return cpu_map.sockets[socket_id].node_id;
> > }
> > 
> > Reminder: AMD can be configured to have multiple nodes per socket.
> 
> Taking the reminder into account the topic definitely
> requires improvements.
> 
> I apologize for my ignorance, but
> Is socket ID really interesting to anybody in DPDK?

I think the socket ID might be interesting for the threads,
but not for memory or devices.

> If no, we should just clarify terminology and switch
> to NUMA node everywhere.

I have the same opinion as Andrew.
If socket ID is required, it could be deduced from the NUMA node
or from the CPU core.




Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library

2021-07-04 Thread Jerin Jacob
On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng  wrote:
>
> This patch introduces 'dmadevice' which is a generic type of DMA
> device.
>
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
>
> Signed-off-by: Chengwen Feng 

Thanks for v1.

I would suggest finalizing  lib/dmadev/rte_dmadev.h before doing the
implementation so that you don't need
to waste time on rewoking the implementation.

Comments inline.

> ---
>  MAINTAINERS  |   4 +
>  config/rte_config.h  |   3 +
>  lib/dmadev/meson.build   |   6 +
>  lib/dmadev/rte_dmadev.c  | 438 +
>  lib/dmadev/rte_dmadev.h  | 919 
> +++
>  lib/dmadev/rte_dmadev_core.h |  98 +
>  lib/dmadev/rte_dmadev_pmd.h  | 210 ++
>  lib/dmadev/version.map   |  32 ++

Missed to update doxygen. See doc/api/doxy-api.conf.in
Use meson  -Denable_docs=true to verify the generated doxgen doc.

>  lib/meson.build  |   1 +
>  9 files changed, 1711 insertions(+)
>  create mode 100644 lib/dmadev/meson.build
>  create mode 100644 lib/dmadev/rte_dmadev.c
>  create mode 100644 lib/dmadev/rte_dmadev.h
>  create mode 100644 lib/dmadev/rte_dmadev_core.h
>  create mode 100644 lib/dmadev/rte_dmadev_pmd.h
>  create mode 100644 lib/dmadev/version.map
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4347555..2019783 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
>  F: app/test/test_rawdev.c
>  F: doc/guides/prog_guide/rawdev.rst
>

Add EXPERIMENTAL

> +Dma device API
> +M: Chengwen Feng 
> +F: lib/dmadev/
> +
>

> new file mode 100644
> index 000..a94e839
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -0,0 +1,438 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

Sort in alphabetical order.

> +
> +#include "rte_dmadev.h"
> +#include "rte_dmadev_pmd.h"
> +
> +struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];

# Please check have you missed any multiprocess angle.
lib/regexdev/rte_regexdev.c is latest device class implemented in dpdk and
please check *rte_regexdev_shared_data scheme.


# Missing dynamic log for this library.


> diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> new file mode 100644
> index 000..f74fc6a
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.h
> @@ -0,0 +1,919 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.

It would be nice to add other companies' names who have contributed to
the specification.

> + */
> +
> +#ifndef _RTE_DMADEV_H_
> +#define _RTE_DMADEV_H_
> +
> +/**
> + * @file rte_dmadev.h
> + *
> + * RTE DMA (Direct Memory Access) device APIs.
> + *
> + * The generic DMA device diagram:
> + *
> + * 
> + *| HW-queue | | HW-queue |
> + * 
> + *   \/
> + *\  /
> + * \/
> + *  
> + *  |dma-controller|
> + *  
> + *
> + *   The DMA could have multiple HW-queues, each HW-queue could have multiple
> + *   capabilities, e.g. whether to support fill operation, supported DMA
> + *   transfter direction and etc.

typo

> + *
> + * The DMA framework is built on the following abstraction model:
> + *
> + * 
> + * |virt-queue||virt-queue|
> + * 
> + *\   /
> + * \ /
> + *  \   /
> + * 
> + *| HW-queue | | HW-queue |
> + * 
> + *   \/
> + *\  /
> + * \/
> + * --
> + * | dmadev |
> + * --

Continuing the discussion with @Morten Brørup , I think, we need to
finalize the model.

> + *   a) The DMA operation request must be submitted to the virt queue, virt
> + *  queues must be created based on HW queues, the DMA device could have
> + *  multiple HW queues.
> + *   b) The virt queues on the same HW-queue could represent different 
> contexts,
> + *  e.g. user could create virt-queue-0 on HW-queue-0 for mem-to-mem
> + *  transfer scenario, and create virt-queue-1 on the same HW-queue for
> + *  mem-to-dev transfer scenario.
> + *   NOTE: user could also create multiple virt queues for mem-to-mem 
> transfer
> + * scenario as long as the corresponding driver supports.
> + *
> + * The control plane APIs include configu

Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library

2021-07-04 Thread Andrew Rybchenko
On 7/2/21 4:18 PM, Chengwen Feng wrote:
> This patch introduces 'dmadevice' which is a generic type of DMA
> device.

"This patch introduces ... " -> "Introduce ..."

> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng 

[snip]

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4347555..2019783 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
>  F: app/test/test_rawdev.c
>  F: doc/guides/prog_guide/rawdev.rst
>  
> +Dma device API

Dma -> DMA

> +M: Chengwen Feng 
> +F: lib/dmadev/
> +
>  
>  Memory Pool Drivers
>  ---

[snip]

> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> new file mode 100644
> index 000..a94e839
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -0,0 +1,438 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "rte_dmadev.h"
> +#include "rte_dmadev_pmd.h"
> +
> +struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];
> +
> +uint16_t
> +rte_dmadev_count(void)
> +{
> + uint16_t count = 0;
> + uint16_t i;
> +
> + for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> + if (rte_dmadevices[i].attached)
> + count++;
> + }
> +
> + return count;
> +}
> +
> +int
> +rte_dmadev_get_dev_id(const char *name)
> +{
> + uint16_t i;
> +
> + if (name == NULL)
> + return -EINVAL;
> +
> + for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++)
> + if ((strcmp(rte_dmadevices[i].name, name) == 0) &&
> + (rte_dmadevices[i].attached == RTE_DMADEV_ATTACHED))
> + return i;
> +
> + return -ENODEV;
> +}
> +
> +int
> +rte_dmadev_socket_id(uint16_t dev_id)
> +{
> + struct rte_dmadev *dev;
> +
> + RTE_DMADEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> + dev = &rte_dmadevices[dev_id];
> +
> + return dev->socket_id;
> +}
> +
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info)
> +{
> + struct rte_dmadev *dev;
> + int diag;
> +
> + RTE_DMADEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> + RTE_FUNC_PTR_OR_ERR_RET(dev_info, -EINVAL);
> +
> + dev = &rte_dmadevices[dev_id];
> +
> + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_info_get, -ENOTSUP);
> +
> + memset(dev_info, 0, sizeof(struct rte_dmadev_info));
> + diag = (*dev->dev_ops->dev_info_get)(dev, dev_info);
> + if (diag != 0)
> + return diag;
> +
> + dev_info->device = dev->device;
> + dev_info->driver_name = dev->driver_name;
> + dev_info->socket_id = dev->socket_id;
> +
> + return 0;
> +}
> +
> +int
> +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf)
> +{
> + struct rte_dmadev *dev;
> + int diag;
> +
> + RTE_DMADEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> + RTE_FUNC_PTR_OR_ERR_RET(dev_conf, -EINVAL);
> +
> + dev = &rte_dmadevices[dev_id];
> +
> + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_configure, -ENOTSUP);
> +
> + if (dev->started) {
> + RTE_DMADEV_LOG(ERR,
> +"device %u must be stopped to allow configuration", dev_id);
> + return -EBUSY;
> + }
> +
> + diag = (*dev->dev_ops->dev_configure)(dev, dev_conf);
> + if (diag != 0)
> + RTE_DMADEV_LOG(ERR, "device %u dev_configure failed, ret = %d",
> +dev_id, diag);
> + else
> + dev->attached = 1;
> +
> + return diag;
> +}
> +
> +int
> +rte_dmadev_start(uint16_t dev_id)
> +{
> + struct rte_dmadev *dev;
> + int diag;
> +
> + RTE_DMADEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> +
> + dev = &rte_dmadevices[dev_id];
> + if (dev->started != 0) {
> + RTE_DMADEV_LOG(ERR, "device %u already started", dev_id);
> + return 0;
> + }
> +
> + if (dev->dev_ops->dev_start == NULL)
> + goto mark_started;
> +
> + diag = (*dev->dev_ops->dev_start)(dev);
> + if (diag != 0)
> + return diag;
> +
> +mark_started:
> + dev->started = 1;
> + return 0;
> +}
> +
> +int
> +rte_dmadev_stop(uint16_t dev_id)
> +{
> + struct rte_dmadev *dev;
> + int diag;
> +
> + RTE_DMADEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> +
> + dev = &rte_dmadevices[dev_id];
> +
> + if (dev->started == 0) {
> + RTE_DMADEV_LOG(ERR, "device %u already stopped", dev_id);
> + return 0;
> + }
> +
> + if (dev->dev_ops->dev_stop == NULL)
> + goto mark_stopped;
> +
> + diag = (*dev->dev_ops->dev_stop)(dev);
> + if (diag != 0)
> + return diag;
> +
> +mark_stopped:
> + dev->started = 0;
> + return 0;
> +}
> +
> +int
> +rte_dm

Re: [dpdk-dev] [PATCH 0/3] net/bonding: make dedicated queues work with mlx5

2021-07-04 Thread Andrew Rybchenko
I guess review from net/mlx5 maintainers would be very useful
here. Adding in Cc.

On 6/22/21 12:25 PM, Martin Havlik wrote:
> This patchset fixes the inability to use dedicated queues
> on mlx5 PMD due to RTE Flow rule attempted creation prior to
> starting the device.
> Missing return value check and copy paste error near the rule
> creation have also been fixed.
> 
> Cc: Chas Williams 
> Cc: "Min Hu (Connor)" 
> Cc: Declan Doherty 
> Cc: Tomasz Kulasek 
> Cc: Jan Viktorin 
> 
> Martin Havlik (3):
>   net/bonding: fix proper return value check and correct log message
>   net/bonding: fix not checked return value
>   net/bonding: start ethdev prior to setting 8023ad flow
> 
>  drivers/net/bonding/rte_eth_bond_pmd.c | 33 +++---
>  1 file changed, 25 insertions(+), 8 deletions(-)
> 



Re: [dpdk-dev] [PATCH 3/3] net/bonding: start ethdev prior to setting 8023ad flow

2021-07-04 Thread Matan Azrad


From: Havlík Martin
> Dne 2021-06-23 09:04, Min Hu (Connor) napsal:
> > 在 2021/6/22 17:25, Martin Havlik 写道:
> >> When dedicated queues are enabled, mlx5 PMD fails to install RTE
> >> Flows if the underlying ethdev is not started:
> >> bond_ethdev_8023ad_flow_set(267) - bond_ethdev_8023ad_flow_set:
> port
> >> not started (slave_port=0 queue_id=1)
> >>
> > Why mlx5 PMD doing flow create relys on port started ?
> > I noticed other PMDs did not has that reliance.
> >
> After looking into it, I really can't answer this mlx5 centered question.
> Closest related info we found so far is the 5th point in
> https://doc.dpdk.org/guides/prog_guide/rte_flow.html#caveats
> but it only specifies it's the application's responsibility and that flow 
> rules are
> assumed invalid after port stop/close/restart but doesn't say anything about
>  vs  where 
> the
> former is the point of failure on mlx5.
> I'm addressing the maintainers for mlx5, who might know a bit more on the
> topic.

From rte_ethdev.h

* Please note that some configuration is not stored between calls to
 * rte_eth_dev_stop()/rte_eth_dev_start(). The following configuration will
 * be retained:
 *
 * - MTU
 * - flow control settings
 * - receive mode configuration (promiscuous mode, all-multicast mode,
 *   hardware checksum mode, RSS/VMDQ settings etc.)
 * - VLAN filtering configuration
 * - default MAC address
 * - MAC addresses supplied to MAC address array
 * - flow director filtering mode (but not filtering rules)
 * - NIC queue statistics mappings


Mlx5 assumes flows are allowed to be configured only after rte_eth_dev_start().
Before start \ after stop - no flow is valid anymore.

Matan

> >> Signed-off-by: Martin Havlik 
> >> Cc: Jan Viktorin 
> >> ---
> >>   drivers/net/bonding/rte_eth_bond_pmd.c | 26
> >> ++
> >>   1 file changed, 18 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> >> b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> index a6755661c..fea3bc537 100644
> >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> @@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev
> >> *bonded_eth_dev,
> >>  rte_flow_destroy(slave_eth_dev->data->port_id,
> >>
> >>  internals-
> >mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
> >>  &flow_error);
> >> +}
> >>   +  /* Start device */
> >> +errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> >> +if (errval != 0) {
> >> +RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
> >> +slave_eth_dev->data->port_id, errval);
> >> +return -1;
> >> +}
> >> +
> >> +if (internals->mode == BONDING_MODE_8023AD &&
> >> +internals->mode4.dedicated_queues.enabled == 1)
> >> + {
> >>  errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
> >>  slave_eth_dev->data->port_id);
> >>  if (errval != 0) {
> >>  RTE_BOND_LOG(ERR,
> >>  "bond_ethdev_8023ad_flow_set: port=%d, err 
> >> (%d)",
> >>  slave_eth_dev->data->port_id, errval);
> >> +
> >> +errval = 
> >> rte_eth_dev_stop(slave_eth_dev->data->port_id);
> >> +if (errval < 0) {
> >> +RTE_BOND_LOG(ERR,
> >> +"rte_eth_dev_stop: port=%d, err (%d)",
> >> +slave_eth_dev->data->port_id, errval);
> >> +}
> >>  return errval;
> >>  }
> >>  }
> >>   -  /* Start device */
> >> -errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
> >> -if (errval != 0) {
> >> -RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
> >> -slave_eth_dev->data->port_id, errval);
> >> -return -1;
> >> -}
> >> -
> >>  /* If RSS is enabled for bonding, synchronize RETA */
> >>  if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode &
> >> ETH_MQ_RX_RSS) {
> >>  int i;
> >>


Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library

2021-07-04 Thread Matan Azrad



From: Chengwen Feng
> This patch introduces 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
Did you consider RTE_COMP_ALGO_NULL xform in compressdev library?

> Signed-off-by: Chengwen Feng 
> ---
>  MAINTAINERS  |   4 +
>  config/rte_config.h  |   3 +
>  lib/dmadev/meson.build   |   6 +
>  lib/dmadev/rte_dmadev.c  | 438 +
>  lib/dmadev/rte_dmadev.h  | 919
> +++
>  lib/dmadev/rte_dmadev_core.h |  98 +
>  lib/dmadev/rte_dmadev_pmd.h  | 210 ++
>  lib/dmadev/version.map   |  32 ++
>  lib/meson.build  |   1 +
>  9 files changed, 1711 insertions(+)
>  create mode 100644 lib/dmadev/meson.build
>  create mode 100644 lib/dmadev/rte_dmadev.c
>  create mode 100644 lib/dmadev/rte_dmadev.h
>  create mode 100644 lib/dmadev/rte_dmadev_core.h
>  create mode 100644 lib/dmadev/rte_dmadev_pmd.h
>  create mode 100644 lib/dmadev/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4347555..2019783 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
>  F: app/test/test_rawdev.c
>  F: doc/guides/prog_guide/rawdev.rst
> 
> +Dma device API
> +M: Chengwen Feng 
> +F: lib/dmadev/
> +
> 
>  Memory Pool Drivers
>  ---
> diff --git a/config/rte_config.h b/config/rte_config.h
> index 590903c..331a431 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -81,6 +81,9 @@
>  /* rawdev defines */
>  #define RTE_RAWDEV_MAX_DEVS 64
> 
> +/* dmadev defines */
> +#define RTE_DMADEV_MAX_DEVS 64
> +
>  /* ip_fragmentation defines */
>  #define RTE_LIBRTE_IP_FRAG_MAX_FRAG 4
>  #undef RTE_LIBRTE_IP_FRAG_TBL_STAT
> diff --git a/lib/dmadev/meson.build b/lib/dmadev/meson.build
> new file mode 100644
> index 000..c918dae
> --- /dev/null
> +++ b/lib/dmadev/meson.build
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2021 HiSilicon Limited.
> +
> +sources = files('rte_dmadev.c')
> +headers = files('rte_dmadev.h', 'rte_dmadev_pmd.h')
> +indirect_headers += files('rte_dmadev_core.h')
> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> new file mode 100644
> index 000..a94e839
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -0,0 +1,438 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "rte_dmadev.h"
> +#include "rte_dmadev_pmd.h"
> +
> +struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];
> +
> +uint16_t
> +rte_dmadev_count(void)
> +{
> +   uint16_t count = 0;
> +   uint16_t i;
> +
> +   for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +   if (rte_dmadevices[i].attached)
> +   count++;
> +   }
> +
> +   return count;
> +}
> +
> +int
> +rte_dmadev_get_dev_id(const char *name)
> +{
> +   uint16_t i;
> +
> +   if (name == NULL)
> +   return -EINVAL;
> +
> +   for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++)
> +   if ((strcmp(rte_dmadevices[i].name, name) == 0) &&
> +   (rte_dmadevices[i].attached == RTE_DMADEV_ATTACHED))
> +   return i;
> +
> +   return -ENODEV;
> +}
> +
> +int
> +rte_dmadev_socket_id(uint16_t dev_id)
> +{
> +   struct rte_dmadev *dev;
> +
> +   RTE_DMADEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> +   dev = &rte_dmadevices[dev_id];
> +
> +   return dev->socket_id;
> +}
> +
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info)
> +{
> +   struct rte_dmadev *dev;
> +   int diag;
> +
> +   RTE_DMADEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> +   RTE_FUNC_PTR_OR_ERR_RET(dev_info, -EINVAL);
> +
> +   dev = &rte_dmadevices[dev_id];
> +
> +   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_info_get, -
> ENOTSUP);
> +
> +   memset(dev_info, 0, sizeof(struct rte_dmadev_info));
> +   diag = (*dev->dev_ops->dev_info_get)(dev, dev_info);
> +   if (diag != 0)
> +   return diag;
> +
> +   dev_info->device = dev->device;
> +   dev_info->driver_name = dev->driver_name;
> +   dev_info->socket_id = dev->socket_id;
> +
> +   return 0;
> +}
> +
> +int
> +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf
> *dev_conf)
> +{
> +   struct rte_dmadev *dev;
> +   int diag;
> +
> +   RTE_DMADEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> +   RTE_FUNC_PTR_OR_ERR_RET(dev_conf, -EINVAL);
> +
> +   dev = &rte_dmadevices[dev_id];
> +
> +   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_configure, -
> ENOTSUP);
> +
> +   if (dev->started) {
> +   RTE_DMADEV_LOG(ERR,
> +  "dev

Re: [dpdk-dev] [PATCH v2] add testing requirement for new PMDs

2021-07-04 Thread Andrew Rybchenko
On 7/1/21 9:49 AM, Mcnamara, John wrote:
>> -Original Message-
>> From: Yigit, Ferruh 
>> Sent: Tuesday, June 29, 2021 9:35 AM
>> To: w...@dpdk.org
>> Cc: dev@dpdk.org; Thomas Monjalon ; Mcnamara, John
>> ; Dmitry Kozlyuk 
>> Subject: [PATCH v2] add testing requirement for new PMDs
>>
>> As discussed in the technical board meeting
>> https://mails.dpdk.org/archives/dev/2021-February/200012.html
>>
>> This is to record that new upstreamed devices tested adequately.
> 
> Acked-by: John McNamara 
> 

Acked-by: Andrew Rybchenko 



Re: [dpdk-dev] [PATCH v6 1/2] devargs: add common key definition

2021-07-04 Thread Andrew Rybchenko
On 6/25/21 2:47 PM, Xueming Li wrote:
> Adds common devargs key definition for "bus", "class" and "driver".
> 
> Acked-by: Thomas Monjalon 
> Signed-off-by: Xueming Li 

Acked-by: Andrew Rybchenko 



[dpdk-dev] [PATCH v4 1/2] kni: rework rte_kni_update_link using ioctl

2021-07-04 Thread Igor Ryzhov
Current implementation doesn't allow us to update KNI carrier if the
interface is not yet UP in kernel. It means that we can't use it in the
same thread which is processing rte_kni_ops.config_network_if, which is
very convenient, because it allows us to have correct carrier status
of the interface right after we enabled it and we don't have to use any
additional thread to track link status.

Propagating speed/duplex/autoneg to the kernel module also allows us to
implement ethtool_ops.get_link_ksettings callback.

Suggested-by: Dan Gora 
Signed-off-by: Igor Ryzhov 
---
 app/test/test_kni.c | 32 ++---
 examples/kni/main.c |  2 +-
 kernel/linux/kni/compat.h   |  4 
 kernel/linux/kni/kni_dev.h  |  5 
 kernel/linux/kni/kni_misc.c | 47 +
 kernel/linux/kni/kni_net.c  | 15 
 lib/kni/rte_kni.c   | 38 --
 lib/kni/rte_kni.h   | 10 
 lib/kni/rte_kni_common.h|  9 +++
 9 files changed, 89 insertions(+), 73 deletions(-)

diff --git a/app/test/test_kni.c b/app/test/test_kni.c
index 96733554b6c4..f9300552b3f6 100644
--- a/app/test/test_kni.c
+++ b/app/test/test_kni.c
@@ -125,6 +125,7 @@ kni_change_mtu(uint16_t port_id, unsigned int new_mtu)
 static int
 test_kni_link_change(void)
 {
+   struct rte_eth_link link;
int ret;
int pid;
 
@@ -135,42 +136,35 @@ test_kni_link_change(void)
}
 
if (pid == 0) {
+   link.link_speed = ETH_SPEED_NUM_10G;
+   link.link_duplex = ETH_LINK_FULL_DUPLEX;
+   link.link_autoneg = ETH_LINK_AUTONEG;
+
printf("Starting KNI Link status change tests.\n");
if (system(IFCONFIG TEST_KNI_PORT" up") == -1) {
ret = -1;
goto error;
}
 
-   ret = rte_kni_update_link(test_kni_ctx, 1);
+   link.link_status = ETH_LINK_UP;
+   ret = rte_kni_update_link(test_kni_ctx, &link);
if (ret < 0) {
printf("Failed to change link state to Up ret=%d.\n",
ret);
goto error;
}
rte_delay_ms(1000);
-   printf("KNI: Set LINKUP, previous state=%d\n", ret);
-
-   ret = rte_kni_update_link(test_kni_ctx, 0);
-   if (ret != 1) {
-   printf(
-   "Failed! Previous link state should be 1, returned %d.\n",
-   ret);
-   goto error;
-   }
-   rte_delay_ms(1000);
-   printf("KNI: Set LINKDOWN, previous state=%d\n", ret);
+   printf("KNI: Set LINKUP\n");
 
-   ret = rte_kni_update_link(test_kni_ctx, 1);
-   if (ret != 0) {
-   printf(
-   "Failed! Previous link state should be 0, returned %d.\n",
+   link.link_status = ETH_LINK_DOWN;
+   ret = rte_kni_update_link(test_kni_ctx, &link);
+   if (ret < 0) {
+   printf("Failed to change link state to Down ret=%d.\n",
ret);
goto error;
}
-   printf("KNI: Set LINKUP, previous state=%d\n", ret);
-
-   ret = 0;
rte_delay_ms(1000);
+   printf("KNI: Set LINKDOWN\n");
 
 error:
if (system(IFCONFIG TEST_KNI_PORT" down") == -1)
diff --git a/examples/kni/main.c b/examples/kni/main.c
index beabb3c848aa..5833037cf1c9 100644
--- a/examples/kni/main.c
+++ b/examples/kni/main.c
@@ -755,7 +755,7 @@ monitor_all_ports_link_status(void *arg)
}
for (i = 0; i < p[portid]->nb_kni; i++) {
prev = rte_kni_update_link(p[portid]->kni[i],
-   link.link_status);
+   &link);
log_link_state(p[portid]->kni[i], prev, &link);
}
}
diff --git a/kernel/linux/kni/compat.h b/kernel/linux/kni/compat.h
index 5f65640d5ed2..995d109c275a 100644
--- a/kernel/linux/kni/compat.h
+++ b/kernel/linux/kni/compat.h
@@ -61,10 +61,6 @@
 #define kni_sock_map_fd(s) sock_map_fd(s, 0)
 #endif
 
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0)
-#define HAVE_CHANGE_CARRIER_CB
-#endif
-
 #if LINUX_VERSION_CODE < KERNEL_VERSION(3, 14, 0)
 #define ether_addr_copy(dst, src) memcpy(dst, src, ETH_ALEN)
 #endif
diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h
index c15da311ba25..969108cc30f8 100644
--- a/kernel/linux/kni/kni_dev.h
+++ b/kernel/linux/kni/kni_dev.h
@@ -88,6 +88,11 @@ struct kni_dev {
void *alloc_va[MBUF_BURST_SZ];
 
struct task_struct *usr_tsk;
+
+   /* correct when netif_carrier_ok */
+  

[dpdk-dev] [PATCH v4 2/2] kni: implement basic get_link_ksettings callback

2021-07-04 Thread Igor Ryzhov
Signed-off-by: Igor Ryzhov 
---
 kernel/linux/kni/kni_net.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index 99da8d37dd6b..84357bec1341 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -805,9 +805,34 @@ static void kni_get_drvinfo(struct net_device *dev,
strlcpy(info->driver, "kni", sizeof(info->driver));
 }
 
+#ifdef ETHTOOL_GLINKSETTINGS
+static int kni_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *settings)
+{
+   struct kni_dev *kni = netdev_priv(dev);
+
+   settings->base.port = PORT_OTHER;
+
+   if (netif_carrier_ok(dev)) {
+   settings->base.speed = kni->speed;
+   settings->base.duplex = kni->duplex;
+   settings->base.autoneg = kni->autoneg;
+   } else {
+   settings->base.speed = SPEED_UNKNOWN;
+   settings->base.duplex = DUPLEX_UNKNOWN;
+   settings->base.autoneg = AUTONEG_ENABLE;
+   }
+
+   return 0;
+}
+#endif
+
 static const struct ethtool_ops kni_net_ethtool_ops = {
.get_drvinfo= kni_get_drvinfo,
.get_link   = ethtool_op_get_link,
+#ifdef ETHTOOL_GLINKSETTINGS
+   .get_link_ksettings = kni_get_link_ksettings,
+#endif
 };
 
 void
-- 
2.32.0



Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus

2021-07-04 Thread Andrew Rybchenko
On 6/25/21 2:47 PM, Xueming Li wrote:
> Auxiliary bus [1] provides a way to split function into child-devices
> representing sub-domains of functionality. Each auxiliary device
> represents a part of its parent functionality.
> 
> Auxiliary device is identified by unique device name, sysfs path:
>   /sys/bus/auxiliary/devices/
> 
> Devargs legacy syntax ofauxiliary device:

Missing space after 'of'

>   -a auxiliary:[,args...]
> Devargs generic syntax of auxiliary device:
>   -a bus=auxiliary,name=,,/class=,,/driver=,,

Are two commas above intentionall? What for?

> 
> [1] kernel auxiliary bus document:
> https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
> 
> Signed-off-by: Xueming Li 

With my below notes fixed:

Acked-by: Andrew Rybchenko 

> Cc: Wang Haiyue 
> Cc: Thomas Monjalon 
> Cc: Kinsella Ray 
> ---
>  MAINTAINERS   |   5 +
>  doc/guides/rel_notes/release_21_08.rst|   6 +
>  drivers/bus/auxiliary/auxiliary_common.c  | 411 ++
>  drivers/bus/auxiliary/auxiliary_params.c  |  59 
>  drivers/bus/auxiliary/linux/auxiliary.c   | 141 
>  drivers/bus/auxiliary/meson.build |  16 +
>  drivers/bus/auxiliary/private.h   |  74 
>  drivers/bus/auxiliary/rte_bus_auxiliary.h | 201 +++
>  drivers/bus/auxiliary/version.map |   7 +
>  drivers/bus/meson.build   |   1 +
>  10 files changed, 921 insertions(+)
>  create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
>  create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
>  create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
>  create mode 100644 drivers/bus/auxiliary/meson.build
>  create mode 100644 drivers/bus/auxiliary/private.h
>  create mode 100644 drivers/bus/auxiliary/rte_bus_auxiliary.h
>  create mode 100644 drivers/bus/auxiliary/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5877a16971..eaf691ca6a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -525,6 +525,11 @@ F: doc/guides/mempool/octeontx2.rst
>  Bus Drivers
>  ---
>  
> +Auxiliary bus driver

Shouldn't it be EXPERIMENTAL?

> +M: Parav Pandit 
> +M: Xueming Li 
> +F: drivers/bus/auxiliary/
> +
>  Intel FPGA bus
>  M: Rosen Xu 
>  F: drivers/bus/ifpga/
> diff --git a/doc/guides/rel_notes/release_21_08.rst 
> b/doc/guides/rel_notes/release_21_08.rst
> index a6ecfdf3ce..e7ef4c8a05 100644
> --- a/doc/guides/rel_notes/release_21_08.rst
> +++ b/doc/guides/rel_notes/release_21_08.rst
> @@ -55,6 +55,12 @@ New Features
>   Also, make sure to start the actual text at the margin.
>   ===
>  
> +* **Added auxiliary bus support.**
> +
> +  Auxiliary bus provides a way to split function into child-devices
> +  representing sub-domains of functionality. Each auxiliary device
> +  represents a part of its parent functionality.
> +
>  
>  Removed Items
>  -
> diff --git a/drivers/bus/auxiliary/auxiliary_common.c 
> b/drivers/bus/auxiliary/auxiliary_common.c
> new file mode 100644
> index 00..8a75306da5
> --- /dev/null
> +++ b/drivers/bus/auxiliary/auxiliary_common.c
> @@ -0,0 +1,411 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "private.h"
> +#include "rte_bus_auxiliary.h"
> +
> +static struct rte_devargs *
> +auxiliary_devargs_lookup(const char *name)
> +{
> + struct rte_devargs *devargs;
> +
> + RTE_EAL_DEVARGS_FOREACH(RTE_BUS_AUXILIARY_NAME, devargs) {
> + if (strcmp(devargs->name, name) == 0)
> + return devargs;
> + }
> + return NULL;
> +}
> +
> +/*
> + * Test whether the auxiliary device exist

Missing full stop above.

> + *
> + * Stub for OS not supporting auxiliary bus.
> + */
> +__rte_weak bool
> +auxiliary_dev_exists(const char *name)
> +{
> + RTE_SET_USED(name);
> + return false;
> +}
> +
> +/*
> + * Scan the devices in the auxiliary bus.
> + *
> + * Stub for OS not supporting auxiliary bus.
> + */
> +__rte_weak int
> +auxiliary_scan(void)
> +{
> + return 0;
> +}
> +
> +/*
> + * Update a device's devargs being scanned.
> + *
> + * @param aux_dev
> + *   AUXILIARY device.
> + */
> +void
> +auxiliary_on_scan(struct rte_auxiliary_device *aux_dev)
> +{
> + aux_dev->device.devargs = auxiliary_devargs_lookup(aux_dev->name);
> +}
> +
> +/*
> + * Match the auxiliary driver and device using driver function.
> + */
> +bool
> +auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
> + const struct rte_auxiliary_device *aux_dev)
> +{
> + if (aux_drv->match == NULL)
> + return false;
> + return aux_drv->match(aux_dev->name);
> +}
> +
> +/*
> + * Call th

Re: [dpdk-dev] NUMA node/socket

2021-07-04 Thread Stephen Hemminger
On Sun, 04 Jul 2021 11:28:23 +0200
Thomas Monjalon  wrote:

> 04/07/2021 10:27, Andrew Rybchenko:
> > On 7/4/21 4:53 AM, Thomas Monjalon wrote:  
> > > 04/07/2021 03:38, Thomas Monjalon:  
> > >> There are some mix between NUMA node and socket IDs in DPDK.
> > >> Examples:
> > >>  * rte_eth_dev_socket_id() returns the NUMA node.
> > >>  * rte_malloc use sockets to allocate the memory
> > >>
> > >> Is it critical?  
> > > 
> > > There is a function, implemented for Windows only,
> > > which distinguishes clearly node and socket
> > > but it assumes there is only 1 node per socket:
> > > 
> > > unsigned int
> > > eal_socket_numa_node(unsigned int socket_id)
> > > {
> > > return cpu_map.sockets[socket_id].node_id;
> > > }
> > > 
> > > Reminder: AMD can be configured to have multiple nodes per socket.  
> > 
> > Taking the reminder into account the topic definitely
> > requires improvements.
> > 
> > I apologize for my ignorance, but
> > Is socket ID really interesting to anybody in DPDK?  
> 
> I think the socket ID might be interesting for the threads,
> but not for memory or devices.
> 
> > If no, we should just clarify terminology and switch
> > to NUMA node everywhere.  
> 
> I have the same opinion as Andrew.
> If socket ID is required, it could be deduced from the NUMA node
> or from the CPU core.
> 
> 

Agree that NUMA node should be used everywhere.
There can be some confusion because some architectures define NUMA
for lcore's that have slower interconnect even if they are on the same chip.
Also Dell used to have different terminology for socket number on some machines.
The socket id reported by BIOS started with 1!


Re: [dpdk-dev] NUMA node/socket

2021-07-04 Thread Dmitry Kozlyuk
2021-07-04 11:28 (UTC+0200), Thomas Monjalon:
> 04/07/2021 10:27, Andrew Rybchenko:
> > On 7/4/21 4:53 AM, Thomas Monjalon wrote:  
> > > 04/07/2021 03:38, Thomas Monjalon:  
> > >> There are some mix between NUMA node and socket IDs in DPDK.
> > >> Examples:
> > >>  * rte_eth_dev_socket_id() returns the NUMA node.
> > >>  * rte_malloc use sockets to allocate the memory
> > >>
> > >> Is it critical?  
> > > 
> > > There is a function, implemented for Windows only,
> > > which distinguishes clearly node and socket
> > > but it assumes there is only 1 node per socket:
> > > 
> > > unsigned int
> > > eal_socket_numa_node(unsigned int socket_id)
> > > {
> > > return cpu_map.sockets[socket_id].node_id;
> > > }
> > > 
> > > Reminder: AMD can be configured to have multiple nodes per socket.
> >
> > Taking the reminder into account the topic definitely
> > requires improvements.
> > 
> > I apologize for my ignorance, but
> > Is socket ID really interesting to anybody in DPDK?  
> 
> I think the socket ID might be interesting for the threads,
> but not for memory or devices.
> 
> > If no, we should just clarify terminology and switch
> > to NUMA node everywhere.  
> 
> I have the same opinion as Andrew.
> If socket ID is required, it could be deduced from the NUMA node
> or from the CPU core.

I agree with renaming too.
Everywhere in DPDK "socket ID" really means "NUMA node".

I don't see how exactly socket ID can be deduced from NUMA node or CPU core
(assuming rte_socket_id becomes rte_numa_node_id), but I also can't imagine
why an app would need it. EAL could use NUMA distance info for better memory
management: currently SOCKET_ID_ANY means "current NUMA node or the first one
with enough memory available" while it could be "or the closest one".


Re: [dpdk-dev] [PATCH v2] doc: add link status event pre-conditions

2021-07-04 Thread Thomas Monjalon
02/07/2021 14:58, Andrew Rybchenko:
> On 7/2/21 3:35 PM, Thomas Monjalon wrote:
> > 02/07/2021 10:55, Andrew Rybchenko:
> >> On 6/30/21 4:56 AM, Min Hu (Connor) wrote:
> >>> From: Chengwen Feng 
> >>> +Firmware 1.8.0.0 and later versions support reporting link changes to 
> >>> the PF.
> >>> +Therefore, to use the LSC for the PF driver, ensure that the firmware 
> >>> version
> >>> +also supports reporting link changes.
> >>> +If the VF driver needs to support LSC, special patch must be added:
> >>> +``_.
> >>> +Note: The patch has been uploaded to 5.13 of the Linux kernel mainline.
> > [...]
> >> @Thomas I'm not 100% sure about such links in the
> >> documentation. Please, double-check.
> > 
> > Not sure to understand what you are not 100% sure.
> > Please could you elaborate?
> > 
> 
> I think you replied my question. My goal was just
> to draw your attention to it. Sometimes it is
> undesirable to have links to the external resources
> which could disappear and it would make the
> documentation broken. I hope it is not the case.

In this case, there is no better source than kernel.org :)




Re: [dpdk-dev] [PATCH v3 19/20] net/sfc: support flow action COUNT in transfer rules

2021-07-04 Thread Thomas Monjalon
02/07/2021 14:53, Andrew Rybchenko:
> On 7/2/21 3:30 PM, Thomas Monjalon wrote:
> > 02/07/2021 10:43, Andrew Rybchenko:
> >> On 7/1/21 4:05 PM, Andrew Rybchenko wrote:
> >>> On 7/1/21 3:34 PM, David Marchand wrote:
>  On Thu, Jul 1, 2021 at 11:22 AM Andrew Rybchenko
>   wrote:
> > The build works fine for me on FC34, but it has
> > libatomic-11.1.1-3.fc34.x86_64 installed.
> 
>  I first produced the issue on my "old" FC32.
>  Afaics, for FC33 and later, gcc now depends on libatomic and the
>  problem won't be noticed.
>  FC32 and before are EOL, but I then reproduced the issue on RHEL 8
>  (and Intel CI reported it on Centos 8 too).
> >>>
> >>> I see. Thanks for the clarification.
> >>>
> >
> > I'd like to understand what we're trying to solve here.
> > Are we trying to make meson to report the missing library
> > correctly?
> >
> > If so, I think I can do simple check using cc.links()
> > which will fail if the library is not found. I'll
> > test that it works as expected if the library is not
> > completely installed.
> >
> 
>  I tried below diff, and it works for me.
>  "works" as in net/sfc gets disabled without libatomic installed:
> > [...]
>   # for gcc compiles we need -latomic for 128-bit atomic ops
>   if cc.get_id() == 'gcc'
>  +code = '''#include 
>  +void main() { printf("Atomilink me.\n"); }
>  +'''
>  +if not cc.links(code, args: '-latomic', name: 'libatomic link 
>  check')
>  +build = false
>  +reason = 'missing dependency, "libatomic"'
>  +subdir_done()
>  +endif
>   ext_deps += cc.find_library('atomic')
>   endif
> >>>
> >>> Many thanks, LGTM. I'll pick it up and add comments why
> >>> it is checked this way.
> >>>
> >>
> >> I've send v4 with the problem fixed. However, I'm afraid
> >> build test systems should be updated to have libatomic
> >> correctly installed. Otherwise, they do not really check
> >> net/sfc build.
> > 
> > When testing on old systems, sfc won't be tested anymore after this 
> > patchset.
> > On recent systems, sfc should be enabled I guess.
> > I don't see how to manage better, sorry.
> > 
> 
> I see. I thought that it is possible to install missing
> package on corresponding systems to make build coverage
> better.
> 
> Now I automatically test build on problematic distros
> with previously missing packages installed. So I have
> internal build coverage anyway.

David asked for installing libatomic:
https://inbox.dpdk.org/ci/CAJFAV8xCNBL4yEZU0c=djgys+13qm7uz7e2qnukmum7eakk...@mail.gmail.com/

We should wait for it to be installed otherwise ABI check will fail.





Re: [dpdk-dev] [RFC] lib/ethdev: add dev configured flag

2021-07-04 Thread Thomas Monjalon
08/05/2021 10:00, Huisong Li:
> Currently, if dev_configure is not invoked or fails to be invoked, users
> can still invoke dev_start successfully. This patch adds a "dev_configured"
> flag in "rte_eth_dev_data" to control whether dev_start can be invoked.
[...]
> --- a/lib/ethdev/rte_ethdev_core.h
> +++ b/lib/ethdev/rte_ethdev_core.h
> @@ -167,7 +167,11 @@ struct rte_eth_dev_data {
>   scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
> OFF(0) */
>   all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
>   dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). 
> */
> - lro : 1;   /**< RX LRO is ON(1) / OFF(0) */
> + lro : 1,  /**< RX LRO is ON(1) / OFF(0) */
> + dev_configured : 1;
> + /**< Device configuration state:
> +  * CONFIGURED(1) / NOT CONFIGURED(0).
> +  */

Why not using "enum rte_eth_dev_state"?
Because rte_eth_dev.state is not shared between processes?





Re: [dpdk-dev] [PATCH 2/2] vhost: add thread unsafe async registration functions

2021-07-04 Thread Hu, Jiayu


> -Original Message-
> From: Maxime Coquelin 
> Sent: Friday, July 2, 2021 3:41 PM
> To: Hu, Jiayu ; dev@dpdk.org
> Cc: Xia, Chenbo ; Wang, Yinan
> 
> Subject: Re: [PATCH 2/2] vhost: add thread unsafe async registration
> functions
> 
> 
> 
> On 5/28/21 10:11 AM, Jiayu Hu wrote:
> > This patch is to add thread unsafe version for async register and
> 
> s/is to add/adds/
> 
> > unregister functions.
> >
> > Signed-off-by: Jiayu Hu 
> > ---
> >  doc/guides/prog_guide/vhost_lib.rst |  12 +++
> >  lib/vhost/rte_vhost_async.h |  42 ++
> >  lib/vhost/version.map   |   4 +
> >  lib/vhost/vhost.c   | 161 
> > +++-
> >  4 files changed, 178 insertions(+), 41 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/vhost_lib.rst
> > b/doc/guides/prog_guide/vhost_lib.rst
> > index d18fb98..6b2745f 100644
> > --- a/doc/guides/prog_guide/vhost_lib.rst
> > +++ b/doc/guides/prog_guide/vhost_lib.rst
> > @@ -253,6 +253,12 @@ The following is an overview of some key Vhost
> API functions:
> >  vhost invokes this function to get the copy data completed by async
> >  devices.
> >
> > +* ``rte_vhost_async_channel_register_thread_unsafe(vid, queue_id,
> > +features, ops)``
> > +  Register a vhost queue with async copy device channel without
> > +  performing any locking.
> > +
> > +  This function is only safe to call from within vhost callback functions.
> 
> I'm not sure with this, AFAICS .new_device() is called without the lock held,
> so is that safe? Since .new_device() is the first time the app gets required
> device info to use the Vhost API, that might not be an issue, though.

I think so. Although new_device() is called without lock, no data plane threads
can operate vrings if device is not created.

Thanks,
Jiayu


Re: [dpdk-dev] [PATCH v3 1/2] net/ice: factorize firmware loading

2021-07-04 Thread Wang, Haiyue
Hi David,

> -Original Message-
> From: dev  On Behalf Of David Marchand
> Sent: Tuesday, June 29, 2021 16:07
> To: dev@dpdk.org
> Cc: Yang, Qiming ; Zhang, Qi Z 
> Subject: [dpdk-dev] [PATCH v3 1/2] net/ice: factorize firmware loading
> 
> Both "normal" and "dcf" inits have their copy of some firmware loading
> code.
> 
> The DSN query is moved in specific parts for the "normal" and "dcf" init.
> 
> A common helper ice_load_pkg is then introduced and takes an adapter
> pointer as its main input.
> 
> This helper takes care of finding the right firmware file and loading
> it.
> The adapter active_pkg_type field is set by this helper.
> 
> The ice_access macro is removed from the osdep.h header: osdep.h should
> only hosts wrappers for base driver code.
> 
> Signed-off-by: David Marchand 
> ---
>  drivers/net/ice/base/ice_osdep.h |   6 --
>  drivers/net/ice/ice_dcf_parent.c |  97 ++-
>  drivers/net/ice/ice_ethdev.c | 161 +++
>  drivers/net/ice/ice_ethdev.h |   3 +-
>  4 files changed, 88 insertions(+), 179 deletions(-)
> 


> + if (!use_dsn)
> + goto no_dsn;
> +
> + memset(opt_ddp_filename, 0, ICE_MAX_PKG_FILENAME_SIZE);
> + snprintf(opt_ddp_filename, ICE_MAX_PKG_FILENAME_SIZE,
> + "ice-%016" PRIx64 ".pkg", dsn);
> + strncpy(pkg_file, ICE_PKG_FILE_SEARCH_PATH_UPDATES,
> + ICE_MAX_PKG_FILENAME_SIZE);
> + if (!ice_access(strcat(pkg_file, opt_ddp_filename), 0))
> + goto load_fw;
> +
> + strncpy(pkg_file, ICE_PKG_FILE_SEARCH_PATH_DEFAULT,
> + ICE_MAX_PKG_FILENAME_SIZE);
> + if (!ice_access(strcat(pkg_file, opt_ddp_filename), 0))
> + goto load_fw;
> +
> +no_dsn:
> + strncpy(pkg_file, ICE_PKG_FILE_UPDATES, ICE_MAX_PKG_FILENAME_SIZE);
> + if (!ice_access(pkg_file, 0))
> + goto load_fw;
> + strncpy(pkg_file, ICE_PKG_FILE_DEFAULT, ICE_MAX_PKG_FILENAME_SIZE);
> + if (ice_access(pkg_file, 0)) {
>   PMD_INIT_LOG(ERR, "failed to search file path\n");
> - return err;
> + return -1;
>   }
> 
> +load_fw:
>   file = fopen(pkg_file, "rb");
>   if (!file)  {
>   PMD_INIT_LOG(ERR, "failed to open file: %s\n", pkg_file);
>   return -1;
>   }
> 

I'm wondering what's full name for ice firmware in F34, has any *.xz
postfix ? If so, the search method will also needs to be updated, since
we will check each file can be accessed: 

#define ICE_PKG_FILE_DEFAULT "/lib/firmware/intel/ice/ddp/ice.pkg"
#define ICE_PKG_FILE_UPDATES "/lib/firmware/updates/intel/ice/ddp/ice.pkg"


> 2.23.0



Re: [dpdk-dev] [PATCH] ethdev: keep count of allocated and used representor ranges

2021-07-04 Thread Xueming(Steven) Li



> -Original Message-
> From: Viacheslav Galaktionov 
> Sent: Saturday, July 3, 2021 6:01 PM
> To: Xueming(Steven) Li 
> Cc: Andrew Rybchenko ; Matan Azrad 
> ; Shahaf Shuler
> ; Slava Ovsiienko ; 
> NBU-Contact-Thomas Monjalon ;
> Ferruh Yigit ; dev@dpdk.org; sta...@dpdk.org
> Subject: Re: [PATCH] ethdev: keep count of allocated and used representor 
> ranges
> 
> Hello!
> 
> On 2021-07-03 04:32, Xueming(Steven) Li wrote:
> >> +  if (i == n_entries)
> >> +  break;
> >>}
> >>  out:
> >> +  info->nb_ranges = i;
> >
> > Here info maybe NULL.
> 
> Good catch, thanks for noticing!
> 
> >> faf3bd901d..d2b27c351f 100644
> >> --- a/lib/ethdev/rte_ethdev.h
> >> +++ b/lib/ethdev/rte_ethdev.h
> >> @@ -4856,6 +4856,8 @@ struct rte_eth_representor_range {  struct
> >> rte_eth_representor_info {
> >>uint16_t controller; /**< Controller ID of caller device. */
> >>uint16_t pf; /**< Physical function ID of caller device. */
> >> +  uint32_t nb_ranges_alloc; /**< Size of the ranges array. */
> >> +  uint32_t nb_ranges; /**< Number of initialized ranges. */
> >
> > How about rte_eth_representor_info_get(info) return max ranges size if
> > info is NULL, return real initialized ranges if info not NULL?
> 
> I'm not sure how I feel about it. I think it'd be best if the function 
> returned just one thing.
> 
> Moreover, there are drivers that don't have a fixed structure for representor 
> IDs, e.g. net/sfc, where every range will contain a single
> ID. If a new representor range is created between invocations of this 
> function, there probably should be a way for the caller to know
> about this.
> 
> Perhaps we should move the total number of representors to an out parameter 
> and use the return value for the number of initialized
> ranges. What do you think about this?

No big difference, I'm ok with your original design.


Re: [dpdk-dev] [PATCH v6 01/19] net/ngbe: add build and doc infrastructure

2021-07-04 Thread Jiawen Wu
On July 2, 2021 9:08 PM, Andrew Rybchenko wrote:
> On 6/17/21 1:59 PM, Jiawen Wu wrote:
> > Adding bare minimum PMD library and doc build infrastructure and claim
> > the maintainership for ngbe PMD.
> >
> > Signed-off-by: Jiawen Wu 
> 
> Just one nit below.
> 
> [snip]
> 
> > diff --git a/drivers/net/ngbe/ngbe_ethdev.c
> > b/drivers/net/ngbe/ngbe_ethdev.c new file mode 100644 index
> > 00..f8e19066de
> > --- /dev/null
> > +++ b/drivers/net/ngbe/ngbe_ethdev.c
> > @@ -0,0 +1,29 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018-2020 Beijing WangXun Technology Co., Ltd.
> > + * Copyright(c) 2010-2017 Intel Corporation  */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +static int
> > +eth_ngbe_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
> > +   struct rte_pci_device *pci_dev)
> > +{
> > +   RTE_SET_USED(pci_dev);
> > +   return -EINVAL;
> > +}
> > +
> > +static int eth_ngbe_pci_remove(struct rte_pci_device *pci_dev) {
> > +   RTE_SET_USED(pci_dev);
> > +   return -EINVAL;
> > +}
> 
> Why is different style of unused suppression is used
> above: __rte_unused vs RTE_SET_USED'?
> 
> [snip]

I guess, 'pci_drv' will not be used in future implement the probe function.
So I just gave '__rte_unused' when I separated the patches.
Does this have to be corrected?






Re: [dpdk-dev] [RFC] lib/ethdev: add dev configured flag

2021-07-04 Thread Huisong Li



在 2021/7/3 19:04, Ananyev, Konstantin 写道:

在 2021/7/2 21:23, Ananyev, Konstantin 写道:

On 7/2/2021 12:08 PM, Andrew Rybchenko wrote:

@Thomas, @Ferruh, I tend to accept it (with minor style fixes),
but I need your opinion on it before doing it.


I guess we were relying on the user/application to have correct order up until
now, it can be good to add this into the API. OK to add it for me.

I don't know do we really need that flag in dev_data or not,
but if we do - probably better to reset it at dev_confgure() straight before
we start to make any changes in dev_data.

Sorry, I don't get you. Some fields in rte_eth_dev_data are initialized
firstly in the probe phase.

Do you mean to add clear this flag at the beginning of dev_configure()?

Yes, just before we start to modify things.


In this patch, this flag has been cleared for all scenarios where the 
rte_eth_dev_data modification fails in the dev_configure().


And it is set to 1 when dev_configure() is configured successfully.

Please check the rollback. Thanks😁





That way SP can also figure out that device is not configured yet, etc.


Thanks,
Andrew.

On 6/29/21 5:27 AM, Huisong Li wrote:

在 2021/6/14 23:37, Andrew Rybchenko 写道:

Summary should start from "ethdev: "

Don't forget to include all maintainers in Cc the next time.
Just use --cc-cmd or --to-cmd options.

ok, thanks!

Adding Thomas.

On 5/8/21 11:00 AM, Huisong Li wrote:

Currently, if dev_configure is not invoked or fails to be invoked, users
can still invoke dev_start successfully. This patch adds a
"dev_configured"
flag in "rte_eth_dev_data" to control whether dev_start can be invoked.

In theory there is an indirect condition. If number of configured Tx
*and* Rx queues is 0, device is not configured.

That's true. If the framework doesn't have this check, each driver needs
to do this.

But it's a common thing, and it's probably more reasonable to put it in
the ethdev layer.


I have no strong opinion on the topic. Extra flag requires
extra housekeeping. Indirect conditions are not always good
and could be a subject to change.


Signed-off-by: Huisong Li 
---
    lib/ethdev/rte_ethdev.c  | 11 +++
    lib/ethdev/rte_ethdev_core.h |  6 +-
    2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index a187976..7d74b17 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1604,6 +1604,8 @@ rte_eth_dev_configure(uint16_t port_id,
uint16_t nb_rx_q, uint16_t nb_tx_q,
    }
      rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q,
dev_conf, 0);
+    dev->data->dev_configured = 1;
+
    return 0;
    reset_queues:
    eth_dev_rx_queue_config(dev, 0);
@@ -1614,6 +1616,8 @@ rte_eth_dev_configure(uint16_t port_id,
uint16_t nb_rx_q, uint16_t nb_tx_q,
    dev->data->mtu = old_mtu;
      rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q,
dev_conf, ret);
+    dev->data->dev_configured = 0;
+

I would move it before trace function.


    return ret;
    }
    @@ -1749,6 +1753,13 @@ rte_eth_dev_start(uint16_t port_id)
      RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_start, -ENOTSUP);
    +    if (dev->data->dev_configured == 0) {
+    RTE_ETHDEV_LOG(INFO,
+    "Device with port_id=%"PRIu16" is not configured.\n",
+    port_id);

Should log type be warning/error?


+    return -EINVAL;
+    }
+
    if (dev->data->dev_started != 0) {
    RTE_ETHDEV_LOG(INFO,
    "Device with port_id=%"PRIu16" already started\n",
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index 4679d94..b508769 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -167,7 +167,11 @@ struct rte_eth_dev_data {
    scattered_rx : 1,  /**< RX of scattered packets is ON(1) /
OFF(0) */
    all_multicast : 1, /**< RX all multicast mode ON(1) /
OFF(0). */
    dev_started : 1,   /**< Device state: STARTED(1) /
STOPPED(0). */
-    lro : 1;   /**< RX LRO is ON(1) / OFF(0) */
+    lro : 1,  /**< RX LRO is ON(1) / OFF(0) */
+    dev_configured : 1;
+    /**< Device configuration state:
+ * CONFIGURED(1) / NOT CONFIGURED(0).
+ */
    uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
    /**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
    uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];


.


Re: [dpdk-dev] [RFC] lib/ethdev: add dev configured flag

2021-07-04 Thread Huisong Li



在 2021/7/5 4:05, Thomas Monjalon 写道:

08/05/2021 10:00, Huisong Li:

Currently, if dev_configure is not invoked or fails to be invoked, users
can still invoke dev_start successfully. This patch adds a "dev_configured"
flag in "rte_eth_dev_data" to control whether dev_start can be invoked.

[...]

--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -167,7 +167,11 @@ struct rte_eth_dev_data {
scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
OFF(0) */
all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). 
*/
-   lro : 1;   /**< RX LRO is ON(1) / OFF(0) */
+   lro : 1,  /**< RX LRO is ON(1) / OFF(0) */
+   dev_configured : 1;
+   /**< Device configuration state:
+* CONFIGURED(1) / NOT CONFIGURED(0).
+*/

Why not using "enum rte_eth_dev_state"?
Because rte_eth_dev.state is not shared between processes?


It doesn't feel right. "enum rte_eth_dev_state" is private to the 
primary and secondary processes and can be independently controlled.


However, the secondary process does not make resource allocations and 
does not call dev_configure().


These are done by the primary process and  can be obtained or used by 
the secondary process.


Like "dev_started" in struct rte_eth_dev_data.





.


Re: [dpdk-dev] [PATCH v3 1/2] net/ice: factorize firmware loading

2021-07-04 Thread Wang, Haiyue
Hi David & Qi,

> -Original Message-
> From: Wang, Haiyue
> Sent: Monday, July 5, 2021 09:43
> To: David Marchand ; dev@dpdk.org
> Cc: Yang, Qiming ; Zhang, Qi Z 
> Subject: RE: [dpdk-dev] [PATCH v3 1/2] net/ice: factorize firmware loading
> 
> Hi David,
> 
> > -Original Message-
> > From: dev  On Behalf Of David Marchand
> > Sent: Tuesday, June 29, 2021 16:07
> > To: dev@dpdk.org
> > Cc: Yang, Qiming ; Zhang, Qi Z 
> > Subject: [dpdk-dev] [PATCH v3 1/2] net/ice: factorize firmware loading
> >
> > Both "normal" and "dcf" inits have their copy of some firmware loading
> > code.
> >
> > The DSN query is moved in specific parts for the "normal" and "dcf" init.
> >
> > A common helper ice_load_pkg is then introduced and takes an adapter
> > pointer as its main input.
> >
> > This helper takes care of finding the right firmware file and loading
> > it.
> > The adapter active_pkg_type field is set by this helper.
> >
> > The ice_access macro is removed from the osdep.h header: osdep.h should
> > only hosts wrappers for base driver code.
> >
> > Signed-off-by: David Marchand 
> > ---
> >  drivers/net/ice/base/ice_osdep.h |   6 --
> >  drivers/net/ice/ice_dcf_parent.c |  97 ++-
> >  drivers/net/ice/ice_ethdev.c | 161 +++
> >  drivers/net/ice/ice_ethdev.h |   3 +-
> >  4 files changed, 88 insertions(+), 179 deletions(-)
> >
> 
> 
> > +   if (!use_dsn)
> > +   goto no_dsn;
> > +
> > +   memset(opt_ddp_filename, 0, ICE_MAX_PKG_FILENAME_SIZE);
> > +   snprintf(opt_ddp_filename, ICE_MAX_PKG_FILENAME_SIZE,
> > +   "ice-%016" PRIx64 ".pkg", dsn);
> > +   strncpy(pkg_file, ICE_PKG_FILE_SEARCH_PATH_UPDATES,
> > +   ICE_MAX_PKG_FILENAME_SIZE);
> > +   if (!ice_access(strcat(pkg_file, opt_ddp_filename), 0))
> > +   goto load_fw;
> > +
> > +   strncpy(pkg_file, ICE_PKG_FILE_SEARCH_PATH_DEFAULT,
> > +   ICE_MAX_PKG_FILENAME_SIZE);
> > +   if (!ice_access(strcat(pkg_file, opt_ddp_filename), 0))
> > +   goto load_fw;
> > +
> > +no_dsn:
> > +   strncpy(pkg_file, ICE_PKG_FILE_UPDATES, ICE_MAX_PKG_FILENAME_SIZE);
> > +   if (!ice_access(pkg_file, 0))
> > +   goto load_fw;
> > +   strncpy(pkg_file, ICE_PKG_FILE_DEFAULT, ICE_MAX_PKG_FILENAME_SIZE);
> > +   if (ice_access(pkg_file, 0)) {
> > PMD_INIT_LOG(ERR, "failed to search file path\n");
> > -   return err;
> > +   return -1;
> > }
> >
> > +load_fw:
> > file = fopen(pkg_file, "rb");
> > if (!file)  {
> > PMD_INIT_LOG(ERR, "failed to open file: %s\n", pkg_file);
> > return -1;
> > }
> >
> 
> I'm wondering what's full name for ice firmware in F34, has any *.xz
> postfix ? If so, the search method will also needs to be updated, since
> we will check each file can be accessed:
> 
> #define ICE_PKG_FILE_DEFAULT "/lib/firmware/intel/ice/ddp/ice.pkg"
> #define ICE_PKG_FILE_UPDATES "/lib/firmware/updates/intel/ice/ddp/ice.pkg"
> 

We need to update the default/comms search method:

I try the F34:

tree /lib/firmware/intel/ice/
/lib/firmware/intel/ice/
├── ddp
│   ├── ice-1.3.16.0.pkg.xz
│   └── ice.pkg.xz -> ice-1.3.16.0.pkg.xz
└── ddp-comms
└── ice_comms-1.3.20.0.pkg.xz

It matches the upstream version:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/intel/ice

d-  ddp-comms   50  logstatsplain
d-  ddp 44  logstatsplain


> 
> > 2.23.0



Re: [dpdk-dev] [PATCH v2] net/i40e: add logic of processing continuous DD bits for Arm

2021-07-04 Thread Joyce Kong



> -Original Message-
> From: Honnappa Nagarahalli 
> Sent: Wednesday, June 30, 2021 9:15 AM
> To: Joyce Kong ; beilei.x...@intel.com;
> qi.z.zh...@intel.com; Ruifeng Wang ;
> bruce.richard...@intel.com; helin.zh...@intel.com
> Cc: dev@dpdk.org; sta...@dpdk.org; nd ; Honnappa
> Nagarahalli ; nd 
> Subject: RE: [PATCH v2] net/i40e: add logic of processing continuous DD bits
> for Arm
> 
> 
> >
> > For Arm platforms, reading descs can get re-ordered, then the status
> > of DD bits will be discontinuous, so add the logic to only process
> > continuous descs by checking DD bits.
> >
> > Fixes: 4861cde46116 ("i40e: new poll mode driver")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Joyce Kong 
> > Reviewed-by: Ruifeng Wang 
> > ---
> >  drivers/net/i40e/i40e_rxtx.c | 19 +++
> >  1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > b/drivers/net/i40e/i40e_rxtx.c index 6c58decec..86e2f083e 100644
> > --- a/drivers/net/i40e/i40e_rxtx.c
> > +++ b/drivers/net/i40e/i40e_rxtx.c
> > @@ -452,7 +452,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > uint16_t pkt_len;
> > uint64_t qword1;
> > uint32_t rx_status;
> > -   int32_t s[I40E_LOOK_AHEAD], nb_dd;
> > +   int32_t s[I40E_LOOK_AHEAD], var, nb_dd;
> > int32_t i, j, nb_rx = 0;
> > uint64_t pkt_flags;
> > uint32_t *ptype_tbl = rxq->vsi->adapter->ptype_tbl; @@ -482,11
> > +482,22 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > I40E_RXD_QW1_STATUS_SHIFT;
> > }
> >
> > -   rte_smp_rmb();
> > +   /* This barrier is to order loads of different words in the
> > descriptor */
> > +   rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> I think this should go into a separate commit as the following change is
> unrelated.

Will separate the two changes in v3.

> 
> >
> > /* Compute how many status bits were set */
> > -   for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
> > -   nb_dd += s[j] & (1 <<
> > I40E_RX_DESC_STATUS_DD_SHIFT);
> > +   for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) {
> > +   var = s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> > #ifdef
> > +RTE_ARCH_ARM
> > +   /* For Arm platforms, only compute continuous
> > status bits */
> > +   if (var)
> > +   nb_dd += 1;
> > +   else
> > +   break;
> > +#else
> > +   nb_dd += var;
> > +#endif
> > +   }
> >
> > nb_rx += nb_dd;
> >
> > --
> > 2.17.1



[dpdk-dev] [PATCH 1/2] common/cnxk: add support for rte flow item raw

2021-07-04 Thread psatheesh
From: Satheesh Paul 

Add roc API for rte_flow_item_raw to parse custom L2 and L3 protocols.

Signed-off-by: Satheesh Paul 
Reviewed-by: Kiran Kumar Kokkilagadda 
---
 drivers/common/cnxk/roc_mbox.h  | 24 ++--
 drivers/common/cnxk/roc_nix_ops.c   | 14 +
 drivers/common/cnxk/roc_npc.h   | 11 
 drivers/common/cnxk/roc_npc_parse.c | 86 +++--
 drivers/common/cnxk/roc_npc_priv.h  |  9 +--
 drivers/common/cnxk/roc_npc_utils.c |  6 +-
 drivers/common/cnxk/roc_utils.c |  2 +-
 7 files changed, 136 insertions(+), 16 deletions(-)

diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h
index 9c529d754..b254f005a 100644
--- a/drivers/common/cnxk/roc_mbox.h
+++ b/drivers/common/cnxk/roc_mbox.h
@@ -278,14 +278,28 @@ struct ready_msg_rsp {
uint16_t __io rclk_freq; /* RCLK frequency */
 };
 
+enum npc_pkind_type {
+   NPC_RX_VLAN_EXDSA_PKIND = 56ULL,
+   NPC_RX_CHLEN24B_PKIND,
+   NPC_RX_CPT_HDR_PKIND,
+   NPC_RX_CHLEN90B_PKIND,
+   NPC_TX_HIGIG_PKIND,
+   NPC_RX_HIGIG_PKIND,
+   NPC_RX_EXDSA_PKIND,
+   NPC_RX_EDSA_PKIND,
+   NPC_TX_DEF_PKIND,
+};
+
 /* Struct to set pkind */
 struct npc_set_pkind {
struct mbox_msghdr hdr;
-#define ROC_PRIV_FLAGS_DEFAULT BIT_ULL(0)
-#define ROC_PRIV_FLAGS_EDSABIT_ULL(1)
-#define ROC_PRIV_FLAGS_HIGIG   BIT_ULL(2)
-#define ROC_PRIV_FLAGS_LEN_90B BIT_ULL(3)
-#define ROC_PRIV_FLAGS_CUSTOM  BIT_ULL(63)
+#define ROC_PRIV_FLAGS_DEFAULT   BIT_ULL(0)
+#define ROC_PRIV_FLAGS_EDSA  BIT_ULL(1)
+#define ROC_PRIV_FLAGS_HIGIG BIT_ULL(2)
+#define ROC_PRIV_FLAGS_LEN_90B   BIT_ULL(3)
+#define ROC_PRIV_FLAGS_EXDSA BIT_ULL(4)
+#define ROC_PRIV_FLAGS_VLAN_EXDSA BIT_ULL(5)
+#define ROC_PRIV_FLAGS_CUSTOMBIT_ULL(63)
uint64_t __io mode;
 #define PKIND_TX BIT_ULL(0)
 #define PKIND_RX BIT_ULL(1)
diff --git a/drivers/common/cnxk/roc_nix_ops.c 
b/drivers/common/cnxk/roc_nix_ops.c
index eeb85a54f..0e28302e7 100644
--- a/drivers/common/cnxk/roc_nix_ops.c
+++ b/drivers/common/cnxk/roc_nix_ops.c
@@ -378,6 +378,8 @@ roc_nix_switch_hdr_set(struct roc_nix *roc_nix, uint64_t 
switch_header_type)
switch_header_type != ROC_PRIV_FLAGS_EDSA &&
switch_header_type != ROC_PRIV_FLAGS_HIGIG &&
switch_header_type != ROC_PRIV_FLAGS_LEN_90B &&
+   switch_header_type != ROC_PRIV_FLAGS_EXDSA &&
+   switch_header_type != ROC_PRIV_FLAGS_VLAN_EXDSA &&
switch_header_type != ROC_PRIV_FLAGS_CUSTOM) {
plt_err("switch header type is not supported");
return NIX_ERR_PARAM;
@@ -399,6 +401,18 @@ roc_nix_switch_hdr_set(struct roc_nix *roc_nix, uint64_t 
switch_header_type)
if (req == NULL)
return rc;
req->mode = switch_header_type;
+
+   if (switch_header_type == ROC_PRIV_FLAGS_LEN_90B) {
+   req->mode = ROC_PRIV_FLAGS_CUSTOM;
+   req->pkind = NPC_RX_CHLEN90B_PKIND;
+   } else if (switch_header_type == ROC_PRIV_FLAGS_EXDSA) {
+   req->mode = ROC_PRIV_FLAGS_CUSTOM;
+   req->pkind = NPC_RX_EXDSA_PKIND;
+   } else if (switch_header_type == ROC_PRIV_FLAGS_VLAN_EXDSA) {
+   req->mode = ROC_PRIV_FLAGS_CUSTOM;
+   req->pkind = NPC_RX_VLAN_EXDSA_PKIND;
+   }
+
req->dir = PKIND_RX;
rc = mbox_process_msg(mbox, (void *)&rsp);
if (rc)
diff --git a/drivers/common/cnxk/roc_npc.h b/drivers/common/cnxk/roc_npc.h
index 2c0a536c9..bab25fd72 100644
--- a/drivers/common/cnxk/roc_npc.h
+++ b/drivers/common/cnxk/roc_npc.h
@@ -36,6 +36,7 @@ enum roc_npc_item_type {
ROC_NPC_ITEM_TYPE_CPT_HDR,
ROC_NPC_ITEM_TYPE_L3_CUSTOM,
ROC_NPC_ITEM_TYPE_QINQ,
+   ROC_NPC_ITEM_TYPE_RAW,
ROC_NPC_ITEM_TYPE_END,
 };
 
@@ -47,6 +48,16 @@ struct roc_npc_item_info {
const void *last; /* For range */
 };
 
+struct roc_npc_flow_item_raw {
+   uint32_t relative : 1; /**< Look for pattern after the previous item. */
+   uint32_t search : 1;   /**< Search pattern from offset. */
+   uint32_t reserved : 30; /**< Reserved, must be set to zero. */
+   int32_t offset; /**< Absolute or relative offset for pattern. */
+   uint16_t limit; /**< Search area limit for start of pattern. */
+   uint16_t length;/**< Pattern length. */
+   const uint8_t *pattern; /**< Byte string to look for. */
+};
+
 #define ROC_NPC_MAX_ACTION_COUNT 12
 
 enum roc_npc_action_type {
diff --git a/drivers/common/cnxk/roc_npc_parse.c 
b/drivers/common/cnxk/roc_npc_parse.c
index d07f91db3..8125035dd 100644
--- a/drivers/common/cnxk/roc_npc_parse.c
+++ b/drivers/common/cnxk/roc_npc_parse.c
@@ -136,14 +136,46 @@ npc_parse_la(struct npc_parse_state *pst)
return npc_update_parse_state(pst, &info, lid, lt, 0);
 }
 
+static int
+npc_flow_raw_item_prepare(const struct roc_npc_flow_item_raw *raw_spec,
+ const struct roc_npc_flow_item_

[dpdk-dev] [PATCH 2/2] net/cnxk: add support for rte flow item raw

2021-07-04 Thread psatheesh
From: Satheesh Paul 

Add support for rte_flow_item_raw to parse custom L2 and L3 protocols.

Signed-off-by: Satheesh Paul 
Reviewed-by: Kiran Kumar Kokkilagadda 
---
 doc/guides/nics/cnxk.rst   | 169 +++--
 drivers/net/cnxk/cnxk_ethdev_devargs.c |   7 +
 drivers/net/cnxk/cnxk_rte_flow.c   |  12 +-
 3 files changed, 172 insertions(+), 16 deletions(-)

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index cb2a51e1d..a7b59a48e 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -104,7 +104,7 @@ Runtime Config Options
 
 - ``Rx&Tx scalar mode enable`` (default ``0``)
 
-   PMD supports both scalar and vector mode, it may be selected at runtime
+   Ethdev supports both scalar and vector mode, it may be selected at runtime
using ``scalar_enable`` ``devargs`` parameter.
 
 - ``RSS reta size`` (default ``64``)
@@ -151,7 +151,7 @@ Runtime Config Options
 
   -a 0002:02:00.0,max_sqb_count=64
 
-   With the above configuration, each send queue's descriptor buffer count is
+   With the above configuration, each send queue's decscriptor buffer count is
limited to a maximum of 64 buffers.
 
 - ``Switch header enable`` (default ``none``)
@@ -165,7 +165,7 @@ Runtime Config Options
 
With the above configuration, higig2 will be enabled on that port and the
traffic on this port should be higig2 traffic only. Supported switch header
-   types are "higig2", "dsa", "chlen90b" and "chlen24b".
+   types are "chlen24b", "chlen90b", "dsa", "exdsa", "higig2" and "vlan_exdsa".
 
 - ``RSS tag as XOR`` (default ``0``)
 
@@ -186,6 +186,7 @@ Runtime Config Options
   -a 0002:02:00.0,tag_as_xor=1
 
 
+
 .. note::
 
Above devarg parameters are configurable per device, user needs to pass the
@@ -196,7 +197,7 @@ Limitations
 ---
 
 ``mempool_cnxk`` external mempool handler dependency
-
+~
 
 The OCTEON CN9K/CN10K SoC family NIC has inbuilt HW assisted external mempool 
manager.
 ``net_cnxk`` pmd only works with ``mempool_cnxk`` mempool handler
@@ -209,12 +210,6 @@ CRC stripping
 The OCTEON CN9K/CN10K SoC family NICs strip the CRC for every packet being 
received by
 the host interface irrespective of the offload configuration.
 
-RTE flow GRE support
-
-
-- ``RTE_FLOW_ITEM_TYPE_GRE_KEY`` works only when checksum and routing
-  bits in the GRE header are equal to 0.
-
 Debugging Options
 -
 
@@ -229,3 +224,157 @@ Debugging Options
+---++---+
| 2 | NPC| --log-level='pmd\.net.cnxk\.flow,8'   |
+---++---+
+
+RTE Flow Support
+
+
+The OCTEON CN9K/CN10K SoC family NIC has support for the following patterns and
+actions.
+
+Patterns:
+
+.. _table_cnxk_supported_flow_item_types:
+
+.. table:: Item types
+
+   +++
+   | #  | Pattern Type   |
+   +++
+   | 1  | RTE_FLOW_ITEM_TYPE_ETH |
+   +++
+   | 2  | RTE_FLOW_ITEM_TYPE_VLAN|
+   +++
+   | 3  | RTE_FLOW_ITEM_TYPE_E_TAG   |
+   +++
+   | 4  | RTE_FLOW_ITEM_TYPE_IPV4|
+   +++
+   | 5  | RTE_FLOW_ITEM_TYPE_IPV6|
+   +++
+   | 6  | RTE_FLOW_ITEM_TYPE_ARP_ETH_IPV4|
+   +++
+   | 7  | RTE_FLOW_ITEM_TYPE_MPLS|
+   +++
+   | 8  | RTE_FLOW_ITEM_TYPE_ICMP|
+   +++
+   | 9  | RTE_FLOW_ITEM_TYPE_UDP |
+   +++
+   | 10 | RTE_FLOW_ITEM_TYPE_TCP |
+   +++
+   | 11 | RTE_FLOW_ITEM_TYPE_SCTP|
+   +++
+   | 12 | RTE_FLOW_ITEM_TYPE_ESP |
+   +++
+   | 13 | RTE_FLOW_ITEM_TYPE_GRE |
+   +++
+   | 14 | RTE_FLOW_ITEM_TYPE_NVGRE   |
+   +++
+   | 15 | RTE_FLOW_ITEM_TYPE_VXLAN   |
+   +++
+   | 16 | RTE_FLOW_ITEM_TYPE_GTPC|
+   +++
+   | 17 | RTE_FLOW_ITEM_TYPE_GTPU|
+   +++
+   | 18 | RTE_FLOW_ITEM_TYPE_GENEVE  |
+   +++
+   | 19 | RTE_FLOW_ITEM_TYPE_VXLAN_GPE   |
+   +++
+   | 20 | RTE_FLOW_ITEM_TYPE_IPV6_EXT|
+   +++
+   | 21 | RTE_FLOW_ITEM_TYPE_VOID|
+   ++-

Re: [dpdk-dev] 19.11.9 patches review and test

2021-07-04 Thread Pei Zhang
Hello Christian,

The testing with dpdk 19.11.9-rc3 from Red Hat looks good. We tested below
16 scenarios and all got PASS on RHEL8:

(1)Guest with device assignment(PF) throughput testing(1G hugepage size):
PASS
(2)Guest with device assignment(PF) throughput testing(2M hugepage size) :
PASS
(3)Guest with device assignment(VF) throughput testing: PASS
(4)PVP (host dpdk testpmd as vswitch) 1Q: throughput testing: PASS
(5)PVP vhost-user 2Q throughput testing: PASS
(6)PVP vhost-user 1Q - cross numa node throughput testing: PASS
(7)Guest with vhost-user 2 queues throughput testing: PASS
(8)vhost-user reconnect with dpdk-client, qemu-server: qemu reconnect: PASS
(9)vhost-user reconnect with dpdk-client, qemu-server: ovs reconnect: PASS
(10)PVP 1Q live migration testing: PASS
(11)PVP 1Q cross numa node live migration testing: PASS
(12)Guest with ovs+dpdk+vhost-user 1Q live migration testing: PASS
(13)Guest with ovs+dpdk+vhost-user 1Q live migration testing (2M): PASS
(14)Guest with ovs+dpdk+vhost-user 2Q live migration testing: PASS
(15)Host PF + DPDK testing: PASS
(16)Host VF + DPDK testing: PASS

Versions:

kernel 4.18
qemu 6.0

dpdk: git://dpdk.org/dpdk-stable
# git log -1
commit e5f56f22a82a1df49c224a8af13e769ff4b04cd8 (HEAD, tag: v19.11.9-rc3,
origin/19.11)
Author: Christian Ehrhardt 
Date:   Thu Jun 17 08:25:20 2021 +0200

version: 19.11.9-rc3

Signed-off-by: Christian Ehrhardt 



# git branch
remotes/origin/19.11

NICs: X540-AT2 NIC(ixgbe, 10G)

Best regards,

Pei

On Thu, Jun 17, 2021 at 2:38 PM Christian Ehrhardt <
christian.ehrha...@canonical.com> wrote:

> Hi all,
>
> Here is a list of patches targeted for stable release 19.11.9.
>
> The planned date for the final release is 2nd July.
>
> Please help with testing and validation of your use cases and report
> any issues/results with reply-all to this mail. For the final release
> the fixes and reported validations will be added to the release notes.
>
> A renewed release candidate tarball can be found at:
>
> https://dpdk.org/browse/dpdk-stable/tag/?id=v19.11.9-rc3
>
> These patches are located at branch 19.11 of dpdk-stable repo:
> https://dpdk.org/browse/dpdk-stable/
>
> Thanks.
>
> Christian Ehrhardt 
>
> ---
> Adam Dybkowski (3):
>   common/qat: increase IM buffer size for GEN3
>   compress/qat: enable compression on GEN3
>   crypto/qat: fix null authentication request
>
> Ajit Khaparde (3):
>   net/bnxt: fix RSS context cleanup
>   net/bnxt: fix mismatched type comparison in MAC restore
>   net/bnxt: check PCI config read
>
> Alvin Zhang (7):
>   net/ice: fix VLAN filter with PF
>   net/i40e: fix input set field mask
>   net/e1000: fix Rx error counter for bad length
>   net/e1000: fix max Rx packet size
>   net/ice: fix fast mbuf freeing
>   net/iavf: fix VF to PF command failure handling
>   net/i40e: fix VF RSS configuration
>
> Anatoly Burakov (3):
>   fbarray: fix log message on truncation error
>   power: do not skip saving original P-state governor
>   power: save original ACPI governor always
>
> Andrew Rybchenko (2):
>   net/failsafe: fix RSS hash offload reporting
>   net/failsafe: report minimum and maximum MTU
>
> Andy Moreton (1):
>   common/sfc_efx/base: limit reported MCDI response length
>
> Apeksha Gupta (1):
>   examples/l2fwd-crypto: skip masked devices
>
> Arek Kusztal (1):
>   crypto/qat: fix offset for out-of-place scatter-gather
>
> Beilei Xing (2):
>   net/i40evf: fix packet loss for X722
>   net/iavf: fix Tx context descriptor
>
> Bruce Richardson (1):
>   build: exclude meson files from examples installation
>
> Chaoyong He (1):
>   doc: fix multiport syntax in nfp guide
>
> Chenbo Xia (1):
>   examples/vhost: check memory table query
>
> Chengchang Tang (13):
>   ethdev: validate input in module EEPROM dump
>   ethdev: validate input in register info
>   ethdev: validate input in EEPROM info
>   net/hns3: fix rollback after setting PVID failure
>   examples: add eal cleanup to examples
>   net/bonding: fix adding itself as its slave
>   app/testpmd: fix max queue number for Tx offloads
>   net/tap: fix interrupt vector array size
>   net/bonding: fix socket ID check
>   net/tap: check ioctl on restore
>   net/hns3: fix HW buffer size on MTU update
>   net/hns3: fix processing Tx offload flags
>   examples/timer: fix time interval
>
> Chengwen Feng (32):
>   net/hns3: fix flow counter value
>   net/hns3: fix VF mailbox head field
>   net/hns3: support get device version when dump register
>   test: check thread creation
>   common/dpaax: fix possible null pointer access
>   examples/ethtool: remove unused parsing
>   net/e1000/base: fix timeout for shadow RAM write
>   mbuf: check shared memory before dumping dynamic space
>   eventdev: remove redundant thread name setting
>   eventdev: fix memory leakage on thre

Re: [dpdk-dev] DTS improvement WG - Debate/triag inputs - Extending for next 5 weeks

2021-07-04 Thread Tu, Lijuan
Hi Honnappa,

Shall we cancel this week as I have to leave for urgent family affairs?

thanks

-Original Appointment-
From: Honnappa Nagarahalli 
Sent: 2021年5月18日 12:37
To: Honnappa Nagarahalli; Chen, Zhaoyan; Buckley, Daniel M; 
juraj.lin...@pantheon.tech; Tu, Lijuan; jer...@marvell.com; Lincoln Lavoie; 
Aaron Conole; tho...@monjalon.net; dev@dpdk.org; Ashwin Sekhar Thalakalath 
Kottilveetil
Cc: nd
Subject: DTS improvement WG - Debate/triag inputs - Extending for next 5 weeks
When: 2021年7月7日星期三 8:00-9:00 (UTC-06:00) Central Time (US & Canada).
Where: https://meet.jit.si/dpdk


Hello,

We will continuing debating/triaging the inputs from [1]. As discussed, I am 
blocking the calendar for next 4 weeks.



Thank you,

Honnappa



[1] 
https://docs.google.com/document/d/1c5S0_mZzFvzZfYkqyORLT2-qNvUb-fBdjA6DGusy4yM/edit#


Re: [dpdk-dev] 19.11.9 patches review and test

2021-07-04 Thread Christian Ehrhardt
On Mon, Jul 5, 2021 at 6:14 AM Pei Zhang  wrote:
>
> Hello Christian,
>
> The testing with dpdk 19.11.9-rc3 from Red Hat looks good. We tested below 16 
> scenarios and all got PASS on RHEL8:
>
> (1)Guest with device assignment(PF) throughput testing(1G hugepage size): PASS
> (2)Guest with device assignment(PF) throughput testing(2M hugepage size) : 
> PASS
> (3)Guest with device assignment(VF) throughput testing: PASS
> (4)PVP (host dpdk testpmd as vswitch) 1Q: throughput testing: PASS
> (5)PVP vhost-user 2Q throughput testing: PASS
> (6)PVP vhost-user 1Q - cross numa node throughput testing: PASS
> (7)Guest with vhost-user 2 queues throughput testing: PASS
> (8)vhost-user reconnect with dpdk-client, qemu-server: qemu reconnect: PASS
> (9)vhost-user reconnect with dpdk-client, qemu-server: ovs reconnect: PASS
> (10)PVP 1Q live migration testing: PASS
> (11)PVP 1Q cross numa node live migration testing: PASS
> (12)Guest with ovs+dpdk+vhost-user 1Q live migration testing: PASS
> (13)Guest with ovs+dpdk+vhost-user 1Q live migration testing (2M): PASS
> (14)Guest with ovs+dpdk+vhost-user 2Q live migration testing: PASS
> (15)Host PF + DPDK testing: PASS
> (16)Host VF + DPDK testing: PASS
>
> Versions:
>
> kernel 4.18
> qemu 6.0
>
> dpdk: git://dpdk.org/dpdk-stable
> # git log -1
> commit e5f56f22a82a1df49c224a8af13e769ff4b04cd8 (HEAD, tag: v19.11.9-rc3, 
> origin/19.11)
> Author: Christian Ehrhardt 
> Date:   Thu Jun 17 08:25:20 2021 +0200
>
> version: 19.11.9-rc3

Thanks,
I updated your test results from -rc2 to -rc3 for the release notes.

> Signed-off-by: Christian Ehrhardt 
>
>
>
> # git branch
> remotes/origin/19.11
>
> NICs: X540-AT2 NIC(ixgbe, 10G)
>
> Best regards,
>
> Pei
>
> On Thu, Jun 17, 2021 at 2:38 PM Christian Ehrhardt 
>  wrote:
>>
>> Hi all,
>>
>> Here is a list of patches targeted for stable release 19.11.9.
>>
>> The planned date for the final release is 2nd July.
>>
>> Please help with testing and validation of your use cases and report
>> any issues/results with reply-all to this mail. For the final release
>> the fixes and reported validations will be added to the release notes.
>>
>> A renewed release candidate tarball can be found at:
>>
>> https://dpdk.org/browse/dpdk-stable/tag/?id=v19.11.9-rc3
>>
>> These patches are located at branch 19.11 of dpdk-stable repo:
>> https://dpdk.org/browse/dpdk-stable/
>>
>> Thanks.
>>
>> Christian Ehrhardt 
>>
>> ---
>> Adam Dybkowski (3):
>>   common/qat: increase IM buffer size for GEN3
>>   compress/qat: enable compression on GEN3
>>   crypto/qat: fix null authentication request
>>
>> Ajit Khaparde (3):
>>   net/bnxt: fix RSS context cleanup
>>   net/bnxt: fix mismatched type comparison in MAC restore
>>   net/bnxt: check PCI config read
>>
>> Alvin Zhang (7):
>>   net/ice: fix VLAN filter with PF
>>   net/i40e: fix input set field mask
>>   net/e1000: fix Rx error counter for bad length
>>   net/e1000: fix max Rx packet size
>>   net/ice: fix fast mbuf freeing
>>   net/iavf: fix VF to PF command failure handling
>>   net/i40e: fix VF RSS configuration
>>
>> Anatoly Burakov (3):
>>   fbarray: fix log message on truncation error
>>   power: do not skip saving original P-state governor
>>   power: save original ACPI governor always
>>
>> Andrew Rybchenko (2):
>>   net/failsafe: fix RSS hash offload reporting
>>   net/failsafe: report minimum and maximum MTU
>>
>> Andy Moreton (1):
>>   common/sfc_efx/base: limit reported MCDI response length
>>
>> Apeksha Gupta (1):
>>   examples/l2fwd-crypto: skip masked devices
>>
>> Arek Kusztal (1):
>>   crypto/qat: fix offset for out-of-place scatter-gather
>>
>> Beilei Xing (2):
>>   net/i40evf: fix packet loss for X722
>>   net/iavf: fix Tx context descriptor
>>
>> Bruce Richardson (1):
>>   build: exclude meson files from examples installation
>>
>> Chaoyong He (1):
>>   doc: fix multiport syntax in nfp guide
>>
>> Chenbo Xia (1):
>>   examples/vhost: check memory table query
>>
>> Chengchang Tang (13):
>>   ethdev: validate input in module EEPROM dump
>>   ethdev: validate input in register info
>>   ethdev: validate input in EEPROM info
>>   net/hns3: fix rollback after setting PVID failure
>>   examples: add eal cleanup to examples
>>   net/bonding: fix adding itself as its slave
>>   app/testpmd: fix max queue number for Tx offloads
>>   net/tap: fix interrupt vector array size
>>   net/bonding: fix socket ID check
>>   net/tap: check ioctl on restore
>>   net/hns3: fix HW buffer size on MTU update
>>   net/hns3: fix processing Tx offload flags
>>   examples/timer: fix time interval
>>
>> Chengwen Feng (32):
>>   net/hns3: fix flow counter value
>>   net/hns3: fix VF mailbox head field
>>   net/hns3: support get device version when dump register
>>   test: check thread creation
>>   common/dpaax: fix possible null 

[dpdk-dev] [PATCH_v2 1/3] regex/mlx5: fix memory region unregistration

2021-07-04 Thread Michael Baum
The issue can cause illegal physical address access while a huge-page A
is released and huge-page B is allocated on the same virtual address.
The old MR can be matched using the virtual address of huge-page B but
the HW will access the physical address of huge-page A which is no more
part of the DPDK process.

Register a driver callback for memory event in order to free out all the
MRs of memory that is going to be freed from the dpdk process.

Fixes: cda883bbb655 ("regex/mlx5: add dynamic memory registration to datapath")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
v2:
- Initialize pointer of global generation number.
- Add global generation number checking in indirect mkey creation.

 drivers/regex/mlx5/mlx5_regex.c  | 55 
 drivers/regex/mlx5/mlx5_regex.h  |  2 +
 drivers/regex/mlx5/mlx5_regex_control.c  |  2 +
 drivers/regex/mlx5/mlx5_regex_fastpath.c | 50 +++--
 4 files changed, 97 insertions(+), 12 deletions(-)

diff --git a/drivers/regex/mlx5/mlx5_regex.c b/drivers/regex/mlx5/mlx5_regex.c
index dcb2ced88e..0f12d94d7e 100644
--- a/drivers/regex/mlx5/mlx5_regex.c
+++ b/drivers/regex/mlx5/mlx5_regex.c
@@ -11,6 +11,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,6 +25,10 @@
 
 int mlx5_regex_logtype;
 
+TAILQ_HEAD(regex_mem_event, mlx5_regex_priv) mlx5_mem_event_list =
+   TAILQ_HEAD_INITIALIZER(mlx5_mem_event_list);
+static pthread_mutex_t mem_event_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
 const struct rte_regexdev_ops mlx5_regexdev_ops = {
.dev_info_get = mlx5_regex_info_get,
.dev_configure = mlx5_regex_configure,
@@ -82,6 +87,40 @@ mlx5_regex_get_name(char *name, struct rte_pci_device 
*pci_dev __rte_unused)
pci_dev->addr.devid, pci_dev->addr.function);
 }
 
+/**
+ * Callback for memory event.
+ *
+ * @param event_type
+ *   Memory event type.
+ * @param addr
+ *   Address of memory.
+ * @param len
+ *   Size of memory.
+ */
+static void
+mlx5_regex_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,
+  size_t len, void *arg __rte_unused)
+{
+   struct mlx5_regex_priv *priv;
+
+   /* Must be called from the primary process. */
+   MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
+   switch (event_type) {
+   case RTE_MEM_EVENT_FREE:
+   pthread_mutex_lock(&mem_event_list_lock);
+   /* Iterate all the existing mlx5 devices. */
+   TAILQ_FOREACH(priv, &mlx5_mem_event_list, mem_event_cb)
+   mlx5_free_mr_by_addr(&priv->mr_scache,
+priv->ctx->device->name,
+addr, len);
+   pthread_mutex_unlock(&mem_event_list_lock);
+   break;
+   case RTE_MEM_EVENT_ALLOC:
+   default:
+   break;
+   }
+}
+
 static int
 mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 struct rte_pci_device *pci_dev)
@@ -193,6 +232,15 @@ mlx5_regex_pci_probe(struct rte_pci_driver *pci_drv 
__rte_unused,
rte_errno = ENOMEM;
goto error;
}
+   /* Register callback function for global shared MR cache management. */
+   if (TAILQ_EMPTY(&mlx5_mem_event_list))
+   rte_mem_event_callback_register("MLX5_MEM_EVENT_CB",
+   mlx5_regex_mr_mem_event_cb,
+   NULL);
+   /* Add device to memory callback list. */
+   pthread_mutex_lock(&mem_event_list_lock);
+   TAILQ_INSERT_TAIL(&mlx5_mem_event_list, priv, mem_event_cb);
+   pthread_mutex_unlock(&mem_event_list_lock);
DRV_LOG(INFO, "RegEx GGA is %s.",
priv->has_umr ? "supported" : "unsupported");
return 0;
@@ -225,6 +273,13 @@ mlx5_regex_pci_remove(struct rte_pci_device *pci_dev)
return 0;
priv = dev->data->dev_private;
if (priv) {
+   /* Remove from memory callback device list. */
+   pthread_mutex_lock(&mem_event_list_lock);
+   TAILQ_REMOVE(&mlx5_mem_event_list, priv, mem_event_cb);
+   pthread_mutex_unlock(&mem_event_list_lock);
+   if (TAILQ_EMPTY(&mlx5_mem_event_list))
+   rte_mem_event_callback_unregister("MLX5_MEM_EVENT_CB",
+ NULL);
if (priv->pd)
mlx5_glue->dealloc_pd(priv->pd);
if (priv->uar)
diff --git a/drivers/regex/mlx5/mlx5_regex.h b/drivers/regex/mlx5/mlx5_regex.h
index 51a2101e53..61f59ba873 100644
--- a/drivers/regex/mlx5/mlx5_regex.h
+++ b/drivers/regex/mlx5/mlx5_regex.h
@@ -70,6 +70,8 @@ struct mlx5_regex_priv {
uint32_t nb_engines; /* Number of RegEx engines. */
struct mlx5dv_devx_uar *uar; /* UAR object. */
   

[dpdk-dev] [PATCH_v2 2/3] regex/mlx5: fix leak in PCI remove function

2021-07-04 Thread Michael Baum
In the PCI removal function, PMD releases all driver resources allocated
in the probe function.

The MR btree memory is allocated in the probe function, but it is not
freed in remove function what caused a memory leak.

Release it.

Fixes: cda883bbb655 ("regex/mlx5: add dynamic memory registration to datapath")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
 drivers/regex/mlx5/mlx5_regex.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/regex/mlx5/mlx5_regex.c b/drivers/regex/mlx5/mlx5_regex.c
index 0f12d94d7e..f64dc2824c 100644
--- a/drivers/regex/mlx5/mlx5_regex.c
+++ b/drivers/regex/mlx5/mlx5_regex.c
@@ -280,6 +280,8 @@ mlx5_regex_pci_remove(struct rte_pci_device *pci_dev)
if (TAILQ_EMPTY(&mlx5_mem_event_list))
rte_mem_event_callback_unregister("MLX5_MEM_EVENT_CB",
  NULL);
+   if (priv->mr_scache.cache.table)
+   mlx5_mr_release_cache(&priv->mr_scache);
if (priv->pd)
mlx5_glue->dealloc_pd(priv->pd);
if (priv->uar)
-- 
2.25.1



[dpdk-dev] [PATCH_v2 3/3] regex/mlx5: fix redundancy in PCI remove function

2021-07-04 Thread Michael Baum
In the PCI removal function, PMD releases all driver resources and
cancels the regexdev registry.

However, regexdev registration is accidentally canceled twice.

Remove one of them.

Fixes: b34d816363b5 ("regex/mlx5: support rules import")
Cc: sta...@dpdk.org

Signed-off-by: Michael Baum 
---
 drivers/regex/mlx5/mlx5_regex.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/regex/mlx5/mlx5_regex.c b/drivers/regex/mlx5/mlx5_regex.c
index f64dc2824c..1c5bf930ad 100644
--- a/drivers/regex/mlx5/mlx5_regex.c
+++ b/drivers/regex/mlx5/mlx5_regex.c
@@ -290,8 +290,6 @@ mlx5_regex_pci_remove(struct rte_pci_device *pci_dev)
rte_regexdev_unregister(priv->regexdev);
if (priv->ctx)
mlx5_glue->close_device(priv->ctx);
-   if (priv->regexdev)
-   rte_regexdev_unregister(priv->regexdev);
rte_free(priv);
}
return 0;
-- 
2.25.1



[dpdk-dev] [PATCH v7 1/2] devargs: add common key definition

2021-07-04 Thread Xueming Li
Adds common devargs key definition for "bus", "class" and "driver".

Acked-by: Thomas Monjalon 
Signed-off-by: Xueming Li 
---
 drivers/common/mlx5/mlx5_common.h   |  2 --
 drivers/common/mlx5/mlx5_common_pci.c   |  2 +-
 drivers/common/sfc_efx/sfc_efx.c|  7 +++
 drivers/common/sfc_efx/sfc_efx.h|  2 --
 drivers/net/bonding/rte_eth_bond_args.c |  2 +-
 drivers/net/i40e/i40e_ethdev_vf.c   |  5 ++---
 drivers/net/iavf/iavf_ethdev.c  |  5 ++---
 drivers/net/mlx5/mlx5.c |  4 ++--
 drivers/net/sfc/sfc_kvargs.c|  2 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c   |  2 +-
 lib/eal/common/eal_common_devargs.c | 12 ++--
 lib/eal/include/rte_devargs.h   | 24 
 12 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.h 
b/drivers/common/mlx5/mlx5_common.h
index 1fbefe0fa6..306f2f1ab7 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -208,8 +208,6 @@ __rte_internal
 int mlx5_get_ifname_sysfs(const char *ibdev_path, char *ifname);
 
 
-#define MLX5_CLASS_ARG_NAME "class"
-
 enum mlx5_class {
MLX5_CLASS_INVALID,
MLX5_CLASS_NET = RTE_BIT64(0),
diff --git a/drivers/common/mlx5/mlx5_common_pci.c 
b/drivers/common/mlx5/mlx5_common_pci.c
index 3f16cd21cf..34747c4e07 100644
--- a/drivers/common/mlx5/mlx5_common_pci.c
+++ b/drivers/common/mlx5/mlx5_common_pci.c
@@ -118,7 +118,7 @@ bus_cmdline_options_handler(__rte_unused const char *key,
 static int
 parse_class_options(const struct rte_devargs *devargs)
 {
-   const char *key = MLX5_CLASS_ARG_NAME;
+   const char *key = RTE_DEVARGS_KEY_CLASS;
struct rte_kvargs *kvlist;
int ret = 0;
 
diff --git a/drivers/common/sfc_efx/sfc_efx.c b/drivers/common/sfc_efx/sfc_efx.c
index 0b78933d9f..2dc5545760 100644
--- a/drivers/common/sfc_efx/sfc_efx.c
+++ b/drivers/common/sfc_efx/sfc_efx.c
@@ -42,7 +42,6 @@ enum sfc_efx_dev_class
 sfc_efx_dev_class_get(struct rte_devargs *devargs)
 {
struct rte_kvargs *kvargs;
-   const char *key = SFC_EFX_KVARG_DEV_CLASS;
enum sfc_efx_dev_class dev_class = SFC_EFX_DEV_CLASS_NET;
 
if (devargs == NULL)
@@ -52,9 +51,9 @@ sfc_efx_dev_class_get(struct rte_devargs *devargs)
if (kvargs == NULL)
return dev_class;
 
-   if (rte_kvargs_count(kvargs, key) != 0) {
-   rte_kvargs_process(kvargs, key, sfc_efx_kvarg_dev_class_handler,
-  &dev_class);
+   if (rte_kvargs_count(kvargs, RTE_DEVARGS_KEY_CLASS) != 0) {
+   rte_kvargs_process(kvargs, RTE_DEVARGS_KEY_CLASS,
+  sfc_efx_kvarg_dev_class_handler, &dev_class);
}
 
rte_kvargs_free(kvargs);
diff --git a/drivers/common/sfc_efx/sfc_efx.h b/drivers/common/sfc_efx/sfc_efx.h
index 6b6164cb1f..c16eca60f3 100644
--- a/drivers/common/sfc_efx/sfc_efx.h
+++ b/drivers/common/sfc_efx/sfc_efx.h
@@ -19,8 +19,6 @@
 extern "C" {
 #endif
 
-#define SFC_EFX_KVARG_DEV_CLASS"class"
-
 enum sfc_efx_dev_class {
SFC_EFX_DEV_CLASS_INVALID = 0,
SFC_EFX_DEV_CLASS_NET,
diff --git a/drivers/net/bonding/rte_eth_bond_args.c 
b/drivers/net/bonding/rte_eth_bond_args.c
index 764b1b8c8e..5406e1c934 100644
--- a/drivers/net/bonding/rte_eth_bond_args.c
+++ b/drivers/net/bonding/rte_eth_bond_args.c
@@ -18,7 +18,7 @@ const char *pmd_bond_init_valid_arguments[] = {
PMD_BOND_SOCKET_ID_KVARG,
PMD_BOND_MAC_ADDR_KVARG,
PMD_BOND_AGG_MODE_KVARG,
-   "driver",
+   RTE_DEVARGS_KEY_DRIVER,
NULL
 };
 
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 385ebedcd3..0cfe13b7b2 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1660,7 +1660,6 @@ static int
 i40evf_driver_selected(struct rte_devargs *devargs)
 {
struct rte_kvargs *kvlist;
-   const char *key = "driver";
int ret = 0;
 
if (devargs == NULL)
@@ -1670,13 +1669,13 @@ i40evf_driver_selected(struct rte_devargs *devargs)
if (kvlist == NULL)
return 0;
 
-   if (!rte_kvargs_count(kvlist, key))
+   if (!rte_kvargs_count(kvlist, RTE_DEVARGS_KEY_DRIVER))
goto exit;
 
/* i40evf driver selected when there's a key-value pair:
 * driver=i40evf
 */
-   if (rte_kvargs_process(kvlist, key,
+   if (rte_kvargs_process(kvlist, RTE_DEVARGS_KEY_DRIVER,
   i40evf_check_driver_handler, NULL) < 0)
goto exit;
 
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 5290588b17..472538181e 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -2448,7 +2448,6 @@ static int
 iavf_drv_i40evf_selected(struct rte_devargs *devargs, uint16_t device_id)
 {
struct rte_kvargs *kvlist;
-   const char *key = 

[dpdk-dev] [PATCH v7 2/2] bus/auxiliary: introduce auxiliary bus

2021-07-04 Thread Xueming Li
Auxiliary bus [1] provides a way to split function into child-devices
representing sub-domains of functionality. Each auxiliary device
represents a part of its parent functionality.

Auxiliary device is identified by unique device name, sysfs path:
  /sys/bus/auxiliary/devices/

Devargs legacy syntax of auxiliary device:
  -a auxiliary:[,args...]
Devargs generic syntax of auxiliary device:
  -a bus=auxiliary,name=/class=/driver=[,args...]

[1] kernel auxiliary bus document:
https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html

Signed-off-by: Xueming Li 
Cc: Wang Haiyue 
Cc: Thomas Monjalon 
Cc: Kinsella Ray 
Acked-by: Andrew Rybchenko 
---
 MAINTAINERS   |   5 +
 doc/guides/rel_notes/release_21_08.rst|   6 +
 drivers/bus/auxiliary/auxiliary_common.c  | 411 ++
 drivers/bus/auxiliary/auxiliary_params.c  |  59 
 drivers/bus/auxiliary/linux/auxiliary.c   | 141 
 drivers/bus/auxiliary/meson.build |  16 +
 drivers/bus/auxiliary/private.h   |  74 
 drivers/bus/auxiliary/rte_bus_auxiliary.h | 201 +++
 drivers/bus/auxiliary/version.map |   7 +
 drivers/bus/meson.build   |   1 +
 10 files changed, 921 insertions(+)
 create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
 create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
 create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
 create mode 100644 drivers/bus/auxiliary/meson.build
 create mode 100644 drivers/bus/auxiliary/private.h
 create mode 100644 drivers/bus/auxiliary/rte_bus_auxiliary.h
 create mode 100644 drivers/bus/auxiliary/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16971..9b0d0d4348 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -525,6 +525,11 @@ F: doc/guides/mempool/octeontx2.rst
 Bus Drivers
 ---
 
+Auxiliary bus driver - EXPERIMENTAL
+M: Parav Pandit 
+M: Xueming Li 
+F: drivers/bus/auxiliary/
+
 Intel FPGA bus
 M: Rosen Xu 
 F: drivers/bus/ifpga/
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..e7ef4c8a05 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,12 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Added auxiliary bus support.**
+
+  Auxiliary bus provides a way to split function into child-devices
+  representing sub-domains of functionality. Each auxiliary device
+  represents a part of its parent functionality.
+
 
 Removed Items
 -
diff --git a/drivers/bus/auxiliary/auxiliary_common.c 
b/drivers/bus/auxiliary/auxiliary_common.c
new file mode 100644
index 00..8a75306da5
--- /dev/null
+++ b/drivers/bus/auxiliary/auxiliary_common.c
@@ -0,0 +1,411 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "private.h"
+#include "rte_bus_auxiliary.h"
+
+static struct rte_devargs *
+auxiliary_devargs_lookup(const char *name)
+{
+   struct rte_devargs *devargs;
+
+   RTE_EAL_DEVARGS_FOREACH(RTE_BUS_AUXILIARY_NAME, devargs) {
+   if (strcmp(devargs->name, name) == 0)
+   return devargs;
+   }
+   return NULL;
+}
+
+/*
+ * Test whether the auxiliary device exist
+ *
+ * Stub for OS not supporting auxiliary bus.
+ */
+__rte_weak bool
+auxiliary_dev_exists(const char *name)
+{
+   RTE_SET_USED(name);
+   return false;
+}
+
+/*
+ * Scan the devices in the auxiliary bus.
+ *
+ * Stub for OS not supporting auxiliary bus.
+ */
+__rte_weak int
+auxiliary_scan(void)
+{
+   return 0;
+}
+
+/*
+ * Update a device's devargs being scanned.
+ *
+ * @param aux_dev
+ * AUXILIARY device.
+ */
+void
+auxiliary_on_scan(struct rte_auxiliary_device *aux_dev)
+{
+   aux_dev->device.devargs = auxiliary_devargs_lookup(aux_dev->name);
+}
+
+/*
+ * Match the auxiliary driver and device using driver function.
+ */
+bool
+auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
+   const struct rte_auxiliary_device *aux_dev)
+{
+   if (aux_drv->match == NULL)
+   return false;
+   return aux_drv->match(aux_dev->name);
+}
+
+/*
+ * Call the probe() function of the driver.
+ */
+static int
+rte_auxiliary_probe_one_driver(struct rte_auxiliary_driver *drv,
+  struct rte_auxiliary_device *dev)
+{
+   enum rte_iova_mode iova_mode;
+   int ret;
+
+   if ((drv == NULL) || (dev == NULL))
+   return -EINVAL;
+
+   /* Check if driver supports it. */
+   if (!auxiliary_match(drv, dev))
+   /* Match of device and driver failed */
+   return 1;

Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus

2021-07-04 Thread Xueming(Steven) Li
Hi Andrew,

Thanks very much all the good suggestions, v7 posted.

Best Regards,
Xueming

> -Original Message-
> From: Andrew Rybchenko 
> Sent: Monday, July 5, 2021 12:13 AM
> To: Xueming(Steven) Li 
> Cc: dev@dpdk.org; Wang Haiyue ; NBU-Contact-Thomas 
> Monjalon ; Kinsella Ray
> ; Neil Horman 
> Subject: Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
> 
> On 6/25/21 2:47 PM, Xueming Li wrote:
> > Auxiliary bus [1] provides a way to split function into child-devices
> > representing sub-domains of functionality. Each auxiliary device
> > represents a part of its parent functionality.
> >
> > Auxiliary device is identified by unique device name, sysfs path:
> >   /sys/bus/auxiliary/devices/
> >
> > Devargs legacy syntax ofauxiliary device:
> 
> Missing space after 'of'
> 
> >   -a auxiliary:[,args...]
> > Devargs generic syntax of auxiliary device:
> >   -a bus=auxiliary,name=,,/class=,,/driver=,,
> 
> Are two commas above intentionall? What for?
> 
> >
> > [1] kernel auxiliary bus document:
> > https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
> >
> > Signed-off-by: Xueming Li 
> 
> With my below notes fixed:
> 
> Acked-by: Andrew Rybchenko 
> 
> > Cc: Wang Haiyue 
> > Cc: Thomas Monjalon 
> > Cc: Kinsella Ray 
> > ---
> >  MAINTAINERS   |   5 +
> >  doc/guides/rel_notes/release_21_08.rst|   6 +
> >  drivers/bus/auxiliary/auxiliary_common.c  | 411
> > ++  drivers/bus/auxiliary/auxiliary_params.c  |  59 
> >  drivers/bus/auxiliary/linux/auxiliary.c   | 141 
> >  drivers/bus/auxiliary/meson.build |  16 +
> >  drivers/bus/auxiliary/private.h   |  74 
> >  drivers/bus/auxiliary/rte_bus_auxiliary.h | 201 +++
> >  drivers/bus/auxiliary/version.map |   7 +
> >  drivers/bus/meson.build   |   1 +
> >  10 files changed, 921 insertions(+)
> >  create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
> >  create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
> >  create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
> >  create mode 100644 drivers/bus/auxiliary/meson.build  create mode
> > 100644 drivers/bus/auxiliary/private.h  create mode 100644
> > drivers/bus/auxiliary/rte_bus_auxiliary.h
> >  create mode 100644 drivers/bus/auxiliary/version.map
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS index 5877a16971..eaf691ca6a
> > 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -525,6 +525,11 @@ F: doc/guides/mempool/octeontx2.rst  Bus Drivers
> >  ---
> >
> > +Auxiliary bus driver
> 
> Shouldn't it be EXPERIMENTAL?
> 
> > +M: Parav Pandit 
> > +M: Xueming Li 
> > +F: drivers/bus/auxiliary/
> > +
> >  Intel FPGA bus
> >  M: Rosen Xu 
> >  F: drivers/bus/ifpga/
> > diff --git a/doc/guides/rel_notes/release_21_08.rst
> > b/doc/guides/rel_notes/release_21_08.rst
> > index a6ecfdf3ce..e7ef4c8a05 100644
> > --- a/doc/guides/rel_notes/release_21_08.rst
> > +++ b/doc/guides/rel_notes/release_21_08.rst
> > @@ -55,6 +55,12 @@ New Features
> >   Also, make sure to start the actual text at the margin.
> >   ===
> >
> > +* **Added auxiliary bus support.**
> > +
> > +  Auxiliary bus provides a way to split function into child-devices
> > + representing sub-domains of functionality. Each auxiliary device
> > + represents a part of its parent functionality.
> > +
> >
> >  Removed Items
> >  -
> > diff --git a/drivers/bus/auxiliary/auxiliary_common.c
> > b/drivers/bus/auxiliary/auxiliary_common.c
> > new file mode 100644
> > index 00..8a75306da5
> > --- /dev/null
> > +++ b/drivers/bus/auxiliary/auxiliary_common.c
> > @@ -0,0 +1,411 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2021 NVIDIA Corporation & Affiliates  */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "private.h"
> > +#include "rte_bus_auxiliary.h"
> > +
> > +static struct rte_devargs *
> > +auxiliary_devargs_lookup(const char *name) {
> > +   struct rte_devargs *devargs;
> > +
> > +   RTE_EAL_DEVARGS_FOREACH(RTE_BUS_AUXILIARY_NAME, devargs) {
> > +   if (strcmp(devargs->name, name) == 0)
> > +   return devargs;
> > +   }
> > +   return NULL;
> > +}
> > +
> > +/*
> > + * Test whether the auxiliary device exist
> 
> Missing full stop above.
> 
> > + *
> > + * Stub for OS not supporting auxiliary bus.
> > + */
> > +__rte_weak bool
> > +auxiliary_dev_exists(const char *name) {
> > +   RTE_SET_USED(name);
> > +   return false;
> > +}
> > +
> > +/*
> > + * Scan the devices in the auxiliary bus.
> > + *
> > + * Stub for OS not supporting auxiliary bus.
> > + */
> > +__rte_weak int
> > +auxiliary_scan(void)
> > +{
>

Re: [dpdk-dev] [RFC] lib/ethdev: add dev configured flag

2021-07-04 Thread Thomas Monjalon
05/07/2021 05:18, Huisong Li:
> 在 2021/7/5 4:05, Thomas Monjalon 写道:
> > 08/05/2021 10:00, Huisong Li:
> >> Currently, if dev_configure is not invoked or fails to be invoked, users
> >> can still invoke dev_start successfully. This patch adds a "dev_configured"
> >> flag in "rte_eth_dev_data" to control whether dev_start can be invoked.
> > [...]
> >> --- a/lib/ethdev/rte_ethdev_core.h
> >> +++ b/lib/ethdev/rte_ethdev_core.h
> >> @@ -167,7 +167,11 @@ struct rte_eth_dev_data {
> >>scattered_rx : 1,  /**< RX of scattered packets is ON(1) / 
> >> OFF(0) */
> >>all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
> >>dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). 
> >> */
> >> -  lro : 1;   /**< RX LRO is ON(1) / OFF(0) */
> >> +  lro : 1,  /**< RX LRO is ON(1) / OFF(0) */
> >> +  dev_configured : 1;
> >> +  /**< Device configuration state:
> >> +   * CONFIGURED(1) / NOT CONFIGURED(0).
> >> +   */
> > Why not using "enum rte_eth_dev_state"?
> > Because rte_eth_dev.state is not shared between processes?
> 
> It doesn't feel right. "enum rte_eth_dev_state" is private to the 
> primary and secondary processes and can be independently controlled.
> 
> However, the secondary process does not make resource allocations and 
> does not call dev_configure().
> 
> These are done by the primary process and  can be obtained or used by 
> the secondary process.
> 
> Like "dev_started" in struct rte_eth_dev_data.

That's a good reason, thanks.




[dpdk-dev] [PATCH v5 0/4] support async dequeue for split ring

2021-07-04 Thread Wenwu Ma
This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with
offloading large copies to the DMA engine, thus saving precious CPU
cycles.

v5:
- DMA address use IOVA instead of VA.

v4:
- Fix wrong packet index issue in async dequeue improve
  the performance of small packet copies.

v3:
- Fix compilation warning and error in arm platform.
- Restore the removed function virtio_dev_pktmbuf_alloc,
  async dequeue allocate packets in separate.

v2:
- Refactor vhost datapath as preliminary patch for this series.
- The change of using new API in examples/vhost is put into a
  dedicated patch.
- Check queue_id value before using it.
- Async dequeue performance enhancement. 160% performance improvement
  for v2 vs. v1.
- Async dequeue API name change from rte_vhost_try_dequeue_burst to
  rte_vhost_async_try_dequeue_burst.
- The completed package updates the used ring directly.

Wenwu Ma (3):
  examples/vhost: refactor vhost enqueue and dequeue datapaths.
  examples/vhost: use a new API to query remaining ring space
  examples/vhost: support vhost async dequeue data path

Yuan Wang (1):
  vhost: support async dequeue for split ring

 doc/guides/prog_guide/vhost_lib.rst |  10 +
 doc/guides/sample_app_ug/vhost.rst  |   9 +-
 examples/vhost/ioat.c   |  67 +++-
 examples/vhost/ioat.h   |  25 ++
 examples/vhost/main.c   | 224 +++
 examples/vhost/main.h   |  33 +-
 examples/vhost/virtio_net.c |  16 +-
 lib/vhost/rte_vhost_async.h |  44 +-
 lib/vhost/version.map   |   3 +
 lib/vhost/virtio_net.c  | 601 
 10 files changed, 924 insertions(+), 108 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v5 1/4] examples/vhost: refactor vhost enqueue and dequeue datapaths.

2021-07-04 Thread Wenwu Ma
Previously, by judging the flag, we call different enqueue/dequeue
functions in data path.

Now, we use an ops that was initialized when Vhost was created,
so that we can call ops directly in Vhost data path without any more
flag judgment.

Signed-off-by: Wenwu Ma 
---
 examples/vhost/main.c   | 112 
 examples/vhost/main.h   |  33 +--
 examples/vhost/virtio_net.c |  16 +-
 3 files changed, 105 insertions(+), 56 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d2179eadb9..aebdc3a566 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -106,6 +106,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[MAX_VHOST_DEVICE];
+
 /* empty vmdq configuration structure. Filled in programatically */
 static struct rte_eth_conf vmdq_conf_default = {
.rxmode = {
@@ -885,27 +887,8 @@ drain_vhost(struct vhost_dev *vdev)
uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-   if (builtin_net_driver) {
-   ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-   } else if (async_vhost_driver) {
-   uint32_t cpu_cpl_nr = 0;
-   uint16_t enqueue_fail = 0;
-   struct rte_mbuf *m_cpu_cpl[nr_xmit];
-
-   complete_async_pkts(vdev);
-   ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-   m, nr_xmit, m_cpu_cpl, &cpu_cpl_nr);
-
-   if (cpu_cpl_nr)
-   free_pkts(m_cpu_cpl, cpu_cpl_nr);
-
-   enqueue_fail = nr_xmit - ret;
-   if (enqueue_fail)
-   free_pkts(&m[ret], nr_xmit - ret);
-   } else {
-   ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-   m, nr_xmit);
-   }
+   ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+   VIRTIO_RXQ, m, nr_xmit);
 
if (enable_stats) {
__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1184,6 +1167,36 @@ drain_mbuf_table(struct mbuf_table *tx_q)
}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+   struct rte_mbuf **pkts, uint32_t rx_count)
+{
+   uint16_t enqueue_count;
+   uint32_t cpu_cpl_nr = 0;
+   uint16_t enqueue_fail = 0;
+   struct rte_mbuf *m_cpu_cpl[MAX_PKT_BURST];
+
+   complete_async_pkts(vdev);
+   enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
+   queue_id, pkts, rx_count,
+   m_cpu_cpl, &cpu_cpl_nr);
+   if (cpu_cpl_nr)
+   free_pkts(m_cpu_cpl, cpu_cpl_nr);
+
+   enqueue_fail = rx_count - enqueue_count;
+   if (enqueue_fail)
+   free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+   return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+   struct rte_mbuf **pkts, uint32_t rx_count)
+{
+   return rte_vhost_enqueue_burst(vdev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1214,29 +1227,8 @@ drain_eth_rx(struct vhost_dev *vdev)
}
}
 
-   if (builtin_net_driver) {
-   enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-   pkts, rx_count);
-   } else if (async_vhost_driver) {
-   uint32_t cpu_cpl_nr = 0;
-   uint16_t enqueue_fail = 0;
-   struct rte_mbuf *m_cpu_cpl[MAX_PKT_BURST];
-
-   complete_async_pkts(vdev);
-   enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-   VIRTIO_RXQ, pkts, rx_count,
-   m_cpu_cpl, &cpu_cpl_nr);
-   if (cpu_cpl_nr)
-   free_pkts(m_cpu_cpl, cpu_cpl_nr);
-
-   enqueue_fail = rx_count - enqueue_count;
-   if (enqueue_fail)
-   free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-   } else {
-   enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-   pkts, rx_count);
-   }
+   enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+   VIRTIO_RXQ, pkts, rx_count);
 
if (enable_stats) {
__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1249,6 +1241,14 @@ drain_eth_rx(struct vhost_dev *vdev)
free_pkts(pkts, rx_count);
 }
 
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+   struct rte_mempool *mbuf_pool,
+   

[dpdk-dev] [PATCH v5 2/4] examples/vhost: use a new API to query remaining ring space

2021-07-04 Thread Wenwu Ma
A new API for querying the remaining descriptor ring capacity
is available, so we use the new one instead of the old one.

Signed-off-by: Wenwu Ma 
---
 examples/vhost/ioat.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index 2a2c2d7202..bf4e033bdb 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -17,7 +17,6 @@ struct packet_tracker {
unsigned short next_read;
unsigned short next_write;
unsigned short last_remain;
-   unsigned short ioat_space;
 };
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
@@ -113,7 +112,6 @@ open_ioat(const char *value)
goto out;
}
rte_rawdev_start(dev_id);
-   cb_tracker[dev_id].ioat_space = IOAT_RING_SIZE - 1;
dma_info->nr++;
i++;
}
@@ -140,7 +138,7 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
src = descs[i_desc].src;
dst = descs[i_desc].dst;
i_seg = 0;
-   if (cb_tracker[dev_id].ioat_space < src->nr_segs)
+   if (rte_ioat_burst_capacity(dev_id) < src->nr_segs)
break;
while (i_seg < src->nr_segs) {
rte_ioat_enqueue_copy(dev_id,
@@ -155,7 +153,6 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
}
write &= mask;
cb_tracker[dev_id].size_track[write] = src->nr_segs;
-   cb_tracker[dev_id].ioat_space -= src->nr_segs;
write++;
}
} else {
@@ -194,7 +191,6 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
if (n_seg == 0)
return 0;
 
-   cb_tracker[dev_id].ioat_space += n_seg;
n_seg += cb_tracker[dev_id].last_remain;
 
read = cb_tracker[dev_id].next_read;
-- 
2.25.1



[dpdk-dev] [PATCH v5 3/4] vhost: support async dequeue for split ring

2021-07-04 Thread Wenwu Ma
From: Yuan Wang 

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with
offloading large copies to the DMA engine, thus saving precious CPU
cycles.

Signed-off-by: Yuan Wang 
Signed-off-by: Jiayu Hu 
Signed-off-by: Wenwu Ma 
---
 doc/guides/prog_guide/vhost_lib.rst |  10 +
 lib/vhost/rte_vhost_async.h |  44 +-
 lib/vhost/version.map   |   3 +
 lib/vhost/virtio_net.c  | 601 
 4 files changed, 655 insertions(+), 3 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst 
b/doc/guides/prog_guide/vhost_lib.rst
index d18fb98910..05c42c9b11 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -281,6 +281,16 @@ The following is an overview of some key Vhost API 
functions:
   Poll enqueue completion status from async data path. Completed packets
   are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count, 
nr_inflight)``
+
+  Try to receive packets from the guest with offloading large packets
+  to the DMA engine. Successfully dequeued packets are transfer
+  completed and returned in ``pkts``. But there may be other packets
+  that are sent from the guest but being transferred by the DMA engine,
+  called in-flight packets. This function will return in-flight packets
+  only after the DMA engine finishes transferring. The amount of
+  in-flight packets by now is returned in ``nr_inflight``.
+
 Vhost-user Implementations
 --
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 6faa31f5ad..58019408f1 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -84,13 +84,21 @@ struct rte_vhost_async_channel_ops {
 };
 
 /**
- * inflight async packet information
+ * in-flight async packet information
  */
+struct async_nethdr {
+   struct virtio_net_hdr hdr;
+   bool valid;
+};
+
 struct async_inflight_info {
struct rte_mbuf *mbuf;
-   uint16_t descs; /* num of descs inflight */
+   union {
+   uint16_t descs; /* num of descs in-flight */
+   struct async_nethdr nethdr;
+   };
uint16_t nr_buffers; /* num of buffers inflight for packed ring */
-};
+} __rte_cache_aligned;
 
 /**
  *  dma channel feature bit definition
@@ -193,4 +201,34 @@ __rte_experimental
 uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id,
struct rte_mbuf **pkts, uint16_t count);
 
+/**
+ * This function tries to receive packets from the guest with offloading
+ * large copies to the DMA engine. Successfully dequeued packets are
+ * transfer completed, either by the CPU or the DMA engine, and they are
+ * returned in "pkts". There may be other packets that are sent from
+ * the guest but being transferred by the DMA engine, called in-flight
+ * packets. The amount of in-flight packets by now is returned in
+ * "nr_inflight". This function will return in-flight packets only after
+ * the DMA engine finishes transferring.
+ *
+ * @param vid
+ *  id of vhost device to dequeue data
+ * @param queue_id
+ *  queue id to dequeue data
+ * @param pkts
+ *  blank array to keep successfully dequeued packets
+ * @param count
+ *  size of the packet array
+ * @param nr_inflight
+ *  the amount of in-flight packets by now. If error occurred, its
+ *  value is set to -1.
+ * @return
+ *  num of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+   struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+   int *nr_inflight);
+
 #endif /* _RTE_VHOST_ASYNC_H_ */
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 9103a23cd4..a320f889cd 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -79,4 +79,7 @@ EXPERIMENTAL {
 
# added in 21.05
rte_vhost_get_negotiated_protocol_features;
+
+   # added in 21.08
+   rte_vhost_async_try_dequeue_burst;
 };
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index b93482587c..52237e8600 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2673,6 +2673,32 @@ virtio_dev_pktmbuf_prep(struct virtio_net *dev, struct 
rte_mbuf *pkt,
return -1;
 }
 
+/*
+ * Allocate a host supported pktmbuf.
+ */
+static __rte_always_inline struct rte_mbuf *
+virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
+uint32_t data_len)
+{
+   struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp);
+
+   if (unlikely(pkt == NULL)) {
+   VHOST_LOG_DATA(ERR,
+   "Failed to allocate memory for mbuf.\n");
+   return NULL;
+   }
+
+   if (virtio_dev_pktmbuf_prep(dev, pkt, data_len)) {
+   /* Data doesn

[dpdk-dev] [PATCH v5 4/4] examples/vhost: support vhost async dequeue data path

2021-07-04 Thread Wenwu Ma
This patch is to add vhost async dequeue data-path in vhost sample.
vswitch can leverage IOAT to accelerate vhost async dequeue data-path.

Signed-off-by: Wenwu Ma 
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/ioat.c  |  61 ++---
 examples/vhost/ioat.h  |  25 ++
 examples/vhost/main.c  | 140 -
 4 files changed, 177 insertions(+), 58 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst 
b/doc/guides/sample_app_ug/vhost.rst
index 9afde9c7f5..63dcf181e1 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in 
combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -
diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index bf4e033bdb..a305100b47 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -21,6 +21,8 @@ struct packet_tracker {
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
 
+int vid2socketid[MAX_VHOST_DEVICE];
+
 int
 open_ioat(const char *value)
 {
@@ -29,7 +31,7 @@ open_ioat(const char *value)
char *addrs = input;
char *ptrs[2];
char *start, *end, *substr;
-   int64_t vid, vring_id;
+   int64_t socketid, vring_id;
struct rte_ioat_rawdev_config config;
struct rte_rawdev_info info = { .dev_private = &config };
char name[32];
@@ -60,6 +62,8 @@ open_ioat(const char *value)
goto out;
}
while (i < args_nr) {
+   char *txd, *rxd;
+   bool is_txd;
char *arg_temp = dma_arg[i];
uint8_t sub_nr;
sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
@@ -68,27 +72,38 @@ open_ioat(const char *value)
goto out;
}
 
-   start = strstr(ptrs[0], "txd");
-   if (start == NULL) {
+   int async_flag;
+   txd = strstr(ptrs[0], "txd");
+   rxd = strstr(ptrs[0], "rxd");
+   if (txd == NULL && rxd == NULL) {
ret = -1;
goto out;
+   } else if (txd) {
+   is_txd = true;
+   start = txd;
+   async_flag = ASYNC_RX_VHOST;
+   } else {
+   is_txd = false;
+   start = rxd;
+   async_flag = ASYNC_TX_VHOST;
}
 
start += 3;
-   vid = strtol(start, &end, 0);
+   socketid = strtol(start, &end, 0);
if (end == start) {
ret = -1;
goto out;
}
 
-   vring_id = 0 + VIRTIO_RXQ;
+   vring_id = is_txd ? VIRTIO_RXQ : VIRTIO_TXQ;
+
if (rte_pci_addr_parse(ptrs[1],
-   &(dma_info + vid)->dmas[vring_id].addr) < 0) {
+   &(dma_info + socketid)->dmas[vring_id].addr) < 0) {
ret = -1;
goto out;
}
 
-   rte_pci_device_name(&(dma_info + vid)->dmas[vring_id].addr,
+   rte_pci_device_name(&(dma_info + socketid)->dmas[vring_id].addr,
name, sizeof(name));
dev_id = rte_rawdev_get_dev_id(name);
if (dev_id == (uint16_t)(-ENODEV) ||
@@ -103,8 +118,9 @@ open_ioat(const char *value)
goto out;
}
 
-   (dma_info + vid)->dmas[vring_id].dev_id = dev_id;
-   (dma_info + vid)->dmas[vring_id].is_valid = true;
+   (dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+   (dma_info + socketid)->dmas[vring_id].is_valid = true;
+   (dma_info + socketid)->async_flag |= async_flag;
config.ring_size = IOAT_RING_SIZE;
config.hdls_disable = true;
if (rte_rawdev_configure(dev_id, &info, sizeof(config)) < 0) {
@@ -126,13 +142,16 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
struct rte_vhost

Re: [dpdk-dev] [PATCH] test: fix crypto_op length for sessionless case

2021-07-04 Thread Gujjar, Abhinandan S
Hi Jerin/Akhil,

Could you please review the patch?

Regards
Abhinandan

> -Original Message-
> From: Yigit, Ferruh 
> Sent: Saturday, July 3, 2021 4:56 AM
> To: Gujjar, Abhinandan S ; dev@dpdk.org;
> jer...@marvell.com; dpdk...@iol.unh.edu; acon...@redhat.com
> Cc: gak...@marvell.com; Power, Ciara ; Ali Alnubani
> 
> Subject: Re: [PATCH] test: fix crypto_op length for sessionless case
> 
> On 7/2/2021 7:08 PM, Gujjar, Abhinandan S wrote:
> > Hi Aaron/dpdklab,
> >
> > This patch's CI seems to have lot of false positive!
> > Ferruh triggered the re-test sometime back. Now, it is reporting less.
> > Could you please check from your end? Thanks!
> >
> 
> Only a malloc related unit test is still failing, which seems unrelated with 
> the
> patch. I am triggering it one more time, third time lucky.
> 
> Also after re-run, some tests passing now still shown as fail in the patchwork
> checks table. Isn't re-run sending the patchwork test status again?
> 
> > Regards
> > Abhinandan
> >
> >
> >> -Original Message-
> >> From: Gujjar, Abhinandan S 
> >> Sent: Wednesday, June 30, 2021 6:17 PM
> >> To: dev@dpdk.org; jer...@marvell.com
> >> Cc: gak...@marvell.com; Gujjar, Abhinandan S
> >> ; Power, Ciara 
> >> Subject: [PATCH] test: fix crypto_op length for sessionless case
> >>
> >> Currently, private_data_offset for the sessionless is computed
> >> wrongly which includes extra bytes added because of using
> >> sizeof(struct
> >> rte_crypto_sym_xform) * 2) instead of (sizeof(union
> >> rte_event_crypto_metadata)). Due to this buffer overflow, the
> >> corruption was leading to test application crash while freeing the ops
> mempool.
> >>
> >> Fixes: 3c2c535ecfc0 ("test: add event crypto adapter auto-test")
> >> Reported-by: ciara.po...@intel.com
> >>
> >> Signed-off-by: Abhinandan Gujjar 
> >> ---
> >>  app/test/test_event_crypto_adapter.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/app/test/test_event_crypto_adapter.c
> >> b/app/test/test_event_crypto_adapter.c
> >> index f689bc1f2..688ac0b2f 100644
> >> --- a/app/test/test_event_crypto_adapter.c
> >> +++ b/app/test/test_event_crypto_adapter.c
> >> @@ -229,7 +229,7 @@ test_op_forward_mode(uint8_t session_less)
> >>   first_xform = &cipher_xform;
> >>   sym_op->xform = first_xform;
> >>   uint32_t len = IV_OFFSET + MAXIMUM_IV_LENGTH +
> >> - (sizeof(struct rte_crypto_sym_xform) * 2);
> >> + (sizeof(union
> >> + rte_event_crypto_metadata));
> >>   op->private_data_offset = len;
> >>   /* Fill in private data information */
> >>   rte_memcpy(&m_data.response_info, &response_info, @@ -
> >> 424,7 +424,7 @@ test_op_new_mode(uint8_t session_less)
> >>   first_xform = &cipher_xform;
> >>   sym_op->xform = first_xform;
> >>   uint32_t len = IV_OFFSET + MAXIMUM_IV_LENGTH +
> >> - (sizeof(struct rte_crypto_sym_xform) * 2);
> >> + (sizeof(union
> >> + rte_event_crypto_metadata));
> >>   op->private_data_offset = len;
> >>   /* Fill in private data information */
> >>   rte_memcpy(&m_data.response_info, &response_info,
> >> --
> >> 2.25.1
> >



Re: [dpdk-dev] 19.11.9 patches review and test

2021-07-04 Thread Ali Alnubani
Hi Christian,
Sorry for the delay.

> -Original Message-
> From: Christian Ehrhardt 
> Sent: Friday, July 2, 2021 10:42 AM
> To: Ali Alnubani 
> Cc: sta...@dpdk.org; dev@dpdk.org; Abhishek Marathe
> ; Akhil Goyal ;
> benjamin.wal...@intel.com; David Christensen ;
> hariprasad.govindhara...@intel.com; Hemant Agrawal
> ; Ian Stokes ; Jerin
> Jacob ; John McNamara ;
> Ju-Hyoung Lee ; Kevin Traynor
> ; Luca Boccassi ; Pei Zhang
> ; pingx...@intel.com; qian.q...@intel.com; Raslan
> Darawsheh ; NBU-Contact-Thomas Monjalon
> ; yuan.p...@intel.com; zhaoyan.c...@intel.com
> Subject: Re: 19.11.9 patches review and test
> 
> On Thu, Jul 1, 2021 at 5:50 PM Ali Alnubani  wrote:
> >
> > Hi,
> >
> > > -Original Message-
> > > From: Christian Ehrhardt 
> > > Sent: Thursday, June 17, 2021 9:38 AM
> > > To: sta...@dpdk.org
> > > Cc: dev@dpdk.org; Abhishek Marathe
> ;
> > > Akhil Goyal ; Ali Alnubani
> > > ; benjamin.wal...@intel.com; David Christensen
> > > ; hariprasad.govindhara...@intel.com;
> Hemant
> > > Agrawal ; Ian Stokes
> ;
> > > Jerin Jacob ; John McNamara
> > > ; Ju-Hyoung Lee ;
> > > Kevin Traynor ; Luca Boccassi
> > > ; Pei Zhang ;
> > > pingx...@intel.com; qian.q...@intel.com; Raslan Darawsheh
> > > ; NBU-Contact-Thomas Monjalon
> > > ; yuan.p...@intel.com;
> zhaoyan.c...@intel.com
> > > Subject: 19.11.9 patches review and test
> > >
> > > Hi all,
> > >
> > > Here is a list of patches targeted for stable release 19.11.9.
> > >
> > > The planned date for the final release is 2nd July.
> > >
> > > Please help with testing and validation of your use cases and report
> > > any issues/results with reply-all to this mail. For the final
> > > release the fixes and reported validations will be added to the release
> notes.
> > >
> > > A renewed release candidate tarball can be found at:
> > >
> > > https://dpdk.org/browse/dpdk-stable/tag/?id=v19.11.9-rc3
> > >
> > > These patches are located at branch 19.11 of dpdk-stable repo:
> > > https://dpdk.org/browse/dpdk-stable/
> > >
> > > Thanks.
> > >
> > > Christian Ehrhardt 
> > >
> >
> > The following covers the functional tests that we ran on Mellanox
> hardware for this release:
> > - Basic functionality:
> >   Send and receive multiple types of traffic.
> > - testpmd xstats counter test.
> > - testpmd timestamp test.
> > - Changing/checking link status through testpmd.
> > - RTE flow tests:
> >   Items: eth / vlan / ipv4 / ipv6 / tcp / udp / icmp / gre / nvgre / vxlan 
> > / ip in
> ip / mplsoudp / mplsogre
> >   Actions: drop / queue / rss / mark / flag / jump / count / raw_encap
> > / raw_decap / vxlan_encap / vxlan_decap / NAT / dec_ttl
> > - Some RSS tests.
> > - VLAN filtering, stripping and insertion tests.
> > - Checksum and TSO tests.
> > - ptype tests.
> > - link_status_interrupt example application tests.
> > - l3fwd-power example application tests.
> > - Multi-process example applications tests.
> >
> > Functional tests ran on:
> > - NIC: ConnectX-4 Lx / OS: Ubuntu 20.04 LTS / Driver:
> > MLNX_OFED_LINUX-5.3-1.0.0.1 / Firmware: 14.30.1004
> > - NIC: ConnectX-5 / OS: Ubuntu 20.04 LTS / Driver:
> > MLNX_OFED_LINUX-5.3-1.0.0.1 / Firmware: 16.30.1004
> >
> > We discovered 2 new issues due to environment changes:
> > - can't create some rules with count action, fixed by
> https://inbox.dpdk.org/stable/20210621145105.963179-1-
> lmarga...@nvidia.com/T/#u.
> 
> As discussed in the release meeting yesterday, that is a regression and needs
> to be added.
> While I feel slightly tired of new respins this is what we do the validation 
> for.
> So thanks you (all!) for the testing!
> 
> I have now applied the referred fix to 19.11.9

Thank you.

> 
> FYI: I see this came up ~5 days into the -rc3 testing, but I missed it so I
> checked why.
> It was breaking the usual subject syntax (double colon) and not flagged for
> 19.11.9 (as requested in backport mails and the common
> style) in subject or body.
> 
> > - rte_flow hit counter doesn't increment, still discussing this internally.
> 
> For this issue please tell me until Monday what you expect.
> Does it look close to being resolved (then I'll hold -rc4 back a bit
> longer) or should I go on planning to release 19.11.9 without it (then I'd 
> tag -
> rc4 on Monday).

We are checking if we can resolve this in the next few hours. Hope that's ok 
with you.

> 
> Generally on -rc4 I think it would be sufficient if you would re-run your
> testing as your tests cover the only two things we have touched in -rc4 (clang
> 12 and the flow issue).
> I'm not "against" more tests, just saying that those should be sufficient in
> case no others come back then.
> Because other than the issues reported in here all other tests already came
> back fully green for -rc3.
> 
> FYI the usual non-official WIP repo on
> https://github.com/cpaelzer/dpdk-stable-queue/commits/19.11 holds what
> 19.11.9-rc4 would become if you want to pre-test anything.
> 
> > Compilation tests with multiple configurations in the followi

Re: [dpdk-dev] [PATCH v3 2/2] eal: handle compressed firmwares

2021-07-04 Thread Wang, Haiyue
> -Original Message-
> From: dev  On Behalf Of David Marchand
> Sent: Tuesday, June 29, 2021 16:07
> To: dev@dpdk.org
> Cc: Igor Russkikh ; Aaron Conole ; 
> Michael Santana
> ; Richardson, Bruce ; 
> Rasesh Mody
> ; Shahed Shaikh ; Yang, Qiming 
> ; Zhang,
> Qi Z ; Heinrich Kuhn ; 
> Devendra Singh Rawat
> ; Ray Kinsella ; Neil Horman 
> ; Dmitry
> Kozlyuk ; Narcisa Ana Maria Vasile 
> ; Dmitry
> Malloy ; Kadam, Pallavi 
> Subject: [dpdk-dev] [PATCH v3 2/2] eal: handle compressed firmwares
> 
> Introduce an internal firmware loading helper to remove code duplication
> in our drivers and handle xz compressed firmwares by calling libarchive.
> 
> This helper tries to look for .xz suffixes so that drivers are not aware
> the firmwares have been compressed.
> 
> libarchive is set as an optional dependency: without libarchive, a
> runtime warning is emitted so that users know there is a compressed
> firmware.
> 
> Windows implementation is left as an empty stub.
> 
> Signed-off-by: David Marchand 
> Reviewed-by: Igor Russkikh 
> ---
> Changes since v2:
> - added a comment on libarchive link dependency,
> 
> Changes since v1:
> - used pkg-config for libarchive detection,
> - updated doxygen annotations,
> - added internal helpers in eal_firmware.c to enhance readability,
> - dropped whitespace damage in version.map,
> 
> ---
>  .github/workflows/build.yml|   1 +
>  .travis.yml|   1 +
>  config/meson.build |  10 +++
>  drivers/net/bnx2x/bnx2x.c  |  35 +++-
>  drivers/net/ice/ice_ethdev.c   |  60 +++--
>  drivers/net/nfp/nfp_net.c  |  57 +++--
>  drivers/net/qede/qede_main.c   |  45 --
>  lib/eal/include/rte_firmware.h |  32 +++
>  lib/eal/unix/eal_firmware.c| 149 +
>  lib/eal/unix/meson.build   |   1 +
>  lib/eal/version.map|   1 +
>  lib/eal/windows/eal.c  |   9 ++
>  12 files changed, 259 insertions(+), 142 deletions(-)
>  create mode 100644 lib/eal/include/rte_firmware.h
>  create mode 100644 lib/eal/unix/eal_firmware.c
> 


> +int
> +rte_firmware_read(const char *name, void **buf, size_t *bufsz)
> +{
> + char path[PATH_MAX];
> + int ret;
> +
> + ret = firmware_read(name, buf, bufsz);
> + if (ret < 0) {
> + snprintf(path, sizeof(path), "%s.xz", name);
> + path[PATH_MAX - 1] = '\0';
> +#ifndef RTE_HAS_LIBARCHIVE
> + if (access(path, F_OK) == 0) {
> + RTE_LOG(WARNING, EAL, "libarchive not available, %s 
> cannot be decompressed\n",
> + path);
> + }
> +#else
> + ret = firmware_read(path, buf, bufsz);
> +#endif
> + }
> + return ret;
> +}


Since ice PMD needs to check if the firmware file with different name can be 
accessed
by some kind of order, before doing the final firmware selection. Should we 
also add
the firmware access API for handling this ?

bool
rte_firmware_access(const char *name)
{
char path[PATH_MAX];

if (access(name, F_OK) == 0)
return true;

snprintf(path, sizeof(path), "%s.xz", name);
path[PATH_MAX - 1] = '\0';
if (access(path, F_OK) == 0)
return true;

return false;
}


> --
> 2.23.0



[dpdk-dev] [PATCH v8 1/2] devargs: add common key definition

2021-07-04 Thread Xueming Li
Adds common devargs key definition for "bus", "class" and "driver".

Acked-by: Thomas Monjalon 
Signed-off-by: Xueming Li 
---
 drivers/common/mlx5/mlx5_common.h   |  2 --
 drivers/common/mlx5/mlx5_common_pci.c   |  2 +-
 drivers/common/sfc_efx/sfc_efx.c|  7 +++
 drivers/common/sfc_efx/sfc_efx.h|  2 --
 drivers/net/bonding/rte_eth_bond_args.c |  2 +-
 drivers/net/i40e/i40e_ethdev_vf.c   |  5 ++---
 drivers/net/iavf/iavf_ethdev.c  |  5 ++---
 drivers/net/mlx5/mlx5.c |  4 ++--
 drivers/net/sfc/sfc_kvargs.c|  2 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c   |  2 +-
 lib/eal/common/eal_common_devargs.c | 12 ++--
 lib/eal/include/rte_devargs.h   | 24 
 12 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.h 
b/drivers/common/mlx5/mlx5_common.h
index 1fbefe0fa6..306f2f1ab7 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -208,8 +208,6 @@ __rte_internal
 int mlx5_get_ifname_sysfs(const char *ibdev_path, char *ifname);
 
 
-#define MLX5_CLASS_ARG_NAME "class"
-
 enum mlx5_class {
MLX5_CLASS_INVALID,
MLX5_CLASS_NET = RTE_BIT64(0),
diff --git a/drivers/common/mlx5/mlx5_common_pci.c 
b/drivers/common/mlx5/mlx5_common_pci.c
index 3f16cd21cf..34747c4e07 100644
--- a/drivers/common/mlx5/mlx5_common_pci.c
+++ b/drivers/common/mlx5/mlx5_common_pci.c
@@ -118,7 +118,7 @@ bus_cmdline_options_handler(__rte_unused const char *key,
 static int
 parse_class_options(const struct rte_devargs *devargs)
 {
-   const char *key = MLX5_CLASS_ARG_NAME;
+   const char *key = RTE_DEVARGS_KEY_CLASS;
struct rte_kvargs *kvlist;
int ret = 0;
 
diff --git a/drivers/common/sfc_efx/sfc_efx.c b/drivers/common/sfc_efx/sfc_efx.c
index 0b78933d9f..2dc5545760 100644
--- a/drivers/common/sfc_efx/sfc_efx.c
+++ b/drivers/common/sfc_efx/sfc_efx.c
@@ -42,7 +42,6 @@ enum sfc_efx_dev_class
 sfc_efx_dev_class_get(struct rte_devargs *devargs)
 {
struct rte_kvargs *kvargs;
-   const char *key = SFC_EFX_KVARG_DEV_CLASS;
enum sfc_efx_dev_class dev_class = SFC_EFX_DEV_CLASS_NET;
 
if (devargs == NULL)
@@ -52,9 +51,9 @@ sfc_efx_dev_class_get(struct rte_devargs *devargs)
if (kvargs == NULL)
return dev_class;
 
-   if (rte_kvargs_count(kvargs, key) != 0) {
-   rte_kvargs_process(kvargs, key, sfc_efx_kvarg_dev_class_handler,
-  &dev_class);
+   if (rte_kvargs_count(kvargs, RTE_DEVARGS_KEY_CLASS) != 0) {
+   rte_kvargs_process(kvargs, RTE_DEVARGS_KEY_CLASS,
+  sfc_efx_kvarg_dev_class_handler, &dev_class);
}
 
rte_kvargs_free(kvargs);
diff --git a/drivers/common/sfc_efx/sfc_efx.h b/drivers/common/sfc_efx/sfc_efx.h
index 6b6164cb1f..c16eca60f3 100644
--- a/drivers/common/sfc_efx/sfc_efx.h
+++ b/drivers/common/sfc_efx/sfc_efx.h
@@ -19,8 +19,6 @@
 extern "C" {
 #endif
 
-#define SFC_EFX_KVARG_DEV_CLASS"class"
-
 enum sfc_efx_dev_class {
SFC_EFX_DEV_CLASS_INVALID = 0,
SFC_EFX_DEV_CLASS_NET,
diff --git a/drivers/net/bonding/rte_eth_bond_args.c 
b/drivers/net/bonding/rte_eth_bond_args.c
index 764b1b8c8e..5406e1c934 100644
--- a/drivers/net/bonding/rte_eth_bond_args.c
+++ b/drivers/net/bonding/rte_eth_bond_args.c
@@ -18,7 +18,7 @@ const char *pmd_bond_init_valid_arguments[] = {
PMD_BOND_SOCKET_ID_KVARG,
PMD_BOND_MAC_ADDR_KVARG,
PMD_BOND_AGG_MODE_KVARG,
-   "driver",
+   RTE_DEVARGS_KEY_DRIVER,
NULL
 };
 
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 385ebedcd3..0cfe13b7b2 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1660,7 +1660,6 @@ static int
 i40evf_driver_selected(struct rte_devargs *devargs)
 {
struct rte_kvargs *kvlist;
-   const char *key = "driver";
int ret = 0;
 
if (devargs == NULL)
@@ -1670,13 +1669,13 @@ i40evf_driver_selected(struct rte_devargs *devargs)
if (kvlist == NULL)
return 0;
 
-   if (!rte_kvargs_count(kvlist, key))
+   if (!rte_kvargs_count(kvlist, RTE_DEVARGS_KEY_DRIVER))
goto exit;
 
/* i40evf driver selected when there's a key-value pair:
 * driver=i40evf
 */
-   if (rte_kvargs_process(kvlist, key,
+   if (rte_kvargs_process(kvlist, RTE_DEVARGS_KEY_DRIVER,
   i40evf_check_driver_handler, NULL) < 0)
goto exit;
 
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index 5290588b17..472538181e 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -2448,7 +2448,6 @@ static int
 iavf_drv_i40evf_selected(struct rte_devargs *devargs, uint16_t device_id)
 {
struct rte_kvargs *kvlist;
-   const char *key = 

[dpdk-dev] [PATCH v8 2/2] bus/auxiliary: introduce auxiliary bus

2021-07-04 Thread Xueming Li
Auxiliary bus [1] provides a way to split function into child-devices
representing sub-domains of functionality. Each auxiliary device
represents a part of its parent functionality.

Auxiliary device is identified by unique device name, sysfs path:
  /sys/bus/auxiliary/devices/

Devargs legacy syntax of auxiliary device:
  -a auxiliary:[,args...]
Devargs generic syntax of auxiliary device:
  -a bus=auxiliary,name=/class=/driver=[,args...]

[1] kernel auxiliary bus document:
https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html

Signed-off-by: Xueming Li 
Cc: Wang Haiyue 
Cc: Thomas Monjalon 
Cc: Kinsella Ray 
Acked-by: Andrew Rybchenko 
---
 MAINTAINERS   |   5 +
 doc/guides/rel_notes/release_21_08.rst|   6 +
 drivers/bus/auxiliary/auxiliary_common.c  | 411 ++
 drivers/bus/auxiliary/auxiliary_params.c  |  59 
 drivers/bus/auxiliary/linux/auxiliary.c   | 141 
 drivers/bus/auxiliary/meson.build |  16 +
 drivers/bus/auxiliary/private.h   |  91 +
 drivers/bus/auxiliary/rte_bus_auxiliary.h | 193 ++
 drivers/bus/auxiliary/version.map |   7 +
 drivers/bus/meson.build   |   1 +
 10 files changed, 930 insertions(+)
 create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
 create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
 create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
 create mode 100644 drivers/bus/auxiliary/meson.build
 create mode 100644 drivers/bus/auxiliary/private.h
 create mode 100644 drivers/bus/auxiliary/rte_bus_auxiliary.h
 create mode 100644 drivers/bus/auxiliary/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16971..9b0d0d4348 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -525,6 +525,11 @@ F: doc/guides/mempool/octeontx2.rst
 Bus Drivers
 ---
 
+Auxiliary bus driver - EXPERIMENTAL
+M: Parav Pandit 
+M: Xueming Li 
+F: drivers/bus/auxiliary/
+
 Intel FPGA bus
 M: Rosen Xu 
 F: drivers/bus/ifpga/
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..e7ef4c8a05 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,12 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Added auxiliary bus support.**
+
+  Auxiliary bus provides a way to split function into child-devices
+  representing sub-domains of functionality. Each auxiliary device
+  represents a part of its parent functionality.
+
 
 Removed Items
 -
diff --git a/drivers/bus/auxiliary/auxiliary_common.c 
b/drivers/bus/auxiliary/auxiliary_common.c
new file mode 100644
index 00..abe6aef098
--- /dev/null
+++ b/drivers/bus/auxiliary/auxiliary_common.c
@@ -0,0 +1,411 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "private.h"
+#include "rte_bus_auxiliary.h"
+
+static struct rte_devargs *
+auxiliary_devargs_lookup(const char *name)
+{
+   struct rte_devargs *devargs;
+
+   RTE_EAL_DEVARGS_FOREACH(RTE_BUS_AUXILIARY_NAME, devargs) {
+   if (strcmp(devargs->name, name) == 0)
+   return devargs;
+   }
+   return NULL;
+}
+
+/*
+ * Test whether the auxiliary device exist.
+ *
+ * Stub for OS not supporting auxiliary bus.
+ */
+__rte_weak bool
+auxiliary_dev_exists(const char *name)
+{
+   RTE_SET_USED(name);
+   return false;
+}
+
+/*
+ * Scan the devices in the auxiliary bus.
+ *
+ * Stub for OS not supporting auxiliary bus.
+ */
+__rte_weak int
+auxiliary_scan(void)
+{
+   return 0;
+}
+
+/*
+ * Update a device's devargs being scanned.
+ *
+ * @param aux_dev
+ * AUXILIARY device.
+ */
+void
+auxiliary_on_scan(struct rte_auxiliary_device *aux_dev)
+{
+   aux_dev->device.devargs = auxiliary_devargs_lookup(aux_dev->name);
+}
+
+/*
+ * Match the auxiliary driver and device using driver function.
+ */
+bool
+auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
+   const struct rte_auxiliary_device *aux_dev)
+{
+   if (aux_drv->match == NULL)
+   return false;
+   return aux_drv->match(aux_dev->name);
+}
+
+/*
+ * Call the probe() function of the driver.
+ */
+static int
+rte_auxiliary_probe_one_driver(struct rte_auxiliary_driver *drv,
+  struct rte_auxiliary_device *dev)
+{
+   enum rte_iova_mode iova_mode;
+   int ret;
+
+   if (drv == NULL || dev == NULL)
+   return -EINVAL;
+
+   /* Check if driver supports it. */
+   if (!auxiliary_match(drv, dev))
+   /* Match of device and driver failed */
+   return 1;
+
+

Re: [dpdk-dev] [PATCH v7 2/2] bus/auxiliary: introduce auxiliary bus

2021-07-04 Thread Xueming(Steven) Li
Made changes and forgot to commit :(  please ignore and refer to new v8.

> -Original Message-
> From: Xueming(Steven) Li 
> Sent: Monday, July 5, 2021 1:37 PM
> Cc: dev@dpdk.org; Xueming(Steven) Li ; Wang Haiyue 
> ; NBU-Contact-Thomas
> Monjalon ; Kinsella Ray ; Andrew 
> Rybchenko ; Parav
> Pandit ; Neil Horman 
> Subject: [PATCH v7 2/2] bus/auxiliary: introduce auxiliary bus
> 
> Auxiliary bus [1] provides a way to split function into child-devices 
> representing sub-domains of functionality. Each auxiliary device
> represents a part of its parent functionality.
> 
> Auxiliary device is identified by unique device name, sysfs path:
>   /sys/bus/auxiliary/devices/
> 
> Devargs legacy syntax of auxiliary device:
>   -a auxiliary:[,args...]
> Devargs generic syntax of auxiliary device:
>   -a bus=auxiliary,name=/class=/driver=[,args...]
> 
> [1] kernel auxiliary bus document:
> https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
> 
> Signed-off-by: Xueming Li 
> Cc: Wang Haiyue 
> Cc: Thomas Monjalon 
> Cc: Kinsella Ray 
> Acked-by: Andrew Rybchenko 
> ---
>  MAINTAINERS   |   5 +
>  doc/guides/rel_notes/release_21_08.rst|   6 +
>  drivers/bus/auxiliary/auxiliary_common.c  | 411 ++  
> drivers/bus/auxiliary/auxiliary_params.c  |  59 
>  drivers/bus/auxiliary/linux/auxiliary.c   | 141 
>  drivers/bus/auxiliary/meson.build |  16 +
>  drivers/bus/auxiliary/private.h   |  74 
>  drivers/bus/auxiliary/rte_bus_auxiliary.h | 201 +++
>  drivers/bus/auxiliary/version.map |   7 +
>  drivers/bus/meson.build   |   1 +
>  10 files changed, 921 insertions(+)
>  create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
>  create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
>  create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
>  create mode 100644 drivers/bus/auxiliary/meson.build  create mode 100644 
> drivers/bus/auxiliary/private.h  create mode 100644
> drivers/bus/auxiliary/rte_bus_auxiliary.h
>  create mode 100644 drivers/bus/auxiliary/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5877a16971..9b0d0d4348 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -525,6 +525,11 @@ F: doc/guides/mempool/octeontx2.rst  Bus Drivers
>  ---
> 
> +Auxiliary bus driver - EXPERIMENTAL
> +M: Parav Pandit 
> +M: Xueming Li 
> +F: drivers/bus/auxiliary/
> +
>  Intel FPGA bus
>  M: Rosen Xu 
>  F: drivers/bus/ifpga/
> diff --git a/doc/guides/rel_notes/release_21_08.rst 
> b/doc/guides/rel_notes/release_21_08.rst
> index a6ecfdf3ce..e7ef4c8a05 100644
> --- a/doc/guides/rel_notes/release_21_08.rst
> +++ b/doc/guides/rel_notes/release_21_08.rst
> @@ -55,6 +55,12 @@ New Features
>   Also, make sure to start the actual text at the margin.
>   ===
> 
> +* **Added auxiliary bus support.**
> +
> +  Auxiliary bus provides a way to split function into child-devices
> + representing sub-domains of functionality. Each auxiliary device
> + represents a part of its parent functionality.
> +
> 
>  Removed Items
>  -
> diff --git a/drivers/bus/auxiliary/auxiliary_common.c 
> b/drivers/bus/auxiliary/auxiliary_common.c
> new file mode 100644
> index 00..8a75306da5
> --- /dev/null
> +++ b/drivers/bus/auxiliary/auxiliary_common.c
> @@ -0,0 +1,411 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2021 NVIDIA Corporation & Affiliates  */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "private.h"
> +#include "rte_bus_auxiliary.h"
> +
> +static struct rte_devargs *
> +auxiliary_devargs_lookup(const char *name) {
> + struct rte_devargs *devargs;
> +
> + RTE_EAL_DEVARGS_FOREACH(RTE_BUS_AUXILIARY_NAME, devargs) {
> + if (strcmp(devargs->name, name) == 0)
> + return devargs;
> + }
> + return NULL;
> +}
> +
> +/*
> + * Test whether the auxiliary device exist
> + *
> + * Stub for OS not supporting auxiliary bus.
> + */
> +__rte_weak bool
> +auxiliary_dev_exists(const char *name)
> +{
> + RTE_SET_USED(name);
> + return false;
> +}
> +
> +/*
> + * Scan the devices in the auxiliary bus.
> + *
> + * Stub for OS not supporting auxiliary bus.
> + */
> +__rte_weak int
> +auxiliary_scan(void)
> +{
> + return 0;
> +}
> +
> +/*
> + * Update a device's devargs being scanned.
> + *
> + * @param aux_dev
> + *   AUXILIARY device.
> + */
> +void
> +auxiliary_on_scan(struct rte_auxiliary_device *aux_dev) {
> + aux_dev->device.devargs = auxiliary_devargs_lookup(aux_dev->name);
> +}
> +
> +/*
> + * Match the auxiliary driver and device using driver function.
> + */
> +bool
> +auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
> + 

Re: [dpdk-dev] [PATCH v3 2/2] eal: handle compressed firmwares

2021-07-04 Thread David Marchand
On Mon, Jul 5, 2021 at 8:35 AM Wang, Haiyue  wrote:
> > +int
> > +rte_firmware_read(const char *name, void **buf, size_t *bufsz)
> > +{
> > + char path[PATH_MAX];
> > + int ret;
> > +
> > + ret = firmware_read(name, buf, bufsz);
> > + if (ret < 0) {
> > + snprintf(path, sizeof(path), "%s.xz", name);
> > + path[PATH_MAX - 1] = '\0';
> > +#ifndef RTE_HAS_LIBARCHIVE
> > + if (access(path, F_OK) == 0) {
> > + RTE_LOG(WARNING, EAL, "libarchive not available, %s 
> > cannot be decompressed\n",
> > + path);
> > + }
> > +#else
> > + ret = firmware_read(path, buf, bufsz);
> > +#endif
> > + }
> > + return ret;
> > +}
>
>
> Since ice PMD needs to check if the firmware file with different name can be 
> accessed
> by some kind of order, before doing the final firmware selection. Should we 
> also add
> the firmware access API for handling this ?

I don't see the need.
Is the behavior changed for net/ice with this patch?


-- 
David Marchand



Re: [dpdk-dev] [PATCH] ethdev: keep count of allocated and used representor ranges

2021-07-04 Thread Viacheslav Galaktionov

Hello!

On 2021-07-03 04:32, Xueming(Steven) Li wrote:

+   if (i == n_entries)
+   break;
}
 out:
+   info->nb_ranges = i;


Here info maybe NULL.


Good catch, thanks for noticing!


faf3bd901d..d2b27c351f 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4856,6 +4856,8 @@ struct rte_eth_representor_range {  struct 
rte_eth_representor_info {

uint16_t controller; /**< Controller ID of caller device. */
uint16_t pf; /**< Physical function ID of caller device. */
+   uint32_t nb_ranges_alloc; /**< Size of the ranges array. */
+   uint32_t nb_ranges; /**< Number of initialized ranges. */


How about rte_eth_representor_info_get(info) return max ranges size if
info is NULL,
return real initialized ranges if info not NULL?


I'm not sure how I feel about it. I think it'd be best if the function 
returned

just one thing.

Moreover, there are drivers that don't have a fixed structure for 
representor

IDs, e.g. net/sfc, where every range will contain a single ID. If a new
representor range is created between invocations of this function, there
probably should be a way for the caller to know about this.

Perhaps we should move the total number of representors to an out 
parameter and
use the return value for the number of initialized ranges. What do you 
think

about this?