[dpdk-dev] [PATCH v4] vhost: check header for legacy dequeue offload

2021-06-15 Thread Xiao Wang
When parsing the virtio net header and packet header for dequeue offload,
we need to perform sanity check on the packet header to ensure:
  - No out-of-boundary memory access.
  - The packet header and virtio_net header are valid and aligned.

Fixes: d0cf91303d73 ("vhost: add Tx offload capabilities")
Cc: sta...@dpdk.org

Signed-off-by: Xiao Wang 
---
v4:
- Rebase on head of main branch.
- Allow empty L4 payload in GSO.

v3:
- Check data_len before calling rte_pktmbuf_mtod. (David)

v2:
- Allow empty L4 payload for cksum offload. (Konstantin)
---
 lib/vhost/virtio_net.c | 52 +++---
 1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 8da8a86a10..351ff0a841 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2259,44 +2259,64 @@ virtio_net_with_host_offload(struct virtio_net *dev)
return false;
 }
 
-static void
-parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
+static int
+parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr,
+   uint16_t *len)
 {
struct rte_ipv4_hdr *ipv4_hdr;
struct rte_ipv6_hdr *ipv6_hdr;
void *l3_hdr = NULL;
struct rte_ether_hdr *eth_hdr;
uint16_t ethertype;
+   uint16_t data_len = m->data_len;
+
+   if (data_len <= sizeof(struct rte_ether_hdr))
+   return -EINVAL;
 
eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
 
m->l2_len = sizeof(struct rte_ether_hdr);
ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
+   data_len -= sizeof(struct rte_ether_hdr);
 
if (ethertype == RTE_ETHER_TYPE_VLAN) {
+   if (data_len <= sizeof(struct rte_vlan_hdr))
+   return -EINVAL;
+
struct rte_vlan_hdr *vlan_hdr =
(struct rte_vlan_hdr *)(eth_hdr + 1);
 
m->l2_len += sizeof(struct rte_vlan_hdr);
ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
+   data_len -= sizeof(struct rte_vlan_hdr);
}
 
l3_hdr = (char *)eth_hdr + m->l2_len;
 
switch (ethertype) {
case RTE_ETHER_TYPE_IPV4:
+   if (data_len <= sizeof(struct rte_ipv4_hdr))
+   return -EINVAL;
ipv4_hdr = l3_hdr;
*l4_proto = ipv4_hdr->next_proto_id;
m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
+   if (data_len <= m->l3_len) {
+   m->l3_len = 0;
+   return -EINVAL;
+   }
*l4_hdr = (char *)l3_hdr + m->l3_len;
m->ol_flags |= PKT_TX_IPV4;
+   data_len -= m->l3_len;
break;
case RTE_ETHER_TYPE_IPV6:
+   if (data_len <= sizeof(struct rte_ipv6_hdr))
+   return -EINVAL;
ipv6_hdr = l3_hdr;
*l4_proto = ipv6_hdr->proto;
m->l3_len = sizeof(struct rte_ipv6_hdr);
*l4_hdr = (char *)l3_hdr + m->l3_len;
m->ol_flags |= PKT_TX_IPV6;
+   data_len -= m->l3_len;
break;
default:
m->l3_len = 0;
@@ -2304,6 +2324,9 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, 
void **l4_hdr)
*l4_hdr = NULL;
break;
}
+
+   *len = data_len;
+   return 0;
 }
 
 static __rte_always_inline void
@@ -2312,21 +2335,27 @@ vhost_dequeue_offload_legacy(struct virtio_net_hdr 
*hdr, struct rte_mbuf *m)
uint16_t l4_proto = 0;
void *l4_hdr = NULL;
struct rte_tcp_hdr *tcp_hdr = NULL;
+   uint16_t len = 0, tcp_len;
+
+   if (parse_ethernet(m, &l4_proto, &l4_hdr, &len) < 0)
+   return;
 
-   parse_ethernet(m, &l4_proto, &l4_hdr);
if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
if (hdr->csum_start == (m->l2_len + m->l3_len)) {
switch (hdr->csum_offset) {
case (offsetof(struct rte_tcp_hdr, cksum)):
-   if (l4_proto == IPPROTO_TCP)
+   if (l4_proto == IPPROTO_TCP &&
+   len >= sizeof(struct rte_tcp_hdr))
m->ol_flags |= PKT_TX_TCP_CKSUM;
break;
case (offsetof(struct rte_udp_hdr, dgram_cksum)):
-   if (l4_proto == IPPROTO_UDP)
+   if (l4_proto == IPPROTO_UDP &&
+   len >= sizeof(struct rte_udp_hdr))
m->ol_flags |= PKT_TX_UDP_CKSUM;
break;
case (offsetof(struct rte_sctp_hdr, cksum)):
-   if (l4_proto == IPPROTO_SCTP)
+   if (l4_pr

Re: [dpdk-dev] 19.11.9 patches review and test - V2

2021-06-15 Thread Christian Ehrhardt
On Tue, Jun 15, 2021 at 5:17 AM Pei Zhang  wrote:
>
> Hi Christian,
>
> The testing with dpdk 19.11.9-rc2 from Red Hat looks good. We tested below 16 
> scenarios and all got PASS on RHEL8:

Thank you so much!

I'm rather confident now that -rc3 will then also be a smooth ride.

> (1)Guest with device assignment(PF) throughput testing(1G hugepage size): PASS
> (2)Guest with device assignment(PF) throughput testing(2M hugepage size) : 
> PASS
> (3)Guest with device assignment(VF) throughput testing: PASS
> (4)PVP (host dpdk testpmd as vswitch) 1Q: throughput testing: PASS
> (5)PVP vhost-user 2Q throughput testing: PASS
> (6)PVP vhost-user 1Q - cross numa node throughput testing: PASS
> (7)Guest with vhost-user 2 queues throughput testing: PASS
> (8)vhost-user reconnect with dpdk-client, qemu-server: qemu reconnect: PASS
> (9)vhost-user reconnect with dpdk-client, qemu-server: ovs reconnect: PASS
> (10)PVP 1Q live migration testing: PASS
> (11)PVP 1Q cross numa node live migration testing: PASS
> (12)Guest with ovs+dpdk+vhost-user 1Q live migration testing: PASS
> (13)Guest with ovs+dpdk+vhost-user 1Q live migration testing (2M): PASS
> (14)Guest with ovs+dpdk+vhost-user 2Q live migration testing: PASS
> (15)Host PF + DPDK testing: PASS
> (16)Host VF + DPDK testing: PASS
>
> Versions:
>
> kernel 4.18
> qemu 6.0
>
> dpdk: git://dpdk.org/dpdk-stable
>
> # git branch
> remotes/origin/19.11
>
> # git log -1
> commit bb144e7a1c5e7709c74b3096179f6296e77923da (HEAD, tag: v19.11.9-rc2, 
> origin/19.11)
> Author: Christian Ehrhardt 
> Date:   Fri Jun 4 07:46:13 2021 +0200
>
> version: 19.11.9-rc2
>
> Signed-off-by: Christian Ehrhardt 
>
>
>
> NICs: X540-AT2 NIC(ixgbe, 10G)
>
> Best regards,
>
> Pei
>
> On Fri, Jun 4, 2021 at 1:52 PM Christian Ehrhardt 
>  wrote:
>>
>> Hi all,
>>
>> Here is version 2 of the list of patches targeted for stable release 19.11.9.
>> Thanks to plenty of helpful developers we've collected a few more backports
>> by now and sorted out a few rare compile time issues that were found with 
>> -rc1.
>>
>> The planned date for the final release of 19.11.9 is now 18th of June.
>>
>> Please help with testing and validation of your use cases and report
>> any issues/results with reply-all to this mail. For the final release
>> the fixes and reported validations will be added to the release notes.
>>
>> A release candidate tarball can be found at:
>>
>> https://dpdk.org/browse/dpdk-stable/tag/?id=v19.11.9-rc2
>>
>> These patches are located at branch 19.11 of dpdk-stable repo:
>> https://dpdk.org/browse/dpdk-stable/
>>
>> Thanks.
>>
>> Christian Ehrhardt 
>>
>> ---
>> Adam Dybkowski (2):
>>   common/qat: increase IM buffer size for GEN3
>>   compress/qat: enable compression on GEN3
>>
>> Ajit Khaparde (3):
>>   net/bnxt: fix RSS context cleanup
>>   net/bnxt: fix mismatched type comparison in MAC restore
>>   net/bnxt: check PCI config read
>>
>> Alvin Zhang (6):
>>   net/ice: fix VLAN filter with PF
>>   net/i40e: fix input set field mask
>>   net/e1000: fix Rx error counter for bad length
>>   net/e1000: fix max Rx packet size
>>   net/ice: fix fast mbuf freeing
>>   net/iavf: fix VF to PF command failure handling
>>
>> Anatoly Burakov (3):
>>   fbarray: fix log message on truncation error
>>   power: do not skip saving original P-state governor
>>   power: save original ACPI governor always
>>
>> Andrew Rybchenko (2):
>>   net/failsafe: fix RSS hash offload reporting
>>   net/failsafe: report minimum and maximum MTU
>>
>> Apeksha Gupta (1):
>>   examples/l2fwd-crypto: skip masked devices
>>
>> Arek Kusztal (1):
>>   crypto/qat: fix offset for out-of-place scatter-gather
>>
>> Beilei Xing (1):
>>   net/i40evf: fix packet loss for X722
>>
>> Bruce Richardson (1):
>>   build: exclude meson files from examples installation
>>
>> Chaoyong He (1):
>>   doc: fix multiport syntax in nfp guide
>>
>> Chenbo Xia (1):
>>   examples/vhost: check memory table query
>>
>> Chengchang Tang (12):
>>   ethdev: validate input in module EEPROM dump
>>   ethdev: validate input in register info
>>   ethdev: validate input in EEPROM info
>>   net/hns3: fix rollback after setting PVID failure
>>   examples: add eal cleanup to examples
>>   net/bonding: fix adding itself as its slave
>>   app/testpmd: fix max queue number for Tx offloads
>>   net/tap: fix interrupt vector array size
>>   net/bonding: fix socket ID check
>>   net/tap: check ioctl on restore
>>   net/hns3: fix HW buffer size on MTU update
>>   net/hns3: fix processing Tx offload flags
>>
>> Chengwen Feng (24):
>>   net/hns3: fix flow counter value
>>   net/hns3: fix VF mailbox head field
>>   net/hns3: support get device version when dump register
>>   test: check thread creation
>>   common/dpaax: fix possible null pointer access
>>   examples/ethtool: remove unused parsing
>>   n

Re: [dpdk-dev] [PATCH v5 05/24] net/ngbe: add log type and error type

2021-06-15 Thread Jiawen Wu
On Tuesday, June 15, 2021 1:55 AM, Andrew Rybchenko wrote:
> On 6/2/21 12:40 PM, Jiawen Wu wrote:
> > Add log type and error type to trace functions.
> >
> > Signed-off-by: Jiawen Wu 
> > ---
> >   doc/guides/nics/ngbe.rst|  20 +
> >   drivers/net/ngbe/base/ngbe_status.h | 125
> 
> >   drivers/net/ngbe/base/ngbe_type.h   |   1 +
> >   drivers/net/ngbe/ngbe_ethdev.c  |  16 
> >   drivers/net/ngbe/ngbe_logs.h|  46 ++
> >   5 files changed, 208 insertions(+)
> >   create mode 100644 drivers/net/ngbe/base/ngbe_status.h
> >   create mode 100644 drivers/net/ngbe/ngbe_logs.h
> >
> > diff --git a/doc/guides/nics/ngbe.rst b/doc/guides/nics/ngbe.rst index
> > 4ec2623a05..c274a15aab 100644
> > --- a/doc/guides/nics/ngbe.rst
> > +++ b/doc/guides/nics/ngbe.rst
> > @@ -15,6 +15,26 @@ Prerequisites
> >
> >   - Follow the DPDK :ref:`Getting Started Guide for Linux ` to
> setup the basic DPDK environment.
> >
> 
> It should be two empty lines before section start.
> 
> > +Pre-Installation Configuration
> > +--
> > +
> > +Dynamic Logging Parameters
> > +~~
> > +
> > +One may leverage EAL option "--log-level" to change default levels
> > +for the log types supported by the driver. The option is used with an
> > +argument typically consisting of two parts separated by a colon.
> > +
> > +NGBE PMD provides the following log types available for control:
> > +
> > +- ``pmd.net.ngbe.driver`` (default level is **notice**)
> > +
> > +  Affects driver-wide messages unrelated to any particular devices.
> > +
> > +- ``pmd.net.ngbe.init`` (default level is **notice**)
> > +
> > +  Extra logging of the messages during PMD initialization.
> > +
> 
> Same here.
> 
> >   Driver compilation and testing
> >   --
> >
> > diff --git a/drivers/net/ngbe/base/ngbe_status.h
> > b/drivers/net/ngbe/base/ngbe_status.h
> > new file mode 100644
> > index 00..b1836c6479
> > --- /dev/null
> > +++ b/drivers/net/ngbe/base/ngbe_status.h
> > @@ -0,0 +1,125 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018-2020 Beijing WangXun Technology Co., Ltd.
> > + * Copyright(c) 2010-2017 Intel Corporation  */
> > +
> > +#ifndef _NGBE_STATUS_H_
> > +#define _NGBE_STATUS_H_
> > +
> > +/* Error Codes:
> > + * common error
> > + * module error(simple)
> > + * module error(detailed)
> > + *
> > + * (-256, 256): reserved for non-ngbe defined error code  */ #define
> > +TERR_BASE (0x100) enum ngbe_error {
> > +   TERR_NULL = TERR_BASE,
> > +   TERR_ANY,
> > +   TERR_NOSUPP,
> > +   TERR_NOIMPL,
> > +   TERR_NOMEM,
> > +   TERR_NOSPACE,
> > +   TERR_NOENTRY,
> > +   TERR_CONFIG,
> > +   TERR_ARGS,
> > +   TERR_PARAM,
> > +   TERR_INVALID,
> > +   TERR_TIMEOUT,
> > +   TERR_VERSION,
> > +   TERR_REGISTER,
> > +   TERR_FEATURE,
> > +   TERR_RESET,
> > +   TERR_AUTONEG,
> > +   TERR_MBX,
> > +   TERR_I2C,
> > +   TERR_FC,
> > +   TERR_FLASH,
> > +   TERR_DEVICE,
> > +   TERR_HOSTIF,
> > +   TERR_SRAM,
> > +   TERR_EEPROM,
> > +   TERR_EEPROM_CHECKSUM,
> > +   TERR_EEPROM_PROTECT,
> > +   TERR_EEPROM_VERSION,
> > +   TERR_MAC,
> > +   TERR_MAC_ADDR,
> > +   TERR_SFP,
> > +   TERR_SFP_INITSEQ,
> > +   TERR_SFP_PRESENT,
> > +   TERR_SFP_SUPPORT,
> > +   TERR_SFP_SETUP,
> > +   TERR_PHY,
> > +   TERR_PHY_ADDR,
> > +   TERR_PHY_INIT,
> > +   TERR_FDIR_CMD,
> > +   TERR_FDIR_REINIT,
> > +   TERR_SWFW_SYNC,
> > +   TERR_SWFW_COMMAND,
> > +   TERR_FC_CFG,
> > +   TERR_FC_NEGO,
> > +   TERR_LINK_SETUP,
> > +   TERR_PCIE_PENDING,
> > +   TERR_PBA_SECTION,
> > +   TERR_OVERTEMP,
> > +   TERR_UNDERTEMP,
> > +   TERR_XPCS_POWERUP,
> > +};
> > +
> > +/* WARNING: just for legacy compatibility */ #define
> > +NGBE_NOT_IMPLEMENTED 0x7FFF
> > +#define NGBE_ERR_OPS_DUMMY   0x3FFF
> > +
> > +/* Error Codes */
> > +#define NGBE_ERR_EEPROM-(TERR_BASE + 1)
> > +#define NGBE_ERR_EEPROM_CHECKSUM   -(TERR_BASE + 2)
> > +#define NGBE_ERR_PHY   -(TERR_BASE + 3)
> > +#define NGBE_ERR_CONFIG-(TERR_BASE + 4)
> > +#define NGBE_ERR_PARAM -(TERR_BASE + 5)
> > +#define NGBE_ERR_MAC_TYPE  -(TERR_BASE + 6)
> > +#define NGBE_ERR_UNKNOWN_PHY   -(TERR_BASE + 7)
> > +#define NGBE_ERR_LINK_SETUP-(TERR_BASE + 8)
> > +#define NGBE_ERR_ADAPTER_STOPPED   -(TERR_BASE + 9)
> > +#define NGBE_ERR_INVALID_MAC_ADDR  -(TERR_BASE + 10)
> > +#define NGBE_ERR_DEVICE_NOT_SUPPORTED  -(TERR_BASE + 11)
> > +#define NGBE_ERR_MASTER_REQUESTS_PENDING   -(TERR_BASE + 12)
> > +#define NGBE_ERR_INVALID_LINK_SETTINGS -(TERR_BASE + 13)
> > +#define NGBE_ERR_AUTONEG_NOT_COMPLETE  -(TERR_BASE + 14)
> > +#define NGBE_ERR_RESET_FAILED  -(TERR_BASE + 15)
> > +#define NGBE_ERR_SWFW_SYNC -(TERR_BASE + 16)

Re: [dpdk-dev] Memory leak in rte_pci_scan

2021-06-15 Thread David Marchand
Hi Owen,

On Mon, Jun 14, 2021 at 10:42 PM Owen Hilyard  wrote:
>
> From what I've seen so far, that fixes the PCI leak. That just leaves a few 
> other places. I'll try to get the complete list to you tomorrow, since 
> running the full set of unit tests takes quite a few hours when ASAN is 
> involved.

Ok, thanks for the test.
I'll send a patch on the pci bus.

Just odd that it tooks hours in your case.
On my f32 system, it took 3/4 minutes.
It is probably worth looking why the difference.
We can't have asan enabled at UNH otherwise.


-- 
David Marchand



Re: [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in DPDK

2021-06-15 Thread Thomas Monjalon
15/06/2021 04:49, Xia, Chenbo:
> From: Thomas Monjalon 
> > 01/06/2021 05:06, Chenbo Xia:
> > > Hi everyone,
> > >
> > > This is a draft implementation of the mdev (Mediated device [1])
> > > support in DPDK PCI bus driver. Mdev is a way to virtualize devices
> > > in Linux kernel. Based on the device-api (mdev_type/device_api),
> > > there could be different types of mdev devices (e.g. vfio-pci).
> > 
> > Please could you illustrate with an usage of mdev in DPDK?
> > What does it enable which is not possible today?
> 
> The main purpose is for DPDK to drive mdev-based devices, which is not
> possible today.
> 
> I'd take PCI devices for an example. Currently DPDK can only drive devices
> of physical pci bus under /sys/bus/pci and kernel exposes the pci devices
> to APP in that way.
> 
> But there are PCI devices using vfio-mdev as a software framework to expose
> Mdev to APP under /sys/bus/mdev. Devices could choose this way of virtualizing
> itself to let multiple APPs share one physical device. For example, Intel
> Scalable IOV technology is known to use vfio-mdev as SW framework for Scalable
> IOV enabled devices (and Intel net/crypto/raw devices support this tech). For
> those mdev-based devices, DPDK needs support on the bus layer to 
> scan/plug/probe/..
> them, which is the main effort this patchset does. There are also other 
> devices
> using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using mdev
> and Intel's GPU virtualization also uses it.

Yes mdev was designed for virtualization I think.
The use of mdev for Scalable IOV without virtualization
may be seen as an abuse by Linux maintainers,
as they currently seem to prefer the auxiliary bus (which is a real bus).

Mellanox got a push back when trying to use mdev for the same purpose
(Scalable Function, also called Sub-Function) in the kernel.
The Linux community decided to use the auxiliary bus.

Any other feedback on the choice mdev vs aux?
Is there any kernel code supporting this mdev model for Intel devices?

> > > In this patchset, the PCI bus driver is extended to support scanning
> > > and probing the mdev devices whose device-api is "vfio-pci".
> > >
> > >  +-+
> > >  | PCI bus |
> > >  +++
> > >   |
> > >  ++---+---++
> > >  ||   ||
> > >   Physical PCI devices ...   Mediated PCI devices ...
> > >
> > > The first four patches in this patchset are mainly preparation of mdev
> > > bus support. The left two patches are the key implementation of mdev bus.
> > >
> > > The implementation of mdev bus in DPDK has several options:
> > >
> > > 1: Embed mdev bus in current pci bus
> > >
> > >This patchset takes this option for an example. Mdev has several
> > >device types: pci/platform/amba/ccw/ap. DPDK currently only cares
> > >pci devices in all mdev device types so we could embed the mdev bus
> > >into current pci bus. Then pci bus with mdev support will scan/plug/
> > >unplug/.. not only normal pci devices but also mediated pci devices.
> > 
> > I think it is a different bus.
> > It would be cleaner to not touch the PCI bus.
> > Having a separate bus will allow an easy way to identify a device
> > with the new generic devargs syntax, example:
> > bus=mdev,uuid=XXX
> > or more complex:
> > bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar
> 
> OK. Agree on cleaner to not touch PCI bus. And there may also be a 'type=pci'
> as mdev has several types in its definition (pci/ap/platform/ccw/...).
> 
> > > 2: A new mdev bus that scans mediated pci devices and probes mdev driver 
> > > to
> > >plug-in pci devices to pci bus
> > >
> > >If we took this option, a new mdev bus will be implemented to scan
> > >mediated pci devices and a new mdev driver for pci devices will be
> > >implemented in pci bus to plug-in mediated pci devices to pci bus.
> > >
> > >Our RFC v1 takes this option:
> > >http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1-
> > tiwei@intel.com/
> > >
> > >Note that: for either option 1 or 2, device drivers do not know the
> > >implementation difference but only use structs/functions exposed by
> > >pci bus. Mediated pci devices are different from normal pci devices
> > >on: 1. Mediated pci devices use UUID as address but normal ones use 
> > > BDF.
> > >2. Mediated pci devices may have some capabilities that normal pci
> > >devices do not have. For example, mediated pci devices could have
> > >regions that have sparse mmap capability, which allows a region to have
> > >multiple mmap areas. Another example is mediated pci devices may have
> > >regions/part of regions not mmaped but need to access them. Above
> > >difference will change the current ABI (i.e., struct rte_pci_device).
> > >Please check 5th and 6th patch for details.
> > >
> > 

Re: [dpdk-dev] [PATCH v5 15/24] net/ngbe: add Rx queue setup and release

2021-06-15 Thread Jiawen Wu
On Tuesday, June 15, 2021 2:53 AM, Andrew Rybchenko wrote:
> On 6/2/21 12:40 PM, Jiawen Wu wrote:
> > Setup device Rx queue and release Rx queue.
> >
> > Signed-off-by: Jiawen Wu 
> > ---
> >   drivers/net/ngbe/ngbe_ethdev.c |   9 +
> >   drivers/net/ngbe/ngbe_ethdev.h |   8 +
> >   drivers/net/ngbe/ngbe_rxtx.c   | 305
> +
> >   drivers/net/ngbe/ngbe_rxtx.h   |  90 ++
> >   4 files changed, 412 insertions(+)
> >
> > diff --git a/drivers/net/ngbe/ngbe_rxtx.h
> > b/drivers/net/ngbe/ngbe_rxtx.h index 39011ee286..e1676a53b4 100644
> > --- a/drivers/net/ngbe/ngbe_rxtx.h
> > +++ b/drivers/net/ngbe/ngbe_rxtx.h
> > @@ -6,7 +6,97 @@
> >   #ifndef _NGBE_RXTX_H_
> >   #define _NGBE_RXTX_H_
> >
> > +/*
> > + * Receive Descriptor
> > + 
> > */
> > +struct ngbe_rx_desc {
> > +   struct {
> > +   union {
> > +   __le32 dw0;
> 
> rte_* types shuld be used

I don't quite understand, should '__le32' be changed to 'rte_*' type?

> 
> > +   struct {
> > +   __le16 pkt;
> > +   __le16 hdr;
> > +   } lo;
> > +   };
> > +   union {
> > +   __le32 dw1;
> > +   struct {
> > +   __le16 ipid;
> > +   __le16 csum;
> > +   } hi;
> > +   };
> > +   } qw0; /* also as r.pkt_addr */
> > +   struct {
> > +   union {
> > +   __le32 dw2;
> > +   struct {
> > +   __le32 status;
> > +   } lo;
> > +   };
> > +   union {
> > +   __le32 dw3;
> > +   struct {
> > +   __le16 len;
> > +   __le16 tag;
> > +   } hi;
> > +   };
> > +   } qw1; /* also as r.hdr_addr */
> > +};
> > +






Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-15 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, 15 June 2021 08.48
> 
> 14/06/2021 17:48, Morten Brørup:
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> Monjalon
> > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> something big enough to hold a sufficiently large array. And possibly
> add an rte_max_ethports variable to indicate the number of populated
> entries in the array, for use when iterating over the array.
> >
> > Can we come up with another example than RTE_MAX_ETHPORTS where this
> library provides a better benefit?
> 
> What is big enough?
> Is 640KB enough for RAM? ;)

Good point!

I think we agree that:
- The cost of this library is some added complexity, i.e. working with a 
dynamically sized array through a library instead of just indexing into a 
compile time fixed size array.
- The main benefit of this library is saving some RAM (and still allowing a 
potentially very high number of ports.)

My point was: The amount of RAM we are saving is a key parameter for the 
cost/benefit analysis. And since I don't think the rte_eth_devices[] array uses 
a significant amount of memory, I was asking for some other array using more 
memory, where the cost/benefit analysis would come out more advantageous to 
your proposed parray library.

> 
> When dealing with microservices switching, the numbers can increase
> very fast.

Yes, I strongly supported increasing the port_id type from 8 to 16 bits for 
this reason, when it was discussed at the DPDK Userspace a few years ago in 
Dublin. And with large RTE_MAX_QUEUES_PER_PORT values, the rte_eth_dev 
structure uses quite a lot of space for the rx/tx callback arrays. But the 
memory usage of rte_eth_devices[] is still relatively insignificant in a system 
wide context.

If main purpose is to optimize the rte_eth_devices[] array, I think there are 
better alternatives than this library. Bruce and Konstantin already threw a few 
ideas on the table.



Re: [dpdk-dev] [PATCH v4] vhost: check header for legacy dequeue offload

2021-06-15 Thread David Marchand
On Tue, Jun 15, 2021 at 9:06 AM Xiao Wang  wrote:
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 8da8a86a10..351ff0a841 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -2259,44 +2259,64 @@ virtio_net_with_host_offload(struct virtio_net *dev)
> return false;
>  }
>
> -static void
> -parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
> +static int
> +parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr,
> +   uint16_t *len)
>  {


This function name is misleading, name could be parse_headers().
Its semantic gets more and more confusing with those l4_hdr and len pointers.

This function fills ->lX_len in the mbuf, everything is available for caller.

Caller can check that rte_pktmbuf_data_len() is >= m->l2_len +
m->l3_len + somesize.
=> no need for len.

l4_hdr can simply be deduced with rte_pktmbuf_mtod_offset(m, struct
somestruct *, m->l2_len + m->l3_len).
=> no need for l4_hdr.


> struct rte_ipv4_hdr *ipv4_hdr;
> struct rte_ipv6_hdr *ipv6_hdr;
> void *l3_hdr = NULL;

No need for l3_hdr.


> struct rte_ether_hdr *eth_hdr;
> uint16_t ethertype;
> +   uint16_t data_len = m->data_len;

Avoid direct access to mbuf internals, we have inline helpers:
rte_pktmbuf_data_len(m).


> +
> +   if (data_len <= sizeof(struct rte_ether_hdr))

Strictly speaking, < is enough.


> +   return -EINVAL;
>
> eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
>
> m->l2_len = sizeof(struct rte_ether_hdr);
> ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
> +   data_len -= sizeof(struct rte_ether_hdr);

No need to decrement data_len if checks below are all done for absolute value.
See suggestions below.


>
> if (ethertype == RTE_ETHER_TYPE_VLAN) {
> +   if (data_len <= sizeof(struct rte_vlan_hdr))
> +   return -EINVAL;

if (data_len < sizeof(rte_ether_hdr) + sizeof(struct rte_vlan_hdr))


> +
> struct rte_vlan_hdr *vlan_hdr =
> (struct rte_vlan_hdr *)(eth_hdr + 1);
>
> m->l2_len += sizeof(struct rte_vlan_hdr);
> ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
> +   data_len -= sizeof(struct rte_vlan_hdr);

Idem.


> }
>
> l3_hdr = (char *)eth_hdr + m->l2_len;
>
> switch (ethertype) {
> case RTE_ETHER_TYPE_IPV4:
> +   if (data_len <= sizeof(struct rte_ipv4_hdr))
> +   return -EINVAL;

if (data_len < m->l2_len + sizeof(struct rte_ipv4_hdr))


> ipv4_hdr = l3_hdr;

ipv4_hdr = rte_pktmbuf_mtod_offset(m, struct rte_ipv4_hdr *, m->l2_len);


> *l4_proto = ipv4_hdr->next_proto_id;
> m->l3_len = rte_ipv4_hdr_len(ipv4_hdr);
> +   if (data_len <= m->l3_len) {

if (data_len < m->l2_len + m->l3_len)


> +   m->l3_len = 0;
> +   return -EINVAL;

Returning here leaves m->l2_len set.


> +   }
> *l4_hdr = (char *)l3_hdr + m->l3_len;
> m->ol_flags |= PKT_TX_IPV4;
> +   data_len -= m->l3_len;
> break;
> case RTE_ETHER_TYPE_IPV6:
> +   if (data_len <= sizeof(struct rte_ipv6_hdr))
> +   return -EINVAL;

if (data_len < m->l2_len + sizeof(struct rte_ipv6_hdr))
Returning here leaves m->l2_len set.


> ipv6_hdr = l3_hdr;

ipv6_hdr = rte_pktmbuf_mtod_offset(m, struct rte_ipv6_hdr *, m->l2_len);


> *l4_proto = ipv6_hdr->proto;
> m->l3_len = sizeof(struct rte_ipv6_hdr);
> *l4_hdr = (char *)l3_hdr + m->l3_len;
> m->ol_flags |= PKT_TX_IPV6;
> +   data_len -= m->l3_len;
> break;
> default:
> m->l3_len = 0;
> @@ -2304,6 +2324,9 @@ parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, 
> void **l4_hdr)
> *l4_hdr = NULL;
> break;
> }
> +
> +   *len = data_len;
> +   return 0;
>  }
>
>  static __rte_always_inline void
> @@ -2312,21 +2335,27 @@ vhost_dequeue_offload_legacy(struct virtio_net_hdr 
> *hdr, struct rte_mbuf *m)
> uint16_t l4_proto = 0;
> void *l4_hdr = NULL;
> struct rte_tcp_hdr *tcp_hdr = NULL;
> +   uint16_t len = 0, tcp_len;
> +
> +   if (parse_ethernet(m, &l4_proto, &l4_hdr, &len) < 0)
> +   return;
>
> -   parse_ethernet(m, &l4_proto, &l4_hdr);
> if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
> if (hdr->csum_start == (m->l2_len + m->l3_len)) {
> switch (hdr->csum_offset) {
> case (offsetof(struct rte_tcp_hdr, cksum)):
> -   if (l4_proto == IPPROTO_TCP)
> +   if (l4_proto == IPPROTO_TCP &&
> +   

Re: [dpdk-dev] [PATCH] app/testpmd: send failure logs to stderr

2021-06-15 Thread Ferruh Yigit
On 6/14/2021 5:56 PM, Andrew Rybchenko wrote:
> On 6/11/21 1:35 PM, Ferruh Yigit wrote:
>> On 6/11/2021 10:19 AM, Andrew Rybchenko wrote:
>>> On 6/11/21 5:06 AM, Li, Xiaoyun wrote:
 Hi
 -Original Message-
 From: Andrew Rybchenko 
 Sent: Friday, May 28, 2021 00:25
 To: Li, Xiaoyun 
 Cc: dev@dpdk.org
 Subject: [PATCH] app/testpmd: send failure logs to stderr

 Running with stdout suppressed or redirected for further processing is very
 confusing in the case of errors.

 Signed-off-by: Andrew Rybchenko 
 ---

 This patch looks good to me.
 But what do you think about make it as a fix and backport to stable 
 branches?
 Anyway works for me.
>>>
>>> I have no strong opinion on the topic.
>>>
>>> @Ferruh, what do you think?
>>>
>>
>> Same here, no strong opinion.
>> Sending errors to 'stderr' looks correct thing to do, but changing behavior 
>> in
>> the LTS may cause some unexpected side affect, if it is scripted and testpmd
>> output is parsed etc... For this possibility I would wait for the next LTS.
> 
> So, I guess all agree that backporting to LTS is a bad idea because of
> behaviour change.
> 
> As I said in a sub-thread I tend to apply in v21.08 since it is a right
> thing to do like a fix, but the fix is not that required to be
> backported to change behaviour of LTS releases.
> 
>> And because of same reason perhaps a release note can be added.
> 
> I'll make v2 with release notes added. Good point.
> 
>> Also there is 'TESTPMD_LOG' macro for logs in testpmd, (as well as 'RTE_LOG'
>> macro), I don't know if we should switch all logs, including 'printf', to
>> 'TESTPMD_LOG' macro?
>> Later stdout/sderr can be managed in rte_log level, instead of any specific
>> logic for the testpmd.
> 
> I think fprintf() is a better option for debug tool, since its
> messages should not go to syslog etc. It should go to stdout/stderr
> regardless of logging configuration and log level settings.
> 

Why application should not take log configuration and log level settings into
account? I think this is a feature we can benefit.

And for not logging to syslog, I think it is DPDK wide concern, not specific to
testpmd, we should have way to say don't log to syslog or only log error to
syslog etc.. When it is done, using 'TESTPMD_LOG' enables benefiting from that.

>> Even there was a defect for this in the rte_log level, that logs should go to
>> stderr: https://bugs.dpdk.org/show_bug.cgi?id=8
>>
>>
 Acked-by: Xiaoyun Li 
>>>
> 



Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-15 Thread Jerin Jacob
On Tue, Jun 15, 2021 at 12:22 PM Thomas Monjalon  wrote:
>
> 14/06/2021 17:48, Jerin Jacob:
> > On Mon, Jun 14, 2021 at 8:29 PM Ananyev, Konstantin
> >  wrote:
> > > > 14/06/2021 15:15, Bruce Richardson:
> > > > > On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
> > > > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas 
> > > > > > > Monjalon
> > > > > > > Sent: Monday, 14 June 2021 12.59
> > > > > > >
> > > > > > > Performance of access in a fixed-size array is very good
> > > > > > > because of cache locality
> > > > > > > and because there is a single pointer to dereference.
> > > > > > > The only drawback is the lack of flexibility:
> > > > > > > the size of such an array cannot be increase at runtime.
> > > > > > >
> > > > > > > An approach to this problem is to allocate the array at runtime,
> > > > > > > being as efficient as static arrays, but still limited to a 
> > > > > > > maximum.
> > > > > > >
> > > > > > > That's why the API rte_parray is introduced,
> > > > > > > allowing to declare an array of pointer which can be resized
> > > > > > > dynamically
> > > > > > > and automatically at runtime while keeping a good read 
> > > > > > > performance.
> > > > > > >
> > > > > > > After resize, the previous array is kept until the next resize
> > > > > > > to avoid crashs during a read without any lock.
> > > > > > >
> > > > > > > Each element is a pointer to a memory chunk dynamically allocated.
> > > > > > > This is not good for cache locality but it allows to keep the same
> > > > > > > memory per element, no matter how the array is resized.
> > > > > > > Cache locality could be improved with mempools.
> > > > > > > The other drawback is having to dereference one more pointer
> > > > > > > to read an element.
> > > > > > >
> > > > > > > There is not much locks, so the API is for internal use only.
> > > > > > > This API may be used to completely remove some compilation-time
> > > > > > > maximums.
> > > > > >
> > > > > > I get the purpose and overall intention of this library.
> > > > > >
> > > > > > I probably already mentioned that I prefer "embedded style 
> > > > > > programming" with fixed size arrays, rather than runtime 
> > > > > > configurability. It's
> > > > my personal opinion, and the DPDK Tech Board clearly prefers reducing 
> > > > the amount of compile time configurability, so there is no way for
> > > > me to stop this progress, and I do not intend to oppose to this 
> > > > library. :-)
> > > > > >
> > > > > > This library is likely to become a core library of DPDK, so I think 
> > > > > > it is important getting it right. Could you please mention a few 
> > > > > > examples
> > > > where you think this internal library should be used, and where it 
> > > > should not be used. Then it is easier to discuss if the border line 
> > > > between
> > > > control path and data plane is correct. E.g. this library is not 
> > > > intended to be used for dynamically sized packet queues that grow and 
> > > > shrink in
> > > > the fast path.
> > > > > >
> > > > > > If the library becomes a core DPDK library, it should probably be 
> > > > > > public instead of internal. E.g. if the library is used to make
> > > > RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some 
> > > > applications might also need dynamically sized arrays for their
> > > > application specific per-port runtime data, and this library could 
> > > > serve that purpose too.
> > > > > >
> > > > >
> > > > > Thanks Thomas for starting this discussion and Morten for follow-up.
> > > > >
> > > > > My thinking is as follows, and I'm particularly keeping in mind the 
> > > > > cases
> > > > > of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
> > > > >
> > > > > While I dislike the hard-coded limits in DPDK, I'm also not convinced 
> > > > > that
> > > > > we should switch away from the flat arrays or that we need fully 
> > > > > dynamic
> > > > > arrays that grow/shrink at runtime for ethdevs. I would suggest a 
> > > > > half-way
> > > > > house here, where we keep the ethdevs as an array, but one 
> > > > > allocated/sized
> > > > > at runtime rather than statically. This would allow us to have a
> > > > > compile-time default value, but, for use cases that need it, allow 
> > > > > use of a
> > > > > flag e.g.  "max-ethdevs" to change the size of the parameter given to 
> > > > > the
> > > > > malloc call for the array.  This max limit could then be provided to 
> > > > > apps
> > > > > too if they want to match any array sizes. [Alternatively those apps 
> > > > > could
> > > > > check the provided size and error out if the size has been increased 
> > > > > beyond
> > > > > what the app is designed to use?]. There would be no extra 
> > > > > dereferences per
> > > > > rx/tx burst call in this scenario so performance should be the same as
> > > > > before (potentially better if array is in hugepage memory, I suppose).
> > > >
> > > > I think we need some benchmarks to decide 

Re: [dpdk-dev] [PATCH v5 15/24] net/ngbe: add Rx queue setup and release

2021-06-15 Thread Andrew Rybchenko
On 6/15/21 10:50 AM, Jiawen Wu wrote:
> On Tuesday, June 15, 2021 2:53 AM, Andrew Rybchenko wrote:
>> On 6/2/21 12:40 PM, Jiawen Wu wrote:
>>> Setup device Rx queue and release Rx queue.
>>>
>>> Signed-off-by: Jiawen Wu 
>>> ---
>>>   drivers/net/ngbe/ngbe_ethdev.c |   9 +
>>>   drivers/net/ngbe/ngbe_ethdev.h |   8 +
>>>   drivers/net/ngbe/ngbe_rxtx.c   | 305
>> +
>>>   drivers/net/ngbe/ngbe_rxtx.h   |  90 ++
>>>   4 files changed, 412 insertions(+)
>>>
>>> diff --git a/drivers/net/ngbe/ngbe_rxtx.h
>>> b/drivers/net/ngbe/ngbe_rxtx.h index 39011ee286..e1676a53b4 100644
>>> --- a/drivers/net/ngbe/ngbe_rxtx.h
>>> +++ b/drivers/net/ngbe/ngbe_rxtx.h
>>> @@ -6,7 +6,97 @@
>>>   #ifndef _NGBE_RXTX_H_
>>>   #define _NGBE_RXTX_H_
>>>
>>> +/*
>>> + * Receive Descriptor
>>> + 
>>> */
>>> +struct ngbe_rx_desc {
>>> +   struct {
>>> +   union {
>>> +   __le32 dw0;
>>
>> rte_* types shuld be used
> 
> I don't quite understand, should '__le32' be changed to 'rte_*' type?

Yes, since it is native DPDK code, it should use native
DPDK data types. In this particular case it is rte_le32.

> 
>>
>>> +   struct {
>>> +   __le16 pkt;
>>> +   __le16 hdr;
>>> +   } lo;
>>> +   };
>>> +   union {
>>> +   __le32 dw1;
>>> +   struct {
>>> +   __le16 ipid;
>>> +   __le16 csum;
>>> +   } hi;
>>> +   };
>>> +   } qw0; /* also as r.pkt_addr */
>>> +   struct {
>>> +   union {
>>> +   __le32 dw2;
>>> +   struct {
>>> +   __le32 status;
>>> +   } lo;
>>> +   };
>>> +   union {
>>> +   __le32 dw3;
>>> +   struct {
>>> +   __le16 len;
>>> +   __le16 tag;
>>> +   } hi;
>>> +   };
>>> +   } qw1; /* also as r.hdr_addr */
>>> +};
>>> +
> 
> 
> 



Re: [dpdk-dev] [PATCH] app/testpmd: send failure logs to stderr

2021-06-15 Thread Andrew Rybchenko
On 6/15/21 10:59 AM, Ferruh Yigit wrote:
> On 6/14/2021 5:56 PM, Andrew Rybchenko wrote:
>> On 6/11/21 1:35 PM, Ferruh Yigit wrote:
>>> On 6/11/2021 10:19 AM, Andrew Rybchenko wrote:
 On 6/11/21 5:06 AM, Li, Xiaoyun wrote:
> Hi
> -Original Message-
> From: Andrew Rybchenko 
> Sent: Friday, May 28, 2021 00:25
> To: Li, Xiaoyun 
> Cc: dev@dpdk.org
> Subject: [PATCH] app/testpmd: send failure logs to stderr
>
> Running with stdout suppressed or redirected for further processing is 
> very
> confusing in the case of errors.
>
> Signed-off-by: Andrew Rybchenko 
> ---
>
> This patch looks good to me.
> But what do you think about make it as a fix and backport to stable 
> branches?
> Anyway works for me.

 I have no strong opinion on the topic.

 @Ferruh, what do you think?

>>>
>>> Same here, no strong opinion.
>>> Sending errors to 'stderr' looks correct thing to do, but changing behavior 
>>> in
>>> the LTS may cause some unexpected side affect, if it is scripted and testpmd
>>> output is parsed etc... For this possibility I would wait for the next LTS.
>>
>> So, I guess all agree that backporting to LTS is a bad idea because of
>> behaviour change.
>>
>> As I said in a sub-thread I tend to apply in v21.08 since it is a right
>> thing to do like a fix, but the fix is not that required to be
>> backported to change behaviour of LTS releases.
>>
>>> And because of same reason perhaps a release note can be added.
>>
>> I'll make v2 with release notes added. Good point.
>>
>>> Also there is 'TESTPMD_LOG' macro for logs in testpmd, (as well as 'RTE_LOG'
>>> macro), I don't know if we should switch all logs, including 'printf', to
>>> 'TESTPMD_LOG' macro?
>>> Later stdout/sderr can be managed in rte_log level, instead of any specific
>>> logic for the testpmd.
>>
>> I think fprintf() is a better option for debug tool, since its
>> messages should not go to syslog etc. It should go to stdout/stderr
>> regardless of logging configuration and log level settings.
>>
> 
> Why application should not take log configuration and log level settings into
> account? I think this is a feature we can benefit.

For me it sounds like an extra way to shoot its own leg.

> And for not logging to syslog, I think it is DPDK wide concern, not specific 
> to
> testpmd, we should have way to say don't log to syslog or only log error to
> syslog etc.. When it is done, using 'TESTPMD_LOG' enables benefiting from 
> that.

Logging configuration should be flexible to support various
logging backends. IMHO, we don't need the flexibility here.
testpmd is a command-line test application and errors should
simply go to stderr. That's it. Since the result is the
same in both ways, my opinion is not strong, I'm just trying
to explain why I slightly prefer suggested way.

I can switch to TESTPMD_LOG() (or define TESTPMD_ERR() and use
it) easily. I just need maintainers decision on it.

>>> Even there was a defect for this in the rte_log level, that logs should go 
>>> to
>>> stderr: https://bugs.dpdk.org/show_bug.cgi?id=8
>>>
>>>
> Acked-by: Xiaoyun Li 

>>



[dpdk-dev] [RFC v2] porting AddressSanitizer feature to DPDK

2021-06-15 Thread zhihongx . peng
From: Zhihong Peng 

AddressSanitizer (ASan) is a google memory error detect
standard tool. It could help to detect use-after-free and
{heap,stack,global}-buffer overflow bugs in C/C++ programs,
print detailed error information when error happens, large
improve debug efficiency.

By referring to its implementation algorithm
(https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm),
ported heap-buffer-overflow and use-after-freefunctions to dpdk.

Here is an example of heap-buffer-overflow bug:
..
char *p = rte_zmalloc(NULL, 7, 0);
p[7] = 'a';
..

Here is an example of use-after-free bug:
..
char *p = rte_zmalloc(NULL, 7, 0);
rte_free(p);
*p = 'a';
..

If you want to use this feature,
you need to use the following compilation options:
-Db_lundef=false -Db_sanitize=address

Signed-off-by: Xueqin Lin 
Signed-off-by: Zhihong Peng 
---
 lib/eal/common/malloc_elem.c |  26 +-
 lib/eal/common/malloc_elem.h | 159 ++-
 lib/eal/common/malloc_heap.c |  12 +++
 lib/eal/common/meson.build   |   4 +
 lib/eal/common/rte_malloc.c  |   7 ++
 5 files changed, 206 insertions(+), 2 deletions(-)

diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c
index c2c9461f1..bdd20a162 100644
--- a/lib/eal/common/malloc_elem.c
+++ b/lib/eal/common/malloc_elem.c
@@ -446,6 +446,8 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, 
unsigned align,
struct malloc_elem *new_free_elem =
RTE_PTR_ADD(new_elem, size + 
MALLOC_ELEM_OVERHEAD);
 
+   asan_clear_split_alloczone(new_free_elem);
+
split_elem(elem, new_free_elem);
malloc_elem_free_list_insert(new_free_elem);
 
@@ -458,6 +460,8 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, 
unsigned align,
elem->state = ELEM_BUSY;
elem->pad = old_elem_size;
 
+   asan_clear_alloczone(elem);
+
/* put a dummy header in padding, to point to real element 
header */
if (elem->pad > 0) { /* pad will be at least 64-bytes, as 
everything
 * is cache-line aligned */
@@ -470,12 +474,18 @@ malloc_elem_alloc(struct malloc_elem *elem, size_t size, 
unsigned align,
return new_elem;
}
 
+   asan_clear_split_alloczone(new_elem);
+
/* we are going to split the element in two. The original element
 * remains free, and the new element is the one allocated.
 * Re-insert original element, in case its new size makes it
 * belong on a different list.
 */
+
split_elem(elem, new_elem);
+
+   asan_clear_alloczone(new_elem);
+
new_elem->state = ELEM_BUSY;
malloc_elem_free_list_insert(elem);
 
@@ -601,6 +611,8 @@ malloc_elem_hide_region(struct malloc_elem *elem, void 
*start, size_t len)
if (next && next_elem_is_adjacent(elem)) {
len_after = RTE_PTR_DIFF(next, hide_end);
if (len_after >= MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
+   asan_clear_split_alloczone(hide_end);
+
/* split after */
split_elem(elem, hide_end);
 
@@ -615,6 +627,8 @@ malloc_elem_hide_region(struct malloc_elem *elem, void 
*start, size_t len)
if (prev && prev_elem_is_adjacent(elem)) {
len_before = RTE_PTR_DIFF(hide_start, elem);
if (len_before >= MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
+   asan_clear_split_alloczone(hide_start);
+
/* split before */
split_elem(elem, hide_start);
 
@@ -628,6 +642,8 @@ malloc_elem_hide_region(struct malloc_elem *elem, void 
*start, size_t len)
}
}
 
+   asan_clear_alloczone(elem);
+
remove_elem(elem);
 }
 
@@ -641,8 +657,10 @@ malloc_elem_resize(struct malloc_elem *elem, size_t size)
const size_t new_size = size + elem->pad + MALLOC_ELEM_OVERHEAD;
 
/* if we request a smaller size, then always return ok */
-   if (elem->size >= new_size)
+   if (elem->size >= new_size) {
+   asan_clear_alloczone(elem);
return 0;
+   }
 
/* check if there is a next element, it's free and adjacent */
if (!elem->next || elem->next->state != ELEM_FREE ||
@@ -661,9 +679,15 @@ malloc_elem_resize(struct malloc_elem *elem, size_t size)
/* now we have a big block together. Lets cut it down a bit, by 
splitting */
struct malloc_elem *split_pt = RTE_PTR_ADD(elem, new_size);
split_pt = RTE_PTR_ALIGN_CEIL(split_pt, RTE_CACHE_LINE_SIZE);
+
+   asan_clear_split_alloczone(split_pt);
+
split_elem(elem, split_pt);
malloc_elem_free_list_insert(split_pt);
}
+
+   asan_clear_allocz

[dpdk-dev] [PATCH v1] net/i40e: clear FDIR SW input set when destroy rules

2021-06-15 Thread Lingyu Liu
When a FDIR rule is destroyed, the corresponding input set needs
to be cleared.

Signed-off-by: Lingyu Liu 
---
 drivers/net/i40e/i40e_fdir.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index ac0e09bfdd..e679324c20 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -1845,6 +1845,10 @@ i40e_flow_add_del_fdir_filter(struct rte_eth_dev *dev,
return -EINVAL;
}
 
+   /* clear input_set flag */
+   pf->fdir.input_set[pctype] = 0;
+   pf->fdir.inset_flag[pctype] = 0;
+
pf->fdir.flex_mask_flag[pctype] = 0;
 
if (fdir_info->fdir_invalprio == 1)
-- 
2.25.1



[dpdk-dev] [PATCH v3] ethdev: add IPv4 and L4 checksum RSS offload types

2021-06-15 Thread Alvin Zhang
This patch defines new RSS offload types for IPv4 and L4 checksum,
which are required when users want to distribute packets based on the
IPv4 or L4 checksum field.

For example "flow create 0 ingress pattern eth / ipv4 / end
actions rss types ipv4-chksum end queues end / end", this flow
causes all matching packets to be distributed to queues on
basis of IPv4 checksum.

Signed-off-by: Alvin Zhang 
Reviewed-by: Andrew Rybchenko 
Acked-by: Ajit Khaparde 
---

v3: Add L4 checksum RSS offload type
---
 app/test-pmd/cmdline.c  | 4 
 app/test-pmd/config.c   | 2 ++
 lib/ethdev/rte_ethdev.h | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0268b18..6148d84 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -2254,6 +2254,10 @@ struct cmd_config_rss {
rss_conf.rss_hf = ETH_RSS_ECPRI;
else if (!strcmp(res->value, "mpls"))
rss_conf.rss_hf = ETH_RSS_MPLS;
+   else if (!strcmp(res->value, "ipv4-chksum"))
+   rss_conf.rss_hf = ETH_RSS_IPV4_CHKSUM;
+   else if (!strcmp(res->value, "l4-chksum"))
+   rss_conf.rss_hf = ETH_RSS_L4_CHKSUM;
else if (!strcmp(res->value, "none"))
rss_conf.rss_hf = 0;
else if (!strcmp(res->value, "level-default")) {
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 43c79b5..14968bf 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -140,6 +140,8 @@
{ "gtpu", ETH_RSS_GTPU },
{ "ecpri", ETH_RSS_ECPRI },
{ "mpls", ETH_RSS_MPLS },
+   { "ipv4-chksum", ETH_RSS_IPV4_CHKSUM },
+   { "l4-chksum", ETH_RSS_L4_CHKSUM },
{ NULL, 0 },
 };
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index faf3bd9..1268729 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -537,6 +537,8 @@ struct rte_eth_rss_conf {
 #define ETH_RSS_PPPOE (1ULL << 31)
 #define ETH_RSS_ECPRI (1ULL << 32)
 #define ETH_RSS_MPLS  (1ULL << 33)
+#define ETH_RSS_IPV4_CHKSUM   (1ULL << 34)
+#define ETH_RSS_L4_CHKSUM (1ULL << 35)
 
 /*
  * We use the following macros to combine with above ETH_RSS_* for
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v3] ethdev: add IPv4 and L4 checksum RSS offload types

2021-06-15 Thread Jerin Jacob
On Tue, Jun 15, 2021 at 1:50 PM Alvin Zhang  wrote:
>
> This patch defines new RSS offload types for IPv4 and L4 checksum,
> which are required when users want to distribute packets based on the
> IPv4 or L4 checksum field.

What is the usecase for distribution based on L4/IPv4 checksum?
Is it something like HW has the feature so expose it or there is some
real use case for this application?



> For example "flow create 0 ingress pattern eth / ipv4 / end
> actions rss types ipv4-chksum end queues end / end", this flow
> causes all matching packets to be distributed to queues on
> basis of IPv4 checksum.
>
> Signed-off-by: Alvin Zhang 
> Reviewed-by: Andrew Rybchenko 
> Acked-by: Ajit Khaparde 
> ---
>
> v3: Add L4 checksum RSS offload type
> ---
>  app/test-pmd/cmdline.c  | 4 
>  app/test-pmd/config.c   | 2 ++
>  lib/ethdev/rte_ethdev.h | 2 ++
>  3 files changed, 8 insertions(+)
>
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index 0268b18..6148d84 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -2254,6 +2254,10 @@ struct cmd_config_rss {
> rss_conf.rss_hf = ETH_RSS_ECPRI;
> else if (!strcmp(res->value, "mpls"))
> rss_conf.rss_hf = ETH_RSS_MPLS;
> +   else if (!strcmp(res->value, "ipv4-chksum"))
> +   rss_conf.rss_hf = ETH_RSS_IPV4_CHKSUM;
> +   else if (!strcmp(res->value, "l4-chksum"))
> +   rss_conf.rss_hf = ETH_RSS_L4_CHKSUM;
> else if (!strcmp(res->value, "none"))
> rss_conf.rss_hf = 0;
> else if (!strcmp(res->value, "level-default")) {
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index 43c79b5..14968bf 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -140,6 +140,8 @@
> { "gtpu", ETH_RSS_GTPU },
> { "ecpri", ETH_RSS_ECPRI },
> { "mpls", ETH_RSS_MPLS },
> +   { "ipv4-chksum", ETH_RSS_IPV4_CHKSUM },
> +   { "l4-chksum", ETH_RSS_L4_CHKSUM },
> { NULL, 0 },
>  };
>
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index faf3bd9..1268729 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -537,6 +537,8 @@ struct rte_eth_rss_conf {
>  #define ETH_RSS_PPPOE (1ULL << 31)
>  #define ETH_RSS_ECPRI (1ULL << 32)
>  #define ETH_RSS_MPLS  (1ULL << 33)
> +#define ETH_RSS_IPV4_CHKSUM   (1ULL << 34)
> +#define ETH_RSS_L4_CHKSUM (1ULL << 35)
>
>  /*
>   * We use the following macros to combine with above ETH_RSS_* for
> --
> 1.8.3.1
>


[dpdk-dev] [PATCH 0/6] vhost: Fix and improve NUMA reallocation

2021-06-15 Thread Maxime Coquelin
This patch series first fixes missing reallocations of some
Virtqueue and device metadata.

Then, it improves the the numa_realloc function by using
rte_realloc_socket API that takes cares of the memcpy &
freeing. The VQs NUMA IDs are also saved in the VQ metadata
and used for every allocations so that all allocations
before NUMA realloc are on the same VQ, later ones are
allocated on the proper one.

Finally inflight feature metada are converted from calloc()
to rte_zmalloc_socket() and their reallocation is handled
in numa_realloc().

Maxime Coquelin (6):
  vhost: fix missing memory table NUMA realloc
  vhost: fix missing guest pages table NUMA realloc
  vhost: fix missing cache logging NUMA realloc
  vhost: improve NUMA reallocation
  vhost: allocate all data on same node as virtqueue
  vhost: convert inflight data to DPDK allocation API

 lib/vhost/vhost.c  |  38 ---
 lib/vhost/vhost.h  |   1 +
 lib/vhost/vhost_user.c | 232 -
 3 files changed, 155 insertions(+), 116 deletions(-)

-- 
2.31.1



[dpdk-dev] [PATCH 1/6] vhost: fix missing memory table NUMA realloc

2021-06-15 Thread Maxime Coquelin
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the Vhost memory table was missing, which
likely causes iat least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: 552e8fd3d2b4 ("vhost: simplify memory regions handling")
Cc: sta...@dpdk.org

Reported-by: David Marchand 
Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost_user.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 8f0eba6412..031e3bfa2f 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -557,6 +557,9 @@ numa_realloc(struct virtio_net *dev, int index)
goto out;
}
if (oldnode != newnode) {
+   struct rte_vhost_memory *old_mem;
+   ssize_t mem_size;
+
VHOST_LOG_CONFIG(INFO,
"reallocate dev from %d to %d node\n",
oldnode, newnode);
@@ -568,6 +571,18 @@ numa_realloc(struct virtio_net *dev, int index)
 
memcpy(dev, old_dev, sizeof(*dev));
rte_free(old_dev);
+
+   mem_size = sizeof(struct rte_vhost_memory) +
+   sizeof(struct rte_vhost_mem_region) * 
dev->mem->nregions;
+   old_mem = dev->mem;
+   dev->mem = rte_malloc_socket(NULL, mem_size, 0, newnode);
+   if (!dev->mem) {
+   dev->mem = old_mem;
+   goto out;
+   }
+
+   memcpy(dev->mem, old_mem, mem_size);
+   rte_free(old_mem);
}
 
 out:
-- 
2.31.1



[dpdk-dev] [PATCH 2/6] vhost: fix missing guest pages table NUMA realloc

2021-06-15 Thread Maxime Coquelin
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the guest pages table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: e246896178e6 ("vhost: get guest/host physical address mappings")
Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost_user.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 031e3bfa2f..cbfdf1b4d8 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -558,7 +558,8 @@ numa_realloc(struct virtio_net *dev, int index)
}
if (oldnode != newnode) {
struct rte_vhost_memory *old_mem;
-   ssize_t mem_size;
+   struct guest_page *old_gp;
+   ssize_t mem_size, gp_size;
 
VHOST_LOG_CONFIG(INFO,
"reallocate dev from %d to %d node\n",
@@ -583,6 +584,17 @@ numa_realloc(struct virtio_net *dev, int index)
 
memcpy(dev->mem, old_mem, mem_size);
rte_free(old_mem);
+
+   gp_size = dev->max_guest_pages * sizeof(*dev->guest_pages);
+   old_gp = dev->guest_pages;
+   dev->guest_pages = rte_malloc_socket(NULL, gp_size, 
RTE_CACHE_LINE_SIZE, newnode);
+   if (!dev->guest_pages) {
+   dev->guest_pages = old_gp;
+   goto out;
+   }
+
+   memcpy(dev->guest_pages, old_gp, gp_size);
+   rte_free(old_gp);
}
 
 out:
-- 
2.31.1



[dpdk-dev] [PATCH 3/6] vhost: fix missing cache logging NUMA realloc

2021-06-15 Thread Maxime Coquelin
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the log cache on the new NUMA node was
not done. This patch fixes this by reallocating it if it has
been allocated already, which means a live-migration is
on-going.

Fixes: 1818a63147fb ("vhost: move dirty logging cache out of virtqueue")

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost_user.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index cbfdf1b4d8..0e9e26ebe0 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -545,6 +545,16 @@ numa_realloc(struct virtio_net *dev, int index)
vq->batch_copy_elems = new_batch_copy_elems;
}
 
+   if (vq->log_cache) {
+   struct log_cache_entry *log_cache;
+
+   log_cache = rte_realloc_socket(vq->log_cache,
+   sizeof(struct log_cache_entry) * 
VHOST_LOG_CACHE_NR,
+   0, newnode);
+   if (log_cache)
+   vq->log_cache = log_cache;
+   }
+
rte_free(old_vq);
}
 
-- 
2.31.1



[dpdk-dev] [PATCH 4/6] vhost: improve NUMA reallocation

2021-06-15 Thread Maxime Coquelin
This patch improves the numa_realloc() function by making use
of rte_realloc_socket(), which takes care of the memory copy
and freeing of the old data.

Suggested-by: David Marchand 
Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost_user.c | 177 +
 1 file changed, 73 insertions(+), 104 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 0e9e26ebe0..b298312db6 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -480,139 +480,108 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
 static struct virtio_net*
 numa_realloc(struct virtio_net *dev, int index)
 {
-   int oldnode, newnode;
+   int node;
struct virtio_net *old_dev;
-   struct vhost_virtqueue *old_vq, *vq;
-   struct vring_used_elem *new_shadow_used_split;
-   struct vring_used_elem_packed *new_shadow_used_packed;
-   struct batch_copy_elem *new_batch_copy_elems;
+   struct vhost_virtqueue *vq;
+   struct batch_copy_elem *bce;
+   struct guest_page *gp;
+   struct rte_vhost_memory *mem;
+   size_t mem_size;
int ret;
 
if (dev->flags & VIRTIO_DEV_RUNNING)
return dev;
 
old_dev = dev;
-   vq = old_vq = dev->virtqueue[index];
-
-   ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
-   MPOL_F_NODE | MPOL_F_ADDR);
+   vq = dev->virtqueue[index];
 
-   /* check if we need to reallocate vq */
-   ret |= get_mempolicy(&oldnode, NULL, 0, old_vq,
-MPOL_F_NODE | MPOL_F_ADDR);
+   ret = get_mempolicy(&node, NULL, 0, vq->desc, MPOL_F_NODE | 
MPOL_F_ADDR);
if (ret) {
-   VHOST_LOG_CONFIG(ERR,
-   "Unable to get vq numa information.\n");
+   VHOST_LOG_CONFIG(ERR, "Unable to get virtqueue %d numa 
information.\n", index);
return dev;
}
-   if (oldnode != newnode) {
-   VHOST_LOG_CONFIG(INFO,
-   "reallocate vq from %d to %d node\n", oldnode, newnode);
-   vq = rte_malloc_socket(NULL, sizeof(*vq), 0, newnode);
-   if (!vq)
-   return dev;
 
-   memcpy(vq, old_vq, sizeof(*vq));
+   vq = rte_realloc_socket(vq, sizeof(*vq), 0, node);
+   if (!vq) {
+   VHOST_LOG_CONFIG(ERR, "Failed to realloc virtqueue %d on node 
%d\n",
+   index, node);
+   return dev;
+   }
 
-   if (vq_is_packed(dev)) {
-   new_shadow_used_packed = rte_malloc_socket(NULL,
-   vq->size *
-   sizeof(struct vring_used_elem_packed),
-   RTE_CACHE_LINE_SIZE,
-   newnode);
-   if (new_shadow_used_packed) {
-   rte_free(vq->shadow_used_packed);
-   vq->shadow_used_packed = new_shadow_used_packed;
-   }
-   } else {
-   new_shadow_used_split = rte_malloc_socket(NULL,
-   vq->size *
-   sizeof(struct vring_used_elem),
-   RTE_CACHE_LINE_SIZE,
-   newnode);
-   if (new_shadow_used_split) {
-   rte_free(vq->shadow_used_split);
-   vq->shadow_used_split = new_shadow_used_split;
-   }
-   }
+   if (vq != dev->virtqueue[index]) {
+   VHOST_LOG_CONFIG(INFO, "reallocated virtqueue on node %d\n", 
node);
+   dev->virtqueue[index] = vq;
+   vhost_user_iotlb_init(dev, index);
+   }
 
-   new_batch_copy_elems = rte_malloc_socket(NULL,
-   vq->size * sizeof(struct batch_copy_elem),
-   RTE_CACHE_LINE_SIZE,
-   newnode);
-   if (new_batch_copy_elems) {
-   rte_free(vq->batch_copy_elems);
-   vq->batch_copy_elems = new_batch_copy_elems;
+   if (vq_is_packed(dev)) {
+   struct vring_used_elem_packed *sup;
+
+   sup = rte_realloc_socket(vq->shadow_used_packed, vq->size * 
sizeof(*sup),
+   RTE_CACHE_LINE_SIZE, node);
+   if (!sup) {
+   VHOST_LOG_CONFIG(ERR, "Failed to realloc shadow packed 
on node %d\n", node);
+   return dev;
}
+   vq->shadow_used_packed = sup;
 
-   if (vq->log_cache) {
-   struct log_cache_entry *log_cache;
+   } else {
+   struct vring_used_elem *sus;
 
-   log_cache = rte_realloc_socket(vq->log_cache,
-   

[dpdk-dev] [PATCH 5/6] vhost: allocate all data on same node as virtqueue

2021-06-15 Thread Maxime Coquelin
This patch saves the NUMA node the virtqueue is allocated
on at init time, in order to allocate all other data on the
same node.

While most of the data are allocated before numa_realloc()
is called and so the data will be reallocated properly, some
data like the log cache are most likely allocated after.

For the virtio device metadata, we decide to allocate them
on the same node as the VQ 0.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost.c  | 34 --
 lib/vhost/vhost.h  |  1 +
 lib/vhost/vhost_user.c | 40 +++-
 3 files changed, 44 insertions(+), 31 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index c96f6335c8..cd3297 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -261,7 +261,7 @@ vhost_alloc_copy_ind_table(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint64_t src, dst;
uint64_t len, remain = desc_len;
 
-   idesc = rte_malloc(__func__, desc_len, 0);
+   idesc = rte_malloc_socket(__func__, desc_len, 0, vq->numa_node);
if (unlikely(!idesc))
return NULL;
 
@@ -549,6 +549,7 @@ static void
 init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
struct vhost_virtqueue *vq;
+   int numa_node = SOCKET_ID_ANY;
 
if (vring_idx >= VHOST_MAX_VRING) {
VHOST_LOG_CONFIG(ERR,
@@ -570,6 +571,15 @@ init_vring_queue(struct virtio_net *dev, uint32_t 
vring_idx)
vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
vq->notif_enable = VIRTIO_UNINITIALIZED_NOTIF;
 
+#ifdef RTE_LIBRTE_VHOST_NUMA
+   if (get_mempolicy(&numa_node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) {
+   VHOST_LOG_CONFIG(ERR, "(%d) failed to query numa node: %s\n",
+   dev->vid, rte_strerror(errno));
+   numa_node = SOCKET_ID_ANY;
+   }
+#endif
+   vq->numa_node = numa_node;
+
vhost_user_iotlb_init(dev, vring_idx);
 }
 
@@ -1616,7 +1626,6 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
struct vhost_virtqueue *vq;
struct virtio_net *dev = get_device(vid);
struct rte_vhost_async_features f;
-   int node;
 
if (dev == NULL || ops == NULL)
return -1;
@@ -1651,20 +1660,9 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
goto reg_out;
}
 
-#ifdef RTE_LIBRTE_VHOST_NUMA
-   if (get_mempolicy(&node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) {
-   VHOST_LOG_CONFIG(ERR,
-   "unable to get numa information in async register. "
-   "allocating async buffer memory on the caller thread 
node\n");
-   node = SOCKET_ID_ANY;
-   }
-#else
-   node = SOCKET_ID_ANY;
-#endif
-
vq->async_pkts_info = rte_malloc_socket(NULL,
vq->size * sizeof(struct async_inflight_info),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->async_pkts_info) {
vhost_free_async_mem(vq);
VHOST_LOG_CONFIG(ERR,
@@ -1675,7 +1673,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
 
vq->it_pool = rte_malloc_socket(NULL,
VHOST_MAX_ASYNC_IT * sizeof(struct rte_vhost_iov_iter),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->it_pool) {
vhost_free_async_mem(vq);
VHOST_LOG_CONFIG(ERR,
@@ -1686,7 +1684,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
 
vq->vec_pool = rte_malloc_socket(NULL,
VHOST_MAX_ASYNC_VEC * sizeof(struct iovec),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->vec_pool) {
vhost_free_async_mem(vq);
VHOST_LOG_CONFIG(ERR,
@@ -1698,7 +1696,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
if (vq_is_packed(dev)) {
vq->async_buffers_packed = rte_malloc_socket(NULL,
vq->size * sizeof(struct vring_used_elem_packed),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->async_buffers_packed) {
vhost_free_async_mem(vq);
VHOST_LOG_CONFIG(ERR,
@@ -1709,7 +1707,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
} else {
vq->async_descs_split = rte_malloc_socket(NULL,
vq->size * sizeof(struct vring_used_elem),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->async_descs_split) {
vhost_free_async_me

[dpdk-dev] [PATCH 6/6] vhost: convert inflight data to DPDK allocation API

2021-06-15 Thread Maxime Coquelin
Inflight metadata are allocated using glibc's calloc.
This patch converts them to rte_zmalloc_socket to take
care of the NUMA affinity.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost.c  |  4 ++--
 lib/vhost/vhost_user.c | 42 +++---
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index cd3297..53a470f547 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -312,10 +312,10 @@ cleanup_vq_inflight(struct virtio_net *dev, struct 
vhost_virtqueue *vq)
 
if (vq->resubmit_inflight) {
if (vq->resubmit_inflight->resubmit_list) {
-   free(vq->resubmit_inflight->resubmit_list);
+   rte_free(vq->resubmit_inflight->resubmit_list);
vq->resubmit_inflight->resubmit_list = NULL;
}
-   free(vq->resubmit_inflight);
+   rte_free(vq->resubmit_inflight);
vq->resubmit_inflight = NULL;
}
 }
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 72879a36c8..5342edf769 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -188,7 +188,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
dev->inflight_info->fd = -1;
}
 
-   free(dev->inflight_info);
+   rte_free(dev->inflight_info);
dev->inflight_info = NULL;
}
 
@@ -1476,6 +1476,7 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
uint16_t num_queues, queue_size;
struct virtio_net *dev = *pdev;
int fd, i, j;
+   int numa_node = SOCKET_ID_ANY;
void *addr;
 
if (msg->size != sizeof(msg->payload.inflight)) {
@@ -1485,9 +1486,16 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
return RTE_VHOST_MSG_RESULT_ERR;
}
 
+   /*
+* If VQ 0 has already been allocated, try to allocate on the same
+* NUMA node. It can be reallocated later in numa_realloc().
+*/
+   if (dev->nr_vring > 0)
+   numa_node = dev->virtqueue[0]->numa_node;
+
if (dev->inflight_info == NULL) {
-   dev->inflight_info = calloc(1,
-   sizeof(struct inflight_mem_info));
+   dev->inflight_info = rte_zmalloc_socket("inflight_info",
+   sizeof(struct inflight_mem_info), 0, numa_node);
if (!dev->inflight_info) {
VHOST_LOG_CONFIG(ERR,
"failed to alloc dev inflight area\n");
@@ -1570,6 +1578,7 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev, 
VhostUserMsg *msg,
struct vhost_virtqueue *vq;
void *addr;
int fd, i;
+   int numa_node = SOCKET_ID_ANY;
 
fd = msg->fds[0];
if (msg->size != sizeof(msg->payload.inflight) || fd < 0) {
@@ -1603,9 +1612,16 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev, 
VhostUserMsg *msg,
"set_inflight_fd pervq_inflight_size: %d\n",
pervq_inflight_size);
 
+   /*
+* If VQ 0 has already been allocated, try to allocate on the same
+* NUMA node. It can be reallocated later in numa_realloc().
+*/
+   if (dev->nr_vring > 0)
+   numa_node = dev->virtqueue[0]->numa_node;
+
if (!dev->inflight_info) {
-   dev->inflight_info = calloc(1,
-   sizeof(struct inflight_mem_info));
+   dev->inflight_info = rte_zmalloc_socket("inflight_info",
+   sizeof(struct inflight_mem_info), 0, numa_node);
if (dev->inflight_info == NULL) {
VHOST_LOG_CONFIG(ERR,
"failed to alloc dev inflight area\n");
@@ -1764,15 +1780,17 @@ vhost_check_queue_inflights_split(struct virtio_net 
*dev,
vq->last_avail_idx += resubmit_num;
 
if (resubmit_num) {
-   resubmit  = calloc(1, sizeof(struct rte_vhost_resubmit_info));
+   resubmit  = rte_zmalloc_socket("resubmit", sizeof(struct 
rte_vhost_resubmit_info),
+   0, vq->numa_node);
if (!resubmit) {
VHOST_LOG_CONFIG(ERR,
"failed to allocate memory for resubmit 
info.\n");
return RTE_VHOST_MSG_RESULT_ERR;
}
 
-   resubmit->resubmit_list = calloc(resubmit_num,
-   sizeof(struct rte_vhost_resubmit_desc));
+   resubmit->resubmit_list = rte_zmalloc_socket("resubmit_list",
+   resubmit_num * sizeof(struct 
rte_vhost_resubmit_desc),
+   0, vq->numa_node);
if (!resubmit->resubmit_list) {
VHOST_LOG_CONFIG(ERR,
"fail

Re: [dpdk-dev] [PATCH v1] net/i40e: clear FDIR SW input set when destroy rules

2021-06-15 Thread Xing, Beilei



> -Original Message-
> From: Liu, Lingyu 
> Sent: Tuesday, June 15, 2021 10:53 PM
> To: dev@dpdk.org; Zhang, Qi Z ; Xing, Beilei
> 
> Cc: Liu, Lingyu 
> Subject: [PATCH v1] net/i40e: clear FDIR SW input set when destroy rules
> 
> When a FDIR rule is destroyed, the corresponding input set needs to be
> cleared.
> 
The fix should be if deleting all the rules of some PCTYPE, then the input set 
need to reset.

Missed fix line.

> Signed-off-by: Lingyu Liu 
> ---
>  drivers/net/i40e/i40e_fdir.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c index
> ac0e09bfdd..e679324c20 100644
> --- a/drivers/net/i40e/i40e_fdir.c
> +++ b/drivers/net/i40e/i40e_fdir.c
> @@ -1845,6 +1845,10 @@ i40e_flow_add_del_fdir_filter(struct rte_eth_dev
> *dev,
>   return -EINVAL;
>   }
> 
> + /* clear input_set flag */
> + pf->fdir.input_set[pctype] = 0;
> + pf->fdir.inset_flag[pctype] = 0;
> +

So should check if it's the last rule of the pctype.

>   pf->fdir.flex_mask_flag[pctype] = 0;
> 
>   if (fdir_info->fdir_invalprio == 1)
> --
> 2.25.1



Re: [dpdk-dev] [RFC v2] porting AddressSanitizer feature to DPDK

2021-06-15 Thread Jerin Jacob
On Tue, Jun 15, 2021 at 1:46 PM  wrote:
>
> From: Zhihong Peng 
>
> AddressSanitizer (ASan) is a google memory error detect
> standard tool. It could help to detect use-after-free and
> {heap,stack,global}-buffer overflow bugs in C/C++ programs,
> print detailed error information when error happens, large
> improve debug efficiency.
>
> By referring to its implementation algorithm
> (https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm),
> ported heap-buffer-overflow and use-after-freefunctions to dpdk.
>
> Here is an example of heap-buffer-overflow bug:
> ..
> char *p = rte_zmalloc(NULL, 7, 0);
> p[7] = 'a';
> ..
>
> Here is an example of use-after-free bug:
> ..
> char *p = rte_zmalloc(NULL, 7, 0);
> rte_free(p);
> *p = 'a';
> ..
>
> If you want to use this feature,
> you need to use the following compilation options:
> -Db_lundef=false -Db_sanitize=address

# Thanks for this patch. It is a useful item.

# Subject could be changed
from:
porting AddressSanitizer feature to DPDK
to
eal: support for  AddressSanitizer
or so

# Could you add a section in the documentation for Sanitizers to
document the build time option and other points that users need to
know.
We can add other sanitizers such as UBSan etc in the future here

# Add a UT test case to make sure it is working in app/test or so.

# Also, Please update the release note for this feature.


[dpdk-dev] [PATCH v2 0/6] vhost: Fix and improve NUMA reallocation

2021-06-15 Thread Maxime Coquelin
This patch series first fixes missing reallocations of some
Virtqueue and device metadata.

Then, it improves the the numa_realloc function by using
rte_realloc_socket API that takes cares of the memcpy &
freeing. The VQs NUMA IDs are also saved in the VQ metadata
and used for every allocations so that all allocations
before NUMA realloc are on the same VQ, later ones are
allocated on the proper one.

Finally inflight feature metada are converted from calloc()
to rte_zmalloc_socket() and their reallocation is handled
in numa_realloc().

Changes in v2:
==
- Add missing NUMA realloc in patch 6

Maxime Coquelin (6):
  vhost: fix missing memory table NUMA realloc
  vhost: fix missing guest pages table NUMA realloc
  vhost: fix missing cache logging NUMA realloc
  vhost: improve NUMA reallocation
  vhost: allocate all data on same node as virtqueue
  vhost: convert inflight data to DPDK allocation API

 lib/vhost/vhost.c  |  38 +++---
 lib/vhost/vhost.h  |   1 +
 lib/vhost/vhost_user.c | 255 ++---
 3 files changed, 179 insertions(+), 115 deletions(-)

-- 
2.31.1



[dpdk-dev] [PATCH v2 1/6] vhost: fix missing memory table NUMA realloc

2021-06-15 Thread Maxime Coquelin
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the Vhost memory table was missing, which
likely causes iat least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: 552e8fd3d2b4 ("vhost: simplify memory regions handling")
Cc: sta...@dpdk.org

Reported-by: David Marchand 
Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost_user.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 8f0eba6412..031e3bfa2f 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -557,6 +557,9 @@ numa_realloc(struct virtio_net *dev, int index)
goto out;
}
if (oldnode != newnode) {
+   struct rte_vhost_memory *old_mem;
+   ssize_t mem_size;
+
VHOST_LOG_CONFIG(INFO,
"reallocate dev from %d to %d node\n",
oldnode, newnode);
@@ -568,6 +571,18 @@ numa_realloc(struct virtio_net *dev, int index)
 
memcpy(dev, old_dev, sizeof(*dev));
rte_free(old_dev);
+
+   mem_size = sizeof(struct rte_vhost_memory) +
+   sizeof(struct rte_vhost_mem_region) * 
dev->mem->nregions;
+   old_mem = dev->mem;
+   dev->mem = rte_malloc_socket(NULL, mem_size, 0, newnode);
+   if (!dev->mem) {
+   dev->mem = old_mem;
+   goto out;
+   }
+
+   memcpy(dev->mem, old_mem, mem_size);
+   rte_free(old_mem);
}
 
 out:
-- 
2.31.1



[dpdk-dev] [PATCH v2 2/6] vhost: fix missing guest pages table NUMA realloc

2021-06-15 Thread Maxime Coquelin
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the guest pages table was missing, which
likely causes at least one cross-NUMA accesses for every burst
of packets.

This patch reallocates this table on the same NUMA node as the
other metadata.

Fixes: e246896178e6 ("vhost: get guest/host physical address mappings")
Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost_user.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 031e3bfa2f..cbfdf1b4d8 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -558,7 +558,8 @@ numa_realloc(struct virtio_net *dev, int index)
}
if (oldnode != newnode) {
struct rte_vhost_memory *old_mem;
-   ssize_t mem_size;
+   struct guest_page *old_gp;
+   ssize_t mem_size, gp_size;
 
VHOST_LOG_CONFIG(INFO,
"reallocate dev from %d to %d node\n",
@@ -583,6 +584,17 @@ numa_realloc(struct virtio_net *dev, int index)
 
memcpy(dev->mem, old_mem, mem_size);
rte_free(old_mem);
+
+   gp_size = dev->max_guest_pages * sizeof(*dev->guest_pages);
+   old_gp = dev->guest_pages;
+   dev->guest_pages = rte_malloc_socket(NULL, gp_size, 
RTE_CACHE_LINE_SIZE, newnode);
+   if (!dev->guest_pages) {
+   dev->guest_pages = old_gp;
+   goto out;
+   }
+
+   memcpy(dev->guest_pages, old_gp, gp_size);
+   rte_free(old_gp);
}
 
 out:
-- 
2.31.1



[dpdk-dev] [PATCH v2 3/6] vhost: fix missing cache logging NUMA realloc

2021-06-15 Thread Maxime Coquelin
When the guest allocates virtqueues on a different NUMA node
than the one the Vhost metadata are allocated, both the Vhost
device struct and the virtqueues struct are reallocated.

However, reallocating the log cache on the new NUMA node was
not done. This patch fixes this by reallocating it if it has
been allocated already, which means a live-migration is
on-going.

Fixes: 1818a63147fb ("vhost: move dirty logging cache out of virtqueue")

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost_user.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index cbfdf1b4d8..0e9e26ebe0 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -545,6 +545,16 @@ numa_realloc(struct virtio_net *dev, int index)
vq->batch_copy_elems = new_batch_copy_elems;
}
 
+   if (vq->log_cache) {
+   struct log_cache_entry *log_cache;
+
+   log_cache = rte_realloc_socket(vq->log_cache,
+   sizeof(struct log_cache_entry) * 
VHOST_LOG_CACHE_NR,
+   0, newnode);
+   if (log_cache)
+   vq->log_cache = log_cache;
+   }
+
rte_free(old_vq);
}
 
-- 
2.31.1



[dpdk-dev] [PATCH v2 4/6] vhost: improve NUMA reallocation

2021-06-15 Thread Maxime Coquelin
This patch improves the numa_realloc() function by making use
of rte_realloc_socket(), which takes care of the memory copy
and freeing of the old data.

Suggested-by: David Marchand 
Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost_user.c | 177 +
 1 file changed, 73 insertions(+), 104 deletions(-)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 0e9e26ebe0..b298312db6 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -480,139 +480,108 @@ vhost_user_set_vring_num(struct virtio_net **pdev,
 static struct virtio_net*
 numa_realloc(struct virtio_net *dev, int index)
 {
-   int oldnode, newnode;
+   int node;
struct virtio_net *old_dev;
-   struct vhost_virtqueue *old_vq, *vq;
-   struct vring_used_elem *new_shadow_used_split;
-   struct vring_used_elem_packed *new_shadow_used_packed;
-   struct batch_copy_elem *new_batch_copy_elems;
+   struct vhost_virtqueue *vq;
+   struct batch_copy_elem *bce;
+   struct guest_page *gp;
+   struct rte_vhost_memory *mem;
+   size_t mem_size;
int ret;
 
if (dev->flags & VIRTIO_DEV_RUNNING)
return dev;
 
old_dev = dev;
-   vq = old_vq = dev->virtqueue[index];
-
-   ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
-   MPOL_F_NODE | MPOL_F_ADDR);
+   vq = dev->virtqueue[index];
 
-   /* check if we need to reallocate vq */
-   ret |= get_mempolicy(&oldnode, NULL, 0, old_vq,
-MPOL_F_NODE | MPOL_F_ADDR);
+   ret = get_mempolicy(&node, NULL, 0, vq->desc, MPOL_F_NODE | 
MPOL_F_ADDR);
if (ret) {
-   VHOST_LOG_CONFIG(ERR,
-   "Unable to get vq numa information.\n");
+   VHOST_LOG_CONFIG(ERR, "Unable to get virtqueue %d numa 
information.\n", index);
return dev;
}
-   if (oldnode != newnode) {
-   VHOST_LOG_CONFIG(INFO,
-   "reallocate vq from %d to %d node\n", oldnode, newnode);
-   vq = rte_malloc_socket(NULL, sizeof(*vq), 0, newnode);
-   if (!vq)
-   return dev;
 
-   memcpy(vq, old_vq, sizeof(*vq));
+   vq = rte_realloc_socket(vq, sizeof(*vq), 0, node);
+   if (!vq) {
+   VHOST_LOG_CONFIG(ERR, "Failed to realloc virtqueue %d on node 
%d\n",
+   index, node);
+   return dev;
+   }
 
-   if (vq_is_packed(dev)) {
-   new_shadow_used_packed = rte_malloc_socket(NULL,
-   vq->size *
-   sizeof(struct vring_used_elem_packed),
-   RTE_CACHE_LINE_SIZE,
-   newnode);
-   if (new_shadow_used_packed) {
-   rte_free(vq->shadow_used_packed);
-   vq->shadow_used_packed = new_shadow_used_packed;
-   }
-   } else {
-   new_shadow_used_split = rte_malloc_socket(NULL,
-   vq->size *
-   sizeof(struct vring_used_elem),
-   RTE_CACHE_LINE_SIZE,
-   newnode);
-   if (new_shadow_used_split) {
-   rte_free(vq->shadow_used_split);
-   vq->shadow_used_split = new_shadow_used_split;
-   }
-   }
+   if (vq != dev->virtqueue[index]) {
+   VHOST_LOG_CONFIG(INFO, "reallocated virtqueue on node %d\n", 
node);
+   dev->virtqueue[index] = vq;
+   vhost_user_iotlb_init(dev, index);
+   }
 
-   new_batch_copy_elems = rte_malloc_socket(NULL,
-   vq->size * sizeof(struct batch_copy_elem),
-   RTE_CACHE_LINE_SIZE,
-   newnode);
-   if (new_batch_copy_elems) {
-   rte_free(vq->batch_copy_elems);
-   vq->batch_copy_elems = new_batch_copy_elems;
+   if (vq_is_packed(dev)) {
+   struct vring_used_elem_packed *sup;
+
+   sup = rte_realloc_socket(vq->shadow_used_packed, vq->size * 
sizeof(*sup),
+   RTE_CACHE_LINE_SIZE, node);
+   if (!sup) {
+   VHOST_LOG_CONFIG(ERR, "Failed to realloc shadow packed 
on node %d\n", node);
+   return dev;
}
+   vq->shadow_used_packed = sup;
 
-   if (vq->log_cache) {
-   struct log_cache_entry *log_cache;
+   } else {
+   struct vring_used_elem *sus;
 
-   log_cache = rte_realloc_socket(vq->log_cache,
-   

[dpdk-dev] [PATCH v2 5/6] vhost: allocate all data on same node as virtqueue

2021-06-15 Thread Maxime Coquelin
This patch saves the NUMA node the virtqueue is allocated
on at init time, in order to allocate all other data on the
same node.

While most of the data are allocated before numa_realloc()
is called and so the data will be reallocated properly, some
data like the log cache are most likely allocated after.

For the virtio device metadata, we decide to allocate them
on the same node as the VQ 0.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost.c  | 34 --
 lib/vhost/vhost.h  |  1 +
 lib/vhost/vhost_user.c | 40 +++-
 3 files changed, 44 insertions(+), 31 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index c96f6335c8..cd3297 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -261,7 +261,7 @@ vhost_alloc_copy_ind_table(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
uint64_t src, dst;
uint64_t len, remain = desc_len;
 
-   idesc = rte_malloc(__func__, desc_len, 0);
+   idesc = rte_malloc_socket(__func__, desc_len, 0, vq->numa_node);
if (unlikely(!idesc))
return NULL;
 
@@ -549,6 +549,7 @@ static void
 init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
struct vhost_virtqueue *vq;
+   int numa_node = SOCKET_ID_ANY;
 
if (vring_idx >= VHOST_MAX_VRING) {
VHOST_LOG_CONFIG(ERR,
@@ -570,6 +571,15 @@ init_vring_queue(struct virtio_net *dev, uint32_t 
vring_idx)
vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
vq->notif_enable = VIRTIO_UNINITIALIZED_NOTIF;
 
+#ifdef RTE_LIBRTE_VHOST_NUMA
+   if (get_mempolicy(&numa_node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) {
+   VHOST_LOG_CONFIG(ERR, "(%d) failed to query numa node: %s\n",
+   dev->vid, rte_strerror(errno));
+   numa_node = SOCKET_ID_ANY;
+   }
+#endif
+   vq->numa_node = numa_node;
+
vhost_user_iotlb_init(dev, vring_idx);
 }
 
@@ -1616,7 +1626,6 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
struct vhost_virtqueue *vq;
struct virtio_net *dev = get_device(vid);
struct rte_vhost_async_features f;
-   int node;
 
if (dev == NULL || ops == NULL)
return -1;
@@ -1651,20 +1660,9 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
goto reg_out;
}
 
-#ifdef RTE_LIBRTE_VHOST_NUMA
-   if (get_mempolicy(&node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) {
-   VHOST_LOG_CONFIG(ERR,
-   "unable to get numa information in async register. "
-   "allocating async buffer memory on the caller thread 
node\n");
-   node = SOCKET_ID_ANY;
-   }
-#else
-   node = SOCKET_ID_ANY;
-#endif
-
vq->async_pkts_info = rte_malloc_socket(NULL,
vq->size * sizeof(struct async_inflight_info),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->async_pkts_info) {
vhost_free_async_mem(vq);
VHOST_LOG_CONFIG(ERR,
@@ -1675,7 +1673,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
 
vq->it_pool = rte_malloc_socket(NULL,
VHOST_MAX_ASYNC_IT * sizeof(struct rte_vhost_iov_iter),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->it_pool) {
vhost_free_async_mem(vq);
VHOST_LOG_CONFIG(ERR,
@@ -1686,7 +1684,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
 
vq->vec_pool = rte_malloc_socket(NULL,
VHOST_MAX_ASYNC_VEC * sizeof(struct iovec),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->vec_pool) {
vhost_free_async_mem(vq);
VHOST_LOG_CONFIG(ERR,
@@ -1698,7 +1696,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
if (vq_is_packed(dev)) {
vq->async_buffers_packed = rte_malloc_socket(NULL,
vq->size * sizeof(struct vring_used_elem_packed),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->async_buffers_packed) {
vhost_free_async_mem(vq);
VHOST_LOG_CONFIG(ERR,
@@ -1709,7 +1707,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t 
queue_id,
} else {
vq->async_descs_split = rte_malloc_socket(NULL,
vq->size * sizeof(struct vring_used_elem),
-   RTE_CACHE_LINE_SIZE, node);
+   RTE_CACHE_LINE_SIZE, vq->numa_node);
if (!vq->async_descs_split) {
vhost_free_async_me

[dpdk-dev] [PATCH v2 6/6] vhost: convert inflight data to DPDK allocation API

2021-06-15 Thread Maxime Coquelin
Inflight metadata are allocated using glibc's calloc.
This patch converts them to rte_zmalloc_socket to take
care of the NUMA affinity.

Signed-off-by: Maxime Coquelin 
---
 lib/vhost/vhost.c  |  4 +--
 lib/vhost/vhost_user.c | 67 +++---
 2 files changed, 58 insertions(+), 13 deletions(-)

diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index cd3297..53a470f547 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -312,10 +312,10 @@ cleanup_vq_inflight(struct virtio_net *dev, struct 
vhost_virtqueue *vq)
 
if (vq->resubmit_inflight) {
if (vq->resubmit_inflight->resubmit_list) {
-   free(vq->resubmit_inflight->resubmit_list);
+   rte_free(vq->resubmit_inflight->resubmit_list);
vq->resubmit_inflight->resubmit_list = NULL;
}
-   free(vq->resubmit_inflight);
+   rte_free(vq->resubmit_inflight);
vq->resubmit_inflight = NULL;
}
 }
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 72879a36c8..ba5dbd14d3 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -188,7 +188,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
dev->inflight_info->fd = -1;
}
 
-   free(dev->inflight_info);
+   rte_free(dev->inflight_info);
dev->inflight_info = NULL;
}
 
@@ -559,6 +559,31 @@ numa_realloc(struct virtio_net *dev, int index)
vq->log_cache = lc;
}
 
+   if (vq->resubmit_inflight) {
+   struct rte_vhost_resubmit_info *ri;
+
+   ri = rte_realloc_socket(vq->resubmit_inflight, sizeof(*ri), 0, 
node);
+   if (!ri) {
+   VHOST_LOG_CONFIG(ERR, "Failed to realloc resubmit 
inflight on node %d\n",
+   node);
+   return dev;
+   }
+   vq->resubmit_inflight = ri;
+
+   if (vq->resubmit_inflight) {
+   struct rte_vhost_resubmit_desc *rd;
+
+   rd = rte_realloc_socket(ri->resubmit_list, sizeof(*rd) 
* ri->resubmit_num,
+   0, node);
+   if (!ri) {
+   VHOST_LOG_CONFIG(ERR, "Failed to realloc 
resubmit list on node %d\n",
+   node);
+   return dev;
+   }
+   ri->resubmit_list = rd;
+   }
+   }
+
vq->numa_node = node;
 
 out_dev_realloc:
@@ -1476,6 +1501,7 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
uint16_t num_queues, queue_size;
struct virtio_net *dev = *pdev;
int fd, i, j;
+   int numa_node = SOCKET_ID_ANY;
void *addr;
 
if (msg->size != sizeof(msg->payload.inflight)) {
@@ -1485,9 +1511,16 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
return RTE_VHOST_MSG_RESULT_ERR;
}
 
+   /*
+* If VQ 0 has already been allocated, try to allocate on the same
+* NUMA node. It can be reallocated later in numa_realloc().
+*/
+   if (dev->nr_vring > 0)
+   numa_node = dev->virtqueue[0]->numa_node;
+
if (dev->inflight_info == NULL) {
-   dev->inflight_info = calloc(1,
-   sizeof(struct inflight_mem_info));
+   dev->inflight_info = rte_zmalloc_socket("inflight_info",
+   sizeof(struct inflight_mem_info), 0, numa_node);
if (!dev->inflight_info) {
VHOST_LOG_CONFIG(ERR,
"failed to alloc dev inflight area\n");
@@ -1570,6 +1603,7 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev, 
VhostUserMsg *msg,
struct vhost_virtqueue *vq;
void *addr;
int fd, i;
+   int numa_node = SOCKET_ID_ANY;
 
fd = msg->fds[0];
if (msg->size != sizeof(msg->payload.inflight) || fd < 0) {
@@ -1603,9 +1637,16 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev, 
VhostUserMsg *msg,
"set_inflight_fd pervq_inflight_size: %d\n",
pervq_inflight_size);
 
+   /*
+* If VQ 0 has already been allocated, try to allocate on the same
+* NUMA node. It can be reallocated later in numa_realloc().
+*/
+   if (dev->nr_vring > 0)
+   numa_node = dev->virtqueue[0]->numa_node;
+
if (!dev->inflight_info) {
-   dev->inflight_info = calloc(1,
-   sizeof(struct inflight_mem_info));
+   dev->inflight_info = rte_zmalloc_socket("inflight_info",
+   sizeof(struct inflight_mem_info), 0, numa_node);
if (dev->inflight_info == NULL) {
  

Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-15 Thread Bruce Richardson
On Tue, Jun 15, 2021 at 09:53:33AM +0200, Morten Brørup wrote:
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> > Sent: Tuesday, 15 June 2021 08.48
> > 
> > 14/06/2021 17:48, Morten Brørup:
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> > Monjalon
> > > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> > something big enough to hold a sufficiently large array. And possibly
> > add an rte_max_ethports variable to indicate the number of populated
> > entries in the array, for use when iterating over the array.
> > >
> > > Can we come up with another example than RTE_MAX_ETHPORTS where this
> > library provides a better benefit?
> > 
> > What is big enough?
> > Is 640KB enough for RAM? ;)
> 
> Good point!
> 
> I think we agree that:
> - The cost of this library is some added complexity, i.e. working with a 
> dynamically sized array through a library instead of just indexing into a 
> compile time fixed size array.
> - The main benefit of this library is saving some RAM (and still allowing a 
> potentially very high number of ports.)
> 
> My point was: The amount of RAM we are saving is a key parameter for the 
> cost/benefit analysis. And since I don't think the rte_eth_devices[] array 
> uses a significant amount of memory, I was asking for some other array using 
> more memory, where the cost/benefit analysis would come out more advantageous 
> to your proposed parray library.
> 
> > 
> > When dealing with microservices switching, the numbers can increase
> > very fast.
> 
> Yes, I strongly supported increasing the port_id type from 8 to 16 bits for 
> this reason, when it was discussed at the DPDK Userspace a few years ago in 
> Dublin. And with large RTE_MAX_QUEUES_PER_PORT values, the rte_eth_dev 
> structure uses quite a lot of space for the rx/tx callback arrays. But the 
> memory usage of rte_eth_devices[] is still relatively insignificant in a 
> system wide context.
> 
> If main purpose is to optimize the rte_eth_devices[] array, I think there are 
> better alternatives than this library. Bruce and Konstantin already threw a 
> few ideas on the table.
>

Yes, though I think we need to be clear on what problems we are trying to
solve here. A generic resizable array may be a useful library for DPDK in
its own right, but for the ethdev (and other devs) arrays I think my
understanding of the problem is that we want:

* scalability of ethdevs list to large numbers of ports, e.g. 2k
* while not paying a large memory footprint penalty for those apps which
  only need a small number of ports, e.g. 2 or 4.

Is that a fair summary?

/Bruce`


Re: [dpdk-dev] [PATCH] app/testpmd: send failure logs to stderr

2021-06-15 Thread Ferruh Yigit
On 6/15/2021 9:14 AM, Andrew Rybchenko wrote:
> On 6/15/21 10:59 AM, Ferruh Yigit wrote:
>> On 6/14/2021 5:56 PM, Andrew Rybchenko wrote:
>>> On 6/11/21 1:35 PM, Ferruh Yigit wrote:
 On 6/11/2021 10:19 AM, Andrew Rybchenko wrote:
> On 6/11/21 5:06 AM, Li, Xiaoyun wrote:
>> Hi
>> -Original Message-
>> From: Andrew Rybchenko 
>> Sent: Friday, May 28, 2021 00:25
>> To: Li, Xiaoyun 
>> Cc: dev@dpdk.org
>> Subject: [PATCH] app/testpmd: send failure logs to stderr
>>
>> Running with stdout suppressed or redirected for further processing is 
>> very
>> confusing in the case of errors.
>>
>> Signed-off-by: Andrew Rybchenko 
>> ---
>>
>> This patch looks good to me.
>> But what do you think about make it as a fix and backport to stable 
>> branches?
>> Anyway works for me.
>
> I have no strong opinion on the topic.
>
> @Ferruh, what do you think?
>

 Same here, no strong opinion.
 Sending errors to 'stderr' looks correct thing to do, but changing 
 behavior in
 the LTS may cause some unexpected side affect, if it is scripted and 
 testpmd
 output is parsed etc... For this possibility I would wait for the next LTS.
>>>
>>> So, I guess all agree that backporting to LTS is a bad idea because of
>>> behaviour change.
>>>
>>> As I said in a sub-thread I tend to apply in v21.08 since it is a right
>>> thing to do like a fix, but the fix is not that required to be
>>> backported to change behaviour of LTS releases.
>>>
 And because of same reason perhaps a release note can be added.
>>>
>>> I'll make v2 with release notes added. Good point.
>>>
 Also there is 'TESTPMD_LOG' macro for logs in testpmd, (as well as 
 'RTE_LOG'
 macro), I don't know if we should switch all logs, including 'printf', to
 'TESTPMD_LOG' macro?
 Later stdout/sderr can be managed in rte_log level, instead of any specific
 logic for the testpmd.
>>>
>>> I think fprintf() is a better option for debug tool, since its
>>> messages should not go to syslog etc. It should go to stdout/stderr
>>> regardless of logging configuration and log level settings.
>>>
>>
>> Why application should not take log configuration and log level settings into
>> account? I think this is a feature we can benefit.
> 
> For me it sounds like an extra way to shoot its own leg.
> 

Please explain what is the cons of it?

>> And for not logging to syslog, I think it is DPDK wide concern, not specific 
>> to
>> testpmd, we should have way to say don't log to syslog or only log error to
>> syslog etc.. When it is done, using 'TESTPMD_LOG' enables benefiting from 
>> that.
> 
> Logging configuration should be flexible to support various
> logging backends. IMHO, we don't need the flexibility here.
> testpmd is a command-line test application and errors should
> simply go to stderr. That's it. Since the result is the
> same in both ways, my opinion is not strong, I'm just trying
> to explain why I slightly prefer suggested way.
> 

Ability to make it less or more verbose seems good opinion to me, just
printf/fprintf doesn't enable it.

And testpmd sometimes used in non-interactive mode for functional testing,
flexible logging can help there too, I think at least.

> I can switch to TESTPMD_LOG() (or define TESTPMD_ERR() and use
> it) easily. I just need maintainers decision on it.
> 
 Even there was a defect for this in the rte_log level, that logs should go 
 to
 stderr: https://bugs.dpdk.org/show_bug.cgi?id=8


>> Acked-by: Xiaoyun Li 
>
>>>
> 



Re: [dpdk-dev] [PATCH] app/testpmd: send failure logs to stderr

2021-06-15 Thread Andrew Rybchenko
On 6/15/21 11:52 AM, Ferruh Yigit wrote:
> On 6/15/2021 9:14 AM, Andrew Rybchenko wrote:
>> On 6/15/21 10:59 AM, Ferruh Yigit wrote:
>>> On 6/14/2021 5:56 PM, Andrew Rybchenko wrote:
 On 6/11/21 1:35 PM, Ferruh Yigit wrote:
> On 6/11/2021 10:19 AM, Andrew Rybchenko wrote:
>> On 6/11/21 5:06 AM, Li, Xiaoyun wrote:
>>> Hi
>>> -Original Message-
>>> From: Andrew Rybchenko 
>>> Sent: Friday, May 28, 2021 00:25
>>> To: Li, Xiaoyun 
>>> Cc: dev@dpdk.org
>>> Subject: [PATCH] app/testpmd: send failure logs to stderr
>>>
>>> Running with stdout suppressed or redirected for further processing is 
>>> very
>>> confusing in the case of errors.
>>>
>>> Signed-off-by: Andrew Rybchenko 
>>> ---
>>>
>>> This patch looks good to me.
>>> But what do you think about make it as a fix and backport to stable 
>>> branches?
>>> Anyway works for me.
>> I have no strong opinion on the topic.
>>
>> @Ferruh, what do you think?
>>
> Same here, no strong opinion.
> Sending errors to 'stderr' looks correct thing to do, but changing 
> behavior in
> the LTS may cause some unexpected side affect, if it is scripted and 
> testpmd
> output is parsed etc... For this possibility I would wait for the next 
> LTS.
 So, I guess all agree that backporting to LTS is a bad idea because of
 behaviour change.

 As I said in a sub-thread I tend to apply in v21.08 since it is a right
 thing to do like a fix, but the fix is not that required to be
 backported to change behaviour of LTS releases.

> And because of same reason perhaps a release note can be added.
 I'll make v2 with release notes added. Good point.

> Also there is 'TESTPMD_LOG' macro for logs in testpmd, (as well as 
> 'RTE_LOG'
> macro), I don't know if we should switch all logs, including 'printf', to
> 'TESTPMD_LOG' macro?
> Later stdout/sderr can be managed in rte_log level, instead of any 
> specific
> logic for the testpmd.
 I think fprintf() is a better option for debug tool, since its
 messages should not go to syslog etc. It should go to stdout/stderr
 regardless of logging configuration and log level settings.

>>> Why application should not take log configuration and log level settings 
>>> into
>>> account? I think this is a feature we can benefit.
>> For me it sounds like an extra way to shoot its own leg.
>>
> Please explain what is the cons of it?

Possibility to silent error logs for test tools.

May be it is just mine paranoia.

>>> And for not logging to syslog, I think it is DPDK wide concern, not 
>>> specific to
>>> testpmd, we should have way to say don't log to syslog or only log error to
>>> syslog etc.. When it is done, using 'TESTPMD_LOG' enables benefiting from 
>>> that.
>> Logging configuration should be flexible to support various
>> logging backends. IMHO, we don't need the flexibility here.
>> testpmd is a command-line test application and errors should
>> simply go to stderr. That's it. Since the result is the
>> same in both ways, my opinion is not strong, I'm just trying
>> to explain why I slightly prefer suggested way.
>>
> Ability to make it less or more verbose seems good opinion to me, just
> printf/fprintf doesn't enable it.

Yes, for helper tracing and extra information logging. IMHO, no for
error logs.

So, the strong argument here is to use uniform logging in the
code everywhere and hope that if somebody disables/redirects
errors it is really intended.

But in this case all printf's should be converted as well.

> And testpmd sometimes used in non-interactive mode for functional testing,
> flexible logging can help there too, I think at least.

in this case stdout/stderr are simply intercepted and processed.
Yes, it could be a bit easier if we redirect it to an interface which
natively provides messages boundaries - a bit easier to parse/match.

>> I can switch to TESTPMD_LOG() (or define TESTPMD_ERR() and use
>> it) easily. I just need maintainers decision on it.
>>
> Even there was a defect for this in the rte_log level, that logs should 
> go to
> stderr: https://bugs.dpdk.org/show_bug.cgi?id=8
>
>
>>> Acked-by: Xiaoyun Li 



[dpdk-dev] [RFC PATCH v2 0/3] Add PIE support for HQoS library

2021-06-15 Thread Liguzinski, WojciechX
DPDK sched library is equipped with mechanism that secures it from the 
bufferbloat problem
which is a situation when excess buffers in the network cause high latency and 
latency 
variation. Currently, it supports RED for active queue management (which is 
designed 
to control the queue length but it does not control latency directly and is now 
being 
obsoleted). However, more advanced queue management is required to address this 
problem
and provide desirable quality of service to users.

This solution (RFC) proposes usage of new algorithm called "PIE" (Proportional 
Integral
controller Enhanced) that can effectively and directly control queuing latency 
to address 
the bufferbloat problem.

The implementation of mentioned functionality includes modification of existing 
and 
adding a new set of data structures to the library, adding PIE related APIs. 
This affects structures in public API/ABI. That is why deprecation notice is 
going
to be prepared and sent.

Liguzinski, WojciechX (3):
  sched: add PIE based congestion management
  example/qos_sched: add PIE support
  example/ip_pipeline: add PIE support

 config/rte_config.h  |   1 -
 drivers/net/softnic/rte_eth_softnic_tm.c |   6 +-
 examples/ip_pipeline/tmgr.c  |   6 +-
 examples/qos_sched/app_thread.c  |   1 -
 examples/qos_sched/cfg_file.c|  82 -
 examples/qos_sched/init.c|   7 +-
 examples/qos_sched/profile.cfg   | 196 
 lib/sched/meson.build|  10 +-
 lib/sched/rte_pie.c  |  78 +
 lib/sched/rte_pie.h  | 389 +++
 lib/sched/rte_sched.c| 229 +
 lib/sched/rte_sched.h|  53 ++-
 12 files changed, 877 insertions(+), 181 deletions(-)
 create mode 100644 lib/sched/rte_pie.c
 create mode 100644 lib/sched/rte_pie.h

-- 
2.17.1



[dpdk-dev] [RFC PATCH v2 1/3] sched: add PIE based congestion management

2021-06-15 Thread Liguzinski, WojciechX
Implement PIE based congestion management based on rfc8033

Signed-off-by: Liguzinski, WojciechX 
---
 drivers/net/softnic/rte_eth_softnic_tm.c |   6 +-
 lib/sched/meson.build|  10 +-
 lib/sched/rte_pie.c  |  78 +
 lib/sched/rte_pie.h  | 389 +++
 lib/sched/rte_sched.c| 229 +
 lib/sched/rte_sched.h|  53 ++-
 6 files changed, 674 insertions(+), 91 deletions(-)
 create mode 100644 lib/sched/rte_pie.c
 create mode 100644 lib/sched/rte_pie.h

diff --git a/drivers/net/softnic/rte_eth_softnic_tm.c 
b/drivers/net/softnic/rte_eth_softnic_tm.c
index 90baba15ce..5b6c4e6d4b 100644
--- a/drivers/net/softnic/rte_eth_softnic_tm.c
+++ b/drivers/net/softnic/rte_eth_softnic_tm.c
@@ -420,7 +420,7 @@ pmd_tm_node_type_get(struct rte_eth_dev *dev,
return 0;
 }
 
-#ifdef RTE_SCHED_RED
+#ifdef RTE_SCHED_AQM
 #define WRED_SUPPORTED 1
 #else
 #define WRED_SUPPORTED 0
@@ -2306,7 +2306,7 @@ tm_tc_wred_profile_get(struct rte_eth_dev *dev, uint32_t 
tc_id)
return NULL;
 }
 
-#ifdef RTE_SCHED_RED
+#ifdef RTE_SCHED_AQM
 
 static void
 wred_profiles_set(struct rte_eth_dev *dev, uint32_t subport_id)
@@ -2321,7 +2321,7 @@ wred_profiles_set(struct rte_eth_dev *dev, uint32_t 
subport_id)
for (tc_id = 0; tc_id < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; tc_id++)
for (color = RTE_COLOR_GREEN; color < RTE_COLORS; color++) {
struct rte_red_params *dst =
-   &pp->red_params[tc_id][color];
+   &pp->wred_params[tc_id][color];
struct tm_wred_profile *src_wp =
tm_tc_wred_profile_get(dev, tc_id);
struct rte_tm_red_params *src =
diff --git a/lib/sched/meson.build b/lib/sched/meson.build
index b24f7b8775..e7ae9bcf19 100644
--- a/lib/sched/meson.build
+++ b/lib/sched/meson.build
@@ -1,11 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-sources = files('rte_sched.c', 'rte_red.c', 'rte_approx.c')
-headers = files(
-'rte_approx.h',
-'rte_red.h',
-'rte_sched.h',
-'rte_sched_common.h',
-)
+sources = files('rte_sched.c', 'rte_red.c', 'rte_approx.c', 'rte_pie.c')
+headers = files('rte_sched.h', 'rte_sched_common.h',
+   'rte_red.h', 'rte_approx.h', 'rte_pie.h')
 deps += ['mbuf', 'meter']
diff --git a/lib/sched/rte_pie.c b/lib/sched/rte_pie.c
new file mode 100644
index 00..f538dda21d
--- /dev/null
+++ b/lib/sched/rte_pie.c
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include 
+
+#include "rte_pie.h"
+#include 
+#include 
+
+#ifdef __INTEL_COMPILER
+#pragma warning(disable:2259) /* conversion may lose significant bits */
+#endif
+
+int
+rte_pie_rt_data_init(struct rte_pie *pie)
+{
+   if (pie == NULL)
+   return -1;
+
+   pie->active = 0;
+   pie->in_measurement = 0;
+   pie->departed_bytes_count = 0;
+   pie->start_measurement = 0;
+   pie->last_measurement = 0;
+   pie->qlen = 0;
+   pie->avg_dq_time = 0;
+   pie->burst_allowance = 0;
+   pie->qdelay_old = 0;
+   pie->drop_prob = 0;
+   pie->accu_prob = 0;
+
+   return 0;
+}
+
+int
+rte_pie_config_init(struct rte_pie_config *pie_cfg,
+   const uint16_t qdelay_ref,
+   const uint16_t dp_update_interval,
+   const uint16_t max_burst,
+   const uint16_t tailq_th)
+{
+   uint64_t tsc_hz = rte_get_tsc_hz();
+
+   if (pie_cfg == NULL)
+   return -1;
+
+   if (qdelay_ref <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for qdelay_ref\n", __func__);
+   return -EINVAL;
+   }
+
+   if (dp_update_interval <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for dp_update_interval\n", 
__func__);
+   return -EINVAL;
+   }
+
+   if (max_burst <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for max_burst\n", __func__);
+   return -EINVAL;
+   }
+
+   if (tailq_th <= 0) {
+   RTE_LOG(ERR, SCHED,
+   "%s: Incorrect value for tailq_th\n", __func__);
+   return -EINVAL;
+   }
+
+   pie_cfg->qdelay_ref = (tsc_hz * qdelay_ref) / 1000;
+   pie_cfg->dp_update_interval = (tsc_hz * dp_update_interval) / 1000;
+   pie_cfg->max_burst = (tsc_hz * max_burst) / 1000;
+   pie_cfg->tailq_th = tailq_th;
+
+   return 0;
+}
diff --git a/lib/sched/rte_pie.h b/lib/sched/rte_pie.h
new file mode 100644
index 00..a0059aad04
--- /dev/null
+++ b/lib/sched/rte_pie.h
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020

[dpdk-dev] [RFC PATCH v2 2/3] example/qos_sched: add PIE support

2021-06-15 Thread Liguzinski, WojciechX
patch add support enable PIE or RED by
parsing config file.

Signed-off-by: Liguzinski, WojciechX 
---
 config/rte_config.h |   1 -
 examples/qos_sched/app_thread.c |   1 -
 examples/qos_sched/cfg_file.c   |  82 ++---
 examples/qos_sched/init.c   |   7 +-
 examples/qos_sched/profile.cfg  | 196 +---
 5 files changed, 200 insertions(+), 87 deletions(-)

diff --git a/config/rte_config.h b/config/rte_config.h
index 590903c07d..48132f27df 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -89,7 +89,6 @@
 #define RTE_MAX_LCORE_FREQS 64
 
 /* rte_sched defines */
-#undef RTE_SCHED_RED
 #undef RTE_SCHED_COLLECT_STATS
 #undef RTE_SCHED_SUBPORT_TC_OV
 #define RTE_SCHED_PORT_N_GRINDERS 8
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index dbc878b553..895c0d3592 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -205,7 +205,6 @@ app_worker_thread(struct thread_conf **confs)
if (likely(nb_pkt)) {
int nb_sent = rte_sched_port_enqueue(conf->sched_port, 
mbufs,
nb_pkt);
-
APP_STATS_ADD(conf->stat.nb_drop, nb_pkt - nb_sent);
APP_STATS_ADD(conf->stat.nb_rx, nb_pkt);
}
diff --git a/examples/qos_sched/cfg_file.c b/examples/qos_sched/cfg_file.c
index cd167bd8e6..657763ca90 100644
--- a/examples/qos_sched/cfg_file.c
+++ b/examples/qos_sched/cfg_file.c
@@ -242,20 +242,20 @@ cfg_load_subport(struct rte_cfgfile *cfg, struct 
rte_sched_subport_params *subpo
memset(active_queues, 0, sizeof(active_queues));
n_active_queues = 0;
 
-#ifdef RTE_SCHED_RED
-   char sec_name[CFG_NAME_LEN];
-   struct rte_red_params 
red_params[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE][RTE_COLORS];
+#ifdef RTE_SCHED_AQM
+   enum rte_sched_aqm_mode aqm_mode;
 
-   snprintf(sec_name, sizeof(sec_name), "red");
+   struct rte_red_params 
red_params[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE][RTE_COLORS];
 
-   if (rte_cfgfile_has_section(cfg, sec_name)) {
+   if (rte_cfgfile_has_section(cfg, "red")) {
+   aqm_mode = RTE_SCHED_AQM_WRED;
 
for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++) {
char str[32];
 
/* Parse WRED min thresholds */
snprintf(str, sizeof(str), "tc %d wred min", i);
-   entry = rte_cfgfile_get_entry(cfg, sec_name, str);
+   entry = rte_cfgfile_get_entry(cfg, "red", str);
if (entry) {
char *next;
/* for each packet colour (green, yellow, red) 
*/
@@ -315,7 +315,42 @@ cfg_load_subport(struct rte_cfgfile *cfg, struct 
rte_sched_subport_params *subpo
}
}
}
-#endif /* RTE_SCHED_RED */
+
+   struct rte_pie_params pie_params[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE];
+
+   if (rte_cfgfile_has_section(cfg, "pie")) {
+   aqm_mode = RTE_SCHED_AQM_PIE;
+
+   for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++) {
+   char str[32];
+
+   /* Parse Queue Delay Ref value */
+   snprintf(str, sizeof(str), "tc %d qdelay ref", i);
+   entry = rte_cfgfile_get_entry(cfg, "pie", str);
+   if (entry)
+   pie_params[i].qdelay_ref = (uint16_t) 
atoi(entry);
+
+   /* Parse Max Burst value */
+   snprintf(str, sizeof(str), "tc %d max burst", i);
+   entry = rte_cfgfile_get_entry(cfg, "pie", str);
+   if (entry)
+   pie_params[i].max_burst = (uint16_t) 
atoi(entry);
+
+   /* Parse Update Interval Value */
+   snprintf(str, sizeof(str), "tc %d update interval", i);
+   entry = rte_cfgfile_get_entry(cfg, "pie", str);
+   if (entry)
+   pie_params[i].dp_update_interval = (uint16_t) 
atoi(entry);
+
+   /* Parse Tailq Threshold Value */
+   snprintf(str, sizeof(str), "tc %d tailq th", i);
+   entry = rte_cfgfile_get_entry(cfg, "pie", str);
+   if (entry)
+   pie_params[i].tailq_th = (uint16_t) atoi(entry);
+
+   }
+   }
+#endif /* RTE_SCHED_AQM */
 
for (i = 0; i < MAX_SCHED_SUBPORTS; i++) {
char sec_name[CFG_NAME_LEN];
@@ -393,17 +428,30 @@ cfg_load_subport(struct rte_cfgfile *cfg, struct 
rte_sched_subport_params *subpo
}
}
}
-#ifdef RTE_SCHED_RED
+#ifdef RTE_SCHED_AQM
+ 

[dpdk-dev] [RFC PATCH v2 3/3] example/ip_pipeline: add PIE support

2021-06-15 Thread Liguzinski, WojciechX
Adding the PIE support for IP Pipeline

Signed-off-by: Liguzinski, WojciechX 
---
 examples/ip_pipeline/tmgr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/examples/ip_pipeline/tmgr.c b/examples/ip_pipeline/tmgr.c
index e4e364cbc0..73da2da870 100644
--- a/examples/ip_pipeline/tmgr.c
+++ b/examples/ip_pipeline/tmgr.c
@@ -25,8 +25,8 @@ static const struct rte_sched_subport_params 
subport_params_default = {
.pipe_profiles = pipe_profile,
.n_pipe_profiles = 0, /* filled at run time */
.n_max_pipe_profiles = RTE_DIM(pipe_profile),
-#ifdef RTE_SCHED_RED
-.red_params = {
+#ifdef RTE_SCHED_AQM
+.wred_params = {
/* Traffic Class 0 Colors Green / Yellow / Red */
[0][0] = {.min_th = 48, .max_th = 64, .maxp_inv = 10, .wq_log2 = 9},
[0][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 = 9},
@@ -92,7 +92,7 @@ static const struct rte_sched_subport_params 
subport_params_default = {
[12][1] = {.min_th = 40, .max_th = 64, .maxp_inv = 10, .wq_log2 = 9},
[12][2] = {.min_th = 32, .max_th = 64, .maxp_inv = 10, .wq_log2 = 9},
},
-#endif /* RTE_SCHED_RED */
+#endif /* RTE_SCHED_AQM */
 };
 
 static struct tmgr_port_list tmgr_port_list;
-- 
2.17.1



Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-15 Thread Thomas Monjalon
15/06/2021 10:00, Jerin Jacob:
> On Tue, Jun 15, 2021 at 12:22 PM Thomas Monjalon  wrote:
> > 14/06/2021 17:48, Jerin Jacob:
> > > On Mon, Jun 14, 2021 at 8:29 PM Ananyev, Konstantin
> > >  wrote:
> > > > > 14/06/2021 15:15, Bruce Richardson:
> > > > > > While I dislike the hard-coded limits in DPDK, I'm also not 
> > > > > > convinced that
> > > > > > we should switch away from the flat arrays or that we need fully 
> > > > > > dynamic
> > > > > > arrays that grow/shrink at runtime for ethdevs. I would suggest a 
> > > > > > half-way
> > > > > > house here, where we keep the ethdevs as an array, but one 
> > > > > > allocated/sized
> > > > > > at runtime rather than statically. This would allow us to have a
> > > > > > compile-time default value, but, for use cases that need it, allow 
> > > > > > use of a
> > > > > > flag e.g.  "max-ethdevs" to change the size of the parameter given 
> > > > > > to the
> > > > > > malloc call for the array.  This max limit could then be provided 
> > > > > > to apps
> > > > > > too if they want to match any array sizes. [Alternatively those 
> > > > > > apps could
> > > > > > check the provided size and error out if the size has been 
> > > > > > increased beyond
> > > > > > what the app is designed to use?]. There would be no extra 
> > > > > > dereferences per
> > > > > > rx/tx burst call in this scenario so performance should be the same 
> > > > > > as
> > > > > > before (potentially better if array is in hugepage memory, I 
> > > > > > suppose).
> > > > >
> > > > > I think we need some benchmarks to decide what is the best tradeoff.
> > > > > I spent time on this implementation, but sorry I won't have time for 
> > > > > benchmarks.
> > > > > Volunteers?
> > > >
> > > > I had only a quick look at your approach so far.
> > > > But from what I can read, in MT environment your suggestion will require
> > > > extra synchronization for each read-write access to such parray element 
> > > > (lock, rcu, ...).
> > > > I think what Bruce suggests will be much ligther, easier to implement 
> > > > and less error prone.
> > > > At least for rte_ethdevs[] and friends.
> > >
> > > +1
> >
> > Please could you have a deeper look and tell me why we need more locks?
> 
> We don't need more locks (It is fat mutex) now in the implementation.
> 
> If it needs to use in fastpath, we need more state of art
> synchronization like RCU.
> 
> Also, you can take look at VPP dynamic array implementation which is
> used in fastpath.
> 
> https://docs.fd.io/vpp/21.10/db/d65/vec_8h.html
> 
> So the question is the use case for this API. Is it for slowpath item
> like ethdev[] memory
> or fastpath items like holding an array of mbuf etc.

As I replied to Morten, it is for read in fast path
and alloc/free in slow path.
I should highlight this in the commit log if there is a v2.
That's why there is a mutex in alloc/free and nothing in read access.

> > The element pointers doesn't change.
> > Only the array pointer change at resize,
> > but the old one is still usable until the next resize.
> > I think we don't need more.





Re: [dpdk-dev] [PATCH v2 6/6] vhost: convert inflight data to DPDK allocation API

2021-06-15 Thread David Marchand
On Tue, Jun 15, 2021 at 10:43 AM Maxime Coquelin
 wrote:
> @@ -559,6 +559,31 @@ numa_realloc(struct virtio_net *dev, int index)
> vq->log_cache = lc;
> }
>
> +   if (vq->resubmit_inflight) {
> +   struct rte_vhost_resubmit_info *ri;
> +
> +   ri = rte_realloc_socket(vq->resubmit_inflight, sizeof(*ri), 
> 0, node);
> +   if (!ri) {
> +   VHOST_LOG_CONFIG(ERR, "Failed to realloc resubmit 
> inflight on node %d\n",
> +   node);
> +   return dev;
> +   }
> +   vq->resubmit_inflight = ri;
> +
> +   if (vq->resubmit_inflight) {

Quick first pass, I'll review more thoroughly the whole series later.

I suppose you want to test ri->resubmit_list != NULL (else, this test
is unnecessary since we made sure ri != NULL earlier).

> +   struct rte_vhost_resubmit_desc *rd;
> +
> +   rd = rte_realloc_socket(ri->resubmit_list, 
> sizeof(*rd) * ri->resubmit_num,
> +   0, node);
> +   if (!ri) {
> +   VHOST_LOG_CONFIG(ERR, "Failed to realloc 
> resubmit list on node %d\n",
> +   node);
> +   return dev;
> +   }
> +   ri->resubmit_list = rd;
> +   }
> +   }
> +
> vq->numa_node = node;



-- 
David Marchand



Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-15 Thread Thomas Monjalon
15/06/2021 10:44, Bruce Richardson:
> On Tue, Jun 15, 2021 at 09:53:33AM +0200, Morten Brørup wrote:
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon
> > > Sent: Tuesday, 15 June 2021 08.48
> > > 
> > > 14/06/2021 17:48, Morten Brørup:
> > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas
> > > Monjalon
> > > > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> > > something big enough to hold a sufficiently large array. And possibly
> > > add an rte_max_ethports variable to indicate the number of populated
> > > entries in the array, for use when iterating over the array.
> > > >
> > > > Can we come up with another example than RTE_MAX_ETHPORTS where this
> > > library provides a better benefit?
> > > 
> > > What is big enough?
> > > Is 640KB enough for RAM? ;)
> > 
> > Good point!
> > 
> > I think we agree that:
> > - The cost of this library is some added complexity, i.e. working with a 
> > dynamically sized array through a library instead of just indexing into a 
> > compile time fixed size array.
> > - The main benefit of this library is saving some RAM (and still allowing a 
> > potentially very high number of ports.)
> > 
> > My point was: The amount of RAM we are saving is a key parameter for the 
> > cost/benefit analysis. And since I don't think the rte_eth_devices[] array 
> > uses a significant amount of memory, I was asking for some other array 
> > using more memory, where the cost/benefit analysis would come out more 
> > advantageous to your proposed parray library.
> > 
> > > 
> > > When dealing with microservices switching, the numbers can increase
> > > very fast.
> > 
> > Yes, I strongly supported increasing the port_id type from 8 to 16 bits for 
> > this reason, when it was discussed at the DPDK Userspace a few years ago in 
> > Dublin. And with large RTE_MAX_QUEUES_PER_PORT values, the rte_eth_dev 
> > structure uses quite a lot of space for the rx/tx callback arrays. But the 
> > memory usage of rte_eth_devices[] is still relatively insignificant in a 
> > system wide context.
> > 
> > If main purpose is to optimize the rte_eth_devices[] array, I think there 
> > are better alternatives than this library. Bruce and Konstantin already 
> > threw a few ideas on the table.
> >
> 
> Yes, though I think we need to be clear on what problems we are trying to
> solve here. A generic resizable array may be a useful library for DPDK in
> its own right, but for the ethdev (and other devs) arrays I think my
> understanding of the problem is that we want:
> 
> * scalability of ethdevs list to large numbers of ports, e.g. 2k
> * while not paying a large memory footprint penalty for those apps which
>   only need a small number of ports, e.g. 2 or 4.
> 
> Is that a fair summary?

Yes.

We must take into account two related issues:
- the app and libs could allocate some data per device,
increasing the bill.
- per-device allocation may be more efficient
if allocated on the NUMA node of the device




Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-15 Thread Ananyev, Konstantin


> 14/06/2021 17:48, Jerin Jacob:
> > On Mon, Jun 14, 2021 at 8:29 PM Ananyev, Konstantin
> >  wrote:
> > > > 14/06/2021 15:15, Bruce Richardson:
> > > > > On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
> > > > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas 
> > > > > > > Monjalon
> > > > > > > Sent: Monday, 14 June 2021 12.59
> > > > > > >
> > > > > > > Performance of access in a fixed-size array is very good
> > > > > > > because of cache locality
> > > > > > > and because there is a single pointer to dereference.
> > > > > > > The only drawback is the lack of flexibility:
> > > > > > > the size of such an array cannot be increase at runtime.
> > > > > > >
> > > > > > > An approach to this problem is to allocate the array at runtime,
> > > > > > > being as efficient as static arrays, but still limited to a 
> > > > > > > maximum.
> > > > > > >
> > > > > > > That's why the API rte_parray is introduced,
> > > > > > > allowing to declare an array of pointer which can be resized
> > > > > > > dynamically
> > > > > > > and automatically at runtime while keeping a good read 
> > > > > > > performance.
> > > > > > >
> > > > > > > After resize, the previous array is kept until the next resize
> > > > > > > to avoid crashs during a read without any lock.
> > > > > > >
> > > > > > > Each element is a pointer to a memory chunk dynamically allocated.
> > > > > > > This is not good for cache locality but it allows to keep the same
> > > > > > > memory per element, no matter how the array is resized.
> > > > > > > Cache locality could be improved with mempools.
> > > > > > > The other drawback is having to dereference one more pointer
> > > > > > > to read an element.
> > > > > > >
> > > > > > > There is not much locks, so the API is for internal use only.
> > > > > > > This API may be used to completely remove some compilation-time
> > > > > > > maximums.
> > > > > >
> > > > > > I get the purpose and overall intention of this library.
> > > > > >
> > > > > > I probably already mentioned that I prefer "embedded style 
> > > > > > programming" with fixed size arrays, rather than runtime
> configurability. It's
> > > > my personal opinion, and the DPDK Tech Board clearly prefers reducing 
> > > > the amount of compile time configurability, so there is no way
> for
> > > > me to stop this progress, and I do not intend to oppose to this 
> > > > library. :-)
> > > > > >
> > > > > > This library is likely to become a core library of DPDK, so I think 
> > > > > > it is important getting it right. Could you please mention a few
> examples
> > > > where you think this internal library should be used, and where it 
> > > > should not be used. Then it is easier to discuss if the border line
> between
> > > > control path and data plane is correct. E.g. this library is not 
> > > > intended to be used for dynamically sized packet queues that grow and
> shrink in
> > > > the fast path.
> > > > > >
> > > > > > If the library becomes a core DPDK library, it should probably be 
> > > > > > public instead of internal. E.g. if the library is used to make
> > > > RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some 
> > > > applications might also need dynamically sized arrays for their
> > > > application specific per-port runtime data, and this library could 
> > > > serve that purpose too.
> > > > > >
> > > > >
> > > > > Thanks Thomas for starting this discussion and Morten for follow-up.
> > > > >
> > > > > My thinking is as follows, and I'm particularly keeping in mind the 
> > > > > cases
> > > > > of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
> > > > >
> > > > > While I dislike the hard-coded limits in DPDK, I'm also not convinced 
> > > > > that
> > > > > we should switch away from the flat arrays or that we need fully 
> > > > > dynamic
> > > > > arrays that grow/shrink at runtime for ethdevs. I would suggest a 
> > > > > half-way
> > > > > house here, where we keep the ethdevs as an array, but one 
> > > > > allocated/sized
> > > > > at runtime rather than statically. This would allow us to have a
> > > > > compile-time default value, but, for use cases that need it, allow 
> > > > > use of a
> > > > > flag e.g.  "max-ethdevs" to change the size of the parameter given to 
> > > > > the
> > > > > malloc call for the array.  This max limit could then be provided to 
> > > > > apps
> > > > > too if they want to match any array sizes. [Alternatively those apps 
> > > > > could
> > > > > check the provided size and error out if the size has been increased 
> > > > > beyond
> > > > > what the app is designed to use?]. There would be no extra 
> > > > > dereferences per
> > > > > rx/tx burst call in this scenario so performance should be the same as
> > > > > before (potentially better if array is in hugepage memory, I suppose).
> > > >
> > > > I think we need some benchmarks to decide what is the best tradeoff.
> > > > I spent time on this implementation, but sorry I won't h

Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-15 Thread Thomas Monjalon
15/06/2021 11:33, Ananyev, Konstantin:
> > 14/06/2021 17:48, Jerin Jacob:
> > > On Mon, Jun 14, 2021 at 8:29 PM Ananyev, Konstantin
> > >  wrote:
> > > > I had only a quick look at your approach so far.
> > > > But from what I can read, in MT environment your suggestion will require
> > > > extra synchronization for each read-write access to such parray element 
> > > > (lock, rcu, ...).
> > > > I think what Bruce suggests will be much ligther, easier to implement 
> > > > and less error prone.
> > > > At least for rte_ethdevs[] and friends.
> > >
> > > +1
> > 
> > Please could you have a deeper look and tell me why we need more locks?
> > The element pointers doesn't change.
> > Only the array pointer change at resize,
> 
> Yes, array pointer changes at resize, and reader has to read that value
> to access elements in the parray. Which means that we need some sync
> between readers and updaters to avoid reader using stale pointer 
> (ref-counter, rcu, etc.).

No
The old array is still there, so we don't need sync.

> I.E. updater can free old array pointer *only* when it can guarantee that 
> there are no
> readers that still use it.

No
Reading an element is OK because the pointer to the element is not changed.
Getting the pointer to an element from the index is the only thing
which is blocking the freeing of an array,
and I see no reason why dereferencing an index would be longer
than 2 consecutive resizes of the array.

> > but the old one is still usable until the next resize.
> 
> Ok, but what is the guarantee that reader would *always* finish till next 
> resize?
> As an example of such race condition:
> 
> /* global one */
>   struct rte_parray pa;
> 
> /* thread #1, tries to read elem from the array */ 
>   
>   int **x = pa->array;

We should not save the array pointer.
Each index must be dereferenced with the macro
getting the current array pointer.
So the interrupt is during dereference of a single index.

> /* thread # 1 get suspended for a while  at that point */
> 
> /* meanwhile thread #2 does: */
>   
>   /* causes first resize(), x still valid, points to pa->old_array */ 
>   rte_parray_alloc(&pa, ...); 
>   .
>   /* causes second resize(), x now points to freed memory */
>   rte_parray_alloc(&pa, ...);
>   ...

2 resizes is a very long time, it is at minimum 33 allocations!

> /* at that point thread #1 resumes: */
> 
>   /* contents of x[0] are undefined, 'p' could point anywhere,
>might cause segfault or silent memory corruption */  
>   int *p = x[0];
> 
> 
> Yes probability of such situation is quite small.
> But it is still possible.

In device probing, I don't see how it is realistically possible:
33 device allocations during 1 device index being dereferenced.
I agree it is tricky, but that's the whole point of finding tricks
to keep fast code.

> > I think we don't need more.





Re: [dpdk-dev] [PATCH] app/testpmd: send failure logs to stderr

2021-06-15 Thread Ferruh Yigit
On 6/15/2021 10:00 AM, Andrew Rybchenko wrote:
> On 6/15/21 11:52 AM, Ferruh Yigit wrote:
>> On 6/15/2021 9:14 AM, Andrew Rybchenko wrote:
>>> On 6/15/21 10:59 AM, Ferruh Yigit wrote:
 On 6/14/2021 5:56 PM, Andrew Rybchenko wrote:
> On 6/11/21 1:35 PM, Ferruh Yigit wrote:
>> On 6/11/2021 10:19 AM, Andrew Rybchenko wrote:
>>> On 6/11/21 5:06 AM, Li, Xiaoyun wrote:
 Hi
 -Original Message-
 From: Andrew Rybchenko 
 Sent: Friday, May 28, 2021 00:25
 To: Li, Xiaoyun 
 Cc: dev@dpdk.org
 Subject: [PATCH] app/testpmd: send failure logs to stderr

 Running with stdout suppressed or redirected for further processing is 
 very
 confusing in the case of errors.

 Signed-off-by: Andrew Rybchenko 
 ---

 This patch looks good to me.
 But what do you think about make it as a fix and backport to stable 
 branches?
 Anyway works for me.
>>> I have no strong opinion on the topic.
>>>
>>> @Ferruh, what do you think?
>>>
>> Same here, no strong opinion.
>> Sending errors to 'stderr' looks correct thing to do, but changing 
>> behavior in
>> the LTS may cause some unexpected side affect, if it is scripted and 
>> testpmd
>> output is parsed etc... For this possibility I would wait for the next 
>> LTS.
> So, I guess all agree that backporting to LTS is a bad idea because of
> behaviour change.
>
> As I said in a sub-thread I tend to apply in v21.08 since it is a right
> thing to do like a fix, but the fix is not that required to be
> backported to change behaviour of LTS releases.
>
>> And because of same reason perhaps a release note can be added.
> I'll make v2 with release notes added. Good point.
>
>> Also there is 'TESTPMD_LOG' macro for logs in testpmd, (as well as 
>> 'RTE_LOG'
>> macro), I don't know if we should switch all logs, including 'printf', to
>> 'TESTPMD_LOG' macro?
>> Later stdout/sderr can be managed in rte_log level, instead of any 
>> specific
>> logic for the testpmd.
> I think fprintf() is a better option for debug tool, since its
> messages should not go to syslog etc. It should go to stdout/stderr
> regardless of logging configuration and log level settings.
>
 Why application should not take log configuration and log level settings 
 into
 account? I think this is a feature we can benefit.
>>> For me it sounds like an extra way to shoot its own leg.
>>>
>> Please explain what is the cons of it?
> 
> Possibility to silent error logs for test tools.
> 

Got your concern, those can do '2> /dev/null' too :)

I was thinking flexibility to increase verbosity and enable debug options, not
for disabling error/warnings.

> May be it is just mine paranoia.
> 
 And for not logging to syslog, I think it is DPDK wide concern, not 
 specific to
 testpmd, we should have way to say don't log to syslog or only log error to
 syslog etc.. When it is done, using 'TESTPMD_LOG' enables benefiting from 
 that.
>>> Logging configuration should be flexible to support various
>>> logging backends. IMHO, we don't need the flexibility here.
>>> testpmd is a command-line test application and errors should
>>> simply go to stderr. That's it. Since the result is the
>>> same in both ways, my opinion is not strong, I'm just trying
>>> to explain why I slightly prefer suggested way.
>>>
>> Ability to make it less or more verbose seems good opinion to me, just
>> printf/fprintf doesn't enable it.
> 
> Yes, for helper tracing and extra information logging. IMHO, no for
> error logs.
> > So, the strong argument here is to use uniform logging in the
> code everywhere and hope that if somebody disables/redirects
> errors it is really intended.
> 
> But in this case all printf's should be converted as well.
> 

Yes, I was asking if we should do this. Doesn't need to be in single big patch
but it can be done gradually and by adding new logs as TESTPMD_LOG.

>> And testpmd sometimes used in non-interactive mode for functional testing,
>> flexible logging can help there too, I think at least.
> 
> in this case stdout/stderr are simply intercepted and processed.
> Yes, it could be a bit easier if we redirect it to an interface which
> natively provides messages boundaries - a bit easier to parse/match.
> 
>>> I can switch to TESTPMD_LOG() (or define TESTPMD_ERR() and use
>>> it) easily. I just need maintainers decision on it.
>>>
>> Even there was a defect for this in the rte_log level, that logs should 
>> go to
>> stderr: https://bugs.dpdk.org/show_bug.cgi?id=8
>>
>>
 Acked-by: Xiaoyun Li 
> 



[dpdk-dev] [PATCH] net/ice: fix integer overflow when computing max_pkt_len

2021-06-15 Thread Tudor Cornea
Greetings,

Please review the following patch for the dpdk-next-net-intel branch.

The len variable, used in the computation of max_pkt_len could overflow,
if used to store the result of the following computation:

ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len

Since, we could define the mbuf size to have a large value (i.e 13312),
and ICE_SUPPORT_CHAIN_NUM is defined as 5, the computation mentioned
above could potentially result in a value which might be bigger than
MAX_USHORT.

The result will be that Jumbo Frames will not work properly

Signed-off-by: Tudor Cornea 
---
 drivers/net/ice/ice_dcf_ethdev.c | 7 ---
 drivers/net/ice/ice_rxtx.c   | 6 +++---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ice/ice_dcf_ethdev.c
b/drivers/net/ice/ice_dcf_ethdev.c
index b937cbb..f73dc80 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -54,13 +54,14 @@ ice_dcf_init_rxq(struct rte_eth_dev *dev, struct
ice_rx_queue *rxq)
struct ice_dcf_adapter *dcf_ad = dev->data->dev_private;
struct rte_eth_dev_data *dev_data = dev->data;
struct iavf_hw *hw = &dcf_ad->real_hw.avf;
-   uint16_t buf_size, max_pkt_len, len;
+   uint16_t buf_size, max_pkt_len;

buf_size = rte_pktmbuf_data_room_size(rxq->mp) -
RTE_PKTMBUF_HEADROOM;
rxq->rx_hdr_len = 0;
rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
-   len = ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len;
-   max_pkt_len = RTE_MIN(len,
dev->data->dev_conf.rxmode.max_rx_pkt_len);
+   max_pkt_len = RTE_MIN((uint32_t)
+ ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,
+ dev->data->dev_conf.rxmode.max_rx_pkt_len);

/* Check if the jumbo frame and maximum packet length are set
 * correctly.
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index fc9bb5a..20352b0 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -258,7 +258,7 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
struct rte_eth_dev_data *dev_data = rxq->vsi->adapter->pf.dev_data;
struct ice_rlan_ctx rx_ctx;
enum ice_status err;
-   uint16_t buf_size, len;
+   uint16_t buf_size;
struct rte_eth_rxmode *rxmode = &dev_data->dev_conf.rxmode;
uint32_t rxdid = ICE_RXDID_COMMS_OVS;
uint32_t regval;
@@ -268,8 +268,8 @@ ice_program_hw_rx_queue(struct ice_rx_queue *rxq)
  RTE_PKTMBUF_HEADROOM);
rxq->rx_hdr_len = 0;
rxq->rx_buf_len = RTE_ALIGN(buf_size, (1 << ICE_RLAN_CTX_DBUF_S));
-   len = ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len;
-   rxq->max_pkt_len = RTE_MIN(len,
+   rxq->max_pkt_len = RTE_MIN((uint32_t)
+  ICE_SUPPORT_CHAIN_NUM * rxq->rx_buf_len,

 dev_data->dev_conf.rxmode.max_rx_pkt_len);

if (rxmode->offloads & DEV_RX_OFFLOAD_JUMBO_FRAME) {
-- 
2.7.4


Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-15 Thread Ananyev, Konstantin


> 
> 15/06/2021 11:33, Ananyev, Konstantin:
> > > 14/06/2021 17:48, Jerin Jacob:
> > > > On Mon, Jun 14, 2021 at 8:29 PM Ananyev, Konstantin
> > > >  wrote:
> > > > > I had only a quick look at your approach so far.
> > > > > But from what I can read, in MT environment your suggestion will 
> > > > > require
> > > > > extra synchronization for each read-write access to such parray 
> > > > > element (lock, rcu, ...).
> > > > > I think what Bruce suggests will be much ligther, easier to implement 
> > > > > and less error prone.
> > > > > At least for rte_ethdevs[] and friends.
> > > >
> > > > +1
> > >
> > > Please could you have a deeper look and tell me why we need more locks?
> > > The element pointers doesn't change.
> > > Only the array pointer change at resize,
> >
> > Yes, array pointer changes at resize, and reader has to read that value
> > to access elements in the parray. Which means that we need some sync
> > between readers and updaters to avoid reader using stale pointer 
> > (ref-counter, rcu, etc.).
> 
> No
> The old array is still there, so we don't need sync.
> 
> > I.E. updater can free old array pointer *only* when it can guarantee that 
> > there are no
> > readers that still use it.
> 
> No
> Reading an element is OK because the pointer to the element is not changed.
> Getting the pointer to an element from the index is the only thing
> which is blocking the freeing of an array,
> and I see no reason why dereferencing an index would be longer
> than 2 consecutive resizes of the array.

In general, your thread can be switched off the cpu at any moment.
And you don't know for sure when it will be scheduled back.

> 
> > > but the old one is still usable until the next resize.
> >
> > Ok, but what is the guarantee that reader would *always* finish till next 
> > resize?
> > As an example of such race condition:
> >
> > /* global one */
> > struct rte_parray pa;
> >
> > /* thread #1, tries to read elem from the array */
> > 
> > int **x = pa->array;
> 
> We should not save the array pointer.
> Each index must be dereferenced with the macro
> getting the current array pointer.
> So the interrupt is during dereference of a single index.

You still need to read your pa->array somewhere (let say into a register).
Straight after that your thread can be interrupted.
Then when it is scheduled back to the CPU that value (in a register) might be s 
stale one.

> 
> > /* thread # 1 get suspended for a while  at that point */
> >
> > /* meanwhile thread #2 does: */
> > 
> > /* causes first resize(), x still valid, points to pa->old_array */
> > rte_parray_alloc(&pa, ...);
> > .
> > /* causes second resize(), x now points to freed memory */
> > rte_parray_alloc(&pa, ...);
> > ...
> 
> 2 resizes is a very long time, it is at minimum 33 allocations!
> 
> > /* at that point thread #1 resumes: */
> >
> > /* contents of x[0] are undefined, 'p' could point anywhere,
> >  might cause segfault or silent memory corruption */
> > int *p = x[0];
> >
> >
> > Yes probability of such situation is quite small.
> > But it is still possible.
> 
> In device probing, I don't see how it is realistically possible:
> 33 device allocations during 1 device index being dereferenced.

Yeh, it would work fine 1M times, but sometimes will crash.
Which will make it even harder to reproduce, debug and fix.
I think that when introducing a new generic library into DPDK,
we should avoid making such assumptions.

> I agree it is tricky, but that's the whole point of finding tricks
> to keep fast code.

It is not tricky, it is buggy 😊
You introducing a race condition into the new core generic library by design,
and trying to convince people that it is *OK*.
Sorry, but NACK from me till that issue will be addressed.


> 
> > > I think we don't need more.
> 
> 



[dpdk-dev] [PATCH] app/eventdev: add option to enable per port pool

2021-06-15 Thread pbhagavatula
From: Pavan Nikhilesh 

Add option to configure unique mempool for each ethernet device
port. Can be used with `pipeline_atq` and `pipeline_queue` tests.

Signed-off-by: Pavan Nikhilesh 
---
 app/test-eventdev/evt_common.h   |  1 +
 app/test-eventdev/evt_options.c  |  9 
 app/test-eventdev/evt_options.h  |  1 +
 app/test-eventdev/test_pipeline_common.c | 52 +---
 app/test-eventdev/test_pipeline_common.h |  2 +-
 doc/guides/tools/testeventdev.rst|  8 
 6 files changed, 58 insertions(+), 15 deletions(-)

diff --git a/app/test-eventdev/evt_common.h b/app/test-eventdev/evt_common.h
index 0e228258e7..28afb114b3 100644
--- a/app/test-eventdev/evt_common.h
+++ b/app/test-eventdev/evt_common.h
@@ -55,6 +55,7 @@ struct evt_options {
uint8_t timdev_cnt;
uint8_t nb_timer_adptrs;
uint8_t timdev_use_burst;
+   uint8_t per_port_pool;
uint8_t sched_type_list[EVT_MAX_STAGES];
uint16_t mbuf_sz;
uint16_t wkr_deq_dep;
diff --git a/app/test-eventdev/evt_options.c b/app/test-eventdev/evt_options.c
index 061b63e12e..bfa3840dbc 100644
--- a/app/test-eventdev/evt_options.c
+++ b/app/test-eventdev/evt_options.c
@@ -297,6 +297,12 @@ evt_parse_eth_queues(struct evt_options *opt, const char 
*arg)
return ret;
 }
 
+evt_parse_per_port_pool(struct evt_options *opt, const char *arg __rte_unused)
+{
+   opt->per_port_pool = 1;
+   return 0;
+}
+
 static void
 usage(char *program)
 {
@@ -333,6 +339,7 @@ usage(char *program)
"\t--enable_vector: enable event vectorization.\n"
"\t--vector_size  : Max vector size.\n"
"\t--vector_tmo_ns: Max vector timeout in nanoseconds\n"
+   "\t--per_port_pool: Configure unique pool per ethdev port\n"
);
printf("available tests:\n");
evt_test_dump_names();
@@ -408,6 +415,7 @@ static struct option lgopts[] = {
{ EVT_ENA_VECTOR,  0, 0, 0 },
{ EVT_VECTOR_SZ,   1, 0, 0 },
{ EVT_VECTOR_TMO,  1, 0, 0 },
+   { EVT_PER_PORT_POOL,   0, 0, 0 },
{ EVT_HELP,0, 0, 0 },
{ NULL,0, 0, 0 }
 };
@@ -446,6 +454,7 @@ evt_opts_parse_long(int opt_idx, struct evt_options *opt)
{ EVT_ENA_VECTOR, evt_parse_ena_vector},
{ EVT_VECTOR_SZ, evt_parse_vector_size},
{ EVT_VECTOR_TMO, evt_parse_vector_tmo_ns},
+   { EVT_PER_PORT_POOL, evt_parse_per_port_pool},
};
 
for (i = 0; i < RTE_DIM(parsermap); i++) {
diff --git a/app/test-eventdev/evt_options.h b/app/test-eventdev/evt_options.h
index 1cea2a3e11..6436200b40 100644
--- a/app/test-eventdev/evt_options.h
+++ b/app/test-eventdev/evt_options.h
@@ -46,6 +46,7 @@
 #define EVT_ENA_VECTOR   ("enable_vector")
 #define EVT_VECTOR_SZ("vector_size")
 #define EVT_VECTOR_TMO   ("vector_tmo_ns")
+#define EVT_PER_PORT_POOL   ("per_port_pool")
 #define EVT_HELP ("help")
 
 void evt_options_default(struct evt_options *opt);
diff --git a/app/test-eventdev/test_pipeline_common.c 
b/app/test-eventdev/test_pipeline_common.c
index d5ef90500f..6ee530d4cd 100644
--- a/app/test-eventdev/test_pipeline_common.c
+++ b/app/test-eventdev/test_pipeline_common.c
@@ -259,9 +259,10 @@ pipeline_ethdev_setup(struct evt_test *test, struct 
evt_options *opt)
}
 
for (j = 0; j < opt->eth_queues; j++) {
-   if (rte_eth_rx_queue_setup(i, j, NB_RX_DESC,
-  rte_socket_id(), &rx_conf,
-  t->pool) < 0) {
+   if (rte_eth_rx_queue_setup(
+   i, j, NB_RX_DESC, rte_socket_id(), &rx_conf,
+   opt->per_port_pool ? t->pool[i] :
+ t->pool[0]) < 0) {
evt_err("Failed to setup eth port [%d] 
rx_queue: %d.",
i, 0);
return -EINVAL;
@@ -569,18 +570,35 @@ pipeline_mempool_setup(struct evt_test *test, struct 
evt_options *opt)
if (data_size  > opt->mbuf_sz)
opt->mbuf_sz = data_size;
}
+   if (opt->per_port_pool) {
+   char name[RTE_MEMPOOL_NAMESIZE];
+
+   snprintf(name, RTE_MEMPOOL_NAMESIZE, "%s-%d",
+test->name, i);
+   t->pool[i] = rte_pktmbuf_pool_create(
+   name, /* mempool name */
+   opt->pool_sz, /* number of elements*/
+   0,/* cache size*/
+   0, opt->mbuf_sz, opt->socket_id); /* flags 

Re: [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in DPDK

2021-06-15 Thread Xia, Chenbo
Hi Thomas,

> -Original Message-
> From: Thomas Monjalon 
> Sent: Tuesday, June 15, 2021 3:48 PM
> To: Xia, Chenbo 
> Cc: dev@dpdk.org; Liang, Cunming ; Wu, Jingjing
> ; Burakov, Anatoly ; Yigit,
> Ferruh ; m...@ashroe.eu; nhor...@tuxdriver.com;
> Richardson, Bruce ; david.march...@redhat.com;
> step...@networkplumber.org; Ananyev, Konstantin 
> ;
> j...@nvidia.com; pa...@nvidia.com; xuemi...@nvidia.com
> Subject: Re: [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in
> DPDK
> 
> 15/06/2021 04:49, Xia, Chenbo:
> > From: Thomas Monjalon 
> > > 01/06/2021 05:06, Chenbo Xia:
> > > > Hi everyone,
> > > >
> > > > This is a draft implementation of the mdev (Mediated device [1])
> > > > support in DPDK PCI bus driver. Mdev is a way to virtualize devices
> > > > in Linux kernel. Based on the device-api (mdev_type/device_api),
> > > > there could be different types of mdev devices (e.g. vfio-pci).
> > >
> > > Please could you illustrate with an usage of mdev in DPDK?
> > > What does it enable which is not possible today?
> >
> > The main purpose is for DPDK to drive mdev-based devices, which is not
> > possible today.
> >
> > I'd take PCI devices for an example. Currently DPDK can only drive devices
> > of physical pci bus under /sys/bus/pci and kernel exposes the pci devices
> > to APP in that way.
> >
> > But there are PCI devices using vfio-mdev as a software framework to expose
> > Mdev to APP under /sys/bus/mdev. Devices could choose this way of
> virtualizing
> > itself to let multiple APPs share one physical device. For example, Intel
> > Scalable IOV technology is known to use vfio-mdev as SW framework for
> Scalable
> > IOV enabled devices (and Intel net/crypto/raw devices support this tech).
> For
> > those mdev-based devices, DPDK needs support on the bus layer to
> scan/plug/probe/..
> > them, which is the main effort this patchset does. There are also other
> devices
> > using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using
> mdev
> > and Intel's GPU virtualization also uses it.
> 
> Yes mdev was designed for virtualization I think.
> The use of mdev for Scalable IOV without virtualization
> may be seen as an abuse by Linux maintainers,
> as they currently seem to prefer the auxiliary bus (which is a real bus).
> 
> Mellanox got a push back when trying to use mdev for the same purpose
> (Scalable Function, also called Sub-Function) in the kernel.
> The Linux community decided to use the auxiliary bus.
> 
> Any other feedback on the choice mdev vs aux?

OK. Thanks for the info. Much appreciated.

I could investigate a bit about the choice and later come back to you.

> Is there any kernel code supporting this mdev model for Intel devices?

Now there's only intel GPU. But I think you care more about devices that DPDK 
could
drive: a dma device (DPDK's name ioat under raw/ioat) is on its way upstreaming
(https://www.spinics.net/lists/kvm/msg244417.html)

Thanks,
Chenbo

> 
> > > > In this patchset, the PCI bus driver is extended to support scanning
> > > > and probing the mdev devices whose device-api is "vfio-pci".
> > > >
> > > >  +-+
> > > >  | PCI bus |
> > > >  +++
> > > >   |
> > > >  ++---+---++
> > > >  ||   ||
> > > >   Physical PCI devices ...   Mediated PCI devices ...
> > > >
> > > > The first four patches in this patchset are mainly preparation of mdev
> > > > bus support. The left two patches are the key implementation of mdev 
> > > > bus.
> > > >
> > > > The implementation of mdev bus in DPDK has several options:
> > > >
> > > > 1: Embed mdev bus in current pci bus
> > > >
> > > >This patchset takes this option for an example. Mdev has several
> > > >device types: pci/platform/amba/ccw/ap. DPDK currently only cares
> > > >pci devices in all mdev device types so we could embed the mdev bus
> > > >into current pci bus. Then pci bus with mdev support will scan/plug/
> > > >unplug/.. not only normal pci devices but also mediated pci devices.
> > >
> > > I think it is a different bus.
> > > It would be cleaner to not touch the PCI bus.
> > > Having a separate bus will allow an easy way to identify a device
> > > with the new generic devargs syntax, example:
> > >   bus=mdev,uuid=XXX
> > > or more complex:
> > >   bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar
> >
> > OK. Agree on cleaner to not touch PCI bus. And there may also be a
> 'type=pci'
> > as mdev has several types in its definition (pci/ap/platform/ccw/...).
> >
> > > > 2: A new mdev bus that scans mediated pci devices and probes mdev driver
> to
> > > >plug-in pci devices to pci bus
> > > >
> > > >If we took this option, a new mdev bus will be implemented to scan
> > > >mediated pci devices and a new mdev driver for pci devices will be
> > > >implemented in pci bus to plug-in mediated pci de

[dpdk-dev] [PATCH] bus: clarify log for non-NUMA-aware devices

2021-06-15 Thread Dmitry Kozlyuk
PCI and vmbus drivers printed a warning
when NUMA node had beed reported as (-1) or not reported by OS:

EAL:   Invalid NUMA socket, default to 0

This message and its level might confuse users, because configuration
is valid and nothing happens that requires attention or intervention.

Reduce level to INFO and reword the message.

Fixes: f0e0e86aa35d ("pci: move NUMA node check from scan to probe")
Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
Cc: sta...@dpdk.org

Signed-off-by: Dmitry Kozlyuk 
Reviewed-by: Slava Ovsiienko 
---
Hi Xueming,
Please align logging in the pending bus/auxiliary patch.

 doc/guides/nics/ena.rst  | 2 +-
 drivers/bus/pci/pci_common.c | 2 +-
 drivers/bus/vmbus/vmbus_common.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 0f1f63f722..694ce1da74 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -234,7 +234,7 @@ Example output:
 
[...]
EAL: PCI device :00:06.0 on NUMA socket -1
-   EAL:   Invalid NUMA socket, default to 0
+   EAL:   Device is not NUMA-aware, defaulting socket to 0
EAL:   probe driver: 1d0f:ec20 net_ena
 
Interactive-mode selected
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 35d7d092d1..bf06f81229 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -190,7 +190,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
}
 
if (dev->device.numa_node < 0) {
-   RTE_LOG(WARNING, EAL, "  Invalid NUMA socket, default to 0\n");
+   RTE_LOG(INFO, EAL, "  Device is not NUMA-aware, defaulting 
socket to 0\n");
dev->device.numa_node = 0;
}
 
diff --git a/drivers/bus/vmbus/vmbus_common.c b/drivers/bus/vmbus/vmbus_common.c
index d25fd14ef5..ef23af90ec 100644
--- a/drivers/bus/vmbus/vmbus_common.c
+++ b/drivers/bus/vmbus/vmbus_common.c
@@ -112,7 +112,7 @@ vmbus_probe_one_driver(struct rte_vmbus_driver *dr,
dev->driver = dr;
 
if (dev->device.numa_node < 0) {
-   VMBUS_LOG(WARNING, "  Invalid NUMA socket, default to 0");
+   VMBUS_LOG(INFO, "  Device is not NUMA-aware, defaulting socket 
to 0\n");
dev->device.numa_node = 0;
}
 
-- 
2.18.2



Re: [dpdk-dev] [PATCH 2/2] devtools: auto detect branch to search fix patches

2021-06-15 Thread Xueming(Steven) Li


> -Original Message-
> From: Christian Ehrhardt 
> Sent: Monday, June 14, 2021 10:16 PM
> To: Xueming(Steven) Li 
> Cc: dev ; Luca Boccassi ; NBU-Contact-Thomas 
> Monjalon ; dpdk stable
> ; Yuanhan Liu 
> Subject: Re: [PATCH 2/2] devtools: auto detect branch to search fix patches
> 
> On Sat, Jun 12, 2021 at 3:57 PM Xueming Li  wrote:
> >
> > Current fix scan scripts scanned specified range in current(HEAD)
> > branch. When users run it in a earlier branch, few patches were
> 
> ^^ typo missing "an" (if you care)
> 
> > scanned.
> >
> > This patch auto etects branch to scan from range.
> 
> ^^ typo missing "d" (if you care)

Thanks :)

> 
> >
> > Fixes: 752d8e097ec1 ("scripts: show fixes with release version of
> > bug")
> > Cc: Thomas Monjalon 
> > Cc: sta...@dpdk.org
> > Signed-off-by: Xueming Li 
> > ---
> >  devtools/git-log-fixes.sh | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/devtools/git-log-fixes.sh b/devtools/git-log-fixes.sh
> > index 5fc57da913..9a8a9d6739 100755
> > --- a/devtools/git-log-fixes.sh
> > +++ b/devtools/git-log-fixes.sh
> > @@ -34,13 +34,15 @@ done
> >  shift $(($OPTIND - 1))
> >  [ $# -ge 1 ] || usage_error 'range argument required'
> >  range="$*"
> > +range_last=$(git log --oneline v21.05-rc3..v21.05 |head -n1|cut -d' '
> > +-f1)
> 
> Instead of these values that would need to be dynamic to be generally 
> reliable right?
> Everyone might need something different.
> 
> I thought about the same and wondered if this script should get a new 
> optional argument.
> If passed it will use this new argument instead of $refbranch
> 
> That would allow any user today to be able to continue to use it as-is and 
> anyone else can for reliable behavior define the branch to
> look in.

Looks good. There are two scenarios for this script:
Called from check-git-log.sh: if version not found, default to VERSION 
file
Standalone running with range: I don't think a default is required.
So the default version only valid when "branch" argument is "HEAD"
 
> 
> > +# use first branch
> > +refbranch=$(git branch --contains $range_last -r --sort=-authordate
> > +|head -n1)
> >
> >  # get major release version of a commit  commit_version () # 
> > {
> > local VER="v*.*"
> > # use current branch as history reference
> > -   local refbranch=$(git rev-parse --abbrev-ref HEAD)
> > local tag=$( (git tag -l $VER --contains $1 --sort=creatordate 
> > --merged $refbranch 2>&- ||
> > # tag --merged option has been introduced in git 2.7.0
> > # below is a fallback in case of old git version
> > --
> > 2.25.1
> >
> 
> 
> --
> Christian Ehrhardt
> Staff Engineer, Ubuntu Server
> Canonical Ltd


Re: [dpdk-dev] [PATCH] net/mlx5: fix switchdev mode recognition

2021-06-15 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Slava Ovsiienko 
> Sent: Friday, June 11, 2021 6:37 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh ; Matan Azrad
> ; sta...@dpdk.org
> Subject: [PATCH] net/mlx5: fix switchdev mode recognition
> 
> The new kernels might add the switch_id attribute to the Netlink replies and
> this caused the wrong recognition of the E-Switch presence. The single uplink
> device was erroneously recognized as master and it caused the extending
> match for source vport index on all installed flows, including the default 
> ones,
> and adding extra hops in the steering engine, that affected the maximal
> throughput packet rate.
> 
> The extra check for the new device name format (it supposes the new
> kernel) and the device is only one is added. If this check succeeds the E-
> Switch presence is considered as wrongly detected and overridden.
> 
> Fixes: 30a86157f6d5 ("net/mlx5: support PF representor")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh


Re: [dpdk-dev] [PATCH] net/mlx5: fix receiving queue timestamp format

2021-06-15 Thread Raslan Darawsheh
Hi,

> -Original Message-
> From: Slava Ovsiienko 
> Sent: Monday, June 14, 2021 4:53 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh ; Matan Azrad
> ; christian.ehrha...@canonical.com; Xueming(Steven)
> Li ; sta...@dpdk.org
> Subject: [PATCH] net/mlx5: fix receiving queue timestamp format
> 
> The timestamp format was not configured correctly for the receiving queues
> created via DevX calls. It caused non-UTC timestamps in CQEs  for real time
> configurations.
> 
> Fixes: d61381ad46d0 ("net/mlx5: support timestamp format")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko 
> ---

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh


[dpdk-dev] [PATCH v2 00/32] add support for baseband phy

2021-06-15 Thread Tomasz Duszynski
This series adds initial support for baseband PHY available on SOCs
belonging to Fusion family. BPHY is a hardware block comprising
accelerators and DSPs specifically tailored for 5G/LTE inline usecases.

This series introduces two rawdev PMDs along with low level common code.

CGX/RPM PMD allows one to configure Ethernet I/O interfaces attached to
BPHY via standard enqueue/dequeue operations.

BPHY PMD provides an out-of-band access to PCI device BARs and a set of
experimental APIs allowing one to setup custom IRQs handlers. This
functionality is backed by kernel module using ioctl() mechanism. Series
has nothing to do with 5G/LTE baseband protocol processing.

v2:
- change some errors to more relevant ones (-EINVAL/-ENODEV)
- fix MAINTAINERS styling issues
- fix dpdk-devbind.py
- fix meson.build styling issues
- fix warning related to possibly uninitialized scr0 variable
- fix warning releated to unused function
- improve documentation
- improve enums items naming
- spread documentation across relevant patches

Tomasz Duszynski (28):
  common/cnxk: add bphy cgx/rpm initialization and cleanup
  common/cnxk: support for communication with atf
  common/cnxk: support for getting link information
  common/cnxk: support for changing internal loopback
  common/cnxk: support for changing ptp mode
  common/cnxk: support for setting link mode
  common/cnxk: support for changing link state
  common/cnxk: support for lmac start/stop
  raw/cnxk_bphy: add bphy cgx/rpm skeleton driver
  raw/cnxk_bphy: support for reading queue configuration
  raw/cnxk_bphy: support for reading queue count
  raw/cnxk_bphy: support for enqueue operation
  raw/cnxk_bphy: support for dequeue operation
  raw/cnxk_bphy: support for performing selftest
  common/cnxk: support for device init and fini
  common/cnxk: support for baseband PHY irq setup
  common/cnxk: support for checking irq availability
  common/cnxk: support for retrieving irq stack
  common/cnxk: support for removing irq stack
  common/cnxk: support for setting bphy irq handler
  common/cnxk: support for clearing bphy irq handler
  common/cnxk: support for registering bphy irq
  raw/cnxk_bphy: add baseband PHY skeleton driver
  raw/cnxk_bphy: support for reading bphy queue configuration
  raw/cnxk_bphy: support for reading bphy queue count
  raw/cnxk_bphy: support for bphy enqueue operation
  raw/cnxk_bphy: support for bphy dequeue operation
  raw/cnxk_bphy: support for interrupt init and cleanup
  raw/cnxk_bphy: support for reading number of bphy irqs
  raw/cnxk_bphy: support for retrieving bphy device memory
  raw/cnxk_bphy: support for registering bphy irq handlers
  raw/cnxk_bphy: support for bphy selftest

 MAINTAINERS|   7 +-
 doc/guides/rawdevs/cnxk_bphy.rst   | 154 
 doc/guides/rawdevs/index.rst   |   1 +
 doc/guides/rel_notes/release_21_08.rst |  13 +
 drivers/common/cnxk/meson.build|   3 +
 drivers/common/cnxk/roc_api.h  |   7 +
 drivers/common/cnxk/roc_bphy.c |  40 ++
 drivers/common/cnxk/roc_bphy.h |  17 +
 drivers/common/cnxk/roc_bphy_cgx.c | 396 +++
 drivers/common/cnxk/roc_bphy_cgx.h | 120 ++
 drivers/common/cnxk/roc_bphy_cgx_priv.h| 131 +++
 drivers/common/cnxk/roc_bphy_irq.c | 422 +
 drivers/common/cnxk/roc_bphy_irq.h |  49 +++
 drivers/common/cnxk/roc_idev.c |   1 +
 drivers/common/cnxk/roc_idev_priv.h|   2 +
 drivers/common/cnxk/roc_io.h   |   9 +
 drivers/common/cnxk/roc_io_generic.h   |   5 +
 drivers/common/cnxk/roc_priv.h |   3 +
 drivers/common/cnxk/version.map|  22 ++
 drivers/raw/cnxk_bphy/cnxk_bphy.c  | 329 
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c  | 321 
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.h  |  10 +
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx_test.c | 206 ++
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.c  | 100 +
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.h  |  41 ++
 drivers/raw/cnxk_bphy/meson.build  |  12 +
 drivers/raw/cnxk_bphy/rte_pmd_bphy.h   | 233 
 drivers/raw/cnxk_bphy/version.map  |   3 +
 drivers/raw/meson.build|   1 +
 usertools/dpdk-devbind.py  |   6 +-
 30 files changed, 2662 insertions(+), 2 deletions(-)
 create mode 100644 doc/guides/rawdevs/cnxk_bphy.rst
 create mode 100644 drivers/common/cnxk/roc_bphy.c
 create mode 100644 drivers/common/cnxk/roc_bphy.h
 create mode 100644 drivers/common/cnxk/roc_bphy_cgx.c
 create mode 100644 drivers/common/cnxk/roc_bphy_cgx.h
 create mode 100644 drivers/common/cnxk/roc_bphy_cgx_priv.h
 create mode 100644 drivers/common/cnxk/roc_bphy_irq.c
 create mode 100644 drivers/common/cnxk/roc_bphy_irq.h
 create mode 100644 drivers/raw/cnxk_bphy/cnxk_bphy.c
 create mode 100644 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
 create mode 100644 drivers/raw/

[dpdk-dev] [PATCH v2 01/32] common/cnxk: add bphy cgx/rpm initialization and cleanup

2021-06-15 Thread Tomasz Duszynski
Add support for low level initialization and cleanup of baseband
phy cgx/rpm blocks.

Initialization and cleanup are related hence are in the same patch.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/common/cnxk/meson.build|  1 +
 drivers/common/cnxk/roc_api.h  |  3 ++
 drivers/common/cnxk/roc_bphy_cgx.c | 62 ++
 drivers/common/cnxk/roc_bphy_cgx.h | 20 ++
 drivers/common/cnxk/version.map|  2 +
 5 files changed, 88 insertions(+)
 create mode 100644 drivers/common/cnxk/roc_bphy_cgx.c
 create mode 100644 drivers/common/cnxk/roc_bphy_cgx.h

diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 178bce7ab..59975fd34 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -11,6 +11,7 @@ endif
 config_flag_fmt = 'RTE_LIBRTE_@0@_COMMON'
 deps = ['eal', 'pci', 'bus_pci', 'mbuf']
 sources = files(
+'roc_bphy_cgx.c',
 'roc_dev.c',
 'roc_idev.c',
 'roc_irq.c',
diff --git a/drivers/common/cnxk/roc_api.h b/drivers/common/cnxk/roc_api.h
index 67f5d13f0..256d8c68d 100644
--- a/drivers/common/cnxk/roc_api.h
+++ b/drivers/common/cnxk/roc_api.h
@@ -100,4 +100,7 @@
 /* Idev */
 #include "roc_idev.h"
 
+/* Baseband phy cgx */
+#include "roc_bphy_cgx.h"
+
 #endif /* _ROC_API_H_ */
diff --git a/drivers/common/cnxk/roc_bphy_cgx.c 
b/drivers/common/cnxk/roc_bphy_cgx.c
new file mode 100644
index 0..029d4102e
--- /dev/null
+++ b/drivers/common/cnxk/roc_bphy_cgx.c
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include "roc_api.h"
+
+/*
+ * CN10K stores number of lmacs in 4 bit filed
+ * in contraty to CN9K which uses only 3 bits.
+ *
+ * In theory masks should differ yet on CN9K
+ * bits beyond specified range contain zeros.
+ *
+ * Hence common longer mask may be used.
+ */
+#define CGX_CMRX_RX_LMACS  0x128
+#define CGX_CMRX_RX_LMACS_LMACS GENMASK_ULL(3, 0)
+
+static uint64_t
+roc_bphy_cgx_read(struct roc_bphy_cgx *roc_cgx, uint64_t lmac, uint64_t offset)
+{
+   int shift = roc_model_is_cn10k() ? 20 : 18;
+   uint64_t base = (uint64_t)roc_cgx->bar0_va;
+
+   return plt_read64(base + (lmac << shift) + offset);
+}
+
+static unsigned int
+roc_bphy_cgx_dev_id(struct roc_bphy_cgx *roc_cgx)
+{
+   uint64_t cgx_id = roc_model_is_cn10k() ? GENMASK_ULL(26, 24) :
+GENMASK_ULL(25, 24);
+
+   return FIELD_GET(cgx_id, roc_cgx->bar0_pa);
+}
+
+int
+roc_bphy_cgx_dev_init(struct roc_bphy_cgx *roc_cgx)
+{
+   uint64_t val;
+
+   if (!roc_cgx || !roc_cgx->bar0_va || !roc_cgx->bar0_pa)
+   return -EINVAL;
+
+   val = roc_bphy_cgx_read(roc_cgx, 0, CGX_CMRX_RX_LMACS);
+   val = FIELD_GET(CGX_CMRX_RX_LMACS_LMACS, val);
+   if (roc_model_is_cn9k())
+   val = GENMASK_ULL(val - 1, 0);
+   roc_cgx->lmac_bmap = val;
+   roc_cgx->id = roc_bphy_cgx_dev_id(roc_cgx);
+
+   return 0;
+}
+
+int
+roc_bphy_cgx_dev_fini(struct roc_bphy_cgx *roc_cgx)
+{
+   if (!roc_cgx)
+   return -EINVAL;
+
+   return 0;
+}
diff --git a/drivers/common/cnxk/roc_bphy_cgx.h 
b/drivers/common/cnxk/roc_bphy_cgx.h
new file mode 100644
index 0..aac2c262c
--- /dev/null
+++ b/drivers/common/cnxk/roc_bphy_cgx.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#ifndef _ROC_BPHY_CGX_H_
+#define _ROC_BPHY_CGX_H_
+
+#include "roc_api.h"
+
+struct roc_bphy_cgx {
+   uint64_t bar0_pa;
+   void *bar0_va;
+   uint64_t lmac_bmap;
+   unsigned int id;
+} __plt_cache_aligned;
+
+__roc_api int roc_bphy_cgx_dev_init(struct roc_bphy_cgx *roc_cgx);
+__roc_api int roc_bphy_cgx_dev_fini(struct roc_bphy_cgx *roc_cgx);
+
+#endif /* _ROC_BPHY_CGX_H_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 8e67c83a6..1db4d104a 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -9,6 +9,8 @@ INTERNAL {
cnxk_logtype_sso;
cnxk_logtype_tim;
cnxk_logtype_tm;
+   roc_bphy_cgx_dev_fini;
+   roc_bphy_cgx_dev_init;
roc_clk_freq_get;
roc_error_msg_get;
roc_idev_lmt_base_addr_get;
-- 
2.25.1



[dpdk-dev] [PATCH v2 02/32] common/cnxk: support for communication with atf

2021-06-15 Thread Tomasz Duszynski
Messages can be exchanged between userspace software and firmware
via set of two dedicated registers, namely scratch1 and scratch0.

scratch1 acts as a command register i.e message is sent to firmware,
while scratch0 holds response to previously sent message.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/common/cnxk/roc_bphy_cgx.c  | 145 
 drivers/common/cnxk/roc_bphy_cgx.h  |   4 +
 drivers/common/cnxk/roc_bphy_cgx_priv.h |  54 +
 drivers/common/cnxk/roc_priv.h  |   3 +
 4 files changed, 206 insertions(+)
 create mode 100644 drivers/common/cnxk/roc_bphy_cgx_priv.h

diff --git a/drivers/common/cnxk/roc_bphy_cgx.c 
b/drivers/common/cnxk/roc_bphy_cgx.c
index 029d4102e..8549145a1 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.c
+++ b/drivers/common/cnxk/roc_bphy_cgx.c
@@ -2,8 +2,13 @@
  * Copyright(C) 2021 Marvell.
  */
 
+#include 
+
 #include "roc_api.h"
+#include "roc_priv.h"
 
+#define CGX_CMRX_INT  0x40
+#define CGX_CMRX_INT_OVERFLW  BIT_ULL(1)
 /*
  * CN10K stores number of lmacs in 4 bit filed
  * in contraty to CN9K which uses only 3 bits.
@@ -15,6 +20,8 @@
  */
 #define CGX_CMRX_RX_LMACS  0x128
 #define CGX_CMRX_RX_LMACS_LMACS GENMASK_ULL(3, 0)
+#define CGX_CMRX_SCRATCH0  0x1050
+#define CGX_CMRX_SCRATCH1  0x1058
 
 static uint64_t
 roc_bphy_cgx_read(struct roc_bphy_cgx *roc_cgx, uint64_t lmac, uint64_t offset)
@@ -25,6 +32,137 @@ roc_bphy_cgx_read(struct roc_bphy_cgx *roc_cgx, uint64_t 
lmac, uint64_t offset)
return plt_read64(base + (lmac << shift) + offset);
 }
 
+static void
+roc_bphy_cgx_write(struct roc_bphy_cgx *roc_cgx, uint64_t lmac, uint64_t 
offset,
+  uint64_t value)
+{
+   int shift = roc_model_is_cn10k() ? 20 : 18;
+   uint64_t base = (uint64_t)roc_cgx->bar0_va;
+
+   plt_write64(value, base + (lmac << shift) + offset);
+}
+
+static void
+roc_bphy_cgx_ack(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
+uint64_t *scr0)
+{
+   uint64_t val;
+
+   /* clear interrupt */
+   val = roc_bphy_cgx_read(roc_cgx, lmac, CGX_CMRX_INT);
+   val |= FIELD_PREP(CGX_CMRX_INT_OVERFLW, 1);
+   roc_bphy_cgx_write(roc_cgx, lmac, CGX_CMRX_INT, val);
+
+   /* ack fw response */
+   *scr0 &= ~SCR0_ETH_EVT_STS_S_ACK;
+   roc_bphy_cgx_write(roc_cgx, lmac, CGX_CMRX_SCRATCH0, *scr0);
+}
+
+static int
+roc_bphy_cgx_wait_for_ownership(struct roc_bphy_cgx *roc_cgx, unsigned int 
lmac,
+   uint64_t *scr0)
+{
+   int tries = 5000;
+   uint64_t scr1;
+
+   do {
+   *scr0 = roc_bphy_cgx_read(roc_cgx, lmac, CGX_CMRX_SCRATCH0);
+   scr1 = roc_bphy_cgx_read(roc_cgx, lmac, CGX_CMRX_SCRATCH1);
+
+   if (FIELD_GET(SCR1_OWN_STATUS, scr1) == ETH_OWN_NON_SECURE_SW &&
+   FIELD_GET(SCR0_ETH_EVT_STS_S_ACK, *scr0) == 0)
+   break;
+
+   /* clear async events if any */
+   if (FIELD_GET(SCR0_ETH_EVT_STS_S_EVT_TYPE, *scr0) == 
ETH_EVT_ASYNC &&
+   FIELD_GET(SCR0_ETH_EVT_STS_S_ACK, *scr0))
+   roc_bphy_cgx_ack(roc_cgx, lmac, scr0);
+
+   plt_delay_ms(1);
+   } while (--tries);
+
+   return tries ? 0 : -ETIMEDOUT;
+}
+
+static int
+roc_bphy_cgx_wait_for_ack(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
+ uint64_t *scr0)
+{
+   int tries = 5000;
+   uint64_t scr1;
+
+   do {
+   *scr0 = roc_bphy_cgx_read(roc_cgx, lmac, CGX_CMRX_SCRATCH0);
+   scr1 = roc_bphy_cgx_read(roc_cgx, lmac, CGX_CMRX_SCRATCH1);
+
+   if (FIELD_GET(SCR1_OWN_STATUS, scr1) == ETH_OWN_NON_SECURE_SW &&
+   FIELD_GET(SCR0_ETH_EVT_STS_S_ACK, *scr0))
+   break;
+
+   plt_delay_ms(1);
+   } while (--tries);
+
+   return tries ? 0 : -ETIMEDOUT;
+}
+
+static int __rte_unused
+roc_bphy_cgx_intf_req(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
+ uint64_t scr1, uint64_t *scr0)
+{
+   uint8_t cmd_id = FIELD_GET(SCR1_ETH_CMD_ID, scr1);
+   int ret;
+
+   pthread_mutex_lock(&roc_cgx->lock);
+
+   /* wait for ownership */
+   ret = roc_bphy_cgx_wait_for_ownership(roc_cgx, lmac, scr0);
+   if (ret) {
+   plt_err("timed out waiting for ownership");
+   goto out;
+   }
+
+   /* write command */
+   scr1 |= FIELD_PREP(SCR1_OWN_STATUS, ETH_OWN_FIRMWARE);
+   roc_bphy_cgx_write(roc_cgx, lmac, CGX_CMRX_SCRATCH1, scr1);
+
+   /* wait for command ack */
+   ret = roc_bphy_cgx_wait_for_ack(roc_cgx, lmac, scr0);
+   if (ret) {
+   plt_err("timed out waiting for response");
+   goto out;
+   }
+
+   if (cmd_id == ETH_CMD_INTF_SHUTDOWN)
+   goto out;
+
+   if (FIELD_GET(SCR0_ETH_EVT_STS_S_EVT_TYPE, *scr0) != ETH_EVT_CMD_RESP) {
+

[dpdk-dev] [PATCH v2 03/32] common/cnxk: support for getting link information

2021-06-15 Thread Tomasz Duszynski
Add support for retrieving link information.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/common/cnxk/roc_bphy_cgx.c  | 40 +-
 drivers/common/cnxk/roc_bphy_cgx.h  | 70 +
 drivers/common/cnxk/roc_bphy_cgx_priv.h |  9 
 drivers/common/cnxk/version.map |  1 +
 4 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/drivers/common/cnxk/roc_bphy_cgx.c 
b/drivers/common/cnxk/roc_bphy_cgx.c
index 8549145a1..6279345c9 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.c
+++ b/drivers/common/cnxk/roc_bphy_cgx.c
@@ -105,7 +105,7 @@ roc_bphy_cgx_wait_for_ack(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac,
return tries ? 0 : -ETIMEDOUT;
 }
 
-static int __rte_unused
+static int
 roc_bphy_cgx_intf_req(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
  uint64_t scr1, uint64_t *scr0)
 {
@@ -205,3 +205,41 @@ roc_bphy_cgx_dev_fini(struct roc_bphy_cgx *roc_cgx)
 
return 0;
 }
+
+static bool
+roc_bphy_cgx_lmac_exists(struct roc_bphy_cgx *roc_cgx, unsigned int lmac)
+{
+   return (lmac < MAX_LMACS_PER_CGX) &&
+  (roc_cgx->lmac_bmap & BIT_ULL(lmac));
+}
+
+int
+roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
+ struct roc_bphy_cgx_link_info *info)
+{
+   uint64_t scr1, scr0;
+   int ret;
+
+   if (!roc_cgx)
+   return -EINVAL;
+
+   if (!roc_bphy_cgx_lmac_exists(roc_cgx, lmac))
+   return -ENODEV;
+
+   if (!info)
+   return -EINVAL;
+
+   scr1 = FIELD_PREP(SCR1_ETH_CMD_ID, ETH_CMD_GET_LINK_STS);
+   ret = roc_bphy_cgx_intf_req(roc_cgx, lmac, scr1, &scr0);
+   if (ret)
+   return ret;
+
+   info->link_up = FIELD_GET(SCR0_ETH_LNK_STS_S_LINK_UP, scr0);
+   info->full_duplex = FIELD_GET(SCR0_ETH_LNK_STS_S_FULL_DUPLEX, scr0);
+   info->speed = FIELD_GET(SCR0_ETH_LNK_STS_S_SPEED, scr0);
+   info->an = FIELD_GET(SCR0_ETH_LNK_STS_S_AN, scr0);
+   info->fec = FIELD_GET(SCR0_ETH_LNK_STS_S_FEC, scr0);
+   info->mode = FIELD_GET(SCR0_ETH_LNK_STS_S_MODE, scr0);
+
+   return 0;
+}
diff --git a/drivers/common/cnxk/roc_bphy_cgx.h 
b/drivers/common/cnxk/roc_bphy_cgx.h
index 37b5c2742..641650d66 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.h
+++ b/drivers/common/cnxk/roc_bphy_cgx.h
@@ -9,6 +9,8 @@
 
 #include "roc_api.h"
 
+#define MAX_LMACS_PER_CGX 4
+
 struct roc_bphy_cgx {
uint64_t bar0_pa;
void *bar0_va;
@@ -18,7 +20,75 @@ struct roc_bphy_cgx {
pthread_mutex_t lock;
 } __plt_cache_aligned;
 
+enum roc_bphy_cgx_eth_link_speed {
+   ROC_BPHY_CGX_ETH_LINK_SPEED_NONE,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_10M,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_100M,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_1G,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_2HG,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_5G,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_10G,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_20G,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_25G,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_40G,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_50G,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_80G,
+   ROC_BPHY_CGX_ETH_LINK_SPEED_100G,
+   __ROC_BPHY_CGX_ETH_LINK_SPEED_MAX
+};
+
+enum roc_bphy_cgx_eth_link_fec {
+   ROC_BPHY_CGX_ETH_LINK_FEC_NONE,
+   ROC_BPHY_CGX_ETH_LINK_FEC_BASE_R,
+   ROC_BPHY_CGX_ETH_LINK_FEC_RS,
+   __ROC_BPHY_CGX_ETH_LINK_FEC_MAX
+};
+
+enum roc_bphy_cgx_eth_link_mode {
+   ROC_BPHY_CGX_ETH_LINK_MODE_SGMII_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_1000_BASEX_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_QSGMII_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_10G_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_10G_C2M_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_10G_KR_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_20G_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_25G_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_25G_C2M_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_25G_2_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_25G_CR_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_25G_KR_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_40G_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_40G_C2M_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_40G_CR4_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_40G_KR4_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_40GAUI_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_50G_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_50G_C2M_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_50G_4_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_50G_CR_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_50G_KR_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_80GAUI_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_100G_C2C_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_100G_C2M_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_100G_CR4_BIT,
+   ROC_BPHY_CGX_ETH_LINK_MODE_100G_KR4_BIT,
+   __ROC_BPHY_CGX_ETH_LINK_MODE_MAX
+};
+
+struct roc_bphy_cgx_link_info {
+   bool link_up;
+   bool full_duplex;
+   enum roc_bphy_cgx_eth_link_speed speed;
+   bool an;
+   enum roc_bphy

[dpdk-dev] [PATCH v2 04/32] common/cnxk: support for changing internal loopback

2021-06-15 Thread Tomasz Duszynski
Add support for enabling or disabling internal loopback.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/common/cnxk/roc_bphy_cgx.c  | 30 +
 drivers/common/cnxk/roc_bphy_cgx.h  |  4 
 drivers/common/cnxk/roc_bphy_cgx_priv.h |  4 
 drivers/common/cnxk/version.map |  2 ++
 4 files changed, 40 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_cgx.c 
b/drivers/common/cnxk/roc_bphy_cgx.c
index 6279345c9..004323968 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.c
+++ b/drivers/common/cnxk/roc_bphy_cgx.c
@@ -213,6 +213,24 @@ roc_bphy_cgx_lmac_exists(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac)
   (roc_cgx->lmac_bmap & BIT_ULL(lmac));
 }
 
+static int
+roc_bphy_cgx_intlbk_ena_dis(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
+   bool enable)
+{
+   uint64_t scr1, scr0;
+
+   if (!roc_cgx)
+   return -EINVAL;
+
+   if (!roc_bphy_cgx_lmac_exists(roc_cgx, lmac))
+   return -ENODEV;
+
+   scr1 = FIELD_PREP(SCR1_ETH_CMD_ID, ETH_CMD_INTERNAL_LBK) |
+  FIELD_PREP(SCR1_ETH_CTL_ARGS_ENABLE, enable);
+
+   return roc_bphy_cgx_intf_req(roc_cgx, lmac, scr1, &scr0);
+}
+
 int
 roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
  struct roc_bphy_cgx_link_info *info)
@@ -243,3 +261,15 @@ roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac,
 
return 0;
 }
+
+int
+roc_bphy_cgx_intlbk_enable(struct roc_bphy_cgx *roc_cgx, unsigned int lmac)
+{
+   return roc_bphy_cgx_intlbk_ena_dis(roc_cgx, lmac, true);
+}
+
+int
+roc_bphy_cgx_intlbk_disable(struct roc_bphy_cgx *roc_cgx, unsigned int lmac)
+{
+   return roc_bphy_cgx_intlbk_ena_dis(roc_cgx, lmac, false);
+}
diff --git a/drivers/common/cnxk/roc_bphy_cgx.h 
b/drivers/common/cnxk/roc_bphy_cgx.h
index 641650d66..970122845 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.h
+++ b/drivers/common/cnxk/roc_bphy_cgx.h
@@ -90,5 +90,9 @@ __roc_api int roc_bphy_cgx_dev_fini(struct roc_bphy_cgx 
*roc_cgx);
 __roc_api int roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx,
unsigned int lmac,
struct roc_bphy_cgx_link_info *info);
+__roc_api int roc_bphy_cgx_intlbk_enable(struct roc_bphy_cgx *roc_cgx,
+unsigned int lmac);
+__roc_api int roc_bphy_cgx_intlbk_disable(struct roc_bphy_cgx *roc_cgx,
+ unsigned int lmac);
 
 #endif /* _ROC_BPHY_CGX_H_ */
diff --git a/drivers/common/cnxk/roc_bphy_cgx_priv.h 
b/drivers/common/cnxk/roc_bphy_cgx_priv.h
index c0550ae87..cb59cac09 100644
--- a/drivers/common/cnxk/roc_bphy_cgx_priv.h
+++ b/drivers/common/cnxk/roc_bphy_cgx_priv.h
@@ -8,6 +8,7 @@
 /* REQUEST ID types. Input to firmware */
 enum eth_cmd_id {
ETH_CMD_GET_LINK_STS = 4,
+   ETH_CMD_INTERNAL_LBK = 7,
ETH_CMD_INTF_SHUTDOWN = 12,
 };
 
@@ -58,6 +59,9 @@ enum eth_cmd_own {
 /* struct eth_cmd */
 #define SCR1_ETH_CMD_ID GENMASK_ULL(7, 2)
 
+/* struct eth_ctl_args */
+#define SCR1_ETH_CTL_ARGS_ENABLE BIT_ULL(8)
+
 #define SCR1_OWN_STATUS GENMASK_ULL(1, 0)
 
 #endif /* _ROC_BPHY_CGX_PRIV_H_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 466207f9d..71437a6c5 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -12,6 +12,8 @@ INTERNAL {
roc_bphy_cgx_dev_fini;
roc_bphy_cgx_dev_init;
roc_bphy_cgx_get_linkinfo;
+   roc_bphy_cgx_intlbk_disable;
+   roc_bphy_cgx_intlbk_enable;
roc_clk_freq_get;
roc_error_msg_get;
roc_idev_lmt_base_addr_get;
-- 
2.25.1



[dpdk-dev] [PATCH v2 05/32] common/cnxk: support for changing ptp mode

2021-06-15 Thread Tomasz Duszynski
Add support for enabling or disablig ptp mode.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/common/cnxk/roc_bphy_cgx.c  | 33 +
 drivers/common/cnxk/roc_bphy_cgx.h  |  5 
 drivers/common/cnxk/roc_bphy_cgx_priv.h |  1 +
 drivers/common/cnxk/version.map |  2 ++
 4 files changed, 41 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_cgx.c 
b/drivers/common/cnxk/roc_bphy_cgx.c
index 004323968..ba86a7dab 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.c
+++ b/drivers/common/cnxk/roc_bphy_cgx.c
@@ -231,6 +231,27 @@ roc_bphy_cgx_intlbk_ena_dis(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac,
return roc_bphy_cgx_intf_req(roc_cgx, lmac, scr1, &scr0);
 }
 
+static int
+roc_bphy_cgx_ptp_rx_ena_dis(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
+   bool enable)
+{
+   uint64_t scr1, scr0;
+
+   if (roc_model_is_cn10k())
+   return -ENOTSUP;
+
+   if (!roc_cgx)
+   return -EINVAL;
+
+   if (!roc_bphy_cgx_lmac_exists(roc_cgx, lmac))
+   return -ENODEV;
+
+   scr1 = FIELD_PREP(SCR1_ETH_CMD_ID, ETH_CMD_SET_PTP_MODE) |
+  FIELD_PREP(SCR1_ETH_CTL_ARGS_ENABLE, enable);
+
+   return roc_bphy_cgx_intf_req(roc_cgx, lmac, scr1, &scr0);
+}
+
 int
 roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
  struct roc_bphy_cgx_link_info *info)
@@ -273,3 +294,15 @@ roc_bphy_cgx_intlbk_disable(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac)
 {
return roc_bphy_cgx_intlbk_ena_dis(roc_cgx, lmac, false);
 }
+
+int
+roc_bphy_cgx_ptp_rx_enable(struct roc_bphy_cgx *roc_cgx, unsigned int lmac)
+{
+   return roc_bphy_cgx_ptp_rx_ena_dis(roc_cgx, lmac, true);
+}
+
+int
+roc_bphy_cgx_ptp_rx_disable(struct roc_bphy_cgx *roc_cgx, unsigned int lmac)
+{
+   return roc_bphy_cgx_ptp_rx_ena_dis(roc_cgx, lmac, false);
+}
diff --git a/drivers/common/cnxk/roc_bphy_cgx.h 
b/drivers/common/cnxk/roc_bphy_cgx.h
index 970122845..992e2d3ed 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.h
+++ b/drivers/common/cnxk/roc_bphy_cgx.h
@@ -94,5 +94,10 @@ __roc_api int roc_bphy_cgx_intlbk_enable(struct roc_bphy_cgx 
*roc_cgx,
 unsigned int lmac);
 __roc_api int roc_bphy_cgx_intlbk_disable(struct roc_bphy_cgx *roc_cgx,
  unsigned int lmac);
+__roc_api int roc_bphy_cgx_ptp_rx_enable(struct roc_bphy_cgx *roc_cgx,
+unsigned int lmac);
+__roc_api int roc_bphy_cgx_ptp_rx_disable(struct roc_bphy_cgx *roc_cgx,
+ unsigned int lmac);
+
 
 #endif /* _ROC_BPHY_CGX_H_ */
diff --git a/drivers/common/cnxk/roc_bphy_cgx_priv.h 
b/drivers/common/cnxk/roc_bphy_cgx_priv.h
index cb59cac09..4e86ae4ea 100644
--- a/drivers/common/cnxk/roc_bphy_cgx_priv.h
+++ b/drivers/common/cnxk/roc_bphy_cgx_priv.h
@@ -10,6 +10,7 @@ enum eth_cmd_id {
ETH_CMD_GET_LINK_STS = 4,
ETH_CMD_INTERNAL_LBK = 7,
ETH_CMD_INTF_SHUTDOWN = 12,
+   ETH_CMD_SET_PTP_MODE = 34,
 };
 
 /* event types - cause of interrupt */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 71437a6c5..205a0602b 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -14,6 +14,8 @@ INTERNAL {
roc_bphy_cgx_get_linkinfo;
roc_bphy_cgx_intlbk_disable;
roc_bphy_cgx_intlbk_enable;
+   roc_bphy_cgx_ptp_rx_disable;
+   roc_bphy_cgx_ptp_rx_enable;
roc_clk_freq_get;
roc_error_msg_get;
roc_idev_lmt_base_addr_get;
-- 
2.25.1



[dpdk-dev] [PATCH v2 06/32] common/cnxk: support for setting link mode

2021-06-15 Thread Tomasz Duszynski
Add support for setting link mode.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/common/cnxk/roc_bphy_cgx.c  | 28 
 drivers/common/cnxk/roc_bphy_cgx.h  | 11 +
 drivers/common/cnxk/roc_bphy_cgx_priv.h | 61 +
 drivers/common/cnxk/version.map |  1 +
 4 files changed, 101 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_cgx.c 
b/drivers/common/cnxk/roc_bphy_cgx.c
index ba86a7dab..3aaf22ec9 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.c
+++ b/drivers/common/cnxk/roc_bphy_cgx.c
@@ -283,6 +283,34 @@ roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac,
return 0;
 }
 
+int
+roc_bphy_cgx_set_link_mode(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
+  struct roc_bphy_cgx_link_mode *mode)
+{
+   uint64_t scr1, scr0;
+
+   if (roc_model_is_cn10k())
+   return -ENOTSUP;
+
+   if (!roc_cgx)
+   return -EINVAL;
+
+   if (!roc_bphy_cgx_lmac_exists(roc_cgx, lmac))
+   return -ENODEV;
+
+   if (!mode)
+   return -EINVAL;
+
+   scr1 = FIELD_PREP(SCR1_ETH_CMD_ID, ETH_CMD_MODE_CHANGE) |
+  FIELD_PREP(SCR1_ETH_MODE_CHANGE_ARGS_SPEED, mode->speed) |
+  FIELD_PREP(SCR1_ETH_MODE_CHANGE_ARGS_DUPLEX, mode->full_duplex) |
+  FIELD_PREP(SCR1_ETH_MODE_CHANGE_ARGS_AN, mode->an) |
+  FIELD_PREP(SCR1_ETH_MODE_CHANGE_ARGS_PORT, mode->port) |
+  FIELD_PREP(SCR1_ETH_MODE_CHANGE_ARGS_MODE, BIT_ULL(mode->mode));
+
+   return roc_bphy_cgx_intf_req(roc_cgx, lmac, scr1, &scr0);
+}
+
 int
 roc_bphy_cgx_intlbk_enable(struct roc_bphy_cgx *roc_cgx, unsigned int lmac)
 {
diff --git a/drivers/common/cnxk/roc_bphy_cgx.h 
b/drivers/common/cnxk/roc_bphy_cgx.h
index 992e2d3ed..b9a6e0be0 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.h
+++ b/drivers/common/cnxk/roc_bphy_cgx.h
@@ -75,6 +75,14 @@ enum roc_bphy_cgx_eth_link_mode {
__ROC_BPHY_CGX_ETH_LINK_MODE_MAX
 };
 
+struct roc_bphy_cgx_link_mode {
+   bool full_duplex;
+   bool an;
+   unsigned int port;
+   enum roc_bphy_cgx_eth_link_speed speed;
+   enum roc_bphy_cgx_eth_link_mode mode;
+};
+
 struct roc_bphy_cgx_link_info {
bool link_up;
bool full_duplex;
@@ -90,6 +98,9 @@ __roc_api int roc_bphy_cgx_dev_fini(struct roc_bphy_cgx 
*roc_cgx);
 __roc_api int roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx,
unsigned int lmac,
struct roc_bphy_cgx_link_info *info);
+__roc_api int roc_bphy_cgx_set_link_mode(struct roc_bphy_cgx *roc_cgx,
+unsigned int lmac,
+struct roc_bphy_cgx_link_mode *mode);
 __roc_api int roc_bphy_cgx_intlbk_enable(struct roc_bphy_cgx *roc_cgx,
 unsigned int lmac);
 __roc_api int roc_bphy_cgx_intlbk_disable(struct roc_bphy_cgx *roc_cgx,
diff --git a/drivers/common/cnxk/roc_bphy_cgx_priv.h 
b/drivers/common/cnxk/roc_bphy_cgx_priv.h
index 4e86ae4ea..ee7578423 100644
--- a/drivers/common/cnxk/roc_bphy_cgx_priv.h
+++ b/drivers/common/cnxk/roc_bphy_cgx_priv.h
@@ -5,10 +5,64 @@
 #ifndef _ROC_BPHY_CGX_PRIV_H_
 #define _ROC_BPHY_CGX_PRIV_H_
 
+/* LINK speed types */
+enum eth_link_speed {
+   ETH_LINK_NONE,
+   ETH_LINK_10M,
+   ETH_LINK_100M,
+   ETH_LINK_1G,
+   ETH_LINK_2HG, /* 2.5 Gbps */
+   ETH_LINK_5G,
+   ETH_LINK_10G,
+   ETH_LINK_20G,
+   ETH_LINK_25G,
+   ETH_LINK_40G,
+   ETH_LINK_50G,
+   ETH_LINK_80G,
+   ETH_LINK_100G,
+   ETH_LINK_MAX,
+};
+
+/* Supported LINK MODE enums
+ * Each link mode is a bit mask of these
+ * enums which are represented as bits
+ */
+enum eth_mode {
+   ETH_MODE_SGMII_BIT = 0,
+   ETH_MODE_1000_BASEX_BIT,
+   ETH_MODE_QSGMII_BIT,
+   ETH_MODE_10G_C2C_BIT,
+   ETH_MODE_10G_C2M_BIT,
+   ETH_MODE_10G_KR_BIT, /* = 5 */
+   ETH_MODE_20G_C2C_BIT,
+   ETH_MODE_25G_C2C_BIT,
+   ETH_MODE_25G_C2M_BIT,
+   ETH_MODE_25G_2_C2C_BIT,
+   ETH_MODE_25G_CR_BIT, /* = 10 */
+   ETH_MODE_25G_KR_BIT,
+   ETH_MODE_40G_C2C_BIT,
+   ETH_MODE_40G_C2M_BIT,
+   ETH_MODE_40G_CR4_BIT,
+   ETH_MODE_40G_KR4_BIT, /* = 15 */
+   ETH_MODE_40GAUI_C2C_BIT,
+   ETH_MODE_50G_C2C_BIT,
+   ETH_MODE_50G_C2M_BIT,
+   ETH_MODE_50G_4_C2C_BIT,
+   ETH_MODE_50G_CR_BIT, /* = 20 */
+   ETH_MODE_50G_KR_BIT,
+   ETH_MODE_80GAUI_C2C_BIT,
+   ETH_MODE_100G_C2C_BIT,
+   ETH_MODE_100G_C2M_BIT,
+   ETH_MODE_100G_CR4_BIT, /* = 25 */
+   ETH_MODE_100G_KR4_BIT,
+   ETH_MODE_MAX_BIT /* = 27 */
+};
+
 /* REQUEST ID types. Input to firmware */
 enum eth_cmd_id {
ETH_CMD_GET_LINK_STS = 4,
ETH_CMD_INTERNAL_LBK = 7,
+   ETH_CMD_MODE_CHANGE = 11, /* hot plug support */
ETH_CMD_INTF_SHUTDOWN = 12,
   

[dpdk-dev] [PATCH v2 07/32] common/cnxk: support for changing link state

2021-06-15 Thread Tomasz Duszynski
Add support for setting link up or down.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/common/cnxk/roc_bphy_cgx.c  | 18 ++
 drivers/common/cnxk/roc_bphy_cgx.h  |  2 ++
 drivers/common/cnxk/roc_bphy_cgx_priv.h |  2 ++
 drivers/common/cnxk/version.map |  1 +
 4 files changed, 23 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_cgx.c 
b/drivers/common/cnxk/roc_bphy_cgx.c
index 3aaf22ec9..9665bafc9 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.c
+++ b/drivers/common/cnxk/roc_bphy_cgx.c
@@ -252,6 +252,24 @@ roc_bphy_cgx_ptp_rx_ena_dis(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac,
return roc_bphy_cgx_intf_req(roc_cgx, lmac, scr1, &scr0);
 }
 
+int
+roc_bphy_cgx_set_link_state(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
+   bool state)
+{
+   uint64_t scr1, scr0;
+
+   if (!roc_cgx)
+   return -EINVAL;
+
+   if (!roc_bphy_cgx_lmac_exists(roc_cgx, lmac))
+   return -ENODEV;
+
+   scr1 = state ? FIELD_PREP(SCR1_ETH_CMD_ID, ETH_CMD_LINK_BRING_UP) :
+  FIELD_PREP(SCR1_ETH_CMD_ID, ETH_CMD_LINK_BRING_DOWN);
+
+   return roc_bphy_cgx_intf_req(roc_cgx, lmac, scr1, &scr0);
+}
+
 int
 roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
  struct roc_bphy_cgx_link_info *info)
diff --git a/drivers/common/cnxk/roc_bphy_cgx.h 
b/drivers/common/cnxk/roc_bphy_cgx.h
index b9a6e0be0..ab6239202 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.h
+++ b/drivers/common/cnxk/roc_bphy_cgx.h
@@ -95,6 +95,8 @@ struct roc_bphy_cgx_link_info {
 __roc_api int roc_bphy_cgx_dev_init(struct roc_bphy_cgx *roc_cgx);
 __roc_api int roc_bphy_cgx_dev_fini(struct roc_bphy_cgx *roc_cgx);
 
+__roc_api int roc_bphy_cgx_set_link_state(struct roc_bphy_cgx *roc_cgx,
+ unsigned int lmac, bool state);
 __roc_api int roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx,
unsigned int lmac,
struct roc_bphy_cgx_link_info *info);
diff --git a/drivers/common/cnxk/roc_bphy_cgx_priv.h 
b/drivers/common/cnxk/roc_bphy_cgx_priv.h
index ee7578423..71a277fff 100644
--- a/drivers/common/cnxk/roc_bphy_cgx_priv.h
+++ b/drivers/common/cnxk/roc_bphy_cgx_priv.h
@@ -61,6 +61,8 @@ enum eth_mode {
 /* REQUEST ID types. Input to firmware */
 enum eth_cmd_id {
ETH_CMD_GET_LINK_STS = 4,
+   ETH_CMD_LINK_BRING_UP = 5,
+   ETH_CMD_LINK_BRING_DOWN = 6,
ETH_CMD_INTERNAL_LBK = 7,
ETH_CMD_MODE_CHANGE = 11, /* hot plug support */
ETH_CMD_INTF_SHUTDOWN = 12,
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 15a6d3a3b..7766f52e0 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -17,6 +17,7 @@ INTERNAL {
roc_bphy_cgx_ptp_rx_disable;
roc_bphy_cgx_ptp_rx_enable;
roc_bphy_cgx_set_link_mode;
+   roc_bphy_cgx_set_link_state;
roc_clk_freq_get;
roc_error_msg_get;
roc_idev_lmt_base_addr_get;
-- 
2.25.1



[dpdk-dev] [PATCH v2 08/32] common/cnxk: support for lmac start/stop

2021-06-15 Thread Tomasz Duszynski
Add support for starting or stopping specific lmac. Start enables rx/tx
traffic while stop does the opposite.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/common/cnxk/roc_bphy_cgx.c | 42 ++
 drivers/common/cnxk/roc_bphy_cgx.h |  4 +++
 drivers/common/cnxk/version.map|  2 ++
 3 files changed, 48 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_cgx.c 
b/drivers/common/cnxk/roc_bphy_cgx.c
index 9665bafc9..47e9d8f47 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.c
+++ b/drivers/common/cnxk/roc_bphy_cgx.c
@@ -7,6 +7,9 @@
 #include "roc_api.h"
 #include "roc_priv.h"
 
+#define CGX_CMRX_CONFIG   0x00
+#define CGX_CMRX_CONFIG_DATA_PKT_RX_EN BIT_ULL(54)
+#define CGX_CMRX_CONFIG_DATA_PKT_TX_EN BIT_ULL(53)
 #define CGX_CMRX_INT  0x40
 #define CGX_CMRX_INT_OVERFLW  BIT_ULL(1)
 /*
@@ -213,6 +216,33 @@ roc_bphy_cgx_lmac_exists(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac)
   (roc_cgx->lmac_bmap & BIT_ULL(lmac));
 }
 
+static int
+roc_bphy_cgx_start_stop_rxtx(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
+bool start)
+{
+   uint64_t val;
+
+   if (!roc_cgx)
+   return -EINVAL;
+
+   if (!roc_bphy_cgx_lmac_exists(roc_cgx, lmac))
+   return -ENODEV;
+
+   pthread_mutex_lock(&roc_cgx->lock);
+   val = roc_bphy_cgx_read(roc_cgx, lmac, CGX_CMRX_CONFIG);
+   val &= ~(CGX_CMRX_CONFIG_DATA_PKT_RX_EN |
+CGX_CMRX_CONFIG_DATA_PKT_TX_EN);
+
+   if (start)
+   val |= FIELD_PREP(CGX_CMRX_CONFIG_DATA_PKT_RX_EN, 1) |
+  FIELD_PREP(CGX_CMRX_CONFIG_DATA_PKT_TX_EN, 1);
+
+   roc_bphy_cgx_write(roc_cgx, lmac, CGX_CMRX_CONFIG, val);
+   pthread_mutex_unlock(&roc_cgx->lock);
+
+   return 0;
+}
+
 static int
 roc_bphy_cgx_intlbk_ena_dis(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
bool enable)
@@ -252,6 +282,18 @@ roc_bphy_cgx_ptp_rx_ena_dis(struct roc_bphy_cgx *roc_cgx, 
unsigned int lmac,
return roc_bphy_cgx_intf_req(roc_cgx, lmac, scr1, &scr0);
 }
 
+int
+roc_bphy_cgx_start_rxtx(struct roc_bphy_cgx *roc_cgx, unsigned int lmac)
+{
+   return roc_bphy_cgx_start_stop_rxtx(roc_cgx, lmac, true);
+}
+
+int
+roc_bphy_cgx_stop_rxtx(struct roc_bphy_cgx *roc_cgx, unsigned int lmac)
+{
+   return roc_bphy_cgx_start_stop_rxtx(roc_cgx, lmac, false);
+}
+
 int
 roc_bphy_cgx_set_link_state(struct roc_bphy_cgx *roc_cgx, unsigned int lmac,
bool state)
diff --git a/drivers/common/cnxk/roc_bphy_cgx.h 
b/drivers/common/cnxk/roc_bphy_cgx.h
index ab6239202..49c35a1e6 100644
--- a/drivers/common/cnxk/roc_bphy_cgx.h
+++ b/drivers/common/cnxk/roc_bphy_cgx.h
@@ -95,6 +95,10 @@ struct roc_bphy_cgx_link_info {
 __roc_api int roc_bphy_cgx_dev_init(struct roc_bphy_cgx *roc_cgx);
 __roc_api int roc_bphy_cgx_dev_fini(struct roc_bphy_cgx *roc_cgx);
 
+__roc_api int roc_bphy_cgx_start_rxtx(struct roc_bphy_cgx *roc_cgx,
+ unsigned int lmac);
+__roc_api int roc_bphy_cgx_stop_rxtx(struct roc_bphy_cgx *roc_cgx,
+unsigned int lmac);
 __roc_api int roc_bphy_cgx_set_link_state(struct roc_bphy_cgx *roc_cgx,
  unsigned int lmac, bool state);
 __roc_api int roc_bphy_cgx_get_linkinfo(struct roc_bphy_cgx *roc_cgx,
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 7766f52e0..0ad805dba 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -18,6 +18,8 @@ INTERNAL {
roc_bphy_cgx_ptp_rx_enable;
roc_bphy_cgx_set_link_mode;
roc_bphy_cgx_set_link_state;
+   roc_bphy_cgx_start_rxtx;
+   roc_bphy_cgx_stop_rxtx;
roc_clk_freq_get;
roc_error_msg_get;
roc_idev_lmt_base_addr_get;
-- 
2.25.1



[dpdk-dev] [PATCH v2 09/32] raw/cnxk_bphy: add bphy cgx/rpm skeleton driver

2021-06-15 Thread Tomasz Duszynski
Add baseband phy cgx/rpm skeleton driver which merely probes a matching
device. CGX/RPM are Ethernet MACs hardwired to baseband subsystem.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 MAINTAINERS|   7 +-
 doc/guides/rawdevs/cnxk_bphy.rst   |  23 
 doc/guides/rawdevs/index.rst   |   1 +
 doc/guides/rel_notes/release_21_08.rst |   7 ++
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c  | 151 +
 drivers/raw/cnxk_bphy/meson.build  |   8 ++
 drivers/raw/cnxk_bphy/version.map  |   3 +
 drivers/raw/meson.build|   1 +
 usertools/dpdk-devbind.py  |   4 +-
 9 files changed, 203 insertions(+), 2 deletions(-)
 create mode 100644 doc/guides/rawdevs/cnxk_bphy.rst
 create mode 100644 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
 create mode 100644 drivers/raw/cnxk_bphy/meson.build
 create mode 100644 drivers/raw/cnxk_bphy/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a1697..4f533fcdb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1287,6 +1287,12 @@ M: Nipun Gupta 
 F: drivers/raw/dpaa2_cmdif/
 F: doc/guides/rawdevs/dpaa2_cmdif.rst
 
+Marvell CNXK BPHY
+M: Jakub Palider 
+M: Tomasz Duszynski 
+F: doc/guides/rawdevs/cnxk_bphy.rst
+F: drivers/raw/cnxk_bphy/
+
 Marvell OCTEON TX2 DMA
 M: Radha Mohan Chintakuntla 
 M: Veerasenareddy Burru 
@@ -1307,7 +1313,6 @@ F: doc/guides/rawdevs/ntb.rst
 F: examples/ntb/
 F: doc/guides/sample_app_ug/ntb.rst
 
-
 Packet processing
 -
 
diff --git a/doc/guides/rawdevs/cnxk_bphy.rst b/doc/guides/rawdevs/cnxk_bphy.rst
new file mode 100644
index 0..96ab68435
--- /dev/null
+++ b/doc/guides/rawdevs/cnxk_bphy.rst
@@ -0,0 +1,23 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(c) 2021 Marvell.
+
+Marvell CNXK BPHY Driver
+
+
+CN10K/CN9K Fusion product families offer an internal BPHY unit which provides
+set of hardware accelerators for performing baseband related operations.
+Connectivity to the outside world happens through a block called RFOE which is
+backed by ethernet I/O block called CGX or RPM (depending on the chip version).
+RFOE stands for Radio Frequency Over Ethernet and provides support for
+IEEE 1904.3 (RoE) standard.
+
+Device Setup
+
+
+The BPHY CGX/RPM devices will need to be bound to a user-space IO driver for
+use. The script ``dpdk-devbind.py`` script included with DPDK can be used to
+view the state of the devices and to bind them to a suitable DPDK-supported
+kernel driver. When querying the status of the devices, they will appear under
+the category of "Misc (rawdev) devices", i.e. the command
+``dpdk-devbind.py --status-dev misc`` can be used to see the state of those
+devices alone.
diff --git a/doc/guides/rawdevs/index.rst b/doc/guides/rawdevs/index.rst
index f64ec4427..7fbae40ea 100644
--- a/doc/guides/rawdevs/index.rst
+++ b/doc/guides/rawdevs/index.rst
@@ -11,6 +11,7 @@ application through rawdev API.
 :maxdepth: 2
 :numbered:
 
+cnxk_bphy
 dpaa2_cmdif
 dpaa2_qdma
 ifpga
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3c..ae70e15d1 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,13 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Added Baseband phy CNXK PMD.**
+
+  Added new Baseband phy CGX/RPM PMD which is used for configuring Ethernet I/O
+  interfaces hardwired to Baseband phy subsystem. Configuration happens via
+  standard rawdev enq/deq operations. See the :doc:`../rawdevs/cnxk_bphy`
+  rawdev guide for more details on this driver.
+
 
 Removed Items
 -
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
new file mode 100644
index 0..e537888f9
--- /dev/null
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
@@ -0,0 +1,151 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+#include 
+#include 
+#include 
+
+#include 
+
+struct cnxk_bphy_cgx_queue {
+   unsigned int lmac;
+   /* queue holds up to one response */
+   void *rsp;
+};
+
+struct cnxk_bphy_cgx {
+   struct roc_bphy_cgx *rcgx;
+   struct cnxk_bphy_cgx_queue queues[MAX_LMACS_PER_CGX];
+   unsigned int num_queues;
+};
+
+static void
+cnxk_bphy_cgx_format_name(char *name, unsigned int len,
+ struct rte_pci_device *pci_dev)
+{
+   snprintf(name, len, "BPHY_CGX:%x:%02x.%x", pci_dev->addr.bus,
+pci_dev->addr.devid, pci_dev->addr.function);
+}
+
+static const struct rte_rawdev_ops cnxk_bphy_cgx_rawdev_ops = {
+};
+
+static void
+cnxk_bphy_cgx_init_queues(struct cnxk_bphy_cgx *cgx)
+{
+   struct roc_bphy_cgx *rcgx = cgx->rcgx;
+   unsigned int i;
+
+   for (i = 0; i < RTE_DIM(cgx->queues); i++) {
+   if (!(r

[dpdk-dev] [PATCH v2 10/32] raw/cnxk_bphy: support for reading queue configuration

2021-06-15 Thread Tomasz Duszynski
Add support for reading queue configuration. Single queue represents
a logical mac available on rpm/cgx.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
index e537888f9..016f9f02c 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
@@ -27,7 +27,27 @@ cnxk_bphy_cgx_format_name(char *name, unsigned int len,
 pci_dev->addr.devid, pci_dev->addr.function);
 }
 
+static int
+cnxk_bphy_cgx_queue_def_conf(struct rte_rawdev *dev, uint16_t queue_id,
+rte_rawdev_obj_t queue_conf,
+size_t queue_conf_size)
+{
+   unsigned int *conf;
+
+   RTE_SET_USED(dev);
+   RTE_SET_USED(queue_id);
+
+   if (queue_conf_size != sizeof(*conf))
+   return -EINVAL;
+
+   conf = (unsigned int *)queue_conf;
+   *conf = 1;
+
+   return 0;
+}
+
 static const struct rte_rawdev_ops cnxk_bphy_cgx_rawdev_ops = {
+   .queue_def_conf = cnxk_bphy_cgx_queue_def_conf,
 };
 
 static void
-- 
2.25.1



[dpdk-dev] [PATCH v2 11/32] raw/cnxk_bphy: support for reading queue count

2021-06-15 Thread Tomasz Duszynski
Add support for reading number of available queues i.e number
of available logical macs (LMACs).

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 doc/guides/rawdevs/cnxk_bphy.rst  | 4 
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c | 9 +
 2 files changed, 13 insertions(+)

diff --git a/doc/guides/rawdevs/cnxk_bphy.rst b/doc/guides/rawdevs/cnxk_bphy.rst
index 96ab68435..d6803e527 100644
--- a/doc/guides/rawdevs/cnxk_bphy.rst
+++ b/doc/guides/rawdevs/cnxk_bphy.rst
@@ -21,3 +21,7 @@ kernel driver. When querying the status of the devices, they 
will appear under
 the category of "Misc (rawdev) devices", i.e. the command
 ``dpdk-devbind.py --status-dev misc`` can be used to see the state of those
 devices alone.
+
+Before performing actual data transfer one needs to first retrieve number of
+available queues with ``rte_rawdev_queue_count()`` and capacity of each
+using ``rte_rawdev_queue_conf_get()``.
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
index 016f9f02c..da4372642 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
@@ -46,8 +46,17 @@ cnxk_bphy_cgx_queue_def_conf(struct rte_rawdev *dev, 
uint16_t queue_id,
return 0;
 }
 
+static uint16_t
+cnxk_bphy_cgx_queue_count(struct rte_rawdev *dev)
+{
+   struct cnxk_bphy_cgx *cgx = dev->dev_private;
+
+   return cgx->num_queues;
+}
+
 static const struct rte_rawdev_ops cnxk_bphy_cgx_rawdev_ops = {
.queue_def_conf = cnxk_bphy_cgx_queue_def_conf,
+   .queue_count = cnxk_bphy_cgx_queue_count,
 };
 
 static void
-- 
2.25.1



[dpdk-dev] [PATCH v2 12/32] raw/cnxk_bphy: support for enqueue operation

2021-06-15 Thread Tomasz Duszynski
Add support for enqueueing messages.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 doc/guides/rawdevs/cnxk_bphy.rst  |  68 
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c | 112 ++
 drivers/raw/cnxk_bphy/meson.build |   1 +
 drivers/raw/cnxk_bphy/rte_pmd_bphy.h  | 104 
 4 files changed, 285 insertions(+)
 create mode 100644 drivers/raw/cnxk_bphy/rte_pmd_bphy.h

diff --git a/doc/guides/rawdevs/cnxk_bphy.rst b/doc/guides/rawdevs/cnxk_bphy.rst
index d6803e527..4015740f7 100644
--- a/doc/guides/rawdevs/cnxk_bphy.rst
+++ b/doc/guides/rawdevs/cnxk_bphy.rst
@@ -11,6 +11,13 @@ backed by ethernet I/O block called CGX or RPM (depending on 
the chip version).
 RFOE stands for Radio Frequency Over Ethernet and provides support for
 IEEE 1904.3 (RoE) standard.
 
+Features
+
+
+The BPHY CGX/RPM implements following features in the rawdev API:
+
+- Access to BPHY CGX/RPM via a set of predefined messages
+
 Device Setup
 
 
@@ -25,3 +32,64 @@ devices alone.
 Before performing actual data transfer one needs to first retrieve number of
 available queues with ``rte_rawdev_queue_count()`` and capacity of each
 using ``rte_rawdev_queue_conf_get()``.
+
+To perform data transfer use standard ``rte_rawdev_enqueue_buffers()`` and
+``rte_rawdev_dequeue_buffers()`` APIs. Not all messages produce sensible
+responses hence dequeueing is not always necessary.
+
+BPHY CGX/RPM PMD accepts ``struct cnxk_bphy_cgx_msg`` messages which differ by 
type and payload.
+Message types along with description are listed below.
+
+Get link information
+
+
+Message is used to get information about link state.
+
+Message must have type set to ``CNXK_BPHY_CGX_MSG_TYPE_GET_LINKINFO``. In 
response one will
+get message containing payload i.e ``struct cnxk_bphy_cgx_msg_link_info`` 
filled with information
+about current link state.
+
+Change internal loopback state
+~~
+
+Message is used to enable or disable internal loopback.
+
+Message must have type set to ``CNXK_BPHY_CGX_MSG_TYPE_INTLBK_ENABLE`` or
+``CNXK_BPHY_CGX_MSG_TYPE_INTLBK_DISABLE``. Former will activate internal 
loobback while the latter
+will do the opposite.
+
+Change PTP RX state
+~~~
+
+Message is used to enable or disable PTP mode.
+
+Message must have type set to ``CNXK_BPHY_CGX_MSG_TYPE_PTP_RX_ENABLE`` or
+``CNXK_BPHY_CGX_MSG_TYPE_PTP_RX_DISABLE``. Former will enable PTP while the 
latter will do the
+opposite.
+
+Set link mode
+~
+
+Message is used to change link mode.
+
+Message must have type set to ``CNXK_BPHY_CGX_MSG_TYPE_SET_LINK_MODE``. Prior 
to sending actual
+message payload i.e ``struct cnxk_bphy_cgx_msg_link_mode`` needs to be filled 
with relevant
+information.
+
+Change link state
+~
+
+Message is used to set link up or down.
+
+Message must have type set to ``CNXK_BPHY_CGX_MSG_TYPE_SET_LINK_STATE``. Prior 
to sending actual
+message payload i.e ``struct cnxk_bphy_cgx_msg_set_link_state`` needs to be 
filled with relevant
+information.
+
+Start or stop RX/TX
+~~~
+
+Message is used to start or stop accepting traffic.
+
+Message must have type set to ``CNXK_BPHY_CGX_MSG_TYPE_START_RXTX`` or
+``CNXK_BPHY_CGX_MSG_TYPE_STOP_RXTX``. Former will enable traffic while the 
latter will
+do the opposite.
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
index da4372642..637514406 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
@@ -1,12 +1,16 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(C) 2021 Marvell.
  */
+#include 
+
 #include 
 #include 
 #include 
 
 #include 
 
+#include "rte_pmd_bphy.h"
+
 struct cnxk_bphy_cgx_queue {
unsigned int lmac;
/* queue holds up to one response */
@@ -46,6 +50,113 @@ cnxk_bphy_cgx_queue_def_conf(struct rte_rawdev *dev, 
uint16_t queue_id,
return 0;
 }
 
+static int
+cnxk_bphy_cgx_process_buf(struct cnxk_bphy_cgx *cgx, unsigned int queue,
+ struct rte_rawdev_buf *buf)
+{
+   struct cnxk_bphy_cgx_queue *qp = &cgx->queues[queue];
+   struct cnxk_bphy_cgx_msg_set_link_state *link_state;
+   struct cnxk_bphy_cgx_msg *msg = buf->buf_addr;
+   struct cnxk_bphy_cgx_msg_link_mode *link_mode;
+   struct cnxk_bphy_cgx_msg_link_info *link_info;
+   struct roc_bphy_cgx_link_info rlink_info;
+   struct roc_bphy_cgx_link_mode rlink_mode;
+   unsigned int lmac = qp->lmac;
+   void *rsp = NULL;
+   int ret;
+
+   switch (msg->type) {
+   case CNXK_BPHY_CGX_MSG_TYPE_GET_LINKINFO:
+   memset(&rlink_info, 0, sizeof(rlink_info));
+   ret = roc_bphy_cgx_get_linkinfo(cgx->rcgx, lmac, &rlink_info);
+   if (ret)
+   break;
+
+   link_info = rte_zmalloc(NULL, sizeof(*link_info), 0);
+   if 

[dpdk-dev] [PATCH v2 13/32] raw/cnxk_bphy: support for dequeue operation

2021-06-15 Thread Tomasz Duszynski
Add support for dequeueing responses to previously
enqueued messages.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
index 637514406..a8eafae1b 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
@@ -157,6 +157,32 @@ cnxk_bphy_cgx_enqueue_bufs(struct rte_rawdev *dev,
return 1;
 }
 
+static int
+cnxk_bphy_cgx_dequeue_bufs(struct rte_rawdev *dev,
+  struct rte_rawdev_buf **buffers, unsigned int count,
+  rte_rawdev_obj_t context)
+{
+   struct cnxk_bphy_cgx *cgx = dev->dev_private;
+   unsigned int queue = (size_t)context;
+   struct cnxk_bphy_cgx_queue *qp;
+
+   if (queue >= cgx->num_queues)
+   return -EINVAL;
+
+   if (count == 0)
+   return 0;
+
+   qp = &cgx->queues[queue];
+   if (qp->rsp) {
+   buffers[0]->buf_addr = qp->rsp;
+   qp->rsp = NULL;
+
+   return 1;
+   }
+
+   return 0;
+}
+
 static uint16_t
 cnxk_bphy_cgx_queue_count(struct rte_rawdev *dev)
 {
@@ -168,6 +194,7 @@ cnxk_bphy_cgx_queue_count(struct rte_rawdev *dev)
 static const struct rte_rawdev_ops cnxk_bphy_cgx_rawdev_ops = {
.queue_def_conf = cnxk_bphy_cgx_queue_def_conf,
.enqueue_bufs = cnxk_bphy_cgx_enqueue_bufs,
+   .dequeue_bufs = cnxk_bphy_cgx_dequeue_bufs,
.queue_count = cnxk_bphy_cgx_queue_count,
 };
 
-- 
2.25.1



[dpdk-dev] [PATCH v2 14/32] raw/cnxk_bphy: support for performing selftest

2021-06-15 Thread Tomasz Duszynski
Add support for performing selftest operation.

Signed-off-by: Tomasz Duszynski 
Signed-off-by: Jakub Palider 
---
 doc/guides/rawdevs/cnxk_bphy.rst   |  19 +-
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c  |   2 +
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.h  |  10 +
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx_test.c | 206 +
 drivers/raw/cnxk_bphy/meson.build  |   1 +
 5 files changed, 237 insertions(+), 1 deletion(-)
 create mode 100644 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.h
 create mode 100644 drivers/raw/cnxk_bphy/cnxk_bphy_cgx_test.c

diff --git a/doc/guides/rawdevs/cnxk_bphy.rst b/doc/guides/rawdevs/cnxk_bphy.rst
index 4015740f7..2edd814da 100644
--- a/doc/guides/rawdevs/cnxk_bphy.rst
+++ b/doc/guides/rawdevs/cnxk_bphy.rst
@@ -38,7 +38,8 @@ To perform data transfer use standard 
``rte_rawdev_enqueue_buffers()`` and
 responses hence dequeueing is not always necessary.
 
 BPHY CGX/RPM PMD accepts ``struct cnxk_bphy_cgx_msg`` messages which differ by 
type and payload.
-Message types along with description are listed below.
+Message types along with description are listed below. As for the usage 
examples please refer to
+``cnxk_bphy_cgx_dev_selftest()``.
 
 Get link information
 
@@ -93,3 +94,19 @@ Message is used to start or stop accepting traffic.
 Message must have type set to ``CNXK_BPHY_CGX_MSG_TYPE_START_RXTX`` or
 ``CNXK_BPHY_CGX_MSG_TYPE_STOP_RXTX``. Former will enable traffic while the 
latter will
 do the opposite.
+
+Self test
+-
+
+On EAL initialization, BPHY CGX/RPM devices will be probed and populated into
+the raw devices. The rawdev ID of the device can be obtained using invocation
+of ``rte_rawdev_get_dev_id("NAME:x")`` from the test application, where:
+
+- NAME is the desired subsystem: use "BPHY_CGX" for
+  RFOE module,
+- x is the device's bus id specified in "bus:device.func" (BDF) format.
+
+Use this identifier for further rawdev function calls.
+
+The driver's selftest rawdev API can be used to verify the BPHY CGX/RPM
+functionality.
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
index a8eafae1b..3da224414 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
@@ -9,6 +9,7 @@
 
 #include 
 
+#include "cnxk_bphy_cgx.h"
 #include "rte_pmd_bphy.h"
 
 struct cnxk_bphy_cgx_queue {
@@ -196,6 +197,7 @@ static const struct rte_rawdev_ops cnxk_bphy_cgx_rawdev_ops 
= {
.enqueue_bufs = cnxk_bphy_cgx_enqueue_bufs,
.dequeue_bufs = cnxk_bphy_cgx_dequeue_bufs,
.queue_count = cnxk_bphy_cgx_queue_count,
+   .dev_selftest = cnxk_bphy_cgx_dev_selftest,
 };
 
 static void
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.h 
b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.h
new file mode 100644
index 0..fb6b31bf4
--- /dev/null
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#ifndef _CNXK_BPHY_CGX_H_
+#define _CNXK_BPHY_CGX_H_
+
+int cnxk_bphy_cgx_dev_selftest(uint16_t dev_id);
+
+#endif /* _CNXK_BPHY_CGX_H_ */
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx_test.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx_test.c
new file mode 100644
index 0..cb4dd4b22
--- /dev/null
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx_test.c
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "cnxk_bphy_cgx.h"
+#include "rte_pmd_bphy.h"
+
+static int
+cnxk_bphy_cgx_enq_msg(uint16_t dev_id, unsigned int queue, void *msg)
+{
+   struct rte_rawdev_buf *bufs[1];
+   struct rte_rawdev_buf buf;
+   void *q;
+   int ret;
+
+   q = (void *)(size_t)queue;
+   buf.buf_addr = msg;
+   bufs[0] = &buf;
+
+   ret = rte_rawdev_enqueue_buffers(dev_id, bufs, 1, q);
+   if (ret < 0)
+   return ret;
+   if (ret != 1)
+   return -EIO;
+
+   return 0;
+}
+
+static int
+cnxk_bphy_cgx_deq_msg(uint16_t dev_id, unsigned int queue, void **msg)
+{
+   struct rte_rawdev_buf *bufs[1];
+   struct rte_rawdev_buf buf;
+   void *q;
+   int ret;
+
+   q = (void *)(size_t)queue;
+   bufs[0] = &buf;
+
+   ret = rte_rawdev_dequeue_buffers(dev_id, bufs, 1, q);
+   if (ret < 0)
+   return ret;
+   if (ret != 1)
+   return -EIO;
+
+   *msg = buf.buf_addr;
+
+   return 0;
+}
+
+static int
+cnxk_bphy_cgx_link_cond(uint16_t dev_id, unsigned int queue, int cond)
+{
+   int tries = 10, ret;
+
+   do {
+   struct cnxk_bphy_cgx_msg_link_info *link_info = NULL;
+   struct cnxk_bphy_cgx_msg msg;
+
+   msg.type = CNXK_BPHY_CGX_MSG_TYPE_GET_LINKINFO;
+   ret = cnxk_bphy_cgx_enq_msg(dev_id, queue, &msg);
+   if (ret)
+   return ret;
+
+   ret = cnxk_bphy_cgx_deq_msg(dev_id, queue,

[dpdk-dev] [PATCH v2 15/32] common/cnxk: support for device init and fini

2021-06-15 Thread Tomasz Duszynski
Add support for device init and fini. It merely saves
baseband phy state container in a globally accessible
resource chest.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/common/cnxk/meson.build |  1 +
 drivers/common/cnxk/roc_api.h   |  4 +++
 drivers/common/cnxk/roc_bphy.c  | 40 +
 drivers/common/cnxk/roc_bphy.h  | 17 
 drivers/common/cnxk/roc_idev.c  |  1 +
 drivers/common/cnxk/roc_idev_priv.h |  2 ++
 drivers/common/cnxk/version.map |  2 ++
 7 files changed, 67 insertions(+)
 create mode 100644 drivers/common/cnxk/roc_bphy.c
 create mode 100644 drivers/common/cnxk/roc_bphy.h

diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 59975fd34..946b98f46 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -11,6 +11,7 @@ endif
 config_flag_fmt = 'RTE_LIBRTE_@0@_COMMON'
 deps = ['eal', 'pci', 'bus_pci', 'mbuf']
 sources = files(
+'roc_bphy.c',
 'roc_bphy_cgx.c',
 'roc_dev.c',
 'roc_idev.c',
diff --git a/drivers/common/cnxk/roc_api.h b/drivers/common/cnxk/roc_api.h
index 256d8c68d..dd0047873 100644
--- a/drivers/common/cnxk/roc_api.h
+++ b/drivers/common/cnxk/roc_api.h
@@ -50,6 +50,7 @@
 #define PCI_DEVID_CNXK_EP_VF 0xB203
 #define PCI_DEVID_CNXK_RVU_SDP_PF 0xA0f6
 #define PCI_DEVID_CNXK_RVU_SDP_VF 0xA0f7
+#define PCI_DEVID_CNXK_BPHY  0xA089
 
 #define PCI_DEVID_CN9K_CGX  0xA059
 #define PCI_DEVID_CN10K_RPM 0xA060
@@ -103,4 +104,7 @@
 /* Baseband phy cgx */
 #include "roc_bphy_cgx.h"
 
+/* Baseband phy */
+#include "roc_bphy.h"
+
 #endif /* _ROC_API_H_ */
diff --git a/drivers/common/cnxk/roc_bphy.c b/drivers/common/cnxk/roc_bphy.c
new file mode 100644
index 0..77606d646
--- /dev/null
+++ b/drivers/common/cnxk/roc_bphy.c
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include "roc_api.h"
+#include "roc_priv.h"
+
+int
+roc_bphy_dev_init(struct roc_bphy *roc_bphy)
+{
+   struct idev_cfg *idev;
+
+   idev = idev_get_cfg();
+   if (!idev)
+   return -ENODEV;
+
+   if (!roc_bphy || !roc_bphy->pci_dev)
+   return -EINVAL;
+
+   idev->bphy = roc_bphy;
+
+   return 0;
+}
+
+int
+roc_bphy_dev_fini(struct roc_bphy *roc_bphy)
+{
+   struct idev_cfg *idev;
+
+   idev = idev_get_cfg();
+   if (!idev)
+   return -ENODEV;
+
+   if (!roc_bphy)
+   return -EINVAL;
+
+   idev->bphy = NULL;
+
+   return 0;
+}
diff --git a/drivers/common/cnxk/roc_bphy.h b/drivers/common/cnxk/roc_bphy.h
new file mode 100644
index 0..0579c6c44
--- /dev/null
+++ b/drivers/common/cnxk/roc_bphy.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell.
+ */
+
+#ifndef _ROC_BPHY_
+#define _ROC_BPHY_
+
+#include "roc_api.h"
+
+struct roc_bphy {
+   struct plt_pci_device *pci_dev;
+} __plt_cache_aligned;
+
+int __roc_api roc_bphy_dev_init(struct roc_bphy *roc_bphy);
+int __roc_api roc_bphy_dev_fini(struct roc_bphy *roc_bphy);
+
+#endif /* _ROC_BPHY_ */
diff --git a/drivers/common/cnxk/roc_idev.c b/drivers/common/cnxk/roc_idev.c
index 63cc04044..4d7b53422 100644
--- a/drivers/common/cnxk/roc_idev.c
+++ b/drivers/common/cnxk/roc_idev.c
@@ -36,6 +36,7 @@ idev_set_defaults(struct idev_cfg *idev)
idev->lmt_pf_func = 0;
idev->lmt_base_addr = 0;
idev->num_lmtlines = 0;
+   idev->bphy = NULL;
__atomic_store_n(&idev->npa_refcnt, 0, __ATOMIC_RELEASE);
 }
 
diff --git a/drivers/common/cnxk/roc_idev_priv.h 
b/drivers/common/cnxk/roc_idev_priv.h
index ff10a905c..384f667ea 100644
--- a/drivers/common/cnxk/roc_idev_priv.h
+++ b/drivers/common/cnxk/roc_idev_priv.h
@@ -7,6 +7,7 @@
 
 /* Intra device related functions */
 struct npa_lf;
+struct roc_bphy;
 struct idev_cfg {
uint16_t sso_pf_func;
uint16_t npa_pf_func;
@@ -16,6 +17,7 @@ struct idev_cfg {
uint16_t lmt_pf_func;
uint16_t num_lmtlines;
uint64_t lmt_base_addr;
+   struct roc_bphy *bphy;
 };
 
 /* Generic */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 0ad805dba..25083d9d4 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -20,6 +20,8 @@ INTERNAL {
roc_bphy_cgx_set_link_state;
roc_bphy_cgx_start_rxtx;
roc_bphy_cgx_stop_rxtx;
+   roc_bphy_dev_fini;
+   roc_bphy_dev_init;
roc_clk_freq_get;
roc_error_msg_get;
roc_idev_lmt_base_addr_get;
-- 
2.25.1



[dpdk-dev] [PATCH v2 16/32] common/cnxk: support for baseband PHY irq setup

2021-06-15 Thread Tomasz Duszynski
Add support for initializing baseband phy irqs. While at it
also add support for reverting back to the default state.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/common/cnxk/meson.build|  1 +
 drivers/common/cnxk/roc_bphy_irq.c | 96 ++
 drivers/common/cnxk/roc_bphy_irq.h | 27 +
 drivers/common/cnxk/version.map|  2 +
 4 files changed, 126 insertions(+)
 create mode 100644 drivers/common/cnxk/roc_bphy_irq.c
 create mode 100644 drivers/common/cnxk/roc_bphy_irq.h

diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 946b98f46..c0ec54932 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -13,6 +13,7 @@ deps = ['eal', 'pci', 'bus_pci', 'mbuf']
 sources = files(
 'roc_bphy.c',
 'roc_bphy_cgx.c',
+'roc_bphy_irq.c',
 'roc_dev.c',
 'roc_idev.c',
 'roc_irq.c',
diff --git a/drivers/common/cnxk/roc_bphy_irq.c 
b/drivers/common/cnxk/roc_bphy_irq.c
new file mode 100644
index 0..c57506542
--- /dev/null
+++ b/drivers/common/cnxk/roc_bphy_irq.c
@@ -0,0 +1,96 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+#include 
+#include 
+#include 
+
+#include "roc_api.h"
+#include "roc_bphy_irq.h"
+
+#define ROC_BPHY_MEMZONE_NAME "roc_bphy_mz"
+#define ROC_BPHY_CTR_DEV_PATH "/dev/otx-bphy-ctr"
+
+#define ROC_BPHY_IOC_MAGIC 0xF3
+#define ROC_BPHY_IOC_GET_BPHY_MAX_IRQ  _IOR(ROC_BPHY_IOC_MAGIC, 3, uint64_t)
+#define ROC_BPHY_IOC_GET_BPHY_BMASK_IRQ _IOR(ROC_BPHY_IOC_MAGIC, 4, uint64_t)
+
+struct roc_bphy_irq_chip *
+roc_bphy_intr_init(void)
+{
+   struct roc_bphy_irq_chip *irq_chip;
+   uint64_t max_irq, i, avail_irqs;
+   int fd, ret;
+
+   fd = open(ROC_BPHY_CTR_DEV_PATH, O_RDWR | O_SYNC);
+   if (fd < 0) {
+   plt_err("Failed to open %s", ROC_BPHY_CTR_DEV_PATH);
+   return NULL;
+   }
+
+   ret = ioctl(fd, ROC_BPHY_IOC_GET_BPHY_MAX_IRQ, &max_irq);
+   if (ret < 0) {
+   plt_err("Failed to get max irq number via ioctl");
+   goto err_ioctl;
+   }
+
+   ret = ioctl(fd, ROC_BPHY_IOC_GET_BPHY_BMASK_IRQ, &avail_irqs);
+   if (ret < 0) {
+   plt_err("Failed to get available irqs bitmask via ioctl");
+   goto err_ioctl;
+   }
+
+   irq_chip = plt_zmalloc(sizeof(*irq_chip), 0);
+   if (irq_chip == NULL) {
+   plt_err("Failed to alloc irq_chip");
+   goto err_alloc_chip;
+   }
+
+   irq_chip->intfd = fd;
+   irq_chip->max_irq = max_irq;
+   irq_chip->avail_irq_bmask = avail_irqs;
+   irq_chip->irq_vecs =
+   plt_zmalloc(irq_chip->max_irq * sizeof(*irq_chip->irq_vecs), 0);
+   if (irq_chip->irq_vecs == NULL) {
+   plt_err("Failed to alloc irq_chip irq_vecs");
+   goto err_alloc_irq;
+   }
+
+   irq_chip->mz_name = plt_zmalloc(strlen(ROC_BPHY_MEMZONE_NAME) + 1, 0);
+   if (irq_chip->mz_name == NULL) {
+   plt_err("Failed to alloc irq_chip name");
+   goto err_alloc_name;
+   }
+   plt_strlcpy(irq_chip->mz_name, ROC_BPHY_MEMZONE_NAME,
+   strlen(ROC_BPHY_MEMZONE_NAME) + 1);
+
+   for (i = 0; i < irq_chip->max_irq; i++) {
+   irq_chip->irq_vecs[i].fd = -1;
+   irq_chip->irq_vecs[i].handler_cpu = -1;
+   }
+
+   return irq_chip;
+
+err_alloc_name:
+   plt_free(irq_chip->irq_vecs);
+
+err_alloc_irq:
+   plt_free(irq_chip);
+
+err_ioctl:
+err_alloc_chip:
+   close(fd);
+   return NULL;
+}
+
+void
+roc_bphy_intr_fini(struct roc_bphy_irq_chip *irq_chip)
+{
+   if (irq_chip == NULL)
+   return;
+
+   close(irq_chip->intfd);
+   plt_free(irq_chip->mz_name);
+   plt_free(irq_chip->irq_vecs);
+   plt_free(irq_chip);
+}
diff --git a/drivers/common/cnxk/roc_bphy_irq.h 
b/drivers/common/cnxk/roc_bphy_irq.h
new file mode 100644
index 0..b5200786b
--- /dev/null
+++ b/drivers/common/cnxk/roc_bphy_irq.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#ifndef _ROC_BPHY_IRQ_
+#define _ROC_BPHY_IRQ_
+
+struct roc_bphy_irq_vec {
+   int fd;
+   int handler_cpu;
+   void (*handler)(int irq_num, void *isr_data);
+   void *isr_data;
+};
+
+struct roc_bphy_irq_chip {
+   struct roc_bphy_irq_vec *irq_vecs;
+   uint64_t max_irq;
+   uint64_t avail_irq_bmask;
+   int intfd;
+   int n_handlers;
+   char *mz_name;
+};
+
+__roc_api struct roc_bphy_irq_chip *roc_bphy_intr_init(void);
+__roc_api void roc_bphy_intr_fini(struct roc_bphy_irq_chip *irq_chip);
+
+#endif /* _ROC_BPHY_IRQ_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 25083d9d4..483e52018 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -22,6 +22,8 @@ INTERNAL {
 

[dpdk-dev] [PATCH v2 17/32] common/cnxk: support for checking irq availability

2021-06-15 Thread Tomasz Duszynski
Add support for checking whether given irq is available.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/common/cnxk/roc_bphy_irq.c | 9 +
 drivers/common/cnxk/roc_bphy_irq.h | 2 ++
 drivers/common/cnxk/version.map| 1 +
 3 files changed, 12 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_irq.c 
b/drivers/common/cnxk/roc_bphy_irq.c
index c57506542..bea2b7f73 100644
--- a/drivers/common/cnxk/roc_bphy_irq.c
+++ b/drivers/common/cnxk/roc_bphy_irq.c
@@ -94,3 +94,12 @@ roc_bphy_intr_fini(struct roc_bphy_irq_chip *irq_chip)
plt_free(irq_chip->irq_vecs);
plt_free(irq_chip);
 }
+
+bool
+roc_bphy_intr_available(struct roc_bphy_irq_chip *irq_chip, int irq_num)
+{
+   if (irq_num < 0 || (uint64_t)irq_num >= irq_chip->max_irq)
+   return false;
+
+   return irq_chip->avail_irq_bmask & BIT(irq_num);
+}
diff --git a/drivers/common/cnxk/roc_bphy_irq.h 
b/drivers/common/cnxk/roc_bphy_irq.h
index b5200786b..f481f4456 100644
--- a/drivers/common/cnxk/roc_bphy_irq.h
+++ b/drivers/common/cnxk/roc_bphy_irq.h
@@ -23,5 +23,7 @@ struct roc_bphy_irq_chip {
 
 __roc_api struct roc_bphy_irq_chip *roc_bphy_intr_init(void);
 __roc_api void roc_bphy_intr_fini(struct roc_bphy_irq_chip *irq_chip);
+__roc_api bool roc_bphy_intr_available(struct roc_bphy_irq_chip *irq_chip,
+  int irq_num);
 
 #endif /* _ROC_BPHY_IRQ_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 483e52018..427321c41 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -22,6 +22,7 @@ INTERNAL {
roc_bphy_cgx_stop_rxtx;
roc_bphy_dev_fini;
roc_bphy_dev_init;
+   roc_bphy_intr_available;
roc_bphy_intr_fini;
roc_bphy_intr_init;
roc_clk_freq_get;
-- 
2.25.1



[dpdk-dev] [PATCH v2 18/32] common/cnxk: support for retrieving irq stack

2021-06-15 Thread Tomasz Duszynski
Add support for retrieving irq stack. If stack does not exist
then it gets created.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/common/cnxk/roc_bphy_irq.c | 62 ++
 drivers/common/cnxk/roc_bphy_irq.h |  1 +
 drivers/common/cnxk/version.map|  1 +
 3 files changed, 64 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_irq.c 
b/drivers/common/cnxk/roc_bphy_irq.c
index bea2b7f73..04ad129ac 100644
--- a/drivers/common/cnxk/roc_bphy_irq.c
+++ b/drivers/common/cnxk/roc_bphy_irq.c
@@ -2,12 +2,21 @@
  * Copyright(C) 2021 Marvell.
  */
 #include 
+#include 
 #include 
+#include 
 #include 
 
 #include "roc_api.h"
 #include "roc_bphy_irq.h"
 
+struct roc_bphy_irq_stack {
+   STAILQ_ENTRY(roc_bphy_irq_stack) entries;
+   void *sp_buffer;
+   int cpu;
+   int inuse;
+};
+
 #define ROC_BPHY_MEMZONE_NAME "roc_bphy_mz"
 #define ROC_BPHY_CTR_DEV_PATH "/dev/otx-bphy-ctr"
 
@@ -15,6 +24,12 @@
 #define ROC_BPHY_IOC_GET_BPHY_MAX_IRQ  _IOR(ROC_BPHY_IOC_MAGIC, 3, uint64_t)
 #define ROC_BPHY_IOC_GET_BPHY_BMASK_IRQ _IOR(ROC_BPHY_IOC_MAGIC, 4, uint64_t)
 
+static STAILQ_HEAD(slisthead, roc_bphy_irq_stack)
+   irq_stacks = STAILQ_HEAD_INITIALIZER(irq_stacks);
+
+/* Note: it is assumed that as for now there is no multiprocess support */
+static pthread_mutex_t stacks_mutex = PTHREAD_MUTEX_INITIALIZER;
+
 struct roc_bphy_irq_chip *
 roc_bphy_intr_init(void)
 {
@@ -95,6 +110,53 @@ roc_bphy_intr_fini(struct roc_bphy_irq_chip *irq_chip)
plt_free(irq_chip);
 }
 
+void *
+roc_bphy_irq_stack_get(int cpu)
+{
+#define ARM_STACK_ALIGNMENT (2 * sizeof(void *))
+#define IRQ_ISR_STACK_SIZE  0x20
+
+   struct roc_bphy_irq_stack *curr_stack;
+   void *retval = NULL;
+
+   if (pthread_mutex_lock(&stacks_mutex))
+   return NULL;
+
+   STAILQ_FOREACH(curr_stack, &irq_stacks, entries) {
+   if (curr_stack->cpu == cpu) {
+   curr_stack->inuse++;
+   retval = ((char *)curr_stack->sp_buffer) +
+IRQ_ISR_STACK_SIZE;
+   goto found_stack;
+   }
+   }
+
+   curr_stack = plt_zmalloc(sizeof(struct roc_bphy_irq_stack), 0);
+   if (curr_stack == NULL)
+   goto err_stack;
+
+   curr_stack->sp_buffer =
+   plt_zmalloc(IRQ_ISR_STACK_SIZE * 2, ARM_STACK_ALIGNMENT);
+   if (curr_stack->sp_buffer == NULL)
+   goto err_buffer;
+
+   curr_stack->cpu = cpu;
+   curr_stack->inuse = 0;
+   STAILQ_INSERT_TAIL(&irq_stacks, curr_stack, entries);
+   retval = ((char *)curr_stack->sp_buffer) + IRQ_ISR_STACK_SIZE;
+
+found_stack:
+   pthread_mutex_unlock(&stacks_mutex);
+   return retval;
+
+err_buffer:
+   plt_free(curr_stack);
+
+err_stack:
+   pthread_mutex_unlock(&stacks_mutex);
+   return NULL;
+}
+
 bool
 roc_bphy_intr_available(struct roc_bphy_irq_chip *irq_chip, int irq_num)
 {
diff --git a/drivers/common/cnxk/roc_bphy_irq.h 
b/drivers/common/cnxk/roc_bphy_irq.h
index f481f4456..e66b2aa7c 100644
--- a/drivers/common/cnxk/roc_bphy_irq.h
+++ b/drivers/common/cnxk/roc_bphy_irq.h
@@ -23,6 +23,7 @@ struct roc_bphy_irq_chip {
 
 __roc_api struct roc_bphy_irq_chip *roc_bphy_intr_init(void);
 __roc_api void roc_bphy_intr_fini(struct roc_bphy_irq_chip *irq_chip);
+__roc_api void *roc_bphy_irq_stack_get(int cpu);
 __roc_api bool roc_bphy_intr_available(struct roc_bphy_irq_chip *irq_chip,
   int irq_num);
 
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 427321c41..542364926 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -25,6 +25,7 @@ INTERNAL {
roc_bphy_intr_available;
roc_bphy_intr_fini;
roc_bphy_intr_init;
+   roc_bphy_irq_stack_get;
roc_clk_freq_get;
roc_error_msg_get;
roc_idev_lmt_base_addr_get;
-- 
2.25.1



[dpdk-dev] [PATCH v2 19/32] common/cnxk: support for removing irq stack

2021-06-15 Thread Tomasz Duszynski
Add support for removing existing irq stack.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/common/cnxk/roc_bphy_irq.c | 30 ++
 drivers/common/cnxk/roc_bphy_irq.h |  1 +
 drivers/common/cnxk/version.map|  1 +
 3 files changed, 32 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_irq.c 
b/drivers/common/cnxk/roc_bphy_irq.c
index 04ad129ac..a90c055ff 100644
--- a/drivers/common/cnxk/roc_bphy_irq.c
+++ b/drivers/common/cnxk/roc_bphy_irq.c
@@ -110,6 +110,36 @@ roc_bphy_intr_fini(struct roc_bphy_irq_chip *irq_chip)
plt_free(irq_chip);
 }
 
+void
+roc_bphy_irq_stack_remove(int cpu)
+{
+   struct roc_bphy_irq_stack *curr_stack;
+
+   if (pthread_mutex_lock(&stacks_mutex))
+   return;
+
+   STAILQ_FOREACH(curr_stack, &irq_stacks, entries) {
+   if (curr_stack->cpu == cpu)
+   break;
+   }
+
+   if (curr_stack == NULL)
+   goto leave;
+
+   if (curr_stack->inuse > 0)
+   curr_stack->inuse--;
+
+   if (curr_stack->inuse == 0) {
+   STAILQ_REMOVE(&irq_stacks, curr_stack, roc_bphy_irq_stack,
+ entries);
+   plt_free(curr_stack->sp_buffer);
+   plt_free(curr_stack);
+   }
+
+leave:
+   pthread_mutex_unlock(&stacks_mutex);
+}
+
 void *
 roc_bphy_irq_stack_get(int cpu)
 {
diff --git a/drivers/common/cnxk/roc_bphy_irq.h 
b/drivers/common/cnxk/roc_bphy_irq.h
index e66b2aa7c..549a84a7d 100644
--- a/drivers/common/cnxk/roc_bphy_irq.h
+++ b/drivers/common/cnxk/roc_bphy_irq.h
@@ -23,6 +23,7 @@ struct roc_bphy_irq_chip {
 
 __roc_api struct roc_bphy_irq_chip *roc_bphy_intr_init(void);
 __roc_api void roc_bphy_intr_fini(struct roc_bphy_irq_chip *irq_chip);
+__roc_api void roc_bphy_irq_stack_remove(int cpu);
 __roc_api void *roc_bphy_irq_stack_get(int cpu);
 __roc_api bool roc_bphy_intr_available(struct roc_bphy_irq_chip *irq_chip,
   int irq_num);
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 542364926..78601fe31 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -26,6 +26,7 @@ INTERNAL {
roc_bphy_intr_fini;
roc_bphy_intr_init;
roc_bphy_irq_stack_get;
+   roc_bphy_irq_stack_remove;
roc_clk_freq_get;
roc_error_msg_get;
roc_idev_lmt_base_addr_get;
-- 
2.25.1



[dpdk-dev] [PATCH v2 20/32] common/cnxk: support for setting bphy irq handler

2021-06-15 Thread Tomasz Duszynski
Add support for setting custom baseband phy irq handler.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/common/cnxk/roc_bphy_irq.c   | 121 +++
 drivers/common/cnxk/roc_bphy_irq.h   |   5 ++
 drivers/common/cnxk/roc_io.h |   9 ++
 drivers/common/cnxk/roc_io_generic.h |   5 ++
 drivers/common/cnxk/version.map  |   2 +
 5 files changed, 142 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_irq.c 
b/drivers/common/cnxk/roc_bphy_irq.c
index a90c055ff..f988abf51 100644
--- a/drivers/common/cnxk/roc_bphy_irq.c
+++ b/drivers/common/cnxk/roc_bphy_irq.c
@@ -4,12 +4,22 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 #include "roc_api.h"
 #include "roc_bphy_irq.h"
 
+#define roc_cpuset_t cpu_set_t
+
+struct roc_bphy_irq_usr_data {
+   uint64_t isr_base;
+   uint64_t sp;
+   uint64_t cpu;
+   uint64_t irq_num;
+};
+
 struct roc_bphy_irq_stack {
STAILQ_ENTRY(roc_bphy_irq_stack) entries;
void *sp_buffer;
@@ -21,6 +31,8 @@ struct roc_bphy_irq_stack {
 #define ROC_BPHY_CTR_DEV_PATH "/dev/otx-bphy-ctr"
 
 #define ROC_BPHY_IOC_MAGIC 0xF3
+#define ROC_BPHY_IOC_SET_BPHY_HANDLER  
\
+   _IOW(ROC_BPHY_IOC_MAGIC, 1, struct roc_bphy_irq_usr_data)
 #define ROC_BPHY_IOC_GET_BPHY_MAX_IRQ  _IOR(ROC_BPHY_IOC_MAGIC, 3, uint64_t)
 #define ROC_BPHY_IOC_GET_BPHY_BMASK_IRQ _IOR(ROC_BPHY_IOC_MAGIC, 4, uint64_t)
 
@@ -187,6 +199,115 @@ roc_bphy_irq_stack_get(int cpu)
return NULL;
 }
 
+void
+roc_bphy_intr_handler(unsigned int irq_num)
+{
+   struct roc_bphy_irq_chip *irq_chip;
+   const struct plt_memzone *mz;
+
+   mz = plt_memzone_lookup(ROC_BPHY_MEMZONE_NAME);
+   if (mz == NULL)
+   return;
+
+   irq_chip = *(struct roc_bphy_irq_chip **)mz->addr;
+   if (irq_chip == NULL)
+   return;
+
+   if (irq_chip->irq_vecs[irq_num].handler != NULL)
+   irq_chip->irq_vecs[irq_num].handler(
+   (int)irq_num, irq_chip->irq_vecs[irq_num].isr_data);
+
+   roc_atf_ret();
+}
+
+int
+roc_bphy_irq_handler_set(struct roc_bphy_irq_chip *chip, int irq_num,
+void (*isr)(int irq_num, void *isr_data),
+void *isr_data)
+{
+   roc_cpuset_t orig_cpuset, intr_cpuset;
+   struct roc_bphy_irq_usr_data irq_usr;
+   const struct plt_memzone *mz;
+   int i, retval, curr_cpu, rc;
+   char *env;
+
+   mz = plt_memzone_lookup(chip->mz_name);
+   if (mz == NULL) {
+   /* what we want is just a pointer to chip, not object itself */
+   mz = plt_memzone_reserve_cache_align(chip->mz_name,
+sizeof(chip));
+   if (mz == NULL)
+   return -ENOMEM;
+   }
+
+   if (chip->irq_vecs[irq_num].handler != NULL)
+   return -EINVAL;
+
+   rc = pthread_getaffinity_np(pthread_self(), sizeof(orig_cpuset),
+   &orig_cpuset);
+   if (rc < 0) {
+   plt_err("Failed to get affinity mask");
+   return rc;
+   }
+
+   for (curr_cpu = -1, i = 0; i < CPU_SETSIZE; i++)
+   if (CPU_ISSET(i, &orig_cpuset))
+   curr_cpu = i;
+   if (curr_cpu < 0)
+   return -ENOENT;
+
+   CPU_ZERO(&intr_cpuset);
+   CPU_SET(curr_cpu, &intr_cpuset);
+   retval = pthread_setaffinity_np(pthread_self(), sizeof(intr_cpuset),
+   &intr_cpuset);
+   if (rc < 0) {
+   plt_err("Failed to set affinity mask");
+   return rc;
+   }
+
+   irq_usr.isr_base = (uint64_t)roc_bphy_intr_handler;
+   irq_usr.sp = (uint64_t)roc_bphy_irq_stack_get(curr_cpu);
+   irq_usr.cpu = curr_cpu;
+   if (irq_usr.sp == 0) {
+   rc = pthread_setaffinity_np(pthread_self(), sizeof(orig_cpuset),
+   &orig_cpuset);
+   if (rc < 0)
+   plt_err("Failed to restore affinity mask");
+   return rc;
+   }
+
+   /* On simulator memory locking operation takes much time. We want
+* to skip this when running in such an environment.
+*/
+   env = getenv("BPHY_INTR_MLOCK_DISABLE");
+   if (env == NULL) {
+   rc = mlockall(MCL_CURRENT | MCL_FUTURE);
+   if (rc < 0)
+   plt_warn("Failed to lock memory into RAM");
+   }
+
+   *((struct roc_bphy_irq_chip **)(mz->addr)) = chip;
+   irq_usr.irq_num = irq_num;
+   chip->irq_vecs[irq_num].handler_cpu = curr_cpu;
+   chip->irq_vecs[irq_num].handler = isr;
+   chip->irq_vecs[irq_num].isr_data = isr_data;
+   retval = ioctl(chip->intfd, ROC_BPHY_IOC_SET_BPHY_HANDLER, &irq_usr);
+   if (retval != 0) {
+   roc_bphy_irq_stack_remove(curr_cpu);
+

[dpdk-dev] [PATCH v2 21/32] common/cnxk: support for clearing bphy irq handler

2021-06-15 Thread Tomasz Duszynski
Add support for clearing previously register baseband phy irq handler.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/common/cnxk/roc_bphy_irq.c | 66 ++
 drivers/common/cnxk/roc_bphy_irq.h |  2 +
 drivers/common/cnxk/version.map|  1 +
 3 files changed, 69 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_irq.c 
b/drivers/common/cnxk/roc_bphy_irq.c
index f988abf51..4b87fc801 100644
--- a/drivers/common/cnxk/roc_bphy_irq.c
+++ b/drivers/common/cnxk/roc_bphy_irq.c
@@ -33,6 +33,7 @@ struct roc_bphy_irq_stack {
 #define ROC_BPHY_IOC_MAGIC 0xF3
 #define ROC_BPHY_IOC_SET_BPHY_HANDLER  
\
_IOW(ROC_BPHY_IOC_MAGIC, 1, struct roc_bphy_irq_usr_data)
+#define ROC_BPHY_IOC_CLR_BPHY_HANDLER  _IO(ROC_BPHY_IOC_MAGIC, 2)
 #define ROC_BPHY_IOC_GET_BPHY_MAX_IRQ  _IOR(ROC_BPHY_IOC_MAGIC, 3, uint64_t)
 #define ROC_BPHY_IOC_GET_BPHY_BMASK_IRQ _IOR(ROC_BPHY_IOC_MAGIC, 4, uint64_t)
 
@@ -316,3 +317,68 @@ roc_bphy_intr_available(struct roc_bphy_irq_chip 
*irq_chip, int irq_num)
 
return irq_chip->avail_irq_bmask & BIT(irq_num);
 }
+
+int
+roc_bphy_handler_clear(struct roc_bphy_irq_chip *chip, int irq_num)
+{
+   roc_cpuset_t orig_cpuset, intr_cpuset;
+   const struct plt_memzone *mz;
+   int retval;
+
+   if (chip == NULL)
+   return -EINVAL;
+   if ((uint64_t)irq_num >= chip->max_irq || irq_num < 0)
+   return -EINVAL;
+   if (!roc_bphy_intr_available(chip, irq_num))
+   return -ENOTSUP;
+   if (chip->irq_vecs[irq_num].handler == NULL)
+   return -EINVAL;
+   mz = plt_memzone_lookup(chip->mz_name);
+   if (mz == NULL)
+   return -ENXIO;
+
+   retval = pthread_getaffinity_np(pthread_self(), sizeof(orig_cpuset),
+   &orig_cpuset);
+   if (retval < 0) {
+   plt_warn("Failed to get affinity mask");
+   CPU_ZERO(&orig_cpuset);
+   CPU_SET(0, &orig_cpuset);
+   }
+
+   CPU_ZERO(&intr_cpuset);
+   CPU_SET(chip->irq_vecs[irq_num].handler_cpu, &intr_cpuset);
+   retval = pthread_setaffinity_np(pthread_self(), sizeof(intr_cpuset),
+   &intr_cpuset);
+   if (retval < 0) {
+   plt_warn("Failed to set affinity mask");
+   CPU_ZERO(&orig_cpuset);
+   CPU_SET(0, &orig_cpuset);
+   }
+
+   retval = ioctl(chip->intfd, ROC_BPHY_IOC_CLR_BPHY_HANDLER, irq_num);
+   if (retval == 0) {
+   roc_bphy_irq_stack_remove(chip->irq_vecs[irq_num].handler_cpu);
+   chip->n_handlers--;
+   chip->irq_vecs[irq_num].isr_data = NULL;
+   chip->irq_vecs[irq_num].handler = NULL;
+   chip->irq_vecs[irq_num].handler_cpu = -1;
+   if (chip->n_handlers == 0) {
+   retval = plt_memzone_free(mz);
+   if (retval < 0)
+   plt_err("Failed to free memzone: irq %d",
+   irq_num);
+   }
+   } else {
+   plt_err("Failed to clear bphy interrupt handler");
+   }
+
+   retval = pthread_setaffinity_np(pthread_self(), sizeof(orig_cpuset),
+   &orig_cpuset);
+   if (retval < 0) {
+   plt_warn("Failed to restore affinity mask");
+   CPU_ZERO(&orig_cpuset);
+   CPU_SET(0, &orig_cpuset);
+   }
+
+   return retval;
+}
diff --git a/drivers/common/cnxk/roc_bphy_irq.h 
b/drivers/common/cnxk/roc_bphy_irq.h
index 7dd23f4ab..778764f68 100644
--- a/drivers/common/cnxk/roc_bphy_irq.h
+++ b/drivers/common/cnxk/roc_bphy_irq.h
@@ -32,5 +32,7 @@ roc_bphy_irq_handler_set(struct roc_bphy_irq_chip *chip, int 
irq_num,
 void *isr_data);
 __roc_api bool roc_bphy_intr_available(struct roc_bphy_irq_chip *irq_chip,
   int irq_num);
+__roc_api int roc_bphy_handler_clear(struct roc_bphy_irq_chip *chip,
+int irq_num);
 
 #endif /* _ROC_BPHY_IRQ_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 861a97cc0..941055ba0 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -22,6 +22,7 @@ INTERNAL {
roc_bphy_cgx_stop_rxtx;
roc_bphy_dev_fini;
roc_bphy_dev_init;
+   roc_bphy_handler_clear;
roc_bphy_intr_available;
roc_bphy_intr_fini;
roc_bphy_intr_handler;
-- 
2.25.1



[dpdk-dev] [PATCH v2 22/32] common/cnxk: support for registering bphy irq

2021-06-15 Thread Tomasz Duszynski
Add support for registering user supplied baseband phy irq handler.

Signed-off-by: Jakib Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/common/cnxk/roc_bphy_irq.c | 38 ++
 drivers/common/cnxk/roc_bphy_irq.h | 11 +
 drivers/common/cnxk/version.map|  1 +
 3 files changed, 50 insertions(+)

diff --git a/drivers/common/cnxk/roc_bphy_irq.c 
b/drivers/common/cnxk/roc_bphy_irq.c
index 4b87fc801..882066ef3 100644
--- a/drivers/common/cnxk/roc_bphy_irq.c
+++ b/drivers/common/cnxk/roc_bphy_irq.c
@@ -382,3 +382,41 @@ roc_bphy_handler_clear(struct roc_bphy_irq_chip *chip, int 
irq_num)
 
return retval;
 }
+
+int
+roc_bphy_intr_register(struct roc_bphy_irq_chip *irq_chip,
+  struct roc_bphy_intr *intr)
+{
+   roc_cpuset_t orig_cpuset, intr_cpuset;
+   int retval;
+   int ret;
+
+   if (!roc_bphy_intr_available(irq_chip, intr->irq_num))
+   return -ENOTSUP;
+
+   retval = pthread_getaffinity_np(pthread_self(), sizeof(orig_cpuset),
+   &orig_cpuset);
+   if (retval < 0) {
+   plt_err("Failed to get affinity mask");
+   return retval;
+   }
+
+   CPU_ZERO(&intr_cpuset);
+   CPU_SET(intr->cpu, &intr_cpuset);
+   retval = pthread_setaffinity_np(pthread_self(), sizeof(intr_cpuset),
+   &intr_cpuset);
+   if (retval < 0) {
+   plt_err("Failed to set affinity mask");
+   return retval;
+   }
+
+   ret = roc_bphy_irq_handler_set(irq_chip, intr->irq_num,
+  intr->intr_handler, intr->isr_data);
+
+   retval = pthread_setaffinity_np(pthread_self(), sizeof(orig_cpuset),
+   &orig_cpuset);
+   if (retval < 0)
+   plt_warn("Failed to restore affinity mask");
+
+   return ret;
+}
diff --git a/drivers/common/cnxk/roc_bphy_irq.h 
b/drivers/common/cnxk/roc_bphy_irq.h
index 778764f68..19ec5fdc4 100644
--- a/drivers/common/cnxk/roc_bphy_irq.h
+++ b/drivers/common/cnxk/roc_bphy_irq.h
@@ -21,6 +21,15 @@ struct roc_bphy_irq_chip {
char *mz_name;
 };
 
+struct roc_bphy_intr {
+   int irq_num;
+   void (*intr_handler)(int irq_num, void *isr_data);
+   void *isr_data;
+   int cpu;
+   /* stack for this interrupt, not supplied by a user */
+   uint8_t *sp;
+};
+
 __roc_api struct roc_bphy_irq_chip *roc_bphy_intr_init(void);
 __roc_api void roc_bphy_intr_fini(struct roc_bphy_irq_chip *irq_chip);
 __roc_api void roc_bphy_irq_stack_remove(int cpu);
@@ -34,5 +43,7 @@ __roc_api bool roc_bphy_intr_available(struct 
roc_bphy_irq_chip *irq_chip,
   int irq_num);
 __roc_api int roc_bphy_handler_clear(struct roc_bphy_irq_chip *chip,
 int irq_num);
+__roc_api int roc_bphy_intr_register(struct roc_bphy_irq_chip *irq_chip,
+struct roc_bphy_intr *intr);
 
 #endif /* _ROC_BPHY_IRQ_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 941055ba0..e24766c05 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -27,6 +27,7 @@ INTERNAL {
roc_bphy_intr_fini;
roc_bphy_intr_handler;
roc_bphy_intr_init;
+   roc_bphy_intr_register;
roc_bphy_irq_handler_set;
roc_bphy_irq_stack_get;
roc_bphy_irq_stack_remove;
-- 
2.25.1



[dpdk-dev] [PATCH v2 23/32] raw/cnxk_bphy: add baseband PHY skeleton driver

2021-06-15 Thread Tomasz Duszynski
Add baseband phy skeleton driver. Baseband phy is a hardware subsystem
accelerating 5G/LTE related tasks. Note this driver isn't involved into
any sort baseband protocol processing. Instead it just provides means
for configuring hardware.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 doc/guides/rel_notes/release_21_08.rst |   6 ++
 drivers/raw/cnxk_bphy/cnxk_bphy.c  | 113 +
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.h  |  23 +
 drivers/raw/cnxk_bphy/meson.build  |   1 +
 usertools/dpdk-devbind.py  |   4 +-
 5 files changed, 146 insertions(+), 1 deletion(-)
 create mode 100644 drivers/raw/cnxk_bphy/cnxk_bphy.c
 create mode 100644 drivers/raw/cnxk_bphy/cnxk_bphy_irq.h

diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index ae70e15d1..b3829bd30 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -62,6 +62,12 @@ New Features
   standard rawdev enq/deq operations. See the :doc:`../rawdevs/cnxk_bphy`
   rawdev guide for more details on this driver.
 
+  Added new Baseband phy PMD which provides means for configuring baseband 
hardware via
+  standard rawdev enq/deq operations. Baseband phy is a hardware subsystem 
accelerating
+  5G/LTE related tasks.
+
+  Both BPHY and BPHY CGX/RPM drivers are related hence kept together to ease 
maintenance.
+
 
 Removed Items
 -
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy.c
new file mode 100644
index 0..cd26b9717
--- /dev/null
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -0,0 +1,113 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "cnxk_bphy_irq.h"
+
+static const struct rte_pci_id pci_bphy_map[] = {
+   {RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CNXK_BPHY)},
+   {
+   .vendor_id = 0,
+   },
+};
+
+static void
+bphy_rawdev_get_name(char *name, struct rte_pci_device *pci_dev)
+{
+   snprintf(name, RTE_RAWDEV_NAME_MAX_LEN, "BPHY:%x:%02x.%x",
+pci_dev->addr.bus, pci_dev->addr.devid,
+pci_dev->addr.function);
+}
+
+static const struct rte_rawdev_ops bphy_rawdev_ops = {
+};
+
+static int
+bphy_rawdev_probe(struct rte_pci_driver *pci_drv,
+ struct rte_pci_device *pci_dev)
+{
+   struct bphy_device *bphy_dev = NULL;
+   char name[RTE_RAWDEV_NAME_MAX_LEN];
+   struct rte_rawdev *bphy_rawdev;
+   int ret;
+
+   RTE_SET_USED(pci_drv);
+
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return 0;
+
+   if (!pci_dev->mem_resource[0].addr) {
+   plt_err("BARs have invalid values: BAR0 %p\n BAR2 %p",
+   pci_dev->mem_resource[0].addr,
+   pci_dev->mem_resource[2].addr);
+   return -ENODEV;
+   }
+
+   ret = roc_plt_init();
+   if (ret)
+   return ret;
+
+   bphy_rawdev_get_name(name, pci_dev);
+   bphy_rawdev = rte_rawdev_pmd_allocate(name, sizeof(*bphy_dev),
+ rte_socket_id());
+   if (bphy_rawdev == NULL) {
+   plt_err("Failed to allocate rawdev");
+   return -ENOMEM;
+   }
+
+   bphy_rawdev->dev_ops = &bphy_rawdev_ops;
+   bphy_rawdev->device = &pci_dev->device;
+   bphy_rawdev->driver_name = pci_dev->driver->driver.name;
+
+   bphy_dev = (struct bphy_device *)bphy_rawdev->dev_private;
+   bphy_dev->mem.res0 = pci_dev->mem_resource[0];
+   bphy_dev->mem.res2 = pci_dev->mem_resource[2];
+
+   return 0;
+}
+
+static int
+bphy_rawdev_remove(struct rte_pci_device *pci_dev)
+{
+   char name[RTE_RAWDEV_NAME_MAX_LEN];
+   struct rte_rawdev *rawdev;
+
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+   return 0;
+
+   if (pci_dev == NULL) {
+   plt_err("invalid pci_dev");
+   return -EINVAL;
+   }
+
+   rawdev = rte_rawdev_pmd_get_named_dev(name);
+   if (rawdev == NULL) {
+   plt_err("invalid device name (%s)", name);
+   return -EINVAL;
+   }
+
+   bphy_rawdev_get_name(name, pci_dev);
+
+   return rte_rawdev_pmd_release(rawdev);
+}
+
+static struct rte_pci_driver cnxk_bphy_rawdev_pmd = {
+   .id_table = pci_bphy_map,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA,
+   .probe = bphy_rawdev_probe,
+   .remove = bphy_rawdev_remove,
+};
+
+RTE_PMD_REGISTER_PCI(bphy_rawdev_pci_driver, cnxk_bphy_rawdev_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(bphy_rawdev_pci_driver, pci_bphy_map);
+RTE_PMD_REGISTER_KMOD_DEP(bphy_rawdev_pci_driver, "vfio-pci");
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
new file mode 100644
index 0..77169b1b7
--- /dev/

[dpdk-dev] [PATCH v2 24/32] raw/cnxk_bphy: support for reading bphy queue configuration

2021-06-15 Thread Tomasz Duszynski
Add support for reading baseband phy queue configuration.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/raw/cnxk_bphy/cnxk_bphy.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy.c
index cd26b9717..00b6c5035 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -29,7 +29,24 @@ bphy_rawdev_get_name(char *name, struct rte_pci_device 
*pci_dev)
 pci_dev->addr.function);
 }
 
+static int
+cnxk_bphy_irq_queue_def_conf(struct rte_rawdev *dev, uint16_t queue_id,
+rte_rawdev_obj_t queue_conf,
+size_t queue_conf_size)
+{
+   RTE_SET_USED(dev);
+   RTE_SET_USED(queue_id);
+
+   if (queue_conf_size != sizeof(unsigned int))
+   return -EINVAL;
+
+   *(unsigned int *)queue_conf = 1;
+
+   return 0;
+}
+
 static const struct rte_rawdev_ops bphy_rawdev_ops = {
+   .queue_def_conf = cnxk_bphy_irq_queue_def_conf,
 };
 
 static int
-- 
2.25.1



[dpdk-dev] [PATCH v2 25/32] raw/cnxk_bphy: support for reading bphy queue count

2021-06-15 Thread Tomasz Duszynski
Add support for reading number of available queues from baseband
phy. Currently only single queue is supported.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/raw/cnxk_bphy/cnxk_bphy.c | 9 +
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.h | 7 +++
 2 files changed, 16 insertions(+)

diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy.c
index 00b6c5035..04e822586 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -29,6 +29,14 @@ bphy_rawdev_get_name(char *name, struct rte_pci_device 
*pci_dev)
 pci_dev->addr.function);
 }
 
+static uint16_t
+cnxk_bphy_irq_queue_count(struct rte_rawdev *dev)
+{
+   struct bphy_device *bphy_dev = (struct bphy_device *)dev->dev_private;
+
+   return RTE_DIM(bphy_dev->queues);
+}
+
 static int
 cnxk_bphy_irq_queue_def_conf(struct rte_rawdev *dev, uint16_t queue_id,
 rte_rawdev_obj_t queue_conf,
@@ -47,6 +55,7 @@ cnxk_bphy_irq_queue_def_conf(struct rte_rawdev *dev, uint16_t 
queue_id,
 
 static const struct rte_rawdev_ops bphy_rawdev_ops = {
.queue_def_conf = cnxk_bphy_irq_queue_def_conf,
+   .queue_count = cnxk_bphy_irq_queue_count,
 };
 
 static int
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
index 77169b1b7..16243efc9 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
@@ -15,9 +15,16 @@ struct bphy_mem {
struct rte_mem_resource res2;
 };
 
+struct bphy_irq_queue {
+   /* queue holds up to one response */
+   void *rsp;
+};
+
 struct bphy_device {
struct roc_bphy_irq_chip *irq_chip;
struct bphy_mem mem;
+   /* bphy irq interface supports single queue only */
+   struct bphy_irq_queue queues[1];
 };
 
 #endif /* _CNXK_BPHY_IRQ_ */
-- 
2.25.1



[dpdk-dev] [PATCH v2 26/32] raw/cnxk_bphy: support for bphy enqueue operation

2021-06-15 Thread Tomasz Duszynski
Add preliminary support for enqueue operation.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/raw/cnxk_bphy/cnxk_bphy.c| 26 ++
 drivers/raw/cnxk_bphy/rte_pmd_bphy.h | 13 +
 2 files changed, 39 insertions(+)

diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy.c
index 04e822586..2949bf02a 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -13,6 +13,7 @@
 #include 
 
 #include "cnxk_bphy_irq.h"
+#include "rte_pmd_bphy.h"
 
 static const struct rte_pci_id pci_bphy_map[] = {
{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CNXK_BPHY)},
@@ -29,6 +30,30 @@ bphy_rawdev_get_name(char *name, struct rte_pci_device 
*pci_dev)
 pci_dev->addr.function);
 }
 
+static int
+cnxk_bphy_irq_enqueue_bufs(struct rte_rawdev *dev,
+  struct rte_rawdev_buf **buffers, unsigned int count,
+  rte_rawdev_obj_t context)
+{
+   struct bphy_device *bphy_dev = (struct bphy_device *)dev->dev_private;
+   struct cnxk_bphy_irq_msg *msg = buffers[0]->buf_addr;
+   unsigned int queue = (size_t)context;
+   int ret = 0;
+
+   if (queue >= RTE_DIM(bphy_dev->queues))
+   return -EINVAL;
+
+   if (count == 0)
+   return 0;
+
+   switch (msg->type) {
+   default:
+   ret = -EINVAL;
+   }
+
+   return ret;
+}
+
 static uint16_t
 cnxk_bphy_irq_queue_count(struct rte_rawdev *dev)
 {
@@ -55,6 +80,7 @@ cnxk_bphy_irq_queue_def_conf(struct rte_rawdev *dev, uint16_t 
queue_id,
 
 static const struct rte_rawdev_ops bphy_rawdev_ops = {
.queue_def_conf = cnxk_bphy_irq_queue_def_conf,
+   .enqueue_bufs = cnxk_bphy_irq_enqueue_bufs,
.queue_count = cnxk_bphy_irq_queue_count,
 };
 
diff --git a/drivers/raw/cnxk_bphy/rte_pmd_bphy.h 
b/drivers/raw/cnxk_bphy/rte_pmd_bphy.h
index fed7916fe..eb39654f1 100644
--- a/drivers/raw/cnxk_bphy/rte_pmd_bphy.h
+++ b/drivers/raw/cnxk_bphy/rte_pmd_bphy.h
@@ -101,4 +101,17 @@ struct cnxk_bphy_cgx_msg {
void *data;
 };
 
+enum cnxk_bphy_irq_msg_type {
+   CNXK_BPHY_IRQ_MSG_TYPE_INIT,
+   CNXK_BPHY_IRQ_MSG_TYPE_FINI,
+   CNXK_BPHY_IRQ_MSG_TYPE_REGISTER,
+   CNXK_BPHY_IRQ_MSG_TYPE_UNREGISTER,
+   CNXK_BPHY_IRQ_MSG_TYPE_MEM_GET,
+};
+
+struct cnxk_bphy_irq_msg {
+   enum cnxk_bphy_irq_msg_type type;
+   void *data;
+};
+
 #endif /* _CNXK_BPHY_H_ */
-- 
2.25.1



[dpdk-dev] [PATCH v2 27/32] raw/cnxk_bphy: support for bphy dequeue operation

2021-06-15 Thread Tomasz Duszynski
Add support for dequeueing responses to previously
enqueued messages.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/raw/cnxk_bphy/cnxk_bphy.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy.c
index 2949bf02a..7e541bac4 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -54,6 +54,25 @@ cnxk_bphy_irq_enqueue_bufs(struct rte_rawdev *dev,
return ret;
 }
 
+static int
+cnxk_bphy_irq_dequeue_bufs(struct rte_rawdev *dev,
+  struct rte_rawdev_buf **buffers, unsigned int count,
+  rte_rawdev_obj_t context)
+{
+   struct bphy_device *bphy_dev = (struct bphy_device *)dev->dev_private;
+   unsigned int queue = (size_t)context;
+
+   if (queue >= RTE_DIM(bphy_dev->queues))
+   return -EINVAL;
+
+   if (count == 0)
+   return 0;
+
+   buffers[0]->buf_addr = bphy_dev->queues[queue].rsp;
+
+   return 0;
+}
+
 static uint16_t
 cnxk_bphy_irq_queue_count(struct rte_rawdev *dev)
 {
@@ -81,6 +100,7 @@ cnxk_bphy_irq_queue_def_conf(struct rte_rawdev *dev, 
uint16_t queue_id,
 static const struct rte_rawdev_ops bphy_rawdev_ops = {
.queue_def_conf = cnxk_bphy_irq_queue_def_conf,
.enqueue_bufs = cnxk_bphy_irq_enqueue_bufs,
+   .dequeue_bufs = cnxk_bphy_irq_dequeue_bufs,
.queue_count = cnxk_bphy_irq_queue_count,
 };
 
-- 
2.25.1



[dpdk-dev] [PATCH v2 28/32] raw/cnxk_bphy: support for interrupt init and cleanup

2021-06-15 Thread Tomasz Duszynski
Add support for interrupt initialization and cleanup. Internally
interrupt initialization performs low level setup that allows
custom interrupt handler registration later on.

Interrupt initialization and cleanup are related hence they
are in the same patch.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 doc/guides/rawdevs/cnxk_bphy.rst  | 20 
 drivers/raw/cnxk_bphy/cnxk_bphy.c |  6 
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.c | 47 +++
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.h |  5 +++
 drivers/raw/cnxk_bphy/meson.build |  1 +
 drivers/raw/cnxk_bphy/rte_pmd_bphy.h  | 41 +++
 6 files changed, 120 insertions(+)
 create mode 100644 drivers/raw/cnxk_bphy/cnxk_bphy_irq.c

diff --git a/doc/guides/rawdevs/cnxk_bphy.rst b/doc/guides/rawdevs/cnxk_bphy.rst
index 2edd814da..00edca4be 100644
--- a/doc/guides/rawdevs/cnxk_bphy.rst
+++ b/doc/guides/rawdevs/cnxk_bphy.rst
@@ -37,6 +37,9 @@ To perform data transfer use standard 
``rte_rawdev_enqueue_buffers()`` and
 ``rte_rawdev_dequeue_buffers()`` APIs. Not all messages produce sensible
 responses hence dequeueing is not always necessary.
 
+BPHY CGX/RPM PMD
+
+
 BPHY CGX/RPM PMD accepts ``struct cnxk_bphy_cgx_msg`` messages which differ by 
type and payload.
 Message types along with description are listed below. As for the usage 
examples please refer to
 ``cnxk_bphy_cgx_dev_selftest()``.
@@ -95,6 +98,23 @@ Message must have type set to 
``CNXK_BPHY_CGX_MSG_TYPE_START_RXTX`` or
 ``CNXK_BPHY_CGX_MSG_TYPE_STOP_RXTX``. Former will enable traffic while the 
latter will
 do the opposite.
 
+BPHY PMD
+
+
+BPHY PMD accepts ``struct cnxk_bphy_irq_msg`` messages which differ by type 
and payload.
+Message types along with description are listed below. For some usage examples 
please refer to
+``bphy_rawdev_selftest()``.
+
+Initialize or finalize interrupt handling
+~
+
+Message is used to setup low level interrupt handling.
+
+Message must have type set to ``CNXK_BPHY_IRQ_MSG_TYPE_INIT`` or 
``CNXK_BPHY_IRQ_MSG_TYPE_FINI``.
+The former will setup low level interrupt handling while the latter will tear 
everything down. There
+are also two convenience functions namely ``rte_pmd_bphy_intr_init()`` and
+``rte_pmd_bphy_intr_fini()`` that take care of all details.
+
 Self test
 -
 
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy.c
index 7e541bac4..3f8679534 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -47,6 +47,12 @@ cnxk_bphy_irq_enqueue_bufs(struct rte_rawdev *dev,
return 0;
 
switch (msg->type) {
+   case CNXK_BPHY_IRQ_MSG_TYPE_INIT:
+   ret = cnxk_bphy_intr_init(dev->dev_id);
+   break;
+   case CNXK_BPHY_IRQ_MSG_TYPE_FINI:
+   cnxk_bphy_intr_fini(dev->dev_id);
+   break;
default:
ret = -EINVAL;
}
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
new file mode 100644
index 0..c4df539cd
--- /dev/null
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "cnxk_bphy_irq.h"
+
+static struct bphy_device *
+cnxk_bphy_get_bphy_dev_by_dev_id(uint16_t dev_id)
+{
+   struct rte_rawdev *rawdev;
+
+   if (!rte_rawdev_pmd_is_valid_dev(dev_id))
+   return NULL;
+
+   rawdev = &rte_rawdevs[dev_id];
+
+   return (struct bphy_device *)rawdev->dev_private;
+}
+
+int
+cnxk_bphy_intr_init(uint16_t dev_id)
+{
+   struct bphy_device *bphy_dev = cnxk_bphy_get_bphy_dev_by_dev_id(dev_id);
+
+   bphy_dev->irq_chip = roc_bphy_intr_init();
+   if (bphy_dev->irq_chip == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
+
+void
+cnxk_bphy_intr_fini(uint16_t dev_id)
+{
+   struct bphy_device *bphy_dev = cnxk_bphy_get_bphy_dev_by_dev_id(dev_id);
+   struct roc_bphy_irq_chip *irq_chip = bphy_dev->irq_chip;
+
+   roc_bphy_intr_fini(irq_chip);
+   bphy_dev->irq_chip = NULL;
+}
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
index 16243efc9..3acc47fe8 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
@@ -10,6 +10,8 @@
 
 #include 
 
+typedef void (*cnxk_bphy_intr_handler_t)(int irq_num, void *isr_data);
+
 struct bphy_mem {
struct rte_mem_resource res0;
struct rte_mem_resource res2;
@@ -27,4 +29,7 @@ struct bphy_device {
struct bphy_irq_queue queues[1];
 };
 
+int cnxk_bphy_intr_init(uint16_t dev_id);
+void cnxk_bphy_intr_fini(uint16_t dev_id);
+
 #endif /* _CNXK_BPHY_IRQ_ */
diff --git a/drivers/raw/cnxk_bphy/meson.build 
b/drivers/raw/cnxk_bphy/meson.build
index f2868fd68..14147feaf 

[dpdk-dev] [PATCH v2 29/32] raw/cnxk_bphy: support for reading number of bphy irqs

2021-06-15 Thread Tomasz Duszynski
Add support for retrieving maximum number of interrupts.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.c | 12 
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
index c4df539cd..991c2d7ab 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
@@ -24,6 +24,18 @@ cnxk_bphy_get_bphy_dev_by_dev_id(uint16_t dev_id)
return (struct bphy_device *)rawdev->dev_private;
 }
 
+uint64_t
+cnxk_bphy_irq_max_get(uint16_t dev_id)
+{
+   struct roc_bphy_irq_chip *irq_chip;
+   struct bphy_device *bphy_dev;
+
+   bphy_dev = cnxk_bphy_get_bphy_dev_by_dev_id(dev_id);
+   irq_chip = bphy_dev->irq_chip;
+
+   return irq_chip->max_irq;
+}
+
 int
 cnxk_bphy_intr_init(uint16_t dev_id)
 {
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
index 3acc47fe8..6b59218af 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
@@ -31,5 +31,6 @@ struct bphy_device {
 
 int cnxk_bphy_intr_init(uint16_t dev_id);
 void cnxk_bphy_intr_fini(uint16_t dev_id);
+uint64_t cnxk_bphy_irq_max_get(uint16_t dev_id);
 
 #endif /* _CNXK_BPHY_IRQ_ */
-- 
2.25.1



[dpdk-dev] [PATCH v2 30/32] raw/cnxk_bphy: support for retrieving bphy device memory

2021-06-15 Thread Tomasz Duszynski
Allow user to retrieve baseband phy memory resources.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 doc/guides/rawdevs/cnxk_bphy.rst  | 10 +
 drivers/raw/cnxk_bphy/cnxk_bphy.c |  3 +++
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.c |  8 +++
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.h |  1 +
 drivers/raw/cnxk_bphy/rte_pmd_bphy.h  | 30 +++
 5 files changed, 52 insertions(+)

diff --git a/doc/guides/rawdevs/cnxk_bphy.rst b/doc/guides/rawdevs/cnxk_bphy.rst
index 00edca4be..e031f4148 100644
--- a/doc/guides/rawdevs/cnxk_bphy.rst
+++ b/doc/guides/rawdevs/cnxk_bphy.rst
@@ -17,6 +17,7 @@ Features
 The BPHY CGX/RPM implements following features in the rawdev API:
 
 - Access to BPHY CGX/RPM via a set of predefined messages
+- Access to BPHY memory
 
 Device Setup
 
@@ -115,6 +116,15 @@ The former will setup low level interrupt handling while 
the latter will tear ev
 are also two convenience functions namely ``rte_pmd_bphy_intr_init()`` and
 ``rte_pmd_bphy_intr_fini()`` that take care of all details.
 
+
+Get device memory
+~
+
+Message is used to read device MMIO address.
+
+Message must have type set to ``CNXK_BPHY_IRQ_MSG_TYPE_MEM_GET``. There's a 
convenience function
+``rte_pmd_bphy_intr_mem_get()`` available that takes care of retrieving that 
address.
+
 Self test
 -
 
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy.c
index 3f8679534..278e26af0 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -53,6 +53,9 @@ cnxk_bphy_irq_enqueue_bufs(struct rte_rawdev *dev,
case CNXK_BPHY_IRQ_MSG_TYPE_FINI:
cnxk_bphy_intr_fini(dev->dev_id);
break;
+   case CNXK_BPHY_IRQ_MSG_TYPE_MEM_GET:
+   bphy_dev->queues[queue].rsp = &bphy_dev->mem;
+   break;
default:
ret = -EINVAL;
}
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
index 991c2d7ab..13a0d8ad1 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
@@ -57,3 +57,11 @@ cnxk_bphy_intr_fini(uint16_t dev_id)
roc_bphy_intr_fini(irq_chip);
bphy_dev->irq_chip = NULL;
 }
+
+struct bphy_mem *
+cnxk_bphy_mem_get(uint16_t dev_id)
+{
+   struct bphy_device *bphy_dev = cnxk_bphy_get_bphy_dev_by_dev_id(dev_id);
+
+   return &bphy_dev->mem;
+}
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
index 6b59218af..5f87143a0 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
@@ -31,6 +31,7 @@ struct bphy_device {
 
 int cnxk_bphy_intr_init(uint16_t dev_id);
 void cnxk_bphy_intr_fini(uint16_t dev_id);
+struct bphy_mem *cnxk_bphy_mem_get(uint16_t dev_id);
 uint64_t cnxk_bphy_irq_max_get(uint16_t dev_id);
 
 #endif /* _CNXK_BPHY_IRQ_ */
diff --git a/drivers/raw/cnxk_bphy/rte_pmd_bphy.h 
b/drivers/raw/cnxk_bphy/rte_pmd_bphy.h
index c667d984e..d08b14b57 100644
--- a/drivers/raw/cnxk_bphy/rte_pmd_bphy.h
+++ b/drivers/raw/cnxk_bphy/rte_pmd_bphy.h
@@ -103,6 +103,7 @@ struct cnxk_bphy_cgx_msg {
void *data;
 };
 
+#define cnxk_bphy_mem  bphy_mem
 #define CNXK_BPHY_DEF_QUEUE 0
 
 enum cnxk_bphy_irq_msg_type {
@@ -115,6 +116,11 @@ enum cnxk_bphy_irq_msg_type {
 
 struct cnxk_bphy_irq_msg {
enum cnxk_bphy_irq_msg_type type;
+   /*
+* The data field, depending on message type, may point to
+* - (deq) struct cnxk_bphy_mem for memory range request response
+* - (xxx) NULL
+*/
void *data;
 };
 
@@ -155,4 +161,28 @@ rte_pmd_bphy_intr_fini(uint16_t dev_id)
rte_rawdev_enqueue_buffers(dev_id, bufs, 1, CNXK_BPHY_DEF_QUEUE);
 }
 
+static __rte_always_inline struct cnxk_bphy_mem *
+rte_pmd_bphy_intr_mem_get(uint16_t dev_id)
+{
+   struct cnxk_bphy_irq_msg msg = {
+   .type = CNXK_BPHY_IRQ_MSG_TYPE_MEM_GET,
+   };
+   struct rte_rawdev_buf *bufs[1];
+   struct rte_rawdev_buf buf;
+   int ret;
+
+   buf.buf_addr = &msg;
+   bufs[0] = &buf;
+
+   ret = rte_rawdev_enqueue_buffers(dev_id, bufs, 1, CNXK_BPHY_DEF_QUEUE);
+   if (ret)
+   return NULL;
+
+   ret = rte_rawdev_dequeue_buffers(dev_id, bufs, 1, CNXK_BPHY_DEF_QUEUE);
+   if (ret)
+   return NULL;
+
+   return buf.buf_addr;
+}
+
 #endif /* _CNXK_BPHY_H_ */
-- 
2.25.1



[dpdk-dev] [PATCH v2 31/32] raw/cnxk_bphy: support for registering bphy irq handlers

2021-06-15 Thread Tomasz Duszynski
Custom irq handlers may be registered/removed on demand.
Since registration and removal are related they are in the
same patch.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 doc/guides/rawdevs/cnxk_bphy.rst  | 13 
 drivers/raw/cnxk_bphy/cnxk_bphy.c | 11 +++
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.c | 33 
 drivers/raw/cnxk_bphy/cnxk_bphy_irq.h |  4 +++
 drivers/raw/cnxk_bphy/rte_pmd_bphy.h  | 45 +++
 5 files changed, 106 insertions(+)

diff --git a/doc/guides/rawdevs/cnxk_bphy.rst b/doc/guides/rawdevs/cnxk_bphy.rst
index e031f4148..899557df0 100644
--- a/doc/guides/rawdevs/cnxk_bphy.rst
+++ b/doc/guides/rawdevs/cnxk_bphy.rst
@@ -18,6 +18,7 @@ The BPHY CGX/RPM implements following features in the rawdev 
API:
 
 - Access to BPHY CGX/RPM via a set of predefined messages
 - Access to BPHY memory
+- Custom interrupt handlers
 
 Device Setup
 
@@ -117,6 +118,18 @@ are also two convenience functions namely 
``rte_pmd_bphy_intr_init()`` and
 ``rte_pmd_bphy_intr_fini()`` that take care of all details.
 
 
+Register or remove interrupt handler
+
+
+Message is used setup custom interrupt handler.
+
+Message must have type set to ``CNXK_BPHY_IRQ_MSG_TYPE_REGISTER`` or
+``CNXK_BPHY_IRQ_MSG_TYPE_UNREGISTER``. The former will register an interrupt 
handler while the
+latter will remove it. Prior sending actual message payload i.e ``struct 
cnxk_bphy_irq_info`` needs
+to be filled with relevant information. There are also two convenience 
functions namely
+``rte_pmd_bphy_intr_register()`` and ``rte_pmd_bphy_intr_unregister()`` that 
take care of all
+details.
+
 Get device memory
 ~
 
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy.c
index 278e26af0..2a516ae73 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -38,6 +38,7 @@ cnxk_bphy_irq_enqueue_bufs(struct rte_rawdev *dev,
struct bphy_device *bphy_dev = (struct bphy_device *)dev->dev_private;
struct cnxk_bphy_irq_msg *msg = buffers[0]->buf_addr;
unsigned int queue = (size_t)context;
+   struct cnxk_bphy_irq_info *info;
int ret = 0;
 
if (queue >= RTE_DIM(bphy_dev->queues))
@@ -53,6 +54,16 @@ cnxk_bphy_irq_enqueue_bufs(struct rte_rawdev *dev,
case CNXK_BPHY_IRQ_MSG_TYPE_FINI:
cnxk_bphy_intr_fini(dev->dev_id);
break;
+   case CNXK_BPHY_IRQ_MSG_TYPE_REGISTER:
+   info = (struct cnxk_bphy_irq_info *)msg->data;
+   ret = cnxk_bphy_intr_register(dev->dev_id, info->irq_num,
+ info->handler, info->data,
+ info->cpu);
+   break;
+   case CNXK_BPHY_IRQ_MSG_TYPE_UNREGISTER:
+   info = (struct cnxk_bphy_irq_info *)msg->data;
+   cnxk_bphy_intr_unregister(dev->dev_id, info->irq_num);
+   break;
case CNXK_BPHY_IRQ_MSG_TYPE_MEM_GET:
bphy_dev->queues[queue].rsp = &bphy_dev->mem;
break;
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
index 13a0d8ad1..bbcc285a7 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.c
@@ -58,6 +58,39 @@ cnxk_bphy_intr_fini(uint16_t dev_id)
bphy_dev->irq_chip = NULL;
 }
 
+int
+cnxk_bphy_intr_register(uint16_t dev_id, int irq_num,
+   cnxk_bphy_intr_handler_t handler, void *data, int cpu)
+{
+   struct roc_bphy_intr intr = {
+   .irq_num = irq_num,
+   .intr_handler = handler,
+   .isr_data = data,
+   .cpu = cpu
+   };
+
+   struct bphy_device *bphy_dev = cnxk_bphy_get_bphy_dev_by_dev_id(dev_id);
+   struct roc_bphy_irq_chip *irq_chip = bphy_dev->irq_chip;
+
+   if (!irq_chip)
+   return -ENODEV;
+   if (!handler || !data)
+   return -EINVAL;
+
+   return roc_bphy_intr_register(irq_chip, &intr);
+}
+
+void
+cnxk_bphy_intr_unregister(uint16_t dev_id, int irq_num)
+{
+   struct bphy_device *bphy_dev = cnxk_bphy_get_bphy_dev_by_dev_id(dev_id);
+
+   if (bphy_dev->irq_chip)
+   roc_bphy_handler_clear(bphy_dev->irq_chip, irq_num);
+   else
+   plt_err("Missing irq chip");
+}
+
 struct bphy_mem *
 cnxk_bphy_mem_get(uint16_t dev_id)
 {
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h 
b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
index 5f87143a0..b55147b93 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_irq.h
@@ -32,6 +32,10 @@ struct bphy_device {
 int cnxk_bphy_intr_init(uint16_t dev_id);
 void cnxk_bphy_intr_fini(uint16_t dev_id);
 struct bphy_mem *cnxk_bphy_mem_get(uint16_t dev_id);
+int cnxk_bphy_intr_register(uint16_t dev_id, int irq_num,
+   cnxk_bphy

[dpdk-dev] [PATCH v2 32/32] raw/cnxk_bphy: support for bphy selftest

2021-06-15 Thread Tomasz Duszynski
Add support for performing selftest.

Signed-off-by: Jakub Palider 
Signed-off-by: Tomasz Duszynski 
---
 doc/guides/rawdevs/cnxk_bphy.rst  |   7 +-
 drivers/raw/cnxk_bphy/cnxk_bphy.c | 124 ++
 2 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rawdevs/cnxk_bphy.rst b/doc/guides/rawdevs/cnxk_bphy.rst
index 899557df0..f5be5f62d 100644
--- a/doc/guides/rawdevs/cnxk_bphy.rst
+++ b/doc/guides/rawdevs/cnxk_bphy.rst
@@ -141,15 +141,14 @@ Message must have type set to 
``CNXK_BPHY_IRQ_MSG_TYPE_MEM_GET``. There's a conv
 Self test
 -
 
-On EAL initialization, BPHY CGX/RPM devices will be probed and populated into
+On EAL initialization BPHY and BPHY CGX/RPM devices will be probed and 
populated into
 the raw devices. The rawdev ID of the device can be obtained using invocation
 of ``rte_rawdev_get_dev_id("NAME:x")`` from the test application, where:
 
-- NAME is the desired subsystem: use "BPHY_CGX" for
+- NAME is the desired subsystem: use "BPHY" for regular, and "BPHY_CGX" for
   RFOE module,
 - x is the device's bus id specified in "bus:device.func" (BDF) format.
 
 Use this identifier for further rawdev function calls.
 
-The driver's selftest rawdev API can be used to verify the BPHY CGX/RPM
-functionality.
+Selftest rawdev API can be used to verify the BPHY and BPHY CGX/RPM 
functionality.
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c 
b/drivers/raw/cnxk_bphy/cnxk_bphy.c
index 2a516ae73..9cb3f8d33 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -11,6 +11,7 @@
 #include 
 
 #include 
+#include 
 
 #include "cnxk_bphy_irq.h"
 #include "rte_pmd_bphy.h"
@@ -22,6 +23,128 @@ static const struct rte_pci_id pci_bphy_map[] = {
},
 };
 
+struct bphy_test {
+   int irq_num;
+   cnxk_bphy_intr_handler_t handler;
+   void *data;
+   int cpu;
+   bool handled_intr;
+   int handled_data;
+   int test_data;
+};
+
+static struct bphy_test *test;
+
+static void
+bphy_test_handler_fn(int irq_num, void *isr_data)
+{
+   test[irq_num].handled_intr = true;
+   test[irq_num].handled_data = *((int *)isr_data);
+}
+
+static int
+bphy_rawdev_selftest(uint16_t dev_id)
+{
+   unsigned int i, queues, descs;
+   uint64_t max_irq;
+   int ret;
+
+   queues = rte_rawdev_queue_count(dev_id);
+   if (queues == 0)
+   return -ENODEV;
+
+   ret = rte_rawdev_start(dev_id);
+   if (ret)
+   return ret;
+
+   ret = rte_rawdev_queue_conf_get(dev_id, CNXK_BPHY_DEF_QUEUE, &descs,
+   sizeof(descs));
+   if (ret)
+   goto err_desc;
+   if (descs != 1) {
+   ret = -ENODEV;
+   plt_err("Wrong number of descs reported\n");
+   goto err_desc;
+   }
+
+   ret = rte_pmd_bphy_intr_init(dev_id);
+   if (ret) {
+   plt_err("intr init failed");
+   return ret;
+   }
+
+   max_irq = cnxk_bphy_irq_max_get(dev_id);
+
+   test = rte_zmalloc("BPHY", max_irq * sizeof(*test), 0);
+   if (test == NULL) {
+   plt_err("intr alloc failed");
+   goto err_alloc;
+   }
+
+   for (i = 0; i < max_irq; i++) {
+   test[i].test_data = i;
+   test[i].irq_num = i;
+   test[i].handler = bphy_test_handler_fn;
+   test[i].data = &test[i].test_data;
+   }
+
+   for (i = 0; i < max_irq; i++) {
+   ret = rte_pmd_bphy_intr_register(dev_id, test[i].irq_num,
+test[i].handler, test[i].data,
+0);
+   if (ret == -ENOTSUP) {
+   /* In the test we iterate over all irq numbers
+* so if some of them are not supported by given
+* platform we treat respective results as valid
+* ones. This way they have no impact on overall
+* test results.
+*/
+   test[i].handled_intr = true;
+   test[i].handled_data = test[i].test_data;
+   ret = 0;
+   continue;
+   }
+
+   if (ret) {
+   plt_err("intr register failed at irq %d", i);
+   goto err_register;
+   }
+   }
+
+   for (i = 0; i < max_irq; i++)
+   roc_bphy_intr_handler(i);
+
+   for (i = 0; i < max_irq; i++) {
+   if (!test[i].handled_intr) {
+   plt_err("intr %u not handled", i);
+   ret = -1;
+   break;
+   }
+   if (test[i].handled_data != test[i].test_data) {
+   plt_err("intr %u has wrong handler", i);
+   ret = -1;
+   break;
+   

Re: [dpdk-dev] [PATCH] bus: clarify log for non-NUMA-aware devices

2021-06-15 Thread Xueming(Steven) Li



> -Original Message-
> From: Dmitry Kozlyuk 
> Sent: Tuesday, June 15, 2021 6:44 PM
> To: dev@dpdk.org
> Cc: Xueming(Steven) Li ; sta...@dpdk.org; Marcin Wojtas 
> ; Michal Krawczyk
> ; Guy Tzalik ; Evgeny Schemeilin 
> ; Igor Chauskin
> ; Stephen Hemminger ; 
> NBU-Contact-longli ; Sergio
> Gonzalez Monroy 
> Subject: [PATCH] bus: clarify log for non-NUMA-aware devices
> 
> PCI and vmbus drivers printed a warning
> when NUMA node had beed reported as (-1) or not reported by OS:
> 
> EAL:   Invalid NUMA socket, default to 0
> 
> This message and its level might confuse users, because configuration is 
> valid and nothing happens that requires attention or
> intervention.
> 
> Reduce level to INFO and reword the message.
> 
> Fixes: f0e0e86aa35d ("pci: move NUMA node check from scan to probe")
> Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Dmitry Kozlyuk 
> Reviewed-by: Slava Ovsiienko 
> ---
> Hi Xueming,
> Please align logging in the pending bus/auxiliary patch.

LGTM, updated.

> 
>  doc/guides/nics/ena.rst  | 2 +-
>  drivers/bus/pci/pci_common.c | 2 +-
>  drivers/bus/vmbus/vmbus_common.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst index 
> 0f1f63f722..694ce1da74 100644
> --- a/doc/guides/nics/ena.rst
> +++ b/doc/guides/nics/ena.rst
> @@ -234,7 +234,7 @@ Example output:
> 
> [...]
> EAL: PCI device :00:06.0 on NUMA socket -1
> -   EAL:   Invalid NUMA socket, default to 0
> +   EAL:   Device is not NUMA-aware, defaulting socket to 0
> EAL:   probe driver: 1d0f:ec20 net_ena
> 
> Interactive-mode selected
> diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c 
> index 35d7d092d1..bf06f81229 100644
> --- a/drivers/bus/pci/pci_common.c
> +++ b/drivers/bus/pci/pci_common.c
> @@ -190,7 +190,7 @@ rte_pci_probe_one_driver(struct rte_pci_driver *dr,
>   }
> 
>   if (dev->device.numa_node < 0) {
> - RTE_LOG(WARNING, EAL, "  Invalid NUMA socket, default to 0\n");
> + RTE_LOG(INFO, EAL, "  Device is not NUMA-aware, defaulting 
> socket to
> +0\n");
>   dev->device.numa_node = 0;
>   }
> 
> diff --git a/drivers/bus/vmbus/vmbus_common.c 
> b/drivers/bus/vmbus/vmbus_common.c
> index d25fd14ef5..ef23af90ec 100644
> --- a/drivers/bus/vmbus/vmbus_common.c
> +++ b/drivers/bus/vmbus/vmbus_common.c
> @@ -112,7 +112,7 @@ vmbus_probe_one_driver(struct rte_vmbus_driver *dr,
>   dev->driver = dr;
> 
>   if (dev->device.numa_node < 0) {
> - VMBUS_LOG(WARNING, "  Invalid NUMA socket, default to 0");
> + VMBUS_LOG(INFO, "  Device is not NUMA-aware, defaulting socket 
> to
> +0\n");
>   dev->device.numa_node = 0;
>   }
> 
> --
> 2.18.2

Reviewed-by: Xueming(Steven) Li 


Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction

2021-06-15 Thread Xia, Chenbo
Hi Parav,

> -Original Message-
> From: Parav Pandit 
> Sent: Tuesday, June 15, 2021 1:43 PM
> To: Xia, Chenbo ; NBU-Contact-Thomas Monjalon
> ; Yigit, Ferruh 
> Cc: Xueming(Steven) Li ; Slava Ovsiienko
> ; dev@dpdk.org; Jason Gunthorpe 
> Subject: RE: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
> 
> 
> 
> > From: Xia, Chenbo 
> > Sent: Tuesday, June 15, 2021 11:03 AM
> >
> > Hi Parav,
> >
> > > -Original Message-
> > > From: Parav Pandit 
> > > Sent: Tuesday, June 15, 2021 12:05 PM
> > > To: Xia, Chenbo ; NBU-Contact-Thomas Monjalon
> > > ; Yigit, Ferruh 
> > > Cc: Xueming(Steven) Li ; Slava Ovsiienko
> > > ; dev@dpdk.org; Jason Gunthorpe
> > > 
> > > Subject: RE: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction
> > >
> > > Hi Chenbo,
> > >
> > > > From: Xia, Chenbo 
> > > > Sent: Tuesday, June 15, 2021 7:41 AM
> > > >
> > > > Hi Thomas,
> > > >
> > > > > From: Thomas Monjalon 
> > > > > Sent: Friday, June 11, 2021 3:54 PM
> > > [..]
> > >
> > > >
> > > > Yes. In our term it's called Assignable Device Interface (ADI)
> > > > introduced in Intel Scalable IOV
> > > > (https://01.org/blogs/2019/assignable-interfaces-intel-
> > > > scalable-i/o-virtualization-linux)
> > > >
> > > > And vfio-mdev is chosen to be the software framework for it. I start
> > > > to
> > > realize
> > > > there is difference between SF and ADI: SF considers multi-function
> > > > devices which may include net/regex/vdpa/...
> > > Yes. net, rdma, vdpa, regex ++.
> > > And eventually vfio_device to map to VM too.
> > >
> > > Non mdev framework is chosen so that all the use cases of kernel only,
> > > or user only or mix modes can be supported.
> >
> > OK. Got it.
> >
> > >
> > > > But ADI only focuses on the
> > > > virtualization of the devices and splitting devices to logic parts
> > > > and
> > > providing
> > > > huge number of interfaces to host APP. I think SF also considers
> > > > this but is mainly used for multi-function devices (like DPU in your
> term?
> > > > Correct me if I'm wrong).
> > > >
> > > SF also supports DPU mode too but it is in addition to above use cases.
> > > SF will expose mdev (or a vfio_device) to map to a VM.
> >
> > So your SW actually supports vfio-mdev? I suppose the device-specific mdev
> > Kernel module is out-of-tree?
> >
> mlx5 driver doesn't support vfio_device for SFs.
> Kernel plumbing for PASID assignment to SF is WIP currently kernel community.
> We do not have any out-of-tree kernel module.
> 
> > Just FYI:
> >
> > We are introducing a new mdev bus for DPDK:
> > http://patchwork.dpdk.org/project/dpdk/cover/20210601030644.3318-1-
> > chenbo@intel.com/
> >
> I am yet to read about it. But I am not sure what value does it add.
> A user can open a vfio device using vfio subsystem and operate on it.
> A vfio device can be a create as a result of binding PCI VF/PF to vfio-pci
> driver or a SF by binding SF to vfio_foo driver.

Yes, in general it is the way. For vfio-mdev, it works as binding the vfio-mdev
to parent device and echo uuid to create a virtual device. VFIO APP like DPDK,
as you said, should work similar with VFIO UAPI for vfio-pci devices or 
mdev-based
devices. But currently DPDK only cares about vfio-pci devices and does not care
things for other cases like mdev-based pci devices. For example, it does not 
scan
/sys/bus/mdev and it always uses pci bdf as device address, which mdev-based pci
devices do not have. Therefore I sent that patchset.

> There is kernel work in progress to use vfio core as library.

OK. Could you share me some link to it? Much appreciated.

> So we do not anticipate to use add mdev layer and uuid to create a vfio device
> for a SF.

OK. For now, we are following the vfio-mdev standard, using UUID to create vfio
devices.

> 
> For Intel, ADI will never has any netdevs or rdma dev?

I think technically it could have. But for some devices like our dma devices, 
it's
just using mdev:

https://www.spinics.net/lists/kvm/msg244417.html

Thanks,
Chenbo



Re: [dpdk-dev] [PATCH v2 6/6] vhost: convert inflight data to DPDK allocation API

2021-06-15 Thread Maxime Coquelin



On 6/15/21 11:25 AM, David Marchand wrote:
> On Tue, Jun 15, 2021 at 10:43 AM Maxime Coquelin
>  wrote:
>> @@ -559,6 +559,31 @@ numa_realloc(struct virtio_net *dev, int index)
>> vq->log_cache = lc;
>> }
>>
>> +   if (vq->resubmit_inflight) {
>> +   struct rte_vhost_resubmit_info *ri;
>> +
>> +   ri = rte_realloc_socket(vq->resubmit_inflight, sizeof(*ri), 
>> 0, node);
>> +   if (!ri) {
>> +   VHOST_LOG_CONFIG(ERR, "Failed to realloc resubmit 
>> inflight on node %d\n",
>> +   node);
>> +   return dev;
>> +   }
>> +   vq->resubmit_inflight = ri;
>> +
>> +   if (vq->resubmit_inflight) {
> 
> Quick first pass, I'll review more thoroughly the whole series later.
> 
> I suppose you want to test ri->resubmit_list != NULL (else, this test
> is unnecessary since we made sure ri != NULL earlier).

Thanks for catching it, I screwed up my copy/paste...

this check should be about !ri->resubmit_list indeed, and below one
about !rd.

It will be fixed in v3, but I'll let time for review on this revision.

Thanks,
Maxime

>> +   struct rte_vhost_resubmit_desc *rd;
>> +
>> +   rd = rte_realloc_socket(ri->resubmit_list, 
>> sizeof(*rd) * ri->resubmit_num,
>> +   0, node);
>> +   if (!ri) {
>> +   VHOST_LOG_CONFIG(ERR, "Failed to realloc 
>> resubmit list on node %d\n",
>> +   node);
>> +   return dev;
>> +   }
>> +   ri->resubmit_list = rd;
>> +   }
>> +   }
>> +
>> vq->numa_node = node;
> 
> 
> 



Re: [dpdk-dev] [PATCH v13] app/testpmd: support multi-process

2021-06-15 Thread Min Hu (Connor)

Hi, Andrew,
see replies below, and others without no reply will be fixed in v14.

在 2021/6/8 16:42, Andrew Rybchenko 写道:

@Thomas, @Ferruh, please, see question below.

On 4/22/21 4:18 AM, Min Hu (Connor) wrote:

This patch adds multi-process support for testpmd.
The test cmd example as follows:
the primary cmd:
./dpdk-testpmd -a xxx --proc-type=auto -l 0-1 -- -i \
--rxq=4 --txq=4 --num-procs=2 --proc-id=0

the secondary cmd:
./dpdk-testpmd -a xxx --proc-type=auto -l 2-3 -- -i \
--rxq=4 --txq=4 --num-procs=2 --proc-id=1

Signed-off-by: Min Hu (Connor) 
Signed-off-by: Lijun Ou 
Acked-by: Xiaoyun Li 
Acked-by: Ajit Khaparde 
Reviewed-by: Ferruh Yigit 
---
v13:
* Modified the doc syntax.

v12:
* Updated doc info.

v11:
* Fixed some minor syntax.

v10:
* Hid process type checks behind new functions.
* Added comments.

v9:
* Updated release notes and rst doc.
* Deleted deprecated codes.
* move macro and variable.

v8:
* Added warning info about queue numbers and process numbers.

v7:
* Fixed compiling error for unexpected unindent.

v6:
* Add rte flow description for multiple process.

v5:
* Fixed run_app.rst for multiple process description.
* Fix compiling error.

v4:
* Fixed minimum vlaue of Rxq or Txq in doc.

v3:
* Fixed compiling error using gcc10.0.

v2:
* Added document for this patch.
---
  app/test-pmd/cmdline.c |   6 ++
  app/test-pmd/config.c  |  21 +-
  app/test-pmd/parameters.c  |  11 +++
  app/test-pmd/testpmd.c | 129 ++---
  app/test-pmd/testpmd.h |   9 +++
  doc/guides/rel_notes/release_21_05.rst |   1 +
  doc/guides/testpmd_app_ug/run_app.rst  |  70 ++
  7 files changed, 220 insertions(+), 27 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 12efbc0..f0fa6e8 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -5450,6 +5450,12 @@ cmd_set_flush_rx_parsed(void *parsed_result,
__rte_unused void *data)
  {
struct cmd_set_flush_rx *res = parsed_result;
+
+   if (num_procs > 1 && (strcmp(res->mode, "on") == 0)) {
+   printf("multi-process doesn't support to flush rx queues.\n");


rx -> Rx


+   return;
+   }
+
no_flush_rx = (uint8_t)((strcmp(res->mode, "on") == 0) ? 0 : 1);
  }
  
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c

index e189062..9eb1fa7 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2971,6 +2971,8 @@ rss_fwd_config_setup(void)
queueid_t  rxq;
queueid_t  nb_q;
streamid_t  sm_id;
+   int start;
+   int end;
  
  	nb_q = nb_rxq;

if (nb_q > nb_txq)
@@ -2988,7 +2990,22 @@ rss_fwd_config_setup(void)
init_fwd_streams();
  
  	setup_fwd_config_of_each_lcore(&cur_fwd_config);

-   rxp = 0; rxq = 0;
+
+   if (proc_id > 0 && nb_q % num_procs)


Please, compare result with 0 explicitly.


+   printf("Warning! queue numbers should be multiple of "
+   "processes, or packet loss will happen.\n");


Do not split format string across multiple lines.

Frankly speaking I don't undertand why. Why is it impossible to
serve 2 queues in the first process and 1 queue in the second
process if 3 queues and 2 processes are configured.
I think RSS redirection table can perfectly do it.


Well, currently, my patch is one design implementation. I think
this can be done for later improvemnet.

+
+   /**
+* In multi-process, All queues are allocated to different
+* processes based on num_procs and proc_id. For example:
+* if supports 4 queues(nb_q), 2 processes(num_procs),
+* the 0~1 queue for primary process.
+* the 2~3 queue for secondary process.
+*/
+   start = proc_id * nb_q / num_procs;
+   end = start + nb_q / num_procs;
+   rxp = 0;
+   rxq = start;
for (sm_id = 0; sm_id < cur_fwd_config.nb_fwd_streams; sm_id++) {
struct fwd_stream *fs;
  
@@ -3005,6 +3022,8 @@ rss_fwd_config_setup(void)

continue;
rxp = 0;
rxq++;
+   if (rxq >= end)
+   rxq = start;
}
  }
  
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c

index f3954c1..ece05c1 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -508,6 +508,9 @@ parse_link_speed(int n)
  void
  launch_args_parse(int argc, char** argv)
  {
+#define PARAM_PROC_ID "proc-id"
+#define PARAM_NUM_PROCS "num-procs"
+
int n, opt;
char **argvopt;
int opt_idx;
@@ -625,6 +628,8 @@ launch_args_parse(int argc, char** argv)
{ "rx-mq-mode", 1, 0, 0 },
{ "record-core-cycles", 0, 0, 0 },
{ "record-burst-stats", 0, 0, 0 },
+   { PARAM_NUM_PROCS,  1, 0, 0 },
+   { PARAM_PR

[dpdk-dev] mbuf next field belongs in the first cacheline

2021-06-15 Thread Morten Brørup
MBUF and MLX5 maintainers,

I'm picking up an old discussion, which you might consider pursuing. Feel free 
to ignore, if you consider this discussion irrelevant or already closed and 
done with.

The Techboard has previously discussed the organization of the mbuf fields. 
Ref: http://mails.dpdk.org/archives/dev/2020-November/191859.html

It was concluded that there was no measured performance difference if the 
"pool" or "next" field was in the first cacheline, so it was decided to put the 
"pool" field in the first cacheline. And further optimizing the mbuf field 
organization could be reconsidered later.

I have been looking at it. In theory it should not be required to touch the 
"pool" field at RX. But the "next" field must be written for segmented packets.

I think you could achieve an RX performance gain in the MLX5 driver if the mbuf 
structure was changed so the "next" and "pool" fields were swapped (i.e. 
putting "next" in the first cacheline), and /drivers/net/mlx5/mlx5_rx.c line 
821 was modified to replace "rep = rte_mbuf_raw_alloc(seg->pool)" with 
something conceptually like "rep = rte_mbuf_raw_alloc(rxq->pool)". Then you 
don't have to touch the mbuf's "pool" field (residing in the second cacheline 
with this change) during RX. This way, you would only touch the mbuf's first 
cacheline during RX.

My suggested optimization might be purely theoretical: Many applications touch 
the mbuf's second cacheline shortly after RX anyway.

If you don't pursue this mbuf reorganization, the comment to the mbuf's 
cacheline1 field is incorrect and should be updated:
- /* second cache line - fields only used in slow path or on TX */
+ /* second cache line - fields mainly used in slow path or on TX */

-Morten



Re: [dpdk-dev] [RFC PATCH v2 2/3] example/qos_sched: add PIE support

2021-06-15 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Liguzinski,
> WojciechX
> Sent: Tuesday, 15 June 2021 11.02

[snip]

> diff --git a/config/rte_config.h b/config/rte_config.h
> index 590903c07d..48132f27df 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -89,7 +89,6 @@
>  #define RTE_MAX_LCORE_FREQS 64
> 
>  /* rte_sched defines */
> -#undef RTE_SCHED_RED

Should the above be removed, or replaced with:
#undef RTE_SCHED_AQM

>  #undef RTE_SCHED_COLLECT_STATS
>  #undef RTE_SCHED_SUBPORT_TC_OV
>  #define RTE_SCHED_PORT_N_GRINDERS 8



[dpdk-dev] [PATCH v14] app/testpmd: support multi-process

2021-06-15 Thread Min Hu (Connor)
This patch adds multi-process support for testpmd.
The test cmd example as follows:
the primary cmd:
./dpdk-testpmd -a xxx --proc-type=auto -l 0-1 -- -i \
--rxq=4 --txq=4 --num-procs=2 --proc-id=0

the secondary cmd:
./dpdk-testpmd -a xxx --proc-type=auto -l 2-3 -- -i \
--rxq=4 --txq=4 --num-procs=2 --proc-id=1

Signed-off-by: Min Hu (Connor) 
Signed-off-by: Lijun Ou 
Acked-by: Xiaoyun Li 
Acked-by: Ajit Khaparde 
Reviewed-by: Ferruh Yigit 
---
v14:
* Fixed comments by Andrew Rybchenko.

v13:
* Modified the doc syntax.

v12:
* Updated doc info.

v11:
* Fixed some minor syntax.

v10:
* Hid process type checks behind new functions.
* Added comments.

v9:
* Updated release notes and rst doc.
* Deleted deprecated codes.
* move macro and variable.

v8:
* Added warning info about queue numbers and process numbers.

v7:
* Fixed compiling error for unexpected unindent.

v6:
* Add rte flow description for multiple process.

v5:
* Fixed run_app.rst for multiple process description.
* Fix compiling error.

v4:
* Fixed minimum vlaue of Rxq or Txq in doc.

v3:
* Fixed compiling error using gcc10.0.

v2:
* Added document for this patch.
---
 app/test-pmd/cmdline.c |   6 ++
 app/test-pmd/config.c  |  20 +-
 app/test-pmd/parameters.c  |   9 +++
 app/test-pmd/testpmd.c | 121 ++---
 app/test-pmd/testpmd.h |   9 +++
 doc/guides/rel_notes/release_21_08.rst |   2 +-
 doc/guides/testpmd_app_ug/run_app.rst  |  70 +++
 7 files changed, 212 insertions(+), 25 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0268b18..a215b12 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -5463,6 +5463,12 @@ cmd_set_flush_rx_parsed(void *parsed_result,
__rte_unused void *data)
 {
struct cmd_set_flush_rx *res = parsed_result;
+
+   if (num_procs > 1 && (strcmp(res->mode, "on") == 0)) {
+   printf("multi-process doesn't support to flush Rx queues.\n");
+   return;
+   }
+
no_flush_rx = (uint8_t)((strcmp(res->mode, "on") == 0) ? 0 : 1);
 }
 
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 43c79b5..a0c24bb 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2976,6 +2976,8 @@ rss_fwd_config_setup(void)
queueid_t  rxq;
queueid_t  nb_q;
streamid_t  sm_id;
+   int start;
+   int end;
 
nb_q = nb_rxq;
if (nb_q > nb_txq)
@@ -2993,7 +2995,21 @@ rss_fwd_config_setup(void)
init_fwd_streams();
 
setup_fwd_config_of_each_lcore(&cur_fwd_config);
-   rxp = 0; rxq = 0;
+
+   if (proc_id > 0 && nb_q % num_procs != 0)
+   printf("Warning! queue numbers should be multiple of processes, 
or packet loss will happen.\n");
+
+   /**
+* In multi-process, All queues are allocated to different
+* processes based on num_procs and proc_id. For example:
+* if supports 4 queues(nb_q), 2 processes(num_procs),
+* the 0~1 queue for primary process.
+* the 2~3 queue for secondary process.
+*/
+   start = proc_id * nb_q / num_procs;
+   end = start + nb_q / num_procs;
+   rxp = 0;
+   rxq = start;
for (sm_id = 0; sm_id < cur_fwd_config.nb_fwd_streams; sm_id++) {
struct fwd_stream *fs;
 
@@ -3010,6 +3026,8 @@ rss_fwd_config_setup(void)
continue;
rxp = 0;
rxq++;
+   if (rxq >= end)
+   rxq = start;
}
 }
 
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index f3954c1..0f09841 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -508,6 +508,9 @@ parse_link_speed(int n)
 void
 launch_args_parse(int argc, char** argv)
 {
+#define PARAM_PROC_ID "proc-id"
+#define PARAM_NUM_PROCS "num-procs"
+
int n, opt;
char **argvopt;
int opt_idx;
@@ -625,6 +628,8 @@ launch_args_parse(int argc, char** argv)
{ "rx-mq-mode", 1, 0, 0 },
{ "record-core-cycles", 0, 0, 0 },
{ "record-burst-stats", 0, 0, 0 },
+   { PARAM_NUM_PROCS,  1, 0, 0 },
+   { PARAM_PROC_ID,1, 0, 0 },
{ 0, 0, 0, 0 },
};
 
@@ -1391,6 +1396,10 @@ launch_args_parse(int argc, char** argv)
record_core_cycles = 1;
if (!strcmp(lgopts[opt_idx].name, "record-burst-stats"))
record_burst_stats = 1;
+   if (!strcmp(lgopts[opt_idx].name, PARAM_NUM_PROCS))
+   num_procs = atoi(optarg);
+   if (!strcmp(lgopts[opt_idx].name, PARAM_PROC_ID))
+   proc_id = atoi(optarg);
break;
case 'h':
   

Re: [dpdk-dev] [PATCH v2 10/62] net/cnxk: add platform specific probe and remove

2021-06-15 Thread Jerin Jacob
On Mon, Jun 7, 2021 at 11:34 PM Nithin Dabilpuram
 wrote:
>
> Add platform specific probe and remove callbacks for CN9K
> and CN10K which use common probe and remove functions.
> Register ethdev driver for CN9K and CN10K.
>
> Signed-off-by: Nithin Dabilpuram 

Reviewed-by: Jerin Jacob 


> ---
>  drivers/net/cnxk/cn10k_ethdev.c | 64 
>  drivers/net/cnxk/cn10k_ethdev.h |  9 +
>  drivers/net/cnxk/cn9k_ethdev.c  | 82 
> +
>  drivers/net/cnxk/cn9k_ethdev.h  |  9 +
>  drivers/net/cnxk/cnxk_ethdev.c  | 42 +
>  drivers/net/cnxk/cnxk_ethdev.h  | 19 ++
>  drivers/net/cnxk/meson.build|  5 +++
>  7 files changed, 230 insertions(+)
>  create mode 100644 drivers/net/cnxk/cn10k_ethdev.c
>  create mode 100644 drivers/net/cnxk/cn10k_ethdev.h
>  create mode 100644 drivers/net/cnxk/cn9k_ethdev.c
>  create mode 100644 drivers/net/cnxk/cn9k_ethdev.h
>
> diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
> new file mode 100644
> index 000..ff8ce31
> --- /dev/null
> +++ b/drivers/net/cnxk/cn10k_ethdev.c
> @@ -0,0 +1,64 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +#include "cn10k_ethdev.h"
> +
> +static int
> +cn10k_nix_remove(struct rte_pci_device *pci_dev)
> +{
> +   return cnxk_nix_remove(pci_dev);
> +}
> +
> +static int
> +cn10k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device 
> *pci_dev)
> +{
> +   struct rte_eth_dev *eth_dev;
> +   int rc;
> +
> +   if (RTE_CACHE_LINE_SIZE != 64) {
> +   plt_err("Driver not compiled for CN10K");
> +   return -EFAULT;
> +   }
> +
> +   rc = roc_plt_init();
> +   if (rc) {
> +   plt_err("Failed to initialize platform model, rc=%d", rc);
> +   return rc;
> +   }
> +
> +   /* Common probe */
> +   rc = cnxk_nix_probe(pci_drv, pci_dev);
> +   if (rc)
> +   return rc;
> +
> +   if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
> +   eth_dev = rte_eth_dev_allocated(pci_dev->device.name);
> +   if (!eth_dev)
> +   return -ENOENT;
> +   }
> +   return 0;
> +}
> +
> +static const struct rte_pci_id cn10k_pci_nix_map[] = {
> +   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_PF),
> +   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_PF),
> +   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_VF),
> +   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_VF),
> +   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_AF_VF),
> +   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_AF_VF),
> +   {
> +   .vendor_id = 0,
> +   },
> +};
> +
> +static struct rte_pci_driver cn10k_pci_nix = {
> +   .id_table = cn10k_pci_nix_map,
> +   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA |
> +RTE_PCI_DRV_INTR_LSC,
> +   .probe = cn10k_nix_probe,
> +   .remove = cn10k_nix_remove,
> +};
> +
> +RTE_PMD_REGISTER_PCI(net_cn10k, cn10k_pci_nix);
> +RTE_PMD_REGISTER_PCI_TABLE(net_cn10k, cn10k_pci_nix_map);
> +RTE_PMD_REGISTER_KMOD_DEP(net_cn10k, "vfio-pci");
> diff --git a/drivers/net/cnxk/cn10k_ethdev.h b/drivers/net/cnxk/cn10k_ethdev.h
> new file mode 100644
> index 000..1bf4a65
> --- /dev/null
> +++ b/drivers/net/cnxk/cn10k_ethdev.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +#ifndef __CN10K_ETHDEV_H__
> +#define __CN10K_ETHDEV_H__
> +
> +#include 
> +
> +#endif /* __CN10K_ETHDEV_H__ */
> diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c
> new file mode 100644
> index 000..701dc12
> --- /dev/null
> +++ b/drivers/net/cnxk/cn9k_ethdev.c
> @@ -0,0 +1,82 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +#include "cn9k_ethdev.h"
> +
> +static int
> +cn9k_nix_remove(struct rte_pci_device *pci_dev)
> +{
> +   return cnxk_nix_remove(pci_dev);
> +}
> +
> +static int
> +cn9k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device 
> *pci_dev)
> +{
> +   struct rte_eth_dev *eth_dev;
> +   struct cnxk_eth_dev *dev;
> +   int rc;
> +
> +   if (RTE_CACHE_LINE_SIZE != 128) {
> +   plt_err("Driver not compiled for CN9K");
> +   return -EFAULT;
> +   }
> +
> +   rc = roc_plt_init();
> +   if (rc) {
> +   plt_err("Failed to initialize platform model, rc=%d", rc);
> +   return rc;
> +   }
> +
> +   /* Common probe */
> +   rc = cnxk_nix_probe(pci_drv, pci_dev);
> +   if (rc)
> +   return rc;
> +
> +   /* Find eth dev allocated */
> +   eth_dev = rte_eth_dev_allocated(pci_dev->device.name);
> +   if (!eth_dev)
> +   return -ENOENT;
> +
> +   dev = cnxk_

Re: [dpdk-dev] [PATCH v2 13/62] net/cnxk: add device configuration operation

2021-06-15 Thread Jerin Jacob
On Mon, Jun 7, 2021 at 11:34 PM Nithin Dabilpuram
 wrote:
>
> Add device configuration op for CN9K and CN10K. Most of the
> device configuration is common between two platforms except for
> some supported offloads.
>
> Signed-off-by: Nithin Dabilpuram 
> +static int
> +nix_restore_queue_cfg(struct rte_eth_dev *eth_dev)
> +{
> +   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
> +   const struct eth_dev_ops *dev_ops = eth_dev->dev_ops;
> +   struct cnxk_eth_qconf *tx_qconf = dev->tx_qconf;
> +   struct cnxk_eth_qconf *rx_qconf = dev->rx_qconf;
> +   int rc, i, nb_rxq, nb_txq;
> +   void **txq, **rxq;
> +
> +   nb_rxq = RTE_MIN(dev->nb_rxq, eth_dev->data->nb_rx_queues);
> +   nb_txq = RTE_MIN(dev->nb_txq, eth_dev->data->nb_tx_queues);
> +
> +   rc = -ENOMEM;
> +   /* Setup tx & rx queues with previous configuration so
> +* that the queues can be functional in cases like ports
> +* are started without re configuring queues.
> +*
> +* Usual re config sequence is like below:
> +* port_configure() {
> +*  if(reconfigure) {
> +*  queue_release()
> +*  queue_setup()
> +*  }
> +*  queue_configure() {
> +*  queue_release()
> +*  queue_setup()
> +*  }
> +* }
> +* port_start()


This logic no more required as the KNI application fixed the bug.

> +*
> +* In some application's control path, queue_configure() would
> +* NOT be invoked for TXQs/RXQs in port_configure().
> +* In such cases, queues can be functional after start as the
> +* queues are already setup in port_configure().
> +*/
> +   for (i = 0; i < nb_txq; i++) {
> +   if (!tx_qconf[i].valid)
> +   continue;
> +   rc = dev_ops->tx_queue_setup(eth_dev, i, tx_qconf[i].nb_desc, 
> 0,
> +&tx_qconf[i].conf.tx);
> +   if (rc) {
> +   plt_err("Failed to setup tx queue rc=%d", rc);
> +   txq = eth_dev->data->tx_queues;
> +   for (i -= 1; i >= 0; i--)
> +   dev_ops->tx_queue_release(txq[i]);
> +   goto fail;
> +   }
> +   }
> +
> +   free(tx_qconf);
> +   tx_qconf = NULL;
> +
> +   for (i = 0; i < nb_rxq; i++) {
> +   if (!rx_qconf[i].valid)
> +   continue;
> +   rc = dev_ops->rx_queue_setup(eth_dev, i, rx_qconf[i].nb_desc, 
> 0,
> +&rx_qconf[i].conf.rx,
> +rx_qconf[i].mp);
> +   if (rc) {
> +   plt_err("Failed to setup rx queue rc=%d", rc);
> +   rxq = eth_dev->data->rx_queues;
> +   for (i -= 1; i >= 0; i--)
> +   dev_ops->rx_queue_release(rxq[i]);
> +   goto tx_queue_release;
> +   }
> +   }
> +
> +   free(rx_qconf);
> +   rx_qconf = NULL;
> +
> +   return 0;
> +
> +tx_queue_release:
> +   txq = eth_dev->data->tx_queues;
> +   for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
> +   dev_ops->tx_queue_release(txq[i]);
> +fail:
> +   if (tx_qconf)
> +   free(tx_qconf);
> +   if (rx_qconf)
> +   free(rx_qconf);
> +
> +   return rc;
> +}
> +
>


Re: [dpdk-dev] [PATCH v2 14/62] net/cnxk: add link status update support

2021-06-15 Thread Jerin Jacob
On Mon, Jun 7, 2021 at 11:35 PM Nithin Dabilpuram
 wrote:
>
> Add link status update callback to get current
> link status.
>
> Signed-off-by: Nithin Dabilpuram 
> ---
>  doc/guides/nics/cnxk.rst  |   1 +
>  doc/guides/nics/features/cnxk.ini |   2 +
>  doc/guides/nics/features/cnxk_vec.ini |   2 +
>  doc/guides/nics/features/cnxk_vf.ini  |   2 +
>  drivers/net/cnxk/cnxk_ethdev.c|   7 +++
>  drivers/net/cnxk/cnxk_ethdev.h|   8 +++
>  drivers/net/cnxk/cnxk_link.c  | 102 
> ++
>  drivers/net/cnxk/meson.build  |   3 +-
>  8 files changed, 126 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/cnxk/cnxk_link.c
>
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2021 Marvell.
> + */
> +
> +#include "cnxk_ethdev.h"
> +
> +static inline int
> +nix_wait_for_link_cfg(struct cnxk_eth_dev *dev)
> +{
> +   uint16_t wait = 1000;
> +
> +   do {
> +   rte_rmb();

Please use gcc atomics instead.

With above change:

Reviewed-by: Jerin Jacob 

> +   if (!(dev->flags & CNXK_LINK_CFG_IN_PROGRESS_F))
> +   break;
> +   wait--;
> +   rte_delay_ms(1);
> +   } while (wait);
> +
> +   return wait ? 0 : -1;
> +}
>


Re: [dpdk-dev] [PATCH v2 38/62] net/cnxk: add flow ctrl set/get ops

2021-06-15 Thread Jerin Jacob
On Mon, Jun 7, 2021 at 11:37 PM Nithin Dabilpuram
 wrote:
>
> From: Sunil Kumar Kori 
>
> Patch implements set and get operations for flow control.
>
> Signed-off-by: Sunil Kumar Kori 
> ---

> +
> +   /* Check if TX pause frame is already enabled or not */
> +   if (fc->tx_pause ^ tx_pause) {
> +   if (roc_model_is_cn96_Ax() && data->dev_started) {
> +   /* On Ax, CQ should be in disabled state
> +* while setting flow control configuration.
> +*/
> +   plt_info("Stop the port=%d for setting flow control",
> +data->port_id);
> +   return 0;
> +   }
> +
> +   for (i = 0; i < data->nb_rx_queues; i++) {

>From here:

> +   memset(&fc_cfg, 0, sizeof(struct roc_nix_fc_cfg));
> +   rxq = ((struct cnxk_eth_rxq_sp *)
> +   data->rx_queues[i]) - 1;
> +   cq = &dev->cqs[rxq->qid];
> +   fc_cfg.cq_cfg_valid = true;
> +   fc_cfg.cq_cfg.enable = tx_pause;
> +   fc_cfg.cq_cfg.rq = rxq->qid;
> +   fc_cfg.cq_cfg.cq_drop = cq->drop_thresh;
> +   rc = roc_nix_fc_config_set(nix, &fc_cfg);
> +   if (rc)
> +   return rc;

Better to move this separate static function.

> +   }
> +   }
> +
> +   rc = roc_nix_fc_mode_set(nix, mode_map[fc_conf->mode]);
> +   if (rc)
> +   return rc;
> +
> +   fc->rx_pause = rx_pause;
> +   fc->tx_pause = tx_pause;
> +   fc->mode = fc_conf->mode;
> +
> +   return rc;
> +}
> +
> +int
>  cnxk_nix_mac_addr_set(struct rte_eth_dev *eth_dev, struct rte_ether_addr 
> *addr)
>  {
> struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
> --
> 2.8.4
>


Re: [dpdk-dev] [PATCH v2 49/62] net/cnxk: add initial version of rte flow support

2021-06-15 Thread Jerin Jacob
On Mon, Jun 7, 2021 at 11:39 PM Nithin Dabilpuram
 wrote:
>
> From: Kiran Kumar K 
>
> Adding initial version of rte_flow support for cnxk family device.
> Supported rte_flow ops are flow_validate, flow_create, flow_crstroy,
> flow_flush, flow_query, flow_isolate.
>
> Signed-off-by: Kiran Kumar K 
> ---
>  doc/guides/nics/cnxk.rst  | 118 
>  doc/guides/nics/features/cnxk.ini |  42 ++
>  drivers/net/cnxk/cnxk_rte_flow.c  | 282 
> ++
>  drivers/net/cnxk/cnxk_rte_flow.h  |  69 ++
>  drivers/net/cnxk/meson.build  |   1 +
>  5 files changed, 512 insertions(+)
>  create mode 100644 drivers/net/cnxk/cnxk_rte_flow.c
>  create mode 100644 drivers/net/cnxk/cnxk_rte_flow.h
>
> diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
> index c2a6fbb..87401f0 100644
> --- a/doc/guides/nics/cnxk.rst
> +++ b/doc/guides/nics/cnxk.rst
> @@ -24,6 +24,7 @@ Features of the CNXK Ethdev PMD are:
>  - Multiple queues for TX and RX
>  - Receiver Side Scaling (RSS)
>  - MAC filtering
> +- Generic flow API
>  - Inner and Outer Checksum offload
>  - Port hardware statistics
>  - Link state information
> @@ -222,3 +223,120 @@ Debugging Options
> +---++---+
> | 2 | NPC| --log-level='pmd\.net.cnxk\.flow,8'   |
> +---++---+
> +
> +RTE Flow Support
> +
> +
> +The OCTEON CN9K/CN10K SoC family NIC has support for the following patterns 
> and
> +actions.
> +
> +Patterns:
> +
> +.. _table_cnxk_supported_flow_item_types:
> +
> +.. table:: Item types
> +
> +   +++
> +   | #  | Pattern Type   |
> +   +++
> +   | 1  | RTE_FLOW_ITEM_TYPE_ETH |
> +   +++
> +   | 2  | RTE_FLOW_ITEM_TYPE_VLAN|
> +   +++
> +   | 3  | RTE_FLOW_ITEM_TYPE_E_TAG   |
> +   +++
> +   | 4  | RTE_FLOW_ITEM_TYPE_IPV4|
> +   +++
> +   | 5  | RTE_FLOW_ITEM_TYPE_IPV6|
> +   +++
> +   | 6  | RTE_FLOW_ITEM_TYPE_ARP_ETH_IPV4|
> +   +++
> +   | 7  | RTE_FLOW_ITEM_TYPE_MPLS|
> +   +++
> +   | 8  | RTE_FLOW_ITEM_TYPE_ICMP|
> +   +++
> +   | 9  | RTE_FLOW_ITEM_TYPE_UDP |
> +   +++
> +   | 10 | RTE_FLOW_ITEM_TYPE_TCP |
> +   +++
> +   | 11 | RTE_FLOW_ITEM_TYPE_SCTP|
> +   +++
> +   | 12 | RTE_FLOW_ITEM_TYPE_ESP |
> +   +++
> +   | 13 | RTE_FLOW_ITEM_TYPE_GRE |
> +   +++
> +   | 14 | RTE_FLOW_ITEM_TYPE_NVGRE   |
> +   +++
> +   | 15 | RTE_FLOW_ITEM_TYPE_VXLAN   |
> +   +++
> +   | 16 | RTE_FLOW_ITEM_TYPE_GTPC|
> +   +++
> +   | 17 | RTE_FLOW_ITEM_TYPE_GTPU|
> +   +++
> +   | 18 | RTE_FLOW_ITEM_TYPE_GENEVE  |
> +   +++
> +   | 19 | RTE_FLOW_ITEM_TYPE_VXLAN_GPE   |
> +   +++
> +   | 20 | RTE_FLOW_ITEM_TYPE_IPV6_EXT|
> +   +++
> +   | 21 | RTE_FLOW_ITEM_TYPE_VOID|
> +   +++
> +   | 22 | RTE_FLOW_ITEM_TYPE_ANY |
> +   +++
> +   | 23 | RTE_FLOW_ITEM_TYPE_GRE_KEY |
> +   +++
> +   | 24 | RTE_FLOW_ITEM_TYPE_HIGIG2  |
> +   +++
> +
> +.. note::
> +
> +   ``RTE_FLOW_ITEM_TYPE_GRE_KEY`` works only when checksum and routing
> +   bits in the GRE header are equal to 0.
> +
> +Actions:
> +
> +.. _table_cnxk_supported_ingress_action_types:
> +
> +.. table:: Ingress action types
> +
> +   ++-+
> +   | #  | Action Type |
> +   ++=+
> +   | 1  | RTE_FLOW_ACTION_TYPE_VOID   |
> +   ++-+
> +   | 2  | RTE_FLOW_ACTION_TYPE_MARK   |
> +   ++-+
> +   | 3  | RTE_FLOW_ACTION_TYPE_FLAG   |
> +   ++-+
> +   | 4  | RTE_FLOW_ACTION_TYPE_COUNT  |
> +   ++-+
> +   | 5  | RTE_FLOW_ACTION_TYPE_DROP   |
> +   ++--

Re: [dpdk-dev] [RFC 00/14] mlx5: support SubFunction

2021-06-15 Thread Parav Pandit


> From: Xia, Chenbo 
> Sent: Tuesday, June 15, 2021 4:49 PM
> 
> >
> > > Just FYI:
> > >
> > > We are introducing a new mdev bus for DPDK:
> > > http://patchwork.dpdk.org/project/dpdk/cover/20210601030644.3318-1-
> > > chenbo@intel.com/
> > >
> > I am yet to read about it. But I am not sure what value does it add.
> > A user can open a vfio device using vfio subsystem and operate on it.
> > A vfio device can be a create as a result of binding PCI VF/PF to
> > vfio-pci driver or a SF by binding SF to vfio_foo driver.
> 
> Yes, in general it is the way. For vfio-mdev, it works as binding the 
> vfio-mdev
> to parent device and echo uuid to create a virtual device. VFIO APP like
> DPDK, as you said, should work similar with VFIO UAPI for vfio-pci devices or
> mdev-based devices. But currently DPDK only cares about vfio-pci devices
> and does not care things for other cases like mdev-based pci devices. For
> example, it does not scan /sys/bus/mdev and it always uses pci bdf as device
> address, which mdev-based pci devices do not have. Therefore I sent that
> patchset.
mdev device reside on mdev bus. So dpdk should identify the mdev object by 
specifying bus type = mdev, and device id = uuid.
There should not be any attachment to pci as Thomas said.

> 
> > There is kernel work in progress to use vfio core as library.
> 
> OK. Could you share me some link to it? Much appreciated.
> 
[1] https://lore.kernel.org/kvm/20210603160809.15845-1-mgurto...@nvidia.com/

> > So we do not anticipate to use add mdev layer and uuid to create a
> > vfio device for a SF.
> 
> OK. For now, we are following the vfio-mdev standard, using UUID to create
> vfio devices.
> 
If this layer is going to work on top of VFIO devices, does it really care that 
is it mdev?
Can it identify the vfio device by vfio device and its UAPI in uniform way?
such as open("/dev/vfio/98" ..);


> >
> > For Intel, ADI will never has any netdevs or rdma dev?
> 
> I think technically it could have. 
Unlikely. As I explained in previous email that creating netdev, rdma dev using 
mdev bus was already rejected in my previous patches.
And we step forward with auxiliary bus.

> But for some devices like our dma devices,
> it's just using mdev:
> 
> https://www.spinics.net/lists/kvm/msg244417.html
Possibly yes. Some devices might live on mdev bus.
You should wait for kernel patches to be merged as Jason said.

I still think that identifying vfio device by its /dev/vfio/ will go long 
way regardless of its bus type.


Re: [dpdk-dev] [PATCH v2 51/62] net/cnxk: add ethdev firmware version get

2021-06-15 Thread Jerin Jacob
On Mon, Jun 7, 2021 at 11:39 PM Nithin Dabilpuram
 wrote:
>
> From: Satha Rao 
>
> Add callback to get ethdev firmware version.
>
> Signed-off-by: Satha Rao 

> +int
> +cnxk_nix_fw_version_get(struct rte_eth_dev *eth_dev, char *fw_version,
> +   size_t fw_size)
> +{
> +   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
> +   const char *str = roc_npc_profile_name_get(&dev->npc);
> +   uint32_t size = strlen(str) + 1;
> +
> +   if (fw_size > size)
> +   fw_size = size;
> +
> +   strlcpy(fw_version, str, fw_size);

use rte_strlcpy instead.

> +
> +   if (fw_size < size)
> +   return size;
> +
> +   return 0;
> +}
> +
>  void
>  cnxk_nix_rxq_info_get(struct rte_eth_dev *eth_dev, uint16_t qid,
>   struct rte_eth_rxq_info *qinfo)
> --
> 2.8.4
>


  1   2   >