Re: [dpdk-dev] Recent change to make rte_cryptodev_pmd.h internal prevents some important functionality

2021-10-04 Thread Zhang, Roy Fan
Hi Akhil,

This isn’t what our concern was – our concern was rte_cryptodev_close() may not 
remove the memory complete as rte_cryptodev_pmd_destroy() did.
Our research result towards this was if the PMD could act more throughout to 
make rte_cryptodev_close() working same as rte_cryptodev_pmd_destroy().

What Paul’s concern is valid: We DO NOT have a way to release a queue pair 
manually anymore, and releasing queue pair to me should not be blocked from 
public API access.
To resolve this problem we should have a public queue_pair_release() function 
in the cryptodev. If you are ok I can send a patch for 21.11 right away.

Regards,
Fan


From: Akhil Goyal 
Sent: Monday, October 4, 2021 7:45 AM
To: Luse, Paul E ; dev@dpdk.org; Zhang, Roy Fan 

Cc: ma...@nvidia.com; hemant.agra...@nxp.com
Subject: RE: Recent change to make rte_cryptodev_pmd.h internal prevents some 
important functionality

Hi Paul,

Similar comment was discussed in ML for fips_validation app.
https://mails.dpdk.org/archives/dev/2021-August/217781.html

I believe Fan is working on it to fix the issue.
Fan, Could you please share the update.

Regards,
Akhil

Note: Please CC maintainers for a prompt response, or else mails can be skipped.

From: Luse, Paul E mailto:paul.e.l...@intel.com>>
Sent: Sunday, October 3, 2021 3:14 AM
To: dev@dpdk.org
Cc: Akhil Goyal mailto:gak...@marvell.com>>; 
ma...@nvidia.com; Zhang, Roy Fan 
mailto:roy.fan.zh...@intel.com>>; 
hemant.agra...@nxp.com
Subject: [EXT] Recent change to make rte_cryptodev_pmd.h internal prevents some 
important functionality

External Email

Hi Everyone,

I sent this last week and haven’t heard back – apologize if I missed the 
response but if not here it is again…

We use cryptodev in SPDK and included rte_cryptodev_pmd.h so that we may 
release qpair memory that was allocated when we called 
rte_cryptodev_queue_pair_setup().  We’d do so by calling the function pointer 
queue_pair_release() which I believe is the prescribed way to do this.

The DPDK change in question is here: 
https://github.com/DPDK/dpdk/commit/af668035f7f492424b2e199f155690815944a8ca

Question: Is there another way for us to release this memory? I’ve looked 
through the public API and nothing stands out.

Thanks
Paul



Re: [dpdk-dev] [PATCH v2] net: introduce IPv4 ihl and version fields

2021-10-04 Thread Olivier Matz
On Fri, Sep 03, 2021 at 10:30:03AM +0300, getelson wrote:
> From: Gregory Etelson 
> 
> RTE IPv4 header definition combines the `version' and `ihl'  fields
> into a single structure member.
> This patch introduces dedicated structure members for both `version'
> and `ihl' IPv4 fields. Separated header fields definitions allow to
> create simplified code to match on the IHL value in a flow rule.
> The original `version_ihl' structure member is kept for backward
> compatibility.
> 
> Signed-off-by: Gregory Etelson 
> 
> Depends-on: f7383e7c7ec1 ("net: announce changes in IPv4 header access")

Acked-by: Olivier Matz 

> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -38,7 +38,21 @@ extern "C" {
>   * IPv4 Header
>   */
>  struct rte_ipv4_hdr {
> - uint8_t  version_ihl;   /**< version and header length */
> + __extension__
> + union {
> + uint8_t version_ihl;/**< version and header length */
> + struct {
> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> + uint8_t ihl:4;
> + uint8_t version:4;
> +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> + uint8_t version:4;
> + uint8_t ihl:4;

nit: although it's obvious, we may want to add /**< IP version */ and
/**< header length */ for these new fields, for consistency with the
rest of the structure.


Re: [dpdk-dev] Recent change to make rte_cryptodev_pmd.h internal prevents some important functionality

2021-10-04 Thread Akhil Goyal
Ahh, yes!!!
queue_pair_release is not a public API it was mentioned in the 
rte_cryptodev_pmd.h, and it was mentioned on top of the file NOT to use it 
directly in the application.

Could you please mention the use case for which this memory need to be cleared 
before the stop or close of the device. Or you can send the patch to introduce 
it and explain the requirement of the use case in patch description.

Regards,
Akhil

From: Zhang, Roy Fan 
Sent: Monday, October 4, 2021 1:14 PM
To: Akhil Goyal ; Luse, Paul E ; 
dev@dpdk.org
Cc: ma...@nvidia.com; hemant.agra...@nxp.com
Subject: [EXT] RE: Recent change to make rte_cryptodev_pmd.h internal prevents 
some important functionality

External Email

Hi Akhil,

This isn't what our concern was - our concern was rte_cryptodev_close() may not 
remove the memory complete as rte_cryptodev_pmd_destroy() did.
Our research result towards this was if the PMD could act more throughout to 
make rte_cryptodev_close() working same as rte_cryptodev_pmd_destroy().

What Paul's concern is valid: We DO NOT have a way to release a queue pair 
manually anymore, and releasing queue pair to me should not be blocked from 
public API access.
To resolve this problem we should have a public queue_pair_release() function 
in the cryptodev. If you are ok I can send a patch for 21.11 right away.

Regards,
Fan


From: Akhil Goyal mailto:gak...@marvell.com>>
Sent: Monday, October 4, 2021 7:45 AM
To: Luse, Paul E mailto:paul.e.l...@intel.com>>; 
dev@dpdk.org; Zhang, Roy Fan 
mailto:roy.fan.zh...@intel.com>>
Cc: ma...@nvidia.com; 
hemant.agra...@nxp.com
Subject: RE: Recent change to make rte_cryptodev_pmd.h internal prevents some 
important functionality

Hi Paul,

Similar comment was discussed in ML for fips_validation app.
https://mails.dpdk.org/archives/dev/2021-August/217781.html

I believe Fan is working on it to fix the issue.
Fan, Could you please share the update.

Regards,
Akhil

Note: Please CC maintainers for a prompt response, or else mails can be skipped.

From: Luse, Paul E mailto:paul.e.l...@intel.com>>
Sent: Sunday, October 3, 2021 3:14 AM
To: dev@dpdk.org
Cc: Akhil Goyal mailto:gak...@marvell.com>>; 
ma...@nvidia.com; Zhang, Roy Fan 
mailto:roy.fan.zh...@intel.com>>; 
hemant.agra...@nxp.com
Subject: [EXT] Recent change to make rte_cryptodev_pmd.h internal prevents some 
important functionality

External Email

Hi Everyone,

I sent this last week and haven't heard back - apologize if I missed the 
response but if not here it is again...

We use cryptodev in SPDK and included rte_cryptodev_pmd.h so that we may 
release qpair memory that was allocated when we called 
rte_cryptodev_queue_pair_setup().  We'd do so by calling the function pointer 
queue_pair_release() which I believe is the prescribed way to do this.

The DPDK change in question is here: 
https://github.com/DPDK/dpdk/commit/af668035f7f492424b2e199f155690815944a8ca

Question: Is there another way for us to release this memory? I've looked 
through the public API and nothing stands out.

Thanks
Paul



Re: [dpdk-dev] [PATCH 1/3] mbuf: remove deprecated offload flags

2021-10-04 Thread David Marchand
On Wed, Sep 29, 2021 at 11:50 PM Olivier Matz  wrote:
>
> The flags PKT_TX_VLAN_PKT, PKT_TX_QINQ_PKT, PKT_RX_EIP_CKSUM_BAD are
> marked as deprecated since commit 380a7aab1ae2 ("mbuf: rename deprecated
> VLAN flags") (2017). Remove their definitions from rte_mbuf_core.h,
> and replace their usages.

The patch lgtm except the removal of some "bad checksum" flags, see below.

[snip]


> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index 05fc2fdee7..549e9416c4 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -159,11 +159,6 @@ Deprecation Notices
>will be limited to maximum 256 queues.
>Also compile time flag ``RTE_ETHDEV_QUEUE_STAT_CNTRS`` will be removed.
>
> -* ethdev: The offload flag ``PKT_RX_EIP_CKSUM_BAD`` will be removed and
> -  replaced by the new flag ``PKT_RX_OUTER_IP_CKSUM_BAD``. The new name is 
> more
> -  consistent with existing outer header checksum status flag naming, which
> -  should help in reducing confusion about its usage.
> -
>  * i40e: As there are both i40evf and iavf pmd, the functions of them are
>duplicated. And now more and more advanced features are developed on iavf.
>To keep consistent with kernel driver's name

Those 3 flags are easy to replace, but some projects are still using them.

$ git grep-all -El
'\<(PKT_TX_VLAN_PKT|PKT_TX_QINQ_PKT|PKT_RX_EIP_CKSUM_BAD)\>' |grep -v
\\.patch$
DPVS/src/netif.c
DPVS/src/vlan.c
FD.io-VPP/src/plugins/dpdk/device/format.c
gatekeeper/bpf/bpf_mbuf.h
lagopus/src/dataplane/dpdk/worker.c
packet-journey/app/main.c
Trex/src/pal/common/common_mbuf.h
Trex/src/pal/linux/mbuf.h

Please update the release notes to announce this API update.


[snip]

> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index 9d8e3ddc86..93db9292c0 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -55,37 +55,12 @@ extern "C" {
>   /** RX packet with FDIR match indicate. */
>  #define PKT_RX_FDIR  (1ULL << 2)
>
> -/**
> - * Deprecated.
> - * Checking this flag alone is deprecated: check the 2 bits of
> - * PKT_RX_L4_CKSUM_MASK.
> - * This flag was set when the L4 checksum of a packet was detected as
> - * wrong by the hardware.
> - */
> -#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
> -
> -/**
> - * Deprecated.
> - * Checking this flag alone is deprecated: check the 2 bits of
> - * PKT_RX_IP_CKSUM_MASK.
> - * This flag was set when the IP checksum of a packet was detected as
> - * wrong by the hardware.
> - */
> -#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
> -

You did not mention PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD in the
commitlog.
There was no deprecation notice, and those flags were not marked
RTE_DEPRECATED (there are still many projects referencing them).

Is this removal intended?


-- 
David Marchand



[dpdk-dev] [PATCH 1/3] event/cnxk: fix packet Tx overflow

2021-10-04 Thread pbhagavatula
From: Pavan Nikhilesh 

The transmit loop incorrectly assumes that nb_mbufs is always
a multiple of 4 when transmitting an event vector. The max
size of the vector might not be reached and pushed out early
due to timeout.

Fixes: 761a321acf91 ("event/cnxk: support vectorized Tx event fast path")

Signed-off-by: Pavan Nikhilesh 
---
 Depends-on: series-18614 ("add SSO XAQ pool create and free")

 drivers/event/cnxk/cn10k_worker.h | 180 +-
 1 file changed, 77 insertions(+), 103 deletions(-)

diff --git a/drivers/event/cnxk/cn10k_worker.h 
b/drivers/event/cnxk/cn10k_worker.h
index 1255662b6c..657ab91ac8 100644
--- a/drivers/event/cnxk/cn10k_worker.h
+++ b/drivers/event/cnxk/cn10k_worker.h
@@ -7,10 +7,10 @@

 #include 

+#include "cn10k_cryptodev_ops.h"
 #include "cnxk_ethdev.h"
 #include "cnxk_eventdev.h"
 #include "cnxk_worker.h"
-#include "cn10k_cryptodev_ops.h"

 #include "cn10k_ethdev.h"
 #include "cn10k_rx.h"
@@ -237,18 +237,16 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct 
rte_event *ev,

cq_w1 = *(uint64_t *)(gw.u64[1] + 8);

-   sa_base = cnxk_nix_sa_base_get(port,
-  lookup_mem);
+   sa_base =
+   cnxk_nix_sa_base_get(port, lookup_mem);
sa_base &= ~(ROC_NIX_INL_SA_BASE_ALIGN - 1);

-   mbuf = (uint64_t)nix_sec_meta_to_mbuf_sc(cq_w1,
-   sa_base, (uintptr_t)&iova,
-   &loff, (struct rte_mbuf *)mbuf,
-   d_off);
+   mbuf = (uint64_t)nix_sec_meta_to_mbuf_sc(
+   cq_w1, sa_base, (uintptr_t)&iova, &loff,
+   (struct rte_mbuf *)mbuf, d_off);
if (loff)
roc_npa_aura_op_free(m->pool->pool_id,
 0, iova);
-
}

gw.u64[0] = CNXK_CLR_SUB_EVENT(gw.u64[0]);
@@ -396,6 +394,56 @@ cn10k_sso_hws_xtract_meta(struct rte_mbuf *m,
txq_data[m->port][rte_event_eth_tx_adapter_txq_get(m)];
 }

+static __rte_always_inline void
+cn10k_sso_tx_one(struct rte_mbuf *m, uint64_t *cmd, uint16_t lmt_id,
+uintptr_t lmt_addr, uint8_t sched_type, uintptr_t base,
+const uint64_t txq_data[][RTE_MAX_QUEUES_PER_PORT],
+const uint32_t flags)
+{
+   uint8_t lnum = 0, loff = 0, shft = 0;
+   struct cn10k_eth_txq *txq;
+   uintptr_t laddr;
+   uint16_t segdw;
+   uintptr_t pa;
+   bool sec;
+
+   txq = cn10k_sso_hws_xtract_meta(m, txq_data);
+   cn10k_nix_tx_skeleton(txq, cmd, flags);
+   /* Perform header writes before barrier
+* for TSO
+*/
+   if (flags & NIX_TX_OFFLOAD_TSO_F)
+   cn10k_nix_xmit_prepare_tso(m, flags);
+
+   cn10k_nix_xmit_prepare(m, cmd, flags, txq->lso_tun_fmt, &sec);
+
+   laddr = lmt_addr;
+   /* Prepare CPT instruction and get nixtx addr if
+* it is for CPT on same lmtline.
+*/
+   if (flags & NIX_TX_OFFLOAD_SECURITY_F && sec)
+   cn10k_nix_prep_sec(m, cmd, &laddr, lmt_addr, &lnum, &loff,
+  &shft, txq->sa_base, flags);
+
+   /* Move NIX desc to LMT/NIXTX area */
+   cn10k_nix_xmit_mv_lmt_base(laddr, cmd, flags);
+
+   if (flags & NIX_TX_MULTI_SEG_F)
+   segdw = cn10k_nix_prepare_mseg(m, (uint64_t *)laddr, flags);
+   else
+   segdw = cn10k_nix_tx_ext_subs(flags) + 2;
+
+   if (flags & NIX_TX_OFFLOAD_SECURITY_F && sec)
+   pa = txq->cpt_io_addr | 3 << 4;
+   else
+   pa = txq->io_addr | ((segdw - 1) << 4);
+
+   if (!sched_type)
+   roc_sso_hws_head_wait(base + SSOW_LF_GWS_TAG);
+
+   roc_lmt_submit_steorl(lmt_id, pa);
+}
+
 static __rte_always_inline void
 cn10k_sso_vwqe_split_tx(struct rte_mbuf **mbufs, uint16_t nb_mbufs,
uint64_t *cmd, uint16_t lmt_id, uintptr_t lmt_addr,
@@ -404,11 +452,13 @@ cn10k_sso_vwqe_split_tx(struct rte_mbuf **mbufs, uint16_t 
nb_mbufs,
const uint32_t flags)
 {
uint16_t port[4], queue[4];
+   uint16_t i, j, pkts, scalar;
struct cn10k_eth_txq *txq;
-   uint16_t i, j;
-   uintptr_t pa;

-   for (i = 0; i < nb_mbufs; i += 4) {
+   scalar = nb_mbufs & (NIX_DESCS_PER_LOOP - 1);
+   pkts = RTE_ALIGN_FLOOR(nb_mbufs, NIX_DESCS_PER_LOOP);
+
+   for (i = 0; i < pkts; i += NIX_DESCS_PER_LOOP) {
port[0] = mbufs[i]->port;
port[1] = mbufs[i + 1]->port;
port[2] = mbufs[i + 2]->port;
@@ -421,66 +4

[dpdk-dev] [PATCH 2/3] event/cnxk: reduce workslot memory consumption

2021-10-04 Thread pbhagavatula
From: Pavan Nikhilesh 

SSO group base addresses are always are always contiguous we
need not store all the base addresses in workslot memory, instead
just store the base address and compute the group address offset
when required.

Signed-off-by: Pavan Nikhilesh 
---
 drivers/event/cnxk/cn10k_eventdev.c |  5 ++---
 drivers/event/cnxk/cn10k_worker.h   |  3 ++-
 drivers/event/cnxk/cn9k_eventdev.c  |  8 +++-
 drivers/event/cnxk/cn9k_worker.h|  6 --
 drivers/event/cnxk/cnxk_eventdev.c  | 15 ++-
 drivers/event/cnxk/cnxk_eventdev.h  |  8 
 6 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/drivers/event/cnxk/cn10k_eventdev.c 
b/drivers/event/cnxk/cn10k_eventdev.c
index c2729a2c48..49bdd14208 100644
--- a/drivers/event/cnxk/cn10k_eventdev.c
+++ b/drivers/event/cnxk/cn10k_eventdev.c
@@ -91,14 +91,13 @@ cn10k_sso_hws_unlink(void *arg, void *port, uint16_t *map, 
uint16_t nb_link)
 }
 
 static void
-cn10k_sso_hws_setup(void *arg, void *hws, uintptr_t *grps_base)
+cn10k_sso_hws_setup(void *arg, void *hws, uintptr_t grp_base)
 {
struct cnxk_sso_evdev *dev = arg;
struct cn10k_sso_hws *ws = hws;
uint64_t val;
 
-   rte_memcpy(ws->grps_base, grps_base,
-  sizeof(uintptr_t) * CNXK_SSO_MAX_HWGRP);
+   ws->grp_base = grp_base;
ws->fc_mem = (uint64_t *)dev->fc_iova;
ws->xaq_lmt = dev->xaq_lmt;
 
diff --git a/drivers/event/cnxk/cn10k_worker.h 
b/drivers/event/cnxk/cn10k_worker.h
index 657ab91ac8..f8331e88d7 100644
--- a/drivers/event/cnxk/cn10k_worker.h
+++ b/drivers/event/cnxk/cn10k_worker.h
@@ -30,7 +30,8 @@ cn10k_sso_hws_new_event(struct cn10k_sso_hws *ws, const 
struct rte_event *ev)
if (ws->xaq_lmt <= *ws->fc_mem)
return 0;
 
-   cnxk_sso_hws_add_work(event_ptr, tag, new_tt, ws->grps_base[grp]);
+   cnxk_sso_hws_add_work(event_ptr, tag, new_tt,
+ ws->grp_base + (grp << 12));
return 1;
 }
 
diff --git a/drivers/event/cnxk/cn9k_eventdev.c 
b/drivers/event/cnxk/cn9k_eventdev.c
index 3a20b099ae..9886720310 100644
--- a/drivers/event/cnxk/cn9k_eventdev.c
+++ b/drivers/event/cnxk/cn9k_eventdev.c
@@ -87,7 +87,7 @@ cn9k_sso_hws_unlink(void *arg, void *port, uint16_t *map, 
uint16_t nb_link)
 }
 
 static void
-cn9k_sso_hws_setup(void *arg, void *hws, uintptr_t *grps_base)
+cn9k_sso_hws_setup(void *arg, void *hws, uintptr_t grp_base)
 {
struct cnxk_sso_evdev *dev = arg;
struct cn9k_sso_hws_dual *dws;
@@ -98,8 +98,7 @@ cn9k_sso_hws_setup(void *arg, void *hws, uintptr_t *grps_base)
val = NSEC2USEC(dev->deq_tmo_ns) - 1;
if (dev->dual_ws) {
dws = hws;
-   rte_memcpy(dws->grps_base, grps_base,
-  sizeof(uintptr_t) * CNXK_SSO_MAX_HWGRP);
+   dws->grp_base = grp_base;
dws->fc_mem = (uint64_t *)dev->fc_iova;
dws->xaq_lmt = dev->xaq_lmt;
 
@@ -107,8 +106,7 @@ cn9k_sso_hws_setup(void *arg, void *hws, uintptr_t 
*grps_base)
plt_write64(val, dws->base[1] + SSOW_LF_GWS_NW_TIM);
} else {
ws = hws;
-   rte_memcpy(ws->grps_base, grps_base,
-  sizeof(uintptr_t) * CNXK_SSO_MAX_HWGRP);
+   ws->grp_base = grp_base;
ws->fc_mem = (uint64_t *)dev->fc_iova;
ws->xaq_lmt = dev->xaq_lmt;
 
diff --git a/drivers/event/cnxk/cn9k_worker.h b/drivers/event/cnxk/cn9k_worker.h
index 6be9be0b47..320e39da7b 100644
--- a/drivers/event/cnxk/cn9k_worker.h
+++ b/drivers/event/cnxk/cn9k_worker.h
@@ -31,7 +31,8 @@ cn9k_sso_hws_new_event(struct cn9k_sso_hws *ws, const struct 
rte_event *ev)
if (ws->xaq_lmt <= *ws->fc_mem)
return 0;
 
-   cnxk_sso_hws_add_work(event_ptr, tag, new_tt, ws->grps_base[grp]);
+   cnxk_sso_hws_add_work(event_ptr, tag, new_tt,
+ ws->grp_base + (grp << 12));
return 1;
 }
 
@@ -108,7 +109,8 @@ cn9k_sso_hws_dual_new_event(struct cn9k_sso_hws_dual *dws,
if (dws->xaq_lmt <= *dws->fc_mem)
return 0;
 
-   cnxk_sso_hws_add_work(event_ptr, tag, new_tt, dws->grps_base[grp]);
+   cnxk_sso_hws_add_work(event_ptr, tag, new_tt,
+ dws->grp_base + (grp << 12));
return 1;
 }
 
diff --git a/drivers/event/cnxk/cnxk_eventdev.c 
b/drivers/event/cnxk/cnxk_eventdev.c
index 84bf8cb6d1..c127034d37 100644
--- a/drivers/event/cnxk/cnxk_eventdev.c
+++ b/drivers/event/cnxk/cnxk_eventdev.c
@@ -332,8 +332,7 @@ cnxk_sso_port_setup(struct rte_eventdev *event_dev, uint8_t 
port_id,
cnxk_sso_hws_setup_t hws_setup_fn)
 {
struct cnxk_sso_evdev *dev = cnxk_sso_pmd_priv(event_dev);
-   uintptr_t grps_base[CNXK_SSO_MAX_HWGRP] = {0};
-   uint16_t q;
+   uintptr_t grp_base = 0;
 
plt_sso_dbg("Port=%d", port_id);
if (event_dev->data->ports[port_id] == NULL) {
@@ -341,15 +340,13 @@ cnx

[dpdk-dev] [PATCH 3/3] event/cnxk: rework enqueue path

2021-10-04 Thread pbhagavatula
From: Pavan Nikhilesh 

Rework SSO enqueue path for CN9K make it similar to CN10K
enqueue interface.

Signed-off-by: Pavan Nikhilesh 
---
 drivers/event/cnxk/cn9k_eventdev.c| 28 ++-
 drivers/event/cnxk/cn9k_worker.c  | 21 ++---
 drivers/event/cnxk/cn9k_worker.h  | 78 +--
 drivers/event/cnxk/cn9k_worker_deq.c  |  4 +-
 drivers/event/cnxk/cn9k_worker_deq_ca.c   |  4 +-
 drivers/event/cnxk/cn9k_worker_deq_tmo.c  |  4 +-
 drivers/event/cnxk/cn9k_worker_dual_deq.c | 16 ++--
 drivers/event/cnxk/cn9k_worker_dual_deq_ca.c  | 19 +++--
 drivers/event/cnxk/cn9k_worker_dual_deq_tmo.c | 26 +++
 drivers/event/cnxk/cnxk_eventdev.h| 25 +-
 10 files changed, 96 insertions(+), 129 deletions(-)

diff --git a/drivers/event/cnxk/cn9k_eventdev.c 
b/drivers/event/cnxk/cn9k_eventdev.c
index 9886720310..a09722b717 100644
--- a/drivers/event/cnxk/cn9k_eventdev.c
+++ b/drivers/event/cnxk/cn9k_eventdev.c
@@ -27,17 +27,6 @@
[!!(dev->tx_offloads & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)] \
[!!(dev->tx_offloads & NIX_TX_OFFLOAD_L3_L4_CSUM_F)])
 
-static void
-cn9k_init_hws_ops(struct cn9k_sso_hws_state *ws, uintptr_t base)
-{
-   ws->tag_op = base + SSOW_LF_GWS_TAG;
-   ws->wqp_op = base + SSOW_LF_GWS_WQP;
-   ws->getwrk_op = base + SSOW_LF_GWS_OP_GET_WORK0;
-   ws->swtag_flush_op = base + SSOW_LF_GWS_OP_SWTAG_FLUSH;
-   ws->swtag_norm_op = base + SSOW_LF_GWS_OP_SWTAG_NORM;
-   ws->swtag_desched_op = base + SSOW_LF_GWS_OP_SWTAG_DESCHED;
-}
-
 static int
 cn9k_sso_hws_link(void *arg, void *port, uint16_t *map, uint16_t nb_link)
 {
@@ -95,7 +84,7 @@ cn9k_sso_hws_setup(void *arg, void *hws, uintptr_t grp_base)
uint64_t val;
 
/* Set get_work tmo for HWS */
-   val = NSEC2USEC(dev->deq_tmo_ns) - 1;
+   val = dev->deq_tmo_ns ? NSEC2USEC(dev->deq_tmo_ns) - 1 : 0;
if (dev->dual_ws) {
dws = hws;
dws->grp_base = grp_base;
@@ -148,7 +137,6 @@ cn9k_sso_hws_flush_events(void *hws, uint8_t queue_id, 
uintptr_t base,
 {
struct cnxk_sso_evdev *dev = cnxk_sso_pmd_priv(arg);
struct cn9k_sso_hws_dual *dws;
-   struct cn9k_sso_hws_state *st;
struct cn9k_sso_hws *ws;
uint64_t cq_ds_cnt = 1;
uint64_t aq_cnt = 1;
@@ -170,22 +158,21 @@ cn9k_sso_hws_flush_events(void *hws, uint8_t queue_id, 
uintptr_t base,
 
if (dev->dual_ws) {
dws = hws;
-   st = &dws->ws_state[0];
ws_base = dws->base[0];
} else {
ws = hws;
-   st = (struct cn9k_sso_hws_state *)ws;
ws_base = ws->base;
}
 
while (aq_cnt || cq_ds_cnt || ds_cnt) {
-   plt_write64(req, st->getwrk_op);
-   cn9k_sso_hws_get_work_empty(st, &ev);
+   plt_write64(req, ws_base + SSOW_LF_GWS_OP_GET_WORK0);
+   cn9k_sso_hws_get_work_empty(ws_base, &ev);
if (fn != NULL && ev.u64 != 0)
fn(arg, ev);
if (ev.sched_type != SSO_TT_EMPTY)
-   cnxk_sso_hws_swtag_flush(st->tag_op,
-st->swtag_flush_op);
+   cnxk_sso_hws_swtag_flush(
+   ws_base + SSOW_LF_GWS_TAG,
+   ws_base + SSOW_LF_GWS_OP_SWTAG_FLUSH);
do {
val = plt_read64(ws_base + SSOW_LF_GWS_PENDSTATE);
} while (val & BIT_ULL(56));
@@ -674,8 +661,6 @@ cn9k_sso_init_hws_mem(void *arg, uint8_t port_id)
&dev->sso, CN9K_DUAL_WS_PAIR_ID(port_id, 0));
dws->base[1] = roc_sso_hws_base_get(
&dev->sso, CN9K_DUAL_WS_PAIR_ID(port_id, 1));
-   cn9k_init_hws_ops(&dws->ws_state[0], dws->base[0]);
-   cn9k_init_hws_ops(&dws->ws_state[1], dws->base[1]);
dws->hws_id = port_id;
dws->swtag_req = 0;
dws->vws = 0;
@@ -695,7 +680,6 @@ cn9k_sso_init_hws_mem(void *arg, uint8_t port_id)
/* First cache line is reserved for cookie */
ws = RTE_PTR_ADD(ws, sizeof(struct cnxk_sso_hws_cookie));
ws->base = roc_sso_hws_base_get(&dev->sso, port_id);
-   cn9k_init_hws_ops((struct cn9k_sso_hws_state *)ws, ws->base);
ws->hws_id = port_id;
ws->swtag_req = 0;
 
diff --git a/drivers/event/cnxk/cn9k_worker.c b/drivers/event/cnxk/cn9k_worker.c
index 32f7cc0343..a981bc986f 100644
--- a/drivers/event/cnxk/cn9k_worker.c
+++ b/drivers/event/cnxk/cn9k_worker.c
@@ -19,7 +19,8 @@ cn9k_sso_hws_enq(void *port, const struct rte_event *ev)
cn9k_sso_hws_forward_event(ws, ev);
break;
case RTE_EVENT_OP_RELEASE:
-   cnxk_sso_hws_swtag_flush(ws->tag_op, ws->swtag_flush_op);
+

Re: [dpdk-dev] [PATCH v3 4/7] ethdev: make burst functions to use new flat array

2021-10-04 Thread Ferruh Yigit
On 10/1/2021 6:40 PM, Ananyev, Konstantin wrote:
> 
> 
>> On 10/1/2021 3:02 PM, Konstantin Ananyev wrote:
>>> Rework 'fast' burst functions to use rte_eth_fp_ops[].
>>> While it is an API/ABI breakage, this change is intended to be
>>> transparent for both users (no changes in user app is required) and
>>> PMD developers (no changes in PMD is required).
>>> One extra thing to note - RX/TX callback invocation will cause extra
>>> function call with these changes. That might cause some insignificant
>>> slowdown for code-path where RX/TX callbacks are heavily involved.
>>>
>>> Signed-off-by: Konstantin Ananyev 
>>
>> <...>
>>
>>>  static inline int
>>>  rte_eth_rx_queue_count(uint16_t port_id, uint16_t queue_id)
>>>  {
>>> -   struct rte_eth_dev *dev;
>>> +   struct rte_eth_fp_ops *p;
>>> +   void *qd;
>>> +
>>> +   if (port_id >= RTE_MAX_ETHPORTS ||
>>> +   queue_id >= RTE_MAX_QUEUES_PER_PORT) {
>>> +   RTE_ETHDEV_LOG(ERR,
>>> +   "Invalid port_id=%u or queue_id=%u\n",
>>> +   port_id, queue_id);
>>> +   return -EINVAL;
>>> +   }
>>
>> Should the checkes wrapped with '#ifdef RTE_ETHDEV_DEBUG_RX' like others?
> 
> Original rte_eth_rx_queue_count() always have similar checks enabled,
> that's why I also kept them 'always on'. 
> 
>>
>> <...>
>>
>>> +++ b/lib/ethdev/version.map
>>> @@ -247,11 +247,16 @@ EXPERIMENTAL {
>>> rte_mtr_meter_policy_delete;
>>> rte_mtr_meter_policy_update;
>>> rte_mtr_meter_policy_validate;
>>> +
>>> +   # added in 21.05
>>
>> s/21.05/21.11/
>>
>>> +   __rte_eth_rx_epilog;
>>> +   __rte_eth_tx_prolog;
>>
>> These are directly called by application and must be part of ABI, but marked 
>> as
>> 'internal' and has '__rte' prefix to highligh it, this may be confusing.
>> What about making them proper, non-internal, API?
> 
> Hmm not sure what do you suggest here.
> We don't want users to call them explicitly.
> They are sort of helpers for rte_eth_rx_burst/rte_eth_tx_burst.
> So I did what I thought is our usual policy for such semi-internal thigns:
> have '@intenal' in comments, but in version.map put them under 
> EXPERIMETAL/global
> section.
> 
> What do you think it should be instead?
>  

Make them public API. (Basically just remove '__' prefix and @internal comment).

This way application can use them to run custom callback(s) (not only the
registered ones), not sure if this can be dangerous though.

We need to trace the ABI for these functions, making them public clarifies it.

Also comment can be updated to describe intended usage instead of marking them
internal, and applications can use these anyway if we mark them internal or not.


>>>  };
>>>
>>>  INTERNAL {
>>> global:
>>>
>>> +   rte_eth_fp_ops;
>>
>> This variable is accessed in inline function, so accessed by application, not
>> sure if it suits the 'internal' object definition, internal should be only 
>> for
>> objects accessed by other parts of DPDK.
>> I think this can be added to 'DPDK_22'.
>>
>>> rte_eth_dev_allocate;
>>> rte_eth_dev_allocated;
>>> rte_eth_dev_attach_secondary;
>>>
> 



Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-04 Thread Harman Kalra
Hi David,

Thanks for the review.
Please see my comments inline.


> -Original Message-
> From: David Marchand 
> Sent: Tuesday, September 28, 2021 9:17 PM
> To: Harman Kalra 
> Cc: dev ; Ray Kinsella 
> Subject: [EXT] Re: [dpdk-dev] [PATCH v1 2/7] eal/interrupts: implement get
> set APIs
> 
> External Email
> 
> --
> On Fri, Sep 3, 2021 at 2:42 PM Harman Kalra  wrote:
> >
> > Implementing get set APIs for interrupt handle fields.
> > To make any change to the interrupt handle fields, one should make use
> > of these APIs.
> 

> Some global comments.
> 
> - Please merge API prototype (from patch 1) and actual implementation in a
> single patch.

  Sure, will do.

> - rte_intr_handle_ seems a rather long prefix, does it really matter to have
> the _handle part?

 Will fix the API names.


> - what part of this API needs to be exported to applications? Let's hide as
> much as we can with __rte_internal.

 I will make all the APIs (new and some old) not used in test suite and 
example app as __rte_internal. 

> 
> 
> >
> > Signed-off-by: Harman Kalra 
> > Acked-by: Ray Kinsella 
> > ---
> >  lib/eal/common/eal_common_interrupts.c | 506
> +
> >  lib/eal/common/meson.build |   2 +
> >  lib/eal/include/rte_eal_interrupts.h   |   6 +-
> >  lib/eal/version.map|  30 ++
> >  4 files changed, 543 insertions(+), 1 deletion(-)  create mode 100644
> > lib/eal/common/eal_common_interrupts.c
> >
> > diff --git a/lib/eal/common/eal_common_interrupts.c
> > b/lib/eal/common/eal_common_interrupts.c
> > new file mode 100644
> > index 00..2e4fed96f0
> > --- /dev/null
> > +++ b/lib/eal/common/eal_common_interrupts.c
> > @@ -0,0 +1,506 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2021 Marvell.
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +
> > +struct rte_intr_handle *rte_intr_handle_instance_alloc(int size,
> > +  bool
> > +from_hugepage) {
> > +   struct rte_intr_handle *intr_handle;
> > +   int i;
> > +
> > +   if (from_hugepage)
> > +   intr_handle = rte_zmalloc(NULL,
> > + size * sizeof(struct 
> > rte_intr_handle),
> > + 0);
> > +   else
> > +   intr_handle = calloc(1, size * sizeof(struct
> > + rte_intr_handle));
> 
> We can call DPDK allocator in all cases.
> That would avoid headaches on why multiprocess does not work in some
> rarely tested cases.
> Wdyt?
> 
> Plus "from_hugepage" is misleading, you could be in --no-huge mode,
> rte_zmalloc still works.

 In mellanox 5 driver interrupt handle instance is freed in destructor 
" mlx5_pmd_interrupt_handler_uninstall()" while DPDK memory allocators
are already cleaned up in "rte_eal_cleanup". Hence I allocated interrupt
instances for such cases from normal heap. There could be other such cases
so I think its ok to keep this support.

Regarding name, I will change " from_hugepage" to "dpdk_allocator".

As per suggestion from Dmitry, I will replace bool arg with a flag variable, to 
support more such configurations in future.


> 
> 
> > +   if (!intr_handle) {
> > +   RTE_LOG(ERR, EAL, "Fail to allocate intr_handle\n");
> > +   rte_errno = ENOMEM;
> > +   return NULL;
> > +   }
> > +
> > +   for (i = 0; i < size; i++) {
> > +   intr_handle[i].nb_intr = RTE_MAX_RXTX_INTR_VEC_ID;
> > +   intr_handle[i].alloc_from_hugepage = from_hugepage;
> > +   }
> > +
> > +   return intr_handle;
> > +}
> > +
> > +struct rte_intr_handle *rte_intr_handle_instance_index_get(
> > +   struct rte_intr_handle *intr_handle,
> > +int index) {
> > +   if (intr_handle == NULL) {
> > +   RTE_LOG(ERR, EAL, "Interrupt instance unallocated\n");
> > +   rte_errno = ENOMEM;
> > +   return NULL;
> > +   }
> > +
> > +   return &intr_handle[index];
> > +}
> > +
> > +int rte_intr_handle_instance_index_set(struct rte_intr_handle
> *intr_handle,
> > +  const struct rte_intr_handle *src,
> > +  int index) {
> > +   if (intr_handle == NULL) {
> > +   RTE_LOG(ERR, EAL, "Interrupt instance unallocated\n");
> > +   rte_errno = ENOTSUP;
> > +   goto fail;
> > +   }
> > +
> > +   if (src == NULL) {
> > +   RTE_LOG(ERR, EAL, "Source interrupt instance 
> > unallocated\n");
> > +   rte_errno = EINVAL;
> > +   goto fail;
> > +   }
> > +
> > +   if (index < 0) {
> > +   RTE_LOG(ERR, EAL, "Index cany be negative");
> > +   rte_errno = EINVAL;
> > +   goto fail;
> >

Re: [dpdk-dev] [PATCH v5 1/3] net/thunderx: enable build only on 64-bit Linux

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 6:56 AM, pbhagavat...@marvell.com wrote:
> From: Pavan Nikhilesh 
> 
> Due to Linux kernel AF(Admin function) driver dependency, only enable
> build for 64-bit Linux.
> 

Hi Pavan,

Isn't it possible to provide a commit log in the kernel side etc, that let
others to verify why only 64 bit is required, or if someone want to support
32bit that may help them to investigate the source of the restriction.

> Signed-off-by: Pavan Nikhilesh 
> Acked-by: Jerin Jacob 
> ---
>  v5 Changes
>  - s/fuction/function.
> 
>  v4 Changes:
>  - Update commit message regarding dependency on AF driver.
> 
>  drivers/net/thunderx/meson.build | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/thunderx/meson.build 
> b/drivers/net/thunderx/meson.build
> index 4bbcea7f93..da665bd76f 100644
> --- a/drivers/net/thunderx/meson.build
> +++ b/drivers/net/thunderx/meson.build
> @@ -1,9 +1,9 @@
>  # SPDX-License-Identifier: BSD-3-Clause
>  # Copyright(c) 2017 Cavium, Inc
> 
> -if is_windows
> +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
>  build = false
> -reason = 'not supported on Windows'
> +reason = 'only supported on 64-bit Linux'
>  subdir_done()
>  endif
> 
> --
> 2.17.1
> 



Re: [dpdk-dev] [PATCH v3 2/6] ethdev: move jumbo frame offload check to library

2021-10-04 Thread Somnath Kotur
On Fri, Oct 1, 2021 at 8:06 PM Ferruh Yigit  wrote:
>
> Setting MTU bigger than RTE_ETHER_MTU requires the jumbo frame support,
> and application should enable the jumbo frame offload support for it.
>
> When jumbo frame offload is not enabled by application, but MTU bigger
> than RTE_ETHER_MTU is requested there are two options, either fail or
> enable jumbo frame offload implicitly.
>
> Enabling jumbo frame offload implicitly is selected by many drivers
> since setting a big MTU value already implies it, and this increases
> usability.
>
> This patch moves this logic from drivers to the library, both to reduce
> the duplicated code in the drivers and to make behaviour more visible.
>
> Signed-off-by: Ferruh Yigit 
> Reviewed-by: Andrew Rybchenko 
> Reviewed-by: Rosen Xu 
> Acked-by: Ajit Khaparde 
> ---
>  drivers/net/axgbe/axgbe_ethdev.c|  9 ++---
>  drivers/net/bnxt/bnxt_ethdev.c  |  9 ++---
>  drivers/net/cnxk/cnxk_ethdev_ops.c  |  5 -
>  drivers/net/cxgbe/cxgbe_ethdev.c|  8 
>  drivers/net/dpaa/dpaa_ethdev.c  |  7 ---
>  drivers/net/dpaa2/dpaa2_ethdev.c|  7 ---
>  drivers/net/e1000/em_ethdev.c   |  9 ++---
>  drivers/net/e1000/igb_ethdev.c  |  9 ++---
>  drivers/net/enetc/enetc_ethdev.c|  7 ---
>  drivers/net/hinic/hinic_pmd_ethdev.c|  7 ---
>  drivers/net/hns3/hns3_ethdev.c  |  8 
>  drivers/net/hns3/hns3_ethdev_vf.c   |  6 --
>  drivers/net/i40e/i40e_ethdev.c  |  5 -
>  drivers/net/iavf/iavf_ethdev.c  |  7 ---
>  drivers/net/ice/ice_ethdev.c|  5 -
>  drivers/net/igc/igc_ethdev.c|  9 ++---
>  drivers/net/ipn3ke/ipn3ke_representor.c |  5 -
>  drivers/net/ixgbe/ixgbe_ethdev.c|  7 ++-
>  drivers/net/liquidio/lio_ethdev.c   |  7 ---
>  drivers/net/nfp/nfp_common.c|  6 --
>  drivers/net/octeontx/octeontx_ethdev.c  |  5 -
>  drivers/net/octeontx2/otx2_ethdev_ops.c |  5 -
>  drivers/net/qede/qede_ethdev.c  |  4 
>  drivers/net/sfc/sfc_ethdev.c|  9 -
>  drivers/net/thunderx/nicvf_ethdev.c |  6 --
>  drivers/net/txgbe/txgbe_ethdev.c|  6 --
>  lib/ethdev/rte_ethdev.c | 18 +-
>  27 files changed, 29 insertions(+), 166 deletions(-)
>
> diff --git a/drivers/net/axgbe/axgbe_ethdev.c 
> b/drivers/net/axgbe/axgbe_ethdev.c
> index 76aeec077f2b..2960834b4539 100644
> --- a/drivers/net/axgbe/axgbe_ethdev.c
> +++ b/drivers/net/axgbe/axgbe_ethdev.c
> @@ -1492,15 +1492,10 @@ static int axgb_mtu_set(struct rte_eth_dev *dev, 
> uint16_t mtu)
> dev->data->port_id);
> return -EBUSY;
> }
> -   if (mtu > RTE_ETHER_MTU) {
> -   dev->data->dev_conf.rxmode.offloads |=
> -   DEV_RX_OFFLOAD_JUMBO_FRAME;
> +   if (mtu > RTE_ETHER_MTU)
> val = 1;
> -   } else {
> -   dev->data->dev_conf.rxmode.offloads &=
> -   ~DEV_RX_OFFLOAD_JUMBO_FRAME;
> +   else
> val = 0;
> -   }
> AXGMAC_IOWRITE_BITS(pdata, MAC_RCR, JE, val);
> return 0;
>  }
> diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
> index 8c6f20b75aed..07ee19938930 100644
> --- a/drivers/net/bnxt/bnxt_ethdev.c
> +++ b/drivers/net/bnxt/bnxt_ethdev.c
> @@ -3052,15 +3052,10 @@ int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, 
> uint16_t new_mtu)
> return -EINVAL;
> }
>
> -   if (new_mtu > RTE_ETHER_MTU) {
> +   if (new_mtu > RTE_ETHER_MTU)
> bp->flags |= BNXT_FLAG_JUMBO;
> -   bp->eth_dev->data->dev_conf.rxmode.offloads |=
> -   DEV_RX_OFFLOAD_JUMBO_FRAME;
> -   } else {
> -   bp->eth_dev->data->dev_conf.rxmode.offloads &=
> -   ~DEV_RX_OFFLOAD_JUMBO_FRAME;
> +   else
> bp->flags &= ~BNXT_FLAG_JUMBO;
> -   }
>
Acked-by: Somnath kotur 
> /* Is there a change in mtu setting? */
> if (eth_dev->data->mtu == new_mtu)
> diff --git a/drivers/net/cnxk/cnxk_ethdev_ops.c 
> b/drivers/net/cnxk/cnxk_ethdev_ops.c
> index 695d0d6fd3e2..349896f6a1bf 100644
> --- a/drivers/net/cnxk/cnxk_ethdev_ops.c
> +++ b/drivers/net/cnxk/cnxk_ethdev_ops.c
> @@ -439,11 +439,6 @@ cnxk_nix_mtu_set(struct rte_eth_dev *eth_dev, uint16_t 
> mtu)
> plt_err("Failed to max Rx frame length, rc=%d", rc);
> goto exit;
> }
> -
> -   if (mtu > RTE_ETHER_MTU)
> -   dev->rx_offloads |= DEV_RX_OFFLOAD_JUMBO_FRAME;
> -   else
> -   dev->rx_offloads &= ~DEV_RX_OFFLOAD_JUMBO_FRAME;
>  exit:
> return rc;
>  }
> diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c 
> b/drivers/net/cxgbe/cxgbe_ethdev.c
> index 8cf61f12a8d6..0c9cc2f5bb3f 100644
> --- a/drivers/net/cxgbe/cxgbe_eth

Re: [dpdk-dev] [PATCH v3 3/6] ethdev: move check to library for MTU set

2021-10-04 Thread Somnath Kotur
On Fri, Oct 1, 2021 at 8:07 PM Ferruh Yigit  wrote:
>
> Move requested MTU value check to the API to prevent the duplicated
> code.
>
> Signed-off-by: Ferruh Yigit 
> Reviewed-by: Andrew Rybchenko 
> Reviewed-by: Rosen Xu 
> ---
>  drivers/net/axgbe/axgbe_ethdev.c| 15 ---
>  drivers/net/bnxt/bnxt_ethdev.c  |  2 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c| 13 +
>  drivers/net/dpaa/dpaa_ethdev.c  |  2 --
>  drivers/net/dpaa2/dpaa2_ethdev.c|  4 
>  drivers/net/e1000/em_ethdev.c   | 10 --
>  drivers/net/e1000/igb_ethdev.c  | 11 ---
>  drivers/net/enetc/enetc_ethdev.c|  4 
>  drivers/net/hinic/hinic_pmd_ethdev.c|  8 +---
>  drivers/net/i40e/i40e_ethdev.c  | 17 -
>  drivers/net/iavf/iavf_ethdev.c  | 10 ++
>  drivers/net/ice/ice_ethdev.c| 14 +++---
>  drivers/net/igc/igc_ethdev.c|  5 -
>  drivers/net/ipn3ke/ipn3ke_representor.c |  6 --
>  drivers/net/liquidio/lio_ethdev.c   | 10 --
>  drivers/net/nfp/nfp_common.c|  4 
>  drivers/net/octeontx/octeontx_ethdev.c  |  4 
>  drivers/net/octeontx2/otx2_ethdev_ops.c |  4 
>  drivers/net/qede/qede_ethdev.c  | 12 
>  drivers/net/thunderx/nicvf_ethdev.c |  6 --
>  drivers/net/txgbe/txgbe_ethdev.c| 10 --
>  lib/ethdev/rte_ethdev.c |  9 +
>  22 files changed, 25 insertions(+), 155 deletions(-)
>
> diff --git a/drivers/net/axgbe/axgbe_ethdev.c 
> b/drivers/net/axgbe/axgbe_ethdev.c
> index 2960834b4539..c36cd7b1d2f0 100644
> --- a/drivers/net/axgbe/axgbe_ethdev.c
> +++ b/drivers/net/axgbe/axgbe_ethdev.c
> @@ -1478,25 +1478,18 @@ axgbe_dev_supported_ptypes_get(struct rte_eth_dev 
> *dev)
>
>  static int axgb_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
>  {
> -   struct rte_eth_dev_info dev_info;
> struct axgbe_port *pdata = dev->data->dev_private;
> -   uint32_t frame_size = mtu + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
> -   unsigned int val = 0;
> -   axgbe_dev_info_get(dev, &dev_info);
> -   /* check that mtu is within the allowed range */
> -   if (mtu < RTE_ETHER_MIN_MTU || frame_size > dev_info.max_rx_pktlen)
> -   return -EINVAL;
> +   unsigned int val;
> +
> /* mtu setting is forbidden if port is start */
> if (dev->data->dev_started) {
> PMD_DRV_LOG(ERR, "port %d must be stopped before 
> configuration",
> dev->data->port_id);
> return -EBUSY;
> }
> -   if (mtu > RTE_ETHER_MTU)
> -   val = 1;
> -   else
> -   val = 0;
> +   val = mtu > RTE_ETHER_MTU ? 1 : 0;
> AXGMAC_IOWRITE_BITS(pdata, MAC_RCR, JE, val);
> +
> return 0;
>  }
>
> diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
> index 07ee19938930..dc33b961320a 100644
> --- a/drivers/net/bnxt/bnxt_ethdev.c
> +++ b/drivers/net/bnxt/bnxt_ethdev.c
> @@ -3025,7 +3025,7 @@ int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, 
> uint16_t new_mtu)
> uint32_t overhead = BNXT_MAX_PKT_LEN - BNXT_MAX_MTU;
> struct bnxt *bp = eth_dev->data->dev_private;
> uint32_t new_pkt_size;
> -   uint32_t rc = 0;
> +   uint32_t rc;
> uint32_t i;
>
> rc = is_bnxt_in_error(bp);
Acked-by:  Somnath Kotur 
> diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c 
> b/drivers/net/cxgbe/cxgbe_ethdev.c
> index 0c9cc2f5bb3f..70b879fed100 100644
> --- a/drivers/net/cxgbe/cxgbe_ethdev.c
> +++ b/drivers/net/cxgbe/cxgbe_ethdev.c
> @@ -301,21 +301,10 @@ int cxgbe_dev_mtu_set(struct rte_eth_dev *eth_dev, 
> uint16_t mtu)
>  {
> struct port_info *pi = eth_dev->data->dev_private;
> struct adapter *adapter = pi->adapter;
> -   struct rte_eth_dev_info dev_info;
> -   int err;
> uint16_t new_mtu = mtu + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
>
> -   err = cxgbe_dev_info_get(eth_dev, &dev_info);
> -   if (err != 0)
> -   return err;
> -
> -   /* Must accommodate at least RTE_ETHER_MIN_MTU */
> -   if (mtu < RTE_ETHER_MIN_MTU || new_mtu > dev_info.max_rx_pktlen)
> -   return -EINVAL;
> -
> -   err = t4_set_rxmode(adapter, adapter->mbox, pi->viid, new_mtu, -1, -1,
> +   return t4_set_rxmode(adapter, adapter->mbox, pi->viid, new_mtu, -1, 
> -1,
> -1, -1, true);
> -   return err;
>  }
>
>  /*
> diff --git a/drivers/net/dpaa/dpaa_ethdev.c b/drivers/net/dpaa/dpaa_ethdev.c
> index 57b09f16ba44..3172e3b2de87 100644
> --- a/drivers/net/dpaa/dpaa_ethdev.c
> +++ b/drivers/net/dpaa/dpaa_ethdev.c
> @@ -167,8 +167,6 @@ dpaa_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
>
> PMD_INIT_FUNC_TRACE();
>
> -   if (mtu < RTE_ETHER_MIN_MTU || frame_size > DPAA_MAX_RX_PKT_LEN)
> -   return -EINVAL;
> /*

Re: [dpdk-dev] [PATCH v3 0/5] A means to negotiate delivery of Rx meta data

2021-10-04 Thread Andrew Rybchenko
On 10/1/21 3:10 PM, Thomas Monjalon wrote:
> 01/10/2021 12:15, Andrew Rybchenko:
>> On 10/1/21 12:48 PM, Thomas Monjalon wrote:
>>> 01/10/2021 10:55, Ivan Malov:
 On 01/10/2021 11:11, Thomas Monjalon wrote:
> 01/10/2021 08:47, Andrew Rybchenko:
>> On 9/30/21 10:30 PM, Ivan Malov wrote:
>>> On 30/09/2021 19:18, Thomas Monjalon wrote:
 23/09/2021 13:20, Ivan Malov:
> Patch [1/5] of this series adds a generic API to let applications
> negotiate delivery of Rx meta data during initialisation period.
>
> What is a metadata?
> Do you mean RTE_FLOW_ITEM_TYPE_META and RTE_FLOW_ITEM_TYPE_MARK?
> Metadata word could cover any field in the mbuf struct so it is vague.

 Metadata here is *any* additional information provided by the NIC for 
 each received packet. For example, Rx flag, Rx mark, RSS hash, packet 
 classification info, you name it. I'd like to stress out that the 
 suggested API comes with flags each of which is crystal clear on what 
 concrete kind of metadata it covers, eg. Rx mark.
>>>
>>> I missed the flags.
>>> You mean these 3 flags?
>>
>> Yes
>>
>>> +/** The ethdev sees flagged packets if there are flows with action FLAG. */
>>> +#define RTE_ETH_RX_META_USER_FLAG (UINT64_C(1) << 0)
>>> +
>>> +/** The ethdev sees mark IDs in packets if there are flows with action 
>>> MARK. */
>>> +#define RTE_ETH_RX_META_USER_MARK (UINT64_C(1) << 1)
>>> +
>>> +/** The ethdev detects missed packets if there are "tunnel_set" flows in 
>>> use. */
>>> +#define RTE_ETH_RX_META_TUNNEL_ID (UINT64_C(1) << 2)
>>>
>>> It is not crystal clear because it does not reference the API,
>>> like RTE_FLOW_ACTION_TYPE_MARK.
>>
>> Thanks, it is easy to fix. Please, note that there is no action
>> for tunnel ID case.
> 
> I don't understand the tunnel ID meta.
> Is it an existing offload? API?

rte_flow_tunnel_*() API and "Tunneled traffic offload" in flow
API documentation.

> 
>>> And it covers a limited set of metadata.
>>
>> Yes which are not covered by offloads, packet classification
>> etc. Anything else?
>>
>>> Do you intend to extend to all mbuf metadata?
>>
>> No. It should be discussed case-by-case separately.
> 
> Ah, it makes the intent clearer.
> Why not planning to do something truly generic?

IMHO, it is generic enough for the purpose.

> 
> This way, an application knows right from the start which parts
> of Rx meta data won't be delivered. Hence, no necessity to try
> inserting flows requesting such data and handle the failures.

 Sorry I don't understand the problem you want to solve.
 And sorry for not noticing earlier.
>>>
>>> No worries. *Some* PMDs do not enable delivery of, say, Rx mark with the
>>> packets by default (for performance reasons). If the application tries
>>> to insert a flow with action MARK, the PMD may not be able to enable
>>> delivery of Rx mark without the need to re-start Rx sub-system. And
>>> that's fraught with traffic disruption and similar bad consequences. In
>>> order to address it, we need to let the application express its interest
>>> in receiving mark with packets as early as possible. This way, the PMD
>>> can enable Rx mark delivery in advance. And, as an additional benefit,
>>> the application can learn *from the very beginning* whether it will be
>>> possible to use the feature or not. If this API tells the application
>>> that no mark delivery will be enabled, then the application can just
>>> skip many unnecessary attempts to insert wittingly unsupported flows
>>> during runtime.
>
> I'm puzzled, because we could have the same reasoning for any offload.

 We're not discussing *offloads*. An offload is when NIC *computes 
 something* and *delivers* it. We are discussing precisely *delivery*.
>>>
>>> OK but still, there are a lot more mbuf metadata delivered.
>>
>> Yes, and some are not controlled yet early enough, and
>> we do here.
>>
>>>
> I don't understand why we are focusing on mark only

 We are not focusing on mark on purpose. It's just how our discussion 
 goes. I chose mark (could've chosen flag or anything else) just to show 
 you an example.

> I would prefer we find a generic solution using the rte_flow API. > Can 
> we make rte_flow_validate() working before port start?
> If validating a fake rule doesn't make sense,
> why not having a new function accepting a single action as parameter?

 A noble idea, but if we feed the entire flow rule to the driver for 
 validation, then the driver must not look specifically for actions FLAG 
 or MARK in it (to enable or disable metadata delivery). This way, the 
 driver is obliged to also validate match criteria, attributes, etc. And, 
 if something is unsupported (say, some specific item), the driver will 
 have to reject the rule as a whole thus leaving th

Re: [dpdk-dev] [PATCH v3 4/7] ethdev: make burst functions to use new flat array

2021-10-04 Thread Ananyev, Konstantin

> >>
> >>>  static inline int
> >>>  rte_eth_rx_queue_count(uint16_t port_id, uint16_t queue_id)
> >>>  {
> >>> - struct rte_eth_dev *dev;
> >>> + struct rte_eth_fp_ops *p;
> >>> + void *qd;
> >>> +
> >>> + if (port_id >= RTE_MAX_ETHPORTS ||
> >>> + queue_id >= RTE_MAX_QUEUES_PER_PORT) {
> >>> + RTE_ETHDEV_LOG(ERR,
> >>> + "Invalid port_id=%u or queue_id=%u\n",
> >>> + port_id, queue_id);
> >>> + return -EINVAL;
> >>> + }
> >>
> >> Should the checkes wrapped with '#ifdef RTE_ETHDEV_DEBUG_RX' like others?
> >
> > Original rte_eth_rx_queue_count() always have similar checks enabled,
> > that's why I also kept them 'always on'.
> >
> >>
> >> <...>
> >>
> >>> +++ b/lib/ethdev/version.map
> >>> @@ -247,11 +247,16 @@ EXPERIMENTAL {
> >>>   rte_mtr_meter_policy_delete;
> >>>   rte_mtr_meter_policy_update;
> >>>   rte_mtr_meter_policy_validate;
> >>> +
> >>> + # added in 21.05
> >>
> >> s/21.05/21.11/
> >>
> >>> + __rte_eth_rx_epilog;
> >>> + __rte_eth_tx_prolog;
> >>
> >> These are directly called by application and must be part of ABI, but 
> >> marked as
> >> 'internal' and has '__rte' prefix to highligh it, this may be confusing.
> >> What about making them proper, non-internal, API?
> >
> > Hmm not sure what do you suggest here.
> > We don't want users to call them explicitly.
> > They are sort of helpers for rte_eth_rx_burst/rte_eth_tx_burst.
> > So I did what I thought is our usual policy for such semi-internal thigns:
> > have '@intenal' in comments, but in version.map put them under 
> > EXPERIMETAL/global
> > section.
> >
> > What do you think it should be instead?
> >
> 
> Make them public API. (Basically just remove '__' prefix and @internal 
> comment).
> 
> This way application can use them to run custom callback(s) (not only the
> registered ones), not sure if this can be dangerous though.

Hmm, as I said above, I don't want users to call them explicitly.
Do you have any good reason to allow it?

> 
> We need to trace the ABI for these functions, making them public clarifies it.

We do have plenty of semi-internal functions right now,
why adding that one will be a problem?
From other side - if we'll declare it public, we will have obligations to 
support it
in future releases, plus it might encourage users to use it on its own.
To me that sounds like extra headache without any gain in return.

> Also comment can be updated to describe intended usage instead of marking them
> internal, and applications can use these anyway if we mark them internal or 
> not.



[dpdk-dev] [PATCH v2] lib/ring: remove experimental tag from functions

2021-10-04 Thread Sean Morrissey
These methods were introduced in 20.05.
There has been no changes in their public API since then.
They seem mature enough to remove the experimental tag.

Signed-off-by: Sean Morrissey 
Acked-by: Konstantin Ananyev 
---
 lib/ring/rte_ring_core.h|  2 --
 lib/ring/rte_ring_elem.h| 12 
 lib/ring/rte_ring_hts.h |  9 -
 lib/ring/rte_ring_peek.h| 13 -
 lib/ring/rte_ring_peek_zc.h | 13 -
 lib/ring/rte_ring_rts.h | 13 -
 6 files changed, 62 deletions(-)

diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 16718ca7f1..4f80c91b72 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -57,10 +57,8 @@ enum rte_ring_queue_behavior {
 enum rte_ring_sync_type {
RTE_RING_SYNC_MT, /**< multi-thread safe (default mode) */
RTE_RING_SYNC_ST, /**< single thread only */
-#ifdef ALLOW_EXPERIMENTAL_API
RTE_RING_SYNC_MT_RTS, /**< multi-thread relaxed tail sync */
RTE_RING_SYNC_MT_HTS, /**< multi-thread head/tail sync */
-#endif
 };
 
 /**
diff --git a/lib/ring/rte_ring_elem.h b/lib/ring/rte_ring_elem.h
index 98c5495e02..4bd016c110 100644
--- a/lib/ring/rte_ring_elem.h
+++ b/lib/ring/rte_ring_elem.h
@@ -165,10 +165,8 @@ rte_ring_sp_enqueue_bulk_elem(struct rte_ring *r, const 
void *obj_table,
RTE_RING_QUEUE_FIXED, RTE_RING_SYNC_ST, free_space);
 }
 
-#ifdef ALLOW_EXPERIMENTAL_API
 #include 
 #include 
-#endif
 
 /**
  * Enqueue several objects on a ring.
@@ -204,14 +202,12 @@ rte_ring_enqueue_bulk_elem(struct rte_ring *r, const void 
*obj_table,
case RTE_RING_SYNC_ST:
return rte_ring_sp_enqueue_bulk_elem(r, obj_table, esize, n,
free_space);
-#ifdef ALLOW_EXPERIMENTAL_API
case RTE_RING_SYNC_MT_RTS:
return rte_ring_mp_rts_enqueue_bulk_elem(r, obj_table, esize, n,
free_space);
case RTE_RING_SYNC_MT_HTS:
return rte_ring_mp_hts_enqueue_bulk_elem(r, obj_table, esize, n,
free_space);
-#endif
}
 
/* valid ring should never reach this point */
@@ -388,14 +384,12 @@ rte_ring_dequeue_bulk_elem(struct rte_ring *r, void 
*obj_table,
case RTE_RING_SYNC_ST:
return rte_ring_sc_dequeue_bulk_elem(r, obj_table, esize, n,
available);
-#ifdef ALLOW_EXPERIMENTAL_API
case RTE_RING_SYNC_MT_RTS:
return rte_ring_mc_rts_dequeue_bulk_elem(r, obj_table, esize,
n, available);
case RTE_RING_SYNC_MT_HTS:
return rte_ring_mc_hts_dequeue_bulk_elem(r, obj_table, esize,
n, available);
-#endif
}
 
/* valid ring should never reach this point */
@@ -576,14 +570,12 @@ rte_ring_enqueue_burst_elem(struct rte_ring *r, const 
void *obj_table,
case RTE_RING_SYNC_ST:
return rte_ring_sp_enqueue_burst_elem(r, obj_table, esize, n,
free_space);
-#ifdef ALLOW_EXPERIMENTAL_API
case RTE_RING_SYNC_MT_RTS:
return rte_ring_mp_rts_enqueue_burst_elem(r, obj_table, esize,
n, free_space);
case RTE_RING_SYNC_MT_HTS:
return rte_ring_mp_hts_enqueue_burst_elem(r, obj_table, esize,
n, free_space);
-#endif
}
 
/* valid ring should never reach this point */
@@ -688,14 +680,12 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void 
*obj_table,
case RTE_RING_SYNC_ST:
return rte_ring_sc_dequeue_burst_elem(r, obj_table, esize, n,
available);
-#ifdef ALLOW_EXPERIMENTAL_API
case RTE_RING_SYNC_MT_RTS:
return rte_ring_mc_rts_dequeue_burst_elem(r, obj_table, esize,
n, available);
case RTE_RING_SYNC_MT_HTS:
return rte_ring_mc_hts_dequeue_burst_elem(r, obj_table, esize,
n, available);
-#endif
}
 
/* valid ring should never reach this point */
@@ -705,10 +695,8 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void 
*obj_table,
return 0;
 }
 
-#ifdef ALLOW_EXPERIMENTAL_API
 #include 
 #include 
-#endif
 
 #include 
 
diff --git a/lib/ring/rte_ring_hts.h b/lib/ring/rte_ring_hts.h
index a9342083f4..9a5938ac58 100644
--- a/lib/ring/rte_ring_hts.h
+++ b/lib/ring/rte_ring_hts.h
@@ -12,7 +12,6 @@
 
 /**
  * @file rte_ring_hts.h
- * @b EXPERIMENTAL: this API may change without prior notice
  * It is not recommended to include this file directly.
  * Please include  instead.
  *
@@ -50,7 +49,6 @@ extern "C" {
  * @return
  *   The number of objects enqueued, either 0 or n
  */
-__rte_experimental
 static __rte_always_inline unsigned int
 rte_ring_mp_hts_enqueue_bulk_elem(struct rte_ring *r, const void *obj_table,
unsigned int esize, unsigned int n, unsigned int *free_space)
@@ -78,7 +76,6 @@ rte_ring

Re: [dpdk-dev] [PATCH 1/3] mbuf: remove deprecated offload flags

2021-10-04 Thread Olivier Matz
Hi David,

Thank you for the review, my comments below.

On Mon, Oct 04, 2021 at 10:29:36AM +0200, David Marchand wrote:
> On Wed, Sep 29, 2021 at 11:50 PM Olivier Matz  wrote:
> >
> > The flags PKT_TX_VLAN_PKT, PKT_TX_QINQ_PKT, PKT_RX_EIP_CKSUM_BAD are
> > marked as deprecated since commit 380a7aab1ae2 ("mbuf: rename deprecated
> > VLAN flags") (2017). Remove their definitions from rte_mbuf_core.h,
> > and replace their usages.
> 
> The patch lgtm except the removal of some "bad checksum" flags, see below.
>
> [snip]
> 
> 
> > diff --git a/doc/guides/rel_notes/deprecation.rst 
> > b/doc/guides/rel_notes/deprecation.rst
> > index 05fc2fdee7..549e9416c4 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -159,11 +159,6 @@ Deprecation Notices
> >will be limited to maximum 256 queues.
> >Also compile time flag ``RTE_ETHDEV_QUEUE_STAT_CNTRS`` will be removed.
> >
> > -* ethdev: The offload flag ``PKT_RX_EIP_CKSUM_BAD`` will be removed and
> > -  replaced by the new flag ``PKT_RX_OUTER_IP_CKSUM_BAD``. The new name is 
> > more
> > -  consistent with existing outer header checksum status flag naming, which
> > -  should help in reducing confusion about its usage.
> > -
> >  * i40e: As there are both i40evf and iavf pmd, the functions of them are
> >duplicated. And now more and more advanced features are developed on 
> > iavf.
> >To keep consistent with kernel driver's name
> 
> Those 3 flags are easy to replace, but some projects are still using them.
> 
> $ git grep-all -El
> '\<(PKT_TX_VLAN_PKT|PKT_TX_QINQ_PKT|PKT_RX_EIP_CKSUM_BAD)\>' |grep -v
> \\.patch$
> DPVS/src/netif.c
> DPVS/src/vlan.c
> FD.io-VPP/src/plugins/dpdk/device/format.c
> gatekeeper/bpf/bpf_mbuf.h
> lagopus/src/dataplane/dpdk/worker.c
> packet-journey/app/main.c
> Trex/src/pal/common/common_mbuf.h
> Trex/src/pal/linux/mbuf.h
> 
> Please update the release notes to announce this API update.

I will add an additional note in the release note.

FYI, the flags PKT_TX_VLAN_PKT and PKT_TX_QINQ_PKT were not marked
RTE_DEPRECATED because their deprecation is older than this macro. If needed, I
can keep them for one more version in the header file.

> [snip]
> 
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > index 9d8e3ddc86..93db9292c0 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -55,37 +55,12 @@ extern "C" {
> >   /** RX packet with FDIR match indicate. */
> >  #define PKT_RX_FDIR  (1ULL << 2)
> >
> > -/**
> > - * Deprecated.
> > - * Checking this flag alone is deprecated: check the 2 bits of
> > - * PKT_RX_L4_CKSUM_MASK.
> > - * This flag was set when the L4 checksum of a packet was detected as
> > - * wrong by the hardware.
> > - */
> > -#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
> > -
> > -/**
> > - * Deprecated.
> > - * Checking this flag alone is deprecated: check the 2 bits of
> > - * PKT_RX_IP_CKSUM_MASK.
> > - * This flag was set when the IP checksum of a packet was detected as
> > - * wrong by the hardware.
> > - */
> > -#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
> > -
> 
> You did not mention PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD in the
> commitlog.
> There was no deprecation notice, and those flags were not marked
> RTE_DEPRECATED (there are still many projects referencing them).
> 
> Is this removal intended?

Yes, actually these flags were defined twice at different places. I'm just
removing one occurence, and the other remains.

Thanks,
Olivier


Re: [dpdk-dev] [PATCH v1 02/12] ethdev: add eswitch port item to flow API

2021-10-04 Thread Ori Kam
Hi Ivan,

> -Original Message-
> From: Ivan Malov 
> Sent: Sunday, October 3, 2021 9:11 PM
> Subject: Re: [PATCH v1 02/12] ethdev: add eswitch port item to flow API
> 
> 
> 
> On 03/10/2021 15:40, Ori Kam wrote:
> > Hi Andrew and Ivan,
> >
> >> -Original Message-
> >> From: Andrew Rybchenko 
> >> Sent: Friday, October 1, 2021 4:47 PM
> >> Subject: [PATCH v1 02/12] ethdev: add eswitch port item to flow API
> >>
> >> From: Ivan Malov 
> >>
> >> For use with "transfer" flows. Supposed to match traffic entering the
> >> e-switch from the external world (network, guests) via the port which
> >> is logically connected with the given ethdev.
> >>
> >> Must not be combined with attributes "ingress" / "egress".
> >>
> >> This item is meant to use the same structure as ethdev item.
> >>
> >
> > In case the app is not working with representors, meaning each switch
> > port is mapped to ethdev.
> > both items (ethdev and eswitch port ) have the same meaning?
> 
> No. Ethdev means ethdev, and e-switch port is the point where this ethdev
> is plugged to. For example, "transfer + ESWITCH_PORT" for a regular PF
> ethdev typically means the network port (maybe you can recall the idea that
> a PF ethdev "represents" the network port it's associated with).
> 
> I believe, that diagrams which these patches add to
> "doc/guides/prog_guide/rte_flow.rst" may come in handy to understand the
> meaning. Also, you can take a look at our larger diagram from the Sep 14
> gathering.
> 

Lets look at the following system:
E-Switch has 3 ports - PF, VF1, VF2
The ports are distributed as follows:
DPDK application:
ethdev(0) pf,
ethdev(1) representor to VF1
ethdev(2) representor to VF2
ethdev(3) VF1

VM:
VF2

As we know all representors are realy connected to the PF(at least in this 
example)

So matching on ethdev(3)  means matching on traffic sent from DPDK port 3 right?
And matching on eswitch_port(3) means matching in traffic that goes into VF1 
which
is the same traffic as ethdev(3) right?

Matching on ethdev(1) means matching on the PF port in the E-Switch but with 
some
metadata that marks the traffic as coming from DPDK port 1 and not from VF1 
E-Switch
port right?

While matching on eswitch_port(2) means matching on traffic coming from the VM 
right?
 
> >
> >> Signed-off-by: Ivan Malov 
> >> Signed-off-by: Andrew Rybchenko 
> >> ---
> >>   app/test-pmd/cmdline_flow.c | 27 +
> >>   doc/guides/prog_guide/rte_flow.rst  | 22 +
> >>   doc/guides/rel_notes/release_21_11.rst  |  2 +-
> >>   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  4 +++
> >>   lib/ethdev/rte_flow.c   |  1 +
> >>   lib/ethdev/rte_flow.h   | 12 -
> >>   6 files changed, 66 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/app/test-pmd/cmdline_flow.c b/app/test-
> pmd/cmdline_flow.c
> >> index e05b0d83d2..188d0ee39d 100644
> >> --- a/app/test-pmd/cmdline_flow.c
> >> +++ b/app/test-pmd/cmdline_flow.c
> >> @@ -308,6 +308,8 @@ enum index {
> >>ITEM_POL_POLICY,
> >>ITEM_ETHDEV,
> >>ITEM_ETHDEV_ID,
> >> +  ITEM_ESWITCH_PORT,
> >> +  ITEM_ESWITCH_PORT_ETHDEV_ID,
> >
> > Like my comment from previous patch, I'm not sure the correct
> > term for ETHDEV is ID is should be port.
> 
> Please see my reply in the previous thread. "ethdev" here is an
> "anchor", a "beacon" of sorts which allows either to refer namely to
> this ethdev or to the e-switch port associated with it.
> 
> >
> >>
> >>/* Validate/create actions. */
> >>ACTIONS,
> >> @@ -1003,6 +1005,7 @@ static const enum index next_item[] = {
> >>ITEM_INTEGRITY,
> >>ITEM_CONNTRACK,
> >>ITEM_ETHDEV,
> >> +  ITEM_ESWITCH_PORT,
> >>END_SET,
> >>ZERO,
> >>   };
> >> @@ -1377,6 +1380,12 @@ static const enum index item_ethdev[] = {
> >>ZERO,
> >>   };
> >>
> >> +static const enum index item_eswitch_port[] = {
> >> +  ITEM_ESWITCH_PORT_ETHDEV_ID,
> >> +  ITEM_NEXT,
> >> +  ZERO,
> >> +};
> >> +
> >>   static const enum index next_action[] = {
> >>ACTION_END,
> >>ACTION_VOID,
> >> @@ -3632,6 +3641,21 @@ static const struct token token_list[] = {
> >> item_param),
> >>.args = ARGS(ARGS_ENTRY(struct rte_flow_item_ethdev,
> id)),
> >>},
> >> +  [ITEM_ESWITCH_PORT] = {
> >> +  .name = "eswitch_port",
> >> +  .help = "match traffic at e-switch going from the external port
> >> associated with the given ethdev",
> >
> > Missing the word logically since if we are talking about representor the
> connected port
> > is the PF while we want to match traffic on one of the FVs.
> 
> Doesn't the word "external" say it all?
> 
> Representor Ethdev <--> Admin ethdev's PF <--> E-Switch <--> VF
> (external / the most remote endpoint).
> 

Until the last comment External had totally different meaning to me.
I think you should add some place the meaning of external or use
the most remote endpoint.

> >
> >> +  

Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-04 Thread David Marchand
On Mon, Oct 4, 2021 at 10:51 AM Harman Kalra  wrote:
> > > +struct rte_intr_handle *rte_intr_handle_instance_alloc(int size,
> > > +  bool
> > > +from_hugepage) {
> > > +   struct rte_intr_handle *intr_handle;
> > > +   int i;
> > > +
> > > +   if (from_hugepage)
> > > +   intr_handle = rte_zmalloc(NULL,
> > > + size * sizeof(struct 
> > > rte_intr_handle),
> > > + 0);
> > > +   else
> > > +   intr_handle = calloc(1, size * sizeof(struct
> > > + rte_intr_handle));
> >
> > We can call DPDK allocator in all cases.
> > That would avoid headaches on why multiprocess does not work in some
> > rarely tested cases.
> > Wdyt?
> >
> > Plus "from_hugepage" is misleading, you could be in --no-huge mode,
> > rte_zmalloc still works.
>
>  In mellanox 5 driver interrupt handle instance is freed in destructor
> " mlx5_pmd_interrupt_handler_uninstall()" while DPDK memory allocators
> are already cleaned up in "rte_eal_cleanup". Hence I allocated interrupt
> instances for such cases from normal heap. There could be other such cases
> so I think its ok to keep this support.

This is surprising.
Why would the mlx5 driver wait to release in a destructor?
It should be done once no interrupt handler is necessary (like when
stopping all ports), and that would be before rte_eal_cleanup().


-- 
David Marchand



Re: [dpdk-dev] [PATCH v5 1/3] net/thunderx: enable build only on 64-bit Linux

2021-10-04 Thread Pavan Nikhilesh Bhagavatula
>On 10/4/2021 6:56 AM, pbhagavat...@marvell.com wrote:
>> From: Pavan Nikhilesh 
>>
>> Due to Linux kernel AF(Admin function) driver dependency, only
>enable
>> build for 64-bit Linux.
>>
>
>Hi Pavan,
>
>Isn't it possible to provide a commit log in the kernel side etc, that let
>others to verify why only 64 bit is required, or if someone want to
>support
>32bit that may help them to investigate the source of the restriction.

Arch 32 support is not implemented on ThunderX, so 32bit will not run.

>
>> Signed-off-by: Pavan Nikhilesh 
>> Acked-by: Jerin Jacob 
>> ---
>>  v5 Changes
>>  - s/fuction/function.
>>
>>  v4 Changes:
>>  - Update commit message regarding dependency on AF driver.
>>
>>  drivers/net/thunderx/meson.build | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/thunderx/meson.build
>b/drivers/net/thunderx/meson.build
>> index 4bbcea7f93..da665bd76f 100644
>> --- a/drivers/net/thunderx/meson.build
>> +++ b/drivers/net/thunderx/meson.build
>> @@ -1,9 +1,9 @@
>>  # SPDX-License-Identifier: BSD-3-Clause
>>  # Copyright(c) 2017 Cavium, Inc
>>
>> -if is_windows
>> +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
>>  build = false
>> -reason = 'not supported on Windows'
>> +reason = 'only supported on 64-bit Linux'
>>  subdir_done()
>>  endif
>>
>> --
>> 2.17.1
>>



[dpdk-dev] [RFC] eal/arm: remove CASP constraints for GCC

2021-10-04 Thread pbhagavatula
From: Pavan Nikhilesh 

GCC now assigns even register pairs for CASP, the fix has also been
backported to all stable releases of older GCC versions.
Removing the manual register allocation allows GCC to inline the
functions and pick optimal registers for performing CASP.

Signed-off-by: Pavan Nikhilesh 
---
 lib/eal/arm/include/rte_atomic_64.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/eal/arm/include/rte_atomic_64.h 
b/lib/eal/arm/include/rte_atomic_64.h
index fa6f334c0d..f6f31ae777 100644
--- a/lib/eal/arm/include/rte_atomic_64.h
+++ b/lib/eal/arm/include/rte_atomic_64.h
@@ -52,6 +52,7 @@ rte_atomic_thread_fence(int memorder)
 #define __LSE_PREAMBLE ""
 #endif
 
+#if defined(__clang__)
 #define __ATOMIC128_CAS_OP(cas_op_name, op_string)  \
 static __rte_noinline void  \
 cas_op_name(rte_int128_t *dst, rte_int128_t *old, rte_int128_t updated) \
@@ -76,6 +77,19 @@ cas_op_name(rte_int128_t *dst, rte_int128_t *old, 
rte_int128_t updated) \
old->val[0] = x0;   \
old->val[1] = x1;   \
 }
+#else
+#define __ATOMIC128_CAS_OP(cas_op_name, op_string)  \
+static __rte_always_inline void \
+cas_op_name(rte_int128_t *dst, rte_int128_t *old, rte_int128_t updated) \
+{   \
+   asm volatile(   \
+   __LSE_PREAMBLE  \
+   op_string " %[old], %H[old], %[upd], %H[upd], [%[dst]]" \
+   : [old] "+r"(old->int128)   \
+   : [upd] "r"(updated.int128), [dst] "r"(dst) \
+   : "memory");\
+}
+#endif
 
 __ATOMIC128_CAS_OP(__cas_128_relaxed, "casp")
 __ATOMIC128_CAS_OP(__cas_128_acquire, "caspa")
-- 
2.17.1



[dpdk-dev] [PATCH v1 0/5] introduce IWYU

2021-10-04 Thread Sean Morrissey
This patchset introduces the include-what-you-use script which removes
unused header includes. IWYU GitHub:

https://github.com/include-what-you-use/include-what-you-use

Along with the script there are some patches which make a start on
removing unneeded headers.

Sean Morrissey (5):
  devtools: script to remove unused headers includes
  lib/telemetry: remove unneeded header includes
  lib/ring: remove unneeded header includes
  lib/kvargs: remove unneeded header includes
  lib/eal: remove unneeded header includes

 devtools/process_iwyu.py   | 109 +
 lib/eal/common/eal_common_dev.c|   5 --
 lib/eal/common/eal_common_devargs.c|   1 -
 lib/eal/common/eal_common_errno.c  |   4 -
 lib/eal/common/eal_common_fbarray.c|   3 -
 lib/eal/common/eal_common_hexdump.c|   3 -
 lib/eal/common/eal_common_launch.c |   6 --
 lib/eal/common/eal_common_lcore.c  |   6 --
 lib/eal/common/eal_common_log.c|   2 -
 lib/eal/common/eal_common_memalloc.c   |   3 -
 lib/eal/common/eal_common_memory.c |   5 --
 lib/eal/common/eal_common_memzone.c|   4 -
 lib/eal/common/eal_common_options.c|   2 -
 lib/eal/common/eal_common_proc.c   |   2 -
 lib/eal/common/eal_common_string_fns.c |   2 -
 lib/eal/common/eal_common_tailqs.c |  11 ---
 lib/eal/common/eal_common_thread.c |   3 -
 lib/eal/common/eal_common_timer.c  |   6 --
 lib/eal/common/eal_common_trace.c  |   1 -
 lib/eal/common/hotplug_mp.h|   1 -
 lib/eal/common/malloc_elem.c   |   6 --
 lib/eal/common/malloc_heap.c   |   5 --
 lib/eal/common/malloc_mp.c |   1 -
 lib/eal/common/malloc_mp.h |   2 -
 lib/eal/common/rte_malloc.c|   6 --
 lib/eal/common/rte_random.c|   3 -
 lib/eal/common/rte_service.c   |   6 --
 lib/eal/include/rte_version.h  |   2 -
 lib/eal/linux/eal.c|  10 ---
 lib/eal/linux/eal_alarm.c  |   7 --
 lib/eal/linux/eal_cpuflags.c   |   2 -
 lib/eal/linux/eal_debug.c  |   5 --
 lib/eal/linux/eal_dev.c|   4 -
 lib/eal/linux/eal_hugepage_info.c  |   8 --
 lib/eal/linux/eal_interrupts.c |   8 --
 lib/eal/linux/eal_lcore.c  |   7 --
 lib/eal/linux/eal_log.c|  11 +--
 lib/eal/linux/eal_memalloc.c   |   8 --
 lib/eal/linux/eal_memory.c |   9 --
 lib/eal/linux/eal_thread.c |   5 --
 lib/eal/linux/eal_timer.c  |  15 
 lib/eal/linux/eal_vfio_mp_sync.c   |   1 -
 lib/eal/unix/eal_file.c|   1 -
 lib/eal/unix/rte_thread.c  |   1 -
 lib/eal/x86/rte_cycles.c   |   1 -
 lib/kvargs/rte_kvargs.c|   1 -
 lib/ring/rte_ring.c|   7 --
 lib/telemetry/telemetry.c  |   1 -
 lib/telemetry/telemetry_data.h |   1 -
 49 files changed, 110 insertions(+), 213 deletions(-)
 create mode 100755 devtools/process_iwyu.py

-- 
2.25.1



[dpdk-dev] [PATCH v1 1/5] devtools: script to remove unused headers includes

2021-10-04 Thread Sean Morrissey
This script can be used for removing headers flagged for removal by the
include-what-you-use (IWYU) tool. The script has the ability to remove
headers from specified sub-directories or dpdk as a whole.

example usages:

Remove headers flagged by iwyu_tool output file
$ ./devtools/process_iwyu.py iwyu.out -b build

Remove headers flagged by iwyu_tool output file from sub-directory
$ ./devtools/process_iwyu.py iwyu.out -b build -d lib/kvargs

Remove headers directly piped from the iwyu_tool
$ iwyu_tool -p build | ./devtools/process_iwyu.py - -b build

Signed-off-by: Sean Morrissey 
Signed-off-by: Conor Fogarty 
---
 devtools/process_iwyu.py | 109 +++
 1 file changed, 109 insertions(+)
 create mode 100755 devtools/process_iwyu.py

diff --git a/devtools/process_iwyu.py b/devtools/process_iwyu.py
new file mode 100755
index 00..ddc4ceafa4
--- /dev/null
+++ b/devtools/process_iwyu.py
@@ -0,0 +1,109 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+#
+
+import argparse
+import fileinput
+import sys
+from os.path import abspath, relpath, join
+from pathlib import Path
+from mesonbuild import mesonmain
+
+def args_parse():
+parser = argparse.ArgumentParser(description='This script can be used to 
remove includes which are not in use\n')
+parser.add_argument('-b', '--build_dir', type=str, help='Name of the build 
directory in which the IWYU tool was used in.', default="build")
+parser.add_argument('-d', '--sub_dir', type=str, help='The sub-directory 
to remove headers from.', default="")
+parser.add_argument('file', type=Path, help='The path to the IWYU log file 
or output from stdin.')
+
+args = parser.parse_args()
+
+return args
+
+
+def run_meson(args):
+"Runs a meson command logging output to process.log"
+with open('process.log', 'a') as sys.stdout:
+ret = mesonmain.run(args, abspath('meson'))
+sys.stdout = sys.__stdout__
+return ret
+
+
+def remove_includes(filename, include, dpdk_dir, build_dir):
+# Load in file - readlines()
+# loop through list once in mem -> make cpy of list with line removed
+# write cpy  -> stored in memory so write cpy to file then check
+# run test build -> call ninja on the build folder, ninja -C build, 
subprocess
+# if fails -> write original back to file otherwise continue on
+# newlist = [ln for ln in lines if not ln.startswith(...)] filters out one 
element
+filepath = filename
+
+with open(filepath, 'r+') as f:
+lines = f.readlines()  # Read lines when file is opened
+
+with open(filepath, 'w') as f:
+for ln in lines:  # Removes the include passed in
+if ln.strip("\n") != include:
+f.write(ln)
+
+ret = run_meson(['compile', '-C', join(dpdk_dir, build_dir)])
+if (ret == 0):  # Include is not needed -> build is successful
+print('SUCCESS')
+else:
+# failed, catch the error
+# return file to original state
+with open(filepath, 'w') as f:
+f.writelines(lines)
+print('FAILED')
+
+
+def get_build_config(builddir, condition):
+"returns contents of rte_build_config.h"
+with open(join(builddir, 'rte_build_config.h')) as f:
+return [ln for ln in f.readlines() if condition(ln)]
+
+
+def uses_libbsd(builddir):
+"return whether the build uses libbsd or not"
+return bool(get_build_config(builddir, lambda ln: 'RTE_USE_LIBBSD' in ln))
+
+
+def process(args):
+filename = None
+build_dir = args.build_dir
+dpdk_dir = abspath(__file__).split('/devtools')[0]
+directory = args.sub_dir
+# Use stdin if no iwyu_tool out file given
+logfile = abspath(args.file) if str(args.file) != '-' else args.file
+
+keep_str_fns = uses_libbsd(join(dpdk_dir, build_dir)) # check for libbsd
+if keep_str_fns:
+print('Warning: libbsd is present, build will fail to detect incorrect 
removal of rte_string_fns.h',
+  file=sys.stderr)
+run_meson(['configure', dpdk_dir + "/" + build_dir, '-Dwerror=true'])  # 
turn on werror
+
+for line in fileinput.input(logfile):
+if 'should remove' in line:
+# If the file path in the iwyu_tool output is an absolute path
+# it means the file is outside of the dpdk directory, therefore 
ignore it
+# Also check to see if the file is within the specified sub 
directory
+if line.split()[0] != abspath(line.split()[0]) and directory in 
line.split()[0]:
+filename = relpath(join(build_dir, line.split()[0]))
+elif line.startswith('-') and filename:
+include = '#include ' + line.split()[2]
+print(f"Remove {include} from {filename} ... ", end='', flush=True)
+if keep_str_fns and '' in include:
+print('skipped')
+continue
+remove_includes(filename, include, dpdk_dir, build

[dpdk-dev] [PATCH v1 2/5] lib/telemetry: remove unneeded header includes

2021-10-04 Thread Sean Morrissey
These header includes have been flagged by the iwyu_tool
and removed.

Signed-off-by: Sean Morrissey 
---
 lib/telemetry/telemetry.c  | 1 -
 lib/telemetry/telemetry_data.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index 8665db8d03..fb3b93ca7e 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -8,7 +8,6 @@
 #include 
 #include 
 #include 
-#include 
 #endif /* !RTE_EXEC_ENV_WINDOWS */
 
 /* we won't link against libbsd, so just always use DPDKs-specific strlcpy */
diff --git a/lib/telemetry/telemetry_data.h b/lib/telemetry/telemetry_data.h
index adb84a09f1..26aa28e72c 100644
--- a/lib/telemetry/telemetry_data.h
+++ b/lib/telemetry/telemetry_data.h
@@ -5,7 +5,6 @@
 #ifndef _TELEMETRY_DATA_H_
 #define _TELEMETRY_DATA_H_
 
-#include 
 #include "rte_telemetry.h"
 
 enum tel_container_types {
-- 
2.25.1



[dpdk-dev] [PATCH v1 3/5] lib/ring: remove unneeded header includes

2021-10-04 Thread Sean Morrissey
These header includes have been flagged by the iwyu_tool
and removed.

Signed-off-by: Sean Morrissey 
---
 lib/ring/rte_ring.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/lib/ring/rte_ring.c b/lib/ring/rte_ring.c
index f17bd966be..bb95962b0c 100644
--- a/lib/ring/rte_ring.c
+++ b/lib/ring/rte_ring.c
@@ -17,16 +17,9 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
 #include 
-- 
2.25.1



[dpdk-dev] [PATCH v1 4/5] lib/kvargs: remove unneeded header includes

2021-10-04 Thread Sean Morrissey
These header includes have been flagged by the iwyu_tool
and removed.

Signed-off-by: Sean Morrissey 
---
 lib/kvargs/rte_kvargs.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/kvargs/rte_kvargs.c b/lib/kvargs/rte_kvargs.c
index 38e9d5c1ca..4cce8e953b 100644
--- a/lib/kvargs/rte_kvargs.c
+++ b/lib/kvargs/rte_kvargs.c
@@ -7,7 +7,6 @@
 #include 
 #include 
 
-#include 
 #include 
 
 #include "rte_kvargs.h"
-- 
2.25.1



[dpdk-dev] [PATCH v1 5/5] lib/eal: remove unneeded header includes

2021-10-04 Thread Sean Morrissey
These header includes have been flagged by the iwyu_tool
and removed.

Signed-off-by: Sean Morrissey 
---
 lib/eal/common/eal_common_dev.c|  5 -
 lib/eal/common/eal_common_devargs.c|  1 -
 lib/eal/common/eal_common_errno.c  |  4 
 lib/eal/common/eal_common_fbarray.c|  3 ---
 lib/eal/common/eal_common_hexdump.c|  3 ---
 lib/eal/common/eal_common_launch.c |  6 --
 lib/eal/common/eal_common_lcore.c  |  6 --
 lib/eal/common/eal_common_log.c|  2 --
 lib/eal/common/eal_common_memalloc.c   |  3 ---
 lib/eal/common/eal_common_memory.c |  5 -
 lib/eal/common/eal_common_memzone.c|  4 
 lib/eal/common/eal_common_options.c|  2 --
 lib/eal/common/eal_common_proc.c   |  2 --
 lib/eal/common/eal_common_string_fns.c |  2 --
 lib/eal/common/eal_common_tailqs.c | 11 ---
 lib/eal/common/eal_common_thread.c |  3 ---
 lib/eal/common/eal_common_timer.c  |  6 --
 lib/eal/common/eal_common_trace.c  |  1 -
 lib/eal/common/hotplug_mp.h|  1 -
 lib/eal/common/malloc_elem.c   |  6 --
 lib/eal/common/malloc_heap.c   |  5 -
 lib/eal/common/malloc_mp.c |  1 -
 lib/eal/common/malloc_mp.h |  2 --
 lib/eal/common/rte_malloc.c|  6 --
 lib/eal/common/rte_random.c|  3 ---
 lib/eal/common/rte_service.c   |  6 --
 lib/eal/include/rte_version.h  |  2 --
 lib/eal/linux/eal.c| 10 --
 lib/eal/linux/eal_alarm.c  |  7 ---
 lib/eal/linux/eal_cpuflags.c   |  2 --
 lib/eal/linux/eal_debug.c  |  5 -
 lib/eal/linux/eal_dev.c|  4 
 lib/eal/linux/eal_hugepage_info.c  |  8 
 lib/eal/linux/eal_interrupts.c |  8 
 lib/eal/linux/eal_lcore.c  |  7 ---
 lib/eal/linux/eal_log.c| 11 +--
 lib/eal/linux/eal_memalloc.c   |  8 
 lib/eal/linux/eal_memory.c |  9 -
 lib/eal/linux/eal_thread.c |  5 -
 lib/eal/linux/eal_timer.c  | 15 ---
 lib/eal/linux/eal_vfio_mp_sync.c   |  1 -
 lib/eal/unix/eal_file.c|  1 -
 lib/eal/unix/rte_thread.c  |  1 -
 lib/eal/x86/rte_cycles.c   |  1 -
 44 files changed, 1 insertion(+), 203 deletions(-)

diff --git a/lib/eal/common/eal_common_dev.c b/lib/eal/common/eal_common_dev.c
index 148a23830a..12bf3d3c22 100644
--- a/lib/eal/common/eal_common_dev.c
+++ b/lib/eal/common/eal_common_dev.c
@@ -5,20 +5,15 @@
 
 #include 
 #include 
-#include 
 #include 
 
-#include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 #include 
 #include 
-#include 
 #include 
 
 #include "eal_private.h"
diff --git a/lib/eal/common/eal_common_devargs.c 
b/lib/eal/common/eal_common_devargs.c
index 7ab9e71b2a..f4beb35591 100644
--- a/lib/eal/common/eal_common_devargs.c
+++ b/lib/eal/common/eal_common_devargs.c
@@ -12,7 +12,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/lib/eal/common/eal_common_errno.c 
b/lib/eal/common/eal_common_errno.c
index f86802705a..1091065568 100644
--- a/lib/eal/common/eal_common_errno.c
+++ b/lib/eal/common/eal_common_errno.c
@@ -5,15 +5,11 @@
 /* Use XSI-compliant portable version of strerror_r() */
 #undef _GNU_SOURCE
 
-#include 
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
-#include 
 
 #ifdef RTE_EXEC_ENV_WINDOWS
 #define strerror_r(errnum, buf, buflen) strerror_s(buf, buflen, errnum)
diff --git a/lib/eal/common/eal_common_fbarray.c 
b/lib/eal/common/eal_common_fbarray.c
index 3a28a53247..f11f87979f 100644
--- a/lib/eal/common/eal_common_fbarray.c
+++ b/lib/eal/common/eal_common_fbarray.c
@@ -2,7 +2,6 @@
  * Copyright(c) 2017-2018 Intel Corporation
  */
 
-#include 
 #include 
 #include 
 #include 
@@ -14,9 +13,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 #include "eal_filesystem.h"
 #include "eal_private.h"
diff --git a/lib/eal/common/eal_common_hexdump.c 
b/lib/eal/common/eal_common_hexdump.c
index 2d2179d411..63bbbdcf0a 100644
--- a/lib/eal/common/eal_common_hexdump.c
+++ b/lib/eal/common/eal_common_hexdump.c
@@ -1,10 +1,7 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
  */
-#include 
 #include 
-#include 
-#include 
 #include 
 #include 
 
diff --git a/lib/eal/common/eal_common_launch.c 
b/lib/eal/common/eal_common_launch.c
index 34f854ad80..2a20b32a77 100644
--- a/lib/eal/common/eal_common_launch.c
+++ b/lib/eal/common/eal_common_launch.c
@@ -3,16 +3,10 @@
  */
 
 #include 
-#include 
-#include 
-#include 
 
 #include 
-#include 
-#include 
 #include 
 #include 
-#include 
 #include 
 
 #include "eal_private.h"
diff --git a/lib/eal/common/eal_common_lcore.c 
b/lib/eal/common/eal_common_lcore.c
index 66d6bad1a7..4307a13190 100644
--- a/lib/eal/common/eal_common_lcore.c
+++ b/lib/eal/common/ea

Re: [dpdk-dev] [PATCH v3 4/7] ethdev: make burst functions to use new flat array

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 10:20 AM, Ananyev, Konstantin wrote:
> 

>  static inline int
>  rte_eth_rx_queue_count(uint16_t port_id, uint16_t queue_id)
>  {
> - struct rte_eth_dev *dev;
> + struct rte_eth_fp_ops *p;
> + void *qd;
> +
> + if (port_id >= RTE_MAX_ETHPORTS ||
> + queue_id >= RTE_MAX_QUEUES_PER_PORT) {
> + RTE_ETHDEV_LOG(ERR,
> + "Invalid port_id=%u or queue_id=%u\n",
> + port_id, queue_id);
> + return -EINVAL;
> + }

 Should the checkes wrapped with '#ifdef RTE_ETHDEV_DEBUG_RX' like others?
>>>
>>> Original rte_eth_rx_queue_count() always have similar checks enabled,
>>> that's why I also kept them 'always on'.
>>>

 <...>

> +++ b/lib/ethdev/version.map
> @@ -247,11 +247,16 @@ EXPERIMENTAL {
>   rte_mtr_meter_policy_delete;
>   rte_mtr_meter_policy_update;
>   rte_mtr_meter_policy_validate;
> +
> + # added in 21.05

 s/21.05/21.11/

> + __rte_eth_rx_epilog;
> + __rte_eth_tx_prolog;

 These are directly called by application and must be part of ABI, but 
 marked as
 'internal' and has '__rte' prefix to highligh it, this may be confusing.
 What about making them proper, non-internal, API?
>>>
>>> Hmm not sure what do you suggest here.
>>> We don't want users to call them explicitly.
>>> They are sort of helpers for rte_eth_rx_burst/rte_eth_tx_burst.
>>> So I did what I thought is our usual policy for such semi-internal thigns:
>>> have '@intenal' in comments, but in version.map put them under 
>>> EXPERIMETAL/global
>>> section.
>>>
>>> What do you think it should be instead?
>>>
>>
>> Make them public API. (Basically just remove '__' prefix and @internal 
>> comment).
>>
>> This way application can use them to run custom callback(s) (not only the
>> registered ones), not sure if this can be dangerous though.
> 
> Hmm, as I said above, I don't want users to call them explicitly.
> Do you have any good reason to allow it?
> 

Just to get rid of this internal APIs that is exposed to application state.

>>
>> We need to trace the ABI for these functions, making them public clarifies 
>> it.
> 
> We do have plenty of semi-internal functions right now,
> why adding that one will be a problem?

As far as I remember existing ones are 'static inline' functions, and we don't
have an ABI concern with them. But these are actual functions called by 
application.

> From other side - if we'll declare it public, we will have obligations to 
> support it
> in future releases, plus it might encourage users to use it on its own.
> To me that sounds like extra headache without any gain in return.
> 

If having those two as public API doesn't make sense, I agree with you.

>> Also comment can be updated to describe intended usage instead of marking 
>> them
>> internal, and applications can use these anyway if we mark them internal or 
>> not.
> 



Re: [dpdk-dev] [PATCH v1 5/5] lib/eal: remove unneeded header includes

2021-10-04 Thread Van Haaren, Harry
> -Original Message-
> From: Morrissey, Sean 
> Sent: Monday, October 4, 2021 11:11 AM
> To: Burakov, Anatoly ; Jerin Jacob
> ; Sunil Kumar Kori ; mattias.ronnblom
> ; Van Haaren, Harry
> ; Harman Kalra ;
> Richardson, Bruce ; Ananyev, Konstantin
> 
> Cc: dev@dpdk.org; Morrissey, Sean 
> Subject: [PATCH v1 5/5] lib/eal: remove unneeded header includes
> 
> These header includes have been flagged by the iwyu_tool
> and removed.
> 
> Signed-off-by: Sean Morrissey 



For lib/eal/common/rte_service.c;
Reviewed-by: Harry van Haaren 




Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-04 Thread Harman Kalra
Hi Dmitry,

Thanks for reviewing the series.
Please find my comments inline.

> -Original Message-
> From: Dmitry Kozlyuk 
> Sent: Sunday, October 3, 2021 11:35 PM
> To: Harman Kalra 
> Cc: dev@dpdk.org; Ray Kinsella 
> Subject: [EXT] Re: [dpdk-dev] [PATCH v1 2/7] eal/interrupts: implement get
> set APIs
> 
> External Email
> 
> --
> 2021-09-03 18:10 (UTC+0530), Harman Kalra:
> > [...]
> > diff --git a/lib/eal/common/eal_common_interrupts.c
> > b/lib/eal/common/eal_common_interrupts.c
> > new file mode 100644
> > index 00..2e4fed96f0
> > --- /dev/null
> > +++ b/lib/eal/common/eal_common_interrupts.c
> > @@ -0,0 +1,506 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(C) 2021 Marvell.
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +
> > +struct rte_intr_handle *rte_intr_handle_instance_alloc(int size,
> > +  bool from_hugepage)
> 
> Since the purpose of the series is to reduce future ABI breakages, how about
> making the second parameter "flags" to have some spare bits?
> (If not removing it completely per David's suggestion.)
> 

 Having second parameter "flags" is a good suggestion, I will include it.

> > +{
> > +   struct rte_intr_handle *intr_handle;
> > +   int i;
> > +
> > +   if (from_hugepage)
> > +   intr_handle = rte_zmalloc(NULL,
> > + size * sizeof(struct rte_intr_handle),
> > + 0);
> > +   else
> > +   intr_handle = calloc(1, size * sizeof(struct rte_intr_handle));
> > +   if (!intr_handle) {
> > +   RTE_LOG(ERR, EAL, "Fail to allocate intr_handle\n");
> > +   rte_errno = ENOMEM;
> > +   return NULL;
> > +   }
> > +
> > +   for (i = 0; i < size; i++) {
> > +   intr_handle[i].nb_intr = RTE_MAX_RXTX_INTR_VEC_ID;
> > +   intr_handle[i].alloc_from_hugepage = from_hugepage;
> > +   }
> > +
> > +   return intr_handle;
> > +}
> > +
> > +struct rte_intr_handle *rte_intr_handle_instance_index_get(
> > +   struct rte_intr_handle *intr_handle, int
> index)
> 
> If rte_intr_handle_instance_alloc() returns a pointer to an array, this 
> function
> is useless since the user can simply manipulate a pointer.

 User wont be able to manipulate the pointer as he is not aware of size of 
struct rte_intr_handle.
He will observe "dereferencing pointer to incomplete type" compilation error.

> If we want to make a distinction between a single struct rte_intr_handle and
> a commonly allocated bunch of such (but why?), then they should be
> represented by distinct types.

 Do you mean, we should have separate APIs for single allocation and batch 
allocation? As get API
will be useful only in case of batch allocation. Currently interrupt autotests 
and ifpga_rawdev driver makes
batch allocation. 
I think common API for single and batch is fine, get API is required for 
returning a particular intr_handle instance.
But one problem I see in current implementation is there should be upper limit 
check for index in get/set
API, which I will fix.

> 
> > +{
> > +   if (intr_handle == NULL) {
> > +   RTE_LOG(ERR, EAL, "Interrupt instance unallocated\n");
> > +   rte_errno = ENOMEM;
> 
> Why it's sometimes ENOMEM and sometimes ENOTSUP when the handle is
> not allocated?

 I will fix and make it symmetrical across.

> 
> > +   return NULL;
> > +   }
> > +
> > +   return &intr_handle[index];
> > +}
> > +
> > +int rte_intr_handle_instance_index_set(struct rte_intr_handle
> *intr_handle,
> > +  const struct rte_intr_handle *src,
> > +  int index)
> 
> See above regarding the "index" parameter. If it can be removed, a better
> name for this function would be rte_intr_handle_copy().

 I think get API is required.

> 
> > +{
> > +   if (intr_handle == NULL) {
> > +   RTE_LOG(ERR, EAL, "Interrupt instance unallocated\n");
> > +   rte_errno = ENOTSUP;
> > +   goto fail;
> > +   }
> > +
> > +   if (src == NULL) {
> > +   RTE_LOG(ERR, EAL, "Source interrupt instance
> unallocated\n");
> > +   rte_errno = EINVAL;
> > +   goto fail;
> > +   }
> > +
> > +   if (index < 0) {
> > +   RTE_LOG(ERR, EAL, "Index cany be negative");
> > +   rte_errno = EINVAL;
> > +   goto fail;
> > +   }
> 
> How about making this parameter "size_t"?

 You mean index ? It can be size_t.

> 
> > +
> > +   intr_handle[index].fd = src->fd;
> > +   intr_handle[index].vfio_dev_fd = src->vfio_dev_fd;
> > +   intr_handle[index].type = src->type;
> > +   intr_handle[index].max_intr = src->max_intr;
> > +   intr_handle[index].nb_efd = src->nb_efd;
> > +   intr_handle[index].efd_counter_size = src->efd_counter_size;
> > +
> > +   memcpy(intr_

[dpdk-dev] [PATCH] cryptodev: extend data-unit length field

2021-10-04 Thread Matan Azrad
As described in [1] and as announced in [2], The field ``dataunit_len``
of the ``struct rte_crypto_cipher_xform`` moved to the end of the
structure and extended to ``uint32_t``.

In this way, sizes bigger than 64K bytes can be supported for data-unit
lengths.

[1] commit d014dddb2d69 ("cryptodev: support multiple cipher
data-units")
[2] commit 9a5c09211b3a ("doc: announce extension of crypto data-unit
length")

Signed-off-by: Matan Azrad 
---
 app/test/test_cryptodev_blockcipher.h  |  2 +-
 doc/guides/rel_notes/deprecation.rst   |  4 ---
 doc/guides/rel_notes/release_21_11.rst |  3 +++
 examples/l2fwd-crypto/main.c   |  6 ++---
 lib/cryptodev/rte_crypto_sym.h | 36 +-
 5 files changed, 19 insertions(+), 32 deletions(-)

diff --git a/app/test/test_cryptodev_blockcipher.h 
b/app/test/test_cryptodev_blockcipher.h
index dcaa08ae22..84f5d57787 100644
--- a/app/test/test_cryptodev_blockcipher.h
+++ b/app/test/test_cryptodev_blockcipher.h
@@ -97,7 +97,7 @@ struct blockcipher_test_data {
 
unsigned int cipher_offset;
unsigned int auth_offset;
-   uint16_t xts_dataunit_len;
+   uint32_t xts_dataunit_len;
bool wrapped_key;
 };
 
diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 05fc2fdee7..8b54088a39 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -202,10 +202,6 @@ Deprecation Notices
 * cryptodev: ``min`` and ``max`` fields of ``rte_crypto_param_range`` structure
   will be renamed in DPDK 21.11 to avoid conflict with Windows Sockets headers.
 
-* cryptodev: The field ``dataunit_len`` of the ``struct 
rte_crypto_cipher_xform``
-  has a limited size ``uint16_t``.
-  It will be moved and extended as ``uint32_t`` in DPDK 21.11.
-
 * cryptodev: The structure ``rte_crypto_sym_vec`` would be updated to add
   ``dest_sgl`` to support out of place processing.
   This field will be null for inplace processing.
diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 37dc1a7786..4a9d1dedd8 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -190,6 +190,9 @@ ABI Changes
  Use fixed width quotes for ``function_names`` or ``struct_names``.
  Use the past tense.
 
+* cryptodev: The field ``dataunit_len`` of the ``struct 
rte_crypto_cipher_xform``
+  moved to the end of the structure and extended to ``uint32_t``.
+
This section is a comment. Do not overwrite or remove it.
Also, make sure to start the actual text at the margin.
===
diff --git a/examples/l2fwd-crypto/main.c b/examples/l2fwd-crypto/main.c
index 66d1491bf7..78844cee18 100644
--- a/examples/l2fwd-crypto/main.c
+++ b/examples/l2fwd-crypto/main.c
@@ -182,7 +182,7 @@ struct l2fwd_crypto_params {
unsigned digest_length;
unsigned block_size;
 
-   uint16_t cipher_dataunit_len;
+   uint32_t cipher_dataunit_len;
 
struct l2fwd_iv cipher_iv;
struct l2fwd_iv auth_iv;
@@ -1269,9 +1269,9 @@ l2fwd_crypto_parse_args_long_options(struct 
l2fwd_crypto_options *options,
 
else if (strcmp(lgopts[option_index].name, "cipher_dataunit_len") == 0) 
{
retval = parse_size(&val, optarg);
-   if (retval == 0 && val >= 0 && val <= UINT16_MAX) {
+   if (retval == 0 && val >= 0) {
options->cipher_xform.cipher.dataunit_len =
-   (uint16_t)val;
+   (uint32_t)val;
return 0;
} else
return -1;
diff --git a/lib/cryptodev/rte_crypto_sym.h b/lib/cryptodev/rte_crypto_sym.h
index 58c0724743..1106ad6201 100644
--- a/lib/cryptodev/rte_crypto_sym.h
+++ b/lib/cryptodev/rte_crypto_sym.h
@@ -195,9 +195,6 @@ struct rte_crypto_cipher_xform {
enum rte_crypto_cipher_algorithm algo;
/**< Cipher algorithm */
 
-   RTE_STD_C11
-   union { /* temporary anonymous union for ABI compatibility */
-
struct {
const uint8_t *data;/**< pointer to key data */
uint16_t length;/**< key length in bytes */
@@ -233,27 +230,6 @@ struct rte_crypto_cipher_xform {
 *  - Each key can be either 128 bits (16 bytes) or 256 bits (32 bytes).
 *  - Both keys must have the same size.
 **/
-
-   RTE_STD_C11
-   struct { /* temporary anonymous struct for ABI compatibility */
-   const uint8_t *_key_data; /* reserved for key.data union */
-   uint16_t _key_length; /* reserved for key.length union */
-   /* next field can fill the padding hole */
-
-   uint16_t dataunit_len;
-   /**< When RTE_CRYPTODEV_FF_CIPHER_MULTIPLE_DATA_UNITS is enabled,
-* this is the data-unit length 

Re: [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct mode bonding

2021-10-04 Thread Slava Ovsiienko
> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, September 30, 2021 0:58
> To: Matan Azrad ; Slava Ovsiienko
> ; Rongwei Liu 
> Cc: Ori Kam ; dev@dpdk.org; Raslan Darawsheh
> 
> Subject: Re: [dpdk-dev] [PATCH v2 2/2] net/mlx5: support socket direct
> mode bonding
> 
> 28/09/2021 10:50, Rongwei Liu:
> > In socket direct mode, it's possible to bind any two (maybe four in
> > future) PCIe devices with IDs like :xx:xx.x and :yy:yy.y.
> > Bonding member interfaces are unnecessary to have the same PCIe
> > domain/bus/device ID anymore,
> >
> > Kernel driver uses "system_image_guid" to identify if devices can be
> > bound together or not. Sysfs "phys_switch_id" is used to get
> > "system_image_guid" of each network interface.
> >
> > OFED 5.4+ is required to support "phys_switch_id".
> > Centos 8.1 needs to enable switch_dev mode first.
> >
> > Signed-off-by: Rongwei Liu 
> > Acked-by: Viacheslav Ovsiienko 
> > ---
> >  drivers/net/mlx5/linux/mlx5_os.c | 43
> > +---
> >  1 file changed, 34 insertions(+), 9 deletions(-)
> 
> Does it deserve a line in the release notes?
Not sure, it is minor update.


Re: [dpdk-dev] [PATCH v1 01/12] ethdev: add ethdev item to flow API

2021-10-04 Thread Andrew Rybchenko
On 10/4/21 3:00 AM, Ivan Malov wrote:
> Hi Ori,
> 
> On 04/10/2021 00:09, Ori Kam wrote:
>> Hi Ivan,
>>
>>> -Original Message-
>>> From: Ivan Malov 
>>> Subject: Re: [PATCH v1 01/12] ethdev: add ethdev item to flow API
>>>
>>> Hi Ori,
>>>
>>> On 03/10/2021 14:52, Ori Kam wrote:
 Hi Andrew and Ivan,

> -Original Message-
> From: Andrew Rybchenko 
> Sent: Friday, October 1, 2021 4:47 PM
> Subject: [PATCH v1 01/12] ethdev: add ethdev item to flow API
>
> From: Ivan Malov 
>
> For use with "transfer" flows. Supposed to match traffic transmitted
> by the DPDK application via the specified ethdev, at e-switch level.
>
> Must not be combined with attributes "ingress" / "egress".
>
> Signed-off-by: Ivan Malov 
> Signed-off-by: Andrew Rybchenko 
> ---

 [Snip]

>    /** Generate flow_action[] entry. */ diff --git
> a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> 7b1ed7f110..880502098e 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -574,6 +574,15 @@ enum rte_flow_item_type {
>     * @see struct rte_flow_item_conntrack.
>     */
>    RTE_FLOW_ITEM_TYPE_CONNTRACK,
> +
> +    /**
> + * [META]
> + *
> + * Matches traffic at e-switch going from (sent by) the given
> ethdev.
> + *
> + * @see struct rte_flow_item_ethdev
> + */
> +    RTE_FLOW_ITEM_TYPE_ETHDEV,
>    };
>
>    /**
> @@ -1799,6 +1808,24 @@ static const struct rte_flow_item_conntrack
> rte_flow_item_conntrack_mask = {  };  #endif
>
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
> + * Provides an ethdev ID for use with items which are as follows:
> + * RTE_FLOW_ITEM_TYPE_ETHDEV.
> + */
> +struct rte_flow_item_ethdev {
> +    uint16_t id; /**< Ethdev ID */

 True for all above uses,
 should this be renamed to port?
>>>
>>> I'd not rename it to "port". The very idea of this series is to
>>> disambiguate
>>> things. This structure is shared between primitives ETHDEV and
>>> ESWITCH_PORT. If we go for "port", then in conjunction with ESWITCH_PORT
>>> the structure name may trick readers into thinking that the ID in
>>> question is
>>> the own ID of the e-switch port itself. But in fact this ID is an
>>> ethdev ID which
>>> is associated with the e-switch port.
>>>
>>> Should you wish to elaborate on your concerns with regard to naming,
>>> please
>>> do so. I'm all ears.
>>>
>> Fully understand and agree that the idea is to clear the ambiguaty.
>> My concern is that we don't use ethdev id, from application ethdev has
>> only
>> ports, so what is the id? (if we keep this, we should document that
>> the id is the
>> port)
>> What about ETHDEV_PORT and ESWITCH_PORT?
> 
> I understand that, technically, the only ports which the application can
> really interface with are ethdevs. So, terms "ethdev" and "port" may
> appear synonymous to the application - you are right on that. But, given
> the fact that we have some primitives like PHY_PORT and the likes, which
> also have "PORT" in their names, I'd rather go for "ethdev" as more
> precise term.
> 
> But let me assure you: I'm not saying that my opinion should prevail.
> I'm giving more thoughts to this in the background. Maybe Andrew can
> join this conversation as well.

As far as I can see ethdev API uses 'port_id' everywhere to
refer to ethdev port by its number. So, I suggest

struct rte_flow_item_ethdev {
uint16_t port_id; /**< ethdev port ID */
};

Basically I agree with Ori, that just "id" is a bit confusing
even when it is a member of the _ethdev structure, but I'd
prepend "port_"  a field name to sync with ethdev API which
uses port_id. So, we have ethdev->port_id.

Andrew.


Re: [dpdk-dev] [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta data

2021-10-04 Thread Ori Kam
Hi Ivan,

> -Original Message-
> From: Ivan Malov 
> Sent: Monday, October 4, 2021 2:50 AM
> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
> data
> 
> Hi Ori,
> 
> On 04/10/2021 00:04, Ori Kam wrote:
> > Hi Ivan,
> >
> > Sorry for the long review.
> >
> >> -Original Message-
> >> From: Ivan Malov 
> >> Sent: Sunday, October 3, 2021 8:30 PM
> >> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
> >> Rx meta data
> >>
> >> Hi Ori,
> >>
> >> On 03/10/2021 14:01, Ori Kam wrote:
> >>> Hi Ivan,
> >>>
>  -Original Message-
>  From: Ivan Malov 
>  Sent: Sunday, October 3, 2021 12:30 PM data
> 
>  Hi Ori,
> 
>  Thanks for reviewing this.
> 
> >>>
> >>> No problem.
> >>>
>  On 03/10/2021 10:42, Ori Kam wrote:
> > Hi Andrew and Ivan,
> >
> >
> >> -Original Message-
> >> From: Andrew Rybchenko 
> >> Sent: Friday, October 1, 2021 9:50 AM
> >> Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery
> >> of Rx meta data
> >>
> >> On 9/30/21 10:07 PM, Ivan Malov wrote:
> >>> Hi Ori,
> >>>
> >>> On 30/09/2021 17:59, Ori Kam wrote:
>  Hi Ivan,

[Snip]

> >>> Good point so why not use the same logic as the metadata and register
> it?
> >>> Since in any case, this is something in the mbuf so maybe this
> >>> should be the
> >> answer?
> >>
> >> I didn't catch your thought. Could you please elaborate on it?
> >
> > The metadata action just like the mark or flag is used to give
> > application data that was set by a flow rule.
> > To enable the metadata the application must register the metadata field.
> > Since this happens during the creation of the mbuf it means that it
> > must be created before the device start.
> >
> > I understand that the mark and flag don't need to be registered in the
> > mbuf since they have saved space but from application point of view
> > there is no difference between the metadata and mark, so why does
> > negotiate function doesn't handle the metadata?
> >
> > I hope this is clearer.
> 
> Thank you. That's a lot clearer.
> 
> I inspected struct rte_flow_action_set_meta as well as
> rte_flow_dynf_metadata_register(). The latter API doesn't require that
> applications invoke it precisely before adapter start. It says "must be called
> prior to use SET_META action", and the comment before the structure says
> just "in advance". So, at a bare minimum, the API contract could've been
> made more strict with this respect. However, far more important points are
> as follows:
> 

Agree, that doc should be updated but by definition this must be set before mbuf
creation this means before device start.

> 1) This API enables delivery of this "custom" metadata between the PMD
> and the application, whilst the API under review, as I noted before,
> negotiates delivery of various kinds of metadata between the NIC and the
> PMD. These are two completely different (albeit adjacent) stages of packet
> delivery process.
>
They are exactly alike also in the metadata case the registertion does two 
things:
Saves a place for the info in the mbuf and tells the PMD that it should 
configure the NIC
to supply this information upon request.
Even in your PMD assuming that it can support the metadata, you will need to 
configure
it otherwise when the application will request this data using a rule you will 
be at the
same spot you are now with the mark.

> 2) This API doesn't negotiate anything with the PMD. It doesn't interact with
> the PMD at all. It just reserves extra room in mbufs for the metadata field
> and exits.
> 
> 3) As a consequence of (3), the PMD can't immediately learn about this field
> being enabled. It's forced to face this fact at some deferred point. If the
> PMD, for instance, learns about that during adapter start and if it for some
> reason decides to deny the use of this field, it won't be able to convey its
> decision to the application. As a result, the application will live in the 
> wrong
> assumption that it has successfully enabled the feature.
>
> 4) Even if we add similar APIs to "register" more kinds of metadata (flag,
> mark, tunnel ID, etc) and re-define the meaning of all these APIs to say that
> not only they enable delivery of the metadata between the PMD and the
> application but also enable the HW transport to get the metadata delivered
> from the NIC to the PMD itself, we won't be able to use this set of APIs to
> actually *negotiate* something. The order of invocations will be confusing to
> the application. If the PMD can't combine some of these features, it won't be
> able to communicate this clearly to the application. It will have to silently
> disregard some of the "registered" features. And this is something that we
> probably want to avoid. Right?
> 
> But I tend to agree that the API under review could have one more (4th) flag
> to negotiate delivery of this "custom" metadata from the NI

Re: [dpdk-dev] [PATCH v1 02/12] ethdev: add eswitch port item to flow API

2021-10-04 Thread Ivan Malov

Hi Ori,

On 04/10/2021 08:45, Ori Kam wrote:

Hi Ivan,


-Original Message-
From: Ivan Malov 
Sent: Sunday, October 3, 2021 9:11 PM
Subject: Re: [PATCH v1 02/12] ethdev: add eswitch port item to flow API



On 03/10/2021 15:40, Ori Kam wrote:

Hi Andrew and Ivan,


-Original Message-
From: Andrew Rybchenko 
Sent: Friday, October 1, 2021 4:47 PM
Subject: [PATCH v1 02/12] ethdev: add eswitch port item to flow API

From: Ivan Malov 

For use with "transfer" flows. Supposed to match traffic entering the
e-switch from the external world (network, guests) via the port which
is logically connected with the given ethdev.

Must not be combined with attributes "ingress" / "egress".

This item is meant to use the same structure as ethdev item.



In case the app is not working with representors, meaning each switch
port is mapped to ethdev.
both items (ethdev and eswitch port ) have the same meaning?


No. Ethdev means ethdev, and e-switch port is the point where this ethdev
is plugged to. For example, "transfer + ESWITCH_PORT" for a regular PF
ethdev typically means the network port (maybe you can recall the idea that
a PF ethdev "represents" the network port it's associated with).

I believe, that diagrams which these patches add to
"doc/guides/prog_guide/rte_flow.rst" may come in handy to understand the
meaning. Also, you can take a look at our larger diagram from the Sep 14
gathering.



Lets look at the following system:
E-Switch has 3 ports - PF, VF1, VF2
The ports are distributed as follows:
DPDK application:
ethdev(0) pf,
ethdev(1) representor to VF1
ethdev(2) representor to VF2
ethdev(3) VF1

VM:
VF2

As we know all representors are realy connected to the PF(at least in this 
example)


This example tries to say that the e-switch has 3 ports in total, and, 
given your explanation, one may indeed agree that *in this example* 
representors re-use e-switch port of ethdev=0 (with some metadata to 
distinguish packets, etc.). But one can hardly assume that *all* 
representors with any vendor's NIC are connected to the e-switch the 
same way. It's vendor specific. Well, at least, applications don't have 
this knowledge and don't need to.




So matching on ethdev(3)  means matching on traffic sent from DPDK port 3 right?


Item ETHDEV (ethdev_id=3) matches traffic sent by DPDK port 3. Looks 
like we're on the same page here.



And matching on eswitch_port(3) means matching in traffic that goes into VF1 
which
is the same traffic as ethdev(3) right?


I didn't catch the thought about "the same traffic". Direction is not 
the same. Item ESWITCH_PORT (ethdev_id=3) matches traffic sent by DPDK 
port 1.


Yes, in this case neither of the ports (1, 3) is truly "external" (they 
both interface the DPDK application), but, the thing is, they're 
"external" *to each other* in the sense that they sit at the opposite 
ends of the wire.




Matching on ethdev(1) means matching on the PF port in the E-Switch but with 
some
metadata that marks the traffic as coming from DPDK port 1 and not from VF1 
E-Switch
port right?


That's vendor specific. The application doesn't have to know how exactly 
this particular ethdev is connected to the e-switch - whether it re-uses 
the PF's e-switch port or has its own. The e-switch port that connects 
the ethdev with the e-switch is just assumed to exist logically.




While matching on eswitch_port(2) means matching on traffic coming from the VM 
right?


Right.

  



Signed-off-by: Ivan Malov 
Signed-off-by: Andrew Rybchenko 
---
   app/test-pmd/cmdline_flow.c | 27 +
   doc/guides/prog_guide/rte_flow.rst  | 22 +
   doc/guides/rel_notes/release_21_11.rst  |  2 +-
   doc/guides/testpmd_app_ug/testpmd_funcs.rst |  4 +++
   lib/ethdev/rte_flow.c   |  1 +
   lib/ethdev/rte_flow.h   | 12 -
   6 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-

pmd/cmdline_flow.c

index e05b0d83d2..188d0ee39d 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -308,6 +308,8 @@ enum index {
ITEM_POL_POLICY,
ITEM_ETHDEV,
ITEM_ETHDEV_ID,
+   ITEM_ESWITCH_PORT,
+   ITEM_ESWITCH_PORT_ETHDEV_ID,


Like my comment from previous patch, I'm not sure the correct
term for ETHDEV is ID is should be port.


Please see my reply in the previous thread. "ethdev" here is an
"anchor", a "beacon" of sorts which allows either to refer namely to
this ethdev or to the e-switch port associated with it.





/* Validate/create actions. */
ACTIONS,
@@ -1003,6 +1005,7 @@ static const enum index next_item[] = {
ITEM_INTEGRITY,
ITEM_CONNTRACK,
ITEM_ETHDEV,
+   ITEM_ESWITCH_PORT,
END_SET,
ZERO,
   };
@@ -1377,6 +1380,12 @@ static const enum index item_ethdev[] = {
ZERO,
   };

+static const enum index item_eswitch_port[] = {
+   ITEM_

Re: [dpdk-dev] [PATCH v1 01/12] ethdev: add ethdev item to flow API

2021-10-04 Thread Ivan Malov

Hi Andrew, Ori,

On 04/10/2021 13:47, Andrew Rybchenko wrote:

On 10/4/21 3:00 AM, Ivan Malov wrote:

Hi Ori,

On 04/10/2021 00:09, Ori Kam wrote:

Hi Ivan,


-Original Message-
From: Ivan Malov 
Subject: Re: [PATCH v1 01/12] ethdev: add ethdev item to flow API

Hi Ori,

On 03/10/2021 14:52, Ori Kam wrote:

Hi Andrew and Ivan,


-Original Message-
From: Andrew Rybchenko 
Sent: Friday, October 1, 2021 4:47 PM
Subject: [PATCH v1 01/12] ethdev: add ethdev item to flow API

From: Ivan Malov 

For use with "transfer" flows. Supposed to match traffic transmitted
by the DPDK application via the specified ethdev, at e-switch level.

Must not be combined with attributes "ingress" / "egress".

Signed-off-by: Ivan Malov 
Signed-off-by: Andrew Rybchenko 
---


[Snip]


    /** Generate flow_action[] entry. */ diff --git
a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
7b1ed7f110..880502098e 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -574,6 +574,15 @@ enum rte_flow_item_type {
     * @see struct rte_flow_item_conntrack.
     */
    RTE_FLOW_ITEM_TYPE_CONNTRACK,
+
+    /**
+ * [META]
+ *
+ * Matches traffic at e-switch going from (sent by) the given
ethdev.
+ *
+ * @see struct rte_flow_item_ethdev
+ */
+    RTE_FLOW_ITEM_TYPE_ETHDEV,
    };

    /**
@@ -1799,6 +1808,24 @@ static const struct rte_flow_item_conntrack
rte_flow_item_conntrack_mask = {  };  #endif

+/**
+ * @warning
+ * @b EXPERIMENTAL: this structure may change without prior notice
+ *
+ * Provides an ethdev ID for use with items which are as follows:
+ * RTE_FLOW_ITEM_TYPE_ETHDEV.
+ */
+struct rte_flow_item_ethdev {
+    uint16_t id; /**< Ethdev ID */


True for all above uses,
should this be renamed to port?


I'd not rename it to "port". The very idea of this series is to
disambiguate
things. This structure is shared between primitives ETHDEV and
ESWITCH_PORT. If we go for "port", then in conjunction with ESWITCH_PORT
the structure name may trick readers into thinking that the ID in
question is
the own ID of the e-switch port itself. But in fact this ID is an
ethdev ID which
is associated with the e-switch port.

Should you wish to elaborate on your concerns with regard to naming,
please
do so. I'm all ears.


Fully understand and agree that the idea is to clear the ambiguaty.
My concern is that we don't use ethdev id, from application ethdev has
only
ports, so what is the id? (if we keep this, we should document that
the id is the
port)
What about ETHDEV_PORT and ESWITCH_PORT?


I understand that, technically, the only ports which the application can
really interface with are ethdevs. So, terms "ethdev" and "port" may
appear synonymous to the application - you are right on that. But, given
the fact that we have some primitives like PHY_PORT and the likes, which
also have "PORT" in their names, I'd rather go for "ethdev" as more
precise term.

But let me assure you: I'm not saying that my opinion should prevail.
I'm giving more thoughts to this in the background. Maybe Andrew can
join this conversation as well.


As far as I can see ethdev API uses 'port_id' everywhere to
refer to ethdev port by its number. So, I suggest

struct rte_flow_item_ethdev {
 uint16_t port_id; /**< ethdev port ID */
};

Basically I agree with Ori, that just "id" is a bit confusing
even when it is a member of the _ethdev structure, but I'd
prepend "port_"  a field name to sync with ethdev API which
uses port_id. So, we have ethdev->port_id.


Ack



Andrew.



--
Ivan M


Re: [dpdk-dev] [EXT] [PATCH v1 11/12] net/octeontx2: support ethdev flow action

2021-10-04 Thread Kiran Kumar Kokkilagadda



> -Original Message-
> From: Andrew Rybchenko 
> Sent: Friday, October 1, 2021 7:17 PM
> To: Jerin Jacob Kollanukkaran ; Nithin Kumar Dabilpuram
> ; Kiran Kumar Kokkilagadda
> 
> Cc: dev@dpdk.org; Ori Kam ; Thomas Monjalon
> ; Ferruh Yigit ; Ivan Malov
> 
> Subject: [EXT] [PATCH v1 11/12] net/octeontx2: support ethdev flow action
> 
> External Email
> 
> --
> PORT_ID action implementation works for ingress only and has the same
> semantics as ETHDEV action.

Please update the documentation also.


> 
> Signed-off-by: Andrew Rybchenko 
> ---
>  drivers/net/octeontx2/otx2_flow_parse.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/octeontx2/otx2_flow_parse.c
> b/drivers/net/octeontx2/otx2_flow_parse.c
> index 63a33142a5..5dd8464ec9 100644
> --- a/drivers/net/octeontx2/otx2_flow_parse.c
> +++ b/drivers/net/octeontx2/otx2_flow_parse.c
> @@ -900,7 +900,6 @@ otx2_flow_parse_actions(struct rte_eth_dev *dev,  {
>   struct otx2_eth_dev *hw = dev->data->dev_private;
>   struct otx2_npc_flow_info *npc = &hw->npc_flow;
> - const struct rte_flow_action_port_id *port_act;
>   const struct rte_flow_action_count *act_count;
>   const struct rte_flow_action_mark *act_mark;
>   const struct rte_flow_action_queue *act_q; @@ -987,9 +986,18 @@
> otx2_flow_parse_actions(struct rte_eth_dev *dev,
>   break;
> 
>   case RTE_FLOW_ACTION_TYPE_PORT_ID:
> - port_act = (const struct rte_flow_action_port_id *)
> - actions->conf;
> - port_id = port_act->id;
> + case RTE_FLOW_ACTION_TYPE_ETHDEV:
> + if (actions->type ==
> RTE_FLOW_ACTION_TYPE_PORT_ID) {
> + const struct rte_flow_action_port_id
> *port_act;
> +
> + port_act = actions->conf;
> + port_id = port_act->id;
> + } else {
> + const struct rte_flow_action_ethdev
> *ethdev_act;
> +
> + ethdev_act = actions->conf;
> + port_id = ethdev_act->id;
> + }
>   if (rte_eth_dev_get_name_by_port(port_id, if_name)) {
>   errmsg = "Name not found for output port id";
>   errcode = EINVAL;
> --
> 2.30.2



Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-04 Thread Dmitry Kozlyuk
2021-10-04 10:37 (UTC+), Harman Kalra:
> [...]
> > > +struct rte_intr_handle *rte_intr_handle_instance_index_get(
> > > + struct rte_intr_handle *intr_handle, int  
> > index)
> > 
> > If rte_intr_handle_instance_alloc() returns a pointer to an array, this 
> > function
> > is useless since the user can simply manipulate a pointer.  
> 
>  User wont be able to manipulate the pointer as he is not aware of size 
> of struct rte_intr_handle.
> He will observe "dereferencing pointer to incomplete type" compilation error.

Sorry, my bad.

> > If we want to make a distinction between a single struct rte_intr_handle and
> > a commonly allocated bunch of such (but why?), then they should be
> > represented by distinct types.  
> 
>  Do you mean, we should have separate APIs for single allocation and 
> batch allocation? As get API
> will be useful only in case of batch allocation. Currently interrupt 
> autotests and ifpga_rawdev driver makes
> batch allocation. 
> I think common API for single and batch is fine, get API is required for 
> returning a particular intr_handle instance.
> But one problem I see in current implementation is there should be upper 
> limit check for index in get/set
> API, which I will fix.

I don't think we need different APIs, I was asking if it was your intention.
Now I understand it and agree with you.

> > > +int rte_intr_handle_instance_index_set(struct rte_intr_handle  
> > *intr_handle,  
> > > +const struct rte_intr_handle *src,
> > > +int index)  
> > 
> > See above regarding the "index" parameter. If it can be removed, a better
> > name for this function would be rte_intr_handle_copy().  
> 
>  I think get API is required.

Maybe index is still not needed: "intr_handle" can just be a pointer to the
right item obtained with rte_intr_handle_instance_index_get(). This way you
also don't need to duplicate the index-checking logic.


Re: [dpdk-dev] [PATCH v4 6/6] net/iavf: add watchdog for VFLR

2021-10-04 Thread Nicolau, Radu



On 10/4/2021 3:15 AM, Wu, Jingjing wrote:



-Original Message-
From: Nicolau, Radu 
Sent: Friday, October 1, 2021 5:52 PM
To: Wu, Jingjing ; Xing, Beilei 
Cc: dev@dpdk.org; Doherty, Declan ; Sinha, Abhijit
; Zhang, Qi Z ; Richardson, Bruce
; Ananyev, Konstantin 
;
Nicolau, Radu 
Subject: [PATCH v4 6/6] net/iavf: add watchdog for VFLR

Add watchdog to iAVF PMD which support monitoring the VFLR register. If
the device is not already in reset then if a VF reset in progress is
detected then notfiy user through callback and set into reset state.
If the device is already in reset then poll for completion of reset.

Signed-off-by: Declan Doherty 
Signed-off-by: Radu Nicolau 
---
  drivers/net/iavf/iavf.h|  6 +++
  drivers/net/iavf/iavf_ethdev.c | 97 ++
  2 files changed, 103 insertions(+)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index d5f574b4b3..4481d2e134 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -212,6 +212,12 @@ struct iavf_info {
int cmd_retval; /* return value of the cmd response from PF */
uint8_t *aq_resp; /* buffer to store the adminq response from PF */

+   struct {
+   uint8_t enabled:1;
+   uint64_t period_us;
+   } watchdog;
+   /** iAVF watchdog configuration */
+
/* Event from pf */
bool dev_closed;
bool link_up;
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index aad6a28585..d02aa9c1c5 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -24,6 +24,7 @@
  #include 
  #include 
  #include 
+#include 

  #include "iavf.h"
  #include "iavf_rxtx.h"
@@ -239,6 +240,94 @@ iavf_tm_ops_get(struct rte_eth_dev *dev __rte_unused,
return 0;
  }

+
+static int
+iavf_vfr_inprogress(struct iavf_hw *hw)
+{
+   int inprogress = 0;
+
+   if ((IAVF_READ_REG(hw, IAVF_VFGEN_RSTAT) &
+   IAVF_VFGEN_RSTAT_VFR_STATE_MASK) ==
+   VIRTCHNL_VFR_INPROGRESS)
+   inprogress = 1;
+
+   if (inprogress)
+   PMD_DRV_LOG(INFO, "Watchdog detected VFR in progress");
+
+   return inprogress;
+}
+
+static void
+iavf_dev_watchdog(void *cb_arg)
+{
+   struct iavf_adapter *adapter = cb_arg;
+   struct iavf_hw *hw = IAVF_DEV_PRIVATE_TO_HW(adapter);
+   int vfr_inprogress = 0, rc = 0;
+
+   /* check if watchdog has been disabled since last call */
+   if (!adapter->vf.watchdog.enabled)
+   return;
+
+   /* If in reset then poll vfr_inprogress register for completion */
+   if (adapter->vf.vf_reset) {
+   vfr_inprogress = iavf_vfr_inprogress(hw);
+
+   if (!vfr_inprogress) {
+   PMD_DRV_LOG(INFO, "VF \"%s\" reset has completed",
+   adapter->eth_dev->data->name);
+   adapter->vf.vf_reset = false;
+   }
+   /* If not in reset then poll vfr_inprogress register for VFLR event */
+   } else {
+   vfr_inprogress = iavf_vfr_inprogress(hw);
+
+   if (vfr_inprogress) {
+   PMD_DRV_LOG(INFO,
+   "VF \"%s\" reset event has been detected by 
watchdog",
+   adapter->eth_dev->data->name);
+
+   /* enter reset state with VFLR event */
+   adapter->vf.vf_reset = true;
+
+   rte_eth_dev_callback_process(adapter->eth_dev,
+   RTE_ETH_EVENT_INTR_RESET, NULL);
+   }
+   }
+
+   /* re-alarm watchdog */
+   rc = rte_eal_alarm_set(adapter->vf.watchdog.period_us,
+   &iavf_dev_watchdog, cb_arg);
+
+   if (rc)
+   PMD_DRV_LOG(ERR, "Failed \"%s\" to reset device watchdog alarm",
+   adapter->eth_dev->data->name);
+}
+
+static void
+iavf_dev_watchdog_enable(struct iavf_adapter *adapter, uint64_t period_us)
+{
+   int rc;
+
+   PMD_DRV_LOG(INFO, "Enabling device watchdog");
+
+   adapter->vf.watchdog.enabled = 1;
+   adapter->vf.watchdog.period_us = period_us;
+
+   rc = rte_eal_alarm_set(adapter->vf.watchdog.period_us,
+   &iavf_dev_watchdog, (void *)adapter);
+   if (rc)
+   PMD_DRV_LOG(ERR, "Failed to enabled device watchdog");
+}
+
+static void
+iavf_dev_watchdog_disable(struct iavf_adapter *adapter)
+{
+   PMD_DRV_LOG(INFO, "Disabling device watchdog");
+
+   adapter->vf.watchdog.enabled = 0;
+   adapter->vf.watchdog.period_us = 0;
+}
+
  static int
  iavf_set_mc_addr_list(struct rte_eth_dev *dev,
struct rte_ether_addr *mc_addrs,
@@ -2448,6 +2537,11 @@ iavf_dev_init(struct rte_eth_dev *eth_dev)

iavf_default_rss_disable(adapter);

+
+   /* Start device watchdog, set polling period to 500us */
+   iavf_dev_watchdog_enable(adapter, 500);
+


Re: [dpdk-dev] [PATCH v3 4/7] ethdev: make burst functions to use new flat array

2021-10-04 Thread Ananyev, Konstantin


> 
> On 10/4/2021 10:20 AM, Ananyev, Konstantin wrote:
> >
> 
> >  static inline int
> >  rte_eth_rx_queue_count(uint16_t port_id, uint16_t queue_id)
> >  {
> > -   struct rte_eth_dev *dev;
> > +   struct rte_eth_fp_ops *p;
> > +   void *qd;
> > +
> > +   if (port_id >= RTE_MAX_ETHPORTS ||
> > +   queue_id >= RTE_MAX_QUEUES_PER_PORT) {
> > +   RTE_ETHDEV_LOG(ERR,
> > +   "Invalid port_id=%u or queue_id=%u\n",
> > +   port_id, queue_id);
> > +   return -EINVAL;
> > +   }
> 
>  Should the checkes wrapped with '#ifdef RTE_ETHDEV_DEBUG_RX' like others?
> >>>
> >>> Original rte_eth_rx_queue_count() always have similar checks enabled,
> >>> that's why I also kept them 'always on'.
> >>>
> 
>  <...>
> 
> > +++ b/lib/ethdev/version.map
> > @@ -247,11 +247,16 @@ EXPERIMENTAL {
> > rte_mtr_meter_policy_delete;
> > rte_mtr_meter_policy_update;
> > rte_mtr_meter_policy_validate;
> > +
> > +   # added in 21.05
> 
>  s/21.05/21.11/
> 
> > +   __rte_eth_rx_epilog;
> > +   __rte_eth_tx_prolog;
> 
>  These are directly called by application and must be part of ABI, but 
>  marked as
>  'internal' and has '__rte' prefix to highligh it, this may be confusing.
>  What about making them proper, non-internal, API?
> >>>
> >>> Hmm not sure what do you suggest here.
> >>> We don't want users to call them explicitly.
> >>> They are sort of helpers for rte_eth_rx_burst/rte_eth_tx_burst.
> >>> So I did what I thought is our usual policy for such semi-internal thigns:
> >>> have '@intenal' in comments, but in version.map put them under 
> >>> EXPERIMETAL/global
> >>> section.
> >>>
> >>> What do you think it should be instead?
> >>>
> >>
> >> Make them public API. (Basically just remove '__' prefix and @internal 
> >> comment).
> >>
> >> This way application can use them to run custom callback(s) (not only the
> >> registered ones), not sure if this can be dangerous though.
> >
> > Hmm, as I said above, I don't want users to call them explicitly.
> > Do you have any good reason to allow it?
> >
> 
> Just to get rid of this internal APIs that is exposed to application state.
> 
> >>
> >> We need to trace the ABI for these functions, making them public clarifies 
> >> it.
> >
> > We do have plenty of semi-internal functions right now,
> > why adding that one will be a problem?
> 
> As far as I remember existing ones are 'static inline' functions, and we don't
> have an ABI concern with them. But these are actual functions called by 
> application.

Not always.
As an example of internal but not static ones:
rte_mempool_check_cookies
rte_mempool_contig_blocks_check_cookies
rte_mempool_op_calc_mem_size_helper
_rte_pktmbuf_read

> 
> > From other side - if we'll declare it public, we will have obligations to 
> > support it
> > in future releases, plus it might encourage users to use it on its own.
> > To me that sounds like extra headache without any gain in return.
> >
> 
> If having those two as public API doesn't make sense, I agree with you.
> 
> >> Also comment can be updated to describe intended usage instead of marking 
> >> them
> >> internal, and applications can use these anyway if we mark them internal 
> >> or not.
> >



Re: [dpdk-dev] [PATCH v9 1/3] ethdev: add an API to get device configuration info

2021-10-04 Thread Ferruh Yigit
On 9/27/2021 8:56 AM, Thomas Monjalon wrote:
> 27/09/2021 09:21, Wang, Jie1X:
>> From: Thomas Monjalon 
>>> 26/09/2021 11:20, Jie Wang:
 This patch adds a new API "rte_eth_dev_conf_info_get()" to help users
 get device configuration info.
>>> [...]
 + * Retrieve the configuration of an Ethernet device.
 + *
 + * @param port_id
 + *   The port identifier of the Ethernet device.
 + * @param dev_conf_info
 + *   A pointer to a structure of type *rte_eth_conf* to be filled with
 + *   the configuration of the Ethernet device.
 + *   And the memory of the structure should be allocated by the caller.
 + * @return
 + *   - (0) if successful.
 + *   - (-ENODEV) if *port_id* invalid.
 + *   - (-EINVAL) if bad parameter.
 + */
 +__rte_experimental
 +int rte_eth_dev_conf_info_get(uint16_t port_id,
 +  struct rte_eth_conf *dev_conf_info);
>>>
>>> It does not make sense to me.
>>> rte_eth_conf is passed by the app to rte_eth_dev_configure.
>>> Why the app would need to get the same info back?
>>>
>>>
>>
>> In rte_eth_dev_configure, dev->data->dev_conf copies the info from 
>> port->dev_conf, and then the driver updates it. It doesn't same as 
>> port->dev_conf.
>> We need to get the updated device configuration.
> 
> OK I see.
> Please update the commit log to explain this.
> 

Also either an application needs to keep copy of the configuration (like testpmd
does), or won't have any way to know device configuration details.
And for the apps that keeps the configuration, it has a risk that application
copy and device copy of the configuration diverged, as Jie mentioned.

I think it makes sense to have a way to get the configuration from device, small
applications can rely on it without keeping a copy of a config at all.

And for testpmd, we have aligned with Xiaoyun to rely on the device
configuration more, in a way:
- When to display a config, use device copy as much as possible
- Use app copy of config to accumulate user config change requests to apply them
later, sync app config with device config after config applied.



Re: [dpdk-dev] [PATCH v9 1/3] ethdev: add an API to get device configuration info

2021-10-04 Thread Ferruh Yigit
On 9/26/2021 10:20 AM, Jie Wang wrote:
> This patch adds a new API "rte_eth_dev_conf_info_get()" to help users get
> device configuration info.
> 
> Cc: sta...@dpdk.org
> 

Since this is a new API, I think we can request it to be backported.

> Signed-off-by: Jie Wang 

<...>

> @@ -247,6 +247,9 @@ EXPERIMENTAL {
>   rte_mtr_meter_policy_delete;
>   rte_mtr_meter_policy_update;
>   rte_mtr_meter_policy_validate;
> +
> + # added in 21.11
> + rte_eth_dev_conf_info_get;

Not sure about the 'info' part in the API, what about 'rte_eth_dev_conf_get()'?


Re: [dpdk-dev] [PATCH v9 2/3] doc: update release notes for new API

2021-10-04 Thread Ferruh Yigit
On 9/26/2021 10:20 AM, Jie Wang wrote:
> Add information about new ethdev API.
> 
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Jie Wang 
> ---
>  doc/guides/rel_notes/release_21_11.rst | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_21_11.rst 
> b/doc/guides/rel_notes/release_21_11.rst
> index dcff939ae8..95e569f51c 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -111,6 +111,10 @@ New Features
>Added command-line options to specify total number of processes and
>current process ID. Each process owns subset of Rx and Tx queues.
>  
> +* **Added support for users get device configuration.**
> +  Added an API which can help users get device configuration.
> +  The declarations for the API's can be found in ``rte_ethdev.h``.
> +
>  

No need to have a separate patch for release notes update, can you please merge
this one with 1/3 patch?


Re: [dpdk-dev] [PATCH v9 1/3] ethdev: add an API to get device configuration info

2021-10-04 Thread Thomas Monjalon
04/10/2021 13:20, Ferruh Yigit:
> On 9/27/2021 8:56 AM, Thomas Monjalon wrote:
> > 27/09/2021 09:21, Wang, Jie1X:
> >> From: Thomas Monjalon 
> >>> 26/09/2021 11:20, Jie Wang:
>  This patch adds a new API "rte_eth_dev_conf_info_get()" to help users
>  get device configuration info.
> >>> [...]
>  + * Retrieve the configuration of an Ethernet device.
>  + *
>  + * @param port_id
>  + *   The port identifier of the Ethernet device.
>  + * @param dev_conf_info
>  + *   A pointer to a structure of type *rte_eth_conf* to be filled with
>  + *   the configuration of the Ethernet device.
>  + *   And the memory of the structure should be allocated by the caller.
>  + * @return
>  + *   - (0) if successful.
>  + *   - (-ENODEV) if *port_id* invalid.
>  + *   - (-EINVAL) if bad parameter.
>  + */
>  +__rte_experimental
>  +int rte_eth_dev_conf_info_get(uint16_t port_id,
>  +struct rte_eth_conf *dev_conf_info);
> >>>
> >>> It does not make sense to me.
> >>> rte_eth_conf is passed by the app to rte_eth_dev_configure.
> >>> Why the app would need to get the same info back?
> >>>
> >>>
> >>
> >> In rte_eth_dev_configure, dev->data->dev_conf copies the info from 
> >> port->dev_conf, and then the driver updates it. It doesn't same as 
> >> port->dev_conf.
> >> We need to get the updated device configuration.
> > 
> > OK I see.
> > Please update the commit log to explain this.
> > 
> 
> Also either an application needs to keep copy of the configuration (like 
> testpmd
> does), or won't have any way to know device configuration details.
> And for the apps that keeps the configuration, it has a risk that application
> copy and device copy of the configuration diverged, as Jie mentioned.
> 
> I think it makes sense to have a way to get the configuration from device, 
> small
> applications can rely on it without keeping a copy of a config at all.
> 
> And for testpmd, we have aligned with Xiaoyun to rely on the device
> configuration more, in a way:
> - When to display a config, use device copy as much as possible
> - Use app copy of config to accumulate user config change requests to apply 
> them
> later, sync app config with device config after config applied.

Makes sense, thanks.





Re: [dpdk-dev] [PATCH v9 1/3] ethdev: add an API to get device configuration info

2021-10-04 Thread Thomas Monjalon
04/10/2021 13:22, Ferruh Yigit:
> On 9/26/2021 10:20 AM, Jie Wang wrote:
> > This patch adds a new API "rte_eth_dev_conf_info_get()" to help users get
> > device configuration info.
> > 
> > Cc: sta...@dpdk.org
> > 
> 
> Since this is a new API, I think we can request it to be backported.

We cannot.

> > Signed-off-by: Jie Wang 
> 
> <...>
> 
> > @@ -247,6 +247,9 @@ EXPERIMENTAL {
> > rte_mtr_meter_policy_delete;
> > rte_mtr_meter_policy_update;
> > rte_mtr_meter_policy_validate;
> > +
> > +   # added in 21.11
> > +   rte_eth_dev_conf_info_get;
> 
> Not sure about the 'info' part in the API, what about 
> 'rte_eth_dev_conf_get()'?

+1




Re: [dpdk-dev] [PATCH v9 2/3] doc: update release notes for new API

2021-10-04 Thread Thomas Monjalon
04/10/2021 13:22, Ferruh Yigit:
> On 9/26/2021 10:20 AM, Jie Wang wrote:
> > Add information about new ethdev API.
> > 
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Jie Wang 
> > ---
> >  doc/guides/rel_notes/release_21_11.rst | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/doc/guides/rel_notes/release_21_11.rst 
> > b/doc/guides/rel_notes/release_21_11.rst
> > index dcff939ae8..95e569f51c 100644
> > --- a/doc/guides/rel_notes/release_21_11.rst
> > +++ b/doc/guides/rel_notes/release_21_11.rst
> > @@ -111,6 +111,10 @@ New Features
> >Added command-line options to specify total number of processes and
> >current process ID. Each process owns subset of Rx and Tx queues.
> >  
> > +* **Added support for users get device configuration.**
> > +  Added an API which can help users get device configuration.
> > +  The declarations for the API's can be found in ``rte_ethdev.h``.
> > +
> >  
> 
> No need to have a separate patch for release notes update, can you please 
> merge
> this one with 1/3 patch?

*not* have a separate patch




Re: [dpdk-dev] [PATCH v5 1/3] net/thunderx: enable build only on 64-bit Linux

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 11:02 AM, Pavan Nikhilesh Bhagavatula wrote:
>> On 10/4/2021 6:56 AM, pbhagavat...@marvell.com wrote:
>>> From: Pavan Nikhilesh 
>>>
>>> Due to Linux kernel AF(Admin function) driver dependency, only
>> enable
>>> build for 64-bit Linux.
>>>
>>
>> Hi Pavan,
>>
>> Isn't it possible to provide a commit log in the kernel side etc, that let
>> others to verify why only 64 bit is required, or if someone want to
>> support
>> 32bit that may help them to investigate the source of the restriction.
> 
> Arch 32 support is not implemented on ThunderX, so 32bit will not run.
> 

I see, is following correct:
All thunderx, octeonx & octeontx2 only supports VF in the DPDK, and PF is
supported by Linux kernel driver. And Linux kernel driver doesn't support 
arch32.

Is something changed in kernel driver side to drop the 32bit support?
If it was not supported at all, what is the motivation to disable the DPDK
drivers now?

>>
>>> Signed-off-by: Pavan Nikhilesh 
>>> Acked-by: Jerin Jacob 
>>> ---
>>>  v5 Changes
>>>  - s/fuction/function.
>>>
>>>  v4 Changes:
>>>  - Update commit message regarding dependency on AF driver.
>>>
>>>  drivers/net/thunderx/meson.build | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/thunderx/meson.build
>> b/drivers/net/thunderx/meson.build
>>> index 4bbcea7f93..da665bd76f 100644
>>> --- a/drivers/net/thunderx/meson.build
>>> +++ b/drivers/net/thunderx/meson.build
>>> @@ -1,9 +1,9 @@
>>>  # SPDX-License-Identifier: BSD-3-Clause
>>>  # Copyright(c) 2017 Cavium, Inc
>>>
>>> -if is_windows
>>> +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
>>>  build = false
>>> -reason = 'not supported on Windows'
>>> +reason = 'only supported on 64-bit Linux'
>>>  subdir_done()
>>>  endif
>>>
>>> --
>>> 2.17.1
>>>
> 



Re: [dpdk-dev] [PATCH v5 1/3] net/thunderx: enable build only on 64-bit Linux

2021-10-04 Thread Pavan Nikhilesh Bhagavatula
>On 10/4/2021 11:02 AM, Pavan Nikhilesh Bhagavatula wrote:
>>> On 10/4/2021 6:56 AM, pbhagavat...@marvell.com wrote:
 From: Pavan Nikhilesh 

 Due to Linux kernel AF(Admin function) driver dependency, only
>>> enable
 build for 64-bit Linux.

>>>
>>> Hi Pavan,
>>>
>>> Isn't it possible to provide a commit log in the kernel side etc, that let
>>> others to verify why only 64 bit is required, or if someone want to
>>> support
>>> 32bit that may help them to investigate the source of the restriction.
>>
>> Arch 32 support is not implemented on ThunderX, so 32bit will not
>run.
>>
>
>I see, is following correct:
>All thunderx, octeonx & octeontx2 only supports VF in the DPDK, and PF
>is
>supported by Linux kernel driver. And Linux kernel driver doesn't
>support arch32.

AF != PF, AF is something that manages all the shared resources between PF/VF.

>
>Is something changed in kernel driver side to drop the 32bit support?
>If it was not supported at all, what is the motivation to disable the DPDK
>drivers now?
>

It was never supported to begin with, motivation is that build will fail if we 
try to 
compile with 32b.

>>>
 Signed-off-by: Pavan Nikhilesh 
 Acked-by: Jerin Jacob 
 ---
  v5 Changes
  - s/fuction/function.

  v4 Changes:
  - Update commit message regarding dependency on AF driver.

  drivers/net/thunderx/meson.build | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/net/thunderx/meson.build
>>> b/drivers/net/thunderx/meson.build
 index 4bbcea7f93..da665bd76f 100644
 --- a/drivers/net/thunderx/meson.build
 +++ b/drivers/net/thunderx/meson.build
 @@ -1,9 +1,9 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2017 Cavium, Inc

 -if is_windows
 +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
  build = false
 -reason = 'not supported on Windows'
 +reason = 'only supported on 64-bit Linux'
  subdir_done()
  endif

 --
 2.17.1

>>



Re: [dpdk-dev] [PATCH v9 1/3] ethdev: add an API to get device configuration info

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 12:26 PM, Thomas Monjalon wrote:
> 04/10/2021 13:22, Ferruh Yigit:
>> On 9/26/2021 10:20 AM, Jie Wang wrote:
>>> This patch adds a new API "rte_eth_dev_conf_info_get()" to help users get
>>> device configuration info.
>>>
>>> Cc: sta...@dpdk.org
>>>
>>
>> Since this is a new API, I think we can request it to be backported.
> 
> We cannot.
> 

Of course, it is a typo in my end, I mean "we can NOT request ..."

>>> Signed-off-by: Jie Wang 
>>
>> <...>
>>
>>> @@ -247,6 +247,9 @@ EXPERIMENTAL {
>>> rte_mtr_meter_policy_delete;
>>> rte_mtr_meter_policy_update;
>>> rte_mtr_meter_policy_validate;
>>> +
>>> +   # added in 21.11
>>> +   rte_eth_dev_conf_info_get;
>>
>> Not sure about the 'info' part in the API, what about 
>> 'rte_eth_dev_conf_get()'?
> 
> +1
> 
> 



Re: [dpdk-dev] [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta data

2021-10-04 Thread Ivan Malov

Hi Ori,

On 04/10/2021 09:56, Ori Kam wrote:

Hi Ivan,


-Original Message-
From: Ivan Malov 
Sent: Monday, October 4, 2021 2:50 AM
Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta
data

Hi Ori,

On 04/10/2021 00:04, Ori Kam wrote:

Hi Ivan,

Sorry for the long review.


-Original Message-
From: Ivan Malov 
Sent: Sunday, October 3, 2021 8:30 PM
Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery of
Rx meta data

Hi Ori,

On 03/10/2021 14:01, Ori Kam wrote:

Hi Ivan,


-Original Message-
From: Ivan Malov 
Sent: Sunday, October 3, 2021 12:30 PM data

Hi Ori,

Thanks for reviewing this.



No problem.


On 03/10/2021 10:42, Ori Kam wrote:

Hi Andrew and Ivan,



-Original Message-
From: Andrew Rybchenko 
Sent: Friday, October 1, 2021 9:50 AM
Subject: Re: [PATCH v3 1/5] ethdev: add API to negotiate delivery
of Rx meta data

On 9/30/21 10:07 PM, Ivan Malov wrote:

Hi Ori,

On 30/09/2021 17:59, Ori Kam wrote:

Hi Ivan,


[Snip]


Good point so why not use the same logic as the metadata and register

it?

Since in any case, this is something in the mbuf so maybe this
should be the

answer?

I didn't catch your thought. Could you please elaborate on it?


The metadata action just like the mark or flag is used to give
application data that was set by a flow rule.
To enable the metadata the application must register the metadata field.
Since this happens during the creation of the mbuf it means that it
must be created before the device start.

I understand that the mark and flag don't need to be registered in the
mbuf since they have saved space but from application point of view
there is no difference between the metadata and mark, so why does
negotiate function doesn't handle the metadata?

I hope this is clearer.


Thank you. That's a lot clearer.

I inspected struct rte_flow_action_set_meta as well as
rte_flow_dynf_metadata_register(). The latter API doesn't require that
applications invoke it precisely before adapter start. It says "must be called
prior to use SET_META action", and the comment before the structure says
just "in advance". So, at a bare minimum, the API contract could've been
made more strict with this respect. However, far more important points are
as follows:



Agree, that doc should be updated but by definition this must be set before mbuf
creation this means before device start.


1) This API enables delivery of this "custom" metadata between the PMD
and the application, whilst the API under review, as I noted before,
negotiates delivery of various kinds of metadata between the NIC and the
PMD. These are two completely different (albeit adjacent) stages of packet
delivery process.


They are exactly alike also in the metadata case the registertion does two 
things:
Saves a place for the info in the mbuf and tells the PMD that it should 
configure the NIC
to supply this information upon request.


Looking at rte_flow_dynf_metadata_register() implementation, it doesn't 
seem to notify the PMD of the new field directly. Yes, the PMD will 
finally know, but at that point it won't be able to reject the field. 
It's one-sided communication in fact.



Even in your PMD assuming that it can support the metadata, you will need to 
configure
it otherwise when the application will request this data using a rule you will 
be at the
same spot you are now with the mark.


Right, but as I said, the primary concern is to configure delivery of 
metadata from the NIC HW to the PMD. It's not about mbuf dynfields.





2) This API doesn't negotiate anything with the PMD. It doesn't interact with
the PMD at all. It just reserves extra room in mbufs for the metadata field
and exits.

3) As a consequence of (3), the PMD can't immediately learn about this field
being enabled. It's forced to face this fact at some deferred point. If the
PMD, for instance, learns about that during adapter start and if it for some
reason decides to deny the use of this field, it won't be able to convey its
decision to the application. As a result, the application will live in the wrong
assumption that it has successfully enabled the feature.

4) Even if we add similar APIs to "register" more kinds of metadata (flag,
mark, tunnel ID, etc) and re-define the meaning of all these APIs to say that
not only they enable delivery of the metadata between the PMD and the
application but also enable the HW transport to get the metadata delivered
from the NIC to the PMD itself, we won't be able to use this set of APIs to
actually *negotiate* something. The order of invocations will be confusing to
the application. If the PMD can't combine some of these features, it won't be
able to communicate this clearly to the application. It will have to silently
disregard some of the "registered" features. And this is something that we
probably want to avoid. Right?

But I tend to agree that the API under review could have one more (4th) flag
to negotiate delivery of this "custom" met

Re: [dpdk-dev] [PATCH v5 1/3] net/thunderx: enable build only on 64-bit Linux

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 12:34 PM, Pavan Nikhilesh Bhagavatula wrote:
>> On 10/4/2021 11:02 AM, Pavan Nikhilesh Bhagavatula wrote:
 On 10/4/2021 6:56 AM, pbhagavat...@marvell.com wrote:
> From: Pavan Nikhilesh 
>
> Due to Linux kernel AF(Admin function) driver dependency, only
 enable
> build for 64-bit Linux.
>

 Hi Pavan,

 Isn't it possible to provide a commit log in the kernel side etc, that let
 others to verify why only 64 bit is required, or if someone want to
 support
 32bit that may help them to investigate the source of the restriction.
>>>
>>> Arch 32 support is not implemented on ThunderX, so 32bit will not
>> run.
>>>
>>
>> I see, is following correct:
>> All thunderx, octeonx & octeontx2 only supports VF in the DPDK, and PF
>> is
>> supported by Linux kernel driver. And Linux kernel driver doesn't
>> support arch32.
> 
> AF != PF, AF is something that manages all the shared resources between PF/VF.
> 

I see, I though AF is part of PF functionality. Are there two different kernel
modules for PF and AF?

So can DPDK driver drive PF? In a way, PF by DPDK, VF by DPDK, AF by Linux
kernel driver.

>>
>> Is something changed in kernel driver side to drop the 32bit support?
>> If it was not supported at all, what is the motivation to disable the DPDK
>> drivers now?
>>
> 
> It was never supported to begin with, motivation is that build will fail if 
> we try to 
> compile with 32b.
> 

If there is no plan to support 32bit in the kernel side, that is reasonable to
disable 32bit build, please provide above details in the commit log.

And after above said, how much maintenance cost to support 32bit, if the build
error is on the logging format "%lx" etc .. (as we mostly have 32bit build
errors), it is better to fix them using 'PRIx64' which is more proper way
anyway. If there is more logical issue with 32bit pointers, I agree with you to
disable it.
Can you please provided the build error in the commit log as record?


> Signed-off-by: Pavan Nikhilesh 
> Acked-by: Jerin Jacob 
> ---
>  v5 Changes
>  - s/fuction/function.
>
>  v4 Changes:
>  - Update commit message regarding dependency on AF driver.
>
>  drivers/net/thunderx/meson.build | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/thunderx/meson.build
 b/drivers/net/thunderx/meson.build
> index 4bbcea7f93..da665bd76f 100644
> --- a/drivers/net/thunderx/meson.build
> +++ b/drivers/net/thunderx/meson.build
> @@ -1,9 +1,9 @@
>  # SPDX-License-Identifier: BSD-3-Clause
>  # Copyright(c) 2017 Cavium, Inc
>
> -if is_windows
> +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
>  build = false
> -reason = 'not supported on Windows'
> +reason = 'only supported on 64-bit Linux'
>  subdir_done()
>  endif
>
> --
> 2.17.1
>
>>>
> 



Re: [dpdk-dev] [PATCH v3 4/6] ethdev: remove jumbo offload flag

2021-10-04 Thread Somnath Kotur
On Mon, Oct 4, 2021 at 10:42 AM Somnath Kotur
 wrote:
>
> On Fri, Oct 1, 2021 at 8:07 PM Ferruh Yigit  wrote:
> >
> > Removing 'DEV_RX_OFFLOAD_JUMBO_FRAME' offload flag.
> >
> > Instead of drivers announce this capability, application can deduct the
> > capability by checking reported 'dev_info.max_mtu' or
> > 'dev_info.max_rx_pktlen'.
> >
> > And instead of application explicitly set this flag to enable jumbo
> application setting this flag explicitly sounds better?
> > frames, this can be deducted by driver by comparing requested 'mtu' to
> typo, think you meant 'deduced' ? :)
>
> > 'RTE_ETHER_MTU'.
> >
> > Removing this additional configuration for simplification.
> >
> > Suggested-by: Konstantin Ananyev 
> > Signed-off-by: Ferruh Yigit 
> > Acked-by: Andrew Rybchenko 
> > Reviewed-by: Rosen Xu 
> > ---
> >  app/test-eventdev/test_pipeline_common.c  |  2 -
> >  app/test-pmd/cmdline.c|  2 +-
> >  app/test-pmd/config.c | 25 +-
> >  app/test-pmd/testpmd.c| 48 +--
> >  app/test-pmd/testpmd.h|  2 +-
> >  doc/guides/howto/debug_troubleshoot.rst   |  2 -
> >  doc/guides/nics/bnxt.rst  |  1 -
> >  doc/guides/nics/features.rst  |  3 +-
> >  drivers/net/atlantic/atl_ethdev.c |  1 -
> >  drivers/net/axgbe/axgbe_ethdev.c  |  1 -
> >  drivers/net/bnx2x/bnx2x_ethdev.c  |  1 -
> >  drivers/net/bnxt/bnxt.h   |  1 -
> >  drivers/net/bnxt/bnxt_ethdev.c| 10 +---
> >  drivers/net/bonding/rte_eth_bond_pmd.c|  8 
> >  drivers/net/cnxk/cnxk_ethdev.h|  5 +-
> >  drivers/net/cnxk/cnxk_ethdev_ops.c|  1 -
> >  drivers/net/cxgbe/cxgbe.h |  1 -
> >  drivers/net/cxgbe/cxgbe_ethdev.c  |  8 
> >  drivers/net/cxgbe/sge.c   |  5 +-
> >  drivers/net/dpaa/dpaa_ethdev.c|  2 -
> >  drivers/net/dpaa2/dpaa2_ethdev.c  |  2 -
> >  drivers/net/e1000/e1000_ethdev.h  |  4 +-
> >  drivers/net/e1000/em_ethdev.c |  4 +-
> >  drivers/net/e1000/em_rxtx.c   | 19 +++-
> >  drivers/net/e1000/igb_rxtx.c  |  3 +-
> >  drivers/net/ena/ena_ethdev.c  |  1 -
> >  drivers/net/enetc/enetc_ethdev.c  |  3 +-
> >  drivers/net/enic/enic_res.c   |  1 -
> >  drivers/net/failsafe/failsafe_ops.c   |  2 -
> >  drivers/net/fm10k/fm10k_ethdev.c  |  1 -
> >  drivers/net/hinic/hinic_pmd_ethdev.c  |  1 -
> >  drivers/net/hns3/hns3_ethdev.c|  1 -
> >  drivers/net/hns3/hns3_ethdev_vf.c |  1 -
> >  drivers/net/i40e/i40e_ethdev.c|  1 -
> >  drivers/net/i40e/i40e_rxtx.c  |  2 +-
> >  drivers/net/iavf/iavf_ethdev.c|  3 +-
> >  drivers/net/ice/ice_dcf_ethdev.c  |  3 +-
> >  drivers/net/ice/ice_dcf_vf_representor.c  |  1 -
> >  drivers/net/ice/ice_ethdev.c  |  1 -
> >  drivers/net/ice/ice_rxtx.c|  3 +-
> >  drivers/net/igc/igc_ethdev.h  |  1 -
> >  drivers/net/igc/igc_txrx.c|  2 +-
> >  drivers/net/ionic/ionic_ethdev.c  |  1 -
> >  drivers/net/ipn3ke/ipn3ke_representor.c   |  3 +-
> >  drivers/net/ixgbe/ixgbe_ethdev.c  |  5 +-
> >  drivers/net/ixgbe/ixgbe_pf.c  |  9 +---
> >  drivers/net/ixgbe/ixgbe_rxtx.c|  3 +-
> >  drivers/net/mlx4/mlx4_rxq.c   |  1 -
> >  drivers/net/mlx5/mlx5_rxq.c   |  1 -
> >  drivers/net/mvneta/mvneta_ethdev.h|  3 +-
> >  drivers/net/mvpp2/mrvl_ethdev.c   |  1 -
> >  drivers/net/nfp/nfp_common.c  |  6 +--
> >  drivers/net/octeontx/octeontx_ethdev.h|  1 -
> >  drivers/net/octeontx2/otx2_ethdev.h   |  1 -
> >  drivers/net/octeontx_ep/otx_ep_ethdev.c   |  3 +-
> >  drivers/net/octeontx_ep/otx_ep_rxtx.c |  6 ---
> >  drivers/net/qede/qede_ethdev.c|  1 -
> >  drivers/net/sfc/sfc_rx.c  |  2 -
> >  drivers/net/thunderx/nicvf_ethdev.h   |  1 -
> >  drivers/net/txgbe/txgbe_rxtx.c|  1 -
> >  drivers/net/virtio/virtio_ethdev.c|  1 -
> >  drivers/net/vmxnet3/vmxnet3_ethdev.c  |  1 -
> >  examples/ip_fragmentation/main.c  |  3 +-
> >  examples/ip_reassembly/main.c |  3 +-
> >  examples/ipsec-secgw/ipsec-secgw.c|  2 -
> >  examples/ipv4_multicast/main.c|  1 -
> >  examples/kni/main.c   |  5 --
> >  examples/l3fwd-acl/main.c |  4 +-
> >  examples/l3fwd-graph/main.c   |  4 +-
> >  examples/l3fwd-power/main.c   |  4 +-
> >  examples/l3fwd/main.c

Re: [dpdk-dev] [PATCH v1 02/12] ethdev: add eswitch port item to flow API

2021-10-04 Thread Ivan Malov

Hi Ori,

On 04/10/2021 14:37, Ori Kam wrote:

Hi Ivan,


-Original Message-
From: Ivan Malov 
Sent: Monday, October 4, 2021 2:06 PM
Cc: dev@dpdk.org
Subject: Re: [PATCH v1 02/12] ethdev: add eswitch port item to flow API

Hi Ori,

On 04/10/2021 08:45, Ori Kam wrote:

Hi Ivan,


-Original Message-
From: Ivan Malov 
Sent: Sunday, October 3, 2021 9:11 PM
Subject: Re: [PATCH v1 02/12] ethdev: add eswitch port item to flow
API



On 03/10/2021 15:40, Ori Kam wrote:

Hi Andrew and Ivan,


-Original Message-
From: Andrew Rybchenko 
Sent: Friday, October 1, 2021 4:47 PM
Subject: [PATCH v1 02/12] ethdev: add eswitch port item to flow API

From: Ivan Malov 

For use with "transfer" flows. Supposed to match traffic entering
the e-switch from the external world (network, guests) via the port
which is logically connected with the given ethdev.

Must not be combined with attributes "ingress" / "egress".

This item is meant to use the same structure as ethdev item.



In case the app is not working with representors, meaning each
switch port is mapped to ethdev.
both items (ethdev and eswitch port ) have the same meaning?


No. Ethdev means ethdev, and e-switch port is the point where this
ethdev is plugged to. For example, "transfer + ESWITCH_PORT" for a
regular PF ethdev typically means the network port (maybe you can
recall the idea that a PF ethdev "represents" the network port it's

associated with).


I believe, that diagrams which these patches add to
"doc/guides/prog_guide/rte_flow.rst" may come in handy to understand
the meaning. Also, you can take a look at our larger diagram from the
Sep 14 gathering.



Lets look at the following system:
E-Switch has 3 ports - PF, VF1, VF2
The ports are distributed as follows:
DPDK application:
ethdev(0) pf,
ethdev(1) representor to VF1
ethdev(2) representor to VF2
ethdev(3) VF1

VM:
VF2

As we know all representors are realy connected to the PF(at least in
this example)


This example tries to say that the e-switch has 3 ports in total, and, given
your explanation, one may indeed agree that *in this example* representors
re-use e-switch port of ethdev=0 (with some metadata to distinguish
packets, etc.). But one can hardly assume that *all* representors with any
vendor's NIC are connected to the e-switch the same way. It's vendor
specific. Well, at least, applications don't have this knowledge and don't need
to.



So matching on ethdev(3)  means matching on traffic sent from DPDK port

3 right?

Item ETHDEV (ethdev_id=3) matches traffic sent by DPDK port 3. Looks like
we're on the same page here.



Good.


And matching on eswitch_port(3) means matching in traffic that goes
into VF1 which is the same traffic as ethdev(3) right?


I didn't catch the thought about "the same traffic". Direction is not the same.
Item ESWITCH_PORT (ethdev_id=3) matches traffic sent by DPDK port 1.


This is the critical part for my understanding.
Matching on ethdev_id(3) means matching on traffic that is coming from DPDK 
port3.


Right.


So from E-Switch view point it is traffic that goes into VF1?


No. Above you clearly say "coming from DPDK port3". That is, from the 
VF1. *Not* going into it. Port 3 (ethdev_id=3) *is* VF1.



While matching on E-Switch_port(3) means matching on traffic coming from VF1?


No. It means matching on traffic coming from ethdev 1. From the VF1's 
representor.




And by the same logic matching on ethdev_id(1) means matching on taffic that 
was sent
from DPDK port 1 and matching on E-Switch_port(1) means matching on traffic 
coming from
VF1


In this case, you've got this right. But please see my above notes. By 
the looks of it, you might have run into confusion over there.




So in this case eswitch_port(3) is equal ot eswitch_port(1) right?
While ethdev(1) is not equal to ethdev(3)


No.

Item ETHDEV (ethdev_id=1) equals item ESWITCH_PORT (ethdev_id=3).
Item ETHDEV (ethdev_id=3) equals item ESWITCH_PORT (ethdev_id=1).



And just to complete the picture, matching on ethdev(2) will result in traffic
coming from the dpdk port and matching on eswitch_port(2) will match
on traffic coming from VF2


Exactly.


But, Ori, let me draw your attention to the following issue. In order to 
simplify understanding, I suggest that we refrain from saying "traffic 
that GOES TO". Where it goes depends on default rules that are supposed 
to be maintained by the PMD when ports get plugged / unplugged.


The flow items ETHDEV and ESWITH_PORT define the SOURCE of traffic. 
That's it. They define where the traffic "goes FROM".


Say, the DPDK application sends a packet from ethdev 0. This packet 
enters the e-switch. Match engine sits in the e-switch and intercepts 
the packet. It doesn't care where the packet *would go* if it wasn't 
intercepted. It cares about where the packet comes from. And it comes 
from ethdev 0. So, in the focus, we have the SOURCE of the packet.






Yes, in this case neither of the ports (1, 3) is truly "external" (they both
interf

Re: [dpdk-dev] [PATCH v5 1/3] net/thunderx: enable build only on 64-bit Linux

2021-10-04 Thread Pavan Nikhilesh Bhagavatula
>On 10/4/2021 12:34 PM, Pavan Nikhilesh Bhagavatula wrote:
>>> On 10/4/2021 11:02 AM, Pavan Nikhilesh Bhagavatula wrote:
> On 10/4/2021 6:56 AM, pbhagavat...@marvell.com wrote:
>> From: Pavan Nikhilesh 
>>
>> Due to Linux kernel AF(Admin function) driver dependency, only
> enable
>> build for 64-bit Linux.
>>
>
> Hi Pavan,
>
> Isn't it possible to provide a commit log in the kernel side etc, that
>let
> others to verify why only 64 bit is required, or if someone want to
> support
> 32bit that may help them to investigate the source of the
>restriction.

 Arch 32 support is not implemented on ThunderX, so 32bit will not
>>> run.

>>>
>>> I see, is following correct:
>>> All thunderx, octeonx & octeontx2 only supports VF in the DPDK,
>and PF
>>> is
>>> supported by Linux kernel driver. And Linux kernel driver doesn't
>>> support arch32.
>>
>> AF != PF, AF is something that manages all the shared resources
>between PF/VF.
>>
>
>I see, I though AF is part of PF functionality. Are there two different
>kernel
>modules for PF and AF?
>
>So can DPDK driver drive PF? In a way, PF by DPDK, VF by DPDK, AF by
>Linux
>kernel driver.

Yup that’s correct.

>
>>>
>>> Is something changed in kernel driver side to drop the 32bit support?
>>> If it was not supported at all, what is the motivation to disable the
>DPDK
>>> drivers now?
>>>
>>
>> It was never supported to begin with, motivation is that build will fail if
>we try to
>> compile with 32b.
>>
>
>If there is no plan to support 32bit in the kernel side, that is reasonable
>to
>disable 32bit build, please provide above details in the commit log.
>
>And after above said, how much maintenance cost to support 32bit, if
>the build
>error is on the logging format "%lx" etc .. (as we mostly have 32bit build
>errors), it is better to fix them using 'PRIx64' which is more proper way
>anyway. If there is more logical issue with 32bit pointers, I agree with
>you to
>disable it.
>Can you please provided the build error in the commit log as record?
>

Apologies, I meant that all the functions that don’t fall under 64b are stubbed 
out
so the driver wouldn’t work.

>
>> Signed-off-by: Pavan Nikhilesh 
>> Acked-by: Jerin Jacob 
>> ---
>>  v5 Changes
>>  - s/fuction/function.
>>
>>  v4 Changes:
>>  - Update commit message regarding dependency on AF driver.
>>
>>  drivers/net/thunderx/meson.build | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/thunderx/meson.build
> b/drivers/net/thunderx/meson.build
>> index 4bbcea7f93..da665bd76f 100644
>> --- a/drivers/net/thunderx/meson.build
>> +++ b/drivers/net/thunderx/meson.build
>> @@ -1,9 +1,9 @@
>>  # SPDX-License-Identifier: BSD-3-Clause
>>  # Copyright(c) 2017 Cavium, Inc
>>
>> -if is_windows
>> +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
>>  build = false
>> -reason = 'not supported on Windows'
>> +reason = 'only supported on 64-bit Linux'
>>  subdir_done()
>>  endif
>>
>> --
>> 2.17.1
>>

>>



Re: [dpdk-dev] [PATCH] net/memif: allocate socket hash on any NUMA socket

2021-10-04 Thread Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)


> On 9/28/2021 2:51 PM, Junxiao Shi wrote:
> > Previously, memif socket hash is always allocated on NUMA socket 0.
> > If the application is entirely running on another NUMA socket and EAL
> > --socket-limit prevents memory allocation on NUMA socket 0, memif
> > creation fails with "HASH: memory allocation failed" error.
> >
> > This patch allows allocating memif socket hash on any NUMA socket.
> >
> > Signed-off-by: Junxiao Shi 

Looks ok. Thanks for the patch!


Re: [dpdk-dev] [EXT] [PATCH v1 11/12] net/octeontx2: support ethdev flow action

2021-10-04 Thread Andrew Rybchenko
On 10/4/21 2:13 PM, Kiran Kumar Kokkilagadda wrote:
> 
> 
>> -Original Message-
>> From: Andrew Rybchenko 
>> Sent: Friday, October 1, 2021 7:17 PM
>> To: Jerin Jacob Kollanukkaran ; Nithin Kumar Dabilpuram
>> ; Kiran Kumar Kokkilagadda
>> 
>> Cc: dev@dpdk.org; Ori Kam ; Thomas Monjalon
>> ; Ferruh Yigit ; Ivan Malov
>> 
>> Subject: [EXT] [PATCH v1 11/12] net/octeontx2: support ethdev flow action
>>
>> External Email
>>
>> --
>> PORT_ID action implementation works for ingress only and has the same
>> semantics as ETHDEV action.
> 
> Please update the documentation also.

Thanks, will do.



Re: [dpdk-dev] [PATCH v3 06/10] drivers/crypto: move snow3g PMD to IPsec-mb framework

2021-10-04 Thread De Lara Guarch, Pablo
Hi Ciara,

> -Original Message-
> From: Power, Ciara 
> Sent: Wednesday, September 29, 2021 5:31 PM
> To: dev@dpdk.org
> Cc: Zhang, Roy Fan ; Bronowski, PiotrX
> ; gak...@marvell.com; Power, Ciara
> ; Thomas Monjalon ; De Lara
> Guarch, Pablo ; Ray Kinsella
> 
> Subject: [PATCH v3 06/10] drivers/crypto: move snow3g PMD to IPsec-mb
> framework
> 
> From: Piotr Bronowski 
> 
> This patch removes the crypto/snow3g folder and gathers all snow3g PMD
> implementation specific details into a single file, pmd_snow3g.c in the
> crypto/ipsec_mb folder.
> 
> Signed-off-by: Piotr Bronowski 
> Signed-off-by: Ciara Power 
> 
> ---
> v3: Removed extra empty lines.
> v2: Updated maintainers file.
> ---
>  MAINTAINERS   |   8 +-
>  doc/guides/cryptodevs/snow3g.rst  |   3 +-
>  drivers/crypto/ipsec_mb/meson.build   |   3 +-
>  .../pmd_snow3g.c} | 457 --
>  .../ipsec_mb/rte_ipsec_mb_pmd_private.h   |   7 +
>  drivers/crypto/meson.build|   1 -
>  drivers/crypto/snow3g/meson.build |  24 -
>  drivers/crypto/snow3g/rte_snow3g_pmd_ops.c| 323 -
>  drivers/crypto/snow3g/snow3g_pmd_private.h|  84 
>  drivers/crypto/snow3g/version.map |   3 -
>  10 files changed, 205 insertions(+), 708 deletions(-)  rename
> drivers/crypto/{snow3g/rte_snow3g_pmd.c => ipsec_mb/pmd_snow3g.c} (57%)
> delete mode 100644 drivers/crypto/snow3g/meson.build  delete mode 100644
> drivers/crypto/snow3g/rte_snow3g_pmd_ops.c
>  delete mode 100644 drivers/crypto/snow3g/snow3g_pmd_private.h
>  delete mode 100644 drivers/crypto/snow3g/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 794bad11c2..28855222d6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS

...

> - case SNOW3G_OP_AUTH_CIPHER:
> + case IPSEC_MB_OP_HASH_VERIFY_THEN_DECRYPT:
> + case IPSEC_MB_OP_HASH_GEN_THEN_ENCRYPT:
>   processed_ops = process_snow3g_hash_op(qp, ops, session,
>   num_ops);
>   process_snow3g_cipher_op(qp, ops, session, processed_ops);
> @@ -358,9 +343,9 @@ process_ops(struct rte_crypto_op **ops, struct
> snow3g_session *session,
>   }
>   }
> 
> - enqueued_ops = rte_ring_enqueue_burst(qp->processed_ops,
> + enqueued_ops = rte_ring_enqueue_burst(qp->ingress_queue,
>   (void **)ops, processed_ops, NULL);

Looks like there is a bug here. We don't need to enqueue operations back in the 
ring here.
We used to enqueue in the ring when crypto processing was done in enqueue, but 
now this is part of dequeue
and we already dequeued the operations from the ring.
As far as I know, the only enqueue operation in the ring should be done in 
enqueue_burst.

Thanks,
Pablo


[dpdk-dev] [PATCH v3] net: introduce IPv4 ihl and version fields

2021-10-04 Thread Gregory Etelson
RTE IPv4 header definition combines the `version' and `ihl'  fields
into a single structure member.
This patch introduces dedicated structure members for both `version'
and `ihl' IPv4 fields. Separated header fields definitions allow to
create simplified code to match on the IHL value in a flow rule.
The original `version_ihl' structure member is kept for backward
compatibility.

Signed-off-by: Gregory Etelson 

Depends-on: f7383e7c7ec1 ("net: announce changes in IPv4 header access")

Acked-by: Olivier Matz 

---
v2: Add dependency.
v3: Add comments.
---
 app/test/test_flow_classify.c |  8 
 lib/net/rte_ip.h  | 16 +++-
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/app/test/test_flow_classify.c b/app/test/test_flow_classify.c
index 951606f248..4f64be5357 100644
--- a/app/test/test_flow_classify.c
+++ b/app/test/test_flow_classify.c
@@ -95,7 +95,7 @@ static struct rte_acl_field_def ipv4_defs[NUM_FIELDS_IPV4] = {
  *  dst mask 255.255.255.00 / udp src is 32 dst is 33 / end"
  */
 static struct rte_flow_item_ipv4 ipv4_udp_spec_1 = {
-   { 0, 0, 0, 0, 0, 0, IPPROTO_UDP, 0,
+   { { .version_ihl = 0}, 0, 0, 0, 0, 0, IPPROTO_UDP, 0,
  RTE_IPV4(2, 2, 2, 3), RTE_IPV4(2, 2, 2, 7)}
 };
 static const struct rte_flow_item_ipv4 ipv4_mask_24 = {
@@ -131,7 +131,7 @@ static struct rte_flow_item  end_item = { 
RTE_FLOW_ITEM_TYPE_END,
  *  dst mask 255.255.255.00 / tcp src is 16 dst is 17 / end"
  */
 static struct rte_flow_item_ipv4 ipv4_tcp_spec_1 = {
-   { 0, 0, 0, 0, 0, 0, IPPROTO_TCP, 0,
+   { { .version_ihl = 0}, 0, 0, 0, 0, 0, IPPROTO_TCP, 0,
  RTE_IPV4(1, 2, 3, 4), RTE_IPV4(5, 6, 7, 8)}
 };
 
@@ -150,8 +150,8 @@ static struct rte_flow_item  tcp_item_1 = { 
RTE_FLOW_ITEM_TYPE_TCP,
  *  dst mask 255.255.255.00 / sctp src is 16 dst is 17/ end"
  */
 static struct rte_flow_item_ipv4 ipv4_sctp_spec_1 = {
-   { 0, 0, 0, 0, 0, 0, IPPROTO_SCTP, 0, RTE_IPV4(11, 12, 13, 14),
-   RTE_IPV4(15, 16, 17, 18)}
+   { { .version_ihl = 0}, 0, 0, 0, 0, 0, IPPROTO_SCTP, 0,
+   RTE_IPV4(11, 12, 13, 14), RTE_IPV4(15, 16, 17, 18)}
 };
 
 static struct rte_flow_item_sctp sctp_spec_1 = {
diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index 05948b69b7..89a68d9433 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -38,7 +38,21 @@ extern "C" {
  * IPv4 Header
  */
 struct rte_ipv4_hdr {
-   uint8_t  version_ihl;   /**< version and header length */
+   __extension__
+   union {
+   uint8_t version_ihl;/**< version and header length */
+   struct {
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   uint8_t ihl:4; /**< header length */
+   uint8_t version:4; /**< version */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+   uint8_t version:4; /**< version */
+   uint8_t ihl:4; /**< header length */
+#else
+#error "setup endian definition"
+#endif
+   };
+   };
uint8_t  type_of_service;   /**< type of service */
rte_be16_t total_length;/**< length of packet */
rte_be16_t packet_id;   /**< packet ID */
-- 
2.33.0



Re: [dpdk-dev] [PATCH v5 1/3] net/thunderx: enable build only on 64-bit Linux

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 1:01 PM, Pavan Nikhilesh Bhagavatula wrote:
>> On 10/4/2021 12:34 PM, Pavan Nikhilesh Bhagavatula wrote:
 On 10/4/2021 11:02 AM, Pavan Nikhilesh Bhagavatula wrote:
>> On 10/4/2021 6:56 AM, pbhagavat...@marvell.com wrote:
>>> From: Pavan Nikhilesh 
>>>
>>> Due to Linux kernel AF(Admin function) driver dependency, only
>> enable
>>> build for 64-bit Linux.
>>>
>>
>> Hi Pavan,
>>
>> Isn't it possible to provide a commit log in the kernel side etc, that
>> let
>> others to verify why only 64 bit is required, or if someone want to
>> support
>> 32bit that may help them to investigate the source of the
>> restriction.
>
> Arch 32 support is not implemented on ThunderX, so 32bit will not
 run.
>

 I see, is following correct:
 All thunderx, octeonx & octeontx2 only supports VF in the DPDK,
>> and PF
 is
 supported by Linux kernel driver. And Linux kernel driver doesn't
 support arch32.
>>>
>>> AF != PF, AF is something that manages all the shared resources
>> between PF/VF.
>>>
>>
>> I see, I though AF is part of PF functionality. Are there two different
>> kernel
>> modules for PF and AF?
>>
>> So can DPDK driver drive PF? In a way, PF by DPDK, VF by DPDK, AF by
>> Linux
>> kernel driver.
> 
> Yup that’s correct.
> 
>>

 Is something changed in kernel driver side to drop the 32bit support?
 If it was not supported at all, what is the motivation to disable the
>> DPDK
 drivers now?

>>>
>>> It was never supported to begin with, motivation is that build will fail if
>> we try to
>>> compile with 32b.
>>>
>>
>> If there is no plan to support 32bit in the kernel side, that is reasonable
>> to
>> disable 32bit build, please provide above details in the commit log.
>>
>> And after above said, how much maintenance cost to support 32bit, if
>> the build
>> error is on the logging format "%lx" etc .. (as we mostly have 32bit build
>> errors), it is better to fix them using 'PRIx64' which is more proper way
>> anyway. If there is more logical issue with 32bit pointers, I agree with
>> you to
>> disable it.
>> Can you please provided the build error in the commit log as record?
>>
> 
> Apologies, I meant that all the functions that don’t fall under 64b are 
> stubbed out
> so the driver wouldn’t work.
> 

so is there build error or not?

>>
>>> Signed-off-by: Pavan Nikhilesh 
>>> Acked-by: Jerin Jacob 
>>> ---
>>>  v5 Changes
>>>  - s/fuction/function.
>>>
>>>  v4 Changes:
>>>  - Update commit message regarding dependency on AF driver.
>>>
>>>  drivers/net/thunderx/meson.build | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/thunderx/meson.build
>> b/drivers/net/thunderx/meson.build
>>> index 4bbcea7f93..da665bd76f 100644
>>> --- a/drivers/net/thunderx/meson.build
>>> +++ b/drivers/net/thunderx/meson.build
>>> @@ -1,9 +1,9 @@
>>>  # SPDX-License-Identifier: BSD-3-Clause
>>>  # Copyright(c) 2017 Cavium, Inc
>>>
>>> -if is_windows
>>> +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
>>>  build = false
>>> -reason = 'not supported on Windows'
>>> +reason = 'only supported on 64-bit Linux'
>>>  subdir_done()
>>>  endif
>>>
>>> --
>>> 2.17.1
>>>
>
>>>
> 



Re: [dpdk-dev] [EXT] Re: [PATCH v5 1/3] net/thunderx: enable build only on 64-bit Linux

2021-10-04 Thread Pavan Nikhilesh Bhagavatula
>On 10/4/2021 1:01 PM, Pavan Nikhilesh Bhagavatula wrote:
>>> On 10/4/2021 12:34 PM, Pavan Nikhilesh Bhagavatula wrote:
> On 10/4/2021 11:02 AM, Pavan Nikhilesh Bhagavatula wrote:
>>> On 10/4/2021 6:56 AM, pbhagavat...@marvell.com wrote:
 From: Pavan Nikhilesh 

 Due to Linux kernel AF(Admin function) driver dependency,
>only
>>> enable
 build for 64-bit Linux.

>>>
>>> Hi Pavan,
>>>
>>> Isn't it possible to provide a commit log in the kernel side etc,
>that
>>> let
>>> others to verify why only 64 bit is required, or if someone want
>to
>>> support
>>> 32bit that may help them to investigate the source of the
>>> restriction.
>>
>> Arch 32 support is not implemented on ThunderX, so 32bit will
>not
> run.
>>
>
> I see, is following correct:
> All thunderx, octeonx & octeontx2 only supports VF in the DPDK,
>>> and PF
> is
> supported by Linux kernel driver. And Linux kernel driver doesn't
> support arch32.

 AF != PF, AF is something that manages all the shared resources
>>> between PF/VF.

>>>
>>> I see, I though AF is part of PF functionality. Are there two different
>>> kernel
>>> modules for PF and AF?
>>>
>>> So can DPDK driver drive PF? In a way, PF by DPDK, VF by DPDK, AF
>by
>>> Linux
>>> kernel driver.
>>
>> Yup that’s correct.
>>
>>>
>
> Is something changed in kernel driver side to drop the 32bit
>support?
> If it was not supported at all, what is the motivation to disable the
>>> DPDK
> drivers now?
>

 It was never supported to begin with, motivation is that build will
>fail if
>>> we try to
 compile with 32b.

>>>
>>> If there is no plan to support 32bit in the kernel side, that is
>reasonable
>>> to
>>> disable 32bit build, please provide above details in the commit log.
>>>
>>> And after above said, how much maintenance cost to support 32bit,
>if
>>> the build
>>> error is on the logging format "%lx" etc .. (as we mostly have 32bit
>build
>>> errors), it is better to fix them using 'PRIx64' which is more proper
>way
>>> anyway. If there is more logical issue with 32bit pointers, I agree with
>>> you to
>>> disable it.
>>> Can you please provided the build error in the commit log as record?
>>>
>>
>> Apologies, I meant that all the functions that don’t fall under 64b are
>stubbed out
>> so the driver wouldn’t work.
>>
>
>so is there build error or not?
>

No build error.

>>>
 Signed-off-by: Pavan Nikhilesh 
 Acked-by: Jerin Jacob 
 ---
  v5 Changes
  - s/fuction/function.

  v4 Changes:
  - Update commit message regarding dependency on AF
>driver.

  drivers/net/thunderx/meson.build | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/net/thunderx/meson.build
>>> b/drivers/net/thunderx/meson.build
 index 4bbcea7f93..da665bd76f 100644
 --- a/drivers/net/thunderx/meson.build
 +++ b/drivers/net/thunderx/meson.build
 @@ -1,9 +1,9 @@
  # SPDX-License-Identifier: BSD-3-Clause
  # Copyright(c) 2017 Cavium, Inc

 -if is_windows
 +if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
  build = false
 -reason = 'not supported on Windows'
 +reason = 'only supported on 64-bit Linux'
  subdir_done()
  endif

 --
 2.17.1

>>

>>



[dpdk-dev] [PATCH v1] ci: update machine meson option to platform

2021-10-04 Thread Juraj Linkeš
The way we're building DPDK in CI, with -Dmachine=default, has not been
updated when the option got replaced to preserve a backwards-complatible
build call to facilitate ABI verification between DPDK versions. Update
the call to use -Dplatform=generic, which is the most up to date way to
execute the same build which is now present in all DPDK versions the ABI
check verifies.

Signed-off-by: Juraj Linkeš 
---
 .ci/linux-build.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 91e43a975b..f8710e3ad4 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -77,7 +77,7 @@ else
 OPTS="$OPTS -Dexamples=all"
 fi
 
-OPTS="$OPTS -Dmachine=default"
+OPTS="$OPTS -platform=generic"
 OPTS="$OPTS --default-library=$DEF_LIB"
 OPTS="$OPTS --buildtype=debugoptimized"
 OPTS="$OPTS -Dcheck_includes=true"
-- 
2.20.1



Re: [dpdk-dev] [PATCH v2] net: introduce IPv4 ihl and version fields

2021-10-04 Thread Gregory Etelson
Hello Olivier,

[:snip:]
> 
> nit: although it's obvious, we may want to add
> /**< IP version */ and
> /**< header length */ for these new fields, for
> consistency with the
> rest of the structure.

I added comments to v3.

Regards,
Gregory


Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten

2021-10-04 Thread Ferruh Yigit
On 9/24/2021 11:54 AM, Elad Nachman wrote:
> Fix lack of multiple KNI requests handling support by introducing a
> request in progress flag which will fail additional requests with
> EAGAIN return code if the original request has not been processed
> by user-space.
> 
> Bugzilla ID: 809

Hi Eric,

Can you please test this patch, if it solves the issue you reported?

>  
> Signed-off-by: Elad Nachman 
> ---
>  kernel/linux/kni/kni_net.c | 9 +
>  lib/kni/rte_kni.c  | 2 ++
>  lib/kni/rte_kni_common.h   | 1 +
>  3 files changed, 12 insertions(+)
> 

<...>

> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev, struct 
> rte_kni_request *req)
>  
>   mutex_lock(&kni->sync_lock);
>  
> + /* Check that existing request has been processed: */
> + cur_req = (struct rte_kni_request *)kni->sync_kva;
> + if (cur_req->req_in_progress) {
> + ret = -EAGAIN;

Overall logic in the KNI looks good to me, this helps to serialize the requests
even for async ones.

But can you please clarify how it behaves in the kernel side with '-EAGAIN'
return type? Will linux call the ndo again, or will it just fail.

If it just fails should we handle the re-try on '-EAGAIN' within the kni module?



Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten

2021-10-04 Thread Elad Nachman
Hi,

EAGAIN is propogated back to the kernel and to the caller.

We cannot retry from the kni kernel module since we hold the rtnl lock.

FYI,

Elad

בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit ‏<
ferruh.yi...@intel.com>:

> On 9/24/2021 11:54 AM, Elad Nachman wrote:
> > Fix lack of multiple KNI requests handling support by introducing a
> > request in progress flag which will fail additional requests with
> > EAGAIN return code if the original request has not been processed
> > by user-space.
> >
> > Bugzilla ID: 809
>
> Hi Eric,
>
> Can you please test this patch, if it solves the issue you reported?
>
> >
> > Signed-off-by: Elad Nachman 
> > ---
> >  kernel/linux/kni/kni_net.c | 9 +
> >  lib/kni/rte_kni.c  | 2 ++
> >  lib/kni/rte_kni_common.h   | 1 +
> >  3 files changed, 12 insertions(+)
> >
>
> <...>
>
> > @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
> >
> >   mutex_lock(&kni->sync_lock);
> >
> > + /* Check that existing request has been processed: */
> > + cur_req = (struct rte_kni_request *)kni->sync_kva;
> > + if (cur_req->req_in_progress) {
> > + ret = -EAGAIN;
>
> Overall logic in the KNI looks good to me, this helps to serialize the
> requests
> even for async ones.
>
> But can you please clarify how it behaves in the kernel side with '-EAGAIN'
> return type? Will linux call the ndo again, or will it just fail.
>
> If it just fails should we handle the re-try on '-EAGAIN' within the kni
> module?
>
>


[dpdk-dev] [PATCH v2] ci: update machine meson option to platform

2021-10-04 Thread Juraj Linkeš
The way we're building DPDK in CI, with -Dmachine=default, has not been
updated when the option got replaced to preserve a backwards-complatible
build call to facilitate ABI verification between DPDK versions. Update
the call to use -Dplatform=generic, which is the most up to date way to
execute the same build which is now present in all DPDK versions the ABI
check verifies.

Signed-off-by: Juraj Linkeš 
---
 .ci/linux-build.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 91e43a975b..06aaa79100 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -77,7 +77,7 @@ else
 OPTS="$OPTS -Dexamples=all"
 fi
 
-OPTS="$OPTS -Dmachine=default"
+OPTS="$OPTS -Dplatform=generic"
 OPTS="$OPTS --default-library=$DEF_LIB"
 OPTS="$OPTS --buildtype=debugoptimized"
 OPTS="$OPTS -Dcheck_includes=true"
-- 
2.20.1



[dpdk-dev] [PATCH v2] test: add reassembly perf test

2021-10-04 Thread pbhagavatula
From: Pavan Nikhilesh 

Add reassembly perf autotest for both ipv4 and ipv6 reassembly.
Each test is performed with vairable number of fragments per flow,
either ordered or unorderd fragments and interleaved flows.

Signed-off-by: Pavan Nikhilesh 
---
 v2 Changes
 - Rebase to master, reduce memory consumption, set default mempool ops
 to ring_mp_mc.

 app/test/meson.build|   2 +
 app/test/test_reassembly_perf.c | 991 
 2 files changed, 993 insertions(+)
 create mode 100644 app/test/test_reassembly_perf.c

diff --git a/app/test/meson.build b/app/test/meson.build
index f144d8b8ed..f1957dc5b5 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -109,6 +109,7 @@ test_sources = files(
 'test_rawdev.c',
 'test_rcu_qsbr.c',
 'test_rcu_qsbr_perf.c',
+'test_reassembly_perf.c',
 'test_reciprocal_division.c',
 'test_reciprocal_division_perf.c',
 'test_red.c',
@@ -315,6 +316,7 @@ perf_test_names = [
 'hash_readwrite_lf_perf_autotest',
 'trace_perf_autotest',
 'ipsec_perf_autotest',
+'reassembly_perf_autotest',
 ]

 driver_test_names = [
diff --git a/app/test/test_reassembly_perf.c b/app/test/test_reassembly_perf.c
new file mode 100644
index 00..da60c8bd7a
--- /dev/null
+++ b/app/test/test_reassembly_perf.c
@@ -0,0 +1,991 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Marvell.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define MAX_FLOWS  1024 * 32
+#define MAX_BKTS   MAX_FLOWS
+#define MAX_ENTRIES_PER_BKT 16
+#define MAX_FRAGMENTS  RTE_LIBRTE_IP_FRAG_MAX_FRAG
+#define MIN_FRAGMENTS  2
+#define MAX_PKTS   MAX_FLOWS *MAX_FRAGMENTS
+
+#define MAX_PKT_LEN 2048
+#define MAX_TTL_MS  5 * MS_PER_S
+
+/* use RFC863 Discard Protocol */
+#define UDP_SRC_PORT 9
+#define UDP_DST_PORT 9
+
+/* use RFC5735 / RFC2544 reserved network test addresses */
+#define IP_SRC_ADDR(x) (198U << 24) | (18 << 16) | (0 << 8) | x
+#define IP_DST_ADDR(x) (198U << 24) | (18 << 16) | (1 << 8) | x
+
+/* 2001:0200::/48 is IANA reserved range for IPv6 benchmarking (RFC5180) */
+static uint8_t ip6_addr[16] = {32, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0};
+#define IP6_VERSION 6
+
+#define IP_DEFTTL 64 /* from RFC 1340. */
+
+static struct rte_ip_frag_tbl *frag_tbl;
+static struct rte_mempool *pkt_pool;
+static struct rte_mbuf *mbufs[MAX_FLOWS][MAX_FRAGMENTS];
+static uint8_t frag_per_flow[MAX_FLOWS];
+static uint32_t flow_cnt;
+
+#define FILL_MODE_LINEAR  0
+#define FILL_MODE_RANDOM  1
+#define FILL_MODE_INTERLEAVED 2
+
+static int
+reassembly_test_setup(void)
+{
+   uint64_t max_ttl_cyc = (MAX_TTL_MS * rte_get_timer_hz()) / 1E3;
+
+   frag_tbl = rte_ip_frag_table_create(MAX_FLOWS, MAX_ENTRIES_PER_BKT,
+   MAX_FLOWS * MAX_ENTRIES_PER_BKT,
+   max_ttl_cyc, rte_socket_id());
+   if (frag_tbl == NULL)
+   return TEST_FAILED;
+
+   rte_mbuf_set_user_mempool_ops("ring_mp_mc");
+   pkt_pool = rte_pktmbuf_pool_create(
+   "reassembly_perf_pool", MAX_FLOWS * MAX_FRAGMENTS, 0, 0,
+   RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
+   if (pkt_pool == NULL) {
+   printf("[%s] Failed to create pkt pool\n", __func__);
+   rte_ip_frag_table_destroy(frag_tbl);
+   return TEST_FAILED;
+   }
+
+   return TEST_SUCCESS;
+}
+
+static void
+reassembly_test_teardown(void)
+{
+   if (frag_tbl != NULL)
+   rte_ip_frag_table_destroy(frag_tbl);
+
+   if (pkt_pool != NULL)
+   rte_mempool_free(pkt_pool);
+}
+
+static void
+randomize_array_positions(void **array, uint8_t sz)
+{
+   void *tmp;
+   int i, j;
+
+   if (sz == 2) {
+   tmp = array[0];
+   array[0] = array[1];
+   array[1] = tmp;
+   } else {
+   for (i = sz - 1; i > 0; i--) {
+   j = rte_rand_max(i + 1);
+   tmp = array[i];
+   array[i] = array[j];
+   array[j] = tmp;
+   }
+   }
+}
+
+static void
+reassembly_print_banner(const char *proto_str)
+{
+   printf("+=="
+  "+\n");
+   printf("| %-32s| %-3s : %-58d|\n", proto_str, "Flow Count", MAX_FLOWS);
+   printf("+++=+=+"
+  "+===+\n");
+   printf("%-17s%-17s%-14s%-14s%-25s%-20s\n", "| Fragment Order",
+  "| Fragments/Flow", "| Outstanding", "| Cycles/Flow",
+  "| Cycles/Fragment insert", "| Cycles/Reassembly |");
+   printf("+

Re: [dpdk-dev] [PATCH v2 0/7] Removal of PCI bus ABIs

2021-10-04 Thread David Marchand
On Thu, Sep 30, 2021 at 10:45 AM David Marchand
 wrote:
> On Wed, Sep 29, 2021 at 9:38 AM Xia, Chenbo  wrote:
> > @David, could you help me understand what is the compile error in Fedora 31?
> > DPDK_compile_spdk failure is expected as the header name for SPDK is 
> > changed,
> > I am not sure if it's the same error...
>
> The error log is odd (no compilation "backtrace").
> You'll need to test spdk manually I guess.

Tried your series with SPDK (w/o and w/ enable_driver_sdk).
I think the same, and the error is likely due to the file rename.

$ make
  CC lib/env_dpdk/env.o
In file included from env.c:39:0:
env_internal.h:64:25: error: field ‘driver’ has incomplete type
  struct rte_pci_driver  driver;
 ^
env_internal.h:75:59: warning: ‘struct rte_pci_device’ declared inside
parameter list [enabled by default]
 int pci_device_init(struct rte_pci_driver *driver, struct
rte_pci_device *device);
   ^
env_internal.h:75:59: warning: its scope is only this definition or
declaration, which is probably not what you want [enabled by default]
env_internal.h:76:28: warning: ‘struct rte_pci_device’ declared inside
parameter list [enabled by default]
 int pci_device_fini(struct rte_pci_device *device);
^
env_internal.h:89:38: warning: ‘struct rte_pci_device’ declared inside
parameter list [enabled by default]
 void vtophys_pci_device_added(struct rte_pci_device *pci_device);
  ^
env_internal.h:96:40: warning: ‘struct rte_pci_device’ declared inside
parameter list [enabled by default]
 void vtophys_pci_device_removed(struct rte_pci_device *pci_device);
^
make[2]: *** [env.o] Error 1
make[1]: *** [env_dpdk] Error 2
make: *** [lib] Error 2



So basically, SPDK needs some updates since it has its own pci drivers.
I copied some SPDK folks for info.

*Disclaimer* I only checked it links fine against my 21.11 dpdk env,
and did not test the other cases:

diff --git a/dpdkbuild/Makefile b/dpdkbuild/Makefile
index d51b1a6e5..0e666735d 100644
--- a/dpdkbuild/Makefile
+++ b/dpdkbuild/Makefile
@@ -166,6 +166,7 @@ all: $(SPDK_ROOT_DIR)/dpdk/build-tmp
 $(SPDK_ROOT_DIR)/dpdk/build-tmp: $(SPDK_ROOT_DIR)/mk/cc.mk
$(SPDK_ROOT_DIR)/include/spdk/config.h
$(Q)rm -rf $(SPDK_ROOT_DIR)/dpdk/build $(SPDK_ROOT_DIR)/dpdk/build-tmp
$(Q)cd "$(SPDK_ROOT_DIR)/dpdk"; CC="$(SUB_CC)" meson
--prefix="$(MESON_PREFIX)" --libdir lib -Dc_args="$(DPDK_CFLAGS)"
-Dc_link_args="$(DPDK_LDFLAGS)" $(DPDK_OPTS)
-Ddisable_drivers="$(shell echo $(DPDK_DISABLED_DRVERS) | sed -E "s/
+/,/g")" build-tmp
+   $(Q)! meson configure build-tmp | grep -qw enable_driver_sdk
|| meson configure build-tmp -Denable_driver_sdk=true
$(Q)sed $(SED_INPLACE_FLAG) 's/#define RTE_EAL_PMD_PATH
.*/#define RTE_EAL_PMD_PATH ""/g'
$(SPDK_ROOT_DIR)/dpdk/build-tmp/rte_build_config.h
$(Q) \
# TODO Meson build adds libbsd dependency when it's available.
This means any app will be \
diff --git a/lib/env_dpdk/env.mk b/lib/env_dpdk/env.mk
index cc7db8aab..e24c6942f 100644bits with an embedded dpdk
--- a/lib/env_dpdk/env.mk
+++ b/lib/env_dpdk/env.mk
@@ -172,6 +172,12 @@ DPDK_PRIVATE_LINKER_ARGS += -lnuma
 endif
 endif

+ifneq (,$(wildcard $(DPDK_INC_DIR)/rte_build_config.h))
+ifneq (,$(shell grep -e "define RTE_HAS_LIBARCHIVE 1"
$(DPDK_INC_DIR)/rte_build_config.h))
+DPDK_PRIVATE_LINKER_ARGS += -larchive
+endif
+endif
+
 ifeq ($(OS),Linux)
 DPDK_PRIVATE_LINKER_ARGS += -ldl
 endif
diff --git a/lib/env_dpdk/env_internal.h b/lib/env_dpdk/env_internal.h
index 2303f432c..24b377545 100644
--- a/lib/env_dpdk/env_internal.h
+++ b/lib/env_dpdk/env_internal.h
@@ -43,13 +43,18 @@
 #include 
 #include 
 #include 
-#include 
 #include 

 #if RTE_VERSION < RTE_VERSION_NUM(19, 11, 0, 0)
 #error RTE_VERSION is too old! Minimum 19.11 is required.
 #endif

+#if RTE_VERSION < RTE_VERSION_NUM(21, 11, 0, 0)
+#include 
+#else
+#include 
+#endif
+
 /* x86-64 and ARM userspace virtual addresses use only the low 48 bits [0..47],
  * which is enough to cover 256 TB.
  */



-- 
David Marchand



Re: [dpdk-dev] [PATCH 1/3] ethdev: update modify field flow action

2021-10-04 Thread Ori Kam
Hi Slava,

> -Original Message-
> From: Slava Ovsiienko 
> Sent: Friday, October 1, 2021 10:52 PM
> Subject: [PATCH 1/3] ethdev: update modify field flow action
> 
> The generic modify field flow action introduced in [1] has some issues related
> to the immediate source operand:
> 
>   - immediate source can be presented either as an unsigned
> 64-bit integer or pointer to data pattern in memory.
> There was no explicit pointer field defined in the union
> 
>   - the byte ordering for 64-bit integer was not specified.
> Many fields have lesser lengths and byte ordering
> is crucial.
> 
>   - how the bit offset is applied to the immediate source
> field was not defined and documented
> 
>   - 64-bit integer size is not enough to provide MAC and

I think for mac it is enough.

> IPv6 addresses
> 
> In order to cover the issues and exclude any ambiguities the following is
> done:
> 
>   - introduce the explicit pointer field
> in rte_flow_action_modify_data structure
> 
>   - replace the 64-bit unsigned integer with 16-byte array
> 
>   - update the modify field flow action documentation
> 
> [1] commit 73b68f4c54a0 ("ethdev: introduce generic modify flow action")
> 
> Signed-off-by: Viacheslav Ovsiienko 
> ---
>  doc/guides/prog_guide/rte_flow.rst |  8 
>  doc/guides/rel_notes/release_21_11.rst |  7 +++
>  lib/ethdev/rte_flow.h  | 15 ---
>  3 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> index 2b42d5ec8c..a54760a7b4 100644
> --- a/doc/guides/prog_guide/rte_flow.rst
> +++ b/doc/guides/prog_guide/rte_flow.rst
> @@ -2835,6 +2835,14 @@ a packet to any other part of it.
>  ``value`` sets an immediate value to be used as a source or points to a
> location of the value in memory. It is used instead of ``level`` and 
> ``offset``
> for ``RTE_FLOW_FIELD_VALUE`` and ``RTE_FLOW_FIELD_POINTER``
> respectively.
> +The data in memory should be presented exactly in the same byte order
> +and length as in the relevant flow item, i.e. data for field with type
> +RTE_FLOW_FIELD_MAC_DST should follow the conventions of dst field in
> +rte_flow_item_eth structure, with type RTE_FLOW_FIELD_IPV6_SRC -
> +rte_flow_item_ipv6 conventions, and so on. The bitfield exatracted from
> +the memory being applied as second operation parameter is defined by
> +width and the destination field offset. If the field size is large than
> +16 bytes the pattern can be provided as pointer only.
> 
You should specify where is the offset of the src is taken from.
Per your example if the application wants to change the 2 byte of source mac
it should giveas an imidate value 6 bytes, with the second byte as the new 
value to set
so from where do it takes the offset? Since offset is not valid in case of 
immediate value.
I assume it is based on the offset of the destination.

>  .. _table_rte_flow_action_modify_field:
> 
> diff --git a/doc/guides/rel_notes/release_21_11.rst
> b/doc/guides/rel_notes/release_21_11.rst
> index 73e377a007..7db6cccab0 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -170,6 +170,10 @@ API Changes
>the crypto/security operation. This field will be used to communicate
>events such as soft expiry with IPsec in lookaside mode.
> 
> +* ethdev: ``rte_flow_action_modify_data`` structure udpdated, immediate
> +data
> +  array is extended, data pointer field is explicitly added to union,
> +the
> +  action behavior is defined in more strict fashion and documentation
> uddated.
> +
Uddated ->updated?
I think it is important to document here that the behavior has changed,
from seting only the relevant value to update to setting all the field and
the mask is done internally.

> 
>  ABI Changes
>  ---
> @@ -206,6 +210,9 @@ ABI Changes
>and hard expiry limits. Limits can be either in number of packets or bytes.
> 
> 
> +* ethdev: ``rte_flow_action_modify_data`` structure udpdated.
> +
> +
>  Known Issues
>  
> 
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> 7b1ed7f110..af4c693ead 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -3204,6 +3204,9 @@ enum rte_flow_field_id {  };
> 
>  /**
> + * @warning
> + * @b EXPERIMENTAL: this structure may change without prior notice
> + *
>   * Field description for MODIFY_FIELD action.
>   */
>  struct rte_flow_action_modify_data {
> @@ -3217,10 +3220,16 @@ struct rte_flow_action_modify_data {
>   uint32_t offset;
>   };
>   /**
> -  * Immediate value for RTE_FLOW_FIELD_VALUE or
> -  * memory address for RTE_FLOW_FIELD_POINTER.
> +  * Immediate value for RTE_FLOW_FIELD_VALUE, presented
> in the
> +  * same byte order and length as in relevant
> rte_flow_item_xxx.

Please see my comment about how to get the offs

Re: [dpdk-dev] [PATCH v3 1/5] ethdev: add API to negotiate delivery of Rx meta data

2021-10-04 Thread Andrew Rybchenko
On 10/4/21 2:39 PM, Ivan Malov wrote:
> On 04/10/2021 09:56, Ori Kam wrote:
>>> On 04/10/2021 00:04, Ori Kam wrote:
 I understand that you are only talking about enabling the action,
 meaning to let the PMD know that at some point there will be a rule
 that will use the mark action for example.
 Is my understanding correct?
>>>
>>> Not really. The causal relationships are as follows. The application
>>> comes to
>>> realise that it will need to use, say, action MARK in flows.
>>> This, in turn, means that, in order to be able to actually see the
>>> mark in
>>> received packets, the application needs to ensure that a) the NIC
>>> will be able
>>> to deliver the mark to the PMD and b) that the PMD will be able to
>>> deliver
>>> the mark to the application. In particular, in the case of Rx mark,
>>> (b) doesn't
>>> need to be negotiated = field "mark" is anyway provisioned in the mbuf
>>> structure, so no need to enable it. But (a) needs to be negotiated.
>>> Hence this
>>> API.
>>>
>> Please see my above comment I think we both agree.
> 
> Agree to have the 4-th flag in the new API to cover this "custom / raw
> metdata" delivery? Personally, I tend to agree, but maybe Andrew can
> express his opinion, too.

Of course, it could be added, but we're not going to support it
in net/sfc. So, I think the flag should be added when a PMD
will going to support it (e.g. net/mlx5).


[dpdk-dev] [PATCH v4 2/7] ethdev: change input parameters for rx_queue_count

2021-10-04 Thread Konstantin Ananyev
Currently majority of 'fast' ethdev ops take pointers to internal
queue data structures as an input parameter.
While eth_rx_queue_count() takes a pointer to rte_eth_dev and queue
index.
For future work to hide rte_eth_devices[] and friends it would be
plausible to unify parameters list of all 'fast' ethdev ops.
This patch changes eth_rx_queue_count() to accept pointer to internal
queue data as input parameter.
While this change is transparent to user, it still counts as an ABI change,
as eth_rx_queue_count_t is used by ethdev public inline function
rte_eth_rx_queue_count().

Signed-off-by: Konstantin Ananyev 
---
 doc/guides/rel_notes/release_21_11.rst  |  6 ++
 drivers/net/ark/ark_ethdev_rx.c |  4 ++--
 drivers/net/ark/ark_ethdev_rx.h |  3 +--
 drivers/net/atlantic/atl_ethdev.h   |  2 +-
 drivers/net/atlantic/atl_rxtx.c |  9 ++---
 drivers/net/bnxt/bnxt_ethdev.c  |  8 +---
 drivers/net/dpaa/dpaa_ethdev.c  |  9 -
 drivers/net/dpaa2/dpaa2_ethdev.c|  9 -
 drivers/net/e1000/e1000_ethdev.h|  6 ++
 drivers/net/e1000/em_rxtx.c |  4 ++--
 drivers/net/e1000/igb_rxtx.c|  4 ++--
 drivers/net/enic/enic_ethdev.c  | 12 ++--
 drivers/net/fm10k/fm10k.h   |  2 +-
 drivers/net/fm10k/fm10k_rxtx.c  |  4 ++--
 drivers/net/hns3/hns3_rxtx.c|  7 +--
 drivers/net/hns3/hns3_rxtx.h|  2 +-
 drivers/net/i40e/i40e_rxtx.c|  4 ++--
 drivers/net/i40e/i40e_rxtx.h|  3 +--
 drivers/net/iavf/iavf_rxtx.c|  4 ++--
 drivers/net/iavf/iavf_rxtx.h|  2 +-
 drivers/net/ice/ice_rxtx.c  |  4 ++--
 drivers/net/ice/ice_rxtx.h  |  2 +-
 drivers/net/igc/igc_txrx.c  |  5 ++---
 drivers/net/igc/igc_txrx.h  |  3 +--
 drivers/net/ixgbe/ixgbe_ethdev.h|  3 +--
 drivers/net/ixgbe/ixgbe_rxtx.c  |  4 ++--
 drivers/net/mlx5/mlx5_rx.c  | 26 -
 drivers/net/mlx5/mlx5_rx.h  |  2 +-
 drivers/net/netvsc/hn_rxtx.c|  4 ++--
 drivers/net/netvsc/hn_var.h |  2 +-
 drivers/net/nfp/nfp_rxtx.c  |  4 ++--
 drivers/net/nfp/nfp_rxtx.h  |  3 +--
 drivers/net/octeontx2/otx2_ethdev.h |  2 +-
 drivers/net/octeontx2/otx2_ethdev_ops.c |  8 
 drivers/net/sfc/sfc_ethdev.c| 12 ++--
 drivers/net/thunderx/nicvf_ethdev.c |  3 +--
 drivers/net/thunderx/nicvf_rxtx.c   |  4 ++--
 drivers/net/thunderx/nicvf_rxtx.h   |  2 +-
 drivers/net/txgbe/txgbe_ethdev.h|  3 +--
 drivers/net/txgbe/txgbe_rxtx.c  |  4 ++--
 drivers/net/vhost/rte_eth_vhost.c   |  4 ++--
 lib/ethdev/rte_ethdev.h |  2 +-
 lib/ethdev/rte_ethdev_core.h|  3 +--
 43 files changed, 103 insertions(+), 110 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 37dc1a7786..fd80538b6c 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -213,6 +213,12 @@ ABI Changes
   ``rte_security_ipsec_xform`` to allow applications to configure SA soft
   and hard expiry limits. Limits can be either in number of packets or bytes.
 
+* ethdev: Input parameters for ``eth_rx_queue_count_t`` was changed.
+  Instead of pointer to ``rte_eth_dev`` and queue index, now it accepts pointer
+  to internal queue data as input parameter. While this change is transparent
+  to user, it still counts as an ABI change, as ``eth_rx_queue_count_t``
+  is used by  public inline function ``rte_eth_rx_queue_count``.
+
 
 Known Issues
 
diff --git a/drivers/net/ark/ark_ethdev_rx.c b/drivers/net/ark/ark_ethdev_rx.c
index d255f0177b..98658ce621 100644
--- a/drivers/net/ark/ark_ethdev_rx.c
+++ b/drivers/net/ark/ark_ethdev_rx.c
@@ -388,11 +388,11 @@ eth_ark_rx_queue_drain(struct ark_rx_queue *queue)
 }
 
 uint32_t
-eth_ark_dev_rx_queue_count(struct rte_eth_dev *dev, uint16_t queue_id)
+eth_ark_dev_rx_queue_count(void *rx_queue)
 {
struct ark_rx_queue *queue;
 
-   queue = dev->data->rx_queues[queue_id];
+   queue = rx_queue;
return (queue->prod_index - queue->cons_index); /* mod arith */
 }
 
diff --git a/drivers/net/ark/ark_ethdev_rx.h b/drivers/net/ark/ark_ethdev_rx.h
index c8dc340a8a..859fcf1e6f 100644
--- a/drivers/net/ark/ark_ethdev_rx.h
+++ b/drivers/net/ark/ark_ethdev_rx.h
@@ -17,8 +17,7 @@ int eth_ark_dev_rx_queue_setup(struct rte_eth_dev *dev,
   unsigned int socket_id,
   const struct rte_eth_rxconf *rx_conf,
   struct rte_mempool *mp);
-uint32_t eth_ark_dev_rx_queue_count(struct rte_eth_dev *dev,
-   uint16_t rx_queue_id);
+uint32_t eth_ark_dev_rx_queue_count(void *rx_queue);
 int eth_ark_rx_stop_queue(struct rte_eth_dev *dev, uint16_t queue_id);

[dpdk-dev] [PATCH v4 0/7] hide eth dev related structures

2021-10-04 Thread Konstantin Ananyev
v4 changes:
 - Fix secondary process attach (Pavan)
 - Fix build failure (Ferruh)
 - Update lib/ethdev/verion.map (Ferruh)
   Note that moving newly added symbols from EXPERIMENTAL to DPDK_22
   section makes checkpatch.sh to complain.

v3 changes:
 - Changes in public struct naming (Jerin/Haiyue)
 - Split patches
 - Update docs
 - Shamelessly included Andrew's patch:
   
https://patches.dpdk.org/project/dpdk/patch/20210928154856.1015020-1-andrew.rybche...@oktetlabs.ru/
   into these series.
   I have to do similar thing here, so decided to avoid duplicated effort.   

The aim of these patch series is to make rte_ethdev core data structures
(rte_eth_dev, rte_eth_dev_data, rte_eth_rxtx_callback, etc.) internal to
DPDK and not visible to the user.
That should allow future possible changes to core ethdev related structures
to be transparent to the user and help to improve ABI/API stability.
Note that current ethdev API is preserved, but it is a formal ABI break.

The work is based on previous discussions at:
https://www.mail-archive.com/dev@dpdk.org/msg211405.html
https://www.mail-archive.com/dev@dpdk.org/msg216685.html
and consists of the following main points:
1. Copy public 'fast' function pointers (rx_pkt_burst(), etc.) and
   related data pointer from rte_eth_dev into a separate flat array.
   We keep it public to still be able to use inline functions for these
   'fast' calls (like rte_eth_rx_burst(), etc.) to avoid/minimize slowdown.
   Note that apart from function pointers itself, each element of this
   flat array also contains two opaque pointers for each ethdev:
   1) a pointer to an array of internal queue data pointers
   2)  points to array of queue callback data pointers.
   Note that exposing this extra information allows us to avoid extra
   changes inside PMD level, plus should help to avoid possible
   performance degradation.
2. Change implementation of 'fast' inline ethdev functions
   (rte_eth_rx_burst(), etc.) to use new public flat array.
   While it is an ABI breakage, this change is intended to be transparent
   for both users (no changes in user app is required) and PMD developers
   (no changes in PMD is required).
   One extra note - with new implementation RX/TX callback invocation
   will cost one extra function call with this changes. That might cause
   some slowdown for code-path with RX/TX callbacks heavily involved.
   Hope such trade-off is acceptable for the community.
3. Move rte_eth_dev, rte_eth_dev_data, rte_eth_rxtx_callback and related
   things into internal header: .

That approach was selected to:
  - Avoid(/minimize) possible performance losses.
  - Minimize required changes inside PMDs.
 
Performance testing results (ICX 2.0GHz, E810 (ice)):
 - testpmd macswap fwd mode, plus
   a) no RX/TX callbacks:
  no actual slowdown observed
   b) bpf-load rx 0 0 JM ./dpdk.org/examples/bpf/t3.o:
  ~2% slowdown
 - l3fwd: no actual slowdown observed

Would like to thank everyone who already reviewed and tested previous
versions of these series. All other interested parties please don't be shy
and provide your feedback.

Konstantin Ananyev (7):
  ethdev: allocate max space for internal queue array
  ethdev: change input parameters for rx_queue_count
  ethdev: copy ethdev 'fast' API into separate structure
  ethdev: make burst functions to use new flat array
  ethdev: add API to retrieve multiple ethernet addresses
  ethdev: remove legacy Rx descriptor done API
  ethdev: hide eth dev related structures

 app/test-pmd/config.c |  23 +-
 doc/guides/nics/features.rst  |   6 +-
 doc/guides/rel_notes/deprecation.rst  |   5 -
 doc/guides/rel_notes/release_21_11.rst|  21 ++
 drivers/common/octeontx2/otx2_sec_idev.c  |   2 +-
 drivers/crypto/octeontx2/otx2_cryptodev_ops.c |   2 +-
 drivers/net/ark/ark_ethdev_rx.c   |   4 +-
 drivers/net/ark/ark_ethdev_rx.h   |   3 +-
 drivers/net/atlantic/atl_ethdev.h |   2 +-
 drivers/net/atlantic/atl_rxtx.c   |   9 +-
 drivers/net/bnxt/bnxt_ethdev.c|   8 +-
 drivers/net/cxgbe/base/adapter.h  |   2 +-
 drivers/net/dpaa/dpaa_ethdev.c|   9 +-
 drivers/net/dpaa2/dpaa2_ethdev.c  |   9 +-
 drivers/net/dpaa2/dpaa2_ptp.c |   2 +-
 drivers/net/e1000/e1000_ethdev.h  |  10 +-
 drivers/net/e1000/em_ethdev.c |   1 -
 drivers/net/e1000/em_rxtx.c   |  21 +-
 drivers/net/e1000/igb_ethdev.c|   2 -
 drivers/net/e1000/igb_rxtx.c  |  21 +-
 drivers/net/enic/enic_ethdev.c|  12 +-
 drivers/net/fm10k/fm10k.h |   5 +-
 drivers/net/fm10k/fm10k_ethdev.c  |   1 -
 drivers/net/fm10k/fm10k_rxtx.c|  29 +-
 drivers/net/hns3/hns3_rxtx.c  |   7 +-
 drivers/net/hns3/hns3_rxtx.h  |   2 +-
 drivers/net/i40e/i40e_ethdev.c   

[dpdk-dev] [PATCH v4 4/7] ethdev: make burst functions to use new flat array

2021-10-04 Thread Konstantin Ananyev
Rework 'fast' burst functions to use rte_eth_fp_ops[].
While it is an API/ABI breakage, this change is intended to be
transparent for both users (no changes in user app is required) and
PMD developers (no changes in PMD is required).
One extra thing to note - RX/TX callback invocation will cause extra
function call with these changes. That might cause some insignificant
slowdown for code-path where RX/TX callbacks are heavily involved.

Signed-off-by: Konstantin Ananyev 
---
 lib/ethdev/ethdev_private.c |  31 +
 lib/ethdev/rte_ethdev.h | 242 ++--
 lib/ethdev/version.map  |   5 +
 3 files changed, 210 insertions(+), 68 deletions(-)

diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index 3eeda6e9f9..27d29b2ac6 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -226,3 +226,34 @@ eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
fpo->txq.data = dev->data->tx_queues;
fpo->txq.clbk = (void **)(uintptr_t)dev->pre_tx_burst_cbs;
 }
+
+uint16_t
+__rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id,
+   struct rte_mbuf **rx_pkts, uint16_t nb_rx, uint16_t nb_pkts,
+   void *opaque)
+{
+   const struct rte_eth_rxtx_callback *cb = opaque;
+
+   while (cb != NULL) {
+   nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
+   nb_pkts, cb->param);
+   cb = cb->next;
+   }
+
+   return nb_rx;
+}
+
+uint16_t
+__rte_eth_tx_prolog(uint16_t port_id, uint16_t queue_id,
+   struct rte_mbuf **tx_pkts, uint16_t nb_pkts, void *opaque)
+{
+   const struct rte_eth_rxtx_callback *cb = opaque;
+
+   while (cb != NULL) {
+   nb_pkts = cb->fn.tx(port_id, queue_id, tx_pkts, nb_pkts,
+   cb->param);
+   cb = cb->next;
+   }
+
+   return nb_pkts;
+}
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 9642b7c00f..7f68be406e 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4904,6 +4904,33 @@ int rte_eth_representor_info_get(uint16_t port_id,
 
 #include 
 
+/**
+ * @internal
+ * Helper routine for eth driver rx_burst API.
+ * Should be called at exit from PMD's rte_eth_rx_bulk implementation.
+ * Does necessary post-processing - invokes RX callbacks if any, etc.
+ *
+ * @param port_id
+ *  The port identifier of the Ethernet device.
+ * @param queue_id
+ *  The index of the receive queue from which to retrieve input packets.
+ * @param rx_pkts
+ *   The address of an array of pointers to *rte_mbuf* structures that
+ *   have been retrieved from the device.
+ * @param nb_pkts
+ *   The number of packets that were retrieved from the device.
+ * @param nb_pkts
+ *   The number of elements in *rx_pkts* array.
+ * @param opaque
+ *   Opaque pointer of RX queue callback related data.
+ *
+ * @return
+ *  The number of packets effectively supplied to the *rx_pkts* array.
+ */
+uint16_t __rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id,
+   struct rte_mbuf **rx_pkts, uint16_t nb_rx, uint16_t nb_pkts,
+   void *opaque);
+
 /**
  *
  * Retrieve a burst of input packets from a receive queue of an Ethernet
@@ -4995,23 +5022,37 @@ static inline uint16_t
 rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
 {
-   struct rte_eth_dev *dev = &rte_eth_devices[port_id];
uint16_t nb_rx;
+   struct rte_eth_fp_ops *p;
+   void *cb, *qd;
+
+#ifdef RTE_ETHDEV_DEBUG_RX
+   if (port_id >= RTE_MAX_ETHPORTS ||
+   queue_id >= RTE_MAX_QUEUES_PER_PORT) {
+   RTE_ETHDEV_LOG(ERR,
+   "Invalid port_id=%u or queue_id=%u\n",
+   port_id, queue_id);
+   return 0;
+   }
+#endif
+
+   /* fetch pointer to queue data */
+   p = &rte_eth_fp_ops[port_id];
+   qd = p->rxq.data[queue_id];
 
 #ifdef RTE_ETHDEV_DEBUG_RX
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
-   RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, 0);
 
-   if (queue_id >= dev->data->nb_rx_queues) {
-   RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
+   if (qd == NULL) {
+   RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u for port_id=%u\n",
+   queue_id, port_id);
return 0;
}
 #endif
-   nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
-rx_pkts, nb_pkts);
+
+   nb_rx = p->rx_pkt_burst(qd, rx_pkts, nb_pkts);
 
 #ifdef RTE_ETHDEV_RXTX_CALLBACKS
-   struct rte_eth_rxtx_callback *cb;
 
/* __ATOMIC_RELEASE memory order was used when the
 * call back was inserted into the list.
@@ -5019,16 +5060,10 @@ rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
 * not required.
   

[dpdk-dev] [PATCH v4 7/7] ethdev: hide eth dev related structures

2021-10-04 Thread Konstantin Ananyev
Move rte_eth_dev, rte_eth_dev_data, rte_eth_rxtx_callback and related
data into private header (ethdev_driver.h).
Few minor changes to keep DPDK building after that.

Signed-off-by: Konstantin Ananyev 
---
 doc/guides/rel_notes/release_21_11.rst|   6 +
 drivers/common/octeontx2/otx2_sec_idev.c  |   2 +-
 drivers/crypto/octeontx2/otx2_cryptodev_ops.c |   2 +-
 drivers/net/cxgbe/base/adapter.h  |   2 +-
 drivers/net/dpaa2/dpaa2_ptp.c |   2 +-
 drivers/net/netvsc/hn_var.h   |   1 +
 lib/ethdev/ethdev_driver.h| 149 ++
 lib/ethdev/rte_ethdev_core.h  | 143 -
 lib/ethdev/version.map|   2 +-
 lib/eventdev/rte_event_eth_rx_adapter.c   |   2 +-
 lib/eventdev/rte_event_eth_tx_adapter.c   |   2 +-
 lib/eventdev/rte_eventdev.c   |   2 +-
 lib/metrics/rte_metrics_telemetry.c   |   2 +-
 13 files changed, 165 insertions(+), 152 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 601443..2944149943 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -228,6 +228,12 @@ ABI Changes
   to user, it still counts as an ABI change, as ``eth_rx_queue_count_t``
   is used by  public inline function ``rte_eth_rx_queue_count``.
 
+* ethdev: Made ``rte_eth_dev``, ``rte_eth_dev_data``, ``rte_eth_rxtx_callback``
+  private data structures. ``rte_eth_devices[]`` can't be accessible directly
+  by user any more. While it is an ABI breakage, this change is intended
+  to be transparent for both users (no changes in user app is required) and
+  PMD developers (no changes in PMD is required).
+
 
 Known Issues
 
diff --git a/drivers/common/octeontx2/otx2_sec_idev.c 
b/drivers/common/octeontx2/otx2_sec_idev.c
index 6e9643c383..b561b67174 100644
--- a/drivers/common/octeontx2/otx2_sec_idev.c
+++ b/drivers/common/octeontx2/otx2_sec_idev.c
@@ -4,7 +4,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #include "otx2_common.h"
diff --git a/drivers/crypto/octeontx2/otx2_cryptodev_ops.c 
b/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
index 37fad11d91..f0b72e05c2 100644
--- a/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
+++ b/drivers/crypto/octeontx2/otx2_cryptodev_ops.c
@@ -6,7 +6,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #include "otx2_cryptodev.h"
diff --git a/drivers/net/cxgbe/base/adapter.h b/drivers/net/cxgbe/base/adapter.h
index 01a2a9d147..1c7c8afe16 100644
--- a/drivers/net/cxgbe/base/adapter.h
+++ b/drivers/net/cxgbe/base/adapter.h
@@ -12,7 +12,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include "../cxgbe_compat.h"
 #include "../cxgbe_ofld.h"
diff --git a/drivers/net/dpaa2/dpaa2_ptp.c b/drivers/net/dpaa2/dpaa2_ptp.c
index 899dd5d442..8d79e39244 100644
--- a/drivers/net/dpaa2/dpaa2_ptp.c
+++ b/drivers/net/dpaa2/dpaa2_ptp.c
@@ -10,7 +10,7 @@
 #include 
 #include 
 
-#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/netvsc/hn_var.h b/drivers/net/netvsc/hn_var.h
index 2a2bac9338..74e6e6010d 100644
--- a/drivers/net/netvsc/hn_var.h
+++ b/drivers/net/netvsc/hn_var.h
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 
 /*
  * Tunable ethdev params
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index cc2c75261c..63b04dce32 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -17,6 +17,155 @@
 
 #include 
 
+/**
+ * @internal
+ * Structure used to hold information about the callbacks to be called for a
+ * queue on RX and TX.
+ */
+struct rte_eth_rxtx_callback {
+   struct rte_eth_rxtx_callback *next;
+   union{
+   rte_rx_callback_fn rx;
+   rte_tx_callback_fn tx;
+   } fn;
+   void *param;
+};
+
+/**
+ * @internal
+ * The generic data structure associated with each ethernet device.
+ *
+ * Pointers to burst-oriented packet receive and transmit functions are
+ * located at the beginning of the structure, along with the pointer to
+ * where all the data elements for the particular device are stored in shared
+ * memory. This split allows the function pointer and driver data to be per-
+ * process, while the actual configuration data for the device is shared.
+ */
+struct rte_eth_dev {
+   eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
+   eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
+   eth_tx_prep_t tx_pkt_prepare;
+   /**< Pointer to PMD transmit prepare function. */
+   eth_rx_queue_count_t rx_queue_count;
+   /**< Get the number of used RX descriptors. */
+   eth_rx_descriptor_status_t rx_descriptor_status;
+   /**< Check the status of a Rx descriptor. */
+   eth_tx_descriptor_status_t tx_descriptor_status;
+   /**< Check the status of a Tx descriptor. */
+
+   /**
+* Next two fiel

[dpdk-dev] [PATCH v4 5/7] ethdev: add API to retrieve multiple ethernet addresses

2021-10-04 Thread Konstantin Ananyev
Introduce rte_eth_macaddrs_get() to allow user to retrieve all ethernet
addresses assigned to given port.
Change testpmd to use this new function and avoid referencing directly
rte_eth_devices[].

Signed-off-by: Konstantin Ananyev 
---
 app/test-pmd/config.c  | 23 +++
 doc/guides/rel_notes/release_21_11.rst |  5 +
 lib/ethdev/rte_ethdev.c| 25 +
 lib/ethdev/rte_ethdev.h| 19 +++
 lib/ethdev/version.map |  3 +++
 5 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 9c66329e96..7221644230 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -5215,20 +5215,20 @@ show_macs(portid_t port_id)
 {
char buf[RTE_ETHER_ADDR_FMT_SIZE];
struct rte_eth_dev_info dev_info;
-   struct rte_ether_addr *addr;
-   uint32_t i, num_macs = 0;
-   struct rte_eth_dev *dev;
-
-   dev = &rte_eth_devices[port_id];
+   int32_t i, rc, num_macs = 0;
 
if (eth_dev_info_get_print_err(port_id, &dev_info))
return;
 
-   for (i = 0; i < dev_info.max_mac_addrs; i++) {
-   addr = &dev->data->mac_addrs[i];
+   struct rte_ether_addr addr[dev_info.max_mac_addrs];
+   rc = rte_eth_macaddrs_get(port_id, addr, dev_info.max_mac_addrs);
+   if (rc < 0)
+   return;
+
+   for (i = 0; i < rc; i++) {
 
/* skip zero address */
-   if (rte_is_zero_ether_addr(addr))
+   if (rte_is_zero_ether_addr(&addr[i]))
continue;
 
num_macs++;
@@ -5236,14 +5236,13 @@ show_macs(portid_t port_id)
 
printf("Number of MAC address added: %d\n", num_macs);
 
-   for (i = 0; i < dev_info.max_mac_addrs; i++) {
-   addr = &dev->data->mac_addrs[i];
+   for (i = 0; i < rc; i++) {
 
/* skip zero address */
-   if (rte_is_zero_ether_addr(addr))
+   if (rte_is_zero_ether_addr(&addr[i]))
continue;
 
-   rte_ether_format_addr(buf, RTE_ETHER_ADDR_FMT_SIZE, addr);
+   rte_ether_format_addr(buf, RTE_ETHER_ADDR_FMT_SIZE, &addr[i]);
printf("  %s\n", buf);
}
 }
diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index fd80538b6c..91c392c14e 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -125,6 +125,11 @@ New Features
   * Added tests to validate packets hard expiry.
   * Added tests to verify tunnel header verification in IPsec inbound.
 
+* **Add new function into ethdev lib.**
+
+  * Added ``rte_eth_macaddrs_get`` to allow user to retrieve all Ethernet
+addresses aasigned to given ethernet port.
+
 
 Removed Items
 -
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 036c82cbfb..b051eff70e 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -3574,6 +3574,31 @@ rte_eth_dev_set_ptypes(uint16_t port_id, uint32_t 
ptype_mask,
return ret;
 }
 
+int
+rte_eth_macaddrs_get(uint16_t port_id, struct rte_ether_addr ma[], uint32_t 
num)
+{
+   int32_t ret;
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info dev_info;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+
+   ret = rte_eth_dev_info_get(port_id, &dev_info);
+   if (ret != 0)
+   return ret;
+
+   if (ma == NULL) {
+   RTE_ETHDEV_LOG(ERR, "%s: invalid parameters\n", __func__);
+   return -EINVAL;
+   }
+
+   num = RTE_MIN(dev_info.max_mac_addrs, num);
+   memcpy(ma, dev->data->mac_addrs, num * sizeof(ma[0]));
+
+   return num;
+}
+
 int
 rte_eth_macaddr_get(uint16_t port_id, struct rte_ether_addr *mac_addr)
 {
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 7f68be406e..047f7c9c5a 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -3037,6 +3037,25 @@ int rte_eth_dev_set_rx_queue_stats_mapping(uint16_t 
port_id,
  */
 int rte_eth_macaddr_get(uint16_t port_id, struct rte_ether_addr *mac_addr);
 
+/**
+ * Retrieve the Ethernet addresses of an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param ma
+ *   A pointer to an array of structures of type *ether_addr* to be filled with
+ *   the Ethernet addresses of the Ethernet device.
+ * @param num
+ *   Number of elements in the *ma* array.
+ * @return
+ *   - number of retrieved addresses if successful
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_macaddrs_get(uint16_t port_id, struct rte_ether_addr ma[],
+   uint32_t num);
+
 /**
  * Retrieve the contextual information of an Ethernet device.
  *
diff --git a/lib/ethdev/version.map b/lib/ethde

[dpdk-dev] [PATCH v4 6/7] ethdev: remove legacy Rx descriptor done API

2021-10-04 Thread Konstantin Ananyev
rte_eth_rx_descriptor_status() should be used as a replacement.

Signed-off-by: Andrew Rybchenko 
Reviewed-by: Ferruh Yigit 
Acked-by: Konstantin Ananyev 
---
 doc/guides/nics/features.rst|  6 +-
 doc/guides/rel_notes/deprecation.rst|  5 -
 doc/guides/rel_notes/release_21_11.rst  |  4 
 drivers/net/e1000/e1000_ethdev.h|  4 
 drivers/net/e1000/em_ethdev.c   |  1 -
 drivers/net/e1000/em_rxtx.c | 17 
 drivers/net/e1000/igb_ethdev.c  |  2 --
 drivers/net/e1000/igb_rxtx.c| 17 
 drivers/net/fm10k/fm10k.h   |  3 ---
 drivers/net/fm10k/fm10k_ethdev.c|  1 -
 drivers/net/fm10k/fm10k_rxtx.c  | 25 
 drivers/net/i40e/i40e_ethdev.c  |  1 -
 drivers/net/i40e/i40e_ethdev_vf.c   |  1 -
 drivers/net/i40e/i40e_rxtx.c| 26 -
 drivers/net/i40e/i40e_rxtx.h|  1 -
 drivers/net/igc/igc_ethdev.c|  1 -
 drivers/net/igc/igc_txrx.c  | 18 -
 drivers/net/igc/igc_txrx.h  |  2 --
 drivers/net/ixgbe/ixgbe_ethdev.c|  2 --
 drivers/net/ixgbe/ixgbe_ethdev.h|  2 --
 drivers/net/ixgbe/ixgbe_rxtx.c  | 18 -
 drivers/net/octeontx2/otx2_ethdev.c |  1 -
 drivers/net/octeontx2/otx2_ethdev.h |  1 -
 drivers/net/octeontx2/otx2_ethdev_ops.c | 12 
 drivers/net/sfc/sfc_ethdev.c| 17 
 drivers/net/virtio/virtio_ethdev.c  |  1 -
 lib/ethdev/rte_ethdev.c |  1 -
 lib/ethdev/rte_ethdev.h | 25 
 lib/ethdev/rte_ethdev_core.h|  4 
 29 files changed, 5 insertions(+), 214 deletions(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 4fce8cd1c9..a02ef25409 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -662,14 +662,10 @@ Rx descriptor status
 
 
 Supports check the status of a Rx descriptor. When ``rx_descriptor_status`` is
-used, status can be "Available", "Done" or "Unavailable". When
-``rx_descriptor_done`` is used, status can be "DD bit is set" or "DD bit is
-not set".
+used, status can be "Available", "Done" or "Unavailable".
 
 * **[implements] rte_eth_dev**: ``rx_descriptor_status``.
 * **[related]API**: ``rte_eth_rx_descriptor_status()``.
-* **[implements] rte_eth_dev**: ``rx_descriptor_done``.
-* **[related]API**: ``rte_eth_rx_descriptor_done()``.
 
 
 .. _nic_features_tx_descriptor_status:
diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 05fc2fdee7..82e843a0b3 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -106,11 +106,6 @@ Deprecation Notices
   the device packet overhead can be calculated as:
   ``(struct rte_eth_dev_info).max_rx_pktlen - (struct 
rte_eth_dev_info).max_mtu``
 
-* ethdev: ``rx_descriptor_done`` dev_ops and ``rte_eth_rx_descriptor_done``
-  will be removed in 21.11.
-  Existing ``rte_eth_rx_descriptor_status`` and 
``rte_eth_tx_descriptor_status``
-  APIs can be used as replacement.
-
 * ethdev: The port mirroring API can be replaced with a more fine grain flow 
API.
   The structs ``rte_eth_mirror_conf``, ``rte_eth_vlan_mirror`` and the 
functions
   ``rte_eth_mirror_rule_set``, ``rte_eth_mirror_rule_reset`` will be marked
diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index 91c392c14e..601443 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -150,6 +150,10 @@ Removed Items
   blacklist/whitelist are removed. Users must use the new
   block/allow list arguments.
 
+* ethdev: Removed ``rx_descriptor_done`` dev_ops and
+  ``rte_eth_rx_descriptor_done``.  Existing ``rte_eth_rx_descriptor_status``
+  APIs can be used as a replacement.
+
 
 API Changes
 ---
diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 460e130a83..fff52958df 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -401,8 +401,6 @@ int eth_igb_rx_queue_setup(struct rte_eth_dev *dev, 
uint16_t rx_queue_id,
 
 uint32_t eth_igb_rx_queue_count(void *rx_queue);
 
-int eth_igb_rx_descriptor_done(void *rx_queue, uint16_t offset);
-
 int eth_igb_rx_descriptor_status(void *rx_queue, uint16_t offset);
 int eth_igb_tx_descriptor_status(void *tx_queue, uint16_t offset);
 
@@ -477,8 +475,6 @@ int eth_em_rx_queue_setup(struct rte_eth_dev *dev, uint16_t 
rx_queue_id,
 
 uint32_t eth_em_rx_queue_count(void *rx_queue);
 
-int eth_em_rx_descriptor_done(void *rx_queue, uint16_t offset);
-
 int eth_em_rx_descriptor_status(void *rx_queue, uint16_t offset);
 int eth_em_tx_descriptor_status(void *tx_queue, uint16_t offset);
 
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index a0c

Re: [dpdk-dev] [EXT] Re: [PATCH v1 2/7] eal/interrupts: implement get set APIs

2021-10-04 Thread Harman Kalra



> -Original Message-
> From: Dmitry Kozlyuk 
> Sent: Monday, October 4, 2021 4:48 PM
> To: Harman Kalra 
> Cc: dev@dpdk.org; Ray Kinsella ; David Marchand
> 
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v1 2/7] eal/interrupts: implement
> get set APIs
> 
> 2021-10-04 10:37 (UTC+), Harman Kalra:
> > [...]
> > > > +struct rte_intr_handle *rte_intr_handle_instance_index_get(
> > > > +   struct rte_intr_handle *intr_handle, int
> > > index)
> > >
> > > If rte_intr_handle_instance_alloc() returns a pointer to an array,
> > > this function is useless since the user can simply manipulate a pointer.
> >
> >  User wont be able to manipulate the pointer as he is not aware of
> size of struct rte_intr_handle.
> > He will observe "dereferencing pointer to incomplete type" compilation
> error.
> 
> Sorry, my bad.
> 
> > > If we want to make a distinction between a single struct
> > > rte_intr_handle and a commonly allocated bunch of such (but why?),
> > > then they should be represented by distinct types.
> >
> >  Do you mean, we should have separate APIs for single allocation
> > and batch allocation? As get API will be useful only in case of batch
> > allocation. Currently interrupt autotests and ifpga_rawdev driver makes
> batch allocation.
> > I think common API for single and batch is fine, get API is required for
> returning a particular intr_handle instance.
> > But one problem I see in current implementation is there should be
> > upper limit check for index in get/set API, which I will fix.
> 
> I don't think we need different APIs, I was asking if it was your intention.
> Now I understand it and agree with you.
> 
> > > > +int rte_intr_handle_instance_index_set(struct rte_intr_handle
> > > *intr_handle,
> > > > +  const struct rte_intr_handle 
> > > > *src,
> > > > +  int index)
> > >
> > > See above regarding the "index" parameter. If it can be removed, a
> > > better name for this function would be rte_intr_handle_copy().
> >
> >  I think get API is required.
> 
> Maybe index is still not needed: "intr_handle" can just be a pointer to the
> right item obtained with rte_intr_handle_instance_index_get(). This way you
> also don't need to duplicate the index-checking logic.

In the current implementation, batch allocation of interrupt handle may lead to 
mem leak while
freeing efds and elist array. I am only freeing efds/elist for intr_handle[0] 
in rte_intr_handle_instance_free().
To free efds/elist of all the intr_handles[], either I should cache the size 
parameter passed during alloc. But
where should I store it? In first instance of struct rte_intr_handle. I don't 
think it will be a good idea.

Since batch allocation is only done in test suite and ifpga_rawdev.c, to keep 
things simpler let's restrict
rte_intr_handle_instance_alloc() to single instance allocation and user can 
call this API in a loop and
maintain array of handles locally. 
With this approach get_index API is not required and set_index API can be 
renamed to rte_intr_handle_copy()

Thoughts? 

Thanks
Harman



Re: [dpdk-dev] [EXT] Re: [PATCH v1 6/7] eal/interrupts: make interrupt handle structure opaque

2021-10-04 Thread Harman Kalra
Hi Dmitry,

Please find my comments inline.

> -Original Message-
> From: Dmitry Kozlyuk 
> Sent: Sunday, October 3, 2021 11:46 PM
> To: Harman Kalra 
> Cc: dev@dpdk.org; Anatoly Burakov 
> Subject: [EXT] Re: [dpdk-dev] [PATCH v1 6/7] eal/interrupts: make interrupt
> handle structure opaque
> 
> External Email
> 
> --
> 2021-09-03 18:11 (UTC+0530), Harman Kalra:
> > [...]
> > @@ -31,11 +54,40 @@ struct rte_intr_handle
> *rte_intr_handle_instance_alloc(int size,
> > }
> >
> > for (i = 0; i < size; i++) {
> > +   if (from_hugepage)
> > +   intr_handle[i].efds = rte_zmalloc(NULL,
> > +   RTE_MAX_RXTX_INTR_VEC_ID *
> sizeof(uint32_t), 0);
> > +   else
> > +   intr_handle[i].efds = calloc(1,
> > +  RTE_MAX_RXTX_INTR_VEC_ID *
> sizeof(uint32_t));
> > +   if (!intr_handle[i].efds) {
> > +   RTE_LOG(ERR, EAL, "Fail to allocate event fd list\n");
> > +   rte_errno = ENOMEM;
> > +   goto fail;
> > +   }
> > +
> > +   if (from_hugepage)
> > +   intr_handle[i].elist = rte_zmalloc(NULL,
> > +   RTE_MAX_RXTX_INTR_VEC_ID *
> > +   sizeof(struct rte_epoll_event), 0);
> > +   else
> > +   intr_handle[i].elist = calloc(1,
> > +   RTE_MAX_RXTX_INTR_VEC_ID *
> > +   sizeof(struct rte_epoll_event));
> > +   if (!intr_handle[i].elist) {
> > +   RTE_LOG(ERR, EAL, "fail to allocate event fd list\n");
> > +   rte_errno = ENOMEM;
> > +   goto fail;
> > +   }
> > intr_handle[i].nb_intr = RTE_MAX_RXTX_INTR_VEC_ID;
> > intr_handle[i].alloc_from_hugepage = from_hugepage;
> > }
> >
> > return intr_handle;
> > +fail:
> > +   free(intr_handle->efds);
> > +   free(intr_handle);
> > +   return NULL;
> 
> This is incorrect if "from_hugepage" is set.

 Ack, will fix it.


> 
> >  }
> >
> >  struct rte_intr_handle *rte_intr_handle_instance_index_get(
> > @@ -73,12 +125,48 @@ int rte_intr_handle_instance_index_set(struct
> rte_intr_handle *intr_handle,
> > }
> >
> > intr_handle[index].fd = src->fd;
> > -   intr_handle[index].vfio_dev_fd = src->vfio_dev_fd;
> > +   intr_handle[index].dev_fd = src->dev_fd;
> > +
> > intr_handle[index].type = src->type;
> > intr_handle[index].max_intr = src->max_intr;
> > intr_handle[index].nb_efd = src->nb_efd;
> > intr_handle[index].efd_counter_size = src->efd_counter_size;
> >
> > +   if (intr_handle[index].nb_intr != src->nb_intr) {
> > +   if (src->alloc_from_hugepage)
> > +   intr_handle[index].efds =
> > +   rte_realloc(intr_handle[index].efds,
> > +   src->nb_intr *
> > +   sizeof(uint32_t), 0);
> > +   else
> > +   intr_handle[index].efds =
> > +   realloc(intr_handle[index].efds,
> > +   src->nb_intr * sizeof(uint32_t));
> > +   if (intr_handle[index].efds == NULL) {
> > +   RTE_LOG(ERR, EAL, "Failed to realloc the efds list");
> > +   rte_errno = ENOMEM;
> > +   goto fail;
> > +   }
> > +
> > +   if (src->alloc_from_hugepage)
> > +   intr_handle[index].elist =
> > +   rte_realloc(intr_handle[index].elist,
> > +   src->nb_intr *
> > +   sizeof(struct rte_epoll_event), 0);
> > +   else
> > +   intr_handle[index].elist =
> > +   realloc(intr_handle[index].elist,
> > +   src->nb_intr *
> > +   sizeof(struct rte_epoll_event));
> > +   if (intr_handle[index].elist == NULL) {
> > +   RTE_LOG(ERR, EAL, "Failed to realloc the event list");
> > +   rte_errno = ENOMEM;
> > +   goto fail;
> > +   }
> > +
> > +   intr_handle[index].nb_intr = src->nb_intr;
> > +   }
> > +
> 
> This implementation leaves "intr_handle" in an invalid state and leaks
> memory on error paths.

 Yes, I will get the reallocated pointer in a tmp variable and will update
(intr_handle[index].elist/efds only after all error paths are cleared.

> 
> > memcpy(intr_handle[index].efds, src->efds, src->nb_intr);
> > memcpy(intr_handle[index].elist, src->elist, src->nb_intr);
> >
> > @@ -87,6 +175,45 @@ int rte_intr_handle_instance_index_set(struct
> rte_intr_handle *intr_handle,
> > return rte_errno;
> >  }
> >
> > +int rte_intr_handle_event_list_

Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 2:09 PM, Elad Nachman wrote:
> Hi,
> 
> EAGAIN is propogated back to the kernel and to the caller.
> 

So will the user get an error, or it will be handled by the kernel and retried?

> We cannot retry from the kni kernel module since we hold the rtnl lock.
> 

Why not? We are already waiting until a command time out, like 'kni_net_open()'
can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we can limit the
number of retry for safety.

> FYI,
> 
> Elad
> 
> בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit ‏<
> ferruh.yi...@intel.com>:
> 
>> On 9/24/2021 11:54 AM, Elad Nachman wrote:
>>> Fix lack of multiple KNI requests handling support by introducing a
>>> request in progress flag which will fail additional requests with
>>> EAGAIN return code if the original request has not been processed
>>> by user-space.
>>>
>>> Bugzilla ID: 809
>>
>> Hi Eric,
>>
>> Can you please test this patch, if it solves the issue you reported?
>>
>>>
>>> Signed-off-by: Elad Nachman 
>>> ---
>>>  kernel/linux/kni/kni_net.c | 9 +
>>>  lib/kni/rte_kni.c  | 2 ++
>>>  lib/kni/rte_kni_common.h   | 1 +
>>>  3 files changed, 12 insertions(+)
>>>
>>
>> <...>
>>
>>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev,
>> struct rte_kni_request *req)
>>>
>>>   mutex_lock(&kni->sync_lock);
>>>
>>> + /* Check that existing request has been processed: */
>>> + cur_req = (struct rte_kni_request *)kni->sync_kva;
>>> + if (cur_req->req_in_progress) {
>>> + ret = -EAGAIN;
>>
>> Overall logic in the KNI looks good to me, this helps to serialize the
>> requests
>> even for async ones.
>>
>> But can you please clarify how it behaves in the kernel side with '-EAGAIN'
>> return type? Will linux call the ndo again, or will it just fail.
>>
>> If it just fails should we handle the re-try on '-EAGAIN' within the kni
>> module?
>>
>>



Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten

2021-10-04 Thread Eric Christian
Adding Sahithi.

I believe adding the -EAGAIN method puts the responsibility on the
application/caller.  If we take the change MAC address as an example.  Most
application code just does this kind of check:

ret = ioctl(sockfd, SIOCSIFHWADDR, &ifr);

if (ret < 0) {
PMD_LOG_ERRNO(ERR, "ioctl(SIOCSIFHWADDR) failed");
return -EINVAL;
}

So the existing application code will treat the -EAGAIN as a failure and
not retry.  Unless it is expected that the IOCTL can return -EAGAIN and the
application decides to keep retrying?

We can try this, but we have temporarily patched out the async changes in
our code as it was blocking QA due to
https://bugs.dpdk.org/show_bug.cgi?id=816

Eric










On Mon, Oct 4, 2021 at 9:05 AM Ferruh Yigit  wrote:

> On 9/24/2021 11:54 AM, Elad Nachman wrote:
> > Fix lack of multiple KNI requests handling support by introducing a
> > request in progress flag which will fail additional requests with
> > EAGAIN return code if the original request has not been processed
> > by user-space.
> >
> > Bugzilla ID: 809
>
> Hi Eric,
>
> Can you please test this patch, if it solves the issue you reported?
>
> >
> > Signed-off-by: Elad Nachman 
> > ---
> >  kernel/linux/kni/kni_net.c | 9 +
> >  lib/kni/rte_kni.c  | 2 ++
> >  lib/kni/rte_kni_common.h   | 1 +
> >  3 files changed, 12 insertions(+)
> >
>
> <...>
>
> > @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev,
> struct rte_kni_request *req)
> >
> >   mutex_lock(&kni->sync_lock);
> >
> > + /* Check that existing request has been processed: */
> > + cur_req = (struct rte_kni_request *)kni->sync_kva;
> > + if (cur_req->req_in_progress) {
> > + ret = -EAGAIN;
>
> Overall logic in the KNI looks good to me, this helps to serialize the
> requests
> even for async ones.
>
> But can you please clarify how it behaves in the kernel side with '-EAGAIN'
> return type? Will linux call the ndo again, or will it just fail.
>
> If it just fails should we handle the re-try on '-EAGAIN' within the kni
> module?
>
>


[dpdk-dev] [PATCH v4 1/7] ethdev: allocate max space for internal queue array

2021-10-04 Thread Konstantin Ananyev
At queue configure stage always allocate space for maximum possible
number (RTE_MAX_QUEUES_PER_PORT) of queue pointers.
That will allow 'fast' inline functions (eth_rx_burst, etc.) to refer
pointer to internal queue data without extra checking of current number
of configured queues.
That would help in future to hide rte_eth_dev and related structures.

Signed-off-by: Konstantin Ananyev 
---
 lib/ethdev/rte_ethdev.c | 36 +---
 1 file changed, 9 insertions(+), 27 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index daf5ca9242..424bc260fa 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -898,7 +898,8 @@ eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t 
nb_queues)
 
if (dev->data->rx_queues == NULL && nb_queues != 0) { /* first time 
configuration */
dev->data->rx_queues = rte_zmalloc("ethdev->rx_queues",
-   sizeof(dev->data->rx_queues[0]) * nb_queues,
+   sizeof(dev->data->rx_queues[0]) *
+   RTE_MAX_QUEUES_PER_PORT,
RTE_CACHE_LINE_SIZE);
if (dev->data->rx_queues == NULL) {
dev->data->nb_rx_queues = 0;
@@ -909,21 +910,11 @@ eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t 
nb_queues)
 
rxq = dev->data->rx_queues;
 
-   for (i = nb_queues; i < old_nb_queues; i++)
+   for (i = nb_queues; i < old_nb_queues; i++) {
(*dev->dev_ops->rx_queue_release)(rxq[i]);
-   rxq = rte_realloc(rxq, sizeof(rxq[0]) * nb_queues,
-   RTE_CACHE_LINE_SIZE);
-   if (rxq == NULL)
-   return -(ENOMEM);
-   if (nb_queues > old_nb_queues) {
-   uint16_t new_qs = nb_queues - old_nb_queues;
-
-   memset(rxq + old_nb_queues, 0,
-   sizeof(rxq[0]) * new_qs);
+   rxq[i] = NULL;
}
 
-   dev->data->rx_queues = rxq;
-
} else if (dev->data->rx_queues != NULL && nb_queues == 0) {
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release, 
-ENOTSUP);
 
@@ -1138,8 +1129,9 @@ eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t 
nb_queues)
 
if (dev->data->tx_queues == NULL && nb_queues != 0) { /* first time 
configuration */
dev->data->tx_queues = rte_zmalloc("ethdev->tx_queues",
-  
sizeof(dev->data->tx_queues[0]) * nb_queues,
-  RTE_CACHE_LINE_SIZE);
+   sizeof(dev->data->tx_queues[0]) *
+   RTE_MAX_QUEUES_PER_PORT,
+   RTE_CACHE_LINE_SIZE);
if (dev->data->tx_queues == NULL) {
dev->data->nb_tx_queues = 0;
return -(ENOMEM);
@@ -1149,21 +1141,11 @@ eth_dev_tx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)
 
txq = dev->data->tx_queues;
 
-   for (i = nb_queues; i < old_nb_queues; i++)
+   for (i = nb_queues; i < old_nb_queues; i++) {
(*dev->dev_ops->tx_queue_release)(txq[i]);
-   txq = rte_realloc(txq, sizeof(txq[0]) * nb_queues,
- RTE_CACHE_LINE_SIZE);
-   if (txq == NULL)
-   return -ENOMEM;
-   if (nb_queues > old_nb_queues) {
-   uint16_t new_qs = nb_queues - old_nb_queues;
-
-   memset(txq + old_nb_queues, 0,
-  sizeof(txq[0]) * new_qs);
+   txq[i] = NULL;
}
 
-   dev->data->tx_queues = txq;
-
} else if (dev->data->tx_queues != NULL && nb_queues == 0) {
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, 
-ENOTSUP);
 
-- 
2.26.3



[dpdk-dev] [PATCH v3] test: add reassembly perf test

2021-10-04 Thread pbhagavatula
From: Pavan Nikhilesh 

Add reassembly perf autotest for both ipv4 and ipv6 reassembly.
Each test is performed with variable number of fragments per flow,
either ordered or unordered fragments and interleaved flows.

Signed-off-by: Pavan Nikhilesh 
---
 v3 Changes:
 - Fix checkpatch issues.
 v2 Changes
 - Rebase to master, reduce memory consumption, set default mempool ops
 to ring_mp_mc.

 app/test/meson.build|   2 +
 app/test/test_reassembly_perf.c | 991 
 2 files changed, 993 insertions(+)
 create mode 100644 app/test/test_reassembly_perf.c

diff --git a/app/test/meson.build b/app/test/meson.build
index f144d8b8ed..f1957dc5b5 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -109,6 +109,7 @@ test_sources = files(
 'test_rawdev.c',
 'test_rcu_qsbr.c',
 'test_rcu_qsbr_perf.c',
+'test_reassembly_perf.c',
 'test_reciprocal_division.c',
 'test_reciprocal_division_perf.c',
 'test_red.c',
@@ -315,6 +316,7 @@ perf_test_names = [
 'hash_readwrite_lf_perf_autotest',
 'trace_perf_autotest',
 'ipsec_perf_autotest',
+'reassembly_perf_autotest',
 ]

 driver_test_names = [
diff --git a/app/test/test_reassembly_perf.c b/app/test/test_reassembly_perf.c
new file mode 100644
index 00..833f50ff6f
--- /dev/null
+++ b/app/test/test_reassembly_perf.c
@@ -0,0 +1,991 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Marvell.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define MAX_FLOWS  (1024 * 32)
+#define MAX_BKTS   MAX_FLOWS
+#define MAX_ENTRIES_PER_BKT 16
+#define MAX_FRAGMENTS  RTE_LIBRTE_IP_FRAG_MAX_FRAG
+#define MIN_FRAGMENTS  2
+#define MAX_PKTS   (MAX_FLOWS * MAX_FRAGMENTS)
+
+#define MAX_PKT_LEN 2048
+#define MAX_TTL_MS  (5 * MS_PER_S)
+
+/* use RFC863 Discard Protocol */
+#define UDP_SRC_PORT 9
+#define UDP_DST_PORT 9
+
+/* use RFC5735 / RFC2544 reserved network test addresses */
+#define IP_SRC_ADDR(x) ((198U << 24) | (18 << 16) | (0 << 8) | (x))
+#define IP_DST_ADDR(x) ((198U << 24) | (18 << 16) | (1 << 8) | (x))
+
+/* 2001:0200::/48 is IANA reserved range for IPv6 benchmarking (RFC5180) */
+static uint8_t ip6_addr[16] = {32, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0};
+#define IP6_VERSION 6
+
+#define IP_DEFTTL 64 /* from RFC 1340. */
+
+static struct rte_ip_frag_tbl *frag_tbl;
+static struct rte_mempool *pkt_pool;
+static struct rte_mbuf *mbufs[MAX_FLOWS][MAX_FRAGMENTS];
+static uint8_t frag_per_flow[MAX_FLOWS];
+static uint32_t flow_cnt;
+
+#define FILL_MODE_LINEAR  0
+#define FILL_MODE_RANDOM  1
+#define FILL_MODE_INTERLEAVED 2
+
+static int
+reassembly_test_setup(void)
+{
+   uint64_t max_ttl_cyc = (MAX_TTL_MS * rte_get_timer_hz()) / 1E3;
+
+   frag_tbl = rte_ip_frag_table_create(MAX_FLOWS, MAX_ENTRIES_PER_BKT,
+   MAX_FLOWS * MAX_ENTRIES_PER_BKT,
+   max_ttl_cyc, rte_socket_id());
+   if (frag_tbl == NULL)
+   return TEST_FAILED;
+
+   rte_mbuf_set_user_mempool_ops("ring_mp_mc");
+   pkt_pool = rte_pktmbuf_pool_create(
+   "reassembly_perf_pool", MAX_FLOWS * MAX_FRAGMENTS, 0, 0,
+   RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
+   if (pkt_pool == NULL) {
+   printf("[%s] Failed to create pkt pool\n", __func__);
+   rte_ip_frag_table_destroy(frag_tbl);
+   return TEST_FAILED;
+   }
+
+   return TEST_SUCCESS;
+}
+
+static void
+reassembly_test_teardown(void)
+{
+   if (frag_tbl != NULL)
+   rte_ip_frag_table_destroy(frag_tbl);
+
+   if (pkt_pool != NULL)
+   rte_mempool_free(pkt_pool);
+}
+
+static void
+randomize_array_positions(void **array, uint8_t sz)
+{
+   void *tmp;
+   int i, j;
+
+   if (sz == 2) {
+   tmp = array[0];
+   array[0] = array[1];
+   array[1] = tmp;
+   } else {
+   for (i = sz - 1; i > 0; i--) {
+   j = rte_rand_max(i + 1);
+   tmp = array[i];
+   array[i] = array[j];
+   array[j] = tmp;
+   }
+   }
+}
+
+static void
+reassembly_print_banner(const char *proto_str)
+{
+   printf("+=="
+  "+\n");
+   printf("| %-32s| %-3s : %-58d|\n", proto_str, "Flow Count", MAX_FLOWS);
+   printf("+++=+=+"
+  "+===+\n");
+   printf("%-17s%-17s%-14s%-14s%-25s%-20s\n", "| Fragment Order",
+  "| Fragments/Flow", "| Outstanding", "| Cycles/Flow",
+  "| Cycles/Fragme

Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten

2021-10-04 Thread Elad Nachman
1. Userspace will get an error
2. Waiting with rtnl locked causes a deadlock; waiting with rtnl unlocked
for interface down command causes a crash because of a race condition in
the device delete/unregister list in the kernel.

FYI,

Elad.

בתאריך יום ב׳, 4 באוק׳ 2021, 17:13, מאת Ferruh Yigit ‏<
ferruh.yi...@intel.com>:

> On 10/4/2021 2:09 PM, Elad Nachman wrote:
> > Hi,
> >
> > EAGAIN is propogated back to the kernel and to the caller.
> >
>
> So will the user get an error, or it will be handled by the kernel and
> retried?
>
> > We cannot retry from the kni kernel module since we hold the rtnl lock.
> >
>
> Why not? We are already waiting until a command time out, like
> 'kni_net_open()'
> can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we can
> limit the
> number of retry for safety.
>
> > FYI,
> >
> > Elad
> >
> > בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit ‏<
> > ferruh.yi...@intel.com>:
> >
> >> On 9/24/2021 11:54 AM, Elad Nachman wrote:
> >>> Fix lack of multiple KNI requests handling support by introducing a
> >>> request in progress flag which will fail additional requests with
> >>> EAGAIN return code if the original request has not been processed
> >>> by user-space.
> >>>
> >>> Bugzilla ID: 809
> >>
> >> Hi Eric,
> >>
> >> Can you please test this patch, if it solves the issue you reported?
> >>
> >>>
> >>> Signed-off-by: Elad Nachman 
> >>> ---
> >>>  kernel/linux/kni/kni_net.c | 9 +
> >>>  lib/kni/rte_kni.c  | 2 ++
> >>>  lib/kni/rte_kni_common.h   | 1 +
> >>>  3 files changed, 12 insertions(+)
> >>>
> >>
> >> <...>
> >>
> >>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev,
> >> struct rte_kni_request *req)
> >>>
> >>>   mutex_lock(&kni->sync_lock);
> >>>
> >>> + /* Check that existing request has been processed: */
> >>> + cur_req = (struct rte_kni_request *)kni->sync_kva;
> >>> + if (cur_req->req_in_progress) {
> >>> + ret = -EAGAIN;
> >>
> >> Overall logic in the KNI looks good to me, this helps to serialize the
> >> requests
> >> even for async ones.
> >>
> >> But can you please clarify how it behaves in the kernel side with
> '-EAGAIN'
> >> return type? Will linux call the ndo again, or will it just fail.
> >>
> >> If it just fails should we handle the re-try on '-EAGAIN' within the kni
> >> module?
> >>
> >>
>
>


[dpdk-dev] [PATCH v4 0/5] Virtio PMD RSS support & RSS fixes

2021-10-04 Thread Maxime Coquelin
This series is mainly adding support for RSS to Virtio PMD
driver. The two last patches are fixing an issue in testpmd
that could cause out of bounds access, and fix
an issue spotted in the mlx5 driver while looking for
inspiration.

The first motivation for this series is to eventually
support RSS down to the Vhost-user library, so that OVS can
benefit from it. But it will be also useful with vDPA
devices in the future.

Regarding the testing, I have tested it with qemu v5.2 from
Fedora 34. Since libvirt does not support yet enabling RSS
feature in the Qemu virtio-net device, and this feature is
disabled by default, the tester can either rebuild the qemu
package to enable it by default or use the qemu cmdline to
do the same.

The tester can use testpmd in icmpecho mode in the guest
and scapy on the host to inject random traffic on the tap
interface, e.g.:
sendp(Ether(src=RandMAC()) / IP(src=RandIP(), dst='192.168.123.9') / 
UDP(sport=RandShort(), dport=RandShort()), loop=True, iface='vnet7')

Then it can play with RSS config in testpmd to change the
RETA, or hash type and see traffic being steered
accordingly by checking the Rx xstats.

Changes in v4:
==
- s/GPTU/GTPU/ (Xiaoyun)

Changes in v3:
==
- Add applying user-specified RSS conf a device config time (Andrew)
- Remove useless checks (Chenbo)
- Clean control message payload dlen variable (Chenbo)
- Add GTPU offload type (Xiaoyun)
- Add missing types to str2flowtype() (Xiaoyun)

Changes in v2:
==
- Rework patch 2 to keep old behaviour, but fix possible out of bounds due to 
key length (Andrew/Nelio/Xiaoyun)
- s/reta/RETA/ (Andrew)
- Applied A-by on patch 3 (Slava)
- Fix display of configured hash types
- Add missing flow types definition to testpmd's port info command

Maxime Coquelin (5):
  net/virtio: add initial RSS support
  app/testpmd: fix RSS key length
  app/testpmd: fix RSS type display
  net/mlx5: fix RSS RETA update
  app/testpmd: add missing flow types in port info

 app/test-pmd/cmdline.c |   4 +
 app/test-pmd/config.c  |  11 +-
 doc/guides/nics/features/virtio.ini|   3 +
 doc/guides/nics/virtio.rst |   3 +
 doc/guides/rel_notes/release_21_11.rst |   6 +
 drivers/net/mlx5/mlx5_rss.c|   2 +-
 drivers/net/virtio/virtio.h|  31 +-
 drivers/net/virtio/virtio_ethdev.c | 394 -
 drivers/net/virtio/virtio_ethdev.h |   3 +-
 drivers/net/virtio/virtqueue.h |  21 ++
 10 files changed, 466 insertions(+), 12 deletions(-)

-- 
2.31.1



[dpdk-dev] [PATCH v4 1/5] net/virtio: add initial RSS support

2021-10-04 Thread Maxime Coquelin
Provide the capability to update the hash key, hash types
and RETA table on the fly (without needing to stop/start
the device). However, the key length and the number of RETA
entries are fixed to 40B and 128 entries respectively. This
is done in order to simplify the design, but may be
revisited later as the Virtio spec provides this
flexibility.

Note that only VIRTIO_NET_F_RSS support is implemented,
VIRTIO_NET_F_HASH_REPORT, which would enable reporting the
packet RSS hash calculated by the device into mbuf.rss, is
not yet supported.

Regarding the default RSS configuration, it has been
chosen to use the default Intel ixgbe key as default key,
and default RETA is a simple modulo between the hash and
the number of Rx queues.

Signed-off-by: Maxime Coquelin 
---
 doc/guides/nics/features/virtio.ini|   3 +
 doc/guides/nics/virtio.rst |   3 +
 doc/guides/rel_notes/release_21_11.rst |   6 +
 drivers/net/virtio/virtio.h|  31 +-
 drivers/net/virtio/virtio_ethdev.c | 394 -
 drivers/net/virtio/virtio_ethdev.h |   3 +-
 drivers/net/virtio/virtqueue.h |  21 ++
 7 files changed, 452 insertions(+), 9 deletions(-)

diff --git a/doc/guides/nics/features/virtio.ini 
b/doc/guides/nics/features/virtio.ini
index 48f6f393b1..a5eab4932f 100644
--- a/doc/guides/nics/features/virtio.ini
+++ b/doc/guides/nics/features/virtio.ini
@@ -14,6 +14,9 @@ Promiscuous mode = Y
 Allmulticast mode= Y
 Unicast MAC filter   = Y
 Multicast MAC filter = Y
+RSS hash = P
+RSS key update   = Y
+RSS reta update  = Y
 VLAN filter  = Y
 Basic stats  = Y
 Stats per queue  = Y
diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst
index 82ce7399ce..98e0d012b7 100644
--- a/doc/guides/nics/virtio.rst
+++ b/doc/guides/nics/virtio.rst
@@ -73,6 +73,9 @@ In this release, the virtio PMD driver provides the basic 
functionality of packe
 
 *   Virtio supports using port IO to get PCI resource when UIO module is not 
available.
 
+*   Virtio supports RSS Rx mode with 40B configurable hash key length, 128
+configurable RETA entries and configurable hash types.
+
 Prerequisites
 -
 
diff --git a/doc/guides/rel_notes/release_21_11.rst 
b/doc/guides/rel_notes/release_21_11.rst
index f099b1cca2..006f3d9c5f 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -129,6 +129,12 @@ New Features
   * Added tests to validate packets hard expiry.
   * Added tests to verify tunnel header verification in IPsec inbound.
 
+* **Added initial RSS support to Virtio PMD.**
+
+  Initial support for RSS receive mode has been added to the Virtio PMD,
+  with the capability for the application to configure the hash key, the
+  RETA and the hash types. Virtio hash reporting is yet to be added.
+
 
 Removed Items
 -
diff --git a/drivers/net/virtio/virtio.h b/drivers/net/virtio/virtio.h
index e78b2e429e..7118e5d24c 100644
--- a/drivers/net/virtio/virtio.h
+++ b/drivers/net/virtio/virtio.h
@@ -30,6 +30,7 @@
 #define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on the 
network */
 #define VIRTIO_NET_F_MQ22  /* Device supports Receive Flow 
Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
+#define VIRTIO_NET_F_RSS   60  /* RSS supported */
 
 /*
  * Do we get callbacks when the ring is completely used,
@@ -100,6 +101,29 @@
  */
 #define VIRTIO_MAX_INDIRECT ((int)(rte_mem_page_size() / 16))
 
+/*  Virtio RSS hash types */
+#define VIRTIO_NET_HASH_TYPE_IPV4  (1 << 0)
+#define VIRTIO_NET_HASH_TYPE_TCPV4 (1 << 1)
+#define VIRTIO_NET_HASH_TYPE_UDPV4 (1 << 2)
+#define VIRTIO_NET_HASH_TYPE_IPV6  (1 << 3)
+#define VIRTIO_NET_HASH_TYPE_TCPV6 (1 << 4)
+#define VIRTIO_NET_HASH_TYPE_UDPV6 (1 << 5)
+#define VIRTIO_NET_HASH_TYPE_IP_EX (1 << 6)
+#define VIRTIO_NET_HASH_TYPE_TCP_EX(1 << 7)
+#define VIRTIO_NET_HASH_TYPE_UDP_EX(1 << 8)
+
+#define VIRTIO_NET_HASH_TYPE_MASK ( \
+   VIRTIO_NET_HASH_TYPE_IPV4 | \
+   VIRTIO_NET_HASH_TYPE_TCPV4 | \
+   VIRTIO_NET_HASH_TYPE_UDPV4 | \
+   VIRTIO_NET_HASH_TYPE_IPV6 | \
+   VIRTIO_NET_HASH_TYPE_TCPV6 | \
+   VIRTIO_NET_HASH_TYPE_UDPV6 | \
+   VIRTIO_NET_HASH_TYPE_IP_EX | \
+   VIRTIO_NET_HASH_TYPE_TCP_EX | \
+   VIRTIO_NET_HASH_TYPE_UDP_EX)
+
+
 /*
  * Maximum number of virtqueues per device.
  */
@@ -157,7 +181,9 @@ struct virtio_net_config {
 * Any other value stands for unknown.
 */
uint8_t duplex;
-
+   uint8_t rss_max_key_size;
+   uint16_t rss_max_indirection_table_length;
+   uint32_t supported_hash_types;
 } __rte_packed;
 
 struct virtio_hw {
@@ -190,6 +216,9 @@ struct virtio_hw {
rte_spinlock_t state_lock;
struct rte_mbuf **inject_pkts;
uint16_t max_queue_pairs;
+   uint32_t rss_hash_types;
+   uint16_t *rss_reta;
+   uint8_t *rss_key;
uin

[dpdk-dev] [PATCH v4 2/5] app/testpmd: fix RSS key length

2021-10-04 Thread Maxime Coquelin
port_rss_hash_key_update() initializes rss_conf with the
RSS key configuration provided  by the user, but it calls
rte_eth_dev_rss_hash_conf_get() before calling
rte_eth_dev_rss_hash_update(), which overrides the parsed
RSS config.

While the RSS key value is set again after, this is not
the case of the key length. It could cause out of bounds
access if the key length parsed is smaller than the one
read from rte_eth_dev_rss_hash_conf_get().

This patch restores the key length before the
rte_eth_dev_rss_hash_update() call to ensure the RSS key
value/length pair is consistent.

Fixes: 8205e241b2b0 ("app/testpmd: add missing type to RSS hash commands")
Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
Acked-by: Xiaoyun Li 
Reviewed-by: Chenbo Xia 
---
 app/test-pmd/config.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 9c66329e96..611965769c 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2854,7 +2854,7 @@ port_rss_hash_key_update(portid_t port_id, char 
rss_type[], uint8_t *hash_key,
unsigned int i;
 
rss_conf.rss_key = NULL;
-   rss_conf.rss_key_len = hash_key_len;
+   rss_conf.rss_key_len = 0;
rss_conf.rss_hf = 0;
for (i = 0; rss_type_table[i].str; i++) {
if (!strcmp(rss_type_table[i].str, rss_type))
@@ -2863,6 +2863,7 @@ port_rss_hash_key_update(portid_t port_id, char 
rss_type[], uint8_t *hash_key,
diag = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf);
if (diag == 0) {
rss_conf.rss_key = hash_key;
+   rss_conf.rss_key_len = hash_key_len;
diag = rte_eth_dev_rss_hash_update(port_id, &rss_conf);
}
if (diag == 0)
-- 
2.31.1



[dpdk-dev] [PATCH v4 3/5] app/testpmd: fix RSS type display

2021-10-04 Thread Maxime Coquelin
This patch fixes the display of the RSS hash types
configured in the port, which displayed "all" even
if only a single type was configured

Fixes: 3c90743dd3b9 ("app/testpmd: support more types for flow RSS")
Cc: sta...@dpdk.org

Signed-off-by: Maxime Coquelin 
Acked-by: Xiaoyun Li 
Reviewed-by: Chenbo Xia 
---
 app/test-pmd/config.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 611965769c..9a4a0c232b 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2833,7 +2833,9 @@ port_rss_hash_conf_show(portid_t port_id, int 
show_rss_key)
}
printf("RSS functions:\n ");
for (i = 0; rss_type_table[i].str; i++) {
-   if (rss_hf & rss_type_table[i].rss_type)
+   if (rss_type_table[i].rss_type == 0)
+   continue;
+   if ((rss_hf & rss_type_table[i].rss_type) == 
rss_type_table[i].rss_type)
printf("%s ", rss_type_table[i].str);
}
printf("\n");
-- 
2.31.1



[dpdk-dev] [PATCH v4 4/5] net/mlx5: fix RSS RETA update

2021-10-04 Thread Maxime Coquelin
This patch fixes RETA updating for entries above 64.
Without ithat, these entries are never updated as
calculated mask value will always be 0.

Fixes: 634efbc2c8c0 ("mlx5: support RETA query and update")
Cc: sta...@dpdk.org
Cc: nelio.laranje...@6wind.com

Signed-off-by: Maxime Coquelin 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_rss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rss.c b/drivers/net/mlx5/mlx5_rss.c
index c32129cdc2..6dc52acee0 100644
--- a/drivers/net/mlx5/mlx5_rss.c
+++ b/drivers/net/mlx5/mlx5_rss.c
@@ -211,7 +211,7 @@ mlx5_dev_rss_reta_update(struct rte_eth_dev *dev,
for (idx = 0, i = 0; (i != reta_size); ++i) {
idx = i / RTE_RETA_GROUP_SIZE;
pos = i % RTE_RETA_GROUP_SIZE;
-   if (((reta_conf[idx].mask >> i) & 0x1) == 0)
+   if (((reta_conf[idx].mask >> pos) & 0x1) == 0)
continue;
MLX5_ASSERT(reta_conf[idx].reta[pos] < priv->rxqs_n);
(*priv->reta_idx)[i] = reta_conf[idx].reta[pos];
-- 
2.31.1



[dpdk-dev] [PATCH v4 5/5] app/testpmd: add missing flow types in port info

2021-10-04 Thread Maxime Coquelin
This patch adds missing IPv6-Ex and GTPU flow types to port
info command. It also add the same definitions to
str2flowtype(), used to configure flow director.

Signed-off-by: Maxime Coquelin 
---
 app/test-pmd/cmdline.c | 4 
 app/test-pmd/config.c  | 4 
 2 files changed, 8 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index a9efd027c3..2fb94df88e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -10408,6 +10408,10 @@ str2flowtype(char *string)
{"ipv6-sctp", RTE_ETH_FLOW_NONFRAG_IPV6_SCTP},
{"ipv6-other", RTE_ETH_FLOW_NONFRAG_IPV6_OTHER},
{"l2_payload", RTE_ETH_FLOW_L2_PAYLOAD},
+   {"ipv6-ex", RTE_ETH_FLOW_IPV6_EX},
+   {"ipv6-tcp-ex", RTE_ETH_FLOW_IPV6_TCP_EX},
+   {"ipv6-udp-ex", RTE_ETH_FLOW_IPV6_UDP_EX},
+   {"gtpu", RTE_ETH_FLOW_GTPU},
};
 
for (i = 0; i < RTE_DIM(flowtype_str); i++) {
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 9a4a0c232b..dbad470bcd 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -4528,11 +4528,15 @@ flowtype_to_str(uint16_t flow_type)
{"ipv6-sctp", RTE_ETH_FLOW_NONFRAG_IPV6_SCTP},
{"ipv6-other", RTE_ETH_FLOW_NONFRAG_IPV6_OTHER},
{"l2_payload", RTE_ETH_FLOW_L2_PAYLOAD},
+   {"ipv6-ex", RTE_ETH_FLOW_IPV6_EX},
+   {"ipv6-tcp-ex", RTE_ETH_FLOW_IPV6_TCP_EX},
+   {"ipv6-udp-ex", RTE_ETH_FLOW_IPV6_UDP_EX},
{"port", RTE_ETH_FLOW_PORT},
{"vxlan", RTE_ETH_FLOW_VXLAN},
{"geneve", RTE_ETH_FLOW_GENEVE},
{"nvgre", RTE_ETH_FLOW_NVGRE},
{"vxlan-gpe", RTE_ETH_FLOW_VXLAN_GPE},
+   {"gtpu", RTE_ETH_FLOW_GTPU},
};
 
for (i = 0; i < RTE_DIM(flowtype_str_table); i++) {
-- 
2.31.1



[dpdk-dev] [PATCH v4 3/7] ethdev: copy ethdev 'fast' API into separate structure

2021-10-04 Thread Konstantin Ananyev
Copy public function pointers (rx_pkt_burst(), etc.) and related
pointers to internal data from rte_eth_dev structure into a
separate flat array. That array will remain in a public header.
The intention here is to make rte_eth_dev and related structures internal.
That should allow future possible changes to core eth_dev structures
to be transparent to the user and help to avoid ABI/API breakages.
The plan is to keep minimal part of data from rte_eth_dev public,
so we still can use inline functions for 'fast' calls
(like rte_eth_rx_burst(), etc.) to avoid/minimize slowdown.

Signed-off-by: Konstantin Ananyev 
---
 lib/ethdev/ethdev_private.c  | 52 
 lib/ethdev/ethdev_private.h  |  7 +
 lib/ethdev/rte_ethdev.c  | 27 +++
 lib/ethdev/rte_ethdev_core.h | 45 +++
 4 files changed, 131 insertions(+)

diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index 012cf73ca2..3eeda6e9f9 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -174,3 +174,55 @@ rte_eth_devargs_parse_representor_ports(char *str, void 
*data)
RTE_LOG(ERR, EAL, "wrong representor format: %s\n", str);
return str == NULL ? -1 : 0;
 }
+
+static uint16_t
+dummy_eth_rx_burst(__rte_unused void *rxq,
+   __rte_unused struct rte_mbuf **rx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   RTE_ETHDEV_LOG(ERR, "rx_pkt_burst for unconfigured port\n");
+   rte_errno = ENOTSUP;
+   return 0;
+}
+
+static uint16_t
+dummy_eth_tx_burst(__rte_unused void *txq,
+   __rte_unused struct rte_mbuf **tx_pkts,
+   __rte_unused uint16_t nb_pkts)
+{
+   RTE_ETHDEV_LOG(ERR, "tx_pkt_burst for unconfigured port\n");
+   rte_errno = ENOTSUP;
+   return 0;
+}
+
+void
+eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
+{
+   static void *dummy_data[RTE_MAX_QUEUES_PER_PORT];
+   static const struct rte_eth_fp_ops dummy_ops = {
+   .rx_pkt_burst = dummy_eth_rx_burst,
+   .tx_pkt_burst = dummy_eth_tx_burst,
+   .rxq = {.data = dummy_data, .clbk = dummy_data,},
+   .txq = {.data = dummy_data, .clbk = dummy_data,},
+   };
+
+   *fpo = dummy_ops;
+}
+
+void
+eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
+   const struct rte_eth_dev *dev)
+{
+   fpo->rx_pkt_burst = dev->rx_pkt_burst;
+   fpo->tx_pkt_burst = dev->tx_pkt_burst;
+   fpo->tx_pkt_prepare = dev->tx_pkt_prepare;
+   fpo->rx_queue_count = dev->rx_queue_count;
+   fpo->rx_descriptor_status = dev->rx_descriptor_status;
+   fpo->tx_descriptor_status = dev->tx_descriptor_status;
+
+   fpo->rxq.data = dev->data->rx_queues;
+   fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
+
+   fpo->txq.data = dev->data->tx_queues;
+   fpo->txq.clbk = (void **)(uintptr_t)dev->pre_tx_burst_cbs;
+}
diff --git a/lib/ethdev/ethdev_private.h b/lib/ethdev/ethdev_private.h
index 3724429577..40333e7651 100644
--- a/lib/ethdev/ethdev_private.h
+++ b/lib/ethdev/ethdev_private.h
@@ -26,4 +26,11 @@ eth_find_device(const struct rte_eth_dev *_start, 
rte_eth_cmp_t cmp,
 /* Parse devargs value for representor parameter. */
 int rte_eth_devargs_parse_representor_ports(char *str, void *data);
 
+/* reset eth 'fast' API to dummy values */
+void eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo);
+
+/* setup eth 'fast' API to ethdev values */
+void eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
+   const struct rte_eth_dev *dev);
+
 #endif /* _ETH_PRIVATE_H_ */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 424bc260fa..036c82cbfb 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -44,6 +44,9 @@
 static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
 struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
 
+/* public 'fast' API */
+struct rte_eth_fp_ops rte_eth_fp_ops[RTE_MAX_ETHPORTS];
+
 /* spinlock for eth device callbacks */
 static rte_spinlock_t eth_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
 
@@ -578,6 +581,8 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
rte_eth_dev_callback_process(eth_dev,
RTE_ETH_EVENT_DESTROY, NULL);
 
+   eth_dev_fp_ops_reset(rte_eth_fp_ops + eth_dev->data->port_id);
+
rte_spinlock_lock(ð_dev_shared_data->ownership_lock);
 
eth_dev->state = RTE_ETH_DEV_UNUSED;
@@ -1788,6 +1793,9 @@ rte_eth_dev_start(uint16_t port_id)
(*dev->dev_ops->link_update)(dev, 0);
}
 
+   /* expose selection of PMD rx/tx function */
+   eth_dev_fp_ops_setup(rte_eth_fp_ops + port_id, dev);
+
rte_ethdev_trace_start(port_id);
return 0;
 }
@@ -1810,6 +1818,9 @@ rte_eth_dev_stop(uint16_t port_id)
return 0;
}
 
+   /* point rx/tx functions to dummy ones */
+   eth_dev_fp_ops_reset(rte_eth_fp_ops + port_id);
+

Re: [dpdk-dev] [PATCH v1 5/5] lib/eal: remove unneeded header includes

2021-10-04 Thread Mattias Rönnblom
On 2021-10-04 12:23, Van Haaren, Harry wrote:
>> -Original Message-
>> From: Morrissey, Sean 
>> Sent: Monday, October 4, 2021 11:11 AM
>> To: Burakov, Anatoly ; Jerin Jacob
>> ; Sunil Kumar Kori ; mattias.ronnblom
>> ; Van Haaren, Harry
>> ; Harman Kalra ;
>> Richardson, Bruce ; Ananyev, Konstantin
>> 
>> Cc: dev@dpdk.org; Morrissey, Sean 
>> Subject: [PATCH v1 5/5] lib/eal: remove unneeded header includes
>>
>> These header includes have been flagged by the iwyu_tool
>> and removed.
>>
>> Signed-off-by: Sean Morrissey 
> 
>
> For lib/eal/common/rte_service.c;
> Reviewed-by: Harry van Haaren 

For lib/eal/common/rte_random.c:
Reviewed-by: Mattias Rönnblom 



Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 3:25 PM, Elad Nachman wrote:

Can you please try to not top post, it will make impossible to follow this
discussion later from the mail archives.

> 1. Userspace will get an error

So there is nothing special with returning '-EAGAIN', user will only observe an
error.
Wasn't initial intention to use '-EAGAIN' to try request again?

> 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl unlocked
> for interface down command causes a crash because of a race condition in
> the device delete/unregister list in the kernel.
> 

Why waiting with rthnl lock causes a deadlock? As said below we are already
doing it, why it is different with retry logic?

I agree to not wait with rtnl unlocked.

> FYI,
> 
> Elad.
> 
> בתאריך יום ב׳, 4 באוק׳ 2021, 17:13, מאת Ferruh Yigit ‏<
> ferruh.yi...@intel.com>:
> 
>> On 10/4/2021 2:09 PM, Elad Nachman wrote:
>>> Hi,
>>>
>>> EAGAIN is propogated back to the kernel and to the caller.
>>>
>>
>> So will the user get an error, or it will be handled by the kernel and
>> retried?
>>
>>> We cannot retry from the kni kernel module since we hold the rtnl lock.
>>>
>>
>> Why not? We are already waiting until a command time out, like
>> 'kni_net_open()'
>> can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we can
>> limit the
>> number of retry for safety.
>>
>>> FYI,
>>>
>>> Elad
>>>
>>> בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit ‏<
>>> ferruh.yi...@intel.com>:
>>>
 On 9/24/2021 11:54 AM, Elad Nachman wrote:
> Fix lack of multiple KNI requests handling support by introducing a
> request in progress flag which will fail additional requests with
> EAGAIN return code if the original request has not been processed
> by user-space.
>
> Bugzilla ID: 809

 Hi Eric,

 Can you please test this patch, if it solves the issue you reported?

>
> Signed-off-by: Elad Nachman 
> ---
>  kernel/linux/kni/kni_net.c | 9 +
>  lib/kni/rte_kni.c  | 2 ++
>  lib/kni/rte_kni_common.h   | 1 +
>  3 files changed, 12 insertions(+)
>

 <...>

> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev,
 struct rte_kni_request *req)
>
>   mutex_lock(&kni->sync_lock);
>
> + /* Check that existing request has been processed: */
> + cur_req = (struct rte_kni_request *)kni->sync_kva;
> + if (cur_req->req_in_progress) {
> + ret = -EAGAIN;

 Overall logic in the KNI looks good to me, this helps to serialize the
 requests
 even for async ones.

 But can you please clarify how it behaves in the kernel side with
>> '-EAGAIN'
 return type? Will linux call the ndo again, or will it just fail.

 If it just fails should we handle the re-try on '-EAGAIN' within the kni
 module?


>>
>>



Re: [dpdk-dev] [PATCH v4 6/6] net/iavf: add watchdog for VFLR

2021-10-04 Thread Nicolau, Radu



On 10/4/2021 12:18 PM, Nicolau, Radu wrote:


On 10/4/2021 3:15 AM, Wu, Jingjing wrote:



-Original Message-
From: Nicolau, Radu 
Sent: Friday, October 1, 2021 5:52 PM
To: Wu, Jingjing ; Xing, Beilei 

Cc: dev@dpdk.org; Doherty, Declan ; Sinha, 
Abhijit
; Zhang, Qi Z ; 
Richardson, Bruce
; Ananyev, Konstantin 
;

Nicolau, Radu 
Subject: [PATCH v4 6/6] net/iavf: add watchdog for VFLR

Add watchdog to iAVF PMD which support monitoring the VFLR register. If
the device is not already in reset then if a VF reset in progress is
detected then notfiy user through callback and set into reset state.
If the device is already in reset then poll for completion of reset.

Signed-off-by: Declan Doherty 
Signed-off-by: Radu Nicolau 
---
  drivers/net/iavf/iavf.h    |  6 +++
  drivers/net/iavf/iavf_ethdev.c | 97 
++

  2 files changed, 103 insertions(+)

...

Besides checking VFGEN_RSTAT, there is a process to handle 
VIRTCHNL_OP_EVENT  from PF. What is the change for? Any scenario 
which VIRTCHNL_OP_EVENT  doesn't cover?

And how is the 500us been determined?


Hi Jingjing, thanks for reviewing, I think this can be handled with 
the VIRTCHNL_OP_EVENT  with no need for a watchdog alarm, I will 
rework the patch.


Hi Jingjing I went over this with Declan, the reason it was added is 
that we can actually have a hardware initiated reset that may not 
trigger an event; and also the kernel driver is implementing a similar 
mechanism. The 500us seems indeed excessive I will update the patch to 
use a configurable value with the default of 5ms, as the kernel driver does.




Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten

2021-10-04 Thread Elad Nachman
בתאריך יום ב׳, 4 באוק׳ 2021, 17:51, מאת Ferruh Yigit ‏<
ferruh.yi...@intel.com>:

> On 10/4/2021 3:25 PM, Elad Nachman wrote:
>
> Can you please try to not top post, it will make impossible to follow this
> discussion later from the mail archives.
>
> > 1. Userspace will get an error
>
> So there is nothing special with returning '-EAGAIN', user will only
> observe an
> error.
> Wasn't initial intention to use '-EAGAIN' to try request again?
>
> To signal user-space to retry the operation.

>
> > 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl unlocked
> > for interface down command causes a crash because of a race condition in
> > the device delete/unregister list in the kernel.
> >
>
> Why waiting with rthnl lock causes a deadlock? As said below we are already
> doing it, why it is different with retry logic?
>
> Because it can be interface down request.


> I agree to not wait with rtnl unlocked.
>
> > FYI,
> >
> > Elad.
> >
> > בתאריך יום ב׳, 4 באוק׳ 2021, 17:13, מאת Ferruh Yigit ‏<
> > ferruh.yi...@intel.com>:
> >
> >> On 10/4/2021 2:09 PM, Elad Nachman wrote:
> >>> Hi,
> >>>
> >>> EAGAIN is propogated back to the kernel and to the caller.
> >>>
> >>
> >> So will the user get an error, or it will be handled by the kernel and
> >> retried?
> >>
> >>> We cannot retry from the kni kernel module since we hold the rtnl lock.
> >>>
> >>
> >> Why not? We are already waiting until a command time out, like
> >> 'kni_net_open()'
> >> can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we can
> >> limit the
> >> number of retry for safety.
> >>
> >>> FYI,
> >>>
> >>> Elad
> >>>
> >>> בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit ‏<
> >>> ferruh.yi...@intel.com>:
> >>>
>  On 9/24/2021 11:54 AM, Elad Nachman wrote:
> > Fix lack of multiple KNI requests handling support by introducing a
> > request in progress flag which will fail additional requests with
> > EAGAIN return code if the original request has not been processed
> > by user-space.
> >
> > Bugzilla ID: 809
> 
>  Hi Eric,
> 
>  Can you please test this patch, if it solves the issue you reported?
> 
> >
> > Signed-off-by: Elad Nachman 
> > ---
> >  kernel/linux/kni/kni_net.c | 9 +
> >  lib/kni/rte_kni.c  | 2 ++
> >  lib/kni/rte_kni_common.h   | 1 +
> >  3 files changed, 12 insertions(+)
> >
> 
>  <...>
> 
> > @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev,
>  struct rte_kni_request *req)
> >
> >   mutex_lock(&kni->sync_lock);
> >
> > + /* Check that existing request has been processed: */
> > + cur_req = (struct rte_kni_request *)kni->sync_kva;
> > + if (cur_req->req_in_progress) {
> > + ret = -EAGAIN;
> 
>  Overall logic in the KNI looks good to me, this helps to serialize the
>  requests
>  even for async ones.
> 
>  But can you please clarify how it behaves in the kernel side with
> >> '-EAGAIN'
>  return type? Will linux call the ndo again, or will it just fail.
> 
>  If it just fails should we handle the re-try on '-EAGAIN' within the
> kni
>  module?
> 
> 
> >>
> >>
>
> Elad.


Re: [dpdk-dev] [PATCH v1 1/5] devtools: script to remove unused headers includes

2021-10-04 Thread Bruce Richardson
On Mon, Oct 04, 2021 at 10:10:54AM +, Sean Morrissey wrote:
> This script can be used for removing headers flagged for removal by the
> include-what-you-use (IWYU) tool. The script has the ability to remove
> headers from specified sub-directories or dpdk as a whole.
> 
Since it also is importing meson and calling "meson compile" it appears to
be testing the build after each removal too. I think this should be called
out, to make it clear it's not a "blind" removal of headers.

Further review comments inline below.

/Bruce

> example usages:
> 
> Remove headers flagged by iwyu_tool output file
> $ ./devtools/process_iwyu.py iwyu.out -b build
> 
> Remove headers flagged by iwyu_tool output file from sub-directory
> $ ./devtools/process_iwyu.py iwyu.out -b build -d lib/kvargs
> 
> Remove headers directly piped from the iwyu_tool
> $ iwyu_tool -p build | ./devtools/process_iwyu.py - -b build
> 
> Signed-off-by: Sean Morrissey 
> Signed-off-by: Conor Fogarty 
> ---
>  devtools/process_iwyu.py | 109 +++
>  1 file changed, 109 insertions(+)
>  create mode 100755 devtools/process_iwyu.py
> 
> diff --git a/devtools/process_iwyu.py b/devtools/process_iwyu.py
> new file mode 100755
> index 00..ddc4ceafa4
> --- /dev/null
> +++ b/devtools/process_iwyu.py
> @@ -0,0 +1,109 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2021 Intel Corporation
> +#
> +
> +import argparse
> +import fileinput
> +import sys
> +from os.path import abspath, relpath, join
> +from pathlib import Path
> +from mesonbuild import mesonmain
> +
> +def args_parse():
> +parser = argparse.ArgumentParser(description='This script can be used to 
> remove includes which are not in use\n')
> +parser.add_argument('-b', '--build_dir', type=str, help='Name of the 
> build directory in which the IWYU tool was used in.', default="build")
> +parser.add_argument('-d', '--sub_dir', type=str, help='The sub-directory 
> to remove headers from.', default="")
> +parser.add_argument('file', type=Path, help='The path to the IWYU log 
> file or output from stdin.')
> +

These lines are all very long. While the text strings shouldn't be split
across lines, you can break the line across multiple ones between
parameters. I suggest checking this whole script using "flake8" to check
for style errors.

> +args = parser.parse_args()
> +
> +return args
> +

"args" is unneeded here. "return parse.parse_args()" is shorter. :-)

> +
> +def run_meson(args):
> +"Runs a meson command logging output to process.log"
> +with open('process.log', 'a') as sys.stdout:
> +ret = mesonmain.run(args, abspath('meson'))
> +sys.stdout = sys.__stdout__
> +return ret
> +

I think process.log should be renamed to "process_iwyu.log" to match the
script name.
Also, it's nice to see a few functions like this with a docstring at the
start. It would be good to have a one-line summary at the start of every
fn in this file.

> +
> +def remove_includes(filename, include, dpdk_dir, build_dir):
> +# Load in file - readlines()
> +# loop through list once in mem -> make cpy of list with line removed
> +# write cpy  -> stored in memory so write cpy to file then check
> +# run test build -> call ninja on the build folder, ninja -C build, 
> subprocess
You actually call "meson compile" rather than ninja -C build. Please make
sure the comments match the code as the code is reworked.

If you take the approach of adding a one-line string at the start of each
function, I'd suggest splitting up this comment into smaller comments
spread throughout the code, explaining each short block as it appears.
[Though I see it's reasonably well commented below as-is]

> +# if fails -> write original back to file otherwise continue on
> +# newlist = [ln for ln in lines if not ln.startswith(...)] filters out 
> one element
> +filepath = filename
> +
> +with open(filepath, 'r+') as f:
> +lines = f.readlines()  # Read lines when file is opened
> +
> +with open(filepath, 'w') as f:
> +for ln in lines:  # Removes the include passed in
> +if ln.strip("\n") != include:
Strip without any parameters removes all whitespace, which is probably ok
here, so drop the explicit "\n".

> +f.write(ln)
> +
> +ret = run_meson(['compile', '-C', join(dpdk_dir, build_dir)])
> +if (ret == 0):  # Include is not needed -> build is successful
> +print('SUCCESS')
> +else:
> +# failed, catch the error
> +# return file to original state
> +with open(filepath, 'w') as f:
> +f.writelines(lines)
> +print('FAILED')
> +
> +
> +def get_build_config(builddir, condition):
> +"returns contents of rte_build_config.h"
> +with open(join(builddir, 'rte_build_config.h')) as f:
> +return [ln for ln in f.readlines() if condition(ln)]
> +
> +
> +def uses_libbsd(builddir):
> +"retu

Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 3:14 PM, Eric Christian wrote:
> Adding Sahithi.
> 
> I believe adding the -EAGAIN method puts the responsibility on the
> application/caller.  If we take the change MAC address as an example.  Most
> application code just does this kind of check:
> 
> ret = ioctl(sockfd, SIOCSIFHWADDR, &ifr);
> 
> if (ret < 0) {
> PMD_LOG_ERRNO(ERR, "ioctl(SIOCSIFHWADDR) failed");
> return -EINVAL;
> }
> 

I am not sure '-EAGAIN' should be handled by the userspace code. I assumed that
kernel netdev layer will try again if ndo returns '-EAGAIN' but that seems not
the case, so perhaps we can retry in the KNI kernel module. So the issue can be
handled without the KNI module transparent to the user application.

> So the existing application code will treat the -EAGAIN as a failure and
> not retry.  Unless it is expected that the IOCTL can return -EAGAIN and the
> application decides to keep retrying?
> 
> We can try this, but we have temporarily patched out the async changes in
> our code as it was blocking QA due to
> https://bugs.dpdk.org/show_bug.cgi?id=816
> 
> Eric
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Mon, Oct 4, 2021 at 9:05 AM Ferruh Yigit  wrote:
> 
>> On 9/24/2021 11:54 AM, Elad Nachman wrote:
>>> Fix lack of multiple KNI requests handling support by introducing a
>>> request in progress flag which will fail additional requests with
>>> EAGAIN return code if the original request has not been processed
>>> by user-space.
>>>
>>> Bugzilla ID: 809
>>
>> Hi Eric,
>>
>> Can you please test this patch, if it solves the issue you reported?
>>
>>>
>>> Signed-off-by: Elad Nachman 
>>> ---
>>>  kernel/linux/kni/kni_net.c | 9 +
>>>  lib/kni/rte_kni.c  | 2 ++
>>>  lib/kni/rte_kni_common.h   | 1 +
>>>  3 files changed, 12 insertions(+)
>>>
>>
>> <...>
>>
>>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev,
>> struct rte_kni_request *req)
>>>
>>>   mutex_lock(&kni->sync_lock);
>>>
>>> + /* Check that existing request has been processed: */
>>> + cur_req = (struct rte_kni_request *)kni->sync_kva;
>>> + if (cur_req->req_in_progress) {
>>> + ret = -EAGAIN;
>>
>> Overall logic in the KNI looks good to me, this helps to serialize the
>> requests
>> even for async ones.
>>
>> But can you please clarify how it behaves in the kernel side with '-EAGAIN'
>> return type? Will linux call the ndo again, or will it just fail.
>>
>> If it just fails should we handle the re-try on '-EAGAIN' within the kni
>> module?
>>
>>



Re: [dpdk-dev] [PATCH v1 02/12] ethdev: add eswitch port item to flow API

2021-10-04 Thread Ori Kam
Hi Ivan,

> -Original Message-
> From: Ivan Malov 
> Sent: Monday, October 4, 2021 2:06 PM
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1 02/12] ethdev: add eswitch port item to flow API
> 
> Hi Ori,
> 
> On 04/10/2021 08:45, Ori Kam wrote:
> > Hi Ivan,
> >
> >> -Original Message-
> >> From: Ivan Malov 
> >> Sent: Sunday, October 3, 2021 9:11 PM
> >> Subject: Re: [PATCH v1 02/12] ethdev: add eswitch port item to flow
> >> API
> >>
> >>
> >>
> >> On 03/10/2021 15:40, Ori Kam wrote:
> >>> Hi Andrew and Ivan,
> >>>
>  -Original Message-
>  From: Andrew Rybchenko 
>  Sent: Friday, October 1, 2021 4:47 PM
>  Subject: [PATCH v1 02/12] ethdev: add eswitch port item to flow API
> 
>  From: Ivan Malov 
> 
>  For use with "transfer" flows. Supposed to match traffic entering
>  the e-switch from the external world (network, guests) via the port
>  which is logically connected with the given ethdev.
> 
>  Must not be combined with attributes "ingress" / "egress".
> 
>  This item is meant to use the same structure as ethdev item.
> 
> >>>
> >>> In case the app is not working with representors, meaning each
> >>> switch port is mapped to ethdev.
> >>> both items (ethdev and eswitch port ) have the same meaning?
> >>
> >> No. Ethdev means ethdev, and e-switch port is the point where this
> >> ethdev is plugged to. For example, "transfer + ESWITCH_PORT" for a
> >> regular PF ethdev typically means the network port (maybe you can
> >> recall the idea that a PF ethdev "represents" the network port it's
> associated with).
> >>
> >> I believe, that diagrams which these patches add to
> >> "doc/guides/prog_guide/rte_flow.rst" may come in handy to understand
> >> the meaning. Also, you can take a look at our larger diagram from the
> >> Sep 14 gathering.
> >>
> >
> > Lets look at the following system:
> > E-Switch has 3 ports - PF, VF1, VF2
> > The ports are distributed as follows:
> > DPDK application:
> > ethdev(0) pf,
> > ethdev(1) representor to VF1
> > ethdev(2) representor to VF2
> > ethdev(3) VF1
> >
> > VM:
> > VF2
> >
> > As we know all representors are realy connected to the PF(at least in
> > this example)
> 
> This example tries to say that the e-switch has 3 ports in total, and, given
> your explanation, one may indeed agree that *in this example* representors
> re-use e-switch port of ethdev=0 (with some metadata to distinguish
> packets, etc.). But one can hardly assume that *all* representors with any
> vendor's NIC are connected to the e-switch the same way. It's vendor
> specific. Well, at least, applications don't have this knowledge and don't 
> need
> to.
> 
> >
> > So matching on ethdev(3)  means matching on traffic sent from DPDK port
> 3 right?
> 
> Item ETHDEV (ethdev_id=3) matches traffic sent by DPDK port 3. Looks like
> we're on the same page here.
> 

Good.

> > And matching on eswitch_port(3) means matching in traffic that goes
> > into VF1 which is the same traffic as ethdev(3) right?
> 
> I didn't catch the thought about "the same traffic". Direction is not the 
> same.
> Item ESWITCH_PORT (ethdev_id=3) matches traffic sent by DPDK port 1.
> 
This is the critical part for my understanding.
Matching on ethdev_id(3) means matching on traffic that is coming from DPDK 
port3.
So from E-Switch view point it is traffic that goes into VF1?
While matching on E-Switch_port(3) means matching on traffic coming from VF1?

And by the same logic matching on ethdev_id(1) means matching on taffic that 
was sent
from DPDK port 1 and matching on E-Switch_port(1) means matching on traffic 
coming from
VF1 

So in this case eswitch_port(3) is equal ot eswitch_port(1) right?
While ethdev(1) is not equal to ethdev(3)

And just to complete the picture, matching on ethdev(2) will result in traffic
coming from the dpdk port and matching on eswitch_port(2) will match
on traffic coming from VF2

> Yes, in this case neither of the ports (1, 3) is truly "external" (they both
> interface the DPDK application), but, the thing is, they're "external" *to 
> each
> other* in the sense that they sit at the opposite ends of the wire.
> 
> >
> > Matching on ethdev(1) means matching on the PF port in the E-Switch but
> with some
> > metadata that marks the traffic as coming from DPDK port 1 and not from
> VF1 E-Switch
> > port right?
> 
> That's vendor specific. The application doesn't have to know how exactly
> this particular ethdev is connected to the e-switch - whether it re-uses
> the PF's e-switch port or has its own. The e-switch port that connects
> the ethdev with the e-switch is just assumed to exist logically.
> 
> >
> > While matching on eswitch_port(2) means matching on traffic coming from
> the VM right?
> 
> Right.
> 

I think the my above question will clear everything for me.

> >
> >>>
>  Signed-off-by: Ivan Malov 
>  Signed-off-by: Andrew Rybchenko 
>  ---
> app/test-pmd/cmdline_flow.c | 27
> ++

Re: [dpdk-dev] [PATCH 1/3] examples/l3fwd: increase number of routes

2021-10-04 Thread Stephen Hemminger
On Mon, 4 Oct 2021 01:41:08 +0530
 wrote:

> From: Pavan Nikhilesh 
> 
> Increase the number of routes from 8 to 16 that are statically added for
> lpm and em mode as most of the SoCs support more than 8 interfaces.
> The number of routes added is equal to the number of ethernet devices
> ports enabled through port mask.
> 
> Signed-off-by: Pavan Nikhilesh 
> ---
>  v3 Changes: (Finally!)
>  - Add FIB to the list.
>  - Update release notes.
>  - Update EM route addition routine and use the correct IP addresses
>DTS need not be updated as EM test doesn't use IP addresses defined
>in l3fwd.
> 
>  v2 Changes:
>  - Fixup for EM mode.
> 
>  examples/l3fwd/l3fwd_route.h |  4 ++--
>  examples/l3fwd/main.c| 20 ++--
>  2 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd_route.h b/examples/l3fwd/l3fwd_route.h
> index 89f8634443..c7eba06c4d 100644
> --- a/examples/l3fwd/l3fwd_route.h
> +++ b/examples/l3fwd/l3fwd_route.h
> @@ -14,6 +14,6 @@ struct ipv6_l3fwd_route {
>   uint8_t if_out;
>  };
> 
> -extern const struct ipv4_l3fwd_route ipv4_l3fwd_route_array[8];
> +extern const struct ipv4_l3fwd_route ipv4_l3fwd_route_array[16];
> 
> -extern const struct ipv6_l3fwd_route ipv6_l3fwd_route_array[8];
> +extern const struct ipv6_l3fwd_route ipv6_l3fwd_route_array[16];
> diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
> index 00ac267af1..194f6ac1a4 100644
> --- a/examples/l3fwd/main.c
> +++ b/examples/l3fwd/main.c
> @@ -179,7 +179,7 @@ static struct l3fwd_lkp_mode l3fwd_fib_lkp = {
> 
>  /*
>   * 198.18.0.0/16 are set aside for RFC2544 benchmarking (RFC5735).
> - * 198.18.{0-7}.0/24 = Port {0-7}
> + * 198.18.{0-15}.0/24 = Port {0-15}
>   */
>  const struct ipv4_l3fwd_route ipv4_l3fwd_route_array[] = {
>   {RTE_IPV4(198, 18, 0, 0), 24, 0},
> @@ -190,11 +190,19 @@ const struct ipv4_l3fwd_route ipv4_l3fwd_route_array[] 
> = {
>   {RTE_IPV4(198, 18, 5, 0), 24, 5},
>   {RTE_IPV4(198, 18, 6, 0), 24, 6},
>   {RTE_IPV4(198, 18, 7, 0), 24, 7},
> + {RTE_IPV4(198, 18, 8, 0), 24, 8},
> + {RTE_IPV4(198, 18, 9, 0), 24, 9},
> + {RTE_IPV4(198, 18, 10, 0), 24, 10},
> + {RTE_IPV4(198, 18, 11, 0), 24, 11},
> + {RTE_IPV4(198, 18, 12, 0), 24, 12},
> + {RTE_IPV4(198, 18, 13, 0), 24, 13},
> + {RTE_IPV4(198, 18, 14, 0), 24, 14},
> + {RTE_IPV4(198, 18, 15, 0), 24, 15},
>  };
> 
>  /*
>   * 2001:200::/48 is IANA reserved range for IPv6 benchmarking (RFC5180).
> - * 2001:200:0:{0-7}::/64 = Port {0-7}
> + * 2001:200:0:{0-15}::/64 = Port {0-15}
>   */
>  const struct ipv6_l3fwd_route ipv6_l3fwd_route_array[] = {
>   {{32, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 0},
> @@ -205,6 +213,14 @@ const struct ipv6_l3fwd_route ipv6_l3fwd_route_array[] = 
> {
>   {{32, 1, 2, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 5},
>   {{32, 1, 2, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 6},
>   {{32, 1, 2, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 7},
> + {{32, 1, 2, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 8},
> + {{32, 1, 2, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 9},
> + {{32, 1, 2, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 10},
> + {{32, 1, 2, 0, 0, 0, 0, 11, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 11},
> + {{32, 1, 2, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 12},
> + {{32, 1, 2, 0, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 13},
> + {{32, 1, 2, 0, 0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 14},
> + {{32, 1, 2, 0, 0, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0}, 64, 15},
>  };
> 
>  /*
> --
> 2.33.0
> 

Maybe the table should be generated or take an input file generated by a script.


Re: [dpdk-dev] [PATCH] common/cnxk: fix incorrect free of MCAM counter

2021-10-04 Thread Jerin Jacob
On Fri, Sep 17, 2021 at 10:08 AM  wrote:
>
> From: Satheesh Paul 
>
> Upon MCAM allocation failure, free counters only if counters
> were allocated earlier for the flow rule.
>
> Fixes: f9af9080746 ("common/cnxk: add mcam utility API")
>
> Signed-off-by: Satheesh Paul 

Acked-by: Jerin Jacob 
Applied to dpdk-next-net-mrvl/for-next-net. Thanks
> ---
>  drivers/common/cnxk/roc_npc_mcam.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/common/cnxk/roc_npc_mcam.c 
> b/drivers/common/cnxk/roc_npc_mcam.c
> index 8ccaaad0af..7d9b0ed3e3 100644
> --- a/drivers/common/cnxk/roc_npc_mcam.c
> +++ b/drivers/common/cnxk/roc_npc_mcam.c
> @@ -519,7 +519,8 @@ npc_mcam_alloc_and_write(struct npc *npc, struct 
> roc_npc_flow *flow,
>
> entry = npc_check_preallocated_entry_cache(mbox, flow, npc);
> if (entry < 0) {
> -   npc_mcam_free_counter(npc, ctr);
> +   if (use_ctr)
> +   npc_mcam_free_counter(npc, ctr);
> return NPC_ERR_MCAM_ALLOC;
> }
>
> --
> 2.25.4
>


Re: [dpdk-dev] [PATCH v1 0/5] introduce IWYU

2021-10-04 Thread Stephen Hemminger
On Mon,  4 Oct 2021 10:10:53 +
Sean Morrissey  wrote:

> This patchset introduces the include-what-you-use script which removes
> unused header includes. IWYU GitHub:
> 
> https://github.com/include-what-you-use/include-what-you-use
> 
> Along with the script there are some patches which make a start on
> removing unneeded headers.
> 
> Sean Morrissey (5):
>   devtools: script to remove unused headers includes
>   lib/telemetry: remove unneeded header includes
>   lib/ring: remove unneeded header includes
>   lib/kvargs: remove unneeded header includes
>   lib/eal: remove unneeded header includes
> 
>  devtools/process_iwyu.py   | 109 +
>  lib/eal/common/eal_common_dev.c|   5 --
>  lib/eal/common/eal_common_devargs.c|   1 -
>  lib/eal/common/eal_common_errno.c  |   4 -
>  lib/eal/common/eal_common_fbarray.c|   3 -
>  lib/eal/common/eal_common_hexdump.c|   3 -
>  lib/eal/common/eal_common_launch.c |   6 --
>  lib/eal/common/eal_common_lcore.c  |   6 --
>  lib/eal/common/eal_common_log.c|   2 -
>  lib/eal/common/eal_common_memalloc.c   |   3 -
>  lib/eal/common/eal_common_memory.c |   5 --
>  lib/eal/common/eal_common_memzone.c|   4 -
>  lib/eal/common/eal_common_options.c|   2 -
>  lib/eal/common/eal_common_proc.c   |   2 -
>  lib/eal/common/eal_common_string_fns.c |   2 -
>  lib/eal/common/eal_common_tailqs.c |  11 ---
>  lib/eal/common/eal_common_thread.c |   3 -
>  lib/eal/common/eal_common_timer.c  |   6 --
>  lib/eal/common/eal_common_trace.c  |   1 -
>  lib/eal/common/hotplug_mp.h|   1 -
>  lib/eal/common/malloc_elem.c   |   6 --
>  lib/eal/common/malloc_heap.c   |   5 --
>  lib/eal/common/malloc_mp.c |   1 -
>  lib/eal/common/malloc_mp.h |   2 -
>  lib/eal/common/rte_malloc.c|   6 --
>  lib/eal/common/rte_random.c|   3 -
>  lib/eal/common/rte_service.c   |   6 --
>  lib/eal/include/rte_version.h  |   2 -
>  lib/eal/linux/eal.c|  10 ---
>  lib/eal/linux/eal_alarm.c  |   7 --
>  lib/eal/linux/eal_cpuflags.c   |   2 -
>  lib/eal/linux/eal_debug.c  |   5 --
>  lib/eal/linux/eal_dev.c|   4 -
>  lib/eal/linux/eal_hugepage_info.c  |   8 --
>  lib/eal/linux/eal_interrupts.c |   8 --
>  lib/eal/linux/eal_lcore.c  |   7 --
>  lib/eal/linux/eal_log.c|  11 +--
>  lib/eal/linux/eal_memalloc.c   |   8 --
>  lib/eal/linux/eal_memory.c |   9 --
>  lib/eal/linux/eal_thread.c |   5 --
>  lib/eal/linux/eal_timer.c  |  15 
>  lib/eal/linux/eal_vfio_mp_sync.c   |   1 -
>  lib/eal/unix/eal_file.c|   1 -
>  lib/eal/unix/rte_thread.c  |   1 -
>  lib/eal/x86/rte_cycles.c   |   1 -
>  lib/kvargs/rte_kvargs.c|   1 -
>  lib/ring/rte_ring.c|   7 --
>  lib/telemetry/telemetry.c  |   1 -
>  lib/telemetry/telemetry_data.h |   1 -
>  49 files changed, 110 insertions(+), 213 deletions(-)
>  create mode 100755 devtools/process_iwyu.py
> 

There is a risk of breaking builds on other platforms.
How can you be sure the include files (especially auto generated list)
are the same in Linux, FreeBSD and Windows as well as the special
versions of libc (musl etc).





Re: [dpdk-dev] [PATCH v4] net: introduce IPv4 ihl and version fields

2021-10-04 Thread Stephen Hemminger
On Mon, 4 Oct 2021 15:13:22 +0300
Gregory Etelson  wrote:

> diff --git a/app/test/test_flow_classify.c b/app/test/test_flow_classify.c
> index 951606f248..4f64be5357 100644
> --- a/app/test/test_flow_classify.c
> +++ b/app/test/test_flow_classify.c
> @@ -95,7 +95,7 @@ static struct rte_acl_field_def ipv4_defs[NUM_FIELDS_IPV4] 
> = {
>   *  dst mask 255.255.255.00 / udp src is 32 dst is 33 / end"
>   */
>  static struct rte_flow_item_ipv4 ipv4_udp_spec_1 = {
> - { 0, 0, 0, 0, 0, 0, IPPROTO_UDP, 0,
> + { { .version_ihl = 0}, 0, 0, 0, 0, 0, IPPROTO_UDP, 0,
> RTE_IPV4(2, 2, 2, 3), RTE_IPV4(2, 2, 2, 7)}
>  };

This ends up being an API change which was not announced.


Re: [dpdk-dev] [PATCH v2] kni: Fix request overwritten

2021-10-04 Thread Ferruh Yigit
On 10/4/2021 3:58 PM, Elad Nachman wrote:
> בתאריך יום ב׳, 4 באוק׳ 2021, 17:51, מאת Ferruh Yigit ‏<
> ferruh.yi...@intel.com>:
> 
>> On 10/4/2021 3:25 PM, Elad Nachman wrote:
>>
>> Can you please try to not top post, it will make impossible to follow this
>> discussion later from the mail archives.
>>
>>> 1. Userspace will get an error
>>
>> So there is nothing special with returning '-EAGAIN', user will only
>> observe an
>> error.
>> Wasn't initial intention to use '-EAGAIN' to try request again?
>>
> To signal user-space to retry the operation.
>

Not sure if it will reach to the end user. If user is calling "ifconfig 
down", it will just fail right, it won't recognize the error type.

Unless this is common usage by the Linux network drivers, having this usage in
KNI won't help much. I am for handling this in the kernel side if we can.

>>
>>> 2. Waiting with rtnl locked causes a deadlock; waiting with rtnl unlocked
>>> for interface down command causes a crash because of a race condition in
>>> the device delete/unregister list in the kernel.
>>>
>>
>> Why waiting with rthnl lock causes a deadlock? As said below we are already
>> doing it, why it is different with retry logic?
>>
> Because it can be interface down request.
> 

(sure you like short answers)

Please help me to see why "interface down" is special. Isn't it point of your
patch to wait the request execution in the userspace even it is an async 
request?

And yet again, number of retry can be limited.


> 
>> I agree to not wait with rtnl unlocked.
>>
>>> FYI,
>>>
>>> Elad.
>>>
>>> בתאריך יום ב׳, 4 באוק׳ 2021, 17:13, מאת Ferruh Yigit ‏<
>>> ferruh.yi...@intel.com>:
>>>
 On 10/4/2021 2:09 PM, Elad Nachman wrote:
> Hi,
>
> EAGAIN is propogated back to the kernel and to the caller.
>

 So will the user get an error, or it will be handled by the kernel and
 retried?

> We cannot retry from the kni kernel module since we hold the rtnl lock.
>

 Why not? We are already waiting until a command time out, like
 'kni_net_open()'
 can retry if 'kni_net_process_request()' returns '-EAGAIN'. And we can
 limit the
 number of retry for safety.

> FYI,
>
> Elad
>
> בתאריך יום ב׳, 4 באוק׳ 2021, 16:05, מאת Ferruh Yigit ‏<
> ferruh.yi...@intel.com>:
>
>> On 9/24/2021 11:54 AM, Elad Nachman wrote:
>>> Fix lack of multiple KNI requests handling support by introducing a
>>> request in progress flag which will fail additional requests with
>>> EAGAIN return code if the original request has not been processed
>>> by user-space.
>>>
>>> Bugzilla ID: 809
>>
>> Hi Eric,
>>
>> Can you please test this patch, if it solves the issue you reported?
>>
>>>
>>> Signed-off-by: Elad Nachman 
>>> ---
>>>  kernel/linux/kni/kni_net.c | 9 +
>>>  lib/kni/rte_kni.c  | 2 ++
>>>  lib/kni/rte_kni_common.h   | 1 +
>>>  3 files changed, 12 insertions(+)
>>>
>>
>> <...>
>>
>>> @@ -123,7 +124,15 @@ kni_net_process_request(struct net_device *dev,
>> struct rte_kni_request *req)
>>>
>>>   mutex_lock(&kni->sync_lock);
>>>
>>> + /* Check that existing request has been processed: */
>>> + cur_req = (struct rte_kni_request *)kni->sync_kva;
>>> + if (cur_req->req_in_progress) {
>>> + ret = -EAGAIN;
>>
>> Overall logic in the KNI looks good to me, this helps to serialize the
>> requests
>> even for async ones.
>>
>> But can you please clarify how it behaves in the kernel side with
 '-EAGAIN'
>> return type? Will linux call the ndo again, or will it just fail.
>>
>> If it just fails should we handle the re-try on '-EAGAIN' within the
>> kni
>> module?
>>
>>


>>
>> Elad.



  1   2   >