date:20230530

RE: [EXT] [PATCH] drivers/net/bnx2x : Offer maintainership for bnx2x

2023-05-30 Thread Akhil Goyal

> From: Julien Aube 
> 
> Signed-off-by: Julien Aube 
> ---
++ Alok

Re: [PATCH v2 4/4] app: add testgraph application

2023-05-30 Thread Jerin Jacob

On Mon, May 22, 2023 at 12:37 PM Vamsi Krishna Attunuru
 wrote:

> > > +static int
> > > +link_graph_nodes(uint64_t valid_nodes, uint32_t lcore_id)
> > > +{
> > > +   int ret = 0;
> > > +
> > > +   num_patterns = 0;
> > > +
> > > +   if (valid_nodes == (TEST_GRAPH_ETHDEV_TX_NODE |

I think, if we need to extend the C code for adding new use case, then
it will not scale.
IMO, We should look at more runtime and file based interface.
Something like 
https://github.com/DPDK/dpdk/blob/main/examples/ip_pipeline/examples/l2fwd.cli
In nutshell,
1) File based interface to kick-start the valid use case enablement
2) Less logic in C code and everything should be driven from config file
3) Allow runtime change. examples/ip_pipeline provides telent
interface to update . Similar concept can be followed.

I think, we should push the app for next release. Not on this release.
Sorry for reviewing late.

RE: [PATCH v3 1/2] ip_frag: optimize key compare and hash generation

2023-05-30 Thread Ruifeng Wang

> -Original Message-
> From: pbhagavat...@marvell.com 
> Sent: Monday, May 29, 2023 10:55 PM
> To: jer...@marvell.com; Ruifeng Wang ; Yipeng Wang
> ; Sameh Gobriel ; Bruce 
> Richardson
> ; Vladimir Medvedkin 
> ;
> Konstantin Ananyev 
> Cc: dev@dpdk.org; Pavan Nikhilesh 
> Subject: [PATCH v3 1/2] ip_frag: optimize key compare and hash generation
> 
> From: Pavan Nikhilesh 
> 
> Use optimized rte_hash_k32_cmp_eq routine for key comparison for
> x86 and ARM64.
> Use CRC instructions for hash generation on ARM64.
> 
> Signed-off-by: Pavan Nikhilesh 
> ---
> On Neoverse-N2, performance improved by 10% when measured with 
> examples/ip_reassembly.
> 
>  v3 Changes:
>  - Drop NEON patch.
>  v2 Changes:
>  - Fix compilation failure with non ARM64/x86 targets
> 
>  lib/hash/rte_cmp_arm64.h   | 16 
>  lib/hash/rte_cmp_x86.h | 16 
>  lib/ip_frag/ip_frag_common.h   | 14 +-
>  lib/ip_frag/ip_frag_internal.c |  4 ++--
>  4 files changed, 31 insertions(+), 19 deletions(-)
> 
Reviewed-by: Ruifeng Wang

[PATCH] app/testpmd: fix IPv6 tunnel packet checksum error

2023-05-30 Thread Shiyang He

In checksum forwarding mode, the checksum of tunnel packet calculated
incorrectly when outer header is IPv6.

This pathch fixes the issue by setting L4 checksum flag.

Fixes: daa02b5cddbb ("mbuf: add namespace to offload flags")
Cc: sta...@dpdk.org

Signed-off-by: Shiyang He 
---
 app/test-pmd/csumonly.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index fc85c22a77..bd2fccc458 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -582,7 +582,7 @@ process_outer_cksums(void *outer_l3_hdr, struct 
testpmd_offload_info *info,
else
ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
} else
-   ol_flags |= RTE_MBUF_F_TX_OUTER_IPV6;
+   ol_flags |= RTE_MBUF_F_TX_OUTER_IPV6 | RTE_MBUF_F_TX_L4_MASK;
 
if (info->outer_l4_proto != IPPROTO_UDP)
return ol_flags;
-- 
2.37.2

Re: Hugepage migration

2023-05-30 Thread Bruce Richardson

On Sun, May 28, 2023 at 11:07:40PM +0300, Baruch Even wrote:
>Hi,
>We found an issue with newer kernels (5.13+) that are found on newer
>OSes (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that
>was allocated for DPDK was migrated (moved into another physical page)
>when a 1G page was allocated.
>From our reading of the kernel commits this started with commit
>ae37c7ff79f1f030e28ec76c46ee032f8fd07607
>mm: make alloc_contig_range handle in-use hugetlb pages
>This caused what looked like memory corruptions to us and cases where
>the rings were moved from their physical location and communication was
>no longer possible.
>I wanted to ask if anyone else hit this issue and what mitigations are
>available?
>We are currently looking at using a kernel driver to pin the pages but
>I expect that this issue will affect others and that a more general
>approach is needed.
>Thanks,
>Baruch
>--

Hi,

what kernel driver was being used for the device I/O part? Was it a UIO
based driver or "vfio-pci"? When using vfio-pci and configuring IOMMU
mappings, the pages mapped should be pinned by the kernel, I would have
thought, since the kernel knows they are being used by devices.

/Bruce

[PATCH] app/testpmd: fix IPv6 tunnel packet checksum error

2023-05-30 Thread Shiyang He

In checksum forwarding mode, the checksum of tunnel packet calculated
incorrectly when outer header is IPv6.

This patch fixes the issue by setting L4 checksum flag.

Fixes: daa02b5cddbb ("mbuf: add namespace to offload flags")
Cc: sta...@dpdk.org

Signed-off-by: Shiyang He 
---
 app/test-pmd/csumonly.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c
index fc85c22a77..bd2fccc458 100644
--- a/app/test-pmd/csumonly.c
+++ b/app/test-pmd/csumonly.c
@@ -582,7 +582,7 @@ process_outer_cksums(void *outer_l3_hdr, struct 
testpmd_offload_info *info,
else
ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
} else
-   ol_flags |= RTE_MBUF_F_TX_OUTER_IPV6;
+   ol_flags |= RTE_MBUF_F_TX_OUTER_IPV6 | RTE_MBUF_F_TX_L4_MASK;
 
if (info->outer_l4_proto != IPPROTO_UDP)
return ol_flags;
-- 
2.37.2

Re: [PATCH v2 1/4] node: add pkt punt to kernel node

2023-05-30 Thread Nithin Dabilpuram

On Tue, Apr 25, 2023 at 6:45 PM Vamsi Attunuru  wrote:
>
> Patch adds a node to punt the packets to kernel over
> a raw socket.
>
> Signed-off-by: Vamsi Attunuru 
> ---
>  doc/guides/prog_guide/graph_lib.rst |  10 +++
>  lib/node/meson.build|   1 +
>  lib/node/punt_kernel.c  | 125 
>  lib/node/punt_kernel_priv.h |  36 
>  4 files changed, 172 insertions(+)
>
> diff --git a/doc/guides/prog_guide/graph_lib.rst 
> b/doc/guides/prog_guide/graph_lib.rst
> index 1cfdc86433..b3b5b14827 100644
> --- a/doc/guides/prog_guide/graph_lib.rst
> +++ b/doc/guides/prog_guide/graph_lib.rst
> @@ -392,3 +392,13 @@ null
>  
>  This node ignores the set of objects passed to it and reports that all are
>  processed.
> +
> +punt_kernel
> +~~~
> +This node punts the packets to kernel using a raw socket interface. For 
> sending
> +the received packets, raw socket uses the packet's destination IP address in
> +sockaddr_in address structure and node uses ``sendto`` function to send data
> +on the raw socket.
> +
> +Aftering sending the burst of packets to kernel, this node redirects the same
> +objects to pkt_drop node to free up the packet buffers.
> diff --git a/lib/node/meson.build b/lib/node/meson.build
> index dbdf673c86..48c2da73f7 100644
> --- a/lib/node/meson.build
> +++ b/lib/node/meson.build
> @@ -17,6 +17,7 @@ sources = files(
>  'null.c',
>  'pkt_cls.c',
>  'pkt_drop.c',
> +'punt_kernel.c',
>  )
>  headers = files('rte_node_ip4_api.h', 'rte_node_eth_api.h')
>  # Strict-aliasing rules are violated by uint8_t[] to context size casts.
> diff --git a/lib/node/punt_kernel.c b/lib/node/punt_kernel.c
> new file mode 100644
> index 00..e5dd15b759
> --- /dev/null
> +++ b/lib/node/punt_kernel.c
> @@ -0,0 +1,125 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Marvell International Ltd.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "node_private.h"
> +#include "punt_kernel_priv.h"
> +
> +static __rte_always_inline void
> +punt_kernel_process_mbuf(struct rte_node *node, struct rte_mbuf **mbufs, 
> uint16_t cnt)
> +{
> +   punt_kernel_node_ctx_t *ctx = (punt_kernel_node_ctx_t *)node->ctx;
> +   struct sockaddr_in sin = {0};
> +   struct rte_ipv4_hdr *ip4;
> +   size_t len;
> +   char *buf;
> +   int i;
> +
> +   for (i = 0; i < cnt; i++) {
> +   ip4 = rte_pktmbuf_mtod(mbufs[i], struct rte_ipv4_hdr *);
> +   len = rte_pktmbuf_data_len(mbufs[i]);
> +   buf = (char *)ip4;
> +
> +   sin.sin_family = AF_INET;
> +   sin.sin_port = 0;
> +   sin.sin_addr.s_addr = ip4->dst_addr;
> +
> +   if (sendto(ctx->sock, buf, len, 0, (struct sockaddr *)&sin, 
> sizeof(sin)) < 0)
> +   node_err("punt_kernel", "Unable to send packets: 
> %s\n", strerror(errno));
> +   }
> +}
> +
> +static uint16_t
> +punt_kernel_node_process(struct rte_graph *graph __rte_unused, struct 
> rte_node *node, void **objs,
> +uint16_t nb_objs)
> +{
> +   struct rte_mbuf **pkts = (struct rte_mbuf **)objs;
> +   uint16_t obj_left = nb_objs;
> +
> +#define PREFETCH_CNT 4
> +
> +   while (obj_left >= 12) {
> +   /* Prefetch next-next mbufs */
> +   rte_prefetch0(pkts[8]);
> +   rte_prefetch0(pkts[9]);
> +   rte_prefetch0(pkts[10]);
> +   rte_prefetch0(pkts[11]);
> +
> +   /* Prefetch next mbuf data */
> +   rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[4], void *, 
> pkts[4]->l2_len));
> +   rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[5], void *, 
> pkts[5]->l2_len));
> +   rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[6], void *, 
> pkts[6]->l2_len));
> +   rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[7], void *, 
> pkts[7]->l2_len));
> +
> +   punt_kernel_process_mbuf(node, pkts, PREFETCH_CNT);
> +
> +   obj_left -= PREFETCH_CNT;
> +   pkts += PREFETCH_CNT;
> +   }
> +
> +   while (obj_left > 0) {
> +   punt_kernel_process_mbuf(node, pkts, 1);
> +
> +   obj_left--;
> +   pkts++;
> +   }
> +
> +   rte_node_next_stream_move(graph, node, PUNT_KERNEL_NEXT_PKT_DROP);

Packet drop node signifies that packet is dropped due to some reason
and not consumed. Since here the packet is really not dropped but
consumed by Kernel, can we avoid using pkt drop node
and instead free pkts directly ?

> +
> +   return nb_objs;
> +}
> +
> +static int
> +punt_kernel_node_init(const struct rte_graph *graph __rte_unused, struct 
> rte_node *node)
> +{
> +   punt_kernel_node_ctx_t *ctx = (punt_kernel_node_ctx_t *)node->ctx;
> +
> +   ctx->sock = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
>

RE: [PATCH v5 01/21] net: add PDCP header

2023-05-30 Thread Akhil Goyal

> Subject: [PATCH v5 01/21] net: add PDCP header
> 
> From: Volodymyr Fialko 
> 
> Add PDCP protocol header to be used for supporting PDCP protocol
> processing.
> 
> Signed-off-by: Anoob Joseph 
> Signed-off-by: Kiran Kumar K 
> Signed-off-by: Volodymyr Fialko 
> Acked-by: Akhil Goyal 
> ---
Olivier,
Could you please review and give ack on this patch.

Thomas,
Can this patch be applied to next-crypto?

Re: [PATCH v2 1/3] lib: add IPv6 lookup node

2023-05-30 Thread Nithin Dabilpuram

On Thu, May 18, 2023 at 9:27 PM Amit Prakash Shukla
 wrote:
>
> From: Sunil Kumar Kori 
>
> Similar to IPv4 lookup node, patch adds IPv6 lookup
> node.
>
> Signed-off-by: Sunil Kumar Kori 
> Signed-off-by: Amit Prakash Shukla 
> ---
> v2:
> - Performance related changes
>
>  doc/guides/prog_guide/graph_lib.rst |  13 +
>  lib/node/ip6_lookup.c   | 374 
>  lib/node/meson.build|   3 +-
>  lib/node/node_private.h |   3 +-
>  lib/node/pkt_cls.c  |  14 ++
>  lib/node/pkt_cls_priv.h |   1 +
>  lib/node/rte_node_ip6_api.h |  80 ++
>  lib/node/version.map|   2 +
>  8 files changed, 488 insertions(+), 2 deletions(-)
>  create mode 100644 lib/node/ip6_lookup.c
>  create mode 100644 lib/node/rte_node_ip6_api.h
>
> diff --git a/doc/guides/prog_guide/graph_lib.rst 
> b/doc/guides/prog_guide/graph_lib.rst
> index 1cfdc86433..1f70d63628 100644
> --- a/doc/guides/prog_guide/graph_lib.rst
> +++ b/doc/guides/prog_guide/graph_lib.rst
> @@ -388,6 +388,19 @@ to determine the L2 header to be written to the packet 
> before sending
>  the packet out to a particular ethdev_tx node.
>  ``rte_node_ip4_rewrite_add()`` is control path API to add next-hop info.
>
> +ip6_lookup
> +~~
> +This node is an intermediate node that does LPM lookup for the received
> +ipv6 packets and the result determines each packets next node.
> +
> +On successful LPM lookup, the result contains the ``next_node`` id and
> +``next-hop`` id with which the packet needs to be further processed.
> +
> +On LPM lookup failure, objects are redirected to pkt_drop node.
> +``rte_node_ip6_route_add()`` is control path API to add ipv6 routes.
> +To achieve home run, node use ``rte_node_stream_move()`` as mentioned in 
> above
> +sections.
> +
>  null
>  
>  This node ignores the set of objects passed to it and reports that all are
> diff --git a/lib/node/ip6_lookup.c b/lib/node/ip6_lookup.c
> new file mode 100644
> index 00..a377c06072
> --- /dev/null
> +++ b/lib/node/ip6_lookup.c
> @@ -0,0 +1,374 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2023 Marvell.
> + */
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "rte_node_ip6_api.h"
> +
> +#include "node_private.h"
> +
> +#define IPV6_L3FWD_LPM_MAX_RULES 1024
> +#define IPV6_L3FWD_LPM_NUMBER_TBL8S (1 << 8)
> +
> +/* IP6 Lookup global data struct */
> +struct ip6_lookup_node_main {
> +   struct rte_lpm6 *lpm_tbl[RTE_MAX_NUMA_NODES];
> +};
> +
> +struct ip6_lookup_node_ctx {
> +   /* Socket's LPM table */
> +   struct rte_lpm6 *lpm6;
> +   /* Dynamic offset to mbuf priv1 */
> +   int mbuf_priv1_off;
> +};
> +
> +int ip6_node_mbuf_priv1_dynfield_offset = -1;
> +
> +static struct ip6_lookup_node_main ip6_lookup_nm;
> +
> +#define IP6_LOOKUP_NODE_LPM(ctx) \
> +   (((struct ip6_lookup_node_ctx *)ctx)->lpm6)
> +
> +#define IP6_LOOKUP_NODE_PRIV1_OFF(ctx) \
> +   (((struct ip6_lookup_node_ctx *)ctx)->mbuf_priv1_off)
> +
> +static uint16_t
> +ip6_lookup_node_process_scalar(struct rte_graph *graph, struct rte_node 
> *node,
> +   void **objs, uint16_t nb_objs)
> +{
> +   struct rte_mbuf *mbuf0, *mbuf1, *mbuf2, *mbuf3, **pkts;
> +   struct rte_lpm6 *lpm6 = IP6_LOOKUP_NODE_LPM(node->ctx);
> +   const int dyn = IP6_LOOKUP_NODE_PRIV1_OFF(node->ctx);
> +   struct rte_ipv6_hdr *ipv6_hdr;
> +   void **to_next, **from;
> +   uint16_t last_spec = 0;
> +   rte_edge_t next_index;
> +   uint16_t n_left_from;
> +   uint16_t held = 0;
> +   uint32_t drop_nh;
> +   int i, rc;
> +
> +   /* Speculative next */
> +   next_index = RTE_NODE_IP6_LOOKUP_NEXT_REWRITE;
> +   /* Drop node */
> +   drop_nh = ((uint32_t)RTE_NODE_IP6_LOOKUP_NEXT_PKT_DROP) << 16;
> +
> +   pkts = (struct rte_mbuf **)objs;
> +   from = objs;
> +   n_left_from = nb_objs;
> +
> +   for (i = OBJS_PER_CLINE; i < RTE_GRAPH_BURST_SIZE; i += 
> OBJS_PER_CLINE)
> +   rte_prefetch0(&objs[i]);
> +
> +   for (i = 0; i < 4 && i < n_left_from; i++)
> +   rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[i], void *,
> +   sizeof(struct 
> rte_ether_hdr)));
> +
> +   /* Get stream for the speculated next node */
> +   to_next = rte_node_next_stream_get(graph, node, next_index, nb_objs);
> +   while (n_left_from >= 4) {
> +#if RTE_GRAPH_BURST_SIZE > 64
> +   /* Prefetch next-next mbufs */
> +   if (likely(n_left_from > 11)) {
> +   rte_prefetch0(pkts[8]);
> +   rte_prefetch0(pkts[9]);
> +   rte_prefetch0(pkts[10]);
> +   rte_prefetch0(pkts[11]);
> +   }
> +#endif
> +   /* Prefetch next mbuf data */
> +   if (likely(n_lef

RE: [PATCH] examples/ptpclient: add signal handler for cleanup

2023-05-30 Thread Rahul Bhansali

Hi Kirill,

This patch is pending for review from long time.
Please do let me know if any comments on this patch, else will request to merge 
it.

Regards,
Rahul

> -Original Message-
> From: Rahul Bhansali
> Sent: Monday, May 15, 2023 4:29 PM
> To: 'Kirill Rybalchenko' ; Thomas Monjalon
> 
> Cc: 'dev@dpdk.org' 
> Subject: RE: [PATCH] examples/ptpclient: add signal handler for cleanup
> 
> Ping.
> 
> > -Original Message-
> > From: Rahul Bhansali
> > Sent: Friday, January 20, 2023 11:26 AM
> > To: 'dev@dpdk.org' ; 'Kirill Rybalchenko'
> > 
> > Subject: RE: [PATCH] examples/ptpclient: add signal handler for
> > cleanup
> >
> > Ping.
> >
> > > -Original Message-
> > > From: Rahul Bhansali
> > > Sent: Wednesday, November 2, 2022 10:21 PM
> > > To: dev@dpdk.org; Kirill Rybalchenko 
> > > Subject: RE: [PATCH] examples/ptpclient: add signal handler for
> > > cleanup
> > >
> > > Ping.
> > >
> > > > -Original Message-
> > > > From: Rahul Bhansali 
> > > > Sent: Wednesday, August 31, 2022 12:19 PM
> > > > To: dev@dpdk.org; Kirill Rybalchenko
> > > > 
> > > > Cc: Rahul Bhansali 
> > > > Subject: [PATCH] examples/ptpclient: add signal handler for
> > > > cleanup
> > > >
> > > > This adds the signal handler for SIGINT, SIGTERM.
> > > > Also, this will come out from infinite loop and do cleanup once it
> > > > receives any of the registered signal.
> > > >
> > > > Signed-off-by: Rahul Bhansali 
> > > > ---
> > > >  examples/ptpclient/ptpclient.c | 32
> > > > ++--
> > > >  1 file changed, 30 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/examples/ptpclient/ptpclient.c
> > > > b/examples/ptpclient/ptpclient.c index 1f1c9c9c52..8b69716be1
> > > > 100644
> > > > --- a/examples/ptpclient/ptpclient.c
> > > > +++ b/examples/ptpclient/ptpclient.c
> > > > @@ -19,6 +19,9 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > > +
> > > > +static volatile bool force_quit;
> > > >
> > > >  #define RX_RING_SIZE 1024
> > > >  #define TX_RING_SIZE 1024
> > > > @@ -609,7 +612,7 @@ parse_ptp_frames(uint16_t portid, struct
> > > > rte_mbuf
> > > > *m) {
> > > >   * The lcore main. This is the main thread that does the work,
> > > > reading from
> > an
> > > >   * input port and writing to an output port.
> > > >   */
> > > > -static __rte_noreturn void
> > > > +static void
> > > >  lcore_main(void)
> > > >  {
> > > > uint16_t portid;
> > > > @@ -621,7 +624,7 @@ lcore_main(void)
> > > >
> > > > /* Run until the application is quit or killed. */
> > > >
> > > > -   while (1) {
> > > > +   while (!force_quit) {
> > > > /* Read packet from RX queues. 8< */
> > > > for (portid = 0; portid < ptp_enabled_port_nb; 
> > > > portid++) {
> > > >
> > > > @@ -734,6 +737,13 @@ ptp_parse_args(int argc, char **argv)
> > > > return 0;
> > > >  }
> > > >
> > > > +static void
> > > > +signal_handler(int signum)
> > > > +{
> > > > +   if (signum == SIGINT || signum == SIGTERM)
> > > > +   force_quit = true;
> > > > +}
> > > > +
> > > >  /*
> > > >   * The main function, which does initialization and calls the per-lcore
> > > >   * functions.
> > > > @@ -758,6 +768,10 @@ main(int argc, char *argv[])
> > > > argc -= ret;
> > > > argv += ret;
> > > >
> > > > +   force_quit = false;
> > > > +   signal(SIGINT, signal_handler);
> > > > +   signal(SIGTERM, signal_handler);
> > > > +
> > > > ret = ptp_parse_args(argc, argv);
> > > > if (ret < 0)
> > > > rte_exit(EXIT_FAILURE, "Error with PTP 
> > > > initialization\n"); @@ -
> > > > 802,6 +816,20 @@ main(int argc, char *argv[])
> > > > /* Call lcore_main on the main core only. */
> > > > lcore_main();
> > > >
> > > > +   RTE_ETH_FOREACH_DEV(portid) {
> > > > +   if ((ptp_enabled_port_mask & (1 << portid)) == 0)
> > > > +   continue;
> > > > +
> > > > +   /* Disable timesync timestamping for the Ethernet 
> > > > device */
> > > > +   rte_eth_timesync_disable(portid);
> > > > +
> > > > +   ret = rte_eth_dev_stop(portid);
> > > > +   if (ret != 0)
> > > > +   printf("rte_eth_dev_stop: err=%d, port=%d\n", 
> > > > ret,
> > > > portid);
> > > > +
> > > > +   rte_eth_dev_close(portid);
> > > > +   }
> > > > +
> > > > /* clean up the EAL */
> > > > rte_eal_cleanup();
> > > >
> > > > --
> > > > 2.25.1

[PATCH 00/10] support telemetry query ethdev info

2023-05-30 Thread Jie Hai

This patchset supports querying information about ethdev.
The information includes MAC addresses, RxTx offload, flow ctrl,
Rx|Tx queue, firmware version, DCB, RSS, FEC, VLAN, etc.


Dengdui Huang (1):
  ethdev: support telemetry query MAC addresses

Jie Hai (9):
  ethdev: support RxTx offload display
  ethdev: support telemetry query flow ctrl info
  ethdev: support telemetry query Rx queue info
  ethdev: support telemetry query Tx queue info
  ethdev: add firmware version in telemetry info command
  ethdev: support telemetry query DCB info
  ethdev: support telemetry query RSS info
  ethdev: support telemetry query FEC info
  ethdev: support telemetry query VLAN info

 lib/ethdev/rte_ethdev.c | 775 +++-
 1 file changed, 765 insertions(+), 10 deletions(-)

-- 
2.33.0

[PATCH 01/10] ethdev: support telemetry query MAC addresses

2023-05-30 Thread Jie Hai

From: Dengdui Huang 

This patch support telemetry query MAC addresses for a specific port.

The command is like:
--> /ethdev/macs,0
{
  "/ethdev/macs": [
"00:18:2D:00:00:79",
"00:18:2D:00:00:78",
"00:18:2D:00:00:77"
  ]
}

Signed-off-by: Dengdui Huang 
---
 lib/ethdev/rte_ethdev.c | 44 +
 1 file changed, 44 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index d46e74504e64..65e0101fc0eb 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -7032,6 +7032,48 @@ int rte_eth_dev_map_aggr_tx_affinity(uint16_t port_id, 
uint16_t tx_queue_id,
return ret;
 }
 
+static int
+eth_dev_handle_port_macs(const char *cmd __rte_unused,
+   const char *params,
+   struct rte_tel_data *d)
+{
+   char mac_addr[RTE_ETHER_ADDR_FMT_SIZE];
+   struct rte_eth_dev_info dev_info;
+   struct rte_eth_dev *eth_dev;
+   unsigned long port_id;
+   char *end_param;
+   uint32_t i;
+   int ret;
+
+   if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+   return -EINVAL;
+
+   port_id = strtoul(params, &end_param, 0);
+   if (*end_param != '\0')
+   RTE_ETHDEV_LOG(NOTICE,
+   "Extra parameters passed to ethdev telemetry command, 
ignoring");
+
+   if (port_id >= UINT16_MAX)
+   return -EINVAL;
+
+   ret = rte_eth_dev_info_get(port_id, &dev_info);
+   if (ret != 0)
+   return ret;
+
+   eth_dev = &rte_eth_devices[port_id];
+   rte_tel_data_start_array(d, RTE_TEL_STRING_VAL);
+   for (i = 0; i < dev_info.max_mac_addrs; i++) {
+   if (rte_is_zero_ether_addr(ð_dev->data->mac_addrs[i]))
+   continue;
+
+   rte_ether_format_addr(mac_addr, sizeof(mac_addr),
+   ð_dev->data->mac_addrs[i]);
+   rte_tel_data_add_array_string(d, mac_addr);
+   }
+
+   return 0;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
@@ -7053,4 +7095,6 @@ RTE_INIT(ethdev_init_telemetry)
"Returns the device info for a port. Parameters: int 
port_id");
rte_telemetry_register_cmd("/ethdev/module_eeprom", 
eth_dev_handle_port_module_eeprom,
"Returns module EEPROM info with SFF specs. Parameters: 
int port_id");
+   rte_telemetry_register_cmd("/ethdev/macs", eth_dev_handle_port_macs,
+   "Returns the MAC addresses for a port. Parameters: int 
port_id");
 }
-- 
2.33.0

[PATCH 02/10] ethdev: support RxTx offload display

2023-05-30 Thread Jie Hai

Currently, Rx/Tx offloads are displayed in numeric format,
which is not easy to understand. This patch fixes it.

Signed-off-by: Jie Hai 
---
 lib/ethdev/rte_ethdev.c | 67 +++--
 1 file changed, 57 insertions(+), 10 deletions(-)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 65e0101fc0eb..3207b3177256 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6607,16 +6607,44 @@ eth_dev_handle_port_link_status(const char *cmd 
__rte_unused,
return 0;
 }
 
+static void
+eth_dev_parse_rx_offloads(uint64_t offload, struct rte_tel_data *d)
+{
+   uint32_t i;
+
+   rte_tel_data_start_array(d, RTE_TEL_STRING_VAL);
+   for (i = 0; i < RTE_DIM(eth_dev_rx_offload_names); i++) {
+   if ((offload & eth_dev_rx_offload_names[i].offload) != 0)
+   rte_tel_data_add_array_string(d,
+   eth_dev_rx_offload_names[i].name);
+   }
+}
+
+static void
+eth_dev_parse_tx_offloads(uint64_t offload, struct rte_tel_data *d)
+{
+   uint32_t i;
+
+   rte_tel_data_start_array(d, RTE_TEL_STRING_VAL);
+   for (i = 0; i < RTE_DIM(eth_dev_tx_offload_names); i++) {
+   if ((offload & eth_dev_tx_offload_names[i].offload) != 0)
+   rte_tel_data_add_array_string(d,
+   eth_dev_tx_offload_names[i].name);
+   }
+}
+
 static int
 eth_dev_handle_port_info(const char *cmd __rte_unused,
const char *params,
struct rte_tel_data *d)
 {
+   struct rte_tel_data *rx_offload, *tx_offload;
struct rte_tel_data *rxq_state, *txq_state;
char mac_addr[RTE_ETHER_ADDR_FMT_SIZE];
struct rte_eth_dev *eth_dev;
char *end_param;
-   int port_id, i;
+   int port_id;
+   uint32_t i;
 
if (params == NULL || strlen(params) == 0 || !isdigit(*params))
return -1;
@@ -6632,14 +6660,20 @@ eth_dev_handle_port_info(const char *cmd __rte_unused,
eth_dev = &rte_eth_devices[port_id];
 
rxq_state = rte_tel_data_alloc();
-   if (!rxq_state)
+   if (rxq_state == NULL)
return -ENOMEM;
 
txq_state = rte_tel_data_alloc();
-   if (!txq_state) {
-   rte_tel_data_free(rxq_state);
-   return -ENOMEM;
-   }
+   if (txq_state == NULL)
+   goto free_rxq_state;
+
+   rx_offload = rte_tel_data_alloc();
+   if (rx_offload == NULL)
+   goto free_txq_state;
+
+   tx_offload = rte_tel_data_alloc();
+   if (tx_offload == NULL)
+   goto free_rx_offload;
 
rte_tel_data_start_dict(d);
rte_tel_data_add_dict_string(d, "name", eth_dev->data->name);
@@ -6681,14 +6715,27 @@ eth_dev_handle_port_info(const char *cmd __rte_unused,
rte_tel_data_add_dict_int(d, "numa_node", eth_dev->data->numa_node);
rte_tel_data_add_dict_uint_hex(d, "dev_flags",
eth_dev->data->dev_flags, 0);
-   rte_tel_data_add_dict_uint_hex(d, "rx_offloads",
-   eth_dev->data->dev_conf.rxmode.offloads, 0);
-   rte_tel_data_add_dict_uint_hex(d, "tx_offloads",
-   eth_dev->data->dev_conf.txmode.offloads, 0);
+
+   eth_dev_parse_rx_offloads(eth_dev->data->dev_conf.rxmode.offloads,
+   rx_offload);
+   rte_tel_data_add_dict_container(d, "rx_offloads", rx_offload, 0);
+   eth_dev_parse_tx_offloads(eth_dev->data->dev_conf.txmode.offloads,
+   tx_offload);
+   rte_tel_data_add_dict_container(d, "tx_offloads", tx_offload, 0);
+
rte_tel_data_add_dict_uint_hex(d, "ethdev_rss_hf",
eth_dev->data->dev_conf.rx_adv_conf.rss_conf.rss_hf, 0);
 
return 0;
+
+free_rx_offload:
+   rte_tel_data_free(rx_offload);
+free_txq_state:
+   rte_tel_data_free(txq_state);
+free_rxq_state:
+   rte_tel_data_free(rxq_state);
+
+   return -ENOMEM;
 }
 
 int
-- 
2.33.0

[PATCH 04/10] ethdev: support telemetry query Rx queue info

2023-05-30 Thread Jie Hai

This patch support querying information of Rx queues.
The command is like:
--> /ethdev/rx_queue,0,0
{
  "/ethdev/rx_queue": {
"mempool_name": "mb_pool_0",
"socket_id": 0,
"host_threshold": 0,
"prefetch_threshold": 0,
"writeback_threshold": 0,
"free_threshold": 32,
"rx_drop_en": "on",
"deferred_start": "off",
"rx_nseg": 0,
"share_group": 0,
"share_qid": 0,
"offloads": [
  "RSS_HASH"
],
"rx_nmempool": 0,
"scattered_rx": "off",
"queue_state": 1,
"nb_desc": 1024,
"rx_buf_size": 2048,
"avail_thresh": 0,
"burst_flags": 0,
"burst_mode": "Vector Neon"
  }
}

Signed-off-by: Jie Hai 
---
 lib/ethdev/rte_ethdev.c | 127 
 1 file changed, 127 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 5cdb310ca979..35c13df1c110 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -7170,6 +7170,131 @@ eth_dev_handle_port_flow_ctrl(const char *cmd 
__rte_unused,
return 0;
 }
 
+static int
+parse_queue_params(const char *params, bool is_rx,
+   unsigned long *port_id, unsigned long *queue_id)
+{
+   struct rte_eth_dev *dev;
+   const char *qid_param;
+   uint16_t nb_queues;
+   char *end_param;
+
+   if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+   return -EINVAL;
+
+   *port_id = strtoul(params, &end_param, 0);
+   if (*port_id >= UINT16_MAX || !rte_eth_dev_is_valid_port(*port_id))
+   return -EINVAL;
+
+   dev = &rte_eth_devices[*port_id];
+   nb_queues = is_rx ? dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+   if (nb_queues == 1 && *end_param == '\0')
+   *queue_id = 0;
+   else {
+   qid_param = strtok(end_param, ",");
+   if (!qid_param || strlen(qid_param) == 0 || 
!isdigit(*qid_param))
+   return -EINVAL;
+
+   *queue_id = strtoul(qid_param, &end_param, 0);
+   }
+   if (*end_param != '\0')
+   RTE_ETHDEV_LOG(NOTICE,
+   "Extra parameters passed to ethdev telemetry command, 
ignoring\n");
+
+   if (*queue_id >= UINT16_MAX)
+   return -EINVAL;
+
+   return 0;
+}
+
+static int
+eth_dev_add_burst_mode(unsigned long port_id, unsigned long queue_id,
+   bool is_rx, struct rte_tel_data *d)
+{
+   struct rte_eth_burst_mode mode;
+   int ret;
+
+   if (is_rx)
+   ret = rte_eth_rx_burst_mode_get(port_id, queue_id, &mode);
+   else
+   ret = rte_eth_tx_burst_mode_get(port_id, queue_id, &mode);
+
+   if (ret == -ENOTSUP)
+   return 0;
+
+   if (ret != 0) {
+   RTE_ETHDEV_LOG(ERR,
+   "Failed to get burst mode for port %lu\n", port_id);
+   return ret;
+   }
+
+   rte_tel_data_add_dict_uint(d, "burst_flags", mode.flags);
+   rte_tel_data_add_dict_string(d, "burst_mode", mode.info);
+   return 0;
+}
+
+static int
+eth_dev_handle_port_rxq(const char *cmd __rte_unused,
+   const char *params,
+   struct rte_tel_data *d)
+{
+   struct rte_eth_thresh *rx_thresh;
+   unsigned long port_id, queue_id;
+   struct rte_eth_rxconf *rxconf;
+   struct rte_eth_rxq_info qinfo;
+   struct rte_tel_data *offload;
+   int ret;
+
+   ret = parse_queue_params(params, true, &port_id, &queue_id);
+   if (ret != 0)
+   return ret;
+
+   ret = rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo);
+   if (ret != 0)
+   return ret;
+
+   rte_tel_data_start_dict(d);
+   rte_tel_data_add_dict_string(d, "mempool_name", qinfo.mp->name);
+   rte_tel_data_add_dict_uint(d, "socket_id", qinfo.mp->socket_id);
+
+   rx_thresh = &qinfo.conf.rx_thresh;
+   rte_tel_data_add_dict_uint(d, "host_threshold", rx_thresh->hthresh);
+   rte_tel_data_add_dict_uint(d, "prefetch_threshold", rx_thresh->pthresh);
+   rte_tel_data_add_dict_uint(d, "writeback_threshold", 
rx_thresh->wthresh);
+
+   rxconf = &qinfo.conf;
+   rte_tel_data_add_dict_uint(d, "free_threshold", rxconf->rx_free_thresh);
+   rte_tel_data_add_dict_string(d, "rx_drop_en",
+   rxconf->rx_drop_en == 0 ? "off" : "on");
+   rte_tel_data_add_dict_string(d, "deferred_start",
+   rxconf->rx_deferred_start == 0 ? "off" : "on");
+   rte_tel_data_add_dict_uint(d, "rx_nseg", rxconf->rx_nseg);
+   rte_tel_data_add_dict_uint(d, "share_group", rxconf->share_group);
+   rte_tel_data_add_dict_uint(d, "share_qid", rxconf->share_qid);
+
+   offload = rte_tel_data_alloc();
+   if (offload == NULL)
+   return -ENOMEM;
+
+   eth_dev_parse_rx_offloads(rxconf->offloads, offload);
+   rte_tel_data_add_dict_container(d, "offloads", offload, 0);
+
+   rte_tel_data_add_dict_uint(d, "

[PATCH 03/10] ethdev: support telemetry query flow ctrl info

2023-05-30 Thread Jie Hai

This patch supports telemetry querying flow control info.
The command is like:
--> /ethdev/flow_ctrl,0
{
  "/ethdev/flow_ctrl": {
"high_waterline": "0x0",
"low_waterline": "0x0",
"pause_time": "0x",
"send_xon": "off",
"mac_ctrl_frame_fwd": "off",
"rx_pause": "off",
"tx_pause": "off",
"autoneg": "off"
  }
}

Signed-off-by: Jie Hai 
---
 lib/ethdev/rte_ethdev.c | 51 +
 1 file changed, 51 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 3207b3177256..5cdb310ca979 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -7121,6 +7121,55 @@ eth_dev_handle_port_macs(const char *cmd __rte_unused,
return 0;
 }
 
+static int
+eth_dev_handle_port_flow_ctrl(const char *cmd __rte_unused,
+   const char *params,
+   struct rte_tel_data *d)
+{
+   struct rte_eth_fc_conf fc_conf;
+   unsigned long port_id;
+   char *end_param;
+   bool rx_fc_en;
+   bool tx_fc_en;
+   int ret;
+
+   if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+   return -EINVAL;
+
+   port_id = strtoul(params, &end_param, 0);
+   if (*end_param != '\0')
+   RTE_ETHDEV_LOG(NOTICE,
+   "Extra parameters passed to ethdev telemetry command, 
ignoring\n");
+
+   if (port_id >= UINT16_MAX || !rte_eth_dev_is_valid_port(port_id))
+   return -EINVAL;
+
+   ret = rte_eth_dev_flow_ctrl_get(port_id, &fc_conf);
+   if (ret != 0) {
+   RTE_ETHDEV_LOG(ERR,
+   "Failed to get flow ctrl info, ret = %d\n", ret);
+   return ret;
+   }
+
+   rx_fc_en = fc_conf.mode == RTE_ETH_FC_RX_PAUSE ||
+  fc_conf.mode == RTE_ETH_FC_FULL;
+   tx_fc_en = fc_conf.mode == RTE_ETH_FC_TX_PAUSE ||
+  fc_conf.mode == RTE_ETH_FC_FULL;
+
+   rte_tel_data_start_dict(d);
+   rte_tel_data_add_dict_uint_hex(d, "high_waterline", fc_conf.high_water, 
0);
+   rte_tel_data_add_dict_uint_hex(d, "low_waterline", fc_conf.low_water, 
0);
+   rte_tel_data_add_dict_uint_hex(d, "pause_time", fc_conf.pause_time, 0);
+   rte_tel_data_add_dict_string(d, "send_xon", fc_conf.send_xon ? "on" : 
"off");
+   rte_tel_data_add_dict_string(d, "mac_ctrl_frame_fwd",
+   fc_conf.mac_ctrl_frame_fwd ? "on" : "off");
+   rte_tel_data_add_dict_string(d, "rx_pause", rx_fc_en ? "on" : "off");
+   rte_tel_data_add_dict_string(d, "tx_pause", tx_fc_en ? "on" : "off");
+   rte_tel_data_add_dict_string(d, "autoneg", fc_conf.autoneg ? "on" : 
"off");
+
+   return 0;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
@@ -7144,4 +7193,6 @@ RTE_INIT(ethdev_init_telemetry)
"Returns module EEPROM info with SFF specs. Parameters: 
int port_id");
rte_telemetry_register_cmd("/ethdev/macs", eth_dev_handle_port_macs,
"Returns the MAC addresses for a port. Parameters: int 
port_id");
+   rte_telemetry_register_cmd("/ethdev/flow_ctrl", 
eth_dev_handle_port_flow_ctrl,
+   "Returns flow ctrl info for a port. Parameters: 
unsigned port_id");
 }
-- 
2.33.0

[PATCH 05/10] ethdev: support telemetry query Tx queue info

2023-05-30 Thread Jie Hai

This patch support querying information of Tx queues.
The command is like:
--> /ethdev/tx_queue,0,0
{
  "/ethdev/tx_queue": {
"host_threshold": 0,
"prefetch_threshold": 0,
"writeback_threshold": 0,
"rs_threshold": 32,
"free_threshold": 928,
"deferred_start": "off",
"offloads": [
  "MBUF_FAST_FREE"
],
"queue_state": 1,
"nb_desc": 1024,
"burst_flags": 0,
"burst_mode": "Vector Neon"
  }
}

Signed-off-by: Jie Hai 
---
 lib/ethdev/rte_ethdev.c | 50 +
 1 file changed, 50 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 35c13df1c110..315334321cb3 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -7295,6 +7295,54 @@ eth_dev_handle_port_rxq(const char *cmd __rte_unused,
return ret;
 }
 
+static int
+eth_dev_handle_port_txq(const char *cmd __rte_unused,
+   const char *params,
+   struct rte_tel_data *d)
+{
+   struct rte_eth_thresh *tx_thresh;
+   unsigned long port_id, queue_id;
+   struct rte_eth_txconf *txconf;
+   struct rte_eth_txq_info qinfo;
+   struct rte_tel_data *offload;
+   int ret;
+
+   ret = parse_queue_params(params, false, &port_id, &queue_id);
+   if (ret != 0)
+   return ret;
+
+   ret = rte_eth_tx_queue_info_get(port_id, queue_id, &qinfo);
+   if (ret != 0)
+   return ret;
+
+   rte_tel_data_start_dict(d);
+   tx_thresh = &qinfo.conf.tx_thresh;
+   txconf = &qinfo.conf;
+   rte_tel_data_add_dict_uint(d, "host_threshold", tx_thresh->hthresh);
+   rte_tel_data_add_dict_uint(d, "prefetch_threshold", tx_thresh->pthresh);
+   rte_tel_data_add_dict_uint(d, "writeback_threshold", 
tx_thresh->wthresh);
+   rte_tel_data_add_dict_uint(d, "rs_threshold", txconf->tx_rs_thresh);
+   rte_tel_data_add_dict_uint(d, "free_threshold", txconf->tx_free_thresh);
+   rte_tel_data_add_dict_string(d, "deferred_start",
+   txconf->tx_deferred_start == 0 ? "off" : "on");
+
+   offload = rte_tel_data_alloc();
+   if (offload == NULL)
+   return -ENOMEM;
+
+   eth_dev_parse_tx_offloads(txconf->offloads, offload);
+   rte_tel_data_add_dict_container(d, "offloads", offload, 0);
+
+   rte_tel_data_add_dict_uint(d, "queue_state", qinfo.queue_state);
+   rte_tel_data_add_dict_uint(d, "nb_desc", qinfo.nb_desc);
+
+   ret = eth_dev_add_burst_mode(port_id, queue_id, false, d);
+   if (ret != 0)
+   rte_tel_data_free(offload);
+
+   return 0;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
@@ -7322,4 +7370,6 @@ RTE_INIT(ethdev_init_telemetry)
"Returns flow ctrl info for a port. Parameters: 
unsigned port_id");
rte_telemetry_register_cmd("/ethdev/rx_queue", eth_dev_handle_port_rxq,
"Returns Rx queue info for a port. Parameters: unsigned 
port_id, unsigned queue_id (Optional if only one queue)");
+   rte_telemetry_register_cmd("/ethdev/tx_queue", eth_dev_handle_port_txq,
+   "Returns Tx queue info for a port. Parameters: unsigned 
port_id, unsigned queue_id (Optional if only one queue)");
 }
-- 
2.33.0

[PATCH 06/10] ethdev: add firmware version in telemetry info command

2023-05-30 Thread Jie Hai

This patch adds firmware version in telemetry info command.
An example is like:
--> /ethdev/info,0
{
  "/ethdev/info": {
"name": ":bd:00.0",
"fw_version": "1.20.0.17",

   }
}

Signed-off-by: Jie Hai 
---
 lib/ethdev/rte_ethdev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 315334321cb3..d906cc66d2f9 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6640,6 +6640,7 @@ eth_dev_handle_port_info(const char *cmd __rte_unused,
 {
struct rte_tel_data *rx_offload, *tx_offload;
struct rte_tel_data *rxq_state, *txq_state;
+   char fw_version[RTE_TEL_MAX_STRING_LEN];
char mac_addr[RTE_ETHER_ADDR_FMT_SIZE];
struct rte_eth_dev *eth_dev;
char *end_param;
@@ -6677,6 +6678,11 @@ eth_dev_handle_port_info(const char *cmd __rte_unused,
 
rte_tel_data_start_dict(d);
rte_tel_data_add_dict_string(d, "name", eth_dev->data->name);
+
+   if (rte_eth_dev_fw_version_get(port_id, fw_version,
+RTE_TEL_MAX_STRING_LEN) == 0)
+   rte_tel_data_add_dict_string(d, "fw_version", fw_version);
+
rte_tel_data_add_dict_int(d, "state", eth_dev->state);
rte_tel_data_add_dict_int(d, "nb_rx_queues",
eth_dev->data->nb_rx_queues);
-- 
2.33.0

[PATCH 07/10] ethdev: support telemetry query DCB info

2023-05-30 Thread Jie Hai

This patch supports querying DCB info.

The command is like:
--> /ethdev/dcb,0
{
  "/ethdev/dcb": {
"tc_num": 1,
"tc0": {
  "priority": 0,
  "bw_percent": "100%",
  "rxq_base": 0,
  "txq_base": 0,
  "nb_rxq": 4,
  "nb_txq": 4
}
  }
}

Signed-off-by: Jie Hai 
---
 lib/ethdev/rte_ethdev.c | 85 +
 1 file changed, 85 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index d906cc66d2f9..dad9c5538149 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -7349,6 +7349,89 @@ eth_dev_handle_port_txq(const char *cmd __rte_unused,
return 0;
 }
 
+static int
+eth_dev_add_dcb_tc(struct rte_eth_dcb_info *dcb_info, struct rte_tel_data *d)
+{
+   struct rte_tel_data *tcds[RTE_ETH_DCB_NUM_TCS] = {NULL};
+   struct rte_eth_dcb_tc_queue_mapping *tcq;
+   char bw_percent[RTE_TEL_MAX_STRING_LEN];
+   char name[RTE_TEL_MAX_STRING_LEN];
+   struct rte_tel_data *tcd;
+   uint32_t i;
+
+   for (i = 0; i < dcb_info->nb_tcs; i++) {
+   tcd = rte_tel_data_alloc();
+   if (tcd == NULL) {
+   while (i-- > 0)
+   rte_tel_data_free(tcds[i]);
+   return -ENOMEM;
+   }
+
+   tcds[i] = tcd;
+   rte_tel_data_start_dict(tcd);
+
+   rte_tel_data_add_dict_uint(tcd, "priority", 
dcb_info->prio_tc[i]);
+   snprintf(bw_percent, RTE_TEL_MAX_STRING_LEN,
+   "%u%%", dcb_info->tc_bws[i]);
+   rte_tel_data_add_dict_string(tcd, "bw_percent", bw_percent);
+
+   tcq = &dcb_info->tc_queue;
+   rte_tel_data_add_dict_uint(tcd, "rxq_base", 
tcq->tc_rxq[0][i].base);
+   rte_tel_data_add_dict_uint(tcd, "txq_base", 
tcq->tc_txq[0][i].base);
+   rte_tel_data_add_dict_uint(tcd, "nb_rxq", 
tcq->tc_rxq[0][i].nb_queue);
+   rte_tel_data_add_dict_uint(tcd, "nb_txq", 
tcq->tc_txq[0][i].nb_queue);
+
+   snprintf(name, RTE_TEL_MAX_STRING_LEN, "tc%u", i);
+   rte_tel_data_add_dict_container(d, name, tcd, 0);
+   }
+
+   return 0;
+}
+
+static int
+eth_dev_add_dcb_info(uint16_t port_id, struct rte_tel_data *d)
+{
+   struct rte_eth_dcb_info dcb_info;
+   int ret;
+
+   ret = rte_eth_dev_get_dcb_info(port_id, &dcb_info);
+   if (ret != 0) {
+   RTE_ETHDEV_LOG(ERR,
+   "Failed to get dcb info, ret = %d\n", ret);
+   return ret;
+   }
+
+   rte_tel_data_start_dict(d);
+   rte_tel_data_add_dict_uint(d, "tc_num", dcb_info.nb_tcs);
+
+   if (dcb_info.nb_tcs > 0)
+   return eth_dev_add_dcb_tc(&dcb_info, d);
+
+   return 0;
+}
+
+static int
+eth_dev_handle_port_dcb(const char *cmd __rte_unused,
+   const char *params,
+   struct rte_tel_data *d)
+{
+   unsigned long port_id;
+   char *end_param;
+
+   if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+   return -EINVAL;
+
+   port_id = strtoul(params, &end_param, 0);
+   if (*end_param != '\0')
+   RTE_ETHDEV_LOG(NOTICE,
+   "Extra parameters passed to ethdev telemetry command, 
ignoring\n");
+
+   if (port_id >= UINT16_MAX || !rte_eth_dev_is_valid_port(port_id))
+   return -EINVAL;
+
+   return eth_dev_add_dcb_info(port_id, d);
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
@@ -7378,4 +7461,6 @@ RTE_INIT(ethdev_init_telemetry)
"Returns Rx queue info for a port. Parameters: unsigned 
port_id, unsigned queue_id (Optional if only one queue)");
rte_telemetry_register_cmd("/ethdev/tx_queue", eth_dev_handle_port_txq,
"Returns Tx queue info for a port. Parameters: unsigned 
port_id, unsigned queue_id (Optional if only one queue)");
+   rte_telemetry_register_cmd("/ethdev/dcb", eth_dev_handle_port_dcb,
+   "Returns DCB info for a port. Parameters: unsigned 
port_id");
 }
-- 
2.33.0

[PATCH 08/10] ethdev: support telemetry query RSS info

2023-05-30 Thread Jie Hai

This patch supports querying RSS info by telemetry command.
The command is like:
-->  /ethdev/rss_info,0
{
  "/ethdev/rss_info": {
"rss_hf": "0x238c",
"rss_key_len": 40,
"rss_key": "6d5a56da255b0ec24167253d43a38fb0d0ca2b\
cbae7b30b477cb2da38030f20c6a42b73bbeac01fa"
  }
}

Signed-off-by: Jie Hai 
---
 lib/ethdev/rte_ethdev.c | 86 +
 1 file changed, 86 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index dad9c5538149..6699b40d5e15 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -7432,6 +7432,90 @@ eth_dev_handle_port_dcb(const char *cmd __rte_unused,
return eth_dev_add_dcb_info(port_id, d);
 }
 
+static int
+eth_dev_add_rss_info(struct rte_eth_rss_conf *rss_conf, struct rte_tel_data *d)
+{
+   const uint32_t key_len = rss_conf->rss_key_len * 2 + 1;
+   char *rss_key;
+   char *key;
+   uint32_t i;
+   int ret;
+
+   key = rte_malloc(NULL, key_len, 0);
+   if (key == NULL)
+   return -ENOMEM;
+
+   rss_key = rte_malloc(NULL, key_len, 0);
+   if (rss_key == NULL) {
+   ret = -ENOMEM;
+   goto free_key;
+   }
+
+   rte_tel_data_start_dict(d);
+   rte_tel_data_add_dict_uint_hex(d, "rss_hf", rss_conf->rss_hf, 0);
+   rte_tel_data_add_dict_uint(d, "rss_key_len", rss_conf->rss_key_len);
+
+   memset(rss_key, 0, key_len);
+   for (i = 0; i < rss_conf->rss_key_len; i++) {
+   ret = snprintf(key, key_len, "%02x", rss_conf->rss_key[i]);
+   if (ret < 0)
+   goto free_rss_key;
+   strlcat(rss_key, key, key_len);
+   }
+   ret = rte_tel_data_add_dict_string(d, "rss_key", rss_key);
+
+free_rss_key:
+   rte_free(rss_key);
+free_key:
+   rte_free(key);
+   return ret;
+}
+
+static int
+eth_dev_handle_port_rss_info(const char *cmd __rte_unused,
+   const char *params,
+   struct rte_tel_data *d)
+{
+   struct rte_eth_dev_info dev_info;
+   struct rte_eth_rss_conf rss_conf;
+   unsigned long port_id;
+   char *end_param;
+   int ret;
+
+   if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+   return -EINVAL;
+
+   port_id = strtoul(params, &end_param, 0);
+   if (*end_param != '\0')
+   RTE_ETHDEV_LOG(NOTICE,
+   "Extra parameters passed to ethdev telemetry command, 
ignoring\n");
+
+   if (port_id >= UINT16_MAX || !rte_eth_dev_is_valid_port(port_id))
+   return -EINVAL;
+
+   ret = rte_eth_dev_info_get(port_id, &dev_info);
+   if (ret != 0) {
+   RTE_ETHDEV_LOG(ERR,
+   "Failed to get device info, ret = %d\n", ret);
+   return ret;
+   }
+
+   rss_conf.rss_key_len = dev_info.hash_key_size;
+   rss_conf.rss_key = rte_malloc(NULL, dev_info.hash_key_size, 0);
+   if (rss_conf.rss_key == NULL)
+   return -ENOMEM;
+
+   ret = rte_eth_dev_rss_hash_conf_get(port_id, &rss_conf);
+   if (ret != 0) {
+   rte_free(rss_conf.rss_key);
+   return ret;
+   }
+
+   ret = eth_dev_add_rss_info(&rss_conf, d);
+   rte_free(rss_conf.rss_key);
+   return ret;
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_eth_dev_logtype, INFO);
 
 RTE_INIT(ethdev_init_telemetry)
@@ -7463,4 +7547,6 @@ RTE_INIT(ethdev_init_telemetry)
"Returns Tx queue info for a port. Parameters: unsigned 
port_id, unsigned queue_id (Optional if only one queue)");
rte_telemetry_register_cmd("/ethdev/dcb", eth_dev_handle_port_dcb,
"Returns DCB info for a port. Parameters: unsigned 
port_id");
+   rte_telemetry_register_cmd("/ethdev/rss_info", 
eth_dev_handle_port_rss_info,
+   "Returns RSS info for a port. Parameters: unsigned 
port_id");
 }
-- 
2.33.0

[PATCH 09/10] ethdev: support telemetry query FEC info

2023-05-30 Thread Jie Hai

This patch supports getting FEC information by telemetry.
The command is like:
--> /ethdev/fec,0
{
  "/ethdev/fec": {
"fec_mode": "off",
"fec_capability": {
  "10_Gbps": "off auto baser"
}
  }
}

Signed-off-by: Jie Hai 
---
 lib/ethdev/rte_ethdev.c | 145 
 1 file changed, 145 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 6699b40d5e15..f7a84ae6c35d 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -160,6 +160,16 @@ enum {
STAT_QMAP_RX
 };
 
+static const struct {
+   uint32_t capa;
+   const char *name;
+} rte_eth_fec_capa_name[] = {
+   { RTE_ETH_FEC_MODE_CAPA_MASK(NOFEC),"off"   },
+   { RTE_ETH_FEC_MODE_CAPA_MASK(AUTO), "auto"  },
+   { RTE_ETH_FEC_MODE_CAPA_MASK(BASER),"baser" },
+   { RTE_ETH_FEC_MODE_CAPA_MASK(RS),   "rs"},
+};
+
 int
 rte_eth_iterator_init(struct rte_dev_iterator *iter, const char *devargs_str)
 {
@@ -7516,6 +7526,139 @@ eth_dev_handle_port_rss_info(const char *cmd 
__rte_unused,
return ret;
 }
 
+static const char *
+eth_dev_fec_capa_to_string(uint32_t fec_capa)
+{
+   uint32_t i;
+
+   for (i = 0; i < RTE_DIM(rte_eth_fec_capa_name); i++) {
+   if ((fec_capa & rte_eth_fec_capa_name[i].capa) != 0)
+   return rte_eth_fec_capa_name[i].name;
+   }
+
+   return "unknown";
+}
+
+static void
+eth_dev_fec_capas_to_string(uint32_t fec_capa, char *fec_name, uint32_t len)
+{
+   bool valid = false;
+   size_t count = 0;
+   uint32_t i;
+
+   for (i = 0; i < RTE_DIM(rte_eth_fec_capa_name); i++) {
+   if ((fec_capa & rte_eth_fec_capa_name[i].capa) != 0) {
+   strlcat(fec_name, rte_eth_fec_capa_name[i].name, len);
+   count = strlcat(fec_name, " ", len);
+   valid = true;
+   }
+   }
+
+   if (!valid)
+   count = snprintf(fec_name, len, "unknown ");
+
+   if (count >= len) {
+   RTE_ETHDEV_LOG(WARNING, "FEC capa names may be truncated\n");
+   count = len;
+   }
+
+   fec_name[count - 1] = '\0';
+}
+
+static int
+eth_dev_get_fec_capability(uint16_t port_id, struct rte_tel_data *d)
+{
+   struct rte_eth_fec_capa *speed_fec_capa;
+   char fec_name[RTE_TEL_MAX_STRING_LEN];
+   char speed[RTE_TEL_MAX_STRING_LEN];
+   uint32_t capa_num;
+   uint32_t i, j;
+   int ret;
+
+   ret = rte_eth_fec_get_capability(port_id, NULL, 0);
+   if (ret <= 0)
+   return ret == 0 ? -EINVAL : ret;
+
+   capa_num = ret;
+   speed_fec_capa = calloc(capa_num, sizeof(struct rte_eth_fec_capa));
+   if (speed_fec_capa == NULL)
+   return -ENOMEM;
+
+   ret = rte_eth_fec_get_capability(port_id, speed_fec_capa, capa_num);
+   if (ret <= 0) {
+   ret = ret == 0 ? -EINVAL : ret;
+   goto out;
+   }
+
+   for (i = 0; i < capa_num; i++) {
+   memset(fec_name, 0, RTE_TEL_MAX_STRING_LEN);
+   eth_dev_fec_capas_to_string(speed_fec_capa[i].capa, fec_name,
+   RTE_TEL_MAX_STRING_LEN);
+
+   memset(speed, 0, RTE_TEL_MAX_STRING_LEN);
+   ret = snprintf(speed, RTE_TEL_MAX_STRING_LEN, "%s",
+   rte_eth_link_speed_to_str(speed_fec_capa[i].speed));
+   if (ret < 0)
+   goto out;
+
+   for (j = 0; j < strlen(speed); j++) {
+   if (speed[j] == ' ')
+   speed[j] = '_';
+   }
+
+   rte_tel_data_add_dict_string(d, speed, fec_name);
+   }
+
+out:
+   free(speed_fec_capa);
+   return ret > 0 ? 0 : ret;
+}
+
+static int
+eth_dev_handle_port_fec(const char *cmd __rte_unused,
+   const char *params,
+   struct rte_tel_data *d)
+{
+   struct rte_tel_data *fec_capas;
+   unsigned long port_id;
+   uint32_t fec_mode;
+   char *end_param;
+   int ret;
+
+   if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+   return -EINVAL;
+
+   port_id = strtoul(params, &end_param, 0);
+   if (*end_param != '\0')
+   RTE_ETHDEV_LOG(NOTICE,
+   "Extra parameters passed to ethdev telemetry command, 
ignoring\n");
+
+   if (port_id >= UINT16_MAX || !rte_eth_dev_is_valid_port(port_id))
+   return -EINVAL;
+
+   ret = rte_eth_fec_get(port_id, &fec_mode);
+   if (ret != 0)
+   return ret;
+
+   rte_tel_data_start_dict(d);
+   rte_tel_data_add_dict_string(d, "fec_mode",
+eth_dev_fec_capa_to_string(fec_mode));
+
+   fec_capas = rte_tel_data_alloc();
+   if (fec_capas == NULL)
+   return -ENOMEM;
+
+   rte_tel_data_start_dict(fec_capas);
+   ret = eth_

[PATCH 10/10] ethdev: support telemetry query VLAN info

2023-05-30 Thread Jie Hai

This patch supports querying VLAN information by telemetry.
The command is like:
--> /ethdev/vlan,0
{
  "/ethdev/vlan": {
"pvid": 0,
"hw_vlan_reject_tagged": 0,
"hw_vlan_reject_untagged": 0,
"hw_vlan_insert_pvid": 0,
"VLAN_STRIP": "off",
"VLAN_EXTEND": "off",
"QINQ_STRIP": "off",
"VLAN_FILTER": "on",
"vlan_num": 3,
"vlan_ids": {
  "vlan_0_to_63": [
1,
20
  ],
  "vlan_192_to_255": [
200
  ]
}
  }
}

Signed-off-by: Jie Hai 
---
 lib/ethdev/rte_ethdev.c | 114 
 1 file changed, 114 insertions(+)

diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f7a84ae6c35d..ba3484d8e870 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -7659,6 +7659,118 @@ eth_dev_handle_port_fec(const char *cmd __rte_unused,
return 0;
 }
 
+static int
+eth_dev_add_vlan_id(int port_id, struct rte_tel_data *d)
+{
+   struct rte_tel_data *vlan_blks[64] = {NULL};
+   uint16_t vlan_num, vidx, vbit, num_blks;
+   char blk_name[RTE_TEL_MAX_STRING_LEN];
+   struct rte_vlan_filter_conf *vfc;
+   struct rte_tel_data *vlan_blk;
+   struct rte_tel_data *vd;
+   uint64_t bit_width;
+   uint64_t vlan_id;
+
+   vd = rte_tel_data_alloc();
+   if (vd == NULL)
+   return -ENOMEM;
+
+   vfc = &rte_eth_devices[port_id].data->vlan_filter_conf;
+   bit_width = CHAR_BIT * sizeof(uint64_t);
+   vlan_num = 0;
+   num_blks = 0;
+
+   rte_tel_data_start_dict(vd);
+   for (vidx = 0; vidx < RTE_DIM(vfc->ids); vidx++) {
+   if (vfc->ids[vidx] == 0)
+   continue;
+
+   vlan_blk = rte_tel_data_alloc();
+   if (vlan_blk == NULL)
+   goto free_all;
+
+   vlan_blks[num_blks] = vlan_blk;
+   num_blks++;
+   snprintf(blk_name, RTE_TEL_MAX_STRING_LEN, "vlan_%lu_to_%lu",
+bit_width * vidx, bit_width * (vidx + 1) - 1);
+   rte_tel_data_start_array(vlan_blk, RTE_TEL_UINT_VAL);
+   rte_tel_data_add_dict_container(vd, blk_name, vlan_blk, 0);
+
+   for (vbit = 0; vbit < bit_width; vbit++) {
+   if ((vfc->ids[vidx] & RTE_BIT64(vbit)) == 0)
+   continue;
+
+   vlan_id = bit_width * vidx + vbit;
+   rte_tel_data_add_array_uint(vlan_blk, vlan_id);
+   vlan_num++;
+   }
+   }
+
+   rte_tel_data_add_dict_uint(d, "vlan_num", vlan_num);
+   rte_tel_data_add_dict_container(d, "vlan_ids", vd, 0);
+
+   return 0;
+
+free_all:
+   while (num_blks-- > 0)
+   rte_tel_data_free(vlan_blks[num_blks]);
+
+   rte_tel_data_free(vd);
+   return -ENOMEM;
+}
+
+static int
+eth_dev_handle_port_vlan(const char *cmd __rte_unused,
+   const char *params,
+   struct rte_tel_data *d)
+{
+   struct rte_eth_txmode *txmode;
+   struct rte_eth_conf dev_conf;
+   unsigned long port_id;
+   int offload, ret;
+   char *end_param;
+
+   if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+   return -EINVAL;
+
+   port_id = strtoul(params, &end_param, 0);
+   if (*end_param != '\0')
+   RTE_ETHDEV_LOG(NOTICE,
+   "Extra parameters passed to ethdev telemetry command, 
ignoring\n");
+
+   if (port_id >= UINT16_MAX || !rte_eth_dev_is_valid_port(port_id))
+   return -EINVAL;
+
+   ret = rte_eth_dev_conf_get(port_id, &dev_conf);
+   if (ret != 0) {
+   RTE_ETHDEV_LOG(ERR,
+   "Failed to get device configuration, ret = %d\n", ret);
+   return ret;
+   }
+
+   txmode = &dev_conf.txmode;
+   rte_tel_data_start_dict(d);
+   rte_tel_data_add_dict_uint(d, "pvid", txmode->pvid);
+   rte_tel_data_add_dict_uint(d, "hw_vlan_reject_tagged",
+   txmode->hw_vlan_reject_tagged);
+   rte_tel_data_add_dict_uint(d, "hw_vlan_reject_untagged",
+   txmode->hw_vlan_reject_untagged);
+   rte_tel_data_add_dict_uint(d, "hw_vlan_insert_pvid",
+   txmode->hw_vlan_insert_pvid);
+
+   offload = rte_eth_dev_get_vlan_offload(port_id);
+   rte_tel_data_add_dict_string(d, "VLAN_STRIP",
+   ((offload & RTE_ETH_VLAN_STRIP_OFFLOAD) != 0) ? "on" : "off");
+   rte_tel_data_add_dict_string(d, "VLAN_EXTEND",
+   ((offload & RTE_ETH_VLAN_EXTEND_OFFLOAD) != 0) ? "on" : "off");
+   rte_tel_data_add_dict_string(d, "QINQ_STRIP",
+   ((offload & RTE_ETH_QINQ_STRIP_OFFLOAD) != 0) ? "on" : "off");
+   rte_tel_data_add_dict_string(d, "VLAN_FILTER",
+   ((offload & RTE_ETH_VLAN_FILTER_OFFLOAD) != 0) ? "on" : "off");
+
+   return eth_dev_add_vlan_id(port_id, d);
+}
+
 RTE_LOG_REGISTER_DEFAULT(rte_

[PATCH v3] common/cnxk: add new APIs for batch operations

2023-05-30 Thread Ashwin Sekhar T K

Add new APIs for counting and extracting allocated objects
from a single cache line in the batch alloc memory.

Signed-off-by: Ashwin Sekhar T K 
---
 drivers/common/cnxk/roc_npa.h | 78 ++-
 1 file changed, 67 insertions(+), 11 deletions(-)

diff --git a/drivers/common/cnxk/roc_npa.h b/drivers/common/cnxk/roc_npa.h
index e1e164499e..4ad5f044b5 100644
--- a/drivers/common/cnxk/roc_npa.h
+++ b/drivers/common/cnxk/roc_npa.h
@@ -209,7 +209,6 @@ roc_npa_aura_batch_alloc_issue(uint64_t aura_handle, 
uint64_t *buf,
   unsigned int num, const int dis_wait,
   const int drop)
 {
-   unsigned int i;
int64_t *addr;
uint64_t res;
union {
@@ -220,10 +219,6 @@ roc_npa_aura_batch_alloc_issue(uint64_t aura_handle, 
uint64_t *buf,
if (num > ROC_CN10K_NPA_BATCH_ALLOC_MAX_PTRS)
return -1;
 
-   /* Zero first word of every cache line */
-   for (i = 0; i < num; i += (ROC_ALIGN / sizeof(uint64_t)))
-   buf[i] = 0;
-
addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
   NPA_LF_AURA_BATCH_ALLOC);
cmp.u = 0;
@@ -240,6 +235,9 @@ roc_npa_aura_batch_alloc_issue(uint64_t aura_handle, 
uint64_t *buf,
return 0;
 }
 
+/*
+ * Wait for a batch alloc operation on a cache line to complete.
+ */
 static inline void
 roc_npa_batch_alloc_wait(uint64_t *cache_line, unsigned int wait_us)
 {
@@ -255,6 +253,23 @@ roc_npa_batch_alloc_wait(uint64_t *cache_line, unsigned 
int wait_us)
break;
 }
 
+/*
+ * Count the number of pointers in a single batch alloc cache line.
+ */
+static inline unsigned int
+roc_npa_aura_batch_alloc_count_line(uint64_t *line, unsigned int wait_us)
+{
+   struct npa_batch_alloc_status_s *status;
+
+   status = (struct npa_batch_alloc_status_s *)line;
+   roc_npa_batch_alloc_wait(line, wait_us);
+
+   return status->count;
+}
+
+/*
+ * Count the number of pointers in a sequence of batch alloc cache lines.
+ */
 static inline unsigned int
 roc_npa_aura_batch_alloc_count(uint64_t *aligned_buf, unsigned int num,
   unsigned int wait_us)
@@ -279,6 +294,40 @@ roc_npa_aura_batch_alloc_count(uint64_t *aligned_buf, 
unsigned int num,
return count;
 }
 
+/*
+ * Extract allocated pointers from a single batch alloc cache line. This api
+ * only extracts the required number of pointers from the cache line and it
+ * adjusts the statsus->count so that a subsequent call to this api can
+ * extract the remaining pointers in the cache line appropriately.
+ */
+static inline unsigned int
+roc_npa_aura_batch_alloc_extract_line(uint64_t *buf, uint64_t *line,
+ unsigned int num, unsigned int *rem)
+{
+   struct npa_batch_alloc_status_s *status;
+   unsigned int avail;
+
+   status = (struct npa_batch_alloc_status_s *)line;
+   roc_npa_batch_alloc_wait(line, 0);
+   avail = status->count;
+   num = avail > num ? num : avail;
+   if (num)
+   memcpy(buf, &line[avail - num], num * sizeof(uint64_t));
+   avail -= num;
+   if (avail == 0) {
+   /* Clear the lowest 7 bits of the first pointer */
+   buf[0] &= ~0x7FUL;
+   status->ccode = 0;
+   }
+   status->count = avail;
+   *rem = avail;
+
+   return num;
+}
+
+/*
+ * Extract all allocated pointers from a sequence of batch alloc cache lines.
+ */
 static inline unsigned int
 roc_npa_aura_batch_alloc_extract(uint64_t *buf, uint64_t *aligned_buf,
 unsigned int num)
@@ -330,11 +379,15 @@ roc_npa_aura_op_bulk_free(uint64_t aura_handle, uint64_t 
const *buf,
}
 }
 
+/*
+ * Issue a batch alloc operation on a sequence of cache lines, wait for the
+ * batch alloc to complete and copy the pointers out into the user buffer.
+ */
 static inline unsigned int
 roc_npa_aura_op_batch_alloc(uint64_t aura_handle, uint64_t *buf,
-   uint64_t *aligned_buf, unsigned int num,
-   const int dis_wait, const int drop,
-   const int partial)
+   unsigned int num, uint64_t *aligned_buf,
+   unsigned int aligned_buf_sz, const int dis_wait,
+   const int drop, const int partial)
 {
unsigned int count, chunk, num_alloc;
 
@@ -344,9 +397,12 @@ roc_npa_aura_op_batch_alloc(uint64_t aura_handle, uint64_t 
*buf,
 
count = 0;
while (num) {
-   chunk = (num > ROC_CN10K_NPA_BATCH_ALLOC_MAX_PTRS) ?
- ROC_CN10K_NPA_BATCH_ALLOC_MAX_PTRS :
- num;
+   /* Make sure that the pointers allocated fit into the cache
+* lines reserved.
+*/
+   chunk = aligned_buf_sz / sizeof(uint64_

Re: [PATCH 1/3] security: introduce out of place support for inline ingress

2023-05-30 Thread Jerin Jacob

> > > > > +  */
> > > > > + uint32_t ingress_oop : 1;
> > > > > +
> > > > >   /** Reserved bit fields for future extension
> > > > >*
> > > > >* User should ensure reserved_opts is cleared as it may change 
> > > > > in
> > > > > @@ -282,7 +293,7 @@ struct rte_security_ipsec_sa_options {
> > > > >*
> > > > >* Note: Reduce number of bits in reserved_opts for every new 
> > > > > option.
> > > > >*/
> > > > > - uint32_t reserved_opts : 17;
> > > > > + uint32_t reserved_opts : 16;
> > > > >  };
> > > >
> > > > NAK
> > > > Let me repeat the reserved bit rant. YAGNI
> > > >
> > > > Reserved space is not usable without ABI breakage unless the existing
> > > > code enforces that reserved space has to be zero.
> > > >
> > > > Just saying "User should ensure reserved_opts is cleared" is not enough.
> > >
> > > Yes. I think, we need to enforce to have _init functions for the
> > > structures which is using reserved filed.
> > >
> > > On the same note on YAGNI, I am wondering why NOT introduce
> > > RTE_NEXT_ABI marco kind of scheme to compile out ABI breaking changes.
> > > By keeping RTE_NEXT_ABI disable by default, enable explicitly if user
> > > wants it to avoid waiting for one year any ABI breaking changes.
> > > There are a lot of "fixed appliance" customers (not OS distribution
> > > driven customer) they are willing to recompile DPDK for new feature.
> > > What we are loosing with this scheme?
> >
> > RTE_NEXT_ABI is described in the ABI policy.
> > We are not doing it currently, but I think we could
> > when it is not too much complicate in the code.
> >
> > The only problems I see are:
> > - more #ifdef clutter
> > - 2 binary versions to test
> > - CI and checks must handle RTE_NEXT_ABI version
>
> I think, we have two buckets of ABI breakages via RTE_NEXT_ABI
>
> 1) Changes that introduces compilation failures like adding new
> argument to API or change API name etc
> 2) Structure size change which won't affect the compilation but breaks
> the ABI for shared library usage.
>
> I think, (1) is very distributive, and I don't see recently such
> changes. I think, we should avoid (1) for non XX.11 releases.(or two
> or three-year cycles if we decide that path)
>
> The (2) comes are very common due to the fact HW features are
> evolving. I think, to address the (2), we have two options
> a) Have reserved fields and have _init() function to initialize the structures
> b) Follow YAGNI style and introduce RTE_NEXT_ABI for structure size change.
>
> The above concerns[1] can greatly reduce with option b OR option a.
>
> [1]
>  1) more #ifdef clutter
> For option (a) this is not needed or option (b) the clutter will be
> limited, it will be around structure which add the new filed and
> around the FULL block where new functions are added (not inside the
> functions)
>
> 2) 2 binary versions to test
> For option (a) this is not needed, for option (b) it is limited as for
> new features only one needs to test another binary (rather than NOT
> adding a new feature).
>
>  3) CI and checks must handle RTE_NEXT_ABI version
>
> I think, it is cheap to add this, at least for compilation test.
>
> IMO, We need to change the API break release to 3 year kind of time
> frame to have very good end user experience
> and allow ABI related change to get in every release and force
> _rebuild_ shared objects in major LTS release.
>
> I think, in this major LTS version(23.11) if we can decide (a) vs (b)
> then we can align the code accordingly . e.s.p for (a) we need to add
> _init() functions.
>
> Thoughts?

Not much input from mailing list. Can we discuss this next TB meeting?
Especially how to align with next LTS release on
-YAGNI vs reserved fileds with init()
-What it takes to Extend the API breaking release more than a year as
first step.

Re: [PATCH v6 0/3] add telemetry cmds for ring

2023-05-30 Thread Jie Hai


Hi, Thomas and all maintainers,
Kindly ping for comments, thanks.

On 2023/5/9 17:24, Jie Hai wrote:

This patch set supports telemetry cmd to list rings and dump information
of a ring by its name.

v1->v2:
1. Add space after "switch".
2. Fix wrong strlen parameter.

v2->v3:
1. Remove prefix "rte_" for static function.
2. Add Acked-by Konstantin Ananyev for PATCH 1.
3. Introduce functions to return strings instead copy strings.
4. Check pointer to memzone of ring.
5. Remove redundant variable.
6. Hold lock when access ring data.

v3->v4:
1. Update changelog according to reviews of Honnappa Nagarahalli.
2. Add Reviewed-by Honnappa Nagarahalli.
3. Correct grammar in help information.
4. Correct spell warning on "te" reported by checkpatch.pl.
5. Use ring_walk() to query ring info instead of rte_ring_lookup().
6. Fix that type definition the flag field of rte_ring does not match the usage.
7. Use rte_tel_data_add_dict_uint_hex instead of rte_tel_data_add_dict_u64
for mask and flags.

v4->v5:
1. Add Acked-by Konstantin Ananyev and Chengwen Feng.
2. Add ABI change explanation for commit message of patch 1/3.

v5->v6:
1. Add Acked-by Morten Brørup.
2. Fix incorrect reference of commit.

Jie Hai (3):
   ring: fix unmatched type definition and usage
   ring: add telemetry cmd to list rings
   ring: add telemetry cmd for ring info

  lib/ring/meson.build |   1 +
  lib/ring/rte_ring.c  | 139 +++
  lib/ring/rte_ring_core.h |   2 +-
  3 files changed, 141 insertions(+), 1 deletion(-)

[PATCH] event/cnxk: add wmb after steorl for event mode

2023-05-30 Thread Srujana Challa

From: Author Srujana Challa 

LMTST area can be overwritten before read by HW between to consecutive
steorl operations. Hence, add wmb() after steorl op to make sure
the lmtst operation is complete.

Signed-off-by: Srujana Challa 
Change-Id: Ib16d7cd88cff79e9ca78eff8c47b7ddad2d234dd
Reviewed-on: https://sj1git1.cavium.com/c/IP/SW/dataplane/dpdk/+/103549
Base-Builds: sa_ip-toolkits-Jenkins 
Tested-by: sa_ip-toolkits-Jenkins 
Reviewed-by: Jerin Jacob Kollanukkaran 
---
 drivers/event/cnxk/cn10k_tx_worker.h | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/event/cnxk/cn10k_tx_worker.h 
b/drivers/event/cnxk/cn10k_tx_worker.h
index c18786a14c..81fe31c4b9 100644
--- a/drivers/event/cnxk/cn10k_tx_worker.h
+++ b/drivers/event/cnxk/cn10k_tx_worker.h
@@ -43,7 +43,6 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf 
*m, uint64_t *cmd,
 const uint64_t *txq_data, const uint32_t flags)
 {
uint8_t lnum = 0, loff = 0, shft = 0;
-   uint16_t ref_cnt = m->refcnt;
struct cn10k_eth_txq *txq;
uintptr_t laddr;
uint16_t segdw;
@@ -98,10 +97,9 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf 
*m, uint64_t *cmd,
 
roc_lmt_submit_steorl(lmt_id, pa);
 
-   if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
-   if (ref_cnt > 1)
-   rte_io_wmb();
-   }
+   /* Memory barrier to make sure lmtst store completes */
+   rte_io_wmb();
+
return 1;
 }
 
-- 
2.25.1

[PATCH] event/cnxk: add wmb after steorl for event mode

2023-05-30 Thread Srujana Challa

From: Author Srujana Challa 

LMTST area can be overwritten before read by HW between to consecutive
steorl operations. Hence, add wmb() after steorl op to make sure
the lmtst operation is complete.

Signed-off-by: Srujana Challa 
---
 drivers/event/cnxk/cn10k_tx_worker.h | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/event/cnxk/cn10k_tx_worker.h 
b/drivers/event/cnxk/cn10k_tx_worker.h
index c18786a14c..81fe31c4b9 100644
--- a/drivers/event/cnxk/cn10k_tx_worker.h
+++ b/drivers/event/cnxk/cn10k_tx_worker.h
@@ -43,7 +43,6 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf 
*m, uint64_t *cmd,
 const uint64_t *txq_data, const uint32_t flags)
 {
uint8_t lnum = 0, loff = 0, shft = 0;
-   uint16_t ref_cnt = m->refcnt;
struct cn10k_eth_txq *txq;
uintptr_t laddr;
uint16_t segdw;
@@ -98,10 +97,9 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf 
*m, uint64_t *cmd,
 
roc_lmt_submit_steorl(lmt_id, pa);
 
-   if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
-   if (ref_cnt > 1)
-   rte_io_wmb();
-   }
+   /* Memory barrier to make sure lmtst store completes */
+   rte_io_wmb();
+
return 1;
 }
 
-- 
2.25.1

Re: dpdk: Inquiry about vring cleanup during packets transmission

2023-05-30 Thread Maxime Coquelin


Hello,

On 5/27/23 09:08, wangzengyuan wrote:

Hi,

  I am writing to inquire about the vring cleanup process during 
packets transmission.


In the virtio_xmit_pkts function, there is the following code:

  nb_used = virtqueue_nused(vq);

  if (likely(nb_used > vq->vq_nentries - vq->vq_free_thresh))

    virtio_xmit_cleanup(vq, nb_used);

In other words, cleaning is performed when the number of items used in 
the vring exceeds (vq->vq_nentries - vq->vq_free_thresh). In the case of 
an vring size of 4096, at least (4096-32) items need to be cleaned at 
once, which will take a considerable amount of time.


I'm curious why not clean up fewer items each time to avoid taking up 
too much CPU time in one transmission. Because during the debugging 
process, I found that cleaning up thousands of items at once takes up a 
considerable amount of time.


As I am not familiar with this process, I would appreciate it if you 
could provide me with some information on what its purpose is.


Both the Tx and Rx queues free threshold are configurable via ethdev
APIs:

int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
uint16_t nb_tx_desc, unsigned int socket_id,
const struct rte_eth_txconf *tx_conf);

int rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
uint16_t nb_rx_desc, unsigned int socket_id,
const struct rte_eth_rxconf *rx_conf,
struct rte_mempool *mb_pool);

As you are using large rings, your application may use above APIs to set 
more appropriate values.


Regards,
Maxime


Best regards,

Zengyuan Wang

Re: [PATCH] event/cnxk: add wmb after steorl for event mode

2023-05-30 Thread Jerin Jacob

On Tue, May 30, 2023 at 3:12 PM Srujana Challa  wrote:
>
> From: Author Srujana Challa 
>
> LMTST area can be overwritten before read by HW between to consecutive
> steorl operations. Hence, add wmb() after steorl op to make sure
> the lmtst operation is complete.

lmtst -> LMTST

Change the subject to "fix "
And add Fixes: tag


>
> Signed-off-by: Srujana Challa 
> ---
>  drivers/event/cnxk/cn10k_tx_worker.h | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/event/cnxk/cn10k_tx_worker.h 
> b/drivers/event/cnxk/cn10k_tx_worker.h
> index c18786a14c..81fe31c4b9 100644
> --- a/drivers/event/cnxk/cn10k_tx_worker.h
> +++ b/drivers/event/cnxk/cn10k_tx_worker.h
> @@ -43,7 +43,6 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf 
> *m, uint64_t *cmd,
>  const uint64_t *txq_data, const uint32_t flags)
>  {
> uint8_t lnum = 0, loff = 0, shft = 0;
> -   uint16_t ref_cnt = m->refcnt;
> struct cn10k_eth_txq *txq;
> uintptr_t laddr;
> uint16_t segdw;
> @@ -98,10 +97,9 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf 
> *m, uint64_t *cmd,
>
> roc_lmt_submit_steorl(lmt_id, pa);
>
> -   if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
> -   if (ref_cnt > 1)
> -   rte_io_wmb();
> -   }
> +   /* Memory barrier to make sure lmtst store completes */
> +   rte_io_wmb();
> +
> return 1;
>  }
>
> --
> 2.25.1
>

[PATCH v6 00/21] lib: add pdcp protocol

2023-05-30 Thread Anoob Joseph

Add Packet Data Convergence Protocol (PDCP) processing library.

The library is similar to lib_ipsec which provides IPsec processing
capabilities in DPDK.

PDCP would involve roughly the following operations,
1. Transfer of user plane data
2. Transfer of control plane data
3. Header compression
4. Uplink data compression
5. Ciphering and integrity protection

PDCP library provides following control path APIs that is used to
configure various PDCP entities,
1. rte_pdcp_entity_establish()
2. rte_pdcp_entity_suspend()
3. rte_pdcp_entity_release()

PDCP process is split into 2 parts. One before crypto processing
(rte_pdcp_pkt_pre_process()) and one after crypto processing
(rte_pdcp_pkt_post_process()). Since cryptodev dequeue can return crypto
operations belonging to multiple entities, rte_pdcp_pkt_crypto_group()
is added to help grouping crypto operations belonging to same entity.

Similar to lib IPsec, lib PDCP would allow application to use same API
sequence while leveraging protocol offload features enabled by rte_security
library. Lib PDCP would internally change the handles registered for
*pre_process* and *post_process* based on features enabled in the entity.

Lib PDCP would create the required sessions on the device provided in entity to
minimize the application requirements. Also, the crypto_op allocation and free
would also be done internally by lib PDCP to allow the library to create
crypto ops as required for the input packets. For example, when control PDUs are
received, no cryptodev enqueue-dequeue is expected for the same and lib PDCP
is expected to handle it differently.

Lib PDCP utilizes reorder library for implementing in-order delivery. It
utilizes bitmap library for implementing status reports and track the COUNT
value of the packets received. To allow application to choose timer
implementation of choice, lib PDCP allows application to configure handles that
can be used for starting & stopping timers. Upon expiry, application can call
corresponding PDCP API(``rte_pdcp_t_reordering_expiry_handle``) for handling the
event. Unit tests are added to verify both rte_timer based timers as well as
rte_eventdev based timers.

PDCP tracks the sequence number of the received packets and during events such
as re-establishment, it is required to generate reports and transmit to the
peer. This series introduces ``rte_pdcp_control_pdu_create`` for handling
control PDU generation.

Changes in v6:
- Rebased
- Minor udpates to documentation (Akhil)

Changes in v5:
- Deferred patch adding thread safe processing.
- Updated release notes & MAINTAINERS file.

Changes in v4:
- Disabled 'annotate locks' with lib PDCP
- Enable PDCP autotest only when lib is enabled
- Use rwlock instead of spinlock
- Avoid per packet checks for thread safety (Stephen)
- In DL path, save count determined during pre-process in mbuf and
  use the same in post-process. Determining count again may lead To
  errors
- Simplified DL path threads to allow more common code between SN 12
  & 18


Changes in v3:
- Addressed review comments (Akhil)
- Addressed build failure in CI (tests with lib eventdev disabled)
- Addressed checkpatch issues
- Set only positive values to rte_errno (Akhil)

Changes in v2:
- Added control PDU handling
- Added t-Reordering timer
- Added in-order delivery
- Added status PDU generation
- Rebased on top of new features added in reorder library
- Split base patch
- Increased test coverage
- Improved thread safety

Changes from RFC
- Implementation for all APIs covering basic control plane & user plane packets
- Unit test leveraging existing PDCP vectors available in test_cryptodev
- Unit test performing both UL & DL operations to verify various protocol
  features
- Updated documentation

Sample application sequence:

struct rte_mbuf **out_mb, *pkts[MAX_BURST_SIZE];
struct rte_crypto_op *cop[MAX_BURST_SIZE];
struct rte_pdcp_group grp[MAX_BURST_SIZE];
struct rte_pdcp_entity *pdcp_entity;
int nb_max_out_mb, ret, nb_grp;

/* Create PDCP entity */
pdcp_entity = rte_pdcp_entity_establish(&conf);

/**
 * Allocate buffer for holding mbufs returned during PDCP suspend,
 * release & post-process APIs.
 */

/* Max packets that can be cached in entity + burst size */
nb_max_out_mb = pdcp_entity->max_pkt_cache + 1;
out_mb = rte_malloc(NULL, nb_max_out_mb * sizeof(uintptr_t), 0);
if (out_mb == NULL) {
/* Handle error */
}

while (1) {
/* Receive packet and form mbuf */

/**
 * Prepare packets for crypto operation. Following operations
 * would be done,
 *
 * Transmitting entity/UL (only data PDUs):
 *  - Perform compression
 *  - Assign sequence number
 *  - Add PDCP header
 *  - Create & prepare crypto_op

[PATCH v6 01/21] net: add PDCP header

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

Add PDCP protocol header to be used for supporting PDCP protocol
processing.

Signed-off-by: Anoob Joseph 
Signed-off-by: Kiran Kumar K 
Signed-off-by: Volodymyr Fialko 
Acked-by: Akhil Goyal 
---
 doc/api/doxy-api-index.md |   3 +-
 lib/net/meson.build   |   1 +
 lib/net/rte_pdcp_hdr.h| 147 ++
 3 files changed, 150 insertions(+), 1 deletion(-)
 create mode 100644 lib/net/rte_pdcp_hdr.h

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index c709fd48ad..debbe4134f 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -127,7 +127,8 @@ The public API headers are grouped by topics:
   [Geneve](@ref rte_geneve.h),
   [eCPRI](@ref rte_ecpri.h),
   [L2TPv2](@ref rte_l2tpv2.h),
-  [PPP](@ref rte_ppp.h)
+  [PPP](@ref rte_ppp.h),
+  [PDCP hdr](@ref rte_pdcp_hdr.h)
 
 - **QoS**:
   [metering](@ref rte_meter.h),
diff --git a/lib/net/meson.build b/lib/net/meson.build
index 379d161ee0..bd56f91c22 100644
--- a/lib/net/meson.build
+++ b/lib/net/meson.build
@@ -22,6 +22,7 @@ headers = files(
 'rte_geneve.h',
 'rte_l2tpv2.h',
 'rte_ppp.h',
+'rte_pdcp_hdr.h',
 )
 
 sources = files(
diff --git a/lib/net/rte_pdcp_hdr.h b/lib/net/rte_pdcp_hdr.h
new file mode 100644
index 00..72ae9a66cb
--- /dev/null
+++ b/lib/net/rte_pdcp_hdr.h
@@ -0,0 +1,147 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#ifndef RTE_PDCP_HDR_H
+#define RTE_PDCP_HDR_H
+
+/**
+ * @file
+ *
+ * PDCP-related defines
+ *
+ * Based on - ETSI TS 138 323 V17.1.0 (2022-08)
+ * 
https://www.etsi.org/deliver/etsi_ts/138300_138399/138323/17.01.00_60/ts_138323v170100p.pdf
+ */
+
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * 4.3.1
+ *
+ * Indicate the maximum supported size of a PDCP Control PDU.
+ */
+#define RTE_PDCP_CTRL_PDU_SIZE_MAX 9000u
+
+/**
+ * 6.3.4 MAC-I
+ *
+ * Indicate the size of MAC-I in PDCP PDU.
+ */
+#define RTE_PDCP_MAC_I_LEN 4
+
+/**
+ * Indicate type of control information included in the corresponding PDCP
+ * Control PDU.
+ */
+enum rte_pdcp_ctrl_pdu_type {
+   RTE_PDCP_CTRL_PDU_TYPE_STATUS_REPORT = 0,
+   RTE_PDCP_CTRL_PDU_TYPE_ROHC_FEEDBACK = 1,
+   RTE_PDCP_CTRL_PDU_TYPE_EHC_FEEDBACK = 2,
+   RTE_PDCP_CRTL_PDU_TYPE_UDC_FEEDBACK = 3,
+};
+
+/**
+ * 6.3.7 D/C
+ *
+ * This field indicates whether the corresponding PDCP PDU is a
+ * PDCP Data PDU or a PDCP Control PDU.
+ */
+enum rte_pdcp_pdu_type {
+   RTE_PDCP_PDU_TYPE_CTRL = 0,
+   RTE_PDCP_PDU_TYPE_DATA = 1,
+};
+
+/**
+ * 6.2.2.1 Data PDU for SRBs
+ */
+__extension__
+struct rte_pdcp_cp_data_pdu_sn_12_hdr {
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   uint8_t sn_11_8 : 4;/**< Sequence number bits 8-11 */
+   uint8_t r : 4;  /**< Reserved */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+   uint8_t r : 4;  /**< Reserved */
+   uint8_t sn_11_8 : 4;/**< Sequence number bits 8-11 */
+#endif
+   uint8_t sn_7_0; /**< Sequence number bits 0-7 */
+} __rte_packed;
+
+/**
+ * 6.2.2.2 Data PDU for DRBs and MRBs with 12 bits PDCP SN
+ */
+__extension__
+struct rte_pdcp_up_data_pdu_sn_12_hdr {
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   uint8_t sn_11_8 : 4;/**< Sequence number bits 8-11 */
+   uint8_t r : 3;  /**< Reserved */
+   uint8_t d_c : 1;/**< D/C bit */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+   uint8_t d_c : 1;/**< D/C bit */
+   uint8_t r : 3;  /**< Reserved */
+   uint8_t sn_11_8 : 4;/**< Sequence number bits 8-11 */
+#endif
+   uint8_t sn_7_0; /**< Sequence number bits 0-7 */
+} __rte_packed;
+
+/**
+ * 6.2.2.3 Data PDU for DRBs and MRBs with 18 bits PDCP SN
+ */
+__extension__
+struct rte_pdcp_up_data_pdu_sn_18_hdr {
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   uint8_t sn_17_16 : 2;   /**< Sequence number bits 16-17 */
+   uint8_t r : 5;  /**< Reserved */
+   uint8_t d_c : 1;/**< D/C bit */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+   uint8_t d_c : 1;/**< D/C bit */
+   uint8_t r : 5;  /**< Reserved */
+   uint8_t sn_17_16 : 2;   /**< Sequence number bits 16-17 */
+#endif
+   uint8_t sn_15_8;/**< Sequence number bits 8-15 */
+   uint8_t sn_7_0; /**< Sequence number bits 0-7 */
+} __rte_packed;
+
+/**
+ * 6.2.3.1 Control PDU for PDCP status report
+ */
+__extension__
+struct rte_pdcp_up_ctrl_pdu_hdr {
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   uint8_t r : 4;  /**< Reserved */
+   uint8_t pdu_type : 3;   /**< Control PDU type */
+   uint8_t d_c : 1;/**< D/C bit */
+#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
+   uint8_t d_c : 1;/**< D/C bit */
+   uint8_t pdu_type : 3;   /**< Control PDU type */
+   uint8_t r : 4;  /**< Reserved */
+#endif
+   /**
+* 6.3.9 FMC
+*
+* First Missing COUNT. This fiel

[PATCH v6 02/21] lib: add pdcp protocol

2023-05-30 Thread Anoob Joseph

Add Packet Data Convergence Protocol (PDCP) processing library.

The library is similar to lib_ipsec which provides IPsec processing
capabilities in DPDK.

PDCP would involve roughly the following options,
1. Transfer of user plane data
2. Transfer of control plane data
3. Header compression
4. Uplink data compression
5. Ciphering and integrity protection

PDCP library provides following control path APIs that is used to
configure various PDCP entities,
1. rte_pdcp_entity_establish()
2. rte_pdcp_entity_suspend()
3. rte_pdcp_entity_release()

Signed-off-by: Anoob Joseph 
Signed-off-by: Kiran Kumar K 
Signed-off-by: Volodymyr Fialko 
---
 MAINTAINERS   |   6 ++
 doc/api/doxy-api-index.md |   3 +-
 doc/api/doxy-api.conf.in  |   1 +
 lib/meson.build   |   1 +
 lib/pdcp/meson.build  |  17 
 lib/pdcp/pdcp_crypto.c|  21 +
 lib/pdcp/pdcp_crypto.h|  15 
 lib/pdcp/pdcp_entity.h| 113 ++
 lib/pdcp/pdcp_process.c   | 138 +++
 lib/pdcp/pdcp_process.h   |  13 +++
 lib/pdcp/rte_pdcp.c   | 141 
 lib/pdcp/rte_pdcp.h   | 166 ++
 lib/pdcp/version.map  |  10 +++
 13 files changed, 644 insertions(+), 1 deletion(-)
 create mode 100644 lib/pdcp/meson.build
 create mode 100644 lib/pdcp/pdcp_crypto.c
 create mode 100644 lib/pdcp/pdcp_crypto.h
 create mode 100644 lib/pdcp/pdcp_entity.h
 create mode 100644 lib/pdcp/pdcp_process.c
 create mode 100644 lib/pdcp/pdcp_process.h
 create mode 100644 lib/pdcp/rte_pdcp.c
 create mode 100644 lib/pdcp/rte_pdcp.h
 create mode 100644 lib/pdcp/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index a5219926ab..82f490c5c0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1551,6 +1551,12 @@ F: doc/guides/tools/pdump.rst
 F: app/dumpcap/
 F: doc/guides/tools/dumpcap.rst
 
+PDCP - EXPERIMENTAL
+M: Anoob Joseph 
+M: Volodymyr Fialko 
+T: git://dpdk.org/next/dpdk-next-crypto
+F: lib/pdcp/
+
 
 Packet Framework
 
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index debbe4134f..cd7a6cae44 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -128,7 +128,8 @@ The public API headers are grouped by topics:
   [eCPRI](@ref rte_ecpri.h),
   [L2TPv2](@ref rte_l2tpv2.h),
   [PPP](@ref rte_ppp.h),
-  [PDCP hdr](@ref rte_pdcp_hdr.h)
+  [PDCP hdr](@ref rte_pdcp_hdr.h),
+  [PDCP](@ref rte_pdcp.h)
 
 - **QoS**:
   [metering](@ref rte_meter.h),
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index d230a19e1f..58789308a9 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -62,6 +62,7 @@ INPUT   = @TOPDIR@/doc/api/doxy-api-index.md \
   @TOPDIR@/lib/net \
   @TOPDIR@/lib/pcapng \
   @TOPDIR@/lib/pci \
+  @TOPDIR@/lib/pdcp \
   @TOPDIR@/lib/pdump \
   @TOPDIR@/lib/pipeline \
   @TOPDIR@/lib/port \
diff --git a/lib/meson.build b/lib/meson.build
index dc8aa4ac84..a6a54c196c 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -64,6 +64,7 @@ libraries = [
 'flow_classify', # flow_classify lib depends on pkt framework table lib
 'graph',
 'node',
+'pdcp', # pdcp lib depends on crypto and security
 ]
 
 optional_libs = [
diff --git a/lib/pdcp/meson.build b/lib/pdcp/meson.build
new file mode 100644
index 00..ccaf426240
--- /dev/null
+++ b/lib/pdcp/meson.build
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(C) 2023 Marvell.
+
+if is_windows
+build = false
+reason = 'not supported on Windows'
+subdir_done()
+endif
+
+sources = files(
+'pdcp_crypto.c',
+'pdcp_process.c',
+'rte_pdcp.c',
+)
+headers = files('rte_pdcp.h')
+
+deps += ['mbuf', 'net', 'cryptodev', 'security']
diff --git a/lib/pdcp/pdcp_crypto.c b/lib/pdcp/pdcp_crypto.c
new file mode 100644
index 00..755e27ec9e
--- /dev/null
+++ b/lib/pdcp/pdcp_crypto.c
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#include 
+
+#include "pdcp_crypto.h"
+
+int
+pdcp_crypto_sess_create(struct rte_pdcp_entity *entity, const struct 
rte_pdcp_entity_conf *conf)
+{
+   RTE_SET_USED(entity);
+   RTE_SET_USED(conf);
+   return 0;
+}
+
+void
+pdcp_crypto_sess_destroy(struct rte_pdcp_entity *entity)
+{
+   RTE_SET_USED(entity);
+}
diff --git a/lib/pdcp/pdcp_crypto.h b/lib/pdcp/pdcp_crypto.h
new file mode 100644
index 00..6563331d37
--- /dev/null
+++ b/lib/pdcp/pdcp_crypto.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#ifndef PDCP_CRYPTO_H
+#define PDCP_CRYPTO_H
+
+#include 
+
+int pdcp_crypto_sess_create(struct rte_pdcp_entity *entity,
+   const struct rte_pdcp_entity_conf *conf);
+
+

[PATCH v6 03/21] pdcp: add pre and post-process

2023-05-30 Thread Anoob Joseph

PDCP process is split into 2 parts. One before crypto processing
(rte_pdcp_pkt_pre_process()) and one after crypto processing
(rte_pdcp_pkt_post_process()). Functionality of pre-process &
post-process varies based on the type of entity. Registration of entity
specific function pointer allows skipping multiple checks that would
come in datapath otherwise.

Signed-off-by: Anoob Joseph 
Signed-off-by: Kiran Kumar K 
Signed-off-by: Volodymyr Fialko 
Acked-by: Akhil Goyal 
---
 lib/pdcp/rte_pdcp.h  | 96 
 lib/pdcp/version.map |  3 ++
 2 files changed, 99 insertions(+)

diff --git a/lib/pdcp/rte_pdcp.h b/lib/pdcp/rte_pdcp.h
index 4489b9526b..d225a6dc5c 100644
--- a/lib/pdcp/rte_pdcp.h
+++ b/lib/pdcp/rte_pdcp.h
@@ -22,6 +22,21 @@
 extern "C" {
 #endif
 
+/* Forward declarations. */
+struct rte_pdcp_entity;
+
+/* PDCP pre-process function based on entity configuration. */
+typedef uint16_t (*rte_pdcp_pre_p_t)(const struct rte_pdcp_entity *entity,
+struct rte_mbuf *mb[],
+struct rte_crypto_op *cop[],
+uint16_t num, uint16_t *nb_err);
+
+/* PDCP post-process function based on entity configuration. */
+typedef uint16_t (*rte_pdcp_post_p_t)(const struct rte_pdcp_entity *entity,
+ struct rte_mbuf *in_mb[],
+ struct rte_mbuf *out_mb[],
+ uint16_t num, uint16_t *nb_err);
+
 /**
  * PDCP entity.
  *
@@ -34,6 +49,10 @@ extern "C" {
  * depending on which radio bearer it is carrying data for.
  */
 struct rte_pdcp_entity {
+   /** Entity specific pre-process handle. */
+   rte_pdcp_pre_p_t pre_process;
+   /** Entity specific post-process handle. */
+   rte_pdcp_post_p_t post_process;
/**
 * PDCP entities may hold packets for purposes of in-order delivery
 * (in case of receiving PDCP entity) and re-transmission
@@ -159,6 +178,83 @@ int
 rte_pdcp_entity_suspend(struct rte_pdcp_entity *pdcp_entity,
struct rte_mbuf *out_mb[]);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * For input mbufs and given PDCP entity pre-process the mbufs and prepare
+ * crypto ops that can be enqueued to the cryptodev associated with given
+ * session. Only error packets would be moved returned in the input buffer,
+ * *mb*, and it is the responsibility of the application to free the same.
+ *
+ * @param entity
+ *   Pointer to the *rte_pdcp_entity* object the packets belong to.
+ * @param[in, out] mb
+ *   The address of an array of *num* pointers to *rte_mbuf* structures
+ *   which contain the input packets.
+ *   Any error packets would be returned in the same buffer.
+ * @param[out] cop
+ *   The address of an array that can hold up to *num* pointers to
+ *   *rte_crypto_op* structures. Crypto ops would be allocated by
+ *   ``rte_pdcp_pkt_pre_process`` API.
+ * @param num
+ *   The maximum number of packets to process.
+ * @param[out] nb_err
+ *   Pointer to return the number of error packets returned in *mb*.
+ * @return
+ *   Count of crypto_ops prepared.
+ */
+__rte_experimental
+static inline uint16_t
+rte_pdcp_pkt_pre_process(const struct rte_pdcp_entity *entity,
+struct rte_mbuf *mb[], struct rte_crypto_op *cop[],
+uint16_t num, uint16_t *nb_err)
+{
+   return entity->pre_process(entity, mb, cop, num, nb_err);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * For input mbufs and given PDCP entity, perform PDCP post-processing of the 
mbufs.
+ *
+ * Input mbufs are the ones retrieved from rte_crypto_ops dequeued from 
cryptodev
+ * and grouped by *rte_pdcp_pkt_crypto_group()*.
+ *
+ * The post-processed packets would be returned in the *out_mb* buffer.
+ * The resultant mbufs would be grouped into success packets and error packets.
+ * Error packets would be grouped in the end of the array and
+ * it is the responsibility of the application to handle the same.
+ *
+ * When in-order delivery is enabled, PDCP entity may buffer packets and would
+ * deliver packets only when all prior packets have been post-processed.
+ * That would result in returning more/less packets than enqueued.
+ *
+ * @param entity
+ *   Pointer to the *rte_pdcp_entity* object the packets belong to.
+ * @param in_mb
+ *   The address of an array of *num* pointers to *rte_mbuf* structures.
+ * @param[out] out_mb
+ *   The address of an array of *num* pointers to *rte_mbuf* structures
+ *   to output packets after PDCP post-processing.
+ * @param num
+ *   The maximum number of packets to process.
+ * @param[out] nb_err
+ *   The number of error packets returned in *out_mb* buffer.
+ * @return
+ *   Count of packets returned in *out_mb* buffer.
+ */
+__rte_experimental
+static inline uint16_t
+rte_pdcp_pkt_post_pro

[PATCH v6 04/21] pdcp: add packet group

2023-05-30 Thread Anoob Joseph

Crypto processing in PDCP is performed asynchronously by
rte_cryptodev_enqueue_burst() and rte_cryptodev_dequeue_burst(). Since
cryptodev dequeue can return crypto operations belonging to multiple
entities, rte_pdcp_pkt_crypto_group() is added to help grouping crypto
operations belonging to same entity.

Signed-off-by: Anoob Joseph 
Signed-off-by: Kiran Kumar K 
Signed-off-by: Volodymyr Fialko 
---
 lib/pdcp/meson.build  |   1 +
 lib/pdcp/rte_pdcp.h   |   6 ++
 lib/pdcp/rte_pdcp_group.h | 131 ++
 lib/pdcp/version.map  |   3 +
 4 files changed, 141 insertions(+)
 create mode 100644 lib/pdcp/rte_pdcp_group.h

diff --git a/lib/pdcp/meson.build b/lib/pdcp/meson.build
index ccaf426240..08679b743a 100644
--- a/lib/pdcp/meson.build
+++ b/lib/pdcp/meson.build
@@ -13,5 +13,6 @@ sources = files(
 'rte_pdcp.c',
 )
 headers = files('rte_pdcp.h')
+indirect_headers += files('rte_pdcp_group.h')
 
 deps += ['mbuf', 'net', 'cryptodev', 'security']
diff --git a/lib/pdcp/rte_pdcp.h b/lib/pdcp/rte_pdcp.h
index d225a6dc5c..e63946aa08 100644
--- a/lib/pdcp/rte_pdcp.h
+++ b/lib/pdcp/rte_pdcp.h
@@ -255,6 +255,12 @@ rte_pdcp_pkt_post_process(const struct rte_pdcp_entity 
*entity,
return entity->post_process(entity, in_mb, out_mb, num, nb_err);
 }
 
+/**
+ * The header 'rte_pdcp_group.h' depends on defines in 'rte_pdcp.h'. So include
+ * in the end.
+ */
+#include 
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdcp/rte_pdcp_group.h b/lib/pdcp/rte_pdcp_group.h
new file mode 100644
index 00..ece3e8c0ff
--- /dev/null
+++ b/lib/pdcp/rte_pdcp_group.h
@@ -0,0 +1,131 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#ifndef RTE_PDCP_GROUP_H
+#define RTE_PDCP_GROUP_H
+
+/**
+ * @file rte_pdcp_group.h
+ *
+ * RTE PDCP grouping support.
+ * It is not recommended to include this file directly, include 
+ * instead.
+ * Provides helper functions to process completed crypto-ops and group related
+ * packets by sessions they belong to.
+ */
+
+#include 
+#include 
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Group packets belonging to same PDCP entity.
+ */
+struct rte_pdcp_group {
+   union {
+   uint64_t val;
+   void *ptr;
+   } id; /**< Grouped by value */
+   struct rte_mbuf **m;  /**< Start of the group */
+   uint32_t cnt; /**< Number of entries in the group */
+   int32_t rc;   /**< Status code associated with the group */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Take crypto-op as an input and extract pointer to related PDCP entity.
+ * @param cop
+ *   The address of an input *rte_crypto_op* structure.
+ * @return
+ *   The pointer to the related *rte_pdcp_entity* structure.
+ */
+static inline struct rte_pdcp_entity *
+rte_pdcp_en_from_cop(const struct rte_crypto_op *cop)
+{
+   void *sess = cop->sym[0].session;
+
+   return (struct rte_pdcp_entity *)(uintptr_t)
+   rte_cryptodev_sym_session_opaque_data_get(sess);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Take as input completed crypto ops, extract related mbufs and group them by
+ * *rte_pdcp_entity* they belong to. Mbuf for which the crypto operation has
+ * failed would be flagged using *RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED* flag
+ * in rte_mbuf.ol_flags. The crypto_ops would be freed after the grouping.
+ *
+ * Note that application must ensure only crypto-ops prepared by lib_pdcp is
+ * provided back to @see rte_pdcp_pkt_crypto_group().
+ *
+ * @param cop
+ *   The address of an array of *num* pointers to the input *rte_crypto_op*
+ *   structures.
+ * @param[out] mb
+ *   The address of an array of *num* pointers to output *rte_mbuf* structures.
+ * @param[out] grp
+ *   The address of an array of *num* to output *rte_pdcp_group* structures.
+ * @param num
+ *   The maximum number of crypto-ops to process.
+ * @return
+ *   Number of filled elements in *grp* array.
+ *
+ */
+static inline uint16_t
+rte_pdcp_pkt_crypto_group(struct rte_crypto_op *cop[], struct rte_mbuf *mb[],
+ struct rte_pdcp_group grp[], uint16_t num)
+{
+   uint32_t i, j = 0, n = 0;
+   void *ns, *ps = NULL;
+   struct rte_mbuf *m;
+
+   for (i = 0; i != num; i++) {
+   m = cop[i]->sym[0].m_src;
+   ns = cop[i]->sym[0].session;
+
+   m->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD;
+   if (cop[i]->status != RTE_CRYPTO_OP_STATUS_SUCCESS)
+   m->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED;
+
+   /* Different entity */
+   if (ps != ns) {
+
+   /* Finalize open group and start a new one */
+   if (ps != NULL) {
+   grp[n].cnt = mb + j - grp[n].m;
+   n++;
+   }
+

[PATCH v6 05/21] pdcp: add crypto session create and destroy

2023-05-30 Thread Anoob Joseph

Add routines to create & destroy sessions. PDCP lib would take
crypto transforms as input and creates the session on the corresponding
device after verifying capabilities.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
Acked-by: Akhil Goyal 
---
 lib/pdcp/pdcp_crypto.c | 223 -
 lib/pdcp/pdcp_crypto.h |   5 +
 2 files changed, 225 insertions(+), 3 deletions(-)

diff --git a/lib/pdcp/pdcp_crypto.c b/lib/pdcp/pdcp_crypto.c
index 755e27ec9e..6d2a85dc7d 100644
--- a/lib/pdcp/pdcp_crypto.c
+++ b/lib/pdcp/pdcp_crypto.c
@@ -2,20 +2,237 @@
  * Copyright(C) 2023 Marvell.
  */
 
+#include 
+#include 
+#include 
+#include 
 #include 
+#include 
 
 #include "pdcp_crypto.h"
+#include "pdcp_entity.h"
+
+static int
+pdcp_crypto_caps_cipher_verify(uint8_t dev_id, const struct 
rte_crypto_sym_xform *c_xfrm)
+{
+   const struct rte_cryptodev_symmetric_capability *cap;
+   struct rte_cryptodev_sym_capability_idx cap_idx;
+   int ret;
+
+   cap_idx.type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+   cap_idx.algo.cipher = c_xfrm->cipher.algo;
+
+   cap = rte_cryptodev_sym_capability_get(dev_id, &cap_idx);
+   if (cap == NULL)
+   return -1;
+
+   ret = rte_cryptodev_sym_capability_check_cipher(cap, 
c_xfrm->cipher.key.length,
+   
c_xfrm->cipher.iv.length);
+
+   return ret;
+}
+
+static int
+pdcp_crypto_caps_auth_verify(uint8_t dev_id, const struct rte_crypto_sym_xform 
*a_xfrm)
+{
+   const struct rte_cryptodev_symmetric_capability *cap;
+   struct rte_cryptodev_sym_capability_idx cap_idx;
+   int ret;
+
+   cap_idx.type = RTE_CRYPTO_SYM_XFORM_AUTH;
+   cap_idx.algo.auth = a_xfrm->auth.algo;
+
+   cap = rte_cryptodev_sym_capability_get(dev_id, &cap_idx);
+   if (cap == NULL)
+   return -1;
+
+   ret = rte_cryptodev_sym_capability_check_auth(cap, 
a_xfrm->auth.key.length,
+ 
a_xfrm->auth.digest_length,
+ a_xfrm->auth.iv.length);
+
+   return ret;
+}
+
+static int
+pdcp_crypto_xfrm_validate(const struct rte_pdcp_entity_conf *conf,
+const struct rte_crypto_sym_xform *c_xfrm,
+const struct rte_crypto_sym_xform *a_xfrm,
+bool is_auth_then_cipher)
+{
+   uint16_t cipher_iv_len, auth_digest_len, auth_iv_len;
+   int ret;
+
+   /*
+* Uplink means PDCP entity is configured for transmit. Downlink means 
PDCP entity is
+* configured for receive. When integrity protection is enabled, PDCP 
always performs
+* digest-encrypted or auth-gen-encrypt for uplink (and 
decrypt-auth-verify for downlink).
+* So for uplink, crypto chain would be auth-cipher while for downlink 
it would be
+* cipher-auth.
+*
+* When integrity protection is not required, xform would be cipher 
only.
+*/
+
+   if (c_xfrm == NULL)
+   return -EINVAL;
+
+   if (conf->pdcp_xfrm.pkt_dir == RTE_SECURITY_PDCP_UPLINK) {
+
+   /* With UPLINK, if auth is enabled, it should be before cipher 
*/
+   if (a_xfrm != NULL && !is_auth_then_cipher)
+   return -EINVAL;
+
+   /* With UPLINK, cipher operation must be encrypt */
+   if (c_xfrm->cipher.op != RTE_CRYPTO_CIPHER_OP_ENCRYPT)
+   return -EINVAL;
+
+   /* With UPLINK, auth operation (if present) must be generate */
+   if (a_xfrm != NULL && a_xfrm->auth.op != 
RTE_CRYPTO_AUTH_OP_GENERATE)
+   return -EINVAL;
+
+   } else if (conf->pdcp_xfrm.pkt_dir == RTE_SECURITY_PDCP_DOWNLINK) {
+
+   /* With DOWNLINK, if auth is enabled, it should be after cipher 
*/
+   if (a_xfrm != NULL && is_auth_then_cipher)
+   return -EINVAL;
+
+   /* With DOWNLINK, cipher operation must be decrypt */
+   if (c_xfrm->cipher.op != RTE_CRYPTO_CIPHER_OP_DECRYPT)
+   return -EINVAL;
+
+   /* With DOWNLINK, auth operation (if present) must be verify */
+   if (a_xfrm != NULL && a_xfrm->auth.op != 
RTE_CRYPTO_AUTH_OP_VERIFY)
+   return -EINVAL;
+
+   } else {
+   return -EINVAL;
+   }
+
+   if ((c_xfrm->cipher.algo != RTE_CRYPTO_CIPHER_NULL) &&
+   (c_xfrm->cipher.algo != RTE_CRYPTO_CIPHER_AES_CTR) &&
+   (c_xfrm->cipher.algo != RTE_CRYPTO_CIPHER_ZUC_EEA3) &&
+   (c_xfrm->cipher.algo != RTE_CRYPTO_CIPHER_SNOW3G_UEA2))
+   return -EINVAL;
+
+   if (c_xfrm->cipher.algo == RTE_CRYPTO_CIPHER_NULL)
+   cipher_iv_len = 0;
+   else
+   cipher_iv_len = PDCP_IV_LEN;
+
+   if (cipher_iv_len != c_xfrm->cipher.iv.length)
+

[PATCH v6 06/21] pdcp: add pre and post process for UL

2023-05-30 Thread Anoob Joseph

Add routines to perform pre & post processing based on the type of
entity. To avoid checks in datapath, there are different function
pointers registered based on the following,
1. Control plane v/s user plane
2. 12 bit v/s 18 bit SN

For control plane only 12 bit SN need to be supported (as per PDCP
specification).

Signed-off-by: Anoob Joseph 
Signed-off-by: Kiran Kumar K 
Signed-off-by: Volodymyr Fialko 
Acked-by: Akhil Goyal 
---
 lib/pdcp/pdcp_entity.h  |  24 +++
 lib/pdcp/pdcp_process.c | 334 
 2 files changed, 358 insertions(+)

diff --git a/lib/pdcp/pdcp_entity.h b/lib/pdcp/pdcp_entity.h
index 000297588f..23628ebad4 100644
--- a/lib/pdcp/pdcp_entity.h
+++ b/lib/pdcp/pdcp_entity.h
@@ -92,22 +92,46 @@ pdcp_hdr_size_get(enum rte_security_pdcp_sn_size sn_size)
return RTE_ALIGN_MUL_CEIL(sn_size, 8) / 8;
 }
 
+static inline uint32_t
+pdcp_window_size_get(enum rte_security_pdcp_sn_size sn_size)
+{
+   return 1 << (sn_size - 1);
+}
+
 static inline uint32_t
 pdcp_sn_mask_get(enum rte_security_pdcp_sn_size sn_size)
 {
return (1 << sn_size) - 1;
 }
 
+static inline uint32_t
+pdcp_sn_from_count_get(uint32_t count, enum rte_security_pdcp_sn_size sn_size)
+{
+   return (count & pdcp_sn_mask_get(sn_size));
+}
+
 static inline uint32_t
 pdcp_hfn_mask_get(enum rte_security_pdcp_sn_size sn_size)
 {
return ~pdcp_sn_mask_get(sn_size);
 }
 
+static inline uint32_t
+pdcp_hfn_from_count_get(uint32_t count, enum rte_security_pdcp_sn_size sn_size)
+{
+   return (count & pdcp_hfn_mask_get(sn_size)) >> sn_size;
+}
+
 static inline uint32_t
 pdcp_count_from_hfn_sn_get(uint32_t hfn, uint32_t sn, enum 
rte_security_pdcp_sn_size sn_size)
 {
return (((hfn << sn_size) & pdcp_hfn_mask_get(sn_size)) | (sn & 
pdcp_sn_mask_get(sn_size)));
 }
 
+static inline uint32_t
+pdcp_hfn_max(enum rte_security_pdcp_sn_size sn_size)
+{
+   return (1 << (32 - sn_size)) - 1;
+}
+
 #endif /* PDCP_ENTITY_H */
diff --git a/lib/pdcp/pdcp_process.c b/lib/pdcp/pdcp_process.c
index 79f5dce5db..9b7de39db6 100644
--- a/lib/pdcp/pdcp_process.c
+++ b/lib/pdcp/pdcp_process.c
@@ -36,6 +36,336 @@ pdcp_crypto_xfrm_get(const struct rte_pdcp_entity_conf 
*conf, struct rte_crypto_
return 0;
 }
 
+static inline void
+cop_prepare(const struct entity_priv *en_priv, struct rte_mbuf *mb, struct 
rte_crypto_op *cop,
+   uint8_t data_offset, uint32_t count, const bool is_auth)
+{
+   const struct rte_crypto_op cop_init = {
+   .type = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   .status = RTE_CRYPTO_OP_STATUS_NOT_PROCESSED,
+   .sess_type = RTE_CRYPTO_OP_WITH_SESSION,
+   };
+   struct rte_crypto_sym_op *op;
+   uint32_t pkt_len;
+
+   const uint8_t cipher_shift = 3 * en_priv->flags.is_cipher_in_bits;
+   const uint8_t auth_shift = 3 * en_priv->flags.is_auth_in_bits;
+
+   op = cop->sym;
+   cop->raw = cop_init.raw;
+   op->m_src = mb;
+   op->m_dst = mb;
+
+   /* Set IV */
+   en_priv->iv_gen(cop, en_priv, count);
+
+   /* Prepare op */
+   pkt_len = rte_pktmbuf_pkt_len(mb);
+   op->cipher.data.offset = data_offset << cipher_shift;
+   op->cipher.data.length = (pkt_len - data_offset) << cipher_shift;
+
+   if (is_auth) {
+   op->auth.data.offset = 0;
+   op->auth.data.length = (pkt_len - RTE_PDCP_MAC_I_LEN) << 
auth_shift;
+   op->auth.digest.data = rte_pktmbuf_mtod_offset(mb, uint8_t *,
+  (pkt_len - 
RTE_PDCP_MAC_I_LEN));
+   }
+
+   __rte_crypto_sym_op_attach_sym_session(op, en_priv->crypto_sess);
+}
+
+static inline bool
+pdcp_pre_process_uplane_sn_12_ul_set_sn(struct entity_priv *en_priv, struct 
rte_mbuf *mb,
+   uint32_t *count)
+{
+   struct rte_pdcp_up_data_pdu_sn_12_hdr *pdu_hdr;
+   const uint8_t hdr_sz = en_priv->hdr_sz;
+   uint32_t sn;
+
+   /* Prepend PDU header */
+   pdu_hdr = (struct rte_pdcp_up_data_pdu_sn_12_hdr 
*)rte_pktmbuf_prepend(mb, hdr_sz);
+   if (unlikely(pdu_hdr == NULL))
+   return false;
+
+   /* Update sequence num in the PDU header */
+   *count = en_priv->state.tx_next++;
+   sn = pdcp_sn_from_count_get(*count, RTE_SECURITY_PDCP_SN_SIZE_12);
+
+   pdu_hdr->d_c = RTE_PDCP_PDU_TYPE_DATA;
+   pdu_hdr->sn_11_8 = ((sn & 0xf00) >> 8);
+   pdu_hdr->sn_7_0 = (sn & 0xff);
+   pdu_hdr->r = 0;
+   return true;
+}
+
+static uint16_t
+pdcp_pre_process_uplane_sn_12_ul(const struct rte_pdcp_entity *entity, struct 
rte_mbuf *in_mb[],
+struct rte_crypto_op *cop[], uint16_t num, 
uint16_t *nb_err_ret)
+{
+   struct entity_priv *en_priv = entity_priv_get(entity);
+   uint16_t nb_cop, nb_prep = 0, nb_err = 0;
+   struct rte_mbuf *mb;
+   uint32_t count;
+   uint8_t *mac_i;
+   int i;
+
+   const uint8_

[PATCH v6 07/21] pdcp: add pre and post process for DL

2023-05-30 Thread Anoob Joseph

Add routines to perform pre & post processing for down link entities.

Signed-off-by: Anoob Joseph 
Signed-off-by: Kiran Kumar K 
Signed-off-by: Volodymyr Fialko 
---
 lib/pdcp/pdcp_entity.h  |   2 +
 lib/pdcp/pdcp_process.c | 384 
 lib/pdcp/pdcp_process.h |  11 ++
 lib/pdcp/rte_pdcp.c |  23 +++
 4 files changed, 420 insertions(+)

diff --git a/lib/pdcp/pdcp_entity.h b/lib/pdcp/pdcp_entity.h
index 23628ebad4..1d4a43a3bc 100644
--- a/lib/pdcp/pdcp_entity.h
+++ b/lib/pdcp/pdcp_entity.h
@@ -13,6 +13,8 @@
 
 struct entity_priv;
 
+#define PDCP_HFN_MIN 0
+
 /* IV generation function based on the entity configuration */
 typedef void (*iv_gen_t)(struct rte_crypto_op *cop, const struct entity_priv 
*en_priv,
 uint32_t count);
diff --git a/lib/pdcp/pdcp_process.c b/lib/pdcp/pdcp_process.c
index 9b7de39db6..bd75e6f802 100644
--- a/lib/pdcp/pdcp_process.c
+++ b/lib/pdcp/pdcp_process.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -333,9 +334,353 @@ pdcp_post_process_ul(const struct rte_pdcp_entity *entity,
return nb_success;
 }
 
+static inline int
+pdcp_sn_count_get(const uint32_t rx_deliv, int32_t rsn, uint32_t *count,
+ const enum rte_security_pdcp_sn_size sn_size)
+{
+   const uint32_t rx_deliv_sn = pdcp_sn_from_count_get(rx_deliv, sn_size);
+   const uint32_t window_sz = pdcp_window_size_get(sn_size);
+   uint32_t rhfn;
+
+   rhfn = pdcp_hfn_from_count_get(rx_deliv, sn_size);
+
+   if (rsn < (int32_t)(rx_deliv_sn - window_sz)) {
+   if (unlikely(rhfn == pdcp_hfn_max(sn_size)))
+   return -ERANGE;
+   rhfn += 1;
+   } else if ((uint32_t)rsn >= (rx_deliv_sn + window_sz)) {
+   if (unlikely(rhfn == PDCP_HFN_MIN))
+   return -ERANGE;
+   rhfn -= 1;
+   }
+
+   *count = pdcp_count_from_hfn_sn_get(rhfn, rsn, sn_size);
+
+   return 0;
+}
+
+static inline uint16_t
+pdcp_pre_process_uplane_sn_12_dl_flags(const struct rte_pdcp_entity *entity,
+  struct rte_mbuf *in_mb[], struct 
rte_crypto_op *cop[],
+  uint16_t num, uint16_t *nb_err_ret,
+  const bool is_integ_protected)
+{
+   struct entity_priv *en_priv = entity_priv_get(entity);
+   struct rte_pdcp_up_data_pdu_sn_12_hdr *pdu_hdr;
+   uint16_t nb_cop, nb_prep = 0, nb_err = 0;
+   rte_pdcp_dynfield_t *mb_dynfield;
+   struct rte_mbuf *mb;
+   int32_t rsn = 0;
+   uint32_t count;
+   int i;
+
+   const uint8_t data_offset = en_priv->hdr_sz + en_priv->aad_sz;
+
+   nb_cop = rte_crypto_op_bulk_alloc(en_priv->cop_pool, 
RTE_CRYPTO_OP_TYPE_SYMMETRIC, cop,
+ num);
+
+   const uint32_t rx_deliv = en_priv->state.rx_deliv;
+
+   for (i = 0; i < nb_cop; i++) {
+   mb = in_mb[i];
+   pdu_hdr = rte_pktmbuf_mtod(mb, struct 
rte_pdcp_up_data_pdu_sn_12_hdr *);
+
+   /* Check for PDU type */
+   if (likely(pdu_hdr->d_c == RTE_PDCP_PDU_TYPE_DATA)) {
+   rsn = ((pdu_hdr->sn_11_8 << 8) | (pdu_hdr->sn_7_0));
+   } else {
+   /** NOTE: Control PDU not handled.*/
+   in_mb[nb_err++] = mb;
+   continue;
+   }
+
+   if (unlikely(pdcp_sn_count_get(rx_deliv, rsn, &count,
+  RTE_SECURITY_PDCP_SN_SIZE_12))) {
+   in_mb[nb_err++] = mb;
+   continue;
+   }
+
+   cop_prepare(en_priv, mb, cop[nb_prep++], data_offset, count, 
is_integ_protected);
+
+   mb_dynfield = pdcp_dynfield(mb);
+   *mb_dynfield = count;
+   }
+
+   if (unlikely(nb_err))
+   rte_mempool_put_bulk(en_priv->cop_pool, (void *)&cop[nb_prep], 
nb_cop - nb_prep);
+
+   *nb_err_ret = num - nb_prep;
+
+   return nb_prep;
+}
+
+static uint16_t
+pdcp_pre_process_uplane_sn_12_dl_ip(const struct rte_pdcp_entity *entity, 
struct rte_mbuf *mb[],
+   struct rte_crypto_op *cop[], uint16_t num, 
uint16_t *nb_err)
+{
+   return pdcp_pre_process_uplane_sn_12_dl_flags(entity, mb, cop, num, 
nb_err, true);
+}
+
+static uint16_t
+pdcp_pre_process_uplane_sn_12_dl(const struct rte_pdcp_entity *entity, struct 
rte_mbuf *mb[],
+struct rte_crypto_op *cop[], uint16_t num, 
uint16_t *nb_err)
+{
+   return pdcp_pre_process_uplane_sn_12_dl_flags(entity, mb, cop, num, 
nb_err, false);
+}
+
+static inline uint16_t
+pdcp_pre_process_uplane_sn_18_dl_flags(const struct rte_pdcp_entity *entity,
+  struct rte_mbuf *in_mb[], struct 
rte_crypto_op *cop[],
+  uint16_

[PATCH v6 08/21] pdcp: add IV generation routines

2023-05-30 Thread Anoob Joseph

For PDCP, IV generated has varying formats depending on the ciphering and
authentication algorithm used. Add routines to populate IV accordingly.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 lib/pdcp/pdcp_entity.h  |  87 
 lib/pdcp/pdcp_process.c | 284 
 2 files changed, 371 insertions(+)

diff --git a/lib/pdcp/pdcp_entity.h b/lib/pdcp/pdcp_entity.h
index 1d4a43a3bc..10a72faae1 100644
--- a/lib/pdcp/pdcp_entity.h
+++ b/lib/pdcp/pdcp_entity.h
@@ -26,6 +26,89 @@ struct entity_state {
uint32_t rx_reord;
 };
 
+union auth_iv_partial {
+   /* For AES-CMAC, there is no IV, but message gets prepended */
+   struct {
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   uint64_t count : 32;
+   uint64_t zero_38_39 : 2;
+   uint64_t direction : 1;
+   uint64_t bearer : 5;
+   uint64_t zero_40_63 : 24;
+#else
+   uint64_t count : 32;
+   uint64_t bearer : 5;
+   uint64_t direction : 1;
+   uint64_t zero_38_39 : 2;
+   uint64_t zero_40_63 : 24;
+#endif
+   } aes_cmac;
+   struct {
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   uint64_t count : 32;
+   uint64_t zero_37_39 : 3;
+   uint64_t bearer : 5;
+   uint64_t zero_40_63 : 24;
+
+   uint64_t rsvd_65_71 : 7;
+   uint64_t direction_64 : 1;
+   uint64_t rsvd_72_111 : 40;
+   uint64_t rsvd_113_119 : 7;
+   uint64_t direction_112 : 1;
+   uint64_t rsvd_120_127 : 8;
+#else
+   uint64_t count : 32;
+   uint64_t bearer : 5;
+   uint64_t zero_37_39 : 3;
+   uint64_t zero_40_63 : 24;
+
+   uint64_t direction_64 : 1;
+   uint64_t rsvd_65_71 : 7;
+   uint64_t rsvd_72_111 : 40;
+   uint64_t direction_112 : 1;
+   uint64_t rsvd_113_119 : 7;
+   uint64_t rsvd_120_127 : 8;
+#endif
+   } zs;
+   uint64_t u64[2];
+};
+
+union cipher_iv_partial {
+   struct {
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   uint64_t count : 32;
+   uint64_t zero_38_39 : 2;
+   uint64_t direction : 1;
+   uint64_t bearer : 5;
+   uint64_t zero_40_63 : 24;
+#else
+   uint64_t count : 32;
+   uint64_t bearer : 5;
+   uint64_t direction : 1;
+   uint64_t zero_38_39 : 2;
+   uint64_t zero_40_63 : 24;
+#endif
+   uint64_t zero_64_127;
+   } aes_ctr;
+   struct {
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+   uint64_t count : 32;
+   uint64_t zero_38_39 : 2;
+   uint64_t direction : 1;
+   uint64_t bearer : 5;
+   uint64_t zero_40_63 : 24;
+#else
+   uint64_t count : 32;
+   uint64_t bearer : 5;
+   uint64_t direction : 1;
+   uint64_t zero_38_39 : 2;
+   uint64_t zero_40_63 : 24;
+#endif
+   uint64_t rsvd_64_127;
+   } zs;
+   uint64_t u64[2];
+};
+
 /*
  * Layout of PDCP entity: [rte_pdcp_entity] [entity_priv] [entity_dl/ul]
  */
@@ -35,6 +118,10 @@ struct entity_priv {
struct rte_cryptodev_sym_session *crypto_sess;
/** Entity specific IV generation function. */
iv_gen_t iv_gen;
+   /** Pre-prepared auth IV. */
+   union auth_iv_partial auth_iv_part;
+   /** Pre-prepared cipher IV. */
+   union cipher_iv_partial cipher_iv_part;
/** Entity state variables. */
struct entity_state state;
/** Flags. */
diff --git a/lib/pdcp/pdcp_process.c b/lib/pdcp/pdcp_process.c
index bd75e6f802..28ac4102da 100644
--- a/lib/pdcp/pdcp_process.c
+++ b/lib/pdcp/pdcp_process.c
@@ -14,6 +14,181 @@
 #include "pdcp_entity.h"
 #include "pdcp_process.h"
 
+/* Enum of supported algorithms for ciphering */
+enum pdcp_cipher_algo {
+   PDCP_CIPHER_ALGO_NULL,
+   PDCP_CIPHER_ALGO_AES,
+   PDCP_CIPHER_ALGO_ZUC,
+   PDCP_CIPHER_ALGO_SNOW3G,
+   PDCP_CIPHER_ALGO_MAX
+};
+
+/* Enum of supported algorithms for integrity */
+enum pdcp_auth_algo {
+   PDCP_AUTH_ALGO_NULL,
+   PDCP_AUTH_ALGO_AES,
+   PDCP_AUTH_ALGO_ZUC,
+   PDCP_AUTH_ALGO_SNOW3G,
+   PDCP_AUTH_ALGO_MAX
+};
+
+/* IV generation functions based on type of operation (cipher - auth) */
+
+static void
+pdcp_iv_gen_null_null(struct rte_crypto_op *cop, const struct entity_priv 
*en_priv, uint32_t count)
+{
+   /* No IV required for NULL cipher + NULL auth */
+   RTE_SET_USED(cop);
+   RTE_SET_USED(en_priv);
+   RTE_SET_USED(count);
+}
+
+static void
+pdcp_iv_gen_null_aes_cmac(struct rte_crypto_op *cop, const struct entity_priv 
*en_priv,
+ uint32_t count)
+{
+   struct rte_crypto_sym_op *op = cop->sym;
+   struct rt

[PATCH v6 09/21] app/test: add lib pdcp tests

2023-05-30 Thread Anoob Joseph

Add tests to verify lib PDCP operations. Tests leverage existing PDCP
test vectors.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 MAINTAINERS  |   1 +
 app/test/meson.build |   5 +
 app/test/test_pdcp.c | 732 +++
 3 files changed, 738 insertions(+)
 create mode 100644 app/test/test_pdcp.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 82f490c5c0..ca684dde83 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1556,6 +1556,7 @@ M: Anoob Joseph 
 M: Volodymyr Fialko 
 T: git://dpdk.org/next/dpdk-next-crypto
 F: lib/pdcp/
+F: app/test/test_pdcp*
 
 
 Packet Framework
diff --git a/app/test/meson.build b/app/test/meson.build
index d96ae7a961..8eab3ea8b2 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -435,6 +435,11 @@ if dpdk_conf.has('RTE_HAS_LIBPCAP')
 endif
 endif
 
+if dpdk_conf.has('RTE_LIB_PDCP')
+test_sources += 'test_pdcp.c'
+fast_tests += [['pdcp_autotest', false, true]]
+endif
+
 if cc.has_argument('-Wno-format-truncation')
 cflags += '-Wno-format-truncation'
 endif
diff --git a/app/test/test_pdcp.c b/app/test/test_pdcp.c
new file mode 100644
index 00..34b759eaef
--- /dev/null
+++ b/app/test/test_pdcp.c
@@ -0,0 +1,732 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+#include "test_cryptodev.h"
+#include "test_cryptodev_security_pdcp_test_vectors.h"
+
+#define NB_DESC 1024
+#define CDEV_INVALID_ID UINT8_MAX
+#define NB_TESTS RTE_DIM(pdcp_test_params)
+#define PDCP_IV_LEN 16
+
+struct pdcp_testsuite_params {
+   struct rte_mempool *mbuf_pool;
+   struct rte_mempool *cop_pool;
+   struct rte_mempool *sess_pool;
+   bool cdevs_used[RTE_CRYPTO_MAX_DEVS];
+};
+
+static struct pdcp_testsuite_params testsuite_params;
+
+struct pdcp_test_conf {
+   struct rte_pdcp_entity_conf entity;
+   struct rte_crypto_sym_xform c_xfrm;
+   struct rte_crypto_sym_xform a_xfrm;
+   bool is_integrity_protected;
+   uint8_t input[RTE_PDCP_CTRL_PDU_SIZE_MAX];
+   uint32_t input_len;
+   uint8_t output[RTE_PDCP_CTRL_PDU_SIZE_MAX];
+   uint32_t output_len;
+};
+
+static inline int
+pdcp_hdr_size_get(enum rte_security_pdcp_sn_size sn_size)
+{
+   return RTE_ALIGN_MUL_CEIL(sn_size, 8) / 8;
+}
+
+static int
+cryptodev_init(int dev_id)
+{
+   struct pdcp_testsuite_params *ts_params = &testsuite_params;
+   struct rte_cryptodev_qp_conf qp_conf;
+   struct rte_cryptodev_info dev_info;
+   struct rte_cryptodev_config config;
+   int ret, socket_id;
+
+   /* Check if device was already initialized */
+   if (ts_params->cdevs_used[dev_id])
+   return 0;
+
+   rte_cryptodev_info_get(dev_id, &dev_info);
+
+   if (dev_info.max_nb_queue_pairs < 1) {
+   RTE_LOG(ERR, USER1, "Cryptodev doesn't have sufficient queue 
pairs available\n");
+   return -ENODEV;
+   }
+
+   socket_id = rte_socket_id();
+
+   memset(&config, 0, sizeof(config));
+   config.nb_queue_pairs = 1;
+   config.socket_id = socket_id;
+
+   ret = rte_cryptodev_configure(dev_id, &config);
+   if (ret < 0) {
+   RTE_LOG(ERR, USER1, "Could not configure cryptodev - %d\n", 
dev_id);
+   return -ENODEV;
+   }
+
+   memset(&qp_conf, 0, sizeof(qp_conf));
+   qp_conf.nb_descriptors = NB_DESC;
+
+   ret = rte_cryptodev_queue_pair_setup(dev_id, 0, &qp_conf, socket_id);
+   if (ret < 0) {
+   RTE_LOG(ERR, USER1, "Could not configure queue pair\n");
+   return -ENODEV;
+   }
+
+   ret = rte_cryptodev_start(dev_id);
+   if (ret < 0) {
+   RTE_LOG(ERR, USER1, "Could not start cryptodev\n");
+   return -ENODEV;
+   }
+
+   /* Mark device as initialized */
+   ts_params->cdevs_used[dev_id] = true;
+
+   return 0;
+}
+
+static void
+cryptodev_fini(int dev_id)
+{
+   rte_cryptodev_stop(dev_id);
+}
+
+static unsigned int
+cryptodev_sess_priv_max_req_get(void)
+{
+   struct rte_cryptodev_info info;
+   unsigned int sess_priv_sz;
+   int i, nb_dev;
+   void *sec_ctx;
+
+   nb_dev = rte_cryptodev_count();
+
+   sess_priv_sz = 0;
+
+   for (i = 0; i < nb_dev; i++) {
+   rte_cryptodev_info_get(i, &info);
+   sess_priv_sz = RTE_MAX(sess_priv_sz, 
rte_cryptodev_sym_get_private_session_size(i));
+   if (info.feature_flags & RTE_CRYPTODEV_FF_SECURITY) {
+   sec_ctx = rte_cryptodev_get_sec_ctx(i);
+   sess_priv_sz = RTE_MAX(sess_priv_sz,
+  
rte_security_session_get_size(sec_ctx));
+   }
+   }
+
+   return sess_priv_sz;
+}
+
+static int
+testsuite_setup(void)
+{
+   struct pdcp_testsuite_params *ts_params = &testsuite_params;
+   int nb_cdev, sess_priv_size, nb_ses

[PATCH v6 10/21] test/pdcp: pdcp HFN tests in combined mode

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

Add tests to verify HFN/SN behaviour.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 app/test/test_pdcp.c | 302 ++-
 1 file changed, 299 insertions(+), 3 deletions(-)

diff --git a/app/test/test_pdcp.c b/app/test/test_pdcp.c
index 34b759eaef..cfe2ec6aa9 100644
--- a/app/test/test_pdcp.c
+++ b/app/test/test_pdcp.c
@@ -16,6 +16,9 @@
 #define NB_TESTS RTE_DIM(pdcp_test_params)
 #define PDCP_IV_LEN 16
 
+/* According to formula(7.2.a Window_Size) */
+#define PDCP_WINDOW_SIZE(sn_size) (1 << (sn_size - 1))
+
 struct pdcp_testsuite_params {
struct rte_mempool *mbuf_pool;
struct rte_mempool *cop_pool;
@@ -36,12 +39,69 @@ struct pdcp_test_conf {
uint32_t output_len;
 };
 
+static int create_test_conf_from_index(const int index, struct pdcp_test_conf 
*conf);
+
+typedef int (*test_with_conf_t)(struct pdcp_test_conf *conf);
+
+static int
+run_test_foreach_known_vec(test_with_conf_t test, bool stop_on_first_pass)
+{
+   struct pdcp_test_conf test_conf;
+   bool all_tests_skipped = true;
+   uint32_t i;
+   int ret;
+
+   for (i = 0; i < NB_TESTS; i++) {
+   create_test_conf_from_index(i, &test_conf);
+   ret = test(&test_conf);
+
+   if (ret == TEST_FAILED) {
+   printf("[%03i] - %s - failed\n", i, 
pdcp_test_params[i].name);
+   return TEST_FAILED;
+   }
+
+   if ((ret == TEST_SKIPPED) || (ret == -ENOTSUP))
+   continue;
+
+   if (stop_on_first_pass)
+   return TEST_SUCCESS;
+
+   all_tests_skipped = false;
+   }
+
+   if (all_tests_skipped)
+   return TEST_SKIPPED;
+
+   return TEST_SUCCESS;
+}
+
+static int
+run_test_with_all_known_vec(const void *args)
+{
+   test_with_conf_t test = args;
+
+   return run_test_foreach_known_vec(test, false);
+}
+
 static inline int
 pdcp_hdr_size_get(enum rte_security_pdcp_sn_size sn_size)
 {
return RTE_ALIGN_MUL_CEIL(sn_size, 8) / 8;
 }
 
+static int
+pktmbuf_read_into(const struct rte_mbuf *m, void *buf, size_t buf_len)
+{
+   if (m->pkt_len > buf_len)
+   return -ENOMEM;
+
+   const void *read = rte_pktmbuf_read(m, 0, m->pkt_len, buf);
+   if (read != NULL && read != buf)
+   memcpy(buf, read, m->pkt_len);
+
+   return 0;
+}
+
 static int
 cryptodev_init(int dev_id)
 {
@@ -326,6 +386,21 @@ pdcp_sn_from_raw_get(const void *data, enum 
rte_security_pdcp_sn_size size)
return sn;
 }
 
+static void
+pdcp_sn_to_raw_set(void *data, uint32_t sn, int size)
+{
+   if (size == RTE_SECURITY_PDCP_SN_SIZE_12) {
+   struct rte_pdcp_up_data_pdu_sn_12_hdr *pdu_hdr = data;
+   pdu_hdr->sn_11_8 = ((sn & 0xf00) >> 8);
+   pdu_hdr->sn_7_0 = (sn & 0xff);
+   } else if (size == RTE_SECURITY_PDCP_SN_SIZE_18) {
+   struct rte_pdcp_up_data_pdu_sn_18_hdr *pdu_hdr = data;
+   pdu_hdr->sn_17_16 = ((sn & 0x3) >> 16);
+   pdu_hdr->sn_15_8 = ((sn & 0xff00) >> 8);
+   pdu_hdr->sn_7_0 = (sn & 0xff);
+   }
+}
+
 static int
 create_test_conf_from_index(const int index, struct pdcp_test_conf *conf)
 {
@@ -648,9 +723,17 @@ test_attempt_single(struct pdcp_test_conf *t_conf)
goto mbuf_free;
}
 
-   ret = pdcp_known_vec_verify(mbuf, t_conf->output, t_conf->output_len);
-   if (ret)
-   goto mbuf_free;
+   /* If expected output provided - verify, else - store for future use */
+   if (t_conf->output_len) {
+   ret = pdcp_known_vec_verify(mbuf, t_conf->output, 
t_conf->output_len);
+   if (ret)
+   goto mbuf_free;
+   } else {
+   ret = pktmbuf_read_into(mbuf, t_conf->output, 
RTE_PDCP_CTRL_PDU_SIZE_MAX);
+   if (ret)
+   goto mbuf_free;
+   t_conf->output_len = mbuf->pkt_len;
+   }
 
ret = rte_pdcp_entity_suspend(pdcp_entity, out_mb);
if (ret) {
@@ -667,6 +750,193 @@ test_attempt_single(struct pdcp_test_conf *t_conf)
return ret;
 }
 
+static void
+uplink_to_downlink_convert(const struct pdcp_test_conf *ul_cfg,
+  struct pdcp_test_conf *dl_cfg)
+{
+   assert(ul_cfg->entity.pdcp_xfrm.pkt_dir == RTE_SECURITY_PDCP_UPLINK);
+
+   memcpy(dl_cfg, ul_cfg, sizeof(*dl_cfg));
+   dl_cfg->entity.pdcp_xfrm.pkt_dir = RTE_SECURITY_PDCP_DOWNLINK;
+   dl_cfg->entity.reverse_iv_direction = false;
+
+   if (dl_cfg->is_integrity_protected) {
+   dl_cfg->entity.crypto_xfrm = &dl_cfg->c_xfrm;
+
+   dl_cfg->c_xfrm.cipher.op = RTE_CRYPTO_CIPHER_OP_DECRYPT;
+   dl_cfg->c_xfrm.next = &dl_cfg->a_xfrm;
+
+   dl_cfg->a_xfrm.auth.op = RTE_CRYPTO_AUTH_OP_VERIFY;
+   dl_cfg->a_xfrm.next = NULL;

[PATCH v6 11/21] doc: add PDCP library guide

2023-05-30 Thread Anoob Joseph

Add guide for PDCP library.

Signed-off-by: Anoob Joseph 
Signed-off-by: Kiran Kumar K 
Signed-off-by: Volodymyr Fialko 
---
 MAINTAINERS   |   1 +
 .../img/pdcp_functional_overview.svg  |   1 +
 doc/guides/prog_guide/index.rst   |   1 +
 doc/guides/prog_guide/pdcp_lib.rst| 254 ++
 doc/guides/rel_notes/release_23_07.rst|  12 +
 5 files changed, 269 insertions(+)
 create mode 100644 doc/guides/prog_guide/img/pdcp_functional_overview.svg
 create mode 100644 doc/guides/prog_guide/pdcp_lib.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index ca684dde83..11ecb153bc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1557,6 +1557,7 @@ M: Volodymyr Fialko 
 T: git://dpdk.org/next/dpdk-next-crypto
 F: lib/pdcp/
 F: app/test/test_pdcp*
+F: doc/guides/prog_guide/pdcp_lib.rst
 
 
 Packet Framework
diff --git a/doc/guides/prog_guide/img/pdcp_functional_overview.svg 
b/doc/guides/prog_guide/img/pdcp_functional_overview.svg
new file mode 100644
index 00..287daafc21
--- /dev/null
+++ b/doc/guides/prog_guide/img/pdcp_functional_overview.svg
@@ -0,0 +1 @@
+http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; overflow="hidden">Radio 
Interface (Uu/PC5)UE/NG-RAN/UE 
ANG-RAN/UE/UE BTransmitting PDCP entityReceiving PDCP 
entityTransmission buffer:SequencenumberingHeader or 
uplink dataCompressionHeader or 
uplink dataDecompressionRouting / 
DuplicationAdd PDCP 
headerCipheringIntegrity 
protectionPackets associated to a PDCP SDUPackets not associated to a 
PDCP 
SDURemove PDCP 
HeaderDecipheringIntegrity 
VerificationReception 
buffer:ReorderingDuplicate discardingPackets associated to 
a PDCP SDUPackets not associated to a 
PDCP SDU
\ No newline at end of file
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 87333ee84a..6099ff63cd 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -77,4 +77,5 @@ Programmer's Guide
 lto
 profile_app
 asan
+pdcp_lib
 glossary
diff --git a/doc/guides/prog_guide/pdcp_lib.rst 
b/doc/guides/prog_guide/pdcp_lib.rst
new file mode 100644
index 00..2eefabf45c
--- /dev/null
+++ b/doc/guides/prog_guide/pdcp_lib.rst
@@ -0,0 +1,254 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(C) 2023 Marvell.
+
+PDCP Protocol Processing Library
+
+
+DPDK provides a library for PDCP protocol processing.
+The library utilizes other DPDK libraries such as cryptodev, reorder, etc.,
+to provide the application with a transparent and
+high performant PDCP protocol processing library.
+
+The library abstracts complete PDCP protocol processing conforming to
+``ETSI TS 138 323 V17.1.0 (2022-08)``.
+https://www.etsi.org/deliver/etsi_ts/138300_138399/138323/17.01.00_60/ts_138323v170100p.pdf
+
+PDCP would involve the following operations,
+
+1. Transfer of user plane data
+2. Transfer of control plane data
+3. Header compression
+4. Uplink data compression
+5. Ciphering and integrity protection
+
+.. _figure_pdcp_functional_overview:
+
+.. figure:: img/pdcp_functional_overview.*
+
+   PDCP functional overview new
+
+PDCP library would abstract the protocol offload features of the cryptodev and
+would provide a uniform interface and consistent API usage to work with
+cryptodev irrespective of the protocol offload features supported.
+
+PDCP entity API
+---
+
+PDCP library provides following control path APIs that is used to
+configure various PDCP entities,
+
+1. ``rte_pdcp_entity_establish()``
+2. ``rte_pdcp_entity_suspend()``
+3. ``rte_pdcp_entity_release()``
+
+A PDCP entity would translate to one ``rte_cryptodev_sym_session`` or
+``rte_security_session`` based on the config. The sessions would be created/
+destroyed while corresponding PDCP entity operations are performed.
+
+When upper layers request a PDCP entity suspend 
(``rte_pdcp_entity_suspend()``),
+it would result in flushing out of all cached packets and
+internal state variables are updated as described in 5.1.4.
+
+When upper layers request a PDCP entity release 
(``rte_pdcp_entity_release()``),
+it would result in flushing out of all cached packets and releasing of all
+memory associated with the entity. It would internally free any crypto/security
+sessions created. All procedures mentioned in 5.1.3 would be performed.
+
+PDCP PDU (Protocol Data Unit) API
+-
+
+PDCP PDUs can be categorized as,
+
+1. Control PDU
+2. Data PDU
+
+Control PDUs are used for signalling between entities on either end and
+can be one of the following,
+
+1. PDCP status report
+2. ROHC feedback
+3. EHC feedback
+
+Control PDUs are not ciphered or authenticated, and so such packets are not
+submitted to cryptodev for processing.
+
+Data PDUs are regular packets submitted by upper layers for transmission to
+other end. Such packets would need to be ciphered and authenticated based on
+the entity conf

[PATCH v6 12/21] pdcp: add control PDU handling for status report

2023-05-30 Thread Anoob Joseph

Add control PDU handling and implement status report generation. Status
report generation works only when RX_DELIV = RX_NEXT.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 doc/guides/prog_guide/pdcp_lib.rst |  9 ++
 lib/pdcp/meson.build   |  2 ++
 lib/pdcp/pdcp_cnt.c| 29 ++
 lib/pdcp/pdcp_cnt.h| 14 +
 lib/pdcp/pdcp_ctrl_pdu.c   | 46 +
 lib/pdcp/pdcp_ctrl_pdu.h   | 15 ++
 lib/pdcp/pdcp_entity.h | 15 --
 lib/pdcp/pdcp_process.c| 13 +
 lib/pdcp/rte_pdcp.c| 47 +-
 lib/pdcp/rte_pdcp.h| 33 +
 lib/pdcp/version.map   |  2 ++
 11 files changed, 222 insertions(+), 3 deletions(-)
 create mode 100644 lib/pdcp/pdcp_cnt.c
 create mode 100644 lib/pdcp/pdcp_cnt.h
 create mode 100644 lib/pdcp/pdcp_ctrl_pdu.c
 create mode 100644 lib/pdcp/pdcp_ctrl_pdu.h

diff --git a/doc/guides/prog_guide/pdcp_lib.rst 
b/doc/guides/prog_guide/pdcp_lib.rst
index 2eefabf45c..a925aa7f14 100644
--- a/doc/guides/prog_guide/pdcp_lib.rst
+++ b/doc/guides/prog_guide/pdcp_lib.rst
@@ -76,6 +76,15 @@ Data PDUs are regular packets submitted by upper layers for 
transmission to
 other end. Such packets would need to be ciphered and authenticated based on
 the entity configuration.
 
+PDCP packet processing API for control PDU
+~~
+
+Control PDUs are used in PDCP as a communication channel between transmitting
+and receiving entities. When upper layer request for operations such as
+re-establishment, receiving PDCP entity need to prepare a status report and
+send it to the other end. The API ``rte_pdcp_control_pdu_create()`` allows
+application to request the same.
+
 PDCP packet processing API for data PDU
 ~~~
 
diff --git a/lib/pdcp/meson.build b/lib/pdcp/meson.build
index 08679b743a..75d476bf6d 100644
--- a/lib/pdcp/meson.build
+++ b/lib/pdcp/meson.build
@@ -8,7 +8,9 @@ if is_windows
 endif
 
 sources = files(
+'pdcp_cnt.c',
 'pdcp_crypto.c',
+'pdcp_ctrl_pdu.c',
 'pdcp_process.c',
 'rte_pdcp.c',
 )
diff --git a/lib/pdcp/pdcp_cnt.c b/lib/pdcp/pdcp_cnt.c
new file mode 100644
index 00..c9b952184b
--- /dev/null
+++ b/lib/pdcp/pdcp_cnt.c
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#include 
+
+#include "pdcp_cnt.h"
+#include "pdcp_entity.h"
+
+int
+pdcp_cnt_ring_create(struct rte_pdcp_entity *en, const struct 
rte_pdcp_entity_conf *conf)
+{
+   struct entity_priv_dl_part *en_priv_dl;
+   uint32_t window_sz;
+
+   if (en == NULL || conf == NULL)
+   return -EINVAL;
+
+   if (conf->pdcp_xfrm.pkt_dir == RTE_SECURITY_PDCP_UPLINK)
+   return 0;
+
+   en_priv_dl = entity_dl_part_get(en);
+   window_sz = pdcp_window_size_get(conf->pdcp_xfrm.sn_size);
+
+   RTE_SET_USED(window_sz);
+   RTE_SET_USED(en_priv_dl);
+
+   return 0;
+}
diff --git a/lib/pdcp/pdcp_cnt.h b/lib/pdcp/pdcp_cnt.h
new file mode 100644
index 00..bbda478b55
--- /dev/null
+++ b/lib/pdcp/pdcp_cnt.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#ifndef PDCP_CNT_H
+#define PDCP_CNT_H
+
+#include 
+
+#include "pdcp_entity.h"
+
+int pdcp_cnt_ring_create(struct rte_pdcp_entity *en, const struct 
rte_pdcp_entity_conf *conf);
+
+#endif /* PDCP_CNT_H */
diff --git a/lib/pdcp/pdcp_ctrl_pdu.c b/lib/pdcp/pdcp_ctrl_pdu.c
new file mode 100644
index 00..feb05fd863
--- /dev/null
+++ b/lib/pdcp/pdcp_ctrl_pdu.c
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell.
+ */
+
+#include 
+#include 
+#include 
+
+#include "pdcp_ctrl_pdu.h"
+#include "pdcp_entity.h"
+
+static __rte_always_inline void
+pdcp_hdr_fill(struct rte_pdcp_up_ctrl_pdu_hdr *pdu_hdr, uint32_t rx_deliv)
+{
+   pdu_hdr->d_c = RTE_PDCP_PDU_TYPE_CTRL;
+   pdu_hdr->pdu_type = RTE_PDCP_CTRL_PDU_TYPE_STATUS_REPORT;
+   pdu_hdr->r = 0;
+   pdu_hdr->fmc = rte_cpu_to_be_32(rx_deliv);
+}
+
+int
+pdcp_ctrl_pdu_status_gen(struct entity_priv *en_priv, struct rte_mbuf *m)
+{
+   struct rte_pdcp_up_ctrl_pdu_hdr *pdu_hdr;
+   uint32_t rx_deliv;
+   int pdu_sz;
+
+   if (!en_priv->flags.is_status_report_required)
+   return -EINVAL;
+
+   pdu_sz = sizeof(struct rte_pdcp_up_ctrl_pdu_hdr);
+
+   rx_deliv = en_priv->state.rx_deliv;
+
+   /* Zero missing PDUs - status report contains only FMC */
+   if (rx_deliv >= en_priv->state.rx_next) {
+   pdu_hdr = (struct rte_pdcp_up_ctrl_pdu_hdr 
*)rte_pktmbuf_append(m, pdu_sz);
+   if (pdu_hdr == NULL)
+   return -ENOMEM;
+   pdcp_hdr_fill(pdu_hdr, rx_deliv);
+
+   return 0;
+   }
+

[PATCH v6 13/21] pdcp: implement t-Reordering and packet buffering

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

Add in-order delivery of packets in PDCP. Delivery of packets in-order
relies on t-Reordering timer.

When 'out-of-order delivery' is disabled, PDCP will buffer all received
packets that are out of order. The t-Reordering timer determines the
time period these packets would be held in the buffer, waiting for any
missing packets to arrive.

Introduce packet buffering and state variables which indicate status of
the timer.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 lib/pdcp/meson.build|   3 +-
 lib/pdcp/pdcp_entity.h  |  19 +++
 lib/pdcp/pdcp_process.c | 117 ++--
 lib/pdcp/pdcp_reorder.c |  27 ++
 lib/pdcp/pdcp_reorder.h |  62 +
 lib/pdcp/rte_pdcp.c |  53 --
 lib/pdcp/rte_pdcp.h |   6 ++-
 7 files changed, 252 insertions(+), 35 deletions(-)
 create mode 100644 lib/pdcp/pdcp_reorder.c
 create mode 100644 lib/pdcp/pdcp_reorder.h

diff --git a/lib/pdcp/meson.build b/lib/pdcp/meson.build
index 75d476bf6d..f4f9246bcb 100644
--- a/lib/pdcp/meson.build
+++ b/lib/pdcp/meson.build
@@ -12,9 +12,10 @@ sources = files(
 'pdcp_crypto.c',
 'pdcp_ctrl_pdu.c',
 'pdcp_process.c',
+'pdcp_reorder.c',
 'rte_pdcp.c',
 )
 headers = files('rte_pdcp.h')
 indirect_headers += files('rte_pdcp_group.h')
 
-deps += ['mbuf', 'net', 'cryptodev', 'security']
+deps += ['mbuf', 'net', 'cryptodev', 'security', 'reorder']
diff --git a/lib/pdcp/pdcp_entity.h b/lib/pdcp/pdcp_entity.h
index 28691a504b..34341cdc11 100644
--- a/lib/pdcp/pdcp_entity.h
+++ b/lib/pdcp/pdcp_entity.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+#include "pdcp_reorder.h"
+
 struct entity_priv;
 
 #define PDCP_HFN_MIN 0
@@ -109,6 +111,17 @@ union cipher_iv_partial {
uint64_t u64[2];
 };
 
+enum timer_state {
+   TIMER_STOP,
+   TIMER_RUNNING,
+   TIMER_EXPIRED,
+};
+
+struct pdcp_t_reordering {
+   /** Represent timer state */
+   enum timer_state state;
+};
+
 struct pdcp_cnt_bitmap {
/** Number of entries that can be stored. */
uint32_t size;
@@ -145,6 +158,8 @@ struct entity_priv {
uint64_t is_null_auth : 1;
/** Is status report required.*/
uint64_t is_status_report_required : 1;
+   /** Is out-of-order delivery enabled */
+   uint64_t is_out_of_order_delivery : 1;
} flags;
/** Crypto op pool. */
struct rte_mempool *cop_pool;
@@ -161,6 +176,10 @@ struct entity_priv {
 struct entity_priv_dl_part {
/** PDCP would need to track the count values that are already 
received.*/
struct pdcp_cnt_bitmap bitmap;
+   /** t-Reordering handles */
+   struct pdcp_t_reordering t_reorder;
+   /** Reorder packet buffer */
+   struct pdcp_reorder reorder;
 };
 
 struct entity_priv_ul_part {
diff --git a/lib/pdcp/pdcp_process.c b/lib/pdcp/pdcp_process.c
index ed1413db6d..84a0f3a43f 100644
--- a/lib/pdcp/pdcp_process.c
+++ b/lib/pdcp/pdcp_process.c
@@ -837,25 +837,88 @@ pdcp_packet_strip(struct rte_mbuf *mb, const uint32_t 
hdr_trim_sz, const bool tr
}
 }
 
-static inline bool
+static inline int
 pdcp_post_process_update_entity_state(const struct rte_pdcp_entity *entity,
- const uint32_t count)
+ const uint32_t count, struct rte_mbuf *mb,
+ struct rte_mbuf *out_mb[],
+ const bool trim_mac)
 {
struct entity_priv *en_priv = entity_priv_get(entity);
+   struct pdcp_t_reordering *t_reorder;
+   struct pdcp_reorder *reorder;
+   uint16_t processed = 0;
 
-   if (count < en_priv->state.rx_deliv)
-   return false;
+   struct entity_priv_dl_part *dl = entity_dl_part_get(entity);
+   const uint32_t hdr_trim_sz = en_priv->hdr_sz + en_priv->aad_sz;
 
-   /* t-Reordering timer is not supported - SDU will be delivered 
immediately.
-* Update RX_DELIV to the COUNT value of the first PDCP SDU which has 
not
-* been delivered to upper layers
-*/
-   en_priv->state.rx_next = count + 1;
+   if (count < en_priv->state.rx_deliv)
+   return -EINVAL;
 
if (count >= en_priv->state.rx_next)
en_priv->state.rx_next = count + 1;
 
-   return true;
+   pdcp_packet_strip(mb, hdr_trim_sz, trim_mac);
+
+   if (en_priv->flags.is_out_of_order_delivery) {
+   out_mb[0] = mb;
+   en_priv->state.rx_deliv = count + 1;
+
+   return 1;
+   }
+
+   reorder = &dl->reorder;
+   t_reorder = &dl->t_reorder;
+
+   if (count == en_priv->state.rx_deliv) {
+   if (reorder->is_active) {
+   /*
+* This insert used only to increment reorder->min_seqn
+* To remove it - min_seqn_s

[PATCH v6 14/21] test/pdcp: add in-order delivery cases

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

Add test cases to verify behaviour when in-order delivery is enabled and
packets arrive in out-of-order. PDCP library is expected to buffer the
packets and return packets in-order when the missing packet arrives.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 app/test/test_pdcp.c | 223 +++
 1 file changed, 223 insertions(+)

diff --git a/app/test/test_pdcp.c b/app/test/test_pdcp.c
index cfe2ec6aa9..24d7826bc2 100644
--- a/app/test/test_pdcp.c
+++ b/app/test/test_pdcp.c
@@ -16,6 +16,15 @@
 #define NB_TESTS RTE_DIM(pdcp_test_params)
 #define PDCP_IV_LEN 16
 
+/* Assert that condition is true, or goto the mark */
+#define ASSERT_TRUE_OR_GOTO(cond, mark, ...) do {\
+   if (!(cond)) { \
+   RTE_LOG(ERR, USER1, "Error at: %s:%d\n", __func__, __LINE__); \
+   RTE_LOG(ERR, USER1, __VA_ARGS__); \
+   goto mark; \
+   } \
+} while (0)
+
 /* According to formula(7.2.a Window_Size) */
 #define PDCP_WINDOW_SIZE(sn_size) (1 << (sn_size - 1))
 
@@ -83,6 +92,38 @@ run_test_with_all_known_vec(const void *args)
return run_test_foreach_known_vec(test, false);
 }
 
+static int
+run_test_with_all_known_vec_until_first_pass(const void *args)
+{
+   test_with_conf_t test = args;
+
+   return run_test_foreach_known_vec(test, true);
+}
+
+static inline uint32_t
+pdcp_sn_mask_get(enum rte_security_pdcp_sn_size sn_size)
+{
+   return (1 << sn_size) - 1;
+}
+
+static inline uint32_t
+pdcp_sn_from_count_get(uint32_t count, enum rte_security_pdcp_sn_size sn_size)
+{
+   return (count & pdcp_sn_mask_get(sn_size));
+}
+
+static inline uint32_t
+pdcp_hfn_mask_get(enum rte_security_pdcp_sn_size sn_size)
+{
+   return ~pdcp_sn_mask_get(sn_size);
+}
+
+static inline uint32_t
+pdcp_hfn_from_count_get(uint32_t count, enum rte_security_pdcp_sn_size sn_size)
+{
+   return (count & pdcp_hfn_mask_get(sn_size)) >> sn_size;
+}
+
 static inline int
 pdcp_hdr_size_get(enum rte_security_pdcp_sn_size sn_size)
 {
@@ -416,6 +457,7 @@ create_test_conf_from_index(const int index, struct 
pdcp_test_conf *conf)
 
conf->entity.sess_mpool = ts_params->sess_pool;
conf->entity.cop_pool = ts_params->cop_pool;
+   conf->entity.ctrl_pdu_pool = ts_params->mbuf_pool;
conf->entity.pdcp_xfrm.bearer = pdcp_test_bearer[index];
conf->entity.pdcp_xfrm.en_ordering = 0;
conf->entity.pdcp_xfrm.remove_duplicates = 0;
@@ -868,6 +910,7 @@ test_sn_range_type(enum sn_range_type type, struct 
pdcp_test_conf *conf)
 
/* Configure Uplink to generate expected, encrypted packet */
pdcp_sn_to_raw_set(conf->input, new_sn, conf->entity.pdcp_xfrm.sn_size);
+   conf->entity.out_of_order_delivery = true;
conf->entity.reverse_iv_direction = true;
conf->entity.pdcp_xfrm.hfn = new_hfn;
conf->entity.sn = new_sn;
@@ -915,6 +958,171 @@ test_sn_minus_outside(struct pdcp_test_conf *t_conf)
return test_sn_range_type(SN_RANGE_MINUS_OUTSIDE, t_conf);
 }
 
+static struct rte_mbuf *
+generate_packet_for_dl_with_sn(struct pdcp_test_conf ul_conf, uint32_t count)
+{
+   enum rte_security_pdcp_sn_size sn_size = 
ul_conf.entity.pdcp_xfrm.sn_size;
+   int ret;
+
+   ul_conf.entity.pdcp_xfrm.hfn = pdcp_hfn_from_count_get(count, sn_size);
+   ul_conf.entity.sn = pdcp_sn_from_count_get(count, sn_size);
+   ul_conf.entity.out_of_order_delivery = true;
+   ul_conf.entity.reverse_iv_direction = true;
+   ul_conf.output_len = 0;
+
+   ret = test_attempt_single(&ul_conf);
+   if (ret != TEST_SUCCESS)
+   return NULL;
+
+   return mbuf_from_data_create(ul_conf.output, ul_conf.output_len);
+}
+
+static bool
+array_asc_sorted_check(struct rte_mbuf *m[], uint32_t len, enum 
rte_security_pdcp_sn_size sn_size)
+{
+   uint32_t i;
+
+   if (len < 2)
+   return true;
+
+   for (i = 0; i < (len - 1); i++) {
+   if (pdcp_sn_from_raw_get(rte_pktmbuf_mtod(m[i], void *), 
sn_size) >
+   pdcp_sn_from_raw_get(rte_pktmbuf_mtod(m[i + 1], void *), 
sn_size))
+   return false;
+   }
+
+   return true;
+}
+
+static int
+test_reorder_gap_fill(struct pdcp_test_conf *ul_conf)
+{
+   const enum rte_security_pdcp_sn_size sn_size = 
ul_conf->entity.pdcp_xfrm.sn_size;
+   struct rte_mbuf *m0 = NULL, *m1 = NULL, *out_mb[2] = {0};
+   uint16_t nb_success = 0, nb_err = 0;
+   struct rte_pdcp_entity *pdcp_entity;
+   struct pdcp_test_conf dl_conf;
+   int ret = TEST_FAILED, nb_out;
+   uint8_t cdev_id;
+
+   const int start_count = 0;
+
+   if (ul_conf->entity.pdcp_xfrm.pkt_dir == RTE_SECURITY_PDCP_DOWNLINK)
+   return TEST_SKIPPED;
+
+   /* Create configuration for actual testing */
+   uplink_to_downlink_convert(ul_conf, &dl_conf);
+   dl_conf.entity.pdcp_xfrm.hfn = pdcp_hfn_from_count_get(start_count, 
sn_

[PATCH v6 15/21] pdcp: add timer callback handlers

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

PDCP has a windowing mechanism which allows only packets that fall in a
reception window. The pivot point for this window is RX_REORD which
happens to be the first missing or next expected packet. If the missing
packet is not received after a specified time, then the RX_REORD state
variable needs to be moved up to slide the reception window. PDCP relies
on timers for such operations.

The timer needs to be armed when PDCP library doesn't receive all
packets in-order and starts buffering packets that arrived after a
missing packet. The timer needs to be cancelled when a missing packet
is received.

To avoid dependency on particular timer implementation, PDCP library
allows application to register two callbacks, timer_start() and
timer_stop() that will be called later by library.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 lib/pdcp/pdcp_entity.h  |  2 ++
 lib/pdcp/pdcp_process.c |  2 ++
 lib/pdcp/rte_pdcp.c |  1 +
 lib/pdcp/rte_pdcp.h | 47 +
 4 files changed, 52 insertions(+)

diff --git a/lib/pdcp/pdcp_entity.h b/lib/pdcp/pdcp_entity.h
index 34341cdc11..efc74ba9b9 100644
--- a/lib/pdcp/pdcp_entity.h
+++ b/lib/pdcp/pdcp_entity.h
@@ -120,6 +120,8 @@ enum timer_state {
 struct pdcp_t_reordering {
/** Represent timer state */
enum timer_state state;
+   /** User defined callback handles */
+   struct rte_pdcp_t_reordering handle;
 };
 
 struct pdcp_cnt_bitmap {
diff --git a/lib/pdcp/pdcp_process.c b/lib/pdcp/pdcp_process.c
index 84a0f3a43f..daf2c27363 100644
--- a/lib/pdcp/pdcp_process.c
+++ b/lib/pdcp/pdcp_process.c
@@ -902,6 +902,7 @@ pdcp_post_process_update_entity_state(const struct 
rte_pdcp_entity *entity,
if (t_reorder->state == TIMER_RUNNING &&
en_priv->state.rx_deliv >= en_priv->state.rx_reord) {
t_reorder->state = TIMER_STOP;
+   t_reorder->handle.stop(t_reorder->handle.timer, 
t_reorder->handle.args);
/* Stop reorder buffer, only if it's empty */
if (en_priv->state.rx_deliv == en_priv->state.rx_next)
pdcp_reorder_stop(reorder);
@@ -916,6 +917,7 @@ pdcp_post_process_update_entity_state(const struct 
rte_pdcp_entity *entity,
en_priv->state.rx_reord = en_priv->state.rx_next;
/* Start t-Reordering */
t_reorder->state = TIMER_RUNNING;
+   t_reorder->handle.start(t_reorder->handle.timer, 
t_reorder->handle.args);
}
 
return processed;
diff --git a/lib/pdcp/rte_pdcp.c b/lib/pdcp/rte_pdcp.c
index be37ff392c..a0558b99ae 100644
--- a/lib/pdcp/rte_pdcp.c
+++ b/lib/pdcp/rte_pdcp.c
@@ -56,6 +56,7 @@ pdcp_dl_establish(struct rte_pdcp_entity *entity, const 
struct rte_pdcp_entity_c
struct entity_priv_dl_part *dl = entity_dl_part_get(entity);
 
entity->max_pkt_cache = RTE_MAX(entity->max_pkt_cache, window_size);
+   dl->t_reorder.handle = conf->t_reordering;
 
return pdcp_reorder_create(&dl->reorder, window_size);
 }
diff --git a/lib/pdcp/rte_pdcp.h b/lib/pdcp/rte_pdcp.h
index 9c4d06962a..9cdce7d3a4 100644
--- a/lib/pdcp/rte_pdcp.h
+++ b/lib/pdcp/rte_pdcp.h
@@ -68,6 +68,51 @@ struct rte_pdcp_entity {
uint32_t max_pkt_cache;
 } __rte_cache_aligned;
 
+/**
+ * Callback function type for t-Reordering timer start, set during PDCP entity 
establish.
+ * This callback is invoked by PDCP library, during t-Reordering timer start 
event.
+ * Only one t-Reordering per receiving PDCP entity would be running at a given 
time.
+ *
+ * @see struct rte_pdcp_timer
+ * @see rte_pdcp_entity_establish()
+ *
+ * @param timer
+ *   Pointer to timer.
+ * @param args
+ *   Pointer to timer arguments.
+ */
+typedef void (*rte_pdcp_t_reordering_start_cb_t)(void *timer, void *args);
+
+/**
+ * Callback function type for t-Reordering timer stop, set during PDCP entity 
establish.
+ * This callback will be invoked by PDCP library, during t-Reordering timer 
stop event.
+ *
+ * @see struct rte_pdcp_timer
+ * @see rte_pdcp_entity_establish()
+ *
+ * @param timer
+ *   Pointer to timer.
+ * @param args
+ *   Pointer to timer arguments.
+ */
+typedef void (*rte_pdcp_t_reordering_stop_cb_t)(void *timer, void *args);
+
+/**
+ * PDCP t-Reordering timer interface
+ *
+ * Configuration provided by user, that PDCP library will invoke according to 
timer behaviour.
+ */
+struct rte_pdcp_t_reordering {
+   /** Timer pointer, to be used in callback functions. */
+   void *timer;
+   /** Timer arguments, to be used in callback functions. */
+   void *args;
+   /** Timer start callback handle. */
+   rte_pdcp_t_reordering_start_cb_t start;
+   /** Timer stop callback handle. */
+   rte_pdcp_t_reordering_stop_cb_t stop;
+};
+
 /**
  * PDCP entity configuration to be used for establishing an entity.
  */
@@ -112,6 +157,8 @@ struct rte_pdcp_entity_conf {
bool status_report_requir

[PATCH v6 16/21] pdcp: add timer expiry handle

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

The PDCP protocol requires usage of timers to keep track of how long
an out-of-order packet should be buffered while waiting for missing
packets. Applications can register a desired timer implementation with the
PDCP library. Once the timer expires, the application will be notified, and
further handling of the event will be performed in the PDCP library.

When the timer expires, the PDCP library will return the cached packets,
and PDCP internal state variables (like RX_REORD, RX_DELIV etc) will be
updated accordingly.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 doc/guides/prog_guide/pdcp_lib.rst | 31 +++
 lib/pdcp/rte_pdcp.c| 49 ++
 lib/pdcp/rte_pdcp.h| 34 +++--
 lib/pdcp/version.map   |  2 ++
 4 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/doc/guides/prog_guide/pdcp_lib.rst 
b/doc/guides/prog_guide/pdcp_lib.rst
index a925aa7f14..61242edf92 100644
--- a/doc/guides/prog_guide/pdcp_lib.rst
+++ b/doc/guides/prog_guide/pdcp_lib.rst
@@ -130,6 +130,37 @@ Supported integrity protection algorithms
 - RTE_CRYPTO_AUTH_SNOW3G_UIA2
 - RTE_CRYPTO_AUTH_ZUC_EIA3
 
+Timers
+--
+
+PDCP utilizes a reception window mechanism to limit the bits of COUNT value
+transmitted in the packet. It utilizes state variables such as RX_REORD,
+RX_DELIV to define the window and uses RX_DELIV as the lower pivot point of the
+window.
+
+RX_DELIV would be updated only when packets are received in-order.
+Any missing packet would mean RX_DELIV won't be updated.
+A timer, t-Reordering, helps PDCP to slide the window
+if the missing packet is not received in a specified time duration.
+
+While starting and stopping the timer will be done by lib PDCP,
+application could register its own timer implementation.
+This is to make sure application can choose between timers
+such as rte_timer and rte_event based timers.
+Starting and stopping of timer would happen during pre & post process API.
+
+When the t-Reordering timer expires, application would receive the expiry 
event.
+To perform the PDCP handling of the expiry event,
+``rte_pdcp_t_reordering_expiry_handle`` can be used.
+Expiry handling would involve sliding the window by updating state variables 
and
+passing the expired packets to the application.
+
+.. literalinclude:: ../../../lib/pdcp/rte_pdcp.h
+   :language: c
+   :start-after: Structure rte_pdcp_t_reordering 8<
+   :end-before: >8 End of structure rte_pdcp_t_reordering.
+
+
 Sample API usage
 
 
diff --git a/lib/pdcp/rte_pdcp.c b/lib/pdcp/rte_pdcp.c
index a0558b99ae..819c66bd08 100644
--- a/lib/pdcp/rte_pdcp.c
+++ b/lib/pdcp/rte_pdcp.c
@@ -251,3 +251,52 @@ rte_pdcp_control_pdu_create(struct rte_pdcp_entity 
*pdcp_entity,
 
return m;
 }
+
+uint16_t
+rte_pdcp_t_reordering_expiry_handle(const struct rte_pdcp_entity *entity, 
struct rte_mbuf *out_mb[])
+{
+   struct entity_priv_dl_part *dl = entity_dl_part_get(entity);
+   struct entity_priv *en_priv = entity_priv_get(entity);
+   uint16_t capacity = entity->max_pkt_cache;
+   uint16_t nb_out, nb_seq;
+
+   /* 5.2.2.2 Actions when a t-Reordering expires */
+
+   /*
+* - deliver to upper layers in ascending order of the associated COUNT 
value after
+*   performing header decompression, if not decompressed before:
+*/
+
+   /*   - all stored PDCP SDU(s) with associated COUNT value(s) < 
RX_REORD; */
+   nb_out = pdcp_reorder_up_to_get(&dl->reorder, out_mb, capacity, 
en_priv->state.rx_reord);
+   capacity -= nb_out;
+   out_mb = &out_mb[nb_out];
+
+   /*
+*   - all stored PDCP SDU(s) with consecutively associated COUNT 
value(s) starting from
+* RX_REORD;
+*/
+   nb_seq = pdcp_reorder_get_sequential(&dl->reorder, out_mb, capacity);
+   nb_out += nb_seq;
+
+   /*
+* - update RX_DELIV to the COUNT value of the first PDCP SDU which has 
not been delivered
+*   to upper layers, with COUNT value >= RX_REORD;
+*/
+   en_priv->state.rx_deliv = en_priv->state.rx_reord + nb_seq;
+
+   /*
+* - if RX_DELIV < RX_NEXT:
+*   - update RX_REORD to RX_NEXT;
+*   - start t-Reordering.
+*/
+   if (en_priv->state.rx_deliv < en_priv->state.rx_next) {
+   en_priv->state.rx_reord = en_priv->state.rx_next;
+   dl->t_reorder.state = TIMER_RUNNING;
+   dl->t_reorder.handle.start(dl->t_reorder.handle.timer, 
dl->t_reorder.handle.args);
+   } else {
+   dl->t_reorder.state = TIMER_EXPIRED;
+   }
+
+   return nb_out;
+}
diff --git a/lib/pdcp/rte_pdcp.h b/lib/pdcp/rte_pdcp.h
index 9cdce7d3a4..c5b99dfa2b 100644
--- a/lib/pdcp/rte_pdcp.h
+++ b/lib/pdcp/rte_pdcp.h
@@ -102,6 +102,7 @@ typedef void (*rte_pdcp_t_reordering_stop_cb_t)(void 
*timer, void *args);
  *
  * Configuration provided by use

[PATCH v6 17/21] test/pdcp: add timer expiry cases

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

Add test cases for handling the expiry with rte_timer and rte_event_timer.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 app/test/test_pdcp.c | 350 +++
 1 file changed, 350 insertions(+)

diff --git a/app/test/test_pdcp.c b/app/test/test_pdcp.c
index 24d7826bc2..25729b2bdd 100644
--- a/app/test/test_pdcp.c
+++ b/app/test/test_pdcp.c
@@ -3,15 +3,24 @@
  */
 
 #include 
+#ifdef RTE_LIB_EVENTDEV
+#include 
+#include 
+#endif /* RTE_LIB_EVENTDEV */
 #include 
 #include 
 #include 
+#include 
 
 #include "test.h"
 #include "test_cryptodev.h"
 #include "test_cryptodev_security_pdcp_test_vectors.h"
 
+#define NSECPERSEC 1E9
 #define NB_DESC 1024
+#define TIMER_ADAPTER_ID 0
+#define TEST_EV_QUEUE_ID 0
+#define TEST_EV_PORT_ID 0
 #define CDEV_INVALID_ID UINT8_MAX
 #define NB_TESTS RTE_DIM(pdcp_test_params)
 #define PDCP_IV_LEN 16
@@ -33,10 +42,21 @@ struct pdcp_testsuite_params {
struct rte_mempool *cop_pool;
struct rte_mempool *sess_pool;
bool cdevs_used[RTE_CRYPTO_MAX_DEVS];
+   int evdev;
+#ifdef RTE_LIB_EVENTDEV
+   struct rte_event_timer_adapter *timdev;
+#endif /* RTE_LIB_EVENTDEV */
+   bool timer_is_running;
+   uint64_t min_resolution_ns;
 };
 
 static struct pdcp_testsuite_params testsuite_params;
 
+struct test_rte_timer_args {
+   int status;
+   struct rte_pdcp_entity *pdcp_entity;
+};
+
 struct pdcp_test_conf {
struct rte_pdcp_entity_conf entity;
struct rte_crypto_sym_xform c_xfrm;
@@ -124,6 +144,30 @@ pdcp_hfn_from_count_get(uint32_t count, enum 
rte_security_pdcp_sn_size sn_size)
return (count & pdcp_hfn_mask_get(sn_size)) >> sn_size;
 }
 
+static void
+pdcp_timer_start_cb(void *timer, void *args)
+{
+   bool *is_timer_running = timer;
+
+   RTE_SET_USED(args);
+   *is_timer_running = true;
+}
+
+static void
+pdcp_timer_stop_cb(void *timer, void *args)
+{
+   bool *is_timer_running = timer;
+
+   RTE_SET_USED(args);
+   *is_timer_running = false;
+}
+
+static struct rte_pdcp_t_reordering t_reorder_timer = {
+   .timer = &testsuite_params.timer_is_running,
+   .start = pdcp_timer_start_cb,
+   .stop = pdcp_timer_stop_cb,
+};
+
 static inline int
 pdcp_hdr_size_get(enum rte_security_pdcp_sn_size sn_size)
 {
@@ -462,6 +506,7 @@ create_test_conf_from_index(const int index, struct 
pdcp_test_conf *conf)
conf->entity.pdcp_xfrm.en_ordering = 0;
conf->entity.pdcp_xfrm.remove_duplicates = 0;
conf->entity.pdcp_xfrm.domain = pdcp_test_params[index].domain;
+   conf->entity.t_reordering = t_reorder_timer;
 
if (pdcp_test_packet_direction[index] == PDCP_DIR_UPLINK)
conf->entity.pdcp_xfrm.pkt_dir = RTE_SECURITY_PDCP_UPLINK;
@@ -1048,6 +1093,8 @@ test_reorder_gap_fill(struct pdcp_test_conf *ul_conf)
/* Check that packets in correct order */
ASSERT_TRUE_OR_GOTO(array_asc_sorted_check(out_mb, nb_success, 
sn_size), exit,
"Error occurred during packet drain\n");
+   ASSERT_TRUE_OR_GOTO(testsuite_params.timer_is_running == false, exit,
+   "Timer should be stopped after full drain\n");
 
ret = TEST_SUCCESS;
 exit:
@@ -1123,6 +1170,181 @@ test_reorder_buffer_full_window_size_sn_12(const struct 
pdcp_test_conf *ul_conf)
return ret;
 }
 
+#ifdef RTE_LIB_EVENTDEV
+static void
+event_timer_start_cb(void *timer, void *args)
+{
+   struct rte_event_timer *evtims = args;
+   int ret = 0;
+
+   ret = rte_event_timer_arm_burst(timer, &evtims, 1);
+   assert(ret == 1);
+}
+#endif /* RTE_LIB_EVENTDEV */
+
+static int
+test_expiry_with_event_timer(const struct pdcp_test_conf *ul_conf)
+{
+#ifdef RTE_LIB_EVENTDEV
+   const enum rte_security_pdcp_sn_size sn_size = 
ul_conf->entity.pdcp_xfrm.sn_size;
+   struct rte_mbuf *m1 = NULL, *out_mb[1] = {0};
+   uint16_t n = 0, nb_err = 0, nb_try = 5;
+   struct rte_pdcp_entity *pdcp_entity;
+   struct pdcp_test_conf dl_conf;
+   int ret = TEST_FAILED, nb_out;
+   struct rte_event event;
+
+   const int start_count = 0;
+   struct rte_event_timer evtim = {
+   .ev.op = RTE_EVENT_OP_NEW,
+   .ev.queue_id = TEST_EV_QUEUE_ID,
+   .ev.sched_type = RTE_SCHED_TYPE_ATOMIC,
+   .ev.priority = RTE_EVENT_DEV_PRIORITY_NORMAL,
+   .ev.event_type =  RTE_EVENT_TYPE_TIMER,
+   .state = RTE_EVENT_TIMER_NOT_ARMED,
+   .timeout_ticks = 1,
+   };
+
+   if (ul_conf->entity.pdcp_xfrm.pkt_dir == RTE_SECURITY_PDCP_DOWNLINK)
+   return TEST_SKIPPED;
+
+   /* Create configuration for actual testing */
+   uplink_to_downlink_convert(ul_conf, &dl_conf);
+   dl_conf.entity.pdcp_xfrm.hfn = pdcp_hfn_from_count_get(start_count, 
sn_size);
+   dl_conf.entity.sn = pdcp_sn_from_count_get(start_count, sn_size);
+   dl_conf.entity.t_r

[PATCH v6 19/21] pdcp: add support for status report

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

Implement status report generation for PDCP entity.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 lib/pdcp/pdcp_cnt.c  | 158 ---
 lib/pdcp/pdcp_cnt.h  |  11 ++-
 lib/pdcp/pdcp_ctrl_pdu.c |  34 -
 lib/pdcp/pdcp_ctrl_pdu.h |   3 +-
 lib/pdcp/pdcp_entity.h   |   2 +
 lib/pdcp/pdcp_process.c  |   9 ++-
 lib/pdcp/pdcp_process.h  |  13 
 lib/pdcp/rte_pdcp.c  |  34 ++---
 8 files changed, 236 insertions(+), 28 deletions(-)

diff --git a/lib/pdcp/pdcp_cnt.c b/lib/pdcp/pdcp_cnt.c
index c9b952184b..af027b00d3 100644
--- a/lib/pdcp/pdcp_cnt.c
+++ b/lib/pdcp/pdcp_cnt.c
@@ -2,28 +2,164 @@
  * Copyright(C) 2023 Marvell.
  */
 
+#include 
 #include 
 
 #include "pdcp_cnt.h"
+#include "pdcp_ctrl_pdu.h"
 #include "pdcp_entity.h"
 
+#define SLAB_BYTE_SIZE (RTE_BITMAP_SLAB_BIT_SIZE / 8)
+
+uint32_t
+pdcp_cnt_bitmap_get_memory_footprint(const struct rte_pdcp_entity_conf *conf)
+{
+   uint32_t n_bits = pdcp_window_size_get(conf->pdcp_xfrm.sn_size);
+
+   return rte_bitmap_get_memory_footprint(n_bits);
+}
+
 int
-pdcp_cnt_ring_create(struct rte_pdcp_entity *en, const struct 
rte_pdcp_entity_conf *conf)
+pdcp_cnt_bitmap_create(struct entity_priv_dl_part *dl, void *bitmap_mem, 
uint32_t window_size)
 {
-   struct entity_priv_dl_part *en_priv_dl;
-   uint32_t window_sz;
+   uint32_t mem_size = rte_bitmap_get_memory_footprint(window_size);
 
-   if (en == NULL || conf == NULL)
+   dl->bitmap.bmp = rte_bitmap_init(window_size, bitmap_mem, mem_size);
+   if (dl->bitmap.bmp == NULL)
return -EINVAL;
 
-   if (conf->pdcp_xfrm.pkt_dir == RTE_SECURITY_PDCP_UPLINK)
-   return 0;
+   dl->bitmap.size = window_size;
 
-   en_priv_dl = entity_dl_part_get(en);
-   window_sz = pdcp_window_size_get(conf->pdcp_xfrm.sn_size);
+   return 0;
+}
 
-   RTE_SET_USED(window_sz);
-   RTE_SET_USED(en_priv_dl);
+void
+pdcp_cnt_bitmap_set(struct pdcp_cnt_bitmap bitmap, uint32_t count)
+{
+   rte_bitmap_set(bitmap.bmp, count % bitmap.size);
+}
 
-   return 0;
+bool
+pdcp_cnt_bitmap_is_set(struct pdcp_cnt_bitmap bitmap, uint32_t count)
+{
+   return rte_bitmap_get(bitmap.bmp, count % bitmap.size);
+}
+
+void
+pdcp_cnt_bitmap_range_clear(struct pdcp_cnt_bitmap bitmap, uint32_t start, 
uint32_t stop)
+{
+   uint32_t i;
+
+   for (i = start; i < stop; i++)
+   rte_bitmap_clear(bitmap.bmp, i % bitmap.size);
+}
+
+uint16_t
+pdcp_cnt_get_bitmap_size(uint32_t pending_bytes)
+{
+   /*
+* Round up bitmap size to slab size to operate only on slabs sizes, 
instead of individual
+* bytes
+*/
+   return RTE_ALIGN_MUL_CEIL(pending_bytes, SLAB_BYTE_SIZE);
+}
+
+static __rte_always_inline uint64_t
+leftover_get(uint64_t slab, uint32_t shift, uint64_t mask)
+{
+   return (slab & mask) << shift;
+}
+
+void
+pdcp_cnt_report_fill(struct pdcp_cnt_bitmap bitmap, struct entity_state state,
+uint8_t *data, uint16_t data_len)
+{
+   uint64_t slab = 0, next_slab = 0, leftover;
+   uint32_t zeros, report_len, diff;
+   uint32_t slab_id, next_slab_id;
+   uint32_t pos = 0, next_pos = 0;
+
+   const uint32_t start_count = state.rx_deliv + 1;
+   const uint32_t nb_slabs = bitmap.size / RTE_BITMAP_SLAB_BIT_SIZE;
+   const uint32_t nb_data_slabs = data_len / SLAB_BYTE_SIZE;
+   const uint32_t start_slab_id = start_count / RTE_BITMAP_SLAB_BIT_SIZE;
+   const uint32_t stop_slab_id = (start_slab_id + nb_data_slabs) % 
nb_slabs;
+   const uint32_t shift = start_count % RTE_BITMAP_SLAB_BIT_SIZE;
+   const uint32_t leftover_shift = shift ? RTE_BITMAP_SLAB_BIT_SIZE - 
shift : 0;
+   const uint8_t *data_end = RTE_PTR_ADD(data, data_len + SLAB_BYTE_SIZE);
+
+   /* NOTE: Mask required to workaround case - when shift is not needed */
+   const uint64_t leftover_mask = shift ? ~0 : 0;
+
+   /* NOTE: implement scan init at to set custom position */
+   __rte_bitmap_scan_init(bitmap.bmp);
+   while (true) {
+   assert(rte_bitmap_scan(bitmap.bmp, &pos, &slab) == 1);
+   slab_id = pos / RTE_BITMAP_SLAB_BIT_SIZE;
+   if (slab_id >= start_slab_id)
+   break;
+   }
+
+   report_len = nb_data_slabs;
+
+   if (slab_id > start_slab_id) {
+   /* Zero slabs at beginning */
+   zeros = (slab_id - start_slab_id - 1) * SLAB_BYTE_SIZE;
+   memset(data, 0, zeros);
+   data = RTE_PTR_ADD(data, zeros);
+   leftover = leftover_get(slab, leftover_shift, leftover_mask);
+   memcpy(data, &leftover, SLAB_BYTE_SIZE);
+   data = RTE_PTR_ADD(data, SLAB_BYTE_SIZE);
+   report_len -= (slab_id - start_slab_id);
+   }
+
+   while (report_len) {
+   rte_bitmap_scan(bitmap.bmp, &next_pos, &next_slab);
+

[PATCH v6 18/21] test/pdcp: add timer restart case

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

Add test to cover the case when t-reordering timer should be restarted on
the same packet.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 app/test/test_pdcp.c | 68 
 1 file changed, 68 insertions(+)

diff --git a/app/test/test_pdcp.c b/app/test/test_pdcp.c
index 25729b2bdd..82cc25ec7a 100644
--- a/app/test/test_pdcp.c
+++ b/app/test/test_pdcp.c
@@ -1106,6 +1106,71 @@ test_reorder_gap_fill(struct pdcp_test_conf *ul_conf)
return ret;
 }
 
+static int
+test_reorder_gap_in_reorder_buffer(const struct pdcp_test_conf *ul_conf)
+{
+   const enum rte_security_pdcp_sn_size sn_size = 
ul_conf->entity.pdcp_xfrm.sn_size;
+   struct rte_mbuf *m = NULL, *out_mb[2] = {0};
+   uint16_t nb_success = 0, nb_err = 0;
+   struct rte_pdcp_entity *pdcp_entity;
+   int ret = TEST_FAILED, nb_out, i;
+   struct pdcp_test_conf dl_conf;
+   uint8_t cdev_id;
+
+   const int start_count = 0;
+
+   if (ul_conf->entity.pdcp_xfrm.pkt_dir == RTE_SECURITY_PDCP_DOWNLINK)
+   return TEST_SKIPPED;
+
+   /* Create configuration for actual testing */
+   uplink_to_downlink_convert(ul_conf, &dl_conf);
+   dl_conf.entity.pdcp_xfrm.hfn = pdcp_hfn_from_count_get(start_count, 
sn_size);
+   dl_conf.entity.sn = pdcp_sn_from_count_get(start_count, sn_size);
+   pdcp_entity = test_entity_create(&dl_conf, &ret);
+   if (pdcp_entity == NULL)
+   return ret;
+
+   cdev_id = dl_conf.entity.dev_id;
+
+   /* Create two gaps [NULL, m1, NULL, m3]*/
+   for (i = 0; i < 2; i++) {
+   m = generate_packet_for_dl_with_sn(*ul_conf, start_count + 2 * 
i + 1);
+   ASSERT_TRUE_OR_GOTO(m != NULL, exit, "Could not allocate buffer 
for packet\n");
+   nb_success = test_process_packets(pdcp_entity, cdev_id, &m, 1, 
out_mb, &nb_err);
+   ASSERT_TRUE_OR_GOTO(nb_err == 0, exit, "Error occurred during 
packet process\n");
+   ASSERT_TRUE_OR_GOTO(nb_success == 0, exit, "Packet was not 
buffered as expected\n");
+   m = NULL; /* Packet was moved to PDCP lib */
+   }
+
+   /* Generate packet to fill the first gap */
+   m = generate_packet_for_dl_with_sn(*ul_conf, start_count);
+   ASSERT_TRUE_OR_GOTO(m != NULL, exit, "Could not allocate buffer for 
packet\n");
+
+   /*
+* Buffered packets after insert [m0, m1, NULL, m3]
+* Only first gap should be filled, timer should be restarted for 
second gap
+*/
+   nb_success = test_process_packets(pdcp_entity, cdev_id, &m, 1, out_mb, 
&nb_err);
+   ASSERT_TRUE_OR_GOTO(nb_err == 0, exit, "Error occurred during packet 
process\n");
+   ASSERT_TRUE_OR_GOTO(nb_success == 2, exit,
+   "Packet count mismatch (received: %i, expected: 2)\n", 
nb_success);
+   m = NULL;
+   /* Check that packets in correct order */
+   ASSERT_TRUE_OR_GOTO(array_asc_sorted_check(out_mb, nb_success, sn_size),
+   exit, "Error occurred during packet drain\n");
+   ASSERT_TRUE_OR_GOTO(testsuite_params.timer_is_running == true, exit,
+   "Timer should be restarted after partial drain");
+
+
+   ret = TEST_SUCCESS;
+exit:
+   rte_pktmbuf_free(m);
+   rte_pktmbuf_free_bulk(out_mb, nb_success);
+   nb_out = rte_pdcp_entity_release(pdcp_entity, out_mb);
+   rte_pktmbuf_free_bulk(out_mb, nb_out);
+   return ret;
+}
+
 static int
 test_reorder_buffer_full_window_size_sn_12(const struct pdcp_test_conf 
*ul_conf)
 {
@@ -1527,6 +1592,9 @@ static struct unit_test_suite reorder_test_cases  = {
TEST_CASE_NAMED_WITH_DATA("test_reorder_gap_fill",
ut_setup_pdcp, ut_teardown_pdcp,
run_test_with_all_known_vec, test_reorder_gap_fill),
+   TEST_CASE_NAMED_WITH_DATA("test_reorder_gap_in_reorder_buffer",
+   ut_setup_pdcp, ut_teardown_pdcp,
+   run_test_with_all_known_vec, 
test_reorder_gap_in_reorder_buffer),

TEST_CASE_NAMED_WITH_DATA("test_reorder_buffer_full_window_size_sn_12",
ut_setup_pdcp, ut_teardown_pdcp,
run_test_with_all_known_vec_until_first_pass,
-- 
2.25.1

[PATCH v6 20/21] pdcp: allocate reorder buffer alongside with entity

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

Instead of allocating reorder buffer separately on heap, allocate memory
for it together with rest of entity, and then only initialize buffer via
`rte_reorder_init()`.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 lib/pdcp/pdcp_cnt.c |  9 +++
 lib/pdcp/pdcp_cnt.h |  3 ++-
 lib/pdcp/pdcp_entity.h  |  2 +-
 lib/pdcp/pdcp_reorder.c | 11 ++--
 lib/pdcp/pdcp_reorder.h | 12 ++---
 lib/pdcp/rte_pdcp.c | 58 ++---
 6 files changed, 55 insertions(+), 40 deletions(-)

diff --git a/lib/pdcp/pdcp_cnt.c b/lib/pdcp/pdcp_cnt.c
index af027b00d3..e1d0634b4d 100644
--- a/lib/pdcp/pdcp_cnt.c
+++ b/lib/pdcp/pdcp_cnt.c
@@ -20,15 +20,14 @@ pdcp_cnt_bitmap_get_memory_footprint(const struct 
rte_pdcp_entity_conf *conf)
 }
 
 int
-pdcp_cnt_bitmap_create(struct entity_priv_dl_part *dl, void *bitmap_mem, 
uint32_t window_size)
+pdcp_cnt_bitmap_create(struct entity_priv_dl_part *dl, uint32_t nb_elem,
+  void *bitmap_mem, uint32_t mem_size)
 {
-   uint32_t mem_size = rte_bitmap_get_memory_footprint(window_size);
-
-   dl->bitmap.bmp = rte_bitmap_init(window_size, bitmap_mem, mem_size);
+   dl->bitmap.bmp = rte_bitmap_init(nb_elem, bitmap_mem, mem_size);
if (dl->bitmap.bmp == NULL)
return -EINVAL;
 
-   dl->bitmap.size = window_size;
+   dl->bitmap.size = nb_elem;
 
return 0;
 }
diff --git a/lib/pdcp/pdcp_cnt.h b/lib/pdcp/pdcp_cnt.h
index 5941b7a406..87b011f9dc 100644
--- a/lib/pdcp/pdcp_cnt.h
+++ b/lib/pdcp/pdcp_cnt.h
@@ -10,7 +10,8 @@
 #include "pdcp_entity.h"
 
 uint32_t pdcp_cnt_bitmap_get_memory_footprint(const struct 
rte_pdcp_entity_conf *conf);
-int pdcp_cnt_bitmap_create(struct entity_priv_dl_part *dl, void *bitmap_mem, 
uint32_t window_size);
+int pdcp_cnt_bitmap_create(struct entity_priv_dl_part *dl, uint32_t nb_elem,
+  void *bitmap_mem, uint32_t mem_size);
 
 void pdcp_cnt_bitmap_set(struct pdcp_cnt_bitmap bitmap, uint32_t count);
 bool pdcp_cnt_bitmap_is_set(struct pdcp_cnt_bitmap bitmap, uint32_t count);
diff --git a/lib/pdcp/pdcp_entity.h b/lib/pdcp/pdcp_entity.h
index a9b1428c7a..9f74b5d0e5 100644
--- a/lib/pdcp/pdcp_entity.h
+++ b/lib/pdcp/pdcp_entity.h
@@ -132,7 +132,7 @@ struct pdcp_cnt_bitmap {
 };
 
 /*
- * Layout of PDCP entity: [rte_pdcp_entity] [entity_priv] [entity_dl/ul]
+ * Layout of PDCP entity: [rte_pdcp_entity] [entity_priv] [entity_dl/ul] 
[reorder/bitmap]
  */
 
 struct entity_priv {
diff --git a/lib/pdcp/pdcp_reorder.c b/lib/pdcp/pdcp_reorder.c
index 5399f0dc28..bc45f2e19b 100644
--- a/lib/pdcp/pdcp_reorder.c
+++ b/lib/pdcp/pdcp_reorder.c
@@ -8,20 +8,13 @@
 #include "pdcp_reorder.h"
 
 int
-pdcp_reorder_create(struct pdcp_reorder *reorder, uint32_t window_size)
+pdcp_reorder_create(struct pdcp_reorder *reorder, size_t nb_elem, void *mem, 
size_t mem_size)
 {
-   reorder->buf = rte_reorder_create("reorder_buffer", SOCKET_ID_ANY, 
window_size);
+   reorder->buf = rte_reorder_init(mem, mem_size, "reorder_buffer", 
nb_elem);
if (reorder->buf == NULL)
return -rte_errno;
 
-   reorder->window_size = window_size;
reorder->is_active = false;
 
return 0;
 }
-
-void
-pdcp_reorder_destroy(const struct pdcp_reorder *reorder)
-{
-   rte_reorder_free(reorder->buf);
-}
diff --git a/lib/pdcp/pdcp_reorder.h b/lib/pdcp/pdcp_reorder.h
index 6a2f61d6ae..7e4f079d4b 100644
--- a/lib/pdcp/pdcp_reorder.h
+++ b/lib/pdcp/pdcp_reorder.h
@@ -9,12 +9,18 @@
 
 struct pdcp_reorder {
struct rte_reorder_buffer *buf;
-   uint32_t window_size;
bool is_active;
 };
 
-int pdcp_reorder_create(struct pdcp_reorder *reorder, uint32_t window_size);
-void pdcp_reorder_destroy(const struct pdcp_reorder *reorder);
+int pdcp_reorder_create(struct pdcp_reorder *reorder, size_t nb_elem, void 
*mem, size_t mem_size);
+
+/* NOTE: replace with `rte_reorder_memory_footprint_get` after DPDK 23.07 */
+#define SIZE_OF_REORDER_BUFFER (4 * RTE_CACHE_LINE_SIZE)
+static inline size_t
+pdcp_reorder_memory_footprint_get(size_t nb_elem)
+{
+   return SIZE_OF_REORDER_BUFFER + (2 * nb_elem * sizeof(struct rte_mbuf 
*));
+}
 
 static inline uint32_t
 pdcp_reorder_get_sequential(struct pdcp_reorder *reorder, struct rte_mbuf 
**mbufs,
diff --git a/lib/pdcp/rte_pdcp.c b/lib/pdcp/rte_pdcp.c
index 9865c620b7..1c6d2466b2 100644
--- a/lib/pdcp/rte_pdcp.c
+++ b/lib/pdcp/rte_pdcp.c
@@ -14,7 +14,15 @@
 
 #define RTE_PDCP_DYNFIELD_NAME "rte_pdcp_dynfield"
 
-static int bitmap_mem_offset;
+struct entity_layout {
+   size_t bitmap_offset;
+   size_t bitmap_size;
+
+   size_t reorder_buf_offset;
+   size_t reorder_buf_size;
+
+   size_t total_size;
+};
 
 int rte_pdcp_dynfield_offset = -1;
 
@@ -35,46 +43,54 @@ pdcp_dynfield_register(void)
 }
 
 static int
-pdcp_entity_size_get(const struct rte_pdcp_entity_conf *conf)
+pdcp_entity_layout_get(const struct rte_pdcp_entity_conf *conf, s

[PATCH v6 21/21] test/pdcp: add PDCP status report cases

2023-05-30 Thread Anoob Joseph

From: Volodymyr Fialko 

Test PDCP status report generation.

Signed-off-by: Anoob Joseph 
Signed-off-by: Volodymyr Fialko 
---
 app/test/test_pdcp.c | 312 +++
 1 file changed, 312 insertions(+)

diff --git a/app/test/test_pdcp.c b/app/test/test_pdcp.c
index 82cc25ec7a..423526380f 100644
--- a/app/test/test_pdcp.c
+++ b/app/test/test_pdcp.c
@@ -2,6 +2,7 @@
  * Copyright(C) 2023 Marvell.
  */
 
+#include 
 #include 
 #ifdef RTE_LIB_EVENTDEV
 #include 
@@ -48,6 +49,9 @@ struct pdcp_testsuite_params {
 #endif /* RTE_LIB_EVENTDEV */
bool timer_is_running;
uint64_t min_resolution_ns;
+   struct rte_pdcp_up_ctrl_pdu_hdr *status_report;
+   uint32_t status_report_bitmask_capacity;
+   uint8_t *ctrl_pdu_buf;
 };
 
 static struct pdcp_testsuite_params testsuite_params;
@@ -168,6 +172,18 @@ static struct rte_pdcp_t_reordering t_reorder_timer = {
.stop = pdcp_timer_stop_cb,
 };
 
+static inline void
+bitmask_set_bit(uint8_t *mask, uint32_t bit)
+{
+   mask[bit / 8] |= (1 << bit % 8);
+}
+
+static inline bool
+bitmask_is_bit_set(const uint8_t *mask, uint32_t bit)
+{
+   return mask[bit / 8] & (1 << (bit % 8));
+}
+
 static inline int
 pdcp_hdr_size_get(enum rte_security_pdcp_sn_size sn_size)
 {
@@ -314,6 +330,21 @@ testsuite_setup(void)
goto cop_pool_free;
}
 
+   /* Allocate memory for longest possible status report */
+   ts_params->status_report_bitmask_capacity = RTE_PDCP_CTRL_PDU_SIZE_MAX -
+   sizeof(struct rte_pdcp_up_ctrl_pdu_hdr);
+   ts_params->status_report = rte_zmalloc(NULL, 
RTE_PDCP_CTRL_PDU_SIZE_MAX, 0);
+   if (ts_params->status_report == NULL) {
+   RTE_LOG(ERR, USER1, "Could not allocate status report\n");
+   goto cop_pool_free;
+   }
+
+   ts_params->ctrl_pdu_buf = rte_zmalloc(NULL, RTE_PDCP_CTRL_PDU_SIZE_MAX, 
0);
+   if (ts_params->ctrl_pdu_buf == NULL) {
+   RTE_LOG(ERR, USER1, "Could not allocate status report data\n");
+   goto cop_pool_free;
+   }
+
return 0;
 
 cop_pool_free:
@@ -322,6 +353,8 @@ testsuite_setup(void)
 mbuf_pool_free:
rte_mempool_free(ts_params->mbuf_pool);
ts_params->mbuf_pool = NULL;
+   rte_free(ts_params->status_report);
+   rte_free(ts_params->ctrl_pdu_buf);
return TEST_FAILED;
 }
 
@@ -344,6 +377,9 @@ testsuite_teardown(void)
 
rte_mempool_free(ts_params->mbuf_pool);
ts_params->mbuf_pool = NULL;
+
+   rte_free(ts_params->status_report);
+   rte_free(ts_params->ctrl_pdu_buf);
 }
 
 static int
@@ -1410,6 +1446,246 @@ test_expiry_with_rte_timer(const struct pdcp_test_conf 
*ul_conf)
return ret;
 }
 
+static struct rte_pdcp_up_ctrl_pdu_hdr *
+pdcp_status_report_init(uint32_t fmc)
+{
+   struct rte_pdcp_up_ctrl_pdu_hdr *hdr = testsuite_params.status_report;
+
+   hdr->d_c = RTE_PDCP_PDU_TYPE_CTRL;
+   hdr->pdu_type = RTE_PDCP_CTRL_PDU_TYPE_STATUS_REPORT;
+   hdr->fmc = rte_cpu_to_be_32(fmc);
+   hdr->r = 0;
+   memset(hdr->bitmap, 0, testsuite_params.status_report_bitmask_capacity);
+
+   return hdr;
+}
+
+static uint32_t
+pdcp_status_report_len(void)
+{
+   struct rte_pdcp_up_ctrl_pdu_hdr *hdr = testsuite_params.status_report;
+   uint32_t i;
+
+   for (i = testsuite_params.status_report_bitmask_capacity; i != 0; i--) {
+   if (hdr->bitmap[i - 1])
+   return i;
+   }
+
+   return 0;
+}
+
+static int
+pdcp_status_report_verify(struct rte_mbuf *status_report,
+const struct rte_pdcp_up_ctrl_pdu_hdr *expected_hdr, 
uint32_t expected_len)
+{
+   uint32_t received_len = rte_pktmbuf_pkt_len(status_report);
+   uint8_t *received_buf = testsuite_params.ctrl_pdu_buf;
+   int ret;
+
+   ret = pktmbuf_read_into(status_report, received_buf, 
RTE_PDCP_CTRL_PDU_SIZE_MAX);
+   TEST_ASSERT_SUCCESS(ret, "Failed to copy status report pkt into 
continuous buffer");
+
+   debug_hexdump(stdout, "Received:", received_buf, received_len);
+   debug_hexdump(stdout, "Expected:", expected_hdr, expected_len);
+
+   TEST_ASSERT_EQUAL(expected_len, received_len,
+ "Mismatch in packet lengths [expected: %d, received: 
%d]",
+ expected_len, received_len);
+
+   TEST_ASSERT_BUFFERS_ARE_EQUAL(received_buf, expected_hdr, expected_len,
+"Generated packet not as expected");
+
+   return 0;
+}
+
+static int
+test_status_report_gen(const struct pdcp_test_conf *ul_conf,
+  const struct rte_pdcp_up_ctrl_pdu_hdr *hdr,
+  uint32_t bitmap_len)
+{
+   const enum rte_security_pdcp_sn_size sn_size = 
ul_conf->entity.pdcp_xfrm.sn_size;
+   struct rte_mbuf *status_report = NULL, **out_mb, *m;
+   uint16_t nb_success = 0, nb_err = 0;
+   struct rte_pdcp_entity *pdcp_entity;
+

[PATCH] net/e1000: support device I219

2023-05-30 Thread Qiming Yang

support device I219 LM22, V22, LM23 and V23.

Signed-off-by: Qiming Yang 
---
 drivers/net/e1000/em_ethdev.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 8ee9be12ad..0afedcd00c 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -155,6 +155,10 @@ static const struct rte_pci_id pci_id_em_map[] = {
{ RTE_PCI_DEVICE(E1000_INTEL_VENDOR_ID, E1000_DEV_ID_PCH_CNP_I219_V6) },
{ RTE_PCI_DEVICE(E1000_INTEL_VENDOR_ID, E1000_DEV_ID_PCH_CNP_I219_LM7) 
},
{ RTE_PCI_DEVICE(E1000_INTEL_VENDOR_ID, E1000_DEV_ID_PCH_CNP_I219_V7) },
+   { RTE_PCI_DEVICE(E1000_INTEL_VENDOR_ID, E1000_DEV_ID_PCH_RPL_I219_LM22) 
},
+   { RTE_PCI_DEVICE(E1000_INTEL_VENDOR_ID, E1000_DEV_ID_PCH_RPL_I219_V22) 
},
+   { RTE_PCI_DEVICE(E1000_INTEL_VENDOR_ID, E1000_DEV_ID_PCH_RPL_I219_LM23) 
},
+   { RTE_PCI_DEVICE(E1000_INTEL_VENDOR_ID, E1000_DEV_ID_PCH_RPL_I219_V23) 
},
{ .vendor_id = 0, /* sentinel */ },
 };
 
@@ -227,6 +231,11 @@ eth_em_dev_is_ich8(struct e1000_hw *hw)
case E1000_DEV_ID_PCH_CNP_I219_V6:
case E1000_DEV_ID_PCH_CNP_I219_LM7:
case E1000_DEV_ID_PCH_CNP_I219_V7:
+   case E1000_DEV_ID_PCH_RPL_I219_LM22:
+   case E1000_DEV_ID_PCH_RPL_I219_V22:
+   case E1000_DEV_ID_PCH_RPL_I219_LM23:
+   case E1000_DEV_ID_PCH_RPL_I219_V23:
+
return 1;
default:
return 0;
@@ -482,6 +491,8 @@ em_set_pba(struct e1000_hw *hw)
case e1000_pch_lpt:
case e1000_pch_spt:
case e1000_pch_cnp:
+   case e1000_pch_adp:
+   case e1000_pch_tgp:
pba = E1000_PBA_26K;
break;
default:
@@ -852,7 +863,9 @@ em_hardware_init(struct e1000_hw *hw)
hw->fc.refresh_time = 0x0400;
} else if (hw->mac.type == e1000_pch_lpt ||
   hw->mac.type == e1000_pch_spt ||
-  hw->mac.type == e1000_pch_cnp) {
+  hw->mac.type == e1000_pch_cnp ||
+  hw->mac.type == e1000_pch_adp ||
+  hw->mac.type == e1000_pch_tgp) {
hw->fc.requested_mode = e1000_fc_full;
}
 
@@ -1033,6 +1046,8 @@ em_get_max_pktlen(struct rte_eth_dev *dev)
case e1000_pch_lpt:
case e1000_pch_spt:
case e1000_pch_cnp:
+   case e1000_pch_adp:
+   case e1000_pch_tgp:
case e1000_82574:
case e1000_80003es2lan: /* 9K Jumbo Frame size */
case e1000_82583:
-- 
2.25.1

RE: [PATCH v1] power: support amd-pstate cpufreq driver

2023-05-30 Thread Tummala, Sivaprasad

[AMD Official Use Only - General]

Hi Thomas,

> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, May 25, 2023 12:17 AM
> To: Tummala, Sivaprasad 
> Cc: david.h...@intel.com; dev@dpdk.org; Yigit, Ferruh ;
> anatoly.bura...@intel.com; Laatz, Kevin 
> Subject: Re: [PATCH v1] power: support amd-pstate cpufreq driver
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> 12/04/2023 11:52, Sivaprasad Tummala:
> > amd-pstate introduces a new CPU frequency control mechanism for AMD
> > processors using the ACPI Collaborative Performance Power Control
> > feature for a finer grained frequency management.
> >
> > Patch to add support for amd-pstate driver.
> >
> > Signed-off-by: Sivaprasad Tummala 
> > ---
> >  app/test/test_power.c  |   1 +
> >  app/test/test_power_cpufreq.c  |   5 +-
> >  doc/guides/rel_notes/release_23_07.rst |   3 +
> >  examples/l3fwd-power/main.c|   1 +
> >  lib/power/meson.build  |   1 +
> >  lib/power/power_amd_pstate_cpufreq.c   | 698
> +
> >  lib/power/power_amd_pstate_cpufreq.h   | 219 
> >  lib/power/rte_power.c  |  26 +
> >  lib/power/rte_power.h  |   3 +-
> >  lib/power/rte_power_pmd_mgmt.c |   6 +-
> >  10 files changed, 958 insertions(+), 5 deletions(-)
>
> I'm not comfortable to merge this patch without a word from David Hunt.
> Given there is 0 review, what do we do?
Yes, we are waiting for feedback from the community on this patch.
>
>
> >   Also, make sure to start the actual text at the margin.
> >
> 
> ===
> >
> > +   * **Added amd-pstate driver support to power management library.**
> > +
> > +Added support for amd-pstate driver which works on AMD Zen 
> > processors.
>
> Looks like the indent is not correct.
Sure, will fix it in v2 patch
>
> >  'power_pstate_cpufreq.c',
> > +'power_amd_pstate_cpufreq.c',
>
> Can you say briefly why AMD has a different pstate?
> Does it mean power_pstate_cpufreq.c should be renamed
> power_intel_pstate_cpufreq.c?
amd-pstate is the new AMD CPU performance scaling driver based on CPPC.
It is implemented as a new kernel driver and we need a different library from
Power_pstate_cpufreq. Yes, it makes sense to rename this, inline with the
Linux kernel as indicated below.
linux/latest/source/drivers/cpufreq
 -  amd-pstate.c
 -   cppc_cpufreq.c
 -   intel_pstate.c
>
> > +++ b/lib/power/power_amd_pstate_cpufreq.c
> > @@ -0,0 +1,698 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2010-2021 Intel Corporation
> > + * Copyright(c) 2021 Arm Limited
>
> Why is there copyright for Intel and Arm?
> Does it mean you copied some code and did not try to keep common code in a
> common place?
Yes, few internal functions follow similar notion as power_cppc_cpufreq.c
>
> > + * Copyright(c) 2023 Amd Limited
>
>

Thanks & Regards,
Sivaprasad

RE: [EXT] [PATCH v2 0/4] Replace obsolote test cases.

2023-05-30 Thread Akhil Goyal

> This patchset removes obsolete test cases for RSA, MOD EXP, MOD INV.
> Doing that, new way of handling ut_setup and ut_teardown was proposed.
> Now both behave like constructor/desctuctor to the unit tests.
> It frees particular alghorithm functions from any kind of responsibility to 
> free
> resources.
> The functionality of the tests was extended, but the number of lines of code 
> was
> reduced by ~600 lines.
> 
> v2:
> - fixed build problem with non compile-time constant
> 
> Arkadiusz Kusztal (4):
>   app/test: remove testsuite calls from ut setup
>   app/test: refactor mod exp test case
>   app/test: refactor mod inv tests
>   app/test: add rsa kat and pwct tests
> 
>  app/test/test_cryptodev_asym.c | 1610 +++-
>  app/test/test_cryptodev_asym_util.h|   28 -
>  app/test/test_cryptodev_mod_test_vectors.h |  631 +---
>  app/test/test_cryptodev_rsa_test_vectors.h |  600 
>  4 files changed, 852 insertions(+), 2017 deletions(-)
> 
Fix Doc build.

RE: [EXT] [PATCH v3 2/2] test: add reassembly perf test

2023-05-30 Thread Amit Prakash Shukla




> -Original Message-
> From: pbhagavat...@marvell.com 
> Sent: Monday, May 29, 2023 8:25 PM
> To: Jerin Jacob Kollanukkaran 
> Cc: dev@dpdk.org; Pavan Nikhilesh Bhagavatula
> 
> Subject: [EXT] [PATCH v3 2/2] test: add reassembly perf test
> 
> External Email
> 
> --
> From: Pavan Nikhilesh 
> 
> Add reassembly perf autotest for both ipv4 and ipv6 reassembly.
> Each test is performed with variable number of fragments per flow, either
> ordered or unordered fragments and interleaved flows.
> 
> Signed-off-by: Pavan Nikhilesh 
> ---
>  app/test/meson.build|2 +
>  app/test/test_reassembly_perf.c | 1001
> +++
>  2 files changed, 1003 insertions(+)
>  create mode 100644 app/test/test_reassembly_perf.c
> 
> diff --git a/app/test/meson.build b/app/test/meson.build index
> d96ae7a961..70f320f388 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -108,6 +108,7 @@ test_sources = files(
>  'test_rawdev.c',
>  'test_rcu_qsbr.c',
>  'test_rcu_qsbr_perf.c',
> +'test_reassembly_perf.c',
>  'test_reciprocal_division.c',
>  'test_reciprocal_division_perf.c',
>  'test_red.c',
> @@ -297,6 +298,7 @@ perf_test_names = [
>  'trace_perf_autotest',
>  'ipsec_perf_autotest',
>  'thash_perf_autotest',
> +'reassembly_perf_autotest',
>  ]
> 
>  driver_test_names = [
> diff --git a/app/test/test_reassembly_perf.c
> b/app/test/test_reassembly_perf.c new file mode 100644 index
> 00..850485a9c5
> --- /dev/null
> +++ b/app/test/test_reassembly_perf.c
> @@ -0,0 +1,1001 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2023 Marvell.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "test.h"
> +
> +#define MAX_FLOWS(1024 * 32)
> +#define MAX_BKTS MAX_FLOWS
> +#define MAX_ENTRIES_PER_BKT 16
> +#define MAX_FRAGMENTSRTE_LIBRTE_IP_FRAG_MAX_FRAG
> +#define MIN_FRAGMENTS2
> +#define MAX_PKTS (MAX_FLOWS * MAX_FRAGMENTS)
> +
> +#define MAX_PKT_LEN 2048
> +#define MAX_TTL_MS  (5 * MS_PER_S)
> +
> +/* use RFC863 Discard Protocol */
> +#define UDP_SRC_PORT 9
> +#define UDP_DST_PORT 9
> +
> +/* use RFC5735 / RFC2544 reserved network test addresses */ #define
> +IP_SRC_ADDR(x) ((198U << 24) | (18 << 16) | (0 << 8) | (x)) #define
> +IP_DST_ADDR(x) ((198U << 24) | (18 << 16) | (1 << 8) | (x))
> +
> +/* 2001:0200::/48 is IANA reserved range for IPv6 benchmarking
> +(RFC5180) */ static uint8_t ip6_addr[16] = {32, 1, 2, 0, 0, 0, 0, 0, 0,
> +0, 0, 0, 0, 0, 0, 0}; #define IP6_VERSION 6
> +
> +#define IP_DEFTTL 64 /* from RFC 1340. */
> +
> +static struct rte_ip_frag_tbl *frag_tbl; static struct rte_mempool
> +*pkt_pool; static struct rte_mbuf
> *mbufs[MAX_FLOWS][MAX_FRAGMENTS];
> +static uint8_t frag_per_flow[MAX_FLOWS]; static uint32_t flow_cnt;
> +
> +#define FILL_MODE_LINEAR  0
> +#define FILL_MODE_RANDOM  1
> +#define FILL_MODE_INTERLEAVED 2
> +
> +static int
> +reassembly_test_setup(void)
> +{
> + uint64_t max_ttl_cyc = (MAX_TTL_MS * rte_get_timer_hz()) / 1E3;
> +
> + frag_tbl = rte_ip_frag_table_create(MAX_FLOWS,

I see MAX_BKTS and MAX_FLOWS are same in this application. Just for code 
readability please use MAX_BKTS.

> MAX_ENTRIES_PER_BKT,
> + MAX_FLOWS *
> MAX_ENTRIES_PER_BKT,
> + max_ttl_cyc, rte_socket_id());
> + if (frag_tbl == NULL)
> + return TEST_FAILED;
> +
> + rte_mbuf_set_user_mempool_ops("ring_mp_mc");
> + pkt_pool = rte_pktmbuf_pool_create(
> + "reassembly_perf_pool", MAX_FLOWS * MAX_FRAGMENTS,
> 0, 0,
> + RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
> + if (pkt_pool == NULL) {
> + printf("[%s] Failed to create pkt pool\n", __func__);
> + rte_ip_frag_table_destroy(frag_tbl);
> + return TEST_FAILED;
> + }
> +
> + return TEST_SUCCESS;
> +}
> +
> +static void
> +reassembly_test_teardown(void)
> +{
> + if (frag_tbl != NULL)
> + rte_ip_frag_table_destroy(frag_tbl);
> +
> + if (pkt_pool != NULL)
> + rte_mempool_free(pkt_pool);
> +}
> +



> +static void
> +ipv4_frag_fill_data(struct rte_mbuf **mbuf, uint8_t nb_frags, uint32_t
> flow_id,
> + uint8_t fill_mode)
> +{
> + struct rte_ether_hdr *eth_hdr;
> + struct rte_ipv4_hdr *ip_hdr;
> + struct rte_udp_hdr *udp_hdr;
> + uint16_t frag_len;
> + uint8_t i;
> +
> + frag_len = MAX_PKT_LEN / nb_frags;
> + if (frag_len % 8)
> + frag_len = RTE_ALIGN_MUL_CEIL(frag_len, 8);
> +
> + for (i = 0; i < nb_frags; i++) {
> + struct rte_mbuf *frag = mbuf[i];
> + uint16_t frag_offset = 0;
> +

RE: [PATCH v4] lib: set/get max memzone segments

2023-05-30 Thread Ophir Munk



> -Original Message-
> Subject: Re: [PATCH v4] lib: set/get max memzone segments
> 
> On 5/24/2023 11:25 PM, Ophir Munk wrote:
> > Currently, the max memzones count constat (RTE_MAX_MEMZONE) is used
> to
> > decide how many memzones a DPDK application can have. This value could
> > technically be changed by manually editing `rte_config.h` before
> > compilation, but if DPDK is already compiled, that option is not useful.
> > There are certain use cases that would benefit from making this value
> > configurable.
> >
> > This commit addresses the issue by adding a new API to set the max
> > number of memzones before EAL initialization (while using the old
> > constant as default value), as well as an API to get current maximum
> > number of memzones.
> >
> > Signed-off-by: Ophir Munk 
> > Acked-by: Morten Brørup 
> > ---
> 
> > +
> > +int
> > +rte_memzone_max_set(size_t max)
> > +{
> > +   struct rte_mem_config *mcfg;
> > +
> > +   if (eal_get_internal_configuration()->init_complete > 0)
> > +   return -1;
> > +
> > +   mcfg = rte_eal_get_configuration()->mem_config;
> > +   if (!mcfg)
> > +   return -1;
> > +
> > +   mcfg->max_memzone = max;
> > +
> > +   return 0;
> > +}
> 
> Would this even work? 

Yes. It's working. 
I successfully ran the following test:

int max = rte_memzone_max_get(); 
printf("Max memzone before eal_init: %d\n", max);  // Printing the default 
max value of 2560
rte_memzone_max_set(1000);
max = rte_memzone_max_get();
printf("Max memzone before eal_init after set to 1000: %d\n", max);// 
Printing the new max value of 1000
rte_eal_init(argc, argv); 
rte_memzone_max_set(2000); // Here we fail with -1 since we set after eal init
max = rte_memzone_max_get();
printf("Max memzone after eal_init and after set to 2000: %d\n", max);   // 
Printing the correct max value of 1000

> AFAIR mem_config is only available some time during
> EAL init, not before (mem_config pointer will be NULL at that point).
> 

Please note that DPDK supports early memory config 
(lib/eal/common/eal_common_config.c):

/* early configuration structure, when memory config is not mmapped */  
   
static struct rte_mem_config early_mem_config;  

So max memzone is saved in the early memory and later this memory becomes 
mapped (shared).

Having said that - I think the current patch is correct.
Please confirm.

> I suggest the following flow:
> 
> set():
> 
> if init_complete => return -1

Also if mem_config is NULL => return -1

> else => set local static value
> 

> get():
> 
> if init_complete => return memzones.count else => return local static value 
> (set
> to our default)
> 
> That way we don't actually need the memconfig, and multiprocess will work
> because memzones.count is shared between primary and secondary anyway.
> 
> --
> Thanks,
> Anatoly

[PATCH v1] bus/pci: get PCI address from rte_device

2023-05-30 Thread eagostini

From: Elena Agostini 

In DPDK 22.11 pci bus related structure have been hidden internally
so the application doesn't have a direct access to those info anymore.

This patch introduces a get function to retrieve a PCI address
from an rte_device handler.

Signed-off-by: Elena Agostini 
---
 drivers/bus/pci/pci_common.c  | 15 +++
 drivers/bus/pci/rte_bus_pci.h | 13 +
 2 files changed, 28 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index e32a9d517a..9ab5256543 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -884,6 +884,21 @@ rte_pci_set_bus_master(struct rte_pci_device *dev, bool 
enable)
return 0;
 }
 
+const struct rte_pci_addr *
+rte_pci_get_addr(const struct rte_device *dev)
+{
+   const struct rte_pci_device *pci_dev;
+
+   if (!dev) {
+   rte_errno = EINVAL;
+   return NULL;
+   }
+
+   pci_dev = RTE_DEV_TO_PCI_CONST(dev);
+
+   return &pci_dev->addr;
+}
+
 struct rte_pci_bus rte_pci_bus = {
.bus = {
.scan = rte_pci_scan,
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index b193114fe5..e18ddb7fd7 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -68,6 +68,19 @@ void rte_pci_unmap_device(struct rte_pci_device *dev);
  */
 void rte_pci_dump(FILE *f);
 
+/**
+ * Return PCI device address of an rte_device
+ *
+ * @param dev
+ *   A pointer to a rte_device structure describing the device
+ *   to use
+ *
+ * @return
+ *   PCI address of the device on success, NULL if no driver
+ *   is found for the device.
+ */
+const struct rte_pci_addr * rte_pci_get_addr(const struct rte_device *dev);
+
 /**
  * Find device's extended PCI capability.
  *
-- 
2.34.1

[PATCH] event/cnxk: add wmb after steorl for event mode

2023-05-30 Thread Srujana Challa

LMTST area can be overwritten before read by HW between to consecutive
steorl operations. Hence, add wmb() after steorl op to make sure
the lmtst operation is complete.

Signed-off-by: Srujana Challa 
---
 drivers/event/cnxk/cn10k_tx_worker.h | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/event/cnxk/cn10k_tx_worker.h 
b/drivers/event/cnxk/cn10k_tx_worker.h
index c18786a14c..81fe31c4b9 100644
--- a/drivers/event/cnxk/cn10k_tx_worker.h
+++ b/drivers/event/cnxk/cn10k_tx_worker.h
@@ -43,7 +43,6 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf 
*m, uint64_t *cmd,
 const uint64_t *txq_data, const uint32_t flags)
 {
uint8_t lnum = 0, loff = 0, shft = 0;
-   uint16_t ref_cnt = m->refcnt;
struct cn10k_eth_txq *txq;
uintptr_t laddr;
uint16_t segdw;
@@ -98,10 +97,9 @@ cn10k_sso_tx_one(struct cn10k_sso_hws *ws, struct rte_mbuf 
*m, uint64_t *cmd,
 
roc_lmt_submit_steorl(lmt_id, pa);
 
-   if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
-   if (ref_cnt > 1)
-   rte_io_wmb();
-   }
+   /* Memory barrier to make sure lmtst store completes */
+   rte_io_wmb();
+
return 1;
 }
 
-- 
2.25.1

Re: [PATCH v3 2/4] vhost: make the guest_notifications statistic counter atomic

2023-05-30 Thread Maxime Coquelin





On 5/17/23 11:08, Eelco Chaudron wrote:

Making the guest_notifications statistic counter atomic, allows
it to be safely incremented while holding the read access_lock.

Signed-off-by: Eelco Chaudron 
---
  lib/vhost/vhost.c |8 
  lib/vhost/vhost.h |9 ++---
  2 files changed, 14 insertions(+), 3 deletions(-)



Reviewed-by: Maxime Coquelin 

Thanks,
Maxime

Re: [PATCH v3 3/4] vhost: fix invalid call FD handling

2023-05-30 Thread Maxime Coquelin





On 5/17/23 11:09, Eelco Chaudron wrote:

This patch fixes cases where IRQ injection is tried while
the call FD is not valid, which should not happen.

Fixes: b1cce26af1dc ("vhost: add notification for packed ring")
Fixes: e37ff954405a ("vhost: support virtqueue interrupt/notification 
suppression")

Signed-off-by: Maxime Coquelin 
Signed-off-by: Eelco Chaudron 
---
  lib/vhost/vhost.h |8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 37609c7c8d..23a4e2b1a7 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -903,9 +903,9 @@ vhost_vring_call_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq)
"%s: used_event_idx=%d, old=%d, new=%d\n",
__func__, vhost_used_event(vq), old, new);
  
-		if ((vhost_need_event(vhost_used_event(vq), new, old) &&

-   (vq->callfd >= 0)) ||
-   unlikely(!signalled_used_valid)) {
+   if ((vhost_need_event(vhost_used_event(vq), new, old) ||
+   unlikely(!signalled_used_valid)) &&
+   vq->callfd >= 0) {
eventfd_write(vq->callfd, (eventfd_t) 1);
if (dev->flags & VIRTIO_DEV_STATS_ENABLED)

__atomic_fetch_add(&vq->stats.guest_notifications,
@@ -974,7 +974,7 @@ vhost_vring_call_packed(struct virtio_net *dev, struct 
vhost_virtqueue *vq)
if (vhost_need_event(off, new, old))
kick = true;
  kick:
-   if (kick) {
+   if (kick && vq->callfd >= 0) {
eventfd_write(vq->callfd, (eventfd_t)1);
if (dev->notify_ops->guest_notified)
dev->notify_ops->guest_notified(dev->vid);



Reporting Chenbo's R-by, from the VDUSE series RFC:

Reviewed-by: Chenbo Xia

Re: [PATCH v3 4/4] vhost: add device op to offload the interrupt kick

2023-05-30 Thread Maxime Coquelin





On 5/17/23 11:09, Eelco Chaudron wrote:

This patch adds an operation callback which gets called every time the
library wants to call eventfd_write(). This eventfd_write() call could
result in a system call, which could potentially block the PMD thread.

The callback function can decide whether it's ok to handle the
eventfd_write() now or have the newly introduced function,
rte_vhost_notify_guest(), called at a later time.

This can be used by 3rd party applications, like OVS, to avoid system
calls being called as part of the PMD threads.

Signed-off-by: Eelco Chaudron 
---
  lib/vhost/meson.build |2 ++
  lib/vhost/rte_vhost.h |   23 +-
  lib/vhost/socket.c|   63 ++---
  lib/vhost/version.map |9 +++
  lib/vhost/vhost.c |   38 ++
  lib/vhost/vhost.h |   58 -
  6 files changed, 171 insertions(+), 22 deletions(-)




The patch looks good to me, but that's the first time we use function
versioning in Vhost library, so I'd like another pair of eyes to be sure
I don't miss anything.

Reviewed-by: Maxime Coquelin 

Thomas, do we need to mention it somewhere in the release note?

Thanks,
Maxime

Re: [PATCH v3 4/4] vhost: add device op to offload the interrupt kick

2023-05-30 Thread Thomas Monjalon

30/05/2023 15:02, Maxime Coquelin:
> 
> On 5/17/23 11:09, Eelco Chaudron wrote:
> > This patch adds an operation callback which gets called every time the
> > library wants to call eventfd_write(). This eventfd_write() call could
> > result in a system call, which could potentially block the PMD thread.
> > 
> > The callback function can decide whether it's ok to handle the
> > eventfd_write() now or have the newly introduced function,
> > rte_vhost_notify_guest(), called at a later time.
> > 
> > This can be used by 3rd party applications, like OVS, to avoid system
> > calls being called as part of the PMD threads.
> > 
> > Signed-off-by: Eelco Chaudron 
> > ---
> >   lib/vhost/meson.build |2 ++
> >   lib/vhost/rte_vhost.h |   23 +-
> >   lib/vhost/socket.c|   63 
> > ++---
> >   lib/vhost/version.map |9 +++
> >   lib/vhost/vhost.c |   38 ++
> >   lib/vhost/vhost.h |   58 -
> >   6 files changed, 171 insertions(+), 22 deletions(-)
> > 
> 
> 
> The patch looks good to me, but that's the first time we use function
> versioning in Vhost library, so I'd like another pair of eyes to be sure
> I don't miss anything.
> 
> Reviewed-by: Maxime Coquelin 
> 
> Thomas, do we need to mention it somewhere in the release note?

If compatibility is kept, I think we don't need to mention it.

Re: [PATCH v4] lib: set/get max memzone segments

2023-05-30 Thread Thomas Monjalon

25/05/2023 00:25, Ophir Munk:
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> -#define RTE_MAX_MEMZONE 2560

Good to be able to remove this compilation-time configuration.


> --- a/lib/eal/common/eal_common_memzone.c
> +++ b/lib/eal/common/eal_common_memzone.c
> +#define DEFAULT_MAX_MEMZONE 2560

Maybe add "_COUNT" at the end to make clear it is not about the size of a 
memzone.
We should add a comment here to explain the meaning of this default:
used until the "set" function is called.


> - "%s(): Number of requested memzone segments exceeds 
> RTE_MAX_MEMZONE\n",
> - __func__);
> + "%s(): Number of requested memzone segments exceeds "
> + "maximum %u\n", __func__, arr->len);

We should keep "maximum" on the first line to ease "grep" in the code.

> +int
> +rte_memzone_max_set(size_t max)
> +{
> + struct rte_mem_config *mcfg;
> +
> + if (eal_get_internal_configuration()->init_complete > 0)
> + return -1;

An error log would be needed here I think.

> +
> + mcfg = rte_eal_get_configuration()->mem_config;
> + if (!mcfg)

Better to use "== NULL" for pointers.

> + return -1;

Do we need an error log as well?

> +
> + mcfg->max_memzone = max;
> +
> + return 0;
> +}
> +
> +size_t
> +rte_memzone_max_get(void)
> +{
> + struct rte_mem_config *mcfg;
> +
> + mcfg = rte_eal_get_configuration()->mem_config;
> + if (!mcfg || !mcfg->max_memzone)

Same comment as above: don't use boolean operator for pointer or value.
 
> + return DEFAULT_MAX_MEMZONE;
> +
> + return mcfg->max_memzone;
> +}
> diff --git a/lib/eal/common/eal_memcfg.h b/lib/eal/common/eal_memcfg.h
> index ea013a9..183bb25 100644
> --- a/lib/eal/common/eal_memcfg.h
> +++ b/lib/eal/common/eal_memcfg.h
> @@ -75,6 +75,8 @@ struct rte_mem_config {
>   /**< TSC rate */
>  
>   uint8_t dma_maskbits; /**< Keeps the more restricted dma mask. */
> +
> + size_t max_memzone; /**< maximum allowed allocated memzones. */

Uppercase for first work, and we may remove "allowed"?
Suggestion: "Maximum number of allocated memzones."

[...]
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Set max memzone value

Add a dot at the end.
Instead of "value", we should mention "number" or "count".

> + *
> + * This function can only be called prior to rte_eal_init().
> + *
> + * @param max
> + *   Maximum number of memzones
> + * @return
> + *  0 on success, -1 otherwise
> + */
> +__rte_experimental
> +int rte_memzone_max_set(size_t max);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Get the maximum number of memzones.
> + *
> + * @note: The maximum value will not change after calling rte_eal_init().
> + *
> + * @return
> + *   Maximum number of memzones
> + */
> +__rte_experimental
> +size_t rte_memzone_max_get(void);

Good, thank you.

Re: [PATCH v1] bus/pci: get PCI address from rte_device

2023-05-30 Thread Thomas Monjalon

30/05/2023 13:42, eagost...@nvidia.com:
> This patch introduces a get function to retrieve a PCI address
> from an rte_device handler.
[...]
> +const struct rte_pci_addr *
> +rte_pci_get_addr(const struct rte_device *dev)
> +{
> + const struct rte_pci_device *pci_dev;
> +
> + if (!dev) {

Please compare pointer with == NULL

> + rte_errno = EINVAL;
> + return NULL;
> + }

Can we check the bus type here?

> +
> + pci_dev = RTE_DEV_TO_PCI_CONST(dev);
> +
> + return &pci_dev->addr;
> +}
[...]
> +/**
> + * Return PCI device address of an rte_device

You can replace rte_device with "generic device" and add a dot :)

> + *
> + * @param dev
> + *   A pointer to a rte_device structure describing the device
> + *   to use

Do it simpler: pointer to the generic device structure.

> + *
> + * @return
> + *   PCI address of the device on success, NULL if no driver
> + *   is found for the device.

Not exactly, it can return NULL if the device is not PCI.

> + */
> +const struct rte_pci_addr * rte_pci_get_addr(const struct rte_device *dev);

Re: [PATCH 1/3] security: introduce out of place support for inline ingress

2023-05-30 Thread Thomas Monjalon

30/05/2023 11:23, Jerin Jacob:
> > > > > > +  */
> > > > > > + uint32_t ingress_oop : 1;
> > > > > > +
> > > > > >   /** Reserved bit fields for future extension
> > > > > >*
> > > > > >* User should ensure reserved_opts is cleared as it may 
> > > > > > change in
> > > > > > @@ -282,7 +293,7 @@ struct rte_security_ipsec_sa_options {
> > > > > >*
> > > > > >* Note: Reduce number of bits in reserved_opts for every new 
> > > > > > option.
> > > > > >*/
> > > > > > - uint32_t reserved_opts : 17;
> > > > > > + uint32_t reserved_opts : 16;
> > > > > >  };
> > > > >
> > > > > NAK
> > > > > Let me repeat the reserved bit rant. YAGNI
> > > > >
> > > > > Reserved space is not usable without ABI breakage unless the existing
> > > > > code enforces that reserved space has to be zero.
> > > > >
> > > > > Just saying "User should ensure reserved_opts is cleared" is not 
> > > > > enough.
> > > >
> > > > Yes. I think, we need to enforce to have _init functions for the
> > > > structures which is using reserved filed.
> > > >
> > > > On the same note on YAGNI, I am wondering why NOT introduce
> > > > RTE_NEXT_ABI marco kind of scheme to compile out ABI breaking changes.
> > > > By keeping RTE_NEXT_ABI disable by default, enable explicitly if user
> > > > wants it to avoid waiting for one year any ABI breaking changes.
> > > > There are a lot of "fixed appliance" customers (not OS distribution
> > > > driven customer) they are willing to recompile DPDK for new feature.
> > > > What we are loosing with this scheme?
> > >
> > > RTE_NEXT_ABI is described in the ABI policy.
> > > We are not doing it currently, but I think we could
> > > when it is not too much complicate in the code.
> > >
> > > The only problems I see are:
> > > - more #ifdef clutter
> > > - 2 binary versions to test
> > > - CI and checks must handle RTE_NEXT_ABI version
> >
> > I think, we have two buckets of ABI breakages via RTE_NEXT_ABI
> >
> > 1) Changes that introduces compilation failures like adding new
> > argument to API or change API name etc
> > 2) Structure size change which won't affect the compilation but breaks
> > the ABI for shared library usage.
> >
> > I think, (1) is very distributive, and I don't see recently such
> > changes. I think, we should avoid (1) for non XX.11 releases.(or two
> > or three-year cycles if we decide that path)
> >
> > The (2) comes are very common due to the fact HW features are
> > evolving. I think, to address the (2), we have two options
> > a) Have reserved fields and have _init() function to initialize the 
> > structures
> > b) Follow YAGNI style and introduce RTE_NEXT_ABI for structure size change.
> >
> > The above concerns[1] can greatly reduce with option b OR option a.
> >
> > [1]
> >  1) more #ifdef clutter
> > For option (a) this is not needed or option (b) the clutter will be
> > limited, it will be around structure which add the new filed and
> > around the FULL block where new functions are added (not inside the
> > functions)
> >
> > 2) 2 binary versions to test
> > For option (a) this is not needed, for option (b) it is limited as for
> > new features only one needs to test another binary (rather than NOT
> > adding a new feature).
> >
> >  3) CI and checks must handle RTE_NEXT_ABI version
> >
> > I think, it is cheap to add this, at least for compilation test.
> >
> > IMO, We need to change the API break release to 3 year kind of time
> > frame to have very good end user experience
> > and allow ABI related change to get in every release and force
> > _rebuild_ shared objects in major LTS release.
> >
> > I think, in this major LTS version(23.11) if we can decide (a) vs (b)
> > then we can align the code accordingly . e.s.p for (a) we need to add
> > _init() functions.
> >
> > Thoughts?
> 
> Not much input from mailing list. Can we discuss this next TB meeting?
> Especially how to align with next LTS release on
> -YAGNI vs reserved fileds with init()
> -What it takes to Extend the API breaking release more than a year as
> first step.

Yes I agree it should be discussed interactively in techboard meeting.

Re: Hugepage migration

2023-05-30 Thread Baruch Even

I have tested the MAP_LOCKED, it doesn't help in this case. I do intend to
report to the kernel but was wondering if others have hit upon this first.

On Tue, May 30, 2023 at 4:35 AM Stephen Hemminger <
step...@networkplumber.org> wrote:

> On Sun, 28 May 2023 23:07:40 +0300
> Baruch Even  wrote:
>
> > Hi,
> >
> > We found an issue with newer kernels (5.13+) that are found on newer OSes
> > (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that was
> > allocated for DPDK was migrated (moved into another physical page) when a
> > 1G page was allocated.
> >
> > From our reading of the kernel commits this started with commit
> > ae37c7ff79f1f030e28ec76c46ee032f8fd07607
> > mm: make alloc_contig_range handle in-use hugetlb pages
> >
> > This caused what looked like memory corruptions to us and cases where the
> > rings were moved from their physical location and communication was no
> > longer possible.
> >
> > I wanted to ask if anyone else hit this issue and what mitigations are
> > available?
> >
> > We are currently looking at using a kernel driver to pin the pages but I
> > expect that this issue will affect others and that a more general
> approach
> > is needed.
> >
> > Thanks,
> > Baruch
> >
>
> Fix might be as simple as asking kernel to lock the mmap().
>
> diff --git a/lib/eal/linux/eal_hugepage_info.c
> b/lib/eal/linux/eal_hugepage_info.c
> index 581d9dfc91eb..989c69387233 100644
> --- a/lib/eal/linux/eal_hugepage_info.c
> +++ b/lib/eal/linux/eal_hugepage_info.c
> @@ -48,7 +48,8 @@ map_shared_memory(const char *filename, const size_t
> mem_size, int flags)
> return NULL;
> }
> retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
> -   MAP_SHARED, fd, 0);
> +   MAP_SHARED_VALIDATE | MAP_LOCKED, fd, 0);
> +
> close(fd);
> return retval == MAP_FAILED ? NULL : retval;
>  }
>


-- 
Baruch Even
Platform Technical Lead,  WEKA
E bar...@weka.io* *W www.weka.io
*
* * *

Re: Hugepage migration

2023-05-30 Thread Baruch Even

On Tue, May 30, 2023 at 11:04 AM Bruce Richardson <
bruce.richard...@intel.com> wrote:

> On Sun, May 28, 2023 at 11:07:40PM +0300, Baruch Even wrote:
> >Hi,
> >We found an issue with newer kernels (5.13+) that are found on newer
> >OSes (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page
> that
> >was allocated for DPDK was migrated (moved into another physical page)
> >when a 1G page was allocated.
> >From our reading of the kernel commits this started with commit
> >ae37c7ff79f1f030e28ec76c46ee032f8fd07607
> >mm: make alloc_contig_range handle in-use hugetlb pages
> >This caused what looked like memory corruptions to us and cases where
> >the rings were moved from their physical location and communication
> was
> >no longer possible.
> >I wanted to ask if anyone else hit this issue and what mitigations are
> >available?
> >We are currently looking at using a kernel driver to pin the pages but
> >I expect that this issue will affect others and that a more general
> >approach is needed.
> >Thanks,
> >Baruch
> >--
>
> Hi,
>
> what kernel driver was being used for the device I/O part? Was it a UIO
> based driver or "vfio-pci"? When using vfio-pci and configuring IOMMU
> mappings, the pages mapped should be pinned by the kernel, I would have
> thought, since the kernel knows they are being used by devices.
>
> /Bruce
>

This was using igb_uio on an AWS instance with their ena driver.

Baruch

-- 
Baruch Even
Platform Technical Lead,  WEKA
E bar...@weka.io* *W www.weka.io
*
* * *

Minutes of Technical Board Meeting, 2023-05-17

2023-05-30 Thread Thomas Monjalon

Members Attending: 10/10
- Aaron Conole
- Bruce Richardson
- Hemant Agrawal
- Honnappa Nagarahalli
- Jerin Jacob
- Kevin Traynor
- Konstantin Ananyev
- Maxime Coquelin
- Stephen Hemminger
- Thomas Monjalon (Chair)

NOTE: The Technical Board meetings take place every second Wednesday
on https://meet.jit.si/DPDK at 3 pm UTC.
Meetings are public, and DPDK community members are welcome to attend.
Agenda and minutes can be found at http://core.dpdk.org/techboard/minutes

NOTE: Next meeting will be on Wednesday 2023-05-31 @3pm UTC,
and should be chaired by Aaron.


1/ Marketing

Our new marketing manager (Benjamin Thomas) is planning to do
more communication on a regular basis.

He needs help for content creation,
especially from boards members and maintainers.

Ben will do remote interviews during the summit in September.

There is a channel on Slack to help coordinating marketing effort.
Everybody is welcome to join #marketing.


2/ Summit

The summit will take place in Dublin, Gibson hotel on 12-13 September.

We discussed about a social event evening
and a board dinner on a different day.

Nathan gave additional information about flight & baggage
reimbursements for board members.

Virtual speakers are allowed but in-person attendance is preferred.

The CFP and dedicated website should be online soon.


3/ Closed minutes of previous meetings


4/ Bugzilla maintenance

We will ask for volunteers to maintain a good status of open bugs.

RE: [EXT] [PATCH v5 1/2] cryptodev: support SM3_HMAC,SM4_CFB and SM4_OFB

2023-05-30 Thread Akhil Goyal

> Add SM3_HMAC/SM4_CFB/SM4_OFB support in DPDK.
> 
> Signed-off-by: Sunyang Wu 
Series Acked-by: Akhil Goyal 

Applied to dpdk-next-crypto

RE: [PATCH] lib/cryptodev: set minimal output buffer size for RSA op

2023-05-30 Thread Akhil Goyal

> > Subject: [PATCH] lib/cryptodev: set minimal output buffer size for RSA op
> >
> > Depending on RSA op, the output buffer size could be set
> > minimal expected memory, rather than 0 as today. This will allow
> > PMD to do any validation on the size, in case an application
> > did not create enough memory or even in case of any memory
> > fault.
> >
> > Signed-off-by: Gowrishankar Muthukrishnan 
> Acked-by: Akhil Goyal 
Applied to dpdk-next-crypto

RE: [EXT] [PATCH] drivers/net/bnx2x : Offer maintainership for bnx2x

2023-05-30 Thread Alok Prasad




> -Original Message-
> From: Akhil Goyal 
> Sent: 30 May 2023 12:37
> To: julien_d...@jaube.fr; dev@dpdk.org
> Cc: Alok Prasad 
> Subject: RE: [EXT] [PATCH] drivers/net/bnx2x : Offer maintainership for bnx2x
> 
> > From: Julien Aube 
> >
> > Signed-off-by: Julien Aube 
> > ---
> ++ Alok

Hi Julien, Can you remove Rasesh and Shahed from the maintainer list and send 
updated patch with only 
your name as maintainer.

Thanks,
Alok

[PATCH 1/1] net/mlx5: fix device removal event handling

2023-05-30 Thread Viacheslav Ovsiienko

On the device removal kernel notifies user space application
with queueing the IBV_DEVICE_FATAL_EVENT and triggering appropriate
file descriptor. Mellanox kernel driver stack emits this event
twice from different layers (mlx5 and uverbs). The IB port index
is not applicable in the event structure and should be ignored
for IBV_DEVICE_FATAL_EVENT events.

Also, on the older kernels (at least from OFED 4.9) there might be
race conditions causing the event queue close before application
fetches the IBV_DEVICE_FATAL_EVENT message with ibv_get_async_event()
API.

To provide the reliable device removal event detection the patch:

  - ignores the IB port index for the IBV_DEVICE_FATAL_EVENT
  - introduces the flag to notify PMD about removal only once
  - acks event with ibv_ack_async_event after actual handling
  - checks for EIO error, making sure queue is not closed yet

Fixes: 40d9f906f4e2 ("net/mlx5: fix device removal handler for multiport")
Cc: sta...@dpdk.org

Signed-off-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/linux/mlx5_ethdev_os.c | 34 +
 drivers/net/mlx5/mlx5.h |  1 +
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c 
b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 55801534d1..639e629fe4 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -746,6 +746,7 @@ mlx5_dev_interrupt_device_fatal(struct mlx5_dev_ctx_shared 
*sh)
 
for (i = 0; i < sh->max_port; ++i) {
struct rte_eth_dev *dev;
+   struct mlx5_priv *priv;
 
if (sh->port[i].ih_port_id >= RTE_MAX_ETHPORTS) {
/*
@@ -756,9 +757,14 @@ mlx5_dev_interrupt_device_fatal(struct mlx5_dev_ctx_shared 
*sh)
}
dev = &rte_eth_devices[sh->port[i].ih_port_id];
MLX5_ASSERT(dev);
-   if (dev->data->dev_conf.intr_conf.rmv)
+   priv = dev->data->dev_private;
+   MLX5_ASSERT(priv);
+   if (!priv->rmv_notified && dev->data->dev_conf.intr_conf.rmv) {
+   /* Notify driver about removal only once. */
+   priv->rmv_notified = 1;
rte_eth_dev_callback_process
(dev, RTE_ETH_EVENT_INTR_RMV, NULL);
+   }
}
 }
 
@@ -830,21 +836,29 @@ mlx5_dev_interrupt_handler(void *cb_arg)
struct rte_eth_dev *dev;
uint32_t tmp;
 
-   if (mlx5_glue->get_async_event(sh->cdev->ctx, &event))
+   if (mlx5_glue->get_async_event(sh->cdev->ctx, &event)) {
+   if (errno == EIO) {
+   DRV_LOG(DEBUG,
+   "IBV async event queue closed on: %s",
+   sh->ibdev_name);
+   mlx5_dev_interrupt_device_fatal(sh);
+   }
break;
-   /* Retrieve and check IB port index. */
-   tmp = (uint32_t)event.element.port_num;
-   if (!tmp && event.event_type == IBV_EVENT_DEVICE_FATAL) {
+   }
+   if (event.event_type == IBV_EVENT_DEVICE_FATAL) {
/*
-* The DEVICE_FATAL event is called once for
-* entire device without port specifying.
-* We should notify all existing ports.
+* The DEVICE_FATAL event can be called by kernel
+* twice - from mlx5 and uverbs layers, and port
+* index is not applicable. We should notify all
+* existing ports.
 */
-   mlx5_glue->ack_async_event(&event);
mlx5_dev_interrupt_device_fatal(sh);
+   mlx5_glue->ack_async_event(&event);
continue;
}
-   MLX5_ASSERT(tmp && (tmp <= sh->max_port));
+   /* Retrieve and check IB port index. */
+   tmp = (uint32_t)event.element.port_num;
+   MLX5_ASSERT(tmp <= sh->max_port);
if (!tmp) {
/* Unsupported device level event. */
mlx5_glue->ack_async_event(&event);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 021049ad2b..6aae8fe3f4 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1743,6 +1743,7 @@ struct mlx5_priv {
unsigned int mtr_en:1; /* Whether support meter. */
unsigned int mtr_reg_share:1; /* Whether support meter REG_C share. */
unsigned int lb_used:1; /* Loopback queue is referred to. */
+   unsigned int rmv_notified:1; /* Notified about removal event */
uint32_t mark_enabled:1; /* If mark action is enabled on rxqs. */
u

Re: [PATCH v3 4/4] vhost: add device op to offload the interrupt kick

2023-05-30 Thread Maxime Coquelin





On 5/30/23 15:16, Thomas Monjalon wrote:

30/05/2023 15:02, Maxime Coquelin:


On 5/17/23 11:09, Eelco Chaudron wrote:

This patch adds an operation callback which gets called every time the
library wants to call eventfd_write(). This eventfd_write() call could
result in a system call, which could potentially block the PMD thread.

The callback function can decide whether it's ok to handle the
eventfd_write() now or have the newly introduced function,
rte_vhost_notify_guest(), called at a later time.

This can be used by 3rd party applications, like OVS, to avoid system
calls being called as part of the PMD threads.

Signed-off-by: Eelco Chaudron 
---
   lib/vhost/meson.build |2 ++
   lib/vhost/rte_vhost.h |   23 +-
   lib/vhost/socket.c|   63 
++---
   lib/vhost/version.map |9 +++
   lib/vhost/vhost.c |   38 ++
   lib/vhost/vhost.h |   58 -
   6 files changed, 171 insertions(+), 22 deletions(-)




The patch looks good to me, but that's the first time we use function
versioning in Vhost library, so I'd like another pair of eyes to be sure
I don't miss anything.

Reviewed-by: Maxime Coquelin 

Thomas, do we need to mention it somewhere in the release note?


If compatibility is kept, I think we don't need to mention it.




Thanks Thomas for the information.

Maxime

Re: Hugepage migration

2023-05-30 Thread Stephen Hemminger

On Tue, 30 May 2023 16:53:14 +0300
Baruch Even  wrote:

> > what kernel driver was being used for the device I/O part? Was it a UIO
> > based driver or "vfio-pci"? When using vfio-pci and configuring IOMMU
> > mappings, the pages mapped should be pinned by the kernel, I would have
> > thought, since the kernel knows they are being used by devices.
> >
> > /Bruce
> >  
> 
> This was using igb_uio on an AWS instance with their ena driver.
> 
> Baruch

Try VFIO, using igb_uio is effectively and out tree driver and the kernel
maintainers are unlikely to give you much support.

dpdk: Inquiry about vring cleanup during packets transmission

2023-05-30 Thread wangzengyuan

Hi,
 I am writing to inquire about the vring cleanup process during packets 
transmission.
In the virtio_xmit_pkts function, there is the following code:
 nb_used = virtqueue_nused(vq);

 if (likely(nb_used > vq->vq_nentries - vq->vq_free_thresh))
   virtio_xmit_cleanup(vq, nb_used);
In other words, cleaning is performed when the number of items used in the 
vring exceeds (vq->vq_nentries - vq->vq_free_thresh). In the case of an vring 
size of 4096, at least (4096-32) items need to be cleaned at once, which will 
take a considerable amount of time.
I'm curious why not clean up fewer items each time to avoid taking up too much 
CPU time in one transmission. Because during the debugging process, I found 
that cleaning up thousands of items at once takes up a considerable amount of 
time.
As I am not familiar with this process, I would appreciate it if you could 
provide me with some information on what its purpose is.

Best regards,

Zengyuan Wang

[PATCH] drivers: fix vmxnet3 return wrong error code in initializing

2023-05-30 Thread root

From: Kaijun Zeng 

In vmxnet3_dev_rxtx_init(), a wrong error code may be thrown after it invokes
vmxnet3_post_rx_bufs() because it negates the error code before returning it.
It causes rte_eth_dev_start() to give a positive number to the invoker, but it
should be a negative number, as described in the comments.

Bugzilla ID: 1239

Signed-off-by: Kaijun Zeng 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index a875ffec07..73ec1e4727 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -1315,7 +1315,7 @@ vmxnet3_dev_rxtx_init(struct rte_eth_dev *dev)
PMD_INIT_LOG(ERR,
 "ERROR: Posting Rxq: %d buffers 
ring: %d",
 i, j);
-   return -ret;
+   return ret;
}
/*
 * Updating device with the index:next2fill to fill the
-- 
2.30.2

📣 Last Call for Content

2023-05-30 Thread Ben Thomas

Hello everyone

*Please email me *btho...@linuxfoundation.org with any content you would
like featured in our new monthly newsletter, and shared across DPDK and
Linux Foundation social accounts.

*Examples of content:*

   - Send links to tech updates, news events, meetings etc related to DPDK
   - Submit blogs about DPDK projects and/or other initiatives here
   

   - Include engineers in the Dev spotlight blog series here
   

   - Send Case studies, white papers & user stories

We would also appreciate any engagement with our content [image: :zap:] Tag
@DPDKproject #DPDKtech on your socials for us to share
[image: :+1:] Like posts
[image: :mega:] Leave a comment
[image: :recycle:] Repost to your network
[image: :point_right:] Follow our Linkedin
 and Twitter


Thank you,
Ben Thomas

[PATCH] pci: fix comment referencing renamed function

2023-05-30 Thread Thomas Monjalon

When renaming functions eal_parse_pci_*,
a referencing comment was missed in the function rte_pci_device_name().

Fixes: ca52fccbb3b9 ("pci: remove deprecated functions")
Cc: sta...@dpdk.org

Signed-off-by: Thomas Monjalon 
---
 lib/pci/rte_pci.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index 5088157e74..aab761b918 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -104,8 +104,7 @@ struct rte_pci_addr {
 
 /**
  * Utility function to write a pci device name, this device name can later be
- * used to retrieve the corresponding rte_pci_addr using eal_parse_pci_*
- * BDF helpers.
+ * used to retrieve the corresponding rte_pci_addr using rte_pci_addr_parse().
  *
  * @param addr
  * The PCI Bus-Device-Function address
-- 
2.40.1

Re: [PATCH v3] common/cnxk: add new APIs for batch operations

2023-05-30 Thread Jerin Jacob

On Tue, May 30, 2023 at 2:43 PM Ashwin Sekhar T K  wrote:
>
> Add new APIs for counting and extracting allocated objects
> from a single cache line in the batch alloc memory.
>
> Signed-off-by: Ashwin Sekhar T K 

Applied to dpdk-next-net-mrvl/for-next-net. Thanks


> ---
>  drivers/common/cnxk/roc_npa.h | 78 ++-
>  1 file changed, 67 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/common/cnxk/roc_npa.h b/drivers/common/cnxk/roc_npa.h
> index e1e164499e..4ad5f044b5 100644
> --- a/drivers/common/cnxk/roc_npa.h
> +++ b/drivers/common/cnxk/roc_npa.h
> @@ -209,7 +209,6 @@ roc_npa_aura_batch_alloc_issue(uint64_t aura_handle, 
> uint64_t *buf,
>unsigned int num, const int dis_wait,
>const int drop)
>  {
> -   unsigned int i;
> int64_t *addr;
> uint64_t res;
> union {
> @@ -220,10 +219,6 @@ roc_npa_aura_batch_alloc_issue(uint64_t aura_handle, 
> uint64_t *buf,
> if (num > ROC_CN10K_NPA_BATCH_ALLOC_MAX_PTRS)
> return -1;
>
> -   /* Zero first word of every cache line */
> -   for (i = 0; i < num; i += (ROC_ALIGN / sizeof(uint64_t)))
> -   buf[i] = 0;
> -
> addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
>NPA_LF_AURA_BATCH_ALLOC);
> cmp.u = 0;
> @@ -240,6 +235,9 @@ roc_npa_aura_batch_alloc_issue(uint64_t aura_handle, 
> uint64_t *buf,
> return 0;
>  }
>
> +/*
> + * Wait for a batch alloc operation on a cache line to complete.
> + */
>  static inline void
>  roc_npa_batch_alloc_wait(uint64_t *cache_line, unsigned int wait_us)
>  {
> @@ -255,6 +253,23 @@ roc_npa_batch_alloc_wait(uint64_t *cache_line, unsigned 
> int wait_us)
> break;
>  }
>
> +/*
> + * Count the number of pointers in a single batch alloc cache line.
> + */
> +static inline unsigned int
> +roc_npa_aura_batch_alloc_count_line(uint64_t *line, unsigned int wait_us)
> +{
> +   struct npa_batch_alloc_status_s *status;
> +
> +   status = (struct npa_batch_alloc_status_s *)line;
> +   roc_npa_batch_alloc_wait(line, wait_us);
> +
> +   return status->count;
> +}
> +
> +/*
> + * Count the number of pointers in a sequence of batch alloc cache lines.
> + */
>  static inline unsigned int
>  roc_npa_aura_batch_alloc_count(uint64_t *aligned_buf, unsigned int num,
>unsigned int wait_us)
> @@ -279,6 +294,40 @@ roc_npa_aura_batch_alloc_count(uint64_t *aligned_buf, 
> unsigned int num,
> return count;
>  }
>
> +/*
> + * Extract allocated pointers from a single batch alloc cache line. This api
> + * only extracts the required number of pointers from the cache line and it
> + * adjusts the statsus->count so that a subsequent call to this api can
> + * extract the remaining pointers in the cache line appropriately.
> + */
> +static inline unsigned int
> +roc_npa_aura_batch_alloc_extract_line(uint64_t *buf, uint64_t *line,
> + unsigned int num, unsigned int *rem)
> +{
> +   struct npa_batch_alloc_status_s *status;
> +   unsigned int avail;
> +
> +   status = (struct npa_batch_alloc_status_s *)line;
> +   roc_npa_batch_alloc_wait(line, 0);
> +   avail = status->count;
> +   num = avail > num ? num : avail;
> +   if (num)
> +   memcpy(buf, &line[avail - num], num * sizeof(uint64_t));
> +   avail -= num;
> +   if (avail == 0) {
> +   /* Clear the lowest 7 bits of the first pointer */
> +   buf[0] &= ~0x7FUL;
> +   status->ccode = 0;
> +   }
> +   status->count = avail;
> +   *rem = avail;
> +
> +   return num;
> +}
> +
> +/*
> + * Extract all allocated pointers from a sequence of batch alloc cache lines.
> + */
>  static inline unsigned int
>  roc_npa_aura_batch_alloc_extract(uint64_t *buf, uint64_t *aligned_buf,
>  unsigned int num)
> @@ -330,11 +379,15 @@ roc_npa_aura_op_bulk_free(uint64_t aura_handle, 
> uint64_t const *buf,
> }
>  }
>
> +/*
> + * Issue a batch alloc operation on a sequence of cache lines, wait for the
> + * batch alloc to complete and copy the pointers out into the user buffer.
> + */
>  static inline unsigned int
>  roc_npa_aura_op_batch_alloc(uint64_t aura_handle, uint64_t *buf,
> -   uint64_t *aligned_buf, unsigned int num,
> -   const int dis_wait, const int drop,
> -   const int partial)
> +   unsigned int num, uint64_t *aligned_buf,
> +   unsigned int aligned_buf_sz, const int dis_wait,
> +   const int drop, const int partial)
>  {
> unsigned int count, chunk, num_alloc;
>
> @@ -344,9 +397,12 @@ roc_npa_aura_op_batch_alloc(uint64_t aura_handle, 
> uint64_t *buf,
>
> count = 0;
> while (num) {
> -

Re: [PATCH v4] ethdev: add flow item for RoCE infiniband BTH

2023-05-30 Thread Ferruh Yigit

On 5/30/2023 4:06 AM, Dong Zhou wrote:
> IB(InfiniBand) is one type of networking used in high-performance
> computing with high throughput and low latency. Like Ethernet,
> IB defines a layered protocol (Physical, Link, Network, Transport
> Layers). IB provides native support for RDMA(Remote DMA), an
> extension of the DMA that allows direct access to remote host
> memory without CPU intervention. IB network requires NICs and
> switches to support the IB protocol.
> 
> RoCE(RDMA over Converged Ethernet) is a network protocol that
> allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
> Ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
> Ethernet link layer protocol, IB packets are encapsulated in the
> Ethernet layer and use Ethernet type 0x8915. RoCEv2 is an internet
> layer protocol, IB packets are encapsulated in UDP payload and
> use a destination port 4791, The format of the RoCEv2 packet is
> as follows:
>   ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
> 
> BTH(Base Transport Header) is the IB transport layer header, RoCEv1
> and RoCEv2 both contain this header. This patch introduces a new
> RTE item to match the IB BTH in RoCE packets. One use of this match
> is that the user can monitor RoCEv2's CNP(Congestion Notification
> Packet) by matching BTH opcode 0x81.
> 
> This patch also adds the testpmd command line to match the RoCEv2
> BTH. Usage example:
> 
>   testpmd> flow create 0 group 1 ingress pattern
>eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
>dst_qp is 0xd3 / end actions queue index 0 / end
> 
> Signed-off-by: Dong Zhou 
> Acked-by: Ori Kam 
> Acked-by: Andrew Rybchenko 
> 
> v2:
>  - Change "ethernet" name to "Ethernet" in the commit log.
>  - Add "RoCE" and "IB" 2 words to words-case.txt.
>  - Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
>  - Add "Acked-by" labels in the first ethdev patch.
> 
> v3:
>  - Do rebase to fix the patch apply failure.
>  - Add "Acked-by" label in the second net/mlx5 patch.
> 
> v4:
>  - Split this series of patches, only keep the first ethdev patch.
>

Patch looks good, can you please add a release notes update too?

RE: [EXT] Re: [PATCH v3 1/2] ip_frag: optimize key compare and hash generation

2023-05-30 Thread Pavan Nikhilesh Bhagavatula

> On Mon, 29 May 2023 20:25:01 +0530
>  wrote:
> 
> > +   return (k1->id_key_len != k2->id_key_len) ||
> > +  (k1->key_len == IPV4_KEYLEN ? k1->src_dst[0] != k2->src_dst[0] :
> > +rte_hash_k32_cmp_eq(k1, k2,
> 32));
> 
> If you make another version, one small comment.
> Breaking this into a couple of if statements would make reading easier
> for human readers. Compiler doesn't care.

I have modified the above code to 

   if (k1->id_key_len != k2->id_key_len)
   return 1;
   if (k1->key_len == IPV4_KEYLEN)
   return k1->src_dst[0] != k2->src_dst[0];
   else
   return rte_hash_k32_cmp_eq(k1, k2, 32);

But upon remeasuring performance I see a performance loss of 1.2%
Compiler(GCC 10) generates additional branches with the above code.

I have also profiled the ip_reassembly application with and without the changes 
and see lot of 
additional branch misses


Current implementation:

==
Branch Metrics
==
Branch MPKI  : 0.159
 
Branch PKI   : 156.566  
 
Branch Mis-prediction Rate   : 0.101
 

INST_RETIRED   : ▇▇ 9.493B
BR_RETIRED : ▇▇▇ 1.486B
BR_MIS_PRED_RETIRED: ▏ 1.508M
BR_IMMED_SPEC  : ▇▇▇ 1.395B
BR_RETURN_SPEC : ▏ 105.203M
BR_INDIRECT_SPEC   : ▏ 106.044M

Modified implementation:

==
Branch Metrics
==
Branch MPKI  : 0.282
 
Branch PKI   : 156.566  
 
Branch Mis-prediction Rate   : 0.180
 

INST_RETIRED   : ▇▇ 9.444B
BR_RETIRED : ▇▇▇ 1.479B
BR_MIS_PRED_RETIRED: ▏ 2.662M
BR_IMMED_SPEC  : ▇▇▇ 1.388B
BR_RETURN_SPEC : ▏ 104.518M
BR_INDIRECT_SPEC   : ▏ 105.354M


I will retain the current implementation in the next patch.

Thanks,
Pavan.

[RFT] graph: fix pcapng file support

2023-05-30 Thread Stephen Hemminger

The interface to rte_pcapng changed in last release
so that the interfaces used need to be added to the pcapng
file via the API. If this step is missing the pcapng
file will not be valid and can't be read by wireshark etc.

I don't have setup to test graph, so needs a validation test.

Fixes: d1da6d0d04c7 ("pcapng: require per-interface information")
Signed-off-by: Stephen Hemminger 
---
 lib/graph/graph_pcap.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
index 6c433300290b..eff7b2d060ed 100644
--- a/lib/graph/graph_pcap.c
+++ b/lib/graph/graph_pcap.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
@@ -80,7 +81,8 @@ graph_pcap_default_path_get(char **dir_path)
 int
 graph_pcap_file_open(const char *filename)
 {
-   int fd;
+   int fd, ret;
+   uint16_t portid;
char file_name[RTE_GRAPH_PCAP_FILE_SZ];
char *pcap_dir;
 
@@ -114,6 +116,18 @@ graph_pcap_file_open(const char *filename)
return -1;
}
 
+   /* Add the configured interfaces as possible capture ports */
+   RTE_ETH_FOREACH_DEV(portid) {
+   ret = rte_pcapng_add_interface(pcapng_fd, portid,
+  NULL, NULL, NULL);
+   if (ret < 0) {
+   graph_err("Graph rte_pcapng_add_interface failed: %d",
+ret);
+   close(fd);
+   return -1;
+   }
+   }
+
 done:
return 0;
 }
-- 
2.39.2

ring name length simplification in ipsec_mb_qp_create_processed_ops_ring

2023-05-30 Thread Stephen Hemminger

I was looking at places in DPDK that are using rte_strlcpy which should be 
using strlcpy
directly instead.  Looking at this code in ipsec_mb, the use of strlcpy is 
actually
not needed at all.

/** Create a ring to place processed operations on */
static struct rte_ring
*ipsec_mb_qp_create_processed_ops_ring(
struct ipsec_mb_qp *qp, unsigned int ring_size, int socket_id)
{
struct rte_ring *r;
char ring_name[RTE_CRYPTODEV_NAME_MAX_LEN];

unsigned int n = rte_strlcpy(ring_name, qp->name, sizeof(ring_name));

if (n >= sizeof(ring_name))
return NULL;

r = rte_ring_lookup(ring_name);

1. The maximum length name allowed for rte_ring is 30 characters which comes 
from
RTE_MEMZONE_NAMESIZE- sizeof(RTE_RING_MZ_PREFIX) + 1 = 32 - 3 + 1 = 30

2. RTE_CRYPTODEV_NAME_MAX_LEN is 64, qp->name is in struct ipsec_mb_qp is 
always the same size.

3. Ring create already does a copy of name, so making a copy here is not needed.

Therefore copying the name is not going to ever catch any errors. And if 
qp->name is
too long it won't fail until ring_create().

Would be better to just do something simpler like:

diff --git a/drivers/crypto/ipsec_mb/ipsec_mb_ops.c 
b/drivers/crypto/ipsec_mb/ipsec_mb_ops.c
index 3e52f9567401..4af6592f12c5 100644
--- a/drivers/crypto/ipsec_mb/ipsec_mb_ops.c
+++ b/drivers/crypto/ipsec_mb/ipsec_mb_ops.c
@@ -185,12 +185,7 @@ static struct rte_ring
struct ipsec_mb_qp *qp, unsigned int ring_size, int socket_id)
 {
struct rte_ring *r;
-   char ring_name[RTE_CRYPTODEV_NAME_MAX_LEN];
-
-   unsigned int n = rte_strlcpy(ring_name, qp->name, sizeof(ring_name));
-
-   if (n >= sizeof(ring_name))
-   return NULL;
+   const char *ring_name = qp->name;
 
r = rte_ring_lookup(ring_name);
if (r) {

Re: [dpdk-dev] [PATCH v1] drivers/cnxk: improve the build time for non arm64 build

2023-05-30 Thread Nithin Kumar Dabilpuram


Acked-by: Nithin Dabilpuram 


From: jer...@marvell.com 
Sent: Monday, May 29, 2023 2:53 PM
To: dev@dpdk.org ; Pavan Nikhilesh Bhagavatula 
; Shijith Thotton ; Nithin 
Kumar Dabilpuram ; Kiran Kumar Kokkilagadda 
; Sunil Kumar Kori ; Satha 
Koteswara Rao Kottidi 
Cc: Jerin Jacob Kollanukkaran 
Subject: [dpdk-dev] [PATCH v1] drivers/cnxk: improve the build time for non 
arm64 build

From: Jerin Jacob 

Specialized fast path routines are not applicable to non
arm64 build, removing those function contained
files to improve the build time on non arm64 build.

Signed-off-by: Jerin Jacob 
---
 drivers/event/cnxk/cn10k_eventdev.c |  5 +
 drivers/event/cnxk/cn9k_eventdev.c  |  4 
 drivers/event/cnxk/meson.build  | 10 ++
 drivers/net/cnxk/cn10k_rx_select.c  |  6 +-
 drivers/net/cnxk/cn10k_tx_select.c  |  6 +-
 drivers/net/cnxk/cn9k_rx_select.c   |  6 +-
 drivers/net/cnxk/cn9k_tx_select.c   |  6 +-
 drivers/net/cnxk/meson.build|  4 
 8 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/drivers/event/cnxk/cn10k_eventdev.c 
b/drivers/event/cnxk/cn10k_eventdev.c
index fd71ff15ca..a7534efad6 100644
--- a/drivers/event/cnxk/cn10k_eventdev.c
+++ b/drivers/event/cnxk/cn10k_eventdev.c
@@ -297,7 +297,9 @@ cn10k_sso_updt_tx_adptr_data(const struct rte_eventdev 
*event_dev)
 static void
 cn10k_sso_fp_fns_set(struct rte_eventdev *event_dev)
 {
+#if defined(RTE_ARCH_ARM64)
 struct cnxk_sso_evdev *dev = cnxk_sso_pmd_priv(event_dev);
+
 struct roc_cpt *cpt = roc_idev_cpt_get();
 const event_dequeue_t sso_hws_deq[NIX_RX_OFFLOAD_MAX] = {
 #define R(name, flags)[flags] = cn10k_sso_hws_deq_##name,
@@ -614,6 +616,9 @@ cn10k_sso_fp_fns_set(struct rte_eventdev *event_dev)
 CN10K_SET_EVDEV_ENQ_OP(dev, event_dev->txa_enqueue, 
sso_hws_tx_adptr_enq);

 event_dev->txa_enqueue_same_dest = event_dev->txa_enqueue;
+#else
+   RTE_SET_USED(event_dev);
+#endif
 }

 static void
diff --git a/drivers/event/cnxk/cn9k_eventdev.c 
b/drivers/event/cnxk/cn9k_eventdev.c
index b104d19b9b..0656940726 100644
--- a/drivers/event/cnxk/cn9k_eventdev.c
+++ b/drivers/event/cnxk/cn9k_eventdev.c
@@ -309,6 +309,7 @@ cn9k_sso_updt_tx_adptr_data(const struct rte_eventdev 
*event_dev)
 static void
 cn9k_sso_fp_fns_set(struct rte_eventdev *event_dev)
 {
+#if defined(RTE_ARCH_ARM64)
 struct cnxk_sso_evdev *dev = cnxk_sso_pmd_priv(event_dev);
 /* Single WS modes */
 const event_dequeue_t sso_hws_deq[NIX_RX_OFFLOAD_MAX] = {
@@ -662,6 +663,9 @@ cn9k_sso_fp_fns_set(struct rte_eventdev *event_dev)

 event_dev->txa_enqueue_same_dest = event_dev->txa_enqueue;
 rte_mb();
+#else
+   RTE_SET_USED(event_dev);
+#endif
 }

 static void *
diff --git a/drivers/event/cnxk/meson.build b/drivers/event/cnxk/meson.build
index 8a7fd53ebd..4f259988df 100644
--- a/drivers/event/cnxk/meson.build
+++ b/drivers/event/cnxk/meson.build
@@ -31,6 +31,10 @@ if soc_type == 'cn9k' or soc_type == 'all'
 sources += files(
 'cn9k_eventdev.c',
 'cn9k_worker.c',
+)
+
+if host_machine.cpu_family().startswith('aarch')
+sources += files(
 'deq/cn9k/deq_0_15_burst.c',
 'deq/cn9k/deq_16_31_burst.c',
 'deq/cn9k/deq_32_47_burst.c',
@@ -330,11 +334,16 @@ sources += files(
 'tx/cn9k/tx_112_127_dual_seg.c',
 )
 endif
+endif

 if soc_type == 'cn10k' or soc_type == 'all'
 sources += files(
 'cn10k_eventdev.c',
 'cn10k_worker.c',
+)
+
+if host_machine.cpu_family().startswith('aarch')
+sources += files(
 'deq/cn10k/deq_0_15_burst.c',
 'deq/cn10k/deq_16_31_burst.c',
 'deq/cn10k/deq_32_47_burst.c',
@@ -484,6 +493,7 @@ sources += files(
 'tx/cn10k/tx_112_127_seg.c',
 )
 endif
+endif

 extra_flags = ['-flax-vector-conversions', '-Wno-strict-aliasing']
 foreach flag: extra_flags
diff --git a/drivers/net/cnxk/cn10k_rx_select.c 
b/drivers/net/cnxk/cn10k_rx_select.c
index 1e0de1b7ac..1d44f2924e 100644
--- a/drivers/net/cnxk/cn10k_rx_select.c
+++ b/drivers/net/cnxk/cn10k_rx_select.c
@@ -5,7 +5,7 @@
 #include "cn10k_ethdev.h"
 #include "cn10k_rx.h"

-static inline void
+static __rte_used void
 pick_rx_func(struct rte_eth_dev *eth_dev,
  const eth_rx_burst_t rx_burst[NIX_RX_OFFLOAD_MAX])
 {
@@ -25,6 +25,7 @@ pick_rx_func(struct rte_eth_dev *eth_dev,
 void
 cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
 {
+#if defined(RTE_ARCH_ARM64)
 struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);

 const eth_rx_burst_t nix_eth_rx_burst[NIX_RX_OFFLOAD_MAX] = {
@@ -111,4 +112,7 @@ cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
 return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_reas);
 else
 return pick_rx_func(eth_dev, nix_eth_rx_vec_burst);
+#else
+   RTE_SET_USED(eth_dev);
+#endif
 }
diff --git a/drivers/net/cnxk/cn10k_tx_select.c 
b/drivers/net/cnxk/cn10k_tx_select.c
index 54

[PATCH v1] common/idpf: fix memory leak on AVX512 TX queue close

2023-05-30 Thread Wenjun Wu

When releasing mbufs for TX queue software ring of AVX512,
the mbuf in the range [i, tx_tail] should also be freed.
The variable i refers to the index of the last mbuf released
in the software ring.

Signed-off-by: Wenjun Wu 
---
 drivers/common/idpf/idpf_common_rxtx_avx512.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c 
b/drivers/common/idpf/idpf_common_rxtx_avx512.c
index dffb11fcf2..81312617cc 100644
--- a/drivers/common/idpf/idpf_common_rxtx_avx512.c
+++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c
@@ -1601,6 +1601,10 @@ idpf_tx_release_mbufs_avx512(struct idpf_tx_queue *txq)
}
i = 0;
}
+   for (; i < txq->tx_tail; i++) {
+   rte_pktmbuf_free_seg(swr[i].mbuf);
+   swr[i].mbuf = NULL;
+   }
 }
 
 static const struct idpf_txq_ops avx512_tx_vec_ops = {
-- 
2.34.1

Reminder - DPDK Tech Board Meeting - Tomorrow, Wed. 5/31/23 @ 8am Pacific/11am Eastern/1500h UTC

2023-05-30 Thread Nathan Southern

Good evening DPDK Community,

Tomorrow, the DPDK Tech board will meet @ 8am Pacific/11am Eastern/1500h
UTC.

Here is a read-only copy of the agenda:

https://annuel.framapad.org/p/r.0c3cc4d1e011214183872a98f6b5c7db


And as always, our login:

http://jit.si/dpdk

See you there.

Thanks,

Nathan

Nathan C. Southern, Project Coordinator

Data Plane Development Kit

The Linux Foundation

248.835.4812 (mobile)

nsouth...@linuxfoundation.org

RE: [PATCH v4] ethdev: add flow item for RoCE infiniband BTH

2023-05-30 Thread Dong Zhou

> -Original Message-
> From: Ferruh Yigit 
> Sent: Wednesday, May 31, 2023 1:46 AM
> To: Dong Zhou ; Ori Kam ; NBU-
> Contact-Thomas Monjalon (EXTERNAL) ; Aman Singh
> ; Yuying Zhang ;
> Andrew Rybchenko ; Olivier Matz
> 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v4] ethdev: add flow item for RoCE infiniband BTH
> 
> On 5/30/2023 4:06 AM, Dong Zhou wrote:
> > IB(InfiniBand) is one type of networking used in high-performance
> > computing with high throughput and low latency. Like Ethernet, IB
> > defines a layered protocol (Physical, Link, Network, Transport
> > Layers). IB provides native support for RDMA(Remote DMA), an extension
> > of the DMA that allows direct access to remote host memory without CPU
> > intervention. IB network requires NICs and switches to support the IB
> > protocol.
> >
> > RoCE(RDMA over Converged Ethernet) is a network protocol that allows
> > RDMA to run on Ethernet. RoCE encapsulates IB packets on Ethernet and
> > has two versions, RoCEv1 and RoCEv2. RoCEv1 is an Ethernet link layer
> > protocol, IB packets are encapsulated in the Ethernet layer and use
> > Ethernet type 0x8915. RoCEv2 is an internet layer protocol, IB packets
> > are encapsulated in UDP payload and use a destination port 4791, The
> > format of the RoCEv2 packet is as follows:
> >   ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)
> >
> > BTH(Base Transport Header) is the IB transport layer header, RoCEv1
> > and RoCEv2 both contain this header. This patch introduces a new RTE
> > item to match the IB BTH in RoCE packets. One use of this match is
> > that the user can monitor RoCEv2's CNP(Congestion Notification
> > Packet) by matching BTH opcode 0x81.
> >
> > This patch also adds the testpmd command line to match the RoCEv2 BTH.
> > Usage example:
> >
> >   testpmd> flow create 0 group 1 ingress pattern
> >eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
> >dst_qp is 0xd3 / end actions queue index 0 / end
> >
> > Signed-off-by: Dong Zhou 
> > Acked-by: Ori Kam 
> > Acked-by: Andrew Rybchenko 
> >
> > v2:
> >  - Change "ethernet" name to "Ethernet" in the commit log.
> >  - Add "RoCE" and "IB" 2 words to words-case.txt.
> >  - Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
> >  - Add "Acked-by" labels in the first ethdev patch.
> >
> > v3:
> >  - Do rebase to fix the patch apply failure.
> >  - Add "Acked-by" label in the second net/mlx5 patch.
> >
> > v4:
> >  - Split this series of patches, only keep the first ethdev patch.
> >
> 
> Patch looks good, can you please add a release notes update too?

Sure, will send the v5 patch to update it.

[PATCH v5] ethdev: add flow item for RoCE infiniband BTH

2023-05-30 Thread Dong Zhou

IB(InfiniBand) is one type of networking used in high-performance
computing with high throughput and low latency. Like Ethernet,
IB defines a layered protocol (Physical, Link, Network, Transport
Layers). IB provides native support for RDMA(Remote DMA), an
extension of the DMA that allows direct access to remote host
memory without CPU intervention. IB network requires NICs and
switches to support the IB protocol.

RoCE(RDMA over Converged Ethernet) is a network protocol that
allows RDMA to run on Ethernet. RoCE encapsulates IB packets on
Ethernet and has two versions, RoCEv1 and RoCEv2. RoCEv1 is an
Ethernet link layer protocol, IB packets are encapsulated in the
Ethernet layer and use Ethernet type 0x8915. RoCEv2 is an internet
layer protocol, IB packets are encapsulated in UDP payload and
use a destination port 4791, The format of the RoCEv2 packet is
as follows:
  ETH + IP + UDP(dport 4791) + IB(BTH + ExtHDR + PAYLOAD + CRC)

BTH(Base Transport Header) is the IB transport layer header, RoCEv1
and RoCEv2 both contain this header. This patch introduces a new
RTE item to match the IB BTH in RoCE packets. One use of this match
is that the user can monitor RoCEv2's CNP(Congestion Notification
Packet) by matching BTH opcode 0x81.

This patch also adds the testpmd command line to match the RoCEv2
BTH. Usage example:

  testpmd> flow create 0 group 1 ingress pattern
   eth / ipv4 / udp dst is 4791 / ib_bth opcode is 0x81
   dst_qp is 0xd3 / end actions queue index 0 / end

Signed-off-by: Dong Zhou 
Acked-by: Ori Kam 
Acked-by: Andrew Rybchenko 

v2:
 - Change "ethernet" name to "Ethernet" in the commit log.
 - Add "RoCE" and "IB" 2 words to words-case.txt.
 - Add "rte_byteorder.h" header file in "rte_ib.h" to fix compile errors.
 - Add "Acked-by" labels in the first ethdev patch.

v3:
 - Do rebase to fix the patch apply failure.
 - Add "Acked-by" label in the second net/mlx5 patch.

v4:
 - Split this series of patches, only keep the first ethdev patch.

v5:
 - Update the release notes.
 - Update the doxy-api-index.md file.
---
 app/test-pmd/cmdline_flow.c | 58 +
 devtools/words-case.txt |  2 +
 doc/api/doxy-api-index.md   |  3 +-
 doc/guides/nics/features/default.ini|  1 +
 doc/guides/prog_guide/rte_flow.rst  |  7 +++
 doc/guides/rel_notes/release_23_07.rst  |  3 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  7 +++
 lib/ethdev/rte_flow.c   |  1 +
 lib/ethdev/rte_flow.h   | 27 
 lib/net/meson.build |  1 +
 lib/net/rte_ib.h| 70 +
 11 files changed, 179 insertions(+), 1 deletion(-)
 create mode 100644 lib/net/rte_ib.h

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 58939ec321..3ade229ffc 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -496,6 +496,11 @@ enum index {
ITEM_QUOTA_STATE_NAME,
ITEM_AGGR_AFFINITY,
ITEM_AGGR_AFFINITY_VALUE,
+   ITEM_IB_BTH,
+   ITEM_IB_BTH_OPCODE,
+   ITEM_IB_BTH_PKEY,
+   ITEM_IB_BTH_DST_QPN,
+   ITEM_IB_BTH_PSN,
 
/* Validate/create actions. */
ACTIONS,
@@ -1452,6 +1457,7 @@ static const enum index next_item[] = {
ITEM_METER,
ITEM_QUOTA,
ITEM_AGGR_AFFINITY,
+   ITEM_IB_BTH,
END_SET,
ZERO,
 };
@@ -1953,6 +1959,15 @@ static const enum index item_aggr_affinity[] = {
ZERO,
 };
 
+static const enum index item_ib_bth[] = {
+   ITEM_IB_BTH_OPCODE,
+   ITEM_IB_BTH_PKEY,
+   ITEM_IB_BTH_DST_QPN,
+   ITEM_IB_BTH_PSN,
+   ITEM_NEXT,
+   ZERO,
+};
+
 static const enum index next_action[] = {
ACTION_END,
ACTION_VOID,
@@ -5523,6 +5538,46 @@ static const struct token token_list[] = {
.call = parse_quota_state_name,
.comp = comp_quota_state_name
},
+   [ITEM_IB_BTH] = {
+   .name = "ib_bth",
+   .help = "match ib bth fields",
+   .priv = PRIV_ITEM(IB_BTH,
+ sizeof(struct rte_flow_item_ib_bth)),
+   .next = NEXT(item_ib_bth),
+   .call = parse_vc,
+   },
+   [ITEM_IB_BTH_OPCODE] = {
+   .name = "opcode",
+   .help = "match ib bth opcode",
+   .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+item_param),
+   .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+hdr.opcode)),
+   },
+   [ITEM_IB_BTH_PKEY] = {
+   .name = "pkey",
+   .help = "partition key",
+   .next = NEXT(item_ib_bth, NEXT_ENTRY(COMMON_UNSIGNED),
+item_param),
+   .args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_item_ib_bth,
+

[PATCH v4 2/2] test: add reassembly perf test

2023-05-30 Thread pbhagavatula

From: Pavan Nikhilesh 

Add reassembly perf autotest for both ipv4 and ipv6 reassembly.
Each test is performed with variable number of fragments per flow,
either ordered or unordered fragments and interleaved flows.

Signed-off-by: Pavan Nikhilesh 
Reviewed-by: Amit Prakash Shukla 
Tested-by: Amit Prakash Shukla 
---
 app/test/meson.build|2 +
 app/test/test_reassembly_perf.c | 1002 +++
 2 files changed, 1004 insertions(+)
 create mode 100644 app/test/test_reassembly_perf.c

diff --git a/app/test/meson.build b/app/test/meson.build
index d96ae7a961..70f320f388 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -108,6 +108,7 @@ test_sources = files(
 'test_rawdev.c',
 'test_rcu_qsbr.c',
 'test_rcu_qsbr_perf.c',
+'test_reassembly_perf.c',
 'test_reciprocal_division.c',
 'test_reciprocal_division_perf.c',
 'test_red.c',
@@ -297,6 +298,7 @@ perf_test_names = [
 'trace_perf_autotest',
 'ipsec_perf_autotest',
 'thash_perf_autotest',
+'reassembly_perf_autotest',
 ]
 
 driver_test_names = [
diff --git a/app/test/test_reassembly_perf.c b/app/test/test_reassembly_perf.c
new file mode 100644
index 00..f72b5b576e
--- /dev/null
+++ b/app/test/test_reassembly_perf.c
@@ -0,0 +1,1002 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2023 Marvell.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define MAX_FLOWS  (1024 * 32)
+#define MAX_BKTS   MAX_FLOWS
+#define MAX_ENTRIES_PER_BKT 16
+#define MAX_FRAGMENTS  RTE_LIBRTE_IP_FRAG_MAX_FRAG
+#define MIN_FRAGMENTS  2
+#define MAX_PKTS   (MAX_FLOWS * MAX_FRAGMENTS)
+
+#define MAX_PKT_LEN 2048
+#define MAX_TTL_MS  (5 * MS_PER_S)
+
+/* use RFC863 Discard Protocol */
+#define UDP_SRC_PORT 9
+#define UDP_DST_PORT 9
+
+/* use RFC5735 / RFC2544 reserved network test addresses */
+#define IP_SRC_ADDR(x) ((198U << 24) | (18 << 16) | (0 << 8) | (x))
+#define IP_DST_ADDR(x) ((198U << 24) | (18 << 16) | (1 << 15) | (x))
+
+/* 2001:0200::/48 is IANA reserved range for IPv6 benchmarking (RFC5180) */
+static uint8_t ip6_addr[16] = {32, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0};
+#define IP6_VERSION 6
+
+#define IP_DEFTTL 64 /* from RFC 1340. */
+
+static struct rte_ip_frag_tbl *frag_tbl;
+static struct rte_mempool *pkt_pool;
+static struct rte_mbuf *mbufs[MAX_FLOWS][MAX_FRAGMENTS];
+static uint8_t frag_per_flow[MAX_FLOWS];
+static uint32_t flow_cnt;
+
+#define FILL_MODE_LINEAR  0
+#define FILL_MODE_RANDOM  1
+#define FILL_MODE_INTERLEAVED 2
+
+static int
+reassembly_test_setup(void)
+{
+   uint64_t max_ttl_cyc = (MAX_TTL_MS * rte_get_timer_hz()) / 1E3;
+
+   frag_tbl = rte_ip_frag_table_create(MAX_BKTS, MAX_ENTRIES_PER_BKT,
+   MAX_BKTS * MAX_ENTRIES_PER_BKT, 
max_ttl_cyc,
+   rte_socket_id());
+   if (frag_tbl == NULL)
+   return TEST_FAILED;
+
+   rte_mbuf_set_user_mempool_ops("ring_mp_mc");
+   pkt_pool = rte_pktmbuf_pool_create(
+   "reassembly_perf_pool", MAX_FLOWS * MAX_FRAGMENTS, 0, 0,
+   RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
+   if (pkt_pool == NULL) {
+   printf("[%s] Failed to create pkt pool\n", __func__);
+   rte_ip_frag_table_destroy(frag_tbl);
+   return TEST_FAILED;
+   }
+
+   return TEST_SUCCESS;
+}
+
+static void
+reassembly_test_teardown(void)
+{
+   if (frag_tbl != NULL)
+   rte_ip_frag_table_destroy(frag_tbl);
+
+   if (pkt_pool != NULL)
+   rte_mempool_free(pkt_pool);
+}
+
+static void
+randomize_array_positions(void **array, uint8_t sz)
+{
+   void *tmp;
+   int i, j;
+
+   if (sz == 2) {
+   tmp = array[0];
+   array[0] = array[1];
+   array[1] = tmp;
+   } else {
+   for (i = sz - 1; i > 0; i--) {
+   j = rte_rand_max(i + 1);
+   tmp = array[i];
+   array[i] = array[j];
+   array[j] = tmp;
+   }
+   }
+}
+
+static void
+reassembly_print_banner(const char *proto_str)
+{
+   printf("+=="
+  "+\n");
+   printf("| %-32s| %-3s : %-58d|\n", proto_str, "Flow Count", MAX_FLOWS);
+   printf("+++=+=+"
+  "+===+\n");
+   printf("%-17s%-17s%-14s%-14s%-25s%-20s\n", "| Fragment Order",
+  "| Fragments/Flow", "| Outstanding", "| Cycles/Flow",
+  "| Cycles/Fragment insert", "| Cycles/Reassembly |");
+   printf("++==

[PATCH v4 1/2] ip_frag: optimize key compare and hash generation

2023-05-30 Thread pbhagavatula

From: Pavan Nikhilesh 

Use optimized rte_hash_k32_cmp_eq routine for key comparison for
x86 and ARM64.
Use CRC instructions for hash generation on ARM64.

Signed-off-by: Pavan Nikhilesh 
Reviewed-by: Ruifeng Wang 
---
On Neoverse-N2, performance improved by 10% when measured with
examples/ip_reassembly.

 v4 Changes:
 - Fix compilation failures (sys/queue)
 - Update test case to use proper macros.
 v3 Changes:
 - Drop NEON patch.
 v2 Changes:
 - Fix compilation failure with non ARM64/x86 targets

 lib/hash/rte_cmp_arm64.h   | 16 
 lib/hash/rte_cmp_x86.h | 16 
 lib/ip_frag/ip_frag_common.h   | 14 ++
 lib/ip_frag/ip_frag_internal.c |  4 ++--
 4 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/lib/hash/rte_cmp_arm64.h b/lib/hash/rte_cmp_arm64.h
index e9e26f9abd..a3e85635eb 100644
--- a/lib/hash/rte_cmp_arm64.h
+++ b/lib/hash/rte_cmp_arm64.h
@@ -3,7 +3,7 @@
  */

 /* Functions to compare multiple of 16 byte keys (up to 128 bytes) */
-static int
+static inline int
 rte_hash_k16_cmp_eq(const void *key1, const void *key2,
size_t key_len __rte_unused)
 {
@@ -24,7 +24,7 @@ rte_hash_k16_cmp_eq(const void *key1, const void *key2,
return !(x0 == 0 && x1 == 0);
 }

-static int
+static inline int
 rte_hash_k32_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k16_cmp_eq(key1, key2, key_len) ||
@@ -32,7 +32,7 @@ rte_hash_k32_cmp_eq(const void *key1, const void *key2, 
size_t key_len)
(const char *) key2 + 16, key_len);
 }

-static int
+static inline int
 rte_hash_k48_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k16_cmp_eq(key1, key2, key_len) ||
@@ -42,7 +42,7 @@ rte_hash_k48_cmp_eq(const void *key1, const void *key2, 
size_t key_len)
(const char *) key2 + 32, key_len);
 }

-static int
+static inline int
 rte_hash_k64_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k32_cmp_eq(key1, key2, key_len) ||
@@ -50,7 +50,7 @@ rte_hash_k64_cmp_eq(const void *key1, const void *key2, 
size_t key_len)
(const char *) key2 + 32, key_len);
 }

-static int
+static inline int
 rte_hash_k80_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k64_cmp_eq(key1, key2, key_len) ||
@@ -58,7 +58,7 @@ rte_hash_k80_cmp_eq(const void *key1, const void *key2, 
size_t key_len)
(const char *) key2 + 64, key_len);
 }

-static int
+static inline int
 rte_hash_k96_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k64_cmp_eq(key1, key2, key_len) ||
@@ -66,7 +66,7 @@ rte_hash_k96_cmp_eq(const void *key1, const void *key2, 
size_t key_len)
(const char *) key2 + 64, key_len);
 }

-static int
+static inline int
 rte_hash_k112_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k64_cmp_eq(key1, key2, key_len) ||
@@ -76,7 +76,7 @@ rte_hash_k112_cmp_eq(const void *key1, const void *key2, 
size_t key_len)
(const char *) key2 + 96, key_len);
 }

-static int
+static inline int
 rte_hash_k128_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k64_cmp_eq(key1, key2, key_len) ||
diff --git a/lib/hash/rte_cmp_x86.h b/lib/hash/rte_cmp_x86.h
index 13a5836351..ddfbef462f 100644
--- a/lib/hash/rte_cmp_x86.h
+++ b/lib/hash/rte_cmp_x86.h
@@ -5,7 +5,7 @@
 #include 

 /* Functions to compare multiple of 16 byte keys (up to 128 bytes) */
-static int
+static inline int
 rte_hash_k16_cmp_eq(const void *key1, const void *key2, size_t key_len 
__rte_unused)
 {
const __m128i k1 = _mm_loadu_si128((const __m128i *) key1);
@@ -15,7 +15,7 @@ rte_hash_k16_cmp_eq(const void *key1, const void *key2, 
size_t key_len __rte_unu
return !_mm_test_all_zeros(x, x);
 }

-static int
+static inline int
 rte_hash_k32_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k16_cmp_eq(key1, key2, key_len) ||
@@ -23,7 +23,7 @@ rte_hash_k32_cmp_eq(const void *key1, const void *key2, 
size_t key_len)
(const char *) key2 + 16, key_len);
 }

-static int
+static inline int
 rte_hash_k48_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k16_cmp_eq(key1, key2, key_len) ||
@@ -33,7 +33,7 @@ rte_hash_k48_cmp_eq(const void *key1, const void *key2, 
size_t key_len)
(const char *) key2 + 32, key_len);
 }

-static int
+static inline int
 rte_hash_k64_cmp_eq(const void *key1, const void *key2, size_t key_len)
 {
return rte_hash_k32_cmp_eq(key1, key2, key_len) ||
@@ -41,7 +41,7 @@ rte_hash_k64_cmp_eq(const void *key1, const void *key2, 
size_t key_len)
(const char *) key2 + 32, key_len);
 }

-static in

[PATCH v4 1/4] bus/pci: introduce an internal representation of PCI device

2023-05-30 Thread Miao Li

From: Chenbo Xia 

This patch introduces an internal representation of the PCI device
which will be used to store the internal information that don't have
to be exposed to drivers, e.g., the VFIO region sizes/offsets.

In this patch, the internal structure is simply a wrapper of the
rte_pci_device structure. More fields will be added.

Signed-off-by: Chenbo Xia 
Acked-by: Sunil Kumar Kori 
Acked-by: Yahui Cao 
---
 drivers/bus/pci/bsd/pci.c | 13 -
 drivers/bus/pci/linux/pci.c   | 28 
 drivers/bus/pci/pci_common.c  | 12 ++--
 drivers/bus/pci/private.h | 14 +-
 drivers/bus/pci/windows/pci.c | 14 +-
 5 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 7459d15c7e..a747eca58c 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -208,16 +208,19 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, 
int res_idx,
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 {
+   struct rte_pci_device_internal *pdev;
struct rte_pci_device *dev;
struct pci_bar_io bar;
unsigned i, max;
 
-   dev = malloc(sizeof(*dev));
-   if (dev == NULL) {
+   pdev = malloc(sizeof(*pdev));
+   if (pdev == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci 
device\n");
return -1;
}
 
-   memset(dev, 0, sizeof(*dev));
+   memset(pdev, 0, sizeof(*pdev));
+   dev = &pdev->device;
dev->device.bus = &rte_pci_bus.bus;
 
dev->addr.domain = conf->pc_sel.pc_domain;
@@ -303,7 +306,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
memmove(dev2->mem_resource,
dev->mem_resource,
sizeof(dev->mem_resource));
-   pci_free(dev);
+   pci_free(pdev);
}
return 0;
}
@@ -313,7 +316,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
return 0;
 
 skipdev:
-   pci_free(dev);
+   pci_free(pdev);
return 0;
 }
 
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index ebd1395502..4c2c5ba382 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -211,22 +211,26 @@ pci_scan_one(const char *dirname, const struct 
rte_pci_addr *addr)
 {
char filename[PATH_MAX];
unsigned long tmp;
+   struct rte_pci_device_internal *pdev;
struct rte_pci_device *dev;
char driver[PATH_MAX];
int ret;
 
-   dev = malloc(sizeof(*dev));
-   if (dev == NULL)
+   pdev = malloc(sizeof(*pdev));
+   if (pdev == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci 
device\n");
return -1;
+   }
 
-   memset(dev, 0, sizeof(*dev));
+   memset(pdev, 0, sizeof(*pdev));
+   dev = &pdev->device;
dev->device.bus = &rte_pci_bus.bus;
dev->addr = *addr;
 
/* get vendor id */
snprintf(filename, sizeof(filename), "%s/vendor", dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
dev->id.vendor_id = (uint16_t)tmp;
@@ -234,7 +238,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr 
*addr)
/* get device id */
snprintf(filename, sizeof(filename), "%s/device", dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
dev->id.device_id = (uint16_t)tmp;
@@ -243,7 +247,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr 
*addr)
snprintf(filename, sizeof(filename), "%s/subsystem_vendor",
 dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
dev->id.subsystem_vendor_id = (uint16_t)tmp;
@@ -252,7 +256,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr 
*addr)
snprintf(filename, sizeof(filename), "%s/subsystem_device",
 dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
dev->id.subsystem_device_id = (uint16_t)tmp;
@@ -261,7 +265,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr 
*addr)
snprintf(filename, sizeof(filename), "%s/class",
 dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-   pci_free(dev);
+   pci_free(pdev);
return -1;
}
/* the least 24 bits are valid:

[PATCH v4 0/4] Support VFIO sparse mmap in PCI bus

2023-05-30 Thread Miao Li

This series introduces a VFIO standard capability, called sparse
mmap to PCI bus. In linux kernel, it's defined as
VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
mmap whole BAR region into DPDK process, only mmap part of the
BAR region after getting sparse mmap information from kernel.
For the rest of BAR region that is not mmap-ed, DPDK process
can use pread/pwrite system calls to access. Sparse mmap is
useful when kernel does not want userspace to mmap whole BAR
region, or kernel wants to control over access to specific BAR
region. Vendors can choose to enable this feature or not for
their devices in their specific kernel modules.

In this patchset:

Patch 1-3 is mainly for introducing BAR access APIs so that
driver could use them to access specific BAR using pread/pwrite
system calls when part of the BAR is not mmap-able. Patch 4
adds the VFIO sparse mmap support finally.

v4:
1. add sparse mmap information allocation and release
2. add release note for BAR access APIs

v3:
fix variable 'pdev' and 'info' uninitialized error

v2:
1. add PCI device internal structure in bus/pci/windows/pci.c
2. fix parameter type error

Chenbo Xia (3):
  bus/pci: introduce an internal representation of PCI device
  bus/pci: avoid depending on private value in kernel source
  bus/pci: introduce helper for MMIO read and write

Miao Li (1):
  bus/pci: add VFIO sparse mmap support

 doc/guides/rel_notes/release_23_07.rst |   5 +
 drivers/bus/pci/bsd/pci.c  |  35 ++-
 drivers/bus/pci/linux/pci.c|  78 +-
 drivers/bus/pci/linux/pci_init.h   |  14 +-
 drivers/bus/pci/linux/pci_uio.c|  22 ++
 drivers/bus/pci/linux/pci_vfio.c   | 371 -
 drivers/bus/pci/pci_common.c   |  12 +-
 drivers/bus/pci/private.h  |  25 +-
 drivers/bus/pci/rte_bus_pci.h  |  48 
 drivers/bus/pci/version.map|   3 +
 drivers/bus/pci/windows/pci.c  |  14 +-
 lib/eal/include/rte_vfio.h |   1 -
 12 files changed, 525 insertions(+), 103 deletions(-)

-- 
2.25.1

[PATCH v4 2/4] bus/pci: avoid depending on private value in kernel source

2023-05-30 Thread Miao Li

From: Chenbo Xia 

The value 40 used in VFIO_GET_REGION_ADDR() is a private value
(VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It
is not part of VFIO API, and we should not depend on it.

[1] https://github.com/torvalds/linux/blob/v6.2/include/linux/vfio_pci_core.h

Signed-off-by: Chenbo Xia 
Acked-by: Sunil Kumar Kori 
Acked-by: Yahui Cao 
---
 drivers/bus/pci/linux/pci.c  |   4 +-
 drivers/bus/pci/linux/pci_init.h |   4 +-
 drivers/bus/pci/linux/pci_vfio.c | 197 +++
 drivers/bus/pci/private.h|   9 ++
 lib/eal/include/rte_vfio.h   |   1 -
 5 files changed, 159 insertions(+), 56 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 4c2c5ba382..04e21ae20f 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -645,7 +645,7 @@ int rte_pci_read_config(const struct rte_pci_device *device,
return pci_uio_read_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
case RTE_PCI_KDRV_VFIO:
-   return pci_vfio_read_config(intr_handle, buf, len, offset);
+   return pci_vfio_read_config(device, buf, len, offset);
 #endif
default:
rte_pci_device_name(&device->addr, devname,
@@ -669,7 +669,7 @@ int rte_pci_write_config(const struct rte_pci_device 
*device,
return pci_uio_write_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
case RTE_PCI_KDRV_VFIO:
-   return pci_vfio_write_config(intr_handle, buf, len, offset);
+   return pci_vfio_write_config(device, buf, len, offset);
 #endif
default:
rte_pci_device_name(&device->addr, devname,
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index dcea726186..9f6659ba6e 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -66,9 +66,9 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);
 #endif
 
 /* access config space */
-int pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_read_config(const struct rte_pci_device *dev,
 void *buf, size_t len, off_t offs);
-int pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_write_config(const struct rte_pci_device *dev,
  const void *buf, size_t len, off_t offs);
 
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fab3483d9f..5aef84b7d0 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -43,45 +43,82 @@ static struct rte_tailq_elem rte_vfio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_vfio_tailq)
 
+static int
+pci_vfio_get_region(const struct rte_pci_device *dev, int index,
+   uint64_t *size, uint64_t *offset)
+{
+   const struct rte_pci_device_internal *pdev =
+   RTE_PCI_DEVICE_INTERNAL_CONST(dev);
+
+   if (index >= VFIO_PCI_NUM_REGIONS || index >= RTE_MAX_PCI_REGIONS)
+   return -1;
+
+   if (pdev->region[index].size == 0 && pdev->region[index].offset == 0)
+   return -1;
+
+   *size   = pdev->region[index].size;
+   *offset = pdev->region[index].offset;
+
+   return 0;
+}
+
 int
-pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_read_config(const struct rte_pci_device *dev,
void *buf, size_t len, off_t offs)
 {
-   int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+   uint64_t size, offset;
+   int fd;
 
-   if (vfio_dev_fd < 0)
+   fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+   if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+   &size, &offset) != 0)
+   return -1;
+
+   if ((uint64_t)len + offs > size)
return -1;
 
-   return pread64(vfio_dev_fd, buf, len,
-  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+   return pread64(fd, buf, len, offset + offs);
 }
 
 int
-pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_write_config(const struct rte_pci_device *dev,
const void *buf, size_t len, off_t offs)
 {
-   int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+   uint64_t size, offset;
+   int fd;
 
-   if (vfio_dev_fd < 0)
+   fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+   if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+   &size, &offset) != 0)
return -1;
 
-   return pwrite64(vfio_dev_fd, buf, len,
-  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+   if ((uint64_t)len + offs > size)
+   return -1;
+
+   return pwrite64(fd, buf, len, offset + offs);
 }
 
 /* get PCI BAR number where MSI-X interrupts are */
 static int
-pci_vfio_get_msix_bar(in

[PATCH v4 3/4] bus/pci: introduce helper for MMIO read and write

2023-05-30 Thread Miao Li

From: Chenbo Xia 

The MMIO regions may not be mmap-able for VFIO-PCI devices.
In this case, the driver should explicitly do read and write
to access these regions.

Signed-off-by: Chenbo Xia 
Acked-by: Sunil Kumar Kori 
Acked-by: Yahui Cao 
---
 doc/guides/rel_notes/release_23_07.rst |  5 +++
 drivers/bus/pci/bsd/pci.c  | 22 
 drivers/bus/pci/linux/pci.c| 46 
 drivers/bus/pci/linux/pci_init.h   | 10 ++
 drivers/bus/pci/linux/pci_uio.c| 22 
 drivers/bus/pci/linux/pci_vfio.c   | 36 +++
 drivers/bus/pci/rte_bus_pci.h  | 48 ++
 drivers/bus/pci/version.map|  3 ++
 8 files changed, 192 insertions(+)

diff --git a/doc/guides/rel_notes/release_23_07.rst 
b/doc/guides/rel_notes/release_23_07.rst
index a9b1293689..dba39134f1 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -55,6 +55,11 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Added MMIO read and write APIs to PCI bus.**
+
+  Introduced ``rte_pci_mmio_read()`` and ``rte_pci_mmio_write()`` APIs to PCI
+  bus so that PCI drivers can access PCI memory resources when they are not
+  mapped to process address space.
 
 Removed Items
 -
diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index a747eca58c..27f12590d4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -489,6 +489,28 @@ int rte_pci_write_config(const struct rte_pci_device *dev,
return -1;
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar,
+ void *buf, size_t len, off_t offset)
+{
+   if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+   (uint64_t)offset + len > dev->mem_resource[bar].len)
+   return -1;
+   memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+   return len;
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *dev, int bar,
+  const void *buf, size_t len, off_t offset)
+{
+   if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+   (uint64_t)offset + len > dev->mem_resource[bar].len)
+   return -1;
+   memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+   return len;
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04e21ae20f..3d237398d9 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -680,6 +680,52 @@ int rte_pci_write_config(const struct rte_pci_device 
*device,
}
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+   void *buf, size_t len, off_t offset)
+{
+   char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+   switch (device->kdrv) {
+   case RTE_PCI_KDRV_IGB_UIO:
+   case RTE_PCI_KDRV_UIO_GENERIC:
+   return pci_uio_mmio_read(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+   case RTE_PCI_KDRV_VFIO:
+   return pci_vfio_mmio_read(device, bar, buf, len, offset);
+#endif
+   default:
+   rte_pci_device_name(&device->addr, devname,
+   RTE_DEV_NAME_MAX_LEN);
+   RTE_LOG(ERR, EAL,
+   "Unknown driver type for %s\n", devname);
+   return -1;
+   }
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+   const void *buf, size_t len, off_t offset)
+{
+   char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+   switch (device->kdrv) {
+   case RTE_PCI_KDRV_IGB_UIO:
+   case RTE_PCI_KDRV_UIO_GENERIC:
+   return pci_uio_mmio_write(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+   case RTE_PCI_KDRV_VFIO:
+   return pci_vfio_mmio_write(device, bar, buf, len, offset);
+#endif
+   default:
+   rte_pci_device_name(&device->addr, devname,
+   RTE_DEV_NAME_MAX_LEN);
+   RTE_LOG(ERR, EAL,
+   "Unknown driver type for %s\n", devname);
+   return -1;
+   }
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 9f6659ba6e..d842809ccd 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -37,6 +37,11 @@ int pci_uio_read_config(const struct rte_intr_handle 
*intr_handle,
 int pci_uio_write_config(const struct rte_intr_handle *intr_handle,

[PATCH v4 4/4] bus/pci: add VFIO sparse mmap support

2023-05-30 Thread Miao Li

This patch adds sparse mmap support in PCI bus. Sparse mmap is a
capability defined in VFIO which allows multiple mmap areas in one
VFIO region.

In this patch, the sparse mmap regions are mapped to one continuous
virtual address region that follows device-specific BAR layout. So,
driver can still access all mapped sparse mmap regions by using
'bar_base_address + bar_offset'.

Signed-off-by: Miao Li 
Signed-off-by: Chenbo Xia 
Acked-by: Sunil Kumar Kori 
Acked-by: Yahui Cao 
---
 drivers/bus/pci/linux/pci_vfio.c | 138 +++
 drivers/bus/pci/private.h|   2 +
 2 files changed, 122 insertions(+), 18 deletions(-)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 24b0795fbd..e6db30d36a 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -673,6 +673,54 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct 
mapped_pci_resource *vfio_res,
return 0;
 }
 
+static int
+pci_vfio_sparse_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
+   int bar_index, int additional_flags)
+{
+   struct pci_map *bar = &vfio_res->maps[bar_index];
+   struct vfio_region_sparse_mmap_area *sparse;
+   void *bar_addr;
+   uint32_t i;
+
+   if (bar->size == 0) {
+   RTE_LOG(DEBUG, EAL, "Bar size is 0, skip BAR%d\n", bar_index);
+   return 0;
+   }
+
+   /* reserve the address using an inaccessible mapping */
+   bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
+   MAP_ANONYMOUS | additional_flags, -1, 0);
+   if (bar_addr != MAP_FAILED) {
+   void *map_addr = NULL;
+   for (i = 0; i < bar->nr_areas; i++) {
+   sparse = &bar->areas[i];
+   if (sparse->size) {
+   void *addr = RTE_PTR_ADD(bar_addr, 
(uintptr_t)sparse->offset);
+   map_addr = pci_map_resource(addr, vfio_dev_fd,
+   bar->offset + sparse->offset, 
sparse->size,
+   RTE_MAP_FORCE_ADDRESS);
+   if (map_addr == NULL) {
+   munmap(bar_addr, bar->size);
+   RTE_LOG(ERR, EAL, "Failed to map pci 
BAR%d\n",
+   bar_index);
+   goto err_map;
+   }
+   }
+   }
+   } else {
+   RTE_LOG(ERR, EAL, "Failed to create inaccessible mapping for 
BAR%d\n",
+   bar_index);
+   goto err_map;
+   }
+
+   bar->addr = bar_addr;
+   return 0;
+
+err_map:
+   bar->nr_areas = 0;
+   return -1;
+}
+
 /*
  * region info may contain capability headers, so we need to keep reallocating
  * the memory until we match allocated memory size with argsz.
@@ -798,7 +846,7 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
char pci_addr[PATH_MAX] = {0};
int vfio_dev_fd;
struct rte_pci_addr *loc = &dev->addr;
-   int i, ret;
+   int i, j, ret;
struct mapped_pci_resource *vfio_res = NULL;
struct mapped_pci_res_list *vfio_res_list =
RTE_TAILQ_CAST(rte_vfio_tailq.head, mapped_pci_res_list);
@@ -875,13 +923,15 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
for (i = 0; i < vfio_res->nb_maps; i++) {
void *bar_addr;
+   struct vfio_info_cap_header *hdr;
+   struct vfio_region_info_cap_sparse_mmap *sparse;
 
ret = pci_vfio_get_region_info(vfio_dev_fd, ®, i);
if (ret < 0) {
RTE_LOG(ERR, EAL,
"%s cannot get device region info error "
"%i (%s)\n", pci_addr, errno, strerror(errno));
-   goto err_vfio_res;
+   goto err_map;
}
 
pdev->region[i].size = reg->size;
@@ -891,7 +941,7 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
ret = pci_vfio_is_ioport_bar(dev, vfio_dev_fd, i);
if (ret < 0) {
free(reg);
-   goto err_vfio_res;
+   goto err_map;
} else if (ret) {
RTE_LOG(INFO, EAL, "Ignore mapping IO port bar(%d)\n",
i);
@@ -920,12 +970,41 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
maps[i].size = reg->size;
maps[i].path = NULL; /* vfio doesn't have per-resource paths */
 
-   ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
-   if (ret < 0) {
-   RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-

RE: [PATCH] pci: fix comment referencing renamed function

2023-05-30 Thread Xia, Chenbo

> -Original Message-
> From: Thomas Monjalon 
> Sent: Wednesday, May 31, 2023 12:02 AM
> To: dev@dpdk.org
> Cc: sta...@dpdk.org; Gaetan Rivet ; David Marchand
> 
> Subject: [PATCH] pci: fix comment referencing renamed function
> 
> When renaming functions eal_parse_pci_*,
> a referencing comment was missed in the function rte_pci_device_name().
> 
> Fixes: ca52fccbb3b9 ("pci: remove deprecated functions")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  lib/pci/rte_pci.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
> index 5088157e74..aab761b918 100644
> --- a/lib/pci/rte_pci.h
> +++ b/lib/pci/rte_pci.h
> @@ -104,8 +104,7 @@ struct rte_pci_addr {
> 
>  /**
>   * Utility function to write a pci device name, this device name can
> later be
> - * used to retrieve the corresponding rte_pci_addr using eal_parse_pci_*
> - * BDF helpers.
> + * used to retrieve the corresponding rte_pci_addr using
> rte_pci_addr_parse().
>   *
>   * @param addr
>   *   The PCI Bus-Device-Function address
> --
> 2.40.1

Reviewed-by: Chenbo Xia

RE: [PATCH v6 1/4] ethdev: add API for mbufs recycle mode

2023-05-30 Thread Feifei Wang




> -Original Message-
> From: Morten Brørup 
> Sent: Thursday, May 25, 2023 11:09 PM
> To: Feifei Wang ; tho...@monjalon.net; Ferruh
> Yigit ; Andrew Rybchenko
> 
> Cc: dev@dpdk.org; nd ; Honnappa Nagarahalli
> ; Ruifeng Wang
> 
> Subject: RE: [PATCH v6 1/4] ethdev: add API for mbufs recycle mode
> 
> > From: Feifei Wang [mailto:feifei.wa...@arm.com]
> > Sent: Thursday, 25 May 2023 11.46
> >
> > Add 'rte_eth_recycle_rx_queue_info_get' and 'rte_eth_recycle_mbufs'
> > APIs to recycle used mbufs from a transmit queue of an Ethernet
> > device, and move these mbufs into a mbuf ring for a receive queue of
> > an Ethernet device. This can bypass mempool 'put/get' operations hence
> > saving CPU cycles.
> >
> > For each recycling mbufs, the rte_eth_recycle_mbufs() function
> > performs the following operations:
> > - Copy used *rte_mbuf* buffer pointers from Tx mbuf ring into Rx mbuf
> > ring.
> > - Replenish the Rx descriptors with the recycling *rte_mbuf* mbufs
> > freed from the Tx mbuf ring.
> >
> > Suggested-by: Honnappa Nagarahalli 
> > Suggested-by: Ruifeng Wang 
> > Signed-off-by: Feifei Wang 
> > Reviewed-by: Ruifeng Wang 
> > Reviewed-by: Honnappa Nagarahalli 
> > ---
> 
> [...]
> 
> > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > index 2c9d615fb5..c6723d5277 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -59,6 +59,10 @@ struct rte_eth_dev {
> > eth_rx_descriptor_status_t rx_descriptor_status;
> > /** Check the status of a Tx descriptor */
> > eth_tx_descriptor_status_t tx_descriptor_status;
> > +   /** Pointer to PMD transmit mbufs reuse function */
> > +   eth_recycle_tx_mbufs_reuse_t recycle_tx_mbufs_reuse;
> > +   /** Pointer to PMD receive descriptors refill function */
> > +   eth_recycle_rx_descriptors_refill_t recycle_rx_descriptors_refill;
> >
> > /**
> >  * Device data that is shared between primary and secondary
> > processes
> 
> The rte_eth_dev struct currently looks like this:
> 
> /**
>  * @internal
>  * The generic data structure associated with each Ethernet device.
>  *
>  * Pointers to burst-oriented packet receive and transmit functions are
>  * located at the beginning of the structure, along with the pointer to
>  * where all the data elements for the particular device are stored in shared
>  * memory. This split allows the function pointer and driver data to be per-
>  * process, while the actual configuration data for the device is shared.
>  */
> struct rte_eth_dev {
>   eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function */
>   eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function */
> 
>   /** Pointer to PMD transmit prepare function */
>   eth_tx_prep_t tx_pkt_prepare;
>   /** Get the number of used Rx descriptors */
>   eth_rx_queue_count_t rx_queue_count;
>   /** Check the status of a Rx descriptor */
>   eth_rx_descriptor_status_t rx_descriptor_status;
>   /** Check the status of a Tx descriptor */
>   eth_tx_descriptor_status_t tx_descriptor_status;
> 
>   /**
>* Device data that is shared between primary and secondary
> processes
>*/
>   struct rte_eth_dev_data *data;
>   void *process_private; /**< Pointer to per-process device data */
>   const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD
> */
>   struct rte_device *device; /**< Backing device */
>   struct rte_intr_handle *intr_handle; /**< Device interrupt handle */
> 
>   /** User application callbacks for NIC interrupts */
>   struct rte_eth_dev_cb_list link_intr_cbs;
>   /**
>* User-supplied functions called from rx_burst to post-process
>* received packets before passing them to the user
>*/
>   struct rte_eth_rxtx_callback
> *post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>   /**
>* User-supplied functions called from tx_burst to pre-process
>* received packets before passing them to the driver for transmission
>*/
>   struct rte_eth_rxtx_callback
> *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> 
>   enum rte_eth_dev_state state; /**< Flag indicating the port state */
>   void *security_ctx; /**< Context for security ops */ }
> __rte_cache_aligned;
> 
> Inserting the two new function pointers (recycle_tx_mbufs_reuse and
> recycle_rx_descriptors_refill) as the 7th and 8th fields will move the 'data' 
> and
> 'process_private' pointers out of the first cache line.
> 
> If those data pointers are used in the fast path with the rx_pkt_burst and
> tx_pkt_burst functions, moving them to a different cache line might have a
> performance impact on those two functions.
> 
> Disclaimer: This is a big "if", and wild speculation from me, because I 
> haven't
> looked at it in detail! If this structure is not used in the fast path like 
> this, you
> can ignore my suggestion below.
> 
> Please consider moving the 'data' and 'process_p

RE: [PATCH v3 3/4] vhost: fix invalid call FD handling

2023-05-30 Thread Xia, Chenbo

> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, May 30, 2023 8:54 PM
> To: Eelco Chaudron ; Xia, Chenbo
> ; david.march...@redhat.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3 3/4] vhost: fix invalid call FD handling
> 
> 
> 
> On 5/17/23 11:09, Eelco Chaudron wrote:
> > This patch fixes cases where IRQ injection is tried while
> > the call FD is not valid, which should not happen.
> >
> > Fixes: b1cce26af1dc ("vhost: add notification for packed ring")
> > Fixes: e37ff954405a ("vhost: support virtqueue interrupt/notification
> suppression")
> >
> > Signed-off-by: Maxime Coquelin 
> > Signed-off-by: Eelco Chaudron 
> > ---
> >   lib/vhost/vhost.h |8 
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> > index 37609c7c8d..23a4e2b1a7 100644
> > --- a/lib/vhost/vhost.h
> > +++ b/lib/vhost/vhost.h
> > @@ -903,9 +903,9 @@ vhost_vring_call_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq)
> > "%s: used_event_idx=%d, old=%d, new=%d\n",
> > __func__, vhost_used_event(vq), old, new);
> >
> > -   if ((vhost_need_event(vhost_used_event(vq), new, old) &&
> > -   (vq->callfd >= 0)) ||
> > -   unlikely(!signalled_used_valid)) {
> > +   if ((vhost_need_event(vhost_used_event(vq), new, old) ||
> > +   unlikely(!signalled_used_valid)) &&
> > +   vq->callfd >= 0) {
> > eventfd_write(vq->callfd, (eventfd_t) 1);
> > if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
> > 
> > __atomic_fetch_add(&vq->stats.guest_notifications,
> > @@ -974,7 +974,7 @@ vhost_vring_call_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq)
> > if (vhost_need_event(off, new, old))
> > kick = true;
> >   kick:
> > -   if (kick) {
> > +   if (kick && vq->callfd >= 0) {
> > eventfd_write(vq->callfd, (eventfd_t)1);
> > if (dev->notify_ops->guest_notified)
> > dev->notify_ops->guest_notified(dev->vid);
> >
> 
> Reporting Chenbo's R-by, from the VDUSE series RFC:
> 
> Reviewed-by: Chenbo Xia 

Thanks Maxime! Btw: what's your plan of the same fix in VDUSE series, do you 
plan
to drop it in VDUSE series or?

Thanks,
Chenbo

RE: [PATCH v3 4/4] vhost: add device op to offload the interrupt kick

2023-05-30 Thread Xia, Chenbo

> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, May 30, 2023 11:17 PM
> To: Thomas Monjalon ; Eelco Chaudron
> ; Xia, Chenbo ;
> david.march...@redhat.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3 4/4] vhost: add device op to offload the interrupt
> kick
> 
> 
> 
> On 5/30/23 15:16, Thomas Monjalon wrote:
> > 30/05/2023 15:02, Maxime Coquelin:
> >>
> >> On 5/17/23 11:09, Eelco Chaudron wrote:
> >>> This patch adds an operation callback which gets called every time the
> >>> library wants to call eventfd_write(). This eventfd_write() call could
> >>> result in a system call, which could potentially block the PMD thread.
> >>>
> >>> The callback function can decide whether it's ok to handle the
> >>> eventfd_write() now or have the newly introduced function,
> >>> rte_vhost_notify_guest(), called at a later time.
> >>>
> >>> This can be used by 3rd party applications, like OVS, to avoid system
> >>> calls being called as part of the PMD threads.
> >>>
> >>> Signed-off-by: Eelco Chaudron 
> >>> ---
> >>>lib/vhost/meson.build |2 ++
> >>>lib/vhost/rte_vhost.h |   23 +-
> >>>lib/vhost/socket.c|   63
> ++---
> >>>lib/vhost/version.map |9 +++
> >>>lib/vhost/vhost.c |   38 ++
> >>>lib/vhost/vhost.h |   58 --
> ---
> >>>6 files changed, 171 insertions(+), 22 deletions(-)
> >>>
> >>
> >>
> >> The patch looks good to me, but that's the first time we use function
> >> versioning in Vhost library, so I'd like another pair of eyes to be
> sure
> >> I don't miss anything.
> >>
> >> Reviewed-by: Maxime Coquelin 
> >>
> >> Thomas, do we need to mention it somewhere in the release note?
> >
> > If compatibility is kept, I think we don't need to mention it.
> >
> >
> 
> Thanks Thomas for the information.
> 
> Maxime

About release note, except the versioning, there is also one new API introduced
in this patch, so we still need to mention this in release note

Thanks,
Chenbo

RE: [PATCH v3 1/4] vhost: change vhost_virtqueue access lock to a read/write one

2023-05-30 Thread Xia, Chenbo

Hi Eelco,

> -Original Message-
> From: Eelco Chaudron 
> Sent: Wednesday, May 17, 2023 5:09 PM
> To: maxime.coque...@redhat.com; Xia, Chenbo ;
> david.march...@redhat.com
> Cc: dev@dpdk.org
> Subject: [PATCH v3 1/4] vhost: change vhost_virtqueue access lock to a
> read/write one
> 
> This change will allow the vhost interrupt datapath handling to be split
> between two processed without one of them holding an explicit lock.
> 
> Signed-off-by: Eelco Chaudron 
> ---
>  lib/eal/include/generic/rte_rwlock.h |   17 ++
>  lib/vhost/vhost.c|   46 +
>  lib/vhost/vhost.h|4 +-
>  lib/vhost/vhost_user.c   |   14 +++--
>  lib/vhost/virtio_net.c   |   90 +
> -
>  5 files changed, 94 insertions(+), 77 deletions(-)
> 
> diff --git a/lib/eal/include/generic/rte_rwlock.h
> b/lib/eal/include/generic/rte_rwlock.h
> index 71e2d8d5f4..9e083bbc61 100644
> --- a/lib/eal/include/generic/rte_rwlock.h
> +++ b/lib/eal/include/generic/rte_rwlock.h
> @@ -236,6 +236,23 @@ rte_rwlock_write_unlock(rte_rwlock_t *rwl)
>   __atomic_fetch_sub(&rwl->cnt, RTE_RWLOCK_WRITE, __ATOMIC_RELEASE);
>  }
> 
> +/**
> + * Test if the write lock is taken.
> + *
> + * @param rwl
> + *   A pointer to a rwlock structure.
> + * @return
> + *   1 if the write lock is currently taken; 0 otherwise.
> + */
> +static inline int
> +rte_rwlock_write_is_locked(rte_rwlock_t *rwl)
> +{
> + if (__atomic_load_n(&rwl->cnt, __ATOMIC_RELAXED) & RTE_RWLOCK_WRITE)
> + return 1;
> +
> + return 0;
> +}
> +

Again we need to update release note as it's a new EAL API.

>  /**
>   * Try to execute critical section in a hardware memory transaction, if
> it
>   * fails or not available take a read lock
> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
> index ef37943817..74bdbfd810 100644
> --- a/lib/vhost/vhost.c
> +++ b/lib/vhost/vhost.c
> @@ -393,9 +393,9 @@ free_vq(struct virtio_net *dev, struct vhost_virtqueue
> *vq)
>   else
>   rte_free(vq->shadow_used_split);
> 
> - rte_spinlock_lock(&vq->access_lock);
> + rte_rwlock_write_lock(&vq->access_lock);
>   vhost_free_async_mem(vq);
> - rte_spinlock_unlock(&vq->access_lock);
> + rte_rwlock_write_unlock(&vq->access_lock);
>   rte_free(vq->batch_copy_elems);
>   vhost_user_iotlb_destroy(vq);
>   rte_free(vq->log_cache);
> @@ -630,7 +630,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t
> vring_idx)
> 
>   dev->virtqueue[i] = vq;
>   init_vring_queue(dev, vq, i);
> - rte_spinlock_init(&vq->access_lock);
> + rte_rwlock_init(&vq->access_lock);
>   vq->avail_wrap_counter = 1;
>   vq->used_wrap_counter = 1;
>   vq->signalled_used_valid = false;
> @@ -1305,14 +1305,14 @@ rte_vhost_vring_call(int vid, uint16_t vring_idx)
>   if (!vq)
>   return -1;
> 
> - rte_spinlock_lock(&vq->access_lock);
> + rte_rwlock_read_lock(&vq->access_lock);
> 
>   if (vq_is_packed(dev))
>   vhost_vring_call_packed(dev, vq);
>   else
>   vhost_vring_call_split(dev, vq);
> 
> - rte_spinlock_unlock(&vq->access_lock);
> + rte_rwlock_read_unlock(&vq->access_lock);

Not sure about this. vhost_ring_call_packed/split is changing some field in
Vq. Should we use write lock here?

Thanks,
Chenbo

[PATCH] eal/linux: register mp hotplug callback after memory init

2023-05-30 Thread Zhihong Wang

Secondary would crash if it tries to handle mp requests before memory
init, since globals such as eth_dev_shared_data_lock are not accessible
to it at this moment.
---
 lib/eal/linux/eal.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index ae323cd492..a74d564597 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1058,12 +1058,6 @@ rte_eal_init(int argc, char **argv)
}
}
 
-   /* register multi-process action callbacks for hotplug */
-   if (eal_mp_dev_hotplug_init() < 0) {
-   rte_eal_init_alert("failed to register mp callback for 
hotplug");
-   return -1;
-   }
-
if (rte_bus_scan()) {
rte_eal_init_alert("Cannot scan the buses for devices");
rte_errno = ENODEV;
@@ -1221,6 +1215,12 @@ rte_eal_init(int argc, char **argv)
return -1;
}
 
+   /* register multi-process action callbacks for hotplug after memory 
init */
+   if (eal_mp_dev_hotplug_init() < 0) {
+   rte_eal_init_alert("failed to register mp callback for 
hotplug");
+   return -1;
+   }
+
if (rte_eal_tailqs_init() < 0) {
rte_eal_init_alert("Cannot init tail queues for objects");
rte_errno = EFAULT;
-- 
2.11.0

1 2 >

100 matches

Mail list logo