[PATCH] net/nfp: fix return value check problem

2024-10-17 Thread Chaoyong He
Fix one return value check problem found by the CI.

Coverity issue: 445519
Fixes: 08ea495d624b ("net/nfp: support loading firmware from flash")
Cc: sta...@dpdk.org

Signed-off-by: Chaoyong He 
Reviewed-by: Peng Zhang 
---
 drivers/net/nfp/nfp_ethdev.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index b16fbe7db7..812c48ff4c 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -1807,7 +1807,13 @@ nfp_enable_multi_pf(struct nfp_pf_dev *pf_dev)
net_hw.tx_bar = pf_dev->qc_bar + tx_base * NFP_QCP_QUEUE_ADDR_SZ;
nfp_net_cfg_queue_setup(&net_hw);
rte_spinlock_init(&hw->reconfig_lock);
-   nfp_ext_reconfig(&net_hw.super, NFP_NET_CFG_CTRL_MULTI_PF, 
NFP_NET_CFG_UPDATE_GEN);
+   err = nfp_ext_reconfig(&net_hw.super, NFP_NET_CFG_CTRL_MULTI_PF,
+   NFP_NET_CFG_UPDATE_GEN);
+   if (err != 0) {
+   PMD_INIT_LOG(ERR, "Configure multiple PF failed.");
+   goto end;
+   }
+
 end:
nfp_cpp_area_release_free(area);
return err;
-- 
2.39.1



Re: [EXTERNAL] Re: [RFC PATCH 0/3] add feature arc in rte_graph

2024-10-17 Thread Nitin Saxena
Hi Robin,

See inline comments

Thanks,
Nitin

On Thu, Oct 17, 2024 at 1:20 PM Robin Jarry  wrote:
>
> Hi Nitin, all,
>
> Nitin Saxena, Oct 17, 2024 at 09:03:
> > Hi Robin/David and all,
> >
> > We realized the feature arc patch series is difficult to understand as
> > a new concept. Our objectives are following with feature arc changes
> >
> > 1. Allow reusability of standard DPDK nodes (defined in lib/nodes/*)
> >with out-of-tree applications (like grout). Currently out-of-tree
> >graph applications are duplicating standard nodes but not reusing
> >the standard ones which are available. In the long term, we would
> >like to mature standard DPDK nodes with flexibility of hooking them
> >to out-of-tree application nodes.
>
> It would be ideal if the in-built nodes could be reused. When we started
> working on grout, I tried multiple approaches where I could reuse these
> nodes, but all failed. The nodes public API seems tailored for app/graph
> but does not fit well with other control plane implementations.
>
> One of the main issues I had is that the ethdev_rx and ethdev_tx nodes
> are cloned per rxq / txq associated with a graph worker. The rte_node
> API requires that every clone has a unique name. This in turn makes hot
> plugging of DPDK ports very complex, if not impossible.

Agreed. I guess hot plugging of DPDK ports was not the objective when
initial changes went in. But we can add hot-plugging functionality
without affecting performance

>
> For example, with the in-built nodes, it is not possible to change the
> number of ports or their number of RX queues without destroying the
> whole graph and creating a new one from scratch.

Coincidentally, I have also encountered these technical issues while
writing an out-of-tree application [1]. I had internal discussions
with @Jerin Jacob  and other graph maintainers to fix these
shortcomings. If you want, we can collaborate on fixing these issues

For [port, rq] pair mapping to worker core, I have an alternate design
[2] which currently stops worker cores. It can be enhanced by RCU
based scheme for an ideal DPDK implementation

[1]: 
https://marvellembeddedprocessors.github.io/dao/guides/applications/secgw-graph.html
[2]: 
https://github.com/MarvellEmbeddedProcessors/dao/blob/dao-devel/app/secgw-graph/nodes/rxtx/ethdev-rx.c#L27

>
> Also, the current implementation of "ip{4,6}-rewrite" handles writing
> ethernet header data. This would prevent it from using this node for an
> IP-in-IP tunnel interface as we did in grout.

For IP-in-IP, a separate rewrite node would be required which computes
checksum etc. but not add rewrite data.

>
> Do you think we could change the in-built nodes to enforce OSI layer
> separation of concerns? It would make them much more flexible.

Yes. We are also in agreement to make RFC compliant optimized in-built
nodes with such flexibility in place.

> It may
> cause a slight drop of performance because you'd be splitting processing
> in two different nodes. But I think flexibility is more important.
> Otherwise, the in-built nodes can only be used for very specific
> use-cases.
>
> Finally, I would like to improve the rte_node API to allow defining and
> enforcing per-packet metadata that every node expects as input. The
> current in-built nodes rely on mbuf dynamic fields for this but this
> means you only have 9x32 bits available. And using all of these may
> break some drivers (ixgbe) that rely on dynfields to work. Have you
> considered using mbuf private data for this?

IMO,  "node_mbuf_priv_t" would be ideal for most of the use-cases as
it fits in second 64B cache line. With mbuf private data, fast path
have to access another cache line per packet which may not be
efficient from performance PoV. But we can discuss in more detail
about it. Although, I thought of adding "sw_if_index" (which is not
same as port_id) to accommodate IP-in-IP like software interfaces

>
> >
> > 2. Flexibility to enable/disable sub-graphs per interface based on the
> >runtime configuration updates. Protocol sub-graphs can be
> >selectively enabled for few (or all interfaces) at runtime
> >
> > 3. More than one sub-graphs/features can be enabled on an interface.
> >So a packet has to follow a sequential ordering node path on worker
> >cores. Packets may need to move from one sub-graph to another
> >sub-graph per interface
> >
> > 4. Last but not least, an optimized implementation which does not (or
> >minimally) stop worker cores for any control plane runtime updates.
> >Any performance regression should also be avoided
> >
> > I am planning to create a draft presentation on feature arc which
> > I can share, when ready, to discuss. If needed, I can also plan to
> > present that in one of the DPDK community meetings. Their we can also
> > discuss if there are any alternatives of achieving above objectives
>
> Looking forward to this.

Sure. Will share ppt asap

>
> Thanks!
>


DPDK Release Status Meeting 2024-10-17

2024-10-17 Thread Mcnamara, John
Release status meeting minutes 2024-10-17
=

Agenda:
* Release Dates
* Subtrees
* Roadmaps
* LTS
* Defects
* Opens

Participants:
* AMD
* ARM
* Intel
* Marvell
* Nvidia
* Red Hat


Release Dates
-

The following are the current/updated working dates for 24.11:

- Proposal deadline (RFC/v1 patches): 7 September 2024
- API freeze (-rc1): 18 October 2024
- PMD features freeze (-rc2): 28 October 2024
- Builtin applications features freeze (-rc3): 4 November 2024
- Release: 18 November 2023

https://core.dpdk.org/roadmap/#dates


Subtrees


* next-net
  * Most ethdev merged for RC1
  * Most driver patches merged apart from new one which
will be merged post RC1
  * 2 New PMD: net/sxe, net/xsc - under review
  * 2 Updating: net/r8169, net/zxdh. Also Napatech: net/ntnic.
  * Bonding needs a maintainer

* next-net-intel
  * 4 base code updates applied.
  * Tree pulled for RC1
  * Some patchsets pending for RC2

* next-net-mlx
  * No updates this week.

* next-net-mvl
  * Merged for RC1.
  * Some patches for RC2.

* next-eventdev
  * Merged for RC1.
  * Some patches for RC2.

* next-baseband
  * No updates this week.

* next-virtio
  * No updates this week.

* next-crypto
  * No updates this week.

* next-dts
  * Capabilities merged.
  * ~10 other patches waiting for dependency to be merged.

* main
  * Lots of patches merged ahead of RC1.
  * RC1 targeted for Friday 18 October


**Note**:
  We will look to move this meeting to a US friendly time slot at
  the start of November.


LTS
---

Status of the current LTSes

* 23.11.2 - Released.
* 22.11.6 - Released.
* 21.11.8 - Released.

* 20.11.10 - Will only be updated with CVE and critical fixes.
* 19.11.15 - Will only be updated with CVE and critical fixes.


* Distros
  * Debian 12 contains DPDK v22.11
  * Ubuntu 24.04 contains DPDK v23.11
  * Ubuntu 23.04 contains DPDK v22.11
  * RHEL 8/9 contains DPDK 23.11

Defects
---

* Bugzilla links, 'Bugs',  added for hosted projects
  * https://www.dpdk.org/hosted-projects/



DPDK Release Status Meetings


The DPDK Release Status Meeting is intended for DPDK Committers to discuss the
status of the master tree and sub-trees, and for project managers to track
progress or milestone dates.

The meeting occurs on every Thursday at 9:30 DST over Jitsi on 
https://meet.jit.si/DPDK

You don't need an invite to join the meeting but if you want a calendar 
reminder just
send an email to "John McNamara john.mcnam...@intel.com" for the invite.


[PATCH v2 2/2] examples/l3fwd: add option to set mbuf cache size

2024-10-17 Thread Jie Hai
The mempool cache size of mbuf is set to
RTE_MEMPOOL_CACHE_MAX_SIZE as default. This patch allows
users to configure the cache size by "--mbcache", and limits
the paramater to a maximum of RTE_MEMPOOL_CACHE_MAX_SIZE.

Signed-off-by: Jie Hai 
---
 examples/l3fwd/l3fwd.h |  1 +
 examples/l3fwd/main.c  | 33 ++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 618e0eaa3af1..0cce3406ee7d 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -117,6 +117,7 @@ extern struct acl_algorithms acl_alg[];
 extern uint32_t max_pkt_len;
 
 extern uint32_t nb_pkt_per_burst;
+extern uint32_t mb_mempool_cache_size;
 
 /* Send burst of packets on an output interface */
 static inline int
diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 2feae5b311a2..1e2da739dded 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -58,6 +58,7 @@ static_assert(MEMPOOL_CACHE_SIZE >= MAX_PKT_BURST);
 uint16_t nb_rxd = RX_DESC_DEFAULT;
 uint16_t nb_txd = TX_DESC_DEFAULT;
 uint32_t nb_pkt_per_burst = DEFAULT_PKT_BURST;
+uint32_t mb_mempool_cache_size = MEMPOOL_CACHE_SIZE;
 
 /**< Ports set in promiscuous mode off by default. */
 static int promiscuous_on;
@@ -399,6 +400,7 @@ print_usage(const char *prgname)
" [--rx-queue-size NPKTS]"
" [--tx-queue-size NPKTS]"
" [--burst NPKTS]"
+   " [--mbcache CACHESZ]"
" [--eth-dest=X,MM:MM:MM:MM:MM:MM]"
" [--max-pkt-len PKTLEN]"
" [--no-numa]"
@@ -426,6 +428,8 @@ print_usage(const char *prgname)
"Default: %d\n"
"  --burst NPKTS: Burst size in decimal\n"
"Default: %d\n"
+   "  --mbcache CACHESZ: Cache size in decimal\n"
+   "Default: %d\n"
"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for 
port X\n"
"  --max-pkt-len PKTLEN: maximum packet length in decimal 
(64-9600)\n"
"  --no-numa: Disable numa awareness\n"
@@ -455,7 +459,7 @@ print_usage(const char *prgname)
"another is route entry at while line leads 
with character '%c'.\n"
"  --rule_ipv6=FILE: Specify the ipv6 rules entries file.\n"
"  --alg: ACL classify method to use, one of: %s.\n\n",
-   prgname, RX_DESC_DEFAULT, TX_DESC_DEFAULT, DEFAULT_PKT_BURST,
+   prgname, RX_DESC_DEFAULT, TX_DESC_DEFAULT, DEFAULT_PKT_BURST, 
MEMPOOL_CACHE_SIZE,
ACL_LEAD_CHAR, ROUTE_LEAD_CHAR, alg);
 }
 
@@ -673,6 +677,22 @@ parse_lookup(const char *optarg)
return 0;
 }
 
+static void
+parse_mbcache_size(const char *optarg)
+{
+   unsigned long mb_cache_size;
+   char *end = NULL;
+
+   mb_cache_size = strtoul(optarg, &end, 10);
+   if ((optarg[0] == '\0') || (end == NULL) || (*end != '\0'))
+   return;
+   if (mb_cache_size <= RTE_MEMPOOL_CACHE_MAX_SIZE)
+   mb_mempool_cache_size = (uint32_t)mb_cache_size;
+   else
+   rte_exit(EXIT_FAILURE, "mbcache must be >= 0 and <= %d\n",
+RTE_MEMPOOL_CACHE_MAX_SIZE);
+}
+
 static void
 parse_pkt_burst(const char *optarg)
 {
@@ -748,6 +768,7 @@ static const char short_options[] =
 #define CMD_LINE_OPT_RULE_IPV6 "rule_ipv6"
 #define CMD_LINE_OPT_ALG "alg"
 #define CMD_LINE_OPT_PKT_BURST "burst"
+#define CMD_LINE_OPT_MB_CACHE_SIZE "mbcache"
 
 enum {
/* long options mapped to a short option */
@@ -777,7 +798,8 @@ enum {
CMD_LINE_OPT_ENABLE_VECTOR_NUM,
CMD_LINE_OPT_VECTOR_SIZE_NUM,
CMD_LINE_OPT_VECTOR_TMO_NS_NUM,
-   CMD_LINE_OPT_PKT_BURST_NUM
+   CMD_LINE_OPT_PKT_BURST_NUM,
+   CMD_LINE_OPT_MB_CACHE_SIZE_NUM
 };
 
 static const struct option lgopts[] = {
@@ -805,6 +827,7 @@ static const struct option lgopts[] = {
{CMD_LINE_OPT_RULE_IPV6,   1, 0, CMD_LINE_OPT_RULE_IPV6_NUM},
{CMD_LINE_OPT_ALG,   1, 0, CMD_LINE_OPT_ALG_NUM},
{CMD_LINE_OPT_PKT_BURST,   1, 0, CMD_LINE_OPT_PKT_BURST_NUM},
+   {CMD_LINE_OPT_MB_CACHE_SIZE,   1, 0, CMD_LINE_OPT_MB_CACHE_SIZE_NUM},
{NULL, 0, 0, 0}
 };
 
@@ -897,6 +920,10 @@ parse_args(int argc, char **argv)
parse_pkt_burst(optarg);
break;
 
+   case CMD_LINE_OPT_MB_CACHE_SIZE_NUM:
+   parse_mbcache_size(optarg);
+   break;
+
case CMD_LINE_OPT_ETH_DEST_NUM:
parse_eth_dest(optarg);
break;
@@ -1089,7 +1116,7 @@ init_mem(uint16_t portid, unsigned int nb_mbuf)
 portid, socketid);
pktmbuf_pool[portid][socketid] =
rte_pktmbuf_pool_create(s, nb_mbuf,
-   MEMPO

[PATCH v2 1/2] examples/l3fwd: add option to set RX burst size

2024-10-17 Thread Jie Hai
Now the Rx burst size is fixed to MAX_PKT_BURST (32). This
parameter needs to be modified in some performance optimization
scenarios. So an option '--burst' is added to set the burst size
explicitly. The default value is DEFAULT_PKT_BURST (32) and maximum
value is MAX_PKT_BURST (512).

Signed-off-by: Jie Hai 
Acked-by: Chengwen Feng 
---
 examples/l3fwd/l3fwd.h |  7 +++--
 examples/l3fwd/l3fwd_acl.c |  2 +-
 examples/l3fwd/l3fwd_em.c  |  2 +-
 examples/l3fwd/l3fwd_fib.c |  2 +-
 examples/l3fwd/l3fwd_lpm.c |  2 +-
 examples/l3fwd/main.c  | 60 --
 6 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 93ce652d02b7..618e0eaa3af1 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -23,10 +23,11 @@
 #define RX_DESC_DEFAULT 1024
 #define TX_DESC_DEFAULT 1024
 
-#define MAX_PKT_BURST 32
+#define DEFAULT_PKT_BURST 32
+#define MAX_PKT_BURST 512
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
-#define MEMPOOL_CACHE_SIZE 256
+#define MEMPOOL_CACHE_SIZE RTE_MEMPOOL_CACHE_MAX_SIZE
 #define MAX_RX_QUEUE_PER_LCORE 16
 
 #define VECTOR_SIZE_DEFAULT   MAX_PKT_BURST
@@ -115,6 +116,8 @@ extern struct acl_algorithms acl_alg[];
 
 extern uint32_t max_pkt_len;
 
+extern uint32_t nb_pkt_per_burst;
+
 /* Send burst of packets on an output interface */
 static inline int
 send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port)
diff --git a/examples/l3fwd/l3fwd_acl.c b/examples/l3fwd/l3fwd_acl.c
index b635011ef708..ccb9946837ed 100644
--- a/examples/l3fwd/l3fwd_acl.c
+++ b/examples/l3fwd/l3fwd_acl.c
@@ -1119,7 +1119,7 @@ acl_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid,
-   pkts_burst, MAX_PKT_BURST);
+   pkts_burst, nb_pkt_per_burst);
 
if (nb_rx > 0) {
acl_process_pkts(pkts_burst, hops, nb_rx,
diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index 31a7e05e39d0..da9c45e3a482 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -644,7 +644,7 @@ em_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   MAX_PKT_BURST);
+   nb_pkt_per_burst);
if (nb_rx == 0)
continue;
 
diff --git a/examples/l3fwd/l3fwd_fib.c b/examples/l3fwd/l3fwd_fib.c
index f38b19af3f57..aa81b12fe7dc 100644
--- a/examples/l3fwd/l3fwd_fib.c
+++ b/examples/l3fwd/l3fwd_fib.c
@@ -239,7 +239,7 @@ fib_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   MAX_PKT_BURST);
+   nb_pkt_per_burst);
if (nb_rx == 0)
continue;
 
diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index e8fd95aae9ce..048c02491378 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -205,7 +205,7 @@ lpm_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   MAX_PKT_BURST);
+   nb_pkt_per_burst);
if (nb_rx == 0)
continue;
 
diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 01b763e5ba11..2feae5b311a2 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -53,8 +54,10 @@
 
 #define MAX_LCORE_PARAMS 1024
 
+static_assert(MEMPOOL_CACHE_SIZE >= MAX_PKT_BURST);
 uint16_t nb_rxd = RX_DESC_DEFAULT;
 uint16_t nb_txd = TX_DESC_DEFAULT;
+uint32_t nb_pkt_per_burst = DEFAULT_PKT_BURST;
 
 /**< Ports set in promiscuous mode off by default. */
 static int promiscuous_on;
@@ -395,6 +398,7 @@ print_usage(const char *prgname)
" --config (port,queue,lcore)[,(port,queue,lcore)]"
" [--rx-queue-size NPKTS]"
" [--tx-queue-size NPKTS]"
+   " [--burst NPKTS]"
" [--eth-dest=X,MM:MM:MM:MM:MM:MM]"
" [--max-pkt-len PKTLEN]"
" [--no-numa]"
@@ -420,6 +424,8 @@ print_usa

[PATCH v2 0/2] examples/l3fwd: add more options

2024-10-17 Thread Jie Hai
Add options to support configuring RX burst size and cache size
of mbuf mempoool.

Jie Hai (2):
  examples/l3fwd: add option to set RX burst size
  examples/l3fwd: add option to set mbuf cache size

 examples/l3fwd/l3fwd.h |  8 +++-
 examples/l3fwd/l3fwd_acl.c |  2 +-
 examples/l3fwd/l3fwd_em.c  |  2 +-
 examples/l3fwd/l3fwd_fib.c |  2 +-
 examples/l3fwd/l3fwd_lpm.c |  2 +-
 examples/l3fwd/main.c  | 89 --
 6 files changed, 96 insertions(+), 9 deletions(-)

-- 
2.22.0



RE: [PATCH v2 0/2] examples/l3fwd: add more options

2024-10-17 Thread Morten Brørup
For the series,
Acked-by: Morten Brørup 



[PATCH v11 01/12] zsda: add zsdadev driver documents

2024-10-17 Thread Hanxiao Li
Introduce ZTE Storage Data Accelerator(ZSDA) drivers
which can help accelerate storage data process.

The official product documenttation web page is:
https://enterprise.zte.com.cn/products.html?id=101

It is recommended to update MAINTAINERS in the
first patch in the new PMD guidelines.
https://patches.dpdk.org/project/dpdk/patch/20241006184
254.53499-1-nandinipersad...@gmail.com/

This patch may contain compilation errors.
Because the patch is depended on the other patch.

Depends-on: series-cryptodev: add SM4-XTS algo and test cases

Signed-off-by: Hanxiao Li 
---
 MAINTAINERS   |   7 +
 doc/guides/compressdevs/features/zsda.ini |  15 ++
 doc/guides/compressdevs/index.rst |   1 +
 doc/guides/compressdevs/zsda.rst  |  45 
 doc/guides/cryptodevs/features/zsda.ini   |  51 +
 doc/guides/cryptodevs/index.rst   |   1 +
 doc/guides/cryptodevs/zsda.rst| 260 ++
 doc/guides/rel_notes/release_24_11.rst|   8 +
 8 files changed, 388 insertions(+)
 create mode 100644 doc/guides/compressdevs/features/zsda.ini
 create mode 100644 doc/guides/compressdevs/zsda.rst
 create mode 100644 doc/guides/cryptodevs/features/zsda.ini
 create mode 100644 doc/guides/cryptodevs/zsda.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index c5a703b5c0..ff4fc977a0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1221,6 +1221,9 @@ F: drivers/crypto/virtio/
 F: doc/guides/cryptodevs/virtio.rst
 F: doc/guides/cryptodevs/features/virtio.ini
 
+ZTE Storage Data Accelerator(ZSDA)
+M: Hanxiao Li 
+F: drivers/crypto/zsda/
 
 Compression Drivers
 ---
@@ -1268,6 +1271,10 @@ F: drivers/compress/zlib/
 F: doc/guides/compressdevs/zlib.rst
 F: doc/guides/compressdevs/features/zlib.ini
 
+ZTE Storage Data Accelerator(ZSDA)
+M: Hanxiao Li 
+F: drivers/common/zsda/
+F: drivers/compress/zsda/
 
 DMAdev Drivers
 --
diff --git a/doc/guides/compressdevs/features/zsda.ini 
b/doc/guides/compressdevs/features/zsda.ini
new file mode 100644
index 00..3b087ea7f9
--- /dev/null
+++ b/doc/guides/compressdevs/features/zsda.ini
@@ -0,0 +1,15 @@
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+; Supported features of 'ZSDA' compression driver.
+;
+[Features]
+HW Accelerated = Y
+OOP SGL In SGL Out = Y
+OOP SGL In LB  Out = Y
+OOP LB  In SGL Out = Y
+Deflate= Y
+Adler32= Y
+Crc32  = Y
+Fixed  = Y
+Dynamic= Y
diff --git a/doc/guides/compressdevs/index.rst 
b/doc/guides/compressdevs/index.rst
index 87ed4f72a4..bab226ffbc 100644
--- a/doc/guides/compressdevs/index.rst
+++ b/doc/guides/compressdevs/index.rst
@@ -17,3 +17,4 @@ Compression Device Drivers
 qat_comp
 uadk
 zlib
+zsda
diff --git a/doc/guides/compressdevs/zsda.rst b/doc/guides/compressdevs/zsda.rst
new file mode 100644
index 00..0140032a61
--- /dev/null
+++ b/doc/guides/compressdevs/zsda.rst
@@ -0,0 +1,45 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(c) 2024 ZTE Corporation.
+
+ZTE Storage Data Accelerator (ZSDA) Poll Mode Driver
+===
+
+The ZSDA compression PMD provides poll mode compression & decompression driver
+support for the following hardware accelerator devices:
+
+* ``ZTE Processing accelerators 1cf2``
+
+
+Features
+
+
+ZSDA compression PMD has support for:
+
+Compression/Decompression algorithm:
+
+* DEFLATE - using Fixed and Dynamic Huffman encoding
+
+Checksum generation:
+
+* CRC32, Adler32
+
+Huffman code type:
+
+* FIXED
+* DYNAMIC
+
+
+Limitations
+---
+
+* Compressdev level 0, no compression, is not supported.
+* No BSD support as BSD ZSDA kernel driver not available.
+* Stateful is not supported.
+
+
+Installation
+
+
+The ZSDA compression PMD is built by default with a standard DPDK build.
+
+It depends on a ZSDA kernel driver, see :ref:`building_zsda`.
diff --git a/doc/guides/cryptodevs/features/zsda.ini 
b/doc/guides/cryptodevs/features/zsda.ini
new file mode 100644
index 00..4d97976815
--- /dev/null
+++ b/doc/guides/cryptodevs/features/zsda.ini
@@ -0,0 +1,51 @@
+;
+; Supported features of the 'zsda' crypto driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Symmetric crypto   = Y
+HW Accelerated = Y
+Protocol offload   = Y
+In Place SGL   = Y
+OOP SGL In SGL Out = Y
+OOP SGL In LB  Out = Y
+OOP LB  In SGL Out = Y
+OOP LB  In LB  Out = Y
+Digest encrypted   = Y
+Sym raw data path API  = Y
+
+;
+; Supported crypto algorithms of the 'zsda' crypto driver.
+;
+[Cipher]
+AES XTS (128)  = Y
+AES XTS (256)  = Y
+SM4 XTS= Y
+;
+; Supported authentication algorithms of the 'zsda' crypto driver.
+;
+[Auth]
+SHA1 = Y
+SHA224   = Y
+SHA256   = Y
+SHA384   = Y
+SHA512   = Y
+SM3  = Y
+
+;
+; Supported

[PATCH v11 03/12] common/zsda: add some common functions

2024-10-17 Thread Hanxiao Li
Introduce common functions and logging macros.

This patch may have warning:

Warning in drivers/common/zsda/zsda_logs.h:
Do not use variadic argument pack in macros

However, the usage is same to CCP_LOG_ERR which
is in ccp_pmd_private and is merged before 4 mouths.


Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/meson.build   |  14 ++
 drivers/common/zsda/zsda_common.c | 239 +
 drivers/common/zsda/zsda_common.h | 334 ++
 drivers/common/zsda/zsda_logs.c   |  20 ++
 drivers/common/zsda/zsda_logs.h   |  25 +++
 drivers/meson.build   |   1 +
 6 files changed, 633 insertions(+)
 create mode 100644 drivers/common/zsda/meson.build
 create mode 100644 drivers/common/zsda/zsda_common.c
 create mode 100644 drivers/common/zsda/zsda_common.h
 create mode 100644 drivers/common/zsda/zsda_logs.c
 create mode 100644 drivers/common/zsda/zsda_logs.h

diff --git a/drivers/common/zsda/meson.build b/drivers/common/zsda/meson.build
new file mode 100644
index 00..8971289080
--- /dev/null
+++ b/drivers/common/zsda/meson.build
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 ZTE Corporation
+
+if is_windows
+build = false
+reason = 'not supported on Windows'
+subdir_done()
+endif
+
+deps += ['bus_pci']
+sources += files(
+   'zsda_common.c',
+   'zsda_logs.c',
+   )
diff --git a/drivers/common/zsda/zsda_common.c 
b/drivers/common/zsda/zsda_common.c
new file mode 100644
index 00..3743d09a2e
--- /dev/null
+++ b/drivers/common/zsda/zsda_common.c
@@ -0,0 +1,239 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 ZTE Corporation
+ */
+
+#include "zsda_common.h"
+
+#include "bus_pci_driver.h"
+
+#define MAGIC_SEND 0xab
+#define MAGIC_RECV 0xcd
+#define ADMIN_VER 1
+
+static const uint8_t crc8_table[256] = {
+   0x00, 0x41, 0x13, 0x52, 0x26, 0x67, 0x35, 0x74, 0x4c, 0x0d, 0x5f, 0x1e,
+   0x6a, 0x2b, 0x79, 0x38, 0x09, 0x48, 0x1a, 0x5b, 0x2f, 0x6e, 0x3c, 0x7d,
+   0x45, 0x04, 0x56, 0x17, 0x63, 0x22, 0x70, 0x31, 0x12, 0x53, 0x01, 0x40,
+   0x34, 0x75, 0x27, 0x66, 0x5e, 0x1f, 0x4d, 0x0c, 0x78, 0x39, 0x6b, 0x2a,
+   0x1b, 0x5a, 0x08, 0x49, 0x3d, 0x7c, 0x2e, 0x6f, 0x57, 0x16, 0x44, 0x05,
+   0x71, 0x30, 0x62, 0x23, 0x24, 0x65, 0x37, 0x76, 0x02, 0x43, 0x11, 0x50,
+   0x68, 0x29, 0x7b, 0x3a, 0x4e, 0x0f, 0x5d, 0x1c, 0x2d, 0x6c, 0x3e, 0x7f,
+   0x0b, 0x4a, 0x18, 0x59, 0x61, 0x20, 0x72, 0x33, 0x47, 0x06, 0x54, 0x15,
+   0x36, 0x77, 0x25, 0x64, 0x10, 0x51, 0x03, 0x42, 0x7a, 0x3b, 0x69, 0x28,
+   0x5c, 0x1d, 0x4f, 0x0e, 0x3f, 0x7e, 0x2c, 0x6d, 0x19, 0x58, 0x0a, 0x4b,
+   0x73, 0x32, 0x60, 0x21, 0x55, 0x14, 0x46, 0x07, 0x48, 0x09, 0x5b, 0x1a,
+   0x6e, 0x2f, 0x7d, 0x3c, 0x04, 0x45, 0x17, 0x56, 0x22, 0x63, 0x31, 0x70,
+   0x41, 0x00, 0x52, 0x13, 0x67, 0x26, 0x74, 0x35, 0x0d, 0x4c, 0x1e, 0x5f,
+   0x2b, 0x6a, 0x38, 0x79, 0x5a, 0x1b, 0x49, 0x08, 0x7c, 0x3d, 0x6f, 0x2e,
+   0x16, 0x57, 0x05, 0x44, 0x30, 0x71, 0x23, 0x62, 0x53, 0x12, 0x40, 0x01,
+   0x75, 0x34, 0x66, 0x27, 0x1f, 0x5e, 0x0c, 0x4d, 0x39, 0x78, 0x2a, 0x6b,
+   0x6c, 0x2d, 0x7f, 0x3e, 0x4a, 0x0b, 0x59, 0x18, 0x20, 0x61, 0x33, 0x72,
+   0x06, 0x47, 0x15, 0x54, 0x65, 0x24, 0x76, 0x37, 0x43, 0x02, 0x50, 0x11,
+   0x29, 0x68, 0x3a, 0x7b, 0x0f, 0x4e, 0x1c, 0x5d, 0x7e, 0x3f, 0x6d, 0x2c,
+   0x58, 0x19, 0x4b, 0x0a, 0x32, 0x73, 0x21, 0x60, 0x14, 0x55, 0x07, 0x46,
+   0x77, 0x36, 0x64, 0x25, 0x51, 0x10, 0x42, 0x03, 0x3b, 0x7a, 0x28, 0x69,
+   0x1d, 0x5c, 0x0e, 0x4f};
+
+static uint8_t
+zsda_crc8(const uint8_t *message, const int length)
+{
+   uint8_t crc = 0;
+   int i;
+
+   for (i = 0; i < length; i++)
+   crc = crc8_table[crc ^ message[i]];
+   return crc;
+}
+
+uint32_t
+zsda_set_reg_8(void *addr, const uint8_t val0, const uint8_t val1,
+ const uint8_t val2, const uint8_t val3)
+{
+   uint8_t val[4];
+   val[0] = val0;
+   val[1] = val1;
+   val[2] = val2;
+   val[3] = val3;
+   ZSDA_CSR_WRITE32(addr, *(uint32_t *)val);
+   return *(uint32_t *)val;
+}
+
+uint8_t
+zsda_get_reg_8(void *addr, const int offset)
+{
+   uint32_t val = ZSDA_CSR_READ32(addr);
+
+   return *(((uint8_t *)&val) + offset);
+}
+
+int
+zsda_admin_msg_init(const struct rte_pci_device *pci_dev)
+{
+   uint8_t *mmio_base = pci_dev->mem_resource[0].addr;
+
+   zsda_set_reg_8(mmio_base + ZSDA_ADMIN_WQ_BASE7, 0, 0, MAGIC_RECV, 0);
+   zsda_set_reg_8(mmio_base + ZSDA_ADMIN_CQ_BASE7, 0, 0, MAGIC_RECV, 0);
+   return 0;
+}
+
+int
+zsda_send_admin_msg(const struct rte_pci_device *pci_dev, void *req,
+   const uint32_t len)
+{
+   uint8_t *mmio_base = pci_dev->mem_resource[0].addr;
+   uint8_t wq_flag;
+   uint8_t crc;
+   uint16_t admin_db;
+   uint32_t retry = ZSDA_TIME_NUM;
+   int i;
+   uint16_t db;
+   int repeat = sizeof(struct zsda_admin_

[PATCH v11 06/12] common/zsda: configure zsda queue enqueue functions

2024-10-17 Thread Hanxiao Li
Add support for zsdadev queue enqueue.

Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/zsda_qp.c | 105 ++
 drivers/common/zsda/zsda_qp.h |   2 +
 2 files changed, 107 insertions(+)

diff --git a/drivers/common/zsda/zsda_qp.c b/drivers/common/zsda/zsda_qp.c
index f1fe6c5817..87932597ed 100644
--- a/drivers/common/zsda/zsda_qp.c
+++ b/drivers/common/zsda/zsda_qp.c
@@ -682,6 +682,111 @@ zsda_queue_pair_release(struct zsda_qp **qp_addr)
return 0;
 }
 
+static int
+zsda_find_next_free_cookie(const struct zsda_queue *queue, void **op_cookie,
+ uint16_t *idx)
+{
+   uint16_t old_tail = queue->tail;
+   uint16_t tail = queue->tail;
+   struct zsda_op_cookie *cookie;
+
+   do {
+   cookie = op_cookie[tail];
+   if (!cookie->used) {
+   *idx = tail & (queue->queue_size - 1);
+   return 0;
+   }
+   tail = zsda_modulo_16(tail++, queue->modulo_mask);
+   } while (old_tail != tail);
+
+   return -EINVAL;
+}
+
+static int
+zsda_enqueue(void *op, struct zsda_qp *qp)
+{
+   uint16_t new_tail;
+   enum zsda_service_type type;
+   void **op_cookie;
+   int ret = 0;
+   struct zsda_queue *queue;
+
+   for (type = 0; type < ZSDA_SERVICE_INVALID; type++) {
+   if (qp->srv[type].used) {
+   if (!qp->srv[type].match(op))
+   continue;
+   queue = &qp->srv[type].tx_q;
+   op_cookie = qp->srv[type].op_cookies;
+
+   if (zsda_find_next_free_cookie(queue, op_cookie,
+ &new_tail)) {
+   ret = -EBUSY;
+   break;
+   }
+   ret = qp->srv[type].tx_cb(op, queue, op_cookie,
+ new_tail);
+   if (ret) {
+   qp->srv[type].stats.enqueue_err_count++;
+   ZSDA_LOG(ERR, "Failed! config wqe");
+   break;
+   }
+   qp->srv[type].stats.enqueued_count++;
+
+   queue->tail = zsda_modulo_16(new_tail + 1,
+queue->queue_size - 1);
+
+   if (new_tail > queue->tail)
+   queue->valid =
+   zsda_modulo_8(queue->valid + 1,
+   (uint8_t)(queue->cycle_size - 1));
+
+   queue->pushed_wqe++;
+   break;
+   }
+   }
+
+   return ret;
+}
+
+static void
+zsda_tx_write_tail(struct zsda_queue *queue)
+{
+   if (queue->pushed_wqe)
+   WRITE_CSR_WQ_TAIL(queue->io_addr, queue->hw_queue_number,
+ queue->tail);
+
+   queue->pushed_wqe = 0;
+}
+
+uint16_t
+zsda_enqueue_op_burst(struct zsda_qp *qp, void **ops, uint16_t nb_ops)
+{
+   int ret = 0;
+   enum zsda_service_type type;
+   uint16_t i;
+   uint16_t nb_send = 0;
+   void *op;
+
+   if (nb_ops > ZSDA_MAX_DESC) {
+   ZSDA_LOG(ERR, "Enqueue number bigger than %d", ZSDA_MAX_DESC);
+   return 0;
+   }
+
+   for (i = 0; i < nb_ops; i++) {
+   op = ops[i];
+   ret = zsda_enqueue(op, qp);
+   if (ret < 0)
+   break;
+   nb_send++;
+   }
+
+   for (type = 0; type < ZSDA_SERVICE_INVALID; type++)
+   if (qp->srv[type].used)
+   zsda_tx_write_tail(&qp->srv[type].tx_q);
+
+   return nb_send;
+}
+
 int
 zsda_common_setup_qp(uint32_t zsda_dev_id, struct zsda_qp **qp_addr,
const uint16_t queue_pair_id, const struct zsda_qp_config *conf)
diff --git a/drivers/common/zsda/zsda_qp.h b/drivers/common/zsda/zsda_qp.h
index a9f0d38ba5..f9efff0e5a 100644
--- a/drivers/common/zsda/zsda_qp.h
+++ b/drivers/common/zsda/zsda_qp.h
@@ -135,6 +135,8 @@ int zsda_get_queue_cfg(struct zsda_pci_device 
*zsda_pci_dev);
 
 int zsda_queue_pair_release(struct zsda_qp **qp_addr);
 
+uint16_t zsda_enqueue_op_burst(struct zsda_qp *qp, void **ops, const uint16_t 
nb_ops);
+
 int zsda_common_setup_qp(uint32_t dev_id, struct zsda_qp **qp_addr,
const uint16_t queue_pair_id,
const struct zsda_qp_config *conf);
-- 
2.27.0

[PATCH v11 09/12] compress/zsda: add zsda compress PMD

2024-10-17 Thread Hanxiao Li
The patch provides a series of interfaces for managing
and controlling the configuration, start, stop,
resource management, etc. of compression devices.

Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/meson.build   |   2 +-
 drivers/compress/zsda/zsda_comp_pmd.c | 464 ++
 drivers/compress/zsda/zsda_comp_pmd.h |  34 ++
 3 files changed, 499 insertions(+), 1 deletion(-)
 create mode 100644 drivers/compress/zsda/zsda_comp_pmd.c
 create mode 100644 drivers/compress/zsda/zsda_comp_pmd.h

diff --git a/drivers/common/zsda/meson.build b/drivers/common/zsda/meson.build
index f873a357e3..20fb37caaf 100644
--- a/drivers/common/zsda/meson.build
+++ b/drivers/common/zsda/meson.build
@@ -20,7 +20,7 @@ zsda_compress_path = 'compress/zsda'
 zsda_compress_relpath = '../../' + zsda_compress_path
 includes += include_directories(zsda_compress_relpath)
 if zsda_compress
-   foreach f: ['zsda_comp.c']
+   foreach f: ['zsda_comp.c', 'zsda_comp_pmd.c']
sources += files(join_paths(zsda_compress_relpath, f))
endforeach
 endif
diff --git a/drivers/compress/zsda/zsda_comp_pmd.c 
b/drivers/compress/zsda/zsda_comp_pmd.c
new file mode 100644
index 00..32221c42a3
--- /dev/null
+++ b/drivers/compress/zsda/zsda_comp_pmd.c
@@ -0,0 +1,464 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 ZTE Corporation
+ */
+
+#include 
+
+#include "zsda_comp.h"
+#include "zsda_comp_pmd.h"
+
+static const struct rte_compressdev_capabilities zsda_comp_capabilities[] = {
+   {
+   .algo = RTE_COMP_ALGO_DEFLATE,
+   .comp_feature_flags = RTE_COMP_FF_HUFFMAN_DYNAMIC |
+   
RTE_COMP_FF_OOP_SGL_IN_SGL_OUT |
+   
RTE_COMP_FF_OOP_SGL_IN_LB_OUT |
+   
RTE_COMP_FF_OOP_LB_IN_SGL_OUT |
+   
RTE_COMP_FF_CRC32_CHECKSUM |
+   
RTE_COMP_FF_ADLER32_CHECKSUM,
+   .window_size = {.min = 15, .max = 15, .increment = 0},
+   },
+};
+
+static void
+zsda_comp_stats_get(struct rte_compressdev *dev,
+   struct rte_compressdev_stats *stats)
+{
+   struct zsda_common_stat comm = {0};
+
+   zsda_stats_get(dev->data->queue_pairs, dev->data->nb_queue_pairs,
+  &comm);
+   stats->enqueued_count = comm.enqueued_count;
+   stats->dequeued_count = comm.dequeued_count;
+   stats->enqueue_err_count = comm.enqueue_err_count;
+   stats->dequeue_err_count = comm.dequeue_err_count;
+}
+
+static void
+zsda_comp_stats_reset(struct rte_compressdev *dev)
+{
+   zsda_stats_reset(dev->data->queue_pairs, dev->data->nb_queue_pairs);
+}
+
+static int
+zsda_comp_qp_release(struct rte_compressdev *dev, uint16_t queue_pair_id)
+{
+   return zsda_queue_pair_release(
+   (struct zsda_qp **)&(dev->data->queue_pairs[queue_pair_id]));
+}
+
+
+static int
+zsda_setup_comp_queue(struct zsda_pci_device *zsda_pci_dev, const uint16_t 
qp_id,
+struct zsda_qp *qp, uint16_t nb_des, int socket_id)
+{
+   enum zsda_service_type type = ZSDA_SERVICE_COMPRESSION;
+   struct zsda_qp_config conf;
+   int ret = 0;
+   struct zsda_qp_hw *qp_hw;
+
+   qp_hw = zsda_qps_hw_per_service(zsda_pci_dev, type);
+   conf.hw = qp_hw->data + qp_id;
+   conf.service_type = type;
+   conf.cookie_size = sizeof(struct zsda_op_cookie);
+   conf.nb_descriptors = nb_des;
+   conf.socket_id = socket_id;
+   conf.service_str = "comp";
+
+   ret = zsda_common_setup_qp(zsda_pci_dev->zsda_dev_id, &qp, qp_id, 
&conf);
+   qp->srv[type].rx_cb = zsda_comp_callback;
+   qp->srv[type].tx_cb = zsda_build_comp_request;
+   qp->srv[type].match = zsda_comp_match;
+
+   return ret;
+}
+
+static int
+zsda_setup_decomp_queue(struct zsda_pci_device *zsda_pci_dev, const uint16_t 
qp_id,
+  struct zsda_qp *qp, uint16_t nb_des, int socket_id)
+{
+   enum zsda_service_type type = ZSDA_SERVICE_DECOMPRESSION;
+   struct zsda_qp_config conf;
+   int ret = 0;
+   struct zsda_qp_hw *qp_hw;
+
+   qp_hw = zsda_qps_hw_per_service(zsda_pci_dev, type);
+   conf.hw = qp_hw->data + qp_id;
+   conf.service_type = type;
+   conf.cookie_size = sizeof(struct zsda_op_cookie);
+   conf.nb_descriptors = nb_des;
+   conf.socket_id = socket_id;
+   conf.service_str = "decomp";
+
+   ret = zsda_common_setup_qp(zsda_pci_dev->zsda_dev_id, &qp, qp_id, 
&conf);
+   qp->srv[type].rx_cb = zsda_comp_callback;
+   qp->srv[type].tx_cb = zsda_build_decomp_request;
+   qp->srv[type].match = zsda_decomp_match;
+
+   return ret;
+}
+
+static int
+zsda_comp_qp_setup(struct rte_compressdev *dev, uint16_t qp_id,
+  uint32_t max_inflight_ops, int socket_id)
+{
+   int

[PATCH v11 08/12] compress/zsda: add zsda compress driver

2024-10-17 Thread Hanxiao Li
The patchset adds support for wqe configuration
of compress and decompress, preliminary verification of results
and preparation of checksums.

Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/meson.build   |  12 +-
 drivers/compress/zsda/zsda_comp.c | 392 ++
 drivers/compress/zsda/zsda_comp.h |  52 
 3 files changed, 455 insertions(+), 1 deletion(-)
 create mode 100644 drivers/compress/zsda/zsda_comp.c
 create mode 100644 drivers/compress/zsda/zsda_comp.h

diff --git a/drivers/common/zsda/meson.build b/drivers/common/zsda/meson.build
index 4d1eec8867..f873a357e3 100644
--- a/drivers/common/zsda/meson.build
+++ b/drivers/common/zsda/meson.build
@@ -7,10 +7,20 @@ if is_windows
 subdir_done()
 endif
 
-deps += ['bus_pci']
+deps += ['bus_pci', 'compressdev']
 sources += files(
'zsda_common.c',
'zsda_logs.c',
'zsda_device.c',
'zsda_qp.c',
)
+
+zsda_compress = true
+zsda_compress_path = 'compress/zsda'
+zsda_compress_relpath = '../../' + zsda_compress_path
+includes += include_directories(zsda_compress_relpath)
+if zsda_compress
+   foreach f: ['zsda_comp.c']
+   sources += files(join_paths(zsda_compress_relpath, f))
+   endforeach
+endif
diff --git a/drivers/compress/zsda/zsda_comp.c 
b/drivers/compress/zsda/zsda_comp.c
new file mode 100644
index 00..40c978c520
--- /dev/null
+++ b/drivers/compress/zsda/zsda_comp.c
@@ -0,0 +1,392 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 ZTE Corporation
+ */
+
+#include "zsda_comp.h"
+
+#define ZLIB_HEADER_SIZE 2
+#define ZLIB_TRAILER_SIZE 4
+#define GZIP_HEADER_SIZE 10
+#define GZIP_TRAILER_SIZE 8
+#define CHECKSUM_SIZE 4
+
+static uint32_t zsda_read_chksum(uint8_t *data_addr, uint8_t op_code,
+uint32_t produced);
+
+
+#define POLYNOMIAL 0xEDB88320
+static uint32_t crc32_table[8][256];
+static int table_config;
+
+static void
+build_crc32_table(void)
+{
+   for (uint32_t i = 0; i < 256; i++) {
+   uint32_t crc = i;
+   for (uint32_t j = 0; j < 8; j++)
+   crc = (crc >> 1) ^ ((crc & 1) ? POLYNOMIAL : 0);
+   crc32_table[0][i] = crc;
+   }
+
+   for (int i = 1; i < 8; i++) {
+   for (uint32_t j = 0; j < 256; j++)
+   crc32_table[i][j] = (crc32_table[i-1][j] >> 8) ^
+   crc32_table[0][crc32_table[i-1][j] & 
0xFF];
+   }
+   table_config = 1;
+}
+
+static uint32_t
+zsda_crc32(const uint8_t *data, size_t length)
+{
+   uint32_t crc = 0x;
+
+   if (!table_config)
+   build_crc32_table();
+
+   while (length >= 8) {
+   crc ^= *(const uint32_t *)data;
+   crc = crc32_table[7][crc & 0xFF] ^
+ crc32_table[6][(crc >> 8) & 0xFF] ^
+ crc32_table[5][(crc >> 16) & 0xFF] ^
+ crc32_table[4][(crc >> 24) & 0xFF] ^
+ crc32_table[3][data[4]] ^
+ crc32_table[2][data[5]] ^
+ crc32_table[1][data[6]] ^
+ crc32_table[0][data[7]];
+
+   data += 8;
+   length -= 8;
+   }
+
+   for (size_t i = 0; i < length; i++)
+   crc = (crc >> 8) ^ crc32_table[0][(crc ^ data[i]) & 0xFF];
+
+   return crc ^ 0x;
+}
+
+#define MOD_ADLER 65521
+#define NMAX 5552
+static uint32_t
+zsda_adler32(const uint8_t *buf, uint32_t len)
+{
+   uint32_t s1 = 1;
+   uint32_t s2 = 0;
+
+   while (len > 0) {
+   uint32_t k = (len < NMAX) ? len : NMAX;
+   len -= k;
+
+   for (uint32_t i = 0; i < k; i++) {
+   s1 += buf[i];
+   s2 += s1;
+   }
+
+   s1 %= MOD_ADLER;
+   s2 %= MOD_ADLER;
+
+   buf += k;
+   }
+
+   return (s2 << 16) | s1;
+}
+
+int
+zsda_comp_match(const void *op_in)
+{
+   const struct rte_comp_op *op = op_in;
+   const struct zsda_comp_xform *xform = op->private_xform;
+
+   if (op->op_type != RTE_COMP_OP_STATELESS)
+   return 0;
+
+   if (xform->type != RTE_COMP_COMPRESS)
+   return 0;
+
+   return 1;
+}
+
+static uint8_t
+get_opcode(const struct zsda_comp_xform *xform)
+{
+   if (xform->type == RTE_COMP_COMPRESS) {
+   if (xform->checksum_type == RTE_COMP_CHECKSUM_NONE ||
+   xform->checksum_type == RTE_COMP_CHECKSUM_CRC32)
+   return ZSDA_OPC_COMP_GZIP;
+   else if (xform->checksum_type == RTE_COMP_CHECKSUM_ADLER32)
+   return ZSDA_OPC_COMP_ZLIB;
+   }
+   if (xform->type == RTE_COMP_DECOMPRESS) {
+   if (xform->checksum_type == RTE_COMP_CHECKSUM_CRC32 ||
+   xform->checksum_type == RTE_COMP

[PATCH v11 11/12] crypto/zsda: add zsda crypto driver

2024-10-17 Thread Hanxiao Li
The patchset adds support for wqe configuration
of encrypto and decrypto, preliminary verification of results
and preparation of checksums.

Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/meson.build |   2 +-
 drivers/crypto/zsda/zsda_sym.c  | 273 
 drivers/crypto/zsda/zsda_sym.h  |  50 ++
 3 files changed, 324 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/zsda/zsda_sym.c
 create mode 100644 drivers/crypto/zsda/zsda_sym.h

diff --git a/drivers/common/zsda/meson.build b/drivers/common/zsda/meson.build
index 7030b5f05a..59fd7a12ae 100644
--- a/drivers/common/zsda/meson.build
+++ b/drivers/common/zsda/meson.build
@@ -30,7 +30,7 @@ zsda_crypto_path = 'crypto/zsda'
 zsda_crypto_relpath = '../../' + zsda_crypto_path
 if zsda_crypto
libcrypto = dependency('libcrypto', required: false, method: 
'pkg-config')
-   foreach f: ['zsda_sym_session.c']
+   foreach f: ['zsda_sym_session.c', 'zsda_sym.c']
sources += files(join_paths(zsda_crypto_relpath, f))
endforeach
deps += ['security']
diff --git a/drivers/crypto/zsda/zsda_sym.c b/drivers/crypto/zsda/zsda_sym.c
new file mode 100644
index 00..a27234eb4e
--- /dev/null
+++ b/drivers/crypto/zsda/zsda_sym.c
@@ -0,0 +1,273 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 ZTE Corporation
+ */
+
+#include "cryptodev_pmd.h"
+
+#include "zsda_logs.h"
+#include "zsda_sym.h"
+
+#define choose_dst_mbuf(mbuf_src, mbuf_dst) ((mbuf_dst) == NULL ? (mbuf_src) : 
(mbuf_dst))
+#define LBADS_MAX_REMAINDER (16 - 1)
+
+static uint8_t
+zsda_get_opcode_hash(struct zsda_sym_session *sess)
+{
+   switch (sess->auth.algo) {
+   case RTE_CRYPTO_AUTH_SHA1:
+   return ZSDA_OPC_HASH_SHA1;
+
+   case RTE_CRYPTO_AUTH_SHA224:
+   return ZSDA_OPC_HASH_SHA2_224;
+
+   case RTE_CRYPTO_AUTH_SHA256:
+   return ZSDA_OPC_HASH_SHA2_256;
+
+   case RTE_CRYPTO_AUTH_SHA384:
+   return ZSDA_OPC_HASH_SHA2_384;
+
+   case RTE_CRYPTO_AUTH_SHA512:
+   return ZSDA_OPC_HASH_SHA2_512;
+
+   case RTE_CRYPTO_AUTH_SM3:
+   return ZSDA_OPC_HASH_SM3;
+   default:
+   break;
+   }
+
+   return ZSDA_OPC_INVALID;
+}
+
+static uint8_t
+zsda_get_opcode_crypto(struct zsda_sym_session *sess)
+{
+   if (sess->cipher.op == RTE_CRYPTO_CIPHER_OP_ENCRYPT) {
+   if (sess->cipher.algo == RTE_CRYPTO_CIPHER_AES_XTS &&
+   sess->cipher.key_encry.length == 32)
+   return ZSDA_OPC_EC_AES_XTS_256;
+   else if (sess->cipher.algo == RTE_CRYPTO_CIPHER_AES_XTS &&
+sess->cipher.key_encry.length == 64)
+   return ZSDA_OPC_EC_AES_XTS_512;
+   else if (sess->cipher.algo == RTE_CRYPTO_CIPHER_SM4_XTS)
+   return ZSDA_OPC_EC_SM4_XTS_256;
+   } else if (sess->cipher.op == RTE_CRYPTO_CIPHER_OP_DECRYPT) {
+   if (sess->cipher.algo == RTE_CRYPTO_CIPHER_AES_XTS &&
+   sess->cipher.key_decry.length == 32)
+   return ZSDA_OPC_DC_AES_XTS_256;
+   else if (sess->cipher.algo == RTE_CRYPTO_CIPHER_AES_XTS &&
+sess->cipher.key_decry.length == 64)
+   return ZSDA_OPC_DC_AES_XTS_512;
+   else if (sess->cipher.algo == RTE_CRYPTO_CIPHER_SM4_XTS)
+   return ZSDA_OPC_DC_SM4_XTS_256;
+   }
+   return ZSDA_OPC_INVALID;
+}
+
+int
+zsda_encry_match(const void *op_in)
+{
+   const struct rte_crypto_op *op = op_in;
+   struct rte_cryptodev_sym_session *session = op->sym->session;
+   struct zsda_sym_session *sess =
+   (struct zsda_sym_session *)session->driver_priv_data;
+
+   if (sess->chain_order == ZSDA_SYM_CHAIN_ONLY_CIPHER &&
+   sess->cipher.op == RTE_CRYPTO_CIPHER_OP_ENCRYPT)
+   return 1;
+   else
+   return 0;
+}
+
+int
+zsda_decry_match(const void *op_in)
+{
+   const struct rte_crypto_op *op = op_in;
+   struct rte_cryptodev_sym_session *session = op->sym->session;
+   struct zsda_sym_session *sess =
+   (struct zsda_sym_session *)session->driver_priv_data;
+
+   if (sess->chain_order == ZSDA_SYM_CHAIN_ONLY_CIPHER &&
+   sess->cipher.op == RTE_CRYPTO_CIPHER_OP_DECRYPT)
+   return 1;
+   else
+   return 0;
+}
+
+int
+zsda_hash_match(const void *op_in)
+{
+   const struct rte_crypto_op *op = op_in;
+   struct rte_cryptodev_sym_session *session = op->sym->session;
+   struct zsda_sym_session *sess =
+   (struct zsda_sym_session *)session->driver_priv_data;
+
+   if (sess->chain_order == ZSDA_SYM_CHAIN_ONLY_AUTH)
+   return 1;
+   else
+   return 0;
+}
+
+static int
+zsda_check_len_lbads(uint32_t data_len, uint32_t lbads_size)
+{
+   if (data_len < 16) {
+

[PATCH v11 04/12] common/zsda: configure zsda device

2024-10-17 Thread Hanxiao Li
The patch provides a series of interfaces for driver probe remove,etc.

Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/meson.build   |   1 +
 drivers/common/zsda/zsda_device.c | 263 ++
 drivers/common/zsda/zsda_device.h | 112 +
 3 files changed, 376 insertions(+)
 create mode 100644 drivers/common/zsda/zsda_device.c
 create mode 100644 drivers/common/zsda/zsda_device.h

diff --git a/drivers/common/zsda/meson.build b/drivers/common/zsda/meson.build
index 8971289080..44f2375d8d 100644
--- a/drivers/common/zsda/meson.build
+++ b/drivers/common/zsda/meson.build
@@ -11,4 +11,5 @@ deps += ['bus_pci']
 sources += files(
'zsda_common.c',
'zsda_logs.c',
+   'zsda_device.c',
)
diff --git a/drivers/common/zsda/zsda_device.c 
b/drivers/common/zsda/zsda_device.c
new file mode 100644
index 00..5af3dcc3c9
--- /dev/null
+++ b/drivers/common/zsda/zsda_device.c
@@ -0,0 +1,263 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 ZTE Corporation
+ */
+
+
+#include 
+#include 
+
+#include "zsda_device.h"
+
+/* per-process array of device data */
+struct zsda_device_info zsda_devs[RTE_PMD_ZSDA_MAX_PCI_DEVICES];
+static int zsda_nb_pci_devices;
+
+/*
+ * The set of PCI devices this driver supports
+ */
+static const struct rte_pci_id pci_id_zsda_map[] = {
+   {
+   RTE_PCI_DEVICE(0x1cf2, 0x8050),
+   },
+   {
+   RTE_PCI_DEVICE(0x1cf2, 0x8051),
+   },
+   {.device_id = 0},
+};
+
+static struct zsda_pci_device *
+zsda_pci_get_named_dev(const char *name)
+{
+   unsigned int i;
+
+   if (name == NULL) {
+   ZSDA_LOG(ERR, E_NULL);
+   return NULL;
+   }
+
+   for (i = 0; i < RTE_PMD_ZSDA_MAX_PCI_DEVICES; i++) {
+   if (zsda_devs[i].mz &&
+   (strcmp(((struct zsda_pci_device *)zsda_devs[i].mz->addr)
+   ->name,
+   name) == 0))
+   return (struct zsda_pci_device *)zsda_devs[i].mz->addr;
+   }
+
+   return NULL;
+}
+
+static uint8_t
+zsda_pci_find_free_device_index(void)
+{
+   uint32_t dev_id;
+
+   for (dev_id = 0; dev_id < RTE_PMD_ZSDA_MAX_PCI_DEVICES; dev_id++)
+   if (zsda_devs[dev_id].mz == NULL)
+   break;
+
+   return dev_id & (ZSDA_MAX_DEV - 1);
+}
+
+struct zsda_pci_device *
+zsda_get_zsda_dev_from_pci_dev(const struct rte_pci_device *pci_dev)
+{
+   char name[ZSDA_DEV_NAME_MAX_LEN];
+
+   rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+   return zsda_pci_get_named_dev(name);
+}
+
+struct zsda_pci_device *
+zsda_pci_device_allocate(struct rte_pci_device *pci_dev)
+{
+   struct zsda_pci_device *zsda_pci_dev;
+   uint8_t zsda_dev_id;
+   char name[ZSDA_DEV_NAME_MAX_LEN];
+   unsigned int socket_id = rte_socket_id();
+
+   rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+   snprintf(name + strlen(name), (ZSDA_DEV_NAME_MAX_LEN - strlen(name)),
+"_zsda");
+   if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+   const struct rte_memzone *mz = rte_memzone_lookup(name);
+
+   if (mz == NULL) {
+   ZSDA_LOG(ERR, "Secondary can't find %s mz", name);
+   return NULL;
+   }
+   zsda_pci_dev = mz->addr;
+   zsda_devs[zsda_pci_dev->zsda_dev_id].mz = mz;
+   zsda_devs[zsda_pci_dev->zsda_dev_id].pci_dev = pci_dev;
+   zsda_nb_pci_devices++;
+   return zsda_pci_dev;
+   }
+
+   if (zsda_pci_get_named_dev(name) != NULL) {
+   ZSDA_LOG(ERR, E_CONFIG);
+   return NULL;
+   }
+
+   zsda_dev_id = zsda_pci_find_free_device_index();
+
+   if (zsda_dev_id == (RTE_PMD_ZSDA_MAX_PCI_DEVICES - 1)) {
+   ZSDA_LOG(ERR, "Reached maximum number of ZSDA devices");
+   return NULL;
+   }
+
+   zsda_devs[zsda_dev_id].mz =
+   rte_memzone_reserve(name, sizeof(struct zsda_pci_device),
+   (int)(socket_id & 0xfff), 0);
+
+   if (zsda_devs[zsda_dev_id].mz == NULL) {
+   ZSDA_LOG(ERR, E_MALLOC);
+   return NULL;
+   }
+
+   zsda_pci_dev = zsda_devs[zsda_dev_id].mz->addr;
+   memset(zsda_pci_dev, 0, sizeof(*zsda_pci_dev));
+   strlcpy(zsda_pci_dev->name, name, ZSDA_DEV_NAME_MAX_LEN);
+   zsda_pci_dev->zsda_dev_id = zsda_dev_id;
+   zsda_pci_dev->pci_dev = pci_dev;
+   zsda_devs[zsda_dev_id].pci_dev = pci_dev;
+
+   zsda_nb_pci_devices++;
+
+   return zsda_pci_dev;
+}
+
+static int
+zsda_pci_device_release(const struct rte_pci_device *pci_dev)
+{
+   struct zsda_pci_device *zsda_pci_dev;
+   struct zsda_device_info *inst;
+   char name[ZSDA_DEV_NAME_MAX_LEN];
+
+   if (pci_dev == NULL)
+

[PATCH v11 07/12] common/zsda: configure zsda queue dequeue functions

2024-10-17 Thread Hanxiao Li
Add support for zsdadev queue dequeue.

Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/zsda_qp.c | 56 +++
 drivers/common/zsda/zsda_qp.h |  1 +
 2 files changed, 57 insertions(+)

diff --git a/drivers/common/zsda/zsda_qp.c b/drivers/common/zsda/zsda_qp.c
index 87932597ed..55d3f0702d 100644
--- a/drivers/common/zsda/zsda_qp.c
+++ b/drivers/common/zsda/zsda_qp.c
@@ -787,6 +787,62 @@ zsda_enqueue_op_burst(struct zsda_qp *qp, void **ops, 
uint16_t nb_ops)
return nb_send;
 }
 
+static void
+zsda_dequeue(struct qp_srv *srv, void **ops, const uint16_t nb_ops, uint16_t 
*nb)
+{
+   uint16_t head;
+   struct zsda_cqe *cqe;
+   struct zsda_queue *queue = &srv->rx_q;
+   struct zsda_op_cookie *cookie;
+   head = queue->head;
+
+   while (*nb < nb_ops) {
+   cqe = (struct zsda_cqe *)(
+   (uint8_t *)queue->base_addr + head * queue->msg_size);
+
+   if (!CQE_VALID(cqe->err1))
+   break;
+   cookie = srv->op_cookies[cqe->sid];
+
+   ops[*nb] = cookie->op;
+   if (srv->rx_cb(cookie, cqe) == ZSDA_SUCCESS)
+   srv->stats.dequeued_count++;
+   else {
+   ZSDA_LOG(ERR,
+"ERR! Cqe, opcode 0x%x, sid 0x%x, "
+"tx_real_length 0x%x, err0 0x%x, err1 0x%x",
+cqe->op_code, cqe->sid, cqe->tx_real_length,
+cqe->err0, cqe->err1);
+   srv->stats.dequeue_err_count++;
+   }
+   (*nb)++;
+   cookie->used = false;
+
+   head = zsda_modulo_16(head + 1, queue->modulo_mask);
+   queue->head = head;
+   WRITE_CSR_CQ_HEAD(queue->io_addr, queue->hw_queue_number, head);
+   memset(cqe, 0x0, sizeof(struct zsda_cqe));
+   }
+}
+
+uint16_t
+zsda_dequeue_op_burst(struct zsda_qp *qp, void **ops, const uint16_t nb_ops)
+{
+   uint16_t nb = 0;
+   uint32_t type = 0;
+   struct qp_srv *srv;
+
+   for (type = 0; type < ZSDA_SERVICE_INVALID; type++) {
+   if (!qp->srv[type].used)
+   continue;
+   srv = &qp->srv[type];
+   zsda_dequeue(srv, ops, nb_ops, &nb);
+   if (nb >= nb_ops)
+   return nb_ops;
+   }
+   return nb;
+}
+
 int
 zsda_common_setup_qp(uint32_t zsda_dev_id, struct zsda_qp **qp_addr,
const uint16_t queue_pair_id, const struct zsda_qp_config *conf)
diff --git a/drivers/common/zsda/zsda_qp.h b/drivers/common/zsda/zsda_qp.h
index f9efff0e5a..7c61a7b486 100644
--- a/drivers/common/zsda/zsda_qp.h
+++ b/drivers/common/zsda/zsda_qp.h
@@ -136,6 +136,7 @@ int zsda_get_queue_cfg(struct zsda_pci_device 
*zsda_pci_dev);
 int zsda_queue_pair_release(struct zsda_qp **qp_addr);
 
 uint16_t zsda_enqueue_op_burst(struct zsda_qp *qp, void **ops, const uint16_t 
nb_ops);
+uint16_t zsda_dequeue_op_burst(struct zsda_qp *qp, void **ops, const uint16_t 
nb_ops);
 
 int zsda_common_setup_qp(uint32_t dev_id, struct zsda_qp **qp_addr,
const uint16_t queue_pair_id,
-- 
2.27.0

[PATCH v11 00/12] drivers/zsda: introduce zsda drivers

2024-10-17 Thread Hanxiao Li
v11:
- use RTE_LOG_LINE in logging macro.
- fix some known bugs.

v10:
- delete new blank line at EOF
- Cleaning up some code in zsda_log.h

v9:
- add a new feature  in default.ini.
- Re-split the patch according to the new PMD guidelines
https://patches.dpdk.org/project/dpdk/patch/20241006184
254.53499-1-nandinipersad...@gmail.com/
- Split SM4-XTS tests into a new series to releases.
- Separate out datapath(enqueue/dequeue) as a separate patch.

v8:

- fix some errors in cryptodevs/features/zsda.ini.

v7: 

- add release notes and some documentations.
- add MAINTAINERS context in the patch where the file/folder is added.
- add files in meason.build which are included in the patch only.
- add a check for unsupported on Windows.
- notice the implicit cast in C.
- add cover letter.
- compile each of the patches individually.



Hanxiao Li (12):
  zsda: add zsdadev driver documents
  config: add zsda device number
  common/zsda: add some common functions
  common/zsda: configure zsda device
  common/zsda: configure zsda queue base functions
  common/zsda: configure zsda queue enqueue functions
  common/zsda: configure zsda queue dequeue functions
  compress/zsda: add zsda compress driver
  compress/zsda: add zsda compress PMD
  crypto/zsda: add crypto sessions configuration
  crypto/zsda: add zsda crypto driver
  crypto/zsda: add zsda crypto PMD

 MAINTAINERS |   7 +
 config/rte_config.h |   4 +
 doc/guides/compressdevs/features/zsda.ini   |  15 +
 doc/guides/compressdevs/index.rst   |   1 +
 doc/guides/compressdevs/zsda.rst|  45 +
 doc/guides/cryptodevs/features/zsda.ini |  51 ++
 doc/guides/cryptodevs/index.rst |   1 +
 doc/guides/cryptodevs/zsda.rst  | 260 ++
 doc/guides/rel_notes/release_24_11.rst  |   8 +
 drivers/common/zsda/meson.build |  39 +
 drivers/common/zsda/zsda_common.c   | 239 ++
 drivers/common/zsda/zsda_common.h   | 334 
 drivers/common/zsda/zsda_device.c   | 263 ++
 drivers/common/zsda/zsda_device.h   | 112 +++
 drivers/common/zsda/zsda_logs.c |  20 +
 drivers/common/zsda/zsda_logs.h |  25 +
 drivers/common/zsda/zsda_qp.c   | 876 
 drivers/common/zsda/zsda_qp.h   | 149 
 drivers/compress/zsda/zsda_comp.c   | 392 +
 drivers/compress/zsda/zsda_comp.h   |  52 ++
 drivers/compress/zsda/zsda_comp_pmd.c   | 464 +++
 drivers/compress/zsda/zsda_comp_pmd.h   |  34 +
 drivers/crypto/zsda/zsda_sym.c  | 273 ++
 drivers/crypto/zsda/zsda_sym.h  |  50 ++
 drivers/crypto/zsda/zsda_sym_capabilities.h | 111 +++
 drivers/crypto/zsda/zsda_sym_pmd.c  | 445 ++
 drivers/crypto/zsda/zsda_sym_pmd.h  |  33 +
 drivers/crypto/zsda/zsda_sym_session.c  | 512 
 drivers/crypto/zsda/zsda_sym_session.h  |  83 ++
 drivers/meson.build |   1 +
 30 files changed, 4899 insertions(+)
 create mode 100644 doc/guides/compressdevs/features/zsda.ini
 create mode 100644 doc/guides/compressdevs/zsda.rst
 create mode 100644 doc/guides/cryptodevs/features/zsda.ini
 create mode 100644 doc/guides/cryptodevs/zsda.rst
 create mode 100644 drivers/common/zsda/meson.build
 create mode 100644 drivers/common/zsda/zsda_common.c
 create mode 100644 drivers/common/zsda/zsda_common.h
 create mode 100644 drivers/common/zsda/zsda_device.c
 create mode 100644 drivers/common/zsda/zsda_device.h
 create mode 100644 drivers/common/zsda/zsda_logs.c
 create mode 100644 drivers/common/zsda/zsda_logs.h
 create mode 100644 drivers/common/zsda/zsda_qp.c
 create mode 100644 drivers/common/zsda/zsda_qp.h
 create mode 100644 drivers/compress/zsda/zsda_comp.c
 create mode 100644 drivers/compress/zsda/zsda_comp.h
 create mode 100644 drivers/compress/zsda/zsda_comp_pmd.c
 create mode 100644 drivers/compress/zsda/zsda_comp_pmd.h
 create mode 100644 drivers/crypto/zsda/zsda_sym.c
 create mode 100644 drivers/crypto/zsda/zsda_sym.h
 create mode 100644 drivers/crypto/zsda/zsda_sym_capabilities.h
 create mode 100644 drivers/crypto/zsda/zsda_sym_pmd.c
 create mode 100644 drivers/crypto/zsda/zsda_sym_pmd.h
 create mode 100644 drivers/crypto/zsda/zsda_sym_session.c
 create mode 100644 drivers/crypto/zsda/zsda_sym_session.h

-- 
2.27.0

[PATCH v11 12/12] crypto/zsda: add zsda crypto PMD

2024-10-17 Thread Hanxiao Li
The patch provides a series of interfaces for managing
and controlling the configuration, start, stop,
resource management, etc. of crypto devices.

Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/meson.build |   2 +-
 drivers/crypto/zsda/zsda_sym_capabilities.h | 111 +
 drivers/crypto/zsda/zsda_sym_pmd.c  | 445 
 drivers/crypto/zsda/zsda_sym_pmd.h  |  33 ++
 4 files changed, 590 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/zsda/zsda_sym_capabilities.h
 create mode 100644 drivers/crypto/zsda/zsda_sym_pmd.c
 create mode 100644 drivers/crypto/zsda/zsda_sym_pmd.h

diff --git a/drivers/common/zsda/meson.build b/drivers/common/zsda/meson.build
index 59fd7a12ae..bf99ec5209 100644
--- a/drivers/common/zsda/meson.build
+++ b/drivers/common/zsda/meson.build
@@ -30,7 +30,7 @@ zsda_crypto_path = 'crypto/zsda'
 zsda_crypto_relpath = '../../' + zsda_crypto_path
 if zsda_crypto
libcrypto = dependency('libcrypto', required: false, method: 
'pkg-config')
-   foreach f: ['zsda_sym_session.c', 'zsda_sym.c']
+   foreach f: ['zsda_sym_session.c', 'zsda_sym.c', 'zsda_sym_pmd.c']
sources += files(join_paths(zsda_crypto_relpath, f))
endforeach
deps += ['security']
diff --git a/drivers/crypto/zsda/zsda_sym_capabilities.h 
b/drivers/crypto/zsda/zsda_sym_capabilities.h
new file mode 100644
index 00..d9e6dc4b40
--- /dev/null
+++ b/drivers/crypto/zsda/zsda_sym_capabilities.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 ZTE Corporation
+ */
+
+#ifndef _ZSDA_SYM_CAPABILITIES_H_
+#define _ZSDA_SYM_CAPABILITIES_H_
+
+static const struct rte_cryptodev_capabilities zsda_crypto_sym_capabilities[] 
= {
+   {/* SHA1 */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   { .sym = {.xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+   { .auth = {
+   .algo = RTE_CRYPTO_AUTH_SHA1,
+   .block_size = 64,
+   .key_size = {.min = 0, .max = 0, .increment = 
0},
+   .digest_size = {.min = 20, .max = 20, 
.increment = 2},
+   .iv_size = {0} },
+   }   },
+   }
+   },
+   {/* SHA224 */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   { .sym = {
+   .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+   { .auth = {
+   .algo = RTE_CRYPTO_AUTH_SHA224,
+   .block_size = 64,
+   .key_size = {.min = 0, .max = 0, .increment = 
0},
+   .digest_size = {.min = 28, .max = 28, 
.increment = 0},
+   .iv_size = {0} },
+   }   },
+   }
+   },
+   {/* SHA256 */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   { .sym = {
+   .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+   { .auth = {
+   .algo = RTE_CRYPTO_AUTH_SHA256,
+   .block_size = 64,
+   .key_size = {.min = 0, .max = 0, .increment = 
0},
+   .digest_size = {.min = 32, .max = 32, 
.increment = 0},
+   .iv_size = {0} },
+   } },
+   }
+   },
+   {/* SHA384 */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   { .sym = {
+   .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+   { .auth = {
+   .algo = RTE_CRYPTO_AUTH_SHA384,
+   .block_size = 128,
+   .key_size = {.min = 0, .max = 0, .increment = 
0},
+   .digest_size = {.min = 48, .max = 48, 
.increment = 0},
+   .iv_size = {0} },
+   } },
+   }
+   },
+   {/* SHA512 */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   { .sym = {
+   .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+   { .auth = {
+   .algo = RTE_CRYPTO_AUTH_SHA512,
+   .block_size = 128,
+   .key_size = {.min = 0, .max = 0, .increment = 
0},
+   .digest_size = {.min = 64, .max = 64, 
.increment = 0},
+   .iv_size = {0} },
+   } },
+   }
+   },
+   {/* SM3 */
+   .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+   { .sym = {
+   .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+   { .auth = {
+   .algo = RTE_CRYPTO_AUTH_SM3,
+

[PATCH v11 05/12] common/zsda: configure zsda queue base functions

2024-10-17 Thread Hanxiao Li
Add support for zsdadev queue interfaces,
including queue start, stop, create, remove, etc.

Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/meson.build |   1 +
 drivers/common/zsda/zsda_qp.c   | 715 
 drivers/common/zsda/zsda_qp.h   | 146 +++
 3 files changed, 862 insertions(+)
 create mode 100644 drivers/common/zsda/zsda_qp.c
 create mode 100644 drivers/common/zsda/zsda_qp.h

diff --git a/drivers/common/zsda/meson.build b/drivers/common/zsda/meson.build
index 44f2375d8d..4d1eec8867 100644
--- a/drivers/common/zsda/meson.build
+++ b/drivers/common/zsda/meson.build
@@ -12,4 +12,5 @@ sources += files(
'zsda_common.c',
'zsda_logs.c',
'zsda_device.c',
+   'zsda_qp.c',
)
diff --git a/drivers/common/zsda/zsda_qp.c b/drivers/common/zsda/zsda_qp.c
new file mode 100644
index 00..f1fe6c5817
--- /dev/null
+++ b/drivers/common/zsda/zsda_qp.c
@@ -0,0 +1,715 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 ZTE Corporation
+ */
+
+#include 
+
+#include 
+
+#include "zsda_common.h"
+#include "zsda_logs.h"
+#include "zsda_device.h"
+#include "zsda_qp.h"
+
+#define RING_DIR_TX 0
+#define RING_DIR_RX 1
+
+struct ring_size {
+   uint16_t tx_msg_size;
+   uint16_t rx_msg_size;
+};
+uint8_t zsda_num_used_qps;
+
+struct ring_size zsda_qp_hw_ring_size[ZSDA_MAX_SERVICES] = {
+   [ZSDA_SERVICE_SYMMETRIC_ENCRYPT] = {128, 16},
+   [ZSDA_SERVICE_SYMMETRIC_DECRYPT] = {128, 16},
+   [ZSDA_SERVICE_COMPRESSION] = {32, 16},
+   [ZSDA_SERVICE_DECOMPRESSION] = {32, 16},
+   [ZSDA_SERVICE_HASH_ENCODE] = {32, 16},
+};
+
+static void
+zsda_set_queue_head_tail(const struct zsda_pci_device *zsda_pci_dev,
+const uint8_t qid)
+{
+   struct rte_pci_device *pci_dev =
+   zsda_devs[zsda_pci_dev->zsda_dev_id].pci_dev;
+   uint8_t *mmio_base = pci_dev->mem_resource[0].addr;
+
+   ZSDA_CSR_WRITE32(mmio_base + IO_DB_INITIAL_CONFIG + (qid * 4),
+SET_HEAD_INTI);
+}
+
+static int
+zsda_get_queue_cfg_by_id(const struct zsda_pci_device *zsda_pci_dev,
+const uint8_t qid, struct qinfo *qcfg)
+{
+   struct zsda_admin_req_qcfg req = {0};
+   struct zsda_admin_resp_qcfg resp = {0};
+   int ret = 0;
+   struct rte_pci_device *pci_dev =
+   zsda_devs[zsda_pci_dev->zsda_dev_id].pci_dev;
+
+   if (qid >= MAX_QPS_ON_FUNCTION) {
+   ZSDA_LOG(ERR, "qid beyond limit!");
+   return ZSDA_FAILED;
+   }
+
+   zsda_admin_msg_init(pci_dev);
+   req.msg_type = ZSDA_ADMIN_QUEUE_CFG_REQ;
+   req.qid = qid;
+
+   ret = zsda_send_admin_msg(pci_dev, &req, sizeof(req));
+   if (ret) {
+   ZSDA_LOG(ERR, "Failed! Send msg");
+   return ret;
+   }
+
+   ret = zsda_recv_admin_msg(pci_dev, &resp, sizeof(resp));
+   if (ret) {
+   ZSDA_LOG(ERR, "Failed! Receive msg");
+   return ret;
+   }
+
+   memcpy(qcfg, &resp.qcfg, sizeof(*qcfg));
+
+   return ZSDA_SUCCESS;
+}
+
+
+int
+zsda_get_queue_cfg(struct zsda_pci_device *zsda_pci_dev)
+{
+   uint8_t i;
+   uint32_t index;
+   enum zsda_service_type type;
+   struct zsda_qp_hw *zsda_hw_qps = zsda_pci_dev->zsda_hw_qps;
+   struct qinfo qcfg = {0};
+   int ret = 0;
+
+   for (i = 0; i < zsda_num_used_qps; i++) {
+   zsda_set_queue_head_tail(zsda_pci_dev, i);
+   ret = zsda_get_queue_cfg_by_id(zsda_pci_dev, i, &qcfg);
+   type = qcfg.q_type;
+   if (ret) {
+   ZSDA_LOG(ERR, "get queue cfg!");
+   return ret;
+   }
+   if (type >= ZSDA_SERVICE_INVALID)
+   continue;
+
+   index = zsda_pci_dev->zsda_qp_hw_num[type];
+   zsda_hw_qps[type].data[index].used = true;
+   zsda_hw_qps[type].data[index].tx_ring_num = i;
+   zsda_hw_qps[type].data[index].rx_ring_num = i;
+   zsda_hw_qps[type].data[index].tx_msg_size =
+   zsda_qp_hw_ring_size[type].tx_msg_size;
+   zsda_hw_qps[type].data[index].rx_msg_size =
+   zsda_qp_hw_ring_size[type].rx_msg_size;
+
+   zsda_pci_dev->zsda_qp_hw_num[type]++;
+   }
+
+   return ret;
+}
+
+
+static uint8_t
+zsda_get_num_used_qps(const struct rte_pci_device *pci_dev)
+{
+   uint8_t *mmio_base = pci_dev->mem_resource[0].addr;
+   uint8_t num_used_qps;
+
+   num_used_qps = ZSDA_CSR_READ8(mmio_base + 0);
+
+   return num_used_qps;
+}
+
+static int
+zsda_check_write(uint8_t *addr, const uint32_t dst_value)
+{
+   int times = ZSDA_TIME_NUM;
+   uint32_t val;
+
+   val = ZSDA_CSR_READ32(addr);
+
+   while ((val != dst_value) && times--) {
+   val = ZSDA_CSR_READ32(addr);
+   rte_d

[PATCH v11 10/12] crypto/zsda: add crypto sessions configuration

2024-10-17 Thread Hanxiao Li
add session support for zsda cryptodev.

Signed-off-by: Hanxiao Li 
---
 drivers/common/zsda/meson.build|  15 +-
 drivers/crypto/zsda/zsda_sym_session.c | 512 +
 drivers/crypto/zsda/zsda_sym_session.h |  83 
 3 files changed, 609 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/zsda/zsda_sym_session.c
 create mode 100644 drivers/crypto/zsda/zsda_sym_session.h

diff --git a/drivers/common/zsda/meson.build b/drivers/common/zsda/meson.build
index 20fb37caaf..7030b5f05a 100644
--- a/drivers/common/zsda/meson.build
+++ b/drivers/common/zsda/meson.build
@@ -7,7 +7,7 @@ if is_windows
 subdir_done()
 endif
 
-deps += ['bus_pci', 'compressdev']
+deps += ['bus_pci', 'compressdev', 'cryptodev']
 sources += files(
'zsda_common.c',
'zsda_logs.c',
@@ -24,3 +24,16 @@ if zsda_compress
sources += files(join_paths(zsda_compress_relpath, f))
endforeach
 endif
+
+zsda_crypto = true
+zsda_crypto_path = 'crypto/zsda'
+zsda_crypto_relpath = '../../' + zsda_crypto_path
+if zsda_crypto
+   libcrypto = dependency('libcrypto', required: false, method: 
'pkg-config')
+   foreach f: ['zsda_sym_session.c']
+   sources += files(join_paths(zsda_crypto_relpath, f))
+   endforeach
+   deps += ['security']
+   ext_deps += libcrypto
+   cflags += ['-DBUILD_ZSDA_SYM']
+endif
diff --git a/drivers/crypto/zsda/zsda_sym_session.c 
b/drivers/crypto/zsda/zsda_sym_session.c
new file mode 100644
index 00..83f9e608cf
--- /dev/null
+++ b/drivers/crypto/zsda/zsda_sym_session.c
@@ -0,0 +1,512 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 ZTE Corporation
+ */
+
+#include "cryptodev_pmd.h"
+
+#include "zsda_logs.h"
+#include "zsda_sym_session.h"
+
+/ AES KEY EXPANSION /
+/**
+ * AES S-boxes
+ * Sbox table: 8bits input convert to 8bits output
+ **/
+static const unsigned char aes_sbox[256] = {
+   /* 0 12  3 45 6 7  89 A  B
+* C DE F
+*/
+   0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b,
+   0xfe, 0xd7, 0xab, 0x76, 0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0,
+   0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0, 0xb7, 0xfd, 0x93, 0x26,
+   0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15,
+   0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, 0x07, 0x12, 0x80, 0xe2,
+   0xeb, 0x27, 0xb2, 0x75, 0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0,
+   0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84, 0x53, 0xd1, 0x00, 0xed,
+   0x20, 0xfc, 0xb1, 0x5b, 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf,
+   0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f,
+   0x50, 0x3c, 0x9f, 0xa8, 0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5,
+   0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2, 0xcd, 0x0c, 0x13, 0xec,
+   0x5f, 0x97, 0x44, 0x17, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73,
+   0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee, 0xb8, 0x14,
+   0xde, 0x5e, 0x0b, 0xdb, 0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c,
+   0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79, 0xe7, 0xc8, 0x37, 0x6d,
+   0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08,
+   0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f,
+   0x4b, 0xbd, 0x8b, 0x8a, 0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e,
+   0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e, 0xe1, 0xf8, 0x98, 0x11,
+   0x69, 0xd9, 0x8e, 0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf,
+   0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f,
+   0xb0, 0x54, 0xbb, 0x16};
+
+/**
+ * The round constant word array, Rcon[i]
+ *
+ * From Wikipedia's article on the Rijndael key schedule @
+ * https://en.wikipedia.org/wiki/Rijndael_key_schedule#Rcon "Only the first 
some
+ * of these constants are actually used – up to rcon[10] for AES-128 (as 11
+ * round keys are needed), up to rcon[8] for AES-192, up to rcon[7] for 
AES-256.
+ * rcon[0] is not used in AES algorithm."
+ */
+static const unsigned char Rcon[11] = {0x8d, 0x01, 0x02, 0x04, 0x08, 0x10,
+  0x20, 0x40, 0x80, 0x1b, 0x36};
+
+#define GET_AES_SBOX_VAL(num) (aes_sbox[(num)])
+
+/ SM4 KEY EXPANSION /
+/*
+ * 32-bit integer manipulation macros (big endian)
+ */
+#ifndef GET_ULONG_BE
+#define GET_ULONG_BE(n, b, i)  
\
+   {  \
+   (n) = ((unsigned int)(b)[(i)] << 24) | \
+ ((unsigned int)(b)[(i) + 1] << 16) | \
+ ((unsigned int)(b)[(i) + 2] << 8) |  \
+ ((unsigned in

[PATCH v11 02/12] config: add zsda device number

2024-10-17 Thread Hanxiao Li
Add the number of zsda devices.

Signed-off-by: Hanxiao Li 
---
 config/rte_config.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/config/rte_config.h b/config/rte_config.h
index dd7bb0d35b..e1e85b3291 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -117,6 +117,10 @@
 #define RTE_PMD_QAT_COMP_SGL_MAX_SEGMENTS 16
 #define RTE_PMD_QAT_COMP_IM_BUFFER_SIZE 65536
 
+/* ZSDA device */
+/* Max. number of ZSDA devices which can be attached */
+#define RTE_PMD_ZSDA_MAX_PCI_DEVICES 256
+
 /* virtio crypto defines */
 #define RTE_MAX_VIRTIO_CRYPTO 32
 
-- 
2.27.0

Re: [PATCH v3] hash: separate param checks in hash create func

2024-10-17 Thread Thomas Monjalon
Recheck-request: iol-unit-amd64-testing, iol-unit-arm64-testing




Re: [EXTERNAL] Re: [RFC PATCH 0/3] add feature arc in rte_graph

2024-10-17 Thread Robin Jarry

Hi Nitin, all,

Nitin Saxena, Oct 17, 2024 at 09:03:

Hi Robin/David and all,

We realized the feature arc patch series is difficult to understand as 
a new concept. Our objectives are following with feature arc changes


1. Allow reusability of standard DPDK nodes (defined in lib/nodes/*) 
   with out-of-tree applications (like grout). Currently out-of-tree 
   graph applications are duplicating standard nodes but not reusing 
   the standard ones which are available. In the long term, we would 
   like to mature standard DPDK nodes with flexibility of hooking them 
   to out-of-tree application nodes.


It would be ideal if the in-built nodes could be reused. When we started 
working on grout, I tried multiple approaches where I could reuse these 
nodes, but all failed. The nodes public API seems tailored for app/graph 
but does not fit well with other control plane implementations.


One of the main issues I had is that the ethdev_rx and ethdev_tx nodes 
are cloned per rxq / txq associated with a graph worker. The rte_node 
API requires that every clone has a unique name. This in turn makes hot 
plugging of DPDK ports very complex, if not impossible.


For example, with the in-built nodes, it is not possible to change the 
number of ports or their number of RX queues without destroying the 
whole graph and creating a new one from scratch.


Also, the current implementation of "ip{4,6}-rewrite" handles writing 
ethernet header data. This would prevent it from using this node for an 
IP-in-IP tunnel interface as we did in grout.


Do you think we could change the in-built nodes to enforce OSI layer 
separation of concerns? It would make them much more flexible. It may 
cause a slight drop of performance because you'd be splitting processing 
in two different nodes. But I think flexibility is more important. 
Otherwise, the in-built nodes can only be used for very specific 
use-cases.


Finally, I would like to improve the rte_node API to allow defining and 
enforcing per-packet metadata that every node expects as input. The 
current in-built nodes rely on mbuf dynamic fields for this but this 
means you only have 9x32 bits available. And using all of these may 
break some drivers (ixgbe) that rely on dynfields to work. Have you 
considered using mbuf private data for this?




2. Flexibility to enable/disable sub-graphs per interface based on the 
   runtime configuration updates. Protocol sub-graphs can be 
   selectively enabled for few (or all interfaces) at runtime


3. More than one sub-graphs/features can be enabled on an interface. 
   So a packet has to follow a sequential ordering node path on worker 
   cores. Packets may need to move from one sub-graph to another 
   sub-graph per interface


4. Last but not least, an optimized implementation which does not (or 
   minimally) stop worker cores for any control plane runtime updates. 
   Any performance regression should also be avoided


I am planning to create a draft presentation on feature arc which 
I can share, when ready, to discuss. If needed, I can also plan to 
present that in one of the DPDK community meetings. Their we can also 
discuss if there are any alternatives of achieving above objectives


Looking forward to this.

Thanks!



Re: [PATCH 0/5] Increase minimum meson version

2024-10-17 Thread David Marchand
On Fri, Sep 20, 2024 at 2:57 PM Bruce Richardson
 wrote:
>
> This patchset proposed increasing the minimum meson version to 0.57
> and makes changes to update our build files appropriately for that
> change: replacing deprecated functions, removing unnecessary version
> checks and taking advantage of some new capabilities.
>
> Why 0.57? No one particular reason; it's mainly a conservative version
> bump that doesn't have many impacts, but still gives us the minimum
> updates we need to replace the deprecated get_cross_properties fn
> and have a few extra features guaranteed available.
>
> Bruce Richardson (5):
>   build: increase minimum meson version to 0.57
>   build: remove version check on compiler links function
>   build: remove unnecessary version checks
>   build: use version file support from meson
>   build: replace deprecated meson function
>
>  .ci/linux-setup.sh| 2 +-
>  config/arm/meson.build| 4 ++--
>  config/meson.build| 8 
>  config/riscv/meson.build  | 4 ++--
>  doc/api/meson.build   | 2 +-
>  doc/guides/linux_gsg/sys_reqs.rst | 2 +-
>  doc/guides/prog_guide/build-sdk-meson.rst | 2 +-
>  drivers/common/qat/meson.build| 2 +-
>  drivers/crypto/ipsec_mb/meson.build   | 2 +-
>  drivers/event/cnxk/meson.build| 2 +-
>  drivers/meson.build   | 7 ++-
>  drivers/net/cnxk/meson.build  | 2 +-
>  lib/meson.build   | 6 --
>  meson.build   | 7 ++-
>  14 files changed, 20 insertions(+), 32 deletions(-)

The series looks good, and CI guys gave me the green light.
Series applied, thanks Bruce.


There is one remaining TODO in config/meson.build:

# MS linker requires special treatment.
# TODO: use cc.get_linker_id() with Meson >= 0.54
is_ms_compiler = is_windows and (cc.get_id() == 'msvc')
is_ms_linker = is_windows and (cc.get_id() == 'clang' or
is_ms_compiler)


-- 
David Marchand



[PATCH 03/10] net/mlx5: rework creation of unicast flow rules

2024-10-17 Thread Dariusz Sosnowski
Rework the code responsible for creation of unicast control flow rules,
to allow creation of:

- unicast DMAC flow rules and
- unicast DMAC with VMAN flow rules,

outside of mlx5_traffic_enable() called when port is started.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/meson.build  |   1 +
 drivers/net/mlx5/mlx5_flow.h  |   9 ++
 drivers/net/mlx5/mlx5_flow_hw.c   | 215 --
 drivers/net/mlx5/mlx5_flow_hw_stubs.c |  41 +
 4 files changed, 219 insertions(+), 47 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_flow_hw_stubs.c

diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index eb5eb2cce7..0114673491 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -23,6 +23,7 @@ sources = files(
 'mlx5_flow_dv.c',
 'mlx5_flow_aso.c',
 'mlx5_flow_flex.c',
+'mlx5_flow_hw_stubs.c',
 'mlx5_mac.c',
 'mlx5_rss.c',
 'mlx5_rx.c',
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 86a1476879..2ff0b25d4d 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2990,6 +2990,15 @@ struct mlx5_flow_hw_ctrl_fdb {
 #define MLX5_CTRL_VLAN_FILTER(RTE_BIT32(6))
 
 int mlx5_flow_hw_ctrl_flows(struct rte_eth_dev *dev, uint32_t flags);
+
+/** Create a control flow rule for matching unicast DMAC (HWS). */
+int mlx5_flow_hw_ctrl_flow_dmac(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr);
+
+/** Create a control flow rule for matching unicast DMAC with VLAN (HWS). */
+int mlx5_flow_hw_ctrl_flow_dmac_vlan(struct rte_eth_dev *dev,
+const struct rte_ether_addr *addr,
+const uint16_t vlan);
+
 void mlx5_flow_hw_cleanup_ctrl_rx_templates(struct rte_eth_dev *dev);
 
 int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index f6918825eb..afc9778b97 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -15896,12 +15896,14 @@ __flow_hw_ctrl_flows_single_vlan(struct rte_eth_dev 
*dev,
 }
 
 static int
-__flow_hw_ctrl_flows_unicast(struct rte_eth_dev *dev,
-struct rte_flow_template_table *tbl,
-const enum mlx5_flow_ctrl_rx_eth_pattern_type 
pattern_type,
-const enum mlx5_flow_ctrl_rx_expanded_rss_type 
rss_type)
+__flow_hw_ctrl_flows_unicast_create(struct rte_eth_dev *dev,
+   struct rte_flow_template_table *tbl,
+   const enum 
mlx5_flow_ctrl_rx_expanded_rss_type rss_type,
+   const struct rte_ether_addr *addr)
 {
-   struct rte_flow_item_eth eth_spec;
+   struct rte_flow_item_eth eth_spec = {
+   .hdr.dst_addr = *addr,
+   };
struct rte_flow_item items[5];
struct rte_flow_action actions[] = {
{ .type = RTE_FLOW_ACTION_TYPE_RSS },
@@ -15909,15 +15911,11 @@ __flow_hw_ctrl_flows_unicast(struct rte_eth_dev *dev,
};
struct mlx5_hw_ctrl_flow_info flow_info = {
.type = MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC,
+   .uc = {
+   .dmac = *addr,
+   },
};
-   const struct rte_ether_addr cmp = {
-   .addr_bytes = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
-   };
-   unsigned int i;
-
-   RTE_SET_USED(pattern_type);
 
-   memset(ð_spec, 0, sizeof(eth_spec));
memset(items, 0, sizeof(items));
items[0] = (struct rte_flow_item){
.type = RTE_FLOW_ITEM_TYPE_ETH,
@@ -15927,28 +15925,47 @@ __flow_hw_ctrl_flows_unicast(struct rte_eth_dev *dev,
items[2] = flow_hw_get_ctrl_rx_l3_item(rss_type);
items[3] = flow_hw_get_ctrl_rx_l4_item(rss_type);
items[4] = (struct rte_flow_item){ .type = RTE_FLOW_ITEM_TYPE_END };
+
+   if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, actions, 0, 
&flow_info, false))
+   return -rte_errno;
+
+   return 0;
+}
+
+static int
+__flow_hw_ctrl_flows_unicast(struct rte_eth_dev *dev,
+struct rte_flow_template_table *tbl,
+const enum mlx5_flow_ctrl_rx_expanded_rss_type 
rss_type)
+{
+   unsigned int i;
+   int ret;
+
for (i = 0; i < MLX5_MAX_MAC_ADDRESSES; ++i) {
struct rte_ether_addr *mac = &dev->data->mac_addrs[i];
 
-   if (!memcmp(mac, &cmp, sizeof(*mac)))
+   if (rte_is_zero_ether_addr(mac))
continue;
-   eth_spec.hdr.dst_addr = *mac;
-   flow_info.uc.dmac = *mac;
-   if (flow_hw_create_ctrl_flow(dev, dev,
-tbl, items, 0, actions, 0, 
&flow_info, false))
-   return -rte_

[PATCH 04/10] net/mlx5: support destroying unicast flow rules

2024-10-17 Thread Dariusz Sosnowski
This patch adds support for destroying:

- unicast DMAC control flow rules and
- unicast DMAC with VLAN control flow rules,

without affecting any other control flow rules,
when HWS flow engine is used.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5_flow.h  |  8 +++
 drivers/net/mlx5/mlx5_flow_hw.c   | 72 +++
 drivers/net/mlx5/mlx5_flow_hw_stubs.c | 27 ++
 3 files changed, 107 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 2ff0b25d4d..165d17e40a 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2994,11 +2994,19 @@ int mlx5_flow_hw_ctrl_flows(struct rte_eth_dev *dev, 
uint32_t flags);
 /** Create a control flow rule for matching unicast DMAC (HWS). */
 int mlx5_flow_hw_ctrl_flow_dmac(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr);
 
+/** Destroy a control flow rule for matching unicast DMAC (HWS). */
+int mlx5_flow_hw_ctrl_flow_dmac_destroy(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr);
+
 /** Create a control flow rule for matching unicast DMAC with VLAN (HWS). */
 int mlx5_flow_hw_ctrl_flow_dmac_vlan(struct rte_eth_dev *dev,
 const struct rte_ether_addr *addr,
 const uint16_t vlan);
 
+/** Destroy a control flow rule for matching unicast DMAC with VLAN (HWS). */
+int mlx5_flow_hw_ctrl_flow_dmac_vlan_destroy(struct rte_eth_dev *dev,
+const struct rte_ether_addr *addr,
+const uint16_t vlan);
+
 void mlx5_flow_hw_cleanup_ctrl_rx_templates(struct rte_eth_dev *dev);
 
 int mlx5_flow_group_to_table(struct rte_eth_dev *dev,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index afc9778b97..35e9eead7e 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -16211,6 +16211,41 @@ mlx5_flow_hw_ctrl_flow_dmac(struct rte_eth_dev *dev,
 addr, 0);
 }
 
+int
+mlx5_flow_hw_ctrl_flow_dmac_destroy(struct rte_eth_dev *dev,
+   const struct rte_ether_addr *addr)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+   struct mlx5_hw_ctrl_flow *entry;
+   struct mlx5_hw_ctrl_flow *tmp;
+   int ret;
+
+   /*
+* HWS does not have automatic RSS flow expansion,
+* so each variant of the control flow rule is a separate entry in the 
list.
+* In that case, the whole list must be traversed.
+*/
+   entry = LIST_FIRST(&priv->hw_ctrl_flows);
+   while (entry != NULL) {
+   tmp = LIST_NEXT(entry, next);
+
+   if (entry->info.type != 
MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC ||
+   !rte_is_same_ether_addr(addr, &entry->info.uc.dmac)) {
+   entry = tmp;
+   continue;
+   }
+
+   ret = flow_hw_destroy_ctrl_flow(dev, entry->flow);
+   LIST_REMOVE(entry, next);
+   mlx5_free(entry);
+   if (ret)
+   return ret;
+
+   entry = tmp;
+   }
+   return 0;
+}
+
 int
 mlx5_flow_hw_ctrl_flow_dmac_vlan(struct rte_eth_dev *dev,
 const struct rte_ether_addr *addr,
@@ -16220,6 +16255,43 @@ mlx5_flow_hw_ctrl_flow_dmac_vlan(struct rte_eth_dev 
*dev,
 addr, vlan);
 }
 
+int
+mlx5_flow_hw_ctrl_flow_dmac_vlan_destroy(struct rte_eth_dev *dev,
+const struct rte_ether_addr *addr,
+const uint16_t vlan)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+   struct mlx5_hw_ctrl_flow *entry;
+   struct mlx5_hw_ctrl_flow *tmp;
+   int ret;
+
+   /*
+* HWS does not have automatic RSS flow expansion,
+* so each variant of the control flow rule is a separate entry in the 
list.
+* In that case, the whole list must be traversed.
+*/
+   entry = LIST_FIRST(&priv->hw_ctrl_flows);
+   while (entry != NULL) {
+   tmp = LIST_NEXT(entry, next);
+
+   if (entry->info.type != 
MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC_VLAN ||
+   !rte_is_same_ether_addr(addr, &entry->info.uc.dmac) ||
+   vlan != entry->info.uc.vlan) {
+   entry = tmp;
+   continue;
+   }
+
+   ret = flow_hw_destroy_ctrl_flow(dev, entry->flow);
+   LIST_REMOVE(entry, next);
+   mlx5_free(entry);
+   if (ret)
+   return ret;
+
+   entry = tmp;
+   }
+   return 0;
+}
+
 static __rte_always_inline uint32_t
 mlx5_reformat_domain_to_tbl_type(const struct rte_flow_indir_action_conf 
*domain)
 {
diff 

[PATCH 02/10] net/mlx5: add checking if unicast flow rule exists

2024-10-17 Thread Dariusz Sosnowski
Add 2 internal functions for checking if:

- unicast DMAC control flow rule or
- unicast DMAC with VLAN control flow rule,

was created.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5.h  | 11 +++
 drivers/net/mlx5/mlx5_flow.c | 37 
 2 files changed, 48 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 80829be5b4..3551b793d6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1831,6 +1831,17 @@ struct mlx5_hw_ctrl_flow_info {
};
 };
 
+/** Returns true if a control flow rule with unicast DMAC match on given 
address was created. */
+bool mlx5_ctrl_flow_uc_dmac_exists(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr);
+
+/**
+ * Returns true if a control flow rule with unicast DMAC and VLAN match
+ * on given values was created.
+ */
+bool mlx5_ctrl_flow_uc_dmac_vlan_exists(struct rte_eth_dev *dev,
+   const struct rte_ether_addr *addr,
+   const uint16_t vid);
+
 /** Entry for tracking control flow rules in HWS. */
 struct mlx5_hw_ctrl_flow {
LIST_ENTRY(mlx5_hw_ctrl_flow) next;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index effc61cdc9..69f8bd8d97 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -12180,3 +12180,40 @@ rte_pmd_mlx5_destroy_geneve_tlv_parser(void *handle)
return -rte_errno;
 #endif
 }
+
+bool
+mlx5_ctrl_flow_uc_dmac_exists(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+   struct mlx5_hw_ctrl_flow *entry;
+   bool exists = false;
+
+   LIST_FOREACH(entry, &priv->hw_ctrl_flows, next) {
+   if (entry->info.type == 
MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC &&
+   rte_is_same_ether_addr(addr, &entry->info.uc.dmac)) {
+   exists = true;
+   break;
+   }
+   }
+   return exists;
+}
+
+bool
+mlx5_ctrl_flow_uc_dmac_vlan_exists(struct rte_eth_dev *dev,
+  const struct rte_ether_addr *addr,
+  const uint16_t vid)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+   struct mlx5_hw_ctrl_flow *entry;
+   bool exists = false;
+
+   LIST_FOREACH(entry, &priv->hw_ctrl_flows, next) {
+   if (entry->info.type == 
MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC_VLAN &&
+   rte_is_same_ether_addr(addr, &entry->info.uc.dmac) &&
+   vid == entry->info.uc.vlan) {
+   exists = true;
+   break;
+   }
+   }
+   return exists;
+}
-- 
2.39.5



[PATCH 01/10] net/mlx5: track unicast DMAC control flow rules

2024-10-17 Thread Dariusz Sosnowski
All control flow rules in NIC Rx domain, created by HWS flow engine,
were assigned MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS type.
To allow checking if a flow rule with given DMAC or VLAN were created,
the list of associated types is extended with:

- type for unicast DMAC flow rules,
- type for unicast DMAC with VLAN flow rules.

These will be used in the follow up commit,
which adds functions for checking if a given control flow rule exists.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5.h | 15 +++
 drivers/net/mlx5/mlx5_flow_hw.c | 11 +++
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 18b4c15a26..80829be5b4 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1796,6 +1796,8 @@ enum mlx5_hw_ctrl_flow_type {
MLX5_HW_CTRL_FLOW_TYPE_TX_REPR_MATCH,
MLX5_HW_CTRL_FLOW_TYPE_LACP_RX,
MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS,
+   MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC,
+   MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC_VLAN,
 };
 
 /** Additional info about control flow rule. */
@@ -1813,6 +1815,19 @@ struct mlx5_hw_ctrl_flow_info {
 * then fields contains matching SQ number.
 */
uint32_t tx_repr_sq;
+   /** Contains data relevant for unicast control flow rules. */
+   struct {
+   /**
+* If control flow is a unicast DMAC (or with VLAN) 
flow rule,
+* then this field contains DMAC.
+*/
+   struct rte_ether_addr dmac;
+   /**
+* If control flow is a unicast DMAC with VLAN flow 
rule,
+* then this field contains VLAN ID.
+*/
+   uint16_t vlan;
+   } uc;
};
 };
 
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index c5ddd1d404..f6918825eb 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -15908,7 +15908,7 @@ __flow_hw_ctrl_flows_unicast(struct rte_eth_dev *dev,
{ .type = RTE_FLOW_ACTION_TYPE_END },
};
struct mlx5_hw_ctrl_flow_info flow_info = {
-   .type = MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS,
+   .type = MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC,
};
const struct rte_ether_addr cmp = {
.addr_bytes = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
@@ -15932,7 +15932,8 @@ __flow_hw_ctrl_flows_unicast(struct rte_eth_dev *dev,
 
if (!memcmp(mac, &cmp, sizeof(*mac)))
continue;
-   memcpy(ð_spec.hdr.dst_addr.addr_bytes, mac->addr_bytes, 
RTE_ETHER_ADDR_LEN);
+   eth_spec.hdr.dst_addr = *mac;
+   flow_info.uc.dmac = *mac;
if (flow_hw_create_ctrl_flow(dev, dev,
 tbl, items, 0, actions, 0, 
&flow_info, false))
return -rte_errno;
@@ -15954,7 +15955,7 @@ __flow_hw_ctrl_flows_unicast_vlan(struct rte_eth_dev 
*dev,
{ .type = RTE_FLOW_ACTION_TYPE_END },
};
struct mlx5_hw_ctrl_flow_info flow_info = {
-   .type = MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS,
+   .type = MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC_VLAN,
};
const struct rte_ether_addr cmp = {
.addr_bytes = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
@@ -15979,13 +15980,15 @@ __flow_hw_ctrl_flows_unicast_vlan(struct rte_eth_dev 
*dev,
 
if (!memcmp(mac, &cmp, sizeof(*mac)))
continue;
-   memcpy(ð_spec.hdr.dst_addr.addr_bytes, mac->addr_bytes, 
RTE_ETHER_ADDR_LEN);
+   eth_spec.hdr.dst_addr = *mac;
+   flow_info.uc.dmac = *mac;
for (j = 0; j < priv->vlan_filter_n; ++j) {
uint16_t vlan = priv->vlan_filter[j];
struct rte_flow_item_vlan vlan_spec = {
.hdr.vlan_tci = rte_cpu_to_be_16(vlan),
};
 
+   flow_info.uc.vlan = vlan;
items[1].spec = &vlan_spec;
if (flow_hw_create_ctrl_flow(dev, dev, tbl, items, 0, 
actions, 0,
 &flow_info, false))
-- 
2.39.5



[PATCH 00/10] net/mlx5: improve MAC address and VLAN add latency

2024-10-17 Thread Dariusz Sosnowski
Whenever a new MAC address is added to the port, mlx5 PMD will:

- Add this address to `dev->data->mac_addrs[]`.
- Destroy all control flow rules.
- Recreate all control flow rules.

Similar logic is also implemented for VLAN filters.

Because of such logic, the latency of adding the new MAC address
(i.e., latency of `rte_eth_dev_mac_addr_add()` function call)
is actually linear to number of MAC addresses already configured.
Since each operation of creating/destroying a control flow rule,
involves an `ioctl()` syscall, on some setups the latency of adding
a single MAC address can reach ~100ms, when port is operating with >= 100 MAC 
addresses.
The same problem exists for VLAN filters (and even compounded by it).

This patchset aims to resolve these issues,
by reworking how mlx5 PMD handles adding/removing MAC addresses and VLAN 
filters.
Instead of recreating all control flow rules,
only necessary flow rules will be created/removed on each operation,
thus minimizing number of syscalls triggered.

Summary of patches:

- Patch 1-2 - Extends existing `mlx5_hw_ctrl_flow_type` enum with special 
variants,
  which will be used for tracking MAC and VLAN control flow rules.
- Patch 3-4 - Refactors HWS code for control flow rule creation to allow
  creation of specific control flow rules with unicast MAC/VLAN match.
  Also functions are added for deletion of specific rules.
- Patch 5-6 - Prepares the control flow rules list, used by HWS flow engine,
  to be used by other flow engine.
  Goal is to reuse the similar logic in Verbs and DV flow engines.
- Patch 7-8 - Adjusts legacy flow engines, so that unicast DMAC/VLAN control 
flow rules
  are added to the control flow rules list.
  Also exposes functions for creating/destroying specific ones.
- Patch 9-10 - Extends `mlx5_traffic_*` interface with 
`mlx5_traffic_mac_add/remove` and
  `mlx5_traffic_vlan_add/remove` functions.
  They are used in implementations of DPDK APIs for adding/removing MAC 
addresses/VLAN filters
  and their goal is to update the set of control flow rules in a minimal number 
of steps possible,
  without recreating the rules.

As a result of these patches the time to add 128th MAC address,
after 127th was added drops **from ~72 ms to ~197 us** (at least on my setup).

Dariusz Sosnowski (10):
  net/mlx5: track unicast DMAC control flow rules
  net/mlx5: add checking if unicast flow rule exists
  net/mlx5: rework creation of unicast flow rules
  net/mlx5: support destroying unicast flow rules
  net/mlx5: rename control flow rules types
  net/mlx5: shared init of control flow rules
  net/mlx5: add legacy unicast flow rules management
  net/mlx5: add legacy unicast flow rule registration
  net/mlx5: add dynamic unicast flow rule management
  net/mlx5: optimize MAC address and VLAN filter handling

 drivers/net/mlx5/linux/mlx5_os.c  |   3 +
 drivers/net/mlx5/meson.build  |   1 +
 drivers/net/mlx5/mlx5.h   |  62 +++--
 drivers/net/mlx5/mlx5_flow.c  | 149 ++-
 drivers/net/mlx5/mlx5_flow.h  |  36 +++
 drivers/net/mlx5/mlx5_flow_hw.c   | 349 --
 drivers/net/mlx5/mlx5_flow_hw_stubs.c |  68 +
 drivers/net/mlx5/mlx5_mac.c   |  41 ++-
 drivers/net/mlx5/mlx5_trigger.c   | 262 ++-
 drivers/net/mlx5/mlx5_vlan.c  |   9 +-
 drivers/net/mlx5/windows/mlx5_os.c|   3 +
 11 files changed, 867 insertions(+), 116 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_flow_hw_stubs.c

--
2.39.5



Re: [PATCH v2 0/4] simplify doing 32-bit DPDK builds

2024-10-17 Thread David Marchand
On Fri, Sep 6, 2024 at 6:13 PM Bruce Richardson
 wrote:
>
> To make it easier to build and test DPDK on 32-bit x86 add a set of
> cross-compile ini files for a number of common linux distributions.
> This avoids the user having to remember strange meson incantations
> with '-m32' in various args values and with the PKG_CONFIG_LIBDIR
> overridden in the environment.
>
> Bruce Richardson (4):
>   config: add 32-bit x86 debian cross-compilation file
>   config: add fedora 32-bit x86 cross-compile file
>   config: add arch 32-bit cross-compilation file
>   devtools/test-meson-builds: use cross files for 32bit build
>
>  config/x86/cross-32bit-arch.ini   | 22 ++
>  config/x86/cross-32bit-debian.ini | 22 ++
>  config/x86/cross-32bit-fedora.ini | 22 ++
>  devtools/test-meson-builds.sh | 13 +
>  4 files changed, 71 insertions(+), 8 deletions(-)
>  create mode 100644 config/x86/cross-32bit-arch.ini
>  create mode 100644 config/x86/cross-32bit-debian.ini
>  create mode 100644 config/x86/cross-32bit-fedora.ini

Thank you, relying on meson feature is better.
Series applied, now that meson minimum version is set to 0.57.

I'll post an update to the .ci script for rc2.


-- 
David Marchand



[PATCH 05/10] net/mlx5: rename control flow rules types

2024-10-17 Thread Dariusz Sosnowski
All structs and enumerations used for managenement of
HWS control flow rules do not really depend on HWS itself.
In order to allow their reuse with Verbs and DV flow engines and
allow fine-grained creation/destruction of unicast DMAC (with VLAN)
flow rules with these flow engines, this patch renames all related
structs and enumerations.
All are renamed as follows:

- Enum mlx5_hw_ctrl_flow_type renamed to mlx5_ctrl_flow_type.
- Enum prefix MLX5_HW_CTRL_FLOW_TYPE_ changes to
  MLX5_CTRL_FLOW_TYPE_
- Struct mlx5_hw_ctrl_flow_info renamed to mlx5_ctrl_flow_info.
- Struct mlx5_hw_ctrl_flow renamed to mlx5_ctrl_flow_entry.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5.h | 36 
 drivers/net/mlx5/mlx5_flow.c|  8 ++--
 drivers/net/mlx5/mlx5_flow_hw.c | 74 -
 3 files changed, 59 insertions(+), 59 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3551b793d6..a51727526f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1787,23 +1787,23 @@ struct mlx5_obj_ops {
 
 #define MLX5_RSS_HASH_FIELDS_LEN RTE_DIM(mlx5_rss_hash_fields)
 
-enum mlx5_hw_ctrl_flow_type {
-   MLX5_HW_CTRL_FLOW_TYPE_GENERAL,
-   MLX5_HW_CTRL_FLOW_TYPE_SQ_MISS_ROOT,
-   MLX5_HW_CTRL_FLOW_TYPE_SQ_MISS,
-   MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_JUMP,
-   MLX5_HW_CTRL_FLOW_TYPE_TX_META_COPY,
-   MLX5_HW_CTRL_FLOW_TYPE_TX_REPR_MATCH,
-   MLX5_HW_CTRL_FLOW_TYPE_LACP_RX,
-   MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS,
-   MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC,
-   MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC_VLAN,
+enum mlx5_ctrl_flow_type {
+   MLX5_CTRL_FLOW_TYPE_GENERAL,
+   MLX5_CTRL_FLOW_TYPE_SQ_MISS_ROOT,
+   MLX5_CTRL_FLOW_TYPE_SQ_MISS,
+   MLX5_CTRL_FLOW_TYPE_DEFAULT_JUMP,
+   MLX5_CTRL_FLOW_TYPE_TX_META_COPY,
+   MLX5_CTRL_FLOW_TYPE_TX_REPR_MATCH,
+   MLX5_CTRL_FLOW_TYPE_LACP_RX,
+   MLX5_CTRL_FLOW_TYPE_DEFAULT_RX_RSS,
+   MLX5_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC,
+   MLX5_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC_VLAN,
 };
 
 /** Additional info about control flow rule. */
-struct mlx5_hw_ctrl_flow_info {
+struct mlx5_ctrl_flow_info {
/** Determines the kind of control flow rule. */
-   enum mlx5_hw_ctrl_flow_type type;
+   enum mlx5_ctrl_flow_type type;
union {
/**
 * If control flow is a SQ miss flow (root or not),
@@ -1843,8 +1843,8 @@ bool mlx5_ctrl_flow_uc_dmac_vlan_exists(struct 
rte_eth_dev *dev,
const uint16_t vid);
 
 /** Entry for tracking control flow rules in HWS. */
-struct mlx5_hw_ctrl_flow {
-   LIST_ENTRY(mlx5_hw_ctrl_flow) next;
+struct mlx5_ctrl_flow_entry {
+   LIST_ENTRY(mlx5_ctrl_flow_entry) next;
/**
 * Owner device is a port on behalf of which flow rule was created.
 *
@@ -1856,7 +1856,7 @@ struct mlx5_hw_ctrl_flow {
/** Pointer to flow rule handle. */
struct rte_flow *flow;
/** Additional information about the control flow rule. */
-   struct mlx5_hw_ctrl_flow_info info;
+   struct mlx5_ctrl_flow_info info;
 };
 
 /* HW Steering port configuration passed to rte_flow_configure(). */
@@ -1965,8 +1965,8 @@ struct mlx5_priv {
struct mlx5_drop drop_queue; /* Flow drop queues. */
void *root_drop_action; /* Pointer to root drop action. */
rte_spinlock_t hw_ctrl_lock;
-   LIST_HEAD(hw_ctrl_flow, mlx5_hw_ctrl_flow) hw_ctrl_flows;
-   LIST_HEAD(hw_ext_ctrl_flow, mlx5_hw_ctrl_flow) hw_ext_ctrl_flows;
+   LIST_HEAD(hw_ctrl_flow, mlx5_ctrl_flow_entry) hw_ctrl_flows;
+   LIST_HEAD(hw_ext_ctrl_flow, mlx5_ctrl_flow_entry) hw_ext_ctrl_flows;
struct mlx5_flow_hw_ctrl_fdb *hw_ctrl_fdb;
struct rte_flow_pattern_template *hw_tx_repr_tagging_pt;
struct rte_flow_actions_template *hw_tx_repr_tagging_at;
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 69f8bd8d97..af79956eaa 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -12185,11 +12185,11 @@ bool
 mlx5_ctrl_flow_uc_dmac_exists(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr)
 {
struct mlx5_priv *priv = dev->data->dev_private;
-   struct mlx5_hw_ctrl_flow *entry;
+   struct mlx5_ctrl_flow_entry *entry;
bool exists = false;
 
LIST_FOREACH(entry, &priv->hw_ctrl_flows, next) {
-   if (entry->info.type == 
MLX5_HW_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC &&
+   if (entry->info.type == 
MLX5_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC &&
rte_is_same_ether_addr(addr, &entry->info.uc.dmac)) {
exists = true;
break;
@@ -12204,11 +12204,11 @@ mlx5_ctrl_flow_uc_dmac_vlan_exists(struct rte_eth_dev 
*dev,
   const uint16_t vid)
 {
stru

[PATCH 10/10] net/mlx5: optimize MAC address and VLAN filter handling

2024-10-17 Thread Dariusz Sosnowski
This patch:

- Changes MAC address adding/removing handling, so that
  only required control rules are added/removed.
  As a result, rte_eth_dev_mac_addr_add() or
  rte_eth_dev_mac_addr_remove() calls are faster for mlx5 PMD.
- Changes VLAN filtering handling, so that
  only required control flow rules are added/removed.
  As a result, rte_eth_dev_vlan_filter() call is faster for mlx5 PMD.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5_mac.c  | 41 +---
 drivers/net/mlx5/mlx5_vlan.c |  9 
 2 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 22a756a52b..0e5d2be530 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -25,15 +25,25 @@
  *   Pointer to Ethernet device structure.
  * @param index
  *   MAC address index.
+ * @param addr
+ *   If MAC address is actually removed, it will be stored here if pointer is 
not a NULL.
+ *
+ * @return
+ *   True if there was a MAC address under given index.
  */
-static void
-mlx5_internal_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+static bool
+mlx5_internal_mac_addr_remove(struct rte_eth_dev *dev,
+ uint32_t index,
+ struct rte_ether_addr *addr)
 {
MLX5_ASSERT(index < MLX5_MAX_MAC_ADDRESSES);
if (rte_is_zero_ether_addr(&dev->data->mac_addrs[index]))
-   return;
+   return false;
mlx5_os_mac_addr_remove(dev, index);
+   if (addr != NULL)
+   *addr = dev->data->mac_addrs[index];
memset(&dev->data->mac_addrs[index], 0, sizeof(struct rte_ether_addr));
+   return true;
 }
 
 /**
@@ -91,15 +101,15 @@ mlx5_internal_mac_addr_add(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac,
 void
 mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
 {
+   struct rte_ether_addr addr = { 0 };
int ret;
 
if (index >= MLX5_MAX_UC_MAC_ADDRESSES)
return;
-   mlx5_internal_mac_addr_remove(dev, index);
-   if (!dev->data->promiscuous) {
-   ret = mlx5_traffic_restart(dev);
+   if (mlx5_internal_mac_addr_remove(dev, index, &addr)) {
+   ret = mlx5_traffic_mac_remove(dev, &addr);
if (ret)
-   DRV_LOG(ERR, "port %u cannot restart traffic: %s",
+   DRV_LOG(ERR, "port %u cannot update control flow rules: 
%s",
dev->data->port_id, strerror(rte_errno));
}
 }
@@ -132,9 +142,7 @@ mlx5_mac_addr_add(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac,
ret = mlx5_internal_mac_addr_add(dev, mac, index);
if (ret < 0)
return ret;
-   if (!dev->data->promiscuous)
-   return mlx5_traffic_restart(dev);
-   return 0;
+   return mlx5_traffic_mac_add(dev, mac);
 }
 
 /**
@@ -154,6 +162,12 @@ mlx5_mac_addr_set(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac_addr)
uint16_t port_id;
struct mlx5_priv *priv = dev->data->dev_private;
struct mlx5_priv *pf_priv;
+   struct rte_ether_addr old_mac_addr = dev->data->mac_addrs[0];
+   int ret;
+
+   /* ethdev does not check if new default address is the same as the old 
one. */
+   if (rte_is_same_ether_addr(mac_addr, &old_mac_addr))
+   return 0;
 
/*
 * Configuring the VF instead of its representor,
@@ -188,7 +202,10 @@ mlx5_mac_addr_set(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac_addr)
 
DRV_LOG(DEBUG, "port %u setting primary MAC address",
dev->data->port_id);
-   return mlx5_mac_addr_add(dev, mac_addr, 0, 0);
+   ret = mlx5_mac_addr_add(dev, mac_addr, 0, 0);
+   if (ret)
+   return ret;
+   return mlx5_traffic_mac_remove(dev, &old_mac_addr);
 }
 
 /**
@@ -208,7 +225,7 @@ mlx5_set_mc_addr_list(struct rte_eth_dev *dev,
return -rte_errno;
}
for (i = MLX5_MAX_UC_MAC_ADDRESSES; i != MLX5_MAX_MAC_ADDRESSES; ++i)
-   mlx5_internal_mac_addr_remove(dev, i);
+   mlx5_internal_mac_addr_remove(dev, i, NULL);
i = MLX5_MAX_UC_MAC_ADDRESSES;
while (nb_mc_addr--) {
ret = mlx5_internal_mac_addr_add(dev, mc_addr_set++, i++);
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
index e7161b66fe..43a314a679 100644
--- a/drivers/net/mlx5/mlx5_vlan.c
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -54,7 +54,7 @@ mlx5_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
MLX5_ASSERT(priv->vlan_filter_n != 0);
/* Enabling an existing VLAN filter has no effect. */
if (on)
-   goto out;
+   goto no_effect;
/* Remove VLAN filter from list. */
--priv->vlan_filter_n;
memmove(&priv->vlan_filter[i],

[PATCH 08/10] net/mlx5: add legacy unicast flow rule registration

2024-10-17 Thread Dariusz Sosnowski
Whenever a unicast DMAC or unicast DMAC with VLAN ID control flow rule
is created when working with Verbs or DV flow engine,
add this flow rule to the control flow rule list,
with information required for recognizing it.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5_flow.c| 32 +---
 drivers/net/mlx5/mlx5_trigger.c | 26 --
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 463edae70e..2038f78481 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8495,8 +8495,9 @@ mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
.type = RTE_FLOW_ACTION_TYPE_END,
},
};
-   uint32_t flow_idx;
+   uintptr_t flow_idx;
struct rte_flow_error error;
+   struct mlx5_ctrl_flow_entry *entry;
unsigned int i;
 
if (!priv->reta_idx_n || !priv->rxqs_n) {
@@ -8506,11 +8507,36 @@ mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
action_rss.types = 0;
for (i = 0; i != priv->reta_idx_n; ++i)
queue[i] = (*priv->reta_idx)[i];
+
+   entry = mlx5_malloc(MLX5_MEM_ZERO, sizeof(*entry), 
alignof(typeof(*entry)), SOCKET_ID_ANY);
+   if (entry == NULL) {
+   rte_errno = ENOMEM;
+   goto err;
+   }
+
+   entry->owner_dev = dev;
+   if (vlan_spec == NULL) {
+   entry->info.type = 
MLX5_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC;
+   } else {
+   entry->info.type = 
MLX5_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC_VLAN;
+   entry->info.uc.vlan = rte_be_to_cpu_16(vlan_spec->hdr.vlan_tci);
+   }
+   entry->info.uc.dmac = eth_spec->hdr.dst_addr;
+
flow_idx = mlx5_flow_list_create(dev, MLX5_FLOW_TYPE_CTL,
&attr, items, actions, false, &error);
-   if (!flow_idx)
-   return -rte_errno;
+   if (!flow_idx) {
+   mlx5_free(entry);
+   goto err;
+   }
+
+   entry->flow = (struct rte_flow *)flow_idx;
+   LIST_INSERT_HEAD(&priv->hw_ctrl_flows, entry, next);
+
return 0;
+
+err:
+   return -rte_errno;
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index bf836c92fc..4fa9319c4d 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -20,6 +20,8 @@
 #include "mlx5_utils.h"
 #include "rte_pmd_mlx5.h"
 
+static void mlx5_traffic_disable_legacy(struct rte_eth_dev *dev);
+
 /**
  * Stop traffic on Tx queues.
  *
@@ -1736,11 +1738,31 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
return 0;
 error:
ret = rte_errno; /* Save rte_errno before cleanup. */
-   mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
+   mlx5_traffic_disable_legacy(dev);
rte_errno = ret; /* Restore rte_errno. */
return -rte_errno;
 }
 
+static void
+mlx5_traffic_disable_legacy(struct rte_eth_dev *dev)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+   struct mlx5_ctrl_flow_entry *entry;
+   struct mlx5_ctrl_flow_entry *tmp;
+
+   /*
+* Free registered control flow rules first,
+* to free the memory allocated for list entries
+*/
+   entry = LIST_FIRST(&priv->hw_ctrl_flows);
+   while (entry != NULL) {
+   tmp = LIST_NEXT(entry, next);
+   mlx5_legacy_ctrl_flow_destroy(dev, entry);
+   entry = tmp;
+   }
+
+   mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
+}
 
 /**
  * Disable traffic flows configured by control plane
@@ -1758,7 +1780,7 @@ mlx5_traffic_disable(struct rte_eth_dev *dev)
mlx5_flow_hw_flush_ctrl_flows(dev);
else
 #endif
-   mlx5_flow_list_flush(dev, MLX5_FLOW_TYPE_CTL, false);
+   mlx5_traffic_disable_legacy(dev);
 }
 
 /**
-- 
2.39.5



[PATCH 09/10] net/mlx5: add dynamic unicast flow rule management

2024-10-17 Thread Dariusz Sosnowski
This patch extens the mlx5_traffic interface with a couple of functions:

- mlx5_traffic_mac_add() - Create an unicast DMAC flow rule, without
  recreating all control flow rules.
- mlx5_traffic_mac_remove() - Remove an unicast DMAC flow rule,
  without recreating all control flow rules.
- mlx5_traffic_mac_vlan_add() - Create an unicast DMAC with VLAN
  flow rule, without recreating all control flow rules.
- mlx5_traffic_mac_vlan_remove() - Remove an unicast DMAC with VLAN
  flow rule, without recreating all control flow rules.

These functions will be used in the follow up commit,
which will modify the behavior of adding/removing MAC address
and enabling/disabling VLAN filter in mlx5 PMD.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5.h |   4 +
 drivers/net/mlx5/mlx5_trigger.c | 236 
 2 files changed, 240 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a51727526f..0e026f7bbb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -2372,6 +2372,10 @@ int mlx5_hairpin_bind(struct rte_eth_dev *dev, uint16_t 
rx_port);
 int mlx5_hairpin_unbind(struct rte_eth_dev *dev, uint16_t rx_port);
 int mlx5_hairpin_get_peer_ports(struct rte_eth_dev *dev, uint16_t *peer_ports,
size_t len, uint32_t direction);
+int mlx5_traffic_mac_add(struct rte_eth_dev *dev, const struct rte_ether_addr 
*addr);
+int mlx5_traffic_mac_remove(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr);
+int mlx5_traffic_vlan_add(struct rte_eth_dev *dev, const uint16_t vid);
+int mlx5_traffic_vlan_remove(struct rte_eth_dev *dev, const uint16_t vid);
 
 /* mlx5_flow.c */
 
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 4fa9319c4d..cac532b1a1 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1804,3 +1804,239 @@ mlx5_traffic_restart(struct rte_eth_dev *dev)
}
return 0;
 }
+
+static bool
+mac_flows_update_needed(struct rte_eth_dev *dev)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+
+   if (!dev->data->dev_started)
+   return false;
+   if (dev->data->promiscuous)
+   return false;
+   if (priv->isolated)
+   return false;
+
+   return true;
+}
+
+static int
+traffic_dmac_create(struct rte_eth_dev *dev, const struct rte_ether_addr *addr)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+
+   if (priv->sh->config.dv_flow_en == 2)
+   return mlx5_flow_hw_ctrl_flow_dmac(dev, addr);
+   else
+   return mlx5_legacy_dmac_flow_create(dev, addr);
+}
+
+static int
+traffic_dmac_destroy(struct rte_eth_dev *dev, const struct rte_ether_addr 
*addr)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+
+   if (priv->sh->config.dv_flow_en == 2)
+   return mlx5_flow_hw_ctrl_flow_dmac_destroy(dev, addr);
+   else
+   return mlx5_legacy_dmac_flow_destroy(dev, addr);
+}
+
+static int
+traffic_dmac_vlan_create(struct rte_eth_dev *dev,
+const struct rte_ether_addr *addr,
+const uint16_t vid)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+
+   if (priv->sh->config.dv_flow_en == 2)
+   return mlx5_flow_hw_ctrl_flow_dmac_vlan(dev, addr, vid);
+   else
+   return mlx5_legacy_dmac_vlan_flow_create(dev, addr, vid);
+}
+
+static int
+traffic_dmac_vlan_destroy(struct rte_eth_dev *dev,
+const struct rte_ether_addr *addr,
+const uint16_t vid)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+
+   if (priv->sh->config.dv_flow_en == 2)
+   return mlx5_flow_hw_ctrl_flow_dmac_vlan_destroy(dev, addr, vid);
+   else
+   return mlx5_legacy_dmac_vlan_flow_destroy(dev, addr, vid);
+}
+
+/**
+ * Adjust Rx control flow rules to allow traffic on provided MAC address.
+ */
+int
+mlx5_traffic_mac_add(struct rte_eth_dev *dev, const struct rte_ether_addr 
*addr)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+
+   if (!mac_flows_update_needed(dev))
+   return 0;
+
+   if (priv->vlan_filter_n > 0) {
+   unsigned int i;
+
+   for (i = 0; i < priv->vlan_filter_n; ++i) {
+   uint16_t vlan = priv->vlan_filter[i];
+   int ret;
+
+   if (mlx5_ctrl_flow_uc_dmac_vlan_exists(dev, addr, vlan))
+   continue;
+
+   ret = traffic_dmac_vlan_create(dev, addr, vlan);
+   if (ret != 0)
+   return ret;
+   }
+
+   return 0;
+   }
+
+   if (mlx5_ctrl_flow_uc_dmac_exists(dev, addr))
+   return 0;
+
+   return traffic_dmac_create(dev, addr);
+}
+
+/**
+ * Adjust Rx control flow rules to disallow traffi

[PATCH 06/10] net/mlx5: shared init of control flow rules

2024-10-17 Thread Dariusz Sosnowski
Control flow rules lists and control flow rule lock
can be reused between all flow engines, but their initialization
was done in flow_hw_configure() implementation.
This patch moves it to mlx5_dev_spawn(),
which is called for Verbs, DV and HWS flow engines.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/linux/mlx5_os.c   | 3 +++
 drivers/net/mlx5/mlx5_flow_hw.c| 3 ---
 drivers/net/mlx5/windows/mlx5_os.c | 3 +++
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 0a8de88759..c8d7fdb8dd 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1701,6 +1701,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
(sh->config.dv_flow_en == 1 && 
mlx5_flow_discover_ipv6_tc_support(eth_dev)))
sh->phdev->config.ipv6_tc_fallback = 
MLX5_IPV6_TC_FALLBACK;
}
+   rte_spinlock_init(&priv->hw_ctrl_lock);
+   LIST_INIT(&priv->hw_ctrl_flows);
+   LIST_INIT(&priv->hw_ext_ctrl_flows);
if (priv->sh->config.dv_flow_en == 2) {
 #ifdef HAVE_MLX5_HWS_SUPPORT
if (priv->sh->config.dv_esw_en) {
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index 0d8224b8de..9ab66f5929 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -11832,9 +11832,6 @@ __flow_hw_configure(struct rte_eth_dev *dev,
if (!priv->dr_ctx)
goto err;
priv->nb_queue = nb_q_updated;
-   rte_spinlock_init(&priv->hw_ctrl_lock);
-   LIST_INIT(&priv->hw_ctrl_flows);
-   LIST_INIT(&priv->hw_ext_ctrl_flows);
ret = flow_hw_action_template_drop_init(dev, error);
if (ret)
goto err;
diff --git a/drivers/net/mlx5/windows/mlx5_os.c 
b/drivers/net/mlx5/windows/mlx5_os.c
index 0ebd233595..80f1679388 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -600,6 +600,9 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
}
mlx5_flow_counter_mode_config(eth_dev);
mlx5_queue_counter_id_prepare(eth_dev);
+   rte_spinlock_init(&priv->hw_ctrl_lock);
+   LIST_INIT(&priv->hw_ctrl_flows);
+   LIST_INIT(&priv->hw_ext_ctrl_flows);
return eth_dev;
 error:
if (priv) {
-- 
2.39.5



[PATCH 07/10] net/mlx5: add legacy unicast flow rules management

2024-10-17 Thread Dariusz Sosnowski
This patch adds the following internal functions for creation of
unicast DMAC flow rules:

- mlx5_legacy_dmac_flow_create() - simple wrapper over
  mlx5_ctrl_flow().
- mlx5_legacy_dmac_vlan_flow_create() - simple wrapper over
  mlx5_ctrl_flow_vlan().

These will be used as a basis for implementing dynamic
additions of unicast DMAC or unicast DMAC with VLAN
control flow rules when new addresses/VLANs are added.

Also, this path adds the following internal functions
for destructions of unicast DMAC flow rules:

- mlx5_legacy_ctrl_flow_destroy() - assuming a flow rule is on the
  control flow rule list, destroy it.
- mlx5_legacy_dmac_flow_destroy() - find and destroy a flow rule
  with given unicast DMAC.
- mlx5_legacy_dmac_flow_destroy() - find and destroy a flow rule
  with given unicast DMAC and VLAN ID.

These will be used as a basis for implementing dynamic
removals of unicast DMAC or unicast DMAC with VLAN
control flow rules when addresses/VLANs are removed.

At the moment, no relevant flow rules are registered on the list
when working with Verbs or DV flow engine.
This will be added in the follow up commit.

Signed-off-by: Dariusz Sosnowski 
---
 drivers/net/mlx5/mlx5_flow.c | 80 
 drivers/net/mlx5/mlx5_flow.h | 19 +
 2 files changed, 99 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index af79956eaa..463edae70e 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -8534,6 +8534,86 @@ mlx5_ctrl_flow(struct rte_eth_dev *dev,
return mlx5_ctrl_flow_vlan(dev, eth_spec, eth_mask, NULL, NULL);
 }
 
+int
+mlx5_legacy_dmac_flow_create(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr)
+{
+   struct rte_flow_item_eth unicast = {
+   .hdr.dst_addr = *addr,
+   };
+   struct rte_flow_item_eth unicast_mask = {
+   .hdr.dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+   };
+
+   return mlx5_ctrl_flow(dev, &unicast, &unicast_mask);
+}
+
+int
+mlx5_legacy_dmac_vlan_flow_create(struct rte_eth_dev *dev,
+ const struct rte_ether_addr *addr,
+ const uint16_t vid)
+{
+   struct rte_flow_item_eth unicast_spec = {
+   .hdr.dst_addr = *addr,
+   };
+   struct rte_flow_item_eth unicast_mask = {
+   .hdr.dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+   };
+   struct rte_flow_item_vlan vlan_spec = {
+   .hdr.vlan_tci = rte_cpu_to_be_16(vid),
+   };
+   struct rte_flow_item_vlan vlan_mask = rte_flow_item_vlan_mask;
+
+   return mlx5_ctrl_flow_vlan(dev, &unicast_spec, &unicast_mask, 
&vlan_spec, &vlan_mask);
+}
+
+void
+mlx5_legacy_ctrl_flow_destroy(struct rte_eth_dev *dev, struct 
mlx5_ctrl_flow_entry *entry)
+{
+   uintptr_t flow_idx;
+
+   flow_idx = (uintptr_t)entry->flow;
+   mlx5_flow_list_destroy(dev, MLX5_FLOW_TYPE_CTL, flow_idx);
+   LIST_REMOVE(entry, next);
+   mlx5_free(entry);
+}
+
+int
+mlx5_legacy_dmac_flow_destroy(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+   struct mlx5_ctrl_flow_entry *entry;
+
+   LIST_FOREACH(entry, &priv->hw_ctrl_flows, next) {
+   if (entry->info.type != 
MLX5_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC ||
+   !rte_is_same_ether_addr(addr, &entry->info.uc.dmac))
+   continue;
+
+   mlx5_legacy_ctrl_flow_destroy(dev, entry);
+   return 0;
+   }
+   return 0;
+}
+
+int
+mlx5_legacy_dmac_vlan_flow_destroy(struct rte_eth_dev *dev,
+  const struct rte_ether_addr *addr,
+  const uint16_t vid)
+{
+   struct mlx5_priv *priv = dev->data->dev_private;
+   struct mlx5_ctrl_flow_entry *entry;
+
+   LIST_FOREACH(entry, &priv->hw_ctrl_flows, next) {
+   if (entry->info.type != 
MLX5_CTRL_FLOW_TYPE_DEFAULT_RX_RSS_UNICAST_DMAC_VLAN ||
+   !rte_is_same_ether_addr(addr, &entry->info.uc.dmac) ||
+   vid != entry->info.uc.vlan)
+   continue;
+
+   mlx5_legacy_ctrl_flow_destroy(dev, entry);
+   return 0;
+   }
+   return 0;
+}
+
 /**
  * Create default miss flow rule matching lacp traffic
  *
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 165d17e40a..db56ae051d 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -2991,6 +2991,25 @@ struct mlx5_flow_hw_ctrl_fdb {
 
 int mlx5_flow_hw_ctrl_flows(struct rte_eth_dev *dev, uint32_t flags);
 
+/** Create a control flow rule for matching unicast DMAC with VLAN (Verbs and 
DV). */
+int mlx5_legacy_dmac_flow_create(struct rte_eth_dev *dev, const struct 
rte_ether_addr *addr);
+
+/** Destroy a control flow rule for matching unicast DMAC with VLAN (Verbs an

Re: [EXTERNAL] [RFC PATCH 0/3] add feature arc in rte_graph

2024-10-17 Thread Christophe Fontaine
Hi all,

What about the following steps:
- update the nodes so they work on the current layer (example: for all L3 
nodes, the current mbuf data offset *must* be pointing to the IP header)
- define a public data structure that would be shared across nodes through priv 
data, and not dynfields ? This structure would be the "internal api" (so, that 
has to be tracked across dpdk releases) between nodes.
We’d need common data shared for all the nodes as well as specific data between 
2 nodes.
As we get to this point, this (hopefully) will help with the node reusability.

- Update the feature arcs to leverage this well known structure, and refine the 
api
- Define which part of the stack needs to be defined as a feature arc, with the 
benefit of the generic API to enable/disable that feature, and which part needs 
to be dynamically pluggable.
For instance, for a router, it may not make sense to define IPv4 support as a 
feature arc.
So, we’d statically connect eth_input to ip_input.
Yet, lldp support is a good candidate for a feature arc: we need to configure 
it per interface, and this is independent of the main graph.

WDYT?
Christophe

> On 17 Oct 2024, at 09:50, Robin Jarry  wrote:
> 
> Hi Nitin, all,
> 
> Nitin Saxena, Oct 17, 2024 at 09:03:
>> Hi Robin/David and all,
>> 
>> We realized the feature arc patch series is difficult to understand as a new 
>> concept. Our objectives are following with feature arc changes
>> 
>> 1. Allow reusability of standard DPDK nodes (defined in lib/nodes/*)with 
>> out-of-tree applications (like grout). Currently out-of-treegraph 
>> applications are duplicating standard nodes but not reusingthe standard 
>> ones which are available. In the long term, we wouldlike to mature 
>> standard DPDK nodes with flexibility of hooking themto out-of-tree 
>> application nodes.
> 
> It would be ideal if the in-built nodes could be reused. When we started 
> working on grout, I tried multiple approaches where I could reuse these 
> nodes, but all failed. The nodes public API seems tailored for app/graph but 
> does not fit well with other control plane implementations.
> 
> One of the main issues I had is that the ethdev_rx and ethdev_tx nodes are 
> cloned per rxq / txq associated with a graph worker. The rte_node API 
> requires that every clone has a unique name. This in turn makes hot plugging 
> of DPDK ports very complex, if not impossible.
> 
> For example, with the in-built nodes, it is not possible to change the number 
> of ports or their number of RX queues without destroying the whole graph and 
> creating a new one from scratch.
> 
> Also, the current implementation of "ip{4,6}-rewrite" handles writing 
> ethernet header data. This would prevent it from using this node for an 
> IP-in-IP tunnel interface as we did in grout.
> 
> Do you think we could change the in-built nodes to enforce OSI layer 
> separation of concerns? It would make them much more flexible. It may cause a 
> slight drop of performance because you'd be splitting processing in two 
> different nodes. But I think flexibility is more important. Otherwise, the 
> in-built nodes can only be used for very specific use-cases.
> 
> Finally, I would like to improve the rte_node API to allow defining and 
> enforcing per-packet metadata that every node expects as input. The current 
> in-built nodes rely on mbuf dynamic fields for this but this means you only 
> have 9x32 bits available. And using all of these may break some drivers 
> (ixgbe) that rely on dynfields to work. Have you considered using mbuf 
> private data for this?
> 
>> 
>> 2. Flexibility to enable/disable sub-graphs per interface based on the
>> runtime configuration updates. Protocol sub-graphs can beselectively 
>> enabled for few (or all interfaces) at runtime
>> 
>> 3. More than one sub-graphs/features can be enabled on an interface.So a 
>> packet has to follow a sequential ordering node path on workercores. 
>> Packets may need to move from one sub-graph to anothersub-graph per 
>> interface
>> 
>> 4. Last but not least, an optimized implementation which does not (or
>> minimally) stop worker cores for any control plane runtime updates.Any 
>> performance regression should also be avoided
>> 
>> I am planning to create a draft presentation on feature arc which I can 
>> share, when ready, to discuss. If needed, I can also plan to present that in 
>> one of the DPDK community meetings. Their we can also discuss if there are 
>> any alternatives of achieving above objectives
> 
> Looking forward to this.
> 
> Thanks!
> 



[PATCH dpdk] net: add more icmp types and code

2024-10-17 Thread Robin Jarry
Add more ICMP message types and codes based on RFC 792. Change the
namespace prefix from RTE_IP_ICMP_ to RTE_ICMP_ to allow differentiation
between types and codes.

Signed-off-by: Robin Jarry 
---
 app/test-pmd/icmpecho.c | 10 +-
 lib/net/rte_icmp.h  | 31 +--
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c
index 68524484e305..4ef23ae67ac4 100644
--- a/app/test-pmd/icmpecho.c
+++ b/app/test-pmd/icmpecho.c
@@ -416,7 +416,7 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs)
icmp_h = (struct rte_icmp_hdr *) ((char *)ip_h +
  sizeof(struct rte_ipv4_hdr));
if (! ((ip_h->next_proto_id == IPPROTO_ICMP) &&
-  (icmp_h->icmp_type == RTE_IP_ICMP_ECHO_REQUEST) &&
+  (icmp_h->icmp_type == RTE_ICMP_TYPE_ECHO_REQUEST) &&
   (icmp_h->icmp_code == 0))) {
rte_pktmbuf_free(pkt);
continue;
@@ -440,7 +440,7 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs)
 * - switch the request IP source and destination
 *   addresses in the reply IP header,
 * - keep the IP header checksum unchanged.
-* - set RTE_IP_ICMP_ECHO_REPLY in ICMP header.
+* - set RTE_ICMP_TYPE_ECHO_REPLY in ICMP header.
 * ICMP checksum is computed by assuming it is valid in the
 * echo request and not verified.
 */
@@ -463,10 +463,10 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs)
ip_h->src_addr = ip_h->dst_addr;
ip_h->dst_addr = ip_addr;
}
-   icmp_h->icmp_type = RTE_IP_ICMP_ECHO_REPLY;
+   icmp_h->icmp_type = RTE_ICMP_TYPE_ECHO_REPLY;
cksum = ~icmp_h->icmp_cksum & 0x;
-   cksum += ~RTE_BE16(RTE_IP_ICMP_ECHO_REQUEST << 8) & 0x;
-   cksum += RTE_BE16(RTE_IP_ICMP_ECHO_REPLY << 8);
+   cksum += ~RTE_BE16(RTE_ICMP_TYPE_ECHO_REQUEST << 8) & 0x;
+   cksum += RTE_BE16(RTE_ICMP_TYPE_ECHO_REPLY << 8);
cksum = (cksum & 0x) + (cksum >> 16);
cksum = (cksum & 0x) + (cksum >> 16);
icmp_h->icmp_cksum = ~cksum;
diff --git a/lib/net/rte_icmp.h b/lib/net/rte_icmp.h
index 7a33280aa1e4..5ac165d8d40d 100644
--- a/lib/net/rte_icmp.h
+++ b/lib/net/rte_icmp.h
@@ -50,8 +50,35 @@ struct rte_icmp_hdr {
 } __rte_packed;
 
 /* ICMP packet types */
-#define RTE_IP_ICMP_ECHO_REPLY   0
-#define RTE_IP_ICMP_ECHO_REQUEST 8
+#define RTE_ICMP_TYPE_ECHO_REPLY 0
+#define RTE_IP_ICMP_ECHO_REPLY RTE_DEPRECATED(RTE_IP_ICMP_ECHO_REPLY) 
RTE_ICMP_TYPE_ECHO_REPLY
+#define RTE_ICMP_TYPE_DEST_UNREACHABLE 3
+#define RTE_ICMP_TYPE_REDIRECT 5
+#define RTE_ICMP_TYPE_ECHO_REQUEST 8
+#define RTE_IP_ICMP_ECHO_REQUEST RTE_DEPRECATED(RTE_IP_ICMP_ECHO_REQUEST) 
RTE_ICMP_TYPE_ECHO_REQUEST
+#define RTE_ICMP_TYPE_TTL_EXCEEDED 11
+#define RTE_ICMP_TYPE_PARAM_PROBLEM 12
+#define RTE_ICMP_TYPE_TIMESTAMP_REQUEST 13
+#define RTE_ICMP_TYPE_TIMESTAMP_REPLY 14
+
+/* Destination Unreachable codes */
+#define RTE_ICMP_CODE_UNREACH_NET 0
+#define RTE_ICMP_CODE_UNREACH_HOST 1
+#define RTE_ICMP_CODE_UNREACH_PROTO 2
+#define RTE_ICMP_CODE_UNREACH_PORT 3
+#define RTE_ICMP_CODE_UNREACH_FRAG 4
+#define RTE_ICMP_CODE_UNREACH_SRC 5
+
+/* Time Exceeded codes */
+#define RTE_ICMP_CODE_TTL_EXCEEDED 0
+#define RTE_ICMP_CODE_TTL_FRAG 1
+
+/* Redirect codes */
+#define RTE_ICMP_CODE_REDIRECT_NET 0
+#define RTE_ICMP_CODE_REDIRECT_HOST 1
+#define RTE_ICMP_CODE_REDIRECT_TOS_NET 2
+#define RTE_ICMP_CODE_REDIRECT_TOS_HOST 3
+
 #define RTE_ICMP6_ECHO_REQUEST 128
 #define RTE_ICMP6_ECHO_REPLY   129
 
-- 
2.47.0



Re: [PATCH v10 1/2] power: introduce PM QoS API on CPU wide

2024-10-17 Thread lihuisong (C)



在 2024/10/17 11:20, Stephen Hemminger 写道:

On Thu, 17 Oct 2024 10:11:13 +0800
"lihuisong (C)"  wrote:


Hi Stephen,

在 2024/10/15 23:45, Stephen Hemminger 写道:

On Tue, 15 Oct 2024 17:41:39 +0800
"lihuisong (C)"  wrote:
  

Hi Stephen,

Can you take a look at this reply so as to send out the next version ASAP?
Thanks.😁

/Huisong
在 2024/10/14 20:19, lihuisong (C) 写道:

The biggest issue is that lcore is not the same as cpu as far as kernel is 
concerned.
DPDK support mapping lcore to a cpuset, and that is not necessarily the same 
one-to-one mapping
as values in sysfs. In documentation of eal see.

Yes, you are right.

For example, "--lcores='1,2@(5-7),(3-5)@(0,2),(0,6),7-8'" which means start 9 
EAL thread;
  lcore 0 runs on cpuset 0x41 (cpu 0,6);
  lcore 1 runs on cpuset 0x2 (cpu 1);
  lcore 2 runs on cpuset 0xe0 (cpu 5,6,7);
  lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2);
  lcore 6 runs on cpuset 0x41 (cpu 0,6);
  lcore 7 runs on cpuset 0x80 (cpu 7);
  lcore 8 runs on cpuset 0x100 (cpu 8).

This problem existed in power library and this new API still has it.

How about use lcore_config[lcore_id].cpuset to get the real cpu_id?
And for this case that application use '--lcores', we simply do some
operations in power lib for all mapping CPUs in lcore's cpuset.
If it is ok, I will fix it for the entire power library and this new API.

Using the lcore_config is the right direction but the cpuset may have more than
one cpu, so the code needs to iterate over those cpus.  Probably safe to ignore 
problems
the case where user misconfigures to have two lcores using an overlapping set 
of cpu's
like the example in the doc.
.

Yes, so we don't care this overlapping set case.
That's attributed to an usage issue and we just need to clearly comment 
this case's influence in doc, ok?


Re: [PATCH v3] mem: allow using ASan in multi-process mode

2024-10-17 Thread Artur Paszkiewicz

On 10/3/24 23:18, Stephen Hemminger wrote:

Makes sense, but patch has some fuzz against current main branch.
There is also another patch that address the ASAN touch issue.

https://patchwork.dpdk.org/project/dpdk/patch/20240723083419.12435-1-amic...@kalrayinc.com/


I just sent a new version of the patch, it no longer needs the change
that was related to that linked patch.

Thanks,
Artur


[PATCH v5 0/5] power: refactor power management library

2024-10-17 Thread Sivaprasad Tummala
This patchset refactors the power management library, addressing both
core and uncore power management. The primary changes involve the
creation of dedicated directories for each driver within
'drivers/power/core/*' and 'drivers/power/uncore/*'.

This refactor significantly improves code organization, enhances
clarity, and boosts maintainability. It lays the foundation for more
focused development on individual drivers and facilitates seamless
integration of future enhancements, particularly the AMD uncore driver.

Furthermore, this effort aims to streamline code maintenance by
consolidating common functions for cpufreq and cppc across various
core drivers, thus reducing code duplication.

Sivaprasad Tummala (5):
  power: refactor core power management library
  power: refactor uncore power management library
  test/power: removed function pointer validations
  drivers/power: uncore support for AMD EPYC processors
  maintainers: update for drivers/power

 MAINTAINERS   |   1 +
 app/test/test_power.c |  95 -
 app/test/test_power_cpufreq.c |  52 ---
 app/test/test_power_kvm_vm.c  |  36 --
 drivers/meson.build   |   1 +
 .../power/acpi/acpi_cpufreq.c |  22 +-
 .../power/acpi/acpi_cpufreq.h |   6 +-
 drivers/power/acpi/meson.build|  10 +
 .../power/amd_pstate/amd_pstate_cpufreq.c |  24 +-
 .../power/amd_pstate/amd_pstate_cpufreq.h |   8 +-
 drivers/power/amd_pstate/meson.build  |  10 +
 drivers/power/amd_uncore/amd_uncore.c | 329 ++
 drivers/power/amd_uncore/amd_uncore.h | 226 
 drivers/power/amd_uncore/meson.build  |  20 ++
 .../power/cppc/cppc_cpufreq.c |  22 +-
 .../power/cppc/cppc_cpufreq.h |   8 +-
 drivers/power/cppc/meson.build|  10 +
 .../power/intel_uncore/intel_uncore.c |  18 +-
 .../power/intel_uncore/intel_uncore.h |   8 +-
 drivers/power/intel_uncore/meson.build|   6 +
 .../power/kvm_vm}/guest_channel.c |   0
 .../power/kvm_vm}/guest_channel.h |   0
 .../power/kvm_vm/kvm_vm.c |  22 +-
 .../power/kvm_vm/kvm_vm.h |   6 +-
 drivers/power/kvm_vm/meson.build  |  16 +
 drivers/power/meson.build |  14 +
 drivers/power/pstate/meson.build  |  10 +
 .../power/pstate/pstate_cpufreq.c |  22 +-
 .../power/pstate/pstate_cpufreq.h |   6 +-
 examples/l3fwd-power/main.c   |  12 +-
 lib/power/meson.build |   9 +-
 lib/power/power_common.c  |   2 +-
 lib/power/power_common.h  |  16 +-
 lib/power/rte_power.c | 287 +--
 lib/power/rte_power.h | 141 +---
 lib/power/rte_power_cpufreq_api.h | 209 +++
 lib/power/rte_power_uncore.c  | 207 +--
 lib/power/rte_power_uncore.h  |  87 +++--
 lib/power/rte_power_uncore_ops.h  | 241 +
 lib/power/version.map |  17 +
 40 files changed, 1611 insertions(+), 625 deletions(-)
 rename lib/power/power_acpi_cpufreq.c => drivers/power/acpi/acpi_cpufreq.c 
(95%)
 rename lib/power/power_acpi_cpufreq.h => drivers/power/acpi/acpi_cpufreq.h 
(98%)
 create mode 100644 drivers/power/acpi/meson.build
 rename lib/power/power_amd_pstate_cpufreq.c => 
drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%)
 rename lib/power/power_amd_pstate_cpufreq.h => 
drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%)
 create mode 100644 drivers/power/amd_pstate/meson.build
 create mode 100644 drivers/power/amd_uncore/amd_uncore.c
 create mode 100644 drivers/power/amd_uncore/amd_uncore.h
 create mode 100644 drivers/power/amd_uncore/meson.build
 rename lib/power/power_cppc_cpufreq.c => drivers/power/cppc/cppc_cpufreq.c 
(95%)
 rename lib/power/power_cppc_cpufreq.h => drivers/power/cppc/cppc_cpufreq.h 
(97%)
 create mode 100644 drivers/power/cppc/meson.build
 rename lib/power/power_intel_uncore.c => 
drivers/power/intel_uncore/intel_uncore.c (95%)
 rename lib/power/power_intel_uncore.h => 
drivers/power/intel_uncore/intel_uncore.h (97%)
 create mode 100644 drivers/power/intel_uncore/meson.build
 rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%)
 rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%)
 rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c (82%)
 rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h (98%)
 create mode 100644 drivers/power/kvm_vm/meson.build
 create mode 100644 drivers/power/meson.build
 create mode 100644 drivers/power/pstate/meson.build
 rename lib/power/power_pstate_cpufreq.c => 
drivers/power/pstate/pstate_cpufreq.c (96%)
 rename lib/power/power_pstate_cpufreq.h =>

[PATCH v5 1/5] power: refactor core power management library

2024-10-17 Thread Sivaprasad Tummala
This patch introduces a comprehensive refactor to the core power
management library. The primary focus is on improving modularity
and organization by relocating specific driver implementations
from the 'lib/power' directory to dedicated directories within
'drivers/power/core/*'. The adjustment of meson.build files
enables the selective activation of individual drivers.

These changes contribute to a significant enhancement in code
organization, providing a clearer structure for driver implementations.
The refactor aims to improve overall code clarity and boost
maintainability. Additionally, it establishes a foundation for
future development, allowing for more focused work on individual
drivers and seamless integration of forthcoming enhancements.

v5:
 - fixed code style warning

v4:
 - fixed build error with RTE_ASSERT

v3:
 - renamed rte_power_core_ops.h as rte_power_cpufreq_api.h
 - re-worked on auto detection logic

v2:
 - added NULL check for global_core_ops in rte_power_get_core_ops

Signed-off-by: Sivaprasad Tummala 
---
 drivers/meson.build   |   1 +
 .../power/acpi/acpi_cpufreq.c |  22 +-
 .../power/acpi/acpi_cpufreq.h |   6 +-
 drivers/power/acpi/meson.build|  10 +
 .../power/amd_pstate/amd_pstate_cpufreq.c |  24 +-
 .../power/amd_pstate/amd_pstate_cpufreq.h |   8 +-
 drivers/power/amd_pstate/meson.build  |  10 +
 .../power/cppc/cppc_cpufreq.c |  22 +-
 .../power/cppc/cppc_cpufreq.h |   8 +-
 drivers/power/cppc/meson.build|  10 +
 .../power/kvm_vm}/guest_channel.c |   0
 .../power/kvm_vm}/guest_channel.h |   0
 .../power/kvm_vm/kvm_vm.c |  22 +-
 .../power/kvm_vm/kvm_vm.h |   6 +-
 drivers/power/kvm_vm/meson.build  |  16 +
 drivers/power/meson.build |  12 +
 drivers/power/pstate/meson.build  |  10 +
 .../power/pstate/pstate_cpufreq.c |  22 +-
 .../power/pstate/pstate_cpufreq.h |   6 +-
 lib/power/meson.build |   7 +-
 lib/power/power_common.c  |   2 +-
 lib/power/power_common.h  |  16 +-
 lib/power/rte_power.c | 287 ++
 lib/power/rte_power.h | 141 ++---
 lib/power/rte_power_cpufreq_api.h | 209 +
 lib/power/version.map |  14 +
 26 files changed, 621 insertions(+), 270 deletions(-)
 rename lib/power/power_acpi_cpufreq.c => drivers/power/acpi/acpi_cpufreq.c 
(95%)
 rename lib/power/power_acpi_cpufreq.h => drivers/power/acpi/acpi_cpufreq.h 
(98%)
 create mode 100644 drivers/power/acpi/meson.build
 rename lib/power/power_amd_pstate_cpufreq.c => 
drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%)
 rename lib/power/power_amd_pstate_cpufreq.h => 
drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%)
 create mode 100644 drivers/power/amd_pstate/meson.build
 rename lib/power/power_cppc_cpufreq.c => drivers/power/cppc/cppc_cpufreq.c 
(95%)
 rename lib/power/power_cppc_cpufreq.h => drivers/power/cppc/cppc_cpufreq.h 
(97%)
 create mode 100644 drivers/power/cppc/meson.build
 rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%)
 rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%)
 rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c (82%)
 rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h (98%)
 create mode 100644 drivers/power/kvm_vm/meson.build
 create mode 100644 drivers/power/meson.build
 create mode 100644 drivers/power/pstate/meson.build
 rename lib/power/power_pstate_cpufreq.c => 
drivers/power/pstate/pstate_cpufreq.c (96%)
 rename lib/power/power_pstate_cpufreq.h => 
drivers/power/pstate/pstate_cpufreq.h (98%)
 create mode 100644 lib/power/rte_power_cpufreq_api.h

diff --git a/drivers/meson.build b/drivers/meson.build
index 2733306698..7ef4f581a0 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -29,6 +29,7 @@ subdirs = [
 'event',  # depends on common, bus, mempool and net.
 'baseband',   # depends on common and bus.
 'gpu',# depends on common and bus.
+'power',  # depends on common (in future).
 ]
 
 if meson.is_cross_build()
diff --git a/lib/power/power_acpi_cpufreq.c b/drivers/power/acpi/acpi_cpufreq.c
similarity index 95%
rename from lib/power/power_acpi_cpufreq.c
rename to drivers/power/acpi/acpi_cpufreq.c
index abad53bef1..c3fd10f287 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/drivers/power/acpi/acpi_cpufreq.c
@@ -10,7 +10,7 @@
 #include 
 #include 
 
-#include "power_acpi_cpufreq.h"
+#include "acpi_cpufreq.h"
 #include "power_common.h"
 
 #define STR_SIZE 1024
@@ -583,3 +583,23 @@ int power_acpi_get_capabilities(unsigned int lcore_id,
 
return 0;
 }
+
+static struct rte_power_core_ops acpi_ops = {
+   .name 

[PATCH v5 4/5] drivers/power: uncore support for AMD EPYC processors

2024-10-17 Thread Sivaprasad Tummala
This patch introduces driver support for power management of uncore
components in AMD EPYC processors.

v2:
 - fixed typo in comments section.
 - added fabric frequency get support for legacy platforms.

Signed-off-by: Sivaprasad Tummala 
---
 drivers/power/amd_uncore/amd_uncore.c | 329 ++
 drivers/power/amd_uncore/amd_uncore.h | 226 ++
 drivers/power/amd_uncore/meson.build  |  20 ++
 drivers/power/meson.build |   1 +
 4 files changed, 576 insertions(+)
 create mode 100644 drivers/power/amd_uncore/amd_uncore.c
 create mode 100644 drivers/power/amd_uncore/amd_uncore.h
 create mode 100644 drivers/power/amd_uncore/meson.build

diff --git a/drivers/power/amd_uncore/amd_uncore.c 
b/drivers/power/amd_uncore/amd_uncore.c
new file mode 100644
index 00..c3e95cdc08
--- /dev/null
+++ b/drivers/power/amd_uncore/amd_uncore.c
@@ -0,0 +1,329 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 Advanced Micro Devices, Inc.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+#include "amd_uncore.h"
+#include "power_common.h"
+#include "e_smi/e_smi.h"
+
+#define MAX_NUMA_DIE 8
+
+struct  __rte_cache_aligned uncore_power_info {
+   unsigned int die;  /* Core die id */
+   unsigned int pkg;  /* Package id */
+   uint32_t freqs[RTE_MAX_UNCORE_FREQS];  /* Frequency array */
+   uint32_t nb_freqs; /* Number of available freqs */
+   uint32_t curr_idx; /* Freq index in freqs array */
+   uint32_t max_freq;/* System max uncore freq */
+   uint32_t min_freq;/* System min uncore freq */
+};
+
+static struct uncore_power_info uncore_info[RTE_MAX_NUMA_NODES][MAX_NUMA_DIE];
+static int esmi_initialized;
+static unsigned int hsmp_proto_ver;
+
+static int
+set_uncore_freq_internal(struct uncore_power_info *ui, uint32_t idx)
+{
+   int ret;
+
+   if (idx >= RTE_MAX_UNCORE_FREQS || idx >= ui->nb_freqs) {
+   POWER_LOG(DEBUG, "Invalid uncore frequency index %u, which "
+   "should be less than %u", idx, ui->nb_freqs);
+   return -1;
+   }
+
+   ret = esmi_apb_disable(ui->pkg, idx);
+   if (ret != ESMI_SUCCESS) {
+   POWER_LOG(ERR, "DF P-state '%u' set failed for pkg %02u",
+   idx, ui->pkg);
+   return -1;
+   }
+
+   POWER_DEBUG_LOG("DF P-state '%u' to be set for pkg %02u die %02u",
+   idx, ui->pkg, ui->die);
+
+   /* write the minimum value first if the target freq is less than 
current max */
+   ui->curr_idx = idx;
+
+   return 0;
+}
+
+static int
+power_init_for_setting_uncore_freq(struct uncore_power_info *ui)
+{
+   switch (hsmp_proto_ver) {
+   case HSMP_PROTO_VER5:
+   ui->max_freq = 180; /* Hz */
+   ui->min_freq = 120; /* Hz */
+   break;
+   case HSMP_PROTO_VER2:
+   default:
+   ui->max_freq = 160; /* Hz */
+   ui->min_freq = 120; /* Hz */
+   }
+
+   return 0;
+}
+
+/*
+ * Get the available uncore frequencies of the specific die.
+ */
+static int
+power_get_available_uncore_freqs(struct uncore_power_info *ui)
+{
+   ui->nb_freqs = 3;
+   if (ui->nb_freqs >= RTE_MAX_UNCORE_FREQS) {
+   POWER_LOG(ERR, "Too many available uncore frequencies: %d",
+   ui->nb_freqs);
+   return -1;
+   }
+
+   /* Generate the uncore freq bucket array. */
+   switch (hsmp_proto_ver) {
+   case HSMP_PROTO_VER5:
+   ui->freqs[0] = 180;
+   ui->freqs[1] = 144;
+   ui->freqs[2] = 120;
+   break;
+   case HSMP_PROTO_VER2:
+   default:
+   ui->freqs[0] = 160;
+   ui->freqs[1] = 1333000;
+   ui->freqs[2] = 120;
+   }
+
+   POWER_DEBUG_LOG("%d frequency(s) of pkg %02u die %02u are available",
+   ui->num_uncore_freqs, ui->pkg, ui->die);
+
+   return 0;
+}
+
+static int
+check_pkg_die_values(unsigned int pkg, unsigned int die)
+{
+   unsigned int max_pkgs, max_dies;
+   max_pkgs = power_amd_uncore_get_num_pkgs();
+   if (max_pkgs == 0)
+   return -1;
+   if (pkg >= max_pkgs) {
+   POWER_LOG(DEBUG, "Package number %02u can not exceed %u",
+   pkg, max_pkgs);
+   return -1;
+   }
+
+   max_dies = power_amd_uncore_get_num_dies(pkg);
+   if (max_dies == 0)
+   return -1;
+   if (die >= max_dies) {
+   POWER_LOG(DEBUG, "Die number %02u can not exceed %u",
+   die, max_dies);
+   return -1;
+   }
+
+   return 0;
+}
+
+static void
+power_amd_uncore_esmi_init(void)
+{
+   if (esmi_init() == ESMI_SUCCESS) {
+   i

[PATCH v5 2/5] power: refactor uncore power management library

2024-10-17 Thread Sivaprasad Tummala
This patch refactors the power management library, addressing uncore
power management. The primary changes involve the creation of dedicated
directories for each driver within 'drivers/power/uncore/*'. The
adjustment of meson.build files enables the selective activation
of individual drivers.

This refactor significantly improves code organization, enhances
clarity and boosts maintainability. It lays the foundation for more
focused development on individual drivers and facilitates seamless
integration of future enhancements, particularly the AMD uncore driver.

v5:
 - fixed build errors for risc-v/ppc targets

v4:
 - fixed build error with RTE_ASSERT

v3:
 - fixed typo in header file inclusion

Signed-off-by: Sivaprasad Tummala 
---
 .../power/intel_uncore/intel_uncore.c |  18 +-
 .../power/intel_uncore/intel_uncore.h |   8 +-
 drivers/power/intel_uncore/meson.build|   6 +
 drivers/power/meson.build |   3 +-
 lib/power/meson.build |   2 +-
 lib/power/rte_power_uncore.c  | 207 ++-
 lib/power/rte_power_uncore.h  |  87 ---
 lib/power/rte_power_uncore_ops.h  | 241 ++
 lib/power/version.map |   3 +
 9 files changed, 410 insertions(+), 165 deletions(-)
 rename lib/power/power_intel_uncore.c => 
drivers/power/intel_uncore/intel_uncore.c (95%)
 rename lib/power/power_intel_uncore.h => 
drivers/power/intel_uncore/intel_uncore.h (97%)
 create mode 100644 drivers/power/intel_uncore/meson.build
 create mode 100644 lib/power/rte_power_uncore_ops.h

diff --git a/lib/power/power_intel_uncore.c 
b/drivers/power/intel_uncore/intel_uncore.c
similarity index 95%
rename from lib/power/power_intel_uncore.c
rename to drivers/power/intel_uncore/intel_uncore.c
index 4eb9c5900a..804ad5d755 100644
--- a/lib/power/power_intel_uncore.c
+++ b/drivers/power/intel_uncore/intel_uncore.c
@@ -8,7 +8,7 @@
 
 #include 
 
-#include "power_intel_uncore.h"
+#include "intel_uncore.h"
 #include "power_common.h"
 
 #define MAX_NUMA_DIE 8
@@ -475,3 +475,19 @@ power_intel_uncore_get_num_dies(unsigned int pkg)
 
return count;
 }
+
+static struct rte_power_uncore_ops intel_uncore_ops = {
+   .name = "intel-uncore",
+   .init = power_intel_uncore_init,
+   .exit = power_intel_uncore_exit,
+   .get_avail_freqs = power_intel_uncore_freqs,
+   .get_num_pkgs = power_intel_uncore_get_num_pkgs,
+   .get_num_dies = power_intel_uncore_get_num_dies,
+   .get_num_freqs = power_intel_uncore_get_num_freqs,
+   .get_freq = power_get_intel_uncore_freq,
+   .set_freq = power_set_intel_uncore_freq,
+   .freq_max = power_intel_uncore_freq_max,
+   .freq_min = power_intel_uncore_freq_min,
+};
+
+RTE_POWER_REGISTER_UNCORE_OPS(intel_uncore_ops);
diff --git a/lib/power/power_intel_uncore.h 
b/drivers/power/intel_uncore/intel_uncore.h
similarity index 97%
rename from lib/power/power_intel_uncore.h
rename to drivers/power/intel_uncore/intel_uncore.h
index 20a3ba8ebe..f2ce2f0c66 100644
--- a/lib/power/power_intel_uncore.h
+++ b/drivers/power/intel_uncore/intel_uncore.h
@@ -2,8 +2,8 @@
  * Copyright(c) 2022 Intel Corporation
  */
 
-#ifndef POWER_INTEL_UNCORE_H
-#define POWER_INTEL_UNCORE_H
+#ifndef INTEL_UNCORE_H
+#define INTEL_UNCORE_H
 
 /**
  * @file
@@ -11,7 +11,7 @@
  */
 
 #include "rte_power.h"
-#include "rte_power_uncore.h"
+#include "rte_power_uncore_ops.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -223,4 +223,4 @@ power_intel_uncore_get_num_dies(unsigned int pkg);
 }
 #endif
 
-#endif /* POWER_INTEL_UNCORE_H */
+#endif /* INTEL_UNCORE_H */
diff --git a/drivers/power/intel_uncore/meson.build 
b/drivers/power/intel_uncore/meson.build
new file mode 100644
index 00..876df8ad14
--- /dev/null
+++ b/drivers/power/intel_uncore/meson.build
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation
+# Copyright(c) 2024 Advanced Micro Devices, Inc.
+
+sources = files('intel_uncore.c')
+deps += ['power']
diff --git a/drivers/power/meson.build b/drivers/power/meson.build
index 8c7215c639..c83047af94 100644
--- a/drivers/power/meson.build
+++ b/drivers/power/meson.build
@@ -6,7 +6,8 @@ drivers = [
 'amd_pstate',
 'cppc',
 'kvm_vm',
-'pstate'
+'pstate',
+'intel_uncore'
 ]
 
 std_deps = ['power']
diff --git a/lib/power/meson.build b/lib/power/meson.build
index d6b86ea19c..63616e60fd 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -13,7 +13,6 @@ if not is_linux
 endif
 sources = files(
 'power_common.c',
-'power_intel_uncore.c',
 'rte_power.c',
 'rte_power_uncore.c',
 'rte_power_pmd_mgmt.c',
@@ -24,6 +23,7 @@ headers = files(
 'rte_power_guest_channel.h',
 'rte_power_pmd_mgmt.h',
 'rte_power_uncore.h',
+'rte_power_uncore_ops.h',
 )
 if cc.has_argument('-Wno-cast-qual')
 cflags += '-Wno-

[PATCH v5 5/5] maintainers: update for drivers/power

2024-10-17 Thread Sivaprasad Tummala
Update maintainers for drivers/power/*.

Signed-off-by: Sivaprasad Tummala 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6814991735..9f14e8f8d6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1744,6 +1744,7 @@ M: Anatoly Burakov 
 M: David Hunt 
 M: Sivaprasad Tummala 
 F: lib/power/
+F: drivers/power/*
 F: doc/guides/prog_guide/power_man.rst
 F: app/test/test_power*
 F: examples/l3fwd-power/
-- 
2.34.1



[PATCH v5 0/5] power: refactor power management library

2024-10-17 Thread Sivaprasad Tummala
This patchset refactors the power management library, addressing both
core and uncore power management. The primary changes involve the
creation of dedicated directories for each driver within
'drivers/power/core/*' and 'drivers/power/uncore/*'.

This refactor significantly improves code organization, enhances
clarity, and boosts maintainability. It lays the foundation for more
focused development on individual drivers and facilitates seamless
integration of future enhancements, particularly the AMD uncore driver.

Furthermore, this effort aims to streamline code maintenance by
consolidating common functions for cpufreq and cppc across various
core drivers, thus reducing code duplication.

Sivaprasad Tummala (5):
  power: refactor core power management library
  power: refactor uncore power management library
  test/power: removed function pointer validations
  drivers/power: uncore support for AMD EPYC processors
  maintainers: update for drivers/power

 MAINTAINERS   |   1 +
 app/test/test_power.c |  95 -
 app/test/test_power_cpufreq.c |  52 ---
 app/test/test_power_kvm_vm.c  |  36 --
 drivers/meson.build   |   1 +
 .../power/acpi/acpi_cpufreq.c |  22 +-
 .../power/acpi/acpi_cpufreq.h |   6 +-
 drivers/power/acpi/meson.build|  10 +
 .../power/amd_pstate/amd_pstate_cpufreq.c |  24 +-
 .../power/amd_pstate/amd_pstate_cpufreq.h |   8 +-
 drivers/power/amd_pstate/meson.build  |  10 +
 drivers/power/amd_uncore/amd_uncore.c | 329 ++
 drivers/power/amd_uncore/amd_uncore.h | 226 
 drivers/power/amd_uncore/meson.build  |  20 ++
 .../power/cppc/cppc_cpufreq.c |  22 +-
 .../power/cppc/cppc_cpufreq.h |   8 +-
 drivers/power/cppc/meson.build|  10 +
 .../power/intel_uncore/intel_uncore.c |  18 +-
 .../power/intel_uncore/intel_uncore.h |   8 +-
 drivers/power/intel_uncore/meson.build|   6 +
 .../power/kvm_vm}/guest_channel.c |   0
 .../power/kvm_vm}/guest_channel.h |   0
 .../power/kvm_vm/kvm_vm.c |  22 +-
 .../power/kvm_vm/kvm_vm.h |   6 +-
 drivers/power/kvm_vm/meson.build  |  16 +
 drivers/power/meson.build |  14 +
 drivers/power/pstate/meson.build  |  10 +
 .../power/pstate/pstate_cpufreq.c |  22 +-
 .../power/pstate/pstate_cpufreq.h |   6 +-
 examples/l3fwd-power/main.c   |  12 +-
 lib/power/meson.build |   9 +-
 lib/power/power_common.c  |   2 +-
 lib/power/power_common.h  |  16 +-
 lib/power/rte_power.c | 287 +--
 lib/power/rte_power.h | 141 +---
 lib/power/rte_power_cpufreq_api.h | 209 +++
 lib/power/rte_power_uncore.c  | 207 +--
 lib/power/rte_power_uncore.h  |  87 +++--
 lib/power/rte_power_uncore_ops.h  | 241 +
 lib/power/version.map |  17 +
 40 files changed, 1611 insertions(+), 625 deletions(-)
 rename lib/power/power_acpi_cpufreq.c => drivers/power/acpi/acpi_cpufreq.c 
(95%)
 rename lib/power/power_acpi_cpufreq.h => drivers/power/acpi/acpi_cpufreq.h 
(98%)
 create mode 100644 drivers/power/acpi/meson.build
 rename lib/power/power_amd_pstate_cpufreq.c => 
drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%)
 rename lib/power/power_amd_pstate_cpufreq.h => 
drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%)
 create mode 100644 drivers/power/amd_pstate/meson.build
 create mode 100644 drivers/power/amd_uncore/amd_uncore.c
 create mode 100644 drivers/power/amd_uncore/amd_uncore.h
 create mode 100644 drivers/power/amd_uncore/meson.build
 rename lib/power/power_cppc_cpufreq.c => drivers/power/cppc/cppc_cpufreq.c 
(95%)
 rename lib/power/power_cppc_cpufreq.h => drivers/power/cppc/cppc_cpufreq.h 
(97%)
 create mode 100644 drivers/power/cppc/meson.build
 rename lib/power/power_intel_uncore.c => 
drivers/power/intel_uncore/intel_uncore.c (95%)
 rename lib/power/power_intel_uncore.h => 
drivers/power/intel_uncore/intel_uncore.h (97%)
 create mode 100644 drivers/power/intel_uncore/meson.build
 rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%)
 rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%)
 rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c (82%)
 rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h (98%)
 create mode 100644 drivers/power/kvm_vm/meson.build
 create mode 100644 drivers/power/meson.build
 create mode 100644 drivers/power/pstate/meson.build
 rename lib/power/power_pstate_cpufreq.c => 
drivers/power/pstate/pstate_cpufreq.c (96%)
 rename lib/power/power_pstate_cpufreq.h =>

[PATCH v5 3/5] test/power: removed function pointer validations

2024-10-17 Thread Sivaprasad Tummala
After refactoring the power library, power management operations are now
consistently supported regardless of the operating environment, making
function pointer checks unnecessary and thus removed from applications.

v2:
 - removed function pointer validation in l3fwd-power app.

Signed-off-by: Sivaprasad Tummala 
---
 app/test/test_power.c | 95 ---
 app/test/test_power_cpufreq.c | 52 ---
 app/test/test_power_kvm_vm.c  | 36 -
 examples/l3fwd-power/main.c   | 12 ++---
 4 files changed, 4 insertions(+), 191 deletions(-)

diff --git a/app/test/test_power.c b/app/test/test_power.c
index 403adc22d6..5df5848c70 100644
--- a/app/test/test_power.c
+++ b/app/test/test_power.c
@@ -24,86 +24,6 @@ test_power(void)
 
 #include 
 
-static int
-check_function_ptrs(void)
-{
-   enum power_management_env env = rte_power_get_env();
-
-   const bool not_null_expected = !(env == PM_ENV_NOT_SET);
-
-   const char *inject_not_string1 = not_null_expected ? " not" : "";
-   const char *inject_not_string2 = not_null_expected ? "" : " not";
-
-   if ((rte_power_freqs == NULL) == not_null_expected) {
-   printf("rte_power_freqs should%s be NULL, environment has%s 
been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_get_freq == NULL) == not_null_expected) {
-   printf("rte_power_get_freq should%s be NULL, environment has%s 
been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_set_freq == NULL) == not_null_expected) {
-   printf("rte_power_set_freq should%s be NULL, environment has%s 
been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_freq_up == NULL) == not_null_expected) {
-   printf("rte_power_freq_up should%s be NULL, environment has%s 
been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_freq_down == NULL) == not_null_expected) {
-   printf("rte_power_freq_down should%s be NULL, environment has%s 
been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_freq_max == NULL) == not_null_expected) {
-   printf("rte_power_freq_max should%s be NULL, environment has%s 
been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_freq_min == NULL) == not_null_expected) {
-   printf("rte_power_freq_min should%s be NULL, environment has%s 
been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_turbo_status == NULL) == not_null_expected) {
-   printf("rte_power_turbo_status should%s be NULL, environment 
has%s been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_freq_enable_turbo == NULL) == not_null_expected) {
-   printf("rte_power_freq_enable_turbo should%s be NULL, 
environment has%s been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_freq_disable_turbo == NULL) == not_null_expected) {
-   printf("rte_power_freq_disable_turbo should%s be NULL, 
environment has%s been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-   if ((rte_power_get_capabilities == NULL) == not_null_expected) {
-   printf("rte_power_get_capabilities should%s be NULL, 
environment has%s been "
-   "initialised\n", inject_not_string1,
-   inject_not_string2);
-   return -1;
-   }
-
-   return 0;
-}
-
 static int
 test_power(void)
 {
@@ -124,10 +44,6 @@ test_power(void)
return -1;
}
 
-   /* Verify that function pointers are NULL */
-   if (check_function_ptrs() < 0)
-   goto fail_all;
-
rte_power_unset_env();
 
/* Perform tests for valid environments.*/
@@ -154,22 +70,11 @@ test_power(void)

[PATCH 12/18] net/r8169: implement Tx path

2024-10-17 Thread Howard Wang
Add implementation for TX datapath.

Signed-off-by: Howard Wang 
---
 drivers/net/r8169/r8169_ethdev.c |   6 +
 drivers/net/r8169/r8169_ethdev.h |  11 +
 drivers/net/r8169/r8169_rxtx.c   | 681 ++-
 3 files changed, 682 insertions(+), 16 deletions(-)

diff --git a/drivers/net/r8169/r8169_ethdev.c b/drivers/net/r8169/r8169_ethdev.c
index 6c06f71385..61aa16cc10 100644
--- a/drivers/net/r8169/r8169_ethdev.c
+++ b/drivers/net/r8169/r8169_ethdev.c
@@ -81,6 +81,11 @@ static const struct eth_dev_ops rtl_eth_dev_ops = {
.rx_queue_setup   = rtl_rx_queue_setup,
.rx_queue_release = rtl_rx_queue_release,
.rxq_info_get = rtl_rxq_info_get,
+
+   .tx_queue_setup   = rtl_tx_queue_setup,
+   .tx_queue_release = rtl_tx_queue_release,
+   .tx_done_cleanup  = rtl_tx_done_cleanup,
+   .txq_info_get = rtl_txq_info_get,
 };
 
 static int
@@ -363,6 +368,7 @@ rtl_dev_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
 
dev_info->rx_offload_capa = (rtl_get_rx_port_offloads() |
 dev_info->rx_queue_offload_capa);
+   dev_info->tx_offload_capa = rtl_get_tx_port_offloads();
 
return 0;
 }
diff --git a/drivers/net/r8169/r8169_ethdev.h b/drivers/net/r8169/r8169_ethdev.h
index cfcf576bc1..5776601081 100644
--- a/drivers/net/r8169/r8169_ethdev.h
+++ b/drivers/net/r8169/r8169_ethdev.h
@@ -77,6 +77,8 @@ struct rtl_hw {
u16 hw_clo_ptr_reg;
u16 sw_tail_ptr_reg;
u32 MaxTxDescPtrMask;
+   u32 NextHwDesCloPtr0;
+   u32 BeginHwDesCloPtr0;
 
/* Dash */
u8 HwSuppDashVer;
@@ -114,16 +116,25 @@ uint16_t rtl_recv_scattered_pkts(void *rx_queue, struct 
rte_mbuf **rx_pkts,
  uint16_t nb_pkts);
 
 void rtl_rx_queue_release(struct rte_eth_dev *dev, uint16_t rx_queue_id);
+void rtl_tx_queue_release(struct rte_eth_dev *dev, uint16_t tx_queue_id);
 
 void rtl_rxq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
   struct rte_eth_rxq_info *qinfo);
+void rtl_txq_info_get(struct rte_eth_dev *dev, uint16_t queue_id,
+  struct rte_eth_txq_info *qinfo);
 
 uint64_t rtl_get_rx_port_offloads(void);
+uint64_t rtl_get_tx_port_offloads(void);
 
 int rtl_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
uint16_t nb_rx_desc, unsigned int socket_id,
const struct rte_eth_rxconf *rx_conf,
struct rte_mempool *mb_pool);
+int rtl_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
+   uint16_t nb_tx_desc, unsigned int socket_id,
+   const struct rte_eth_txconf *tx_conf);
+
+int rtl_tx_done_cleanup(void *tx_queue, uint32_t free_cnt);
 
 int rtl_stop_queues(struct rte_eth_dev *dev);
 void rtl_free_queues(struct rte_eth_dev *dev);
diff --git a/drivers/net/r8169/r8169_rxtx.c b/drivers/net/r8169/r8169_rxtx.c
index 8fcc2ad909..14740ef0a4 100644
--- a/drivers/net/r8169/r8169_rxtx.c
+++ b/drivers/net/r8169/r8169_rxtx.c
@@ -24,11 +24,34 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "r8169_ethdev.h"
 #include "r8169_hw.h"
 #include "r8169_logs.h"
 
+/* Bit mask to indicate what bits required for building TX context */
+#define RTL_TX_OFFLOAD_MASK (RTE_MBUF_F_TX_IPV6 |  \
+RTE_MBUF_F_TX_IPV4 |   \
+RTE_MBUF_F_TX_VLAN |   \
+RTE_MBUF_F_TX_IP_CKSUM |   \
+RTE_MBUF_F_TX_L4_MASK |\
+RTE_MBUF_F_TX_TCP_SEG)
+
+#define MIN_PATCH_LENGTH 47
+#define ETH_ZLEN60 /* Min. octets in frame sans FCS */
+
+/* Struct TxDesc in kernel r8169 */
+struct rtl_tx_desc {
+   u32 opts1;
+   u32 opts2;
+   u64 addr;
+   u32 reserved0;
+   u32 reserved1;
+   u32 reserved2;
+   u32 reserved3;
+};
+
 /* Struct RxDesc in kernel r8169 */
 struct rtl_rx_desc {
u32 opts1;
@@ -36,27 +59,47 @@ struct rtl_rx_desc {
u64 addr;
 };
 
+/* Structure associated with each descriptor of the TX ring of a TX queue. */
+struct rtl_tx_entry {
+   struct rte_mbuf *mbuf;
+};
+
 /* Structure associated with each descriptor of the RX ring of a RX queue. */
 struct rtl_rx_entry {
struct rte_mbuf *mbuf;
 };
 
+/* Structure associated with each TX queue. */
+struct rtl_tx_queue {
+   struct rtl_tx_desc   *hw_ring;
+   struct rtl_tx_entry  *sw_ring;
+   struct rtl_hw*hw;
+   uint64_t hw_ring_phys_addr;
+   uint16_t nb_tx_desc;
+   RTE_ATOMIC(uint32_t) tx_tail;
+   uint16_t tx_head;
+   uint16_t queue_id;
+   uint16_t port_id;
+   uint16_t tx_free_thresh;
+   uint16_t tx_free;
+};
+
 /* Structure associated 

Re: [PATCH] hash: fix thash lfsr initialization

2024-10-17 Thread Thomas Monjalon
06/09/2024 19:01, Vladimir Medvedkin:
> Reverse polynomial for an LFSR was initialized improperly which
> could generate improper bit sequence in some situations.
> This patch implements proper polynomial reversing function.
> 
> Fixes: 28ebff11c2dc ("hash: add predictable RSS")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Vladimir Medvedkin 

There was no review but I assume it will be fine.

Applied, thanks.





[PATCH RESEND] malloc: fix allocation for a specific case with ASan

2024-10-17 Thread Artur Paszkiewicz
Allocation would fail with ASan enabled if the size and alignment was
equal to half of the page size, e.g.:

size_t pg_sz = 2 * (1 << 20);
rte_malloc(NULL, pg_sz / 2, pg_sz / 2);

In such case, try_expand_heap_primary() only allocated one page but it
is not enough to fit this allocation with such alignment and
MALLOC_ELEM_TRAILER_LEN > 0, as correctly checked by
malloc_elem_can_hold().

Signed-off-by: Artur Paszkiewicz 
---
 lib/eal/common/malloc_heap.c | 4 ++--
 lib/eal/common/malloc_mp.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index 058aaf4209..5b93e7fcb8 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -401,8 +401,8 @@ try_expand_heap_primary(struct malloc_heap *heap, uint64_t 
pg_sz,
int n_segs;
bool callback_triggered = false;
 
-   alloc_sz = RTE_ALIGN_CEIL(RTE_ALIGN_CEIL(elt_size, align) +
-   MALLOC_ELEM_OVERHEAD, pg_sz);
+   alloc_sz = RTE_ALIGN_CEIL(RTE_MAX(MALLOC_ELEM_HEADER_LEN, align) +
+   elt_size + MALLOC_ELEM_TRAILER_LEN, pg_sz);
n_segs = alloc_sz / pg_sz;
 
/* we can't know in advance how many pages we'll need, so we malloc */
diff --git a/lib/eal/common/malloc_mp.c b/lib/eal/common/malloc_mp.c
index 9765277f5d..1373da44c9 100644
--- a/lib/eal/common/malloc_mp.c
+++ b/lib/eal/common/malloc_mp.c
@@ -251,8 +251,8 @@ handle_alloc_request(const struct malloc_mp_req *m,
return -1;
}
 
-   alloc_sz = RTE_ALIGN_CEIL(RTE_ALIGN_CEIL(ar->elt_size, ar->align) +
-   MALLOC_ELEM_OVERHEAD, ar->page_sz);
+   alloc_sz = RTE_ALIGN_CEIL(RTE_MAX(MALLOC_ELEM_HEADER_LEN, ar->align) +
+   ar->elt_size + MALLOC_ELEM_TRAILER_LEN, ar->page_sz);
n_segs = alloc_sz / ar->page_sz;
 
/* we can't know in advance how many pages we'll need, so we malloc */
-- 
2.43.0



[PATCH v4] mem: allow using ASan in multi-process mode

2024-10-17 Thread Artur Paszkiewicz
Multi-process applications operate on shared hugepage memory but each
process has its own ASan shadow region which is not synchronized with
the other processes. This causes issues when different processes try to
use the same memory because they have their own view of which addresses
are valid.

Fix it by mapping the shadow regions for allocated segments as shared
memory. The primary process is responsible for creating and removing the
shared memory objects.

Signed-off-by: Artur Paszkiewicz 
---
v4:
- Map ASan shadow shm after mapping the segment.
  Due to a change in ASan behavior[1] the mapped shadow shared memory
  regions are remapped later, when segments are mapped. So instead of
  mapping the whole shadow region when reserving the memseg list memory,
  map only the fragments corresponding to the segments after they are
  mapped. Because of this it is also no longer necessary to disable ASan
  instrumentation for triggering the page fault in alloc_seg().
- Adjusted function naming.
- Enabled unit tests.
v3:
- Removed conditional compilation from eal_common_memory.c.
- Improved comments.
v2:
- Added checks for config options disabling multi-process support.
- Fixed missing unmap in legacy mode.

[1] 
https://github.com/llvm/llvm-project/commit/a34e702aa16fde4cc76e9360d985a64e008e0b23

 app/test/test_mp_secondary.c   |  2 +-
 app/test/test_pdump.c  |  2 +-
 lib/eal/common/eal_common_memory.c |  7 +++
 lib/eal/common/eal_private.h   | 54 
 lib/eal/linux/eal_memalloc.c   | 30 +
 lib/eal/linux/eal_memory.c | 98 ++
 lib/eal/linux/meson.build  |  4 ++
 7 files changed, 195 insertions(+), 2 deletions(-)

diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c
index f3694530a8..7da2878f64 100644
--- a/app/test/test_mp_secondary.c
+++ b/app/test/test_mp_secondary.c
@@ -223,4 +223,4 @@ test_mp_secondary(void)
 
 #endif /* !RTE_EXEC_ENV_WINDOWS */
 
-REGISTER_FAST_TEST(multiprocess_autotest, false, false, test_mp_secondary);
+REGISTER_FAST_TEST(multiprocess_autotest, false, true, test_mp_secondary);
diff --git a/app/test/test_pdump.c b/app/test/test_pdump.c
index 9f7769707e..a0919e89ba 100644
--- a/app/test/test_pdump.c
+++ b/app/test/test_pdump.c
@@ -219,4 +219,4 @@ test_pdump(void)
return TEST_SUCCESS;
 }
 
-REGISTER_FAST_TEST(pdump_autotest, true, false, test_pdump);
+REGISTER_FAST_TEST(pdump_autotest, true, true, test_pdump);
diff --git a/lib/eal/common/eal_common_memory.c 
b/lib/eal/common/eal_common_memory.c
index a185e0b580..8fbd0c5af9 100644
--- a/lib/eal/common/eal_common_memory.c
+++ b/lib/eal/common/eal_common_memory.c
@@ -263,6 +263,11 @@ eal_memseg_list_alloc(struct rte_memseg_list *msl, int 
reserve_flags)
EAL_LOG(DEBUG, "VA reserved for memseg list at %p, size %zx",
addr, mem_sz);
 
+   if (eal_memseg_list_init_asan_shadow(msl) != 0) {
+   EAL_LOG(ERR, "Failed to init ASan shadow region for memseg 
list");
+   return -1;
+   }
+
return 0;
 }
 
@@ -1052,6 +1057,8 @@ rte_eal_memory_detach(void)
EAL_LOG(ERR, "Could not unmap memory: %s",
rte_strerror(rte_errno));
 
+   eal_memseg_list_cleanup_asan_shadow(msl);
+
/*
 * we are detaching the fbarray rather than destroying because
 * other processes might still reference this fbarray, and we
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index bb315dab04..96e05647ff 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -309,6 +309,60 @@ eal_memseg_list_alloc(struct rte_memseg_list *msl, int 
reserve_flags);
 void
 eal_memseg_list_populate(struct rte_memseg_list *msl, void *addr, int n_segs);
 
+/**
+ * Initialize the MSL ASan shadow region shared memory.
+ *
+ * @param msl
+ *  Memory segment list.
+ * @return
+ *  0 on success, (-1) on failure.
+ */
+#ifdef RTE_MALLOC_ASAN
+int
+eal_memseg_list_init_asan_shadow(struct rte_memseg_list *msl);
+#else
+static inline int
+eal_memseg_list_init_asan_shadow(__rte_unused struct rte_memseg_list *msl)
+{
+   return 0;
+}
+#endif
+
+/**
+ * Cleanup the MSL ASan shadow region shared memory.
+ *
+ * @param msl
+ *  Memory segment list.
+ */
+#ifdef RTE_MALLOC_ASAN
+void
+eal_memseg_list_cleanup_asan_shadow(struct rte_memseg_list *msl);
+#else
+static inline void
+eal_memseg_list_cleanup_asan_shadow(__rte_unused struct rte_memseg_list *msl)
+{
+}
+#endif
+
+/**
+ * Get the MSL ASan shadow shared memory object file descriptor.
+ *
+ * @param msl
+ *  Index of the MSL.
+ * @return
+ *  A file descriptor.
+ */
+#ifdef RTE_MALLOC_ASAN
+int
+eal_memseg_list_get_asan_shadow_fd(int msl_idx);
+#else
+static inline int
+eal_memseg_list_get_asan_shadow_fd(__rte_unused int msl_idx)
+{
+   return -1;
+}
+#endif
+
 /**
  * Distribute available memory

Re: [PATCH 0/5] Increase minimum meson version

2024-10-17 Thread Bruce Richardson
On Thu, Oct 17, 2024 at 09:53:18AM +0200, David Marchand wrote:
> On Fri, Sep 20, 2024 at 2:57 PM Bruce Richardson
>  wrote:
> >
> > This patchset proposed increasing the minimum meson version to 0.57
> > and makes changes to update our build files appropriately for that
> > change: replacing deprecated functions, removing unnecessary version
> > checks and taking advantage of some new capabilities.
> >
> > Why 0.57? No one particular reason; it's mainly a conservative version
> > bump that doesn't have many impacts, but still gives us the minimum
> > updates we need to replace the deprecated get_cross_properties fn
> > and have a few extra features guaranteed available.
> >
> > Bruce Richardson (5):
> >   build: increase minimum meson version to 0.57
> >   build: remove version check on compiler links function
> >   build: remove unnecessary version checks
> >   build: use version file support from meson
> >   build: replace deprecated meson function
> >
> >  .ci/linux-setup.sh| 2 +-
> >  config/arm/meson.build| 4 ++--
> >  config/meson.build| 8 
> >  config/riscv/meson.build  | 4 ++--
> >  doc/api/meson.build   | 2 +-
> >  doc/guides/linux_gsg/sys_reqs.rst | 2 +-
> >  doc/guides/prog_guide/build-sdk-meson.rst | 2 +-
> >  drivers/common/qat/meson.build| 2 +-
> >  drivers/crypto/ipsec_mb/meson.build   | 2 +-
> >  drivers/event/cnxk/meson.build| 2 +-
> >  drivers/meson.build   | 7 ++-
> >  drivers/net/cnxk/meson.build  | 2 +-
> >  lib/meson.build   | 6 --
> >  meson.build   | 7 ++-
> >  14 files changed, 20 insertions(+), 32 deletions(-)
> 
> The series looks good, and CI guys gave me the green light.
> Series applied, thanks Bruce.
> 
> 
> There is one remaining TODO in config/meson.build:
> 
> # MS linker requires special treatment.
> # TODO: use cc.get_linker_id() with Meson >= 0.54
> is_ms_compiler = is_windows and (cc.get_id() == 'msvc')
> is_ms_linker = is_windows and (cc.get_id() == 'clang' or
> is_ms_compiler)
> 

Yep.

I'm hoping perhaps one of the windows maintainers/devs could look at this
because I see a number of possible linker values for windows listed in the
table at [1] and I'm not 100% sure which ones are to be accepted here.

/Bruce

[1] https://mesonbuild.com/Reference-tables.html#linker-ids


[PATCH v3 1/2] examples/l3fwd: add option to set RX burst size

2024-10-17 Thread Jie Hai
Now the Rx burst size is fixed to MAX_PKT_BURST (32). This
parameter needs to be modified in some performance optimization
scenarios. So an option '--burst' is added to set the burst size
explicitly. The default value is DEFAULT_PKT_BURST (32) and maximum
value is MAX_PKT_BURST (512).

Signed-off-by: Jie Hai 
Acked-by: Chengwen Feng 
Acked-by: Huisong Li 
Acked-by: Morten Brørup 
---
 examples/l3fwd/l3fwd.h |  7 +++--
 examples/l3fwd/l3fwd_acl.c |  2 +-
 examples/l3fwd/l3fwd_em.c  |  2 +-
 examples/l3fwd/l3fwd_fib.c |  2 +-
 examples/l3fwd/l3fwd_lpm.c |  2 +-
 examples/l3fwd/main.c  | 60 --
 6 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 93ce652d02b7..618e0eaa3af1 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -23,10 +23,11 @@
 #define RX_DESC_DEFAULT 1024
 #define TX_DESC_DEFAULT 1024
 
-#define MAX_PKT_BURST 32
+#define DEFAULT_PKT_BURST 32
+#define MAX_PKT_BURST 512
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
-#define MEMPOOL_CACHE_SIZE 256
+#define MEMPOOL_CACHE_SIZE RTE_MEMPOOL_CACHE_MAX_SIZE
 #define MAX_RX_QUEUE_PER_LCORE 16
 
 #define VECTOR_SIZE_DEFAULT   MAX_PKT_BURST
@@ -115,6 +116,8 @@ extern struct acl_algorithms acl_alg[];
 
 extern uint32_t max_pkt_len;
 
+extern uint32_t nb_pkt_per_burst;
+
 /* Send burst of packets on an output interface */
 static inline int
 send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port)
diff --git a/examples/l3fwd/l3fwd_acl.c b/examples/l3fwd/l3fwd_acl.c
index b635011ef708..ccb9946837ed 100644
--- a/examples/l3fwd/l3fwd_acl.c
+++ b/examples/l3fwd/l3fwd_acl.c
@@ -1119,7 +1119,7 @@ acl_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid,
-   pkts_burst, MAX_PKT_BURST);
+   pkts_burst, nb_pkt_per_burst);
 
if (nb_rx > 0) {
acl_process_pkts(pkts_burst, hops, nb_rx,
diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index 31a7e05e39d0..da9c45e3a482 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -644,7 +644,7 @@ em_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   MAX_PKT_BURST);
+   nb_pkt_per_burst);
if (nb_rx == 0)
continue;
 
diff --git a/examples/l3fwd/l3fwd_fib.c b/examples/l3fwd/l3fwd_fib.c
index 993e36cec235..5fa32685f419 100644
--- a/examples/l3fwd/l3fwd_fib.c
+++ b/examples/l3fwd/l3fwd_fib.c
@@ -239,7 +239,7 @@ fib_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   MAX_PKT_BURST);
+   nb_pkt_per_burst);
if (nb_rx == 0)
continue;
 
diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c
index e8fd95aae9ce..048c02491378 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -205,7 +205,7 @@ lpm_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   MAX_PKT_BURST);
+   nb_pkt_per_burst);
if (nb_rx == 0)
continue;
 
diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 01b763e5ba11..8b7a07cc7d67 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -53,8 +54,10 @@
 
 #define MAX_LCORE_PARAMS 1024
 
+static_assert(MEMPOOL_CACHE_SIZE >= MAX_PKT_BURST, "MAX_PKT_BURST should be at 
most MEMPOOL_CACHE_SIZE");
 uint16_t nb_rxd = RX_DESC_DEFAULT;
 uint16_t nb_txd = TX_DESC_DEFAULT;
+uint32_t nb_pkt_per_burst = DEFAULT_PKT_BURST;
 
 /**< Ports set in promiscuous mode off by default. */
 static int promiscuous_on;
@@ -395,6 +398,7 @@ print_usage(const char *prgname)
" --config (port,queue,lcore)[,(port,queue,lcore)]"
" [--rx-queue-size NPKTS]"
" [--tx-queue-size NPKTS]"
+   " [--burst NPKTS]"
" [--eth-dest=X,MM:MM:MM:MM:MM:MM]"

[PATCH v3 2/2] examples/l3fwd: add option to set mbuf cache size

2024-10-17 Thread Jie Hai
The mempool cache size of mbuf is set to
RTE_MEMPOOL_CACHE_MAX_SIZE as default. This patch allows
users to configure the cache size by "--mbcache", and limits
the parameter to a maximum of RTE_MEMPOOL_CACHE_MAX_SIZE.

Signed-off-by: Jie Hai 
Acked-by: Huisong Li 
Acked-by: Morten Brørup 
---
 examples/l3fwd/l3fwd.h |  1 +
 examples/l3fwd/main.c  | 33 ++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 618e0eaa3af1..0cce3406ee7d 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -117,6 +117,7 @@ extern struct acl_algorithms acl_alg[];
 extern uint32_t max_pkt_len;
 
 extern uint32_t nb_pkt_per_burst;
+extern uint32_t mb_mempool_cache_size;
 
 /* Send burst of packets on an output interface */
 static inline int
diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 8b7a07cc7d67..d08525faa1a6 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -58,6 +58,7 @@ static_assert(MEMPOOL_CACHE_SIZE >= MAX_PKT_BURST, 
"MAX_PKT_BURST should be at m
 uint16_t nb_rxd = RX_DESC_DEFAULT;
 uint16_t nb_txd = TX_DESC_DEFAULT;
 uint32_t nb_pkt_per_burst = DEFAULT_PKT_BURST;
+uint32_t mb_mempool_cache_size = MEMPOOL_CACHE_SIZE;
 
 /**< Ports set in promiscuous mode off by default. */
 static int promiscuous_on;
@@ -399,6 +400,7 @@ print_usage(const char *prgname)
" [--rx-queue-size NPKTS]"
" [--tx-queue-size NPKTS]"
" [--burst NPKTS]"
+   " [--mbcache CACHESZ]"
" [--eth-dest=X,MM:MM:MM:MM:MM:MM]"
" [--max-pkt-len PKTLEN]"
" [--no-numa]"
@@ -426,6 +428,8 @@ print_usage(const char *prgname)
"Default: %d\n"
"  --burst NPKTS: Burst size in decimal\n"
"Default: %d\n"
+   "  --mbcache CACHESZ: Cache size in decimal\n"
+   "Default: %d\n"
"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for 
port X\n"
"  --max-pkt-len PKTLEN: maximum packet length in decimal 
(64-9600)\n"
"  --no-numa: Disable numa awareness\n"
@@ -455,7 +459,7 @@ print_usage(const char *prgname)
"another is route entry at while line leads 
with character '%c'.\n"
"  --rule_ipv6=FILE: Specify the ipv6 rules entries file.\n"
"  --alg: ACL classify method to use, one of: %s.\n\n",
-   prgname, RX_DESC_DEFAULT, TX_DESC_DEFAULT, DEFAULT_PKT_BURST,
+   prgname, RX_DESC_DEFAULT, TX_DESC_DEFAULT, DEFAULT_PKT_BURST, 
MEMPOOL_CACHE_SIZE,
ACL_LEAD_CHAR, ROUTE_LEAD_CHAR, alg);
 }
 
@@ -673,6 +677,22 @@ parse_lookup(const char *optarg)
return 0;
 }
 
+static void
+parse_mbcache_size(const char *optarg)
+{
+   unsigned long mb_cache_size;
+   char *end = NULL;
+
+   mb_cache_size = strtoul(optarg, &end, 10);
+   if ((optarg[0] == '\0') || (end == NULL) || (*end != '\0'))
+   return;
+   if (mb_cache_size <= RTE_MEMPOOL_CACHE_MAX_SIZE)
+   mb_mempool_cache_size = (uint32_t)mb_cache_size;
+   else
+   rte_exit(EXIT_FAILURE, "mbcache must be >= 0 and <= %d\n",
+RTE_MEMPOOL_CACHE_MAX_SIZE);
+}
+
 static void
 parse_pkt_burst(const char *optarg)
 {
@@ -748,6 +768,7 @@ static const char short_options[] =
 #define CMD_LINE_OPT_RULE_IPV6 "rule_ipv6"
 #define CMD_LINE_OPT_ALG "alg"
 #define CMD_LINE_OPT_PKT_BURST "burst"
+#define CMD_LINE_OPT_MB_CACHE_SIZE "mbcache"
 
 enum {
/* long options mapped to a short option */
@@ -777,7 +798,8 @@ enum {
CMD_LINE_OPT_ENABLE_VECTOR_NUM,
CMD_LINE_OPT_VECTOR_SIZE_NUM,
CMD_LINE_OPT_VECTOR_TMO_NS_NUM,
-   CMD_LINE_OPT_PKT_BURST_NUM
+   CMD_LINE_OPT_PKT_BURST_NUM,
+   CMD_LINE_OPT_MB_CACHE_SIZE_NUM
 };
 
 static const struct option lgopts[] = {
@@ -805,6 +827,7 @@ static const struct option lgopts[] = {
{CMD_LINE_OPT_RULE_IPV6,   1, 0, CMD_LINE_OPT_RULE_IPV6_NUM},
{CMD_LINE_OPT_ALG,   1, 0, CMD_LINE_OPT_ALG_NUM},
{CMD_LINE_OPT_PKT_BURST,   1, 0, CMD_LINE_OPT_PKT_BURST_NUM},
+   {CMD_LINE_OPT_MB_CACHE_SIZE,   1, 0, CMD_LINE_OPT_MB_CACHE_SIZE_NUM},
{NULL, 0, 0, 0}
 };
 
@@ -897,6 +920,10 @@ parse_args(int argc, char **argv)
parse_pkt_burst(optarg);
break;
 
+   case CMD_LINE_OPT_MB_CACHE_SIZE_NUM:
+   parse_mbcache_size(optarg);
+   break;
+
case CMD_LINE_OPT_ETH_DEST_NUM:
parse_eth_dest(optarg);
break;
@@ -1089,7 +1116,7 @@ init_mem(uint16_t portid, unsigned int nb_mbuf)
 portid, socketid);
pktmbuf_pool[portid][socketid] =
rte_

[PATCH v3 0/2] examples/l3fwd: add more options

2024-10-17 Thread Jie Hai
Add options to support configuring RX burst size and cache size
of mbuf mempoool.

--
v3:
1. add Acked-bys.
2. fix compile error.
--
Jie Hai (2):
  examples/l3fwd: add option to set RX burst size
  examples/l3fwd: add option to set mbuf cache size

 examples/l3fwd/l3fwd.h |  8 +++-
 examples/l3fwd/l3fwd_acl.c |  2 +-
 examples/l3fwd/l3fwd_em.c  |  2 +-
 examples/l3fwd/l3fwd_fib.c |  2 +-
 examples/l3fwd/l3fwd_lpm.c |  2 +-
 examples/l3fwd/main.c  | 89 --
 6 files changed, 96 insertions(+), 9 deletions(-)

-- 
2.22.0



RE: [PATCH v14 1/4] lib: add generic support for reading PMU events

2024-10-17 Thread Tomasz Duszynski



>-Original Message-
>From: Konstantin Ananyev 
>Sent: Wednesday, October 16, 2024 10:50 AM
>To: Tomasz Duszynski ; Thomas Monjalon 
>
>Cc: ruifeng.w...@arm.com; bruce.richard...@intel.com; 
>david.march...@redhat.com; dev@dpdk.org;
>Jerin Jacob ; konstantin.v.anan...@yandex.ru; 
>mattias.ronnb...@ericsson.com;
>m...@smartsharesystems.com; roret...@linux.microsoft.com; zhou...@loongson.cn;
>step...@networkplumber.org
>Subject: [EXTERNAL] RE: [PATCH v14 1/4] lib: add generic support for reading 
>PMU events
>
>> >> +int > >> +__rte_pmu_enable_group(void) > >> +{ > >> + struct
>> >> +rte_pmu_event_group *group = &RTE_PER_LCORE(_event_group); > >> +
>> >> +int ret; > >> + > >> + if (rte_pmu. num_group_events
>
>
>
>> >> +int
>> >> +__rte_pmu_enable_group(void)
>> >> +{
>> >> + struct rte_pmu_event_group *group = &RTE_PER_LCORE(_event_group);
>> >> + int ret;
>> >> +
>> >> + if (rte_pmu.num_group_events == 0)
>> >> + return -ENODEV;
>> >> +
>> >> + ret = open_events(group);
>> >> + if (ret)
>> >> + goto out;
>> >> +
>> >> + ret = mmap_events(group);
>> >> + if (ret)
>> >> + goto out;
>> >> +
>> >> + if (ioctl(group->fds[0], PERF_EVENT_IOC_RESET, PERF_IOC_FLAG_GROUP) == 
>> >> -1) {
>> >> + ret = -errno;
>> >> + goto out;
>> >> + }
>> >> +
>> >> + if (ioctl(group->fds[0], PERF_EVENT_IOC_ENABLE, PERF_IOC_FLAG_GROUP) == 
>> >> -1) {
>> >> + ret = -errno;
>> >> + goto out;
>> >> + }
>> >> +
>> >> + rte_spinlock_lock(&rte_pmu.lock);
>> >> + TAILQ_INSERT_TAIL(&rte_pmu.event_group_list, group, next);
>> >> + rte_spinlock_unlock(&rte_pmu.lock);
>> >
>> >I thought that after previous round of reviews, we got a consensus
>> >that it is a bad idea to insert pointer of TLS variable into the global 
>> >list:
>> >https://urldefense.proofpoint.com/v2/url?u=https-
>> >
>> >3A__patchwork.dpdk.org_project_dpdk_patch_20230216175502.3164820-2D2-
>> 2Dtduszynski-
>> >40marvell.com_&d=DwIFAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=PZNXgrbjdlXxVEEGYk
>> >xIxRndyEUwWU_ad5ce22YI6Is&m=oJ
>> >-
>> >eSnmJoK0r1zVFhKrkWMnfelOkxqpjtX2fCrXaG2RdWagOqAQ7vcFCJ0dOWrTt&s=TvGxq
>> >QmUz_U3xLOroMxOmsCiaxdqbNLi6GZ
>> >pHIefniw&e=
>>
>> I don't think there was any consensus. It was rather your point of view 
>> solely.
>
>Here is a mail where I highlighted the problem:
>https://urldefense.proofpoint.com/v2/url?u=https-
>3A__inbox.dpdk.org_dev_6bf789b7ba4e4a8e847431a130372a4b-
>40huawei.com_&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=PZNXgrbjdlXxVEEGYkxIxRndyEUwWU_ad5ce22YI6Is&m=wN1
>qnauDKLK8D51b9VKif_iUKl3vHRCvB4El-B965rNZHUr9xDs6GiBAERI6lHSM&s=J0zulN5Omv_77822arGCtQhab6oa74zF-
>tBWcdU7ZUI&e=
>
>Here is a mail where Morten agreed that it needs to be addressed:
>https://urldefense.proofpoint.com/v2/url?u=https-
>3A__inbox.dpdk.org_dev_98CBD80474FA8B44BF855DF32C47DC35D87792-
>40smartserver.smartshare.dk_&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=PZNXgrbjdlXxVEEGYkxIxRndyEUwWU_ad5
>ce22YI6Is&m=wN1qnauDKLK8D51b9VKif_iUKl3vHRCvB4El-
>B965rNZHUr9xDs6GiBAERI6lHSM&s=aUb6eYuLukWkofeLkMzDbBiafIeQXlKbxnw1OTauaMU&e=
>
>Here is a mail from David, where he summarizes the remaining work required for 
>these series:
>https://urldefense.proofpoint.com/v2/url?u=https-
>3A__inbox.dpdk.org_dev_DM4PR18MB43684B889C50F20DDD1B0A29D20CA-
>40DM4PR18MB4368.namprd18.prod.outlook.com_&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=PZNXgrbjdlXxVEEGYkxI
>xRndyEUwWU_ad5ce22YI6Is&m=wN1qnauDKLK8D51b9VKif_iUKl3vHRCvB4El-
>B965rNZHUr9xDs6GiBAERI6lHSM&s=s8o4dWZJPBqtPTYSYB1jT7ufGGu-ADncO6u0ZRgOAuQ&e=
>"...
>- Konstantin asked for better explanations in the implementation.
>- He also pointed out at using this feature with non EAL lcores.
>..."
>
>Here is your reply:
>https://urldefense.proofpoint.com/v2/url?u=https-
>3A__inbox.dpdk.org_dev_DM4PR18MB43684B889C50F20DDD1B0A29D20CA-
>40DM4PR18MB4368.namprd18.prod.outlook.com_&d=DwIGaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=PZNXgrbjdlXxVEEGYkxI
>xRndyEUwWU_ad5ce22YI6Is&m=wN1qnauDKLK8D51b9VKif_iUKl3vHRCvB4El-
>B965rNZHUr9xDs6GiBAERI6lHSM&s=s8o4dWZJPBqtPTYSYB1jT7ufGGu-ADncO6u0ZRgOAuQ&e=
>
>You didn't object, so  I interpreted that (probably wrongly) that we had a 
>consensus here.
>
>> So I still believe
>> that given currently that only runs on lcores and lcores do not
>> terminate before the main one hence it's safe to access TLS from a main.
>
>Could you clarify what you means by 'lcores' here?
>Threads with valid 'lcore_id'?
>There is an API within DPDK that allows user to assign vacant lcore_id 
>((rte_thread_register) to
>some new thread, run it for some time, then release lcore_id 
>(rte_thread_unregister) and then
>terminate the thread.
>
>Another thing - there is no checks within PMU API around  does this thread 
>satisfy expected
>criteria's or not.
>So if user will call PMU API from 'wrong' thread - it would succeed and for 
>many cases  will keep
>working as expected. But for other cases (thread lifetime is shorter then 
>program lifetime) it
>might cause a crash

Re: [EXTERNAL] Re: [RFC PATCH 0/3] add feature arc in rte_graph

2024-10-17 Thread Nitin Saxena
Hi Robin/David and all,

We realized the feature arc patch series is difficult to understand as
a new concept. Our objectives are following with feature arc changes

1. Allow reusability of standard DPDK nodes (defined in lib/nodes/*)
with out-of-tree applications (like grout). Currently out-of-tree
graph applications are duplicating standard nodes but not reusing the
standard ones
which are available. In the long term, we would like to mature
standard DPDK nodes with flexibility of hooking them to out-of-tree
application nodes.

2. Flexibility to enable/disable sub-graphs per interface based on the
runtime configuration updates. Protocol sub-graphs can be selectively
enabled for few (or all interfaces) at runtime

3. More than one sub-graphs/features can be enabled on an interface.
So a packet has to follow a sequential ordering node path on worker
cores.
Packets may need to move from one sub-graph to another sub-graph per interface

4. Last but not least, an optimized implementation which does not (or
minimally) stop worker cores for any control plane runtime updates.
Any performance regression should also be avoided

I am planning to create a draft presentation on feature arc which I
can share, when ready, to discuss. If needed, I can also plan to
present that in one of the DPDK community meetings.
Their we can also discuss if there are any alternatives of achieving
above objectives

Thanks,
Nitin
.
On Wed, Oct 16, 2024 at 7:20 PM Nitin Saxena  wrote:
>
> Hi Robin,
>
> Thanks for the review
> Please see my replies inline
>
> Thanks,
> Nitin
>
> On Wed, Oct 16, 2024 at 3:08 PM Robin Jarry  wrote:
> >
> > Hi folks,
> >
> > David Marchand, Oct 16, 2024 at 11:24:
> > > On Mon, Oct 14, 2024 at 1:12 PM Nitin Saxena  wrote:
> > >> I had pushed non RFC patch series before -rc1 date (11th oct).
> > >> We have an ABI change in this patch series 
> > >> https://patches.dpdk.org/project/dpdk/patch/20241010133111.2764712-3-nsax...@marvell.com/
> > >> Could you help merge this patch series in rc2 otherwise it has to wait 
> > >> for next LTS
> > >
> > > Just read through the series, I am not confident with this addition.
> > > It requires a lot of changes in the node code for supporting it, where
> > > it should be something handled in/facilitated by the graph library
> > > itself.
> >
> > As far as I can tell, it will be very complicated (if not impossible) to
> > determine in a generic manner whether a packet must be steered towards
> > a sub tree or not. The decision *must* come from the originating node in
> > some way or another.
>
> Nitin> I am not sure if it *must* always be from the originating node?
> What about a control plane which wants to enable "IP4 feature" on
> interface  'X'  by assigning IP address?
> A originating node (say: ip4-input) *must not* activate IP4 lookup
> sub-graph for interface "X " until control plane assigns any IP
> address to it.
>
> Regarding the complexity of adopting feature arc changes in fast path,
> - a sub-optimal change for feature-arc would be simple and trivial but
> at the cost of performance.
> - Complexity increases when feature arc changes are optimally
> integrated (like "ip4_rewrite" changes in the patch) with no
> performance regression
>
> >
> > > I did not read much from Robin or Christophe who have been writing
> > > more node code than me.
> > > I would prefer their opinion before going forward.
> >
> > This series is indeed very dense. I like the concept of having
> > extensible sub trees in the graph but it feels like the implementation
> > is more complex than it should be.
> >
> > Lacking of another solution, we went for a naive approach in grout.
> > Basically, some nodes have undefined next nodes which are extended using
> > a dedicated API.
>
> Nitin> With an initial glance, it looks like "grout" is trying to
> solve a use-case where a child is being added to the parent's
> undefined next node. This is trying to create a runtime  parent-child
> relationship
>
> On the other hand, feature arc not just create parent-child
> relationships but also sibling-sibling relationships as well. Also
> enabling sub-graph per interface is critical functionality in feature
> arc that adds complexity
>
> Let's assume a use-case in ingress direction, at the IPv4 layer,
> where IPv4-input is the *originating node* and
>
> - On interface X, IPsec-policy, IP4-classify() and IPv4-lookup
> sub-graphs are enabled in a sequential order
> - On interface Y, IP4-classify() and IPv4-lookup sub-graphs are
> enabled. in a sequential order. i.e. IPsec-policy is *disabled* on
> interface Y
>
> In fast path, following processing should happen for "mbuf0" which is
> received on interface "X"
> - "ipv4-input" sends mbuf0 to the first enabled sub-graph node for
> interface X, "IPsec-policy"
> - In "IPsec-policy" node processing, if policy action results in
> "bypass" action for mbuf0, it must then be sent to next enabled
> sub-graph  i.e. "IPv4-classify" (from "IPsec-policy" node)
> - In 

Re: [EXTERNAL] Re: [RFC PATCH 0/3] add feature arc in rte_graph

2024-10-17 Thread Nitin Saxena
Hi Robin,

Thanks for the review
Please see my replies inline

Thanks,
Nitin

On Wed, Oct 16, 2024 at 3:08 PM Robin Jarry  wrote:
>
> Hi folks,
>
> David Marchand, Oct 16, 2024 at 11:24:
> > On Mon, Oct 14, 2024 at 1:12 PM Nitin Saxena  wrote:
> >> I had pushed non RFC patch series before -rc1 date (11th oct).
> >> We have an ABI change in this patch series 
> >> https://patches.dpdk.org/project/dpdk/patch/20241010133111.2764712-3-nsax...@marvell.com/
> >> Could you help merge this patch series in rc2 otherwise it has to wait for 
> >> next LTS
> >
> > Just read through the series, I am not confident with this addition.
> > It requires a lot of changes in the node code for supporting it, where
> > it should be something handled in/facilitated by the graph library
> > itself.
>
> As far as I can tell, it will be very complicated (if not impossible) to
> determine in a generic manner whether a packet must be steered towards
> a sub tree or not. The decision *must* come from the originating node in
> some way or another.

Nitin> I am not sure if it *must* always be from the originating node?
What about a control plane which wants to enable "IP4 feature" on
interface  'X'  by assigning IP address?
A originating node (say: ip4-input) *must not* activate IP4 lookup
sub-graph for interface "X " until control plane assigns any IP
address to it.

Regarding the complexity of adopting feature arc changes in fast path,
- a sub-optimal change for feature-arc would be simple and trivial but
at the cost of performance.
- Complexity increases when feature arc changes are optimally
integrated (like "ip4_rewrite" changes in the patch) with no
performance regression

>
> > I did not read much from Robin or Christophe who have been writing
> > more node code than me.
> > I would prefer their opinion before going forward.
>
> This series is indeed very dense. I like the concept of having
> extensible sub trees in the graph but it feels like the implementation
> is more complex than it should be.
>
> Lacking of another solution, we went for a naive approach in grout.
> Basically, some nodes have undefined next nodes which are extended using
> a dedicated API.

Nitin> With an initial glance, it looks like "grout" is trying to
solve a use-case where a child is being added to the parent's
undefined next node. This is trying to create a runtime  parent-child
relationship

On the other hand, feature arc not just create parent-child
relationships but also sibling-sibling relationships as well. Also
enabling sub-graph per interface is critical functionality in feature
arc that adds complexity

Let's assume a use-case in ingress direction, at the IPv4 layer,
where IPv4-input is the *originating node* and

- On interface X, IPsec-policy, IP4-classify() and IPv4-lookup
sub-graphs are enabled in a sequential order
- On interface Y, IP4-classify() and IPv4-lookup sub-graphs are
enabled. in a sequential order. i.e. IPsec-policy is *disabled* on
interface Y

In fast path, following processing should happen for "mbuf0" which is
received on interface "X"
- "ipv4-input" sends mbuf0 to the first enabled sub-graph node for
interface X, "IPsec-policy"
- In "IPsec-policy" node processing, if policy action results in
"bypass" action for mbuf0, it must then be sent to next enabled
sub-graph  i.e. "IPv4-classify" (from "IPsec-policy" node)
- In "IPv4-classify" node processing, if classify fails for mbuf0 then
it should finally be sent to "IPv4-lookup" node (from "IPv4-classify"
node)

whereas for "mbuf1" received on interface Y following fast path
processing must happen
- "Ipv4-input" sends mbuf1 to the first enabled sub-graph node for
interface Y, "IPv4-classify"
- If "IPv4-classify" fails for mbuf1, then it should finally be sent
to IPv4-lookup node

To behave differently for interface X and interface Y as above
- First of all, IPsec-policy/IPv4-classify/IPv4-lookup must be
connected to "ipv4-input" node (Parent-Child relationship)
- Also, IPsec-policy/IPv4-classify/IPv4-lookup must also be connected
with each other (Sibling-Sibling relationship)
- Fast path APIs provide *rte_edges_t* to send mbuf from one node to
another node
   1. Based on interface (either Interface X or Interface Y)
   2. Based on which node, fast path APIs are called. Next enabled
feature/sub-graph can only be determined from previous enabled
feature/sub-graph in fast path

Not sure if grout handles above use-cases in the same manner. AFAIR ,
for any control plane change grout re-creates "graph" objects which
may not be required with feature arc.

>
> https://github.com/DPDK/grout/blob/v0.2/modules/infra/datapath/eth_input.c#L23-L31
>
> This API can be used by other nodes to attach themselves to these
> extensible nodes:
>
> https://github.com/DPDK/grout/blob/v0.2/modules/ip/datapath/arp_input.c#L143
> https://github.com/DPDK/grout/blob/v0.2/modules/ip/datapath/ip_input.c#L124
> https://github.com/DPDK/grout/blob/v0.2/modules/ip6/datapath/ip6_input.c#L122
>
> After which, the e

RE: [PATCH 00/10] net/mlx5: improve MAC address and VLAN add latency

2024-10-17 Thread Slava Ovsiienko
For the entire series:

Acked-by: Viacheslav Ovsiienko 

> -Original Message-
> From: Dariusz Sosnowski 
> Sent: Thursday, October 17, 2024 10:57 AM
> To: Slava Ovsiienko ; Bing Zhao
> ; Ori Kam ; Suanming Mou
> ; Matan Azrad 
> Cc: dev@dpdk.org
> Subject: [PATCH 00/10] net/mlx5: improve MAC address and VLAN add
> latency
> 
> Whenever a new MAC address is added to the port, mlx5 PMD will:
> 
> - Add this address to `dev->data->mac_addrs[]`.
> - Destroy all control flow rules.
> - Recreate all control flow rules.
> 
> Similar logic is also implemented for VLAN filters.
> 
> Because of such logic, the latency of adding the new MAC address (i.e.,
> latency of `rte_eth_dev_mac_addr_add()` function call) is actually linear to
> number of MAC addresses already configured.
> Since each operation of creating/destroying a control flow rule, involves an
> `ioctl()` syscall, on some setups the latency of adding a single MAC address
> can reach ~100ms, when port is operating with >= 100 MAC addresses.
> The same problem exists for VLAN filters (and even compounded by it).
> 
> This patchset aims to resolve these issues, by reworking how mlx5 PMD
> handles adding/removing MAC addresses and VLAN filters.
> Instead of recreating all control flow rules, only necessary flow rules will 
> be
> created/removed on each operation, thus minimizing number of syscalls
> triggered.
> 
> Summary of patches:
> 
> - Patch 1-2 - Extends existing `mlx5_hw_ctrl_flow_type` enum with special
> variants,
>   which will be used for tracking MAC and VLAN control flow rules.
> - Patch 3-4 - Refactors HWS code for control flow rule creation to allow
>   creation of specific control flow rules with unicast MAC/VLAN match.
>   Also functions are added for deletion of specific rules.
> - Patch 5-6 - Prepares the control flow rules list, used by HWS flow engine,
>   to be used by other flow engine.
>   Goal is to reuse the similar logic in Verbs and DV flow engines.
> - Patch 7-8 - Adjusts legacy flow engines, so that unicast DMAC/VLAN control
> flow rules
>   are added to the control flow rules list.
>   Also exposes functions for creating/destroying specific ones.
> - Patch 9-10 - Extends `mlx5_traffic_*` interface with
> `mlx5_traffic_mac_add/remove` and
>   `mlx5_traffic_vlan_add/remove` functions.
>   They are used in implementations of DPDK APIs for adding/removing MAC
> addresses/VLAN filters
>   and their goal is to update the set of control flow rules in a minimal 
> number
> of steps possible,
>   without recreating the rules.
> 
> As a result of these patches the time to add 128th MAC address, after 127th
> was added drops **from ~72 ms to ~197 us** (at least on my setup).
> 
> Dariusz Sosnowski (10):
>   net/mlx5: track unicast DMAC control flow rules
>   net/mlx5: add checking if unicast flow rule exists
>   net/mlx5: rework creation of unicast flow rules
>   net/mlx5: support destroying unicast flow rules
>   net/mlx5: rename control flow rules types
>   net/mlx5: shared init of control flow rules
>   net/mlx5: add legacy unicast flow rules management
>   net/mlx5: add legacy unicast flow rule registration
>   net/mlx5: add dynamic unicast flow rule management
>   net/mlx5: optimize MAC address and VLAN filter handling
> 
>  drivers/net/mlx5/linux/mlx5_os.c  |   3 +
>  drivers/net/mlx5/meson.build  |   1 +
>  drivers/net/mlx5/mlx5.h   |  62 +++--
>  drivers/net/mlx5/mlx5_flow.c  | 149 ++-
>  drivers/net/mlx5/mlx5_flow.h  |  36 +++
>  drivers/net/mlx5/mlx5_flow_hw.c   | 349 --
>  drivers/net/mlx5/mlx5_flow_hw_stubs.c |  68 +
>  drivers/net/mlx5/mlx5_mac.c   |  41 ++-
>  drivers/net/mlx5/mlx5_trigger.c   | 262 ++-
>  drivers/net/mlx5/mlx5_vlan.c  |   9 +-
>  drivers/net/mlx5/windows/mlx5_os.c|   3 +
>  11 files changed, 867 insertions(+), 116 deletions(-)  create mode 100644
> drivers/net/mlx5/mlx5_flow_hw_stubs.c
> 
> --
> 2.39.5



Re: [PATCH v2 1/2] examples/l3fwd: add option to set RX burst size

2024-10-17 Thread lihuisong (C)

Acked-by: Huisong Li 

在 2024/10/17 16:58, Jie Hai 写道:

Now the Rx burst size is fixed to MAX_PKT_BURST (32). This
parameter needs to be modified in some performance optimization
scenarios. So an option '--burst' is added to set the burst size
explicitly. The default value is DEFAULT_PKT_BURST (32) and maximum
value is MAX_PKT_BURST (512).

Signed-off-by: Jie Hai 
Acked-by: Chengwen Feng 
---
  examples/l3fwd/l3fwd.h |  7 +++--
  examples/l3fwd/l3fwd_acl.c |  2 +-
  examples/l3fwd/l3fwd_em.c  |  2 +-
  examples/l3fwd/l3fwd_fib.c |  2 +-
  examples/l3fwd/l3fwd_lpm.c |  2 +-
  examples/l3fwd/main.c  | 60 --
  6 files changed, 67 insertions(+), 8 deletions(-)

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 93ce652d02b7..618e0eaa3af1 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -23,10 +23,11 @@
  #define RX_DESC_DEFAULT 1024
  #define TX_DESC_DEFAULT 1024
  
-#define MAX_PKT_BURST 32

+#define DEFAULT_PKT_BURST 32
+#define MAX_PKT_BURST 512
  #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
  
-#define MEMPOOL_CACHE_SIZE 256

+#define MEMPOOL_CACHE_SIZE RTE_MEMPOOL_CACHE_MAX_SIZE
  #define MAX_RX_QUEUE_PER_LCORE 16
  
  #define VECTOR_SIZE_DEFAULT   MAX_PKT_BURST

@@ -115,6 +116,8 @@ extern struct acl_algorithms acl_alg[];
  
  extern uint32_t max_pkt_len;
  
+extern uint32_t nb_pkt_per_burst;

+
  /* Send burst of packets on an output interface */
  static inline int
  send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port)
diff --git a/examples/l3fwd/l3fwd_acl.c b/examples/l3fwd/l3fwd_acl.c
index b635011ef708..ccb9946837ed 100644
--- a/examples/l3fwd/l3fwd_acl.c
+++ b/examples/l3fwd/l3fwd_acl.c
@@ -1119,7 +1119,7 @@ acl_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid,
-   pkts_burst, MAX_PKT_BURST);
+   pkts_burst, nb_pkt_per_burst);
  
  			if (nb_rx > 0) {

acl_process_pkts(pkts_burst, hops, nb_rx,
diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index 31a7e05e39d0..da9c45e3a482 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -644,7 +644,7 @@ em_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   MAX_PKT_BURST);
+   nb_pkt_per_burst);
if (nb_rx == 0)
continue;
  
diff --git a/examples/l3fwd/l3fwd_fib.c b/examples/l3fwd/l3fwd_fib.c

index f38b19af3f57..aa81b12fe7dc 100644
--- a/examples/l3fwd/l3fwd_fib.c
+++ b/examples/l3fwd/l3fwd_fib.c
@@ -239,7 +239,7 @@ fib_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   MAX_PKT_BURST);
+   nb_pkt_per_burst);
if (nb_rx == 0)
continue;
  
diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c

index e8fd95aae9ce..048c02491378 100644
--- a/examples/l3fwd/l3fwd_lpm.c
+++ b/examples/l3fwd/l3fwd_lpm.c
@@ -205,7 +205,7 @@ lpm_main_loop(__rte_unused void *dummy)
portid = qconf->rx_queue_list[i].port_id;
queueid = qconf->rx_queue_list[i].queue_id;
nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
-   MAX_PKT_BURST);
+   nb_pkt_per_burst);
if (nb_rx == 0)
continue;
  
diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c

index 01b763e5ba11..2feae5b311a2 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -53,8 +54,10 @@
  
  #define MAX_LCORE_PARAMS 1024
  
+static_assert(MEMPOOL_CACHE_SIZE >= MAX_PKT_BURST);

  uint16_t nb_rxd = RX_DESC_DEFAULT;
  uint16_t nb_txd = TX_DESC_DEFAULT;
+uint32_t nb_pkt_per_burst = DEFAULT_PKT_BURST;
  
  /**< Ports set in promiscuous mode off by default. */

  static int promiscuous_on;
@@ -395,6 +398,7 @@ print_usage(const char *prgname)
" --config (port,queue,lcore)[,(port,queue,lcore)]"
" [--rx-queue-size NPKTS]"
" [--tx-queue-size NPKTS]"
+   " [--burst NPKTS]"
" [--eth-dest=X,MM:MM:MM:MM:MM:MM]"
  

Re: [PATCH v2 2/2] examples/l3fwd: add option to set mbuf cache size

2024-10-17 Thread lihuisong (C)

lgtm, Acked-by: Huisong Li 

在 2024/10/17 16:58, Jie Hai 写道:

The mempool cache size of mbuf is set to
RTE_MEMPOOL_CACHE_MAX_SIZE as default. This patch allows
users to configure the cache size by "--mbcache", and limits
the paramater to a maximum of RTE_MEMPOOL_CACHE_MAX_SIZE.

Signed-off-by: Jie Hai 
---
  examples/l3fwd/l3fwd.h |  1 +
  examples/l3fwd/main.c  | 33 ++---
  2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 618e0eaa3af1..0cce3406ee7d 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -117,6 +117,7 @@ extern struct acl_algorithms acl_alg[];
  extern uint32_t max_pkt_len;
  
  extern uint32_t nb_pkt_per_burst;

+extern uint32_t mb_mempool_cache_size;
  
  /* Send burst of packets on an output interface */

  static inline int
diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 2feae5b311a2..1e2da739dded 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -58,6 +58,7 @@ static_assert(MEMPOOL_CACHE_SIZE >= MAX_PKT_BURST);
  uint16_t nb_rxd = RX_DESC_DEFAULT;
  uint16_t nb_txd = TX_DESC_DEFAULT;
  uint32_t nb_pkt_per_burst = DEFAULT_PKT_BURST;
+uint32_t mb_mempool_cache_size = MEMPOOL_CACHE_SIZE;
  
  /**< Ports set in promiscuous mode off by default. */

  static int promiscuous_on;
@@ -399,6 +400,7 @@ print_usage(const char *prgname)
" [--rx-queue-size NPKTS]"
" [--tx-queue-size NPKTS]"
" [--burst NPKTS]"
+   " [--mbcache CACHESZ]"
" [--eth-dest=X,MM:MM:MM:MM:MM:MM]"
" [--max-pkt-len PKTLEN]"
" [--no-numa]"
@@ -426,6 +428,8 @@ print_usage(const char *prgname)
"Default: %d\n"
"  --burst NPKTS: Burst size in decimal\n"
"Default: %d\n"
+   "  --mbcache CACHESZ: Cache size in decimal\n"
+   "Default: %d\n"
"  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for port 
X\n"
"  --max-pkt-len PKTLEN: maximum packet length in decimal 
(64-9600)\n"
"  --no-numa: Disable numa awareness\n"
@@ -455,7 +459,7 @@ print_usage(const char *prgname)
"another is route entry at while line leads with 
character '%c'.\n"
"  --rule_ipv6=FILE: Specify the ipv6 rules entries file.\n"
"  --alg: ACL classify method to use, one of: %s.\n\n",
-   prgname, RX_DESC_DEFAULT, TX_DESC_DEFAULT, DEFAULT_PKT_BURST,
+   prgname, RX_DESC_DEFAULT, TX_DESC_DEFAULT, DEFAULT_PKT_BURST, 
MEMPOOL_CACHE_SIZE,
ACL_LEAD_CHAR, ROUTE_LEAD_CHAR, alg);
  }
  
@@ -673,6 +677,22 @@ parse_lookup(const char *optarg)

return 0;
  }
  
+static void

+parse_mbcache_size(const char *optarg)
+{
+   unsigned long mb_cache_size;
+   char *end = NULL;
+
+   mb_cache_size = strtoul(optarg, &end, 10);
+   if ((optarg[0] == '\0') || (end == NULL) || (*end != '\0'))
+   return;
+   if (mb_cache_size <= RTE_MEMPOOL_CACHE_MAX_SIZE)
+   mb_mempool_cache_size = (uint32_t)mb_cache_size;
+   else
+   rte_exit(EXIT_FAILURE, "mbcache must be >= 0 and <= %d\n",
+RTE_MEMPOOL_CACHE_MAX_SIZE);
+}
+
  static void
  parse_pkt_burst(const char *optarg)
  {
@@ -748,6 +768,7 @@ static const char short_options[] =
  #define CMD_LINE_OPT_RULE_IPV6 "rule_ipv6"
  #define CMD_LINE_OPT_ALG "alg"
  #define CMD_LINE_OPT_PKT_BURST "burst"
+#define CMD_LINE_OPT_MB_CACHE_SIZE "mbcache"
  
  enum {

/* long options mapped to a short option */
@@ -777,7 +798,8 @@ enum {
CMD_LINE_OPT_ENABLE_VECTOR_NUM,
CMD_LINE_OPT_VECTOR_SIZE_NUM,
CMD_LINE_OPT_VECTOR_TMO_NS_NUM,
-   CMD_LINE_OPT_PKT_BURST_NUM
+   CMD_LINE_OPT_PKT_BURST_NUM,
+   CMD_LINE_OPT_MB_CACHE_SIZE_NUM
  };
  
  static const struct option lgopts[] = {

@@ -805,6 +827,7 @@ static const struct option lgopts[] = {
{CMD_LINE_OPT_RULE_IPV6,   1, 0, CMD_LINE_OPT_RULE_IPV6_NUM},
{CMD_LINE_OPT_ALG,   1, 0, CMD_LINE_OPT_ALG_NUM},
{CMD_LINE_OPT_PKT_BURST,   1, 0, CMD_LINE_OPT_PKT_BURST_NUM},
+   {CMD_LINE_OPT_MB_CACHE_SIZE,   1, 0, CMD_LINE_OPT_MB_CACHE_SIZE_NUM},
{NULL, 0, 0, 0}
  };
  
@@ -897,6 +920,10 @@ parse_args(int argc, char **argv)

parse_pkt_burst(optarg);
break;
  
+		case CMD_LINE_OPT_MB_CACHE_SIZE_NUM:

+   parse_mbcache_size(optarg);
+   break;
+
case CMD_LINE_OPT_ETH_DEST_NUM:
parse_eth_dest(optarg);
break;
@@ -1089,7 +1116,7 @@ init_mem(uint16_t portid, unsigned int nb_mbuf)
 portid, socketid);
pktmbuf_pool[portid][socketid] =
 

Re: [PATCH v1 1/2] power: fix power library with --lcores

2024-10-17 Thread lihuisong (C)

Good job.
With belows to change,
Acked-by: Huisong Li 


在 2024/10/17 19:02, Sivaprasad Tummala 写道:

This commit fixes an issue in the power library
related to using lcores mapped to different
physical cores (--lcores option in EAL).

Previously, the power library incorrectly accessed
CPU sysfs attributes for power management, treating
lcore IDs as CPU IDs.
e.g. with --lcores '1@128', lcore_id '1' was interpreted
as CPU_id instead of '128'.

This patch corrects the cpu_id based on lcore and CPU
mappings. It also constraints power management support
for lcores mapped to multiple physical cores/threads.

When multiple lcores are mapped to the same physical core,
invoking frequency scaling APIs on any lcore will apply the
changes effectively.

Signed-off-by: Sivaprasad Tummala 
---
  lib/power/power_acpi_cpufreq.c   |  7 ++-
  lib/power/power_amd_pstate_cpufreq.c |  7 ++-
  lib/power/power_common.c | 23 +++
  lib/power/power_common.h |  2 +-
  lib/power/power_cppc_cpufreq.c   |  7 ++-
  lib/power/power_pstate_cpufreq.c |  7 ++-
  6 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
index 81996e1c13..259bc8a263 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/lib/power/power_acpi_cpufreq.c
@@ -258,7 +258,12 @@ power_acpi_cpufreq_init(unsigned int lcore_id)
return -1;
}
  
-	pi->lcore_id = lcore_id;

+   if (check_lcore_and_set_cpu(lcore_id, &pi->lcore_id) < 0) {
+   POWER_LOG(ERR,
+   "Cannot get cpu id mapped for lcore %u", lcore_id);
+   return -1;
+   }
+
/* Check and set the governor */
if (power_set_governor_userspace(pi) < 0) {
POWER_LOG(ERR, "Cannot set governor of lcore %u to "
diff --git a/lib/power/power_amd_pstate_cpufreq.c 
b/lib/power/power_amd_pstate_cpufreq.c
index 090a0d96cb..9bfb271bb7 100644
--- a/lib/power/power_amd_pstate_cpufreq.c
+++ b/lib/power/power_amd_pstate_cpufreq.c
@@ -376,7 +376,12 @@ power_amd_pstate_cpufreq_init(unsigned int lcore_id)
return -1;
}
  
-	pi->lcore_id = lcore_id;

+   if (check_lcore_and_set_cpu(lcore_id, &pi->lcore_id) < 0) {
+   POWER_LOG(ERR,
+   "Cannot get cpu id mapped for lcore %u", lcore_id);
+   return -1;
+   }
+
/* Check and set the governor */
if (power_set_governor_userspace(pi) < 0) {
POWER_LOG(ERR, "Cannot set governor of lcore %u to "
diff --git a/lib/power/power_common.c b/lib/power/power_common.c
index 590986d5ef..a8d5cd1c50 100644
--- a/lib/power/power_common.c
+++ b/lib/power/power_common.c
@@ -9,6 +9,7 @@
  
  #include 

  #include 
+#include 
  
  #include "power_common.h"
  
@@ -204,3 +205,25 @@ power_set_governor(unsigned int lcore_id, const char *new_governor,
  
  	return ret;

  }
+
+int check_lcore_and_set_cpu(uint32_t lcore_id, uint32_t *cpu_id)
How about the following function name? It is probably more common for 
other power library like PM QoS which is upstreaming.

-->
int power_get_lcore_mapped_cpu_id(lcore_id)
negative on failure
>=0 on success

+{
+   rte_cpuset_t lcore_cpus;
+   uint32_t cpu;
+
+   lcore_cpus = rte_lcore_cpuset(lcore_id);
+   if (CPU_COUNT(&lcore_cpus) != 1) {
+   POWER_LOG(ERR,
+   "Power library does not support lcore %u mapping to %u 
cpus",
+   lcore_id, CPU_COUNT(&lcore_cpus));
+   return -1;
+   }
+
+   for (cpu = 0; cpu < CPU_SETSIZE; cpu++) {
+   if (CPU_ISSET(cpu, &lcore_cpus))
+   break;
+   }
+   *cpu_id = cpu;
+
+   return 0;
+}

<...>


Re: [EXTERNAL] [RFC PATCH 0/3] add feature arc in rte_graph

2024-10-17 Thread Nitin Saxena
Hi Christophe,

Please see inline comments

Thanks,
Nitin

On Thu, Oct 17, 2024 at 2:02 PM Christophe Fontaine  wrote:
>
> Hi all,
>
> What about the following steps:
> - update the nodes so they work on the current layer (example: for all L3 
> nodes, the current mbuf data offset *must* be pointing to the IP header)

Agreed. It would be better if nodes uses
rte_pktmbuf_[append()/shrink() etc..] APIs to manipulate layer data
offset

> - define a public data structure that would be shared across nodes through 
> priv data, and not dynfields ?

Eventually public data structures should be defined to serve *a
purpose*. Do you refer to creating a generic public structure? If yes,
IMO, it may not be tuned for performance
IMO, we need to create public structures for each specific purpose.
Feature arc is also a public data structure which optimally saves
following variables in 8 byte compact structure
(rte_graph_feature_daa_t) for every interface
- rte_edge_t (uint16_t)
- next enabled feature (uint8_t) per index (from current node)
-  Per interface feature specific user_data (uint32_t)

Due to its compact nature, 8 such objects per interface can be saved
in one 64B cache line. So IMO, it is better to create public
structures for a given purpose and optimally define them fields and
APIs.
Also feature arc specific 3B data is saved in mbuf dynfield. Hard to
say if priv data would provide a better solution.

> This structure would be the "internal api" (so, that has to be tracked across 
> dpdk releases) between nodes.
> We’d need common data shared for all the nodes as well as specific data 
> between 2 nodes.
> As we get to this point, this (hopefully) will help with the node reusability.

Feature arc also maintains data between 2 nodes per interface and also
for all nodes which are added as features.

>
> - Update the feature arcs to leverage this well known structure, and refine 
> the api
> - Define which part of the stack needs to be defined as a feature arc, with 
> the benefit of the generic API to enable/disable that feature, and which part 
> needs to be dynamically pluggable.
> For instance, for a router, it may not make sense to define IPv4 support as a 
> feature arc.
> So, we’d statically connect eth_input to ip_input.

Agreed

> Yet, lldp support is a good candidate for a feature arc: we need to configure 
> it per interface, and this is independent of the main graph.
>

There would be more protocols which need to be enabled per interface

> WDYT?
> Christophe
>
> > On 17 Oct 2024, at 09:50, Robin Jarry  wrote:
> >
> > Hi Nitin, all,
> >
> > Nitin Saxena, Oct 17, 2024 at 09:03:
> >> Hi Robin/David and all,
> >>
> >> We realized the feature arc patch series is difficult to understand as a 
> >> new concept. Our objectives are following with feature arc changes
> >>
> >> 1. Allow reusability of standard DPDK nodes (defined in lib/nodes/*)
> >> with out-of-tree applications (like grout). Currently out-of-treegraph 
> >> applications are duplicating standard nodes but not reusingthe 
> >> standard ones which are available. In the long term, we wouldlike to 
> >> mature standard DPDK nodes with flexibility of hooking themto 
> >> out-of-tree application nodes.
> >
> > It would be ideal if the in-built nodes could be reused. When we started 
> > working on grout, I tried multiple approaches where I could reuse these 
> > nodes, but all failed. The nodes public API seems tailored for app/graph 
> > but does not fit well with other control plane implementations.
> >
> > One of the main issues I had is that the ethdev_rx and ethdev_tx nodes are 
> > cloned per rxq / txq associated with a graph worker. The rte_node API 
> > requires that every clone has a unique name. This in turn makes hot 
> > plugging of DPDK ports very complex, if not impossible.
> >
> > For example, with the in-built nodes, it is not possible to change the 
> > number of ports or their number of RX queues without destroying the whole 
> > graph and creating a new one from scratch.
> >
> > Also, the current implementation of "ip{4,6}-rewrite" handles writing 
> > ethernet header data. This would prevent it from using this node for an 
> > IP-in-IP tunnel interface as we did in grout.
> >
> > Do you think we could change the in-built nodes to enforce OSI layer 
> > separation of concerns? It would make them much more flexible. It may cause 
> > a slight drop of performance because you'd be splitting processing in two 
> > different nodes. But I think flexibility is more important. Otherwise, the 
> > in-built nodes can only be used for very specific use-cases.
> >
> > Finally, I would like to improve the rte_node API to allow defining and 
> > enforcing per-packet metadata that every node expects as input. The current 
> > in-built nodes rely on mbuf dynamic fields for this but this means you only 
> > have 9x32 bits available. And using all of these may break some drivers 
> > (ixgbe) that rely on dynfields to work. Have y

RE: [PATCH v10 2/2] examples/l3fwd-power: add PM QoS configuration

2024-10-17 Thread Konstantin Ananyev

> >
>  Add PM QoS configuration to declease the delay after sleep in case of
>  entering deeper idle state.
> 
>  Signed-off-by: Huisong Li 
>  Acked-by: Morten Brørup 
>  ---
> examples/l3fwd-power/main.c | 24 
> 1 file changed, 24 insertions(+)
> 
>  diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
>  index 2bb6b092c3..b0ddb54ee2 100644
>  --- a/examples/l3fwd-power/main.c
>  +++ b/examples/l3fwd-power/main.c
>  @@ -47,6 +47,7 @@
> #include 
> #include 
> #include 
>  +#include 
> 
> #include "perf_core.h"
> #include "main.h"
>  @@ -2260,6 +2261,22 @@ init_power_library(void)
>   return -1;
>   }
>   }
>  +
>  +RTE_LCORE_FOREACH(lcore_id) {
>  +/*
>  + * Set the worker lcore's to have strict latency limit 
>  to allow
>  + * the CPU to enter the shallowest idle state.
>  + */
>  +ret = rte_power_qos_set_cpu_resume_latency(lcore_id,
>  +
>  RTE_POWER_QOS_STRICT_LATENCY_VALUE);
> >>> I wonder why it is set to all worker cores silently and unconditionally?
> >>> Wouldn't it be a change from current behavior of the power library?
> >> L3fwd-power uses Rx interrupt to receive packet.
> > AFAIK, not exactly.
> >  From what I remember l3fwd-power still runs RX in poll mode,
> > thought it counts number of idle rx bursts.
> > As that number goes above some threshold, it puts itself into
> > sleep with some timeout value.
> Exactly.
> >
> >> Do you mean this
> >> setting should be for the core of Rx queue, right?
> >> This setting doesn't change the behavior of l3fwd-power. It is just for
> >> getting low resume latency when worker core wakes up from sleeping.
> > As understand your patch - you force CPU to select more shallow C state
> > when entering such sleep.
> > Then it means that possible packet loss will be smaller,
> > but power consumption probably higher, correct?
> correct.
> > If so, then it looks like a change from current  behavior for that app,
> > and we probably need to document what will be an expected change.
> > Or probably as a better way - provider user with a way to choose,
> > new cmdline option or so.
> Yes.
> The power consumption may increase but the performance is better due to
> this patch if the platform enables cpuidle funtion.

Yes, that what I expect, and personally I am ok with that.
Though I suspect different users who use this sample as some test-app
might have different priorities in that tradeoff (power vs performance).

> After all, this is just a very little point. It is enough to document
> this change or impact in doc of this API. Just let it more clear for user.
> What do you think?

I think yes, probably just updating docs (rel-notes, SG ?) will be enough.
David Hunt, what are your thoughts here?

> >
>  +if (ret != 0) {
>  +RTE_LOG(ERR, L3FWD_POWER,
>  +"Failed to set strict resume latency on 
>  core%u.\n",
>  +lcore_id);
>  +return ret;
>  +}
>  +}
>  +
>   return ret;
> }
> 
>  @@ -2299,6 +2316,13 @@ deinit_power_library(void)
>   }
>   }
>   }
>  +
>  +RTE_LCORE_FOREACH(lcore_id) {
>  +/* Restore the original value in kernel. */
>  +rte_power_qos_set_cpu_resume_latency(lcore_id,
>  +
>  RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT);
>  +}
>  +
>   return ret;
> }
> 
>  --
>  2.22.0


RE: [PATCH v1 1/2] power: fix power library with --lcores

2024-10-17 Thread Konstantin Ananyev



> This commit fixes an issue in the power library
> related to using lcores mapped to different
> physical cores (--lcores option in EAL).
> 
> Previously, the power library incorrectly accessed
> CPU sysfs attributes for power management, treating
> lcore IDs as CPU IDs.
> e.g. with --lcores '1@128', lcore_id '1' was interpreted
> as CPU_id instead of '128'.
> 
> This patch corrects the cpu_id based on lcore and CPU
> mappings. It also constraints power management support
> for lcores mapped to multiple physical cores/threads.
> 
> When multiple lcores are mapped to the same physical core,
> invoking frequency scaling APIs on any lcore will apply the
> changes effectively.
> 
> Signed-off-by: Sivaprasad Tummala 
> ---
>  lib/power/power_acpi_cpufreq.c   |  7 ++-
>  lib/power/power_amd_pstate_cpufreq.c |  7 ++-
>  lib/power/power_common.c | 23 +++
>  lib/power/power_common.h |  2 +-
>  lib/power/power_cppc_cpufreq.c   |  7 ++-
>  lib/power/power_pstate_cpufreq.c |  7 ++-
>  6 files changed, 48 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
> index 81996e1c13..259bc8a263 100644
> --- a/lib/power/power_acpi_cpufreq.c
> +++ b/lib/power/power_acpi_cpufreq.c
> @@ -258,7 +258,12 @@ power_acpi_cpufreq_init(unsigned int lcore_id)
>   return -1;
>   }
> 
> - pi->lcore_id = lcore_id;
> + if (check_lcore_and_set_cpu(lcore_id, &pi->lcore_id) < 0) {
> + POWER_LOG(ERR,
> + "Cannot get cpu id mapped for lcore %u", lcore_id);
> + return -1;
> + }
> +
>   /* Check and set the governor */
>   if (power_set_governor_userspace(pi) < 0) {
>   POWER_LOG(ERR, "Cannot set governor of lcore %u to "
> diff --git a/lib/power/power_amd_pstate_cpufreq.c 
> b/lib/power/power_amd_pstate_cpufreq.c
> index 090a0d96cb..9bfb271bb7 100644
> --- a/lib/power/power_amd_pstate_cpufreq.c
> +++ b/lib/power/power_amd_pstate_cpufreq.c
> @@ -376,7 +376,12 @@ power_amd_pstate_cpufreq_init(unsigned int lcore_id)
>   return -1;
>   }
> 
> - pi->lcore_id = lcore_id;
> + if (check_lcore_and_set_cpu(lcore_id, &pi->lcore_id) < 0) {
> + POWER_LOG(ERR,
> + "Cannot get cpu id mapped for lcore %u", lcore_id);
> + return -1;
> + }
> +
>   /* Check and set the governor */
>   if (power_set_governor_userspace(pi) < 0) {
>   POWER_LOG(ERR, "Cannot set governor of lcore %u to "
> diff --git a/lib/power/power_common.c b/lib/power/power_common.c
> index 590986d5ef..a8d5cd1c50 100644
> --- a/lib/power/power_common.c
> +++ b/lib/power/power_common.c
> @@ -9,6 +9,7 @@
> 
>  #include 
>  #include 
> +#include 
> 
>  #include "power_common.h"
> 
> @@ -204,3 +205,25 @@ power_set_governor(unsigned int lcore_id, const char 
> *new_governor,
> 
>   return ret;
>  }
> +
> +int check_lcore_and_set_cpu(uint32_t lcore_id, uint32_t *cpu_id)
> +{
> + rte_cpuset_t lcore_cpus;
> + uint32_t cpu;
> +
> + lcore_cpus = rte_lcore_cpuset(lcore_id);
> + if (CPU_COUNT(&lcore_cpus) != 1) {
> + POWER_LOG(ERR,
> + "Power library does not support lcore %u mapping to %u 
> cpus",
> + lcore_id, CPU_COUNT(&lcore_cpus));
> + return -1;
> + }
> +
> + for (cpu = 0; cpu < CPU_SETSIZE; cpu++) {
> + if (CPU_ISSET(cpu, &lcore_cpus))
> + break;
> + }
> + *cpu_id = cpu;
> +
> + return 0;
> +}
> diff --git a/lib/power/power_common.h b/lib/power/power_common.h
> index 83f742f42a..c5034104d0 100644
> --- a/lib/power/power_common.h
> +++ b/lib/power/power_common.h
> @@ -31,5 +31,5 @@ int open_core_sysfs_file(FILE **f, const char *mode, const 
> char *format, ...)
>  int read_core_sysfs_u32(FILE *f, uint32_t *val);
>  int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
>  int write_core_sysfs_s(FILE *f, const char *str);
> -
> +int check_lcore_and_set_cpu(uint32_t lcore_id, uint32_t *cpu_id);
>  #endif /* _POWER_COMMON_H_ */
> diff --git a/lib/power/power_cppc_cpufreq.c b/lib/power/power_cppc_cpufreq.c
> index 32aaacb948..0a1d723bae 100644
> --- a/lib/power/power_cppc_cpufreq.c
> +++ b/lib/power/power_cppc_cpufreq.c
> @@ -362,7 +362,12 @@ power_cppc_cpufreq_init(unsigned int lcore_id)
>   return -1;
>   }
> 
> - pi->lcore_id = lcore_id;
> + if (check_lcore_and_set_cpu(lcore_id, &pi->lcore_id) < 0) {
> + POWER_LOG(ERR,
> + "Cannot get cpu id mapped for lcore %u", lcore_id);
> + return -1;
> + }
> +
>   /* Check and set the governor */
>   if (power_set_governor_userspace(pi) < 0) {
>   POWER_LOG(ERR, "Cannot set governor of lcore %u to "
> diff --git a/lib/power/power_pstate_cpufreq.c 
> b/lib/power/power_pstate_cpufreq.c
> index 2343121621..116b130be2 100644
> --

[DPDK/ethdev Bug 1566] On Windows, some netuio-bound device can not be used on Windows if at least one device is using Intel ice drivers

2024-10-17 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1566

Bug ID: 1566
   Summary: On Windows, some netuio-bound device can not be used
on Windows if at least one device is using Intel ice
drivers
   Product: DPDK
   Version: 23.11
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: normal
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: e.ho...@deltacast.tv
  Target Milestone: ---

I have a Windows set-up on which I have one Intel XXV710 (2 x 25G port) and one
Intel E810 (2 x 100G port).

When using DPDK v23.11.2, if I bind the 4 NIC to netuio, when running
dpdk-testpmp, I have the following :

> PS C:\dpdk-23112\bin> .\dpdk-testpmd.exe
> EAL: Detected CPU lcores: 48
> EAL: Detected NUMA nodes: 2
> EAL: Multi-process support is requested, but not available.
> EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: :18:00.0 (socket 0)
> mlx5_common: Cannot list devices, is DevX enabled?
> mlx5_common: Failed to allocate PD Obj using DevX.
> mlx5_common: Failed to initialize device context.
> EAL: Requested device :18:00.0 cannot be used
> EAL: Probe PCI driver: mlx5_pci (15b3:1015) device: :18:00.1 (socket 0)
> mlx5_common: Cannot list devices, is DevX enabled?
> mlx5_common: Failed to allocate PD Obj using DevX.
> mlx5_common: Failed to initialize device context.
> EAL: Requested device :18:00.1 cannot be used
> EAL: Probe PCI driver: net_ice (8086:1592) device: :5e:00.0 (socket 0)
> ice_dev_init(): Failed to read device serial number
> 
> ice_load_pkg_type(): Active package is: 1.3.36.0, ICE OS Default Package
> (double VLAN mode)
> EAL: Probe PCI driver: net_ice (8086:1592) device: :5e:00.1 (socket 0)
> ice_dev_init(): Failed to initialize HW
> EAL: Requested device :5e:00.1 cannot be used
> EAL: Probe PCI driver: net_i40e (8086:158b) device: :af:00.0 (socket 1)
> i40e_enable_extended_tag(): Does not support Extended Tag
> EAL: Probe PCI driver: net_i40e (8086:158b) device: :af:00.1 (socket 1)
> i40e_enable_extended_tag(): Does not support Extended Tag
> eth_i40e_dev_init(): mac address is not valid
> ethdev initialisation failed
> EAL: Requested device :af:00.1 cannot be used

Logs show :
> EAL: Requested device :5e:00.1 cannot be used
> ...
> EAL: Requested device :af:00.1 cannot be used



I tried the following : 
- Only one i40e NIC bound to netuio : OK
- Only one ice NIC bound to netuio : OK
- Two i40e NIC bound to netuio : OK
- Two ice NIC bound to netuio : Second NIC can not be used
- one ice NIC and two i40e NIC bound to netuio : Second i40e NIC can not be
used
- two ice NIC and two e40e NIC bound to netuio (as above) : second i40e and
second ice NIC can not be used



When using v24.07, the problem is not present : 

> PS C:\dpdk-2407\bin> .\dpdk-testpmd.exe
> EAL: Detected CPU lcores: 48
> EAL: Detected NUMA nodes: 2
> EAL: Multi-process support is requested, but not available.
> mlx5_common: Cannot list devices, is DevX enabled?
> mlx5_common: Failed to allocate PD Obj using DevX.
> mlx5_common: Failed to initialize device context.
> PCI_BUS: Requested device :18:00.0 cannot be used
> mlx5_common: Cannot list devices, is DevX enabled?
> mlx5_common: Failed to allocate PD Obj using DevX.
> mlx5_common: Failed to initialize device context.
> PCI_BUS: Requested device :18:00.1 cannot be used
> ice_flow_init(): Failed to initialize DDP parser, raw packet filter will not
> be supported
> ice_flow_init(): Failed to initialize DDP parser, raw packet filter will not
> be supported
> i40e_enable_extended_tag(): Does not support Extended Tag
> i40e_enable_extended_tag(): Does not support Extended Tag


No message regarding "NIC can not be used is present".

Any plan on fixing this behavior in 23.11?

Best regards,

Eric Houet

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [PATCH v1 27/30] net/i40e/base: change time variables from 64 bit to 32 bit

2024-10-17 Thread David Marchand
On Mon, Sep 2, 2024 at 11:58 AM Anatoly Burakov
 wrote:
>
> From: Jaroslaw Ilgiewicz 
>
> Time variables were designed for 32 bit and 64 bit variables are not
> necessary. Changed all to 32 bit.
>
> Signed-off-by: Jaroslaw Ilgiewicz 
> Signed-off-by: Anatoly Burakov 
> ---
>  drivers/net/i40e/base/i40e_common.c| 2 +-
>  drivers/net/i40e/base/i40e_nvm.c   | 6 +++---
>  drivers/net/i40e/base/i40e_prototype.h | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/i40e/base/i40e_common.c 
> b/drivers/net/i40e/base/i40e_common.c
> index 07e18deaea..a2cfafeda9 100644
> --- a/drivers/net/i40e/base/i40e_common.c
> +++ b/drivers/net/i40e/base/i40e_common.c
> @@ -3596,7 +3596,7 @@ enum i40e_status_code 
> i40e_aq_debug_write_register(struct i40e_hw *hw,
>  enum i40e_status_code i40e_aq_request_resource(struct i40e_hw *hw,
> enum i40e_aq_resources_ids resource,
> enum i40e_aq_resource_access_type access,
> -   u8 sdp_number, u64 *timeout,
> +   u8 sdp_number, u32 *timeout,
> struct i40e_asq_cmd_details *cmd_details)
>  {
> struct i40e_aq_desc desc;
> diff --git a/drivers/net/i40e/base/i40e_nvm.c 
> b/drivers/net/i40e/base/i40e_nvm.c
> index 2f6cd9eda5..185af67817 100644
> --- a/drivers/net/i40e/base/i40e_nvm.c
> +++ b/drivers/net/i40e/base/i40e_nvm.c
> @@ -62,7 +62,7 @@ enum i40e_status_code i40e_acquire_nvm(struct i40e_hw *hw,
>  {
> enum i40e_status_code ret_code = I40E_SUCCESS;
> u32 gtime, timeout;
> -   u64 time_left = 0;
> +   u32 time_left = 0;

Some logs (a few lines below) needs updating:

i40e_debug(hw, I40E_DEBUG_NVM,
   "NVM acquire type %d failed time_left=%"
PRIu64 " ret=%d aq_err=%d\n",
   access, time_left, ret_code,
hw->aq.asq_last_status);

And
i40e_debug(hw, I40E_DEBUG_NVM,
   "NVM acquire timed out, wait %"
PRIu64 " ms before trying again. status=%d aq_err=%d\n",
   time_left, ret_code,
hw->aq.asq_last_status);

Afaiu, it should be PRIu32.

This was raised by OBS CI:
https://build.opensuse.org/package/live_build_log/home:bluca:dpdk/dpdk/Debian_12/x86_64


-- 
David Marchand



[PATCH v1 1/2] power: fix power library with --lcores

2024-10-17 Thread Sivaprasad Tummala
This commit fixes an issue in the power library
related to using lcores mapped to different
physical cores (--lcores option in EAL).

Previously, the power library incorrectly accessed
CPU sysfs attributes for power management, treating
lcore IDs as CPU IDs.
e.g. with --lcores '1@128', lcore_id '1' was interpreted
as CPU_id instead of '128'.

This patch corrects the cpu_id based on lcore and CPU
mappings. It also constraints power management support
for lcores mapped to multiple physical cores/threads.

When multiple lcores are mapped to the same physical core,
invoking frequency scaling APIs on any lcore will apply the
changes effectively.

Signed-off-by: Sivaprasad Tummala 
---
 lib/power/power_acpi_cpufreq.c   |  7 ++-
 lib/power/power_amd_pstate_cpufreq.c |  7 ++-
 lib/power/power_common.c | 23 +++
 lib/power/power_common.h |  2 +-
 lib/power/power_cppc_cpufreq.c   |  7 ++-
 lib/power/power_pstate_cpufreq.c |  7 ++-
 6 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
index 81996e1c13..259bc8a263 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/lib/power/power_acpi_cpufreq.c
@@ -258,7 +258,12 @@ power_acpi_cpufreq_init(unsigned int lcore_id)
return -1;
}
 
-   pi->lcore_id = lcore_id;
+   if (check_lcore_and_set_cpu(lcore_id, &pi->lcore_id) < 0) {
+   POWER_LOG(ERR,
+   "Cannot get cpu id mapped for lcore %u", lcore_id);
+   return -1;
+   }
+
/* Check and set the governor */
if (power_set_governor_userspace(pi) < 0) {
POWER_LOG(ERR, "Cannot set governor of lcore %u to "
diff --git a/lib/power/power_amd_pstate_cpufreq.c 
b/lib/power/power_amd_pstate_cpufreq.c
index 090a0d96cb..9bfb271bb7 100644
--- a/lib/power/power_amd_pstate_cpufreq.c
+++ b/lib/power/power_amd_pstate_cpufreq.c
@@ -376,7 +376,12 @@ power_amd_pstate_cpufreq_init(unsigned int lcore_id)
return -1;
}
 
-   pi->lcore_id = lcore_id;
+   if (check_lcore_and_set_cpu(lcore_id, &pi->lcore_id) < 0) {
+   POWER_LOG(ERR,
+   "Cannot get cpu id mapped for lcore %u", lcore_id);
+   return -1;
+   }
+
/* Check and set the governor */
if (power_set_governor_userspace(pi) < 0) {
POWER_LOG(ERR, "Cannot set governor of lcore %u to "
diff --git a/lib/power/power_common.c b/lib/power/power_common.c
index 590986d5ef..a8d5cd1c50 100644
--- a/lib/power/power_common.c
+++ b/lib/power/power_common.c
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 
 #include "power_common.h"
 
@@ -204,3 +205,25 @@ power_set_governor(unsigned int lcore_id, const char 
*new_governor,
 
return ret;
 }
+
+int check_lcore_and_set_cpu(uint32_t lcore_id, uint32_t *cpu_id)
+{
+   rte_cpuset_t lcore_cpus;
+   uint32_t cpu;
+
+   lcore_cpus = rte_lcore_cpuset(lcore_id);
+   if (CPU_COUNT(&lcore_cpus) != 1) {
+   POWER_LOG(ERR,
+   "Power library does not support lcore %u mapping to %u 
cpus",
+   lcore_id, CPU_COUNT(&lcore_cpus));
+   return -1;
+   }
+
+   for (cpu = 0; cpu < CPU_SETSIZE; cpu++) {
+   if (CPU_ISSET(cpu, &lcore_cpus))
+   break;
+   }
+   *cpu_id = cpu;
+
+   return 0;
+}
diff --git a/lib/power/power_common.h b/lib/power/power_common.h
index 83f742f42a..c5034104d0 100644
--- a/lib/power/power_common.h
+++ b/lib/power/power_common.h
@@ -31,5 +31,5 @@ int open_core_sysfs_file(FILE **f, const char *mode, const 
char *format, ...)
 int read_core_sysfs_u32(FILE *f, uint32_t *val);
 int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
 int write_core_sysfs_s(FILE *f, const char *str);
-
+int check_lcore_and_set_cpu(uint32_t lcore_id, uint32_t *cpu_id);
 #endif /* _POWER_COMMON_H_ */
diff --git a/lib/power/power_cppc_cpufreq.c b/lib/power/power_cppc_cpufreq.c
index 32aaacb948..0a1d723bae 100644
--- a/lib/power/power_cppc_cpufreq.c
+++ b/lib/power/power_cppc_cpufreq.c
@@ -362,7 +362,12 @@ power_cppc_cpufreq_init(unsigned int lcore_id)
return -1;
}
 
-   pi->lcore_id = lcore_id;
+   if (check_lcore_and_set_cpu(lcore_id, &pi->lcore_id) < 0) {
+   POWER_LOG(ERR,
+   "Cannot get cpu id mapped for lcore %u", lcore_id);
+   return -1;
+   }
+
/* Check and set the governor */
if (power_set_governor_userspace(pi) < 0) {
POWER_LOG(ERR, "Cannot set governor of lcore %u to "
diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c
index 2343121621..116b130be2 100644
--- a/lib/power/power_pstate_cpufreq.c
+++ b/lib/power/power_pstate_cpufreq.c
@@ -564,7 +564,12 @@ power_pstate_cpufreq_init(unsigned int lcore_id)
return -1;

[PATCH v1 2/2] test/power: fix power library with --lcores

2024-10-17 Thread Sivaprasad Tummala
When user request to use lcores mapped to different physical
cores using --lcores eal option, power application accesses
incorrect cpu sysfs attribute for checking current frequency

The patch fixes the cpu_id based on the lcore and cpu mappings.

Signed-off-by: Sivaprasad Tummala 
---
 app/test/test_power_cpufreq.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/app/test/test_power_cpufreq.c b/app/test/test_power_cpufreq.c
index 619b2811c6..63d13614df 100644
--- a/app/test/test_power_cpufreq.c
+++ b/app/test/test_power_cpufreq.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "test.h"
 
@@ -46,9 +47,10 @@ test_power_caps(void)
 
 static uint32_t total_freq_num;
 static uint32_t freqs[TEST_POWER_FREQS_NUM_MAX];
+static uint32_t cpu_id;
 
 static int
-check_cur_freq(unsigned int lcore_id, uint32_t idx, bool turbo)
+check_cur_freq(__rte_unused unsigned int lcore_id, uint32_t idx, bool turbo)
 {
 #define TEST_POWER_CONVERT_TO_DECIMAL 10
 #define MAX_LOOP 100
@@ -62,13 +64,13 @@ check_cur_freq(unsigned int lcore_id, uint32_t idx, bool 
turbo)
int i;
 
if (snprintf(fullpath, sizeof(fullpath),
-   TEST_POWER_SYSFILE_CPUINFO_FREQ, lcore_id) < 0) {
+   TEST_POWER_SYSFILE_CPUINFO_FREQ, cpu_id) < 0) {
return 0;
}
f = fopen(fullpath, "r");
if (f == NULL) {
if (snprintf(fullpath, sizeof(fullpath),
-   TEST_POWER_SYSFILE_SCALING_FREQ, lcore_id) < 0) {
+   TEST_POWER_SYSFILE_SCALING_FREQ, cpu_id) < 0) {
return 0;
}
f = fopen(fullpath, "r");
@@ -497,6 +499,20 @@ test_power_cpufreq(void)
 {
int ret = -1;
enum power_management_env env;
+   rte_cpuset_t lcore_cpus;
+
+   lcore_cpus = rte_lcore_cpuset(TEST_POWER_LCORE_ID);
+   if (CPU_COUNT(&lcore_cpus) != 1) {
+   printf("Power management doesn't support "
+   "lcore %u mapping to %u cpus\n",
+   TEST_POWER_LCORE_ID,
+   CPU_COUNT(&lcore_cpus));
+   return TEST_SKIPPED;
+   }
+   for (cpu_id = 0; cpu_id < CPU_SETSIZE; cpu_id++) {
+   if (CPU_ISSET(cpu_id, &lcore_cpus))
+   break;
+   }
 
/* Test initialisation of a valid lcore */
ret = rte_power_init(TEST_POWER_LCORE_ID);
-- 
2.34.1



Re: [PATCH v3 2/2] examples/l3fwd: add option to set mbuf cache size

2024-10-17 Thread fengchengwen
There's a little tip that can be modified, with that modified:
Acked-by: Chengwen Feng 

On 2024/10/17 17:58, Jie Hai wrote:
> The mempool cache size of mbuf is set to
> RTE_MEMPOOL_CACHE_MAX_SIZE as default. This patch allows
> users to configure the cache size by "--mbcache", and limits
> the parameter to a maximum of RTE_MEMPOOL_CACHE_MAX_SIZE.
> 
> Signed-off-by: Jie Hai 
> Acked-by: Huisong Li 
> Acked-by: Morten Brørup 
> ---
>  examples/l3fwd/l3fwd.h |  1 +
>  examples/l3fwd/main.c  | 33 ++---
>  2 files changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
> index 618e0eaa3af1..0cce3406ee7d 100644
> --- a/examples/l3fwd/l3fwd.h
> +++ b/examples/l3fwd/l3fwd.h
> @@ -117,6 +117,7 @@ extern struct acl_algorithms acl_alg[];
>  extern uint32_t max_pkt_len;
>  
>  extern uint32_t nb_pkt_per_burst;
> +extern uint32_t mb_mempool_cache_size;
>  
>  /* Send burst of packets on an output interface */
>  static inline int
> diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
> index 8b7a07cc7d67..d08525faa1a6 100644
> --- a/examples/l3fwd/main.c
> +++ b/examples/l3fwd/main.c
> @@ -58,6 +58,7 @@ static_assert(MEMPOOL_CACHE_SIZE >= MAX_PKT_BURST, 
> "MAX_PKT_BURST should be at m
>  uint16_t nb_rxd = RX_DESC_DEFAULT;
>  uint16_t nb_txd = TX_DESC_DEFAULT;
>  uint32_t nb_pkt_per_burst = DEFAULT_PKT_BURST;
> +uint32_t mb_mempool_cache_size = MEMPOOL_CACHE_SIZE;
>  
>  /**< Ports set in promiscuous mode off by default. */
>  static int promiscuous_on;
> @@ -399,6 +400,7 @@ print_usage(const char *prgname)
>   " [--rx-queue-size NPKTS]"
>   " [--tx-queue-size NPKTS]"
>   " [--burst NPKTS]"
> + " [--mbcache CACHESZ]"
>   " [--eth-dest=X,MM:MM:MM:MM:MM:MM]"
>   " [--max-pkt-len PKTLEN]"
>   " [--no-numa]"
> @@ -426,6 +428,8 @@ print_usage(const char *prgname)
>   "Default: %d\n"
>   "  --burst NPKTS: Burst size in decimal\n"
>   "Default: %d\n"
> + "  --mbcache CACHESZ: Cache size in decimal\n"

Suggest "  --mbcache CACHESZ: Mbuf cache size in decimal\n"

> + "Default: %d\n"
>   "  --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for 
> port X\n"
>   "  --max-pkt-len PKTLEN: maximum packet length in decimal 
> (64-9600)\n"
>   "  --no-numa: Disable numa awareness\n"
> @@ -455,7 +459,7 @@ print_usage(const char *prgname)
>   "another is route entry at while line leads 
> with character '%c'.\n"
>   "  --rule_ipv6=FILE: Specify the ipv6 rules entries file.\n"
>   "  --alg: ACL classify method to use, one of: %s.\n\n",
> - prgname, RX_DESC_DEFAULT, TX_DESC_DEFAULT, DEFAULT_PKT_BURST,
> + prgname, RX_DESC_DEFAULT, TX_DESC_DEFAULT, DEFAULT_PKT_BURST, 
> MEMPOOL_CACHE_SIZE,
>   ACL_LEAD_CHAR, ROUTE_LEAD_CHAR, alg);
>  }
>  
> @@ -673,6 +677,22 @@ parse_lookup(const char *optarg)
>   return 0;
>  }
>  
> +static void
> +parse_mbcache_size(const char *optarg)
> +{
> + unsigned long mb_cache_size;
> + char *end = NULL;
> +
> + mb_cache_size = strtoul(optarg, &end, 10);
> + if ((optarg[0] == '\0') || (end == NULL) || (*end != '\0'))
> + return;
> + if (mb_cache_size <= RTE_MEMPOOL_CACHE_MAX_SIZE)
> + mb_mempool_cache_size = (uint32_t)mb_cache_size;
> + else
> + rte_exit(EXIT_FAILURE, "mbcache must be >= 0 and <= %d\n",
> +  RTE_MEMPOOL_CACHE_MAX_SIZE);
> +}
> +
>  static void
>  parse_pkt_burst(const char *optarg)
>  {
> @@ -748,6 +768,7 @@ static const char short_options[] =
>  #define CMD_LINE_OPT_RULE_IPV6 "rule_ipv6"
>  #define CMD_LINE_OPT_ALG "alg"
>  #define CMD_LINE_OPT_PKT_BURST "burst"
> +#define CMD_LINE_OPT_MB_CACHE_SIZE "mbcache"
>  
>  enum {
>   /* long options mapped to a short option */
> @@ -777,7 +798,8 @@ enum {
>   CMD_LINE_OPT_ENABLE_VECTOR_NUM,
>   CMD_LINE_OPT_VECTOR_SIZE_NUM,
>   CMD_LINE_OPT_VECTOR_TMO_NS_NUM,
> - CMD_LINE_OPT_PKT_BURST_NUM
> + CMD_LINE_OPT_PKT_BURST_NUM,
> + CMD_LINE_OPT_MB_CACHE_SIZE_NUM
>  };
>  
>  static const struct option lgopts[] = {
> @@ -805,6 +827,7 @@ static const struct option lgopts[] = {
>   {CMD_LINE_OPT_RULE_IPV6,   1, 0, CMD_LINE_OPT_RULE_IPV6_NUM},
>   {CMD_LINE_OPT_ALG,   1, 0, CMD_LINE_OPT_ALG_NUM},
>   {CMD_LINE_OPT_PKT_BURST,   1, 0, CMD_LINE_OPT_PKT_BURST_NUM},
> + {CMD_LINE_OPT_MB_CACHE_SIZE,   1, 0, CMD_LINE_OPT_MB_CACHE_SIZE_NUM},
>   {NULL, 0, 0, 0}
>  };
>  
> @@ -897,6 +920,10 @@ parse_args(int argc, char **argv)
>   parse_pkt_burst(optarg);
>   break;
>  
> + case CMD_LINE_OPT_MB_CACHE_SIZE_NUM:
> + parse_mbcache_size(optarg);
> + break;
> +
>  

Re: [PATCH v1 2/2] test/power: fix power library with --lcores

2024-10-17 Thread lihuisong (C)



在 2024/10/17 19:02, Sivaprasad Tummala 写道:

When user request to use lcores mapped to different physical
cores using --lcores eal option, power application accesses
incorrect cpu sysfs attribute for checking current frequency

The patch fixes the cpu_id based on the lcore and cpu mappings.

Signed-off-by: Sivaprasad Tummala 
---
  app/test/test_power_cpufreq.c | 22 +++---
  1 file changed, 19 insertions(+), 3 deletions(-)

Acked-by: Huisong Li 


Build issue with Fedora Rawhide

2024-10-17 Thread David Marchand
Hello guys,

I am not clear if this issue is new (and what caused it), but
compilation fails on Rawhide for the net/gve driver (see below for the
log).

Afaics, Intel drivers wrapped __le16 types (and friends) using macros
(example: 
https://git.dpdk.org/dpdk/tree/drivers/net/i40e/base/i40e_osdep.h#n47),
and I suspect it was to avoid such conflicts.

Can you send a fix please?

[  224s] FAILED: drivers/libtmp_rte_net_gve.a.p/net_gve_base_gve_adminq.c.o
[  224s] cc -Idrivers/libtmp_rte_net_gve.a.p -Idrivers -I../drivers
-Idrivers/net/gve -I../drivers/net/gve -I../drivers/net/gve/base
-Ilib/ethdev -I../lib/ethdev -I. -I.. -Iconfig -I../config
-Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include
-I../lib/eal/linux/include -Ilib/eal/x86/include
-I../lib/eal/x86/include -Ilib/eal/common -I../lib/eal/common
-Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log
-I../lib/log -Ilib/metrics -I../lib/metrics -Ilib/telemetry
-I../lib/telemetry -Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf
-Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/meter
-I../lib/meter -Idrivers/bus/pci -I../drivers/bus/pci
-I../drivers/bus/pci/linux -Ilib/pci -I../lib/pci -Idrivers/bus/vdev
-I../drivers/bus/vdev -fdiagnostics-color=always
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=c11 -O3
-include rte_config.h -Wcast-qual -Wdeprecated -Wformat
-Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare
-Wstrict-prototypes -Wundef -Wwrite-strings
-Wno-address-of-packed-member -Wno-packed-not-aligned
-Wno-missing-field-initializers -Wno-zero-length-bounds -D_GNU_SOURCE
-fcommon -Werror -fPIC -march=corei7 -mrtm -DALLOW_EXPERIMENTAL_API
-DALLOW_INTERNAL_API -Wno-format-truncation
-DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.gve -MD -MQ
drivers/libtmp_rte_net_gve.a.p/net_gve_base_gve_adminq.c.o -MF
drivers/libtmp_rte_net_gve.a.p/net_gve_base_gve_adminq.c.o.d -o
drivers/libtmp_rte_net_gve.a.p/net_gve_base_gve_adminq.c.o -c
../drivers/net/gve/base/gve_adminq.c
[  224s] In file included from ../drivers/net/gve/base/../base/gve_desc.h:11,
[  224s]  from ../drivers/net/gve/base/../base/gve.h:9,
[  224s]  from ../drivers/net/gve/base/../gve_ethdev.h:13,
[  224s]  from ../drivers/net/gve/base/gve_adminq.c:6:
[  224s] ../drivers/net/gve/base/../base/gve_osdep.h:41:20: error:
conflicting types for ‘__be64’; have ‘rte_be64_t’ {aka ‘long
unsigned int’}
[  224s]41 | typedef rte_be64_t __be64;
[  224s]   |^~
[  224s] In file included from /usr/include/linux/sched/types.h:5,
[  224s]  from /usr/include/bits/sched.h:60,
[  224s]  from /usr/include/sched.h:43,
[  224s]  from /usr/include/pthread.h:22,
[  224s]  from ../lib/ethdev/ethdev_driver.h:17,
[  224s]  from ../drivers/net/gve/base/../gve_ethdev.h:8:
[  224s] /usr/include/linux/types.h:36:25: note: previous declaration
of ‘__be64’ with type ‘__be64’ {aka ‘long long unsigned
int’}
[  224s]36 | typedef __u64 __bitwise __be64;
[  224s]   | ^~
[  224s] ../drivers/net/gve/base/../base/gve_osdep.h:45:20: error:
conflicting types for ‘__le64’; have ‘rte_le64_t’ {aka ‘long
unsigned int’}
[  224s]45 | typedef rte_le64_t __le64;
[  224s]   |^~
[  224s] /usr/include/linux/types.h:35:25: note: previous declaration
of ‘__le64’ with type ‘__le64’ {aka ‘long long unsigned
int’}
[  224s]35 | typedef __u64 __bitwise __le64;
[  224s]   | ^~



-- 
David Marchand



Re: [PATCH dpdk v3 00/17] IPv6 APIs overhaul

2024-10-17 Thread David Marchand
Hello Robin,

On Thu, Oct 10, 2024 at 9:42 PM Robin Jarry  wrote:
>
> As discussed recently [1], here is a first draft of the IPv6 APIs rework. The
> API change was announced before the 24.07 release [2]. This series is intended
> for 24.11.
>
> [1] http://inbox.dpdk.org/dev/d2sr8t1h39cj.jrqfi6jeh...@redhat.com/
> [2] 
> https://git.dpdk.org/dpdk/commit/?id=835d4c41e0ab58a115c2170c886ba6d3cc1b5764
>
> I tried to keep the patches as small as possible; unfortunately some of them
> are quite big and cannot be broken down if we want to preserve a bisectable
> tree.
>
> Let me know what you think.
>
> Thanks!
>
> Cc: Bruce Richardson 
> Cc: Ferruh Yigit 
> Cc: Konstantin Ananyev 
> Cc: Morten Brørup 
> Cc: Stephen Hemminger 
> Cc: Vladimir Medvedkin 
>
> Changelog:
>
> v3:
>
> - replace *memcpy(ipv6) with direct struct assignments
> - replace in6_addr with rte_ipv6_addr
> - add more ipv6 utils to deal with addresses
> - replace string initializers and RTE_IPV6_ADDR() with RTE_IPV6()
> - restore deleted macro constants and mark them as RTE_DEPRECATED()
>
> Robin Jarry (17):
>   net: split raw checksum functions in separate header
>   net: split ipv6 symbols in separate header
>   net: add structure for ipv6 addresses
>   net: add ipv6 address utilities
>   net: use struct rte_ipv6_addr for header addresses
>   fib6,rib6,lpm6: use struct rte_ipv6_addr
>   fib6,rib6,lpm6: use ipv6 utils
>   rib6,fib6,lpm6: remove duplicate constants
>   cmdline: replace in6_addr with rte_ipv6_addr
>   graph,node: use struct rte_ipv6_addr and utils
>   pipeline: use struct rte_ipv6_addr
>   ipsec,security: use struct rte_ipv6_addr and utils
>   thash: use struct rte_ipv6_addr
>   gro: use struct rte_ipv6_addr
>   rte_flow: use struct rte_ipv6_addr
>   net: add utilities for well known ipv6 address types
>   ipv6: add function to check ipv6 version
>
>  MAINTAINERS |1 +
>  app/graph/ethdev.c  |   44 +-
>  app/graph/ethdev.h  |9 +-
>  app/graph/ip6_route.c   |   51 +-
>  app/graph/meson.build   |2 +-
>  app/graph/neigh.c   |   21 +-
>  app/graph/neigh_priv.h  |4 +-
>  app/graph/route.h   |8 +-
>  app/test-fib/main.c |   51 +-
>  app/test-flow-perf/actions_gen.c|4 +-
>  app/test-flow-perf/items_gen.c  |4 +-
>  app/test-pipeline/pipeline_hash.c   |4 +-
>  app/test-pipeline/pipeline_lpm_ipv6.c   |   11 +-
>  app/test-pmd/cmdline.c  |4 +-
>  app/test-pmd/cmdline_flow.c |   14 +-
>  app/test-pmd/testpmd.h  |   16 +-
>  app/test-sad/main.c |   24 +-
>  app/test/meson.build|1 +
>  app/test/packet_burst_generator.c   |5 +-
>  app/test/test_cmdline_ipaddr.c  |   49 +-
>  app/test/test_cryptodev_security_ipsec.c|1 +
>  app/test/test_fib6.c|   92 +-
>  app/test/test_fib6_perf.c   |8 +-
>  app/test/test_ipfrag.c  |4 +-
>  app/test/test_ipsec_sad.c   |   46 +-
>  app/test/test_lpm6.c|  490 +++--
>  app/test/test_lpm6_data.h   | 2025 ++-
>  app/test/test_lpm6_perf.c   |   10 +-
>  app/test/test_net_ip6.c |  195 ++
>  app/test/test_reassembly_perf.c |   23 +-
>  app/test/test_rib6.c|   55 +-
>  app/test/test_table_combined.c  |2 +-
>  app/test/test_table_tables.c|8 +-
>  app/test/test_thash.c   |   46 +-
>  doc/guides/prog_guide/ipsec_lib.rst |4 +-
>  doc/guides/rel_notes/deprecation.rst|   42 -
>  doc/guides/rel_notes/release_24_11.rst  |   12 +
>  drivers/common/cnxk/cnxk_security.c |   15 +-
>  drivers/crypto/cnxk/cn9k_cryptodev_ops.c|1 +
>  drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c |1 +
>  drivers/net/bnxt/bnxt_flow.c|   12 +-
>  drivers/net/bonding/rte_eth_bond_pmd.c  |6 +-
>  drivers/net/cxgbe/cxgbe_flow.c  |   20 +-
>  drivers/net/dpaa2/dpaa2_flow.c  |   24 +-
>  drivers/net/hinic/hinic_pmd_flow.c  |6 +-
>  drivers/net/hinic/hinic_pmd_tx.c|2 +-
>  drivers/net/hns3/hns3_flow.c|8 +-
>  drivers/net/i40e/i40e_flow.c|   12 +-
>  drivers/net/iavf/iavf_fdir.c|8 +-
>  drivers/net/iavf/iavf_fsub.c|8 +-
>  drivers/net/iavf/iavf_ipsec_crypto.c|9 +-
>  drivers/net/ice/ice_fdir_filter.c   |   12 +-
>  drivers/net/ice/ice_switch_filter.c |   16 +-
>  drivers/net/igc/igc_flow.c  |4 +-
>  drivers/net/ixgbe/ixgbe_flow.c  |   1

RE: [PATCH v4] net/netvsc: fix number Tx queues > Rx queues

2024-10-17 Thread Long Li
> This version of the patch is garbled in patchwork which means the CI test 
> never
> ran because patch would not apply.
> 
> You need cleanup and resubmit it.

I have sent v5 on behalf of Alan.


Re: [PATCH v11 00/12] drivers/zsda: introduce zsda drivers

2024-10-17 Thread Stephen Hemminger
On Thu, 17 Oct 2024 17:21:57 +0800
Hanxiao Li  wrote:

> v11:
> - use RTE_LOG_LINE in logging macro.
> - fix some known bugs.
> 
> v10:
> - delete new blank line at EOF
> - Cleaning up some code in zsda_log.h
> 
> v9:
> - add a new feature  in default.ini.
> - Re-split the patch according to the new PMD guidelines
> https://patches.dpdk.org/project/dpdk/patch/20241006184
> 254.53499-1-nandinipersad...@gmail.com/
> - Split SM4-XTS tests into a new series to releases.
> - Separate out datapath(enqueue/dequeue) as a separate patch.
> 
> v8:
> 
> - fix some errors in cryptodevs/features/zsda.ini.
> 
> v7: 
> 
> - add release notes and some documentations.
> - add MAINTAINERS context in the patch where the file/folder is added.
> - add files in meason.build which are included in the patch only.
> - add a check for unsupported on Windows.
> - notice the implicit cast in C.
> - add cover letter.
> - compile each of the patches individually.
> 
> 
> 
> Hanxiao Li (12):
>   zsda: add zsdadev driver documents
>   config: add zsda device number
>   common/zsda: add some common functions
>   common/zsda: configure zsda device
>   common/zsda: configure zsda queue base functions
>   common/zsda: configure zsda queue enqueue functions
>   common/zsda: configure zsda queue dequeue functions
>   compress/zsda: add zsda compress driver
>   compress/zsda: add zsda compress PMD
>   crypto/zsda: add crypto sessions configuration
>   crypto/zsda: add zsda crypto driver
>   crypto/zsda: add zsda crypto PMD
> 
>  MAINTAINERS |   7 +
>  config/rte_config.h |   4 +
>  doc/guides/compressdevs/features/zsda.ini   |  15 +
>  doc/guides/compressdevs/index.rst   |   1 +
>  doc/guides/compressdevs/zsda.rst|  45 +
>  doc/guides/cryptodevs/features/zsda.ini |  51 ++
>  doc/guides/cryptodevs/index.rst |   1 +
>  doc/guides/cryptodevs/zsda.rst  | 260 ++
>  doc/guides/rel_notes/release_24_11.rst  |   8 +
>  drivers/common/zsda/meson.build |  39 +
>  drivers/common/zsda/zsda_common.c   | 239 ++
>  drivers/common/zsda/zsda_common.h   | 334 
>  drivers/common/zsda/zsda_device.c   | 263 ++
>  drivers/common/zsda/zsda_device.h   | 112 +++
>  drivers/common/zsda/zsda_logs.c |  20 +
>  drivers/common/zsda/zsda_logs.h |  25 +
>  drivers/common/zsda/zsda_qp.c   | 876 
>  drivers/common/zsda/zsda_qp.h   | 149 
>  drivers/compress/zsda/zsda_comp.c   | 392 +
>  drivers/compress/zsda/zsda_comp.h   |  52 ++
>  drivers/compress/zsda/zsda_comp_pmd.c   | 464 +++
>  drivers/compress/zsda/zsda_comp_pmd.h   |  34 +
>  drivers/crypto/zsda/zsda_sym.c  | 273 ++
>  drivers/crypto/zsda/zsda_sym.h  |  50 ++
>  drivers/crypto/zsda/zsda_sym_capabilities.h | 111 +++
>  drivers/crypto/zsda/zsda_sym_pmd.c  | 445 ++
>  drivers/crypto/zsda/zsda_sym_pmd.h  |  33 +
>  drivers/crypto/zsda/zsda_sym_session.c  | 512 
>  drivers/crypto/zsda/zsda_sym_session.h  |  83 ++
>  drivers/meson.build |   1 +
>  30 files changed, 4899 insertions(+)
>  create mode 100644 doc/guides/compressdevs/features/zsda.ini
>  create mode 100644 doc/guides/compressdevs/zsda.rst
>  create mode 100644 doc/guides/cryptodevs/features/zsda.ini
>  create mode 100644 doc/guides/cryptodevs/zsda.rst
>  create mode 100644 drivers/common/zsda/meson.build
>  create mode 100644 drivers/common/zsda/zsda_common.c
>  create mode 100644 drivers/common/zsda/zsda_common.h
>  create mode 100644 drivers/common/zsda/zsda_device.c
>  create mode 100644 drivers/common/zsda/zsda_device.h
>  create mode 100644 drivers/common/zsda/zsda_logs.c
>  create mode 100644 drivers/common/zsda/zsda_logs.h
>  create mode 100644 drivers/common/zsda/zsda_qp.c
>  create mode 100644 drivers/common/zsda/zsda_qp.h
>  create mode 100644 drivers/compress/zsda/zsda_comp.c
>  create mode 100644 drivers/compress/zsda/zsda_comp.h
>  create mode 100644 drivers/compress/zsda/zsda_comp_pmd.c
>  create mode 100644 drivers/compress/zsda/zsda_comp_pmd.h
>  create mode 100644 drivers/crypto/zsda/zsda_sym.c
>  create mode 100644 drivers/crypto/zsda/zsda_sym.h
>  create mode 100644 drivers/crypto/zsda/zsda_sym_capabilities.h
>  create mode 100644 drivers/crypto/zsda/zsda_sym_pmd.c
>  create mode 100644 drivers/crypto/zsda/zsda_sym_pmd.h
>  create mode 100644 drivers/crypto/zsda/zsda_sym_session.c
>  create mode 100644 drivers/crypto/zsda/zsda_sym_session.h
> 

There a couple things that should be fixed later.

Series-Acked-by: Stephen Hemminger 


Re: [PATCH v11 05/12] common/zsda: configure zsda queue base functions

2024-10-17 Thread Stephen Hemminger
On Thu, 17 Oct 2024 17:22:02 +0800
Hanxiao Li  wrote:

> +static int
> +zsda_get_queue_cfg_by_id(const struct zsda_pci_device *zsda_pci_dev,
> +  const uint8_t qid, struct qinfo *qcfg)
> +{
> + struct zsda_admin_req_qcfg req = {0};
> + struct zsda_admin_resp_qcfg resp = {0};
> + int ret = 0;
> + struct rte_pci_device *pci_dev =
> + zsda_devs[zsda_pci_dev->zsda_dev_id].pci_dev;
> +
> + if (qid >= MAX_QPS_ON_FUNCTION) {
> + ZSDA_LOG(ERR, "qid beyond limit!");
> + return ZSDA_FAILED;
> + }
> +
> + zsda_admin_msg_init(pci_dev);
> + req.msg_type = ZSDA_ADMIN_QUEUE_CFG_REQ;
> + req.qid = qid;
> +
> + ret = zsda_send_admin_msg(pci_dev, &req, sizeof(req));
> + if (ret) {
> + ZSDA_LOG(ERR, "Failed! Send msg");
> + return ret;
> + }
> +
> + ret = zsda_recv_admin_msg(pci_dev, &resp, sizeof(resp));
> + if (ret) {
> + ZSDA_LOG(ERR, "Failed! Receive msg");
> + return ret;
> + }
> +
> + memcpy(qcfg, &resp.qcfg, sizeof(*qcfg));

Could this just be a structure assignment and keep type safety.

*qcfg = resp.qcfg;

> +static int
> +zsda_cookie_init(const uint32_t dev_id, struct zsda_qp **qp_addr,
> + const uint16_t queue_pair_id,
> + const struct zsda_qp_config *zsda_qp_conf)
> +{
> + struct zsda_qp *qp = *qp_addr;
> + struct rte_pci_device *pci_dev = zsda_devs[dev_id].pci_dev;
> + char op_cookie_pool_name[RTE_RING_NAMESIZE];
> + uint32_t i;
> + enum zsda_service_type type = zsda_qp_conf->service_type;
> +
> + if (zsda_qp_conf->nb_descriptors != ZSDA_MAX_DESC)
> + ZSDA_LOG(ERR, "Can't create qp for %u descriptors",
> +  zsda_qp_conf->nb_descriptors);
> +
> + qp->srv[type].nb_descriptors = zsda_qp_conf->nb_descriptors;
> +
> + qp->srv[type].op_cookies = rte_zmalloc_socket(
> + "zsda PMD op cookie pointer",
> + zsda_qp_conf->nb_descriptors *
> + sizeof(*qp->srv[type].op_cookies),
> + RTE_CACHE_LINE_SIZE, zsda_qp_conf->socket_id);
> +
> + if (qp->srv[type].op_cookies == NULL) {
> + ZSDA_LOG(ERR, E_MALLOC);
> + return -ENOMEM;
> + }
> +
> + snprintf(op_cookie_pool_name, RTE_RING_NAMESIZE, "%s%d_cks_%s_qp%hu",
> +  pci_dev->driver->driver.name, dev_id,
> +  zsda_qp_conf->service_str, queue_pair_id);
> +
> + qp->srv[type].op_cookie_pool = rte_mempool_lookup(op_cookie_pool_name);
> + if (qp->srv[type].op_cookie_pool == NULL)
> + qp->srv[type].op_cookie_pool = rte_mempool_create(
> + op_cookie_pool_name, qp->srv[type].nb_descriptors,
> + zsda_qp_conf->cookie_size, 64, 0, NULL, NULL, NULL,
> + NULL, (int)(rte_socket_id() & 0xfff), 0);
> + if (!qp->srv[type].op_cookie_pool) {
> + ZSDA_LOG(ERR, E_CREATE);
> + goto exit;
> + }
> +
> + for (i = 0; i < qp->srv[type].nb_descriptors; i++) {
> + if (rte_mempool_get(qp->srv[type].op_cookie_pool,
> + &qp->srv[type].op_cookies[i])) {
> + ZSDA_LOG(ERR, "ZSDA PMD Cannot get op_cookie");
> + goto exit;
> + }
> + memset(qp->srv[type].op_cookies[i], 0,
> +zsda_qp_conf->cookie_size);
> + }
> + return 0;
> +
> +exit:
> + if (qp->srv[type].op_cookie_pool)
> + rte_mempool_free(qp->srv[type].op_cookie_pool);

No need to check for null with rte_mempool_free. The cocci/nullfree script
modifies this


Re: [PATCH 5/6] net: add smaller IPv4 cksum function for simple cases

2024-10-17 Thread Stephen Hemminger
On Thu, 17 Oct 2024 20:03:13 +0100
Bruce Richardson  wrote:

> On Thu, Oct 17, 2024 at 07:15:10PM +0200, Morten Brørup wrote:
> > > +/**
> > > + * Process the IPv4 checksum of an IPv4 header without any extensions.
> > > + *
> > > + * The checksum field does NOT have to be set by the caller, the field
> > > + * is skipped by the calculation.
> > > + *
> > > + * @param ipv4_hdr
> > > + *   The pointer to the contiguous IPv4 header.
> > > + * @return
> > > + *   The complemented checksum to set in the IP packet.
> > > + */
> > > +__rte_experimental
> > > +static inline uint16_t
> > > +rte_ipv4_cksum_simple(const struct rte_ipv4_hdr *ipv4_hdr)
> > > +{
> > > + const uint16_t *v16_h;
> > > + uint32_t ip_cksum;
> > > +
> > > + /*
> > > +  * Compute the sum of successive 16-bit words of the IPv4 header,
> > > +  * skipping the checksum field of the header.
> > > +  */
> > > + v16_h = (const unaligned_uint16_t *)&ipv4_hdr->version_ihl;
> > > + ip_cksum = v16_h[0] + v16_h[1] + v16_h[2] + v16_h[3] +
> > > + v16_h[4] + v16_h[6] + v16_h[7] + v16_h[8] + v16_h[9];
> > > +
> > > + /* reduce 32 bit checksum to 16 bits and complement it */
> > > + ip_cksum = (ip_cksum & 0x) + (ip_cksum >> 16);
> > > + ip_cksum = (ip_cksum & 0x) + (ip_cksum >> 16);
> > > + ip_cksum = (~ip_cksum) & 0x;
> > > + return (ip_cksum == 0) ? 0x : (uint16_t) ip_cksum;  
> > 
> > The zero exception does not apply to the checksum stored in the IP header, 
> > only to the checksum in the UDP header.
> >   
> 
> I was wondering about that, because I didn't see it mentioned anywhere in
> the RFCs I consulted, but on the other hand all the implementations in the
> code seemed to have the check for zero.
> 
> > > +}  
> > 
> > Besides that, for the series,  
> 
> So, just to confirm, the zero check at the end of the new ip_cksum_simple
> function should be removed and we always return the computed value
> directly?

Depends on usage.
  - if the computed value is zero, then 0x should be placed in
the IP header.
  - often code use ip checksum code to see if incoming checksum is good.
in that case zero means the checksum is valid.



Re: [PATCH] net: improve vlan header type alignment

2024-10-17 Thread Thomas Monjalon
13/10/2024 10:35, Morten Brørup:
> +static_assert(sizeof(struct rte_ether_addr) == 6,
> + "sizeof(struct rte_ether_addr) == 6");
> +static_assert(alignof(struct rte_ether_addr) == 2,
> + "alignof(struct rte_ether_addr) == 2");

Instead of repeating the condition twice,
it would be simpler to use RTE_BUILD_BUG_ON




Re: [PATCH] mbuf: add transport mode ESP packet type

2024-10-17 Thread Thomas Monjalon
Please could we have another review?


22/08/2024 17:32, Alexander Kozyrev:
> Support the IP Encapsulating Security Payload (ESP) in transport mode.
> Currently, we have RTE_PTYPE_TUNNEL_ESP for the ESP tunnel mode.
> Transport mode can be detected by parsing the "Next Header" field.
> The Next Header is TCP for the transport mode and IP for the tunnel mode.
> Add RTE_PTYPE_L4_ESP for the regular transport mode and
> RTE_PTYPE_INNER_L4_ESP for the ESP over UDP packets.
> 
> Signed-off-by: Alexander Kozyrev 
> ---
>  lib/mbuf/rte_mbuf_ptype.c |  2 ++
>  lib/mbuf/rte_mbuf_ptype.h | 36 ++--
>  2 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/mbuf/rte_mbuf_ptype.c b/lib/mbuf/rte_mbuf_ptype.c
> index d6f906b06c..ab180b3dda 100644
> --- a/lib/mbuf/rte_mbuf_ptype.c
> +++ b/lib/mbuf/rte_mbuf_ptype.c
> @@ -50,6 +50,7 @@ const char *rte_get_ptype_l4_name(uint32_t ptype)
>   case RTE_PTYPE_L4_ICMP: return "L4_ICMP";
>   case RTE_PTYPE_L4_NONFRAG: return "L4_NONFRAG";
>   case RTE_PTYPE_L4_IGMP: return "L4_IGMP";
> + case RTE_PTYPE_L4_ESP: return "L4_ESP";
>   default: return "L4_UNKNOWN";
>   }
>  }
> @@ -112,6 +113,7 @@ const char *rte_get_ptype_inner_l4_name(uint32_t ptype)
>   case RTE_PTYPE_INNER_L4_SCTP: return "INNER_L4_SCTP";
>   case RTE_PTYPE_INNER_L4_ICMP: return "INNER_L4_ICMP";
>   case RTE_PTYPE_INNER_L4_NONFRAG: return "INNER_L4_NONFRAG";
> + case RTE_PTYPE_INNER_L4_ESP: return "INNER_L4_ESP";
>   default: return "INNER_L4_UNKNOWN";
>   }
>  }
> diff --git a/lib/mbuf/rte_mbuf_ptype.h b/lib/mbuf/rte_mbuf_ptype.h
> index f2276e2909..c46a94f89f 100644
> --- a/lib/mbuf/rte_mbuf_ptype.h
> +++ b/lib/mbuf/rte_mbuf_ptype.h
> @@ -247,7 +247,7 @@ extern "C" {
>   * It refers to those packets of any IP types, which can be recognized as
>   * fragmented. A fragmented packet cannot be recognized as any other L4 types
>   * (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP,
> - * RTE_PTYPE_L4_NONFRAG).
> + * RTE_PTYPE_L4_NONFRAG, RTE_PTYPE_L4_IGMP, RTE_PTYPE_L4_ESP).
>   *
>   * Packet format:
>   * <'ether type'=0x0800
> @@ -290,14 +290,15 @@ extern "C" {
>   *
>   * It refers to those packets of any IP types, while cannot be recognized as
>   * any of above L4 types (RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP,
> - * RTE_PTYPE_L4_FRAG, RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP).
> + * RTE_PTYPE_L4_FRAG (for IPv6), RTE_PTYPE_L4_SCTP, RTE_PTYPE_L4_ICMP,
> + * RTE_PTYPE_L4_IGMP (for IPv4), RTE_PTYPE_L4_ESP).
>   *
>   * Packet format:
>   * <'ether type'=0x0800
> - * | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0, 'frag_offset'=0>
> + * | 'version'=4, 'protocol'!=[1|2|6|17|50|132], 'MF'=0, 'frag_offset'=0>
>   * or,
>   * <'ether type'=0x86DD
> - * | 'version'=6, 'next header'!=[6|17|44|132|1]>
> + * | 'version'=6, 'next header'!=[1|6|17|44|50|132]>
>   */
>  #define RTE_PTYPE_L4_NONFRAG0x0600
>  /**
> @@ -308,6 +309,17 @@ extern "C" {
>   * | 'version'=4, 'protocol'=2, 'MF'=0, 'frag_offset'=0>
>   */
>  #define RTE_PTYPE_L4_IGMP   0x0700
> +/**
> + * ESP (IP Encapsulating Security Payload) transport packet type.
> + *
> + * Packet format:
> + * <'ether type'=0x0800
> + * | 'version'=4, 'protocol'=50, 'MF'=0, 'frag_offset'=0>
> + * or,
> + * <'ether type'=0x86DD
> + * | 'version'=6, 'next header'=50>
> + */
> +#define RTE_PTYPE_L4_ESP0x0800
>  /**
>   * Mask of layer 4 packet types.
>   * It is used for outer packet for tunneling cases.
> @@ -652,12 +664,24 @@ extern "C" {
>   *
>   * Packet format (inner only):
>   * <'ether type'=0x0800
> - * | 'version'=4, 'protocol'!=[6|17|132|1], 'MF'=0, 'frag_offset'=0>
> + * | 'version'=4, 'protocol'!=[1|6|17|50|132], 'MF'=0, 'frag_offset'=0>
>   * or,
>   * <'ether type'=0x86DD
> - * | 'version'=6, 'next header'!=[6|17|44|132|1]>
> + * | 'version'=6, 'next header'!=[1|6|17|44|50|132]>
>   */
>  #define RTE_PTYPE_INNER_L4_NONFRAG  0x0600
> +/**
> + * ESP (IP Encapsulating Security Payload) transport packet type.
> + * It is used for inner packet only.
> + *
> + * Packet format (inner only):
> + * <'ether type'=0x0800
> + * | 'version'=4, 'protocol'=50, 'MF'=0, 'frag_offset'=0>
> + * or,
> + * <'ether type'=0x86DD
> + * | 'version'=6, 'next header'=50>
> + */
> +#define RTE_PTYPE_INNER_L4_ESP  0x0800
>  /**
>   * Mask of inner layer 4 packet types.
>   */
> 







RE: [PATCH] net: improve vlan header type alignment

2024-10-17 Thread Morten Brørup
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Thursday, 17 October 2024 22.44
> 
> 13/10/2024 10:35, Morten Brørup:
> > +static_assert(sizeof(struct rte_ether_addr) == 6,
> > +   "sizeof(struct rte_ether_addr) == 6");
> > +static_assert(alignof(struct rte_ether_addr) == 2,
> > +   "alignof(struct rte_ether_addr) == 2");
> 
> Instead of repeating the condition twice,
> it would be simpler to use RTE_BUILD_BUG_ON

RTE_BUILD_BUG_ON can only be used in code blocks, so it would need to be 
wrapped in some dummy function.



[PATCH] net/gve: replace typedefs with macros in gve osdep

2024-10-17 Thread Joshua Washington
Currently, a number of integer types are typedef'd to their
corresponding upserspace or RTE values. This can be problematic if these
types are already defined somewhere else, as it would cause type
collisions. This patch changes the typedefs to #define macros which are
only defined if the types are not defined already.

Fixes: c9ba2caf6302 ("net/gve/base: add OS-specific implementation")
Fixes: abf1242fbb84 ("net/gve: add struct members and typedefs for DQO")
Cc: junfeng@intel.com
Cc: sta...@dpdk.org

Signed-off-by: Joshua Washington 
Suggested-by: David Marchand 
---
 drivers/net/gve/base/gve_osdep.h | 48 
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/drivers/net/gve/base/gve_osdep.h b/drivers/net/gve/base/gve_osdep.h
index c0ee0d567c..64181cebd6 100644
--- a/drivers/net/gve/base/gve_osdep.h
+++ b/drivers/net/gve/base/gve_osdep.h
@@ -29,22 +29,46 @@
 #include 
 #endif
 
-typedef uint8_t u8;
-typedef uint16_t u16;
-typedef uint32_t u32;
-typedef uint64_t u64;
+#ifndef u8
+#define u8 uint8_t
+#endif
+#ifndef u16
+#define u16 uint16_t
+#endif
+#ifndef u32
+#define u32 uint32_t
+#endif
+#ifndef u64
+#define u64 uint64_t
+#endif
 
-typedef rte_be16_t __sum16;
+#ifndef __sum16
+#define __sum16 rte_be16_t
+#endif
 
-typedef rte_be16_t __be16;
-typedef rte_be32_t __be32;
-typedef rte_be64_t __be64;
+#ifndef __be16
+#define __be16 rte_be16_t
+#endif
+#ifndef __be32
+#define __be32 rte_be32_t
+#endif
+#ifndef __be64
+#define __be64 rte_be64_t
+#endif
 
-typedef rte_le16_t __le16;
-typedef rte_le32_t __le32;
-typedef rte_le64_t __le64;
+#ifndef __le16
+#define __le16 rte_le16_t
+#endif
+#ifndef __le32
+#define __le32 rte_le32_t
+#endif
+#ifndef __le64
+#define __le64 rte_le64_t
+#endif
 
-typedef rte_iova_t dma_addr_t;
+#ifndef dma_addr_t
+#define dma_addr_t rte_iova_t
+#endif
 
 #define ETH_MIN_MTURTE_ETHER_MIN_MTU
 #define ETH_ALEN   RTE_ETHER_ADDR_LEN
-- 
2.47.0.rc1.288.g06298d1525-goog



Updated Invitation: Adding support for PCIe steering tags in DPDK

2024-10-17 Thread Data Plane Development Kit - Meetings






This meeting has changed times. The instructions to join the call are the same.




Adding support for PCIe steering tags in DPDK

When: Wednesday, October 23rd, 9:00 America/Chicago



Meeting Description:
We discussed adding the PCIe steering tag support to DPDK. This feature allows for stashing the descriptors and packet data closer to the CPUs, possibly allowing for lower latency and higher throughput. This feature requires contributions from CPU vendors and NIC vendors. The goal of the meeting is to present the next version of the API and seek support for implementation from other participants in the community. Agenda:
- Brief introduction to the feature
- Introduce the APIs from RFC v2 (this will be submitted to the community before the call)
- Dependencies on kernel support - API for reading steering tags
- Addressing ABI in advance as patches will not be ready by 24.11





Ways to join meeting:



1. Join from PC, Mac, iPad, or Android
Join Meeting
If the button above does not work, paste this into your browser: https://zoom-lfx.platform.linuxfoundation.org/meeting/94917063595?password=77f36625-ad41-4b9c-b067-d33e68c3a29e



2. Join via audio

One tap mobile:
US: +12532158782,,94917063595#*270522# or +13462487799,,94917063595#*270522#


Or dial: 
US: +1 253 215 8782 or +1 346 248 7799 or +1 669 900 6833 or +1 301 715 8592 or +1 312 626 6799 or +1 646 374 8656 or 877 369 0926 (Toll Free) or 855 880 1246 (Toll Free)
Canada: +1 647 374 4685 or +1 647 558 0588 or +1 778 907 2071 or +1 204 272 7920 or +1 438 809 7799 or +1 587 328 1099 or 855 703 8985 (Toll Free)

Meeting ID: 94917063595

Meeting Passcode: 270522

International numbers





BEGIN:VCALENDAR
METHOD:REQUEST
PRODID:-//Linux Foundation//Meeting Management
VERSION:2.0
BEGIN:VTIMEZONE
TZID:America/Chicago
LAST-MODIFIED:20221029T021029Z
TZURL:http://tzurl.org/zoneinfo/America/Chicago
X-LIC-LOCATION:America/Chicago
X-PROLEPTIC-TZNAME:LMT
BEGIN:STANDARD
TZNAME:CST
TZOFFSETFROM:-055036
TZOFFSETTO:-0600
DTSTART:18831118T120924
END:STANDARD
BEGIN:DAYLIGHT
TZNAME:CDT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
DTSTART:19180331T02
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU;UNTIL=19190330T08Z
END:DAYLIGHT
BEGIN:STANDARD
TZNAME:CST
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
DTSTART:19181027T02
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU;UNTIL=19211030T07Z
END:STANDARD
BEGIN:DAYLIGHT
TZNAME:CDT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
DTSTART:19200613T02
RDATE:19210327T02
RDATE:19740106T02
RDATE:19750223T02
END:DAYLIGHT
BEGIN:DAYLIGHT
TZNAME:CDT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
DTSTART:19220430T02
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=-1SU;UNTIL=19350428T08Z
END:DAYLIGHT
BEGIN:STANDARD
TZNAME:CST
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
DTSTART:19220924T02
RRULE:FREQ=YEARLY;BYMONTH=9;BYDAY=-1SU;UNTIL=19350929T07Z
END:STANDARD
BEGIN:STANDARD
TZNAME:EST
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
DTSTART:19360301T02
END:STANDARD
BEGIN:STANDARD
TZNAME:CST
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
DTSTART:19361115T02
RDATE:19450930T02
END:STANDARD
BEGIN:DAYLIGHT
TZNAME:CDT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
DTSTART:19370425T02
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=-1SU;UNTIL=19410427T08Z
END:DAYLIGHT
BEGIN:STANDARD
TZNAME:CST
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
DTSTART:19370926T02
RRULE:FREQ=YEARLY;BYMONTH=9;BYDAY=-1SU;UNTIL=19410928T07Z
END:STANDARD
BEGIN:DAYLIGHT
TZNAME:CWT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
DTSTART:19420209T02
END:DAYLIGHT
BEGIN:DAYLIGHT
TZNAME:CPT
TZOFFSETFROM:-0500
TZOFFSETTO:-0500
DTSTART:19450814T18
END:DAYLIGHT
BEGIN:DAYLIGHT
TZNAME:CDT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
DTSTART:19460428T02
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=-1SU;UNTIL=19730429T08Z
END:DAYLIGHT
BEGIN:STANDARD
TZNAME:CST
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
DTSTART:19460929T02
RRULE:FREQ=YEARLY;BYMONTH=9;BYDAY=-1SU;UNTIL=19540926T07Z
END:STANDARD
BEGIN:STANDARD
TZNAME:CST
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
DTSTART:19551030T02
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU;UNTIL=20061029T07Z
END:STANDARD
BEGIN:DAYLIGHT
TZNAME:CDT
TZOFFSETFROM:-0600
TZOFFSET

Re: [PATCH 5/6] net: add smaller IPv4 cksum function for simple cases

2024-10-17 Thread Bruce Richardson
On Thu, Oct 17, 2024 at 07:15:10PM +0200, Morten Brørup wrote:
> > +/**
> > + * Process the IPv4 checksum of an IPv4 header without any extensions.
> > + *
> > + * The checksum field does NOT have to be set by the caller, the field
> > + * is skipped by the calculation.
> > + *
> > + * @param ipv4_hdr
> > + *   The pointer to the contiguous IPv4 header.
> > + * @return
> > + *   The complemented checksum to set in the IP packet.
> > + */
> > +__rte_experimental
> > +static inline uint16_t
> > +rte_ipv4_cksum_simple(const struct rte_ipv4_hdr *ipv4_hdr)
> > +{
> > +   const uint16_t *v16_h;
> > +   uint32_t ip_cksum;
> > +
> > +   /*
> > +* Compute the sum of successive 16-bit words of the IPv4 header,
> > +* skipping the checksum field of the header.
> > +*/
> > +   v16_h = (const unaligned_uint16_t *)&ipv4_hdr->version_ihl;
> > +   ip_cksum = v16_h[0] + v16_h[1] + v16_h[2] + v16_h[3] +
> > +   v16_h[4] + v16_h[6] + v16_h[7] + v16_h[8] + v16_h[9];
> > +
> > +   /* reduce 32 bit checksum to 16 bits and complement it */
> > +   ip_cksum = (ip_cksum & 0x) + (ip_cksum >> 16);
> > +   ip_cksum = (ip_cksum & 0x) + (ip_cksum >> 16);
> > +   ip_cksum = (~ip_cksum) & 0x;
> > +   return (ip_cksum == 0) ? 0x : (uint16_t) ip_cksum;
> 
> The zero exception does not apply to the checksum stored in the IP header, 
> only to the checksum in the UDP header.
> 

I was wondering about that, because I didn't see it mentioned anywhere in
the RFCs I consulted, but on the other hand all the implementations in the
code seemed to have the check for zero.

> > +}
> 
> Besides that, for the series,

So, just to confirm, the zero check at the end of the new ip_cksum_simple
function should be removed and we always return the computed value
directly?

> Acked-by: Morten Brørup 
> 


RE: [EXTERNAL] [PATCH v5 2/4] cryptodev: add ec points to sm2 op

2024-10-17 Thread Kusztal, ArkadiuszX



> -Original Message-
> From: Akhil Goyal 
> Sent: Friday, October 11, 2024 2:18 PM
> To: Kusztal, ArkadiuszX ; dev@dpdk.org
> Cc: Dooley, Brian 
> Subject: RE: [EXTERNAL] [PATCH v5 2/4] cryptodev: add ec points to sm2 op
> 
> > In the case when PMD cannot support the full process of the SM2, but
> > elliptic curve computation only, additional fields are needed to
> > handle such a case.
> >
> > Points C1, kP therefore were added to the SM2 crypto operation struct.
> >
> > Signed-off-by: Arkadiusz Kusztal 
> > ---
> >  lib/cryptodev/rte_crypto_asym.h | 53
> > ++--
> > -
> >  1 file changed, 39 insertions(+), 14 deletions(-)
> >
> > diff --git a/lib/cryptodev/rte_crypto_asym.h
> > b/lib/cryptodev/rte_crypto_asym.h index 2af6a307f6..65b1a081b1 100644
> > --- a/lib/cryptodev/rte_crypto_asym.h
> > +++ b/lib/cryptodev/rte_crypto_asym.h
> > @@ -607,6 +607,8 @@ enum rte_crypto_sm2_op_capa {
> > /**< Random number generator supported in SM2 ops. */
> > RTE_CRYPTO_SM2_PH,
> > /**< Prehash message before crypto op. */
> > +   RTE_CRYPTO_SM2_PARTIAL,
> > +   /**< Calculate elliptic curve points only. */
> >  };
> >
> >  /**
> > @@ -634,20 +636,43 @@ struct rte_crypto_sm2_op_param {
> >  * will be overwritten by the PMD with the decrypted length.
> >  */
> >
> > -   rte_crypto_param cipher;
> > -   /**<
> > -* Pointer to input data
> > -* - to be decrypted for SM2 private decrypt.
> > -*
> > -* Pointer to output data
> > -* - for SM2 public encrypt.
> > -* In this case the underlying array should have been allocated
> > -* with enough memory to hold ciphertext output (at least X bytes
> > -* for prime field curve of N bytes and for message M bytes,
> > -* where X = (C1 || C2 || C3) and computed based on SM2 RFC as
> > -* C1 (1 + N + N), C2 = M, C3 = N. The cipher.length field will
> > -* be overwritten by the PMD with the encrypted length.
> > -*/
> > +   union {
> > +   rte_crypto_param cipher;
> > +   /**<
> > +* Pointer to input data
> > +* - to be decrypted for SM2 private decrypt.
> > +*
> > +* Pointer to output data
> > +* - for SM2 public encrypt.
> > +* In this case the underlying array should have been allocated
> > +* with enough memory to hold ciphertext output (at least X
> > bytes
> > +* for prime field curve of N bytes and for message M bytes,
> > +* where X = (C1 || C2 || C3) and computed based on SM2 RFC
> > as
> > +* C1 (1 + N + N), C2 = M, C3 = N. The cipher.length field will
> > +* be overwritten by the PMD with the encrypted length.
> > +*/
> > +   struct {
> > +   struct rte_crypto_ec_point C1;
> > +   /**<
> > +* This field is used only when PMD does not support
> the
> > full
> > +* process of the SM2 encryption/decryption, but the
> > elliptic
> > +* curve part only.
> > +*
> > +* In the case of encryption, it is an output - point 
> > C1 =
> > (x1,y1).
> > +* In the case of decryption, if is an input - point C1 
> > =
> > (x1,y1)
> > +*
> > +*/
> > +   struct rte_crypto_ec_point kP;
> > +   /**<
> > +* This field is used only when PMD does not support
> the
> > full
> > +* process of the SM2 encryption/decryption, but the
> > elliptic
> > +* curve part only.
> > +*
> > +* It is an output in the encryption case, it is a point
> > +* [k]P = (x2,y2)
> > +*/
> 
> It is better to keep the variable names in lower case.
> c1 and kp should be fine.

The reason for keeping some of the letters in uppercase is that it corresponds 
to the general convention of naming for these types.
That's why we have dQ, qInv in RSA key for example, not dq, qinv.

> 
> > +   };
> > +   };
> >
> > rte_crypto_uint id;
> > /**< The SM2 id used by signer and verifier. */
> > --
> > 2.13.6



Re: [PATCH] examples/l3fwd: support setting the data size of mbuf

2024-10-17 Thread Stephen Hemminger
On Wed, 16 Oct 2024 16:22:32 +0800
Chaoyong He  wrote:

> From: Long Wu 
> 
> The previous code used a macro as the data size for mbuf
> to create the mempool and users cannot modify the size.
> 
> Now modify the code to support setting the data size of
> mbuf by '--mbuf-size' parameter. If user does not add the
> parameter in start command line, the default size is still
> 'RTE_MBUF_DEFAULT_BUF_SIZE'.
> 
> Examples:
> dpdk-l3fwd -l 0-3 -- -p 0x03 --mbuf-size=4096
> 
> Signed-off-by: Long Wu 
> Reviewed-by: Chaoyong He 

Patch has build failures

*Build Failed #1:
OS: OpenAnolis8.9-64
Target: x86_64-native-linuxapp-gcc
FAILED: examples/dpdk-l3fwd.p/l3fwd_main.c.o 
gcc -Iexamples/dpdk-l3fwd.p -Iexamples -I../examples -Iexamples/l3fwd 
-I../examples/l3fwd -I../examples/common -I. -I.. -Iconfig -I../config 
-Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include 
-I../lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/include 
-Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal -Ilib/kvargs 
-I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics -I../lib/metrics 
-Ilib/telemetry -I../lib/telemetry -Ilib/mempool -I../lib/mempool -Ilib/ring 
-I../lib/ring -Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/ethdev 
-I../lib/ethdev -Ilib/meter -I../lib/meter -Ilib/cmdline -I../lib/cmdline 
-Ilib/acl -I../lib/acl -Ilib/hash -I../lib/hash -Ilib/rcu -I../lib/rcu 
-Ilib/lpm -I../lib/lpm -Ilib/fib -I../lib/fib -Ilib/rib -I../lib/rib 
-Ilib/eventdev -I../lib/eventdev -Ilib/timer -I../lib/timer -Ilib/cryptodev 
-I../lib/cryptodev -Ilib/dmadev -I../lib/dmadev -fdiagnostics-color=always 
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Werror -std=c11 -O3 
-include rte_config.h -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral 
-Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs 
-Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes 
-Wundef -Wwrite-strings -Wno-packed-not-aligned -Wno-missing-field-initializers 
-D_GNU_SOURCE -march=native -mrtm -Wno-format-truncation 
-DALLOW_EXPERIMENTAL_API -MD -MQ examples/dpdk-l3fwd.p/l3fwd_main.c.o -MF 
examples/dpdk-l3fwd.p/l3fwd_main.c.o.d -o examples/dpdk-l3fwd.p/l3fwd_main.c.o 
-c ../examples/l3fwd/main.c
../examples/l3fwd/main.c: In function ‘parse_args’:
../examples/l3fwd/main.c:944:44: error: comparison is always false due to 
limited range of data type [-Werror=type-limits]
if (mbuf_seg_size <= 0 || mbuf_seg_size > 0x)
^
cc1: all warnings being treated as errors


Re: [PATCH v1 1/1] mldev: introduce data type conversion functions

2024-10-17 Thread Thomas Monjalon
31/07/2024 08:32, Srikanth Yalavarthi:
> Introduced data type conversion functions with support for
> user defined scale factor and zero-point. Updated library
> functions to support asymmetric / affine conversion for
> integer types.
> 
> Signed-off-by: Srikanth Yalavarthi 

Applied, thanks.





Community Call for Adding Support of PCIe Steering Tags Support in DPDK

2024-10-17 Thread Wathsala Wathawana Vithanage
A DPDK community call on adding support for PCIe steering tags is scheduled for 
10/23/24 at 9AM CST.
Steering tags allow for the stashing of descriptors and packet data closer to 
the CPUs, possibly allowing for lower latency and higher throughput.
This feature requires contributions from CPU vendors and NIC vendors.
The meeting's goal is to present the next version of the API and seek support 
for its implementation from other community participants.

Agenda:
- Brief introduction to the feature
- Introduce the APIs from RFC v2 (this will be submitted to the community 
before the call)
- Dependencies on kernel support - API for reading steering tags
- Addressing ABI in advance as patches will not be ready by 24.11

LXF meeting registration link: 
https://zoom-lfx.platform.linuxfoundation.org/meeting/94917063595?password=77f36625-ad41-4b9c-b067-d33e68c3a29e&invite=true
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


RE: [PATCH] maintainers: remove Pallavi as windows maintainer

2024-10-17 Thread Kadam, Pallavi



-Original Message-
From: Richardson, Bruce  
Sent: Thursday, October 17, 2024 11:59 AM
To: dev@dpdk.org
Cc: Kadam, Pallavi ; Richardson, Bruce 

Subject: [PATCH] maintainers: remove Pallavi as windows maintainer

Pallavi Kadam is no longer working on DPDK, so she has requested that her name 
be removed from the maintainers file.

Signed-off-by: Bruce Richardson 
Acked-by: Pallavi Kadam 


Re: [PATCH v1 1/1] mldev: add scale and zero point to I/O info struct

2024-10-17 Thread Thomas Monjalon
31/07/2024 08:25, Srikanth Yalavarthi:
> Added scale and zero point to I/O information structure.
> This would provision sharing the recommended sclae factor
> and zero point to the user for quantization process.
> 
> Signed-off-by: Srikanth Yalavarthi 

Applied with a Doxygen comment fixed.




Re: [Patch v5] net/netvsc: fix number Tx queues > Rx queues

2024-10-17 Thread Ferruh Yigit
On 10/17/2024 8:20 PM, lon...@linuxonhyperv.com wrote:
> From: Alan Elder 
> 
> The previous code allowed the number of Tx queues to be set higher than
> the number of Rx queues.  If a packet was sent on a Tx queue with index
>> = number Rx queues there was a segfault due to accessing beyond the end
> of the dev->data->rx_queues[] array.
> 
> This commit fixes the issue by creating an Rx queue for every Tx queue
> meaning that an event buffer is allocated to handle receiving Tx
> completion messages.
> 
> mbuf pool and Rx ring are not allocated for these additional Rx queues
> and RSS configuration ensures that no packets are received on them.
> 
> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> Cc: sthem...@microsoft.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Alan Elder 
> Signed-off-by: Long Li 
>

Applied to dpdk-next-net/main, thanks.

Checkpatch warning fixed while merging.


Re: [Patch v5] net/netvsc: fix number Tx queues > Rx queues

2024-10-17 Thread Ferruh Yigit
On 10/18/2024 12:02 AM, Stephen Hemminger wrote:
> On Thu, 17 Oct 2024 22:38:49 +
> Long Li  wrote:
> 
>>> Subject: Re: [Patch v5] net/netvsc: fix number Tx queues > Rx queues
>>>
>>> On Thu, 17 Oct 2024 12:20:29 -0700
>>> lon...@linuxonhyperv.com wrote:
>>>   
 +static void
 +hn_rx_queue_free_common(struct hn_rx_queue *rxq) {
 +  if (!rxq)
 +  return;
 +
 +  rte_free(rxq->rxbuf_info);
 +  rte_free(rxq->event_buf);
 +  rte_free(rxq);
 +}  
>>>
>>> Minor nit, DPDK style is for the bracket on the next line.
>>> Checkpatch will complain about this.  
>>
>> Is it okay to take the patch as is, or should I send a v6 to have it fixed?
> 
> It is ok as is, but followup to fix the minor stuff like this would be good.
>

I will fix this one while merging.


Re: [PATCH] net/gve: replace typedefs with macros in gve osdep

2024-10-17 Thread Ferruh Yigit
On 10/18/2024 12:42 AM, Joshua Washington wrote:
> Currently, a number of integer types are typedef'd to their
> corresponding upserspace or RTE values. This can be problematic if these
> types are already defined somewhere else, as it would cause type
> collisions. This patch changes the typedefs to #define macros which are
> only defined if the types are not defined already.
> 
> Fixes: c9ba2caf6302 ("net/gve/base: add OS-specific implementation")
> Fixes: abf1242fbb84 ("net/gve: add struct members and typedefs for DQO")
> Cc: junfeng@intel.com
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Joshua Washington 
> Suggested-by: David Marchand 
>

Thanks Joshua, probably better to get this fix directly to main for
-rc1, in patchwork assigned it to David.



[DPDK/DTS Bug 1567] dts: Replace the helloworld testsuite with a testsuite which starts testpmd but does nothing

2024-10-17 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=1567

Bug ID: 1567
   Summary: dts: Replace the helloworld testsuite with a testsuite
which starts testpmd but does nothing
   Product: DPDK
   Version: unspecified
  Hardware: All
OS: All
Status: UNCONFIRMED
  Severity: minor
  Priority: Normal
 Component: DTS
  Assignee: dev@dpdk.org
  Reporter: pr...@iol.unh.edu
CC: juraj.lin...@pantheon.tech, pr...@iol.unh.edu
  Target Milestone: ---

This morning during the DTS call we discussed how we were removing the build
process method for building example apps (example apps will not be used in new
DTS) and remove the helloworld testsuite accordingly. 

The question was raised of whether the helloworld testsuite has any actual
value and whether it should be replaced or not. One point made was that running
the helloworld suite can be a sanity check for a user, showing that they have
setup their environment correctly, and are running the testsuite correctly. So,
there may be a use case for continuing to provide such a testsuite which
essentially does nothing. 

Paul proposed that we replace it with a suite which just starts testpmd, does
nothing (or something inconsequential) and then stops.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [PATCH dpdk v2] mbuf: fix strict aliasing error in allocator

2024-10-17 Thread Thomas Monjalon
25/09/2024 17:47, Stephen Hemminger:
> On Wed, 25 Sep 2024 11:40:54 -0400
> Robin Jarry  wrote:
> 
> > From: Robin Jarry 
> > To: dev@dpdk.org
> > Subject: [PATCH dpdk v2] mbuf: fix strict aliasing error in allocator
> > Date: Wed, 25 Sep 2024 11:40:54 -0400
> > 
> > When building an application with -fstrict-aliasing -Wstrict-aliasing=2,
> > we get errors triggered by rte_mbuf_raw_alloc() which is called inline
> > from rte_pktmbuf_alloc().
> > 
> >  ../dpdk/lib/mbuf/rte_mbuf.h: In function ‘rte_mbuf_raw_alloc’:
> >  ../dpdk/lib/mbuf/rte_mbuf.h:600:42: error: dereferencing type-punned
> >  pointer might break strict-aliasing rules [-Werror=strict-aliasing]
> >600 | if (rte_mempool_get(mp, (void **)&m) < 0)
> >|  ^~
> > 
> > Avoid incorrect casting by using an inline union variable.
> > 
> > Signed-off-by: Robin Jarry 
> 
> Thanks, union is safer than cast.
> 
> Reviewed-by: Stephen Hemminger 

Applied, thanks.





Rescheduling next week's DTS meeting

2024-10-17 Thread Patrick Robb
Hello,

There is a joint Governing Board & Tech Board meeting next Thursday
(October 24) at 9AM EST, when we normally have the DTS meeting. I
think I need to be available to join this, so we will need to
reschedule the DTS call.

I am going to preliminarily move it to Friday at the normal time, but
I can adjust to a different time or day if needed. Thanks.


[Patch v5] net/netvsc: fix number Tx queues > Rx queues

2024-10-17 Thread longli
From: Alan Elder 

The previous code allowed the number of Tx queues to be set higher than
the number of Rx queues.  If a packet was sent on a Tx queue with index
>= number Rx queues there was a segfault due to accessing beyond the end
of the dev->data->rx_queues[] array.

This commit fixes the issue by creating an Rx queue for every Tx queue
meaning that an event buffer is allocated to handle receiving Tx
completion messages.

mbuf pool and Rx ring are not allocated for these additional Rx queues
and RSS configuration ensures that no packets are received on them.

Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
Cc: sthem...@microsoft.com
Cc: sta...@dpdk.org

Signed-off-by: Alan Elder 
Signed-off-by: Long Li 
---
v5:
* Resend/fixed up the last verison of the patch garbled in patchwork

v4:
* Include segfault core stack in commit message

v3:
* Handle case of Rx queue creation failure in hn_dev_tx_queue_setup.
* Re-use rx queue if it has already been allocated.
* Don't allocate an mbuf if pool is NULL.  This avoids segfault if RSS
  configuration is incorrect.

v2:
* Remove function declaration for static non-member function

 drivers/net/netvsc/hn_ethdev.c |  9 +
 drivers/net/netvsc/hn_rxtx.c   | 68 +-
 2 files changed, 68 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index f8cb05a118..1736cb5d07 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -313,6 +313,15 @@ static int hn_rss_reta_update(struct rte_eth_dev *dev,
 
if (reta_conf[idx].mask & mask)
hv->rss_ind[i] = reta_conf[idx].reta[shift];
+
+   /*
+* Ensure we don't allow config that directs traffic to an Rx
+* queue that we aren't going to poll
+*/
+   if (hv->rss_ind[i] >=  dev->data->nb_rx_queues) {
+   PMD_DRV_LOG(ERR, "RSS distributing traffic to invalid 
Rx queue");
+   return -EINVAL;
+   }
}
 
err = hn_rndis_conf_rss(hv, NDIS_RSS_FLAG_DISABLE);
diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
index 870f62e5fa..3e5386aaf1 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -222,6 +222,16 @@ static void hn_reset_txagg(struct hn_tx_queue *txq)
txq->agg_prevpkt = NULL;
 }
 
+static void
+hn_rx_queue_free_common(struct hn_rx_queue *rxq) {
+   if (!rxq)
+   return;
+
+   rte_free(rxq->rxbuf_info);
+   rte_free(rxq->event_buf);
+   rte_free(rxq);
+}
+
 int
 hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
  uint16_t queue_idx, uint16_t nb_desc,
@@ -231,6 +241,7 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 {
struct hn_data *hv = dev->data->dev_private;
struct hn_tx_queue *txq;
+   struct hn_rx_queue *rxq = NULL;
char name[RTE_MEMPOOL_NAMESIZE];
uint32_t tx_free_thresh;
int err = -ENOMEM;
@@ -289,6 +300,27 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
goto error;
}
 
+   /*
+* If there are more Tx queues than Rx queues, allocate rx_queues
+* with event buffer so that Tx completion messages can still be
+* received
+*/
+   if (queue_idx >= dev->data->nb_rx_queues) {
+   rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+
+   if (!rxq) {
+   err = -ENOMEM;
+   goto error;
+   }
+
+   /*
+* Don't allocate mbuf pool or rx ring.  RSS is always 
configured
+* to ensure packets aren't received by this Rx queue.
+*/
+   rxq->mb_pool = NULL;
+   rxq->rx_ring = NULL;
+   }
+
txq->agg_szmax  = RTE_MIN(hv->chim_szmax, hv->rndis_agg_size);
txq->agg_pktmax = hv->rndis_agg_pkts;
txq->agg_align  = hv->rndis_agg_align;
@@ -299,12 +331,15 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 socket_id, tx_conf);
if (err == 0) {
dev->data->tx_queues[queue_idx] = txq;
+   if (rxq != NULL)
+   dev->data->rx_queues[queue_idx] = rxq;
return 0;
}
 
 error:
rte_mempool_free(txq->txdesc_pool);
rte_memzone_free(txq->tx_rndis_mz);
+   hn_rx_queue_free_common(rxq);
rte_free(txq);
return err;
 }
@@ -351,6 +386,12 @@ hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t 
qid)
 
if (!txq)
return;
+   /*
+* Free any Rx queues allocated for a Tx queue without a corresponding
+* Rx queue
+*/
+   if (qid >= dev->data->nb_rx_queues)
+   hn_rx_queue_free_common(dev->data->rx_queues[qid]);
 
rte_mempool_free(txq->txdesc_pool);
 
@@ -540,10

RE: [Patch v5] net/netvsc: fix number Tx queues > Rx queues

2024-10-17 Thread Long Li
> Subject: Re: [Patch v5] net/netvsc: fix number Tx queues > Rx queues
> 
> On Thu, 17 Oct 2024 12:20:29 -0700
> lon...@linuxonhyperv.com wrote:
> 
> > +static void
> > +hn_rx_queue_free_common(struct hn_rx_queue *rxq) {
> > +   if (!rxq)
> > +   return;
> > +
> > +   rte_free(rxq->rxbuf_info);
> > +   rte_free(rxq->event_buf);
> > +   rte_free(rxq);
> > +}
> 
> Minor nit, DPDK style is for the bracket on the next line.
> Checkpatch will complain about this.

Is it okay to take the patch as is, or should I send a v6 to have it fixed?


[PATCH dpdk v2] net: add more icmp types and code

2024-10-17 Thread Robin Jarry
Add more ICMP message types and codes based on RFC 792. Change the
namespace prefix from RTE_IP_ICMP_ to RTE_ICMP_ to allow differentiation
between types and codes.

Do not include deprecated message types as described in RFC 6918.

Link: https://www.rfc-editor.org/rfc/rfc792
Link: https://www.rfc-editor.org/rfc/rfc6918
Signed-off-by: Robin Jarry 
Acked-by: Stephen Hemminger 
Acked-by: Ferruh Yigit 
---

Notes:
v2: added release note

 app/test-pmd/icmpecho.c| 10 
 doc/guides/rel_notes/release_24_11.rst |  6 +
 lib/net/rte_icmp.h | 33 --
 3 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c
index 68524484e305..4ef23ae67ac4 100644
--- a/app/test-pmd/icmpecho.c
+++ b/app/test-pmd/icmpecho.c
@@ -416,7 +416,7 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs)
icmp_h = (struct rte_icmp_hdr *) ((char *)ip_h +
  sizeof(struct rte_ipv4_hdr));
if (! ((ip_h->next_proto_id == IPPROTO_ICMP) &&
-  (icmp_h->icmp_type == RTE_IP_ICMP_ECHO_REQUEST) &&
+  (icmp_h->icmp_type == RTE_ICMP_TYPE_ECHO_REQUEST) &&
   (icmp_h->icmp_code == 0))) {
rte_pktmbuf_free(pkt);
continue;
@@ -440,7 +440,7 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs)
 * - switch the request IP source and destination
 *   addresses in the reply IP header,
 * - keep the IP header checksum unchanged.
-* - set RTE_IP_ICMP_ECHO_REPLY in ICMP header.
+* - set RTE_ICMP_TYPE_ECHO_REPLY in ICMP header.
 * ICMP checksum is computed by assuming it is valid in the
 * echo request and not verified.
 */
@@ -463,10 +463,10 @@ reply_to_icmp_echo_rqsts(struct fwd_stream *fs)
ip_h->src_addr = ip_h->dst_addr;
ip_h->dst_addr = ip_addr;
}
-   icmp_h->icmp_type = RTE_IP_ICMP_ECHO_REPLY;
+   icmp_h->icmp_type = RTE_ICMP_TYPE_ECHO_REPLY;
cksum = ~icmp_h->icmp_cksum & 0x;
-   cksum += ~RTE_BE16(RTE_IP_ICMP_ECHO_REQUEST << 8) & 0x;
-   cksum += RTE_BE16(RTE_IP_ICMP_ECHO_REPLY << 8);
+   cksum += ~RTE_BE16(RTE_ICMP_TYPE_ECHO_REQUEST << 8) & 0x;
+   cksum += RTE_BE16(RTE_ICMP_TYPE_ECHO_REPLY << 8);
cksum = (cksum & 0x) + (cksum >> 16);
cksum = (cksum & 0x) + (cksum >> 16);
icmp_h->icmp_cksum = ~cksum;
diff --git a/doc/guides/rel_notes/release_24_11.rst 
b/doc/guides/rel_notes/release_24_11.rst
index d2301461ce35..0e9c81b32b20 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -238,6 +238,12 @@ New Features
   Added ability for node to advertise and update multiple xstat counters,
   that can be retrieved using ``rte_graph_cluster_stats_get``.
 
+* **Added new ICMP message types and codes.**
+
+  New ICMP message types and codes from RFC 792 are available in 
``rte_icmp.h``.
+  Message types now use the ``RTE_ICMP_TYPE_`` prefix.
+  Message codes use the ``RTE_ICMP_CODE_`` prefix.
+
 
 Removed Items
 -
diff --git a/lib/net/rte_icmp.h b/lib/net/rte_icmp.h
index 7a33280aa1e4..e69d68ab6e22 100644
--- a/lib/net/rte_icmp.h
+++ b/lib/net/rte_icmp.h
@@ -50,8 +50,37 @@ struct rte_icmp_hdr {
 } __rte_packed;
 
 /* ICMP packet types */
-#define RTE_IP_ICMP_ECHO_REPLY   0
-#define RTE_IP_ICMP_ECHO_REQUEST 8
+#define RTE_ICMP_TYPE_ECHO_REPLY 0
+#define RTE_IP_ICMP_ECHO_REPLY \
+   (RTE_DEPRECATED(RTE_IP_ICMP_ECHO_REPLY) RTE_ICMP_TYPE_ECHO_REPLY)
+#define RTE_ICMP_TYPE_DEST_UNREACHABLE 3
+#define RTE_ICMP_TYPE_REDIRECT 5
+#define RTE_ICMP_TYPE_ECHO_REQUEST 8
+#define RTE_IP_ICMP_ECHO_REQUEST \
+   (RTE_DEPRECATED(RTE_IP_ICMP_ECHO_REQUEST) RTE_ICMP_TYPE_ECHO_REQUEST)
+#define RTE_ICMP_TYPE_TTL_EXCEEDED 11
+#define RTE_ICMP_TYPE_PARAM_PROBLEM 12
+#define RTE_ICMP_TYPE_TIMESTAMP_REQUEST 13
+#define RTE_ICMP_TYPE_TIMESTAMP_REPLY 14
+
+/* Destination Unreachable codes */
+#define RTE_ICMP_CODE_UNREACH_NET 0
+#define RTE_ICMP_CODE_UNREACH_HOST 1
+#define RTE_ICMP_CODE_UNREACH_PROTO 2
+#define RTE_ICMP_CODE_UNREACH_PORT 3
+#define RTE_ICMP_CODE_UNREACH_FRAG 4
+#define RTE_ICMP_CODE_UNREACH_SRC 5
+
+/* Time Exceeded codes */
+#define RTE_ICMP_CODE_TTL_EXCEEDED 0
+#define RTE_ICMP_CODE_TTL_FRAG 1
+
+/* Redirect codes */
+#define RTE_ICMP_CODE_REDIRECT_NET 0
+#define RTE_ICMP_CODE_REDIRECT_HOST 1
+#define RTE_ICMP_CODE_REDIRECT_TOS_NET 2
+#define RTE_ICMP_CODE_REDIRECT_TOS_HOST 3
+
 #define RTE_ICMP6_ECHO_REQUEST 128
 #define RTE_ICMP6_ECHO_REPLY   129
 
-- 
2.47.0



Re: [PATCH dpdk] net: add more icmp types and code

2024-10-17 Thread Robin Jarry

Ferruh Yigit, Oct 18, 2024 at 00:22:

On 10/17/2024 5:02 PM, Stephen Hemminger wrote:

On Thu, 17 Oct 2024 10:33:22 +0200
Robin Jarry  wrote:


Add more ICMP message types and codes based on RFC 792. Change the
namespace prefix from RTE_IP_ICMP_ to RTE_ICMP_ to allow differentiation
between types and codes.

Signed-off-by: Robin Jarry 


Should add a release note for this

Acked-by: Stephen Hemminger 



+1 to get this in 24.11

Acked-by: Ferruh Yigit 


@Thomas, probably it is better to get this for -rc1 and as next-net
already pulled can this directly go to main?


I will probably send a v2 with a release note.



Btw, is following also required? Just because it is documented in RFC:
   15  Information Request
   16  Information Reply


It seems some of the original codes have been deprecated:

https://www.rfc-editor.org/rfc/rfc6918#section-2.2



Ack stands if above added or not.




Re: [PATCH dpdk] net: add more icmp types and code

2024-10-17 Thread Ferruh Yigit
On 10/17/2024 11:33 PM, Robin Jarry wrote:
> Ferruh Yigit, Oct 18, 2024 at 00:22:
>> On 10/17/2024 5:02 PM, Stephen Hemminger wrote:
>>> On Thu, 17 Oct 2024 10:33:22 +0200
>>> Robin Jarry  wrote:
>>>
 Add more ICMP message types and codes based on RFC 792. Change the
 namespace prefix from RTE_IP_ICMP_ to RTE_ICMP_ to allow
 differentiation
 between types and codes.

 Signed-off-by: Robin Jarry 
>>>
>>> Should add a release note for this
>>>
>>> Acked-by: Stephen Hemminger 
>>>
>>
>> +1 to get this in 24.11
>>
>> Acked-by: Ferruh Yigit 
>>
>>
>> @Thomas, probably it is better to get this for -rc1 and as next-net
>> already pulled can this directly go to main?
> 
> I will probably send a v2 with a release note.
> 
>>
>> Btw, is following also required? Just because it is documented in RFC:
>>    15  Information Request
>>    16  Information Reply
> 
> It seems some of the original codes have been deprecated:
> 
> https://www.rfc-editor.org/rfc/rfc6918#section-2.2
> 

Ack


Re: [PATCH] net/nfp: fix RSS failed on VXLAN inner layer

2024-10-17 Thread Ferruh Yigit
On 10/16/2024 9:17 AM, Chaoyong He wrote:
> From: Long Wu 
> 
> Before the commit 5126a904fae0
> ("net/nfp: use offload flag to control VXLAN configuration"),
> in the initial logic 'nfp_net_start()' will enable the
> NFP_NET_CFG_CTRL_VXLAN flag if hardware has the capability,
> 'udp_tunnel_port_add()' and 'udp_tunnel_port_del()' just do
> the port add and delete action.
> 
> But the commit 5126a904fae0
> ("net/nfp: use offload flag to control VXLAN configuration")
> added another limitation of RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO over
> the VXLAN inner RSS flag of Tx wrongly, which caused the
> NFP_NET_CFG_CTRL_VXLAN cannot be enable, thus 'udp_tunnel_port_add()'
> and 'udp_tunnel_port_del()' can not done their works.
> 
> This commit fix the problem and do a little of enhancement to the
> initial logic, move the logic of enable NFP_NET_CFG_CTRL_VXLAN into the
> 'udp_tunnel_port_add()', and add the logic of disable
> NFP_NET_CFG_CTRL_VXLAN into the 'udp_tunnel_port_del()', thus the whole
> solution more complete and easier to understand.
> 
> Fixes: 5126a904fae0 ("net/nfp: use offload flag to control VXLAN 
> configuration")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Long Wu 
> Reviewed-by: Chaoyong He 
> Reviewed-by: Peng Zhang 
>

Applied to dpdk-next-net/main, thanks.


  1   2   >