RE: [EXT] Re: [PATCH v1 3/3] app/testpmd: support different input color method

2022-05-21 Thread Sunil Kumar Kori
Hi All,

Please hold on for review as this version is not aligned with latest spec 
changes for input coloring mechanism. 
I will be sending the next version soon.

Regards
Sunil Kumar Kori

> -Original Message-
> From: Ferruh Yigit 
> Sent: Saturday, May 21, 2022 3:40 AM
> To: Aman Singh ; Yuying Zhang
> ; Dumitrescu, Cristian
> 
> Cc: dev@dpdk.org; Xiaoyun Li ; Sunil Kumar Kori
> 
> Subject: [EXT] Re: [PATCH v1 3/3] app/testpmd: support different input color
> method
> 
> External Email
> 
> --
> On 3/1/2022 9:00 AM, sk...@marvell.com wrote:
> > From: Sunil Kumar Kori 
> >
> > Support for input coloring is added based on VLAN. Patch adds support
> > for the same.
> >
> > Signed-off-by: Sunil Kumar Kori 
> 
> Hi Aman, Yuying, Cristian,
> 
> Can you please review this patch?
> 
> For reference, patchwork link:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__patches.dpdk.org_project_dpdk_patch_20220301090056.1042866-2D3-
> 2Dskori-
> 40marvell.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=dXeXaAMkP5
> COgn1zxHMyaF1_d9IIuq6vHQO6NrIPjaE&m=uIZoPQrybNRc6RaCYoz6lsx8OQ
> m-1PThzgOBgBfL8h20GBu0JfP6oMBiNjYar1aV&s=WheQ6sKU9cqUgxfRSIKeK-
> MPmvalcreMKqAUKnNsmLA&e=
> 
> Thanks,
> ferruh
> 
> <...>


RE: [PATCH 00/12] Fix compilation with gcc 12

2022-05-21 Thread Morten Brørup
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Friday, 20 May 2022 22.14
> 
> On Wed, 18 May 2022 12:16:45 +0200
> David Marchand  wrote:
> 
> > Fedora 36 is out since early may and comes with gcc 12.
> > This series fixes compilation or waives some checks.
> >
> > There might be something fishy with rte_memcpy on x86 but, for now,
> > the rte_memcpy related fixes are on the caller side.
> >
> > Some "base" drivers have issues, I chose the simple solution of
> waiving
> > the checks for them.
> >
> > Compilation is the only thing checked.
> > Please driver maintainers, check nothing got broken.
> >
> 
> 
> We need to purge all code still using array size of one
> instead of proper flex array member.

+1 to that!



RE: [PATCH 04/12] net/ena: fix build with GCC 12

2022-05-21 Thread Morten Brørup
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Friday, 20 May 2022 22.28
> 
> On Wed, 18 May 2022 12:16:49 +0200
> David Marchand  wrote:
> 
> > +   for (i = 0; i < RTE_DIM(default_key); ++i)
> > default_key[i] = rte_rand() & 0xff;
> 
> We should have rte_random_bytes() functionality if this gets
> used often.

Since the other pseudorandom functions are called rand, such a function should 
be named rte_rand_bytes().

> 
> Also, worth considering dropping DPDK random number generator
> in userspace for security reasons and just using more secure kernel
> code.

Absolutely not! We need a fast pseudorandom number generator in DPDK.

If anything, we could consider renaming the functions and header file to 
reflect that they are pseudorandom number generators, and not 
(cryptographically) random generators. That would cause an API/ABI breakage, so 
it's probably not going to happen. ;-)




Re: [PATCH v2] event/dlb2: add support for single 512B write of 4 QEs

2022-05-21 Thread Jerin Jacob
On Fri, May 20, 2022 at 1:31 AM Timothy McDaniel
 wrote:
>
> On Xeon, as 512b accesses are available, movdir64 instruction is able to
> perform 512b read and write to DLB producer port. In order for movdir64
> to be able to pull its data from store buffers (store-buffer-forwarding)
> (before actual write), data should be in single 512b write format.
> This commit add change when code is built for Xeon with 512b AVX support
> to make single 512b write of all 4 QEs instead of 4x64b writes.
>
> Signed-off-by: Timothy McDaniel 
> Acked-by: Kent Wires 
> ===
>
> Changes since V1:
> 1) Split out dlb2_event_build_hcws into two implementations, one
> that uses AVX512 instructions, and one that does not. Each implementation
> is in its own source file in order to avoid build errors if the compiler
> does not support the newer AVX512 instructions.
> 2) Update meson.build to and pull in appropriate source file based on
> whether the compiler supports AVX512VL
> 3) Check if target supports AVX512VL, and use appropriate implementation
> based on this runtime check.
> ---
>  drivers/event/dlb2/dlb2.c  | 206 +-
>  drivers/event/dlb2/dlb2_avx512.c   | 267 +
>  drivers/event/dlb2/dlb2_noavx512.c | 219 +++

Could you change the file name to dlb2_sve.c as noavx512 means it can
be NEON too.
Rest looks good to me. Will merge the next version.


Re: [PATCH 04/12] net/ena: fix build with GCC 12

2022-05-21 Thread Stephen Hemminger
On Sat, 21 May 2022 11:49:47 +0200
Morten Brørup  wrote:

> > 
> > Also, worth considering dropping DPDK random number generator
> > in userspace for security reasons and just using more secure kernel
> > code.  
> 
> Absolutely not! We need a fast pseudorandom number generator in DPDK.
> 
> If anything, we could consider renaming the functions and header file to 
> reflect that they are pseudorandom number generators, and not 
> (cryptographically) random generators. That would cause an API/ABI breakage, 
> so it's probably not going to happen. ;-)


The Linux kernel has received an way more attention on random numbers than
DPDK. If you follow the history, what happens is that a simple dumb LCG
or similar random number generator gets invented, and then gets used for
lots of things that people don't think need a strong generator.

Followed by DoS and other attacks where the weak random number generator
is broken when used for doing things like creating sequence numbers of
TCP port assignment.  This is then followed by even more work on the
kernel random number generator to make the default random number generator
stronger.

I bring up this history, so that DPDK won't have to repeat it.

Right now the DPDK random number generator is insecure because it uses
long but weak PRNG and never reseeds itself.

See:
https://lwn.net/Articles/884875/

There is also FIPS to consider.
https://lwn.net/Articles/877607/

Since random number generators are hard, prefer that someone else do it :-)


RE: [RFC] ethdev: datapath-focused meter actions, continue

2022-05-21 Thread Alexander Kozyrev
On Thursday, May 19, 2022 13:35 Dumitrescu, Cristian 
:
> Here are a few takeaways of mine, with a few actions for Alex and Ori:

Thank you, Cristian, for compiling these takeaways. Great summary of the 
discussion.
 
> 6. Alexander Kozyrev to provide pseudo-code for the meter operation with
> the new proposal:
>   (1) meter creation;
>   (2) meter sharing;
>   (3) meter reconfiguration: do we need to remove the flow/flows
> using the meter and re-add them with a new meter object that has updated
> configuration, or can we update the meter object itself (what API?);
>   (4) meter free.

Traditional Meter Usage:

profile_id = rte_mtr_meter_profile_add(RFC_params);
policy_id = rte_mtr_meter_policy_add(actions[RED/YELLOW/GREEN]);
meter_id = rte_mtr_create(profile_id, policy_id);
rte_flow_create(pattern=5-tuple,actions=METER(meter_id));

The METER action effectively translates to the following:
1. Metering a packet stream.
2. Marking packets with an appropriate color.
3. Jump to a policy group.
4. Match on a color.
5. Execute assigned policy actions for the color.

New Meter Usage Model:
profile_id = rte_mtr_meter_profile_add(RFC_params);
*profile_obj_ptr = rte_mtr_meter_profile_get(profile_id);
rte_flow_create(pattern=5-tuple,
actions=METER_MARK(profile_obj_ptr),JUMP);
rte_flow_create(pattern=COLOR, actions=...);

The METER_MARK action effectively translates to the following:
1. Metering a packet stream.
2. Marking packets with an appropriate color.

A user is able to match the color later with the COLOR item.
In order to do this we add the JUMP action after the METER action.

3. Jump to a policy group.
4. Match on a color.
5. Execute actions for the color.

Here we decoupled the meter profile usage from the meter policy usage
for greater flexibility and got rid of any locks related to meter_id lookup.

Another example of the meter creation to mimic the old model entirely:
profile_id = rte_mtr_meter_profile_add(RFC_params);
*profile_obj_ptr = rte_mtr_meter_profile_get(profile_id);
policy_id = rte_mtr_meter_policy_add(actions[RED/YELLOW/GREEN]);
*policy_obj_ptr = rte_mtr_meter_policy_get(policy_id);
rte_flow_create(pattern=5-tuple,
actions= METER_MARK(profile_obj_ptr, policy_obj_ptr));

In this case, we define the policy actions right away.
The main advantage is not having to lookup for profile_id/policy_id.

To free the meter obects we need to do the following:
rte_flow_destroy(flow_handle);
rte_mtr_meter_policy_delete(policy_id);
rte_mtr_meter_profile_delete(profile_id);.
profile_obj_ptr and policy_obj_ptr are no longer valid after that.

The meter profile configuration cannot be updated dynamically
with the current set of patches, but can be supported later on.
Now you have to destroy flows and profiles and recreate them.
But rte_mtr_meter_profile_update()/rte_mtr_meter_policy_update()
can have the corresponding siblings without mtr_id parameters.
In this case, we can update the config and all the flows using them.

The meter sharing is done via the indirect action Flow API:
profile_id = rte_mtr_meter_profile_add(RFC_params);
*profile_obj_ptr = rte_mtr_meter_profile_get(profile_id);
handle = rte_flow_action_handle_create(action= METER_MARK(profile_obj_ptr, 
NULL));
flow1 = rte_flow_create(pattern=5-tuple-1, actions=INDIRECT(handle));
flow2 = rte_flow_create(pattern=5-tuple-2, actions=INDIRECT(handle));

Once we are done with the flow rules we can free everything.
rte_flow_destroy(flow1);
rte_flow_destroy(flow2);
rte_flow_action_handle_destroy(handle);
rte_mtr_meter_profile_delete(profile_id);


Re: [RFC] ethdev: datapath-focused meter actions, continue

2022-05-21 Thread Ajit Khaparde
On Sat, May 21, 2022 at 7:49 PM Alexander Kozyrev  wrote:
>
> On Thursday, May 19, 2022 13:35 Dumitrescu, Cristian 
> :
> > Here are a few takeaways of mine, with a few actions for Alex and Ori:
>
> Thank you, Cristian, for compiling these takeaways. Great summary of the 
> discussion.
Agree.

>
> > 6. Alexander Kozyrev to provide pseudo-code for the meter operation with
> > the new proposal:

I was wondering how the PMD and application can negotiate whether
to use the old model or the new model.
Maybe the application can determine if the PMD supports the traditional model or
the new model by checking rte_mtr_meter_profile_get or rte_mtr_meter_policy_get?
Or use something else?


> >   (1) meter creation;
> >   (2) meter sharing;
> >   (3) meter reconfiguration: do we need to remove the flow/flows
> > using the meter and re-add them with a new meter object that has updated
> > configuration, or can we update the meter object itself (what API?);
> >   (4) meter free.
>
> Traditional Meter Usage:
>
> profile_id = rte_mtr_meter_profile_add(RFC_params);
> policy_id = rte_mtr_meter_policy_add(actions[RED/YELLOW/GREEN]);
> meter_id = rte_mtr_create(profile_id, policy_id);
> rte_flow_create(pattern=5-tuple,actions=METER(meter_id));
>
> The METER action effectively translates to the following:
> 1. Metering a packet stream.
> 2. Marking packets with an appropriate color.
> 3. Jump to a policy group.
> 4. Match on a color.
> 5. Execute assigned policy actions for the color.
>
> New Meter Usage Model:
> profile_id = rte_mtr_meter_profile_add(RFC_params);
> *profile_obj_ptr = rte_mtr_meter_profile_get(profile_id);
> rte_flow_create(pattern=5-tuple,
> actions=METER_MARK(profile_obj_ptr),JUMP);
> rte_flow_create(pattern=COLOR, actions=...);
>
> The METER_MARK action effectively translates to the following:
> 1. Metering a packet stream.
> 2. Marking packets with an appropriate color.
>
> A user is able to match the color later with the COLOR item.
> In order to do this we add the JUMP action after the METER action.
>
> 3. Jump to a policy group.
> 4. Match on a color.
> 5. Execute actions for the color.
>
> Here we decoupled the meter profile usage from the meter policy usage
> for greater flexibility and got rid of any locks related to meter_id lookup.
>
> Another example of the meter creation to mimic the old model entirely:
> profile_id = rte_mtr_meter_profile_add(RFC_params);
> *profile_obj_ptr = rte_mtr_meter_profile_get(profile_id);
> policy_id = rte_mtr_meter_policy_add(actions[RED/YELLOW/GREEN]);
> *policy_obj_ptr = rte_mtr_meter_policy_get(policy_id);
> rte_flow_create(pattern=5-tuple,
> actions= METER_MARK(profile_obj_ptr, policy_obj_ptr));
>
>
> In this case, we define the policy actions right away.
> The main advantage is not having to lookup for profile_id/policy_id.
>
> To free the meter obects we need to do the following:
> rte_flow_destroy(flow_handle);
> rte_mtr_meter_policy_delete(policy_id);
> rte_mtr_meter_profile_delete(profile_id);.
> profile_obj_ptr and policy_obj_ptr are no longer valid after that.
>
> The meter profile configuration cannot be updated dynamically
> with the current set of patches, but can be supported later on.
> Now you have to destroy flows and profiles and recreate them.
> But rte_mtr_meter_profile_update()/rte_mtr_meter_policy_update()
> can have the corresponding siblings without mtr_id parameters.
> In this case, we can update the config and all the flows using them.
>
> The meter sharing is done via the indirect action Flow API:
> profile_id = rte_mtr_meter_profile_add(RFC_params);
> *profile_obj_ptr = rte_mtr_meter_profile_get(profile_id);
> handle = rte_flow_action_handle_create(action= METER_MARK(profile_obj_ptr, 
> NULL));
> flow1 = rte_flow_create(pattern=5-tuple-1, actions=INDIRECT(handle));
> flow2 = rte_flow_create(pattern=5-tuple-2, actions=INDIRECT(handle));
>
> Once we are done with the flow rules we can free everything.
> rte_flow_destroy(flow1);
> rte_flow_destroy(flow2);
> rte_flow_action_handle_destroy(handle);
> rte_mtr_meter_profile_delete(profile_id);


smime.p7s
Description: S/MIME Cryptographic Signature


[RFC v2 0/7] introduce per-queue limit watermark and host shaper

2022-05-21 Thread Spike Du
LWM(limit watermark) is per RX queue attribute, when RX queue fullness reach 
the LWM limit, HW sends an event to dpdk application.
Host shaper can configure shaper rate and lwm-triggered for a host port.
The shaper limits the rate of traffic from host port to wire port.
If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically when one 
of the host port's Rx queues receives LWM event.

These two features can combine to control traffic from host port to wire port.
The work flow is configure LWM to RX queue and enable lwm-triggered flag in 
host shaper, after receiving LWM event, delay a while until RX queue is empty , 
then disable the shaper. We recycle this work flow to reduce RX queue drops.

Add new libethdev API to set LWM, add rte event RTE_ETH_EVENT_RXQ_LIMIT_REACHED 
to handle LWM event. For host shaper, because it doesn't align to existing DPDK 
framework and is specific to Nvidia NIC, use PMD private API.

For integration with testpmd, put the private cmdline function and LWM event 
handler in mlx5 PMD directory by adding a new file mlx5_test.c. Only add 
minimal code in testpmd to invoke interfaces from mlx5_test.c.

Spike Du (7):
  net/mlx5: add LWM support for Rxq
  common/mlx5: share interrupt management
  ethdev: introduce Rx queue based limit watermark
  net/mlx5: add LWM event handling support
  net/mlx5: support Rx queue based limit watermark
  net/mlx5: add private API to config host port shaper
  app/testpmd: add LWM and Host Shaper command

 app/test-pmd/cmdline.c   |  74 +
 app/test-pmd/config.c|  23 ++
 app/test-pmd/meson.build |   4 +
 app/test-pmd/testpmd.c   |  24 ++
 app/test-pmd/testpmd.h   |   1 +
 doc/guides/nics/mlx5.rst |  84 ++
 doc/guides/rel_notes/release_22_07.rst   |   2 +
 drivers/common/mlx5/linux/meson.build|  13 +
 drivers/common/mlx5/linux/mlx5_common_os.c   | 131 +
 drivers/common/mlx5/linux/mlx5_common_os.h   |  11 +
 drivers/common/mlx5/mlx5_prm.h   |  26 ++
 drivers/common/mlx5/version.map  |   2 +
 drivers/common/mlx5/windows/mlx5_common_os.h |  24 ++
 drivers/net/mlx5/linux/mlx5_ethdev_os.c  |  71 -
 drivers/net/mlx5/linux/mlx5_os.c | 132 ++---
 drivers/net/mlx5/linux/mlx5_socket.c |  53 +---
 drivers/net/mlx5/mlx5.c  |  68 +
 drivers/net/mlx5/mlx5.h  |  12 +-
 drivers/net/mlx5/mlx5_devx.c |  60 +++-
 drivers/net/mlx5/mlx5_devx.h |   1 +
 drivers/net/mlx5/mlx5_rx.c   | 292 +++
 drivers/net/mlx5/mlx5_rx.h   |  13 +
 drivers/net/mlx5/mlx5_testpmd.c  | 184 
 drivers/net/mlx5/mlx5_testpmd.h  |  27 ++
 drivers/net/mlx5/mlx5_txpp.c |  28 +-
 drivers/net/mlx5/rte_pmd_mlx5.h  |  30 ++
 drivers/net/mlx5/version.map |   2 +
 drivers/net/mlx5/windows/mlx5_ethdev_os.c|  22 --
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c  |  48 +--
 lib/ethdev/ethdev_driver.h   |  22 ++
 lib/ethdev/rte_ethdev.c  |  52 
 lib/ethdev/rte_ethdev.h  |  74 -
 lib/ethdev/version.map   |   4 +
 33 files changed, 1305 insertions(+), 309 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_testpmd.c
 create mode 100644 drivers/net/mlx5/mlx5_testpmd.h

-- 
2.27.0



[RFC v2 1/7] net/mlx5: add LWM support for Rxq

2022-05-21 Thread Spike Du
Add lwm(Limit WaterMark) field to Rxq object which indicates the percentage
of RX queue size used by HW to raise LWM event to the user.
Allow LWM setting in modify_rq command.
Allow the LWM configuration dynamically by adding RDY2RDY state change.

Signed-off-by: Spike Du 
---
 drivers/net/mlx5/mlx5.h  |  1 +
 drivers/net/mlx5/mlx5_devx.c | 13 -
 drivers/net/mlx5/mlx5_devx.h |  1 +
 drivers/net/mlx5/mlx5_rx.h   |  1 +
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ef755ee8cf..305edffe71 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1395,6 +1395,7 @@ enum mlx5_rxq_modify_type {
MLX5_RXQ_MOD_RST2RDY, /* modify state from reset to ready. */
MLX5_RXQ_MOD_RDY2ERR, /* modify state from ready to error. */
MLX5_RXQ_MOD_RDY2RST, /* modify state from ready to reset. */
+   MLX5_RXQ_MOD_RDY2RDY, /* modify state from ready to ready. */
 };
 
 enum mlx5_txq_modify_type {
diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 4b48f9433a..c918a50ae9 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -62,7 +62,7 @@ mlx5_rxq_obj_modify_rq_vlan_strip(struct mlx5_rxq_priv *rxq, 
int on)
  * @return
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
-static int
+int
 mlx5_devx_modify_rq(struct mlx5_rxq_priv *rxq, uint8_t type)
 {
struct mlx5_devx_modify_rq_attr rq_attr;
@@ -76,6 +76,11 @@ mlx5_devx_modify_rq(struct mlx5_rxq_priv *rxq, uint8_t type)
case MLX5_RXQ_MOD_RST2RDY:
rq_attr.rq_state = MLX5_RQC_STATE_RST;
rq_attr.state = MLX5_RQC_STATE_RDY;
+   if (rxq->lwm) {
+   rq_attr.modify_bitmask |=
+   MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM;
+   rq_attr.lwm = rxq->lwm;
+   }
break;
case MLX5_RXQ_MOD_RDY2ERR:
rq_attr.rq_state = MLX5_RQC_STATE_RDY;
@@ -85,6 +90,12 @@ mlx5_devx_modify_rq(struct mlx5_rxq_priv *rxq, uint8_t type)
rq_attr.rq_state = MLX5_RQC_STATE_RDY;
rq_attr.state = MLX5_RQC_STATE_RST;
break;
+   case MLX5_RXQ_MOD_RDY2RDY:
+   rq_attr.rq_state = MLX5_RQC_STATE_RDY;
+   rq_attr.state = MLX5_RQC_STATE_RDY;
+   rq_attr.modify_bitmask |= 
MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM;
+   rq_attr.lwm = rxq->lwm;
+   break;
default:
break;
}
diff --git a/drivers/net/mlx5/mlx5_devx.h b/drivers/net/mlx5/mlx5_devx.h
index a95207a6b9..ebd1da455a 100644
--- a/drivers/net/mlx5/mlx5_devx.h
+++ b/drivers/net/mlx5/mlx5_devx.h
@@ -11,6 +11,7 @@ int mlx5_txq_devx_obj_new(struct rte_eth_dev *dev, uint16_t 
idx);
 int mlx5_txq_devx_modify(struct mlx5_txq_obj *obj,
 enum mlx5_txq_modify_type type, uint8_t dev_port);
 void mlx5_txq_devx_obj_release(struct mlx5_txq_obj *txq_obj);
+int mlx5_devx_modify_rq(struct mlx5_rxq_priv *rxq, uint8_t type);
 
 extern struct mlx5_obj_ops devx_obj_ops;
 
diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h
index e715ed6b62..25a5f2c1fa 100644
--- a/drivers/net/mlx5/mlx5_rx.h
+++ b/drivers/net/mlx5/mlx5_rx.h
@@ -175,6 +175,7 @@ struct mlx5_rxq_priv {
struct mlx5_devx_rq devx_rq;
struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
uint32_t hairpin_status; /* Hairpin binding status. */
+   uint32_t lwm:16;
 };
 
 /* External RX queue descriptor. */
-- 
2.27.0



[RFC v2 3/7] ethdev: introduce Rx queue based limit watermark

2022-05-21 Thread Spike Du
LWM(limit watermark) describes the fullness of a Rx queue. If the Rx
queue fullness is above LWM, the device will trigger the event
RTE_ETH_EVENT_RX_LWM.
LWM is defined as a percentage of Rx queue size with valid value of
[0,99].
Setting LWM to 0 means disable it, which is the default.
When translate the percentage to queue descriptor number, the numbe
should be bigger than 0 and less than queue size.
Add LWM's configuration and query driver callbacks in eth_dev_ops.

Signed-off-by: Spike Du 
---
 lib/ethdev/ethdev_driver.h | 22 
 lib/ethdev/rte_ethdev.c| 52 +++
 lib/ethdev/rte_ethdev.h| 74 +-
 lib/ethdev/version.map |  4 +++
 4 files changed, 151 insertions(+), 1 deletion(-)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 69d9dc21d8..12ec5e7e19 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -470,6 +470,23 @@ typedef int (*eth_rx_queue_setup_t)(struct rte_eth_dev 
*dev,
const struct rte_eth_rxconf *rx_conf,
struct rte_mempool *mb_pool);
 
+/**
+ * @internal Set Rx queue limit watermark.
+ * see @rte_eth_rx_lwm_set()
+ */
+typedef int (*eth_rx_queue_lwm_set_t)(struct rte_eth_dev *dev,
+ uint16_t rx_queue_id,
+ uint8_t lwm);
+
+/**
+ * @internal Query queue limit watermark.
+ * see @rte_eth_rx_lwm_query()
+ */
+
+typedef int (*eth_rx_queue_lwm_query_t)(struct rte_eth_dev *dev,
+   uint16_t *rx_queue_id,
+   uint8_t *lwm);
+
 /** @internal Setup a transmit queue of an Ethernet device. */
 typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev,
uint16_t tx_queue_id,
@@ -1168,6 +1185,11 @@ struct eth_dev_ops {
/** Priority flow control queue configure */
priority_flow_ctrl_queue_config_t priority_flow_ctrl_queue_config;
 
+   /** Set Rx queue limit watermark */
+   eth_rx_queue_lwm_set_t rx_queue_lwm_set;
+   /** Query Rx queue limit watermark */
+   eth_rx_queue_lwm_query_t rx_queue_lwm_query;
+
/** Set Unicast Table Array */
eth_uc_hash_table_set_tuc_hash_table_set;
/** Set Unicast hash bitmap */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 8520aec561..0a46c71288 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -4429,6 +4429,58 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, 
uint16_t queue_idx,
queue_idx, tx_rate));
 }
 
+int rte_eth_rx_lwm_set(uint16_t port_id, uint16_t queue_id,
+  uint8_t lwm)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info dev_info;
+   int ret;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+
+   ret = rte_eth_dev_info_get(port_id, &dev_info);
+   if (ret != 0)
+   return ret;
+
+   if (queue_id > dev_info.max_rx_queues) {
+   RTE_ETHDEV_LOG(ERR,
+   "Set queue LWM:port %u: invalid queue ID=%u.\n",
+   port_id, queue_id);
+   return -EINVAL;
+   }
+
+   if (lwm > 99)
+   return -EINVAL;
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_lwm_set, -ENOTSUP);
+   return eth_err(port_id, (*dev->dev_ops->rx_queue_lwm_set)(dev,
+queue_id, lwm));
+}
+
+int rte_eth_rx_lwm_query(uint16_t port_id, uint16_t *queue_id,
+uint8_t *lwm)
+{
+   struct rte_eth_dev_info dev_info;
+   struct rte_eth_dev *dev;
+   int ret;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+
+   ret = rte_eth_dev_info_get(port_id, &dev_info);
+   if (ret != 0)
+   return ret;
+
+   if (queue_id == NULL)
+   return -EINVAL;
+   if (*queue_id >= dev_info.max_rx_queues)
+   *queue_id = 0;
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_lwm_query, -ENOTSUP);
+   return eth_err(port_id, (*dev->dev_ops->rx_queue_lwm_query)(dev,
+queue_id, lwm));
+}
+
 RTE_INIT(eth_dev_init_fp_ops)
 {
uint32_t i;
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..687ae5ff29 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1249,7 +1249,16 @@ struct rte_eth_rxconf {
 */
union rte_eth_rxseg *rx_seg;
 
-   uint64_t reserved_64s[2]; /**< Reserved for future fields */
+   /**
+* Per-queue Rx limit watermark defined as percentage of Rx queue
+* size. If Rx queue receives traffic higher than this percentage,
+* the

[RFC v2 5/7] net/mlx5: support Rx queue based limit watermark

2022-05-21 Thread Spike Du
Add mlx5 specific LWM(limit watermark) configuration and query handler.
While the Rx queue fullness reaches the LWM limit, the driver catches
an HW event and invokes the user callback.
The query handler finds the next RX queue with pending LWM event
if any, starting from the given RX queue index.

Signed-off-by: Spike Du 
---
 doc/guides/nics/mlx5.rst   |  12 ++
 doc/guides/rel_notes/release_22_07.rst |   1 +
 drivers/common/mlx5/mlx5_prm.h |   1 +
 drivers/net/mlx5/mlx5.c|   2 +
 drivers/net/mlx5/mlx5_rx.c | 156 +
 drivers/net/mlx5/mlx5_rx.h |   5 +
 6 files changed, 177 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index d83c56de11..79f56018ef 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -93,6 +93,7 @@ Features
 - Connection tracking.
 - Sub-Function representors.
 - Sub-Function.
+- Rx queue LWM (Limit WaterMark) configuration.
 
 
 Limitations
@@ -520,6 +521,9 @@ Limitations
 
 - The NIC egress flow rules on representor port are not supported.
 
+- LWM:
+
+  - Doesn't support shared Rx queue and Hairpin Rx queue.
 
 Statistics
 --
@@ -1680,3 +1684,11 @@ The procedure below is an example of using a ConnectX-5 
adapter card (pf0) with
 #. For each VF PCIe, using the following command to bind the driver::
 
$ echo ":82:00.2" >> /sys/bus/pci/drivers/mlx5_core/bind
+
+LWM introduction
+
+
+LWM (Limit WaterMark) is a per Rx queue attribute, it should be configured as
+a percentage of the Rx queue size.
+When Rx queue fullness is above LWM, an event is sent to PMD.
+
diff --git a/doc/guides/rel_notes/release_22_07.rst 
b/doc/guides/rel_notes/release_22_07.rst
index a60a0d5f16..253bc7e381 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -80,6 +80,7 @@ New Features
   * Added support for promiscuous mode on Windows.
   * Added support for MTU on Windows.
   * Added matching and RSS on IPsec ESP.
+  * Added Rx queue LWM(Limit WaterMark) support.
 
 * **Updated Marvell cnxk crypto driver.**
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 630b2c5100..3b5e60532a 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3293,6 +3293,7 @@ struct mlx5_aso_wqe {
 
 enum {
MLX5_EVENT_TYPE_OBJECT_CHANGE = 0x27,
+   MLX5_EVENT_TYPE_SRQ_LIMIT_REACHED = 0x14,
 };
 
 enum {
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e04a66625e..35ae51b3af 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2071,6 +2071,8 @@ const struct eth_dev_ops mlx5_dev_ops = {
.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
.vlan_filter_set = mlx5_vlan_filter_set,
.rx_queue_setup = mlx5_rx_queue_setup,
+   .rx_queue_lwm_set = mlx5_rx_queue_lwm_set,
+   .rx_queue_lwm_query = mlx5_rx_queue_lwm_query,
.rx_hairpin_queue_setup = mlx5_rx_hairpin_queue_setup,
.tx_queue_setup = mlx5_tx_queue_setup,
.tx_hairpin_queue_setup = mlx5_tx_hairpin_queue_setup,
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 7d556c2b45..d30522e6df 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -19,12 +19,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
 #include "mlx5.h"
 #include "mlx5_utils.h"
 #include "mlx5_rxtx.h"
+#include "mlx5_devx.h"
 #include "mlx5_rx.h"
 
 
@@ -128,6 +130,17 @@ mlx5_rx_descriptor_status(void *rx_queue, uint16_t offset)
return RTE_ETH_RX_DESC_AVAIL;
 }
 
+/* Get rxq lwm percentage according to lwm number. */
+static uint8_t
+mlx5_rxq_lwm_to_percentage(struct mlx5_rxq_priv *rxq)
+{
+   struct mlx5_rxq_data *rxq_data = &rxq->ctrl->rxq;
+   uint32_t wqe_cnt = 1 << rxq_data->elts_n;
+
+   /* ethdev LWM describes fullness, mlx5 LWM describes emptiness. */
+   return rxq->lwm ? (100 - rxq->lwm * 100 / wqe_cnt) : 0;
+}
+
 /**
  * DPDK callback to get the RX queue information.
  *
@@ -150,6 +163,7 @@ mlx5_rxq_info_get(struct rte_eth_dev *dev, uint16_t 
rx_queue_id,
 {
struct mlx5_rxq_ctrl *rxq_ctrl = mlx5_rxq_ctrl_get(dev, rx_queue_id);
struct mlx5_rxq_data *rxq = mlx5_rxq_data_get(dev, rx_queue_id);
+   struct mlx5_rxq_priv *rxq_priv = mlx5_rxq_get(dev, rx_queue_id);
 
if (!rxq)
return;
@@ -169,6 +183,8 @@ mlx5_rxq_info_get(struct rte_eth_dev *dev, uint16_t 
rx_queue_id,
qinfo->nb_desc = mlx5_rxq_mprq_enabled(rxq) ?
RTE_BIT32(rxq->elts_n) * RTE_BIT32(rxq->log_strd_num) :
RTE_BIT32(rxq->elts_n);
+   qinfo->conf.lwm = rxq_priv ?
+   mlx5_rxq_lwm_to_percentage(rxq_priv) : 0;
 }
 
 /**
@@ -1188,6 +1204,34 @@ mlx5_check_vec_rx_support(struct rte_eth_dev *dev 
__rte_unused)
return -ENOTSUP;
 }
 
+int
+mlx5_rx_queue_lwm_query(s

[RFC v2 4/7] net/mlx5: add LWM event handling support

2022-05-21 Thread Spike Du
When LWM meets RQ WQE, the kernel driver raises an event to SW.
Use devx event_channel to catch this and to notify the user.
Allocate this channel per shared device.
The channel has a cookie that informs the specific event port and queue.

Signed-off-by: Spike Du 
---
 drivers/net/mlx5/mlx5.c  | 66 
 drivers/net/mlx5/mlx5.h  |  7 
 drivers/net/mlx5/mlx5_devx.c | 47 +
 drivers/net/mlx5/mlx5_rx.c   | 33 ++
 drivers/net/mlx5/mlx5_rx.h   |  7 
 5 files changed, 160 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f0988712df..e04a66625e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -22,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1524,6 +1526,69 @@ mlx5_alloc_shared_dev_ctx(const struct 
mlx5_dev_spawn_data *spawn,
return NULL;
 }
 
+/**
+ * Create LWM event_channel and interrupt handle for shared device
+ * context. All rxqs sharing the device context share the event_channel.
+ * A callback is registered in interrupt thread to receive the LWM event.
+ *
+ * @param[in] priv
+ *   Pointer to mlx5_priv instance.
+ *
+ * @return
+ *   0 on success, negative with rte_errno set.
+ */
+int
+mlx5_lwm_setup(struct mlx5_priv *priv)
+{
+   int fd_lwm;
+
+   pthread_mutex_init(&priv->sh->lwm_config_lock, NULL);
+   priv->sh->devx_channel_lwm = mlx5_os_devx_create_event_channel
+   (priv->sh->cdev->ctx,
+MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
+   if (!priv->sh->devx_channel_lwm)
+   goto err;
+   fd_lwm = mlx5_os_get_devx_channel_fd(priv->sh->devx_channel_lwm);
+   priv->sh->intr_handle_lwm = mlx5_os_interrupt_handler_create
+   (RTE_INTR_INSTANCE_F_SHARED, true,
+fd_lwm, mlx5_dev_interrupt_handler_lwm, priv);
+   if (!priv->sh->intr_handle_lwm)
+   goto err;
+   return 0;
+err:
+   if (priv->sh->devx_channel_lwm) {
+   mlx5_os_devx_destroy_event_channel
+   (priv->sh->devx_channel_lwm);
+   priv->sh->devx_channel_lwm = NULL;
+   }
+   pthread_mutex_destroy(&priv->sh->lwm_config_lock);
+   return -rte_errno;
+}
+
+/**
+ * Destroy LWM event_channel and interrupt handle for shared device
+ * context before free this context. The interrupt handler is also
+ * unregistered.
+ *
+ * @param[in] sh
+ *   Pointer to shared device context.
+ */
+void
+mlx5_lwm_unset(struct mlx5_dev_ctx_shared *sh)
+{
+   if (sh->intr_handle_lwm) {
+   mlx5_os_interrupt_handler_destroy(sh->intr_handle_lwm,
+   mlx5_dev_interrupt_handler_lwm, (void *)-1);
+   sh->intr_handle_lwm = NULL;
+   }
+   if (sh->devx_channel_lwm) {
+   mlx5_os_devx_destroy_event_channel
+   (sh->devx_channel_lwm);
+   sh->devx_channel_lwm = NULL;
+   }
+   pthread_mutex_destroy(&sh->lwm_config_lock);
+}
+
 /**
  * Free shared IB device context. Decrement counter and if zero free
  * all allocated resources and close handles.
@@ -1601,6 +1666,7 @@ mlx5_free_shared_dev_ctx(struct mlx5_dev_ctx_shared *sh)
claim_zero(mlx5_devx_cmd_destroy(sh->td));
MLX5_ASSERT(sh->geneve_tlv_option_resource == NULL);
pthread_mutex_destroy(&sh->txpp.mutex);
+   mlx5_lwm_unset(sh);
mlx5_free(sh);
return;
 exit:
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 7ebb2cc961..a76f2fed3d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1268,6 +1268,9 @@ struct mlx5_dev_ctx_shared {
struct mlx5_lb_ctx self_lb; /* QP to enable self loopback for Devx. */
unsigned int flow_max_priority;
enum modify_reg flow_mreg_c[MLX5_MREG_C_NUM];
+   void *devx_channel_lwm;
+   struct rte_intr_handle *intr_handle_lwm;
+   pthread_mutex_t lwm_config_lock;
/* Availability of mreg_c's. */
struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
@@ -1405,6 +1408,7 @@ enum mlx5_txq_modify_type {
 };
 
 struct mlx5_rxq_priv;
+struct mlx5_priv;
 
 /* HW objects operations structure. */
 struct mlx5_obj_ops {
@@ -1413,6 +1417,7 @@ struct mlx5_obj_ops {
int (*rxq_event_get)(struct mlx5_rxq_obj *rxq_obj);
int (*rxq_obj_modify)(struct mlx5_rxq_priv *rxq, uint8_t type);
void (*rxq_obj_release)(struct mlx5_rxq_priv *rxq);
+   int (*rxq_event_get_lwm)(struct mlx5_priv *priv, int *rxq_idx, int 
*port_id);
int (*ind_table_new)(struct rte_eth_dev *dev, const unsigned int log_n,
 struct mlx5_ind_table_obj *ind_tbl);
int (*ind_table_modify)(struct rte_eth_dev *dev,
@@ -1603,6 +1608,8 @@ int mlx5_net_remove(struct mlx5_common_devi

[RFC v2 2/7] common/mlx5: share interrupt management

2022-05-21 Thread Spike Du
There are many duplicate code of creating and initializing rte_intr_handle.
Add a new mlx5_os API to do this, replace all PMD related code with this
API.

Signed-off-by: Spike Du 
---
 drivers/common/mlx5/linux/mlx5_common_os.c   | 131 ++
 drivers/common/mlx5/linux/mlx5_common_os.h   |  11 ++
 drivers/common/mlx5/version.map  |   2 +
 drivers/common/mlx5/windows/mlx5_common_os.h |  24 
 drivers/net/mlx5/linux/mlx5_ethdev_os.c  |  71 --
 drivers/net/mlx5/linux/mlx5_os.c | 132 ---
 drivers/net/mlx5/linux/mlx5_socket.c |  53 +---
 drivers/net/mlx5/mlx5.h  |   2 -
 drivers/net/mlx5/mlx5_txpp.c |  28 +---
 drivers/net/mlx5/windows/mlx5_ethdev_os.c|  22 
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c  |  48 +--
 11 files changed, 217 insertions(+), 307 deletions(-)

diff --git a/drivers/common/mlx5/linux/mlx5_common_os.c 
b/drivers/common/mlx5/linux/mlx5_common_os.c
index d40cfd5cd1..f10a981a37 100644
--- a/drivers/common/mlx5/linux/mlx5_common_os.c
+++ b/drivers/common/mlx5/linux/mlx5_common_os.c
@@ -11,6 +11,7 @@
 #endif
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -964,3 +965,133 @@ mlx5_os_wrapped_mkey_destroy(struct mlx5_pmd_wrapped_mr 
*pmd_mr)
claim_zero(mlx5_glue->dereg_mr(pmd_mr->obj));
memset(pmd_mr, 0, sizeof(*pmd_mr));
 }
+
+/**
+ * Rte_intr_handle create and init helper.
+ *
+ * @param[in] mode
+ *   interrupt instance can be shared between primary and secondary
+ *   processes or not.
+ * @param[in] set_fd_nonblock
+ *   Whether to set fd to O_NONBLOCK.
+ * @param[in] fd
+ *   Fd to set in created intr_handle.
+ * @param[in] cb
+ *   Callback to register for intr_handle.
+ * @param[in] cb_arg
+ *   Callback argument for cb.
+ *
+ * @return
+ *  - Interrupt handle on success.
+ *  - NULL on failure, with rte_errno set.
+ */
+struct rte_intr_handle *
+mlx5_os_interrupt_handler_create(int mode, bool set_fd_nonblock, int fd,
+rte_intr_callback_fn cb, void *cb_arg)
+{
+   struct rte_intr_handle *tmp_intr_handle;
+   int ret, flags;
+
+   tmp_intr_handle = rte_intr_instance_alloc(mode);
+   if (!tmp_intr_handle) {
+   rte_errno = ENOMEM;
+   goto err;
+   }
+   if (set_fd_nonblock) {
+   flags = fcntl(fd, F_GETFL);
+   ret = fcntl(fd, F_SETFL, flags | O_NONBLOCK);
+   if (ret) {
+   rte_errno = errno;
+   goto err;
+   }
+   }
+   ret = rte_intr_fd_set(tmp_intr_handle, fd);
+   if (ret)
+   goto err;
+   ret = rte_intr_type_set(tmp_intr_handle, RTE_INTR_HANDLE_EXT);
+   if (ret)
+   goto err;
+   ret = rte_intr_callback_register(tmp_intr_handle, cb, cb_arg);
+   if (ret) {
+   rte_errno = -ret;
+   goto err;
+   }
+   return tmp_intr_handle;
+err:
+   if (tmp_intr_handle)
+   rte_intr_instance_free(tmp_intr_handle);
+   return NULL;
+}
+
+/* Safe unregistration for interrupt callback. */
+static void
+mlx5_intr_callback_unregister(const struct rte_intr_handle *handle,
+ rte_intr_callback_fn cb_fn, void *cb_arg)
+{
+   uint64_t twait = 0;
+   uint64_t start = 0;
+
+   do {
+   int ret;
+
+   ret = rte_intr_callback_unregister(handle, cb_fn, cb_arg);
+   if (ret >= 0)
+   return;
+   if (ret != -EAGAIN) {
+   DRV_LOG(INFO, "failed to unregister interrupt"
+ " handler (error: %d)", ret);
+   MLX5_ASSERT(false);
+   return;
+   }
+   if (twait) {
+   struct timespec onems;
+
+   /* Wait one millisecond and try again. */
+   onems.tv_sec = 0;
+   onems.tv_nsec = NS_PER_S / MS_PER_S;
+   nanosleep(&onems, 0);
+   /* Check whether one second elapsed. */
+   if ((rte_get_timer_cycles() - start) <= twait)
+   continue;
+   } else {
+   /*
+* We get the amount of timer ticks for one second.
+* If this amount elapsed it means we spent one
+* second in waiting. This branch is executed once
+* on first iteration.
+*/
+   twait = rte_get_timer_hz();
+   MLX5_ASSERT(twait);
+   }
+   /*
+* Timeout elapsed, show message (once a second) and retry.
+* We have no other acceptable option here, if we ignore
+* the unregistering return code the h

[RFC v2 6/7] net/mlx5: add private API to config host port shaper

2022-05-21 Thread Spike Du
Host port shaper can be configured with QSHR(QoS Shaper Host Register).
Add check in build files to enable this function or not.

The host shaper configuration affects all the ethdev ports belonging to the
same host port.

Host shaper can configure shaper rate and lwm-triggered for a host port.
The shaper limits the rate of traffic from host port to wire port.
If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
when one of the host port's Rx queues receives LWM(Limit Watermark) event.

Signed-off-by: Spike Du 
---
 doc/guides/nics/mlx5.rst   |  26 +++
 doc/guides/rel_notes/release_22_07.rst |   1 +
 drivers/common/mlx5/linux/meson.build  |  13 
 drivers/common/mlx5/mlx5_prm.h |  25 ++
 drivers/net/mlx5/mlx5.h|   2 +
 drivers/net/mlx5/mlx5_rx.c | 103 +
 drivers/net/mlx5/rte_pmd_mlx5.h|  30 +++
 drivers/net/mlx5/version.map   |   2 +
 8 files changed, 202 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 79f56018ef..3da6f5a03c 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -94,6 +94,7 @@ Features
 - Sub-Function representors.
 - Sub-Function.
 - Rx queue LWM (Limit WaterMark) configuration.
+- Host shaper support.
 
 
 Limitations
@@ -525,6 +526,12 @@ Limitations
 
   - Doesn't support shared Rx queue and Hairpin Rx queue.
 
+- Host shaper:
+
+  - Support BlueField series NIC from BlueField 2.
+  - When configure host shaper with MLX5_HOST_SHAPER_FLAG_LWM_TRIGGERED flag 
set,
+only rate 0 and 100Mbps are supported.
+
 Statistics
 --
 
@@ -1692,3 +1699,22 @@ LWM (Limit WaterMark) is a per Rx queue attribute, it 
should be configured as
 a percentage of the Rx queue size.
 When Rx queue fullness is above LWM, an event is sent to PMD.
 
+Host shaper introduction
+
+
+Host shaper register is per host port register which sets a shaper
+on the host port.
+All VF/hostPF representors belonging to one host port share one host shaper.
+For example, if representor 0 and representor 1 belong to same host port,
+and a host shaper rate of 1Gbps is configured, the shaper throttles both
+representors' traffic from host.
+Host shaper has two modes for setting the shaper, immediate and deferred to
+LWM event trigger. In immediate mode, the rate limit is configured immediately
+to host shaper. When deferring to LWM trigger, the shaper is not set until an
+LWM event is received by any Rx queue in a VF representor belonging to the host
+port. The only rate supported for deferred mode is 100Mbps (there is no limit
+on the supported rates for immediate mode). In deferred mode, the shaper is set
+on the host port by the firmware upon receiving the LMW event, which allows
+throttling host traffic on LWM events at minimum latency, preventing excess
+drops in the Rx queue.
+
diff --git a/doc/guides/rel_notes/release_22_07.rst 
b/doc/guides/rel_notes/release_22_07.rst
index 253bc7e381..21879bda41 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -81,6 +81,7 @@ New Features
   * Added support for MTU on Windows.
   * Added matching and RSS on IPsec ESP.
   * Added Rx queue LWM(Limit WaterMark) support.
+  * Added host shaper support.
 
 * **Updated Marvell cnxk crypto driver.**
 
diff --git a/drivers/common/mlx5/linux/meson.build 
b/drivers/common/mlx5/linux/meson.build
index 5335f5b027..51c6e5dd2e 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -45,6 +45,13 @@ if static_ibverbs
 ext_deps += declare_dependency(link_args:ibv_ldflags.split())
 endif
 
+libmtcr_ul_found = false
+lib = cc.find_library('mtcr_ul', required:false)
+if lib.found() and run_command('meson', 
'--version').stdout().version_compare('>= 0.49.2')
+libmtcr_ul_found = true
+ext_deps += lib
+endif
+
 sources += files('mlx5_nl.c')
 sources += files('mlx5_common_auxiliary.c')
 sources += files('mlx5_common_os.c')
@@ -207,6 +214,12 @@ has_sym_args = [
 [ 'HAVE_MLX5_IBV_IMPORT_CTX_PD_AND_MR', 'infiniband/verbs.h',
 'ibv_import_device' ],
 ]
+if  libmtcr_ul_found
+has_sym_args += [
+[  'HAVE_MLX5_MSTFLINT', 'mstflint/mtcr.h',
+'mopen'],
+]
+endif
 config = configuration_data()
 foreach arg:has_sym_args
 config.set(arg[0], cc.has_header_symbol(arg[1], arg[2], dependencies: 
libs))
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 3b5e60532a..92d05a7368 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3771,6 +3771,7 @@ enum {
MLX5_CRYPTO_COMMISSIONING_REGISTER_ID = 0xC003,
MLX5_IMPORT_KEK_HANDLE_REGISTER_ID = 0xC004,
MLX5_CREDENTIAL_HANDLE_REGISTER_ID = 0xC005,
+   MLX5_QSHR_REGISTER_ID = 0x4030,
 };
 
 struct mlx5_ifc_register_mtutc_bits {
@@ -3785,6 +3786,30 @@ struct mlx5_ifc_register_mtutc_bits {
 

[RFC v2 7/7] app/testpmd: add LWM and Host Shaper command

2022-05-21 Thread Spike Du
Add command line options to support LWM per-rxq configure.
- Command syntax:
  set port  rxq  lwm 
  mlx5 set port  host_shaper lwm_triggered <0|1> rate 

- Example commands:
To configure LWM as 30% of rxq size on port 1 rxq 0:
testpmd> set port 1 rxq 0 lwm 30

To disable LWM on port 1 rxq 0:
testpmd> set port 1 rxq 0 lwm 0

To enable lwm_triggered on port 1 and disable current host shaper:
testpmd> mlx5 set port 1 host_shaper lwm_triggered 1 rate 0

To disable lwm_triggered and current host shaper on port 1:
testpmd> mlx5 set port 1 host_shaper lwm_triggered 0 rate 0

The rate unit is 100Mbps.
To disable lwm_triggered and configure a shaper of 5Gbps on port 1:
testpmd> mlx5 set port 1 host_shaper lwm_triggered 0 rate 50

Add sample code to handle rxq LWM event, it delays a while so that rxq
empties, then disables host shaper and rearms LWM event.

Signed-off-by: Spike Du 
---
 app/test-pmd/cmdline.c  |  74 +
 app/test-pmd/config.c   |  23 
 app/test-pmd/meson.build|   4 +
 app/test-pmd/testpmd.c  |  24 +
 app/test-pmd/testpmd.h  |   1 +
 doc/guides/nics/mlx5.rst|  46 
 drivers/net/mlx5/mlx5_testpmd.c | 184 
 drivers/net/mlx5/mlx5_testpmd.h |  27 +
 8 files changed, 383 insertions(+)
 create mode 100644 drivers/net/mlx5/mlx5_testpmd.c
 create mode 100644 drivers/net/mlx5/mlx5_testpmd.h

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 91e4090582..e8663dd797 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -67,6 +67,9 @@
 #include "cmdline_mtr.h"
 #include "cmdline_tm.h"
 #include "bpf_cmd.h"
+#ifdef RTE_NET_MLX5
+#include "mlx5_testpmd.h"
+#endif
 
 static struct cmdline *testpmd_cl;
 
@@ -17803,6 +17806,73 @@ cmdline_parse_inst_t cmd_show_port_flow_transfer_proxy 
= {
}
 };
 
+/* *** SET LIMIT WARTER MARK FOR A RXQ OF A PORT *** */
+struct cmd_rxq_lwm_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t port;
+   uint16_t port_num;
+   cmdline_fixed_string_t rxq;
+   uint16_t rxq_num;
+   cmdline_fixed_string_t lwm;
+   uint16_t lwm_num;
+};
+
+static void cmd_rxq_lwm_parsed(void *parsed_result,
+   __rte_unused struct cmdline *cl,
+   __rte_unused void *data)
+{
+   struct cmd_rxq_lwm_result *res = parsed_result;
+   int ret = 0;
+
+   if ((strcmp(res->set, "set") == 0) && (strcmp(res->port, "port") == 0)
+   && (strcmp(res->rxq, "rxq") == 0)
+   && (strcmp(res->lwm, "lwm") == 0))
+   ret = set_rxq_lwm(res->port_num, res->rxq_num,
+ res->lwm_num);
+   if (ret < 0)
+   printf("rxq_lwm_cmd error: (%s)\n", strerror(-ret));
+
+}
+
+cmdline_parse_token_string_t cmd_rxq_lwm_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_rxq_lwm_result,
+   set, "set");
+cmdline_parse_token_string_t cmd_rxq_lwm_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_rxq_lwm_result,
+   port, "port");
+cmdline_parse_token_num_t cmd_rxq_lwm_portnum =
+   TOKEN_NUM_INITIALIZER(struct cmd_rxq_lwm_result,
+   port_num, RTE_UINT16);
+cmdline_parse_token_string_t cmd_rxq_lwm_rxq =
+   TOKEN_STRING_INITIALIZER(struct cmd_rxq_lwm_result,
+   rxq, "rxq");
+cmdline_parse_token_num_t cmd_rxq_lwm_rxqnum =
+   TOKEN_NUM_INITIALIZER(struct cmd_rxq_lwm_result,
+   rxq_num, RTE_UINT8);
+cmdline_parse_token_string_t cmd_rxq_lwm_lwm =
+   TOKEN_STRING_INITIALIZER(struct cmd_rxq_lwm_result,
+   lwm, "lwm");
+cmdline_parse_token_num_t cmd_rxq_lwm_lwmnum =
+   TOKEN_NUM_INITIALIZER(struct cmd_rxq_lwm_result,
+   lwm_num, RTE_UINT16);
+
+cmdline_parse_inst_t cmd_rxq_lwm = {
+   .f = cmd_rxq_lwm_parsed,
+   .data = (void *)0,
+   .help_str = "set port  rxq  lwm "
+   "Set lwm for rxq on port_id",
+   .tokens = {
+   (void *)&cmd_rxq_lwm_set,
+   (void *)&cmd_rxq_lwm_port,
+   (void *)&cmd_rxq_lwm_portnum,
+   (void *)&cmd_rxq_lwm_rxq,
+   (void *)&cmd_rxq_lwm_rxqnum,
+   (void *)&cmd_rxq_lwm_lwm,
+   (void *)&cmd_rxq_lwm_lwmnum,
+   NULL,
+   },
+};
+
 /* 

 */
 
 /* list of instructions */
@@ -18089,6 +18159,10 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)&cmd_show_capability,
(cmdline_parse_inst_t *)&cmd_set_flex_is_pattern,
(cmdline_parse_inst_t *)&cmd_set_flex_spec_pattern,
+   (cmdline_parse_inst_t *)&cmd_rxq_lwm,
+#ifdef RTE_NET_MLX5
+   (cmdline_parse_inst_t *)&mlx5_test_cmd_port_host_shaper,
+#endif
NULL,
 };
 
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.