Re: [dpdk-dev] [PATCH v2] net/ixgbe: fix using heap pointer after free

2021-07-09 Thread Zhang, Qi Z



> -Original Message-
> From: dev  On Behalf Of Wang, Haiyue
> Sent: Friday, July 9, 2021 12:35 PM
> To: Yu, DapengX 
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] net/ixgbe: fix using heap pointer after 
> free
> 
> > -Original Message-
> > From: Yu, DapengX 
> > Sent: Friday, July 9, 2021 11:15
> > To: Wang, Haiyue 
> > Cc: dev@dpdk.org; Yu, DapengX ; sta...@dpdk.org
> > Subject: [PATCH v2] net/ixgbe: fix using heap pointer after free
> >
> > From: Dapeng Yu 
> >
> > The original code use a heap pointer after it is freed.
> > This patch fix it.
> >
> > Fixes: a14de8b498d1 ("net/ixgbe: destroy consistent filter")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Dapeng Yu 
> > ---
> > V2:
> > * Simplify the patch according to maintainer's comment:
> >   only one "pmd_flow" in the list, so just "break;" is fine.
> > ---
> >  drivers/net/ixgbe/ixgbe_flow.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> 
> Good catch, thanks!
> 
> Reviewed-by: Haiyue Wang 

Applied to dpdk-next-net-intel.

Thanks
Qi
> 
> 
> > --
> > 2.27.0



[dpdk-dev] [PATCH] app/testpmd: fix testpmd doesn't show RSS hash offload

2021-07-09 Thread Jie Wang
This patch reapply Rx/Tx offloads configuration for all ports
after the program configuring the device port. When the program
configures the ports, the default Rx/Tx offloads are modified.

So it is need to reapply Rx/Tx offloads configuration before
testpmd showing offloads.

Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
Cc: sta...@dpdk.org

Signed-off-by: Jie Wang 
---
 app/test-pmd/testpmd.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 1cdd3cdd12..7089ae216d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2475,6 +2475,9 @@ start_port(portid_t pid)
}
 
if (port->need_reconfig > 0) {
+   const struct rte_eth_dev *dev = &rte_eth_devices[pi];
+   int k;
+
port->need_reconfig = 0;
 
if (flow_isolate_all) {
@@ -2508,6 +2511,18 @@ start_port(portid_t pid)
port->need_reconfig = 1;
return -1;
}
+
+   /* Apply TxRx configuration for all ports */
+   port->dev_conf.txmode = dev->data->dev_conf.txmode;
+   port->dev_conf.rxmode = dev->data->dev_conf.rxmode;
+   /* Apply Rx offloads configuration */
+   for (k = 0; k < port->dev_info.max_rx_queues; k++)
+   port->rx_conf[k].offloads =
+   port->dev_conf.rxmode.offloads;
+   /* Apply Tx offloads configuration */
+   for (k = 0; k < port->dev_info.max_tx_queues; k++)
+   port->tx_conf[k].offloads =
+   port->dev_conf.txmode.offloads;
}
if (port->need_reconfig_queues > 0) {
port->need_reconfig_queues = 0;
-- 
2.25.1



[dpdk-dev] [PATCH] crypto/cnxk: fix debug build failure

2021-07-09 Thread Tejasree Kondoj
Removing usage of unavailable macro.

Reported-by: Ali Alnubani 
Suggested-by: David Marchand 
Signed-off-by: Tejasree Kondoj 
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c 
b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 6d322a9cb6..2e1a73939c 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -275,7 +275,7 @@ cn10k_cpt_sec_post_process(struct rte_crypto_op *cop,
m_len = rte_be_to_cpu_16(ip->total_length);
} else {
PLT_ASSERT(((ip->version_ihl & 0xf0) >>
-   RTE_IPV4_IHL_MULTIPLIER) == IPV6_VERSION);
+   RTE_IPV4_IHL_MULTIPLIER) == 6);
ip6 = (struct rte_ipv6_hdr *)ip;
m_len = rte_be_to_cpu_16(ip6->payload_len) +
sizeof(struct rte_ipv6_hdr);
-- 
2.27.0



Re: [dpdk-dev] [PATCH] net/bnxt: fix build failure

2021-07-09 Thread Thomas Monjalon
09/07/2021 00:49, Ajit Khaparde:
> Fix build failures because of uninitialized variable usage.

You should add the error log here.
You don't mention the condition of failure.
Nobody reproduced a failure so far.

> Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation")
> 
> Signed-off-by: Kishore Padmanabha 
> Signed-off-by: Ajit Khaparde 





Re: [dpdk-dev] [PATCH v1 4/4] doc: update iavf driver FDIR/RSS for GTPoGRE

2021-07-09 Thread Thomas Monjalon
07/07/2021 14:57, Lingyu Liu:
> Update 21.08 release note for GTPoGRE FDIR and RSS.
> 
> Signed-off-by: Lingyu Liu 
> ---
>  doc/guides/rel_notes/release_21_08.rst | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_21_08.rst 
> b/doc/guides/rel_notes/release_21_08.rst
> index cd02820e68..df719420b8 100644
> --- a/doc/guides/rel_notes/release_21_08.rst
> +++ b/doc/guides/rel_notes/release_21_08.rst
> @@ -64,6 +64,8 @@ New Features
>  * **Updated Intel iavf driver.**
>  
>* Added Tx QoS VF queue TC mapping.
> +  * Enable FDIR and RSS for GTPoGRE, support filter based on GTPU TEID/QFI,
> +outer most L3 or inner most l3/l4.

Should be in past tense.
Please update the release notes piece by piece in each patch.





Re: [dpdk-dev] [EXT] Re: [pull-request] next-crypto 21.08 rc1

2021-07-09 Thread Anoob Joseph
Hi David,

Pushed patches for both issues.

http://patches.dpdk.org/project/dpdk/patch/1625812123-802-1-git-send-email-ano...@marvell.com/
http://patches.dpdk.org/project/dpdk/patch/20210709094147.12710-1-ktejas...@marvell.com/

Thanks,
Anoob

> -Original Message-
> From: David Marchand 
> Sent: Friday, July 9, 2021 12:13 PM
> To: Anoob Joseph 
> Cc: Ali Alnubani ; Akhil Goyal
> ; NBU-Contact-Thomas Monjalon
> ; dev@dpdk.org; Tejasree Kondoj
> 
> Subject: Re: [EXT] Re: [dpdk-dev] [pull-request] next-crypto 21.08 rc1
> 
> On Fri, Jul 9, 2021 at 6:25 AM Anoob Joseph  wrote:
> >
> > Hi Ali, David,
> >
> > Thanks for pointing it out. The suggestion from David should do. Should we
> submit a patch with fix on master? How do you suggest we proceed?
> 
> Yes, please send a patch.
> 
> I'll take those three fixes directly in main.
> Probably a bit later today.
> 
> 
> --
> David Marchand



Re: [dpdk-dev] [PATCH v7 1/2] power: add support for cppc cpufreq

2021-07-09 Thread David Marchand
On Fri, Jul 9, 2021 at 5:34 AM Richael Zhuang  wrote:
>
> Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are
> supported, which are both not available on arm64 platforms. Add
> support for cppc_cpufreq driver which works on most arm64 platforms.

GHA reported build issues with clang.
https://github.com/ovsrobot/dpdk/runs/3025478784?check_suite_focus=true#step:15:787

Could you look at it?
Thanks.


-- 
David Marchand



Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library

2021-07-09 Thread Bruce Richardson
On Fri, Jul 09, 2021 at 12:05:40AM +0530, Jerin Jacob wrote:
> On Thu, Jul 8, 2021 at 8:41 AM fengchengwen  wrote:
> >
> 
> > >>>
> > >>> It's just more conditionals and branches all through the code. Inside 
> > >>> the
> > >>> user application, the user has to check whether to set the flag or not 
> > >>> (or
> > >>> special-case the last transaction outside the loop), and within the 
> > >>> driver,
> > >>> there has to be a branch whether or not to call the doorbell function. 
> > >>> The
> > >>> code on both sides is far simpler and more readable if the doorbell
> > >>> function is exactly that - a separate function.
> > >>
> > >> I disagree. The reason is:
> > >>
> > >> We will have two classes of applications
> > >>
> > >> a) do dma copy request as and when it has data(I think, this is the
> > >> prime use case), for those,
> > >> I think, it is considerable overhead to have two function invocation
> > >> per transfer i.e
> > >> rte_dma_copy() and rte_dma_perform()
> > >>
> > >> b) do dma copy when the data is reached to a logical state,  like copy
> > >> IP frame from Ethernet packets or so,
> > >> In that case, the application will have  a LOGIC to detect when to
> > >> perform it so on the end of
> > >> that rte_dma_copy() flag can be updated to fire the doorbell.
> > >>
> > >> IMO, We are comparing against a branch(flag is already in register) vs
> > >> a set of instructions for
> > >> 1) function pointer overhead
> > >> 2) Need to use the channel context again back in another function.
> > >>
> > >> IMO, a single branch is most optimal from performance PoV.
> > >>
> > > Ok, let's try it and see how it goes.
> >
> > Test result show:
> > 1) For Kunpeng platform (ARMv8) could benefit very little with doorbell in 
> > flags
> > 2) For Xeon E5-2690 v2 (X86) could benefit with separate function
> > 3) Both platform could benefit with doorbell in flags if burst < 5
> >
> > There is a performance gain in small bursts (<5). Given the extensive use 
> > of bursts
>  in DPDK applications and users are accustomed to the concept, I do
> not recommend
> > using the 'doorbell' in flags.
> 
> There is NO concept change between one option vs other option. Just
> argument differnet.
> Also, _perform() scheme not used anywhere in DPDK. I
> 
> Regarding performance, I have added dummy instructions to simulate the real 
> work
> load[1], now burst also has some gain in both x86 and arm64[3]
> 
> I have modified your application[2] to dpdk test application to use
> cpu isolation etc.
> So this is gain in flag scheme ad code is checked in to Github[2[
> 


The benchmark numbers all seem very close between the two schemes. On my
team we pretty much have test ioat & idxd drivers ported internally to the
last dmadev draft library, and have sample apps handling traffic using
those. I'll therefore attempt to get these numbers with real traffic on
real drivers to just double check that it's the same as these
microbenchmarks.

Assuming that perf is the same, how to resolve this? Some thoughts:
* As I understand it, the main objection to the separate doorbell function
  is the use of 8-bytes in fastpath slot. Therefore I will also attempt to
  benchmark having the doorbell function not on the same cacheline and check
  perf impact, if any.
* If we don't have a impact to perf by having the doorbell function inside
  the regular "ops" rather than on fastpath cacheline, there is no reason
  we can't implement both schemes. The user can then choose themselves
  whether to doorbell using a flag on last item, or to doorbell explicitly
  using function call.

Of the two schemes, and assuming they are equal, I do have a preference for
the separate function one, primarily from a code readability point of view.
Other than that, I have no strong opinions.

/Bruce


Re: [dpdk-dev] [PATCH 0/4] features for hns3 PMD

2021-07-09 Thread Andrew Rybchenko
On 7/9/21 4:32 AM, Min Hu (Connor) wrote:
> Thanks Andrew,
> "net/hns3: supports disabling PFC by dev configure API"
> this patch could be abandoned.
> 
> The other three can be applied.
> 
> 
> 在 2021/6/14 23:00, Andrew Rybchenko 写道:
>> On 6/13/21 6:05 AM, Min Hu (Connor) wrote:
>>> This patch set contains 4 features for hns3 PMD:
>>> add query basic info support for VF
>>> support for VF modify VLAN filter state
>>> support multiple TC MAC pause
>>> supports disabling PFC by dev configure API
>>>
>>> Chengchang Tang (2):
>>>    net/hns3: add query basic info support for VF
>>>    net/hns3: support for VF modify VLAN filter state
>>>
>>> Huisong Li (2):
>>>    net/hns3: support multiple TC MAC pause
>>>    net/hns3: supports disabling PFC by dev configure API
>>>
>>>   drivers/net/hns3/hns3_cmd.h   |  9 
>>>   drivers/net/hns3/hns3_dcb.c   | 61 +
>>>   drivers/net/hns3/hns3_ethdev.c    |  5 +-
>>>   drivers/net/hns3/hns3_ethdev.h    |  7 +++
>>>   drivers/net/hns3/hns3_ethdev_vf.c | 96
>>> ++-
>>>   drivers/net/hns3/hns3_mbx.h   | 11 -
>>>   6 files changed, 138 insertions(+), 51 deletions(-)
>>>
>>
>> Overall LGTM, but build breakage must be fixed.
>> .

Please, send a new version with the patch dropped.
I'd like patchwork automation to check it once
again.



Re: [dpdk-dev] [PATCH] app/testpmd: fix testpmd doesn't show RSS hash offload

2021-07-09 Thread Andrew Rybchenko
On 7/9/21 6:57 PM, Jie Wang wrote:
> This patch reapply Rx/Tx offloads configuration for all ports
> after the program configuring the device port. When the program
> configures the ports, the default Rx/Tx offloads are modified.
> 
> So it is need to reapply Rx/Tx offloads configuration before
> testpmd showing offloads.
> 
> Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Jie Wang 
> ---
>  app/test-pmd/testpmd.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 1cdd3cdd12..7089ae216d 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2475,6 +2475,9 @@ start_port(portid_t pid)
>   }
>  
>   if (port->need_reconfig > 0) {
> + const struct rte_eth_dev *dev = &rte_eth_devices[pi];
> + int k;
> +
>   port->need_reconfig = 0;
>  
>   if (flow_isolate_all) {
> @@ -2508,6 +2511,18 @@ start_port(portid_t pid)
>   port->need_reconfig = 1;
>   return -1;
>   }
> +
> + /* Apply TxRx configuration for all ports */
> + port->dev_conf.txmode = dev->data->dev_conf.txmode;
> + port->dev_conf.rxmode = dev->data->dev_conf.rxmode;
> + /* Apply Rx offloads configuration */
> + for (k = 0; k < port->dev_info.max_rx_queues; k++)
> + port->rx_conf[k].offloads =
> + port->dev_conf.rxmode.offloads;
> + /* Apply Tx offloads configuration */
> + for (k = 0; k < port->dev_info.max_tx_queues; k++)
> + port->tx_conf[k].offloads =
> + port->dev_conf.txmode.offloads;

Does testpmd really require these copies? May be the right fix is to get
rid of these copies at all and show actual
information from data->dev_conf ?




Re: [dpdk-dev] [PATCH v7 1/2] power: add support for cppc cpufreq

2021-07-09 Thread David Hunt



On 9/7/2021 4:34 AM, Richael Zhuang wrote:

Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are
supported, which are both not available on arm64 platforms. Add
support for cppc_cpufreq driver which works on most arm64 platforms.

Signed-off-by: Richael Zhuang 
---



--snip--

Hi Richael,

It's usuall to inherit the tags from previous versions (such as 
Acked-by:, Tested-by:, etc.) if the changes are relatively minor. That 
helps the patchwork flags keep up to date.


And by "inherit" I mean you can add them in to your patch just below the 
Signed-off-by: tag.


If it's a major re-work of a patch, then there might need to be 
re-review or re-ack.


If you’re doing a re-spin of this patch set for the clang issue, that 
might be a good opportunity to include the tags from previous versions.


Rgds,
Dave.





Re: [dpdk-dev] [PATCH v7 1/2] power: add support for cppc cpufreq

2021-07-09 Thread Richael Zhuang


> -Original Message-
> From: David Marchand 
> Sent: Friday, July 9, 2021 5:10 PM
> To: Richael Zhuang 
> Cc: dev ; Yu Jiang ; David Hunt
> 
> Subject: Re: [dpdk-dev] [PATCH v7 1/2] power: add support for cppc cpufreq
> 
> On Fri, Jul 9, 2021 at 5:34 AM Richael Zhuang 
> wrote:
> >
> > Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are
> > supported, which are both not available on arm64 platforms. Add
> > support for cppc_cpufreq driver which works on most arm64 platforms.
> 
> GHA reported build issues with clang.
> https://github.com/ovsrobot/dpdk/runs/3025478784?check_suite_focus=tru
> e#step:15:787
> 
> Could you look at it?
> Thanks.
> 
> --
> David Marchand

OK, I will fix it soon.


[dpdk-dev] [PATCH v8 0/2] power: add support for cppc cpufreq driver

2021-07-09 Thread Richael Zhuang
v8:
fix build issue with clang

Richael Zhuang (2):
  power: add support for cppc cpufreq
  test/power: round cpuinfo cur freq only in CPPC cpufreq

 app/test/test_power.c  |   3 +-
 app/test/test_power_cpufreq.c  |  26 +-
 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build  |   1 +
 lib/power/power_cppc_cpufreq.c | 680 +
 lib/power/power_cppc_cpufreq.h | 230 +
 lib/power/rte_power.c  |  26 +
 lib/power/rte_power.h  |   2 +-
 8 files changed, 960 insertions(+), 12 deletions(-)
 create mode 100644 lib/power/power_cppc_cpufreq.c
 create mode 100644 lib/power/power_cppc_cpufreq.h

-- 
2.20.1



[dpdk-dev] [PATCH v8 1/2] power: add support for cppc cpufreq

2021-07-09 Thread Richael Zhuang
Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are
supported, which are both not available on arm64 platforms. Add
support for cppc_cpufreq driver which works on most arm64 platforms.

Signed-off-by: Richael Zhuang 
Acked-by: David Hunt 

---
 app/test/test_power.c  |   3 +-
 app/test/test_power_cpufreq.c  |   3 +-
 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build  |   1 +
 lib/power/power_cppc_cpufreq.c | 680 +
 lib/power/power_cppc_cpufreq.h | 230 +
 lib/power/rte_power.c  |  26 +
 lib/power/rte_power.h  |   2 +-
 8 files changed, 946 insertions(+), 3 deletions(-)
 create mode 100644 lib/power/power_cppc_cpufreq.c
 create mode 100644 lib/power/power_cppc_cpufreq.h

diff --git a/app/test/test_power.c b/app/test/test_power.c
index da1d67c0ab..b7b5561348 100644
--- a/app/test/test_power.c
+++ b/app/test/test_power.c
@@ -133,7 +133,8 @@ test_power(void)
/* Perform tests for valid environments.*/
const enum power_management_env envs[] = {PM_ENV_ACPI_CPUFREQ,
PM_ENV_KVM_VM,
-   PM_ENV_PSTATE_CPUFREQ};
+   PM_ENV_PSTATE_CPUFREQ,
+   PM_ENV_CPPC_CPUFREQ};
 
unsigned int i;
for (i = 0; i < RTE_DIM(envs); ++i) {
diff --git a/app/test/test_power_cpufreq.c b/app/test/test_power_cpufreq.c
index 0c3adc5f33..8516df4ca6 100644
--- a/app/test/test_power_cpufreq.c
+++ b/app/test/test_power_cpufreq.c
@@ -496,7 +496,8 @@ test_power_cpufreq(void)
 
/* Test environment configuration */
env = rte_power_get_env();
-   if ((env != PM_ENV_ACPI_CPUFREQ) && (env != PM_ENV_PSTATE_CPUFREQ)) {
+   if ((env != PM_ENV_ACPI_CPUFREQ) && (env != PM_ENV_PSTATE_CPUFREQ) &&
+   (env != PM_ENV_CPPC_CPUFREQ)) {
printf("Unexpectedly got an environment other than 
ACPI/PSTATE\n");
goto fail_all;
}
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index c92e016783..6fd9f0168a 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -103,6 +103,10 @@ New Features
   usecases. Configuration happens via standard rawdev enq/deq operations. See
   the :doc:`../rawdevs/cnxk_bphy` rawdev guide for more details on this driver.
 
+* **Added cppc_cpufreq support to Power Management library.**
+
+  Added support for cppc_cpufreq driver which works on most arm64 platforms.
+
 
 Removed Items
 -
diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..36e5a65874 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -9,6 +9,7 @@ sources = files(
 'guest_channel.c',
 'power_acpi_cpufreq.c',
 'power_common.c',
+'power_cppc_cpufreq.c',
 'power_kvm_vm.c',
 'power_pstate_cpufreq.c',
 'rte_power.c',
diff --git a/lib/power/power_cppc_cpufreq.c b/lib/power/power_cppc_cpufreq.c
new file mode 100644
index 00..e92973ab54
--- /dev/null
+++ b/lib/power/power_cppc_cpufreq.c
@@ -0,0 +1,680 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2021 Intel Corporation
+ * Copyright(c) 2021 Arm Limited
+ */
+
+#include 
+#include 
+
+#include "power_cppc_cpufreq.h"
+#include "power_common.h"
+
+/* macros used for rounding frequency to nearest 10 */
+#define FREQ_ROUNDING_DELTA 5
+#define ROUND_FREQ_TO_N_10 10
+
+/* the unit of highest_perf and nominal_perf differs on different arm 
platforms.
+ * For highest_perf, it maybe 300 or 300, both means 3.0GHz.
+ */
+#define UNIT_DIFF 1
+
+#define POWER_CONVERT_TO_DECIMAL 10
+
+#define POWER_GOVERNOR_USERSPACE "userspace"
+#define POWER_SYSFILE_SETSPEED   \
+   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_setspeed"
+#define POWER_SYSFILE_SCALING_MAX_FREQ \
+   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_max_freq"
+#define POWER_SYSFILE_SCALING_MIN_FREQ  \
+   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_min_freq"
+#define POWER_SYSFILE_HIGHEST_PERF \
+   "/sys/devices/system/cpu/cpu%u/acpi_cppc/highest_perf"
+#define POWER_SYSFILE_NOMINAL_PERF \
+   "/sys/devices/system/cpu/cpu%u/acpi_cppc/nominal_perf"
+#define POWER_SYSFILE_SYS_MAX \
+   "/sys/devices/system/cpu/cpu%u/cpufreq/cpuinfo_max_freq"
+
+#define POWER_CPPC_DRIVER "cppc-cpufreq"
+#define BUS_FREQ 10
+
+enum power_state {
+   POWER_IDLE = 0,
+   POWER_ONGOING,
+   POWER_USED,
+   POWER_UNKNOWN
+};
+
+/**
+ * Power info per lcore.
+ */
+struct cppc_power_info {
+   unsigned int lcore_id;   /**< Logical core id */
+   uint32_t state;  /**< Power in use state */
+   FILE *f; /**< FD of scaling_setspeed */
+   char governor_o

[dpdk-dev] [PATCH v8 2/2] test/power: round cpuinfo cur freq only in CPPC cpufreq

2021-07-09 Thread Richael Zhuang
On arm platform, the value in "/sys/.../cpuinfo_cur_freq" may not
be exactly the same as what was set when using CPPC cpufreq driver.
For other cpufreq driver, no need to round it currently, or else
this check will fail with turbo enabled. For example, with acpi_cpufreq,
cpuinfo_cur_freq can be 2401000 which is equal to freqs[0].It should
not be rounded to 240.

Fixes: 606a234c6d360 ("test/power: round CPU frequency to check")
Cc: richael.zhu...@arm.com
Cc: sta...@dpdk.org

Signed-off-by: Richael Zhuang 
Acked-by: David Hunt 
---
 app/test/test_power_cpufreq.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/app/test/test_power_cpufreq.c b/app/test/test_power_cpufreq.c
index 8516df4ca6..b8fc53925c 100644
--- a/app/test/test_power_cpufreq.c
+++ b/app/test/test_power_cpufreq.c
@@ -55,7 +55,9 @@ check_cur_freq(unsigned int lcore_id, uint32_t idx, bool 
turbo)
FILE *f;
char fullpath[PATH_MAX];
char buf[BUFSIZ];
+   enum power_management_env env;
uint32_t cur_freq;
+   uint32_t freq_conv;
int ret = -1;
int i;
 
@@ -80,15 +82,18 @@ check_cur_freq(unsigned int lcore_id, uint32_t idx, bool 
turbo)
goto fail_all;
 
cur_freq = strtoul(buf, NULL, TEST_POWER_CONVERT_TO_DECIMAL);
-
-   /* convert the frequency to nearest 10 value
-* Ex: if cur_freq=1396789 then freq_conv=140
-* Ex: if cur_freq=800030 then freq_conv=80
-*/
-   unsigned int freq_conv = 0;
-   freq_conv = (cur_freq + TEST_FREQ_ROUNDING_DELTA)
-   / TEST_ROUND_FREQ_TO_N_10;
-   freq_conv = freq_conv * TEST_ROUND_FREQ_TO_N_10;
+   freq_conv = cur_freq;
+
+   env = rte_power_get_env();
+   if (env == PM_ENV_CPPC_CPUFREQ) {
+   /* convert the frequency to nearest 10 value
+* Ex: if cur_freq=1396789 then freq_conv=140
+* Ex: if cur_freq=800030 then freq_conv=80
+*/
+   freq_conv = (cur_freq + TEST_FREQ_ROUNDING_DELTA)
+   / TEST_ROUND_FREQ_TO_N_10;
+   freq_conv = freq_conv * TEST_ROUND_FREQ_TO_N_10;
+   }
 
if (turbo)
ret = (freqs[idx] <= freq_conv ? 0 : -1);
-- 
2.20.1



Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows

2021-07-09 Thread Thomas Monjalon
09/07/2021 02:16, Tyler Retzlaff:
> On Thu, Jul 08, 2021 at 10:39:13PM +0200, Thomas Monjalon wrote:
> > 08/07/2021 21:21, Tyler Retzlaff:
> > > (2) importing exported data symbols from a dll/dso on windows requires
> > > that the symbol be decorated with dllimport. optionally loading
> > > performance of dll/dso is also further improved by decorating
> > > exported function symbols. [3]
> > > 
> > > for (2) we would propose the introduction and use of two macros to
> > > allow decoration of exported data symbols. these macro would be or
> > > similarly named __rte_import and __rte_export. of note
> > 
> > That's the same symbol declared in a single place
> > which is exported and imported.
> > So I don't understand the need for 2 macros.
> 
> i may be misinterpreting your reply. you're saying there is no need for
> 2 because we use .def files?
> 
> strictly speaking when exporting C symbols this is true. so yes, we
> could introduce only __rte_import and not bother with __rte_export.
> 
> is that what you meant?
> 
> i don't have any objection to just __rte_import alone but it is
> mandatory for data symbols.

It may be my misunderstanding.
The function is declared only once in the .h
so I don't understand where these 2 macros are used.




Re: [dpdk-dev] [PATCH] crypto/cnxk: add PCI ID for cn9k

2021-07-09 Thread David Marchand
On Fri, Jul 9, 2021 at 8:29 AM Anoob Joseph  wrote:
>
> Add PCI ID for crypo_cn9k PMD.
>
> To avoid conflicting PCI ID in crypto_octeontx2 and
> crypto_cn9k PMDs, disable crypto_cn9k PMD when built with
> octeontx2 config.
>
> The lack of PCI ID is causing debug build to fail for
> crypto_cn9k PMD.
>
> Reported-by: Ali Alnubani 
> Suggested-by: David Marchand 
> Signed-off-by: Anoob Joseph 

Applied, thanks.


-- 
David Marchand



Re: [dpdk-dev] [PATCH] crypto/cnxk: fix debug build failure

2021-07-09 Thread David Marchand
On Fri, Jul 9, 2021 at 10:47 AM Tejasree Kondoj  wrote:
>
> Removing usage of unavailable macro.

Fixes: baee42a6beff ("crypto/cnxk: add IPsec datapath")

>
> Reported-by: Ali Alnubani 
> Suggested-by: David Marchand 
> Signed-off-by: Tejasree Kondoj 

Applied, thanks.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v1] net/octeontx/base: fix debug build with clang

2021-07-09 Thread David Marchand
On Fri, Jul 9, 2021 at 8:03 AM Shijith Thotton  wrote:
>
> From: David Marchand 
>
> Remove conflicting declaration of this symbol.
>
> Fixes: d0d654986018 ("net/octeontx: support event Rx adapter")
> Cc: sta...@dpdk.org
>
> Signed-off-by: David Marchand 

Applied, thanks.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v7] build: use platform for generic and native builds

2021-07-09 Thread Thomas Monjalon
07/07/2021 15:59, Bruce Richardson:
> On Tue, Jul 06, 2021 at 11:44:28AM +0200, Juraj Linkeš wrote:
> > The current meson option 'machine' should only specify the ISA, which is
> > not sufficient for Arm, where setting ISA implies other settings as well
> > (and is used in Arm configuration as such).
> > Use the existing 'platform' meson option to differentiate the type of
> > the build (native/generic) and set ISA accordingly, unless the user
> > chooses to override it with a new option, 'cpu_instruction_set'.
> > The 'machine' option set the ISA in x86 builds and set native/default
> > 'build type' in aarch64 builds. These two new variables, 'platform' and
> > 'cpu_instruction_set', now properly set both ISA and build type for all
> > architectures in a uniform manner.
> > The 'machine' option also doesn't describe very well what it sets. The
> > new option, 'cpu_instruction_set', is much more descriptive. Keep
> > 'machine' for backwards compatibility.
> > 
> > Signed-off-by: Juraj Linkeš 
> 
> Acked-by: Bruce Richardson 

Applied, thanks.

I did few doc formatting improvements and removed some mentions to Arm
as a SoC is not necessarily Arm (Risc-V is coming).




Re: [dpdk-dev] [PATCH] doc: fix build on Windows with meson 0.58

2021-07-09 Thread Thomas Monjalon
01/07/2021 11:57, Bruce Richardson:
> On Thu, Jul 01, 2021 at 09:34:04AM +0100, Luca Boccassi wrote:
> > On Wed, 2021-06-30 at 19:22 +0300, Dmitry Kozlyuk wrote:
> > > The `doc` target used `echo` as its command.
> > > On Windows, `echo` is always a shell built-in, there is no binary.
> > > Starting from meson 0.58, `run_target()` always searches for command
> > > executable and no longer accepts `echo` as such on Windows.
> > > Replace plain `echo` with a Python one-liner.
> > > 
> > > Fixes: d02a2dab2dfb ("doc: support building HTML guides with meson")
> > > Cc: Bruce Richardson 
> > > Cc: Luca Boccassi 
> > > Cc: sta...@dpdk.org
> > > 
> > > Reported-by: Rob Scheepens 
> > > Signed-off-by: Dmitry Kozlyuk 
> > > ---
> > 
> > Acked-by: Luca Boccassi 
> 
> One small suggestion might be to move the "echo" command definition to
> buildtools folder in case it's wanted for use anywhere in the build at some
> point in the future. However, this patch is fine without that change too.

I did the move in buildtools/meson.build:
echo = py3 + ['-c', 'import sys; print(*sys.argv[1:])']

> Acked-by: Bruce Richardson 

Acked-by: Thomas Monjalon 

Applied, thanks




Re: [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison

2021-07-09 Thread Thomas Monjalon
08/07/2021 16:13, Anatoly Burakov:
>  doc/guides/rel_notes/release_21_08.rst|  2 ++
>  drivers/event/dlb2/dlb2.c | 17 --
>  drivers/net/i40e/i40e_rxtx.c  | 20 +++
>  drivers/net/iavf/iavf_rxtx.c  | 20 +++
>  drivers/net/ice/ice_rxtx.c| 20 +++
>  drivers/net/ixgbe/ixgbe_rxtx.c| 20 +++
>  drivers/net/mlx5/mlx5_rx.c| 17 --
>  .../include/generic/rte_power_intrinsics.h| 33 +++
>  lib/eal/x86/rte_power_intrinsics.c| 17 +-
>  9 files changed, 122 insertions(+), 44 deletions(-)

About the title, it is introducing a new prefix "power_intrinsics:"
with is not so much descriptive.
Probably better to formulate with "eal:" prefix.


> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t 
> rx_queue_id)
>   return rx_queue_count(rxq);
>  }
>  
> +#define CLB_VAL_IDX 0
> +#define CLB_MSK_IDX 1
> +static int
> +mlx_monitor_callback(const uint64_t value,
> + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])

Everything is prefixed with mlx5, let's be consistent.
Please replace mlx_ with mlx5_





Re: [dpdk-dev] [PATCH v7] build: use platform for generic and native builds

2021-07-09 Thread Juraj Linkeš


> -Original Message-
> From: Thomas Monjalon 
> Sent: Friday, July 9, 2021 2:31 PM
> To: Juraj Linkeš ; Bruce Richardson
> 
> Cc: dev@dpdk.org; david.march...@redhat.com;
> honnappa.nagaraha...@arm.com; ruifeng.w...@arm.com;
> ferruh.yi...@intel.com; jerinjac...@gmail.com
> Subject: Re: [dpdk-dev] [PATCH v7] build: use platform for generic and native
> builds
> 
> 07/07/2021 15:59, Bruce Richardson:
> > On Tue, Jul 06, 2021 at 11:44:28AM +0200, Juraj Linkeš wrote:
> > > The current meson option 'machine' should only specify the ISA,
> > > which is not sufficient for Arm, where setting ISA implies other
> > > settings as well (and is used in Arm configuration as such).
> > > Use the existing 'platform' meson option to differentiate the type
> > > of the build (native/generic) and set ISA accordingly, unless the
> > > user chooses to override it with a new option, 'cpu_instruction_set'.
> > > The 'machine' option set the ISA in x86 builds and set
> > > native/default 'build type' in aarch64 builds. These two new
> > > variables, 'platform' and 'cpu_instruction_set', now properly set
> > > both ISA and build type for all architectures in a uniform manner.
> > > The 'machine' option also doesn't describe very well what it sets.
> > > The new option, 'cpu_instruction_set', is much more descriptive.
> > > Keep 'machine' for backwards compatibility.
> > >
> > > Signed-off-by: Juraj Linkeš 
> >
> > Acked-by: Bruce Richardson 
> 
> Applied, thanks.
> 
> I did few doc formatting improvements and removed some mentions to Arm as
> a SoC is not necessarily Arm (Risc-V is coming).
> 
> 

Ok, thanks, Thomas.


Re: [dpdk-dev] [PATCH v8 0/2] power: add support for cppc cpufreq driver

2021-07-09 Thread David Marchand
On Fri, Jul 9, 2021 at 12:56 PM Richael Zhuang  wrote:
>
> v8:
> fix build issue with clang

This cover letter is empty and we only have a partial changelog.
I'll trust David Hunt reviews and go ahead with the series.
Please put some effort in the cover letter and tracking changes for
future contributions.


>
> Richael Zhuang (2):
>   power: add support for cppc cpufreq
>   test/power: round cpuinfo cur freq only in CPPC cpufreq
>
>  app/test/test_power.c  |   3 +-
>  app/test/test_power_cpufreq.c  |  26 +-
>  doc/guides/rel_notes/release_21_08.rst |   4 +
>  lib/power/meson.build  |   1 +
>  lib/power/power_cppc_cpufreq.c | 680 +
>  lib/power/power_cppc_cpufreq.h | 230 +
>  lib/power/rte_power.c  |  26 +
>  lib/power/rte_power.h  |   2 +-
>  8 files changed, 960 insertions(+), 12 deletions(-)
>  create mode 100644 lib/power/power_cppc_cpufreq.c
>  create mode 100644 lib/power/power_cppc_cpufreq.h

Series applied, thanks.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v8 5/7] power: support callbacks for multiple Rx queues

2021-07-09 Thread David Marchand
On Thu, Jul 8, 2021 at 4:14 PM Anatoly Burakov
 wrote:
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 403c2b03a3..a96e12d155 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information.
>  * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
>  * **[related] API**: ``rte_eth_rx_burst_mode_get()``, 
> ``rte_eth_tx_burst_mode_get()``.
>
> +.. _nic_features_get_monitor_addr:
> +
> +PMD power management using monitor addresses
> +
> +
> +Supports getting a monitoring condition to use together with Ethernet PMD 
> power
> +management (see :doc:`../prog_guide/power_man` for more details).
> +
> +* **[implements] eth_dev_ops**: ``get_monitor_addr``
> +

- This new ethdev feature deserves its own commit.

- Adding ethdev maintainers.

We are missing a doc/guides/nics/features/default.ini.
The name of the features proposed here is rather long.
As far as I can see, features should be shorter than:
doc/guides/conf.py:feature_str_len = 30
I am not really inspired.. "Power mgmt address monitor"?

- pmd supporting this feature must have their .ini updated.


>  .. _nic_features_other:
>
>  Other dev ops not represented by a Feature



-- 
David Marchand



Re: [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison

2021-07-09 Thread Burakov, Anatoly

On 09-Jul-21 2:46 PM, Thomas Monjalon wrote:

08/07/2021 16:13, Anatoly Burakov:

  doc/guides/rel_notes/release_21_08.rst|  2 ++
  drivers/event/dlb2/dlb2.c | 17 --
  drivers/net/i40e/i40e_rxtx.c  | 20 +++
  drivers/net/iavf/iavf_rxtx.c  | 20 +++
  drivers/net/ice/ice_rxtx.c| 20 +++
  drivers/net/ixgbe/ixgbe_rxtx.c| 20 +++
  drivers/net/mlx5/mlx5_rx.c| 17 --
  .../include/generic/rte_power_intrinsics.h| 33 +++
  lib/eal/x86/rte_power_intrinsics.c| 17 +-
  9 files changed, 122 insertions(+), 44 deletions(-)


About the title, it is introducing a new prefix "power_intrinsics:"
with is not so much descriptive.
Probably better to formulate with "eal:" prefix.







--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t 
rx_queue_id)
return rx_queue_count(rxq);
  }
  
+#define CLB_VAL_IDX 0

+#define CLB_MSK_IDX 1
+static int
+mlx_monitor_callback(const uint64_t value,
+   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])


Everything is prefixed with mlx5, let's be consistent.
Please replace mlx_ with mlx5_



Sure, will fix.

--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v8 5/7] power: support callbacks for multiple Rx queues

2021-07-09 Thread Burakov, Anatoly

On 09-Jul-21 3:24 PM, David Marchand wrote:

On Thu, Jul 8, 2021 at 4:14 PM Anatoly Burakov
 wrote:

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 403c2b03a3..a96e12d155 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information.
  * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
  * **[related] API**: ``rte_eth_rx_burst_mode_get()``, 
``rte_eth_tx_burst_mode_get()``.

+.. _nic_features_get_monitor_addr:
+
+PMD power management using monitor addresses
+
+
+Supports getting a monitoring condition to use together with Ethernet PMD power
+management (see :doc:`../prog_guide/power_man` for more details).
+
+* **[implements] eth_dev_ops**: ``get_monitor_addr``
+


- This new ethdev feature deserves its own commit.

- Adding ethdev maintainers.

We are missing a doc/guides/nics/features/default.ini.
The name of the features proposed here is rather long.
As far as I can see, features should be shorter than:
doc/guides/conf.py:feature_str_len = 30
I am not really inspired.. "Power mgmt address monitor"?

- pmd supporting this feature must have their .ini updated.



  .. _nic_features_other:

  Other dev ops not represented by a Feature






You'd have to walk me through/translate whatever it is that you just 
said as i have no idea what any of that means :D


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v8 5/7] power: support callbacks for multiple Rx queues

2021-07-09 Thread David Marchand
On Fri, Jul 9, 2021 at 4:42 PM Burakov, Anatoly
 wrote:
> You'd have to walk me through/translate whatever it is that you just
> said as i have no idea what any of that means :D

https://git.dpdk.org/dpdk/commit/?id=fa5dbd825a


-- 
David Marchand



Re: [dpdk-dev] [PATCH v8 7/7] l3fwd-power: support multiqueue in PMD pmgmt modes

2021-07-09 Thread David Marchand
On Thu, Jul 8, 2021 at 4:14 PM Anatoly Burakov
 wrote:
>
> Currently, l3fwd-power enforces the limitation of having one queue per
> lcore. This is no longer necessary, so remove the limitation.

Title prefix is examples/l3fwd-power: for consistency sake.


-- 
David Marchand



Re: [dpdk-dev] [PATCH v8 5/7] power: support callbacks for multiple Rx queues

2021-07-09 Thread Burakov, Anatoly

On 09-Jul-21 3:46 PM, David Marchand wrote:

On Fri, Jul 9, 2021 at 4:42 PM Burakov, Anatoly
 wrote:

You'd have to walk me through/translate whatever it is that you just
said as i have no idea what any of that means :D


https://git.dpdk.org/dpdk/commit/?id=fa5dbd825a




Thanks, i'll separate it out into a separate commit and try to come up 
with a good name for it!


--
Thanks,
Anatoly


[dpdk-dev] [PATCH v1 1/1] power: fix potential null dereferences

2021-07-09 Thread Anatoly Burakov
Currently, the error paths can lead to attempts at dereferencing NULL
pointers. Add the check to avoid attempts at dereferencing NULL
pointers.

Fixes: 06cffd468fdd ("power: refactor ACPI and intel_pstate support")
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 
---
 lib/power/power_pstate_cpufreq.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c
index ba28ddcfca..3b607515fd 100644
--- a/lib/power/power_pstate_cpufreq.c
+++ b/lib/power/power_pstate_cpufreq.c
@@ -440,8 +440,10 @@ power_get_available_freqs(struct pstate_power_info *pi)
num_freqs, pi->lcore_id);
 
 out:
-   fclose(f_min);
-   fclose(f_max);
+   if (f_min != NULL)
+   fclose(f_min);
+   if (f_max != NULL)
+   fclose(f_max);
 
return ret;
 }
-- 
2.25.1



Re: [dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion

2021-07-09 Thread David Marchand
On Thu, Jun 24, 2021 at 4:35 PM Burakov, Anatoly
 wrote:
> Right, so the idea is store the PMD-specific data in the monitor
> condition, and leave it to the callback to interpret it.
>
> The obvious question then is, how many values is enough? Two? Three?
> Four? This option doesn't really solve the basic issue, it just kicks
> the can down the road. I guess three values should be enough for
> everyone (tm) ? :D

Can we have multiple callbacks executed for a given rxq?

Since this is on the rx path, I would say no, but I might be missing
some consideration for future evols of this API.
In this case, a private field in rxq seems a good place for temporary
storage, and then we simply pass a void * opaque pointer.


-- 
David marchand



Re: [dpdk-dev] [dpdk-stable] [PATCH v3] build: fix symlink of drivers for Windows

2021-07-09 Thread Thomas Monjalon
28/05/2021 10:19, Bruce Richardson:
> On Mon, Apr 26, 2021 at 11:07:32AM +0100, Nick Connolly wrote:
> > The symlink-drivers-solibs.sh script was disabled as part of 'install'
> > for Windows because there is no support for shell scripts. However,
> > this means that driver related DLLs are not present in the installed
> > 'libdir' directory. Add a python script to perform the install and use
> > it for Windows if the version of meson supports using an external
> > program with add_install_script (>= 0.55.0).
> > 
> > On Windows, symbolic links are somewhat problematic since the
> > SeCreateSymbolicLinkPrivilege is required to be able to create them.
> > In addition, different cross-compilation environments handle symbolic
> > links differently, e.g. WSL, Msys2, Cygwin. Rather than trying to
> > distinguish these scenarios, the python script will perform a file copy
> > for any Windows specific names.
> > 
> > On Windows, the shared library outputs have different names depending
> > upon which toolset has been used to build them. The script currently
> > handles Clang and GCC.
> > 
> > On Linux the functionality is unchanged, but could be replaced with the
> > python script once the required minimum version of meson is >= 0.55.0.
> > 
> > Fixes: 5c7d86948764 ("build: fix install on Windows")
> > Cc: sta...@dpdk.org
> > 
> > Signed-off-by: Nick Connolly 
> > Tested-by: Narcisa Vasile 
> > Acked-by: Narcisa Vasile 
> > ---
> Reviewed-by: Bruce Richardson 

Added the new file in MAINTAINERS.

Applied, thanks.




Re: [dpdk-dev] [PATCH v1 1/1] power: fix potential null dereferences

2021-07-09 Thread David Marchand
On Fri, Jul 9, 2021 at 5:02 PM Anatoly Burakov
 wrote:
>
> Currently, the error paths can lead to attempts at dereferencing NULL
> pointers. Add the check to avoid attempts at dereferencing NULL
> pointers.
>

Coverity issue: 371895

> Fixes: 06cffd468fdd ("power: refactor ACPI and intel_pstate support")
>
> Signed-off-by: Anatoly Burakov 

Reviewed-by: David Marchand 


-- 
David Marchand



Re: [dpdk-dev] [PATCH v1 1/1] power: fix potential null dereferences

2021-07-09 Thread David Hunt



On 9/7/2021 4:09 PM, David Marchand wrote:

On Fri, Jul 9, 2021 at 5:02 PM Anatoly Burakov
 wrote:

Currently, the error paths can lead to attempts at dereferencing NULL
pointers. Add the check to avoid attempts at dereferencing NULL
pointers.


Coverity issue: 371895



This patch also fixes

Coverity issue: 371889

Rgds,

Dave





Fixes: 06cffd468fdd ("power: refactor ACPI and intel_pstate support")

Signed-off-by: Anatoly Burakov 

Reviewed-by: David Marchand 




Re: [dpdk-dev] [PATCH] kni: update link only on change

2021-07-09 Thread Thomas Monjalon
24/06/2021 15:32, Ferruh Yigit:
> 'rte_kni_update_link()' updates virtual KNI interface link using kernel
> sysfs interface.
> If the requested link status is same as interface link status, do not
> update the link status but return with success.
> 
> Signed-off-by: Ferruh Yigit 

Applied, thanks.





[dpdk-dev] [PATCH 1/3] bitrate: change reg implementation to match API description

2021-07-09 Thread Kevin Traynor
rte_stats_bitrate_reg() API states it returns 'Zero on success'.

However, the implementation directly returns the return of
rte_metrics_reg_names() which may be zero or positive on success,
with a positive value also indicating the index.

The user of rte_stats_bitrate_reg() should not care about the
index as it is stored in the opaque rte_stats_bitrates struct.

Change the implementation of rte_stats_bitrate_reg() to match
the API description by always returning zero on success.

Fixes: 2ad7ba9a6567 ("bitrate: add bitrate statistics library")

Signed-off-by: Kevin Traynor 
---
 lib/bitratestats/rte_bitrate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/bitratestats/rte_bitrate.c b/lib/bitratestats/rte_bitrate.c
index 8fd9f47288..e23e38bc94 100644
--- a/lib/bitratestats/rte_bitrate.c
+++ b/lib/bitratestats/rte_bitrate.c
@@ -56,6 +56,8 @@ rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data)
 
return_value = rte_metrics_reg_names(&names[0], RTE_DIM(names));
-   if (return_value >= 0)
+   if (return_value >= 0) {
bitrate_data->id_stats_set = return_value;
+   return 0;
+   }
return return_value;
 }
-- 
2.31.1



[dpdk-dev] [PATCH 2/3] bitrate: change calc implementation to match API description

2021-07-09 Thread Kevin Traynor
rte_stats_bitrate_calc() API states it returns 'Negative value on error'.

However, the implementation will return the error code from
rte_eth_stats_get() which may be non-zero on error.

Change the implementation of rte_stats_bitrate_calc() to match
the API description by always returning a negative value on error.

Fixes: 2ad7ba9a6567 ("bitrate: add bitrate statistics library")

Signed-off-by: Kevin Traynor 
---
 lib/bitratestats/rte_bitrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bitratestats/rte_bitrate.c b/lib/bitratestats/rte_bitrate.c
index e23e38bc94..1664e4863b 100644
--- a/lib/bitratestats/rte_bitrate.c
+++ b/lib/bitratestats/rte_bitrate.c
@@ -81,5 +81,5 @@ rte_stats_bitrate_calc(struct rte_stats_bitrates 
*bitrate_data,
ret_code = rte_eth_stats_get(port_id, ð_stats);
if (ret_code != 0)
-   return ret_code;
+   return ret_code < 0 ? ret_code : -ret_code;
 
port_data = &bitrate_data->port_stats[port_id];
-- 
2.31.1



[dpdk-dev] [PATCH 3/3] bitrate: promote rte_stats_bitrate_free() to stable

2021-07-09 Thread Kevin Traynor
rte_stats_bitrate_free() has been in DPDK since 20.11.

Its signature is very basic as it just frees an opaque
data struct allocated in rte_stats_bitrate_create()
and returns void.

It's unlikely that such a basic signature would need to change
so might as well promote it to stable for the next major ABI.

Cc: Hemant Agrawal 

Signed-off-by: Kevin Traynor 
---
 lib/bitratestats/rte_bitrate.h | 3 ---
 lib/bitratestats/version.map   | 4 ++--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/bitratestats/rte_bitrate.h b/lib/bitratestats/rte_bitrate.h
index fcd1564ddc..e494389b95 100644
--- a/lib/bitratestats/rte_bitrate.h
+++ b/lib/bitratestats/rte_bitrate.h
@@ -8,6 +8,4 @@
 #include 
 
-#include 
-
 #ifdef __cplusplus
 extern "C" {
@@ -36,5 +34,4 @@ struct rte_stats_bitrates *rte_stats_bitrate_create(void);
  *   Pointer allocated by rte_stats_bitrate_create()
  */
-__rte_experimental
 void rte_stats_bitrate_free(struct rte_stats_bitrates *bitrate_data);
 
diff --git a/lib/bitratestats/version.map b/lib/bitratestats/version.map
index 152730bb4e..a14d21ebba 100644
--- a/lib/bitratestats/version.map
+++ b/lib/bitratestats/version.map
@@ -9,7 +9,7 @@ DPDK_21 {
 };
 
-EXPERIMENTAL {
+DPDK_22 {
global:
 
rte_stats_bitrate_free;
-};
+} DPDK_21;
-- 
2.31.1



Re: [dpdk-dev] [PATCH v2 01/20] crypto/cnxk: add driver skeleton

2021-07-09 Thread David Marchand
Hello,

On Thu, Jul 8, 2021 at 10:15 PM David Marchand
 wrote:
> - Reproduced the issue in Ubuntu 18.04 GHA vms:
>
> FAILED: drivers/rte_crypto_cnxk.pmd.c
> /usr/bin/python3 ../buildtools/gen-pmdinfo-cfile.py
> /home/runner/work/dpdk/dpdk/build/buildtools ar
> /home/runner/work/dpdk/dpdk/build/drivers/libtmp_rte_crypto_cnxk.a
> drivers/rte_crypto_cnxk.pmd.c /usr/bin/python3
> ../buildtools/pmdinfogen.py elf
> Traceback (most recent call last):
> File "../buildtools/pmdinfogen.py", line 274, in 
> main()
> File "../buildtools/pmdinfogen.py", line 269, in main
> drivers = load_drivers(image)
> File "../buildtools/pmdinfogen.py", line 203, in load_drivers
> drivers.append(Driver.load(image, symbol))
> File "../buildtools/pmdinfogen.py", line 157, in load
> driver.pci_ids = cls._load_pci_ids(image, pci_table_name_symbol)
> File "../buildtools/pmdinfogen.py", line 177, in _load_pci_ids
> pci_id = rte_pci_id.from_buffer_copy(data)
> ValueError: Buffer size too small (4 instead of at least 12 bytes)

I spent some time trying to understand this error, there is probably
multiple bug(s) somewhere between pyelftools/ctype/pmdinfogen.

In any case, on my f32, before
https://git.dpdk.org/dpdk/commit/?id=b146c30d3c0e (that hides the
issue), with a more "recent" pyelftools + python than Ubuntu 18.04, I
can see:

$ cat $HOME/builds/build-clang-shared/drivers/rte_crypto_cnxk.pmd.c
static __attribute__((unused)) const char *generator =
"../../dpdk/buildtools/pmdinfogen.py";
const char crypto_cn9k_pmd_info[] __attribute__((used)) =
"PMD_INFO_STRING= {\"name\": \"crypto_cn9k\", \"kmod\": \"vfio-pci\",
\"pci_ids\": [[10272, 25926, 28516, 24946], [11824, 11569, 11827,
25446]]}";
const char crypto_cn10k_pmd_info[] __attribute__((used)) =
"PMD_INFO_STRING= {\"name\": \"crypto_cn10k\", \"kmod\": \"vfio-pci\",
\"pci_ids\": [[6013, 41203, 65535, 65535]]}";

$ nm -S /home/dmarchan/builds/build-clang-shared/drivers/librte_crypto_cnxk.so
|grep cpt_table
0001d400 0018 b pci_id_cpt_table
0001d2f0 0018 d pci_id_cpt_table

You'll notice the former is located in bss.

$ hexdump -s $(($(printf "%d" 0x0001d400) - $(printf "%d"
0x1000))) -n $(printf "%d" 0x18)
/home/dmarchan/builds/build-clang-shared/drivers/librte_crypto_cnxk.so
001c400 6e61 2067 6576 7372 6f69 206e 3031 302e
001c410 312e 2820 6546 6f64
001c418

$ hexdump -s $(($(printf "%d" 0x0001d2f0) - $(printf "%d"
0x1000))) -n $(printf "%d" 0x18)
/home/dmarchan/builds/build-clang-shared/drivers/librte_crypto_cnxk.so
001c2f0  00ff 177d a0f3    
001c300    
001c308

Those values match what the pmd.c file contains.

Are we missing a "memset()" for such symbols in pmdinfogen?
Or maybe a warning that we are adding an empty pci ids table..

There is no urgency, but if you find some time to look at this. Thanks.


-- 
David Marchand



Re: [dpdk-dev] [PATCH V6] config/arm: add Qualcomm Centriq 2400 part number

2021-07-09 Thread Thomas Monjalon
21/06/2021 03:52, Ruifeng Wang:
> From: Thomas Monjalon 
> > 18/06/2021 10:53, Thierry Herbelot:
> > > On 6/18/21 10:51 AM, Thomas Monjalon wrote:
> > > > 18/06/2021 04:09, Ruifeng Wang:
> > > >> From: Thierry Herbelot 
> > > >>>   'part_number_config': {
> > > >>> -'0xc00': {'machine_args':  ['-march=armv8-a+crc']}
> > > >>> +'0x800': {'machine_args':  ['-march=armv8-a+crc']},
> > > >>> +'0xc00': {'machine_args':  ['-march=armv8-a+crc']},
> > > >> Nit, redundant comma at the end of the line.
> > > >
> > > > What is redundant?
> > >
> > > The comma at the end of the second line is not necessary.
> > 
> > It is a good practice to have comma like other lines, so no need to update 
> > this
> > line when adding more.
> > 
> Looked at style in the rest of the file. Just wanted them to be aligned.
> I'm fine with a trailing comma at the last line.
> 
> Acked-by: Ruifeng Wang 

Applied, thanks





Re: [dpdk-dev] [PATCH] maintainers: update for ARM v8

2021-07-09 Thread Thomas Monjalon
25/06/2021 16:27, jer...@marvell.com:
> From: Jerin Jacob 
> 
> Resigning my maintainership for ARM v8 architecture.
> 
> Signed-off-by: Jerin Jacob 
> ---
> 
> Resigning due to not getting enough quality time to review arm64 architecture
> patches.
> 
> Unlike those days where arm64 architecture started with Cavium HW,
> Now arm64 architecture is quite matured and at par with x86 support, and
> there is enough contribution from Arm.
> 
> I will put my best effort into reviewing the arm64 architecture patches as 
> time
> permits.

Applied




Re: [dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file

2021-07-09 Thread Thomas Monjalon
25/06/2021 10:08, Ferruh Yigit:
> On 6/22/2021 4:50 PM, Ray Kinsella wrote:
> > Update to ABI MAINTAINERS.
> > 
> > Signed-off-by: Ray Kinsella 
> > ---
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> >  ABI Policy & Versioning
> >  M: Ray Kinsella 
> > -M: Neil Horman 
> >  F: lib/eal/include/rte_compat.h
> >  F: lib/eal/include/rte_function_versioning.h
> >  F: doc/guides/contributing/abi_*.rst
> 
> Acked-by: Ferruh Yigit 
> 
> Tried to reach out Neil multiple times for ABI issues without success.

Acked-by: Thomas Monjalon 

Applied




[dpdk-dev] [PATCH v9 2/8] net/af_xdp: add power monitor support

2021-07-09 Thread Anatoly Burakov
Implement support for .get_monitor_addr in AF_XDP driver.

Signed-off-by: Anatoly Burakov 
---

Notes:
v8:
- Fix checkpatch issue

v2:
- Rewrite using the callback mechanism

 drivers/net/af_xdp/rte_eth_af_xdp.c | 34 +
 1 file changed, 34 insertions(+)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index eb5660a3dc..989051dd6d 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "compat.h"
 
@@ -788,6 +789,38 @@ eth_dev_configure(struct rte_eth_dev *dev)
return 0;
 }
 
+#define CLB_VAL_IDX 0
+static int
+eth_monitor_callback(const uint64_t value,
+   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+   const uint64_t v = opaque[CLB_VAL_IDX];
+   const uint64_t m = (uint32_t)~0;
+
+   /* if the value has changed, abort entering power optimized state */
+   return (value & m) == v ? 0 : -1;
+}
+
+static int
+eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
+{
+   struct pkt_rx_queue *rxq = rx_queue;
+   unsigned int *prod = rxq->rx.producer;
+   const uint32_t cur_val = rxq->rx.cached_prod; /* use cached value */
+
+   /* watch for changes in producer ring */
+   pmc->addr = (void *)prod;
+
+   /* store current value */
+   pmc->opaque[CLB_VAL_IDX] = cur_val;
+   pmc->fn = eth_monitor_callback;
+
+   /* AF_XDP producer ring index is 32-bit */
+   pmc->size = sizeof(uint32_t);
+
+   return 0;
+}
+
 static int
 eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
@@ -1448,6 +1481,7 @@ static const struct eth_dev_ops ops = {
.link_update = eth_link_update,
.stats_get = eth_stats_get,
.stats_reset = eth_stats_reset,
+   .get_monitor_addr = eth_get_monitor_addr
 };
 
 /** parse busy_budget argument */
-- 
2.25.1



[dpdk-dev] [PATCH v9 0/8] Enhancements for PMD power management

2021-07-09 Thread Anatoly Burakov
This patchset introduces several changes related to PMD power management:

- Changed monitoring intrinsics to use callbacks as a comparison function, based
  on previous patchset [1] but incorporating feedback [2] - this hopefully will
  make it possible to add support for .get_monitor_addr in virtio
- Add a new intrinsic to monitor multiple addresses, based on RTM instruction
  set and the TPAUSE instruction
- Add support for PMD power management on multiple queues, as well as all
  accompanying infrastructure and example apps changes

v9:
- Added all missing Acks and Tests
- Added a new commit with NIC features
- Addressed minor issues raised in review

v8:
- Fixed checkpatch issue
- Added comment explaining empty poll handling (Konstantin)

v7:
- Fixed various bugs

v6:
- Improved the algorithm for multi-queue sleep
- Fixed segfault and addressed other feedback

v5:
- Removed "power save queue" API and replaced with mechanism suggested by
  Konstantin
- Addressed other feedback

v4:
- Replaced raw number with a macro
- Fixed all the bugs found by Konstantin
- Some other minor corrections

v3:
- Moved some doc updates to NIC features list

v2:
- Changed check inversion to callbacks
- Addressed feedback from Konstantin
- Added doc updates where necessary

[1] http://patches.dpdk.org/project/dpdk/list/?series=16930&state=*
[2] 
http://patches.dpdk.org/project/dpdk/patch/819ef1ace187365a615d3383e54579e3d9fb216e.1620747068.git.anatoly.bura...@intel.com/#133274

Anatoly Burakov (8):
  eal: use callbacks for power monitoring comparison
  net/af_xdp: add power monitor support
  doc: add PMD power management NIC feature
  eal: add power monitor for multiple events
  power: remove thread safety from PMD power API's
  power: support callbacks for multiple Rx queues
  power: support monitoring multiple Rx queues
  examples/l3fwd-power: support multiq in PMD modes

 doc/guides/nics/features.rst  |  10 +
 doc/guides/nics/features/default.ini  |   1 +
 doc/guides/prog_guide/power_man.rst   |  74 +-
 doc/guides/rel_notes/release_21_08.rst|  11 +
 drivers/event/dlb2/dlb2.c |  17 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c   |  34 +
 drivers/net/i40e/i40e_rxtx.c  |  20 +-
 drivers/net/iavf/iavf_rxtx.c  |  20 +-
 drivers/net/ice/ice_rxtx.c|  20 +-
 drivers/net/ixgbe/ixgbe_rxtx.c|  20 +-
 drivers/net/mlx5/mlx5_rx.c|  17 +-
 examples/l3fwd-power/main.c   |   6 -
 lib/eal/arm/rte_power_intrinsics.c|  11 +
 lib/eal/include/generic/rte_cpuflags.h|   2 +
 .../include/generic/rte_power_intrinsics.h|  68 +-
 lib/eal/ppc/rte_power_intrinsics.c|  11 +
 lib/eal/version.map   |   3 +
 lib/eal/x86/rte_cpuflags.c|   2 +
 lib/eal/x86/rte_power_intrinsics.c|  90 ++-
 lib/power/meson.build |   3 +
 lib/power/rte_power_pmd_mgmt.c| 663 +-
 lib/power/rte_power_pmd_mgmt.h|   6 +
 22 files changed, 847 insertions(+), 262 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison

2021-07-09 Thread Anatoly Burakov
Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev 
Signed-off-by: Anatoly Burakov 
Acked-by: Konstantin Ananyev 
Tested-by: David Hunt 
Acked-by: Timothy McDaniel 
---

Notes:
v4:
- Return error if callback is set to NULL
- Replace raw number with a macro in monitor condition opaque data

v2:
- Use callback mechanism for more flexibility
- Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst|  2 ++
 drivers/event/dlb2/dlb2.c | 17 --
 drivers/net/i40e/i40e_rxtx.c  | 20 +++
 drivers/net/iavf/iavf_rxtx.c  | 20 +++
 drivers/net/ice/ice_rxtx.c| 20 +++
 drivers/net/ixgbe/ixgbe_rxtx.c| 20 +++
 drivers/net/mlx5/mlx5_rx.c| 17 --
 .../include/generic/rte_power_intrinsics.h| 33 +++
 lib/eal/x86/rte_power_intrinsics.c| 17 +-
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index 476822b47f..912fb13b84 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -144,6 +144,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 ---
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+   /* abort if the value matches */
+   return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
expected_value = 0;
 
pmc.addr = monitor_addr;
-   pmc.val = expected_value;
-   pmc.mask = qe_mask.raw_qe[1];
+   /* store expected value and comparison mask in opaque data */
+   pmc.opaque[CLB_VAL_IDX] = expected_value;
+   pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+   /* set up callback */
+   pmc.fn = dlb2_monitor_callback;
pmc.size = sizeof(uint64_t);
 
rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index e518409fe5..8489f91f1d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+   const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+   const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+   /*
+* we expect the DD bit to be set to 1 if this descriptor was already
+* written to.
+*/
+   return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct 
rte_power_monitor_cond *pmc)
/* watch for changes in status bit */
pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-   /*
-* we expect the DD bit to be set to 1 if this descriptor was already
-* written to.
-*/
-   pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-   pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+   /* comparison callback */
+   pmc->fn = i40e_monitor_callback;
 
/* registers are 64-bit */
pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_pr

[dpdk-dev] [PATCH v9 3/8] doc: add PMD power management NIC feature

2021-07-09 Thread Anatoly Burakov
At this point, multiple different Ethernet drivers from multiple vendors
will support the PMD power management scheme. It would be useful to add
it to the NIC feature table to indicate support for it.

Suggested-by: David Marchand 
Signed-off-by: Anatoly Burakov 
---
 doc/guides/nics/features.rst | 10 ++
 doc/guides/nics/features/default.ini |  1 +
 2 files changed, 11 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 403c2b03a3..a96e12d155 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information.
 * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
 * **[related] API**: ``rte_eth_rx_burst_mode_get()``, 
``rte_eth_tx_burst_mode_get()``.
 
+.. _nic_features_get_monitor_addr:
+
+PMD power management using monitor addresses
+
+
+Supports getting a monitoring condition to use together with Ethernet PMD power
+management (see :doc:`../prog_guide/power_man` for more details).
+
+* **[implements] eth_dev_ops**: ``get_monitor_addr``
+
 .. _nic_features_other:
 
 Other dev ops not represented by a Feature
diff --git a/doc/guides/nics/features/default.ini 
b/doc/guides/nics/features/default.ini
index 3b55e0ccb0..f1e947bd9e 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -76,6 +76,7 @@ x86-64   =
 Usage doc=
 Design doc   =
 Perf doc =
+Power mgmt address monitor =
 
 [rte_flow items]
 ah   =
-- 
2.25.1



[dpdk-dev] [PATCH v9 4/8] eal: add power monitor for multiple events

2021-07-09 Thread Anatoly Burakov
Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
what UMWAIT does, but without the limitation of having to listen for
just one event. This works because the optimized power state used by the
TPAUSE instruction will cause a wake up on RTM transaction abort, so if
we add the addresses we're interested in to the read-set, any write to
those addresses will wake us up.

Signed-off-by: Konstantin Ananyev 
Signed-off-by: Anatoly Burakov 
Tested-by: David Hunt 
---

Notes:
v4:
- Fixed bugs in accessing the monitor condition
- Abort on any monitor condition not having a defined callback

v2:
- Adapt to callback mechanism

 lib/eal/arm/rte_power_intrinsics.c| 11 +++
 lib/eal/include/generic/rte_cpuflags.h|  2 +
 .../include/generic/rte_power_intrinsics.h| 35 +
 lib/eal/ppc/rte_power_intrinsics.c| 11 +++
 lib/eal/version.map   |  3 +
 lib/eal/x86/rte_cpuflags.c|  2 +
 lib/eal/x86/rte_power_intrinsics.c| 73 +++
 7 files changed, 137 insertions(+)

diff --git a/lib/eal/arm/rte_power_intrinsics.c 
b/lib/eal/arm/rte_power_intrinsics.c
index e83f04072a..78f55b7203 100644
--- a/lib/eal/arm/rte_power_intrinsics.c
+++ b/lib/eal/arm/rte_power_intrinsics.c
@@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 
return -ENOTSUP;
 }
+
+int
+rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp)
+{
+   RTE_SET_USED(pmc);
+   RTE_SET_USED(num);
+   RTE_SET_USED(tsc_timestamp);
+
+   return -ENOTSUP;
+}
diff --git a/lib/eal/include/generic/rte_cpuflags.h 
b/lib/eal/include/generic/rte_cpuflags.h
index 28a5aecde8..d35551e931 100644
--- a/lib/eal/include/generic/rte_cpuflags.h
+++ b/lib/eal/include/generic/rte_cpuflags.h
@@ -24,6 +24,8 @@ struct rte_cpu_intrinsics {
/**< indicates support for rte_power_monitor function */
uint32_t power_pause : 1;
/**< indicates support for rte_power_pause function */
+   uint32_t power_monitor_multi : 1;
+   /**< indicates support for rte_power_monitor_multi function */
 };
 
 /**
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h 
b/lib/eal/include/generic/rte_power_intrinsics.h
index c9aa52a86d..04e8c2ab37 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -128,4 +128,39 @@ int rte_power_monitor_wakeup(const unsigned int lcore_id);
 __rte_experimental
 int rte_power_pause(const uint64_t tsc_timestamp);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Monitor a set of addresses for changes. This will cause the CPU to enter an
+ * architecture-defined optimized power state until either one of the specified
+ * memory addresses is written to, a certain TSC timestamp is reached, or other
+ * reasons cause the CPU to wake up.
+ *
+ * Additionally, `expected` 64-bit values and 64-bit masks are provided. If
+ * mask is non-zero, the current value pointed to by the `p` pointer will be
+ * checked against the expected value, and if they do not match, the entering 
of
+ * optimized power state may be aborted.
+ *
+ * @warning It is responsibility of the user to check if this function is
+ *   supported at runtime using `rte_cpu_get_intrinsics_support()` API call.
+ *   Failing to do so may result in an illegal CPU instruction error.
+ *
+ * @param pmc
+ *   An array of monitoring condition structures.
+ * @param num
+ *   Length of the `pmc` array.
+ * @param tsc_timestamp
+ *   Maximum TSC timestamp to wait for. Note that the wait behavior is
+ *   architecture-dependent.
+ *
+ * @return
+ *   0 on success
+ *   -EINVAL on invalid parameters
+ *   -ENOTSUP if unsupported
+ */
+__rte_experimental
+int rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp);
+
 #endif /* _RTE_POWER_INTRINSIC_H_ */
diff --git a/lib/eal/ppc/rte_power_intrinsics.c 
b/lib/eal/ppc/rte_power_intrinsics.c
index 7fc9586da7..f00b58ade5 100644
--- a/lib/eal/ppc/rte_power_intrinsics.c
+++ b/lib/eal/ppc/rte_power_intrinsics.c
@@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 
return -ENOTSUP;
 }
+
+int
+rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp)
+{
+   RTE_SET_USED(pmc);
+   RTE_SET_USED(num);
+   RTE_SET_USED(tsc_timestamp);
+
+   return -ENOTSUP;
+}
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 2df65c6903..887012d02a 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -423,6 +423,9 @@ EXPERIMENTAL {
rte_version_release; # WINDOWS_NO_EXPORT
rte_version_suffix; # WINDOWS_NO_EXPORT
rte_version_year; # WINDOWS_NO_EXPORT
+
+   # added in 21.08
+   r

[dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's

2021-07-09 Thread Anatoly Burakov
Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov 
Acked-by: Konstantin Ananyev 
Tested-by: David Hunt 
---

Notes:
v2:
- Add check for stopped queue
- Clarified doc message
- Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build  |   3 +
 lib/power/rte_power_pmd_mgmt.c | 133 ++---
 lib/power/rte_power_pmd_mgmt.h |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index 912fb13b84..b9a3caabf0 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -146,6 +146,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 ---
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 36e5a65874..bf937acde4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -22,4 +22,7 @@ headers = files(
 'rte_power_pmd_mgmt.h',
 'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
/**< Callback mode for this queue */
const struct rte_eth_rxtx_callback *cur_cb;
/**< Callback instance */
-   volatile bool umwait_in_progress;
-   /**< are we currently sleeping? */
uint64_t empty_poll_stats;
/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf 
**pkts __rte_unused,
struct rte_power_monitor_cond pmc;
uint16_t ret;
 
-   /*
-* we might get a cancellation request while being
-* inside the callback, in which case the wakeup
-* wouldn't work because it would've arrived too early.
-*
-* to get around this, we notify the other thread that
-* we're sleeping, so that it can spin until we're done.
-* unsolicited wakeups are perfectly safe.
-*/
-   q_conf->umwait_in_progress = true;
-
-   rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-   /* check if we need to cancel sleep */
-   if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-   /* use monitoring condition to sleep */
-   ret = rte_eth_get_monitor_addr(port_id, qidx,
-   &pmc);
-   if (ret == 0)
-   rte_power_monitor(&pmc, UINT64_MAX);
-   }
-   q_conf->umwait_in_progress = false;
-
-   rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+   /* use monitoring condition to sleep */
+   ret = rte_eth_get_monitor_addr(port_id, qidx,
+   &pmc);
+   if (ret == 0)
+   rte_power_monitor(&pmc, UINT64_MAX);
}
} else
q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+   struct rte_eth_rxq_info qinfo;
+
+   if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+   return -1;
+
+   return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
struct 

[dpdk-dev] [PATCH v9 6/8] power: support callbacks for multiple Rx queues

2021-07-09 Thread Anatoly Burakov
Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of queues to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when all queues in the list were polled and were determined
  to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Also, while we're at it, update and improve the docs.

Signed-off-by: Anatoly Burakov 
Acked-by: Konstantin Ananyev 
Tested-by: David Hunt 
---

Notes:
v8:
- Added a comment explaining that we want to sleep on each empty poll after
  threshold has been reached

v7:
- Fix bug where initial sleep target was always set to zero
- Fix logic in handling of n_queues_ready_to_sleep counter
- Update documentation on hardware requirements

v6:
- Track each individual queue sleep status (Konstantin)
- Fix segfault (Dave)

v5:
- Remove the "power save queue" API and replace it with mechanism suggested 
by
  Konstantin

v3:
- Move the list of supported NICs to NIC feature table

v2:
- Use a TAILQ for queues instead of a static array
- Address feedback from Konstantin
- Add additional checks for stopped queues

 doc/guides/prog_guide/power_man.rst|  69 ++--
 doc/guides/rel_notes/release_21_08.rst |   5 +
 lib/power/rte_power_pmd_mgmt.c | 460 +++--
 3 files changed, 398 insertions(+), 136 deletions(-)

diff --git a/doc/guides/prog_guide/power_man.rst 
b/doc/guides/prog_guide/power_man.rst
index c70ae128ac..0e66878892 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -198,34 +198,45 @@ Ethernet PMD Power Management API
 Abstract
 
 
-Existing power management mechanisms require developers
-to change application design or change code to make use of it.
-The PMD power management API provides a convenient alternative
-by utilizing Ethernet PMD RX callbacks,
-and triggering power saving whenever empty poll count reaches a certain number.
-
-Monitor
-   This power saving scheme will put the CPU into optimized power state
-   and use the ``rte_power_monitor()`` function
-   to monitor the Ethernet PMD RX descriptor address,
-   and wake the CPU up whenever there's new traffic.
-
-Pause
-   This power saving scheme will avoid busy polling
-   by either entering power-optimized sleep state
-   with ``rte_power_pause()`` function,
-   or, if it's not available, use ``rte_pause()``.
-
-Frequency scaling
-   This power saving scheme will use ``librte_power`` library
-   functionality to scale the core frequency up/down
-   depending on traffic volume.
-
-.. note::
-
-   Currently, this power management API is limited to mandatory mapping
-   of 1 queue to 1 core (multiple queues are supported,
-   but they must be polled from different cores).
+Existing power management mechanisms require developers to change application
+design or change code to make use of it. The PMD power management API provides 
a
+convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering
+power saving whenever empty poll count reaches a certain number.
+
+* Monitor
+   This power saving scheme will put the CPU into optimized power state and
+   monitor the Ethernet PMD RX descriptor address, waking the CPU up whenever
+   there's new traffic. Support for this scheme may not be available on all
+   platforms, and further limitations may apply (see below).
+
+* Pause
+   This power saving scheme will avoid busy polling by either entering
+   power-optimized sleep state with ``rte_power_pause()`` function, or, if it's
+   not supported by the underlying platform, use ``rte_pause()``.
+
+* Frequency scaling
+   This power saving scheme will use ``librte_power`` library functionality to
+   scale the core frequency up/down depending on traffic volume.
+
+The "monitor" mode is only supported in the following configurations and 
scenarios:
+
+* On Linux* x86_64, `rte_power_monitor()` requires WAITPKG instruction set 
being
+  supported by the CPU. Please refe

[dpdk-dev] [PATCH v9 7/8] power: support monitoring multiple Rx queues

2021-07-09 Thread Anatoly Burakov
Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
Rx queues while entering the energy efficient power state. The multi
version will be used unconditionally if supported, and the UMWAIT one
will only be used when multi-monitor is not supported by the hardware.

Signed-off-by: Anatoly Burakov 
Acked-by: Konstantin Ananyev 
Tested-by: David Hunt 
---

Notes:
v6:
- Fix the missed feedback from v5

v4:
- Fix possible out of bounds access
- Added missing index increment

 doc/guides/prog_guide/power_man.rst | 15 --
 lib/power/rte_power_pmd_mgmt.c  | 82 -
 2 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/doc/guides/prog_guide/power_man.rst 
b/doc/guides/prog_guide/power_man.rst
index 0e66878892..e387d7811e 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -221,17 +221,22 @@ power saving whenever empty poll count reaches a certain 
number.
 The "monitor" mode is only supported in the following configurations and 
scenarios:
 
 * On Linux* x86_64, `rte_power_monitor()` requires WAITPKG instruction set 
being
-  supported by the CPU. Please refer to your platform documentation for further
-  information.
+  supported by the CPU, while `rte_power_monitor_multi()` requires WAITPKG and
+  RTM instruction sets being supported by the CPU. RTM instruction set may also
+  require booting the Linux with `tsx=on` command line parameter. Please refer
+  to your platform documentation for further information.
 
 * If ``rte_cpu_get_intrinsics_support()`` function indicates that
+  ``rte_power_monitor_multi()`` function is supported by the platform, then
+  monitoring multiple Ethernet Rx queues for traffic will be supported.
+
+* If ``rte_cpu_get_intrinsics_support()`` function indicates that only
   ``rte_power_monitor()`` is supported by the platform, then monitoring will be
   limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be
   monitored from a different lcore).
 
-* If ``rte_cpu_get_intrinsics_support()`` function indicates that the
-  ``rte_power_monitor()`` function is not supported, then monitor mode will not
-  be supported.
+* If ``rte_cpu_get_intrinsics_support()`` function indicates that neither of 
the
+  two monitoring functions are supported, then monitor mode will not be 
supported.
 
 * Not all Ethernet drivers support monitoring, even if the underlying
   platform may support the necessary CPU instructions. Please refer to
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 30772791af..2586204b93 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -126,6 +126,32 @@ queue_list_take(struct pmd_core_cfg *cfg, const union 
queue *q)
return found;
 }
 
+static inline int
+get_monitor_addresses(struct pmd_core_cfg *cfg,
+   struct rte_power_monitor_cond *pmc, size_t len)
+{
+   const struct queue_list_entry *qle;
+   size_t i = 0;
+   int ret;
+
+   TAILQ_FOREACH(qle, &cfg->head, next) {
+   const union queue *q = &qle->queue;
+   struct rte_power_monitor_cond *cur;
+
+   /* attempted out of bounds access */
+   if (i >= len) {
+   RTE_LOG(ERR, POWER, "Too many queues being 
monitored\n");
+   return -1;
+   }
+
+   cur = &pmc[i++];
+   ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur);
+   if (ret < 0)
+   return ret;
+   }
+   return 0;
+}
+
 static void
 calc_tsc(void)
 {
@@ -215,6 +241,46 @@ lcore_can_sleep(struct pmd_core_cfg *cfg)
return true;
 }
 
+static uint16_t
+clb_multiwait(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused,
+   struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
+   uint16_t max_pkts __rte_unused, void *arg)
+{
+   const unsigned int lcore = rte_lcore_id();
+   struct queue_list_entry *queue_conf = arg;
+   struct pmd_core_cfg *lcore_conf;
+   const bool empty = nb_rx == 0;
+
+   lcore_conf = &lcore_cfgs[lcore];
+
+   /* early exit */
+   if (likely(!empty))
+   /* early exit */
+   queue_reset(lcore_conf, queue_conf);
+   else {
+   struct rte_power_monitor_cond pmc[lcore_conf->n_queues];
+   int ret;
+
+   /* can this queue sleep? */
+   if (!queue_can_sleep(lcore_conf, queue_conf))
+   return nb_rx;
+
+   /* can this lcore sleep? */
+   if (!lcore_can_sleep(lcore_conf))
+   return nb_rx;
+
+   /* gather all monitoring conditions */
+   ret = get_monitor_addresses(lcore_conf, pmc,
+   lcore_conf->n_queues);
+   if (ret < 0)
+   return nb_rx;
+
+   rte_power_mo

[dpdk-dev] [PATCH v9 8/8] examples/l3fwd-power: support multiq in PMD modes

2021-07-09 Thread Anatoly Burakov
Currently, l3fwd-power enforces the limitation of having one queue per
lcore. This is no longer necessary, so remove the limitation.

Signed-off-by: Anatoly Burakov 
Tested-by: David Hunt 
---
 examples/l3fwd-power/main.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index f8dfed1634..52f56dc405 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -2723,12 +2723,6 @@ main(int argc, char **argv)
printf("\nInitializing rx queues on lcore %u ... ", lcore_id );
fflush(stdout);
 
-   /* PMD power management mode can only do 1 queue per core */
-   if (app_mode == APP_MODE_PMD_MGMT && qconf->n_rx_queue > 1) {
-   rte_exit(EXIT_FAILURE,
-   "In PMD power management mode, only one queue 
per lcore is allowed\n");
-   }
-
/* init RX queues */
for(queue = 0; queue < qconf->n_rx_queue; ++queue) {
struct rte_eth_rxconf rxq_conf;
-- 
2.25.1



Re: [dpdk-dev] [dpdk-stable] [EXT] [PATCH] compress/mlx5: fix memory region unregistration

2021-07-09 Thread Thomas Monjalon
08/07/2021 08:25, Michael Baum:
> Hi,
> 
> From: Akhil Goyal 
> > > The issue can cause illegal physical address access while a huge-page
> > > A is released and huge-page B is allocated on the same virtual address.
> > > The old MR can be matched using the virtual address of huge-page B but
> > > the HW will access the physical address of huge-page A which is no
> > > more part of the DPDK process.
> > >
> > > Register a driver callback for memory event in order to free out all
> > > the MRs of memory that is going to be freed from the dpdk process.
> > >
> > > Fixes: f8c97babc9f4 ("compress/mlx5: add data-path functions")
> > > Cc: sta...@dpdk.org
> > >
> > > Signed-off-by: Michael Baum 
> > > Acked-by: Matan Azrad 
> > > ---
> > CI is reporting error with this patch. Can you check?
> 
> This patch depends on this patch:
> https://patchwork.dpdk.org/project/dpdk/patch/20210628150614.1769507-1-michae...@nvidia.com/
> It sent before and integrated to master-net-mlx branch, but not integrated to 
> main branch.
> This deficiency probably causes these CI errors.

Please send a v2 to trigger a new test, thanks.





[dpdk-dev] [PATCH v2 1/1] power: fix potential null dereferences

2021-07-09 Thread Anatoly Burakov
Currently, the error paths can lead to attempts at dereferencing NULL
pointers. Add the check to avoid attempts at dereferencing NULL
pointers.

Coverity issue: 371895
Coverity issue: 371889
Fixes: 06cffd468fdd ("power: refactor ACPI and intel_pstate support")
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov 
Reviewed-by: David Marchand 
---
 lib/power/power_pstate_cpufreq.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c
index ba28ddcfca..3b607515fd 100644
--- a/lib/power/power_pstate_cpufreq.c
+++ b/lib/power/power_pstate_cpufreq.c
@@ -440,8 +440,10 @@ power_get_available_freqs(struct pstate_power_info *pi)
num_freqs, pi->lcore_id);
 
 out:
-   fclose(f_min);
-   fclose(f_max);
+   if (f_min != NULL)
+   fclose(f_min);
+   if (f_max != NULL)
+   fclose(f_max);
 
return ret;
 }
-- 
2.25.1



Re: [dpdk-dev] [PATCH v9 3/8] doc: add PMD power management NIC feature

2021-07-09 Thread Burakov, Anatoly

On 09-Jul-21 4:53 PM, Anatoly Burakov wrote:

At this point, multiple different Ethernet drivers from multiple vendors
will support the PMD power management scheme. It would be useful to add
it to the NIC feature table to indicate support for it.

Suggested-by: David Marchand 
Signed-off-by: Anatoly Burakov 
---
  doc/guides/nics/features.rst | 10 ++
  doc/guides/nics/features/default.ini |  1 +
  2 files changed, 11 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 403c2b03a3..a96e12d155 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information.
  * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
  * **[related] API**: ``rte_eth_rx_burst_mode_get()``, 
``rte_eth_tx_burst_mode_get()``.
  
+.. _nic_features_get_monitor_addr:

+
+PMD power management using monitor addresses
+
+
+Supports getting a monitoring condition to use together with Ethernet PMD power
+management (see :doc:`../prog_guide/power_man` for more details).
+
+* **[implements] eth_dev_ops**: ``get_monitor_addr``
+
  .. _nic_features_other:
  
  Other dev ops not represented by a Feature

diff --git a/doc/guides/nics/features/default.ini 
b/doc/guides/nics/features/default.ini
index 3b55e0ccb0..f1e947bd9e 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -76,6 +76,7 @@ x86-64   =
  Usage doc=
  Design doc   =
  Perf doc =
+Power mgmt address monitor =
  
  [rte_flow items]

  ah   =



Apologies, forgot to git add the driver files to the commit. Will respin 
shortly.


--
Thanks,
Anatoly


[dpdk-dev] [PATCH v10 0/8] Enhancements for PMD power management

2021-07-09 Thread Anatoly Burakov
This patchset introduces several changes related to PMD power management:

- Changed monitoring intrinsics to use callbacks as a comparison function, based
  on previous patchset [1] but incorporating feedback [2] - this hopefully will
  make it possible to add support for .get_monitor_addr in virtio
- Add a new intrinsic to monitor multiple addresses, based on RTM instruction
  set and the TPAUSE instruction
- Add support for PMD power management on multiple queues, as well as all
  accompanying infrastructure and example apps changes

v10:
- Added missing changes to NIC feature .ini files

v9:
- Added all missing Acks and Tests
- Added a new commit with NIC features
- Addressed minor issues raised in review

v8:
- Fixed checkpatch issue
- Added comment explaining empty poll handling (Konstantin)

v7:
- Fixed various bugs

v6:
- Improved the algorithm for multi-queue sleep
- Fixed segfault and addressed other feedback

v5:
- Removed "power save queue" API and replaced with mechanism suggested by
  Konstantin
- Addressed other feedback

v4:
- Replaced raw number with a macro
- Fixed all the bugs found by Konstantin
- Some other minor corrections

v3:
- Moved some doc updates to NIC features list

v2:
- Changed check inversion to callbacks
- Addressed feedback from Konstantin
- Added doc updates where necessary

[1] http://patches.dpdk.org/project/dpdk/list/?series=16930&state=*
[2] 
http://patches.dpdk.org/project/dpdk/patch/819ef1ace187365a615d3383e54579e3d9fb216e.1620747068.git.anatoly.bura...@intel.com/#133274

Anatoly Burakov (8):
  eal: use callbacks for power monitoring comparison
  net/af_xdp: add power monitor support
  doc: add PMD power management NIC feature
  eal: add power monitor for multiple events
  power: remove thread safety from PMD power API's
  power: support callbacks for multiple Rx queues
  power: support monitoring multiple Rx queues
  examples/l3fwd-power: support multiq in PMD modes

 doc/guides/nics/features.rst  |  10 +
 doc/guides/nics/features/af_xdp.ini   |   1 +
 doc/guides/nics/features/default.ini  |   1 +
 doc/guides/nics/features/i40e.ini |   1 +
 doc/guides/nics/features/i40e_vf.ini  |   1 +
 doc/guides/nics/features/iavf.ini |   1 +
 doc/guides/nics/features/ice.ini  |   1 +
 doc/guides/nics/features/ixgbe.ini|   1 +
 doc/guides/nics/features/ixgbe_vf.ini |   1 +
 doc/guides/nics/features/mlx5.ini |   1 +
 doc/guides/prog_guide/power_man.rst   |  74 +-
 doc/guides/rel_notes/release_21_08.rst|  11 +
 drivers/event/dlb2/dlb2.c |  17 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c   |  34 +
 drivers/net/i40e/i40e_rxtx.c  |  20 +-
 drivers/net/iavf/iavf_rxtx.c  |  20 +-
 drivers/net/ice/ice_rxtx.c|  20 +-
 drivers/net/ixgbe/ixgbe_rxtx.c|  20 +-
 drivers/net/mlx5/mlx5_rx.c|  17 +-
 examples/l3fwd-power/main.c   |   6 -
 lib/eal/arm/rte_power_intrinsics.c|  11 +
 lib/eal/include/generic/rte_cpuflags.h|   2 +
 .../include/generic/rte_power_intrinsics.h|  68 +-
 lib/eal/ppc/rte_power_intrinsics.c|  11 +
 lib/eal/version.map   |   3 +
 lib/eal/x86/rte_cpuflags.c|   2 +
 lib/eal/x86/rte_power_intrinsics.c|  90 ++-
 lib/power/meson.build |   3 +
 lib/power/rte_power_pmd_mgmt.c| 663 +-
 lib/power/rte_power_pmd_mgmt.h|   6 +
 30 files changed, 855 insertions(+), 262 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v10 1/8] eal: use callbacks for power monitoring comparison

2021-07-09 Thread Anatoly Burakov
Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev 
Signed-off-by: Anatoly Burakov 
Acked-by: Konstantin Ananyev 
Tested-by: David Hunt 
Acked-by: Timothy McDaniel 
---

Notes:
v4:
- Return error if callback is set to NULL
- Replace raw number with a macro in monitor condition opaque data

v2:
- Use callback mechanism for more flexibility
- Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst|  2 ++
 drivers/event/dlb2/dlb2.c | 17 --
 drivers/net/i40e/i40e_rxtx.c  | 20 +++
 drivers/net/iavf/iavf_rxtx.c  | 20 +++
 drivers/net/ice/ice_rxtx.c| 20 +++
 drivers/net/ixgbe/ixgbe_rxtx.c| 20 +++
 drivers/net/mlx5/mlx5_rx.c| 17 --
 .../include/generic/rte_power_intrinsics.h| 33 +++
 lib/eal/x86/rte_power_intrinsics.c| 17 +-
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index 476822b47f..912fb13b84 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -144,6 +144,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 ---
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+   /* abort if the value matches */
+   return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
expected_value = 0;
 
pmc.addr = monitor_addr;
-   pmc.val = expected_value;
-   pmc.mask = qe_mask.raw_qe[1];
+   /* store expected value and comparison mask in opaque data */
+   pmc.opaque[CLB_VAL_IDX] = expected_value;
+   pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+   /* set up callback */
+   pmc.fn = dlb2_monitor_callback;
pmc.size = sizeof(uint64_t);
 
rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index e518409fe5..8489f91f1d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+   const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+   const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+   /*
+* we expect the DD bit to be set to 1 if this descriptor was already
+* written to.
+*/
+   return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct 
rte_power_monitor_cond *pmc)
/* watch for changes in status bit */
pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-   /*
-* we expect the DD bit to be set to 1 if this descriptor was already
-* written to.
-*/
-   pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-   pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+   /* comparison callback */
+   pmc->fn = i40e_monitor_callback;
 
/* registers are 64-bit */
pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_pr

[dpdk-dev] [PATCH v10 2/8] net/af_xdp: add power monitor support

2021-07-09 Thread Anatoly Burakov
Implement support for .get_monitor_addr in AF_XDP driver.

Signed-off-by: Anatoly Burakov 
---

Notes:
v8:
- Fix checkpatch issue

v2:
- Rewrite using the callback mechanism

 drivers/net/af_xdp/rte_eth_af_xdp.c | 34 +
 1 file changed, 34 insertions(+)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c 
b/drivers/net/af_xdp/rte_eth_af_xdp.c
index eb5660a3dc..989051dd6d 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "compat.h"
 
@@ -788,6 +789,38 @@ eth_dev_configure(struct rte_eth_dev *dev)
return 0;
 }
 
+#define CLB_VAL_IDX 0
+static int
+eth_monitor_callback(const uint64_t value,
+   const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+   const uint64_t v = opaque[CLB_VAL_IDX];
+   const uint64_t m = (uint32_t)~0;
+
+   /* if the value has changed, abort entering power optimized state */
+   return (value & m) == v ? 0 : -1;
+}
+
+static int
+eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
+{
+   struct pkt_rx_queue *rxq = rx_queue;
+   unsigned int *prod = rxq->rx.producer;
+   const uint32_t cur_val = rxq->rx.cached_prod; /* use cached value */
+
+   /* watch for changes in producer ring */
+   pmc->addr = (void *)prod;
+
+   /* store current value */
+   pmc->opaque[CLB_VAL_IDX] = cur_val;
+   pmc->fn = eth_monitor_callback;
+
+   /* AF_XDP producer ring index is 32-bit */
+   pmc->size = sizeof(uint32_t);
+
+   return 0;
+}
+
 static int
 eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
@@ -1448,6 +1481,7 @@ static const struct eth_dev_ops ops = {
.link_update = eth_link_update,
.stats_get = eth_stats_get,
.stats_reset = eth_stats_reset,
+   .get_monitor_addr = eth_get_monitor_addr
 };
 
 /** parse busy_budget argument */
-- 
2.25.1



[dpdk-dev] [PATCH v10 3/8] doc: add PMD power management NIC feature

2021-07-09 Thread Anatoly Burakov
At this point, multiple different Ethernet drivers from multiple vendors
will support the PMD power management scheme. It would be useful to add
it to the NIC feature table to indicate support for it.

Suggested-by: David Marchand 
Signed-off-by: Anatoly Burakov 
---

Notes:
v10:
- Added missing NIC feature support in ini files

 doc/guides/nics/features.rst  | 10 ++
 doc/guides/nics/features/af_xdp.ini   |  1 +
 doc/guides/nics/features/default.ini  |  1 +
 doc/guides/nics/features/i40e.ini |  1 +
 doc/guides/nics/features/i40e_vf.ini  |  1 +
 doc/guides/nics/features/iavf.ini |  1 +
 doc/guides/nics/features/ice.ini  |  1 +
 doc/guides/nics/features/ixgbe.ini|  1 +
 doc/guides/nics/features/ixgbe_vf.ini |  1 +
 doc/guides/nics/features/mlx5.ini |  1 +
 10 files changed, 19 insertions(+)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 403c2b03a3..a96e12d155 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information.
 * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``.
 * **[related] API**: ``rte_eth_rx_burst_mode_get()``, 
``rte_eth_tx_burst_mode_get()``.
 
+.. _nic_features_get_monitor_addr:
+
+PMD power management using monitor addresses
+
+
+Supports getting a monitoring condition to use together with Ethernet PMD power
+management (see :doc:`../prog_guide/power_man` for more details).
+
+* **[implements] eth_dev_ops**: ``get_monitor_addr``
+
 .. _nic_features_other:
 
 Other dev ops not represented by a Feature
diff --git a/doc/guides/nics/features/af_xdp.ini 
b/doc/guides/nics/features/af_xdp.ini
index 36953c2dec..4e3f638bf5 100644
--- a/doc/guides/nics/features/af_xdp.ini
+++ b/doc/guides/nics/features/af_xdp.ini
@@ -9,3 +9,4 @@ MTU update   = Y
 Promiscuous mode = Y
 Stats per queue  = Y
 x86-64   = Y
+Power mgmt address monitor = Y
diff --git a/doc/guides/nics/features/default.ini 
b/doc/guides/nics/features/default.ini
index 3b55e0ccb0..f1e947bd9e 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -76,6 +76,7 @@ x86-64   =
 Usage doc=
 Design doc   =
 Perf doc =
+Power mgmt address monitor =
 
 [rte_flow items]
 ah   =
diff --git a/doc/guides/nics/features/i40e.ini 
b/doc/guides/nics/features/i40e.ini
index 1f3f5eb3ff..b6765d0e5a 100644
--- a/doc/guides/nics/features/i40e.ini
+++ b/doc/guides/nics/features/i40e.ini
@@ -51,6 +51,7 @@ x86-32   = Y
 x86-64   = Y
 ARMv8= Y
 Power8   = Y
+Power mgmt address monitor = Y
 
 [rte_flow items]
 ah   = Y
diff --git a/doc/guides/nics/features/i40e_vf.ini 
b/doc/guides/nics/features/i40e_vf.ini
index bac1bb4344..d5b163c1c1 100644
--- a/doc/guides/nics/features/i40e_vf.ini
+++ b/doc/guides/nics/features/i40e_vf.ini
@@ -37,3 +37,4 @@ FreeBSD  = Y
 Linux= Y
 x86-32   = Y
 x86-64   = Y
+Power mgmt address monitor = Y
diff --git a/doc/guides/nics/features/iavf.ini 
b/doc/guides/nics/features/iavf.ini
index 43a84a3bda..146b004da2 100644
--- a/doc/guides/nics/features/iavf.ini
+++ b/doc/guides/nics/features/iavf.ini
@@ -33,6 +33,7 @@ FreeBSD  = Y
 Linux= Y
 x86-32   = Y
 x86-64   = Y
+Power mgmt address monitor = Y
 
 [rte_flow items]
 ah   = Y
diff --git a/doc/guides/nics/features/ice.ini b/doc/guides/nics/features/ice.ini
index 1b9228c678..fbc81c654d 100644
--- a/doc/guides/nics/features/ice.ini
+++ b/doc/guides/nics/features/ice.ini
@@ -42,6 +42,7 @@ Linux= Y
 Windows  = Y
 x86-32   = Y
 x86-64   = Y
+Power mgmt address monitor = Y
 
 [rte_flow items]
 ah   = Y
diff --git a/doc/guides/nics/features/ixgbe.ini 
b/doc/guides/nics/features/ixgbe.ini
index 93a9cc18ab..92228fe194 100644
--- a/doc/guides/nics/features/ixgbe.ini
+++ b/doc/guides/nics/features/ixgbe.ini
@@ -54,6 +54,7 @@ Linux= Y
 ARMv8= Y
 x86-32   = Y
 x86-64   = Y
+Power mgmt address monitor = Y
 
 [rte_flow items]
 eth  = Y
diff --git a/doc/guides/nics/features/ixgbe_vf.ini 
b/doc/guides/nics/features/ixgbe_vf.ini
index 7161e61f9a..ea8342f2c9 100644
--- a/doc/guides/nics/features/ixgbe_vf.ini
+++ b/doc/guides/nics/features/ixgbe_vf.ini
@@ -38,3 +38,4 @@ Linux= Y
 ARMv8= Y
 x86-32   = Y
 x86-64   = Y
+Power mgmt address monitor = Y
diff --git a/doc/guides/nics/features/mlx5.ini 
b/doc/guides/nics/features/mlx5.ini
index 3b82ce41fd..2c7d9f6e8c 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -51,6 +51,7 @@ Power8   

[dpdk-dev] [PATCH v10 4/8] eal: add power monitor for multiple events

2021-07-09 Thread Anatoly Burakov
Use RTM and WAITPKG instructions to perform a wait-for-writes similar to
what UMWAIT does, but without the limitation of having to listen for
just one event. This works because the optimized power state used by the
TPAUSE instruction will cause a wake up on RTM transaction abort, so if
we add the addresses we're interested in to the read-set, any write to
those addresses will wake us up.

Signed-off-by: Konstantin Ananyev 
Signed-off-by: Anatoly Burakov 
Tested-by: David Hunt 
---

Notes:
v4:
- Fixed bugs in accessing the monitor condition
- Abort on any monitor condition not having a defined callback

v2:
- Adapt to callback mechanism

 lib/eal/arm/rte_power_intrinsics.c| 11 +++
 lib/eal/include/generic/rte_cpuflags.h|  2 +
 .../include/generic/rte_power_intrinsics.h| 35 +
 lib/eal/ppc/rte_power_intrinsics.c| 11 +++
 lib/eal/version.map   |  3 +
 lib/eal/x86/rte_cpuflags.c|  2 +
 lib/eal/x86/rte_power_intrinsics.c| 73 +++
 7 files changed, 137 insertions(+)

diff --git a/lib/eal/arm/rte_power_intrinsics.c 
b/lib/eal/arm/rte_power_intrinsics.c
index e83f04072a..78f55b7203 100644
--- a/lib/eal/arm/rte_power_intrinsics.c
+++ b/lib/eal/arm/rte_power_intrinsics.c
@@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 
return -ENOTSUP;
 }
+
+int
+rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp)
+{
+   RTE_SET_USED(pmc);
+   RTE_SET_USED(num);
+   RTE_SET_USED(tsc_timestamp);
+
+   return -ENOTSUP;
+}
diff --git a/lib/eal/include/generic/rte_cpuflags.h 
b/lib/eal/include/generic/rte_cpuflags.h
index 28a5aecde8..d35551e931 100644
--- a/lib/eal/include/generic/rte_cpuflags.h
+++ b/lib/eal/include/generic/rte_cpuflags.h
@@ -24,6 +24,8 @@ struct rte_cpu_intrinsics {
/**< indicates support for rte_power_monitor function */
uint32_t power_pause : 1;
/**< indicates support for rte_power_pause function */
+   uint32_t power_monitor_multi : 1;
+   /**< indicates support for rte_power_monitor_multi function */
 };
 
 /**
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h 
b/lib/eal/include/generic/rte_power_intrinsics.h
index c9aa52a86d..04e8c2ab37 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -128,4 +128,39 @@ int rte_power_monitor_wakeup(const unsigned int lcore_id);
 __rte_experimental
 int rte_power_pause(const uint64_t tsc_timestamp);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Monitor a set of addresses for changes. This will cause the CPU to enter an
+ * architecture-defined optimized power state until either one of the specified
+ * memory addresses is written to, a certain TSC timestamp is reached, or other
+ * reasons cause the CPU to wake up.
+ *
+ * Additionally, `expected` 64-bit values and 64-bit masks are provided. If
+ * mask is non-zero, the current value pointed to by the `p` pointer will be
+ * checked against the expected value, and if they do not match, the entering 
of
+ * optimized power state may be aborted.
+ *
+ * @warning It is responsibility of the user to check if this function is
+ *   supported at runtime using `rte_cpu_get_intrinsics_support()` API call.
+ *   Failing to do so may result in an illegal CPU instruction error.
+ *
+ * @param pmc
+ *   An array of monitoring condition structures.
+ * @param num
+ *   Length of the `pmc` array.
+ * @param tsc_timestamp
+ *   Maximum TSC timestamp to wait for. Note that the wait behavior is
+ *   architecture-dependent.
+ *
+ * @return
+ *   0 on success
+ *   -EINVAL on invalid parameters
+ *   -ENOTSUP if unsupported
+ */
+__rte_experimental
+int rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp);
+
 #endif /* _RTE_POWER_INTRINSIC_H_ */
diff --git a/lib/eal/ppc/rte_power_intrinsics.c 
b/lib/eal/ppc/rte_power_intrinsics.c
index 7fc9586da7..f00b58ade5 100644
--- a/lib/eal/ppc/rte_power_intrinsics.c
+++ b/lib/eal/ppc/rte_power_intrinsics.c
@@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 
return -ENOTSUP;
 }
+
+int
+rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
+   const uint32_t num, const uint64_t tsc_timestamp)
+{
+   RTE_SET_USED(pmc);
+   RTE_SET_USED(num);
+   RTE_SET_USED(tsc_timestamp);
+
+   return -ENOTSUP;
+}
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 2df65c6903..887012d02a 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -423,6 +423,9 @@ EXPERIMENTAL {
rte_version_release; # WINDOWS_NO_EXPORT
rte_version_suffix; # WINDOWS_NO_EXPORT
rte_version_year; # WINDOWS_NO_EXPORT
+
+   # added in 21.08
+   r

[dpdk-dev] [PATCH v10 5/8] power: remove thread safety from PMD power API's

2021-07-09 Thread Anatoly Burakov
Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov 
Acked-by: Konstantin Ananyev 
Tested-by: David Hunt 
---

Notes:
v2:
- Add check for stopped queue
- Clarified doc message
- Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build  |   3 +
 lib/power/rte_power_pmd_mgmt.c | 133 ++---
 lib/power/rte_power_pmd_mgmt.h |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index 912fb13b84..b9a3caabf0 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -146,6 +146,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 ---
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 36e5a65874..bf937acde4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -22,4 +22,7 @@ headers = files(
 'rte_power_pmd_mgmt.h',
 'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
/**< Callback mode for this queue */
const struct rte_eth_rxtx_callback *cur_cb;
/**< Callback instance */
-   volatile bool umwait_in_progress;
-   /**< are we currently sleeping? */
uint64_t empty_poll_stats;
/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf 
**pkts __rte_unused,
struct rte_power_monitor_cond pmc;
uint16_t ret;
 
-   /*
-* we might get a cancellation request while being
-* inside the callback, in which case the wakeup
-* wouldn't work because it would've arrived too early.
-*
-* to get around this, we notify the other thread that
-* we're sleeping, so that it can spin until we're done.
-* unsolicited wakeups are perfectly safe.
-*/
-   q_conf->umwait_in_progress = true;
-
-   rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-   /* check if we need to cancel sleep */
-   if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-   /* use monitoring condition to sleep */
-   ret = rte_eth_get_monitor_addr(port_id, qidx,
-   &pmc);
-   if (ret == 0)
-   rte_power_monitor(&pmc, UINT64_MAX);
-   }
-   q_conf->umwait_in_progress = false;
-
-   rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+   /* use monitoring condition to sleep */
+   ret = rte_eth_get_monitor_addr(port_id, qidx,
+   &pmc);
+   if (ret == 0)
+   rte_power_monitor(&pmc, UINT64_MAX);
}
} else
q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+   struct rte_eth_rxq_info qinfo;
+
+   if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+   return -1;
+
+   return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
struct 

[dpdk-dev] [PATCH v10 6/8] power: support callbacks for multiple Rx queues

2021-07-09 Thread Anatoly Burakov
Currently, there is a hard limitation on the PMD power management
support that only allows it to support a single queue per lcore. This is
not ideal as most DPDK use cases will poll multiple queues per core.

The PMD power management mechanism relies on ethdev Rx callbacks, so it
is very difficult to implement such support because callbacks are
effectively stateless and have no visibility into what the other ethdev
devices are doing. This places limitations on what we can do within the
framework of Rx callbacks, but the basics of this implementation are as
follows:

- Replace per-queue structures with per-lcore ones, so that any device
  polled from the same lcore can share data
- Any queue that is going to be polled from a specific lcore has to be
  added to the list of queues to poll, so that the callback is aware of
  other queues being polled by the same lcore
- Both the empty poll counter and the actual power saving mechanism is
  shared between all queues polled on a particular lcore, and is only
  activated when all queues in the list were polled and were determined
  to have no traffic.
- The limitation on UMWAIT-based polling is not removed because UMWAIT
  is incapable of monitoring more than one address.

Also, while we're at it, update and improve the docs.

Signed-off-by: Anatoly Burakov 
Acked-by: Konstantin Ananyev 
Tested-by: David Hunt 
---

Notes:
v8:
- Added a comment explaining that we want to sleep on each empty poll after
  threshold has been reached

v7:
- Fix bug where initial sleep target was always set to zero
- Fix logic in handling of n_queues_ready_to_sleep counter
- Update documentation on hardware requirements

v6:
- Track each individual queue sleep status (Konstantin)
- Fix segfault (Dave)

v5:
- Remove the "power save queue" API and replace it with mechanism suggested 
by
  Konstantin

v3:
- Move the list of supported NICs to NIC feature table

v2:
- Use a TAILQ for queues instead of a static array
- Address feedback from Konstantin
- Add additional checks for stopped queues

 doc/guides/prog_guide/power_man.rst|  69 ++--
 doc/guides/rel_notes/release_21_08.rst |   5 +
 lib/power/rte_power_pmd_mgmt.c | 460 +++--
 3 files changed, 398 insertions(+), 136 deletions(-)

diff --git a/doc/guides/prog_guide/power_man.rst 
b/doc/guides/prog_guide/power_man.rst
index c70ae128ac..0e66878892 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -198,34 +198,45 @@ Ethernet PMD Power Management API
 Abstract
 
 
-Existing power management mechanisms require developers
-to change application design or change code to make use of it.
-The PMD power management API provides a convenient alternative
-by utilizing Ethernet PMD RX callbacks,
-and triggering power saving whenever empty poll count reaches a certain number.
-
-Monitor
-   This power saving scheme will put the CPU into optimized power state
-   and use the ``rte_power_monitor()`` function
-   to monitor the Ethernet PMD RX descriptor address,
-   and wake the CPU up whenever there's new traffic.
-
-Pause
-   This power saving scheme will avoid busy polling
-   by either entering power-optimized sleep state
-   with ``rte_power_pause()`` function,
-   or, if it's not available, use ``rte_pause()``.
-
-Frequency scaling
-   This power saving scheme will use ``librte_power`` library
-   functionality to scale the core frequency up/down
-   depending on traffic volume.
-
-.. note::
-
-   Currently, this power management API is limited to mandatory mapping
-   of 1 queue to 1 core (multiple queues are supported,
-   but they must be polled from different cores).
+Existing power management mechanisms require developers to change application
+design or change code to make use of it. The PMD power management API provides 
a
+convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering
+power saving whenever empty poll count reaches a certain number.
+
+* Monitor
+   This power saving scheme will put the CPU into optimized power state and
+   monitor the Ethernet PMD RX descriptor address, waking the CPU up whenever
+   there's new traffic. Support for this scheme may not be available on all
+   platforms, and further limitations may apply (see below).
+
+* Pause
+   This power saving scheme will avoid busy polling by either entering
+   power-optimized sleep state with ``rte_power_pause()`` function, or, if it's
+   not supported by the underlying platform, use ``rte_pause()``.
+
+* Frequency scaling
+   This power saving scheme will use ``librte_power`` library functionality to
+   scale the core frequency up/down depending on traffic volume.
+
+The "monitor" mode is only supported in the following configurations and 
scenarios:
+
+* On Linux* x86_64, `rte_power_monitor()` requires WAITPKG instruction set 
being
+  supported by the CPU. Please refe

[dpdk-dev] [PATCH v10 7/8] power: support monitoring multiple Rx queues

2021-07-09 Thread Anatoly Burakov
Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
Rx queues while entering the energy efficient power state. The multi
version will be used unconditionally if supported, and the UMWAIT one
will only be used when multi-monitor is not supported by the hardware.

Signed-off-by: Anatoly Burakov 
Acked-by: Konstantin Ananyev 
Tested-by: David Hunt 
---

Notes:
v6:
- Fix the missed feedback from v5

v4:
- Fix possible out of bounds access
- Added missing index increment

 doc/guides/prog_guide/power_man.rst | 15 --
 lib/power/rte_power_pmd_mgmt.c  | 82 -
 2 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/doc/guides/prog_guide/power_man.rst 
b/doc/guides/prog_guide/power_man.rst
index 0e66878892..e387d7811e 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -221,17 +221,22 @@ power saving whenever empty poll count reaches a certain 
number.
 The "monitor" mode is only supported in the following configurations and 
scenarios:
 
 * On Linux* x86_64, `rte_power_monitor()` requires WAITPKG instruction set 
being
-  supported by the CPU. Please refer to your platform documentation for further
-  information.
+  supported by the CPU, while `rte_power_monitor_multi()` requires WAITPKG and
+  RTM instruction sets being supported by the CPU. RTM instruction set may also
+  require booting the Linux with `tsx=on` command line parameter. Please refer
+  to your platform documentation for further information.
 
 * If ``rte_cpu_get_intrinsics_support()`` function indicates that
+  ``rte_power_monitor_multi()`` function is supported by the platform, then
+  monitoring multiple Ethernet Rx queues for traffic will be supported.
+
+* If ``rte_cpu_get_intrinsics_support()`` function indicates that only
   ``rte_power_monitor()`` is supported by the platform, then monitoring will be
   limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be
   monitored from a different lcore).
 
-* If ``rte_cpu_get_intrinsics_support()`` function indicates that the
-  ``rte_power_monitor()`` function is not supported, then monitor mode will not
-  be supported.
+* If ``rte_cpu_get_intrinsics_support()`` function indicates that neither of 
the
+  two monitoring functions are supported, then monitor mode will not be 
supported.
 
 * Not all Ethernet drivers support monitoring, even if the underlying
   platform may support the necessary CPU instructions. Please refer to
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 30772791af..2586204b93 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -126,6 +126,32 @@ queue_list_take(struct pmd_core_cfg *cfg, const union 
queue *q)
return found;
 }
 
+static inline int
+get_monitor_addresses(struct pmd_core_cfg *cfg,
+   struct rte_power_monitor_cond *pmc, size_t len)
+{
+   const struct queue_list_entry *qle;
+   size_t i = 0;
+   int ret;
+
+   TAILQ_FOREACH(qle, &cfg->head, next) {
+   const union queue *q = &qle->queue;
+   struct rte_power_monitor_cond *cur;
+
+   /* attempted out of bounds access */
+   if (i >= len) {
+   RTE_LOG(ERR, POWER, "Too many queues being 
monitored\n");
+   return -1;
+   }
+
+   cur = &pmc[i++];
+   ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur);
+   if (ret < 0)
+   return ret;
+   }
+   return 0;
+}
+
 static void
 calc_tsc(void)
 {
@@ -215,6 +241,46 @@ lcore_can_sleep(struct pmd_core_cfg *cfg)
return true;
 }
 
+static uint16_t
+clb_multiwait(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused,
+   struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
+   uint16_t max_pkts __rte_unused, void *arg)
+{
+   const unsigned int lcore = rte_lcore_id();
+   struct queue_list_entry *queue_conf = arg;
+   struct pmd_core_cfg *lcore_conf;
+   const bool empty = nb_rx == 0;
+
+   lcore_conf = &lcore_cfgs[lcore];
+
+   /* early exit */
+   if (likely(!empty))
+   /* early exit */
+   queue_reset(lcore_conf, queue_conf);
+   else {
+   struct rte_power_monitor_cond pmc[lcore_conf->n_queues];
+   int ret;
+
+   /* can this queue sleep? */
+   if (!queue_can_sleep(lcore_conf, queue_conf))
+   return nb_rx;
+
+   /* can this lcore sleep? */
+   if (!lcore_can_sleep(lcore_conf))
+   return nb_rx;
+
+   /* gather all monitoring conditions */
+   ret = get_monitor_addresses(lcore_conf, pmc,
+   lcore_conf->n_queues);
+   if (ret < 0)
+   return nb_rx;
+
+   rte_power_mo

[dpdk-dev] [PATCH v10 8/8] examples/l3fwd-power: support multiq in PMD modes

2021-07-09 Thread Anatoly Burakov
Currently, l3fwd-power enforces the limitation of having one queue per
lcore. This is no longer necessary, so remove the limitation.

Signed-off-by: Anatoly Burakov 
Tested-by: David Hunt 
---
 examples/l3fwd-power/main.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index f8dfed1634..52f56dc405 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -2723,12 +2723,6 @@ main(int argc, char **argv)
printf("\nInitializing rx queues on lcore %u ... ", lcore_id );
fflush(stdout);
 
-   /* PMD power management mode can only do 1 queue per core */
-   if (app_mode == APP_MODE_PMD_MGMT && qconf->n_rx_queue > 1) {
-   rte_exit(EXIT_FAILURE,
-   "In PMD power management mode, only one queue 
per lcore is allowed\n");
-   }
-
/* init RX queues */
for(queue = 0; queue < qconf->n_rx_queue; ++queue) {
struct rte_eth_rxconf rxq_conf;
-- 
2.25.1



Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows

2021-07-09 Thread Tyler Retzlaff
On Fri, Jul 09, 2021 at 01:34:06PM +0200, Thomas Monjalon wrote:
> 09/07/2021 02:16, Tyler Retzlaff:
> > On Thu, Jul 08, 2021 at 10:39:13PM +0200, Thomas Monjalon wrote:
> > > 08/07/2021 21:21, Tyler Retzlaff:
> > > > (2) importing exported data symbols from a dll/dso on windows requires
> > > > that the symbol be decorated with dllimport. optionally loading
> > > > performance of dll/dso is also further improved by decorating
> > > > exported function symbols. [3]
> > > > 
> > > > for (2) we would propose the introduction and use of two macros to
> > > > allow decoration of exported data symbols. these macro would be or
> > > > similarly named __rte_import and __rte_export. of note
> > > 
> > > That's the same symbol declared in a single place
> > > which is exported and imported.
> > > So I don't understand the need for 2 macros.
> > 
> > i may be misinterpreting your reply. you're saying there is no need for
> > 2 because we use .def files?
> > 
> > strictly speaking when exporting C symbols this is true. so yes, we
> > could introduce only __rte_import and not bother with __rte_export.
> > 
> > is that what you meant?
> > 
> > i don't have any objection to just __rte_import alone but it is
> > mandatory for data symbols.
> 
> It may be my misunderstanding.
> The function is declared only once in the .h
> so I don't understand where these 2 macros are used.
> 

okay, i think the question is about mechanically where they are used and
i'll limit my answer to data symbols since that's what we need them for.

the export is typically used at the point of definition and the import
at declaration. so ignoring anything about tls the following example
hopefully illustrates.

let's start with a module i'll just call it eal.dll it is declaring a
data symbol that will be exported and may be referenced internally within
eal.dll itself but also externally by other modules outside of eal.dll.

// eal.c
__rte_export
int eal_some_var;

// eal.h
#ifndef BUILDING_EAL
__rte_import
#endif
extern int eal_some_var;

// eal_other.c
#define BUILDING_EAL
#include 

eal_some_var = 100;

we also have another module app.exe that consumes the variable
exported from eal.dll.

// app.c
#include 

eal_some_var = 200;

as mentioned in my first reply you can get away with not using
__rte_export because dpdk also lists the exports in the .map/.def files
even for data symbols but it is required for import [1]. the relevant
text from the documentation is quoted here.

Using __declspec(dllimport) is optional on function declarations,
but the compiler produces more efficient code if you use this
keyword.  However, you must use __declspec(dllimport) for the
importing executable to access the DLL's public data symbols and
objects. Note that the users of your DLL still need to link with
an import library.

specifically "you must use __declspec(dllimport) for the importing
executable to access the DLL's public data symbols" is relevant.

the other internal/external conditional compile for BUILDING_EAL is
applied to avoid generating the code for the extra layer of indirection
mentioned here [2] when using the variable in the same module in which
it was defined.

1. 
https://docs.microsoft.com/en-us/cpp/build/importing-into-an-application-using-declspec-dllimport?view=msvc-160
2. 
https://docs.microsoft.com/en-us/cpp/build/importing-data-using-declspec-dllimport?view=msvc-160


[dpdk-dev] [PATCH] examples/pipeline: add FIB example

2021-07-09 Thread Cristian Dumitrescu
Add example for FIB with VRF and ECMP support.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Churchill Khangar 
---
Depends-on: patch-17716 ("[V2] pipeline: add support for LPM lookup")

 examples/pipeline/examples/fib.cli|  58 ++
 examples/pipeline/examples/fib.spec   | 171 ++
 .../examples/fib_nexthop_group_table.txt  |  54 ++
 .../pipeline/examples/fib_nexthop_table.txt   |  22 +++
 .../pipeline/examples/fib_routing_table.txt   |  30 +++
 5 files changed, 335 insertions(+)
 create mode 100644 examples/pipeline/examples/fib.cli
 create mode 100644 examples/pipeline/examples/fib.spec
 create mode 100644 examples/pipeline/examples/fib_nexthop_group_table.txt
 create mode 100644 examples/pipeline/examples/fib_nexthop_table.txt
 create mode 100644 examples/pipeline/examples/fib_routing_table.txt

diff --git a/examples/pipeline/examples/fib.cli 
b/examples/pipeline/examples/fib.cli
new file mode 100644
index 0..b20aed3cf
--- /dev/null
+++ b/examples/pipeline/examples/fib.cli
@@ -0,0 +1,58 @@
+; SPDX-License-Identifier: BSD-3-Clause
+; Copyright(c) 2020 Intel Corporation
+
+;
+; Customize the LINK parameters to match your setup.
+;
+mempool MEMPOOL0 buffer 2304 pool 32K cache 256 cpu 0
+
+link LINK0 dev :18:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+link LINK1 dev :18:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+link LINK2 dev :3b:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+link LINK3 dev :3b:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+
+;
+; PIPELINE0 setup.
+;
+pipeline PIPELINE0 create 0
+
+pipeline PIPELINE0 port in 0 link LINK0 rxq 0 bsz 32
+pipeline PIPELINE0 port in 1 link LINK1 rxq 0 bsz 32
+pipeline PIPELINE0 port in 2 link LINK2 rxq 0 bsz 32
+pipeline PIPELINE0 port in 3 link LINK3 rxq 0 bsz 32
+
+pipeline PIPELINE0 port out 0 link LINK0 txq 0 bsz 32
+pipeline PIPELINE0 port out 1 link LINK1 txq 0 bsz 32
+pipeline PIPELINE0 port out 2 link LINK2 txq 0 bsz 32
+pipeline PIPELINE0 port out 3 link LINK3 txq 0 bsz 32
+pipeline PIPELINE0 port out 4 sink none
+
+pipeline PIPELINE0 build ./examples/pipeline/examples/fib.spec
+
+;
+; Initial set of table entries.
+;
+; The table entries can later be updated at run-time through the CLI commands. 
Once the application
+; has been successfully started, the command to get the CLI prompt is: telnet 
0.0.0.0 8086.
+;
+pipeline PIPELINE0 table routing_table add 
./examples/pipeline/examples/fib_routing_table.txt
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group add
+pipeline PIPELINE0 selector nexthop_group_table group member add 
./examples/pipeline/examples/fib_nexthop_group_table.txt
+pipeline PIPELINE0 table nexthop_table add 
./examples/pipeline/examples/fib_nexthop_table.txt
+pipeline PIPELINE0 commit
+
+;
+; Pipelines-to-threads mapping.
+;
+thread 1 pipeline PIPELINE0 enable
diff --git a/examples/pipeline/examples/fib.spec 
b/examples/pipeline/examples/fib.spec
new file mode 100644
index 0..3f8c76610
--- /dev/null
+++ b/examples/pipeline/examples/fib.spec
@@ -0,0 +1,171 @@
+; SPDX-License-Identifier: BSD-3-Clause
+; Copyright(c) 2021 Intel Corporation
+
+; This example illustrates a FIB [1] with VRF [2] and ECMP [3] support. A FIB 
essentially is the
+; data plane copy of the routing table. The VRF support allows for multiple 
logical routing tables
+; to co-exist as part of the same "physical" routing table; the VRF ID 
typically identifies the
+; logical table to provide the matching route for the IP destination address 
of the input packet.
+; The ECMP provides a load balancing mechanism for the packet forwarding by 
allowing for multiple
+; next hops (of equal or different weights, in case of WCMP [4]) to be 
provided for each route.
+;
+; In this example, the VRF ID is read from the IP source address of the input 
packet as opposed to a
+; more complex classification scheme being used. The routing table produces 
the ID of the group of
+; next hops associated with the current route, out of which a single next hop 
is selected based on a
+; hashing scheme that preserves the packet order within each flow (with the 
flow defined here by a
+; typical 3-tuple) by always selecting the same next hop for packets that are 
part of the same flow.
+; The next hop provides the Ethernet header and the output port for

[dpdk-dev] [PATCH 2/4] ethdev: move jumbo frame offload check to library

2021-07-09 Thread Ferruh Yigit
Setting MTU bigger than RTE_ETHER_MTU requires the jumbo frame support,
and application should enable the jumbo frame offload support for it.

When jumbo frame offload is not enabled by application, but MTU bigger
than RTE_ETHER_MTU is requested there are two options, either fail or
enable jumbo frame offload implicitly.

Enabling jumbo frame offload implicitly is selected by many drivers
since setting a big MTU value already implies it, and this increases
usability.

This patch moves this logic from drivers to the library, both to reduce
the duplicated code in the drivers and to make behaviour more visible.

Signed-off-by: Ferruh Yigit 
---
 drivers/net/axgbe/axgbe_ethdev.c|  9 ++---
 drivers/net/bnxt/bnxt_ethdev.c  |  9 ++---
 drivers/net/cnxk/cnxk_ethdev_ops.c  |  5 -
 drivers/net/cxgbe/cxgbe_ethdev.c|  8 
 drivers/net/dpaa/dpaa_ethdev.c  |  7 ---
 drivers/net/dpaa2/dpaa2_ethdev.c|  7 ---
 drivers/net/e1000/em_ethdev.c   |  9 ++---
 drivers/net/e1000/igb_ethdev.c  |  9 ++---
 drivers/net/enetc/enetc_ethdev.c|  7 ---
 drivers/net/hinic/hinic_pmd_ethdev.c|  7 ---
 drivers/net/hns3/hns3_ethdev.c  |  8 
 drivers/net/hns3/hns3_ethdev_vf.c   |  6 --
 drivers/net/i40e/i40e_ethdev.c  |  5 -
 drivers/net/i40e/i40e_ethdev_vf.c   |  5 -
 drivers/net/iavf/iavf_ethdev.c  |  7 ---
 drivers/net/ice/ice_ethdev.c|  5 -
 drivers/net/igc/igc_ethdev.c|  9 ++---
 drivers/net/ipn3ke/ipn3ke_representor.c |  5 -
 drivers/net/ixgbe/ixgbe_ethdev.c|  7 ++-
 drivers/net/liquidio/lio_ethdev.c   |  7 ---
 drivers/net/nfp/nfp_net.c   |  6 --
 drivers/net/octeontx/octeontx_ethdev.c  |  5 -
 drivers/net/octeontx2/otx2_ethdev_ops.c |  5 -
 drivers/net/qede/qede_ethdev.c  |  4 
 drivers/net/sfc/sfc_ethdev.c|  9 -
 drivers/net/thunderx/nicvf_ethdev.c |  6 --
 drivers/net/txgbe/txgbe_ethdev.c|  6 --
 lib/ethdev/rte_ethdev.c | 18 +-
 28 files changed, 29 insertions(+), 171 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_ethdev.c b/drivers/net/axgbe/axgbe_ethdev.c
index 76aeec077f2b..2960834b4539 100644
--- a/drivers/net/axgbe/axgbe_ethdev.c
+++ b/drivers/net/axgbe/axgbe_ethdev.c
@@ -1492,15 +1492,10 @@ static int axgb_mtu_set(struct rte_eth_dev *dev, 
uint16_t mtu)
dev->data->port_id);
return -EBUSY;
}
-   if (mtu > RTE_ETHER_MTU) {
-   dev->data->dev_conf.rxmode.offloads |=
-   DEV_RX_OFFLOAD_JUMBO_FRAME;
+   if (mtu > RTE_ETHER_MTU)
val = 1;
-   } else {
-   dev->data->dev_conf.rxmode.offloads &=
-   ~DEV_RX_OFFLOAD_JUMBO_FRAME;
+   else
val = 0;
-   }
AXGMAC_IOWRITE_BITS(pdata, MAC_RCR, JE, val);
return 0;
 }
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 335505a106d5..4344a012f06e 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3018,15 +3018,10 @@ int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, 
uint16_t new_mtu)
return -EINVAL;
}
 
-   if (new_mtu > RTE_ETHER_MTU) {
+   if (new_mtu > RTE_ETHER_MTU)
bp->flags |= BNXT_FLAG_JUMBO;
-   bp->eth_dev->data->dev_conf.rxmode.offloads |=
-   DEV_RX_OFFLOAD_JUMBO_FRAME;
-   } else {
-   bp->eth_dev->data->dev_conf.rxmode.offloads &=
-   ~DEV_RX_OFFLOAD_JUMBO_FRAME;
+   else
bp->flags &= ~BNXT_FLAG_JUMBO;
-   }
 
/* Is there a change in mtu setting? */
if (eth_dev->data->mtu == new_mtu)
diff --git a/drivers/net/cnxk/cnxk_ethdev_ops.c 
b/drivers/net/cnxk/cnxk_ethdev_ops.c
index 695d0d6fd3e2..349896f6a1bf 100644
--- a/drivers/net/cnxk/cnxk_ethdev_ops.c
+++ b/drivers/net/cnxk/cnxk_ethdev_ops.c
@@ -439,11 +439,6 @@ cnxk_nix_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu)
plt_err("Failed to max Rx frame length, rc=%d", rc);
goto exit;
}
-
-   if (mtu > RTE_ETHER_MTU)
-   dev->rx_offloads |= DEV_RX_OFFLOAD_JUMBO_FRAME;
-   else
-   dev->rx_offloads &= ~DEV_RX_OFFLOAD_JUMBO_FRAME;
 exit:
return rc;
 }
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 8cf61f12a8d6..0c9cc2f5bb3f 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -313,14 +313,6 @@ int cxgbe_dev_mtu_set(struct rte_eth_dev *eth_dev, 
uint16_t mtu)
if (mtu < RTE_ETHER_MIN_MTU || new_mtu > dev_info.max_rx_pktlen)
return -EINVAL;
 
-   /* set to jumbo mode if needed */
-   if (mtu > RTE_ETHER_MTU)
-   eth_

[dpdk-dev] [PATCH 3/4] ethdev: move check to library for MTU set

2021-07-09 Thread Ferruh Yigit
Move requested MTU value check to the API to prevent the duplicated
code.

Signed-off-by: Ferruh Yigit 
---
 drivers/net/axgbe/axgbe_ethdev.c| 15 ---
 drivers/net/bnxt/bnxt_ethdev.c  |  2 +-
 drivers/net/cxgbe/cxgbe_ethdev.c| 13 +
 drivers/net/dpaa/dpaa_ethdev.c  |  2 --
 drivers/net/dpaa2/dpaa2_ethdev.c|  4 
 drivers/net/e1000/em_ethdev.c   | 10 --
 drivers/net/e1000/igb_ethdev.c  | 11 ---
 drivers/net/enetc/enetc_ethdev.c|  4 
 drivers/net/hinic/hinic_pmd_ethdev.c|  8 +---
 drivers/net/i40e/i40e_ethdev.c  | 17 -
 drivers/net/i40e/i40e_ethdev_vf.c   | 17 -
 drivers/net/iavf/iavf_ethdev.c  | 10 ++
 drivers/net/ice/ice_ethdev.c| 14 +++---
 drivers/net/igc/igc_ethdev.c|  5 -
 drivers/net/ipn3ke/ipn3ke_representor.c |  6 --
 drivers/net/liquidio/lio_ethdev.c   | 10 --
 drivers/net/nfp/nfp_net.c   |  4 
 drivers/net/octeontx/octeontx_ethdev.c  |  4 
 drivers/net/octeontx2/otx2_ethdev_ops.c |  5 -
 drivers/net/qede/qede_ethdev.c  | 12 
 drivers/net/thunderx/nicvf_ethdev.c |  6 --
 drivers/net/txgbe/txgbe_ethdev.c| 10 --
 lib/ethdev/rte_ethdev.c |  9 +
 23 files changed, 29 insertions(+), 169 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_ethdev.c b/drivers/net/axgbe/axgbe_ethdev.c
index 2960834b4539..c36cd7b1d2f0 100644
--- a/drivers/net/axgbe/axgbe_ethdev.c
+++ b/drivers/net/axgbe/axgbe_ethdev.c
@@ -1478,25 +1478,18 @@ axgbe_dev_supported_ptypes_get(struct rte_eth_dev *dev)
 
 static int axgb_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 {
-   struct rte_eth_dev_info dev_info;
struct axgbe_port *pdata = dev->data->dev_private;
-   uint32_t frame_size = mtu + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
-   unsigned int val = 0;
-   axgbe_dev_info_get(dev, &dev_info);
-   /* check that mtu is within the allowed range */
-   if (mtu < RTE_ETHER_MIN_MTU || frame_size > dev_info.max_rx_pktlen)
-   return -EINVAL;
+   unsigned int val;
+
/* mtu setting is forbidden if port is start */
if (dev->data->dev_started) {
PMD_DRV_LOG(ERR, "port %d must be stopped before configuration",
dev->data->port_id);
return -EBUSY;
}
-   if (mtu > RTE_ETHER_MTU)
-   val = 1;
-   else
-   val = 0;
+   val = mtu > RTE_ETHER_MTU ? 1 : 0;
AXGMAC_IOWRITE_BITS(pdata, MAC_RCR, JE, val);
+
return 0;
 }
 
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 4344a012f06e..1e7da8ba61a6 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -2991,7 +2991,7 @@ int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t 
new_mtu)
uint32_t overhead = BNXT_MAX_PKT_LEN - BNXT_MAX_MTU;
struct bnxt *bp = eth_dev->data->dev_private;
uint32_t new_pkt_size;
-   uint32_t rc = 0;
+   uint32_t rc;
uint32_t i;
 
rc = is_bnxt_in_error(bp);
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 0c9cc2f5bb3f..70b879fed100 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -301,21 +301,10 @@ int cxgbe_dev_mtu_set(struct rte_eth_dev *eth_dev, 
uint16_t mtu)
 {
struct port_info *pi = eth_dev->data->dev_private;
struct adapter *adapter = pi->adapter;
-   struct rte_eth_dev_info dev_info;
-   int err;
uint16_t new_mtu = mtu + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
 
-   err = cxgbe_dev_info_get(eth_dev, &dev_info);
-   if (err != 0)
-   return err;
-
-   /* Must accommodate at least RTE_ETHER_MIN_MTU */
-   if (mtu < RTE_ETHER_MIN_MTU || new_mtu > dev_info.max_rx_pktlen)
-   return -EINVAL;
-
-   err = t4_set_rxmode(adapter, adapter->mbox, pi->viid, new_mtu, -1, -1,
+   return t4_set_rxmode(adapter, adapter->mbox, pi->viid, new_mtu, -1, -1,
-1, -1, true);
-   return err;
 }
 
 /*
diff --git a/drivers/net/dpaa/dpaa_ethdev.c b/drivers/net/dpaa/dpaa_ethdev.c
index a444f749bb96..60dd4f67fc26 100644
--- a/drivers/net/dpaa/dpaa_ethdev.c
+++ b/drivers/net/dpaa/dpaa_ethdev.c
@@ -167,8 +167,6 @@ dpaa_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 
PMD_INIT_FUNC_TRACE();
 
-   if (mtu < RTE_ETHER_MIN_MTU || frame_size > DPAA_MAX_RX_PKT_LEN)
-   return -EINVAL;
/*
 * Refuse mtu that requires the support of scattered packets
 * when this feature has not been enabled before.
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index be2858b3adac..6b44b0557e6a 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net

[dpdk-dev] [PATCH 4/4] ethdev: remove jumbo offload flag

2021-07-09 Thread Ferruh Yigit
Removing 'DEV_RX_OFFLOAD_JUMBO_FRAME' offload flag.

Instead of drivers announce this capability, application can deduct the
capability by checking reported 'dev_info.max_mtu' or
'dev_info.max_rx_pktlen'.

And instead of application explicitly set this flag to enable jumbo
frames, this can be deducted by driver by comparing requested 'mtu' to
'RTE_ETHER_MTU'.

Removing this additional configuration for simplification.

Signed-off-by: Ferruh Yigit 
---
 app/test-eventdev/test_pipeline_common.c  |  2 -
 app/test-pmd/cmdline.c|  2 +-
 app/test-pmd/config.c | 24 +-
 app/test-pmd/testpmd.c| 46 +--
 app/test-pmd/testpmd.h|  2 +-
 doc/guides/howto/debug_troubleshoot.rst   |  2 -
 doc/guides/nics/bnxt.rst  |  1 -
 doc/guides/nics/features.rst  |  3 +-
 drivers/net/atlantic/atl_ethdev.c |  1 -
 drivers/net/axgbe/axgbe_ethdev.c  |  1 -
 drivers/net/bnx2x/bnx2x_ethdev.c  |  1 -
 drivers/net/bnxt/bnxt.h   |  1 -
 drivers/net/bnxt/bnxt_ethdev.c| 10 +---
 drivers/net/bonding/rte_eth_bond_pmd.c|  8 
 drivers/net/cnxk/cnxk_ethdev.h|  5 +-
 drivers/net/cnxk/cnxk_ethdev_ops.c|  1 -
 drivers/net/cxgbe/cxgbe.h |  1 -
 drivers/net/cxgbe/cxgbe_ethdev.c  |  8 
 drivers/net/cxgbe/sge.c   |  5 +-
 drivers/net/dpaa/dpaa_ethdev.c|  2 -
 drivers/net/dpaa2/dpaa2_ethdev.c  |  2 -
 drivers/net/e1000/e1000_ethdev.h  |  4 +-
 drivers/net/e1000/em_ethdev.c |  4 +-
 drivers/net/e1000/em_rxtx.c   | 19 +++-
 drivers/net/e1000/igb_rxtx.c  |  3 +-
 drivers/net/ena/ena_ethdev.c  |  2 -
 drivers/net/enetc/enetc_ethdev.c  |  3 +-
 drivers/net/enic/enic_res.c   |  1 -
 drivers/net/failsafe/failsafe_ops.c   |  2 -
 drivers/net/fm10k/fm10k_ethdev.c  |  1 -
 drivers/net/hinic/hinic_pmd_ethdev.c  |  1 -
 drivers/net/hns3/hns3_ethdev.c|  1 -
 drivers/net/hns3/hns3_ethdev_vf.c |  1 -
 drivers/net/i40e/i40e_ethdev.c|  1 -
 drivers/net/i40e/i40e_ethdev_vf.c |  3 +-
 drivers/net/i40e/i40e_rxtx.c  |  2 +-
 drivers/net/iavf/iavf_ethdev.c|  3 +-
 drivers/net/ice/ice_dcf_ethdev.c  |  3 +-
 drivers/net/ice/ice_dcf_vf_representor.c  |  1 -
 drivers/net/ice/ice_ethdev.c  |  1 -
 drivers/net/ice/ice_rxtx.c|  3 +-
 drivers/net/igc/igc_ethdev.h  |  1 -
 drivers/net/igc/igc_txrx.c|  2 +-
 drivers/net/ionic/ionic_ethdev.c  |  1 -
 drivers/net/ipn3ke/ipn3ke_representor.c   |  3 +-
 drivers/net/ixgbe/ixgbe_ethdev.c  |  5 +-
 drivers/net/ixgbe/ixgbe_pf.c  |  9 +---
 drivers/net/ixgbe/ixgbe_rxtx.c|  3 +-
 drivers/net/mlx4/mlx4_rxq.c   |  1 -
 drivers/net/mlx5/mlx5_rxq.c   |  1 -
 drivers/net/mvneta/mvneta_ethdev.h|  3 +-
 drivers/net/mvpp2/mrvl_ethdev.c   |  1 -
 drivers/net/nfp/nfp_net.c |  6 +--
 drivers/net/octeontx/octeontx_ethdev.h|  1 -
 drivers/net/octeontx2/otx2_ethdev.h   |  1 -
 drivers/net/octeontx_ep/otx_ep_ethdev.c   |  3 +-
 drivers/net/octeontx_ep/otx_ep_rxtx.c |  6 ---
 drivers/net/qede/qede_ethdev.c|  1 -
 drivers/net/sfc/sfc_rx.c  |  2 -
 drivers/net/thunderx/nicvf_ethdev.h   |  1 -
 drivers/net/txgbe/txgbe_rxtx.c|  1 -
 drivers/net/virtio/virtio_ethdev.c|  1 -
 drivers/net/vmxnet3/vmxnet3_ethdev.c  |  1 -
 examples/ip_fragmentation/main.c  |  3 +-
 examples/ip_reassembly/main.c |  3 +-
 examples/ipsec-secgw/ipsec-secgw.c|  2 -
 examples/ipv4_multicast/main.c|  1 -
 examples/kni/main.c   |  5 --
 examples/l3fwd-acl/main.c |  2 -
 examples/l3fwd-graph/main.c   |  1 -
 examples/l3fwd-power/main.c   |  2 -
 examples/l3fwd/main.c |  1 -
 .../performance-thread/l3fwd-thread/main.c|  2 -
 examples/vhost/main.c |  2 -
 lib/ethdev/rte_ethdev.c   | 26 +--
 lib/ethdev/rte_ethdev.h   |  1 -
 76 files changed, 42 insertions(+), 250 deletions(-)

diff --git a/app/test-eventdev/test_pipeline_common.c 
b/app/test-eventdev/test_pipeline_common.c
index 5fcea74b4d43..2775e72c580d 100644
--- a/app/test-eventdev/test_pipeline_common.c
+++ b/app/test-eventdev/test_pipeline_common.c
@@ -199,8 +199,6 @@ pipeline_ethdev_setup(struct evt_test *test, stru

Re: [dpdk-dev] [PATCH v18 0/7] aarch64 -> aarch32 cross compilation support

2021-07-09 Thread Thomas Monjalon
07/07/2021 15:25, Juraj Linkeš:
> Add support for aarch32 cross build in meson.
> 
> Aarch32 is an execution state that allows execution of 32-bit code on
> armv8 machines. This execution state contains a superset of previous
> armv7 32-bit instructions and features. Thus the aarch32 build is distinct
> from arvm7 build.
> 
> v18:
> Rebased, adjusted to use arm soc format.
> 
> Acked-by: Aaron Conole 
> 
> Juraj Linkeš (4):
>   net/virtio: fix aarch32 build
>   eal/arm: update CPU flags
>   build: add aarch32 meson build flags
>   build: add aarch32 to meson cross-compilation
> 
> Phil Yang (1):
>   doc: add aarch32 build guidance
> 
> Ruifeng Wang (2):
>   net/sfc: fix aarch32 build
>   net/bnxt: fix aarch32 build

Applied, thanks.

I did few small improvements in the doc.





[dpdk-dev] RHEL 7 support

2021-07-09 Thread Thomas Monjalon
Hi,

I would like to open a discussion about RHEL 7 support in DPDK.
How long do we want to support it in new DPDK versions?
Can we drop RHEL 7 support starting DPDK 21.11?

If we decide to drop RHEL 7 support, does it mean we can generally use
standard C11 atomics?

What other benefits or impacts can you think about?




Re: [dpdk-dev] [PATCH v4 0/3] Use WFE for spinlock and ring

2021-07-09 Thread Thomas Monjalon
07/07/2021 07:48, Ruifeng Wang:
> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'[1].
> 
> Use the API for the rte spinlock and ring implementations.
> With the wait until equal APIs being stable, changes will not impact ABI.
> 
> Gavin Hu (1):
>   spinlock: use wfe to reduce contention on aarch64
> 
> Ruifeng Wang (2):
>   ring: use wfe to wait for ring tail update on aarch64
>   build: add option to enable wait until equal

As discussed in the thread, patches 1 & 2 are applied.
The patch 3 (meson option) is rejected.




Re: [dpdk-dev] [PATCH] net/bnxt: fix build failure

2021-07-09 Thread Ajit Khaparde
On Fri, Jul 9, 2021 at 1:48 AM Thomas Monjalon  wrote:

> 09/07/2021 00:49, Ajit Khaparde:
> > Fix build failures because of uninitialized variable usage.
>
> You should add the error log here.
> You don't mention the condition of failure.
> Nobody reproduced a failure so far.
>
Sure. I did not notice it on my Fedora34 either.
The build error was reported in the dpdk-test-report [1].
Let me resend a v2 with the information.

[1] http://mails.dpdk.org/archives/test-report/2021-July/203186.html


> > Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation")
> >
> > Signed-off-by: Kishore Padmanabha 
> > Signed-off-by: Ajit Khaparde 
>
>
>
>


Re: [dpdk-dev] [PATCH] net/bnxt: fix missing barriers in completion handling

2021-07-09 Thread Lance Richardson
On Fri, Jul 9, 2021 at 2:00 AM Ruifeng Wang  wrote:
>

> > +/**
> > + * Check validity of a completion ring entry. If the entry is valid, 
> > include a
> > + * C11 __ATOMIC_ACQUIRE fence to ensure that subsequent loads of fields
> > in the
> > + * completion are not hoisted by the compiler or by the CPU to come before
> > the
> > + * loading of the "valid" field.
> > + *
> > + * Note: the caller must not access any fields in the specified completion
> > + * entry prior to calling this function.
> > + *
> > + * @param cmp
> Nit, cmpl

Thanks, good catch. I'll fix this in v2.


>
> >
> >   /* Check to see if hw has posted a completion for the descriptor. */
> > @@ -3327,7 +3327,7 @@ bnxt_tx_descriptor_status_op(void *tx_queue,
> > uint16_t offset)
> >   cons = RING_CMPL(ring_mask, raw_cons);
> >   txcmp = (struct tx_cmpl *)&cp_desc_ring[cons];
> >
> > - if (!CMP_VALID(txcmp, raw_cons, cp_ring_struct))
> > + if (!bnxt_cpr_cmp_valid(txcmp, raw_cons, ring_mask + 1))
> cpr->cp_ring_struct->ring_size can be used instead of 'ring_mask + 1'?
>
> >   break;
> >
> >   if (CMP_TYPE(txcmp) == TX_CMPL_TYPE_TX_L2)
>
> 
>
> > diff --git a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
> > b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
> > index 263e6ec3c..13211060c 100644
> > --- a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
> > +++ b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
> > @@ -339,7 +339,7 @@ bnxt_handle_tx_cp_vec(struct bnxt_tx_queue *txq)
> >   cons = RING_CMPL(ring_mask, raw_cons);
> >   txcmp = (struct tx_cmpl *)&cp_desc_ring[cons];
> >
> > - if (!CMP_VALID(txcmp, raw_cons, cp_ring_struct))
> > + if (!bnxt_cpr_cmp_valid(txcmp, raw_cons, ring_mask + 1))
> Same here. I think cpr->cp_ring_struct->ring_size can be used and it avoids 
> calculation.
> Also some places in other vector files.

It's true that cpr->cp_ring_struct->ring_size and ring_mask + 1 are
equivalent, but there doesn't seem to be a meaningful difference
between the two in the generated code.

Based on disassembly of x86 and Arm code for this function, the compiler
correctly determines that the value of ring_mask + 1 doesn't change within
the loop, so it is only computed once. The only difference would be in
whether an add instruction or a load instruction is used to put the value
in the register.


Re: [dpdk-dev] [PATCH] net/bnxt: fix build failure

2021-07-09 Thread Lance Richardson
On Fri, Jul 9, 2021 at 4:48 AM Thomas Monjalon  wrote:
>
> 09/07/2021 00:49, Ajit Khaparde:
> > Fix build failures because of uninitialized variable usage.
>
> You should add the error log here.
> You don't mention the condition of failure.
> Nobody reproduced a failure so far.

Hi Thomas,

This addresses FC34 (which I think is using gcc11) build failures, there
is an example here:
 http://mails.dpdk.org/archives/test-report/2021-July/203186.html

Regards,
Lance


Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs

2021-07-09 Thread Tyler Retzlaff
On Fri, Jul 09, 2021 at 11:46:54AM +0530, Jerin Jacob wrote:
> > +
> > +Promotion to stable
> > +~~~
> > +
> > +Ordinarily APIs marked as ``experimental`` will be promoted to the stable 
> > ABI
> > +once a maintainer and/or the original contributor is satisfied that the 
> > API is
> > +reasonably mature. In exceptional circumstances, should an API still be
> 
> Is this line with git commit message?
> Why making an exceptional case? why not make it stable after two years
> or remove it.
> My worry is if we make an exception case, it will be difficult to
> enumerate the exception case.

i think the intent here is to indicate that an api/abi doesn't just
automatically become stable after a period of time.  there also has to
be an evaluation by the maintainer / community before making it stable.

so i guess the timer is something that should force that evaluation. as
a part of that evaluation one would imagine there is justification for
keeping the api as experimental for longer and if so a rationale as to
why.


Re: [dpdk-dev] [PATCH v10 0/8] Enhancements for PMD power management

2021-07-09 Thread David Marchand
On Fri, Jul 9, 2021 at 6:08 PM Anatoly Burakov
 wrote:
>
> This patchset introduces several changes related to PMD power management:
>
> - Changed monitoring intrinsics to use callbacks as a comparison function, 
> based
>   on previous patchset [1] but incorporating feedback [2] - this hopefully 
> will
>   make it possible to add support for .get_monitor_addr in virtio
> - Add a new intrinsic to monitor multiple addresses, based on RTM instruction
>   set and the TPAUSE instruction
> - Add support for PMD power management on multiple queues, as well as all
>   accompanying infrastructure and example apps changes
>
> v10:
> - Added missing changes to NIC feature .ini files
>
> v9:
> - Added all missing Acks and Tests
> - Added a new commit with NIC features
> - Addressed minor issues raised in review
>
> v8:
> - Fixed checkpatch issue
> - Added comment explaining empty poll handling (Konstantin)
>
> v7:
> - Fixed various bugs
>
> v6:
> - Improved the algorithm for multi-queue sleep
> - Fixed segfault and addressed other feedback
>
> v5:
> - Removed "power save queue" API and replaced with mechanism suggested by
>   Konstantin
> - Addressed other feedback
>
> v4:
> - Replaced raw number with a macro
> - Fixed all the bugs found by Konstantin
> - Some other minor corrections
>
> v3:
> - Moved some doc updates to NIC features list
>
> v2:
> - Changed check inversion to callbacks
> - Addressed feedback from Konstantin
> - Added doc updates where necessary
>
> [1] http://patches.dpdk.org/project/dpdk/list/?series=16930&state=*
> [2] 
> http://patches.dpdk.org/project/dpdk/patch/819ef1ace187365a615d3383e54579e3d9fb216e.1620747068.git.anatoly.bura...@intel.com/#133274
>
> Anatoly Burakov (8):
>   eal: use callbacks for power monitoring comparison
>   net/af_xdp: add power monitor support
>   doc: add PMD power management NIC feature
>   eal: add power monitor for multiple events
>   power: remove thread safety from PMD power API's
>   power: support callbacks for multiple Rx queues
>   power: support monitoring multiple Rx queues
>   examples/l3fwd-power: support multiq in PMD modes

Overall, the series lgtm.

I still have a comment on the opaque pointer passed in callbacks.
This is not blocking, we can still go with followup patches in this release.

It would be great if drivers maintainers could implement this new ops
in their driver or give feedback on what should be enhanced.

Series applied, thanks.


-- 
David Marchand



[dpdk-dev] [PATCH v2] net/bnxt: fix build failure

2021-07-09 Thread Ajit Khaparde
Fix build failures because of uninitialized variables.

../drivers/net/bnxt/tf_ulp/ulp_mapper.c: In function 
‘ulp_mapper_index_tbl_process’:
../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: ‘*(unsigned int 
*)((char *)&glb_res + offsetof(struct bnxt_ulp_glb_resource_info, 
resource_func))’ may be used uninitialized in this function 
[-Werror=maybe-uninitialized]
 2252 | struct bnxt_ulp_glb_resource_info glb_res;
  |   ^~~
../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: ‘glb_res.resource_type’ 
may be used uninitialized in this function [-Werror=maybe-uninitialized]

../drivers/net/bnxt/tf_core/dpool.c: In function ‘dpool_defrag’:
../drivers/net/bnxt/tf_core/dpool.c:95:18: error: ‘index’ may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
   95 | uint32_t index;
  |  ^

Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation")

Signed-off-by: Kishore Padmanabha 
Signed-off-by: Ajit Khaparde 
---
v1->v2: update commit log to show the build error encountered.
---
 drivers/net/bnxt/tf_core/dpool.c | 1 +
 drivers/net/bnxt/tf_ulp/ulp_mapper.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/tf_core/dpool.c b/drivers/net/bnxt/tf_core/dpool.c
index 0dae42b1bb..145efa486f 100644
--- a/drivers/net/bnxt/tf_core/dpool.c
+++ b/drivers/net/bnxt/tf_core/dpool.c
@@ -125,6 +125,7 @@ int dpool_defrag(struct dpool *dpool,
largest_free_size = 0;
largest_free_index = 0;
count = 0;
+   index = 0;
 
for (i = 0; i < dpool->size; i++) {
if (DP_IS_FREE(dpool->entry[i].flags)) {
diff --git a/drivers/net/bnxt/tf_ulp/ulp_mapper.c 
b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
index acd9f996e8..871dbad0fe 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_mapper.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
@@ -2249,7 +2249,7 @@ ulp_mapper_index_tbl_process(struct bnxt_ulp_mapper_parms 
*parms,
struct tf_free_tbl_entry_parms free_parms = { 0 };
uint32_t tbl_scope_id;
struct tf *tfp;
-   struct bnxt_ulp_glb_resource_info glb_res;
+   struct bnxt_ulp_glb_resource_info glb_res = { 0 };
uint16_t bit_size;
bool alloc = false;
bool write = false;
-- 
2.21.1 (Apple Git-122.3)



Re: [dpdk-dev] [PATCH 1/2] build: fix SVE compile error with gcc8.3

2021-07-09 Thread Thomas Monjalon
28/06/2021 04:57, Chengwen Feng:
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -488,3 +488,9 @@ if cc.get_define('__ARM_FEATURE_CRYPTO', args: 
> machine_args) != ''
>  compile_time_cpuflags += ['RTE_CPUFLAG_AES', 'RTE_CPUFLAG_PMULL',
>  'RTE_CPUFLAG_SHA1', 'RTE_CPUFLAG_SHA2']
>  endif
> +
> +# Check whether SVE ACLE is supported and set the corresponding flag which 
> will used with SVE ACLE code.
> +if (cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != '' and
> +cc.check_header('arm_sve.h'))
> +dpdk_conf.set('RTE_HAS_SVE_ACLE', 1)
> +endif

Simpler and better sorted:

--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -524,6 +524,9 @@ endif
 
 if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
 compile_time_cpuflags += ['RTE_CPUFLAG_SVE']
+if (cc.check_header('arm_sve.h'))
+dpdk_conf.set('RTE_HAS_SVE_ACLE', 1)
+endif
 endif





Re: [dpdk-dev] [PATCH 0/2] bugfix for SVE compile

2021-07-09 Thread Thomas Monjalon
28/06/2021 04:57, Chengwen Feng:
> This patch set contains two bugfixes for SVE compile.
> Note:
> 1) Because 2/2 patch needs backport to 20.11 so we separate it.
> 2) These two patches already acked by ARM guys from previous threads.
> 
> Chengwen Feng (2):
>   build: fix SVE compile error with gcc8.3
>   net/hns3: fix SVE code compile error with gcc8.3

Applied with proposed improvement, thanks.




[dpdk-dev] [PATCH v2] net/bnxt: fix missing barriers in completion handling

2021-07-09 Thread Lance Richardson
Ensure that Rx/Tx/Async completion entry fields are accessed
only after the completion's valid flag has been loaded and
verified. This is needed for correct operation on systems that
use relaxed memory consistency models.

Fixes: 2eb53b134aae ("net/bnxt: add initial Rx code")
Fixes: 6eb3cc2294fd ("net/bnxt: add initial Tx code")
Cc: sta...@dpdk.org
Signed-off-by: Lance Richardson 
Reviewed-by: Ajit Khaparde 
---
v2:
* Corrected name of the first parameter to bnxt_cpr_cmp_valid() in
  comments ('cmp' to 'cmpl').

 drivers/net/bnxt/bnxt_cpr.h   | 36 ---
 drivers/net/bnxt/bnxt_ethdev.c| 16 ++--
 drivers/net/bnxt/bnxt_irq.c   |  7 +++---
 drivers/net/bnxt/bnxt_rxr.c   |  9 ---
 drivers/net/bnxt/bnxt_rxtx_vec_avx2.c |  2 +-
 drivers/net/bnxt/bnxt_rxtx_vec_neon.c |  2 +-
 drivers/net/bnxt/bnxt_rxtx_vec_sse.c  |  2 +-
 drivers/net/bnxt/bnxt_txr.c   |  2 +-
 8 files changed, 54 insertions(+), 22 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_cpr.h b/drivers/net/bnxt/bnxt_cpr.h
index 2a56ec52c..4095c8c40 100644
--- a/drivers/net/bnxt/bnxt_cpr.h
+++ b/drivers/net/bnxt/bnxt_cpr.h
@@ -8,13 +8,10 @@
 #include 
 
 #include 
+#include "hsi_struct_def_dpdk.h"
 
 struct bnxt_db_info;
 
-#define CMP_VALID(cmp, raw_cons, ring) \
-   (!!(rte_le_to_cpu_32(((struct cmpl_base *)(cmp))->info3_v) &\
-   CMPL_BASE_V) == !((raw_cons) & ((ring)->ring_size)))
-
 #define CMP_TYPE(cmp)  \
(((struct cmpl_base *)cmp)->type & CMPL_BASE_TYPE_MASK)
 
@@ -121,4 +118,35 @@ bool bnxt_is_recovery_enabled(struct bnxt *bp);
 bool bnxt_is_master_func(struct bnxt *bp);
 
 void bnxt_stop_rxtx(struct bnxt *bp);
+
+/**
+ * Check validity of a completion ring entry. If the entry is valid, include a
+ * C11 __ATOMIC_ACQUIRE fence to ensure that subsequent loads of fields in the
+ * completion are not hoisted by the compiler or by the CPU to come before the
+ * loading of the "valid" field.
+ *
+ * Note: the caller must not access any fields in the specified completion
+ * entry prior to calling this function.
+ *
+ * @param cmpl
+ *   Pointer to an entry in the completion ring.
+ * @param raw_cons
+ *   Raw consumer index of entry in completion ring.
+ * @param ring_size
+ *   Size of completion ring.
+ */
+static __rte_always_inline bool
+bnxt_cpr_cmp_valid(const void *cmpl, uint32_t raw_cons, uint32_t ring_size)
+{
+   const struct cmpl_base *c = cmpl;
+   bool expected, valid;
+
+   expected = !(raw_cons & ring_size);
+   valid = !!(rte_le_to_cpu_32(c->info3_v) & CMPL_BASE_V);
+   if (valid == expected) {
+   rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+   return true;
+   }
+   return false;
+}
 #endif
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index ed09f1bf5..ee6929692 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3126,7 +3126,7 @@ bnxt_rx_queue_count_op(struct rte_eth_dev *dev, uint16_t 
rx_queue_id)
 {
struct bnxt *bp = (struct bnxt *)dev->data->dev_private;
struct bnxt_cp_ring_info *cpr;
-   uint32_t desc = 0, raw_cons;
+   uint32_t desc = 0, raw_cons, cp_ring_size;
struct bnxt_rx_queue *rxq;
struct rx_pkt_cmpl *rxcmp;
int rc;
@@ -3138,6 +3138,7 @@ bnxt_rx_queue_count_op(struct rte_eth_dev *dev, uint16_t 
rx_queue_id)
rxq = dev->data->rx_queues[rx_queue_id];
cpr = rxq->cp_ring;
raw_cons = cpr->cp_raw_cons;
+   cp_ring_size = cpr->cp_ring_struct->ring_size;
 
while (1) {
uint32_t agg_cnt, cons, cmpl_type;
@@ -3145,7 +3146,7 @@ bnxt_rx_queue_count_op(struct rte_eth_dev *dev, uint16_t 
rx_queue_id)
cons = RING_CMP(cpr->cp_ring_struct, raw_cons);
rxcmp = (struct rx_pkt_cmpl *)&cpr->cp_desc_ring[cons];
 
-   if (!CMP_VALID(rxcmp, raw_cons, cpr->cp_ring_struct))
+   if (!bnxt_cpr_cmp_valid(rxcmp, raw_cons, cp_ring_size))
break;
 
cmpl_type = CMP_TYPE(rxcmp);
@@ -3189,7 +3190,7 @@ bnxt_rx_descriptor_status_op(void *rx_queue, uint16_t 
offset)
struct bnxt_rx_queue *rxq = rx_queue;
struct bnxt_cp_ring_info *cpr;
struct bnxt_rx_ring_info *rxr;
-   uint32_t desc, raw_cons;
+   uint32_t desc, raw_cons, cp_ring_size;
struct bnxt *bp = rxq->bp;
struct rx_pkt_cmpl *rxcmp;
int rc;
@@ -3203,6 +3204,7 @@ bnxt_rx_descriptor_status_op(void *rx_queue, uint16_t 
offset)
 
rxr = rxq->rx_ring;
cpr = rxq->cp_ring;
+   cp_ring_size = cpr->cp_ring_struct->ring_size;
 
/*
 * For the vector receive case, the completion at the requested
@@ -3219,7 +3221,7 @@ bnxt_rx_descriptor_status_op(void *rx_queue, uint16_t 
offset)
cons = RING_CMP(cpr->cp_ring_struct, raw_cons);
r

Re: [dpdk-dev] [PATCH v2] net/bnxt: fix build failure

2021-07-09 Thread Thomas Monjalon
09/07/2021 20:38, Ajit Khaparde:
> On Fri, Jul 9, 2021 at 8:37 AM Ajit Khaparde 
> wrote:
> 
> > Fix build failures because of uninitialized variables.
> >
> > ../drivers/net/bnxt/tf_ulp/ulp_mapper.c: In function
> > ‘ulp_mapper_index_tbl_process’:
> > ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: ‘*(unsigned int
> > *)((char *)&glb_res + offsetof(struct bnxt_ulp_glb_resource_info,
> > resource_func))’ may be used uninitialized in this function
> > [-Werror=maybe-uninitialized]
> >  2252 | struct bnxt_ulp_glb_resource_info glb_res;
> >   |   ^~~
> > ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error:
> > ‘glb_res.resource_type’ may be used uninitialized in this function
> > [-Werror=maybe-uninitialized]
> >
> > ../drivers/net/bnxt/tf_core/dpool.c: In function ‘dpool_defrag’:
> > ../drivers/net/bnxt/tf_core/dpool.c:95:18: error: ‘index’ may be used
> > uninitialized in this function [-Werror=maybe-uninitialized]
> >95 | uint32_t index;
> >   |  ^
> >
> > Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation")
> >
> > Signed-off-by: Kishore Padmanabha 
> > Signed-off-by: Ajit Khaparde 
> 
> Patch applied to dpdk-next-net-brcm for-next-net branch. Thanks

Pulled in main with additional info:
seen on Fedora Core 34 (GCC 11)





Re: [dpdk-dev] [PATCH V3] table: fix bucket empty logic

2021-07-09 Thread Thomas Monjalon
> > Due to a typo, only 3 out of 4 keys in the bucket of the exact match
> > table were considered, which can result in valid keys being
> > incorrectly dropped from the table.
> > 
> > Fixes: d0a00966618ba ("table: add exact match SWX table")
> > Cc: sta...@dpdk.org
> > Cc: Cristian Dumitrescu 
> > 
> > Signed-off-by: Thierry Herbelot 
> 
> Acked-by: Cristian Dumitrescu 

Applied, thanks




Re: [dpdk-dev] [PATCH V2] pipeline: fix table entry read

2021-07-09 Thread Thomas Monjalon
06/07/2021 00:56, Cristian Dumitrescu:
> The rte_swx_pipeline_table_entry_read() function is used to read from
> a character string a table entry that is to be added to the table,
> deleted from the table or set as the default entry of the table.
> Addition needs both the match and the part of the entry, deletion
> ignores the action part, while the default set ignores the match part,
> hence the need to make both the match and the action part optional.
> The logic for skipping the match or the action part was broken, hence
> the current fix.
> 
> Fixes: b32c0a2c5e4c ("pipeline: add SWX table update high level API")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Cristian Dumitrescu 
> Signed-off-by: Venkata Suresh Kumar P 
> Signed-off-by: Churchill Khangar 

Applied, thanks




Re: [dpdk-dev] [PATCH V3 1/5] examples/pipeline: improve table update CLI commands

2021-07-09 Thread Thomas Monjalon
03/07/2021 00:46, Cristian Dumitrescu:
> From: Churchill Khangar 
> 
> For more felxibility, the single monolithic table update command is
> split into table entry add, table entry delete, table default entry
> add, pipeline commit and pipeline abort.
> 
> Signed-off-by: Churchill Khangar 
> Signed-off-by: Cristian Dumitrescu 

Series applied, thanks.





Re: [dpdk-dev] [PATCH V2] pipeline: add support for LPM lookup

2021-07-09 Thread Thomas Monjalon
08/07/2021 12:11, Cristian Dumitrescu:
> Add support for the Longest Prefix Match (LPM) lookup to the SWX
> pipeline.
> 
> Signed-off-by: Cristian Dumitrescu 
> Signed-off-by: Churchill Khangar 

Applied, thanks.





Re: [dpdk-dev] [PATCH] examples/pipeline: add FIB example

2021-07-09 Thread Thomas Monjalon
09/07/2021 19:07, Cristian Dumitrescu:
> Add example for FIB with VRF and ECMP support.
> 
> Signed-off-by: Cristian Dumitrescu 
> Signed-off-by: Churchill Khangar 

Applied, thanks.





Re: [dpdk-dev] [PATCH v2] app/procinfo: add device registers dump

2021-07-09 Thread Thomas Monjalon
21/06/2021 04:17, Min Hu (Connor):
> From: Chengchang Tang 
> 
> This patch add support for dump the device registers from a running
> application. It can help developers locate the problem.
> 
> Signed-off-by: Chengchang Tang 
> Signed-off-by: Min Hu (Connor) 
> ---
> v2:
> * some logs are adjusted and error string are printed after
> file operation fails.

Ping for review please.





Re: [dpdk-dev] [PATCH] examples/pipeline: add FIB example

2021-07-09 Thread Thomas Monjalon
09/07/2021 23:43, Thomas Monjalon:
> 09/07/2021 19:07, Cristian Dumitrescu:
> > Add example for FIB with VRF and ECMP support.
> > 
> > Signed-off-by: Cristian Dumitrescu 
> > Signed-off-by: Churchill Khangar 
> 
> Applied, thanks.

Sorry, no it cannot be in 21.08-rc1 because the series "selector table"
fails to compile on 32-bit target.




Re: [dpdk-dev] [PATCH V2] pipeline: add support for LPM lookup

2021-07-09 Thread Thomas Monjalon
09/07/2021 23:41, Thomas Monjalon:
> 08/07/2021 12:11, Cristian Dumitrescu:
> > Add support for the Longest Prefix Match (LPM) lookup to the SWX
> > pipeline.
> > 
> > Signed-off-by: Cristian Dumitrescu 
> > Signed-off-by: Churchill Khangar 
> 
> Applied, thanks.

Sorry, no it cannot be in 21.08-rc1 because the series "selector table"
fails to compile on 32-bit target.




Re: [dpdk-dev] [PATCH V3 1/5] examples/pipeline: improve table update CLI commands

2021-07-09 Thread Thomas Monjalon
09/07/2021 23:37, Thomas Monjalon:
> 03/07/2021 00:46, Cristian Dumitrescu:
> > From: Churchill Khangar 
> > 
> > For more felxibility, the single monolithic table update command is
> > split into table entry add, table entry delete, table default entry
> > add, pipeline commit and pipeline abort.
> > 
> > Signed-off-by: Churchill Khangar 
> > Signed-off-by: Cristian Dumitrescu 
> 
> Series applied, thanks.

Sorry, only first 2 patches are really kept,
because the patch 3 fails 32-bit compilation:

lib/pipeline/rte_swx_pipeline.c:9851:33: error:
array subscript ‘struct rte_swx_table_selector_params[0]’
is partly outside array bounds of ‘unsigned char[24]’ [-Werror=array-bounds]





Re: [dpdk-dev] [PATCH v2] net/bnxt: fix build failure

2021-07-09 Thread Ajit Khaparde
On Fri, Jul 9, 2021 at 8:37 AM Ajit Khaparde 
wrote:

> Fix build failures because of uninitialized variables.
>
> ../drivers/net/bnxt/tf_ulp/ulp_mapper.c: In function
> ‘ulp_mapper_index_tbl_process’:
> ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: ‘*(unsigned int
> *)((char *)&glb_res + offsetof(struct bnxt_ulp_glb_resource_info,
> resource_func))’ may be used uninitialized in this function
> [-Werror=maybe-uninitialized]
>  2252 | struct bnxt_ulp_glb_resource_info glb_res;
>   |   ^~~
> ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error:
> ‘glb_res.resource_type’ may be used uninitialized in this function
> [-Werror=maybe-uninitialized]
>
> ../drivers/net/bnxt/tf_core/dpool.c: In function ‘dpool_defrag’:
> ../drivers/net/bnxt/tf_core/dpool.c:95:18: error: ‘index’ may be used
> uninitialized in this function [-Werror=maybe-uninitialized]
>95 | uint32_t index;
>   |  ^
>
> Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation")
>
> Signed-off-by: Kishore Padmanabha 
> Signed-off-by: Ajit Khaparde 
> ---
> v1->v2: update commit log to show the build error encountered.
> ---
>
Patch applied to dpdk-next-net-brcm for-next-net branch. Thanks



>  drivers/net/bnxt/tf_core/dpool.c | 1 +
>  drivers/net/bnxt/tf_ulp/ulp_mapper.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/bnxt/tf_core/dpool.c
> b/drivers/net/bnxt/tf_core/dpool.c
> index 0dae42b1bb..145efa486f 100644
> --- a/drivers/net/bnxt/tf_core/dpool.c
> +++ b/drivers/net/bnxt/tf_core/dpool.c
> @@ -125,6 +125,7 @@ int dpool_defrag(struct dpool *dpool,
> largest_free_size = 0;
> largest_free_index = 0;
> count = 0;
> +   index = 0;
>
> for (i = 0; i < dpool->size; i++) {
> if (DP_IS_FREE(dpool->entry[i].flags)) {
> diff --git a/drivers/net/bnxt/tf_ulp/ulp_mapper.c
> b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
> index acd9f996e8..871dbad0fe 100644
> --- a/drivers/net/bnxt/tf_ulp/ulp_mapper.c
> +++ b/drivers/net/bnxt/tf_ulp/ulp_mapper.c
> @@ -2249,7 +2249,7 @@ ulp_mapper_index_tbl_process(struct
> bnxt_ulp_mapper_parms *parms,
> struct tf_free_tbl_entry_parms free_parms = { 0 };
> uint32_t tbl_scope_id;
> struct tf *tfp;
> -   struct bnxt_ulp_glb_resource_info glb_res;
> +   struct bnxt_ulp_glb_resource_info glb_res = { 0 };
> uint16_t bit_size;
> bool alloc = false;
> bool write = false;
> --
> 2.21.1 (Apple Git-122.3)
>
>


[dpdk-dev] [PATCH V4 2/5] table: add support for selector tables

2021-07-09 Thread Cristian Dumitrescu
A selector table is made up of groups of weighted members, with a
given member potentially part of several groups. The select operation
returns a member ID by first selecting a group based on an input group
ID and then selecting a member within that group based on hashing one
or several input header/meta-data fields. It is very useful for
implementing an ECMP/WCMP-enabled FIB or a load balancer. It is part
of the action selector described by the P4 Portable Switch
Architecture (PSA) specification.

Signed-off-by: Cristian Dumitrescu 
---
 lib/table/meson.build  |   2 +
 lib/table/rte_swx_table_selector.c | 581 +
 lib/table/rte_swx_table_selector.h | 203 ++
 lib/table/version.map  |   8 +
 4 files changed, 794 insertions(+)
 create mode 100644 lib/table/rte_swx_table_selector.c
 create mode 100644 lib/table/rte_swx_table_selector.h

diff --git a/lib/table/meson.build b/lib/table/meson.build
index b7b70b805..16e55f086 100644
--- a/lib/table/meson.build
+++ b/lib/table/meson.build
@@ -3,6 +3,7 @@
 
 sources = files(
 'rte_swx_table_em.c',
+   'rte_swx_table_selector.c',
 'rte_swx_table_wm.c',
 'rte_table_acl.c',
 'rte_table_array.c',
@@ -20,6 +21,7 @@ headers = files(
 'rte_lru.h',
 'rte_swx_table.h',
 'rte_swx_table_em.h',
+   'rte_swx_table_selector.h',
 'rte_swx_table_wm.h',
 'rte_table.h',
 'rte_table_acl.h',
diff --git a/lib/table/rte_swx_table_selector.c 
b/lib/table/rte_swx_table_selector.c
new file mode 100644
index 0..541ebc221
--- /dev/null
+++ b/lib/table/rte_swx_table_selector.c
@@ -0,0 +1,581 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Intel Corporation
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "rte_swx_table_selector.h"
+
+#ifndef RTE_SWX_TABLE_SELECTOR_HUGE_PAGES_DISABLE
+
+#include 
+
+static void *
+env_calloc(size_t size, size_t alignment, int numa_node)
+{
+   return rte_zmalloc_socket(NULL, size, alignment, numa_node);
+}
+
+static void
+env_free(void *start, size_t size __rte_unused)
+{
+   rte_free(start);
+}
+
+#else
+
+#include 
+
+static void *
+env_calloc(size_t size, size_t alignment __rte_unused, int numa_node)
+{
+   void *start;
+
+   if (numa_available() == -1)
+   return NULL;
+
+   start = numa_alloc_onnode(size, numa_node);
+   if (!start)
+   return NULL;
+
+   memset(start, 0, size);
+   return start;
+}
+
+static void
+env_free(void *start, size_t size)
+{
+   if ((numa_available() == -1) || !start)
+   return;
+
+   numa_free(start, size);
+}
+
+#endif
+
+#if defined(RTE_ARCH_X86_64)
+
+#include 
+
+#define crc32_u64(crc, v) _mm_crc32_u64(crc, v)
+
+#else
+
+static inline uint64_t
+crc32_u64_generic(uint64_t crc, uint64_t value)
+{
+   int i;
+
+   crc = (crc & 0xLLU) ^ value;
+   for (i = 63; i >= 0; i--) {
+   uint64_t mask;
+
+   mask = -(crc & 1LLU);
+   crc = (crc >> 1LLU) ^ (0x82F63B78LLU & mask);
+   }
+
+   return crc;
+}
+
+#define crc32_u64(crc, v) crc32_u64_generic(crc, v)
+
+#endif
+
+/* Key size needs to be one of: 8, 16, 32 or 64. */
+static inline uint32_t
+hash(void *key, void *key_mask, uint32_t key_size, uint32_t seed)
+{
+   uint64_t *k = key;
+   uint64_t *m = key_mask;
+   uint64_t k0, k2, k5, crc0, crc1, crc2, crc3, crc4, crc5;
+
+   switch (key_size) {
+   case 8:
+   crc0 = crc32_u64(seed, k[0] & m[0]);
+   return crc0;
+
+   case 16:
+   k0 = k[0] & m[0];
+
+   crc0 = crc32_u64(k0, seed);
+   crc1 = crc32_u64(k0 >> 32, k[1] & m[1]);
+
+   crc0 ^= crc1;
+
+   return crc0;
+
+   case 32:
+   k0 = k[0] & m[0];
+   k2 = k[2] & m[2];
+
+   crc0 = crc32_u64(k0, seed);
+   crc1 = crc32_u64(k0 >> 32, k[1] & m[1]);
+
+   crc2 = crc32_u64(k2, k[3] & m[3]);
+   crc3 = k2 >> 32;
+
+   crc0 = crc32_u64(crc0, crc1);
+   crc1 = crc32_u64(crc2, crc3);
+
+   crc0 ^= crc1;
+
+   return crc0;
+
+   case 64:
+   k0 = k[0] & m[0];
+   k2 = k[2] & m[2];
+   k5 = k[5] & m[5];
+
+   crc0 = crc32_u64(k0, seed);
+   crc1 = crc32_u64(k0 >> 32, k[1] & m[1]);
+
+   crc2 = crc32_u64(k2, k[3] & m[3]);
+   crc3 = crc32_u64(k2 >> 32, k[4] & m[4]);
+
+   crc4 = crc32_u64(k5, k[6] & m[6]);
+   crc5 = crc32_u64(k5 >> 32, k[7] & m[7]);
+
+   crc0 = crc32_u64(crc0, (crc1 << 32) ^ crc2);
+   crc1 = crc32_u64(crc3, (crc4 << 32) ^ crc5);
+
+   crc0 ^= crc1;
+
+   return crc0;
+
+   default:
+   crc0 = 0;
+   return crc0;

[dpdk-dev] [PATCH V4 1/5] examples/pipeline: improve table update CLI commands

2021-07-09 Thread Cristian Dumitrescu
From: Churchill Khangar 

For more felxibility, the single monolithic table update command is
split into table entry add, table entry delete, table default entry
add, pipeline commit and pipeline abort.

Signed-off-by: Churchill Khangar 
Signed-off-by: Cristian Dumitrescu 
---
 examples/pipeline/cli.c   | 589 --
 examples/pipeline/examples/vxlan.cli  |   3 +-
 examples/pipeline/examples/vxlan_pcap.cli |   3 +-
 3 files changed, 428 insertions(+), 167 deletions(-)

diff --git a/examples/pipeline/cli.c b/examples/pipeline/cli.c
index 215dd8e85..30754e319 100644
--- a/examples/pipeline/cli.c
+++ b/examples/pipeline/cli.c
@@ -1038,25 +1038,76 @@ table_entry_free(struct rte_swx_table_entry *entry)
free(entry);
 }
 
-static const char cmd_pipeline_table_update_help[] =
-"pipeline  table  update  "
-" ";
+#ifndef MAX_LINE_SIZE
+#define MAX_LINE_SIZE 2048
+#endif
+
+static int
+pipeline_table_entries_add(struct rte_swx_ctl_pipeline *p,
+  const char *table_name,
+  FILE *file,
+  uint32_t *file_line_number)
+{
+   char *line = NULL;
+   uint32_t line_id = 0;
+   int status = 0;
+
+   /* Buffer allocation. */
+   line = malloc(MAX_LINE_SIZE);
+   if (!line)
+   return -ENOMEM;
+
+   /* File read. */
+   for (line_id = 1; ; line_id++) {
+   struct rte_swx_table_entry *entry;
+   int is_blank_or_comment;
+
+   if (fgets(line, MAX_LINE_SIZE, file) == NULL)
+   break;
+
+   entry = rte_swx_ctl_pipeline_table_entry_read(p,
+ table_name,
+ line,
+ 
&is_blank_or_comment);
+   if (!entry) {
+   if (is_blank_or_comment)
+   continue;
+
+   status = -EINVAL;
+   goto error;
+   }
+
+   status = rte_swx_ctl_pipeline_table_entry_add(p,
+ table_name,
+ entry);
+   table_entry_free(entry);
+   if (status)
+   goto error;
+   }
+
+error:
+   free(line);
+   *file_line_number = line_id;
+   return status;
+}
+
+static const char cmd_pipeline_table_add_help[] =
+"pipeline  table  add \n";
 
 static void
-cmd_pipeline_table_update(char **tokens,
-   uint32_t n_tokens,
-   char *out,
-   size_t out_size,
-   void *obj)
+cmd_pipeline_table_add(char **tokens,
+  uint32_t n_tokens,
+  char *out,
+  size_t out_size,
+  void *obj)
 {
struct pipeline *p;
-   char *pipeline_name, *table_name, *line = NULL;
-   char *file_name_add, *file_name_delete, *file_name_default;
-   FILE *file_add = NULL, *file_delete = NULL, *file_default = NULL;
-   uint32_t line_id;
+   char *pipeline_name, *table_name, *file_name;
+   FILE *file = NULL;
+   uint32_t file_line_number = 0;
int status;
 
-   if (n_tokens != 8) {
+   if (n_tokens != 6) {
snprintf(out, out_size, MSG_ARG_MISMATCH, tokens[0]);
return;
}
@@ -1068,192 +1119,313 @@ cmd_pipeline_table_update(char **tokens,
return;
}
 
-   if (strcmp(tokens[2], "table") != 0) {
-   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "table");
+   table_name = tokens[3];
+
+   file_name = tokens[5];
+   file = fopen(file_name, "r");
+   if (!file) {
+   snprintf(out, out_size, "Cannot open file %s.\n", file_name);
+   return;
+   }
+
+   status = pipeline_table_entries_add(p->ctl,
+   table_name,
+   file,
+   &file_line_number);
+   if (status)
+   snprintf(out, out_size, "Invalid entry in file %s at line %u\n",
+file_name,
+file_line_number);
+
+   fclose(file);
+}
+
+static int
+pipeline_table_entries_delete(struct rte_swx_ctl_pipeline *p,
+ const char *table_name,
+ FILE *file,
+ uint32_t *file_line_number)
+{
+   char *line = NULL;
+   uint32_t line_id = 0;
+   int status = 0;
+
+   /* Buffer allocation. */
+   line = malloc(MAX_LINE_SIZE);
+   if (!line)
+   return -ENOMEM;
+
+   /* File read. */
+   for (line_id = 1; ; line_id++) {
+   struct rte_swx_table_entry *entry;
+   int is_blank_or_comment;
+
+   

[dpdk-dev] [PATCH V4 3/5] pipeline: add support for selector tables

2021-07-09 Thread Cristian Dumitrescu
Add pipeline-level support for selector tables,

Signed-off-by: Cristian Dumitrescu 
---
Change log:

V3 -> V4: Fixed proper name of allocated structure
(rte_swx_table_selector_params instead of rte_swx_pipeline_selector_params).

 lib/pipeline/rte_swx_ctl.c   | 700 -
 lib/pipeline/rte_swx_ctl.h   | 253 +
 lib/pipeline/rte_swx_pipeline.c  | 748 ---
 lib/pipeline/rte_swx_pipeline.h  |  51 ++
 lib/pipeline/rte_swx_pipeline_spec.c | 354 -
 lib/pipeline/version.map |  13 +
 6 files changed, 2034 insertions(+), 85 deletions(-)

diff --git a/lib/pipeline/rte_swx_ctl.c b/lib/pipeline/rte_swx_ctl.c
index 5d04e750f..8cabce2b9 100644
--- a/lib/pipeline/rte_swx_ctl.c
+++ b/lib/pipeline/rte_swx_ctl.c
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+#include 
+
 #include "rte_swx_ctl.h"
 
 #define CHECK(condition, err_code) 
\
@@ -89,11 +91,44 @@ struct table {
uint32_t n_delete;
 };
 
+struct selector {
+   /* Selector table info. */
+   struct rte_swx_ctl_selector_info info;
+
+   /* group_id field. */
+   struct rte_swx_ctl_table_match_field_info group_id_field;
+
+   /* selector fields. */
+   struct rte_swx_ctl_table_match_field_info *selector_fields;
+
+   /* member_id field. */
+   struct rte_swx_ctl_table_match_field_info member_id_field;
+
+   /* Current selector table. Array of info.n_groups_max elements.*/
+   struct rte_swx_table_selector_group **groups;
+
+   /* Pending selector table subject to the next commit. Array of 
info.n_groups_max elements.
+*/
+   struct rte_swx_table_selector_group **pending_groups;
+
+   /* Valid flag per group. Array of n_groups_max elements. */
+   int *groups_added;
+
+   /* Pending delete flag per group. Group deletion is subject to the next 
commit. Array of
+* info.n_groups_max elements.
+*/
+   int *groups_pending_delete;
+
+   /* Params. */
+   struct rte_swx_table_selector_params params;
+};
+
 struct rte_swx_ctl_pipeline {
struct rte_swx_ctl_pipeline_info info;
struct rte_swx_pipeline *p;
struct action *actions;
struct table *tables;
+   struct selector *selectors;
struct rte_swx_table_state *ts;
struct rte_swx_table_state *ts_next;
int numa_node;
@@ -709,6 +744,209 @@ table_free(struct rte_swx_ctl_pipeline *ctl)
ctl->tables = NULL;
 }
 
+static void
+selector_group_members_free(struct selector *s, uint32_t group_id)
+{
+   struct rte_swx_table_selector_group *group = s->groups[group_id];
+
+   if (!group)
+   return;
+
+   for ( ; ; ) {
+   struct rte_swx_table_selector_member *m;
+
+   m = TAILQ_FIRST(&group->members);
+   if (!m)
+   break;
+
+   TAILQ_REMOVE(&group->members, m, node);
+   free(m);
+   }
+
+   free(group);
+   s->groups[group_id] = NULL;
+}
+
+static void
+selector_pending_group_members_free(struct selector *s, uint32_t group_id)
+{
+   struct rte_swx_table_selector_group *group = 
s->pending_groups[group_id];
+
+   if (!group)
+   return;
+
+   for ( ; ; ) {
+   struct rte_swx_table_selector_member *m;
+
+   m = TAILQ_FIRST(&group->members);
+   if (!m)
+   break;
+
+   TAILQ_REMOVE(&group->members, m, node);
+   free(m);
+   }
+
+   free(group);
+   s->pending_groups[group_id] = NULL;
+}
+
+static int
+selector_group_duplicate_to_pending(struct selector *s, uint32_t group_id)
+{
+   struct rte_swx_table_selector_group *g, *gp;
+   struct rte_swx_table_selector_member *m;
+
+   selector_pending_group_members_free(s, group_id);
+
+   g = s->groups[group_id];
+   gp = s->pending_groups[group_id];
+
+   if (!gp) {
+   gp = calloc(1, sizeof(struct rte_swx_table_selector_group));
+   if (!gp)
+   goto error;
+
+   TAILQ_INIT(&gp->members);
+
+   s->pending_groups[group_id] = gp;
+   }
+
+   if (!g)
+   return 0;
+
+   TAILQ_FOREACH(m, &g->members, node) {
+   struct rte_swx_table_selector_member *mp;
+
+   mp = calloc(1, sizeof(struct rte_swx_table_selector_member));
+   if (!mp)
+   goto error;
+
+   memcpy(mp, m, sizeof(struct rte_swx_table_selector_member));
+
+   TAILQ_INSERT_TAIL(&gp->members, mp, node);
+   }
+
+   return 0;
+
+error:
+   selector_pending_group_members_free(s, group_id);
+   return -ENOMEM;
+}
+
+static void
+selector_free(struct rte_swx_ctl_pipeline *ctl)
+{
+   uint32_t i;
+
+   if (ctl->selectors)
+   return;
+
+   for (i = 0; i < ctl->info.n_selectors; i++) {

[dpdk-dev] [PATCH V4 4/5] examples/pipeline: add support for selector tables

2021-07-09 Thread Cristian Dumitrescu
Add application-evel support for selector tables.

Signed-off-by: Churchill Khangar 
Signed-off-by: Cristian Dumitrescu 
---
 examples/pipeline/cli.c | 563 
 1 file changed, 563 insertions(+)

diff --git a/examples/pipeline/cli.c b/examples/pipeline/cli.c
index 30754e319..f67783c8f 100644
--- a/examples/pipeline/cli.c
+++ b/examples/pipeline/cli.c
@@ -1368,6 +1368,467 @@ cmd_pipeline_table_show(char **tokens,
snprintf(out, out_size, MSG_ARG_INVALID, "table_name");
 }
 
+static const char cmd_pipeline_selector_group_add_help[] =
+"pipeline  selector  group add\n";
+
+static void
+cmd_pipeline_selector_group_add(char **tokens,
+   uint32_t n_tokens,
+   char *out,
+   size_t out_size,
+   void *obj)
+{
+   struct pipeline *p;
+   char *pipeline_name, *selector_name;
+   uint32_t group_id;
+   int status;
+
+   if (n_tokens != 6) {
+   snprintf(out, out_size, MSG_ARG_MISMATCH, tokens[0]);
+   return;
+   }
+
+   pipeline_name = tokens[1];
+   p = pipeline_find(obj, pipeline_name);
+   if (!p || !p->ctl) {
+   snprintf(out, out_size, MSG_ARG_INVALID, "pipeline_name");
+   return;
+   }
+
+   if (strcmp(tokens[2], "selector") != 0) {
+   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "selector");
+   return;
+   }
+
+   selector_name = tokens[3];
+
+   if (strcmp(tokens[4], "group") ||
+   strcmp(tokens[5], "add")) {
+   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "group add");
+   return;
+   }
+
+   status = rte_swx_ctl_pipeline_selector_group_add(p->ctl,
+   selector_name,
+   &group_id);
+   if (status)
+   snprintf(out, out_size, MSG_CMD_FAIL, tokens[0]);
+   else
+   snprintf(out, out_size, "Group ID: %u\n", group_id);
+}
+
+static const char cmd_pipeline_selector_group_delete_help[] =
+"pipeline  selector  group delete \n";
+
+static void
+cmd_pipeline_selector_group_delete(char **tokens,
+   uint32_t n_tokens,
+   char *out,
+   size_t out_size,
+   void *obj)
+{
+   struct pipeline *p;
+   char *pipeline_name, *selector_name;
+   uint32_t group_id;
+   int status;
+
+   if (n_tokens != 7) {
+   snprintf(out, out_size, MSG_ARG_MISMATCH, tokens[0]);
+   return;
+   }
+
+   pipeline_name = tokens[1];
+   p = pipeline_find(obj, pipeline_name);
+   if (!p || !p->ctl) {
+   snprintf(out, out_size, MSG_ARG_INVALID, "pipeline_name");
+   return;
+   }
+
+   if (strcmp(tokens[2], "selector") != 0) {
+   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "selector");
+   return;
+   }
+
+   selector_name = tokens[3];
+
+   if (strcmp(tokens[4], "group") ||
+   strcmp(tokens[5], "delete")) {
+   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "group delete");
+   return;
+   }
+
+   if (parser_read_uint32(&group_id, tokens[6]) != 0) {
+   snprintf(out, out_size, MSG_ARG_INVALID, "group_id");
+   return;
+   }
+
+   status = rte_swx_ctl_pipeline_selector_group_delete(p->ctl,
+   selector_name,
+   group_id);
+   if (status)
+   snprintf(out, out_size, MSG_CMD_FAIL, tokens[0]);
+}
+
+#define GROUP_MEMBER_INFO_TOKENS_MAX 6
+
+static int
+token_is_comment(const char *token)
+{
+   if ((token[0] == '#') ||
+   (token[0] == ';') ||
+   ((token[0] == '/') && (token[1] == '/')))
+   return 1; /* TRUE. */
+
+   return 0; /* FALSE. */
+}
+
+static int
+pipeline_selector_group_member_read(const char *string,
+ uint32_t *group_id,
+ uint32_t *member_id,
+ uint32_t *weight,
+ int *is_blank_or_comment)
+{
+   char *token_array[GROUP_MEMBER_INFO_TOKENS_MAX], **tokens;
+   char *s0 = NULL, *s;
+   uint32_t n_tokens = 0, group_id_val, member_id_val, weight_val;
+   int blank_or_comment = 0;
+
+   /* Check input arguments. */
+   if (!string || !string[0])
+   goto error;
+
+   /* Memory allocation. */
+   s0 = strdup(string);
+   if (!s0)
+   goto error;
+
+   /* Parse the string into tokens. */
+   for (s = s0; ; ) {
+   char *token;
+
+   token = strtok_r(s, " \f\n\r\t\v", &s);
+   if (!token || token_is_comment(token))
+   break;
+
+   if (n_tokens > GROUP_MEMBER_INFO_TOKENS_MAX)
+   goto error;
+
+   token_array[n_tokens] = token;
+   n_tokens++;
+   }
+
+   if (!n_tokens) {
+   blank_or_comment = 1;
+   goto 

[dpdk-dev] [PATCH V4 5/5] examples/pipeline: add selector example

2021-07-09 Thread Cristian Dumitrescu
Added the files to illustrate the selector table usage.

Signed-off-by: Cristian Dumitrescu 
---
 examples/pipeline/examples/selector.cli  | 31 
 examples/pipeline/examples/selector.spec | 95 
 examples/pipeline/examples/selector.txt  |  4 +
 3 files changed, 130 insertions(+)
 create mode 100644 examples/pipeline/examples/selector.cli
 create mode 100644 examples/pipeline/examples/selector.spec
 create mode 100644 examples/pipeline/examples/selector.txt

diff --git a/examples/pipeline/examples/selector.cli 
b/examples/pipeline/examples/selector.cli
new file mode 100644
index 0..36f3ead54
--- /dev/null
+++ b/examples/pipeline/examples/selector.cli
@@ -0,0 +1,31 @@
+; SPDX-License-Identifier: BSD-3-Clause
+; Copyright(c) 2020 Intel Corporation
+
+mempool MEMPOOL0 buffer 2304 pool 32K cache 256 cpu 0
+
+link LINK0 dev :18:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+link LINK1 dev :18:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+link LINK2 dev :3b:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+link LINK3 dev :3b:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on
+
+pipeline PIPELINE0 create 0
+
+pipeline PIPELINE0 port in 0 link LINK0 rxq 0 bsz 32
+pipeline PIPELINE0 port in 1 link LINK1 rxq 0 bsz 32
+pipeline PIPELINE0 port in 2 link LINK2 rxq 0 bsz 32
+pipeline PIPELINE0 port in 3 link LINK3 rxq 0 bsz 32
+
+pipeline PIPELINE0 port out 0 link LINK0 txq 0 bsz 32
+pipeline PIPELINE0 port out 1 link LINK1 txq 0 bsz 32
+pipeline PIPELINE0 port out 2 link LINK2 txq 0 bsz 32
+pipeline PIPELINE0 port out 3 link LINK3 txq 0 bsz 32
+pipeline PIPELINE0 port out 4 sink none
+
+pipeline PIPELINE0 build ./examples/pipeline/examples/selector.spec
+
+pipeline PIPELINE0 selector s group add
+pipeline PIPELINE0 selector s group member add 
./examples/pipeline/examples/selector.txt
+pipeline PIPELINE0 commit
+pipeline PIPELINE0 selector s show
+
+thread 1 pipeline PIPELINE0 enable
diff --git a/examples/pipeline/examples/selector.spec 
b/examples/pipeline/examples/selector.spec
new file mode 100644
index 0..72e49cb1d
--- /dev/null
+++ b/examples/pipeline/examples/selector.spec
@@ -0,0 +1,95 @@
+; SPDX-License-Identifier: BSD-3-Clause
+; Copyright(c) 2020 Intel Corporation
+
+; A selector table is made out of groups of weighted members, with a given 
member potentially part
+; of several groups. The select operation returns a member ID by first 
selecting a group based on an
+; input group ID and then selecting a member within that group by hashing one 
or several input
+; header or meta-data fields. It is very useful for implementing an Equal-Cost 
Multi-Path (ECMP) or
+; Weighted-Cost Multi-Path (WCMP) enabled FIB or a load balancer. It is part 
of the action selector
+; construct described by the P4 Portable Switch Architecture (PSA) 
specification.
+;
+; Normally, an action selector FIB is built with a routing table (the base 
table), a selector table
+; (the group table) and a next hop table (the member table). One of the 
routing table actions sets
+; up the group ID meta-data field used as the index into the group table, 
which produces the member
+; ID meta-data field, i.e. the next hop ID that is used as the index into the 
next hop table. The
+; next hop action prepares the output packet for being sent next hop in the 
network by prepending
+; one or several headers to the packet (Ethernet at the very least), 
decrementing the TTL and
+; recomputing the IPv4 checksum, etc. The selector allows for multiple next 
hops to be specified
+; for any given route as opposed to a single next hop per route; for every 
packet, its next hop is
+; picked out of the set of next hops defined for the route while preserving 
the packet ordering
+; within the flow, with the flow defined by the selector n-tuple fields.
+;
+; In this simple example, the base table and the member table are striped out 
in order to focus
+; exclusively on illustrating the selector table. The group_id is read from 
the destination MAC
+; address and the selector n-tuple is represented by the Protocol, the source 
IP address and the
+; destination IP address fields. The member_id produced by the selector table 
is used to identify
+; the output port which facilitates the testing of different member weights by 
simply comparing the
+; rates of output packets sent on different ports.
+
+//
+// Headers
+//
+struct ethernet_h {
+   bit<48> dst_addr
+   bit<48> src_addr
+   bit<16> ethertype
+}
+
+struct ipv4_h {
+   bit<8> ver_ihl
+   bit<8> diffserv
+   bit<16> total_len
+   bit<16> identification
+   bit<16> flags_offset
+   bit<8> ttl
+   bit<8> protocol
+   bit<16> hdr_checksum
+   bit<32> src_addr
+   bit<32> dst_addr
+}
+
+header ethernet instanceof ethernet_h
+header ipv4 instanceof ipv4_h
+
+//
+// Meta-data
+//
+struct metadata_t {
+   bit<32> port_in
+   bit<32> port_out
+   bit<32> group_id
+}
+
+metadata instanceo

Re: [dpdk-dev] [PATCH V3 1/5] examples/pipeline: improve table update CLI commands

2021-07-09 Thread Dumitrescu, Cristian


> -Original Message-
> From: Thomas Monjalon 
> Sent: Friday, July 9, 2021 11:14 PM
> To: Dumitrescu, Cristian 
> Cc: dev@dpdk.org; Khangar, Churchill 
> Subject: Re: [dpdk-dev] [PATCH V3 1/5] examples/pipeline: improve table
> update CLI commands
> 
> 09/07/2021 23:37, Thomas Monjalon:
> > 03/07/2021 00:46, Cristian Dumitrescu:
> > > From: Churchill Khangar 
> > >
> > > For more felxibility, the single monolithic table update command is
> > > split into table entry add, table entry delete, table default entry
> > > add, pipeline commit and pipeline abort.
> > >
> > > Signed-off-by: Churchill Khangar 
> > > Signed-off-by: Cristian Dumitrescu 
> >
> > Series applied, thanks.
> 
> Sorry, only first 2 patches are really kept,
> because the patch 3 fails 32-bit compilation:
> 
> lib/pipeline/rte_swx_pipeline.c:9851:33: error:
> array subscript ‘struct rte_swx_table_selector_params[0]’
> is partly outside array bounds of ‘unsigned char[24]’ [-Werror=array-bounds]
> 
> 

Hi Thomas,

I just set V4 for this patch set (and the entire series); it was basically just 
a typo that created a small memory allocation bug: the code was incorrectly 
using sizeof(struct rte_swx_pipeline_selector_params) instead of sizeof(struct 
rte_swx_table_selector_params) for the calloc call, hence the warning.

I know it is very late at night right now and you are still awake working to 
get RC1 done (I am just 1 hour before you time zone wise ;)), I would 
appreciate if you could still look at applying this for RC1, only if possible.

Thank you so much!

Regards,
Cristian


[dpdk-dev] [PATCH v2 0/3] features for hns3 PMD

2021-07-09 Thread Min Hu (Connor)
This patchset contains 3 features for hns3 PMD:
add query basic info support for VF
support for VF modify VLAN filter state
support multiple TC MAC pause

Chengchang Tang (2):
  net/hns3: add query basic info support for VF
  net/hns3: support for VF modify VLAN filter state

Huisong Li (1):
  net/hns3: support multiple TC MAC pause
---
v2:
* Abandon the last patch "supports disabling PFC by dev configure API".

 drivers/net/hns3/hns3_cmd.h   |  9 
 drivers/net/hns3/hns3_ethdev.c|  5 +-
 drivers/net/hns3/hns3_ethdev.h|  6 +++
 drivers/net/hns3/hns3_ethdev_vf.c | 96 ++-
 drivers/net/hns3/hns3_mbx.h   | 11 -
 5 files changed, 104 insertions(+), 23 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH v2 3/3] net/hns3: support multiple TC MAC pause

2021-07-09 Thread Min Hu (Connor)
From: Huisong Li 

MAC PAUSE can take effect on a single TC or multiple TCs, depending on the
hardware. For example, the Kunpeng 920 supports MAC pause in a single TC,
and the Kunpeng 930 supports MAC pause in multiple TCs. This patch
supports MAC PAUSE in multiple TC for some hardware.

Signed-off-by: Huisong Li 
Signed-off-by: Min Hu (Connor) 
---
 drivers/net/hns3/hns3_ethdev.c | 5 -
 drivers/net/hns3/hns3_ethdev.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index e515125..17b995a 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -3317,6 +3317,7 @@ hns3_get_capability(struct hns3_hw *hw)
pf->tqp_config_mode = HNS3_FIXED_MAX_TQP_NUM_MODE;
hw->rss_info.ipv6_sctp_offload_supported = false;
hw->udp_cksum_mode = HNS3_SPECIAL_PORT_SW_CKSUM_MODE;
+   pf->support_multi_tc_pause = false;
return 0;
}
 
@@ -3337,6 +3338,7 @@ hns3_get_capability(struct hns3_hw *hw)
pf->tqp_config_mode = HNS3_FLEX_MAX_TQP_NUM_MODE;
hw->rss_info.ipv6_sctp_offload_supported = true;
hw->udp_cksum_mode = HNS3_SPECIAL_PORT_HW_CKSUM_MODE;
+   pf->support_multi_tc_pause = true;
 
return 0;
 }
@@ -6103,6 +6105,7 @@ static int
 hns3_flow_ctrl_set(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
 {
struct hns3_hw *hw = HNS3_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct hns3_pf *pf = HNS3_DEV_PRIVATE_TO_PF(dev->data->dev_private);
int ret;
 
if (fc_conf->high_water || fc_conf->low_water ||
@@ -6132,7 +6135,7 @@ hns3_flow_ctrl_set(struct rte_eth_dev *dev, struct 
rte_eth_fc_conf *fc_conf)
return -EOPNOTSUPP;
}
 
-   if (hw->num_tc > 1) {
+   if (hw->num_tc > 1 && !pf->support_multi_tc_pause) {
hns3_err(hw, "in multi-TC scenarios, MAC pause is not 
supported.");
return -EOPNOTSUPP;
}
diff --git a/drivers/net/hns3/hns3_ethdev.h b/drivers/net/hns3/hns3_ethdev.h
index 729e1c3..3485614 100644
--- a/drivers/net/hns3/hns3_ethdev.h
+++ b/drivers/net/hns3/hns3_ethdev.h
@@ -783,6 +783,7 @@ struct hns3_pf {
uint8_t prio_tc[HNS3_MAX_USER_PRIO]; /* TC indexed by prio */
uint16_t pause_time;
bool support_fc_autoneg;   /* support FC autonegotiate */
+   bool support_multi_tc_pause;
 
uint16_t wanted_umv_size;
uint16_t max_umv_size;
-- 
2.7.4



  1   2   >