Re: [dpdk-dev] [PATCH v2] net/ixgbe: fix using heap pointer after free
> -Original Message- > From: dev On Behalf Of Wang, Haiyue > Sent: Friday, July 9, 2021 12:35 PM > To: Yu, DapengX > Cc: dev@dpdk.org; sta...@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] net/ixgbe: fix using heap pointer after > free > > > -Original Message- > > From: Yu, DapengX > > Sent: Friday, July 9, 2021 11:15 > > To: Wang, Haiyue > > Cc: dev@dpdk.org; Yu, DapengX ; sta...@dpdk.org > > Subject: [PATCH v2] net/ixgbe: fix using heap pointer after free > > > > From: Dapeng Yu > > > > The original code use a heap pointer after it is freed. > > This patch fix it. > > > > Fixes: a14de8b498d1 ("net/ixgbe: destroy consistent filter") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Dapeng Yu > > --- > > V2: > > * Simplify the patch according to maintainer's comment: > > only one "pmd_flow" in the list, so just "break;" is fine. > > --- > > drivers/net/ixgbe/ixgbe_flow.c | 1 + > > 1 file changed, 1 insertion(+) > > > > Good catch, thanks! > > Reviewed-by: Haiyue Wang Applied to dpdk-next-net-intel. Thanks Qi > > > > -- > > 2.27.0
[dpdk-dev] [PATCH] app/testpmd: fix testpmd doesn't show RSS hash offload
This patch reapply Rx/Tx offloads configuration for all ports after the program configuring the device port. When the program configures the ports, the default Rx/Tx offloads are modified. So it is need to reapply Rx/Tx offloads configuration before testpmd showing offloads. Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings") Cc: sta...@dpdk.org Signed-off-by: Jie Wang --- app/test-pmd/testpmd.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 1cdd3cdd12..7089ae216d 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -2475,6 +2475,9 @@ start_port(portid_t pid) } if (port->need_reconfig > 0) { + const struct rte_eth_dev *dev = &rte_eth_devices[pi]; + int k; + port->need_reconfig = 0; if (flow_isolate_all) { @@ -2508,6 +2511,18 @@ start_port(portid_t pid) port->need_reconfig = 1; return -1; } + + /* Apply TxRx configuration for all ports */ + port->dev_conf.txmode = dev->data->dev_conf.txmode; + port->dev_conf.rxmode = dev->data->dev_conf.rxmode; + /* Apply Rx offloads configuration */ + for (k = 0; k < port->dev_info.max_rx_queues; k++) + port->rx_conf[k].offloads = + port->dev_conf.rxmode.offloads; + /* Apply Tx offloads configuration */ + for (k = 0; k < port->dev_info.max_tx_queues; k++) + port->tx_conf[k].offloads = + port->dev_conf.txmode.offloads; } if (port->need_reconfig_queues > 0) { port->need_reconfig_queues = 0; -- 2.25.1
[dpdk-dev] [PATCH] crypto/cnxk: fix debug build failure
Removing usage of unavailable macro. Reported-by: Ali Alnubani Suggested-by: David Marchand Signed-off-by: Tejasree Kondoj --- drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c index 6d322a9cb6..2e1a73939c 100644 --- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c +++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c @@ -275,7 +275,7 @@ cn10k_cpt_sec_post_process(struct rte_crypto_op *cop, m_len = rte_be_to_cpu_16(ip->total_length); } else { PLT_ASSERT(((ip->version_ihl & 0xf0) >> - RTE_IPV4_IHL_MULTIPLIER) == IPV6_VERSION); + RTE_IPV4_IHL_MULTIPLIER) == 6); ip6 = (struct rte_ipv6_hdr *)ip; m_len = rte_be_to_cpu_16(ip6->payload_len) + sizeof(struct rte_ipv6_hdr); -- 2.27.0
Re: [dpdk-dev] [PATCH] net/bnxt: fix build failure
09/07/2021 00:49, Ajit Khaparde: > Fix build failures because of uninitialized variable usage. You should add the error log here. You don't mention the condition of failure. Nobody reproduced a failure so far. > Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation") > > Signed-off-by: Kishore Padmanabha > Signed-off-by: Ajit Khaparde
Re: [dpdk-dev] [PATCH v1 4/4] doc: update iavf driver FDIR/RSS for GTPoGRE
07/07/2021 14:57, Lingyu Liu: > Update 21.08 release note for GTPoGRE FDIR and RSS. > > Signed-off-by: Lingyu Liu > --- > doc/guides/rel_notes/release_21_08.rst | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/doc/guides/rel_notes/release_21_08.rst > b/doc/guides/rel_notes/release_21_08.rst > index cd02820e68..df719420b8 100644 > --- a/doc/guides/rel_notes/release_21_08.rst > +++ b/doc/guides/rel_notes/release_21_08.rst > @@ -64,6 +64,8 @@ New Features > * **Updated Intel iavf driver.** > >* Added Tx QoS VF queue TC mapping. > + * Enable FDIR and RSS for GTPoGRE, support filter based on GTPU TEID/QFI, > +outer most L3 or inner most l3/l4. Should be in past tense. Please update the release notes piece by piece in each patch.
Re: [dpdk-dev] [EXT] Re: [pull-request] next-crypto 21.08 rc1
Hi David, Pushed patches for both issues. http://patches.dpdk.org/project/dpdk/patch/1625812123-802-1-git-send-email-ano...@marvell.com/ http://patches.dpdk.org/project/dpdk/patch/20210709094147.12710-1-ktejas...@marvell.com/ Thanks, Anoob > -Original Message- > From: David Marchand > Sent: Friday, July 9, 2021 12:13 PM > To: Anoob Joseph > Cc: Ali Alnubani ; Akhil Goyal > ; NBU-Contact-Thomas Monjalon > ; dev@dpdk.org; Tejasree Kondoj > > Subject: Re: [EXT] Re: [dpdk-dev] [pull-request] next-crypto 21.08 rc1 > > On Fri, Jul 9, 2021 at 6:25 AM Anoob Joseph wrote: > > > > Hi Ali, David, > > > > Thanks for pointing it out. The suggestion from David should do. Should we > submit a patch with fix on master? How do you suggest we proceed? > > Yes, please send a patch. > > I'll take those three fixes directly in main. > Probably a bit later today. > > > -- > David Marchand
Re: [dpdk-dev] [PATCH v7 1/2] power: add support for cppc cpufreq
On Fri, Jul 9, 2021 at 5:34 AM Richael Zhuang wrote: > > Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are > supported, which are both not available on arm64 platforms. Add > support for cppc_cpufreq driver which works on most arm64 platforms. GHA reported build issues with clang. https://github.com/ovsrobot/dpdk/runs/3025478784?check_suite_focus=true#step:15:787 Could you look at it? Thanks. -- David Marchand
Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library
On Fri, Jul 09, 2021 at 12:05:40AM +0530, Jerin Jacob wrote: > On Thu, Jul 8, 2021 at 8:41 AM fengchengwen wrote: > > > > > >>> > > >>> It's just more conditionals and branches all through the code. Inside > > >>> the > > >>> user application, the user has to check whether to set the flag or not > > >>> (or > > >>> special-case the last transaction outside the loop), and within the > > >>> driver, > > >>> there has to be a branch whether or not to call the doorbell function. > > >>> The > > >>> code on both sides is far simpler and more readable if the doorbell > > >>> function is exactly that - a separate function. > > >> > > >> I disagree. The reason is: > > >> > > >> We will have two classes of applications > > >> > > >> a) do dma copy request as and when it has data(I think, this is the > > >> prime use case), for those, > > >> I think, it is considerable overhead to have two function invocation > > >> per transfer i.e > > >> rte_dma_copy() and rte_dma_perform() > > >> > > >> b) do dma copy when the data is reached to a logical state, like copy > > >> IP frame from Ethernet packets or so, > > >> In that case, the application will have a LOGIC to detect when to > > >> perform it so on the end of > > >> that rte_dma_copy() flag can be updated to fire the doorbell. > > >> > > >> IMO, We are comparing against a branch(flag is already in register) vs > > >> a set of instructions for > > >> 1) function pointer overhead > > >> 2) Need to use the channel context again back in another function. > > >> > > >> IMO, a single branch is most optimal from performance PoV. > > >> > > > Ok, let's try it and see how it goes. > > > > Test result show: > > 1) For Kunpeng platform (ARMv8) could benefit very little with doorbell in > > flags > > 2) For Xeon E5-2690 v2 (X86) could benefit with separate function > > 3) Both platform could benefit with doorbell in flags if burst < 5 > > > > There is a performance gain in small bursts (<5). Given the extensive use > > of bursts > in DPDK applications and users are accustomed to the concept, I do > not recommend > > using the 'doorbell' in flags. > > There is NO concept change between one option vs other option. Just > argument differnet. > Also, _perform() scheme not used anywhere in DPDK. I > > Regarding performance, I have added dummy instructions to simulate the real > work > load[1], now burst also has some gain in both x86 and arm64[3] > > I have modified your application[2] to dpdk test application to use > cpu isolation etc. > So this is gain in flag scheme ad code is checked in to Github[2[ > The benchmark numbers all seem very close between the two schemes. On my team we pretty much have test ioat & idxd drivers ported internally to the last dmadev draft library, and have sample apps handling traffic using those. I'll therefore attempt to get these numbers with real traffic on real drivers to just double check that it's the same as these microbenchmarks. Assuming that perf is the same, how to resolve this? Some thoughts: * As I understand it, the main objection to the separate doorbell function is the use of 8-bytes in fastpath slot. Therefore I will also attempt to benchmark having the doorbell function not on the same cacheline and check perf impact, if any. * If we don't have a impact to perf by having the doorbell function inside the regular "ops" rather than on fastpath cacheline, there is no reason we can't implement both schemes. The user can then choose themselves whether to doorbell using a flag on last item, or to doorbell explicitly using function call. Of the two schemes, and assuming they are equal, I do have a preference for the separate function one, primarily from a code readability point of view. Other than that, I have no strong opinions. /Bruce
Re: [dpdk-dev] [PATCH 0/4] features for hns3 PMD
On 7/9/21 4:32 AM, Min Hu (Connor) wrote: > Thanks Andrew, > "net/hns3: supports disabling PFC by dev configure API" > this patch could be abandoned. > > The other three can be applied. > > > 在 2021/6/14 23:00, Andrew Rybchenko 写道: >> On 6/13/21 6:05 AM, Min Hu (Connor) wrote: >>> This patch set contains 4 features for hns3 PMD: >>> add query basic info support for VF >>> support for VF modify VLAN filter state >>> support multiple TC MAC pause >>> supports disabling PFC by dev configure API >>> >>> Chengchang Tang (2): >>> net/hns3: add query basic info support for VF >>> net/hns3: support for VF modify VLAN filter state >>> >>> Huisong Li (2): >>> net/hns3: support multiple TC MAC pause >>> net/hns3: supports disabling PFC by dev configure API >>> >>> drivers/net/hns3/hns3_cmd.h | 9 >>> drivers/net/hns3/hns3_dcb.c | 61 + >>> drivers/net/hns3/hns3_ethdev.c | 5 +- >>> drivers/net/hns3/hns3_ethdev.h | 7 +++ >>> drivers/net/hns3/hns3_ethdev_vf.c | 96 >>> ++- >>> drivers/net/hns3/hns3_mbx.h | 11 - >>> 6 files changed, 138 insertions(+), 51 deletions(-) >>> >> >> Overall LGTM, but build breakage must be fixed. >> . Please, send a new version with the patch dropped. I'd like patchwork automation to check it once again.
Re: [dpdk-dev] [PATCH] app/testpmd: fix testpmd doesn't show RSS hash offload
On 7/9/21 6:57 PM, Jie Wang wrote: > This patch reapply Rx/Tx offloads configuration for all ports > after the program configuring the device port. When the program > configures the ports, the default Rx/Tx offloads are modified. > > So it is need to reapply Rx/Tx offloads configuration before > testpmd showing offloads. > > Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings") > Cc: sta...@dpdk.org > > Signed-off-by: Jie Wang > --- > app/test-pmd/testpmd.c | 15 +++ > 1 file changed, 15 insertions(+) > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c > index 1cdd3cdd12..7089ae216d 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -2475,6 +2475,9 @@ start_port(portid_t pid) > } > > if (port->need_reconfig > 0) { > + const struct rte_eth_dev *dev = &rte_eth_devices[pi]; > + int k; > + > port->need_reconfig = 0; > > if (flow_isolate_all) { > @@ -2508,6 +2511,18 @@ start_port(portid_t pid) > port->need_reconfig = 1; > return -1; > } > + > + /* Apply TxRx configuration for all ports */ > + port->dev_conf.txmode = dev->data->dev_conf.txmode; > + port->dev_conf.rxmode = dev->data->dev_conf.rxmode; > + /* Apply Rx offloads configuration */ > + for (k = 0; k < port->dev_info.max_rx_queues; k++) > + port->rx_conf[k].offloads = > + port->dev_conf.rxmode.offloads; > + /* Apply Tx offloads configuration */ > + for (k = 0; k < port->dev_info.max_tx_queues; k++) > + port->tx_conf[k].offloads = > + port->dev_conf.txmode.offloads; Does testpmd really require these copies? May be the right fix is to get rid of these copies at all and show actual information from data->dev_conf ?
Re: [dpdk-dev] [PATCH v7 1/2] power: add support for cppc cpufreq
On 9/7/2021 4:34 AM, Richael Zhuang wrote: Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are supported, which are both not available on arm64 platforms. Add support for cppc_cpufreq driver which works on most arm64 platforms. Signed-off-by: Richael Zhuang --- --snip-- Hi Richael, It's usuall to inherit the tags from previous versions (such as Acked-by:, Tested-by:, etc.) if the changes are relatively minor. That helps the patchwork flags keep up to date. And by "inherit" I mean you can add them in to your patch just below the Signed-off-by: tag. If it's a major re-work of a patch, then there might need to be re-review or re-ack. If you’re doing a re-spin of this patch set for the clang issue, that might be a good opportunity to include the tags from previous versions. Rgds, Dave.
Re: [dpdk-dev] [PATCH v7 1/2] power: add support for cppc cpufreq
> -Original Message- > From: David Marchand > Sent: Friday, July 9, 2021 5:10 PM > To: Richael Zhuang > Cc: dev ; Yu Jiang ; David Hunt > > Subject: Re: [dpdk-dev] [PATCH v7 1/2] power: add support for cppc cpufreq > > On Fri, Jul 9, 2021 at 5:34 AM Richael Zhuang > wrote: > > > > Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are > > supported, which are both not available on arm64 platforms. Add > > support for cppc_cpufreq driver which works on most arm64 platforms. > > GHA reported build issues with clang. > https://github.com/ovsrobot/dpdk/runs/3025478784?check_suite_focus=tru > e#step:15:787 > > Could you look at it? > Thanks. > > -- > David Marchand OK, I will fix it soon.
[dpdk-dev] [PATCH v8 0/2] power: add support for cppc cpufreq driver
v8: fix build issue with clang Richael Zhuang (2): power: add support for cppc cpufreq test/power: round cpuinfo cur freq only in CPPC cpufreq app/test/test_power.c | 3 +- app/test/test_power_cpufreq.c | 26 +- doc/guides/rel_notes/release_21_08.rst | 4 + lib/power/meson.build | 1 + lib/power/power_cppc_cpufreq.c | 680 + lib/power/power_cppc_cpufreq.h | 230 + lib/power/rte_power.c | 26 + lib/power/rte_power.h | 2 +- 8 files changed, 960 insertions(+), 12 deletions(-) create mode 100644 lib/power/power_cppc_cpufreq.c create mode 100644 lib/power/power_cppc_cpufreq.h -- 2.20.1
[dpdk-dev] [PATCH v8 1/2] power: add support for cppc cpufreq
Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are supported, which are both not available on arm64 platforms. Add support for cppc_cpufreq driver which works on most arm64 platforms. Signed-off-by: Richael Zhuang Acked-by: David Hunt --- app/test/test_power.c | 3 +- app/test/test_power_cpufreq.c | 3 +- doc/guides/rel_notes/release_21_08.rst | 4 + lib/power/meson.build | 1 + lib/power/power_cppc_cpufreq.c | 680 + lib/power/power_cppc_cpufreq.h | 230 + lib/power/rte_power.c | 26 + lib/power/rte_power.h | 2 +- 8 files changed, 946 insertions(+), 3 deletions(-) create mode 100644 lib/power/power_cppc_cpufreq.c create mode 100644 lib/power/power_cppc_cpufreq.h diff --git a/app/test/test_power.c b/app/test/test_power.c index da1d67c0ab..b7b5561348 100644 --- a/app/test/test_power.c +++ b/app/test/test_power.c @@ -133,7 +133,8 @@ test_power(void) /* Perform tests for valid environments.*/ const enum power_management_env envs[] = {PM_ENV_ACPI_CPUFREQ, PM_ENV_KVM_VM, - PM_ENV_PSTATE_CPUFREQ}; + PM_ENV_PSTATE_CPUFREQ, + PM_ENV_CPPC_CPUFREQ}; unsigned int i; for (i = 0; i < RTE_DIM(envs); ++i) { diff --git a/app/test/test_power_cpufreq.c b/app/test/test_power_cpufreq.c index 0c3adc5f33..8516df4ca6 100644 --- a/app/test/test_power_cpufreq.c +++ b/app/test/test_power_cpufreq.c @@ -496,7 +496,8 @@ test_power_cpufreq(void) /* Test environment configuration */ env = rte_power_get_env(); - if ((env != PM_ENV_ACPI_CPUFREQ) && (env != PM_ENV_PSTATE_CPUFREQ)) { + if ((env != PM_ENV_ACPI_CPUFREQ) && (env != PM_ENV_PSTATE_CPUFREQ) && + (env != PM_ENV_CPPC_CPUFREQ)) { printf("Unexpectedly got an environment other than ACPI/PSTATE\n"); goto fail_all; } diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst index c92e016783..6fd9f0168a 100644 --- a/doc/guides/rel_notes/release_21_08.rst +++ b/doc/guides/rel_notes/release_21_08.rst @@ -103,6 +103,10 @@ New Features usecases. Configuration happens via standard rawdev enq/deq operations. See the :doc:`../rawdevs/cnxk_bphy` rawdev guide for more details on this driver. +* **Added cppc_cpufreq support to Power Management library.** + + Added support for cppc_cpufreq driver which works on most arm64 platforms. + Removed Items - diff --git a/lib/power/meson.build b/lib/power/meson.build index c1097d32f1..36e5a65874 100644 --- a/lib/power/meson.build +++ b/lib/power/meson.build @@ -9,6 +9,7 @@ sources = files( 'guest_channel.c', 'power_acpi_cpufreq.c', 'power_common.c', +'power_cppc_cpufreq.c', 'power_kvm_vm.c', 'power_pstate_cpufreq.c', 'rte_power.c', diff --git a/lib/power/power_cppc_cpufreq.c b/lib/power/power_cppc_cpufreq.c new file mode 100644 index 00..e92973ab54 --- /dev/null +++ b/lib/power/power_cppc_cpufreq.c @@ -0,0 +1,680 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2021 Intel Corporation + * Copyright(c) 2021 Arm Limited + */ + +#include +#include + +#include "power_cppc_cpufreq.h" +#include "power_common.h" + +/* macros used for rounding frequency to nearest 10 */ +#define FREQ_ROUNDING_DELTA 5 +#define ROUND_FREQ_TO_N_10 10 + +/* the unit of highest_perf and nominal_perf differs on different arm platforms. + * For highest_perf, it maybe 300 or 300, both means 3.0GHz. + */ +#define UNIT_DIFF 1 + +#define POWER_CONVERT_TO_DECIMAL 10 + +#define POWER_GOVERNOR_USERSPACE "userspace" +#define POWER_SYSFILE_SETSPEED \ + "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_setspeed" +#define POWER_SYSFILE_SCALING_MAX_FREQ \ + "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_max_freq" +#define POWER_SYSFILE_SCALING_MIN_FREQ \ + "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_min_freq" +#define POWER_SYSFILE_HIGHEST_PERF \ + "/sys/devices/system/cpu/cpu%u/acpi_cppc/highest_perf" +#define POWER_SYSFILE_NOMINAL_PERF \ + "/sys/devices/system/cpu/cpu%u/acpi_cppc/nominal_perf" +#define POWER_SYSFILE_SYS_MAX \ + "/sys/devices/system/cpu/cpu%u/cpufreq/cpuinfo_max_freq" + +#define POWER_CPPC_DRIVER "cppc-cpufreq" +#define BUS_FREQ 10 + +enum power_state { + POWER_IDLE = 0, + POWER_ONGOING, + POWER_USED, + POWER_UNKNOWN +}; + +/** + * Power info per lcore. + */ +struct cppc_power_info { + unsigned int lcore_id; /**< Logical core id */ + uint32_t state; /**< Power in use state */ + FILE *f; /**< FD of scaling_setspeed */ + char governor_o
[dpdk-dev] [PATCH v8 2/2] test/power: round cpuinfo cur freq only in CPPC cpufreq
On arm platform, the value in "/sys/.../cpuinfo_cur_freq" may not be exactly the same as what was set when using CPPC cpufreq driver. For other cpufreq driver, no need to round it currently, or else this check will fail with turbo enabled. For example, with acpi_cpufreq, cpuinfo_cur_freq can be 2401000 which is equal to freqs[0].It should not be rounded to 240. Fixes: 606a234c6d360 ("test/power: round CPU frequency to check") Cc: richael.zhu...@arm.com Cc: sta...@dpdk.org Signed-off-by: Richael Zhuang Acked-by: David Hunt --- app/test/test_power_cpufreq.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/app/test/test_power_cpufreq.c b/app/test/test_power_cpufreq.c index 8516df4ca6..b8fc53925c 100644 --- a/app/test/test_power_cpufreq.c +++ b/app/test/test_power_cpufreq.c @@ -55,7 +55,9 @@ check_cur_freq(unsigned int lcore_id, uint32_t idx, bool turbo) FILE *f; char fullpath[PATH_MAX]; char buf[BUFSIZ]; + enum power_management_env env; uint32_t cur_freq; + uint32_t freq_conv; int ret = -1; int i; @@ -80,15 +82,18 @@ check_cur_freq(unsigned int lcore_id, uint32_t idx, bool turbo) goto fail_all; cur_freq = strtoul(buf, NULL, TEST_POWER_CONVERT_TO_DECIMAL); - - /* convert the frequency to nearest 10 value -* Ex: if cur_freq=1396789 then freq_conv=140 -* Ex: if cur_freq=800030 then freq_conv=80 -*/ - unsigned int freq_conv = 0; - freq_conv = (cur_freq + TEST_FREQ_ROUNDING_DELTA) - / TEST_ROUND_FREQ_TO_N_10; - freq_conv = freq_conv * TEST_ROUND_FREQ_TO_N_10; + freq_conv = cur_freq; + + env = rte_power_get_env(); + if (env == PM_ENV_CPPC_CPUFREQ) { + /* convert the frequency to nearest 10 value +* Ex: if cur_freq=1396789 then freq_conv=140 +* Ex: if cur_freq=800030 then freq_conv=80 +*/ + freq_conv = (cur_freq + TEST_FREQ_ROUNDING_DELTA) + / TEST_ROUND_FREQ_TO_N_10; + freq_conv = freq_conv * TEST_ROUND_FREQ_TO_N_10; + } if (turbo) ret = (freqs[idx] <= freq_conv ? 0 : -1); -- 2.20.1
Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows
09/07/2021 02:16, Tyler Retzlaff: > On Thu, Jul 08, 2021 at 10:39:13PM +0200, Thomas Monjalon wrote: > > 08/07/2021 21:21, Tyler Retzlaff: > > > (2) importing exported data symbols from a dll/dso on windows requires > > > that the symbol be decorated with dllimport. optionally loading > > > performance of dll/dso is also further improved by decorating > > > exported function symbols. [3] > > > > > > for (2) we would propose the introduction and use of two macros to > > > allow decoration of exported data symbols. these macro would be or > > > similarly named __rte_import and __rte_export. of note > > > > That's the same symbol declared in a single place > > which is exported and imported. > > So I don't understand the need for 2 macros. > > i may be misinterpreting your reply. you're saying there is no need for > 2 because we use .def files? > > strictly speaking when exporting C symbols this is true. so yes, we > could introduce only __rte_import and not bother with __rte_export. > > is that what you meant? > > i don't have any objection to just __rte_import alone but it is > mandatory for data symbols. It may be my misunderstanding. The function is declared only once in the .h so I don't understand where these 2 macros are used.
Re: [dpdk-dev] [PATCH] crypto/cnxk: add PCI ID for cn9k
On Fri, Jul 9, 2021 at 8:29 AM Anoob Joseph wrote: > > Add PCI ID for crypo_cn9k PMD. > > To avoid conflicting PCI ID in crypto_octeontx2 and > crypto_cn9k PMDs, disable crypto_cn9k PMD when built with > octeontx2 config. > > The lack of PCI ID is causing debug build to fail for > crypto_cn9k PMD. > > Reported-by: Ali Alnubani > Suggested-by: David Marchand > Signed-off-by: Anoob Joseph Applied, thanks. -- David Marchand
Re: [dpdk-dev] [PATCH] crypto/cnxk: fix debug build failure
On Fri, Jul 9, 2021 at 10:47 AM Tejasree Kondoj wrote: > > Removing usage of unavailable macro. Fixes: baee42a6beff ("crypto/cnxk: add IPsec datapath") > > Reported-by: Ali Alnubani > Suggested-by: David Marchand > Signed-off-by: Tejasree Kondoj Applied, thanks. -- David Marchand
Re: [dpdk-dev] [PATCH v1] net/octeontx/base: fix debug build with clang
On Fri, Jul 9, 2021 at 8:03 AM Shijith Thotton wrote: > > From: David Marchand > > Remove conflicting declaration of this symbol. > > Fixes: d0d654986018 ("net/octeontx: support event Rx adapter") > Cc: sta...@dpdk.org > > Signed-off-by: David Marchand Applied, thanks. -- David Marchand
Re: [dpdk-dev] [PATCH v7] build: use platform for generic and native builds
07/07/2021 15:59, Bruce Richardson: > On Tue, Jul 06, 2021 at 11:44:28AM +0200, Juraj Linkeš wrote: > > The current meson option 'machine' should only specify the ISA, which is > > not sufficient for Arm, where setting ISA implies other settings as well > > (and is used in Arm configuration as such). > > Use the existing 'platform' meson option to differentiate the type of > > the build (native/generic) and set ISA accordingly, unless the user > > chooses to override it with a new option, 'cpu_instruction_set'. > > The 'machine' option set the ISA in x86 builds and set native/default > > 'build type' in aarch64 builds. These two new variables, 'platform' and > > 'cpu_instruction_set', now properly set both ISA and build type for all > > architectures in a uniform manner. > > The 'machine' option also doesn't describe very well what it sets. The > > new option, 'cpu_instruction_set', is much more descriptive. Keep > > 'machine' for backwards compatibility. > > > > Signed-off-by: Juraj Linkeš > > Acked-by: Bruce Richardson Applied, thanks. I did few doc formatting improvements and removed some mentions to Arm as a SoC is not necessarily Arm (Risc-V is coming).
Re: [dpdk-dev] [PATCH] doc: fix build on Windows with meson 0.58
01/07/2021 11:57, Bruce Richardson: > On Thu, Jul 01, 2021 at 09:34:04AM +0100, Luca Boccassi wrote: > > On Wed, 2021-06-30 at 19:22 +0300, Dmitry Kozlyuk wrote: > > > The `doc` target used `echo` as its command. > > > On Windows, `echo` is always a shell built-in, there is no binary. > > > Starting from meson 0.58, `run_target()` always searches for command > > > executable and no longer accepts `echo` as such on Windows. > > > Replace plain `echo` with a Python one-liner. > > > > > > Fixes: d02a2dab2dfb ("doc: support building HTML guides with meson") > > > Cc: Bruce Richardson > > > Cc: Luca Boccassi > > > Cc: sta...@dpdk.org > > > > > > Reported-by: Rob Scheepens > > > Signed-off-by: Dmitry Kozlyuk > > > --- > > > > Acked-by: Luca Boccassi > > One small suggestion might be to move the "echo" command definition to > buildtools folder in case it's wanted for use anywhere in the build at some > point in the future. However, this patch is fine without that change too. I did the move in buildtools/meson.build: echo = py3 + ['-c', 'import sys; print(*sys.argv[1:])'] > Acked-by: Bruce Richardson Acked-by: Thomas Monjalon Applied, thanks
Re: [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison
08/07/2021 16:13, Anatoly Burakov: > doc/guides/rel_notes/release_21_08.rst| 2 ++ > drivers/event/dlb2/dlb2.c | 17 -- > drivers/net/i40e/i40e_rxtx.c | 20 +++ > drivers/net/iavf/iavf_rxtx.c | 20 +++ > drivers/net/ice/ice_rxtx.c| 20 +++ > drivers/net/ixgbe/ixgbe_rxtx.c| 20 +++ > drivers/net/mlx5/mlx5_rx.c| 17 -- > .../include/generic/rte_power_intrinsics.h| 33 +++ > lib/eal/x86/rte_power_intrinsics.c| 17 +- > 9 files changed, 122 insertions(+), 44 deletions(-) About the title, it is introducing a new prefix "power_intrinsics:" with is not so much descriptive. Probably better to formulate with "eal:" prefix. > --- a/drivers/net/mlx5/mlx5_rx.c > +++ b/drivers/net/mlx5/mlx5_rx.c > @@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t > rx_queue_id) > return rx_queue_count(rxq); > } > > +#define CLB_VAL_IDX 0 > +#define CLB_MSK_IDX 1 > +static int > +mlx_monitor_callback(const uint64_t value, > + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]) Everything is prefixed with mlx5, let's be consistent. Please replace mlx_ with mlx5_
Re: [dpdk-dev] [PATCH v7] build: use platform for generic and native builds
> -Original Message- > From: Thomas Monjalon > Sent: Friday, July 9, 2021 2:31 PM > To: Juraj Linkeš ; Bruce Richardson > > Cc: dev@dpdk.org; david.march...@redhat.com; > honnappa.nagaraha...@arm.com; ruifeng.w...@arm.com; > ferruh.yi...@intel.com; jerinjac...@gmail.com > Subject: Re: [dpdk-dev] [PATCH v7] build: use platform for generic and native > builds > > 07/07/2021 15:59, Bruce Richardson: > > On Tue, Jul 06, 2021 at 11:44:28AM +0200, Juraj Linkeš wrote: > > > The current meson option 'machine' should only specify the ISA, > > > which is not sufficient for Arm, where setting ISA implies other > > > settings as well (and is used in Arm configuration as such). > > > Use the existing 'platform' meson option to differentiate the type > > > of the build (native/generic) and set ISA accordingly, unless the > > > user chooses to override it with a new option, 'cpu_instruction_set'. > > > The 'machine' option set the ISA in x86 builds and set > > > native/default 'build type' in aarch64 builds. These two new > > > variables, 'platform' and 'cpu_instruction_set', now properly set > > > both ISA and build type for all architectures in a uniform manner. > > > The 'machine' option also doesn't describe very well what it sets. > > > The new option, 'cpu_instruction_set', is much more descriptive. > > > Keep 'machine' for backwards compatibility. > > > > > > Signed-off-by: Juraj Linkeš > > > > Acked-by: Bruce Richardson > > Applied, thanks. > > I did few doc formatting improvements and removed some mentions to Arm as > a SoC is not necessarily Arm (Risc-V is coming). > > Ok, thanks, Thomas.
Re: [dpdk-dev] [PATCH v8 0/2] power: add support for cppc cpufreq driver
On Fri, Jul 9, 2021 at 12:56 PM Richael Zhuang wrote: > > v8: > fix build issue with clang This cover letter is empty and we only have a partial changelog. I'll trust David Hunt reviews and go ahead with the series. Please put some effort in the cover letter and tracking changes for future contributions. > > Richael Zhuang (2): > power: add support for cppc cpufreq > test/power: round cpuinfo cur freq only in CPPC cpufreq > > app/test/test_power.c | 3 +- > app/test/test_power_cpufreq.c | 26 +- > doc/guides/rel_notes/release_21_08.rst | 4 + > lib/power/meson.build | 1 + > lib/power/power_cppc_cpufreq.c | 680 + > lib/power/power_cppc_cpufreq.h | 230 + > lib/power/rte_power.c | 26 + > lib/power/rte_power.h | 2 +- > 8 files changed, 960 insertions(+), 12 deletions(-) > create mode 100644 lib/power/power_cppc_cpufreq.c > create mode 100644 lib/power/power_cppc_cpufreq.h Series applied, thanks. -- David Marchand
Re: [dpdk-dev] [PATCH v8 5/7] power: support callbacks for multiple Rx queues
On Thu, Jul 8, 2021 at 4:14 PM Anatoly Burakov wrote: > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst > index 403c2b03a3..a96e12d155 100644 > --- a/doc/guides/nics/features.rst > +++ b/doc/guides/nics/features.rst > @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information. > * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``. > * **[related] API**: ``rte_eth_rx_burst_mode_get()``, > ``rte_eth_tx_burst_mode_get()``. > > +.. _nic_features_get_monitor_addr: > + > +PMD power management using monitor addresses > + > + > +Supports getting a monitoring condition to use together with Ethernet PMD > power > +management (see :doc:`../prog_guide/power_man` for more details). > + > +* **[implements] eth_dev_ops**: ``get_monitor_addr`` > + - This new ethdev feature deserves its own commit. - Adding ethdev maintainers. We are missing a doc/guides/nics/features/default.ini. The name of the features proposed here is rather long. As far as I can see, features should be shorter than: doc/guides/conf.py:feature_str_len = 30 I am not really inspired.. "Power mgmt address monitor"? - pmd supporting this feature must have their .ini updated. > .. _nic_features_other: > > Other dev ops not represented by a Feature -- David Marchand
Re: [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison
On 09-Jul-21 2:46 PM, Thomas Monjalon wrote: 08/07/2021 16:13, Anatoly Burakov: doc/guides/rel_notes/release_21_08.rst| 2 ++ drivers/event/dlb2/dlb2.c | 17 -- drivers/net/i40e/i40e_rxtx.c | 20 +++ drivers/net/iavf/iavf_rxtx.c | 20 +++ drivers/net/ice/ice_rxtx.c| 20 +++ drivers/net/ixgbe/ixgbe_rxtx.c| 20 +++ drivers/net/mlx5/mlx5_rx.c| 17 -- .../include/generic/rte_power_intrinsics.h| 33 +++ lib/eal/x86/rte_power_intrinsics.c| 17 +- 9 files changed, 122 insertions(+), 44 deletions(-) About the title, it is introducing a new prefix "power_intrinsics:" with is not so much descriptive. Probably better to formulate with "eal:" prefix. --- a/drivers/net/mlx5/mlx5_rx.c +++ b/drivers/net/mlx5/mlx5_rx.c @@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id) return rx_queue_count(rxq); } +#define CLB_VAL_IDX 0 +#define CLB_MSK_IDX 1 +static int +mlx_monitor_callback(const uint64_t value, + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]) Everything is prefixed with mlx5, let's be consistent. Please replace mlx_ with mlx5_ Sure, will fix. -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH v8 5/7] power: support callbacks for multiple Rx queues
On 09-Jul-21 3:24 PM, David Marchand wrote: On Thu, Jul 8, 2021 at 4:14 PM Anatoly Burakov wrote: diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index 403c2b03a3..a96e12d155 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information. * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``. * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst_mode_get()``. +.. _nic_features_get_monitor_addr: + +PMD power management using monitor addresses + + +Supports getting a monitoring condition to use together with Ethernet PMD power +management (see :doc:`../prog_guide/power_man` for more details). + +* **[implements] eth_dev_ops**: ``get_monitor_addr`` + - This new ethdev feature deserves its own commit. - Adding ethdev maintainers. We are missing a doc/guides/nics/features/default.ini. The name of the features proposed here is rather long. As far as I can see, features should be shorter than: doc/guides/conf.py:feature_str_len = 30 I am not really inspired.. "Power mgmt address monitor"? - pmd supporting this feature must have their .ini updated. .. _nic_features_other: Other dev ops not represented by a Feature You'd have to walk me through/translate whatever it is that you just said as i have no idea what any of that means :D -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH v8 5/7] power: support callbacks for multiple Rx queues
On Fri, Jul 9, 2021 at 4:42 PM Burakov, Anatoly wrote: > You'd have to walk me through/translate whatever it is that you just > said as i have no idea what any of that means :D https://git.dpdk.org/dpdk/commit/?id=fa5dbd825a -- David Marchand
Re: [dpdk-dev] [PATCH v8 7/7] l3fwd-power: support multiqueue in PMD pmgmt modes
On Thu, Jul 8, 2021 at 4:14 PM Anatoly Burakov wrote: > > Currently, l3fwd-power enforces the limitation of having one queue per > lcore. This is no longer necessary, so remove the limitation. Title prefix is examples/l3fwd-power: for consistency sake. -- David Marchand
Re: [dpdk-dev] [PATCH v8 5/7] power: support callbacks for multiple Rx queues
On 09-Jul-21 3:46 PM, David Marchand wrote: On Fri, Jul 9, 2021 at 4:42 PM Burakov, Anatoly wrote: You'd have to walk me through/translate whatever it is that you just said as i have no idea what any of that means :D https://git.dpdk.org/dpdk/commit/?id=fa5dbd825a Thanks, i'll separate it out into a separate commit and try to come up with a good name for it! -- Thanks, Anatoly
[dpdk-dev] [PATCH v1 1/1] power: fix potential null dereferences
Currently, the error paths can lead to attempts at dereferencing NULL pointers. Add the check to avoid attempts at dereferencing NULL pointers. Fixes: 06cffd468fdd ("power: refactor ACPI and intel_pstate support") Cc: anatoly.bura...@intel.com Signed-off-by: Anatoly Burakov --- lib/power/power_pstate_cpufreq.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c index ba28ddcfca..3b607515fd 100644 --- a/lib/power/power_pstate_cpufreq.c +++ b/lib/power/power_pstate_cpufreq.c @@ -440,8 +440,10 @@ power_get_available_freqs(struct pstate_power_info *pi) num_freqs, pi->lcore_id); out: - fclose(f_min); - fclose(f_max); + if (f_min != NULL) + fclose(f_min); + if (f_max != NULL) + fclose(f_max); return ret; } -- 2.25.1
Re: [dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion
On Thu, Jun 24, 2021 at 4:35 PM Burakov, Anatoly wrote: > Right, so the idea is store the PMD-specific data in the monitor > condition, and leave it to the callback to interpret it. > > The obvious question then is, how many values is enough? Two? Three? > Four? This option doesn't really solve the basic issue, it just kicks > the can down the road. I guess three values should be enough for > everyone (tm) ? :D Can we have multiple callbacks executed for a given rxq? Since this is on the rx path, I would say no, but I might be missing some consideration for future evols of this API. In this case, a private field in rxq seems a good place for temporary storage, and then we simply pass a void * opaque pointer. -- David marchand
Re: [dpdk-dev] [dpdk-stable] [PATCH v3] build: fix symlink of drivers for Windows
28/05/2021 10:19, Bruce Richardson: > On Mon, Apr 26, 2021 at 11:07:32AM +0100, Nick Connolly wrote: > > The symlink-drivers-solibs.sh script was disabled as part of 'install' > > for Windows because there is no support for shell scripts. However, > > this means that driver related DLLs are not present in the installed > > 'libdir' directory. Add a python script to perform the install and use > > it for Windows if the version of meson supports using an external > > program with add_install_script (>= 0.55.0). > > > > On Windows, symbolic links are somewhat problematic since the > > SeCreateSymbolicLinkPrivilege is required to be able to create them. > > In addition, different cross-compilation environments handle symbolic > > links differently, e.g. WSL, Msys2, Cygwin. Rather than trying to > > distinguish these scenarios, the python script will perform a file copy > > for any Windows specific names. > > > > On Windows, the shared library outputs have different names depending > > upon which toolset has been used to build them. The script currently > > handles Clang and GCC. > > > > On Linux the functionality is unchanged, but could be replaced with the > > python script once the required minimum version of meson is >= 0.55.0. > > > > Fixes: 5c7d86948764 ("build: fix install on Windows") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Nick Connolly > > Tested-by: Narcisa Vasile > > Acked-by: Narcisa Vasile > > --- > Reviewed-by: Bruce Richardson Added the new file in MAINTAINERS. Applied, thanks.
Re: [dpdk-dev] [PATCH v1 1/1] power: fix potential null dereferences
On Fri, Jul 9, 2021 at 5:02 PM Anatoly Burakov wrote: > > Currently, the error paths can lead to attempts at dereferencing NULL > pointers. Add the check to avoid attempts at dereferencing NULL > pointers. > Coverity issue: 371895 > Fixes: 06cffd468fdd ("power: refactor ACPI and intel_pstate support") > > Signed-off-by: Anatoly Burakov Reviewed-by: David Marchand -- David Marchand
Re: [dpdk-dev] [PATCH v1 1/1] power: fix potential null dereferences
On 9/7/2021 4:09 PM, David Marchand wrote: On Fri, Jul 9, 2021 at 5:02 PM Anatoly Burakov wrote: Currently, the error paths can lead to attempts at dereferencing NULL pointers. Add the check to avoid attempts at dereferencing NULL pointers. Coverity issue: 371895 This patch also fixes Coverity issue: 371889 Rgds, Dave Fixes: 06cffd468fdd ("power: refactor ACPI and intel_pstate support") Signed-off-by: Anatoly Burakov Reviewed-by: David Marchand
Re: [dpdk-dev] [PATCH] kni: update link only on change
24/06/2021 15:32, Ferruh Yigit: > 'rte_kni_update_link()' updates virtual KNI interface link using kernel > sysfs interface. > If the requested link status is same as interface link status, do not > update the link status but return with success. > > Signed-off-by: Ferruh Yigit Applied, thanks.
[dpdk-dev] [PATCH 1/3] bitrate: change reg implementation to match API description
rte_stats_bitrate_reg() API states it returns 'Zero on success'. However, the implementation directly returns the return of rte_metrics_reg_names() which may be zero or positive on success, with a positive value also indicating the index. The user of rte_stats_bitrate_reg() should not care about the index as it is stored in the opaque rte_stats_bitrates struct. Change the implementation of rte_stats_bitrate_reg() to match the API description by always returning zero on success. Fixes: 2ad7ba9a6567 ("bitrate: add bitrate statistics library") Signed-off-by: Kevin Traynor --- lib/bitratestats/rte_bitrate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/bitratestats/rte_bitrate.c b/lib/bitratestats/rte_bitrate.c index 8fd9f47288..e23e38bc94 100644 --- a/lib/bitratestats/rte_bitrate.c +++ b/lib/bitratestats/rte_bitrate.c @@ -56,6 +56,8 @@ rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data) return_value = rte_metrics_reg_names(&names[0], RTE_DIM(names)); - if (return_value >= 0) + if (return_value >= 0) { bitrate_data->id_stats_set = return_value; + return 0; + } return return_value; } -- 2.31.1
[dpdk-dev] [PATCH 2/3] bitrate: change calc implementation to match API description
rte_stats_bitrate_calc() API states it returns 'Negative value on error'. However, the implementation will return the error code from rte_eth_stats_get() which may be non-zero on error. Change the implementation of rte_stats_bitrate_calc() to match the API description by always returning a negative value on error. Fixes: 2ad7ba9a6567 ("bitrate: add bitrate statistics library") Signed-off-by: Kevin Traynor --- lib/bitratestats/rte_bitrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/bitratestats/rte_bitrate.c b/lib/bitratestats/rte_bitrate.c index e23e38bc94..1664e4863b 100644 --- a/lib/bitratestats/rte_bitrate.c +++ b/lib/bitratestats/rte_bitrate.c @@ -81,5 +81,5 @@ rte_stats_bitrate_calc(struct rte_stats_bitrates *bitrate_data, ret_code = rte_eth_stats_get(port_id, ð_stats); if (ret_code != 0) - return ret_code; + return ret_code < 0 ? ret_code : -ret_code; port_data = &bitrate_data->port_stats[port_id]; -- 2.31.1
[dpdk-dev] [PATCH 3/3] bitrate: promote rte_stats_bitrate_free() to stable
rte_stats_bitrate_free() has been in DPDK since 20.11. Its signature is very basic as it just frees an opaque data struct allocated in rte_stats_bitrate_create() and returns void. It's unlikely that such a basic signature would need to change so might as well promote it to stable for the next major ABI. Cc: Hemant Agrawal Signed-off-by: Kevin Traynor --- lib/bitratestats/rte_bitrate.h | 3 --- lib/bitratestats/version.map | 4 ++-- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/lib/bitratestats/rte_bitrate.h b/lib/bitratestats/rte_bitrate.h index fcd1564ddc..e494389b95 100644 --- a/lib/bitratestats/rte_bitrate.h +++ b/lib/bitratestats/rte_bitrate.h @@ -8,6 +8,4 @@ #include -#include - #ifdef __cplusplus extern "C" { @@ -36,5 +34,4 @@ struct rte_stats_bitrates *rte_stats_bitrate_create(void); * Pointer allocated by rte_stats_bitrate_create() */ -__rte_experimental void rte_stats_bitrate_free(struct rte_stats_bitrates *bitrate_data); diff --git a/lib/bitratestats/version.map b/lib/bitratestats/version.map index 152730bb4e..a14d21ebba 100644 --- a/lib/bitratestats/version.map +++ b/lib/bitratestats/version.map @@ -9,7 +9,7 @@ DPDK_21 { }; -EXPERIMENTAL { +DPDK_22 { global: rte_stats_bitrate_free; -}; +} DPDK_21; -- 2.31.1
Re: [dpdk-dev] [PATCH v2 01/20] crypto/cnxk: add driver skeleton
Hello, On Thu, Jul 8, 2021 at 10:15 PM David Marchand wrote: > - Reproduced the issue in Ubuntu 18.04 GHA vms: > > FAILED: drivers/rte_crypto_cnxk.pmd.c > /usr/bin/python3 ../buildtools/gen-pmdinfo-cfile.py > /home/runner/work/dpdk/dpdk/build/buildtools ar > /home/runner/work/dpdk/dpdk/build/drivers/libtmp_rte_crypto_cnxk.a > drivers/rte_crypto_cnxk.pmd.c /usr/bin/python3 > ../buildtools/pmdinfogen.py elf > Traceback (most recent call last): > File "../buildtools/pmdinfogen.py", line 274, in > main() > File "../buildtools/pmdinfogen.py", line 269, in main > drivers = load_drivers(image) > File "../buildtools/pmdinfogen.py", line 203, in load_drivers > drivers.append(Driver.load(image, symbol)) > File "../buildtools/pmdinfogen.py", line 157, in load > driver.pci_ids = cls._load_pci_ids(image, pci_table_name_symbol) > File "../buildtools/pmdinfogen.py", line 177, in _load_pci_ids > pci_id = rte_pci_id.from_buffer_copy(data) > ValueError: Buffer size too small (4 instead of at least 12 bytes) I spent some time trying to understand this error, there is probably multiple bug(s) somewhere between pyelftools/ctype/pmdinfogen. In any case, on my f32, before https://git.dpdk.org/dpdk/commit/?id=b146c30d3c0e (that hides the issue), with a more "recent" pyelftools + python than Ubuntu 18.04, I can see: $ cat $HOME/builds/build-clang-shared/drivers/rte_crypto_cnxk.pmd.c static __attribute__((unused)) const char *generator = "../../dpdk/buildtools/pmdinfogen.py"; const char crypto_cn9k_pmd_info[] __attribute__((used)) = "PMD_INFO_STRING= {\"name\": \"crypto_cn9k\", \"kmod\": \"vfio-pci\", \"pci_ids\": [[10272, 25926, 28516, 24946], [11824, 11569, 11827, 25446]]}"; const char crypto_cn10k_pmd_info[] __attribute__((used)) = "PMD_INFO_STRING= {\"name\": \"crypto_cn10k\", \"kmod\": \"vfio-pci\", \"pci_ids\": [[6013, 41203, 65535, 65535]]}"; $ nm -S /home/dmarchan/builds/build-clang-shared/drivers/librte_crypto_cnxk.so |grep cpt_table 0001d400 0018 b pci_id_cpt_table 0001d2f0 0018 d pci_id_cpt_table You'll notice the former is located in bss. $ hexdump -s $(($(printf "%d" 0x0001d400) - $(printf "%d" 0x1000))) -n $(printf "%d" 0x18) /home/dmarchan/builds/build-clang-shared/drivers/librte_crypto_cnxk.so 001c400 6e61 2067 6576 7372 6f69 206e 3031 302e 001c410 312e 2820 6546 6f64 001c418 $ hexdump -s $(($(printf "%d" 0x0001d2f0) - $(printf "%d" 0x1000))) -n $(printf "%d" 0x18) /home/dmarchan/builds/build-clang-shared/drivers/librte_crypto_cnxk.so 001c2f0 00ff 177d a0f3 001c300 001c308 Those values match what the pmd.c file contains. Are we missing a "memset()" for such symbols in pmdinfogen? Or maybe a warning that we are adding an empty pci ids table.. There is no urgency, but if you find some time to look at this. Thanks. -- David Marchand
Re: [dpdk-dev] [PATCH V6] config/arm: add Qualcomm Centriq 2400 part number
21/06/2021 03:52, Ruifeng Wang: > From: Thomas Monjalon > > 18/06/2021 10:53, Thierry Herbelot: > > > On 6/18/21 10:51 AM, Thomas Monjalon wrote: > > > > 18/06/2021 04:09, Ruifeng Wang: > > > >> From: Thierry Herbelot > > > >>> 'part_number_config': { > > > >>> -'0xc00': {'machine_args': ['-march=armv8-a+crc']} > > > >>> +'0x800': {'machine_args': ['-march=armv8-a+crc']}, > > > >>> +'0xc00': {'machine_args': ['-march=armv8-a+crc']}, > > > >> Nit, redundant comma at the end of the line. > > > > > > > > What is redundant? > > > > > > The comma at the end of the second line is not necessary. > > > > It is a good practice to have comma like other lines, so no need to update > > this > > line when adding more. > > > Looked at style in the rest of the file. Just wanted them to be aligned. > I'm fine with a trailing comma at the last line. > > Acked-by: Ruifeng Wang Applied, thanks
Re: [dpdk-dev] [PATCH] maintainers: update for ARM v8
25/06/2021 16:27, jer...@marvell.com: > From: Jerin Jacob > > Resigning my maintainership for ARM v8 architecture. > > Signed-off-by: Jerin Jacob > --- > > Resigning due to not getting enough quality time to review arm64 architecture > patches. > > Unlike those days where arm64 architecture started with Cavium HW, > Now arm64 architecture is quite matured and at par with x86 support, and > there is enough contribution from Arm. > > I will put my best effort into reviewing the arm64 architecture patches as > time > permits. Applied
Re: [dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file
25/06/2021 10:08, Ferruh Yigit: > On 6/22/2021 4:50 PM, Ray Kinsella wrote: > > Update to ABI MAINTAINERS. > > > > Signed-off-by: Ray Kinsella > > --- > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > ABI Policy & Versioning > > M: Ray Kinsella > > -M: Neil Horman > > F: lib/eal/include/rte_compat.h > > F: lib/eal/include/rte_function_versioning.h > > F: doc/guides/contributing/abi_*.rst > > Acked-by: Ferruh Yigit > > Tried to reach out Neil multiple times for ABI issues without success. Acked-by: Thomas Monjalon Applied
[dpdk-dev] [PATCH v9 2/8] net/af_xdp: add power monitor support
Implement support for .get_monitor_addr in AF_XDP driver. Signed-off-by: Anatoly Burakov --- Notes: v8: - Fix checkpatch issue v2: - Rewrite using the callback mechanism drivers/net/af_xdp/rte_eth_af_xdp.c | 34 + 1 file changed, 34 insertions(+) diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index eb5660a3dc..989051dd6d 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "compat.h" @@ -788,6 +789,38 @@ eth_dev_configure(struct rte_eth_dev *dev) return 0; } +#define CLB_VAL_IDX 0 +static int +eth_monitor_callback(const uint64_t value, + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]) +{ + const uint64_t v = opaque[CLB_VAL_IDX]; + const uint64_t m = (uint32_t)~0; + + /* if the value has changed, abort entering power optimized state */ + return (value & m) == v ? 0 : -1; +} + +static int +eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) +{ + struct pkt_rx_queue *rxq = rx_queue; + unsigned int *prod = rxq->rx.producer; + const uint32_t cur_val = rxq->rx.cached_prod; /* use cached value */ + + /* watch for changes in producer ring */ + pmc->addr = (void *)prod; + + /* store current value */ + pmc->opaque[CLB_VAL_IDX] = cur_val; + pmc->fn = eth_monitor_callback; + + /* AF_XDP producer ring index is 32-bit */ + pmc->size = sizeof(uint32_t); + + return 0; +} + static int eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) { @@ -1448,6 +1481,7 @@ static const struct eth_dev_ops ops = { .link_update = eth_link_update, .stats_get = eth_stats_get, .stats_reset = eth_stats_reset, + .get_monitor_addr = eth_get_monitor_addr }; /** parse busy_budget argument */ -- 2.25.1
[dpdk-dev] [PATCH v9 0/8] Enhancements for PMD power management
This patchset introduces several changes related to PMD power management: - Changed monitoring intrinsics to use callbacks as a comparison function, based on previous patchset [1] but incorporating feedback [2] - this hopefully will make it possible to add support for .get_monitor_addr in virtio - Add a new intrinsic to monitor multiple addresses, based on RTM instruction set and the TPAUSE instruction - Add support for PMD power management on multiple queues, as well as all accompanying infrastructure and example apps changes v9: - Added all missing Acks and Tests - Added a new commit with NIC features - Addressed minor issues raised in review v8: - Fixed checkpatch issue - Added comment explaining empty poll handling (Konstantin) v7: - Fixed various bugs v6: - Improved the algorithm for multi-queue sleep - Fixed segfault and addressed other feedback v5: - Removed "power save queue" API and replaced with mechanism suggested by Konstantin - Addressed other feedback v4: - Replaced raw number with a macro - Fixed all the bugs found by Konstantin - Some other minor corrections v3: - Moved some doc updates to NIC features list v2: - Changed check inversion to callbacks - Addressed feedback from Konstantin - Added doc updates where necessary [1] http://patches.dpdk.org/project/dpdk/list/?series=16930&state=* [2] http://patches.dpdk.org/project/dpdk/patch/819ef1ace187365a615d3383e54579e3d9fb216e.1620747068.git.anatoly.bura...@intel.com/#133274 Anatoly Burakov (8): eal: use callbacks for power monitoring comparison net/af_xdp: add power monitor support doc: add PMD power management NIC feature eal: add power monitor for multiple events power: remove thread safety from PMD power API's power: support callbacks for multiple Rx queues power: support monitoring multiple Rx queues examples/l3fwd-power: support multiq in PMD modes doc/guides/nics/features.rst | 10 + doc/guides/nics/features/default.ini | 1 + doc/guides/prog_guide/power_man.rst | 74 +- doc/guides/rel_notes/release_21_08.rst| 11 + drivers/event/dlb2/dlb2.c | 17 +- drivers/net/af_xdp/rte_eth_af_xdp.c | 34 + drivers/net/i40e/i40e_rxtx.c | 20 +- drivers/net/iavf/iavf_rxtx.c | 20 +- drivers/net/ice/ice_rxtx.c| 20 +- drivers/net/ixgbe/ixgbe_rxtx.c| 20 +- drivers/net/mlx5/mlx5_rx.c| 17 +- examples/l3fwd-power/main.c | 6 - lib/eal/arm/rte_power_intrinsics.c| 11 + lib/eal/include/generic/rte_cpuflags.h| 2 + .../include/generic/rte_power_intrinsics.h| 68 +- lib/eal/ppc/rte_power_intrinsics.c| 11 + lib/eal/version.map | 3 + lib/eal/x86/rte_cpuflags.c| 2 + lib/eal/x86/rte_power_intrinsics.c| 90 ++- lib/power/meson.build | 3 + lib/power/rte_power_pmd_mgmt.c| 663 +- lib/power/rte_power_pmd_mgmt.h| 6 + 22 files changed, 847 insertions(+), 262 deletions(-) -- 2.25.1
[dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison
Previously, the semantics of power monitor were such that we were checking current value against the expected value, and if they matched, then the sleep was aborted. This is somewhat inflexible, because it only allowed us to check for a specific value in a specific way. This commit replaces the comparison with a user callback mechanism, so that any PMD (or other code) using `rte_power_monitor()` can define their own comparison semantics and decision making on how to detect the need to abort the entering of power optimized state. Existing implementations are adjusted to follow the new semantics. Suggested-by: Konstantin Ananyev Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Tested-by: David Hunt Acked-by: Timothy McDaniel --- Notes: v4: - Return error if callback is set to NULL - Replace raw number with a macro in monitor condition opaque data v2: - Use callback mechanism for more flexibility - Address feedback from Konstantin doc/guides/rel_notes/release_21_08.rst| 2 ++ drivers/event/dlb2/dlb2.c | 17 -- drivers/net/i40e/i40e_rxtx.c | 20 +++ drivers/net/iavf/iavf_rxtx.c | 20 +++ drivers/net/ice/ice_rxtx.c| 20 +++ drivers/net/ixgbe/ixgbe_rxtx.c| 20 +++ drivers/net/mlx5/mlx5_rx.c| 17 -- .../include/generic/rte_power_intrinsics.h| 33 +++ lib/eal/x86/rte_power_intrinsics.c| 17 +- 9 files changed, 122 insertions(+), 44 deletions(-) diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst index 476822b47f..912fb13b84 100644 --- a/doc/guides/rel_notes/release_21_08.rst +++ b/doc/guides/rel_notes/release_21_08.rst @@ -144,6 +144,8 @@ API Changes * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string truncation. +* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism. + ABI Changes --- diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c index eca183753f..252bbd8d5e 100644 --- a/drivers/event/dlb2/dlb2.c +++ b/drivers/event/dlb2/dlb2.c @@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num) } } +#define CLB_MASK_IDX 0 +#define CLB_VAL_IDX 1 +static int +dlb2_monitor_callback(const uint64_t val, + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]) +{ + /* abort if the value matches */ + return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0; +} + static inline int dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, struct dlb2_eventdev_port *ev_port, @@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, expected_value = 0; pmc.addr = monitor_addr; - pmc.val = expected_value; - pmc.mask = qe_mask.raw_qe[1]; + /* store expected value and comparison mask in opaque data */ + pmc.opaque[CLB_VAL_IDX] = expected_value; + pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1]; + /* set up callback */ + pmc.fn = dlb2_monitor_callback; pmc.size = sizeof(uint64_t); rte_power_monitor(&pmc, timeout + start_ticks); diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index e518409fe5..8489f91f1d 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -81,6 +81,18 @@ #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \ (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK) +static int +i40e_monitor_callback(const uint64_t value, + const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused) +{ + const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + /* +* we expect the DD bit to be set to 1 if this descriptor was already +* written to. +*/ + return (value & m) == m ? -1 : 0; +} + int i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) { @@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) /* watch for changes in status bit */ pmc->addr = &rxdp->wb.qword1.status_error_len; - /* -* we expect the DD bit to be set to 1 if this descriptor was already -* written to. -*/ - pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); - pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + /* comparison callback */ + pmc->fn = i40e_monitor_callback; /* registers are 64-bit */ pmc->size = sizeof(uint64_t); diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c index f817fbc49b..d61b32fcee 100644 --- a/drivers/net/iavf/iavf_rxtx.c +++ b/drivers/net/iavf/iavf_rxtx.c @@ -57,6 +57,18 @@ iavf_pr
[dpdk-dev] [PATCH v9 3/8] doc: add PMD power management NIC feature
At this point, multiple different Ethernet drivers from multiple vendors will support the PMD power management scheme. It would be useful to add it to the NIC feature table to indicate support for it. Suggested-by: David Marchand Signed-off-by: Anatoly Burakov --- doc/guides/nics/features.rst | 10 ++ doc/guides/nics/features/default.ini | 1 + 2 files changed, 11 insertions(+) diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index 403c2b03a3..a96e12d155 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information. * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``. * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst_mode_get()``. +.. _nic_features_get_monitor_addr: + +PMD power management using monitor addresses + + +Supports getting a monitoring condition to use together with Ethernet PMD power +management (see :doc:`../prog_guide/power_man` for more details). + +* **[implements] eth_dev_ops**: ``get_monitor_addr`` + .. _nic_features_other: Other dev ops not represented by a Feature diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini index 3b55e0ccb0..f1e947bd9e 100644 --- a/doc/guides/nics/features/default.ini +++ b/doc/guides/nics/features/default.ini @@ -76,6 +76,7 @@ x86-64 = Usage doc= Design doc = Perf doc = +Power mgmt address monitor = [rte_flow items] ah = -- 2.25.1
[dpdk-dev] [PATCH v9 4/8] eal: add power monitor for multiple events
Use RTM and WAITPKG instructions to perform a wait-for-writes similar to what UMWAIT does, but without the limitation of having to listen for just one event. This works because the optimized power state used by the TPAUSE instruction will cause a wake up on RTM transaction abort, so if we add the addresses we're interested in to the read-set, any write to those addresses will wake us up. Signed-off-by: Konstantin Ananyev Signed-off-by: Anatoly Burakov Tested-by: David Hunt --- Notes: v4: - Fixed bugs in accessing the monitor condition - Abort on any monitor condition not having a defined callback v2: - Adapt to callback mechanism lib/eal/arm/rte_power_intrinsics.c| 11 +++ lib/eal/include/generic/rte_cpuflags.h| 2 + .../include/generic/rte_power_intrinsics.h| 35 + lib/eal/ppc/rte_power_intrinsics.c| 11 +++ lib/eal/version.map | 3 + lib/eal/x86/rte_cpuflags.c| 2 + lib/eal/x86/rte_power_intrinsics.c| 73 +++ 7 files changed, 137 insertions(+) diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c index e83f04072a..78f55b7203 100644 --- a/lib/eal/arm/rte_power_intrinsics.c +++ b/lib/eal/arm/rte_power_intrinsics.c @@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) return -ENOTSUP; } + +int +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp) +{ + RTE_SET_USED(pmc); + RTE_SET_USED(num); + RTE_SET_USED(tsc_timestamp); + + return -ENOTSUP; +} diff --git a/lib/eal/include/generic/rte_cpuflags.h b/lib/eal/include/generic/rte_cpuflags.h index 28a5aecde8..d35551e931 100644 --- a/lib/eal/include/generic/rte_cpuflags.h +++ b/lib/eal/include/generic/rte_cpuflags.h @@ -24,6 +24,8 @@ struct rte_cpu_intrinsics { /**< indicates support for rte_power_monitor function */ uint32_t power_pause : 1; /**< indicates support for rte_power_pause function */ + uint32_t power_monitor_multi : 1; + /**< indicates support for rte_power_monitor_multi function */ }; /** diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h index c9aa52a86d..04e8c2ab37 100644 --- a/lib/eal/include/generic/rte_power_intrinsics.h +++ b/lib/eal/include/generic/rte_power_intrinsics.h @@ -128,4 +128,39 @@ int rte_power_monitor_wakeup(const unsigned int lcore_id); __rte_experimental int rte_power_pause(const uint64_t tsc_timestamp); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Monitor a set of addresses for changes. This will cause the CPU to enter an + * architecture-defined optimized power state until either one of the specified + * memory addresses is written to, a certain TSC timestamp is reached, or other + * reasons cause the CPU to wake up. + * + * Additionally, `expected` 64-bit values and 64-bit masks are provided. If + * mask is non-zero, the current value pointed to by the `p` pointer will be + * checked against the expected value, and if they do not match, the entering of + * optimized power state may be aborted. + * + * @warning It is responsibility of the user to check if this function is + * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. + * Failing to do so may result in an illegal CPU instruction error. + * + * @param pmc + * An array of monitoring condition structures. + * @param num + * Length of the `pmc` array. + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + * + * @return + * 0 on success + * -EINVAL on invalid parameters + * -ENOTSUP if unsupported + */ +__rte_experimental +int rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp); + #endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/eal/ppc/rte_power_intrinsics.c b/lib/eal/ppc/rte_power_intrinsics.c index 7fc9586da7..f00b58ade5 100644 --- a/lib/eal/ppc/rte_power_intrinsics.c +++ b/lib/eal/ppc/rte_power_intrinsics.c @@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) return -ENOTSUP; } + +int +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp) +{ + RTE_SET_USED(pmc); + RTE_SET_USED(num); + RTE_SET_USED(tsc_timestamp); + + return -ENOTSUP; +} diff --git a/lib/eal/version.map b/lib/eal/version.map index 2df65c6903..887012d02a 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -423,6 +423,9 @@ EXPERIMENTAL { rte_version_release; # WINDOWS_NO_EXPORT rte_version_suffix; # WINDOWS_NO_EXPORT rte_version_year; # WINDOWS_NO_EXPORT + + # added in 21.08 + r
[dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's
Currently, we expect that only one callback can be active at any given moment, for a particular queue configuration, which is relatively easy to implement in a thread-safe way. However, we're about to add support for multiple queues per lcore, which will greatly increase the possibility of various race conditions. We could have used something like an RCU for this use case, but absent of a pressing need for thread safety we'll go the easy way and just mandate that the API's are to be called when all affected ports are stopped, and document this limitation. This greatly simplifies the `rte_power_monitor`-related code. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Tested-by: David Hunt --- Notes: v2: - Add check for stopped queue - Clarified doc message - Added release notes doc/guides/rel_notes/release_21_08.rst | 4 + lib/power/meson.build | 3 + lib/power/rte_power_pmd_mgmt.c | 133 ++--- lib/power/rte_power_pmd_mgmt.h | 6 ++ 4 files changed, 66 insertions(+), 80 deletions(-) diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst index 912fb13b84..b9a3caabf0 100644 --- a/doc/guides/rel_notes/release_21_08.rst +++ b/doc/guides/rel_notes/release_21_08.rst @@ -146,6 +146,10 @@ API Changes * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism. +* rte_power: The experimental PMD power management API is no longer considered + to be thread safe; all Rx queues affected by the API will now need to be + stopped before making any changes to the power management scheme. + ABI Changes --- diff --git a/lib/power/meson.build b/lib/power/meson.build index 36e5a65874..bf937acde4 100644 --- a/lib/power/meson.build +++ b/lib/power/meson.build @@ -22,4 +22,7 @@ headers = files( 'rte_power_pmd_mgmt.h', 'rte_power_guest_channel.h', ) +if cc.has_argument('-Wno-cast-qual') +cflags += '-Wno-cast-qual' +endif deps += ['timer', 'ethdev'] diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c index db03cbf420..9b95cf1794 100644 --- a/lib/power/rte_power_pmd_mgmt.c +++ b/lib/power/rte_power_pmd_mgmt.c @@ -40,8 +40,6 @@ struct pmd_queue_cfg { /**< Callback mode for this queue */ const struct rte_eth_rxtx_callback *cur_cb; /**< Callback instance */ - volatile bool umwait_in_progress; - /**< are we currently sleeping? */ uint64_t empty_poll_stats; /**< Number of empty polls */ } __rte_cache_aligned; @@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, struct rte_power_monitor_cond pmc; uint16_t ret; - /* -* we might get a cancellation request while being -* inside the callback, in which case the wakeup -* wouldn't work because it would've arrived too early. -* -* to get around this, we notify the other thread that -* we're sleeping, so that it can spin until we're done. -* unsolicited wakeups are perfectly safe. -*/ - q_conf->umwait_in_progress = true; - - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); - - /* check if we need to cancel sleep */ - if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { - /* use monitoring condition to sleep */ - ret = rte_eth_get_monitor_addr(port_id, qidx, - &pmc); - if (ret == 0) - rte_power_monitor(&pmc, UINT64_MAX); - } - q_conf->umwait_in_progress = false; - - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); + /* use monitoring condition to sleep */ + ret = rte_eth_get_monitor_addr(port_id, qidx, + &pmc); + if (ret == 0) + rte_power_monitor(&pmc, UINT64_MAX); } } else q_conf->empty_poll_stats = 0; @@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx, return nb_rx; } +static int +queue_stopped(const uint16_t port_id, const uint16_t queue_id) +{ + struct rte_eth_rxq_info qinfo; + + if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0) + return -1; + + return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED; +} + int rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) { struct
[dpdk-dev] [PATCH v9 6/8] power: support callbacks for multiple Rx queues
Currently, there is a hard limitation on the PMD power management support that only allows it to support a single queue per lcore. This is not ideal as most DPDK use cases will poll multiple queues per core. The PMD power management mechanism relies on ethdev Rx callbacks, so it is very difficult to implement such support because callbacks are effectively stateless and have no visibility into what the other ethdev devices are doing. This places limitations on what we can do within the framework of Rx callbacks, but the basics of this implementation are as follows: - Replace per-queue structures with per-lcore ones, so that any device polled from the same lcore can share data - Any queue that is going to be polled from a specific lcore has to be added to the list of queues to poll, so that the callback is aware of other queues being polled by the same lcore - Both the empty poll counter and the actual power saving mechanism is shared between all queues polled on a particular lcore, and is only activated when all queues in the list were polled and were determined to have no traffic. - The limitation on UMWAIT-based polling is not removed because UMWAIT is incapable of monitoring more than one address. Also, while we're at it, update and improve the docs. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Tested-by: David Hunt --- Notes: v8: - Added a comment explaining that we want to sleep on each empty poll after threshold has been reached v7: - Fix bug where initial sleep target was always set to zero - Fix logic in handling of n_queues_ready_to_sleep counter - Update documentation on hardware requirements v6: - Track each individual queue sleep status (Konstantin) - Fix segfault (Dave) v5: - Remove the "power save queue" API and replace it with mechanism suggested by Konstantin v3: - Move the list of supported NICs to NIC feature table v2: - Use a TAILQ for queues instead of a static array - Address feedback from Konstantin - Add additional checks for stopped queues doc/guides/prog_guide/power_man.rst| 69 ++-- doc/guides/rel_notes/release_21_08.rst | 5 + lib/power/rte_power_pmd_mgmt.c | 460 +++-- 3 files changed, 398 insertions(+), 136 deletions(-) diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index c70ae128ac..0e66878892 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -198,34 +198,45 @@ Ethernet PMD Power Management API Abstract -Existing power management mechanisms require developers -to change application design or change code to make use of it. -The PMD power management API provides a convenient alternative -by utilizing Ethernet PMD RX callbacks, -and triggering power saving whenever empty poll count reaches a certain number. - -Monitor - This power saving scheme will put the CPU into optimized power state - and use the ``rte_power_monitor()`` function - to monitor the Ethernet PMD RX descriptor address, - and wake the CPU up whenever there's new traffic. - -Pause - This power saving scheme will avoid busy polling - by either entering power-optimized sleep state - with ``rte_power_pause()`` function, - or, if it's not available, use ``rte_pause()``. - -Frequency scaling - This power saving scheme will use ``librte_power`` library - functionality to scale the core frequency up/down - depending on traffic volume. - -.. note:: - - Currently, this power management API is limited to mandatory mapping - of 1 queue to 1 core (multiple queues are supported, - but they must be polled from different cores). +Existing power management mechanisms require developers to change application +design or change code to make use of it. The PMD power management API provides a +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering +power saving whenever empty poll count reaches a certain number. + +* Monitor + This power saving scheme will put the CPU into optimized power state and + monitor the Ethernet PMD RX descriptor address, waking the CPU up whenever + there's new traffic. Support for this scheme may not be available on all + platforms, and further limitations may apply (see below). + +* Pause + This power saving scheme will avoid busy polling by either entering + power-optimized sleep state with ``rte_power_pause()`` function, or, if it's + not supported by the underlying platform, use ``rte_pause()``. + +* Frequency scaling + This power saving scheme will use ``librte_power`` library functionality to + scale the core frequency up/down depending on traffic volume. + +The "monitor" mode is only supported in the following configurations and scenarios: + +* On Linux* x86_64, `rte_power_monitor()` requires WAITPKG instruction set being + supported by the CPU. Please refe
[dpdk-dev] [PATCH v9 7/8] power: support monitoring multiple Rx queues
Use the new multi-monitor intrinsic to allow monitoring multiple ethdev Rx queues while entering the energy efficient power state. The multi version will be used unconditionally if supported, and the UMWAIT one will only be used when multi-monitor is not supported by the hardware. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Tested-by: David Hunt --- Notes: v6: - Fix the missed feedback from v5 v4: - Fix possible out of bounds access - Added missing index increment doc/guides/prog_guide/power_man.rst | 15 -- lib/power/rte_power_pmd_mgmt.c | 82 - 2 files changed, 90 insertions(+), 7 deletions(-) diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index 0e66878892..e387d7811e 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -221,17 +221,22 @@ power saving whenever empty poll count reaches a certain number. The "monitor" mode is only supported in the following configurations and scenarios: * On Linux* x86_64, `rte_power_monitor()` requires WAITPKG instruction set being - supported by the CPU. Please refer to your platform documentation for further - information. + supported by the CPU, while `rte_power_monitor_multi()` requires WAITPKG and + RTM instruction sets being supported by the CPU. RTM instruction set may also + require booting the Linux with `tsx=on` command line parameter. Please refer + to your platform documentation for further information. * If ``rte_cpu_get_intrinsics_support()`` function indicates that + ``rte_power_monitor_multi()`` function is supported by the platform, then + monitoring multiple Ethernet Rx queues for traffic will be supported. + +* If ``rte_cpu_get_intrinsics_support()`` function indicates that only ``rte_power_monitor()`` is supported by the platform, then monitoring will be limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be monitored from a different lcore). -* If ``rte_cpu_get_intrinsics_support()`` function indicates that the - ``rte_power_monitor()`` function is not supported, then monitor mode will not - be supported. +* If ``rte_cpu_get_intrinsics_support()`` function indicates that neither of the + two monitoring functions are supported, then monitor mode will not be supported. * Not all Ethernet drivers support monitoring, even if the underlying platform may support the necessary CPU instructions. Please refer to diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c index 30772791af..2586204b93 100644 --- a/lib/power/rte_power_pmd_mgmt.c +++ b/lib/power/rte_power_pmd_mgmt.c @@ -126,6 +126,32 @@ queue_list_take(struct pmd_core_cfg *cfg, const union queue *q) return found; } +static inline int +get_monitor_addresses(struct pmd_core_cfg *cfg, + struct rte_power_monitor_cond *pmc, size_t len) +{ + const struct queue_list_entry *qle; + size_t i = 0; + int ret; + + TAILQ_FOREACH(qle, &cfg->head, next) { + const union queue *q = &qle->queue; + struct rte_power_monitor_cond *cur; + + /* attempted out of bounds access */ + if (i >= len) { + RTE_LOG(ERR, POWER, "Too many queues being monitored\n"); + return -1; + } + + cur = &pmc[i++]; + ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur); + if (ret < 0) + return ret; + } + return 0; +} + static void calc_tsc(void) { @@ -215,6 +241,46 @@ lcore_can_sleep(struct pmd_core_cfg *cfg) return true; } +static uint16_t +clb_multiwait(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *arg) +{ + const unsigned int lcore = rte_lcore_id(); + struct queue_list_entry *queue_conf = arg; + struct pmd_core_cfg *lcore_conf; + const bool empty = nb_rx == 0; + + lcore_conf = &lcore_cfgs[lcore]; + + /* early exit */ + if (likely(!empty)) + /* early exit */ + queue_reset(lcore_conf, queue_conf); + else { + struct rte_power_monitor_cond pmc[lcore_conf->n_queues]; + int ret; + + /* can this queue sleep? */ + if (!queue_can_sleep(lcore_conf, queue_conf)) + return nb_rx; + + /* can this lcore sleep? */ + if (!lcore_can_sleep(lcore_conf)) + return nb_rx; + + /* gather all monitoring conditions */ + ret = get_monitor_addresses(lcore_conf, pmc, + lcore_conf->n_queues); + if (ret < 0) + return nb_rx; + + rte_power_mo
[dpdk-dev] [PATCH v9 8/8] examples/l3fwd-power: support multiq in PMD modes
Currently, l3fwd-power enforces the limitation of having one queue per lcore. This is no longer necessary, so remove the limitation. Signed-off-by: Anatoly Burakov Tested-by: David Hunt --- examples/l3fwd-power/main.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index f8dfed1634..52f56dc405 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -2723,12 +2723,6 @@ main(int argc, char **argv) printf("\nInitializing rx queues on lcore %u ... ", lcore_id ); fflush(stdout); - /* PMD power management mode can only do 1 queue per core */ - if (app_mode == APP_MODE_PMD_MGMT && qconf->n_rx_queue > 1) { - rte_exit(EXIT_FAILURE, - "In PMD power management mode, only one queue per lcore is allowed\n"); - } - /* init RX queues */ for(queue = 0; queue < qconf->n_rx_queue; ++queue) { struct rte_eth_rxconf rxq_conf; -- 2.25.1
Re: [dpdk-dev] [dpdk-stable] [EXT] [PATCH] compress/mlx5: fix memory region unregistration
08/07/2021 08:25, Michael Baum: > Hi, > > From: Akhil Goyal > > > The issue can cause illegal physical address access while a huge-page > > > A is released and huge-page B is allocated on the same virtual address. > > > The old MR can be matched using the virtual address of huge-page B but > > > the HW will access the physical address of huge-page A which is no > > > more part of the DPDK process. > > > > > > Register a driver callback for memory event in order to free out all > > > the MRs of memory that is going to be freed from the dpdk process. > > > > > > Fixes: f8c97babc9f4 ("compress/mlx5: add data-path functions") > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Michael Baum > > > Acked-by: Matan Azrad > > > --- > > CI is reporting error with this patch. Can you check? > > This patch depends on this patch: > https://patchwork.dpdk.org/project/dpdk/patch/20210628150614.1769507-1-michae...@nvidia.com/ > It sent before and integrated to master-net-mlx branch, but not integrated to > main branch. > This deficiency probably causes these CI errors. Please send a v2 to trigger a new test, thanks.
[dpdk-dev] [PATCH v2 1/1] power: fix potential null dereferences
Currently, the error paths can lead to attempts at dereferencing NULL pointers. Add the check to avoid attempts at dereferencing NULL pointers. Coverity issue: 371895 Coverity issue: 371889 Fixes: 06cffd468fdd ("power: refactor ACPI and intel_pstate support") Cc: anatoly.bura...@intel.com Signed-off-by: Anatoly Burakov Reviewed-by: David Marchand --- lib/power/power_pstate_cpufreq.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c index ba28ddcfca..3b607515fd 100644 --- a/lib/power/power_pstate_cpufreq.c +++ b/lib/power/power_pstate_cpufreq.c @@ -440,8 +440,10 @@ power_get_available_freqs(struct pstate_power_info *pi) num_freqs, pi->lcore_id); out: - fclose(f_min); - fclose(f_max); + if (f_min != NULL) + fclose(f_min); + if (f_max != NULL) + fclose(f_max); return ret; } -- 2.25.1
Re: [dpdk-dev] [PATCH v9 3/8] doc: add PMD power management NIC feature
On 09-Jul-21 4:53 PM, Anatoly Burakov wrote: At this point, multiple different Ethernet drivers from multiple vendors will support the PMD power management scheme. It would be useful to add it to the NIC feature table to indicate support for it. Suggested-by: David Marchand Signed-off-by: Anatoly Burakov --- doc/guides/nics/features.rst | 10 ++ doc/guides/nics/features/default.ini | 1 + 2 files changed, 11 insertions(+) diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index 403c2b03a3..a96e12d155 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information. * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``. * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst_mode_get()``. +.. _nic_features_get_monitor_addr: + +PMD power management using monitor addresses + + +Supports getting a monitoring condition to use together with Ethernet PMD power +management (see :doc:`../prog_guide/power_man` for more details). + +* **[implements] eth_dev_ops**: ``get_monitor_addr`` + .. _nic_features_other: Other dev ops not represented by a Feature diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini index 3b55e0ccb0..f1e947bd9e 100644 --- a/doc/guides/nics/features/default.ini +++ b/doc/guides/nics/features/default.ini @@ -76,6 +76,7 @@ x86-64 = Usage doc= Design doc = Perf doc = +Power mgmt address monitor = [rte_flow items] ah = Apologies, forgot to git add the driver files to the commit. Will respin shortly. -- Thanks, Anatoly
[dpdk-dev] [PATCH v10 0/8] Enhancements for PMD power management
This patchset introduces several changes related to PMD power management: - Changed monitoring intrinsics to use callbacks as a comparison function, based on previous patchset [1] but incorporating feedback [2] - this hopefully will make it possible to add support for .get_monitor_addr in virtio - Add a new intrinsic to monitor multiple addresses, based on RTM instruction set and the TPAUSE instruction - Add support for PMD power management on multiple queues, as well as all accompanying infrastructure and example apps changes v10: - Added missing changes to NIC feature .ini files v9: - Added all missing Acks and Tests - Added a new commit with NIC features - Addressed minor issues raised in review v8: - Fixed checkpatch issue - Added comment explaining empty poll handling (Konstantin) v7: - Fixed various bugs v6: - Improved the algorithm for multi-queue sleep - Fixed segfault and addressed other feedback v5: - Removed "power save queue" API and replaced with mechanism suggested by Konstantin - Addressed other feedback v4: - Replaced raw number with a macro - Fixed all the bugs found by Konstantin - Some other minor corrections v3: - Moved some doc updates to NIC features list v2: - Changed check inversion to callbacks - Addressed feedback from Konstantin - Added doc updates where necessary [1] http://patches.dpdk.org/project/dpdk/list/?series=16930&state=* [2] http://patches.dpdk.org/project/dpdk/patch/819ef1ace187365a615d3383e54579e3d9fb216e.1620747068.git.anatoly.bura...@intel.com/#133274 Anatoly Burakov (8): eal: use callbacks for power monitoring comparison net/af_xdp: add power monitor support doc: add PMD power management NIC feature eal: add power monitor for multiple events power: remove thread safety from PMD power API's power: support callbacks for multiple Rx queues power: support monitoring multiple Rx queues examples/l3fwd-power: support multiq in PMD modes doc/guides/nics/features.rst | 10 + doc/guides/nics/features/af_xdp.ini | 1 + doc/guides/nics/features/default.ini | 1 + doc/guides/nics/features/i40e.ini | 1 + doc/guides/nics/features/i40e_vf.ini | 1 + doc/guides/nics/features/iavf.ini | 1 + doc/guides/nics/features/ice.ini | 1 + doc/guides/nics/features/ixgbe.ini| 1 + doc/guides/nics/features/ixgbe_vf.ini | 1 + doc/guides/nics/features/mlx5.ini | 1 + doc/guides/prog_guide/power_man.rst | 74 +- doc/guides/rel_notes/release_21_08.rst| 11 + drivers/event/dlb2/dlb2.c | 17 +- drivers/net/af_xdp/rte_eth_af_xdp.c | 34 + drivers/net/i40e/i40e_rxtx.c | 20 +- drivers/net/iavf/iavf_rxtx.c | 20 +- drivers/net/ice/ice_rxtx.c| 20 +- drivers/net/ixgbe/ixgbe_rxtx.c| 20 +- drivers/net/mlx5/mlx5_rx.c| 17 +- examples/l3fwd-power/main.c | 6 - lib/eal/arm/rte_power_intrinsics.c| 11 + lib/eal/include/generic/rte_cpuflags.h| 2 + .../include/generic/rte_power_intrinsics.h| 68 +- lib/eal/ppc/rte_power_intrinsics.c| 11 + lib/eal/version.map | 3 + lib/eal/x86/rte_cpuflags.c| 2 + lib/eal/x86/rte_power_intrinsics.c| 90 ++- lib/power/meson.build | 3 + lib/power/rte_power_pmd_mgmt.c| 663 +- lib/power/rte_power_pmd_mgmt.h| 6 + 30 files changed, 855 insertions(+), 262 deletions(-) -- 2.25.1
[dpdk-dev] [PATCH v10 1/8] eal: use callbacks for power monitoring comparison
Previously, the semantics of power monitor were such that we were checking current value against the expected value, and if they matched, then the sleep was aborted. This is somewhat inflexible, because it only allowed us to check for a specific value in a specific way. This commit replaces the comparison with a user callback mechanism, so that any PMD (or other code) using `rte_power_monitor()` can define their own comparison semantics and decision making on how to detect the need to abort the entering of power optimized state. Existing implementations are adjusted to follow the new semantics. Suggested-by: Konstantin Ananyev Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Tested-by: David Hunt Acked-by: Timothy McDaniel --- Notes: v4: - Return error if callback is set to NULL - Replace raw number with a macro in monitor condition opaque data v2: - Use callback mechanism for more flexibility - Address feedback from Konstantin doc/guides/rel_notes/release_21_08.rst| 2 ++ drivers/event/dlb2/dlb2.c | 17 -- drivers/net/i40e/i40e_rxtx.c | 20 +++ drivers/net/iavf/iavf_rxtx.c | 20 +++ drivers/net/ice/ice_rxtx.c| 20 +++ drivers/net/ixgbe/ixgbe_rxtx.c| 20 +++ drivers/net/mlx5/mlx5_rx.c| 17 -- .../include/generic/rte_power_intrinsics.h| 33 +++ lib/eal/x86/rte_power_intrinsics.c| 17 +- 9 files changed, 122 insertions(+), 44 deletions(-) diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst index 476822b47f..912fb13b84 100644 --- a/doc/guides/rel_notes/release_21_08.rst +++ b/doc/guides/rel_notes/release_21_08.rst @@ -144,6 +144,8 @@ API Changes * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string truncation. +* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism. + ABI Changes --- diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c index eca183753f..252bbd8d5e 100644 --- a/drivers/event/dlb2/dlb2.c +++ b/drivers/event/dlb2/dlb2.c @@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num) } } +#define CLB_MASK_IDX 0 +#define CLB_VAL_IDX 1 +static int +dlb2_monitor_callback(const uint64_t val, + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]) +{ + /* abort if the value matches */ + return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0; +} + static inline int dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, struct dlb2_eventdev_port *ev_port, @@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, expected_value = 0; pmc.addr = monitor_addr; - pmc.val = expected_value; - pmc.mask = qe_mask.raw_qe[1]; + /* store expected value and comparison mask in opaque data */ + pmc.opaque[CLB_VAL_IDX] = expected_value; + pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1]; + /* set up callback */ + pmc.fn = dlb2_monitor_callback; pmc.size = sizeof(uint64_t); rte_power_monitor(&pmc, timeout + start_ticks); diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index e518409fe5..8489f91f1d 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -81,6 +81,18 @@ #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \ (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK) +static int +i40e_monitor_callback(const uint64_t value, + const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused) +{ + const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + /* +* we expect the DD bit to be set to 1 if this descriptor was already +* written to. +*/ + return (value & m) == m ? -1 : 0; +} + int i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) { @@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) /* watch for changes in status bit */ pmc->addr = &rxdp->wb.qword1.status_error_len; - /* -* we expect the DD bit to be set to 1 if this descriptor was already -* written to. -*/ - pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); - pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + /* comparison callback */ + pmc->fn = i40e_monitor_callback; /* registers are 64-bit */ pmc->size = sizeof(uint64_t); diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c index f817fbc49b..d61b32fcee 100644 --- a/drivers/net/iavf/iavf_rxtx.c +++ b/drivers/net/iavf/iavf_rxtx.c @@ -57,6 +57,18 @@ iavf_pr
[dpdk-dev] [PATCH v10 2/8] net/af_xdp: add power monitor support
Implement support for .get_monitor_addr in AF_XDP driver. Signed-off-by: Anatoly Burakov --- Notes: v8: - Fix checkpatch issue v2: - Rewrite using the callback mechanism drivers/net/af_xdp/rte_eth_af_xdp.c | 34 + 1 file changed, 34 insertions(+) diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index eb5660a3dc..989051dd6d 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "compat.h" @@ -788,6 +789,38 @@ eth_dev_configure(struct rte_eth_dev *dev) return 0; } +#define CLB_VAL_IDX 0 +static int +eth_monitor_callback(const uint64_t value, + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]) +{ + const uint64_t v = opaque[CLB_VAL_IDX]; + const uint64_t m = (uint32_t)~0; + + /* if the value has changed, abort entering power optimized state */ + return (value & m) == v ? 0 : -1; +} + +static int +eth_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) +{ + struct pkt_rx_queue *rxq = rx_queue; + unsigned int *prod = rxq->rx.producer; + const uint32_t cur_val = rxq->rx.cached_prod; /* use cached value */ + + /* watch for changes in producer ring */ + pmc->addr = (void *)prod; + + /* store current value */ + pmc->opaque[CLB_VAL_IDX] = cur_val; + pmc->fn = eth_monitor_callback; + + /* AF_XDP producer ring index is 32-bit */ + pmc->size = sizeof(uint32_t); + + return 0; +} + static int eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) { @@ -1448,6 +1481,7 @@ static const struct eth_dev_ops ops = { .link_update = eth_link_update, .stats_get = eth_stats_get, .stats_reset = eth_stats_reset, + .get_monitor_addr = eth_get_monitor_addr }; /** parse busy_budget argument */ -- 2.25.1
[dpdk-dev] [PATCH v10 3/8] doc: add PMD power management NIC feature
At this point, multiple different Ethernet drivers from multiple vendors will support the PMD power management scheme. It would be useful to add it to the NIC feature table to indicate support for it. Suggested-by: David Marchand Signed-off-by: Anatoly Burakov --- Notes: v10: - Added missing NIC feature support in ini files doc/guides/nics/features.rst | 10 ++ doc/guides/nics/features/af_xdp.ini | 1 + doc/guides/nics/features/default.ini | 1 + doc/guides/nics/features/i40e.ini | 1 + doc/guides/nics/features/i40e_vf.ini | 1 + doc/guides/nics/features/iavf.ini | 1 + doc/guides/nics/features/ice.ini | 1 + doc/guides/nics/features/ixgbe.ini| 1 + doc/guides/nics/features/ixgbe_vf.ini | 1 + doc/guides/nics/features/mlx5.ini | 1 + 10 files changed, 19 insertions(+) diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst index 403c2b03a3..a96e12d155 100644 --- a/doc/guides/nics/features.rst +++ b/doc/guides/nics/features.rst @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information. * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``. * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst_mode_get()``. +.. _nic_features_get_monitor_addr: + +PMD power management using monitor addresses + + +Supports getting a monitoring condition to use together with Ethernet PMD power +management (see :doc:`../prog_guide/power_man` for more details). + +* **[implements] eth_dev_ops**: ``get_monitor_addr`` + .. _nic_features_other: Other dev ops not represented by a Feature diff --git a/doc/guides/nics/features/af_xdp.ini b/doc/guides/nics/features/af_xdp.ini index 36953c2dec..4e3f638bf5 100644 --- a/doc/guides/nics/features/af_xdp.ini +++ b/doc/guides/nics/features/af_xdp.ini @@ -9,3 +9,4 @@ MTU update = Y Promiscuous mode = Y Stats per queue = Y x86-64 = Y +Power mgmt address monitor = Y diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini index 3b55e0ccb0..f1e947bd9e 100644 --- a/doc/guides/nics/features/default.ini +++ b/doc/guides/nics/features/default.ini @@ -76,6 +76,7 @@ x86-64 = Usage doc= Design doc = Perf doc = +Power mgmt address monitor = [rte_flow items] ah = diff --git a/doc/guides/nics/features/i40e.ini b/doc/guides/nics/features/i40e.ini index 1f3f5eb3ff..b6765d0e5a 100644 --- a/doc/guides/nics/features/i40e.ini +++ b/doc/guides/nics/features/i40e.ini @@ -51,6 +51,7 @@ x86-32 = Y x86-64 = Y ARMv8= Y Power8 = Y +Power mgmt address monitor = Y [rte_flow items] ah = Y diff --git a/doc/guides/nics/features/i40e_vf.ini b/doc/guides/nics/features/i40e_vf.ini index bac1bb4344..d5b163c1c1 100644 --- a/doc/guides/nics/features/i40e_vf.ini +++ b/doc/guides/nics/features/i40e_vf.ini @@ -37,3 +37,4 @@ FreeBSD = Y Linux= Y x86-32 = Y x86-64 = Y +Power mgmt address monitor = Y diff --git a/doc/guides/nics/features/iavf.ini b/doc/guides/nics/features/iavf.ini index 43a84a3bda..146b004da2 100644 --- a/doc/guides/nics/features/iavf.ini +++ b/doc/guides/nics/features/iavf.ini @@ -33,6 +33,7 @@ FreeBSD = Y Linux= Y x86-32 = Y x86-64 = Y +Power mgmt address monitor = Y [rte_flow items] ah = Y diff --git a/doc/guides/nics/features/ice.ini b/doc/guides/nics/features/ice.ini index 1b9228c678..fbc81c654d 100644 --- a/doc/guides/nics/features/ice.ini +++ b/doc/guides/nics/features/ice.ini @@ -42,6 +42,7 @@ Linux= Y Windows = Y x86-32 = Y x86-64 = Y +Power mgmt address monitor = Y [rte_flow items] ah = Y diff --git a/doc/guides/nics/features/ixgbe.ini b/doc/guides/nics/features/ixgbe.ini index 93a9cc18ab..92228fe194 100644 --- a/doc/guides/nics/features/ixgbe.ini +++ b/doc/guides/nics/features/ixgbe.ini @@ -54,6 +54,7 @@ Linux= Y ARMv8= Y x86-32 = Y x86-64 = Y +Power mgmt address monitor = Y [rte_flow items] eth = Y diff --git a/doc/guides/nics/features/ixgbe_vf.ini b/doc/guides/nics/features/ixgbe_vf.ini index 7161e61f9a..ea8342f2c9 100644 --- a/doc/guides/nics/features/ixgbe_vf.ini +++ b/doc/guides/nics/features/ixgbe_vf.ini @@ -38,3 +38,4 @@ Linux= Y ARMv8= Y x86-32 = Y x86-64 = Y +Power mgmt address monitor = Y diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini index 3b82ce41fd..2c7d9f6e8c 100644 --- a/doc/guides/nics/features/mlx5.ini +++ b/doc/guides/nics/features/mlx5.ini @@ -51,6 +51,7 @@ Power8
[dpdk-dev] [PATCH v10 4/8] eal: add power monitor for multiple events
Use RTM and WAITPKG instructions to perform a wait-for-writes similar to what UMWAIT does, but without the limitation of having to listen for just one event. This works because the optimized power state used by the TPAUSE instruction will cause a wake up on RTM transaction abort, so if we add the addresses we're interested in to the read-set, any write to those addresses will wake us up. Signed-off-by: Konstantin Ananyev Signed-off-by: Anatoly Burakov Tested-by: David Hunt --- Notes: v4: - Fixed bugs in accessing the monitor condition - Abort on any monitor condition not having a defined callback v2: - Adapt to callback mechanism lib/eal/arm/rte_power_intrinsics.c| 11 +++ lib/eal/include/generic/rte_cpuflags.h| 2 + .../include/generic/rte_power_intrinsics.h| 35 + lib/eal/ppc/rte_power_intrinsics.c| 11 +++ lib/eal/version.map | 3 + lib/eal/x86/rte_cpuflags.c| 2 + lib/eal/x86/rte_power_intrinsics.c| 73 +++ 7 files changed, 137 insertions(+) diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c index e83f04072a..78f55b7203 100644 --- a/lib/eal/arm/rte_power_intrinsics.c +++ b/lib/eal/arm/rte_power_intrinsics.c @@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) return -ENOTSUP; } + +int +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp) +{ + RTE_SET_USED(pmc); + RTE_SET_USED(num); + RTE_SET_USED(tsc_timestamp); + + return -ENOTSUP; +} diff --git a/lib/eal/include/generic/rte_cpuflags.h b/lib/eal/include/generic/rte_cpuflags.h index 28a5aecde8..d35551e931 100644 --- a/lib/eal/include/generic/rte_cpuflags.h +++ b/lib/eal/include/generic/rte_cpuflags.h @@ -24,6 +24,8 @@ struct rte_cpu_intrinsics { /**< indicates support for rte_power_monitor function */ uint32_t power_pause : 1; /**< indicates support for rte_power_pause function */ + uint32_t power_monitor_multi : 1; + /**< indicates support for rte_power_monitor_multi function */ }; /** diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h index c9aa52a86d..04e8c2ab37 100644 --- a/lib/eal/include/generic/rte_power_intrinsics.h +++ b/lib/eal/include/generic/rte_power_intrinsics.h @@ -128,4 +128,39 @@ int rte_power_monitor_wakeup(const unsigned int lcore_id); __rte_experimental int rte_power_pause(const uint64_t tsc_timestamp); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Monitor a set of addresses for changes. This will cause the CPU to enter an + * architecture-defined optimized power state until either one of the specified + * memory addresses is written to, a certain TSC timestamp is reached, or other + * reasons cause the CPU to wake up. + * + * Additionally, `expected` 64-bit values and 64-bit masks are provided. If + * mask is non-zero, the current value pointed to by the `p` pointer will be + * checked against the expected value, and if they do not match, the entering of + * optimized power state may be aborted. + * + * @warning It is responsibility of the user to check if this function is + * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. + * Failing to do so may result in an illegal CPU instruction error. + * + * @param pmc + * An array of monitoring condition structures. + * @param num + * Length of the `pmc` array. + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + * + * @return + * 0 on success + * -EINVAL on invalid parameters + * -ENOTSUP if unsupported + */ +__rte_experimental +int rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp); + #endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/eal/ppc/rte_power_intrinsics.c b/lib/eal/ppc/rte_power_intrinsics.c index 7fc9586da7..f00b58ade5 100644 --- a/lib/eal/ppc/rte_power_intrinsics.c +++ b/lib/eal/ppc/rte_power_intrinsics.c @@ -38,3 +38,14 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) return -ENOTSUP; } + +int +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], + const uint32_t num, const uint64_t tsc_timestamp) +{ + RTE_SET_USED(pmc); + RTE_SET_USED(num); + RTE_SET_USED(tsc_timestamp); + + return -ENOTSUP; +} diff --git a/lib/eal/version.map b/lib/eal/version.map index 2df65c6903..887012d02a 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -423,6 +423,9 @@ EXPERIMENTAL { rte_version_release; # WINDOWS_NO_EXPORT rte_version_suffix; # WINDOWS_NO_EXPORT rte_version_year; # WINDOWS_NO_EXPORT + + # added in 21.08 + r
[dpdk-dev] [PATCH v10 5/8] power: remove thread safety from PMD power API's
Currently, we expect that only one callback can be active at any given moment, for a particular queue configuration, which is relatively easy to implement in a thread-safe way. However, we're about to add support for multiple queues per lcore, which will greatly increase the possibility of various race conditions. We could have used something like an RCU for this use case, but absent of a pressing need for thread safety we'll go the easy way and just mandate that the API's are to be called when all affected ports are stopped, and document this limitation. This greatly simplifies the `rte_power_monitor`-related code. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Tested-by: David Hunt --- Notes: v2: - Add check for stopped queue - Clarified doc message - Added release notes doc/guides/rel_notes/release_21_08.rst | 4 + lib/power/meson.build | 3 + lib/power/rte_power_pmd_mgmt.c | 133 ++--- lib/power/rte_power_pmd_mgmt.h | 6 ++ 4 files changed, 66 insertions(+), 80 deletions(-) diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst index 912fb13b84..b9a3caabf0 100644 --- a/doc/guides/rel_notes/release_21_08.rst +++ b/doc/guides/rel_notes/release_21_08.rst @@ -146,6 +146,10 @@ API Changes * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism. +* rte_power: The experimental PMD power management API is no longer considered + to be thread safe; all Rx queues affected by the API will now need to be + stopped before making any changes to the power management scheme. + ABI Changes --- diff --git a/lib/power/meson.build b/lib/power/meson.build index 36e5a65874..bf937acde4 100644 --- a/lib/power/meson.build +++ b/lib/power/meson.build @@ -22,4 +22,7 @@ headers = files( 'rte_power_pmd_mgmt.h', 'rte_power_guest_channel.h', ) +if cc.has_argument('-Wno-cast-qual') +cflags += '-Wno-cast-qual' +endif deps += ['timer', 'ethdev'] diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c index db03cbf420..9b95cf1794 100644 --- a/lib/power/rte_power_pmd_mgmt.c +++ b/lib/power/rte_power_pmd_mgmt.c @@ -40,8 +40,6 @@ struct pmd_queue_cfg { /**< Callback mode for this queue */ const struct rte_eth_rxtx_callback *cur_cb; /**< Callback instance */ - volatile bool umwait_in_progress; - /**< are we currently sleeping? */ uint64_t empty_poll_stats; /**< Number of empty polls */ } __rte_cache_aligned; @@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, struct rte_power_monitor_cond pmc; uint16_t ret; - /* -* we might get a cancellation request while being -* inside the callback, in which case the wakeup -* wouldn't work because it would've arrived too early. -* -* to get around this, we notify the other thread that -* we're sleeping, so that it can spin until we're done. -* unsolicited wakeups are perfectly safe. -*/ - q_conf->umwait_in_progress = true; - - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); - - /* check if we need to cancel sleep */ - if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { - /* use monitoring condition to sleep */ - ret = rte_eth_get_monitor_addr(port_id, qidx, - &pmc); - if (ret == 0) - rte_power_monitor(&pmc, UINT64_MAX); - } - q_conf->umwait_in_progress = false; - - rte_atomic_thread_fence(__ATOMIC_SEQ_CST); + /* use monitoring condition to sleep */ + ret = rte_eth_get_monitor_addr(port_id, qidx, + &pmc); + if (ret == 0) + rte_power_monitor(&pmc, UINT64_MAX); } } else q_conf->empty_poll_stats = 0; @@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx, return nb_rx; } +static int +queue_stopped(const uint16_t port_id, const uint16_t queue_id) +{ + struct rte_eth_rxq_info qinfo; + + if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0) + return -1; + + return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED; +} + int rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) { struct
[dpdk-dev] [PATCH v10 6/8] power: support callbacks for multiple Rx queues
Currently, there is a hard limitation on the PMD power management support that only allows it to support a single queue per lcore. This is not ideal as most DPDK use cases will poll multiple queues per core. The PMD power management mechanism relies on ethdev Rx callbacks, so it is very difficult to implement such support because callbacks are effectively stateless and have no visibility into what the other ethdev devices are doing. This places limitations on what we can do within the framework of Rx callbacks, but the basics of this implementation are as follows: - Replace per-queue structures with per-lcore ones, so that any device polled from the same lcore can share data - Any queue that is going to be polled from a specific lcore has to be added to the list of queues to poll, so that the callback is aware of other queues being polled by the same lcore - Both the empty poll counter and the actual power saving mechanism is shared between all queues polled on a particular lcore, and is only activated when all queues in the list were polled and were determined to have no traffic. - The limitation on UMWAIT-based polling is not removed because UMWAIT is incapable of monitoring more than one address. Also, while we're at it, update and improve the docs. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Tested-by: David Hunt --- Notes: v8: - Added a comment explaining that we want to sleep on each empty poll after threshold has been reached v7: - Fix bug where initial sleep target was always set to zero - Fix logic in handling of n_queues_ready_to_sleep counter - Update documentation on hardware requirements v6: - Track each individual queue sleep status (Konstantin) - Fix segfault (Dave) v5: - Remove the "power save queue" API and replace it with mechanism suggested by Konstantin v3: - Move the list of supported NICs to NIC feature table v2: - Use a TAILQ for queues instead of a static array - Address feedback from Konstantin - Add additional checks for stopped queues doc/guides/prog_guide/power_man.rst| 69 ++-- doc/guides/rel_notes/release_21_08.rst | 5 + lib/power/rte_power_pmd_mgmt.c | 460 +++-- 3 files changed, 398 insertions(+), 136 deletions(-) diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index c70ae128ac..0e66878892 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -198,34 +198,45 @@ Ethernet PMD Power Management API Abstract -Existing power management mechanisms require developers -to change application design or change code to make use of it. -The PMD power management API provides a convenient alternative -by utilizing Ethernet PMD RX callbacks, -and triggering power saving whenever empty poll count reaches a certain number. - -Monitor - This power saving scheme will put the CPU into optimized power state - and use the ``rte_power_monitor()`` function - to monitor the Ethernet PMD RX descriptor address, - and wake the CPU up whenever there's new traffic. - -Pause - This power saving scheme will avoid busy polling - by either entering power-optimized sleep state - with ``rte_power_pause()`` function, - or, if it's not available, use ``rte_pause()``. - -Frequency scaling - This power saving scheme will use ``librte_power`` library - functionality to scale the core frequency up/down - depending on traffic volume. - -.. note:: - - Currently, this power management API is limited to mandatory mapping - of 1 queue to 1 core (multiple queues are supported, - but they must be polled from different cores). +Existing power management mechanisms require developers to change application +design or change code to make use of it. The PMD power management API provides a +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering +power saving whenever empty poll count reaches a certain number. + +* Monitor + This power saving scheme will put the CPU into optimized power state and + monitor the Ethernet PMD RX descriptor address, waking the CPU up whenever + there's new traffic. Support for this scheme may not be available on all + platforms, and further limitations may apply (see below). + +* Pause + This power saving scheme will avoid busy polling by either entering + power-optimized sleep state with ``rte_power_pause()`` function, or, if it's + not supported by the underlying platform, use ``rte_pause()``. + +* Frequency scaling + This power saving scheme will use ``librte_power`` library functionality to + scale the core frequency up/down depending on traffic volume. + +The "monitor" mode is only supported in the following configurations and scenarios: + +* On Linux* x86_64, `rte_power_monitor()` requires WAITPKG instruction set being + supported by the CPU. Please refe
[dpdk-dev] [PATCH v10 7/8] power: support monitoring multiple Rx queues
Use the new multi-monitor intrinsic to allow monitoring multiple ethdev Rx queues while entering the energy efficient power state. The multi version will be used unconditionally if supported, and the UMWAIT one will only be used when multi-monitor is not supported by the hardware. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Tested-by: David Hunt --- Notes: v6: - Fix the missed feedback from v5 v4: - Fix possible out of bounds access - Added missing index increment doc/guides/prog_guide/power_man.rst | 15 -- lib/power/rte_power_pmd_mgmt.c | 82 - 2 files changed, 90 insertions(+), 7 deletions(-) diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index 0e66878892..e387d7811e 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -221,17 +221,22 @@ power saving whenever empty poll count reaches a certain number. The "monitor" mode is only supported in the following configurations and scenarios: * On Linux* x86_64, `rte_power_monitor()` requires WAITPKG instruction set being - supported by the CPU. Please refer to your platform documentation for further - information. + supported by the CPU, while `rte_power_monitor_multi()` requires WAITPKG and + RTM instruction sets being supported by the CPU. RTM instruction set may also + require booting the Linux with `tsx=on` command line parameter. Please refer + to your platform documentation for further information. * If ``rte_cpu_get_intrinsics_support()`` function indicates that + ``rte_power_monitor_multi()`` function is supported by the platform, then + monitoring multiple Ethernet Rx queues for traffic will be supported. + +* If ``rte_cpu_get_intrinsics_support()`` function indicates that only ``rte_power_monitor()`` is supported by the platform, then monitoring will be limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be monitored from a different lcore). -* If ``rte_cpu_get_intrinsics_support()`` function indicates that the - ``rte_power_monitor()`` function is not supported, then monitor mode will not - be supported. +* If ``rte_cpu_get_intrinsics_support()`` function indicates that neither of the + two monitoring functions are supported, then monitor mode will not be supported. * Not all Ethernet drivers support monitoring, even if the underlying platform may support the necessary CPU instructions. Please refer to diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c index 30772791af..2586204b93 100644 --- a/lib/power/rte_power_pmd_mgmt.c +++ b/lib/power/rte_power_pmd_mgmt.c @@ -126,6 +126,32 @@ queue_list_take(struct pmd_core_cfg *cfg, const union queue *q) return found; } +static inline int +get_monitor_addresses(struct pmd_core_cfg *cfg, + struct rte_power_monitor_cond *pmc, size_t len) +{ + const struct queue_list_entry *qle; + size_t i = 0; + int ret; + + TAILQ_FOREACH(qle, &cfg->head, next) { + const union queue *q = &qle->queue; + struct rte_power_monitor_cond *cur; + + /* attempted out of bounds access */ + if (i >= len) { + RTE_LOG(ERR, POWER, "Too many queues being monitored\n"); + return -1; + } + + cur = &pmc[i++]; + ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur); + if (ret < 0) + return ret; + } + return 0; +} + static void calc_tsc(void) { @@ -215,6 +241,46 @@ lcore_can_sleep(struct pmd_core_cfg *cfg) return true; } +static uint16_t +clb_multiwait(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *arg) +{ + const unsigned int lcore = rte_lcore_id(); + struct queue_list_entry *queue_conf = arg; + struct pmd_core_cfg *lcore_conf; + const bool empty = nb_rx == 0; + + lcore_conf = &lcore_cfgs[lcore]; + + /* early exit */ + if (likely(!empty)) + /* early exit */ + queue_reset(lcore_conf, queue_conf); + else { + struct rte_power_monitor_cond pmc[lcore_conf->n_queues]; + int ret; + + /* can this queue sleep? */ + if (!queue_can_sleep(lcore_conf, queue_conf)) + return nb_rx; + + /* can this lcore sleep? */ + if (!lcore_can_sleep(lcore_conf)) + return nb_rx; + + /* gather all monitoring conditions */ + ret = get_monitor_addresses(lcore_conf, pmc, + lcore_conf->n_queues); + if (ret < 0) + return nb_rx; + + rte_power_mo
[dpdk-dev] [PATCH v10 8/8] examples/l3fwd-power: support multiq in PMD modes
Currently, l3fwd-power enforces the limitation of having one queue per lcore. This is no longer necessary, so remove the limitation. Signed-off-by: Anatoly Burakov Tested-by: David Hunt --- examples/l3fwd-power/main.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index f8dfed1634..52f56dc405 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -2723,12 +2723,6 @@ main(int argc, char **argv) printf("\nInitializing rx queues on lcore %u ... ", lcore_id ); fflush(stdout); - /* PMD power management mode can only do 1 queue per core */ - if (app_mode == APP_MODE_PMD_MGMT && qconf->n_rx_queue > 1) { - rte_exit(EXIT_FAILURE, - "In PMD power management mode, only one queue per lcore is allowed\n"); - } - /* init RX queues */ for(queue = 0; queue < qconf->n_rx_queue; ++queue) { struct rte_eth_rxconf rxq_conf; -- 2.25.1
Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows
On Fri, Jul 09, 2021 at 01:34:06PM +0200, Thomas Monjalon wrote: > 09/07/2021 02:16, Tyler Retzlaff: > > On Thu, Jul 08, 2021 at 10:39:13PM +0200, Thomas Monjalon wrote: > > > 08/07/2021 21:21, Tyler Retzlaff: > > > > (2) importing exported data symbols from a dll/dso on windows requires > > > > that the symbol be decorated with dllimport. optionally loading > > > > performance of dll/dso is also further improved by decorating > > > > exported function symbols. [3] > > > > > > > > for (2) we would propose the introduction and use of two macros to > > > > allow decoration of exported data symbols. these macro would be or > > > > similarly named __rte_import and __rte_export. of note > > > > > > That's the same symbol declared in a single place > > > which is exported and imported. > > > So I don't understand the need for 2 macros. > > > > i may be misinterpreting your reply. you're saying there is no need for > > 2 because we use .def files? > > > > strictly speaking when exporting C symbols this is true. so yes, we > > could introduce only __rte_import and not bother with __rte_export. > > > > is that what you meant? > > > > i don't have any objection to just __rte_import alone but it is > > mandatory for data symbols. > > It may be my misunderstanding. > The function is declared only once in the .h > so I don't understand where these 2 macros are used. > okay, i think the question is about mechanically where they are used and i'll limit my answer to data symbols since that's what we need them for. the export is typically used at the point of definition and the import at declaration. so ignoring anything about tls the following example hopefully illustrates. let's start with a module i'll just call it eal.dll it is declaring a data symbol that will be exported and may be referenced internally within eal.dll itself but also externally by other modules outside of eal.dll. // eal.c __rte_export int eal_some_var; // eal.h #ifndef BUILDING_EAL __rte_import #endif extern int eal_some_var; // eal_other.c #define BUILDING_EAL #include eal_some_var = 100; we also have another module app.exe that consumes the variable exported from eal.dll. // app.c #include eal_some_var = 200; as mentioned in my first reply you can get away with not using __rte_export because dpdk also lists the exports in the .map/.def files even for data symbols but it is required for import [1]. the relevant text from the documentation is quoted here. Using __declspec(dllimport) is optional on function declarations, but the compiler produces more efficient code if you use this keyword. However, you must use __declspec(dllimport) for the importing executable to access the DLL's public data symbols and objects. Note that the users of your DLL still need to link with an import library. specifically "you must use __declspec(dllimport) for the importing executable to access the DLL's public data symbols" is relevant. the other internal/external conditional compile for BUILDING_EAL is applied to avoid generating the code for the extra layer of indirection mentioned here [2] when using the variable in the same module in which it was defined. 1. https://docs.microsoft.com/en-us/cpp/build/importing-into-an-application-using-declspec-dllimport?view=msvc-160 2. https://docs.microsoft.com/en-us/cpp/build/importing-data-using-declspec-dllimport?view=msvc-160
[dpdk-dev] [PATCH] examples/pipeline: add FIB example
Add example for FIB with VRF and ECMP support. Signed-off-by: Cristian Dumitrescu Signed-off-by: Churchill Khangar --- Depends-on: patch-17716 ("[V2] pipeline: add support for LPM lookup") examples/pipeline/examples/fib.cli| 58 ++ examples/pipeline/examples/fib.spec | 171 ++ .../examples/fib_nexthop_group_table.txt | 54 ++ .../pipeline/examples/fib_nexthop_table.txt | 22 +++ .../pipeline/examples/fib_routing_table.txt | 30 +++ 5 files changed, 335 insertions(+) create mode 100644 examples/pipeline/examples/fib.cli create mode 100644 examples/pipeline/examples/fib.spec create mode 100644 examples/pipeline/examples/fib_nexthop_group_table.txt create mode 100644 examples/pipeline/examples/fib_nexthop_table.txt create mode 100644 examples/pipeline/examples/fib_routing_table.txt diff --git a/examples/pipeline/examples/fib.cli b/examples/pipeline/examples/fib.cli new file mode 100644 index 0..b20aed3cf --- /dev/null +++ b/examples/pipeline/examples/fib.cli @@ -0,0 +1,58 @@ +; SPDX-License-Identifier: BSD-3-Clause +; Copyright(c) 2020 Intel Corporation + +; +; Customize the LINK parameters to match your setup. +; +mempool MEMPOOL0 buffer 2304 pool 32K cache 256 cpu 0 + +link LINK0 dev :18:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on +link LINK1 dev :18:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on +link LINK2 dev :3b:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on +link LINK3 dev :3b:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on + +; +; PIPELINE0 setup. +; +pipeline PIPELINE0 create 0 + +pipeline PIPELINE0 port in 0 link LINK0 rxq 0 bsz 32 +pipeline PIPELINE0 port in 1 link LINK1 rxq 0 bsz 32 +pipeline PIPELINE0 port in 2 link LINK2 rxq 0 bsz 32 +pipeline PIPELINE0 port in 3 link LINK3 rxq 0 bsz 32 + +pipeline PIPELINE0 port out 0 link LINK0 txq 0 bsz 32 +pipeline PIPELINE0 port out 1 link LINK1 txq 0 bsz 32 +pipeline PIPELINE0 port out 2 link LINK2 txq 0 bsz 32 +pipeline PIPELINE0 port out 3 link LINK3 txq 0 bsz 32 +pipeline PIPELINE0 port out 4 sink none + +pipeline PIPELINE0 build ./examples/pipeline/examples/fib.spec + +; +; Initial set of table entries. +; +; The table entries can later be updated at run-time through the CLI commands. Once the application +; has been successfully started, the command to get the CLI prompt is: telnet 0.0.0.0 8086. +; +pipeline PIPELINE0 table routing_table add ./examples/pipeline/examples/fib_routing_table.txt +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group add +pipeline PIPELINE0 selector nexthop_group_table group member add ./examples/pipeline/examples/fib_nexthop_group_table.txt +pipeline PIPELINE0 table nexthop_table add ./examples/pipeline/examples/fib_nexthop_table.txt +pipeline PIPELINE0 commit + +; +; Pipelines-to-threads mapping. +; +thread 1 pipeline PIPELINE0 enable diff --git a/examples/pipeline/examples/fib.spec b/examples/pipeline/examples/fib.spec new file mode 100644 index 0..3f8c76610 --- /dev/null +++ b/examples/pipeline/examples/fib.spec @@ -0,0 +1,171 @@ +; SPDX-License-Identifier: BSD-3-Clause +; Copyright(c) 2021 Intel Corporation + +; This example illustrates a FIB [1] with VRF [2] and ECMP [3] support. A FIB essentially is the +; data plane copy of the routing table. The VRF support allows for multiple logical routing tables +; to co-exist as part of the same "physical" routing table; the VRF ID typically identifies the +; logical table to provide the matching route for the IP destination address of the input packet. +; The ECMP provides a load balancing mechanism for the packet forwarding by allowing for multiple +; next hops (of equal or different weights, in case of WCMP [4]) to be provided for each route. +; +; In this example, the VRF ID is read from the IP source address of the input packet as opposed to a +; more complex classification scheme being used. The routing table produces the ID of the group of +; next hops associated with the current route, out of which a single next hop is selected based on a +; hashing scheme that preserves the packet order within each flow (with the flow defined here by a +; typical 3-tuple) by always selecting the same next hop for packets that are part of the same flow. +; The next hop provides the Ethernet header and the output port for
[dpdk-dev] [PATCH 2/4] ethdev: move jumbo frame offload check to library
Setting MTU bigger than RTE_ETHER_MTU requires the jumbo frame support, and application should enable the jumbo frame offload support for it. When jumbo frame offload is not enabled by application, but MTU bigger than RTE_ETHER_MTU is requested there are two options, either fail or enable jumbo frame offload implicitly. Enabling jumbo frame offload implicitly is selected by many drivers since setting a big MTU value already implies it, and this increases usability. This patch moves this logic from drivers to the library, both to reduce the duplicated code in the drivers and to make behaviour more visible. Signed-off-by: Ferruh Yigit --- drivers/net/axgbe/axgbe_ethdev.c| 9 ++--- drivers/net/bnxt/bnxt_ethdev.c | 9 ++--- drivers/net/cnxk/cnxk_ethdev_ops.c | 5 - drivers/net/cxgbe/cxgbe_ethdev.c| 8 drivers/net/dpaa/dpaa_ethdev.c | 7 --- drivers/net/dpaa2/dpaa2_ethdev.c| 7 --- drivers/net/e1000/em_ethdev.c | 9 ++--- drivers/net/e1000/igb_ethdev.c | 9 ++--- drivers/net/enetc/enetc_ethdev.c| 7 --- drivers/net/hinic/hinic_pmd_ethdev.c| 7 --- drivers/net/hns3/hns3_ethdev.c | 8 drivers/net/hns3/hns3_ethdev_vf.c | 6 -- drivers/net/i40e/i40e_ethdev.c | 5 - drivers/net/i40e/i40e_ethdev_vf.c | 5 - drivers/net/iavf/iavf_ethdev.c | 7 --- drivers/net/ice/ice_ethdev.c| 5 - drivers/net/igc/igc_ethdev.c| 9 ++--- drivers/net/ipn3ke/ipn3ke_representor.c | 5 - drivers/net/ixgbe/ixgbe_ethdev.c| 7 ++- drivers/net/liquidio/lio_ethdev.c | 7 --- drivers/net/nfp/nfp_net.c | 6 -- drivers/net/octeontx/octeontx_ethdev.c | 5 - drivers/net/octeontx2/otx2_ethdev_ops.c | 5 - drivers/net/qede/qede_ethdev.c | 4 drivers/net/sfc/sfc_ethdev.c| 9 - drivers/net/thunderx/nicvf_ethdev.c | 6 -- drivers/net/txgbe/txgbe_ethdev.c| 6 -- lib/ethdev/rte_ethdev.c | 18 +- 28 files changed, 29 insertions(+), 171 deletions(-) diff --git a/drivers/net/axgbe/axgbe_ethdev.c b/drivers/net/axgbe/axgbe_ethdev.c index 76aeec077f2b..2960834b4539 100644 --- a/drivers/net/axgbe/axgbe_ethdev.c +++ b/drivers/net/axgbe/axgbe_ethdev.c @@ -1492,15 +1492,10 @@ static int axgb_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) dev->data->port_id); return -EBUSY; } - if (mtu > RTE_ETHER_MTU) { - dev->data->dev_conf.rxmode.offloads |= - DEV_RX_OFFLOAD_JUMBO_FRAME; + if (mtu > RTE_ETHER_MTU) val = 1; - } else { - dev->data->dev_conf.rxmode.offloads &= - ~DEV_RX_OFFLOAD_JUMBO_FRAME; + else val = 0; - } AXGMAC_IOWRITE_BITS(pdata, MAC_RCR, JE, val); return 0; } diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 335505a106d5..4344a012f06e 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -3018,15 +3018,10 @@ int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t new_mtu) return -EINVAL; } - if (new_mtu > RTE_ETHER_MTU) { + if (new_mtu > RTE_ETHER_MTU) bp->flags |= BNXT_FLAG_JUMBO; - bp->eth_dev->data->dev_conf.rxmode.offloads |= - DEV_RX_OFFLOAD_JUMBO_FRAME; - } else { - bp->eth_dev->data->dev_conf.rxmode.offloads &= - ~DEV_RX_OFFLOAD_JUMBO_FRAME; + else bp->flags &= ~BNXT_FLAG_JUMBO; - } /* Is there a change in mtu setting? */ if (eth_dev->data->mtu == new_mtu) diff --git a/drivers/net/cnxk/cnxk_ethdev_ops.c b/drivers/net/cnxk/cnxk_ethdev_ops.c index 695d0d6fd3e2..349896f6a1bf 100644 --- a/drivers/net/cnxk/cnxk_ethdev_ops.c +++ b/drivers/net/cnxk/cnxk_ethdev_ops.c @@ -439,11 +439,6 @@ cnxk_nix_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu) plt_err("Failed to max Rx frame length, rc=%d", rc); goto exit; } - - if (mtu > RTE_ETHER_MTU) - dev->rx_offloads |= DEV_RX_OFFLOAD_JUMBO_FRAME; - else - dev->rx_offloads &= ~DEV_RX_OFFLOAD_JUMBO_FRAME; exit: return rc; } diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c index 8cf61f12a8d6..0c9cc2f5bb3f 100644 --- a/drivers/net/cxgbe/cxgbe_ethdev.c +++ b/drivers/net/cxgbe/cxgbe_ethdev.c @@ -313,14 +313,6 @@ int cxgbe_dev_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu) if (mtu < RTE_ETHER_MIN_MTU || new_mtu > dev_info.max_rx_pktlen) return -EINVAL; - /* set to jumbo mode if needed */ - if (mtu > RTE_ETHER_MTU) - eth_
[dpdk-dev] [PATCH 3/4] ethdev: move check to library for MTU set
Move requested MTU value check to the API to prevent the duplicated code. Signed-off-by: Ferruh Yigit --- drivers/net/axgbe/axgbe_ethdev.c| 15 --- drivers/net/bnxt/bnxt_ethdev.c | 2 +- drivers/net/cxgbe/cxgbe_ethdev.c| 13 + drivers/net/dpaa/dpaa_ethdev.c | 2 -- drivers/net/dpaa2/dpaa2_ethdev.c| 4 drivers/net/e1000/em_ethdev.c | 10 -- drivers/net/e1000/igb_ethdev.c | 11 --- drivers/net/enetc/enetc_ethdev.c| 4 drivers/net/hinic/hinic_pmd_ethdev.c| 8 +--- drivers/net/i40e/i40e_ethdev.c | 17 - drivers/net/i40e/i40e_ethdev_vf.c | 17 - drivers/net/iavf/iavf_ethdev.c | 10 ++ drivers/net/ice/ice_ethdev.c| 14 +++--- drivers/net/igc/igc_ethdev.c| 5 - drivers/net/ipn3ke/ipn3ke_representor.c | 6 -- drivers/net/liquidio/lio_ethdev.c | 10 -- drivers/net/nfp/nfp_net.c | 4 drivers/net/octeontx/octeontx_ethdev.c | 4 drivers/net/octeontx2/otx2_ethdev_ops.c | 5 - drivers/net/qede/qede_ethdev.c | 12 drivers/net/thunderx/nicvf_ethdev.c | 6 -- drivers/net/txgbe/txgbe_ethdev.c| 10 -- lib/ethdev/rte_ethdev.c | 9 + 23 files changed, 29 insertions(+), 169 deletions(-) diff --git a/drivers/net/axgbe/axgbe_ethdev.c b/drivers/net/axgbe/axgbe_ethdev.c index 2960834b4539..c36cd7b1d2f0 100644 --- a/drivers/net/axgbe/axgbe_ethdev.c +++ b/drivers/net/axgbe/axgbe_ethdev.c @@ -1478,25 +1478,18 @@ axgbe_dev_supported_ptypes_get(struct rte_eth_dev *dev) static int axgb_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) { - struct rte_eth_dev_info dev_info; struct axgbe_port *pdata = dev->data->dev_private; - uint32_t frame_size = mtu + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN; - unsigned int val = 0; - axgbe_dev_info_get(dev, &dev_info); - /* check that mtu is within the allowed range */ - if (mtu < RTE_ETHER_MIN_MTU || frame_size > dev_info.max_rx_pktlen) - return -EINVAL; + unsigned int val; + /* mtu setting is forbidden if port is start */ if (dev->data->dev_started) { PMD_DRV_LOG(ERR, "port %d must be stopped before configuration", dev->data->port_id); return -EBUSY; } - if (mtu > RTE_ETHER_MTU) - val = 1; - else - val = 0; + val = mtu > RTE_ETHER_MTU ? 1 : 0; AXGMAC_IOWRITE_BITS(pdata, MAC_RCR, JE, val); + return 0; } diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 4344a012f06e..1e7da8ba61a6 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -2991,7 +2991,7 @@ int bnxt_mtu_set_op(struct rte_eth_dev *eth_dev, uint16_t new_mtu) uint32_t overhead = BNXT_MAX_PKT_LEN - BNXT_MAX_MTU; struct bnxt *bp = eth_dev->data->dev_private; uint32_t new_pkt_size; - uint32_t rc = 0; + uint32_t rc; uint32_t i; rc = is_bnxt_in_error(bp); diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c index 0c9cc2f5bb3f..70b879fed100 100644 --- a/drivers/net/cxgbe/cxgbe_ethdev.c +++ b/drivers/net/cxgbe/cxgbe_ethdev.c @@ -301,21 +301,10 @@ int cxgbe_dev_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu) { struct port_info *pi = eth_dev->data->dev_private; struct adapter *adapter = pi->adapter; - struct rte_eth_dev_info dev_info; - int err; uint16_t new_mtu = mtu + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN; - err = cxgbe_dev_info_get(eth_dev, &dev_info); - if (err != 0) - return err; - - /* Must accommodate at least RTE_ETHER_MIN_MTU */ - if (mtu < RTE_ETHER_MIN_MTU || new_mtu > dev_info.max_rx_pktlen) - return -EINVAL; - - err = t4_set_rxmode(adapter, adapter->mbox, pi->viid, new_mtu, -1, -1, + return t4_set_rxmode(adapter, adapter->mbox, pi->viid, new_mtu, -1, -1, -1, -1, true); - return err; } /* diff --git a/drivers/net/dpaa/dpaa_ethdev.c b/drivers/net/dpaa/dpaa_ethdev.c index a444f749bb96..60dd4f67fc26 100644 --- a/drivers/net/dpaa/dpaa_ethdev.c +++ b/drivers/net/dpaa/dpaa_ethdev.c @@ -167,8 +167,6 @@ dpaa_mtu_set(struct rte_eth_dev *dev, uint16_t mtu) PMD_INIT_FUNC_TRACE(); - if (mtu < RTE_ETHER_MIN_MTU || frame_size > DPAA_MAX_RX_PKT_LEN) - return -EINVAL; /* * Refuse mtu that requires the support of scattered packets * when this feature has not been enabled before. diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c index be2858b3adac..6b44b0557e6a 100644 --- a/drivers/net/dpaa2/dpaa2_ethdev.c +++ b/drivers/net
[dpdk-dev] [PATCH 4/4] ethdev: remove jumbo offload flag
Removing 'DEV_RX_OFFLOAD_JUMBO_FRAME' offload flag. Instead of drivers announce this capability, application can deduct the capability by checking reported 'dev_info.max_mtu' or 'dev_info.max_rx_pktlen'. And instead of application explicitly set this flag to enable jumbo frames, this can be deducted by driver by comparing requested 'mtu' to 'RTE_ETHER_MTU'. Removing this additional configuration for simplification. Signed-off-by: Ferruh Yigit --- app/test-eventdev/test_pipeline_common.c | 2 - app/test-pmd/cmdline.c| 2 +- app/test-pmd/config.c | 24 +- app/test-pmd/testpmd.c| 46 +-- app/test-pmd/testpmd.h| 2 +- doc/guides/howto/debug_troubleshoot.rst | 2 - doc/guides/nics/bnxt.rst | 1 - doc/guides/nics/features.rst | 3 +- drivers/net/atlantic/atl_ethdev.c | 1 - drivers/net/axgbe/axgbe_ethdev.c | 1 - drivers/net/bnx2x/bnx2x_ethdev.c | 1 - drivers/net/bnxt/bnxt.h | 1 - drivers/net/bnxt/bnxt_ethdev.c| 10 +--- drivers/net/bonding/rte_eth_bond_pmd.c| 8 drivers/net/cnxk/cnxk_ethdev.h| 5 +- drivers/net/cnxk/cnxk_ethdev_ops.c| 1 - drivers/net/cxgbe/cxgbe.h | 1 - drivers/net/cxgbe/cxgbe_ethdev.c | 8 drivers/net/cxgbe/sge.c | 5 +- drivers/net/dpaa/dpaa_ethdev.c| 2 - drivers/net/dpaa2/dpaa2_ethdev.c | 2 - drivers/net/e1000/e1000_ethdev.h | 4 +- drivers/net/e1000/em_ethdev.c | 4 +- drivers/net/e1000/em_rxtx.c | 19 +++- drivers/net/e1000/igb_rxtx.c | 3 +- drivers/net/ena/ena_ethdev.c | 2 - drivers/net/enetc/enetc_ethdev.c | 3 +- drivers/net/enic/enic_res.c | 1 - drivers/net/failsafe/failsafe_ops.c | 2 - drivers/net/fm10k/fm10k_ethdev.c | 1 - drivers/net/hinic/hinic_pmd_ethdev.c | 1 - drivers/net/hns3/hns3_ethdev.c| 1 - drivers/net/hns3/hns3_ethdev_vf.c | 1 - drivers/net/i40e/i40e_ethdev.c| 1 - drivers/net/i40e/i40e_ethdev_vf.c | 3 +- drivers/net/i40e/i40e_rxtx.c | 2 +- drivers/net/iavf/iavf_ethdev.c| 3 +- drivers/net/ice/ice_dcf_ethdev.c | 3 +- drivers/net/ice/ice_dcf_vf_representor.c | 1 - drivers/net/ice/ice_ethdev.c | 1 - drivers/net/ice/ice_rxtx.c| 3 +- drivers/net/igc/igc_ethdev.h | 1 - drivers/net/igc/igc_txrx.c| 2 +- drivers/net/ionic/ionic_ethdev.c | 1 - drivers/net/ipn3ke/ipn3ke_representor.c | 3 +- drivers/net/ixgbe/ixgbe_ethdev.c | 5 +- drivers/net/ixgbe/ixgbe_pf.c | 9 +--- drivers/net/ixgbe/ixgbe_rxtx.c| 3 +- drivers/net/mlx4/mlx4_rxq.c | 1 - drivers/net/mlx5/mlx5_rxq.c | 1 - drivers/net/mvneta/mvneta_ethdev.h| 3 +- drivers/net/mvpp2/mrvl_ethdev.c | 1 - drivers/net/nfp/nfp_net.c | 6 +-- drivers/net/octeontx/octeontx_ethdev.h| 1 - drivers/net/octeontx2/otx2_ethdev.h | 1 - drivers/net/octeontx_ep/otx_ep_ethdev.c | 3 +- drivers/net/octeontx_ep/otx_ep_rxtx.c | 6 --- drivers/net/qede/qede_ethdev.c| 1 - drivers/net/sfc/sfc_rx.c | 2 - drivers/net/thunderx/nicvf_ethdev.h | 1 - drivers/net/txgbe/txgbe_rxtx.c| 1 - drivers/net/virtio/virtio_ethdev.c| 1 - drivers/net/vmxnet3/vmxnet3_ethdev.c | 1 - examples/ip_fragmentation/main.c | 3 +- examples/ip_reassembly/main.c | 3 +- examples/ipsec-secgw/ipsec-secgw.c| 2 - examples/ipv4_multicast/main.c| 1 - examples/kni/main.c | 5 -- examples/l3fwd-acl/main.c | 2 - examples/l3fwd-graph/main.c | 1 - examples/l3fwd-power/main.c | 2 - examples/l3fwd/main.c | 1 - .../performance-thread/l3fwd-thread/main.c| 2 - examples/vhost/main.c | 2 - lib/ethdev/rte_ethdev.c | 26 +-- lib/ethdev/rte_ethdev.h | 1 - 76 files changed, 42 insertions(+), 250 deletions(-) diff --git a/app/test-eventdev/test_pipeline_common.c b/app/test-eventdev/test_pipeline_common.c index 5fcea74b4d43..2775e72c580d 100644 --- a/app/test-eventdev/test_pipeline_common.c +++ b/app/test-eventdev/test_pipeline_common.c @@ -199,8 +199,6 @@ pipeline_ethdev_setup(struct evt_test *test, stru
Re: [dpdk-dev] [PATCH v18 0/7] aarch64 -> aarch32 cross compilation support
07/07/2021 15:25, Juraj Linkeš: > Add support for aarch32 cross build in meson. > > Aarch32 is an execution state that allows execution of 32-bit code on > armv8 machines. This execution state contains a superset of previous > armv7 32-bit instructions and features. Thus the aarch32 build is distinct > from arvm7 build. > > v18: > Rebased, adjusted to use arm soc format. > > Acked-by: Aaron Conole > > Juraj Linkeš (4): > net/virtio: fix aarch32 build > eal/arm: update CPU flags > build: add aarch32 meson build flags > build: add aarch32 to meson cross-compilation > > Phil Yang (1): > doc: add aarch32 build guidance > > Ruifeng Wang (2): > net/sfc: fix aarch32 build > net/bnxt: fix aarch32 build Applied, thanks. I did few small improvements in the doc.
[dpdk-dev] RHEL 7 support
Hi, I would like to open a discussion about RHEL 7 support in DPDK. How long do we want to support it in new DPDK versions? Can we drop RHEL 7 support starting DPDK 21.11? If we decide to drop RHEL 7 support, does it mean we can generally use standard C11 atomics? What other benefits or impacts can you think about?
Re: [dpdk-dev] [PATCH v4 0/3] Use WFE for spinlock and ring
07/07/2021 07:48, Ruifeng Wang: > The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling > for a memory location to become equal to a given value'[1]. > > Use the API for the rte spinlock and ring implementations. > With the wait until equal APIs being stable, changes will not impact ABI. > > Gavin Hu (1): > spinlock: use wfe to reduce contention on aarch64 > > Ruifeng Wang (2): > ring: use wfe to wait for ring tail update on aarch64 > build: add option to enable wait until equal As discussed in the thread, patches 1 & 2 are applied. The patch 3 (meson option) is rejected.
Re: [dpdk-dev] [PATCH] net/bnxt: fix build failure
On Fri, Jul 9, 2021 at 1:48 AM Thomas Monjalon wrote: > 09/07/2021 00:49, Ajit Khaparde: > > Fix build failures because of uninitialized variable usage. > > You should add the error log here. > You don't mention the condition of failure. > Nobody reproduced a failure so far. > Sure. I did not notice it on my Fedora34 either. The build error was reported in the dpdk-test-report [1]. Let me resend a v2 with the information. [1] http://mails.dpdk.org/archives/test-report/2021-July/203186.html > > Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation") > > > > Signed-off-by: Kishore Padmanabha > > Signed-off-by: Ajit Khaparde > > > >
Re: [dpdk-dev] [PATCH] net/bnxt: fix missing barriers in completion handling
On Fri, Jul 9, 2021 at 2:00 AM Ruifeng Wang wrote: > > > +/** > > + * Check validity of a completion ring entry. If the entry is valid, > > include a > > + * C11 __ATOMIC_ACQUIRE fence to ensure that subsequent loads of fields > > in the > > + * completion are not hoisted by the compiler or by the CPU to come before > > the > > + * loading of the "valid" field. > > + * > > + * Note: the caller must not access any fields in the specified completion > > + * entry prior to calling this function. > > + * > > + * @param cmp > Nit, cmpl Thanks, good catch. I'll fix this in v2. > > > > > /* Check to see if hw has posted a completion for the descriptor. */ > > @@ -3327,7 +3327,7 @@ bnxt_tx_descriptor_status_op(void *tx_queue, > > uint16_t offset) > > cons = RING_CMPL(ring_mask, raw_cons); > > txcmp = (struct tx_cmpl *)&cp_desc_ring[cons]; > > > > - if (!CMP_VALID(txcmp, raw_cons, cp_ring_struct)) > > + if (!bnxt_cpr_cmp_valid(txcmp, raw_cons, ring_mask + 1)) > cpr->cp_ring_struct->ring_size can be used instead of 'ring_mask + 1'? > > > break; > > > > if (CMP_TYPE(txcmp) == TX_CMPL_TYPE_TX_L2) > > > > > diff --git a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c > > b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c > > index 263e6ec3c..13211060c 100644 > > --- a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c > > +++ b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c > > @@ -339,7 +339,7 @@ bnxt_handle_tx_cp_vec(struct bnxt_tx_queue *txq) > > cons = RING_CMPL(ring_mask, raw_cons); > > txcmp = (struct tx_cmpl *)&cp_desc_ring[cons]; > > > > - if (!CMP_VALID(txcmp, raw_cons, cp_ring_struct)) > > + if (!bnxt_cpr_cmp_valid(txcmp, raw_cons, ring_mask + 1)) > Same here. I think cpr->cp_ring_struct->ring_size can be used and it avoids > calculation. > Also some places in other vector files. It's true that cpr->cp_ring_struct->ring_size and ring_mask + 1 are equivalent, but there doesn't seem to be a meaningful difference between the two in the generated code. Based on disassembly of x86 and Arm code for this function, the compiler correctly determines that the value of ring_mask + 1 doesn't change within the loop, so it is only computed once. The only difference would be in whether an add instruction or a load instruction is used to put the value in the register.
Re: [dpdk-dev] [PATCH] net/bnxt: fix build failure
On Fri, Jul 9, 2021 at 4:48 AM Thomas Monjalon wrote: > > 09/07/2021 00:49, Ajit Khaparde: > > Fix build failures because of uninitialized variable usage. > > You should add the error log here. > You don't mention the condition of failure. > Nobody reproduced a failure so far. Hi Thomas, This addresses FC34 (which I think is using gcc11) build failures, there is an example here: http://mails.dpdk.org/archives/test-report/2021-July/203186.html Regards, Lance
Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
On Fri, Jul 09, 2021 at 11:46:54AM +0530, Jerin Jacob wrote: > > + > > +Promotion to stable > > +~~~ > > + > > +Ordinarily APIs marked as ``experimental`` will be promoted to the stable > > ABI > > +once a maintainer and/or the original contributor is satisfied that the > > API is > > +reasonably mature. In exceptional circumstances, should an API still be > > Is this line with git commit message? > Why making an exceptional case? why not make it stable after two years > or remove it. > My worry is if we make an exception case, it will be difficult to > enumerate the exception case. i think the intent here is to indicate that an api/abi doesn't just automatically become stable after a period of time. there also has to be an evaluation by the maintainer / community before making it stable. so i guess the timer is something that should force that evaluation. as a part of that evaluation one would imagine there is justification for keeping the api as experimental for longer and if so a rationale as to why.
Re: [dpdk-dev] [PATCH v10 0/8] Enhancements for PMD power management
On Fri, Jul 9, 2021 at 6:08 PM Anatoly Burakov wrote: > > This patchset introduces several changes related to PMD power management: > > - Changed monitoring intrinsics to use callbacks as a comparison function, > based > on previous patchset [1] but incorporating feedback [2] - this hopefully > will > make it possible to add support for .get_monitor_addr in virtio > - Add a new intrinsic to monitor multiple addresses, based on RTM instruction > set and the TPAUSE instruction > - Add support for PMD power management on multiple queues, as well as all > accompanying infrastructure and example apps changes > > v10: > - Added missing changes to NIC feature .ini files > > v9: > - Added all missing Acks and Tests > - Added a new commit with NIC features > - Addressed minor issues raised in review > > v8: > - Fixed checkpatch issue > - Added comment explaining empty poll handling (Konstantin) > > v7: > - Fixed various bugs > > v6: > - Improved the algorithm for multi-queue sleep > - Fixed segfault and addressed other feedback > > v5: > - Removed "power save queue" API and replaced with mechanism suggested by > Konstantin > - Addressed other feedback > > v4: > - Replaced raw number with a macro > - Fixed all the bugs found by Konstantin > - Some other minor corrections > > v3: > - Moved some doc updates to NIC features list > > v2: > - Changed check inversion to callbacks > - Addressed feedback from Konstantin > - Added doc updates where necessary > > [1] http://patches.dpdk.org/project/dpdk/list/?series=16930&state=* > [2] > http://patches.dpdk.org/project/dpdk/patch/819ef1ace187365a615d3383e54579e3d9fb216e.1620747068.git.anatoly.bura...@intel.com/#133274 > > Anatoly Burakov (8): > eal: use callbacks for power monitoring comparison > net/af_xdp: add power monitor support > doc: add PMD power management NIC feature > eal: add power monitor for multiple events > power: remove thread safety from PMD power API's > power: support callbacks for multiple Rx queues > power: support monitoring multiple Rx queues > examples/l3fwd-power: support multiq in PMD modes Overall, the series lgtm. I still have a comment on the opaque pointer passed in callbacks. This is not blocking, we can still go with followup patches in this release. It would be great if drivers maintainers could implement this new ops in their driver or give feedback on what should be enhanced. Series applied, thanks. -- David Marchand
[dpdk-dev] [PATCH v2] net/bnxt: fix build failure
Fix build failures because of uninitialized variables. ../drivers/net/bnxt/tf_ulp/ulp_mapper.c: In function ‘ulp_mapper_index_tbl_process’: ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: ‘*(unsigned int *)((char *)&glb_res + offsetof(struct bnxt_ulp_glb_resource_info, resource_func))’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 2252 | struct bnxt_ulp_glb_resource_info glb_res; | ^~~ ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: ‘glb_res.resource_type’ may be used uninitialized in this function [-Werror=maybe-uninitialized] ../drivers/net/bnxt/tf_core/dpool.c: In function ‘dpool_defrag’: ../drivers/net/bnxt/tf_core/dpool.c:95:18: error: ‘index’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 95 | uint32_t index; | ^ Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation") Signed-off-by: Kishore Padmanabha Signed-off-by: Ajit Khaparde --- v1->v2: update commit log to show the build error encountered. --- drivers/net/bnxt/tf_core/dpool.c | 1 + drivers/net/bnxt/tf_ulp/ulp_mapper.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/bnxt/tf_core/dpool.c b/drivers/net/bnxt/tf_core/dpool.c index 0dae42b1bb..145efa486f 100644 --- a/drivers/net/bnxt/tf_core/dpool.c +++ b/drivers/net/bnxt/tf_core/dpool.c @@ -125,6 +125,7 @@ int dpool_defrag(struct dpool *dpool, largest_free_size = 0; largest_free_index = 0; count = 0; + index = 0; for (i = 0; i < dpool->size; i++) { if (DP_IS_FREE(dpool->entry[i].flags)) { diff --git a/drivers/net/bnxt/tf_ulp/ulp_mapper.c b/drivers/net/bnxt/tf_ulp/ulp_mapper.c index acd9f996e8..871dbad0fe 100644 --- a/drivers/net/bnxt/tf_ulp/ulp_mapper.c +++ b/drivers/net/bnxt/tf_ulp/ulp_mapper.c @@ -2249,7 +2249,7 @@ ulp_mapper_index_tbl_process(struct bnxt_ulp_mapper_parms *parms, struct tf_free_tbl_entry_parms free_parms = { 0 }; uint32_t tbl_scope_id; struct tf *tfp; - struct bnxt_ulp_glb_resource_info glb_res; + struct bnxt_ulp_glb_resource_info glb_res = { 0 }; uint16_t bit_size; bool alloc = false; bool write = false; -- 2.21.1 (Apple Git-122.3)
Re: [dpdk-dev] [PATCH 1/2] build: fix SVE compile error with gcc8.3
28/06/2021 04:57, Chengwen Feng: > --- a/config/arm/meson.build > +++ b/config/arm/meson.build > @@ -488,3 +488,9 @@ if cc.get_define('__ARM_FEATURE_CRYPTO', args: > machine_args) != '' > compile_time_cpuflags += ['RTE_CPUFLAG_AES', 'RTE_CPUFLAG_PMULL', > 'RTE_CPUFLAG_SHA1', 'RTE_CPUFLAG_SHA2'] > endif > + > +# Check whether SVE ACLE is supported and set the corresponding flag which > will used with SVE ACLE code. > +if (cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != '' and > +cc.check_header('arm_sve.h')) > +dpdk_conf.set('RTE_HAS_SVE_ACLE', 1) > +endif Simpler and better sorted: --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -524,6 +524,9 @@ endif if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != '' compile_time_cpuflags += ['RTE_CPUFLAG_SVE'] +if (cc.check_header('arm_sve.h')) +dpdk_conf.set('RTE_HAS_SVE_ACLE', 1) +endif endif
Re: [dpdk-dev] [PATCH 0/2] bugfix for SVE compile
28/06/2021 04:57, Chengwen Feng: > This patch set contains two bugfixes for SVE compile. > Note: > 1) Because 2/2 patch needs backport to 20.11 so we separate it. > 2) These two patches already acked by ARM guys from previous threads. > > Chengwen Feng (2): > build: fix SVE compile error with gcc8.3 > net/hns3: fix SVE code compile error with gcc8.3 Applied with proposed improvement, thanks.
[dpdk-dev] [PATCH v2] net/bnxt: fix missing barriers in completion handling
Ensure that Rx/Tx/Async completion entry fields are accessed only after the completion's valid flag has been loaded and verified. This is needed for correct operation on systems that use relaxed memory consistency models. Fixes: 2eb53b134aae ("net/bnxt: add initial Rx code") Fixes: 6eb3cc2294fd ("net/bnxt: add initial Tx code") Cc: sta...@dpdk.org Signed-off-by: Lance Richardson Reviewed-by: Ajit Khaparde --- v2: * Corrected name of the first parameter to bnxt_cpr_cmp_valid() in comments ('cmp' to 'cmpl'). drivers/net/bnxt/bnxt_cpr.h | 36 --- drivers/net/bnxt/bnxt_ethdev.c| 16 ++-- drivers/net/bnxt/bnxt_irq.c | 7 +++--- drivers/net/bnxt/bnxt_rxr.c | 9 --- drivers/net/bnxt/bnxt_rxtx_vec_avx2.c | 2 +- drivers/net/bnxt/bnxt_rxtx_vec_neon.c | 2 +- drivers/net/bnxt/bnxt_rxtx_vec_sse.c | 2 +- drivers/net/bnxt/bnxt_txr.c | 2 +- 8 files changed, 54 insertions(+), 22 deletions(-) diff --git a/drivers/net/bnxt/bnxt_cpr.h b/drivers/net/bnxt/bnxt_cpr.h index 2a56ec52c..4095c8c40 100644 --- a/drivers/net/bnxt/bnxt_cpr.h +++ b/drivers/net/bnxt/bnxt_cpr.h @@ -8,13 +8,10 @@ #include #include +#include "hsi_struct_def_dpdk.h" struct bnxt_db_info; -#define CMP_VALID(cmp, raw_cons, ring) \ - (!!(rte_le_to_cpu_32(((struct cmpl_base *)(cmp))->info3_v) &\ - CMPL_BASE_V) == !((raw_cons) & ((ring)->ring_size))) - #define CMP_TYPE(cmp) \ (((struct cmpl_base *)cmp)->type & CMPL_BASE_TYPE_MASK) @@ -121,4 +118,35 @@ bool bnxt_is_recovery_enabled(struct bnxt *bp); bool bnxt_is_master_func(struct bnxt *bp); void bnxt_stop_rxtx(struct bnxt *bp); + +/** + * Check validity of a completion ring entry. If the entry is valid, include a + * C11 __ATOMIC_ACQUIRE fence to ensure that subsequent loads of fields in the + * completion are not hoisted by the compiler or by the CPU to come before the + * loading of the "valid" field. + * + * Note: the caller must not access any fields in the specified completion + * entry prior to calling this function. + * + * @param cmpl + * Pointer to an entry in the completion ring. + * @param raw_cons + * Raw consumer index of entry in completion ring. + * @param ring_size + * Size of completion ring. + */ +static __rte_always_inline bool +bnxt_cpr_cmp_valid(const void *cmpl, uint32_t raw_cons, uint32_t ring_size) +{ + const struct cmpl_base *c = cmpl; + bool expected, valid; + + expected = !(raw_cons & ring_size); + valid = !!(rte_le_to_cpu_32(c->info3_v) & CMPL_BASE_V); + if (valid == expected) { + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + return true; + } + return false; +} #endif diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index ed09f1bf5..ee6929692 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -3126,7 +3126,7 @@ bnxt_rx_queue_count_op(struct rte_eth_dev *dev, uint16_t rx_queue_id) { struct bnxt *bp = (struct bnxt *)dev->data->dev_private; struct bnxt_cp_ring_info *cpr; - uint32_t desc = 0, raw_cons; + uint32_t desc = 0, raw_cons, cp_ring_size; struct bnxt_rx_queue *rxq; struct rx_pkt_cmpl *rxcmp; int rc; @@ -3138,6 +3138,7 @@ bnxt_rx_queue_count_op(struct rte_eth_dev *dev, uint16_t rx_queue_id) rxq = dev->data->rx_queues[rx_queue_id]; cpr = rxq->cp_ring; raw_cons = cpr->cp_raw_cons; + cp_ring_size = cpr->cp_ring_struct->ring_size; while (1) { uint32_t agg_cnt, cons, cmpl_type; @@ -3145,7 +3146,7 @@ bnxt_rx_queue_count_op(struct rte_eth_dev *dev, uint16_t rx_queue_id) cons = RING_CMP(cpr->cp_ring_struct, raw_cons); rxcmp = (struct rx_pkt_cmpl *)&cpr->cp_desc_ring[cons]; - if (!CMP_VALID(rxcmp, raw_cons, cpr->cp_ring_struct)) + if (!bnxt_cpr_cmp_valid(rxcmp, raw_cons, cp_ring_size)) break; cmpl_type = CMP_TYPE(rxcmp); @@ -3189,7 +3190,7 @@ bnxt_rx_descriptor_status_op(void *rx_queue, uint16_t offset) struct bnxt_rx_queue *rxq = rx_queue; struct bnxt_cp_ring_info *cpr; struct bnxt_rx_ring_info *rxr; - uint32_t desc, raw_cons; + uint32_t desc, raw_cons, cp_ring_size; struct bnxt *bp = rxq->bp; struct rx_pkt_cmpl *rxcmp; int rc; @@ -3203,6 +3204,7 @@ bnxt_rx_descriptor_status_op(void *rx_queue, uint16_t offset) rxr = rxq->rx_ring; cpr = rxq->cp_ring; + cp_ring_size = cpr->cp_ring_struct->ring_size; /* * For the vector receive case, the completion at the requested @@ -3219,7 +3221,7 @@ bnxt_rx_descriptor_status_op(void *rx_queue, uint16_t offset) cons = RING_CMP(cpr->cp_ring_struct, raw_cons); r
Re: [dpdk-dev] [PATCH v2] net/bnxt: fix build failure
09/07/2021 20:38, Ajit Khaparde: > On Fri, Jul 9, 2021 at 8:37 AM Ajit Khaparde > wrote: > > > Fix build failures because of uninitialized variables. > > > > ../drivers/net/bnxt/tf_ulp/ulp_mapper.c: In function > > ‘ulp_mapper_index_tbl_process’: > > ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: ‘*(unsigned int > > *)((char *)&glb_res + offsetof(struct bnxt_ulp_glb_resource_info, > > resource_func))’ may be used uninitialized in this function > > [-Werror=maybe-uninitialized] > > 2252 | struct bnxt_ulp_glb_resource_info glb_res; > > | ^~~ > > ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: > > ‘glb_res.resource_type’ may be used uninitialized in this function > > [-Werror=maybe-uninitialized] > > > > ../drivers/net/bnxt/tf_core/dpool.c: In function ‘dpool_defrag’: > > ../drivers/net/bnxt/tf_core/dpool.c:95:18: error: ‘index’ may be used > > uninitialized in this function [-Werror=maybe-uninitialized] > >95 | uint32_t index; > > | ^ > > > > Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation") > > > > Signed-off-by: Kishore Padmanabha > > Signed-off-by: Ajit Khaparde > > Patch applied to dpdk-next-net-brcm for-next-net branch. Thanks Pulled in main with additional info: seen on Fedora Core 34 (GCC 11)
Re: [dpdk-dev] [PATCH V3] table: fix bucket empty logic
> > Due to a typo, only 3 out of 4 keys in the bucket of the exact match > > table were considered, which can result in valid keys being > > incorrectly dropped from the table. > > > > Fixes: d0a00966618ba ("table: add exact match SWX table") > > Cc: sta...@dpdk.org > > Cc: Cristian Dumitrescu > > > > Signed-off-by: Thierry Herbelot > > Acked-by: Cristian Dumitrescu Applied, thanks
Re: [dpdk-dev] [PATCH V2] pipeline: fix table entry read
06/07/2021 00:56, Cristian Dumitrescu: > The rte_swx_pipeline_table_entry_read() function is used to read from > a character string a table entry that is to be added to the table, > deleted from the table or set as the default entry of the table. > Addition needs both the match and the part of the entry, deletion > ignores the action part, while the default set ignores the match part, > hence the need to make both the match and the action part optional. > The logic for skipping the match or the action part was broken, hence > the current fix. > > Fixes: b32c0a2c5e4c ("pipeline: add SWX table update high level API") > Cc: sta...@dpdk.org > > Signed-off-by: Cristian Dumitrescu > Signed-off-by: Venkata Suresh Kumar P > Signed-off-by: Churchill Khangar Applied, thanks
Re: [dpdk-dev] [PATCH V3 1/5] examples/pipeline: improve table update CLI commands
03/07/2021 00:46, Cristian Dumitrescu: > From: Churchill Khangar > > For more felxibility, the single monolithic table update command is > split into table entry add, table entry delete, table default entry > add, pipeline commit and pipeline abort. > > Signed-off-by: Churchill Khangar > Signed-off-by: Cristian Dumitrescu Series applied, thanks.
Re: [dpdk-dev] [PATCH V2] pipeline: add support for LPM lookup
08/07/2021 12:11, Cristian Dumitrescu: > Add support for the Longest Prefix Match (LPM) lookup to the SWX > pipeline. > > Signed-off-by: Cristian Dumitrescu > Signed-off-by: Churchill Khangar Applied, thanks.
Re: [dpdk-dev] [PATCH] examples/pipeline: add FIB example
09/07/2021 19:07, Cristian Dumitrescu: > Add example for FIB with VRF and ECMP support. > > Signed-off-by: Cristian Dumitrescu > Signed-off-by: Churchill Khangar Applied, thanks.
Re: [dpdk-dev] [PATCH v2] app/procinfo: add device registers dump
21/06/2021 04:17, Min Hu (Connor): > From: Chengchang Tang > > This patch add support for dump the device registers from a running > application. It can help developers locate the problem. > > Signed-off-by: Chengchang Tang > Signed-off-by: Min Hu (Connor) > --- > v2: > * some logs are adjusted and error string are printed after > file operation fails. Ping for review please.
Re: [dpdk-dev] [PATCH] examples/pipeline: add FIB example
09/07/2021 23:43, Thomas Monjalon: > 09/07/2021 19:07, Cristian Dumitrescu: > > Add example for FIB with VRF and ECMP support. > > > > Signed-off-by: Cristian Dumitrescu > > Signed-off-by: Churchill Khangar > > Applied, thanks. Sorry, no it cannot be in 21.08-rc1 because the series "selector table" fails to compile on 32-bit target.
Re: [dpdk-dev] [PATCH V2] pipeline: add support for LPM lookup
09/07/2021 23:41, Thomas Monjalon: > 08/07/2021 12:11, Cristian Dumitrescu: > > Add support for the Longest Prefix Match (LPM) lookup to the SWX > > pipeline. > > > > Signed-off-by: Cristian Dumitrescu > > Signed-off-by: Churchill Khangar > > Applied, thanks. Sorry, no it cannot be in 21.08-rc1 because the series "selector table" fails to compile on 32-bit target.
Re: [dpdk-dev] [PATCH V3 1/5] examples/pipeline: improve table update CLI commands
09/07/2021 23:37, Thomas Monjalon: > 03/07/2021 00:46, Cristian Dumitrescu: > > From: Churchill Khangar > > > > For more felxibility, the single monolithic table update command is > > split into table entry add, table entry delete, table default entry > > add, pipeline commit and pipeline abort. > > > > Signed-off-by: Churchill Khangar > > Signed-off-by: Cristian Dumitrescu > > Series applied, thanks. Sorry, only first 2 patches are really kept, because the patch 3 fails 32-bit compilation: lib/pipeline/rte_swx_pipeline.c:9851:33: error: array subscript ‘struct rte_swx_table_selector_params[0]’ is partly outside array bounds of ‘unsigned char[24]’ [-Werror=array-bounds]
Re: [dpdk-dev] [PATCH v2] net/bnxt: fix build failure
On Fri, Jul 9, 2021 at 8:37 AM Ajit Khaparde wrote: > Fix build failures because of uninitialized variables. > > ../drivers/net/bnxt/tf_ulp/ulp_mapper.c: In function > ‘ulp_mapper_index_tbl_process’: > ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: ‘*(unsigned int > *)((char *)&glb_res + offsetof(struct bnxt_ulp_glb_resource_info, > resource_func))’ may be used uninitialized in this function > [-Werror=maybe-uninitialized] > 2252 | struct bnxt_ulp_glb_resource_info glb_res; > | ^~~ > ../drivers/net/bnxt/tf_ulp/ulp_mapper.c:2252:43: error: > ‘glb_res.resource_type’ may be used uninitialized in this function > [-Werror=maybe-uninitialized] > > ../drivers/net/bnxt/tf_core/dpool.c: In function ‘dpool_defrag’: > ../drivers/net/bnxt/tf_core/dpool.c:95:18: error: ‘index’ may be used > uninitialized in this function [-Werror=maybe-uninitialized] >95 | uint32_t index; > | ^ > > Fixes: 05b405d58148 ("net/bnxt: add dpool allocator for EM allocation") > > Signed-off-by: Kishore Padmanabha > Signed-off-by: Ajit Khaparde > --- > v1->v2: update commit log to show the build error encountered. > --- > Patch applied to dpdk-next-net-brcm for-next-net branch. Thanks > drivers/net/bnxt/tf_core/dpool.c | 1 + > drivers/net/bnxt/tf_ulp/ulp_mapper.c | 2 +- > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/bnxt/tf_core/dpool.c > b/drivers/net/bnxt/tf_core/dpool.c > index 0dae42b1bb..145efa486f 100644 > --- a/drivers/net/bnxt/tf_core/dpool.c > +++ b/drivers/net/bnxt/tf_core/dpool.c > @@ -125,6 +125,7 @@ int dpool_defrag(struct dpool *dpool, > largest_free_size = 0; > largest_free_index = 0; > count = 0; > + index = 0; > > for (i = 0; i < dpool->size; i++) { > if (DP_IS_FREE(dpool->entry[i].flags)) { > diff --git a/drivers/net/bnxt/tf_ulp/ulp_mapper.c > b/drivers/net/bnxt/tf_ulp/ulp_mapper.c > index acd9f996e8..871dbad0fe 100644 > --- a/drivers/net/bnxt/tf_ulp/ulp_mapper.c > +++ b/drivers/net/bnxt/tf_ulp/ulp_mapper.c > @@ -2249,7 +2249,7 @@ ulp_mapper_index_tbl_process(struct > bnxt_ulp_mapper_parms *parms, > struct tf_free_tbl_entry_parms free_parms = { 0 }; > uint32_t tbl_scope_id; > struct tf *tfp; > - struct bnxt_ulp_glb_resource_info glb_res; > + struct bnxt_ulp_glb_resource_info glb_res = { 0 }; > uint16_t bit_size; > bool alloc = false; > bool write = false; > -- > 2.21.1 (Apple Git-122.3) > >
[dpdk-dev] [PATCH V4 2/5] table: add support for selector tables
A selector table is made up of groups of weighted members, with a given member potentially part of several groups. The select operation returns a member ID by first selecting a group based on an input group ID and then selecting a member within that group based on hashing one or several input header/meta-data fields. It is very useful for implementing an ECMP/WCMP-enabled FIB or a load balancer. It is part of the action selector described by the P4 Portable Switch Architecture (PSA) specification. Signed-off-by: Cristian Dumitrescu --- lib/table/meson.build | 2 + lib/table/rte_swx_table_selector.c | 581 + lib/table/rte_swx_table_selector.h | 203 ++ lib/table/version.map | 8 + 4 files changed, 794 insertions(+) create mode 100644 lib/table/rte_swx_table_selector.c create mode 100644 lib/table/rte_swx_table_selector.h diff --git a/lib/table/meson.build b/lib/table/meson.build index b7b70b805..16e55f086 100644 --- a/lib/table/meson.build +++ b/lib/table/meson.build @@ -3,6 +3,7 @@ sources = files( 'rte_swx_table_em.c', + 'rte_swx_table_selector.c', 'rte_swx_table_wm.c', 'rte_table_acl.c', 'rte_table_array.c', @@ -20,6 +21,7 @@ headers = files( 'rte_lru.h', 'rte_swx_table.h', 'rte_swx_table_em.h', + 'rte_swx_table_selector.h', 'rte_swx_table_wm.h', 'rte_table.h', 'rte_table_acl.h', diff --git a/lib/table/rte_swx_table_selector.c b/lib/table/rte_swx_table_selector.c new file mode 100644 index 0..541ebc221 --- /dev/null +++ b/lib/table/rte_swx_table_selector.c @@ -0,0 +1,581 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ +#include +#include +#include +#include + +#include +#include + +#include "rte_swx_table_selector.h" + +#ifndef RTE_SWX_TABLE_SELECTOR_HUGE_PAGES_DISABLE + +#include + +static void * +env_calloc(size_t size, size_t alignment, int numa_node) +{ + return rte_zmalloc_socket(NULL, size, alignment, numa_node); +} + +static void +env_free(void *start, size_t size __rte_unused) +{ + rte_free(start); +} + +#else + +#include + +static void * +env_calloc(size_t size, size_t alignment __rte_unused, int numa_node) +{ + void *start; + + if (numa_available() == -1) + return NULL; + + start = numa_alloc_onnode(size, numa_node); + if (!start) + return NULL; + + memset(start, 0, size); + return start; +} + +static void +env_free(void *start, size_t size) +{ + if ((numa_available() == -1) || !start) + return; + + numa_free(start, size); +} + +#endif + +#if defined(RTE_ARCH_X86_64) + +#include + +#define crc32_u64(crc, v) _mm_crc32_u64(crc, v) + +#else + +static inline uint64_t +crc32_u64_generic(uint64_t crc, uint64_t value) +{ + int i; + + crc = (crc & 0xLLU) ^ value; + for (i = 63; i >= 0; i--) { + uint64_t mask; + + mask = -(crc & 1LLU); + crc = (crc >> 1LLU) ^ (0x82F63B78LLU & mask); + } + + return crc; +} + +#define crc32_u64(crc, v) crc32_u64_generic(crc, v) + +#endif + +/* Key size needs to be one of: 8, 16, 32 or 64. */ +static inline uint32_t +hash(void *key, void *key_mask, uint32_t key_size, uint32_t seed) +{ + uint64_t *k = key; + uint64_t *m = key_mask; + uint64_t k0, k2, k5, crc0, crc1, crc2, crc3, crc4, crc5; + + switch (key_size) { + case 8: + crc0 = crc32_u64(seed, k[0] & m[0]); + return crc0; + + case 16: + k0 = k[0] & m[0]; + + crc0 = crc32_u64(k0, seed); + crc1 = crc32_u64(k0 >> 32, k[1] & m[1]); + + crc0 ^= crc1; + + return crc0; + + case 32: + k0 = k[0] & m[0]; + k2 = k[2] & m[2]; + + crc0 = crc32_u64(k0, seed); + crc1 = crc32_u64(k0 >> 32, k[1] & m[1]); + + crc2 = crc32_u64(k2, k[3] & m[3]); + crc3 = k2 >> 32; + + crc0 = crc32_u64(crc0, crc1); + crc1 = crc32_u64(crc2, crc3); + + crc0 ^= crc1; + + return crc0; + + case 64: + k0 = k[0] & m[0]; + k2 = k[2] & m[2]; + k5 = k[5] & m[5]; + + crc0 = crc32_u64(k0, seed); + crc1 = crc32_u64(k0 >> 32, k[1] & m[1]); + + crc2 = crc32_u64(k2, k[3] & m[3]); + crc3 = crc32_u64(k2 >> 32, k[4] & m[4]); + + crc4 = crc32_u64(k5, k[6] & m[6]); + crc5 = crc32_u64(k5 >> 32, k[7] & m[7]); + + crc0 = crc32_u64(crc0, (crc1 << 32) ^ crc2); + crc1 = crc32_u64(crc3, (crc4 << 32) ^ crc5); + + crc0 ^= crc1; + + return crc0; + + default: + crc0 = 0; + return crc0;
[dpdk-dev] [PATCH V4 1/5] examples/pipeline: improve table update CLI commands
From: Churchill Khangar For more felxibility, the single monolithic table update command is split into table entry add, table entry delete, table default entry add, pipeline commit and pipeline abort. Signed-off-by: Churchill Khangar Signed-off-by: Cristian Dumitrescu --- examples/pipeline/cli.c | 589 -- examples/pipeline/examples/vxlan.cli | 3 +- examples/pipeline/examples/vxlan_pcap.cli | 3 +- 3 files changed, 428 insertions(+), 167 deletions(-) diff --git a/examples/pipeline/cli.c b/examples/pipeline/cli.c index 215dd8e85..30754e319 100644 --- a/examples/pipeline/cli.c +++ b/examples/pipeline/cli.c @@ -1038,25 +1038,76 @@ table_entry_free(struct rte_swx_table_entry *entry) free(entry); } -static const char cmd_pipeline_table_update_help[] = -"pipeline table update " -" "; +#ifndef MAX_LINE_SIZE +#define MAX_LINE_SIZE 2048 +#endif + +static int +pipeline_table_entries_add(struct rte_swx_ctl_pipeline *p, + const char *table_name, + FILE *file, + uint32_t *file_line_number) +{ + char *line = NULL; + uint32_t line_id = 0; + int status = 0; + + /* Buffer allocation. */ + line = malloc(MAX_LINE_SIZE); + if (!line) + return -ENOMEM; + + /* File read. */ + for (line_id = 1; ; line_id++) { + struct rte_swx_table_entry *entry; + int is_blank_or_comment; + + if (fgets(line, MAX_LINE_SIZE, file) == NULL) + break; + + entry = rte_swx_ctl_pipeline_table_entry_read(p, + table_name, + line, + &is_blank_or_comment); + if (!entry) { + if (is_blank_or_comment) + continue; + + status = -EINVAL; + goto error; + } + + status = rte_swx_ctl_pipeline_table_entry_add(p, + table_name, + entry); + table_entry_free(entry); + if (status) + goto error; + } + +error: + free(line); + *file_line_number = line_id; + return status; +} + +static const char cmd_pipeline_table_add_help[] = +"pipeline table add \n"; static void -cmd_pipeline_table_update(char **tokens, - uint32_t n_tokens, - char *out, - size_t out_size, - void *obj) +cmd_pipeline_table_add(char **tokens, + uint32_t n_tokens, + char *out, + size_t out_size, + void *obj) { struct pipeline *p; - char *pipeline_name, *table_name, *line = NULL; - char *file_name_add, *file_name_delete, *file_name_default; - FILE *file_add = NULL, *file_delete = NULL, *file_default = NULL; - uint32_t line_id; + char *pipeline_name, *table_name, *file_name; + FILE *file = NULL; + uint32_t file_line_number = 0; int status; - if (n_tokens != 8) { + if (n_tokens != 6) { snprintf(out, out_size, MSG_ARG_MISMATCH, tokens[0]); return; } @@ -1068,192 +1119,313 @@ cmd_pipeline_table_update(char **tokens, return; } - if (strcmp(tokens[2], "table") != 0) { - snprintf(out, out_size, MSG_ARG_NOT_FOUND, "table"); + table_name = tokens[3]; + + file_name = tokens[5]; + file = fopen(file_name, "r"); + if (!file) { + snprintf(out, out_size, "Cannot open file %s.\n", file_name); + return; + } + + status = pipeline_table_entries_add(p->ctl, + table_name, + file, + &file_line_number); + if (status) + snprintf(out, out_size, "Invalid entry in file %s at line %u\n", +file_name, +file_line_number); + + fclose(file); +} + +static int +pipeline_table_entries_delete(struct rte_swx_ctl_pipeline *p, + const char *table_name, + FILE *file, + uint32_t *file_line_number) +{ + char *line = NULL; + uint32_t line_id = 0; + int status = 0; + + /* Buffer allocation. */ + line = malloc(MAX_LINE_SIZE); + if (!line) + return -ENOMEM; + + /* File read. */ + for (line_id = 1; ; line_id++) { + struct rte_swx_table_entry *entry; + int is_blank_or_comment; + +
[dpdk-dev] [PATCH V4 3/5] pipeline: add support for selector tables
Add pipeline-level support for selector tables, Signed-off-by: Cristian Dumitrescu --- Change log: V3 -> V4: Fixed proper name of allocated structure (rte_swx_table_selector_params instead of rte_swx_pipeline_selector_params). lib/pipeline/rte_swx_ctl.c | 700 - lib/pipeline/rte_swx_ctl.h | 253 + lib/pipeline/rte_swx_pipeline.c | 748 --- lib/pipeline/rte_swx_pipeline.h | 51 ++ lib/pipeline/rte_swx_pipeline_spec.c | 354 - lib/pipeline/version.map | 13 + 6 files changed, 2034 insertions(+), 85 deletions(-) diff --git a/lib/pipeline/rte_swx_ctl.c b/lib/pipeline/rte_swx_ctl.c index 5d04e750f..8cabce2b9 100644 --- a/lib/pipeline/rte_swx_ctl.c +++ b/lib/pipeline/rte_swx_ctl.c @@ -10,6 +10,8 @@ #include #include +#include + #include "rte_swx_ctl.h" #define CHECK(condition, err_code) \ @@ -89,11 +91,44 @@ struct table { uint32_t n_delete; }; +struct selector { + /* Selector table info. */ + struct rte_swx_ctl_selector_info info; + + /* group_id field. */ + struct rte_swx_ctl_table_match_field_info group_id_field; + + /* selector fields. */ + struct rte_swx_ctl_table_match_field_info *selector_fields; + + /* member_id field. */ + struct rte_swx_ctl_table_match_field_info member_id_field; + + /* Current selector table. Array of info.n_groups_max elements.*/ + struct rte_swx_table_selector_group **groups; + + /* Pending selector table subject to the next commit. Array of info.n_groups_max elements. +*/ + struct rte_swx_table_selector_group **pending_groups; + + /* Valid flag per group. Array of n_groups_max elements. */ + int *groups_added; + + /* Pending delete flag per group. Group deletion is subject to the next commit. Array of +* info.n_groups_max elements. +*/ + int *groups_pending_delete; + + /* Params. */ + struct rte_swx_table_selector_params params; +}; + struct rte_swx_ctl_pipeline { struct rte_swx_ctl_pipeline_info info; struct rte_swx_pipeline *p; struct action *actions; struct table *tables; + struct selector *selectors; struct rte_swx_table_state *ts; struct rte_swx_table_state *ts_next; int numa_node; @@ -709,6 +744,209 @@ table_free(struct rte_swx_ctl_pipeline *ctl) ctl->tables = NULL; } +static void +selector_group_members_free(struct selector *s, uint32_t group_id) +{ + struct rte_swx_table_selector_group *group = s->groups[group_id]; + + if (!group) + return; + + for ( ; ; ) { + struct rte_swx_table_selector_member *m; + + m = TAILQ_FIRST(&group->members); + if (!m) + break; + + TAILQ_REMOVE(&group->members, m, node); + free(m); + } + + free(group); + s->groups[group_id] = NULL; +} + +static void +selector_pending_group_members_free(struct selector *s, uint32_t group_id) +{ + struct rte_swx_table_selector_group *group = s->pending_groups[group_id]; + + if (!group) + return; + + for ( ; ; ) { + struct rte_swx_table_selector_member *m; + + m = TAILQ_FIRST(&group->members); + if (!m) + break; + + TAILQ_REMOVE(&group->members, m, node); + free(m); + } + + free(group); + s->pending_groups[group_id] = NULL; +} + +static int +selector_group_duplicate_to_pending(struct selector *s, uint32_t group_id) +{ + struct rte_swx_table_selector_group *g, *gp; + struct rte_swx_table_selector_member *m; + + selector_pending_group_members_free(s, group_id); + + g = s->groups[group_id]; + gp = s->pending_groups[group_id]; + + if (!gp) { + gp = calloc(1, sizeof(struct rte_swx_table_selector_group)); + if (!gp) + goto error; + + TAILQ_INIT(&gp->members); + + s->pending_groups[group_id] = gp; + } + + if (!g) + return 0; + + TAILQ_FOREACH(m, &g->members, node) { + struct rte_swx_table_selector_member *mp; + + mp = calloc(1, sizeof(struct rte_swx_table_selector_member)); + if (!mp) + goto error; + + memcpy(mp, m, sizeof(struct rte_swx_table_selector_member)); + + TAILQ_INSERT_TAIL(&gp->members, mp, node); + } + + return 0; + +error: + selector_pending_group_members_free(s, group_id); + return -ENOMEM; +} + +static void +selector_free(struct rte_swx_ctl_pipeline *ctl) +{ + uint32_t i; + + if (ctl->selectors) + return; + + for (i = 0; i < ctl->info.n_selectors; i++) {
[dpdk-dev] [PATCH V4 4/5] examples/pipeline: add support for selector tables
Add application-evel support for selector tables. Signed-off-by: Churchill Khangar Signed-off-by: Cristian Dumitrescu --- examples/pipeline/cli.c | 563 1 file changed, 563 insertions(+) diff --git a/examples/pipeline/cli.c b/examples/pipeline/cli.c index 30754e319..f67783c8f 100644 --- a/examples/pipeline/cli.c +++ b/examples/pipeline/cli.c @@ -1368,6 +1368,467 @@ cmd_pipeline_table_show(char **tokens, snprintf(out, out_size, MSG_ARG_INVALID, "table_name"); } +static const char cmd_pipeline_selector_group_add_help[] = +"pipeline selector group add\n"; + +static void +cmd_pipeline_selector_group_add(char **tokens, + uint32_t n_tokens, + char *out, + size_t out_size, + void *obj) +{ + struct pipeline *p; + char *pipeline_name, *selector_name; + uint32_t group_id; + int status; + + if (n_tokens != 6) { + snprintf(out, out_size, MSG_ARG_MISMATCH, tokens[0]); + return; + } + + pipeline_name = tokens[1]; + p = pipeline_find(obj, pipeline_name); + if (!p || !p->ctl) { + snprintf(out, out_size, MSG_ARG_INVALID, "pipeline_name"); + return; + } + + if (strcmp(tokens[2], "selector") != 0) { + snprintf(out, out_size, MSG_ARG_NOT_FOUND, "selector"); + return; + } + + selector_name = tokens[3]; + + if (strcmp(tokens[4], "group") || + strcmp(tokens[5], "add")) { + snprintf(out, out_size, MSG_ARG_NOT_FOUND, "group add"); + return; + } + + status = rte_swx_ctl_pipeline_selector_group_add(p->ctl, + selector_name, + &group_id); + if (status) + snprintf(out, out_size, MSG_CMD_FAIL, tokens[0]); + else + snprintf(out, out_size, "Group ID: %u\n", group_id); +} + +static const char cmd_pipeline_selector_group_delete_help[] = +"pipeline selector group delete \n"; + +static void +cmd_pipeline_selector_group_delete(char **tokens, + uint32_t n_tokens, + char *out, + size_t out_size, + void *obj) +{ + struct pipeline *p; + char *pipeline_name, *selector_name; + uint32_t group_id; + int status; + + if (n_tokens != 7) { + snprintf(out, out_size, MSG_ARG_MISMATCH, tokens[0]); + return; + } + + pipeline_name = tokens[1]; + p = pipeline_find(obj, pipeline_name); + if (!p || !p->ctl) { + snprintf(out, out_size, MSG_ARG_INVALID, "pipeline_name"); + return; + } + + if (strcmp(tokens[2], "selector") != 0) { + snprintf(out, out_size, MSG_ARG_NOT_FOUND, "selector"); + return; + } + + selector_name = tokens[3]; + + if (strcmp(tokens[4], "group") || + strcmp(tokens[5], "delete")) { + snprintf(out, out_size, MSG_ARG_NOT_FOUND, "group delete"); + return; + } + + if (parser_read_uint32(&group_id, tokens[6]) != 0) { + snprintf(out, out_size, MSG_ARG_INVALID, "group_id"); + return; + } + + status = rte_swx_ctl_pipeline_selector_group_delete(p->ctl, + selector_name, + group_id); + if (status) + snprintf(out, out_size, MSG_CMD_FAIL, tokens[0]); +} + +#define GROUP_MEMBER_INFO_TOKENS_MAX 6 + +static int +token_is_comment(const char *token) +{ + if ((token[0] == '#') || + (token[0] == ';') || + ((token[0] == '/') && (token[1] == '/'))) + return 1; /* TRUE. */ + + return 0; /* FALSE. */ +} + +static int +pipeline_selector_group_member_read(const char *string, + uint32_t *group_id, + uint32_t *member_id, + uint32_t *weight, + int *is_blank_or_comment) +{ + char *token_array[GROUP_MEMBER_INFO_TOKENS_MAX], **tokens; + char *s0 = NULL, *s; + uint32_t n_tokens = 0, group_id_val, member_id_val, weight_val; + int blank_or_comment = 0; + + /* Check input arguments. */ + if (!string || !string[0]) + goto error; + + /* Memory allocation. */ + s0 = strdup(string); + if (!s0) + goto error; + + /* Parse the string into tokens. */ + for (s = s0; ; ) { + char *token; + + token = strtok_r(s, " \f\n\r\t\v", &s); + if (!token || token_is_comment(token)) + break; + + if (n_tokens > GROUP_MEMBER_INFO_TOKENS_MAX) + goto error; + + token_array[n_tokens] = token; + n_tokens++; + } + + if (!n_tokens) { + blank_or_comment = 1; + goto
[dpdk-dev] [PATCH V4 5/5] examples/pipeline: add selector example
Added the files to illustrate the selector table usage. Signed-off-by: Cristian Dumitrescu --- examples/pipeline/examples/selector.cli | 31 examples/pipeline/examples/selector.spec | 95 examples/pipeline/examples/selector.txt | 4 + 3 files changed, 130 insertions(+) create mode 100644 examples/pipeline/examples/selector.cli create mode 100644 examples/pipeline/examples/selector.spec create mode 100644 examples/pipeline/examples/selector.txt diff --git a/examples/pipeline/examples/selector.cli b/examples/pipeline/examples/selector.cli new file mode 100644 index 0..36f3ead54 --- /dev/null +++ b/examples/pipeline/examples/selector.cli @@ -0,0 +1,31 @@ +; SPDX-License-Identifier: BSD-3-Clause +; Copyright(c) 2020 Intel Corporation + +mempool MEMPOOL0 buffer 2304 pool 32K cache 256 cpu 0 + +link LINK0 dev :18:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on +link LINK1 dev :18:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on +link LINK2 dev :3b:00.0 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on +link LINK3 dev :3b:00.1 rxq 1 128 MEMPOOL0 txq 1 512 promiscuous on + +pipeline PIPELINE0 create 0 + +pipeline PIPELINE0 port in 0 link LINK0 rxq 0 bsz 32 +pipeline PIPELINE0 port in 1 link LINK1 rxq 0 bsz 32 +pipeline PIPELINE0 port in 2 link LINK2 rxq 0 bsz 32 +pipeline PIPELINE0 port in 3 link LINK3 rxq 0 bsz 32 + +pipeline PIPELINE0 port out 0 link LINK0 txq 0 bsz 32 +pipeline PIPELINE0 port out 1 link LINK1 txq 0 bsz 32 +pipeline PIPELINE0 port out 2 link LINK2 txq 0 bsz 32 +pipeline PIPELINE0 port out 3 link LINK3 txq 0 bsz 32 +pipeline PIPELINE0 port out 4 sink none + +pipeline PIPELINE0 build ./examples/pipeline/examples/selector.spec + +pipeline PIPELINE0 selector s group add +pipeline PIPELINE0 selector s group member add ./examples/pipeline/examples/selector.txt +pipeline PIPELINE0 commit +pipeline PIPELINE0 selector s show + +thread 1 pipeline PIPELINE0 enable diff --git a/examples/pipeline/examples/selector.spec b/examples/pipeline/examples/selector.spec new file mode 100644 index 0..72e49cb1d --- /dev/null +++ b/examples/pipeline/examples/selector.spec @@ -0,0 +1,95 @@ +; SPDX-License-Identifier: BSD-3-Clause +; Copyright(c) 2020 Intel Corporation + +; A selector table is made out of groups of weighted members, with a given member potentially part +; of several groups. The select operation returns a member ID by first selecting a group based on an +; input group ID and then selecting a member within that group by hashing one or several input +; header or meta-data fields. It is very useful for implementing an Equal-Cost Multi-Path (ECMP) or +; Weighted-Cost Multi-Path (WCMP) enabled FIB or a load balancer. It is part of the action selector +; construct described by the P4 Portable Switch Architecture (PSA) specification. +; +; Normally, an action selector FIB is built with a routing table (the base table), a selector table +; (the group table) and a next hop table (the member table). One of the routing table actions sets +; up the group ID meta-data field used as the index into the group table, which produces the member +; ID meta-data field, i.e. the next hop ID that is used as the index into the next hop table. The +; next hop action prepares the output packet for being sent next hop in the network by prepending +; one or several headers to the packet (Ethernet at the very least), decrementing the TTL and +; recomputing the IPv4 checksum, etc. The selector allows for multiple next hops to be specified +; for any given route as opposed to a single next hop per route; for every packet, its next hop is +; picked out of the set of next hops defined for the route while preserving the packet ordering +; within the flow, with the flow defined by the selector n-tuple fields. +; +; In this simple example, the base table and the member table are striped out in order to focus +; exclusively on illustrating the selector table. The group_id is read from the destination MAC +; address and the selector n-tuple is represented by the Protocol, the source IP address and the +; destination IP address fields. The member_id produced by the selector table is used to identify +; the output port which facilitates the testing of different member weights by simply comparing the +; rates of output packets sent on different ports. + +// +// Headers +// +struct ethernet_h { + bit<48> dst_addr + bit<48> src_addr + bit<16> ethertype +} + +struct ipv4_h { + bit<8> ver_ihl + bit<8> diffserv + bit<16> total_len + bit<16> identification + bit<16> flags_offset + bit<8> ttl + bit<8> protocol + bit<16> hdr_checksum + bit<32> src_addr + bit<32> dst_addr +} + +header ethernet instanceof ethernet_h +header ipv4 instanceof ipv4_h + +// +// Meta-data +// +struct metadata_t { + bit<32> port_in + bit<32> port_out + bit<32> group_id +} + +metadata instanceo
Re: [dpdk-dev] [PATCH V3 1/5] examples/pipeline: improve table update CLI commands
> -Original Message- > From: Thomas Monjalon > Sent: Friday, July 9, 2021 11:14 PM > To: Dumitrescu, Cristian > Cc: dev@dpdk.org; Khangar, Churchill > Subject: Re: [dpdk-dev] [PATCH V3 1/5] examples/pipeline: improve table > update CLI commands > > 09/07/2021 23:37, Thomas Monjalon: > > 03/07/2021 00:46, Cristian Dumitrescu: > > > From: Churchill Khangar > > > > > > For more felxibility, the single monolithic table update command is > > > split into table entry add, table entry delete, table default entry > > > add, pipeline commit and pipeline abort. > > > > > > Signed-off-by: Churchill Khangar > > > Signed-off-by: Cristian Dumitrescu > > > > Series applied, thanks. > > Sorry, only first 2 patches are really kept, > because the patch 3 fails 32-bit compilation: > > lib/pipeline/rte_swx_pipeline.c:9851:33: error: > array subscript ‘struct rte_swx_table_selector_params[0]’ > is partly outside array bounds of ‘unsigned char[24]’ [-Werror=array-bounds] > > Hi Thomas, I just set V4 for this patch set (and the entire series); it was basically just a typo that created a small memory allocation bug: the code was incorrectly using sizeof(struct rte_swx_pipeline_selector_params) instead of sizeof(struct rte_swx_table_selector_params) for the calloc call, hence the warning. I know it is very late at night right now and you are still awake working to get RC1 done (I am just 1 hour before you time zone wise ;)), I would appreciate if you could still look at applying this for RC1, only if possible. Thank you so much! Regards, Cristian
[dpdk-dev] [PATCH v2 0/3] features for hns3 PMD
This patchset contains 3 features for hns3 PMD: add query basic info support for VF support for VF modify VLAN filter state support multiple TC MAC pause Chengchang Tang (2): net/hns3: add query basic info support for VF net/hns3: support for VF modify VLAN filter state Huisong Li (1): net/hns3: support multiple TC MAC pause --- v2: * Abandon the last patch "supports disabling PFC by dev configure API". drivers/net/hns3/hns3_cmd.h | 9 drivers/net/hns3/hns3_ethdev.c| 5 +- drivers/net/hns3/hns3_ethdev.h| 6 +++ drivers/net/hns3/hns3_ethdev_vf.c | 96 ++- drivers/net/hns3/hns3_mbx.h | 11 - 5 files changed, 104 insertions(+), 23 deletions(-) -- 2.7.4
[dpdk-dev] [PATCH v2 3/3] net/hns3: support multiple TC MAC pause
From: Huisong Li MAC PAUSE can take effect on a single TC or multiple TCs, depending on the hardware. For example, the Kunpeng 920 supports MAC pause in a single TC, and the Kunpeng 930 supports MAC pause in multiple TCs. This patch supports MAC PAUSE in multiple TC for some hardware. Signed-off-by: Huisong Li Signed-off-by: Min Hu (Connor) --- drivers/net/hns3/hns3_ethdev.c | 5 - drivers/net/hns3/hns3_ethdev.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c index e515125..17b995a 100644 --- a/drivers/net/hns3/hns3_ethdev.c +++ b/drivers/net/hns3/hns3_ethdev.c @@ -3317,6 +3317,7 @@ hns3_get_capability(struct hns3_hw *hw) pf->tqp_config_mode = HNS3_FIXED_MAX_TQP_NUM_MODE; hw->rss_info.ipv6_sctp_offload_supported = false; hw->udp_cksum_mode = HNS3_SPECIAL_PORT_SW_CKSUM_MODE; + pf->support_multi_tc_pause = false; return 0; } @@ -3337,6 +3338,7 @@ hns3_get_capability(struct hns3_hw *hw) pf->tqp_config_mode = HNS3_FLEX_MAX_TQP_NUM_MODE; hw->rss_info.ipv6_sctp_offload_supported = true; hw->udp_cksum_mode = HNS3_SPECIAL_PORT_HW_CKSUM_MODE; + pf->support_multi_tc_pause = true; return 0; } @@ -6103,6 +6105,7 @@ static int hns3_flow_ctrl_set(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf) { struct hns3_hw *hw = HNS3_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct hns3_pf *pf = HNS3_DEV_PRIVATE_TO_PF(dev->data->dev_private); int ret; if (fc_conf->high_water || fc_conf->low_water || @@ -6132,7 +6135,7 @@ hns3_flow_ctrl_set(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf) return -EOPNOTSUPP; } - if (hw->num_tc > 1) { + if (hw->num_tc > 1 && !pf->support_multi_tc_pause) { hns3_err(hw, "in multi-TC scenarios, MAC pause is not supported."); return -EOPNOTSUPP; } diff --git a/drivers/net/hns3/hns3_ethdev.h b/drivers/net/hns3/hns3_ethdev.h index 729e1c3..3485614 100644 --- a/drivers/net/hns3/hns3_ethdev.h +++ b/drivers/net/hns3/hns3_ethdev.h @@ -783,6 +783,7 @@ struct hns3_pf { uint8_t prio_tc[HNS3_MAX_USER_PRIO]; /* TC indexed by prio */ uint16_t pause_time; bool support_fc_autoneg; /* support FC autonegotiate */ + bool support_multi_tc_pause; uint16_t wanted_umv_size; uint16_t max_umv_size; -- 2.7.4