Re: [PATCH v2 16/29] power: remove experimental from API's
On 09/08/2023 01:10, Stephen Hemminger wrote: The power management API's were last changed in 22.11 release. Therefore remove experimental for 23.11 release. Signed-off-by: Stephen Hemminger --- Acked-by: David Hunt
Re: [PATCH v4 16/28] power: remove experimental from API's
On 19/10/2023 20:10, Stephen Hemminger wrote: The power management API's were last changed in 22.11 release. Therefore remove experimental for 23.11 release. Signed-off-by: Stephen Hemminger --- Acked-by: David Hunt
Re: [PATCH] maintainers: volunteer to maintain power library
On 23/10/2023 05:27, Sivaprasad Tummala wrote: Add co-maintainer for power library. Signed-off-by: Sivaprasad Tummala --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) Thanks for the help, Sivaprasad. Acked-by: David Hunt
Re: [PATCH v1 3/4] test/power: removed function pointer validations
On 20/07/2024 17:50, Sivaprasad Tummala wrote: After refactoring the power library, power management operations are now consistently supported regardless of the operating environment, making function pointer checks unnecessary and thus removed from applications. Signed-off-by: Sivaprasad Tummala --- app/test/test_power.c | 95 --- app/test/test_power_cpufreq.c | 52 --- app/test/test_power_kvm_vm.c | 36 - 3 files changed, 183 deletions(-) Hi Sivaprasad, Nice work on the patch-set. There's just four function pointer checks remaining that my compiler is complaining about. They are in examples/l3fwd-power/main.c (lines 443, 452, 1350, 1353). It would be nice to have these removed as well, seeing as the functions are now inlines and don't need these checks. I'm running the patch set through some tests here, will keep you posted on progress. Rgds, Dave. ---snip---
Re: [PATCH v1 1/4] power: refactor core power management library
Hi Sivaprasad, A couple of comments below: On 20/07/2024 17:50, Sivaprasad Tummala wrote: This patch introduces a comprehensive refactor to the core power management library. The primary focus is on improving modularity and organization by relocating specific driver implementations from the 'lib/power' directory to dedicated directories within 'drivers/power/core/*'. The adjustment of meson.build files enables the selective activation of individual drivers. These changes contribute to a significant enhancement in code organization, providing a clearer structure for driver implementations. The refactor aims to improve overall code clarity and boost maintainability. Additionally, it establishes a foundation for future development, allowing for more focused work on individual drivers and seamless integration of forthcoming enhancements. Signed-off-by: Sivaprasad Tummala --- drivers/meson.build | 1 + .../power/acpi/acpi_cpufreq.c | 22 +- .../power/acpi/acpi_cpufreq.h | 6 +- drivers/power/acpi/meson.build| 10 + .../power/amd_pstate/amd_pstate_cpufreq.c | 24 +- .../power/amd_pstate/amd_pstate_cpufreq.h | 8 +- drivers/power/amd_pstate/meson.build | 10 + .../power/cppc/cppc_cpufreq.c | 22 +- .../power/cppc/cppc_cpufreq.h | 8 +- drivers/power/cppc/meson.build| 10 + .../power/kvm_vm}/guest_channel.c | 0 .../power/kvm_vm}/guest_channel.h | 0 .../power/kvm_vm/kvm_vm.c | 22 +- .../power/kvm_vm/kvm_vm.h | 6 +- drivers/power/kvm_vm/meson.build | 16 + drivers/power/meson.build | 12 + drivers/power/pstate/meson.build | 10 + .../power/pstate/pstate_cpufreq.c | 22 +- .../power/pstate/pstate_cpufreq.h | 6 +- lib/power/meson.build | 7 +- lib/power/power_common.c | 2 +- lib/power/power_common.h | 16 +- lib/power/rte_power.c | 287 ++ lib/power/rte_power.h | 139 ++--- lib/power/rte_power_core_ops.h| 208 + lib/power/version.map | 14 + 26 files changed, 618 insertions(+), 270 deletions(-) rename lib/power/power_acpi_cpufreq.c => drivers/power/acpi/acpi_cpufreq.c (95%) rename lib/power/power_acpi_cpufreq.h => drivers/power/acpi/acpi_cpufreq.h (98%) create mode 100644 drivers/power/acpi/meson.build rename lib/power/power_amd_pstate_cpufreq.c => drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%) rename lib/power/power_amd_pstate_cpufreq.h => drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%) create mode 100644 drivers/power/amd_pstate/meson.build rename lib/power/power_cppc_cpufreq.c => drivers/power/cppc/cppc_cpufreq.c (95%) rename lib/power/power_cppc_cpufreq.h => drivers/power/cppc/cppc_cpufreq.h (97%) create mode 100644 drivers/power/cppc/meson.build rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%) rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%) rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c (82%) rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h (98%) create mode 100644 drivers/power/kvm_vm/meson.build create mode 100644 drivers/power/meson.build create mode 100644 drivers/power/pstate/meson.build rename lib/power/power_pstate_cpufreq.c => drivers/power/pstate/pstate_cpufreq.c (96%) rename lib/power/power_pstate_cpufreq.h => drivers/power/pstate/pstate_cpufreq.h (98%) create mode 100644 lib/power/rte_power_core_ops.h --snip-- diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c index 36c3f3da98..8afb5949b9 100644 --- a/lib/power/rte_power.c +++ b/lib/power/rte_power.c @@ -8,153 +8,86 @@ #include #include "rte_power.h" -#include "power_acpi_cpufreq.h" -#include "power_cppc_cpufreq.h" #include "power_common.h" -#include "power_kvm_vm.h" -#include "power_pstate_cpufreq.h" -#include "power_amd_pstate_cpufreq.h" -enum power_management_env global_default_env = PM_ENV_NOT_SET; +static enum power_management_env global_default_env = PM_ENV_NOT_SET; +static struct rte_power_core_ops *global_power_core_ops; Suggest initialising this to NULL so we can check in rte_power_get_core_ops if it's null and throw an error. --snip-- +struct rte_power_core_ops * +rte_power_get_core_ops(void) +{ Need a check here to see if rte_power_get_core_ops is NULL. If it is, then the developer has probably called a frequency change API before the relevant init function, so throw an error. Also, all the functions that call this need to check if it returns NULL so as to avoid a segfault when they attempts to call the op function. + return global_power_core_op
Re: [PATCH v1 2/4] power: refactor uncore power management library
On 20/07/2024 17:50, Sivaprasad Tummala wrote: This patch refactors the power management library, addressing uncore power management. The primary changes involve the creation of dedicated directories for each driver within 'drivers/power/uncore/*'. The adjustment of meson.build files enables the selective activation of individual drivers. This refactor significantly improves code organization, enhances clarity and boosts maintainability. It lays the foundation for more focused development on individual drivers and facilitates seamless integration of future enhancements, particularly the AMD uncore driver. Signed-off-by: Sivaprasad Tummala --- .../power/intel_uncore/intel_uncore.c | 18 +- .../power/intel_uncore/intel_uncore.h | 8 +- drivers/power/intel_uncore/meson.build| 7 + drivers/power/meson.build | 3 +- lib/power/meson.build | 2 +- lib/power/rte_power_uncore.c | 206 ++- lib/power/rte_power_uncore.h | 91 --- lib/power/rte_power_uncore_ops.h | 239 ++ lib/power/version.map | 1 + 9 files changed, 406 insertions(+), 169 deletions(-) rename lib/power/power_intel_uncore.c => drivers/power/intel_uncore/intel_uncore.c (95%) rename lib/power/power_intel_uncore.h => drivers/power/intel_uncore/intel_uncore.h (97%) create mode 100644 drivers/power/intel_uncore/meson.build create mode 100644 lib/power/rte_power_uncore_ops.h --snip-- diff --git a/lib/power/rte_power_uncore.c b/lib/power/rte_power_uncore.c index 48c75a5da0..127f6ed212 100644 --- a/lib/power/rte_power_uncore.c +++ b/lib/power/rte_power_uncore.c @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2010-2014 Intel Corporation - * Copyright(c) 2023 AMD Corporation + * Copyright(c) 2024 Advanced Micro Devices, Inc. */ --snip-- +struct rte_power_uncore_ops * +rte_power_get_uncore_ops(void) +{ + RTE_ASSERT(global_uncore_ops != NULL); I'm only seeing this now after sending the email for the first patch. This would be a good solution for the global_core_ops check in rte_power_get_core_ops() in rte_power.c, and would be the smaller change, rather than checking everywhere rte_power_get_env() is called. + + return global_uncore_ops; } --snip--
Re: [PATCH v1 4/4] power/amd_uncore: uncore power management support for AMD EPYC processors
On 20/07/2024 17:50, Sivaprasad Tummala wrote: This patch introduces driver support for power management of uncore components in AMD EPYC processors. Signed-off-by: Sivaprasad Tummala --- drivers/power/amd_uncore/amd_uncore.c | 321 ++ drivers/power/amd_uncore/amd_uncore.h | 226 ++ drivers/power/amd_uncore/meson.build | 20 ++ drivers/power/meson.build | 1 + 4 files changed, 568 insertions(+) create mode 100644 drivers/power/amd_uncore/amd_uncore.c create mode 100644 drivers/power/amd_uncore/amd_uncore.h create mode 100644 drivers/power/amd_uncore/meson.build diff --git a/drivers/power/amd_uncore/amd_uncore.c b/drivers/power/amd_uncore/amd_uncore.c new file mode 100644 index 00..f15eaaa307 --- /dev/null +++ b/drivers/power/amd_uncore/amd_uncore.c @@ -0,0 +1,321 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Advanced Micro Devices, Inc. + */ + +#include +#include +#include + +#include + +#include "amd_uncore.h" +#include "power_common.h" +#include "e_smi/e_smi.h" + +#define MAX_UNCORE_FREQS 8 +#define MAX_NUMA_DIE 8 + +#define BUS_FREQ 1000 + +struct __rte_cache_aligned uncore_power_info { + unsigned int die; /* Core die id */ + unsigned int pkg; /* Package id */ + uint32_t freqs[MAX_UNCORE_FREQS]; /* Frequency array */ + uint32_t nb_freqs; /* Number of available freqs */ + uint32_t curr_idx; /* Freq index in freqs array */ + uint32_t max_freq;/* System max uncore freq */ + uint32_t min_freq;/* System min uncore freq */ +}; + +static struct uncore_power_info uncore_info[RTE_MAX_NUMA_NODES][MAX_NUMA_DIE]; +static int esmi_initialized; + +static int +set_uncore_freq_internal(struct uncore_power_info *ui, uint32_t idx) +{ + int ret; + + if (idx >= MAX_UNCORE_FREQS || idx >= ui->nb_freqs) { + POWER_LOG(DEBUG, "Invalid uncore frequency index %u, which " + "should be less than %u", idx, ui->nb_freqs); + return -1; + } + + ret = esmi_apb_disable(ui->pkg, idx); + if (ret != ESMI_SUCCESS) { + POWER_LOG(ERR, "DF P-state '%u' set failed for pkg %02u", + idx, ui->pkg); + return -1; + } + + POWER_DEBUG_LOG("DF P-state '%u' to be set for pkg %02u die %02u", + idx, ui->pkg, ui->die); + + /* write the minimum value first if the target freq is less than current max */ + ui->curr_idx = idx; + + return 0; +} + +/* + * Fopen the sys file for the future setting of the uncore die frequency. + */ Comment may need updating, as function is not reading any sysfs files (for the moment, at least). +static int +power_init_for_setting_uncore_freq(struct uncore_power_info *ui) +{ + /* open and read all uncore sys files */ Comment may need updating, as function is not reading any sysfs files (for the moment, at least). + /* Base max */ + ui->max_freq = 180; + ui->min_freq = 120; + + return 0; +} + +/* + * Get the available uncore frequencies of the specific die by reading the + * sys file. + */ Comment may need updating, as function is not reading any sysfs files. 3 uncore frequencies hard-coded for the moment, may get via esmi or sysfs in the future. +static int +power_get_available_uncore_freqs(struct uncore_power_info *ui) +{ + int ret = -1; + uint32_t i, num_uncore_freqs = 3; + uint32_t fabric_freqs[] = { + /* to be extended for probing support in future */ + 1800, + 1444, + 1200 + }; + + if (num_uncore_freqs >= MAX_UNCORE_FREQS) { + POWER_LOG(ERR, "Too many available uncore frequencies: %d", + num_uncore_freqs); + goto out; + } + + /* Generate the uncore freq bucket array. */ + for (i = 0; i < num_uncore_freqs; i++) + ui->freqs[i] = fabric_freqs[i] * BUS_FREQ; + + ui->nb_freqs = num_uncore_freqs; + + ret = 0; + + POWER_DEBUG_LOG("%d frequency(s) of pkg %02u die %02u are available", + num_uncore_freqs, ui->pkg, ui->die); + +out: + return ret; +} + --snip--
Re: [PATCH v3] power: fix number of uncore freqs
On 22/07/2024 21:15, Stephen Hemminger wrote: The number of uncore frequencies was defined in three places, and two of them were too small leading to test failures. All places should be using RTE_MAX_UNCORE_FREQS. Bugzilla ID: 1499 Fixes: 60b8a661a957 ("power: add Intel uncore frequency control") Signed-off-by: Stephen Hemminger --- v3 - restore one change lost in V2 app/test/test_power_intel_uncore.c | 4 +--- lib/power/power_intel_uncore.c | 7 +++ 2 files changed, 4 insertions(+), 7 deletions(-) --snip-- LGTM. Acked-by: David Hunt
Re: [PATCH] examples/distributor: fix syntax on single core rx and distributor
Hi Ferruh, On 20/06/2022 18:10, Ferruh Yigit wrote: On 6/20/2022 5:31 PM, Abdullah Ömer Yamaç wrote: This patch fixes the syntax error when using the single-core for both rx and distributor functions. Fixes: 4a7f40c0ff9a ("examples/distributor: add dedicated core") Cc: sta...@dpdk.org Signed-off-by: Abdullah Ömer Yamaç Acked-by: Ferruh Yigit Can we convert compile time macro to dynamic configuration to prevent build errors like this? +1 for the dynamic config. Maybe a "-c" on the command line for "combine rx and dist threads"? Tested-by: David Hunt
Re: [PATCH] examples/distributor: update dynamic configuration
Hi Ömer, I've a few comments: On 21/06/2022 21:15, Abdullah Ömer Yamaç wrote: In this patch, * It is possible to switch the running mode of the distributor using the command line argument. * With "-c" parameter, you can run RX and Distributor on the same core. * Without "-c" parameter, you can run RX and Distributor on the different core. * Syntax error of the single RX and distributor core is fixed. * When "-c" parameter is active, the wasted distributor core is also deactivated in the main function. Fixes: 4a7f40c0ff9a ("examples/distributor: add dedicated core") Cc: sta...@dpdk.org Signed-off-by: Abdullah Ömer Yamaç --- Cc: david.h...@intel.com --- doc/guides/sample_app_ug/dist_app.rst | 3 +- examples/distributor/main.c | 205 +++--- 2 files changed, 152 insertions(+), 56 deletions(-) diff --git a/doc/guides/sample_app_ug/dist_app.rst b/doc/guides/sample_app_ug/dist_app.rst index 3bd03905c3..5c80561187 100644 --- a/doc/guides/sample_app_ug/dist_app.rst +++ b/doc/guides/sample_app_ug/dist_app.rst @@ -42,11 +42,12 @@ Running the Application .. code-block:: console - .//examples/dpdk-distributor [EAL options] -- -p PORTMASK + .//examples/dpdk-distributor [EAL options] -- -p PORTMASK [-c] where, * -p PORTMASK: Hexadecimal bitmask of ports to configure + * -c: Combines the RX core with distribution core #. To run the application in linux environment with 10 lcores, 4 ports, issue the command: diff --git a/examples/distributor/main.c b/examples/distributor/main.c index 02bf91f555..6e98f78054 100644 --- a/examples/distributor/main.c +++ b/examples/distributor/main.c @@ -39,6 +39,7 @@ volatile uint8_t quit_signal_rx; volatile uint8_t quit_signal_dist; volatile uint8_t quit_signal_work; unsigned int power_lib_initialised; +bool enable_lcore_rx_distributor; static volatile struct app_stats { struct { @@ -256,14 +257,82 @@ lcore_rx(struct lcore_params *p) } app_stats.rx.rx_pkts += nb_rx; -/* - * You can run the distributor on the rx core with this code. Returned - * packets are then send straight to the tx core. - */ -#if 0 - rte_distributor_process(d, bufs, nb_rx); - const uint16_t nb_ret = rte_distributor_returned_pktsd, - bufs, BURST_SIZE*2); + /* +* Swap the following two lines if you want the rx traffic +* to go directly to tx, no distribution. +*/ + struct rte_ring *out_ring = p->rx_dist_ring; + /* struct rte_ring *out_ring = p->dist_tx_ring; */ + + uint16_t sent = rte_ring_enqueue_burst(out_ring, + (void *)bufs, nb_rx, NULL); + + app_stats.rx.enqueued_pkts += sent; + if (unlikely(sent < nb_rx)) { + app_stats.rx.enqdrop_pkts += nb_rx - sent; + RTE_LOG_DP(DEBUG, DISTRAPP, + "%s:Packet loss due to full ring\n", __func__); + while (sent < nb_rx) + rte_pktmbuf_free(bufs[sent++]); + } + if (++port == nb_ports) + port = 0; + } + if (power_lib_initialised) + rte_power_exit(rte_lcore_id()); + /* set worker & tx threads quit flag */ + printf("\nCore %u exiting rx task.\n", rte_lcore_id()); + quit_signal = 1; + return 0; +} + +static int +lcore_rx_and_distributor(struct lcore_params *p) +{ + struct rte_distributor *d = p->d; + const uint16_t nb_ports = rte_eth_dev_count_avail(); + const int socket_id = rte_socket_id(); + uint16_t port; + struct rte_mbuf *bufs[BURST_SIZE*2]; + + RTE_ETH_FOREACH_DEV(port) { + /* skip ports that are not enabled */ + if ((enabled_port_mask & (1 << port)) == 0) + continue; + + if (rte_eth_dev_socket_id(port) > 0 && + rte_eth_dev_socket_id(port) != socket_id) + printf("WARNING, port %u is on remote NUMA node to " + "RX thread.\n\tPerformance will not " + "be optimal.\n", port); + } + + printf("\nCore %u doing packet RX and Distributor.\n", rte_lcore_id()); + port = 0; + while (!quit_signal_rx) { + + /* skip ports that are not enabled */ + if ((enabled_port_mask & (1 << port)) == 0) { + if (++port == nb_ports) + port = 0; + continue; + } + const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs, + BURST_SIZE); + if (unlikely(nb_rx == 0)) { +
Re: [PATCH] examples/distributor: update dynamic configuration
Hi Ömer, On 27/06/2022 17:28, Omer Yamac wrote: Hi David, Thank you for your review. I have two questions. The first one is about new release. As I remember new DPDK realize will be published in a short time and my previous fix in that release. Therefore, Should I wait for that release to submit patch? Yes, You can wait, or you can submit to the mailing list now and mark the patch as "Deferred" in patchwork. Once 22.07 is released it will get marked as "New", and under consideration for 22.11. The other question is below, On 27.06.2022 18:51, Hunt, David wrote: Hi Ömer, I've a few comments: On 21/06/2022 21:15, Abdullah Ömer Yamaç wrote: In this patch, * It is possible to switch the running mode of the distributor using the command line argument. * With "-c" parameter, you can run RX and Distributor on the same core. * Without "-c" parameter, you can run RX and Distributor on the different core. * Syntax error of the single RX and distributor core is fixed. * When "-c" parameter is active, the wasted distributor core is also deactivated in the main function. Fixes: 4a7f40c0ff9a ("examples/distributor: add dedicated core") Cc: sta...@dpdk.org Signed-off-by: Abdullah Ömer Yamaç --- --snip-- "1 lcore for stats (can be core 0)\n" @@ -733,6 +808,15 @@ main(int argc, char *argv[]) "1 lcore for packet TX\n" "and at least 1 lcore for worker threads\n"); + if (rte_lcore_count() < 4 && enable_lcore_rx_distributor) + rte_exit(EXIT_FAILURE, "Error, This application needs at " + "least 4 logical cores to run:\n" + "1 lcore for stats (can be core 0)\n" + "1 lcore for packet RX and distribution\n" + "1 lcore for packet TX\n" + "and at least 1 lcore for worker threads\n"); + the two checks above could be replaced with something like: min_cores = 4 + enable_lcore_rx_distributor; if (rte_lcore_count() < min_cores) rte_exit(EXIT_FAILURE, "Error, This application needs at " "least %d logical cores to run:\n" "1 lcore for stats (can be core 0)\n" "1 lcore for packet RX\n" "1 lcore for distribution\n" "1 lcore for packet TX\n" "and at least 1 lcore for worker threads\n", min_cores); Is it okay, if I change the error string such that: "1 or 2 lcore for packet RX and distribution" Sure, that's fine. --snip-- Thanks, Dave.
Re: [PATCH] examples/distributor: update dynamic configuration
On 28/06/2022 12:06, Omer Yamac wrote: Hi David, I have one more question. When I was working on new patch, I just want to make sure what we are doing. On 27.06.2022 18:51, Hunt, David wrote: Hi Ömer, I've a few comments: On 21/06/2022 21:15, Abdullah Ömer Yamaç wrote: --clipped-- @@ -39,6 +39,7 @@ volatile uint8_t quit_signal_rx; volatile uint8_t quit_signal_dist; volatile uint8_t quit_signal_work; unsigned int power_lib_initialised; +bool enable_lcore_rx_distributor; static volatile struct app_stats { struct { --clipped-- @@ -724,7 +794,12 @@ main(int argc, char *argv[]) if (ret < 0) rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n"); - if (rte_lcore_count() < 5) + if (enable_lcore_rx_distributor) + num_workers = rte_lcore_count() - 3; + else + num_workers = rte_lcore_count() - 4; + This could be "num_workers = rte_lcore_count() - (4 - enable_lcore_rx_distributor)". For the "if-else" case of enable_lcore_rx_distributor, we will reduce the line of codes; but I am not sure about that change. Because the type of the variable is bool and we are using arithmetic operation on that variable. I think it is a little bit harder for people to understand operation. Am I right? I can suggest one more solution. We may change the data type to "unsigned int" or Is it okay to leave as before? --clipped-- Hi Ömer, You raise a good point about readability. Let's leave it as you had it originally. Maybe just add a couple of one-line comments? "rx and distributor combined, 3 fixed function cores" and "separate rx and distributor, 4 fixed function cores? Rgds, Dave.
Re: [PATCH] examples/distributor: update dynamic configuration
On 28/06/2022 13:06, Omer Yamac wrote: Hi, Here is the final version. If it is ok, I will test the code and publish. if (enable_lcore_rx_distributor){ // rx and distributor combined, 3 fixed function cores (stat, TX, at least 1 worker) min_cores = 4; num_workers = rte_lcore_count() - 3; } else{ // separate rx and distributor, 3 fixed function cores (stat, TX, at least 1 worker) min_cores = 5; num_workers = rte_lcore_count() - 4; } Hi Ömer, Please use standard comment formatting (look elsewhere for examples). I'm not sure you need min_cores any more. Rgds, Dave. --snip--
Re: [PATCH 2/2] examples/vm_power_manager: use safe version of list iterator
On 01/06/2022 11:54, Hamza Khan wrote: Currently, when vm_power_manager exits, we are using a LIST_FOREACH macro to iterate over VM info structures while freeing them. This leads to use-after-free error. To address this, use the newly added LIST_FOREACH_SAFE macro. Fixes: e8ae9b662506 ("examples/vm_power: channel manager and monitor in host") Cc: alan.ca...@intel.com Cc: sta...@dpdk.org Signed-off-by: Hamza Khan --- examples/vm_power_manager/channel_manager.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index 838465ab4b..bc95cec8d6 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -1005,9 +1005,9 @@ channel_manager_exit(void) { unsigned i; char mask[RTE_MAX_LCORE]; - struct virtual_machine_info *vm_info; + struct virtual_machine_info *vm_info, *tmp; - LIST_FOREACH(vm_info, &vm_list_head, vms_info) { + LIST_FOREACH_SAFE(vm_info, &vm_list_head, vms_info, tmp) { rte_spinlock_lock(&(vm_info->config_spinlock)); Acked-by: David Hunt
Re: [PATCH v3] examples/vm_power_manager: use safe version of list iterator
On 08/07/2022 09:51, Hamza Khan wrote: Currently, when vm_power_manager exits, we are using a LIST_FOREACH macro to iterate over VM info structures while freeing them. This leads to use-after-free error. To address this, replace all usages of LIST_* with TAILQ_* macros, and use the RTE_TAILQ_FOREACH_SAFE macro to iterate and delete VM info structures. * The change is small and doesn’t affect other code * Testing was performed on the patch Fixes: e8ae9b662506 ("examples/vm_power: channel manager and monitor in host") Cc: alan.ca...@intel.com Cc: sta...@dpdk.org Signed-off-by: Hamza Khan --- V3: Update commit message V2: Use RTE_TAILQ_* marcos --- examples/vm_power_manager/channel_manager.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index 838465ab4b..e82c26ddca 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -29,6 +29,8 @@ #include "channel_monitor.h" #include "power_manager.h" +#include "rte_tailq.h" + #define RTE_LOGTYPE_CHANNEL_MANAGER RTE_LOGTYPE_USER1 @@ -58,16 +60,16 @@ struct virtual_machine_info { virDomainInfo info; rte_spinlock_t config_spinlock; int allow_query; - LIST_ENTRY(virtual_machine_info) vms_info; + RTE_TAILQ_ENTRY(virtual_machine_info) vms_info; }; -LIST_HEAD(, virtual_machine_info) vm_list_head; +RTE_TAILQ_HEAD(, virtual_machine_info) vm_list_head; static struct virtual_machine_info * find_domain_by_name(const char *name) { struct virtual_machine_info *info; - LIST_FOREACH(info, &vm_list_head, vms_info) { + RTE_TAILQ_FOREACH(info, &vm_list_head, vms_info) { if (!strncmp(info->name, name, CHANNEL_MGR_MAX_NAME_LEN-1)) return info; } @@ -878,7 +880,7 @@ add_vm(const char *vm_name) new_domain->allow_query = 0; rte_spinlock_init(&(new_domain->config_spinlock)); - LIST_INSERT_HEAD(&vm_list_head, new_domain, vms_info); + TAILQ_INSERT_HEAD(&vm_list_head, new_domain, vms_info); return 0; } @@ -900,7 +902,7 @@ remove_vm(const char *vm_name) rte_spinlock_unlock(&vm_info->config_spinlock); return -1; } - LIST_REMOVE(vm_info, vms_info); + TAILQ_REMOVE(&vm_list_head, vm_info, vms_info); rte_spinlock_unlock(&vm_info->config_spinlock); rte_free(vm_info); return 0; @@ -953,7 +955,7 @@ channel_manager_init(const char *path __rte_unused) { virNodeInfo info; - LIST_INIT(&vm_list_head); + TAILQ_INIT(&vm_list_head); if (connect_hypervisor(path) < 0) { global_n_host_cpus = 64; global_hypervisor_available = 0; @@ -1005,9 +1007,9 @@ channel_manager_exit(void) { unsigned i; char mask[RTE_MAX_LCORE]; - struct virtual_machine_info *vm_info; + struct virtual_machine_info *vm_info, *tmp; - LIST_FOREACH(vm_info, &vm_list_head, vms_info) { + RTE_TAILQ_FOREACH_SAFE(vm_info, &vm_list_head, vms_info, tmp) { rte_spinlock_lock(&(vm_info->config_spinlock)); @@ -1022,7 +1024,7 @@ channel_manager_exit(void) } rte_spinlock_unlock(&(vm_info->config_spinlock)); - LIST_REMOVE(vm_info, vms_info); + TAILQ_REMOVE(&vm_list_head, vm_info, vms_info); rte_free(vm_info); } Acked-by: David Hunt
Re: [PATCH] examples/l3fwd-power: support CPPC cpufreq
On 22/02/2023 02:13, Jie Hai wrote: Hi, David Hunt, Kindly ping. Could you please take a look at this patch? Thanks, Jie Hai On 2023/1/31 10:58, Jie Hai wrote: Currently the l3fwd-power only supports ACPI cpufreq and Pstate cpufreq, This patch adds CPPC cpufreq. Signed-off-by: Jie Hai Hi, Jie Hai, Apologies, this patch never got to my inbox. Looks good to me. Acked-by: David Hunt
Re: [PATCH v4 09/19] remove repeated word 'worker'
On 22/02/2023 16:25, Stephen Hemminger wrote: Found by doing duplicate word scan. Signed-off-by: Stephen Hemminger --- lib/distributor/rte_distributor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/distributor/rte_distributor.c b/lib/distributor/rte_distributor.c index 3969b2ea7e8a..c1a721dd821d 100644 --- a/lib/distributor/rte_distributor.c +++ b/lib/distributor/rte_distributor.c @@ -576,7 +576,7 @@ rte_distributor_process(struct rte_distributor *d, } } -/* Add to current worker worker */ + /* Add to current worker */ unsigned int idx = bl->count++; bl->tags[idx] = new_tag; Acked-by: David Hunt
Re: [PATCH v4 05/19] remove repeated word 'on'
On 22/02/2023 16:25, Stephen Hemminger wrote: Found by doing duplicate word scan. Signed-off-by: Stephen Hemminger --- drivers/net/bonding/rte_eth_bond_8023ad.h | 2 +- examples/vm_power_manager/channel_monitor.h | 2 +- examples/vm_power_manager/oob_monitor.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.h b/drivers/net/bonding/rte_eth_bond_8023ad.h index 7eb392f8c8a1..7ad8d6d00bd5 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.h +++ b/drivers/net/bonding/rte_eth_bond_8023ad.h @@ -271,7 +271,7 @@ rte_eth_bond_8023ad_ext_slowtx(uint16_t port_id, uint16_t slave_id, struct rte_mbuf *lacp_pkt); /** - * Enable dedicated hw queues for 802.3ad control plane traffic on on slaves + * Enable dedicated hw queues for 802.3ad control plane traffic on slaves * * This function creates an additional tx and rx queue on each slave for * dedicated 802.3ad control plane traffic . A flow filtering rule is diff --git a/examples/vm_power_manager/channel_monitor.h b/examples/vm_power_manager/channel_monitor.h index 2b38c554b5cd..ab69524af52c 100644 --- a/examples/vm_power_manager/channel_monitor.h +++ b/examples/vm_power_manager/channel_monitor.h @@ -41,7 +41,7 @@ extern "C" { int channel_monitor_init(void); /** - * Run the channel monitor, loops forever on on epoll_wait. + * Run the channel monitor, loops forever on epoll_wait. * * * @return diff --git a/examples/vm_power_manager/oob_monitor.h b/examples/vm_power_manager/oob_monitor.h index b96e08df782c..2389c1151956 100644 --- a/examples/vm_power_manager/oob_monitor.h +++ b/examples/vm_power_manager/oob_monitor.h @@ -20,7 +20,7 @@ extern "C" { int branch_monitor_init(void); /** - * Run the OOB branch monitor, loops forever on on epoll_wait. + * Run the OOB branch monitor, loops forever on epoll_wait. * * * @return Acked-by: David Hunt
Re: [PATCH v4 02/19] remove repeated word 'to'
On 22/02/2023 16:25, Stephen Hemminger wrote: Found by doing duplicate word scan. Signed-off-by: Stephen Hemminger --- app/test/test_resource.c | 2 +- lib/pipeline/rte_swx_ctl.c | 2 +- lib/power/guest_channel.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/app/test/test_resource.c b/app/test/test_resource.c index 8f41e3babdc5..05c27db203cc 100644 --- a/app/test/test_resource.c +++ b/app/test/test_resource.c @@ -45,7 +45,7 @@ static int test_resource_c(void) r->name); TEST_ASSERT_SUCCESS(resource_fwrite_file(r, "test_resource.c"), - "Failed to to write file %s", r->name); + "Failed to write file %s", r->name); f = fopen("test_resource.c", "r"); TEST_ASSERT_NOT_NULL(f, diff --git a/lib/pipeline/rte_swx_ctl.c b/lib/pipeline/rte_swx_ctl.c index f7f49d7cebad..857770d297a4 100644 --- a/lib/pipeline/rte_swx_ctl.c +++ b/lib/pipeline/rte_swx_ctl.c @@ -1683,7 +1683,7 @@ rte_swx_ctl_pipeline_table_entry_delete(struct rte_swx_ctl_pipeline *ctl, CHECK(!table_entry_check(ctl, table_id, entry, 1, 0), EINVAL); /* The entry is found in the table->entries list: -* - Move the existing entry from the table->entries list to to the +* - Move the existing entry from the table->entries list to the * table->pending_delete list. */ existing_entry = table_entries_find(table, entry); diff --git a/lib/power/guest_channel.c b/lib/power/guest_channel.c index 969a9e5aaa06..7b2ae0b6506f 100644 --- a/lib/power/guest_channel.c +++ b/lib/power/guest_channel.c @@ -74,7 +74,7 @@ guest_channel_host_connect(const char *path, unsigned int lcore_id) fd_path, lcore_id); fd = open(fd_path, O_RDWR); if (fd < 0) { - RTE_LOG(ERR, GUEST_CHANNEL, "Unable to to connect to '%s' with error " + RTE_LOG(ERR, GUEST_CHANNEL, "Unable to connect to '%s' with error " "%s\n", fd_path, strerror(errno)); return -1; } Acked-by: David Hunt
Re: [PATCH v9 09/22] power: replace RTE_LOGTYPE_POWER with dynamic type
On 21/02/2023 19:01, Stephen Hemminger wrote: Use dynamic log type for power library. Also replace use of RTE_LOGTYPE_USER1 with lib.power.guest. Signed-off-by: Stephen Hemminger --- lib/eal/common/eal_common_log.c | 1 - lib/eal/include/rte_log.h | 2 +- lib/power/guest_channel.c | 3 ++- lib/power/power_common.c| 2 ++ lib/power/power_common.h| 3 ++- lib/power/power_kvm_vm.c| 1 + lib/power/rte_power.c | 1 + 7 files changed, 9 insertions(+), 4 deletions(-) diff --git a/lib/eal/common/eal_common_log.c b/lib/eal/common/eal_common_log.c index 9e853addb717..39e1e6680dea 100644 --- a/lib/eal/common/eal_common_log.c +++ b/lib/eal/common/eal_common_log.c @@ -355,7 +355,6 @@ static const struct logtype logtype_strings[] = { {RTE_LOGTYPE_HASH, "lib.hash"}, {RTE_LOGTYPE_LPM,"lib.lpm"}, {RTE_LOGTYPE_KNI,"lib.kni"}, - {RTE_LOGTYPE_POWER, "lib.power"}, {RTE_LOGTYPE_METER, "lib.meter"}, {RTE_LOGTYPE_SCHED, "lib.sched"}, {RTE_LOGTYPE_PORT, "lib.port"}, diff --git a/lib/eal/include/rte_log.h b/lib/eal/include/rte_log.h index 1408722b2c2f..7d4345acceca 100644 --- a/lib/eal/include/rte_log.h +++ b/lib/eal/include/rte_log.h @@ -36,7 +36,7 @@ extern "C" { #define RTE_LOGTYPE_LPM7 /**< Log related to LPM. */ #define RTE_LOGTYPE_KNI8 /**< Log related to KNI. */ /* was RTE_LOGTYPE_ACL */ -#define RTE_LOGTYPE_POWER 10 /**< Log related to power. */ +/* was RTE_LOGTYPE_POWER */ #define RTE_LOGTYPE_METER 11 /**< Log related to QoS meter. */ #define RTE_LOGTYPE_SCHED 12 /**< Log related to QoS port scheduler. */ #define RTE_LOGTYPE_PORT 13 /**< Log related to port. */ diff --git a/lib/power/guest_channel.c b/lib/power/guest_channel.c index 969a9e5aaa06..efc326d520ca 100644 --- a/lib/power/guest_channel.c +++ b/lib/power/guest_channel.c @@ -17,7 +17,8 @@ #include "guest_channel.h" -#define RTE_LOGTYPE_GUEST_CHANNEL RTE_LOGTYPE_USER1 +RTE_LOG_REGISTER_SUFFIX(guest_channel_logtype, guest, INFO); +#define RTE_LOGTYPE_GUEST_CHANNEL guest_channel_logtype /* Timeout for incoming message in milliseconds. */ #define TIMEOUT 10 diff --git a/lib/power/power_common.c b/lib/power/power_common.c index 1e09facb863f..bf77eafa886b 100644 --- a/lib/power/power_common.c +++ b/lib/power/power_common.c @@ -12,6 +12,8 @@ #include "power_common.h" +RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO); + #define POWER_SYSFILE_SCALING_DRIVER \ "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver" #define POWER_SYSFILE_GOVERNOR \ diff --git a/lib/power/power_common.h b/lib/power/power_common.h index c1c713927621..63a3a443509e 100644 --- a/lib/power/power_common.h +++ b/lib/power/power_common.h @@ -5,11 +5,12 @@ #ifndef _POWER_COMMON_H_ #define _POWER_COMMON_H_ - #include #define RTE_POWER_INVALID_FREQ_INDEX (~0) +extern int power_logtype; +#define RTE_LOGTYPE_POWER power_logtype #ifdef RTE_LIBRTE_POWER_DEBUG #define POWER_DEBUG_TRACE(fmt, args...) \ diff --git a/lib/power/power_kvm_vm.c b/lib/power/power_kvm_vm.c index 6a8109d44959..db031f43105a 100644 --- a/lib/power/power_kvm_vm.c +++ b/lib/power/power_kvm_vm.c @@ -8,6 +8,7 @@ #include "rte_power_guest_channel.h" #include "guest_channel.h" +#include "power_common.h" #include "power_kvm_vm.h" #define FD_PATH "/dev/virtio-ports/virtio.serial.port.poweragent" diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c index 63a43bd8f5ae..db0e7705a9ef 100644 --- a/lib/power/rte_power.c +++ b/lib/power/rte_power.c @@ -10,6 +10,7 @@ #include "rte_power.h" #include "power_acpi_cpufreq.h" #include "power_cppc_cpufreq.h" +#include "power_common.h" #include "power_kvm_vm.h" #include "power_pstate_cpufreq.h" Acked-by: David Hunt
Re: [PATCH v9 08/22] examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
On 21/02/2023 19:01, Stephen Hemminger wrote: Convert to using a dynamic logtype for the application. Signed-off-by: Stephen Hemminger --- examples/l3fwd-power/main.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index f57c54d2b57c..76b890b76c88 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -51,7 +51,8 @@ #include "perf_core.h" #include "main.h" -#define RTE_LOGTYPE_L3FWD_POWER RTE_LOGTYPE_USER1 +RTE_LOG_REGISTER(l3fwd_power_logtype, l3fwd.power, INFO); +#define RTE_LOGTYPE_L3FWD_POWER l3fwd_power_logtype #define MAX_PKT_BURST 32 @@ -2236,7 +2237,7 @@ init_power_library(void) /* init power management library */ ret = rte_power_init(lcore_id); if (ret) { - RTE_LOG(ERR, POWER, + RTE_LOG(ERR, L3FWD_POWER, "Library initialization failed on core %u\n", lcore_id); return ret; @@ -2245,7 +2246,7 @@ init_power_library(void) env = rte_power_get_env(); if (env != PM_ENV_ACPI_CPUFREQ && env != PM_ENV_PSTATE_CPUFREQ) { - RTE_LOG(ERR, POWER, + RTE_LOG(ERR, L3FWD_POWER, "Only ACPI and PSTATE mode are supported\n"); return -1; } @@ -2263,7 +2264,7 @@ deinit_power_library(void) /* deinit power management library */ ret = rte_power_exit(lcore_id); if (ret) { - RTE_LOG(ERR, POWER, + RTE_LOG(ERR, L3FWD_POWER, "Library deinitialization failed on core %u\n", lcore_id); return ret; @@ -2332,7 +2333,7 @@ update_telemetry(__rte_unused struct rte_timer *tim, ret = rte_metrics_update_values(RTE_METRICS_GLOBAL, telstats_index, values, RTE_DIM(values)); if (ret < 0) - RTE_LOG(WARNING, POWER, "failed to update metrics\n"); + RTE_LOG(WARNING, L3FWD_POWER, "failed to update metrics\n"); } static int @@ -2381,7 +2382,7 @@ launch_timer(unsigned int lcore_id) rte_get_main_lcore()); } - RTE_LOG(INFO, POWER, "Bring up the Timer\n"); + RTE_LOG(INFO, L3FWD_POWER, "Bring up the Timer\n"); telemetry_setup_timer(); @@ -2397,7 +2398,7 @@ launch_timer(unsigned int lcore_id) } } - RTE_LOG(INFO, POWER, "Timer_subsystem is done\n"); + RTE_LOG(INFO, L3FWD_POWER, "Timer_subsystem is done\n"); return 0; } Acked-by: David Hunt
Re: [PATCH v9 07/22] examples/power: replace use of RTE_LOGTYPE_POWER
On 21/02/2023 19:01, Stephen Hemminger wrote: Don't use static logtype in sample application. Signed-off-by: Stephen Hemminger --- examples/distributor/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/distributor/main.c b/examples/distributor/main.c index 21304d661873..542f76cf9664 100644 --- a/examples/distributor/main.c +++ b/examples/distributor/main.c @@ -679,7 +679,7 @@ init_power_library(void) /* init power management library */ ret = rte_power_init(lcore_id); if (ret) { - RTE_LOG(ERR, POWER, + fprintf(stderr, "Library initialization failed on core %u\n", lcore_id); /* Acked-by: David Hunt
Re: [PATCH v7 09/22] power: replace RTE_LOGTYPE_POWER with dynamic type
On 15/02/2023 17:23, Stephen Hemminger wrote: Use dynamic log type for power library. Also replace use of RTE_LOGTYPE_USER1 with lib.power.guest. Signed-off-by: Stephen Hemminger --- lib/eal/common/eal_common_log.c | 1 - lib/eal/include/rte_log.h| 2 +- lib/power/guest_channel.c| 3 ++- lib/power/power_common.c | 2 ++ lib/power/power_common.h | 3 ++- lib/power/power_kvm_vm.c | 1 + lib/power/rte_power.c| 1 + lib/power/rte_power_empty_poll.c | 1 + 8 files changed, 10 insertions(+), 4 deletions(-) diff --git a/lib/eal/common/eal_common_log.c b/lib/eal/common/eal_common_log.c index 9e853addb717..39e1e6680dea 100644 --- a/lib/eal/common/eal_common_log.c +++ b/lib/eal/common/eal_common_log.c @@ -355,7 +355,6 @@ static const struct logtype logtype_strings[] = { {RTE_LOGTYPE_HASH, "lib.hash"}, {RTE_LOGTYPE_LPM,"lib.lpm"}, {RTE_LOGTYPE_KNI,"lib.kni"}, - {RTE_LOGTYPE_POWER, "lib.power"}, {RTE_LOGTYPE_METER, "lib.meter"}, {RTE_LOGTYPE_SCHED, "lib.sched"}, {RTE_LOGTYPE_PORT, "lib.port"}, diff --git a/lib/eal/include/rte_log.h b/lib/eal/include/rte_log.h index 1408722b2c2f..7d4345acceca 100644 --- a/lib/eal/include/rte_log.h +++ b/lib/eal/include/rte_log.h @@ -36,7 +36,7 @@ extern "C" { #define RTE_LOGTYPE_LPM7 /**< Log related to LPM. */ #define RTE_LOGTYPE_KNI8 /**< Log related to KNI. */ /* was RTE_LOGTYPE_ACL */ -#define RTE_LOGTYPE_POWER 10 /**< Log related to power. */ +/* was RTE_LOGTYPE_POWER */ #define RTE_LOGTYPE_METER 11 /**< Log related to QoS meter. */ #define RTE_LOGTYPE_SCHED 12 /**< Log related to QoS port scheduler. */ #define RTE_LOGTYPE_PORT 13 /**< Log related to port. */ diff --git a/lib/power/guest_channel.c b/lib/power/guest_channel.c index 969a9e5aaa06..efc326d520ca 100644 --- a/lib/power/guest_channel.c +++ b/lib/power/guest_channel.c @@ -17,7 +17,8 @@ #include "guest_channel.h" -#define RTE_LOGTYPE_GUEST_CHANNEL RTE_LOGTYPE_USER1 +RTE_LOG_REGISTER_SUFFIX(guest_channel_logtype, guest, INFO); +#define RTE_LOGTYPE_GUEST_CHANNEL guest_channel_logtype /* Timeout for incoming message in milliseconds. */ #define TIMEOUT 10 diff --git a/lib/power/power_common.c b/lib/power/power_common.c index 1e09facb863f..bf77eafa886b 100644 --- a/lib/power/power_common.c +++ b/lib/power/power_common.c @@ -12,6 +12,8 @@ #include "power_common.h" +RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO); + #define POWER_SYSFILE_SCALING_DRIVER \ "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver" #define POWER_SYSFILE_GOVERNOR \ diff --git a/lib/power/power_common.h b/lib/power/power_common.h index c1c713927621..63a3a443509e 100644 --- a/lib/power/power_common.h +++ b/lib/power/power_common.h @@ -5,11 +5,12 @@ #ifndef _POWER_COMMON_H_ #define _POWER_COMMON_H_ - #include #define RTE_POWER_INVALID_FREQ_INDEX (~0) +extern int power_logtype; +#define RTE_LOGTYPE_POWER power_logtype #ifdef RTE_LIBRTE_POWER_DEBUG #define POWER_DEBUG_TRACE(fmt, args...) \ diff --git a/lib/power/power_kvm_vm.c b/lib/power/power_kvm_vm.c index 6a8109d44959..db031f43105a 100644 --- a/lib/power/power_kvm_vm.c +++ b/lib/power/power_kvm_vm.c @@ -8,6 +8,7 @@ #include "rte_power_guest_channel.h" #include "guest_channel.h" +#include "power_common.h" #include "power_kvm_vm.h" #define FD_PATH "/dev/virtio-ports/virtio.serial.port.poweragent" diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c index 63a43bd8f5ae..db0e7705a9ef 100644 --- a/lib/power/rte_power.c +++ b/lib/power/rte_power.c @@ -10,6 +10,7 @@ #include "rte_power.h" #include "power_acpi_cpufreq.h" #include "power_cppc_cpufreq.h" +#include "power_common.h" #include "power_kvm_vm.h" #include "power_pstate_cpufreq.h" diff --git a/lib/power/rte_power_empty_poll.c b/lib/power/rte_power_empty_poll.c index 4a4db512474e..64e3b4b46d5e 100644 --- a/lib/power/rte_power_empty_poll.c +++ b/lib/power/rte_power_empty_poll.c @@ -10,6 +10,7 @@ #include "rte_power.h" #include "rte_power_empty_poll.h" +#include "power_common.h" #define INTERVALS_PER_SECOND 100 /* (10ms) */ #define SECONDS_TO_TRAIN_FOR 2 Acked-by: David Hunt
Re: [PATCH v10 09/22] power: replace RTE_LOGTYPE_POWER with dynamic type
On 22/02/2023 16:07, Stephen Hemminger wrote: Use dynamic log type for power library. Also replace use of RTE_LOGTYPE_USER1 with lib.power.guest. Signed-off-by: Stephen Hemminger --- lib/eal/common/eal_common_log.c | 1 - lib/eal/include/rte_log.h | 2 +- lib/power/guest_channel.c | 3 ++- lib/power/power_common.c| 2 ++ lib/power/power_common.h| 3 ++- lib/power/power_kvm_vm.c| 1 + lib/power/rte_power.c | 1 + 7 files changed, 9 insertions(+), 4 deletions(-) diff --git a/lib/eal/common/eal_common_log.c b/lib/eal/common/eal_common_log.c index 9e853addb717..39e1e6680dea 100644 --- a/lib/eal/common/eal_common_log.c +++ b/lib/eal/common/eal_common_log.c @@ -355,7 +355,6 @@ static const struct logtype logtype_strings[] = { {RTE_LOGTYPE_HASH, "lib.hash"}, {RTE_LOGTYPE_LPM,"lib.lpm"}, {RTE_LOGTYPE_KNI,"lib.kni"}, - {RTE_LOGTYPE_POWER, "lib.power"}, {RTE_LOGTYPE_METER, "lib.meter"}, {RTE_LOGTYPE_SCHED, "lib.sched"}, {RTE_LOGTYPE_PORT, "lib.port"}, diff --git a/lib/eal/include/rte_log.h b/lib/eal/include/rte_log.h index 1408722b2c2f..7d4345acceca 100644 --- a/lib/eal/include/rte_log.h +++ b/lib/eal/include/rte_log.h @@ -36,7 +36,7 @@ extern "C" { #define RTE_LOGTYPE_LPM7 /**< Log related to LPM. */ #define RTE_LOGTYPE_KNI8 /**< Log related to KNI. */ /* was RTE_LOGTYPE_ACL */ -#define RTE_LOGTYPE_POWER 10 /**< Log related to power. */ +/* was RTE_LOGTYPE_POWER */ #define RTE_LOGTYPE_METER 11 /**< Log related to QoS meter. */ #define RTE_LOGTYPE_SCHED 12 /**< Log related to QoS port scheduler. */ #define RTE_LOGTYPE_PORT 13 /**< Log related to port. */ diff --git a/lib/power/guest_channel.c b/lib/power/guest_channel.c index 969a9e5aaa06..efc326d520ca 100644 --- a/lib/power/guest_channel.c +++ b/lib/power/guest_channel.c @@ -17,7 +17,8 @@ #include "guest_channel.h" -#define RTE_LOGTYPE_GUEST_CHANNEL RTE_LOGTYPE_USER1 +RTE_LOG_REGISTER_SUFFIX(guest_channel_logtype, guest, INFO); +#define RTE_LOGTYPE_GUEST_CHANNEL guest_channel_logtype /* Timeout for incoming message in milliseconds. */ #define TIMEOUT 10 diff --git a/lib/power/power_common.c b/lib/power/power_common.c index 1e09facb863f..bf77eafa886b 100644 --- a/lib/power/power_common.c +++ b/lib/power/power_common.c @@ -12,6 +12,8 @@ #include "power_common.h" +RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO); + #define POWER_SYSFILE_SCALING_DRIVER \ "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver" #define POWER_SYSFILE_GOVERNOR \ diff --git a/lib/power/power_common.h b/lib/power/power_common.h index c1c713927621..63a3a443509e 100644 --- a/lib/power/power_common.h +++ b/lib/power/power_common.h @@ -5,11 +5,12 @@ #ifndef _POWER_COMMON_H_ #define _POWER_COMMON_H_ - #include #define RTE_POWER_INVALID_FREQ_INDEX (~0) +extern int power_logtype; +#define RTE_LOGTYPE_POWER power_logtype #ifdef RTE_LIBRTE_POWER_DEBUG #define POWER_DEBUG_TRACE(fmt, args...) \ diff --git a/lib/power/power_kvm_vm.c b/lib/power/power_kvm_vm.c index 6a8109d44959..db031f43105a 100644 --- a/lib/power/power_kvm_vm.c +++ b/lib/power/power_kvm_vm.c @@ -8,6 +8,7 @@ #include "rte_power_guest_channel.h" #include "guest_channel.h" +#include "power_common.h" #include "power_kvm_vm.h" #define FD_PATH "/dev/virtio-ports/virtio.serial.port.poweragent" diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c index 63a43bd8f5ae..db0e7705a9ef 100644 --- a/lib/power/rte_power.c +++ b/lib/power/rte_power.c @@ -10,6 +10,7 @@ #include "rte_power.h" #include "power_acpi_cpufreq.h" #include "power_cppc_cpufreq.h" +#include "power_common.h" #include "power_kvm_vm.h" #include "power_pstate_cpufreq.h" Acked-by: David Hunt
Re: [PATCH v10 08/22] examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
On 22/02/2023 16:07, Stephen Hemminger wrote: Convert to using a dynamic logtype for the application. Signed-off-by: Stephen Hemminger --- examples/l3fwd-power/main.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index f57c54d2b57c..76b890b76c88 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -51,7 +51,8 @@ #include "perf_core.h" #include "main.h" -#define RTE_LOGTYPE_L3FWD_POWER RTE_LOGTYPE_USER1 +RTE_LOG_REGISTER(l3fwd_power_logtype, l3fwd.power, INFO); +#define RTE_LOGTYPE_L3FWD_POWER l3fwd_power_logtype #define MAX_PKT_BURST 32 @@ -2236,7 +2237,7 @@ init_power_library(void) /* init power management library */ ret = rte_power_init(lcore_id); if (ret) { - RTE_LOG(ERR, POWER, + RTE_LOG(ERR, L3FWD_POWER, "Library initialization failed on core %u\n", lcore_id); return ret; @@ -2245,7 +2246,7 @@ init_power_library(void) env = rte_power_get_env(); if (env != PM_ENV_ACPI_CPUFREQ && env != PM_ENV_PSTATE_CPUFREQ) { - RTE_LOG(ERR, POWER, + RTE_LOG(ERR, L3FWD_POWER, "Only ACPI and PSTATE mode are supported\n"); return -1; } @@ -2263,7 +2264,7 @@ deinit_power_library(void) /* deinit power management library */ ret = rte_power_exit(lcore_id); if (ret) { - RTE_LOG(ERR, POWER, + RTE_LOG(ERR, L3FWD_POWER, "Library deinitialization failed on core %u\n", lcore_id); return ret; @@ -2332,7 +2333,7 @@ update_telemetry(__rte_unused struct rte_timer *tim, ret = rte_metrics_update_values(RTE_METRICS_GLOBAL, telstats_index, values, RTE_DIM(values)); if (ret < 0) - RTE_LOG(WARNING, POWER, "failed to update metrics\n"); + RTE_LOG(WARNING, L3FWD_POWER, "failed to update metrics\n"); } static int @@ -2381,7 +2382,7 @@ launch_timer(unsigned int lcore_id) rte_get_main_lcore()); } - RTE_LOG(INFO, POWER, "Bring up the Timer\n"); + RTE_LOG(INFO, L3FWD_POWER, "Bring up the Timer\n"); telemetry_setup_timer(); @@ -2397,7 +2398,7 @@ launch_timer(unsigned int lcore_id) } } - RTE_LOG(INFO, POWER, "Timer_subsystem is done\n"); + RTE_LOG(INFO, L3FWD_POWER, "Timer_subsystem is done\n"); return 0; } Acked-by: David Hunt
Re: [PATCH v10 07/22] examples/power: replace use of RTE_LOGTYPE_POWER
On 22/02/2023 16:07, Stephen Hemminger wrote: Don't use static logtype in sample application. Signed-off-by: Stephen Hemminger --- examples/distributor/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/distributor/main.c b/examples/distributor/main.c index 21304d661873..542f76cf9664 100644 --- a/examples/distributor/main.c +++ b/examples/distributor/main.c @@ -679,7 +679,7 @@ init_power_library(void) /* init power management library */ ret = rte_power_init(lcore_id); if (ret) { - RTE_LOG(ERR, POWER, + fprintf(stderr, "Library initialization failed on core %u\n", lcore_id); /* Acked-by: David Hunt
Re: [RFC PATCH 1/2] power: refactor core power management library
On 01/03/2024 02:56, lihuisong (C) wrote: 在 2024/2/20 23:33, Sivaprasad Tummala 写道: This patch introduces a comprehensive refactor to the core power management library. The primary focus is on improving modularity and organization by relocating specific driver implementations from the 'lib/power' directory to dedicated directories within 'drivers/power/core/*'. The adjustment of meson.build files enables the selective activation of individual drivers. These changes contribute to a significant enhancement in code organization, providing a clearer structure for driver implementations. The refactor aims to improve overall code clarity and boost maintainability. Additionally, it establishes a foundation for future development, allowing for more focused work on individual drivers and seamless integration of forthcoming enhancements. Good job. +1 to refacotor. <...> Also a +1 from me, looks like a sensible re-organisation of the power code. Regards, Dave. diff --git a/drivers/meson.build b/drivers/meson.build index f2be71bc05..e293c3945f 100644 --- a/drivers/meson.build +++ b/drivers/meson.build @@ -28,6 +28,7 @@ subdirs = [ 'event', # depends on common, bus, mempool and net. 'baseband', # depends on common and bus. 'gpu', # depends on common and bus. + 'power', # depends on common (in future). ] if meson.is_cross_build() diff --git a/drivers/power/core/acpi/meson.build b/drivers/power/core/acpi/meson.build new file mode 100644 index 00..d10ec8ee94 --- /dev/null +++ b/drivers/power/core/acpi/meson.build @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 AMD Limited + +sources = files('power_acpi_cpufreq.c') + +headers = files('power_acpi_cpufreq.h') + +deps += ['power'] diff --git a/lib/power/power_acpi_cpufreq.c b/drivers/power/core/acpi/power_acpi_cpufreq.c similarity index 95% rename from lib/power/power_acpi_cpufreq.c rename to drivers/power/core/acpi/power_acpi_cpufreq.c This file is in power lib. How about remove the 'power' prefix of this file name? like acpi_cpufreq.c, cppc_cpufreq.c. index f8d978d03d..69d80ad2ae 100644 --- a/lib/power/power_acpi_cpufreq.c +++ b/drivers/power/core/acpi/power_acpi_cpufreq.c @@ -577,3 +577,22 @@ int power_acpi_get_capabilities(unsigned int lcore_id, return 0; } + +static struct rte_power_ops acpi_ops = { How about use the following structure name? "struct rte_power_cpufreq_ops" or "struct rte_power_core_ops" After all, we also have other power ops, like uncore, right? + .init = power_acpi_cpufreq_init, + .exit = power_acpi_cpufreq_exit, + .check_env_support = power_acpi_cpufreq_check_supported, + .get_avail_freqs = power_acpi_cpufreq_freqs, + .get_freq = power_acpi_cpufreq_get_freq, + .set_freq = power_acpi_cpufreq_set_freq, + .freq_down = power_acpi_cpufreq_freq_down, + .freq_up = power_acpi_cpufreq_freq_up, + .freq_max = power_acpi_cpufreq_freq_max, + .freq_min = power_acpi_cpufreq_freq_min, + .turbo_status = power_acpi_turbo_status, + .enable_turbo = power_acpi_enable_turbo, + .disable_turbo = power_acpi_disable_turbo, + .get_caps = power_acpi_get_capabilities +}; + +RTE_POWER_REGISTER_OPS(acpi_ops); diff --git a/lib/power/power_acpi_cpufreq.h b/drivers/power/core/acpi/power_acpi_cpufreq.h similarity index 100% rename from lib/power/power_acpi_cpufreq.h rename to drivers/power/core/acpi/power_acpi_cpufreq.h diff --git a/drivers/power/core/amd-pstate/meson.build b/drivers/power/core/amd-pstate/meson.build new file mode 100644 index 00..8ec4c960f5 --- /dev/null +++ b/drivers/power/core/amd-pstate/meson.build @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2024 AMD Limited + +sources = files('power_amd_pstate_cpufreq.c') + +headers = files('power_amd_pstate_cpufreq.h') + +deps += ['power'] diff --git a/lib/power/power_amd_pstate_cpufreq.c b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c similarity index 95% rename from lib/power/power_amd_pstate_cpufreq.c rename to drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c index 028f84416b..9938de72a6 100644 --- a/lib/power/power_amd_pstate_cpufreq.c +++ b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c @@ -700,3 +700,22 @@ power_amd_pstate_get_capabilities(unsigned int lcore_id, return 0; } + +static struct rte_power_ops amd_pstate_ops = { + .init = power_amd_pstate_cpufreq_init, + .exit = power_amd_pstate_cpufreq_exit, + .check_env_support = power_amd_pstate_cpufreq_check_supported, + .get_avail_freqs = power_amd_pstate_cpufreq_freqs, + .get_freq = power_amd_pstate_cpufreq_get_freq, + .set_freq = power_amd_pstate_cpufreq_set_freq, + .freq_down = power_amd_pstate_cpufreq_freq_down, + .freq_up = power_amd_pstate_cpufreq_freq_up, + .freq_max = power_amd_pstate_cpufreq_freq_max, + .freq_min = power_amd_pstate_cpufreq_freq_min, + .turbo_
Re: [PATCH v3 8/8] examples/l3fwd-power: update to call arg parser API
Hi Euan, On 07/12/2023 16:18, Euan Bourke wrote: Update to the l3fwd-power example application to call the arg parser library for its 'combined core string parser' instead of implementing its own corelist parser. The default_type passed into the function call is a corelist. Signed-off-by: Euan Bourke --- examples/l3fwd-power/perf_core.c | 51 +--- 1 file changed, 8 insertions(+), 43 deletions(-) diff --git a/examples/l3fwd-power/perf_core.c b/examples/l3fwd-power/perf_core.c index 41ef6d0c9a..f8511e30b3 100644 --- a/examples/l3fwd-power/perf_core.c +++ b/examples/l3fwd-power/perf_core.c @@ -12,6 +12,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -177,56 +178,20 @@ parse_perf_config(const char *q_arg) int parse_perf_core_list(const char *corelist) { - int i, idx = 0; - unsigned int count = 0; - char *end = NULL; - int min, max; + int count; + uint16_t cores[RTE_MAX_LCORE]; if (corelist == NULL) { printf("invalid core list\n"); return -1; } + count = rte_arg_parse_core_string(corelist, cores, RTE_DIM(cores), 1); - /* Remove all blank characters ahead and after */ - while (isblank(*corelist)) - corelist++; - i = strlen(corelist); - while ((i > 0) && isblank(corelist[i - 1])) - i--; + for (int i = 0; i < count; i++) nit: you've used int here, but below you use uint16_t for a for loop. If you're re-spinning, it might be worth making consistent. But no biggie. --snip-- @@ -234,7 +199,7 @@ parse_perf_core_list(const char *corelist) nb_hp_lcores = count; printf("Configured %d high performance cores\n", nb_hp_lcores); - for (i = 0; i < nb_hp_lcores; i++) + for (uint16_t i = 0; i < nb_hp_lcores; i++) printf("\tHigh performance core %d %d\n", i, hp_lcores[i]); I've also tested this with a 16-core incantation of l3fwd-power with various combinations of cores, seems to work well. Acked-by: David Hunt
Re: [PATCH 3/3] examples/vm_power_manager: do not use EAL logtype
On 11/12/2023 17:23, Stephen Hemminger wrote: Be consistent for all the error printouts and use fprintf(). The EAL logtype is reserved for internal use by EAL. Signed-off-by: Stephen Hemminger --- examples/vm_power_manager/main.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index b159291d77ce..c14138202004 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c --snip-- Acked-by: David Hunt
Re: [PATCH] examples/l3fwd-power: fix to configure the uncore env
Hi Sivaprasad, Karen, On 27/10/2023 13:36, Kelly, Karen wrote: On 26/10/2023 16:19, Sivaprasad Tummala wrote: Updated the l3fwd-power app to configure the uncore env before invoking any uncore APIs. With auto-detection in 'rte_power_uncore_init()' it is too late because other APIs already called. Bugzilla ID: 1304 Fixes: ac1edcb6621a ("power: refactor uncore power management API") Cc:karen.ke...@intel.com Signed-off-by: Sivaprasad Tummala --- examples/l3fwd-power/main.c | 6 ++ lib/power/rte_power_uncore.c | 7 +++ lib/power/rte_power_uncore.h | 9 + 3 files changed, 18 insertions(+), 4 deletions(-) Ran on my system, bug is resolved after applying this patch. Tested-by: Karen Kelly Acked-by: David Hunt
Re: [PATCH v1 1/1] maintainers: update maintainership of power lib
On 15/06/2023 10:16, Anatoly Burakov wrote: Add co-maintainer for power library. Signed-off-by: Anatoly Burakov --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 906b31f97c..21c971a273 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1671,6 +1671,7 @@ M: Gaetan Rivet F: lib/pci/ Power management +M: Anatoly Burakov M: David Hunt F: lib/power/ F: doc/guides/prog_guide/power_man.rst Acked-by: David Hunt
Re: [PATCH v2] examples/distributor: update dynamic configuration
On 02/09/2022 06:20, Omer Yamac wrote: Hi David, I applied the changes as new version (v3), Thank you --snip-- With the above suggested changes to the commit message: Reviewed-by: David Hunt Hi Omer, 2 things: Usually you submit subsequent versions of patches "in-reply-to" the previous versions to keep them together. Please see other examples of this on the mailing list. It is normal to include any Reviewed-by or Acked-by tags in subsequent versions if the patch has not changed significantly. Rgds, Dave.
Re: [PATCH v3] examples/distributor: update dynamic configuration
On 01/09/2022 15:09, omer.yamac at ceng.metu.edu.tr (Abdullah Ömer Yamaç) wrote: In this patch, * It is possible to switch the running mode of the distributor using the command line argument. * With "-c" parameter, you can run RX and Distributor on the same core. * Without "-c" parameter, you can run RX and Distributor on the different core. * Consecutive termination of the lcores fixed. The termination order was wrong, and you couldn't terminate the application while traffic was capturing. The current order is RX -> Distributor -> TX -> Workers * When "-c" parameter is active, the wasted distributor core is also deactivated in the main function. Signed-off-by: Abdullah ?mer Yama? --- Cc: david.hunt at intel.com --- LGTM Reviewed-by: David Hunt
Re: [PATCH v2 3/3] l3fwd-power: add option to call uncore API
Hi Tadhg, On 13/07/2022 15:07, Tadhg Kearney wrote: Add option for setting uncore frequency min/max/index, through uncore api. This will be set for each package and die on the SKU. On exit, uncore frequency will be reverted back to previous frequency. Signed-off-by: Tadhg Kearney --- .../sample_app_ug/l3_forward_power_man.rst| 28 +++ examples/l3fwd-power/main.c | 190 -- 2 files changed, 202 insertions(+), 16 deletions(-) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index 8f6d906200..1e452140a1 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -97,6 +97,12 @@ where, * -P: Sets all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address. Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted. +* -u: optional, sets uncore frequency to minimum value. + +* -U: optional, sets uncore frequency to maximum value. + +* -i (frequency index): optional, sets uncore frequency to frequency index value, by setting min and max values to be the same. + * --config (port,queue,lcore)[,(port,queue,lcore)]: determines which queues from which ports are mapped to which cores. * --max-pkt-len: optional, maximum packet length in decimal (64-9600) @@ -364,3 +370,25 @@ in the DPDK Programmer's Guide for more details on PMD power management. .. code-block:: console .//examples/dpdk-l3fwd-power -l 1-3 -- -p 0x0f --config="(0,0,2),(0,1,3)" --pmd-mgmt=scale + +Setting Uncore Values +- + +Uncore frequency can be adjusted through manipulating related sysfs entries to adjust the minimum and maximum uncore values. +This will be set for each package and die on the SKU. Three options are available for setting uncore frequency: + +``-u`` + This will set uncore to minimum frequency possible. + +``-U`` + This will set uncore to maximum frequency possible. + +``-i`` + This will allow you to set the specific uncore frequency index that you want, by setting + minimum and maximum values to be the same. Frequency index's are set 10Hz apart from + maximum to minimum. + Frequency index values are in descending order, ie, index 0 is maximum frequency index. + +.. code-block:: console + +.//examples/dpdk-l3fwd-power -l 1-3 -- -p 0x0f --config="(0,0,2),(0,1,3)" -i 1 diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 887c6eae3f..5f74e29e3a 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -15,6 +15,8 @@ #include #include #include +#include +#include #include #include @@ -47,6 +49,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -71,6 +74,7 @@ #ifndef APP_LOOKUP_METHOD #define APP_LOOKUP_METHOD APP_LOOKUP_LPM + #endif #if (APP_LOOKUP_METHOD == APP_LOOKUP_EXACT_MATCH) @@ -83,7 +87,7 @@ #ifndef IPv6_BYTES #define IPv6_BYTES_FMT "%02x%02x:%02x%02x:%02x%02x:%02x%02x:"\ - "%02x%02x:%02x%02x:%02x%02x:%02x%02x" + "%02x%02x:%02x%02x:%02x%02x:%02x%02x" #define IPv6_BYTES(addr) \ addr[0], addr[1], addr[2], addr[3], \ addr[4], addr[5], addr[6], addr[7], \ @@ -134,9 +138,14 @@ #define NUM_TELSTATS RTE_DIM(telstats_strings) +#define UNCORE_FREQUENCY_DIR "/sys/devices/system/cpu/intel_uncore_frequency" + static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT; static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT; +/* Max number of nodes times dies available on uncore */ +#define MAX_DIE_NODES (RTE_MAX_NUMA_DIE * RTE_MAX_NUMA_NODES) + /* ethernet addresses of ports */ static struct rte_ether_addr ports_eth_addr[RTE_MAX_ETHPORTS]; @@ -145,6 +154,8 @@ static rte_spinlock_t locks[RTE_MAX_ETHPORTS]; /* mask of enabled ports */ static uint32_t enabled_port_mask = 0; +/* if uncore frequency was enabled without errors */ +static int enabled_uncore; /* Ports set in promiscuous mode off by default. */ static int promiscuous_on = 0; /* NUMA is enabled by default. */ @@ -165,6 +176,13 @@ struct telstats_name { char name[RTE_ETH_XSTATS_NAME_SIZE]; }; +struct uncore_info { + unsigned int pkg; + unsigned int die; +}; + +struct uncore_info ui[MAX_DIE_NODES]; + /* telemetry stats to be reported */ const struct telstats_name telstats_strings[] = { {"empty_poll"}, @@ -557,9 +575,9 @@ static void print_ipv6_key(struct ipv6_5tuple key) { printf( "IP dst = " IPv6_BYTES_FMT ", IP src = " IPv6_BYTES_FMT ", " - "port dst = %d, port src = %d, proto = %d\n", - IPv6_BYTES(key.ip_dst), IPv6_BYTES(key.ip_src), - key.port
Re: [PATCH v4 2/3] l3fwd-power: add option to call uncore API
On 20/09/2022 10:48, Tadhg Kearney wrote: Add option for setting uncore frequency min/max/index, through uncore API. This will be set for each package and die on the SKU. On exit, uncore min and max frequency will be reverted back to previous frequencies. Signed-off-by: Tadhg Kearney --- .../sample_app_ug/l3_forward_power_man.rst| 29 + examples/l3fwd-power/main.c | 122 +- 2 files changed, 148 insertions(+), 3 deletions(-) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index 8f6d906200..08ac8ef369 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -97,6 +97,12 @@ where, * -P: Sets all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address. Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted. +* -u: optional, sets uncore min/max frequency to minimum value. + +* -U: optional, sets uncore min/max frequency to maximum value. + +* -i (frequency index): optional, sets uncore frequency to frequency index value, by setting min and max values to be the same. + * --config (port,queue,lcore)[,(port,queue,lcore)]: determines which queues from which ports are mapped to which cores. * --max-pkt-len: optional, maximum packet length in decimal (64-9600) @@ -364,3 +370,26 @@ in the DPDK Programmer's Guide for more details on PMD power management. .. code-block:: console .//examples/dpdk-l3fwd-power -l 1-3 -- -p 0x0f --config="(0,0,2),(0,1,3)" --pmd-mgmt=scale + +Setting Uncore Values +- + +Uncore frequency can be adjusted through manipulating related sysfs entries to adjust the minimum and maximum uncore values. +This will be set for each package and die on the SKU. The driver for enabling this is available from kernel version 5.6 and above. +Three options are available for setting uncore frequency: + +``-u`` + This will set uncore minimum and maximum frequencies to minimum possible value. + +``-U`` + This will set uncore minimum and maximum frequencies to maximum possible value. + +``-i`` + This will allow you to set the specific uncore frequency index that you want, by setting + the uncore frequency to a frequency pointed by index. Frequency index's are set 100MHz apart from + maximum to minimum. + Frequency index values are in descending order, ie, index 0 is maximum frequency index. + +.. code-block:: console + +.//examples/dpdk-l3fwd-power -l 1-3 -- -p 0x0f --config="(0,0,2),(0,1,3)" -i 1 diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 887c6eae3f..d1a32594c0 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -179,6 +180,12 @@ enum busy_rate { FULL = 100 }; +enum uncore_choice { + UNCORE_MIN = 0, + UNCORE_MAX = 1, + UNCORE_IDX = 2 +}; + /* reference poll count to measure core busyness */ #define DEFAULT_COUNT 1 /* @@ -1616,6 +1623,9 @@ print_usage(const char *prgname) " [--max-pkt-len PKTLEN]\n" " -p PORTMASK: hexadecimal bitmask of ports to configure\n" " -P: enable promiscuous mode\n" + " -u: set min/max frequency for uncore to minimum value\n" + " -U: set min/max frequency for uncore to maximum value\n" + " -i (frequency index): set min/max frequency for uncore to specified frequency index\n" " --config (port,queue,lcore): rx queues configuration\n" " --high-perf-cores CORELIST: list of high performance cores\n" " --perf-config: similar as config, cores specified as indices" @@ -1672,6 +1682,74 @@ static int parse_max_pkt_len(const char *pktlen) return len; } +static int +parse_uncore_options(enum uncore_choice choice, const char *argument) +{ + unsigned int die, pkg, max_pkg, max_die; + int ret = 0; + max_pkg = rte_power_uncore_get_num_pkgs(); + if (max_pkg == 0) + return -1; + + for (pkg = 0; pkg < max_pkg; pkg++) { + max_die = rte_power_uncore_get_num_dies(pkg); + if (max_die == 0) + return -1; + for (die = 0; die < max_die; die++) { + ret = rte_power_uncore_init(pkg, die); + if (ret == -1) { + RTE_LOG(INFO, L3FWD_POWER, "Unable to initialize uncore for pkg %02u die %02u\n" + , pkg, die); + return ret; + } + if (choice == UNCORE_MIN) { +
Re: [PATCH v4 1/3] power: add uncore frequency control API to the power library
On 20/09/2022 10:48, Tadhg Kearney wrote: Add API to allow uncore frequency adjustment. This is done through manipulating related uncore frequency control sysfs entries to adjust the minimum and maximum uncore frequency values. Nine API's are being added that are all public and experimental. Signed-off-by: Tadhg Kearney --- doc/guides/prog_guide/power_man.rst| 37 ++ doc/guides/rel_notes/release_22_11.rst | 5 + lib/power/meson.build | 2 + lib/power/rte_power_uncore.c | 447 + lib/power/rte_power_uncore.h | 194 +++ lib/power/version.map | 11 + 6 files changed, 696 insertions(+) create mode 100644 lib/power/rte_power_uncore.c create mode 100644 lib/power/rte_power_uncore.h diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index 98cfd3c1f3..49ff3edef0 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -276,6 +276,43 @@ API Overview for Ethernet PMD Power Management * **Set Scaling Max Freq**: Set the maximum frequency (kHz) to be used in Frequency Scaling mode. +Uncore API +-- + +Abstract + + +Uncore is a term used by Intel to describe the functions of a microprocessor that are +not in the core, but which must be closely connected to the core to achieve high performance; +L3 cache, on-die memory controller, etc. +Significant power savings can be achieved by reducing the uncore frequency to its lowest value. + +The Linux kernel provides the driver “intel-uncore-frequency" to control the uncore frequency limits +for x86 platform. The driver is available from kernel version 5.6 and above. +This manipulates the contest of MSR 0x620, which sets min/max of the uncore for the SKU. Need to mention that the uncore frequency control is made available in the kernel by enabling the CONFIG_INTEL_UNCORE_FREQ_CONTROL kernel option which was added in 5.6 + + +API Overview for Uncore +~~~ +* **Uncore Power Init**: Initialise uncore power, populate frequency array and record + original min & max for pkg & die. + +* **Uncore Power Exit**: Exit uncore power, restoring original min & max for pkg & die. + +* **Get Uncore Power Freq**: Get current uncore freq index for pkg & die. + +* **Set Uncore Power Freq**: Set min & max uncore freq index for pkg & die (min and max will be the same). + +* **Uncore Power Max**: Set max uncore freq index for pkg & die. + +* **Uncore Power Min**: Set min uncore freq index for pkg & die. + +* **Get Num Freqs**: Get the number of frequencies in the index array. + +* **Get Num Pkgs**: Get the number of packages (CPUs) on the system. + +* **Get Num Dies**: Get the number of die's on a given package. + References -- diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst index 8c021cf050..8e184034d8 100644 --- a/doc/guides/rel_notes/release_22_11.rst +++ b/doc/guides/rel_notes/release_22_11.rst @@ -55,6 +55,11 @@ New Features Also, make sure to start the actual text at the margin. === +* **Added uncore frequency control API to the power library.** + + Add api to allow uncore frequency adjustment. This is done through + manipulating related uncore frequency control sysfs entries to + adjust the minimum and maximum uncore frequency values. Removed Items - diff --git a/lib/power/meson.build b/lib/power/meson.build index ba8d66074b..80cdeb72d4 100644 --- a/lib/power/meson.build +++ b/lib/power/meson.build @@ -21,12 +21,14 @@ sources = files( 'rte_power.c', 'rte_power_empty_poll.c', 'rte_power_pmd_mgmt.c', +'rte_power_uncore.c', ) headers = files( 'rte_power.h', 'rte_power_empty_poll.h', 'rte_power_pmd_mgmt.h', 'rte_power_guest_channel.h', +'rte_power_uncore.h', ) if cc.has_argument('-Wno-cast-qual') cflags += '-Wno-cast-qual' diff --git a/lib/power/rte_power_uncore.c b/lib/power/rte_power_uncore.c new file mode 100644 index 00..b3004e5bfc --- /dev/null +++ b/lib/power/rte_power_uncore.c @@ -0,0 +1,447 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Intel Corporation + */ + +#include +#include +#include + +#include + +#include "rte_power_uncore.h" +#include "power_common.h" + +#define MAX_UNCORE_FREQS 32 +#define MAX_NUMA_DIE 8 +#define BUS_FREQ 10 +#define FILTER_LENGTH 18 +#define PACKAGE_FILTER "package_%02u_die_*" +#define DIE_FILTER "package_%02u_die_%02u" +#define UNCORE_FREQUENCY_DIR "/sys/devices/system/cpu/intel_uncore_frequency" +#define POWER_GOVERNOR_PERF "performance" +#define POWER_UNCORE_SYSFILE_MAX_FREQ \ + "/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/max_freq_khz" +#define POWER_UNCORE_SYSFILE_MIN_FREQ \ +
Re: [PATCH v4 3/3] test/power: add unit tests for uncore API
Hi Tadhg, On 20/09/2022 10:48, Tadhg Kearney wrote: Add basic unit tests covering all nine uncore API's. Signed-off-by: Tadhg Kearney --- app/test/meson.build | 2 + app/test/test_power_uncore.c | 299 +++ 2 files changed, 301 insertions(+) create mode 100644 app/test/test_power_uncore.c diff --git a/app/test/meson.build b/app/test/meson.build index bf1d81f84a..170401ccdc 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -100,6 +100,7 @@ test_sources = files( 'test_power.c', 'test_power_cpufreq.c', 'test_power_kvm_vm.c', +'test_power_uncore.c', 'test_prefetch.c', 'test_rand_perf.c', 'test_rawdev.c', @@ -240,6 +241,7 @@ fast_tests = [ ['power_cpufreq_autotest', false, true], ['power_autotest', true, true], ['power_kvm_vm_autotest', false, true], +['power_uncore_autotest', true, true], ['reorder_autotest', true, true], ['service_autotest', true, true], ['thash_autotest', true, true], diff --git a/app/test/test_power_uncore.c b/app/test/test_power_uncore.c new file mode 100644 index 00..7bc3ed7260 --- /dev/null +++ b/app/test/test_power_uncore.c @@ -0,0 +1,299 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2022 Intel Corporation + */ + +#include "test.h" + +#ifndef RTE_LIB_POWER + +static int +test_power_uncore(void) +{ + printf("Power management library not supported, skipping test\n"); + return TEST_SKIPPED; +} + +#else +#include +#include + +#define MAX_UNCORE_FREQS 32 + +#define VALID_PKG 0 +#define VALID_DIE 0 +#define INVALID_PKG (rte_power_uncore_get_num_pkgs() + 1) +#define INVALID_DIE (rte_power_uncore_get_num_dies(VALID_PKG) + 1) +#define VALID_INDEX 1 +#define INVALID_INDEX (MAX_UNCORE_FREQS + 1) + +static int check_power_uncore_init(void) +{ + int ret; + + /* Test initialisation of uncore configuration*/ + ret = rte_power_uncore_init(VALID_PKG, VALID_DIE); + if (ret < 0) { + printf("Cannot initialise uncore power management for pkg %u die %u, this " + "may occur if environment is not configured " + "correctly(APCI cpufreq) or operating in another valid " + "Power management environment\n", VALID_PKG, VALID_DIE); + return -1; + } + + /* Unsuccessful Test */ + ret = rte_power_uncore_init(INVALID_PKG, INVALID_DIE); + if (ret == 0) { + printf("Unexpectedly was able to initialise uncore power management " + "for pkg %u die %u\n", INVALID_PKG, INVALID_DIE); + return -1; + } + + return 0; +} + +static int +check_power_get_uncore_freq(void) +{ + int ret; + + /* Successfully get uncore freq */ + ret = rte_power_get_uncore_freq(VALID_PKG, VALID_DIE); + if (ret < 0) { + printf("Failed to get uncore frequency for pkg %u die %u\n", + VALID_PKG, VALID_DIE); + return -1; + } + + /* Unsuccessful Test */ + ret = rte_power_get_uncore_freq(INVALID_PKG, INVALID_DIE); + if (ret >= 0) { + printf("Unexpectedly got invalid uncore frequency for pkg %u die %u\n", + INVALID_PKG, INVALID_DIE); + return -1; + } + + return 0; +} + +static int +check_power_set_uncore_freq(void) +{ + int ret; + + /* Successfully set uncore freq */ + ret = rte_power_set_uncore_freq(VALID_PKG, VALID_DIE, VALID_INDEX); + if (ret < 0) { + printf("Failed to set uncore frequency for pkg %u die %u index %u\n", + VALID_PKG, VALID_DIE, VALID_INDEX); + return -1; + } + + /* Try to unsuccessfully set invalid uncore freq index */ + ret = rte_power_set_uncore_freq(VALID_PKG, VALID_DIE, INVALID_INDEX); + if (ret == 0) { + printf("Unexpectedly set invalid uncore index for pkg %u die %u index %u\n", + VALID_PKG, VALID_DIE, INVALID_INDEX); + return -1; + } + + /* Unsuccessful Test */ + ret = rte_power_set_uncore_freq(INVALID_PKG, INVALID_DIE, VALID_INDEX); + if (ret == 0) { + printf("Unexpectedly set invalid uncore frequency for pkg %u die %u index %u\n", + INVALID_PKG, INVALID_DIE, VALID_INDEX); + return -1; + } + + return 0; +} + +static int +check_power_uncore_freq_max(void) +{ + int ret; + + /* Successfully get max uncore freq */ + ret = rte_power_uncore_freq_max(VALID_PKG, VALID_DIE); + if (ret < 0) { + printf("Failed to set m
Re: [PATCH v6 2/3] l3fwd-power: add option to call uncore API
Hi Tadhg, On 28/09/2022 10:06, Tadhg Kearney wrote: Add option for setting uncore frequency min/max/index, through uncore API. This will be set for each package and die on the SKU. On exit, uncore min and max frequency will be reverted back to previous frequencies. Signed-off-by: Tadhg Kearney Reviewed-by: David Hunt --- .../sample_app_ug/l3_forward_power_man.rst| 29 examples/l3fwd-power/main.c | 126 +- 2 files changed, 153 insertions(+), 2 deletions(-) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index 8f6d906200..08ac8ef369 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -97,6 +97,12 @@ where, * -P: Sets all ports to promiscuous mode so that packets are accepted regardless of the packet's Ethernet MAC destination address. Without this option, only packets with the Ethernet MAC destination address set to the Ethernet address of the port are accepted. +* -u: optional, sets uncore min/max frequency to minimum value. + +* -U: optional, sets uncore min/max frequency to maximum value. + +* -i (frequency index): optional, sets uncore frequency to frequency index value, by setting min and max values to be the same. + * --config (port,queue,lcore)[,(port,queue,lcore)]: determines which queues from which ports are mapped to which cores. * --max-pkt-len: optional, maximum packet length in decimal (64-9600) @@ -364,3 +370,26 @@ in the DPDK Programmer's Guide for more details on PMD power management. .. code-block:: console .//examples/dpdk-l3fwd-power -l 1-3 -- -p 0x0f --config="(0,0,2),(0,1,3)" --pmd-mgmt=scale + +Setting Uncore Values +- + +Uncore frequency can be adjusted through manipulating related sysfs entries to adjust the minimum and maximum uncore values. +This will be set for each package and die on the SKU. The driver for enabling this is available from kernel version 5.6 and above. +Three options are available for setting uncore frequency: + +``-u`` + This will set uncore minimum and maximum frequencies to minimum possible value. + +``-U`` + This will set uncore minimum and maximum frequencies to maximum possible value. + +``-i`` + This will allow you to set the specific uncore frequency index that you want, by setting + the uncore frequency to a frequency pointed by index. Frequency index's are set 100MHz apart from + maximum to minimum. + Frequency index values are in descending order, ie, index 0 is maximum frequency index. + +.. code-block:: console + +.//examples/dpdk-l3fwd-power -l 1-3 -- -p 0x0f --config="(0,0,2),(0,1,3)" -i 1 diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 887c6eae3f..0b6acc8e58 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -161,6 +162,9 @@ static struct rte_timer telemetry_timer; /* stats index returned by metrics lib */ int telstats_index; +/* flag to check if uncore option enabled */ +int enabled_uncore = -1; + struct telstats_name { char name[RTE_ETH_XSTATS_NAME_SIZE]; }; @@ -179,6 +183,12 @@ enum busy_rate { FULL = 100 }; +enum uncore_choice { + UNCORE_MIN = 0, + UNCORE_MAX = 1, + UNCORE_IDX = 2 +}; + /* reference poll count to measure core busyness */ #define DEFAULT_COUNT 1 /* @@ -1616,6 +1626,9 @@ print_usage(const char *prgname) " [--max-pkt-len PKTLEN]\n" " -p PORTMASK: hexadecimal bitmask of ports to configure\n" " -P: enable promiscuous mode\n" + " -u: set min/max frequency for uncore to minimum value\n" + " -U: set min/max frequency for uncore to maximum value\n" + " -i (frequency index): set min/max frequency for uncore to specified frequency index\n" " --config (port,queue,lcore): rx queues configuration\n" " --high-perf-cores CORELIST: list of high performance cores\n" " --perf-config: similar as config, cores specified as indices" @@ -1672,6 +1685,74 @@ static int parse_max_pkt_len(const char *pktlen) return len; } +static int +parse_uncore_options(enum uncore_choice choice, const char *argument) +{ + unsigned int die, pkg, max_pkg, max_die; + int ret = 0; + max_pkg = rte_power_uncore_get_num_pkgs(); + if (max_pkg == 0) + return -1; + + for (pkg = 0; pkg < max_pkg; pkg++) { + max_die = rte_power_uncore_get_num_dies(pkg); + if (max_die == 0) + return -1; + for (die = 0; die < max_die; die++) { + ret = rte_power_uncore_init(pkg, die); +
Re: [PATCH v7 0/3] add uncore api to be called through l3fwd-power
Hi Tadhg, On 28/09/2022 14:30, Tadhg Kearney wrote: This is targeting 22.11 and aims to add an API to DPDK power library to allow uncore frequency adjustment. This will be called through the l3fwd-power app and gives the ability to set the minimum and maximum uncore frequency to both min, max or specific frequency index. Signed-off-by: tadhgkearney Reviewed-by: David Hunt --- v2: Fix compilation warnings and errors. v3: Remove addition of x86 global macros. Add 2 new API's for getting package and die numbers from system. Address comments from mailing list. Improve efficiency of code and code quality. v4: Fix compilation warnings and errors. v5: Improve error message for uncore access not working. v6: Fix uncore exit being called if uncore option not selected. v7: Address ml comments. Tadhg Kearney (3): power: add uncore frequency control API to the power library l3fwd-power: add option to call uncore API test/power: add unit tests for uncore API app/test/meson.build | 2 + app/test/test_power_uncore.c | 301 doc/guides/prog_guide/power_man.rst | 38 ++ doc/guides/rel_notes/release_22_11.rst| 5 + .../sample_app_ug/l3_forward_power_man.rst| 29 ++ examples/l3fwd-power/main.c | 126 - lib/power/meson.build | 2 + lib/power/rte_power_uncore.c | 451 ++ lib/power/rte_power_uncore.h | 194 lib/power/version.map | 11 + 10 files changed, 1157 insertions(+), 2 deletions(-) create mode 100644 app/test/test_power_uncore.c create mode 100644 lib/power/rte_power_uncore.c create mode 100644 lib/power/rte_power_uncore.h Latest revision looks good to me. Consider the series: Acked-by: David Hunt
Re: [PATCH v9 1/3] power: add Intel uncore frequency control API to power library
On 06/10/2022 18:32, Stephen Hemminger wrote: On Thu, 6 Oct 2022 09:38:01 + Tadhg Kearney wrote: Add API to allow uncore frequency adjustment. Uncore is a term used by Intel to describe function of a microprocessor that are closely connected to the core to achieve high performance. This is done through manipulating related uncore frequency control sysfs entries to adjust the minimum and maximum uncore frequency values and works on Linux for Intel hardware. Signed-off-by: Tadhg Kearney Reviewed-by: David Hunt Acked-by: David Hunt Hi Stephen, Looks like this is missing an opportunity for a more general long term solution in DPDK. We're hoping that this is the first step along the path to that long-term solution. It's like the power library frequency control for cores, which was initially Intel only, and then more architectures were added over time. The API's are experimental, so can be adapted if needed. Shouldn't this be a general thing like the Linux kernel scheduler. I don't think the kernel scheduler has any concept of uncore busyness, the uncore frequency is typically controlled by hardware, and if there's enough polling going on, the frequency of the uncore will remain high (if uncore frequency scaling is enabled). We're addressing that in this patch in that if the application realises that it's not processing a lot of packets even though most of it's cores are polling, it can tell the hardware to scale down the uncore to save power. Uncore is Intel specific, but there is already big/little cores on many ARM platforms. I don't think big/little is related to uncore frequency scaling. The big/little cores still need to communicate via that architecture's communications bus (called uncore in Intel's case, though I've seen that term used on other architectures also). Where an architecture can scale the frequency of this communications bus, this patch set's functionality can be extended in the future to cover this. Regards, Dave
Re: [PATCH v1 1/3] examples/power: remove empty poll mode from l3fwd-power
On 20/12/2022 12:56, David Hunt wrote: Remove calls to the experimental empty poll API. l3fwd-power is the only app that uses this. This API is no longer needed as it is superceded by the monitor/pause/scale callback mechanism. I did check the spelling against my spellchecker with checkpatch, and it was clean. Too bad my dictionary uses a disputed spelling... From https://www.merriam-webster.com/dictionary/supercede: "Supercede has occurred as a spelling variant of supersede since the 17th century, and it is common in current published writing. It continues, however, to be widely regarded as an error." I'll fix in the next patch revision. Rgds, Dave.
Re: [PATCH v4] examples/vm_power_manager: use safe version of list iterator
On 22/08/2022 11:58, Reshma Pattan wrote: From: Hamza Khan Currently, when vm_power_manager exits, we are using a LIST_FOREACH macro to iterate over VM info structures while freeing them. This leads to use-after-free error. To address this, replace all usages of LIST_* with TAILQ_* macros, and use the RTE_TAILQ_FOREACH_SAFE macro to iterate and delete VM info structures. * The change is small and doesn’t affect other code * Testing was performed on the patch Fixes: e8ae9b662506 ("examples/vm_power: channel manager and monitor in host") Cc: alan.ca...@intel.com Cc: sta...@dpdk.org Signed-off-by: Hamza Khan Reviewed-by: Reshma Pattan Acked-by: Reshma Pattan Signed-off-by: Reshma Pattan --- v4: fix header file inclusion --- --snip-- Acked-by: David Hunt
Re: [PATCH] power: add unpriv. read of turbo % for pstate
Hi Markus, On 24/08/2022 20:28, Markus Theil wrote: If DPDK applications should be used with a minimal set of privileges, using the msr kernel module on linux should not be necessary. Since at least kernel 4.4 the rdmsr call to obtain the last non-turbo boost frequency can be left out, if the sysfs interface is used. Also RHEL 7 with recent kernel updates should include the sysfs interface for this (I only looked this up for CentOS 7). Signed-off-by: Markus Theil --- lib/power/power_pstate_cpufreq.c | 69 ++-- 1 file changed, 40 insertions(+), 29 deletions(-) diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c index 78c9197695..c3d66a8f68 100644 --- a/lib/power/power_pstate_cpufreq.c +++ b/lib/power/power_pstate_cpufreq.c @@ -35,15 +35,9 @@ "/sys/devices/system/cpu/cpu%u/cpufreq/cpuinfo_min_freq" #define POWER_SYSFILE_BASE_FREQ \ "/sys/devices/system/cpu/cpu%u/cpufreq/base_frequency" +#define POWER_SYSFILE_TURBO_PCT \ + "/sys/devices/system/cpu/intel_pstate/turbo_pct" #define POWER_PSTATE_DRIVER "intel_pstate" -#define POWER_MSR_PATH "/dev/cpu/%u/msr" - -/* - * MSR related - */ -#define PLATFORM_INFO 0x0CE -#define NON_TURBO_MASK0xFF00 -#define NON_TURBO_OFFSET 0x8 enum power_state { @@ -74,37 +68,33 @@ struct pstate_power_info { static struct pstate_power_info lcore_power_info[RTE_MAX_LCORE]; /** - * It is to read the specific MSR. + * It is to read the turbo mode percentage from sysfs */ - static int32_t -power_rdmsr(int msr, uint64_t *val, unsigned int lcore_id) +power_read_turbo_pct(uint64_t *outVal) { int fd, ret; - char fullpath[PATH_MAX]; + char val[4] = {0}; - snprintf(fullpath, sizeof(fullpath), POWER_MSR_PATH, lcore_id); - - fd = open(fullpath, O_RDONLY); + fd = open(POWER_SYSFILE_TURBO_PCT, O_RDONLY); if (fd < 0) { - RTE_LOG(ERR, POWER, "Error opening '%s': %s\n", fullpath, + RTE_LOG(ERR, POWER, "Error opening '%s': %s\n", POWER_SYSFILE_TURBO_PCT, strerror(errno)); return fd; } - ret = pread(fd, val, sizeof(uint64_t), msr); + ret = read(fd, val, sizeof(val)); if (ret < 0) { - RTE_LOG(ERR, POWER, "Error reading '%s': %s\n", fullpath, + RTE_LOG(ERR, POWER, "Error reading '%s': %s\n", POWER_SYSFILE_TURBO_PCT, strerror(errno)); goto out; } - POWER_DEBUG_TRACE("MSR Path %s, offset 0x%X for lcore %u\n", - fullpath, msr, lcore_id); + *outVal = (uint64_t) atol(val); I'd recommend replacing atol with strtol, it's a safer implementation. It's more commonly found in DPDK code than atol. - POWER_DEBUG_TRACE("Ret value %d, content is 0x%"PRIx64"\n", ret, *val); + POWER_DEBUG_TRACE("power turbo pct: %"PRIu64"\n", *outVal); out: close(fd); return ret; @@ -116,8 +106,9 @@ out:close(fd); static int power_init_for_setting_freq(struct pstate_power_info *pi) { - FILE *f_base = NULL, *f_base_max = NULL, *f_min = NULL, *f_max = NULL; - uint32_t base_ratio, base_max_ratio; + FILE *f_base = NULL, *f_base_min = NULL, *f_base_max = NULL, +*f_min = NULL, *f_max = NULL; + uint32_t base_ratio, base_min_ratio, base_max_ratio; uint64_t max_non_turbo; int ret; @@ -130,6 +121,14 @@ power_init_for_setting_freq(struct pstate_power_info *pi) goto err; } + open_core_sysfs_file(&f_base_min, "r", POWER_SYSFILE_BASE_MIN_FREQ, + pi->lcore_id); + if (f_base_min == NULL) { + RTE_LOG(ERR, POWER, "failed to open %s\n", + POWER_SYSFILE_BASE_MIN_FREQ); + goto err; + } + open_core_sysfs_file(&f_min, "rw+", POWER_SYSFILE_MIN_FREQ, pi->lcore_id); if (f_min == NULL) { @@ -158,6 +157,14 @@ power_init_for_setting_freq(struct pstate_power_info *pi) goto err; } + /* read base min ratio */ + ret = read_core_sysfs_u32(f_base_min, &base_min_ratio); + if (ret < 0) { + RTE_LOG(ERR, POWER, "Failed to read %s\n", + POWER_SYSFILE_BASE_MIN_FREQ); + goto err; + } + /* base ratio may not exist */ if (f_base != NULL) { ret = read_core_sysfs_u32(f_base, &base_ratio); @@ -170,20 +177,22 @@ power_init_for_setting_freq(struct pstate_power_info *pi) base_ratio = 0; } - /* Add MSR read to detect turbo status */ - if (power_rdmsr(PLATFORM_INFO, &max_non_turbo, pi->lcore_id) < 0) - goto err; - /* no errors after this point */ - /* convert ratios to bins */ base_max_ratio /= BUS_FREQ; + base_
Re: [PATCH v1 3/3] examples/l3fwd-power: enable PMD power mgmt on Arm
On 25/08/2022 07:42, Feifei Wang wrote: For Arm aarch, power monitor uses WFE instruction to enable, which can not exit automatically within the time limit. This means 'rte_power_monitor_wakeup' API needs to be called to wake up sleep cores if there is no store operation to monitored address. Furthermore, we disable power monitor feature on the main core so that it can be used to wake up other sleeping cores when it receives SIGINT siginal. Signed-off-by: Feifei Wang Reviewed-by: Ruifeng Wang --- examples/l3fwd-power/main.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 887c6eae3f..2bd0d700f0 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -432,8 +432,16 @@ static void signal_exit_now(int sigtype) { - if (sigtype == SIGINT) + if (sigtype == SIGINT) { +#if defined(RTE_ARCH_ARM64) + /** +* wake_up api does not need input parameter on Arm, +* so 0 is meaningless here. +*/ + rte_power_monitor_wakeup(0); +#endif quit_signal = true; + } } @@ -2885,6 +2893,25 @@ main(int argc, char **argv) "Error setting scaling freq max: err=%d, lcore %d\n", ret, lcore_id); +#if defined(RTE_ARCH_ARM64) + /* Ensure the main lcore does not enter the power-monitor state, +* so that it can be used to wake up other lcores on ARM. +* This is due to WFE instruction has no timeout wake-up mechanism, +* and if users want to exit actively, the main lcore is needed +* to send SEV instruction to wake up other lcores. +*/ + unsigned int main_lcore = rte_get_main_lcore(); + if (lcore_id != main_lcore || + pmgmt_type != RTE_POWER_MGMT_TYPE_MONITOR) { + ret = rte_power_ethdev_pmgmt_queue_enable( + lcore_id, portid, queueid, + pmgmt_type); + if (ret < 0) + rte_exit(EXIT_FAILURE, + "rte_power_ethdev_pmgmt_queue_enable: err=%d, port=%d\n", + ret, portid); + } +#else ret = rte_power_ethdev_pmgmt_queue_enable( lcore_id, portid, queueid, pmgmt_type); @@ -2892,6 +2919,7 @@ main(int argc, char **argv) rte_exit(EXIT_FAILURE, "rte_power_ethdev_pmgmt_queue_enable: err=%d, port=%d\n", ret, portid); +#endif } } } Hi Feifei, Acked-by: David Hunt
Re: [PATCH] doc: add removal note for power empty poll API
On 02/08/2022 16:22, Reshma Pattan wrote: Add removal note for experimental empty poll API. CC: David Hunt Signed-off-by: Reshma Pattan --- doc/guides/prog_guide/power_man.rst | 6 ++ 1 file changed, 6 insertions(+) diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index 98cfd3c1f3..2e47d87cbb 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -192,6 +192,12 @@ User Cases -- The mechanism can applied to any device which is based on polling. e.g. NIC, FPGA. +Removal Note + +The experimental empty poll APIs will be removed from the library in a future DPDK release. +Suggest to use new lcore poll busyness APIs added in 22.11. + + Ethernet PMD Power Management API - Hi Reshma, Yes, these APIs will be superseded by the newer poll busyness telemetry. Acked-by: David Hunt
Re: [PATCH v2] examples/distributor: update dynamic configuration
Hi Ömer, On 28/06/2022 20:54, omer.yamac at ceng.metu.edu.tr (Abdullah Ömer Yamaç) wrote: In this patch, * It is possible to switch the running mode of the distributor using the command line argument. * With "-c" parameter, you can run RX and Distributor on the same core. * Without "-c" parameter, you can run RX and Distributor on the different core. * Syntax error of the single RX and distributor core is fixed. I believe this particular fix is already merged and back-ported to stable. No need to include this line in the commit message. * Consecutive termination of the lcores fixed. The termination order was wrong, and you couldn't terminate the application while traffic was capturing. The current order is RX -> Distributor -> TX -> Workers * When "-c" parameter is active, the wasted distributor core is also deactivated in the main function. Fixes: 4a7f40c0ff9a ("examples/distributor: add dedicated core") Cc: stable at dpdk.org This is a feature change, not a fix, so I don't believe you need the "Fixes" line or the "Cc: stable" line. Signed-off-by: Abdullah ?mer Yama? I've tested this with the "-c" option, works well. Traffic coming into the app is distributed among the core. With -c added to the command line parameters, I have an extra worker core, as expected. Looks good to me. With the above suggested changes to the commit message: Reviewed-by: David Hunt
Re: [dpdk-dev] [PATCH v4 1/6] lib: distributor performance enhancements
Thanks for the comments Bruce. Addressed below. On 16/1/2017 4:36 PM, Bruce Richardson wrote: On Mon, Jan 09, 2017 at 07:50:43AM +, David Hunt wrote: Now sends bursts of up to 8 mbufs to each worker, and tracks the in-flight flow-ids (atomic scheduling) New file with a new api, similar to the old API except with _burst at the end of the function names Can you explain why this is necessary, and also how the new version works compared to the old. I know this is explained in the cover letter, but the cover letter does not make the git commit log. Sure. I'll add extra comments into the git comment. The main reason is to preserve the original API. This gives the user the choice to migrate to the new API should they wish to. Signed-off-by: David Hunt --- diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c new file mode 100644 index 000..ae7cf9d --- /dev/null +++ b/lib/librte_distributor/rte_distributor_burst.c @@ -0,0 +1,558 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. Update year since we aren't in 2016 any more. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "rte_distributor_priv.h" +#include "rte_distributor_burst.h" + +TAILQ_HEAD(rte_dist_burst_list, rte_distributor_burst); + +static struct rte_tailq_elem rte_dist_burst_tailq = { +.name = "RTE_DIST_BURST", +}; +EAL_REGISTER_TAILQ(rte_dist_burst_tailq) + +/ APIs called by workers / + +/ Burst Packet APIs called by workers / + +/* This function should really be called return_pkt_burst() */ 1) Why should it be? 2) Why isn't it called that? Please explain the naming. It seemed to me that the main use of this function was to return the packets from the worker rather than requesting new packets, whilst also toggling the bit to tell the distributor to send more packets. So I guess it's OK as it is. I've removed the comment to remove this confusion. +void +rte_distributor_request_pkt_burst(struct rte_distributor_burst *d, +unsigned int worker_id, struct rte_mbuf **oldpkt, +unsigned int count) +{ +struct rte_distributor_buffer_burst *buf = &(d->bufs[worker_id]); +unsigned int i; + +volatile int64_t *retptr64; + + +/* if we dont' have any packets to return, return. */ +if (count == 0) +return; + So if we don't return anything we don't get any more packets, right? What happens if we return fewer packets than we were previously given? If that is allowed, why the restriction on returning at least one? You are correct. We should be able to return 0, and still flip the handshake bit to request more packets. This check will be removed. +retptr64 = &(buf->retptr64[0]); + +int +rte_distributor_get_pkt_burst(struct rte_distributor_burst *d, +unsigned int worker_id, struct rte_mbuf **pkts, +struct rte_mbuf **oldpkt, unsigned int return_count) +{ +unsigned int count; +uint64_t retries = 0; + +rte_distributor_request_pkt_burst(d, worker_id, oldpkt, return_count); + +count = rte_distributor_poll_pkt_burst(d, worker_id, pkts); +while (count == 0) { +rte_pause(); +retries++; +if (retries > 1000) +return 0; This behaviour is differe
Re: [dpdk-dev] [PATCH v4 2/6] lib: add distributor vector flow matching
On 16/1/2017 4:40 PM, Bruce Richardson wrote: On Mon, Jan 09, 2017 at 07:50:44AM +, David Hunt wrote: --- a/lib/librte_distributor/rte_distributor_burst.c +++ b/lib/librte_distributor/rte_distributor_burst.c @@ -352,6 +352,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d, } switch (d->dist_match_fn) { + case RTE_DIST_MATCH_VECTOR: + find_match_vec(d, &flows[0], &matches[0]); + break; default: find_match_scalar(d, &flows[0], &matches[0]); } Will link not fail on non-x86 platforms due to find_match_vec not having any implementation on those platforms? /Bruce I've added a fallback find_match_vec in rte_distributor_match_generic.c that calls find_match_scalar. Rgds, Dave.
Re: [dpdk-dev] [PATCH v4 1/6] lib: distributor performance enhancements
On 13/1/2017 3:19 PM, Bruce Richardson wrote: On Mon, Jan 09, 2017 at 07:50:43AM +, David Hunt wrote: Now sends bursts of up to 8 mbufs to each worker, and tracks the in-flight flow-ids (atomic scheduling) New file with a new api, similar to the old API except with _burst at the end of the function names Signed-off-by: David Hunt --- lib/librte_distributor/Makefile| 2 + lib/librte_distributor/rte_distributor.c | 72 +-- lib/librte_distributor/rte_distributor_burst.c | 558 + lib/librte_distributor/rte_distributor_burst.h | 255 ++ lib/librte_distributor/rte_distributor_priv.h | 189 +++ lib/librte_distributor/rte_distributor_version.map | 9 + 6 files changed, 1014 insertions(+), 71 deletions(-) create mode 100644 lib/librte_distributor/rte_distributor_burst.c create mode 100644 lib/librte_distributor/rte_distributor_burst.h create mode 100644 lib/librte_distributor/rte_distributor_priv.h Run a documentation sanity check after this patch throws up a few warnings: --- /dev/null 2017-01-10 10:26:01.206201474 + +++ /tmp/doc-check/doc.txt 2017-01-13 15:19:50.717102848 + @@ -0,0 +1,6 @@ +/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_burst.h:187: warning: argument 'mbuf' of command @param is not found in the argument list of rte_distributor_return_pkt_burst(struct rte_distributor_burst *d, unsigned int worker_id, struct rte_mbuf **oldpkt, int num) +/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_burst.h:199: warning: The following parameters of rte_distributor_return_pkt_burst(struct rte_distributor_burst *d, unsigned int worker_id, struct rte_mbuf **oldpkt, int num) are not documented: + parameter 'oldpkt' + parameter 'num' +/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_priv.h:73: warning: Found unknown command `\in_flight_bitmask' +/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_priv.h:73: warning: Found unknown command `\rte_distributor_process' Regards, /Bruce Will be cleaned up in next revsion. Thanks, Dave.
Re: [dpdk-dev] [PATCH v4 2/6] lib: add distributor vector flow matching
On 13/1/2017 3:26 PM, Bruce Richardson wrote: On Mon, Jan 09, 2017 at 07:50:44AM +, David Hunt wrote: Signed-off-by: David Hunt --- lib/librte_distributor/Makefile| 4 + lib/librte_distributor/rte_distributor_burst.c | 11 +- lib/librte_distributor/rte_distributor_match_sse.c | 113 + lib/librte_distributor/rte_distributor_priv.h | 6 ++ 4 files changed, 133 insertions(+), 1 deletion(-) create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile index 2acc54d..a725aaf 100644 --- a/lib/librte_distributor/Makefile +++ b/lib/librte_distributor/Makefile @@ -44,6 +44,10 @@ LIBABIVER := 1 # all source are stored in SRCS-y SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c +ifeq ($(CONFIG_RTE_ARCH_X86),y) +SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c +endif + I believe some of the intrinsics used in the vector code are SSE4.2 instructions, so you need to pass that flag for the compilation for e.g. the "default" target for packaging into distros. # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c index ae7cf9d..35044c4 100644 --- a/lib/librte_distributor/rte_distributor_burst.c +++ b/lib/librte_distributor/rte_distributor_burst.c @@ -352,6 +352,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d, } switch (d->dist_match_fn) { + case RTE_DIST_MATCH_VECTOR: + find_match_vec(d, &flows[0], &matches[0]); + break; default: find_match_scalar(d, &flows[0], &matches[0]); } @@ -538,7 +541,13 @@ rte_distributor_create_burst(const char *name, snprintf(d->name, sizeof(d->name), "%s", name); d->num_workers = num_workers; - d->dist_match_fn = RTE_DIST_MATCH_SCALAR; +#if defined(RTE_ARCH_X86) + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2)) { + d->dist_match_fn = RTE_DIST_MATCH_VECTOR; + } else { +#endif + d->dist_match_fn = RTE_DIST_MATCH_SCALAR; + } Two issues here: 1) the check needs to be for SSE4.2, not SSE2 [minimum for DPDK on x86 is SSE3 anyway, so no need for any checks for SSE2] 2) The closing brace should be ifdefed out to fix compilation on non-x86 platforms. A simpler/better solution might actually be to remove the braces since only a single line is involved in each branch. Regards, /Bruce Will be resolved up in next revision. Thanks, Dave.
Re: [dpdk-dev] [PATCH v1] doc: add distributor library API change notice
On 9/2/2017 2:20 PM, Ferruh Yigit wrote: On 2/6/2017 8:08 AM, David Hunt wrote: Given that the packet distributor library improvements (1) will not be in 17.02, I plan on doing some consolidation of the API for burst operation for 17.05, merging the two api's into one, with options for single or burst operation. (1) http://dpdk.org/dev/patchwork/patch/19911/ Signed-off-by: David Hunt --- doc/guides/rel_notes/deprecation.rst | 6 ++ 1 file changed, 6 insertions(+) diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 755dc65..925e156 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -55,6 +55,12 @@ Deprecation Notices and will be removed in 17.02. It is replaced by ``rte_mempool_generic_get/put`` functions. +* lib: distributor library API will be changed to incorporate a burst- + oriented API. This will include a change to ``rte_distributor_create`` + to specify which type of instance to create (single or burst), and + additional calls for ``rte_poll_pkt_burst`` and ``rte_return_pkt_burst``, + among others. Should new APIs (rte_poll_pkt_burst & rte_return_pkt_burst) have "rte_distributor_" name space? Apart from this: Acked-by: Ferruh Yigit Ferruh, Thanks for the third Ack. Thomas, Would you prefer me to re-spin the patch after inserting "_distributor" into the two function names, or would you be so good as to do it during the merge? Regards, Dave.
Re: [dpdk-dev] [PATCH v2 1/5] lib: distributor performance enhancements
Thanks for the review, Jerin, I very much appreciate it. I'll address all the minor comments, and I've a comment or two on the remaining changes below. On 22/12/2016 12:47 PM, Jerin Jacob wrote: On Thu, Dec 22, 2016 at 04:37:04AM +, David Hunt wrote: --snip-- + + /* set the GET_BUF but even if we got no returns */ + buf->retptr64[0] |= RTE_DISTRIB_GET_BUF; + + return 0; +} + +#if RTE_MACHINE_CPUFLAG_SSE2 +static inline void Move SSE version of the code to separate file so that later other SIMD arch specific version like NEON can be incorporated. Sure. Will do. I'll model it on the i40e SIMD layout. + switch (d->dist_match_fn) { +#ifdef RTE_MACHINE_CPUFLAG_SSE2 Is this conditional compilation flag is really required ? i.e RTE_DIST_MATCH_SSE will not enabled in non SSE case So I can always leave the call to find_match_sse2 in there, but the run-time cpu flags check will take care of whether it's called or not? OK sure. Thanks, Dave.
Re: [dpdk-dev] [PATCH v2 3/5] test: add distributor_perf autotest
On 22/12/2016 12:19 PM, Jerin Jacob wrote: On Thu, Dec 22, 2016 at 04:37:06AM +, David Hunt wrote: + struct rte_distributor_burst *d = arg; + unsigned int count = 0; + unsigned int num = 0; + unsigned int id = __sync_fetch_and_add(&worker_idx, 1); Use rte_atomic equivalent Jerin, I'm looking for an equivalent, but I can't seem to find one. Here I'm assigning 'id' with the incremented value of worker_idx in one statement. However, rte_atomic32_add just increments the variable and returns void, so I'd have to use two statements, losing the atomicity. static inline void rte_atomic32_add(rte_atomic32_t *v, int32_t inc) There's a second reason why I can't use the rte_atomics, and that's because worker_idx is a volatile. Maybe we could add new atomic functions in the future to address this? Thanks, Dave.
Re: [dpdk-dev] [PATCH v5 2/3] doc: add sw eventdev pipeline to sample app ug
Hi Jerin, On 3/7/2017 6:37 AM, Jerin Jacob wrote: -Original Message- From: Harry van Haaren Adi a new entry in the sample app user-guides, which details the working of the eventdev_pipeline_sw. Signed-off-by: Harry van Haaren Signed-off-by: David Hunt --- doc/guides/sample_app_ug/eventdev_pipeline_sw.rst | 190 ++ doc/guides/sample_app_ug/index.rst| 1 + 2 files changed, 191 insertions(+) create mode 100644 doc/guides/sample_app_ug/eventdev_pipeline_sw.rst diff --git a/doc/guides/sample_app_ug/eventdev_pipeline_sw.rst b/doc/guides/sample_app_ug/eventdev_pipeline_sw.rst new file mode 100644 index 000..65c33a8 --- /dev/null +++ b/doc/guides/sample_app_ug/eventdev_pipeline_sw.rst @@ -0,0 +1,190 @@ + + +Eventdev Pipeline SW Sample Application Eventdev Pipeline SW PMD Sample Application Sure. +=== + +The eventdev pipeline sample application is a sample app that demonstrates +the usage of the eventdev API using the software PMD. It shows how an +application can configure a pipeline and assign a set of worker cores to +perform the processing required. + +The application has a range of command line arguments allowing it to be +configured for various numbers worker cores, stages,queue depths and cycles per +stage of work. This is useful for performance testing as well as quickly testing +a particular pipeline configuration. + + +statistics that the PMD provides. The statistics provided depend on the PMD +used, see the Event Device Drivers section for a list of eventdev PMDs. diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst index 02611ef..90be36a 100644 --- a/doc/guides/sample_app_ug/index.rst +++ b/doc/guides/sample_app_ug/index.rst @@ -69,6 +69,7 @@ Sample Applications User Guides netmap_compatibility ip_pipeline test_pipeline +eventdev_pipeline_sw eventdev_pipeline_sw_pmd There's no need to change this, it would break 'make doc' The filename of the .rst is "./doc/guides/sample_app_ug/eventdev_pipeline_sw.rst" The app called eventdev_pipeline_sw. dist_app vm_power_management tep_termination With above changes: Acked-by: Jerin Jacob
Re: [dpdk-dev] [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
Hi Jerin, On 3/7/2017 4:57 AM, Jerin Jacob wrote: -Original Message- From: Harry van Haaren This commit adds a sample app for the eventdev library. The app has been tested with DPDK 17.05-rc2, hence this release (or later) is recommended. The sample app showcases a pipeline processing use-case, with event scheduling and processing defined per stage. The application receives traffic as normal, with each packet traversing the pipeline. Once the packet has been processed by each of the pipeline stages, it is transmitted again. The app provides a framework to utilize cores for a single role or multiple roles. Examples of roles are the RX core, TX core, Scheduling core (in the case of the event/sw PMD), and worker cores. Various flags are available to configure numbers of stages, cycles of work at each stage, type of scheduling, number of worker cores, queue depths etc. For a full explaination, please refer to the documentation. A few comments on bugs and "to avoid the future rework on base code when HW PMD is introduced". As we agreed, We will keep the functionality intact to provide an application to test ethdev + eventdev with _SW PMD_ for 17.08 Sure OK. I will Address. --- examples/Makefile | 2 + examples/eventdev_pipeline/Makefile | 49 ++ examples/eventdev_pipeline/main.c | 999 3 files changed, 1050 insertions(+) create mode 100644 examples/eventdev_pipeline/Makefile create mode 100644 examples/eventdev_pipeline/main.c Do we need to update the MAINTAINERS file? Updated diff --git a/examples/Makefile b/examples/Makefile index 6298626..a6dcc2b 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -100,4 +100,6 @@ $(info vm_power_manager requires libvirt >= 0.9.3) endif endif +DIRS-y += eventdev_pipeline Can you change to eventdev_pipeline_sw_pmd to emphasis on the scope. We will rename to eventdev_pipeline once it working effectively on both SW and HW PMD with ethdev. OK, I've updated the directory, app name and relevant docs across the board so they're all eventdev_pipeline_sw_pmd. This should make it clear to anyone using it that it's for the sw_pmd only, and an updated version will be provided later. + include $(RTE_SDK)/mk/rte.extsubdir.mk diff --git a/examples/eventdev_pipeline/Makefile b/examples/eventdev_pipeline/Makefile new file mode 100644 index 000..4c26e15 --- /dev/null +++ b/examples/eventdev_pipeline/Makefile @@ -0,0 +1,49 @@ +# BSD LICENSE +# +# Copyright(c) 2016 Intel Corporation. All rights reserved. 2016-2017 Done. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# + +static unsigned int active_cores; +static unsigned int num_workers; +static long num_packets = (1L << 25); /* do ~32M packets */ +static unsigned int num_fids = 512; +static unsigned int num_stages = 1; +static unsigned int worker_cq_depth = 16; +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY; +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1}; +static int16_t qid[MAX_NUM_STAGES] = {-1}; +static int worker_cycles; +static int enable_queue_priorities; + +struct prod_data { + uint8_t dev_id; + uint8_t port_id; + int32_t qid; + unsigned int num_nic_ports; +} __rte_cache_aligned; + +struct cons_data { + uint8_t dev_id; + uint8_t port_id; +} __rte_cache_aligned; + +static struct prod_data prod_data; +static struct cons_data cons_data; + +struct worker_data { + uint8_t dev_id; + uint8_t port_id; +} __rte_cache_aligned; + +static unsigned int *enqueue_cnt; +static unsigned int *dequeue_cnt; Not been consumed. Remove it. Done. + +static volatile int done; +static int quiet; +static int dump_dev; +static int dump_dev_signal; + +static uint32_t rx_lock; +static uint32_t tx_lock; +static uint32_t sched_lock; +static bool rx_single; +static bool tx_single; +static bool sched_single; + +static unsigned int rx_core[MAX_NUM_CORE]; +static unsigned int tx_core[MAX_NUM_CORE]; +static unsigned int sched_core[MAX_NUM_CORE]; +static unsigned int worker_core[MAX_NUM_CORE]; + +static struct rte_eth_dev_tx_buffer *tx_buf[RTE_MAX_ETHPORTS]; Could you please remove this global variable and group under a structure for "command line parsing specific" and "fast path specific"(anything comes in producer(), worker() and consumer()). And please allocate "fast path specific" structure variable from huge page area. So that we can easily add new parsing and fastpath variable in future. Done. Fastpath vars now allocated using rte_malloc() + +static int +consumer(void) +{ + const uint64_t freq_khz = rte_get_timer_hz() / 1000; + struct rte_event packets[BATCH_SIZE]; + + static uint64_t received; + static uint64_t last_pkts; + static uint64_t last_time; + static uint64_t start_time; + unsigned int i, j; + u
Re: [dpdk-dev] [PATCH v5 2/3] doc: add sw eventdev pipeline to sample app ug
On 3/7/2017 10:32 AM, Jerin Jacob wrote: -Original Message- --snip-- @@ -69,6 +69,7 @@ Sample Applications User Guides netmap_compatibility ip_pipeline test_pipeline +eventdev_pipeline_sw eventdev_pipeline_sw_pmd There's no need to change this, it would break 'make doc' The filename of the .rst is "./doc/guides/sample_app_ug/eventdev_pipeline_sw.rst" The app called eventdev_pipeline_sw. OK Jerin, I've now changed the app name to eventdev_pipeline_sw_pmd, including directory name, rst name, etc. So now it's all consistent. Rgds, Dave.
Re: [dpdk-dev] [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
Hi Jerin, On 5/7/2017 6:30 AM, Jerin Jacob wrote: -Original Message- Date: Tue, 4 Jul 2017 08:55:25 +0100 From: "Hunt, David" To: Jerin Jacob CC: dev@dpdk.org, harry.van.haa...@intel.com, Gage Eads , Bruce Richardson Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Hi Jerin, Hi David, I have checked the v6. Some comments below. On 3/7/2017 4:57 AM, Jerin Jacob wrote: -Original Message- --snip-- +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# + +static unsigned int active_cores; +static unsigned int num_workers; +static long num_packets = (1L << 25); /* do ~32M packets */ +static unsigned int num_fids = 512; +static unsigned int num_stages = 1; +static unsigned int worker_cq_depth = 16; Any reason not move this to struct config_data? Sure. All vars now moved to either config_data or fastpath_data. +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY; +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1}; +static int16_t qid[MAX_NUM_STAGES] = {-1}; +static int worker_cycles; used in fastpath move to struct fastpath_data. Performance drops by ~20% when I put these in fastpath_data. No performace drop when in config_data. I think we need to leaving in config_data for now. +static int enable_queue_priorities; struct config_data ? Done. + +struct prod_data { + uint8_t dev_id; + uint8_t port_id; + int32_t qid; + unsigned int num_nic_ports; +} __rte_cache_aligned; + +struct cons_data { + uint8_t dev_id; + uint8_t port_id; +} __rte_cache_aligned; + +static struct prod_data prod_data; +static struct cons_data cons_data; + +struct worker_data { + uint8_t dev_id; + uint8_t port_id; +} __rte_cache_aligned; + +static unsigned int *enqueue_cnt; +static unsigned int *dequeue_cnt; Not been consumed. Remove it. Done. + + return 0; +} + + +static inline void +work(struct rte_mbuf *m) +{ + struct ether_hdr *eth; + struct ether_addr addr; + + /* change mac addresses on packet (to use mbuf data) */ + eth = rte_pktmbuf_mtod(m, struct ether_hdr *); + ether_addr_copy(ð->d_addr, &addr); + ether_addr_copy(ð->s_addr, ð->d_addr); + ether_addr_copy(&addr, ð->s_addr); If it is even number of stages(say 2), Will mac swap be negated? as we are swapping on each stage NOT in consumer? The mac swap is just to touch the mbuf. It does not matter if it is negated. Are you sure or I am missing something here? for example,If I add following piece of code, irrespective number of stages it should send the same packet if source packet is same. Right ? But not happening as expected(Check the src and dest mac address) stage == 1 : 00 0F B7 11 27 2B 00 0F B7 11 27 2C 08 00 45 00 |'+',..E. 0010: 00 2E 00 00 00 00 04 11 D5 B1 C0 A8 12 02 0E 01 | 0020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.cabcdef 0030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 | | | | | ghijklmnopqr stage == 2 : 00 0F B7 11 27 2C 00 0F B7 11 27 2B 08 00 45 00 |','+..E. 0010: 00 2E 00 00 00 00 04 11 D5 B0 C0 A8 12 03 0E 01 | 0020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.cabcdef 0030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 | | | | | ghijklmnopqr diff --git a/examples/eventdev_pipeline_sw_pmd/main.c b/examples/eventdev_pipeline_sw_pmd/main.c index c62cba2..a7aaf37 100644 --- a/examples/eventdev_pipeline_sw_pmd/main.c +++ b/examples/eventdev_pipeline_sw_pmd/main.c @@ -147,8 +147,9 @@ consumer(void) if (start_time == 0) @@ -157,6 +158,7 @@ consumer(void) received += n; for (i = 0; i < n; i++) { uint8_t outport = packets[i].mbuf->port; + rte_pktmbuf_dump(stdout, packets[i].mbuf, 64); rte_eth_tx_buffer(outport, 0, fdata->tx_buf[outport], packets[i].mbuf); Either fix the mac swap properly or remove it. Temporary fix added. Now reading in addr and writing it back without swapping. Not ideal, will probably need more work in the future. Added a FIXME in the code with agreement from Jerin. + + if (!quiet) { + printf("\nPort Workload distribution:\n"); + uint32_t i; + uint64_t tot_pkts = 0; + uint64_t pkts_per_wkr[RTE_MAX_LCORE] = {0}; + for (i = 0; i < num_workers; i++) { + char statname[64]; + snprintf(statname, sizeof(statname), "port_%u_rx", + worker_data[i].port_id); + pkts_per_wkr[i] = rte_event_dev_
Re: [dpdk-dev] [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
On 6/7/2017 4:31 AM, Jerin Jacob wrote: -Original Message- Date: Wed, 5 Jul 2017 12:15:51 +0100 From: "Hunt, David" To: Jerin Jacob CC: dev@dpdk.org, harry.van.haa...@intel.com, Gage Eads , Bruce Richardson Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Hi Jerin, On 5/7/2017 6:30 AM, Jerin Jacob wrote: -Original Message- Date: Tue, 4 Jul 2017 08:55:25 +0100 From: "Hunt, David" To: Jerin Jacob CC: dev@dpdk.org, harry.van.haa...@intel.com, Gage Eads , Bruce Richardson Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Hi Jerin, Hi David, I have checked the v6. Some comments below. On 3/7/2017 4:57 AM, Jerin Jacob wrote: -Original Message- --snip-- +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# + +static unsigned int active_cores; +static unsigned int num_workers; +static long num_packets = (1L << 25); /* do ~32M packets */ +static unsigned int num_fids = 512; +static unsigned int num_stages = 1; +static unsigned int worker_cq_depth = 16; Any reason not move this to struct config_data? Sure. All vars now moved to either config_data or fastpath_data. +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY; +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1}; +static int16_t qid[MAX_NUM_STAGES] = {-1}; +static int worker_cycles; used in fastpath move to struct fastpath_data. Performance drops by ~20% when I put these in fastpath_data. No performace drop when in config_data. I think we need to leaving in config_data for now. I checked v7 it looks to OK to merge. Can you fix following minor issue with check patch and check-git-log.sh check-git-log.sh - Wrong headline lowercase: doc: add sw eventdev pipeline to sample app ug Will Do. Change sw to SW ### examples/eventdev_pipeline: added sample app Will Do. Add _sw_pmd Note: Change application to new name. checkpatch.sh - WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'consumer', this function's name, in a string #294: FILE: examples/eventdev_pipeline_sw_pmd/main.c:178: + printf("# consumer RX=%"PRIu64", time %"PRIu64 "ms, " WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'worker', this function's name, in a string #453: FILE: examples/eventdev_pipeline_sw_pmd/main.c:337: + printf(" worker %u thread done. RX=%zu TX=%zu\n", total: 0 errors, 2 warnings, 1078 lines checked These are false positives. The text in the messages are not meant to be the function name. If anything, I would prefer to change the function names to have " _thread"? I will give pull request Thomas on Friday morning. I will include this change set in the pull request. Regarding the performance drop, Can you add __rte_cache_aligned on those variable which creates regression in moving to rte_malloc area. The cache line could be shared? If not fixing then its fine we will look into that latter. I will investigate and post a new patch in a few hours. and Can you share your command to test for this regression, I will also try on x86 box if I get time? Sure. I'm using this: ./examples/eventdev_pipeline_sw_pmd/build/app/eventdev_pipeline_sw_pmd -c ff0 -w :18:00.0 -w :18:00.1 -w :18:00.2 -w :18:00.3 --vdev=event_sw0 -- -s 4 -r 10 -t 10 -e 20 -w f00 +static int enable_queue_priorities; struct config_data ? Done. + +struct prod_data { + uint8_t dev_id; + uint8_t port_id; + int32_t qid; + unsigned int num_nic_ports; +} __rte_cache_aligned; + +struct cons_data { + uint8_t dev_id; + uint8_t port_id; +} __rte_cache_aligned; + +static struct prod_data prod_data; +static struct cons_data cons_data; + +struct worker_data { + uint8_t dev_id; + uint8_t port_id; +} __rte_cache_aligned; + +static unsigned int *enqueue_cnt; +static unsigned int *dequeue_cnt; Not been consumed. Remove it. Done. + + return 0; +} + + +static inline void +work(struct rte_mbuf *m) +{ + struct ether_hdr *eth; + struct ether_addr addr; + + /* change mac addresses on packet (to use mbuf data) */ + eth = rte_pktmbuf_mtod(m, struct ether_hdr *); + ether_addr_copy(ð->d_addr, &addr); + ether_addr_copy(ð->s_addr, ð->d_addr); + ether_addr_copy(&addr, ð->s_addr); If it is even number of stages(say 2), Will mac swap be negated? as we are swapping on each stage NOT in consumer? The mac swap is just
Re: [dpdk-dev] [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
On 6/7/2017 11:04 AM, Hunt, David wrote: On 6/7/2017 4:31 AM, Jerin Jacob wrote: Note: Change application to new name. checkpatch.sh - WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'consumer', this function's name, in a string #294: FILE: examples/eventdev_pipeline_sw_pmd/main.c:178: +printf("# consumer RX=%"PRIu64", time %"PRIu64 "ms, " WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'worker', this function's name, in a string #453: FILE: examples/eventdev_pipeline_sw_pmd/main.c:337: +printf(" worker %u thread done. RX=%zu TX=%zu\n", total: 0 errors, 2 warnings, 1078 lines checked These are false positives. The text in the messages are not meant to be the function name. If anything, I would prefer to change the function names to have " _thread"? Or perhaps, better still, change the function names to verbs, i.e. produce() consume(), do_work(). Regards, Dave.
Re: [dpdk-dev] [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
Hi Jerin, On 6/7/2017 11:04 AM, Hunt, David wrote: On 6/7/2017 4:31 AM, Jerin Jacob wrote: -Original Message- --snip-- I checked v7 it looks to OK to merge. Can you fix following minor issue with check patch and check-git-log.sh check-git-log.sh - Wrong headline lowercase: doc: add sw eventdev pipeline to sample app ug Will Do. Change sw to SW ### examples/eventdev_pipeline: added sample app Will Do. Add _sw_pmd Both of these will be in next patch. Note: Change application to new name. checkpatch.sh - WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'consumer', this function's name, in a string #294: FILE: examples/eventdev_pipeline_sw_pmd/main.c:178: +printf("# consumer RX=%"PRIu64", time %"PRIu64 "ms, " WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'worker', this function's name, in a string #453: FILE: examples/eventdev_pipeline_sw_pmd/main.c:337: +printf(" worker %u thread done. RX=%zu TX=%zu\n", total: 0 errors, 2 warnings, 1078 lines checked These are false positives. The text in the messages are not meant to be the function name. If anything, I would prefer to change the function names to have " _thread"? Having looked at this a bit more, and unable to reproduce with my original kernel version checkpatch, and the patchwork version does not show, and the 4.11.9 stable kernel version does not show, I suggest we mark these down as false positives, as the string is not intended to show the function name. I will give pull request Thomas on Friday morning. I will include this change set in the pull request. Regarding the performance drop, Can you add __rte_cache_aligned on those variable which creates regression in moving to rte_malloc area. The cache line could be shared? If not fixing then its fine we will look into that latter. I will investigate and post a new patch in a few hours. Of the 4 variables I am attempting to move into fastpath structure, no matter whether I move them one at a time or all at once, with __rte_cache_align or not, I still see a significant performance degradation. I suggest looking into this later. I will push a patch in the next couple of hours with the first two changes mentioned above. OK with you? Regards, Dave.
Re: [dpdk-dev] [PATCH v3 1/3] mk: add sensible default target with defconfig
-Original Message- From: Thomas Monjalon [mailto:tho...@monjalon.net] Sent: Thursday, 3 August, 2017 11:40 PM To: Hunt, David Cc: dev@dpdk.org; shreyansh.j...@nxp.com Subject: Re: [dpdk-dev] [PATCH v3 1/3] mk: add sensible default target with defconfig 07/06/2017 16:37, David Hunt: > Users can now use 'make defconfig' to generate a configuration using > the most appropriate defaults for the current machine. > > > arch taken from uname -m > machine defaults to native > execenv is taken from uname, Linux=linuxapp, otherwise bsdapp > toolchain is taken from $CC -v to see which compiler to use > > Signed-off-by: David Hunt > Acked-by: Shreyansh Jain Looks to be a good idea if it is really automatic. > +${CC} -v 2>&1 | \ > +grep " version " | cut -d ' ' -f 1) Unfortunately, it depends on $CC which is not commonly exported. What about defaulting to gcc? > - @echo "Configuration done" > + @echo "Configuration done using "$(shell basename \ > + $(RTE_CONFIG_TEMPLATE) | sed "s/defconfig_//g") RTE_CONFIG_TEMPLATE is not defined in this patch (and I do not see the benefit in next patch). Thomas, Does this mean that this patch is not going into this release? It has been acked for almost a month now, with no further comment. The one hour between your comment and the release of RC4 did not give me a reasonable amount of time to address your concerns. I also feel that the lack of comments in the last month should mean that the patch should be applied as is. If changes are required, I am happy to address in the next release. Regards, Dave.
Re: [dpdk-dev] [PATCH v3 1/3] mk: add sensible default target with defconfig
On 4/8/2017 10:36 AM, Thomas Monjalon wrote: 04/08/2017 10:22, Hunt, David: From: Thomas Monjalon [mailto:tho...@monjalon.net] 07/06/2017 16:37, David Hunt: Users can now use 'make defconfig' to generate a configuration using the most appropriate defaults for the current machine. arch taken from uname -m machine defaults to native execenv is taken from uname, Linux=linuxapp, otherwise bsdapp toolchain is taken from $CC -v to see which compiler to use Signed-off-by: David Hunt Acked-by: Shreyansh Jain Looks to be a good idea if it is really automatic. +${CC} -v 2>&1 | \ +grep " version " | cut -d ' ' -f 1) Unfortunately, it depends on $CC which is not commonly exported. What about defaulting to gcc? - @echo "Configuration done" + @echo "Configuration done using "$(shell basename \ + $(RTE_CONFIG_TEMPLATE) | sed "s/defconfig_//g") RTE_CONFIG_TEMPLATE is not defined in this patch (and I do not see the benefit in next patch). Thomas, Does this mean that this patch is not going into this release? It has been acked for almost a month now, with no further comment. The one hour between your comment and the release of RC4 did not give me a reasonable amount of time to address your concerns. I also feel that the lack of comments in the last month should mean that the patch should be applied as is. If changes are required, I am happy to address in the next release. You're right, I'm very sorry not taking time to review it before. I think only the first patch should be integrated, without the comment for RTE_CONFIG_TEMPLATE. Opinion? OK, I would be OK with the first patch. However, I think the RTE_CONFIG_TEMPLATE comment part of the patch is fine, we just tested it here. It's only RTE_TEMPLATE I'm introducing in the second patch, nor RTE_CONFIG_TEMPLATE. That existed before this patch set. So the echo command in the first patch works fine, and shows the user what template the script has used to configure itself. I could upload another patch with just the first patch (and the relevant 2 lines from the docs patch) as a v4? Regards, Dave.
Re: [dpdk-dev] [PATCH v3 1/3] mk: add sensible default target with defconfig
On 4/8/2017 11:05 AM, Thomas Monjalon wrote: 04/08/2017 11:53, Hunt, David: On 4/8/2017 10:36 AM, Thomas Monjalon wrote: 04/08/2017 10:22, Hunt, David: From: Thomas Monjalon [mailto:tho...@monjalon.net] 07/06/2017 16:37, David Hunt: Users can now use 'make defconfig' to generate a configuration using the most appropriate defaults for the current machine. arch taken from uname -m machine defaults to native execenv is taken from uname, Linux=linuxapp, otherwise bsdapp toolchain is taken from $CC -v to see which compiler to use Signed-off-by: David Hunt Acked-by: Shreyansh Jain Looks to be a good idea if it is really automatic. +${CC} -v 2>&1 | \ +grep " version " | cut -d ' ' -f 1) Unfortunately, it depends on $CC which is not commonly exported. What about defaulting to gcc? - @echo "Configuration done" + @echo "Configuration done using "$(shell basename \ + $(RTE_CONFIG_TEMPLATE) | sed "s/defconfig_//g") RTE_CONFIG_TEMPLATE is not defined in this patch (and I do not see the benefit in next patch). Thomas, Does this mean that this patch is not going into this release? It has been acked for almost a month now, with no further comment. The one hour between your comment and the release of RC4 did not give me a reasonable amount of time to address your concerns. I also feel that the lack of comments in the last month should mean that the patch should be applied as is. If changes are required, I am happy to address in the next release. You're right, I'm very sorry not taking time to review it before. I think only the first patch should be integrated, without the comment for RTE_CONFIG_TEMPLATE. Opinion? OK, I would be OK with the first patch. However, I think the RTE_CONFIG_TEMPLATE comment part of the patch is fine, we just tested it here. It's only RTE_TEMPLATE I'm introducing in the second patch, nor RTE_CONFIG_TEMPLATE. That existed before this patch set. So the echo command in the first patch works fine, and shows the user what template the script has used to configure itself. Ah OK I totally missed it :) I could upload another patch with just the first patch (and the relevant 2 lines from the docs patch) as a v4? Yes perfect Thomas, OK, V5 sent. (v4 had 1 line missing in docs). There's just the one patch in the set now. Thanks, Dave.
Re: [dpdk-dev] [PATCH v10 02/18] lib: create private header file
On 15/3/2017 5:18 PM, Thomas Monjalon wrote: 2017-03-15 06:19, David Hunt: +/** + * Number of packets to deal with in bursts. Needs to be 8 so as to + * fit in one cache line. + */ +#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t)) error: 'rte_xmm_t' undeclared here (arm compilation) Can it be fixed by including rte_vect.h? Ideally I would prefer we stop using XMM types in a generic code. XMM are x86-only registers. It has been translated for other arches but we should use a more generic name. What was the intention here? SSE-optimized code or 128-bit size? Please check lib/librte_eal/common/include/generic/rte_vect.h for a generic type. Thomas, Including rte_vect.h does indeed resolve the issue. I had originally had "#define RTE_DIST_BURST_SIZE 8" but thought that latest definition would give further clarity as to why it's set to 8. There are 2 reasons. 1. The vector instruction I use for the matching works on 8 uint16s at a time 2. The (x86) cache lines communicating with the worker cores fit 8 mbuf pointers at a time. So there are 2 options to resolve: 1. #include at the top of rte_distributor_private.h 2. revert back to "#define RTE_DIST_BURST_SIZE 8" Personally, I'd probably lean towards option 2 (with additional comment) , as it removes the mention of xmm from the generic header file, as well as being valid for both reasons, whereas the xmm #define really only helps to explain one reason. Do you have any preference? Let me know and I can push up a v11. Regards, Dave. P.S. Suggested change: /* * Transfer up to 8 mbufs at a time to/from workers, and * flow matching algorithm optimised for 8 flow IDs at a time */ #define RTE_DIST_BURST_SIZE 8
Re: [dpdk-dev] [PATCH 1/2] drivers/mempool: add stack mempool handler as driver
On 20/3/2017 10:03 AM, Shreyansh Jain wrote: CONFIG_RTE_DRIVER_MEMPOOL_STACK option added to common_base. Stack mempool handler moved from lib/librte_mempool into drivers/mempool. With this patch, the Stack mempool handler registration is optional and toggled via the configuration file. In case disabled (N), it would imply request for creating of mempool would fail. Signed-off-by: Shreyansh Jain --- config/common_base | 5 + drivers/Makefile | 1 + drivers/mempool/Makefile | 36 + drivers/mempool/stack/Makefile | 55 drivers/mempool/stack/rte_mempool_stack.c | 147 + .../mempool/stack/rte_mempool_stack_version.map| 4 + lib/librte_mempool/Makefile| 1 - lib/librte_mempool/rte_mempool_stack.c | 147 - 8 files changed, 248 insertions(+), 148 deletions(-) create mode 100644 drivers/mempool/Makefile create mode 100644 drivers/mempool/stack/Makefile create mode 100644 drivers/mempool/stack/rte_mempool_stack.c create mode 100644 drivers/mempool/stack/rte_mempool_stack_version.map delete mode 100644 lib/librte_mempool/rte_mempool_stack.c diff --git a/config/common_base b/config/common_base index 37aa1e1..3e70aa0 100644 --- a/config/common_base +++ b/config/common_base @@ -464,6 +464,11 @@ CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=512 CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n # +# Compile Mempool drivers +# +CONFIG_RTE_DRIVER_MEMPOOL_STACK=y + +# # Compile librte_mbuf # CONFIG_RTE_LIBRTE_MBUF=y diff --git a/drivers/Makefile b/drivers/Makefile index 81c03a8..193056b 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -31,6 +31,7 @@ include $(RTE_SDK)/mk/rte.vars.mk +DIRS-y += mempool DIRS-y += net DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += crypto diff --git a/drivers/mempool/Makefile b/drivers/mempool/Makefile new file mode 100644 index 000..b256a34 --- /dev/null +++ b/drivers/mempool/Makefile @@ -0,0 +1,36 @@ +# BSD LICENSE +# +# Copyright(c) 2017 NXP. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of NXP nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +DIRS-$(CONFIG_RTE_DRIVER_MEMPOOL_STACK) += stack + +include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/mempool/stack/Makefile b/drivers/mempool/stack/Makefile new file mode 100644 index 000..f98eb5b --- /dev/null +++ b/drivers/mempool/stack/Makefile @@ -0,0 +1,55 @@ +# BSD LICENSE +# +# Copyright(c) 2017 NXP. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of NXP nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR I
Re: [dpdk-dev] [PATCH v11 08/18] lib: add symbol versioning to distributor
Hi Thomas, On 27/3/2017 2:02 PM, Thomas Monjalon wrote: 2017-03-20 10:08, David Hunt: Also bumped up the ABI version number in the Makefile It would be good to explain the intent of versioning here. Signed-off-by: David Hunt Acked-by: Bruce Richardson --- lib/librte_distributor/Makefile| 2 +- lib/librte_distributor/rte_distributor.c | 57 +++--- lib/librte_distributor/rte_distributor_v1705.h | 89 ++ lib/librte_distributor/rte_distributor_v20.c | 10 +++ lib/librte_distributor/rte_distributor_version.map | 14 5 files changed, 162 insertions(+), 10 deletions(-) create mode 100644 lib/librte_distributor/rte_distributor_v1705.h diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile index 2b28eff..2f05cf3 100644 --- a/lib/librte_distributor/Makefile +++ b/lib/librte_distributor/Makefile @@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) EXPORT_MAP := rte_distributor_version.map -LIBABIVER := 1 +LIBABIVER := 2 Why keeping ABI compat if you bump ABIVER? I guess you do not really want to bump now. You are correct. The symbol versioning will ensure old binaries will work without the bump in LIBABIVER. Please do not apply this line. Thanks, Dave.
Re: [dpdk-dev] [PATCH v11 14/18] examples/distributor: tweaks for performance
On 27/3/2017 2:04 PM, Thomas Monjalon wrote: 2017-03-20 10:08, David Hunt: Approximately 10% performance increase due to these changes. It would have been better to explain what are the changes. Signed-off-by: David Hunt Acked-by: Bruce Richardson --- examples/distributor/main.c | 36 +++- 1 file changed, 23 insertions(+), 13 deletions(-) Hi Thomas, Sure, how about this: This patch tunes Rx, Tx, and rte_distributor_process() burst sizes to maximize performance. It also addresses some checkpatch issues. The result is approximately 10% performance increase. Regards, Dave.
Re: [dpdk-dev] [PATCH v11 07/18] lib: make v20 header file private
On 27/3/2017 2:10 PM, Thomas Monjalon wrote: 2017-03-20 10:08, David Hunt: Signed-off-by: David Hunt Acked-by: Bruce Richardson [...] SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h -SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h There is no explanation for this change. I think it would be clearer if squashed with the previous patch, switching to the new API. Let me know, I can squash it myself :) Sure, makes sense to squash this one. Please do. :) Thanks, Dave.
Re: [dpdk-dev] [PATCH v6 11/21] event/sw: add scheduling logic
On 30/3/2017 12:25 AM, Harry van Haaren wrote: From: Bruce Richardson Add in the scheduling function which takes the events from the producer queues and buffers them before scheduling them to consumer queues. The scheduling logic includes support for atomic, reordered, and parallel scheduling of flows. Signed-off-by: Bruce Richardson Signed-off-by: Gage Eads Signed-off-by: Harry van Haaren --- v6: - Fix handling of event priority normalization (Jerin) --- drivers/event/sw/Makefile | 1 + drivers/event/sw/sw_evdev.c | 1 + drivers/event/sw/sw_evdev.h | 11 + drivers/event/sw/sw_evdev_scheduler.c | 601 ++ 4 files changed, 614 insertions(+) create mode 100644 drivers/event/sw/sw_evdev_scheduler.c diff --git a/drivers/event/sw/Makefile b/drivers/event/sw/Makefile index b6ecd91..a7f5b3d 100644 --- a/drivers/event/sw/Makefile +++ b/drivers/event/sw/Makefile @@ -54,6 +54,7 @@ EXPORT_MAP := rte_pmd_evdev_sw_version.map # library source files SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev_worker.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev_scheduler.c # export include files SYMLINK-y-include += diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 2c28547..f91a04b 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -557,6 +557,7 @@ sw_probe(const char *name, const char *params) dev->enqueue_burst = sw_event_enqueue_burst; dev->dequeue = sw_event_dequeue; dev->dequeue_burst = sw_event_dequeue_burst; + dev->schedule = sw_event_schedule; if (rte_eal_process_type() != RTE_PROC_PRIMARY) return 0; diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h index ab372fd..7c157c7 100644 --- a/drivers/event/sw/sw_evdev.h +++ b/drivers/event/sw/sw_evdev.h @@ -248,8 +248,18 @@ struct sw_evdev { /* Cache how many packets are in each cq */ uint16_t cq_ring_space[SW_PORTS_MAX] __rte_cache_aligned; + /* Array of pointers to load-balanced QIDs sorted by priority level */ + struct sw_qid *qids_prioritized[RTE_EVENT_MAX_QUEUES_PER_DEV]; + + /* Stats */ + struct sw_point_stats stats __rte_cache_aligned; + uint64_t sched_called; int32_t sched_quanta; + uint64_t sched_no_iq_enqueues; + uint64_t sched_no_cq_enqueues; + uint64_t sched_cq_qid_called; + uint8_t started; uint32_t credit_update_quanta; }; @@ -272,5 +282,6 @@ uint16_t sw_event_enqueue_burst(void *port, const struct rte_event ev[], uint16_t sw_event_dequeue(void *port, struct rte_event *ev, uint64_t wait); uint16_t sw_event_dequeue_burst(void *port, struct rte_event *ev, uint16_t num, uint64_t wait); +void sw_event_schedule(struct rte_eventdev *dev); #endif /* _SW_EVDEV_H_ */ diff --git a/drivers/event/sw/sw_evdev_scheduler.c b/drivers/event/sw/sw_evdev_scheduler.c new file mode 100644 index 000..c0fe6a3 --- /dev/null +++ b/drivers/event/sw/sw_evdev_scheduler.c @@ -0,0 +1,601 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016-2017 Intel Corporation. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include "sw_evdev.h" +#include "iq_ring.h" +#include "event_ring.h" + +#define SW_IQS_MASK (SW_IQ
Re: [dpdk-dev] [PATCH v6 13/21] event/sw: add dump function for easier debugging
On 30/3/2017 12:25 AM, Harry van Haaren wrote: From: Bruce Richardson Segfault issue resolved when only partially configured and rte_event_dev_dump() is called before start(), Reported-by: Vipin Varghese Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- drivers/event/sw/sw_evdev.c | 148 1 file changed, 148 insertions(+) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 04ab7ad..37f5db5 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -441,6 +441,153 @@ sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info) *info = evdev_sw_info; } +static void +sw_dump(struct rte_eventdev *dev, FILE *f) +{ + const struct sw_evdev *sw = sw_pmd_priv(dev); + + static const char * const q_type_strings[] = { + "Ordered", "Atomic", "Parallel", "Directed" + }; + uint32_t i; + fprintf(f, "EventDev %s: ports %d, qids %d\n", "todo-fix-name", + sw->port_count, sw->qid_count); + + fprintf(f, "\trx %"PRIu64"\n\tdrop %"PRIu64"\n\ttx %"PRIu64"\n", + sw->stats.rx_pkts, sw->stats.rx_dropped, sw->stats.tx_pkts); + fprintf(f, "\tsched calls: %"PRIu64"\n", sw->sched_called); + fprintf(f, "\tsched cq/qid call: %"PRIu64"\n", sw->sched_cq_qid_called); + fprintf(f, "\tsched no IQ enq: %"PRIu64"\n", sw->sched_no_iq_enqueues); + fprintf(f, "\tsched no CQ enq: %"PRIu64"\n", sw->sched_no_cq_enqueues); + uint32_t inflights = rte_atomic32_read(&sw->inflights); + uint32_t credits = sw->nb_events_limit - inflights; + fprintf(f, "\tinflight %d, credits: %d\n", inflights, credits); + +#define COL_RED "\x1b[31m" +#define COL_RESET "\x1b[0m" + + for (i = 0; i < sw->port_count; i++) { + int max, j; + const struct sw_port *p = &sw->ports[i]; + if (!p->initialized) { + fprintf(f, " %sPort %d not initialized.%s\n", + COL_RED, i, COL_RESET); + continue; + } + fprintf(f, " Port %d %s\n", i, + p->is_directed ? " (SingleCons)" : ""); + fprintf(f, "\trx %"PRIu64"\tdrop %"PRIu64"\ttx %"PRIu64 + "\t%sinflight %d%s\n", sw->ports[i].stats.rx_pkts, + sw->ports[i].stats.rx_dropped, + sw->ports[i].stats.tx_pkts, + (p->inflights == p->inflight_max) ? + COL_RED : COL_RESET, + sw->ports[i].inflights, COL_RESET); + + fprintf(f, "\tMax New: %u" + "\tAvg cycles PP: %"PRIu64"\tCredits: %u\n", + sw->ports[i].inflight_max, + sw->ports[i].avg_pkt_ticks, + sw->ports[i].inflight_credits); + fprintf(f, "\tReceive burst distribution:\n"); + float zp_percent = p->zero_polls * 100.0 / p->total_polls; + fprintf(f, zp_percent < 10 ? "\t\t0:%.02f%% " : "\t\t0:%.0f%% ", + zp_percent); + for (max = (int)RTE_DIM(p->poll_buckets); max-- > 0;) + if (p->poll_buckets[max] != 0) + break; + for (j = 0; j <= max; j++) { + if (p->poll_buckets[j] != 0) { + float poll_pc = p->poll_buckets[j] * 100.0 / + p->total_polls; + fprintf(f, "%u-%u:%.02f%% ", + ((j << SW_DEQ_STAT_BUCKET_SHIFT) + 1), + ((j+1) << SW_DEQ_STAT_BUCKET_SHIFT), + poll_pc); + } + } + fprintf(f, "\n"); + + if (p->rx_worker_ring) { + uint64_t used = qe_ring_count(p->rx_worker_ring); + uint64_t space = qe_ring_free_count(p->rx_worker_ring); + const char *col = (space == 0) ? COL_RED : COL_RESET; + fprintf(f, "\t%srx ring used: %4"PRIu64"\tfree: %4" + PRIu64 COL_RESET"\n", col, used, space); + } else + fprintf(f, "\trx ring not initialized.\n"); + + if (p->cq_worker_ring) { + uint64_t used = qe_ring_count(p->cq_worker_ring); + uint64_t space = qe_ring_free_count(p->cq_worker_ring); + const char *col = (space == 0) ? COL_RED : COL_RESET; + fprintf(f, "\t%scq ring used: %4"PRIu64"\tfree: %4" + PRIu64 COL_RESET"\n", col, used, space); + } else + fpri
Re: [dpdk-dev] [PATCH v6 14/21] event/sw: add xstats support
On 30/3/2017 12:25 AM, Harry van Haaren wrote: From: Bruce Richardson Add support for xstats to report out on the state of the eventdev. Useful for debugging and for unit tests, as well as observability at runtime and performance tuning of apps to work well with the scheduler. --snip-- +static int +sw_xstats_reset_port(struct sw_evdev *sw, uint8_t port_id, + const uint32_t ids[], uint32_t nb_ids) +{ + const uint32_t reset = 1; + const uint32_t ret_n_lt_stats = 0; + int offset = sw->xstats_offset_for_port[port_id]; + int nb_stat = sw->xstats_count_per_port[port_id]; + + if (ids) { + uint32_t nb_reset = sw_xstats_update(sw, + RTE_EVENT_DEV_XSTATS_PORT, port_id, + ids, NULL, nb_ids, + reset, ret_n_lt_stats); + return nb_reset == nb_ids ? 0 : -EINVAL; + } else + sw_xstats_reset_range(sw, offset, nb_stat); + + return 0; +} Checkpatch warning here: WARNING:UNNECESSARY_ELSE: else is not generally useful after a break or return #747: FILE: drivers/event/sw/sw_evdev_xstats.c:611: + return nb_reset == nb_ids ? 0 : -EINVAL; + } else So can be changed to + return nb_reset == nb_ids ? 0 : -EINVAL; + } + + sw_xstats_reset_range(sw, offset, nb_stat); + + return 0; Apart from that: Acked-by: David Hunt
Re: [dpdk-dev] [PATCH v7 21/22] doc: add SW eventdev PMD to 17.05 release notes
On 30/3/2017 8:30 PM, Harry van Haaren wrote: Signed-off-by: Harry van Haaren --- doc/guides/rel_notes/release_17_05.rst | 7 +++ 1 file changed, 7 insertions(+) diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst index 918f483..a5b8351 100644 --- a/doc/guides/rel_notes/release_17_05.rst +++ b/doc/guides/rel_notes/release_17_05.rst @@ -76,6 +76,13 @@ EAL Drivers ~~~ +* **Added Software Eventdev PMD.** + + Added support for the software eventdev PMD. The software eventdev is a + software based scheduler device that implements the eventdev API. This + PMD allows an application to configure a pipeline using the eventdev + library, and run the scheduling workload on a CPU core. + Libraries ~ Acked-by: David Hunt
Re: [dpdk-dev] [PATCH] eal: fix bug in x86 cmpset
On 9/2/2017 4:53 PM, Thomas Monjalon wrote: 2016-11-06 22:09, Thomas Monjalon: 2016-09-29 18:34, Thomas Monjalon: 2016-09-30 02:54, Nikhil Rao: The original code used movl instead of xchgl, this caused rte_atomic64_cmpset to use ebx as the lower dword of the source to cmpxchg8b instead of the lower dword of function argument "src". Could you please start the explanation with a statement of what is wrong from an user point of view? It could help to understand how severe it is. Please, we need a clear explanation of the bug, and an acknowledgement. Should we close this bug? I took a few minutes to look at this, and the issue can easily be reproduced with a small snippet of code. With the 'mov', the lower dword of the result is incorrect. This is resolved by using 'xchgl'. void main() { uint64_t a = 0xff00ff; rte_atomic64_cmpset( &a, 0xff00ff, 0xfa00fa); printf("0x%lx\n", a); } When using 'mov', the result is 0xfa When using 'xchgl', the result is 0xfa00fa, as expected. Rgds, Dave.
Re: [dpdk-dev] [PATCH] eal: fix bug in x86 cmpset
On 10/2/2017 10:53 AM, Thomas Monjalon wrote: 2017-02-10 10:39, Hunt, David: On 9/2/2017 4:53 PM, Thomas Monjalon wrote: 2016-11-06 22:09, Thomas Monjalon: 2016-09-29 18:34, Thomas Monjalon: 2016-09-30 02:54, Nikhil Rao: The original code used movl instead of xchgl, this caused rte_atomic64_cmpset to use ebx as the lower dword of the source to cmpxchg8b instead of the lower dword of function argument "src". Could you please start the explanation with a statement of what is wrong from an user point of view? It could help to understand how severe it is. Please, we need a clear explanation of the bug, and an acknowledgement. Should we close this bug? I took a few minutes to look at this, and the issue can easily be reproduced with a small snippet of code. With the 'mov', the lower dword of the result is incorrect. This is resolved by using 'xchgl'. void main() { uint64_t a = 0xff00ff; rte_atomic64_cmpset( &a, 0xff00ff, 0xfa00fa); printf("0x%lx\n", a); } When using 'mov', the result is 0xfa When using 'xchgl', the result is 0xfa00fa, as expected. This operation is used a lot in drivers for link status. I think we need to clearly explain what was the consequence of this bug. Agreed. It's probably also worth noting that its only on the __PIC__ enabled codepath so would have more of an affect on the distros.
Re: [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
On 21/2/2017 3:17 AM, David Hunt wrote: Move files out of the way so that we can replace with new versions of the distributor libtrary. Files are named in such a way as to match the symbol versioning that we will apply for backward ABI compatibility. Signed-off-by: David Hunt --- ---snip-- Apologies, this patch should have been sent with '--find-renames', thus reducing the size of this patch significantly, and eliminating checkpatch warnings/errors. Dave.
Re: [dpdk-dev] [PATCH v7 07/17] lib: apply symbol versioning to distibutor lib
On 21/2/2017 3:17 AM, David Hunt wrote: Note: LIBABIVER is also bumped up in the Makefile Signed-off-by: David Hunt --- lib/librte_distributor/rte_distributor.c | 10 +- lib/librte_distributor/rte_distributor_v20.c | 10 ++ lib/librte_distributor/rte_distributor_version.map | 14 ++ 3 files changed, 33 insertions(+), 1 deletion(-) --snip-- The following is generated by checkpatch for this patch: ERROR:SPACING: space required after that ',' (ctx:VxO) #70: FILE: lib/librte_distributor/rte_distributor.c:105: +BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt,, 17.05); ^ However, I also tried with a space: ERROR:SPACING: space prohibited before that ',' (ctx:WxW) #26: FILE: lib/librte_distributor/rte_distributor.c:105: +BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, , 17.05); ^ So in either case it seems it's not possible to make checkpatch happy. Rgds, Dave.
Re: [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
On 24/2/2017 2:03 PM, Bruce Richardson wrote: On Tue, Feb 21, 2017 at 03:17:37AM +, David Hunt wrote: Move files out of the way so that we can replace with new versions of the distributor libtrary. Files are named in such a way as to match the symbol versioning that we will apply for backward ABI compatibility. Signed-off-by: David Hunt --- app/test/test_distributor.c | 2 +- app/test/test_distributor_perf.c | 2 +- examples/distributor/main.c | 2 +- lib/librte_distributor/Makefile | 4 +- lib/librte_distributor/rte_distributor.c | 487 --- lib/librte_distributor/rte_distributor.h | 247 -- lib/librte_distributor/rte_distributor_v20.c | 487 +++ lib/librte_distributor/rte_distributor_v20.h | 247 ++ Rather than changing the unit tests and example applications, I think this patch would be better with a new rte_distributor.h file which simply does "#include ". Alternatively, I recently upstreamed a patch, which went into 17.02, to allow symlinks in the folder so you could create a symlink to the renamed file. /Bruce Thanks for the review, Bruce. I've just finished reworking the patchset on your review comments (including later emails) and will post soon. Regards, Dave.
Re: [dpdk-dev] [PATCH v7 04/17] lib: add new burst oriented distributor structs
On 24/2/2017 2:08 PM, Bruce Richardson wrote: On Tue, Feb 21, 2017 at 03:17:40AM +, David Hunt wrote: Signed-off-by: David Hunt --- lib/librte_distributor/rte_distributor_private.h | 61 1 file changed, 61 insertions(+) diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h index 2d85b9b..c8e0f98 100644 --- a/lib/librte_distributor/rte_distributor_private.h +++ b/lib/librte_distributor/rte_distributor_private.h @@ -129,6 +129,67 @@ struct rte_distributor_v20 { struct rte_distributor_returned_pkts returns; }; +/* All different signature compare functions */ +enum rte_distributor_match_function { + RTE_DIST_MATCH_SCALAR = 0, + RTE_DIST_MATCH_VECTOR, + RTE_DIST_NUM_MATCH_FNS +}; + +/** + * Buffer structure used to pass the pointer data between cores. This is cache + * line aligned, but to improve performance and prevent adjacent cache-line + * prefetches of buffers for other workers, e.g. when worker 1's buffer is on + * the next cache line to worker 0, we pad this out to two cache lines. + * We can pass up to 8 mbufs at a time in one cacheline. + * There is a separate cacheline for returns in the burst API. + */ +struct rte_distributor_buffer { + volatile int64_t bufptr64[RTE_DIST_BURST_SIZE] + __rte_cache_aligned; /* <= outgoing to worker */ + + int64_t pad1 __rte_cache_aligned;/* <= one cache line */ + + volatile int64_t retptr64[RTE_DIST_BURST_SIZE] + __rte_cache_aligned; /* <= incoming from worker */ + + int64_t pad2 __rte_cache_aligned;/* <= one cache line */ + + int count __rte_cache_aligned; /* <= number of current mbufs */ +}; + +struct rte_distributor { + TAILQ_ENTRY(rte_distributor) next;/**< Next in list. */ + + char name[RTE_DISTRIBUTOR_NAMESIZE]; /**< Name of the ring. */ + unsigned int num_workers; /**< Number of workers polling */ + unsigned int alg_type;/**< Number of alg types */ + + /**> +* First cache line in the this array are the tags inflight +* on the worker core. Second cache line are the backlog +* that are going to go to the worker core. +*/ + uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2] + __rte_cache_aligned; + + struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS] + __rte_cache_aligned; + + struct rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS]; + + struct rte_distributor_returned_pkts returns; + + enum rte_distributor_match_function dist_match_fn; + + struct rte_distributor_v20 *d_v20; +}; + +void +find_match_scalar(struct rte_distributor *d, + uint16_t *data_ptr, + uint16_t *output_ptr); + #ifdef __cplusplus } #endif The last patch claimed that this header file is for structs/definitions common between the old and new implementations. These definitions look to apply only to the new one, so do they belong in the .c file instead? The _v20 structs are used as a fallback in the new struct, so probably best to have in a common private file.
Re: [dpdk-dev] [PATCH v7 04/17] lib: add new burst oriented distributor structs
On 24/2/2017 2:09 PM, Bruce Richardson wrote: On Tue, Feb 21, 2017 at 03:17:40AM +, David Hunt wrote: Signed-off-by: David Hunt --- lib/librte_distributor/rte_distributor_private.h | 61 1 file changed, 61 insertions(+) diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h index 2d85b9b..c8e0f98 100644 --- a/lib/librte_distributor/rte_distributor_private.h +++ b/lib/librte_distributor/rte_distributor_private.h @@ -129,6 +129,67 @@ struct rte_distributor_v20 { struct rte_distributor_returned_pkts returns; }; +/* All different signature compare functions */ +enum rte_distributor_match_function { + RTE_DIST_MATCH_SCALAR = 0, + RTE_DIST_MATCH_VECTOR, + RTE_DIST_NUM_MATCH_FNS +}; + +/** + * Buffer structure used to pass the pointer data between cores. This is cache + * line aligned, but to improve performance and prevent adjacent cache-line + * prefetches of buffers for other workers, e.g. when worker 1's buffer is on + * the next cache line to worker 0, we pad this out to two cache lines. + * We can pass up to 8 mbufs at a time in one cacheline. + * There is a separate cacheline for returns in the burst API. + */ +struct rte_distributor_buffer { + volatile int64_t bufptr64[RTE_DIST_BURST_SIZE] + __rte_cache_aligned; /* <= outgoing to worker */ + + int64_t pad1 __rte_cache_aligned;/* <= one cache line */ + + volatile int64_t retptr64[RTE_DIST_BURST_SIZE] + __rte_cache_aligned; /* <= incoming from worker */ + + int64_t pad2 __rte_cache_aligned;/* <= one cache line */ + + int count __rte_cache_aligned; /* <= number of current mbufs */ +}; Rather than adding padding elements here, would it be better and clearer just to align the values to 128B (or more strictly CACHE_LINE_SZ * 2)? /Bruce I tried various combinations of __rte_align(128) and taking out the pads, but the performance regressed 10-15%. For the moment, I suggest leaving as is. Dave.
Re: [dpdk-dev] [PATCH v7 08/17] test: change params to distributor autotest
On 24/2/2017 2:14 PM, Bruce Richardson wrote: On Tue, Feb 21, 2017 at 03:17:44AM +, David Hunt wrote: In the next few patches, we'll want to test old and new API, so here we're allowing different parameters to be passed to the tests, instead of just a distributor struct. Signed-off-by: David Hunt --- app/test/test_distributor.c | 64 + 1 file changed, 42 insertions(+), 22 deletions(-) diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c index 6a4e20b..fdfa793 100644 --- a/app/test/test_distributor.c +++ b/app/test/test_distributor.c @@ -45,6 +45,13 @@ #define BURST 32 #define BIG_BATCH 1024 +struct worker_params { + char name[64]; + struct rte_distributor_v20 *dist; +}; + +struct worker_params worker_params; + /* statics - all zero-initialized by default */ static volatile int quit; /**< general quit variable for all threads */ static volatile int zero_quit; /**< var for when we just want thr0 to quit*/ @@ -81,7 +88,8 @@ static int handle_work(void *arg) { struct rte_mbuf *pkt = NULL; - struct rte_distributor_v20 *d = arg; + struct worker_params *wp = arg; + struct rte_distributor_v20 *d = wp->dist; The cover letter indicated that using new vs old API was just a matter of passing a different parameter to create. I therefore would not expect to see references to v20 APIs or structures in any code outside the lib itself. Am I missing something? /Bruce Patchset has now been reworked so the api switches over and apps migrated to new SPI in one step, so _v20 never exposed. Dave.
Re: [dpdk-dev] [PATCH v8 09/18] lib: add symbol versioning to distributor
ERROR:SPACING: space prohibited before that ',' (ctx:WxW) #84: FILE: lib/librte_distributor/rte_distributor.c:172: +BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05); ^ FYI, checkpatch does not like this regardless of whether there's a space there or not. It complains either way. :) Regards, Dave. On 1/3/2017 7:47 AM, David Hunt wrote: Also bumped up the ABI version number in the Makefile Signed-off-by: David Hunt --- lib/librte_distributor/Makefile| 2 +- lib/librte_distributor/rte_distributor.c | 8 lib/librte_distributor/rte_distributor_v20.c | 10 ++ lib/librte_distributor/rte_distributor_version.map | 14 ++ 4 files changed, 33 insertions(+), 1 deletion(-) diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile index 2b28eff..2f05cf3 100644 --- a/lib/librte_distributor/Makefile +++ b/lib/librte_distributor/Makefile @@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) EXPORT_MAP := rte_distributor_version.map -LIBABIVER := 1 +LIBABIVER := 2 # all source are stored in SRCS-y SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c index 6e1debf..2c5511d 100644 --- a/lib/librte_distributor/rte_distributor.c +++ b/lib/librte_distributor/rte_distributor.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -168,6 +169,7 @@ rte_distributor_get_pkt(struct rte_distributor *d, } return count; } +BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05); int rte_distributor_return_pkt(struct rte_distributor *d, @@ -197,6 +199,7 @@ rte_distributor_return_pkt(struct rte_distributor *d, return 0; } +BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, , 17.05); / APIs called on distributor core ***/ @@ -476,6 +479,7 @@ rte_distributor_process(struct rte_distributor *d, return num_mbufs; } +BIND_DEFAULT_SYMBOL(rte_distributor_process, , 17.05); /* return to the caller, packets returned from workers */ int @@ -504,6 +508,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d, return retval; } +BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, , 17.05); /* * Return the number of packets in-flight in a distributor, i.e. packets @@ -549,6 +554,7 @@ rte_distributor_flush(struct rte_distributor *d) return flushed; } +BIND_DEFAULT_SYMBOL(rte_distributor_flush, , 17.05); /* clears the internal returns array in the distributor */ void @@ -565,6 +571,7 @@ rte_distributor_clear_returns(struct rte_distributor *d) for (wkr = 0; wkr < d->num_workers; wkr++) d->bufs[wkr].retptr64[0] = 0; } +BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, , 17.05); /* creates a distributor instance */ struct rte_distributor * @@ -638,3 +645,4 @@ rte_distributor_create(const char *name, return d; } +BIND_DEFAULT_SYMBOL(rte_distributor_create, , 17.05); diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c index 1f406c5..bb6c5d7 100644 --- a/lib/librte_distributor/rte_distributor_v20.c +++ b/lib/librte_distributor/rte_distributor_v20.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include "rte_distributor_v20.h" @@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d, rte_pause(); buf->bufptr64 = req; } +VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0); struct rte_mbuf * rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d, @@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d, int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS; return (struct rte_mbuf *)((uintptr_t)ret); } +VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0); struct rte_mbuf * rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d, @@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d, rte_pause(); return ret; } +VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0); int rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d, @@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d, buf->bufptr64 = req; return 0; } +VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0); / APIs called on distributor core ***/ @@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d, d->returns.count = ret_count; return num_mbufs; } +VERSION_SYMBOL(rte_distributor_process, _v20, 2.0); /* return to the caller, packets returned from workers */ int @@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d, return retval; } +VER
Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
-Original Message- From: Richardson, Bruce Sent: Friday, 10 March, 2017 4:22 PM To: Hunt, David Cc: dev@dpdk.org Subject: Re: [PATCH v9 09/18] lib: add symbol versioning to distributor On Mon, Mar 06, 2017 at 09:10:24AM +, David Hunt wrote: > Also bumped up the ABI version number in the Makefile > > Signed-off-by: David Hunt > --- > lib/librte_distributor/Makefile| 2 +- > lib/librte_distributor/rte_distributor.c | 57 +++--- > lib/librte_distributor/rte_distributor_v1705.h | 89 > ++ A file named rte_distributor_v1705.h was added in patch 4, then deleted in patch 7, and now added again here. Seems a lot of churn. /Bruce
Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
On 10/3/2017 4:22 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:24AM +, David Hunt wrote: Also bumped up the ABI version number in the Makefile Signed-off-by: David Hunt --- lib/librte_distributor/Makefile| 2 +- lib/librte_distributor/rte_distributor.c | 57 +++--- lib/librte_distributor/rte_distributor_v1705.h | 89 ++ A file named rte_distributor_v1705.h was added in patch 4, then deleted in patch 7, and now added again here. Seems a lot of churn. /Bruce The first introduction of this file is what will become the public header. For successful compilation, this cannot be called rte_distributor.h until the symbol versioning patch, at which stage I will rename the file, and introduce the symbol versioned header at the same time. In the next patch I'll rename this version of the files as rte_distributor_public.h to make this clearer. Regards, Dave.
Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
On 13/3/2017 11:01 AM, Van Haaren, Harry wrote: From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Hunt, David Subject: Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor On 10/3/2017 4:22 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:24AM +, David Hunt wrote: Also bumped up the ABI version number in the Makefile Signed-off-by: David Hunt --- lib/librte_distributor/Makefile| 2 +- lib/librte_distributor/rte_distributor.c | 57 +++--- lib/librte_distributor/rte_distributor_v1705.h | 89 ++ A file named rte_distributor_v1705.h was added in patch 4, then deleted in patch 7, and now added again here. Seems a lot of churn. /Bruce The first introduction of this file is what will become the public header. For successful compilation, this cannot be called rte_distributor.h until the symbol versioning patch, at which stage I will rename the file, and introduce the symbol versioned header at the same time. In the next patch I'll rename this version of the files as rte_distributor_public.h to make this clearer. Suggestion to use rte_distributor_next.h instead of public? Public doesn't indicate if its old or new, while next would make that clearer IMO :) Good call, will use "_next". Its clearer. Thanks, Dave.
Re: [dpdk-dev] [PATCH v9 16/18] examples/distributor: give Rx thread a core
On 10/3/2017 4:51 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:31AM +, David Hunt wrote: This so that with the increased amount of stats we are counting, we don't interfere with the rx core. Where are the stats being counted in the current code and how would they interfere? /Bruce Previous version of the distributor example did not print out several lines of statistics every second, which the new version does. I felt that it it would be better to separate this off to it's own core, so that the rx core performance would not be impacted. I'll add an extra comment in the commit message. Regards, Dave.
Re: [dpdk-dev] [PATCH v9 04/18] lib: add new distributor code
On 10/3/2017 4:03 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:19AM +, David Hunt wrote: This patch includes public header file which will be used once we add in the symbol versioning for v20 and v1705 APIs. Also includes v1702 header file, and code for new Now 1705. burst-capable distributor library. This will be re-named as rte_distributor.h later in the patch-set The new distributor code contains a very similar API to the legacy code, but now sends bursts of up to 8 mbufs to each worker. Flow ID's are reduced to 15 bits for an optimal flow matching algorithm. Signed-off-by: David Hunt --- lib/librte_distributor/Makefile | 1 + lib/librte_distributor/rte_distributor.c | 628 +++ lib/librte_distributor/rte_distributor_private.h | 7 +- lib/librte_distributor/rte_distributor_v1705.h | 269 ++ 4 files changed, 904 insertions(+), 1 deletion(-) create mode 100644 lib/librte_distributor/rte_distributor.c create mode 100644 lib/librte_distributor/rte_distributor_v1705.h Minor nit, I think this patch might be squashed into the previous one, to have new structures and code together. /Bruce Done in the next version. Dave.
Re: [dpdk-dev] [PATCH v9 12/18] examples/distributor: allow for extra stats
On 10/3/2017 4:46 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:27AM +, David Hunt wrote: + freq = rte_get_timer_hz(); + t = rte_rdtsc() + freq; + while (!quit_signal_dist) { + if (t < rte_rdtsc()) { + print_stats(); + t = rte_rdtsc() + freq; + } + } + You can probably put in a usleep or nanosleep inot the while loop above. No need to burn an entire core by polling the tsc. /Bruce Done in the next version. Dave.
Re: [dpdk-dev] [PATCH v9 14/18] examples/distributor: give distributor a core
On 10/3/2017 4:49 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:29AM +, David Hunt wrote: Signed-off-by: David Hunt --- Title could do with some rewording - e.g. "make distributor API calls on dedicated core" This also requires an explanation as to why the change is being made. Does it not also need an update to the sample app guide about how the app works? /Bruce Yes, sure. And I'll add some more info into the dist_app.rst file. Rgds, Dave.
Re: [dpdk-dev] [PATCH v9 15/18] examples/distributor: limit number of Tx rings
On 10/3/2017 4:50 PM, Bruce Richardson wrote: On Mon, Mar 06, 2017 at 09:10:30AM +, David Hunt wrote: Signed-off-by: David Hunt --- Please explain reason for change. /Bruce I've re-worked this change, as it's mostly do to with performance, resulting in about 10% gain. The number of Tx rings has been removed, as it was part of an investigation for high core count operation, and is not relevant. Rgds, Dave.
[dpdk-dev] [PATCH v3 00/15] Introduce SoC device/driver framework for EAL
Shreyansh, Jan, Hemant, On 9/9/2016 9:43 AM, Shreyansh Jain wrote: > Introduction: > = > > This patch set is direct derivative of Jan's original series [1],[2]. > > - As this deviates substantially from original series, if need be I can > post it as a separate patch rather than v2. Please suggest. > - Also, there are comments on original v1 ([4]) which are _not_ > incorporated in this series as they refer to section no more in new > version. > - This v3 version is based on the rte_driver/device patchset v9 [10]. > That series introduced device structures (rte_driver/rte_device) > generalizing devices into PCI, VDEV, XXX. For the purpose of this > patchset, XXX=>SOC. > > ---snip--- FYI, I've reviewed this patch set, and it looks to me like there's some very good work here. Each patch in the set builds nicely on the one before, and logically introduces the changes one by one. I've no functional suggestions as the implementation looks clean, but I've one or two tiny suggestions on headers and error messages. I'll add a reply to the relevant patches in the set. Also, there's one or two issues thrown up by checkpatch, but I suspect they're false positives, as I'm using the 4.6 kernel version. Regards, Dave
[dpdk-dev] [PATCH v3 01/15] eal/soc: introduce very essential SoC infra definitions
Some small comments below: On 9/9/2016 9:43 AM, Shreyansh Jain wrote: > Define initial structures and functions for the SoC infrastructure. > This patch supports only a very minimal functions for now. > More features will be added in the following commits. > > Includes rte_device/rte_driver inheritance of > rte_soc_device/rte_soc_driver. > > Signed-off-by: Jan Viktorin > Signed-off-by: Shreyansh Jain > Signed-off-by: Hemant Agrawal > --- > app/test/Makefile | 1 + > app/test/test_soc.c | 90 + > lib/librte_eal/common/Makefile | 2 +- > lib/librte_eal/common/eal_private.h | 4 + > lib/librte_eal/common/include/rte_soc.h | 138 > > 5 files changed, 234 insertions(+), 1 deletion(-) > create mode 100644 app/test/test_soc.c > create mode 100644 lib/librte_eal/common/include/rte_soc.h > > diff --git a/app/test/Makefile b/app/test/Makefile > index 611d77a..64b261d 100644 > --- a/app/test/Makefile > +++ b/app/test/Makefile > @@ -77,6 +77,7 @@ APP = test > # > SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) := commands.c > SRCS-y += test.c > +SRCS-y += test_soc.c > SRCS-y += resource.c > SRCS-y += test_resource.c > test_resource.res: test_resource.c > diff --git a/app/test/test_soc.c b/app/test/test_soc.c > new file mode 100644 > index 000..916a863 > --- /dev/null > +++ b/app/test/test_soc.c > @@ -0,0 +1,90 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2016 RehiveTech. All rights reserved. > + * All rights reserved. Remove un-needed "All rights reserved" > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of RehiveTech nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +#include "test.h" > + > +static char *safe_strdup(const char *s) > +{ > + char *c = strdup(s); > + > + if (c == NULL) > + rte_panic("failed to strdup '%s'\n", s); > + > + return c; > +} > + > +static int test_compare_addr(void) > +{ > + struct rte_soc_addr a0; > + struct rte_soc_addr a1; > + struct rte_soc_addr a2; > + > + a0.name = safe_strdup("ethernet0"); > + a0.fdt_path = NULL; > + > + a1.name = safe_strdup("ethernet0"); > + a1.fdt_path = NULL; > + > + a2.name = safe_strdup("ethernet1"); > + a2.fdt_path = NULL; > + > + TEST_ASSERT(!rte_eal_compare_soc_addr(&a0, &a1), > + "Failed to compare two soc addresses that equal"); > + TEST_ASSERT(rte_eal_compare_soc_addr(&a0, &a2), > + "Failed to compare two soc addresses that differs"); > + > + free(a2.name); > + free(a1.name); > + free(a0.name); > + return 0; > +} > + > +static int > +test_soc(void) > +{ > + if (test_compare_addr()) > + return -1; > + > + return 0; > +} > + > +REGISTER_TEST_COMMAND(soc_autotest, test_soc); > diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile > index dfd64aa..b414008 100644 > --- a/lib/librte_eal/common/Makefile > +++ b/lib/librte_eal/common/Makefile > @@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk > > INC := rte_branch_prediction.h rte_common.h > INC += rte_debug.h rte_eal.h rte_errno.h rte_launch.h rte_lcore.h > -INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h > +INC += rte_log.h rte_memory.h rte_memzone.h rte_soc.h rte_pci.h > INC += rte_p
[dpdk-dev] [PATCH v3 02/15] eal/soc: add rte_eal_soc_register/unregister logic
On 9/9/2016 9:43 AM, Shreyansh Jain wrote: > Registeration of a SoC driver through a helper DRIVER_REGISTER_SOC > (on the lines of DRIVER_REGISTER_PCI). soc_driver_list stores all the > registered drivers. > > Test case has been introduced to verify the registration and > deregistration. > > Signed-off-by: Jan Viktorin > Signed-off-by: Shreyansh Jain > Signed-off-by: Hemant Agrawal > --- > app/test/test_soc.c | 111 > > lib/librte_eal/bsdapp/eal/rte_eal_version.map | 4 + > lib/librte_eal/common/eal_common_soc.c | 56 > lib/librte_eal/common/include/rte_soc.h | 26 ++ > lib/librte_eal/linuxapp/eal/Makefile| 1 + > lib/librte_eal/linuxapp/eal/rte_eal_version.map | 3 + > 6 files changed, 201 insertions(+) > create mode 100644 lib/librte_eal/common/eal_common_soc.c > > diff --git a/app/test/test_soc.c b/app/test/test_soc.c > index 916a863..ac03e64 100644 > --- a/app/test/test_soc.c > +++ b/app/test/test_soc.c > @@ -75,6 +75,108 @@ static int test_compare_addr(void) > free(a2.name); > free(a1.name); > free(a0.name); > + > + return 0; > +} > + > +/** > + * Empty PMD driver based on the SoC infra. > + * > + * The rte_soc_device is usually wrapped in some higher-level struct > + * (eth_driver). We simulate such a wrapper with an anonymous struct here. > + */ > +struct test_wrapper { > + struct rte_soc_driver soc_drv; > +}; > + > +struct test_wrapper empty_pmd0 = { > + .soc_drv = { > + .driver = { > + .name = "empty_pmd0" > + }, > + }, > +}; > + > +struct test_wrapper empty_pmd1 = { > + .soc_drv = { > + .driver = { > + .name = "empty_pmd1" > + }, > + }, > +}; > + > +static int > +count_registered_socdrvs(void) > +{ > + int i; > + struct rte_soc_driver *drv; > + > + i = 0; > + TAILQ_FOREACH(drv, &soc_driver_list, next) > + i += 1; > + > + return i; > +} > + > +static int > +test_register_unregister(void) > +{ > + struct rte_soc_driver *drv; > + int count; > + > + rte_eal_soc_register(&empty_pmd0.soc_drv); > + > + TEST_ASSERT(!TAILQ_EMPTY(&soc_driver_list), > + "No PMD is present but the empty_pmd0 should be there"); > + drv = TAILQ_FIRST(&soc_driver_list); > + TEST_ASSERT(!strcmp(drv->driver.name, "empty_pmd0"), > + "The registered PMD is not empty_pmd0 but '%s'", > + drv->driver.name); > + > + rte_eal_soc_register(&empty_pmd1.soc_drv); > + > + count = count_registered_socdrvs(); > + TEST_ASSERT_EQUAL(count, 2, "Expected 2 PMDs but detected %d", count); > + > + rte_eal_soc_unregister(&empty_pmd0.soc_drv); > + count = count_registered_socdrvs(); > + TEST_ASSERT_EQUAL(count, 1, "Expected 1 PMDs but detected %d", count); > + > + rte_eal_soc_unregister(&empty_pmd1.soc_drv); > + > + printf("%s has been successful\n", __func__); > + return 0; > +} > + > +/* save real devices and drivers until the tests finishes */ > +struct soc_driver_list real_soc_driver_list = > + TAILQ_HEAD_INITIALIZER(real_soc_driver_list); > + > +static int test_soc_setup(void) > +{ > + struct rte_soc_driver *drv; > + > + /* no real drivers for the test */ > + while (!TAILQ_EMPTY(&soc_driver_list)) { > + drv = TAILQ_FIRST(&soc_driver_list); > + rte_eal_soc_unregister(drv); > + TAILQ_INSERT_TAIL(&real_soc_driver_list, drv, next); > + } > + > + return 0; > +} > + > +static int test_soc_cleanup(void) > +{ > + struct rte_soc_driver *drv; > + > + /* bring back real drivers after the test */ > + while (!TAILQ_EMPTY(&real_soc_driver_list)) { > + drv = TAILQ_FIRST(&real_soc_driver_list); > + TAILQ_REMOVE(&real_soc_driver_list, drv, next); > + rte_eal_soc_register(drv); > + } > + > return 0; > } > > @@ -84,6 +186,15 @@ test_soc(void) > if (test_compare_addr()) > return -1; > > + if (test_soc_setup()) > + return -1; > + > + if (test_register_unregister()) > + return -1; > + > + if (test_soc_cleanup()) > + return -1; > + > return 0; > } > > diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > index 7b3d409..cda8009 100644 > --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map > +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map > @@ -168,4 +168,8 @@ DPDK_16.11 { > > rte_eal_dev_attach; > rte_eal_dev_detach; > + soc_driver_list; > + rte_eal_soc_register; > + rte_eal_soc_unregister; > + > } DPDK_16.07; > diff --git a/lib/librte_eal/common/eal_common_soc.c > b/lib/librte_eal/common/eal_common_soc.c > new file mode 100644 > index 000..56135ed > --- /dev/null > +++ b/lib/librte_eal/common
[dpdk-dev] [PATCH v3 12/15] ether: extract function eth_dev_get_intr_handle
On 9/9/2016 9:43 AM, Shreyansh Jain wrote: > We abstract access to the intr_handle here as we want to get > it either from the pci_dev or soc_dev. > > Signed-off-by: Jan Viktorin > Signed-off-by: Shreyansh Jain > Signed-off-by: Hemant Agrawal > --- > lib/librte_ether/rte_ethdev.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c > index e9f5467..104ea4a 100644 > --- a/lib/librte_ether/rte_ethdev.c > +++ b/lib/librte_ether/rte_ethdev.c > @@ -2526,6 +2526,17 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev, > rte_spinlock_unlock(&rte_eth_dev_cb_lock); > } > > +static inline > +struct rte_intr_handle *eth_dev_get_intr_handle(struct rte_eth_dev *dev) > +{ > + if (dev->pci_dev) { > + return &dev->pci_dev->intr_handle; > + } > + > + RTE_VERIFY(0); Rather than RTE_VERIFY(0), might I suggest using rte_panic with a more relevant error message? > + return NULL; > +} > + > int > rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int op, void *data) > { > @@ -2538,7 +2549,7 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int > op, void *data) > RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); > > dev = &rte_eth_devices[port_id]; > - intr_handle = &dev->pci_dev->intr_handle; > + intr_handle = eth_dev_get_intr_handle(dev); > if (!intr_handle->intr_vec) { > RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n"); > return -EPERM; > @@ -2598,7 +2609,7 @@ rte_eth_dev_rx_intr_ctl_q(uint8_t port_id, uint16_t > queue_id, > return -EINVAL; > } > > - intr_handle = &dev->pci_dev->intr_handle; > + intr_handle = eth_dev_get_intr_handle(dev); > if (!intr_handle->intr_vec) { > RTE_PMD_DEBUG_TRACE("RX Intr vector unset\n"); > return -EPERM;
[dpdk-dev] [PATCH v3 13/15] ether: extract function eth_dev_get_driver_name
On 9/9/2016 9:43 AM, Shreyansh Jain wrote: > Signed-off-by: Jan Viktorin > Signed-off-by: Shreyansh Jain > Signed-off-by: Hemant Agrawal > --- > lib/librte_ether/rte_ethdev.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c > index 104ea4a..4fa65ca 100644 > --- a/lib/librte_ether/rte_ethdev.c > +++ b/lib/librte_ether/rte_ethdev.c > @@ -2568,6 +2568,17 @@ rte_eth_dev_rx_intr_ctl(uint8_t port_id, int epfd, int > op, void *data) > return 0; > } > > +static inline > +const char *eth_dev_get_driver_name(const struct rte_eth_dev *dev) > +{ > + if (dev->pci_dev) { > + return dev->driver->pci_drv.driver.name; > + } > + > + RTE_VERIFY(0); Same comment as last patch, maybe add an rte_panic with more descriptive error message. > + return NULL; > +} > + > const struct rte_memzone * > rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char > *ring_name, >uint16_t queue_id, size_t size, unsigned align, > @@ -2575,9 +2586,11 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev > *dev, const char *ring_name, > { > char z_name[RTE_MEMZONE_NAMESIZE]; > const struct rte_memzone *mz; > + const char *drv_name; > > + drv_name = eth_dev_get_driver_name(dev); > snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d", > - dev->driver->pci_drv.driver.name, ring_name, > + drv_name, ring_name, >dev->data->port_id, queue_id); > > mz = rte_memzone_lookup(z_name);
[dpdk-dev] [PATCH v2 2/2] mempool:pktmbuf pool default fallback for mempool ops error
Hi Hemant, On 15/9/2016 6:13 PM, Hemant Agrawal wrote: > In the rte_pktmbuf_pool_create, if the default external mempool is > not available, the implementation can default to "ring_mp_mc", which > is an software implementation. > > Signed-off-by: Hemant Agrawal > --- > lib/librte_mbuf/rte_mbuf.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c > index 4846b89..4adb4f5 100644 > --- a/lib/librte_mbuf/rte_mbuf.c > +++ b/lib/librte_mbuf/rte_mbuf.c > @@ -176,6 +176,11 @@ rte_pktmbuf_pool_create(const char *name, unsigned n, > > rte_errno = rte_mempool_set_ops_byname(mp, > RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL); > + > + /* on error, try falling back to the software based default pool */ > + if (rte_errno == -EOPNOTSUPP) > + rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL); Should we log a warning message here saying that we're falling back to the mp/mc handler? > + > if (rte_errno != 0) { > RTE_LOG(ERR, MBUF, "error setting mempool handler\n"); > return NULL;
[dpdk-dev] [PATCH v2 1/2] eal/mempool: introduce check for external mempool availability
Hi Hemant, On 15/9/2016 6:13 PM, Hemant Agrawal wrote: > External offloaded mempool may not be available always. This check enables > run time verification of the presence of external mempool before the > mempool ops are installed. > > An application built with a specific external mempool may work fine > on host. But, may not work on VM, specificaly if non-hw specific interfaces > are used e.g. virtio-net. > > This is required to make sure that same application can work in all > environment for a given hw platform. > > The solution proposed here is that rte_mempool_set_ops_byname should return > specific error in case the current execution environment is not supporting > the given mempool. Thereby allowing it to fallback on software mempool > implementation e.g. "ring_mp_mc" > > This patch introduces new optional "pool_verify" as mempool ops function > to check if external mempool instance is available or not. I've a small suggestion around the naming of the new functions. It seems to me that this patch is more about mempool handler 'support' than 'verification'. Could I suggest a different name for this new function? I think maybe "supported" may be more appropriate than "pool_verify". The return code that's returned when a mempool handler is not available is -EOPNOTSUPP, which is what suggested the word "supported" to me. I think that: if (ops->supported) return ops->supported(mp); makes for slightly easier reading. The wrapper function would logically then be called rte_mempool_ops_supported(). > Signed-off-by: Hemant Agrawal > --- > Changes in v2: > * fixed the pool_verify copy in rte_mempool_register_ops > --- > lib/librte_mempool/rte_mempool.h | 21 + > lib/librte_mempool/rte_mempool_ops.c | 19 +++ > lib/librte_mempool/rte_mempool_ring.c | 4 > lib/librte_mempool/rte_mempool_stack.c | 3 ++- > 4 files changed, 46 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_mempool/rte_mempool.h > b/lib/librte_mempool/rte_mempool.h > index 0243f9e..8599df1 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -387,6 +387,12 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool > *mp, >*/ > typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp); > > +/** > + * Return if the given external mempool is available for this instance. > + * it is optional to implement for mempools > + */ > +typedef int (*rte_mempool_pool_verify)(const struct rte_mempool *mp); > + > /** Structure defining mempool operations structure */ > struct rte_mempool_ops { > char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */ > @@ -395,6 +401,8 @@ struct rte_mempool_ops { > rte_mempool_enqueue_t enqueue; /**< Enqueue an object. */ > rte_mempool_dequeue_t dequeue; /**< Dequeue an object. */ > rte_mempool_get_count get_count; /**< Get qty of available objs. */ > + rte_mempool_pool_verify pool_verify; > + /**< Verify if external mempool is available for usages*/ > } __rte_cache_aligned; > > #define RTE_MEMPOOL_MAX_OPS_IDX 16 /**< Max registered ops structs */ > @@ -516,6 +524,18 @@ void > rte_mempool_ops_free(struct rte_mempool *mp); > > /** > + * @internal wrapper to verify the external mempool availability > + * > + * @param mp > + * Pointer to the memory pool. > + * @return > + * 0: Success; external mempool instance is available > + * - <0: Error; external mempool instance is not available > + */ > +int > +rte_mempool_ops_pool_verify(const struct rte_mempool *mp); > + > +/** >* Set the ops of a mempool. >* >* This can only be done on a mempool that is not populated, i.e. just after > @@ -531,6 +551,7 @@ rte_mempool_ops_free(struct rte_mempool *mp); >* - 0: Success; the mempool is now using the requested ops functions. >* - -EINVAL - Invalid ops struct name provided. >* - -EEXIST - mempool already has an ops struct assigned. > + * - -EOPNOTSUPP - mempool instance not available. >*/ > int > rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name, > diff --git a/lib/librte_mempool/rte_mempool_ops.c > b/lib/librte_mempool/rte_mempool_ops.c > index 5f24de2..c2e765f 100644 > --- a/lib/librte_mempool/rte_mempool_ops.c > +++ b/lib/librte_mempool/rte_mempool_ops.c > @@ -85,6 +85,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h) > ops->enqueue = h->enqueue; > ops->dequeue = h->dequeue; > ops->get_count = h->get_count; > + ops->pool_verify = h->pool_verify; > > rte_spinlock_unlock(&rte_mempool_ops_table.sl); > > @@ -123,6 +124,18 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp) > return ops->get_count(mp); > } > > +/* wrapper to check if given external mempool is available for this > instance.*/ > +int > +rte_mempool_ops_pool_verify(const struct rte_mempool *mp) > +{ > + struct rt
Re: [dpdk-dev] [PATCH v2 1/2] mk: allow use of environment var for make config
Shreyansh, I found an issue (or two) with this part of the patch, and have a proposed solution. 1. RTE_TARGET originally had a different meaning. It was used for making examples, specifying the target directory of where the SDK was built. It's not good to re-purpose this for something else, as I'm doing in this patch. (even though I'm not sure that variable is suitably named in the first place, but that's a different issue). 2. If we set RTE_TARGET on the environment, we will break the 'make -C examples/', unless we set RTE_TARGET to be something else (i.e. 'make -C examples/ RTE_TARGET=build'). One value for making DPDK, and another for building examples. It's confusing to the user. An alternative patch would be as follows: RTE_CONFIG_TEMPLATE := ifdef T *-ifeq ("$(origin T)", "command line")* RTE_CONFIG_TEMPLATE := $(RTE_SRCDIR)/config/defconfig_$(T) *-endif** *endif export RTE_CONFIG_TEMPLATE So instead of setting 'RTE_TARGET' on in the environment, we set 'T' instead. This allows 'T' to come from the command line OR an environment variable. It resolves the 'make examples' issue, and everything else works as it did before, 'make install', etc. It seems to me to be a cleaner solution for this. What do you think? If it's OK with you, I'll submit a v3 (the 'make defconfig' part of the patchset will remain the same as v2). Rgds, Dave. On 26/5/2017 9:52 AM, David Hunt wrote: Users can now set RTE_TARGET in their environment and use 'make config' without T=template. If RTE_TARGET is set in the user's environment, and if T= is not used, 'make config' will use $RTE_TARGET. Signed-off-by: David Hunt --- mk/rte.sdkroot.mk | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk index 2843b7d..9bdaf20 100644 --- a/mk/rte.sdkroot.mk +++ b/mk/rte.sdkroot.mk @@ -63,6 +63,8 @@ ifdef T ifeq ("$(origin T)", "command line") RTE_CONFIG_TEMPLATE := $(RTE_SRCDIR)/config/defconfig_$(T) endif +else ifdef RTE_TARGET +RTE_CONFIG_TEMPLATE := $(RTE_SRCDIR)/config/defconfig_$(RTE_TARGET) endif export RTE_CONFIG_TEMPLATE
Re: [dpdk-dev] [PATCH v2 1/2] mk: allow use of environment var for make config
Hi Shreyansh, On 7/6/2017 10:36 AM, Shreyansh Jain wrote: Hello David, On Wednesday 07 June 2017 02:09 PM, Hunt, David wrote: Shreyansh, I found an issue (or two) with this part of the patch, and have a proposed solution. 1. RTE_TARGET originally had a different meaning. It was used for making examples, specifying the target directory of where the SDK was built. It's not good to re-purpose this for something else, as I'm doing in this patch. (even though I'm not sure that variable is suitably named in the first place, but that's a different issue). Even I didn't realize this until you highlighted here. 2. If we set RTE_TARGET on the environment, we will break the 'make -C examples/', unless we set RTE_TARGET to be something else (i.e. 'make -C examples/ RTE_TARGET=build'). One value for making DPDK, and another for building examples. It's confusing to the user. Agree about re-using RTE_TARGET is breaking existing assumption about its use. An alternative patch would be as follows: RTE_CONFIG_TEMPLATE := ifdef T *-ifeq ("$(origin T)", "command line")* RTE_CONFIG_TEMPLATE := $(RTE_SRCDIR)/config/defconfig_$(T) *-endif** *endif export RTE_CONFIG_TEMPLATE So, that would mean, user would do either of the following: make T= config or export T= make config Is that correct? (I tried it and it seems to be working fine) First method is same as today. For the second, I am just skeptical whether we should use such a small identifier ("T") or we have a new RTE_TEMPLATE. Either way, I am OK. [export T=] looks fine to me - in fact, on a second though, IMO, if T= is provided as command line, it should also be acceptable as env variable. I did a quick poll here in the office and people feel that 'T' is too short for an environment variable. RTE_TEMPLATE would be preferred, and it's a sensible choice that does not conflict with RTE_TARGET. So if we use RTE_TEMPLATE, we'd also have to put in a couple of lines for the "make install" case, but it's still a small enough patch: diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk index dbac2a2..a464b01 100644 --- a/mk/rte.sdkinstall.mk +++ b/mk/rte.sdkinstall.mk @@ -47,6 +47,10 @@ ifneq ($(MAKECMDGOALS),pre_install) include $(RTE_SDK)/mk/rte.vars.mk endif *+ifndef T** **+T := $(RTE_TEMPLATE)** **+endif** * ifdef T # defaults with T= will install an almost flat staging tree export prefix ?= kerneldir ?= $(prefix)/kmod diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk index 076a2d7..0b71a4e 100644 --- a/mk/rte.sdkroot.mk +++ b/mk/rte.sdkroot.mk @@ -63,6 +63,8 @@ ifdef T ifeq ("$(origin T)", "command line") RTE_CONFIG_TEMPLATE := $(RTE_SRCDIR)/config/defconfig_$(T) endif *+else** **+RTE_CONFIG_TEMPLATE := $(RTE_SRCDIR)/config/defconfig_$(RTE_TEMPLATE)** * endif export RTE_CONFIG_TEMPLATE So if T is provided on the command line, it takes priority. If that seems reasonable to you, I'll push up a v3. :) Regards, Dave.
Re: [dpdk-dev] [PATCH 1/3] examples/eventdev_pipeline: added sample app
Hi Jerin, I'm assisting Harry on the sample app, and have just pushed up a V2 patch based on your feedback. I've addressed most of your suggestions, comments below. There may still a couple of outstanding questions that need further discussion. Regards, Dave On 10/5/2017 3:12 PM, Jerin Jacob wrote: > -Original Message- >> Date: Fri, 21 Apr 2017 10:51:37 +0100 >> From: Harry van Haaren >> To: dev@dpdk.org >> CC: jerin.ja...@caviumnetworks.com, Harry van Haaren >> , Gage Eads , Bruce >> Richardson >> Subject: [PATCH 1/3] examples/eventdev_pipeline: added sample app >> X-Mailer: git-send-email 2.7.4 >> >> This commit adds a sample app for the eventdev library. >> The app has been tested with DPDK 17.05-rc2, hence this >> release (or later) is recommended. >> >> The sample app showcases a pipeline processing use-case, >> with event scheduling and processing defined per stage. >> The application recieves traffic as normal, with each >> packet traversing the pipeline. Once the packet has >> been processed by each of the pipeline stages, it is >> transmitted again. >> >> The app provides a framework to utilize cores for a single >> role or multiple roles. Examples of roles are the RX core, >> TX core, Scheduling core (in the case of the event/sw PMD), >> and worker cores. >> >> Various flags are available to configure numbers of stages, >> cycles of work at each stage, type of scheduling, number of >> worker cores, queue depths etc. For a full explaination, >> please refer to the documentation. >> >> Signed-off-by: Gage Eads >> Signed-off-by: Bruce Richardson >> Signed-off-by: Harry van Haaren > > Thanks for the example application to share the SW view. > I could make it run on HW after some tweaking(not optimized though) > > [...] >> +#define MAX_NUM_STAGES 8 >> +#define BATCH_SIZE 16 >> +#define MAX_NUM_CORE 64 > > How about RTE_MAX_LCORE? Core usage in the sample app is held in a uint64_t. Adding arrays would be possible, but I feel that the extra effort would not give that much benefit. I've left as is for the moment, unless you see any strong requirement to go beyond 64 cores? > >> + >> +static unsigned int active_cores; >> +static unsigned int num_workers; >> +static unsigned long num_packets = (1L << 25); /* do ~32M packets */ >> +static unsigned int num_fids = 512; >> +static unsigned int num_priorities = 1; > > looks like its not used. Yes, Removed. > >> +static unsigned int num_stages = 1; >> +static unsigned int worker_cq_depth = 16; >> +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY; >> +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1}; >> +static int16_t qid[MAX_NUM_STAGES] = {-1}; > > Moving all fastpath related variables under a structure with cache > aligned will help. I tried a few different combinations of this, and saw no gains, some losses. So will leave as is for the moment, if that's OK. > >> +static int worker_cycles; >> +static int enable_queue_priorities; >> + >> +struct prod_data { >> +uint8_t dev_id; >> +uint8_t port_id; >> +int32_t qid; >> +unsigned num_nic_ports; >> +}; Yes, saw a percent or two gain when this plus following two data structs cache aligned. > > cache aligned ? > >> + >> +struct cons_data { >> +uint8_t dev_id; >> +uint8_t port_id; >> +}; >> + > > cache aligned ? Yes, see comment above > >> +static struct prod_data prod_data; >> +static struct cons_data cons_data; >> + >> +struct worker_data { >> +uint8_t dev_id; >> +uint8_t port_id; >> +}; > > cache aligned ? Yes, see comment above > >> + >> +static unsigned *enqueue_cnt; >> +static unsigned *dequeue_cnt; >> + >> +static volatile int done; >> +static volatile int prod_stop; > > No one updating the prod_stop. Old var, removed. > >> +static int quiet; >> +static int dump_dev; >> +static int dump_dev_signal; >> + >> +static uint32_t rx_lock; >> +static uint32_t tx_lock; >> +static uint32_t sched_lock; >> +static bool rx_single; >> +static bool tx_single; >> +static bool sched_single; >> + >> +static unsigned rx_core[MAX_NUM_CORE]; >> +static unsigned tx_core[MAX_NUM_CORE]; >> +static unsigned sched_core[MAX_NUM_CORE]; >> +static unsigned worker_core[MAX_NUM_CORE]; >> + >> +static bool >> +core_in_use(unsigned lcore_id) { >> +return (rx_core[lcore_id] || sched_core[lcore_id] || >> +tx_core[lcore_id] || worker_core[lcore_id]); >> +} >> + >> +static struct rte_eth_dev_tx_buffer *tx_buf[RTE_MAX_ETHPORTS]; >> + >> +static void >> +rte_eth_tx_buffer_retry(struct rte_mbuf **pkts, uint16_t unsent, >> +void *userdata) > > IMO, It is better to not use rte_eth_* for application functions. Sure, removed the 'rte_' part of the function name. > >> +{ >> +int port_id = (uintptr_t) userdata; >> +unsigned _sent = 0; >> + >> +do { >> +/* Note: hard-coded TX queue */ >> +_sent += rte_eth_tx_burst(port_id, 0, &pkts[_sent], >> + unsent - _sent); >> +} while (_sent != unsent); >> +}
Re: [dpdk-dev] [PATCH 1/3] examples/eventdev_pipeline: added sample app
Hi Jerin: On 27/6/2017 10:35 AM, Jerin Jacob wrote: -Original Message- Date: Mon, 26 Jun 2017 15:46:47 +0100 From: "Hunt, David" To: Jerin Jacob , Harry van Haaren CC: dev@dpdk.org, Gage Eads , Bruce Richardson Subject: Re: [dpdk-dev] [PATCH 1/3] examples/eventdev_pipeline: added sample app User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Hi Jerin, Hi David, Looks like you have sent the old version. The below mentioned comments are not addressed in v2. Oops. Glitch in the Matrix. I've just pushed a V3 with the changes. I'm assisting Harry on the sample app, and have just pushed up a V2 patch based on your feedback. I've addressed most of your suggestions, comments below. There may still a couple of outstanding questions that need further discussion. A few general comments: 1) Nikhil/Gage's proposal on ethdev rx to eventdev adapter will change the major portion(eventdev setup and producer()) of this application 2) Producing one lcore worth of packets, really cant show as example eventdev application as it will be pretty bad in low-end machine. At least application infrastructure should not limit. Considering above points, Should we wait for rx adapter to complete first? I would like to show this as real world application to use eventdev. Thoughts? On the same note: Can we finalize on rx adapter proposal? I can work on v1 of patch and common code if Nikhil or Gage don't have bandwidth. Let me know? last followup: http://dpdk.org/ml/archives/dev/2017-June/068776.html I had a quick chat with Harry, and wonder if we'd be as well to merge the app as it is now, and as the new frameworks become available, the app can be updated to make use of them? I feel it would be better to have something out there for people to play with than waiting for 17.11. Also, if you have bandwidth to patch the app for your desired use cases, that would be a good contribution. I'd only be guessing for some of it :) Regards, Dave On 10/5/2017 3:12 PM, Jerin Jacob wrote: -Original Message- Date: Fri, 21 Apr 2017 10:51:37 +0100 From: Harry van Haaren To: dev@dpdk.org CC: jerin.ja...@caviumnetworks.com, Harry van Haaren , Gage Eads , Bruce Richardson Subject: [PATCH 1/3] examples/eventdev_pipeline: added sample app X-Mailer: git-send-email 2.7.4 This commit adds a sample app for the eventdev library. The app has been tested with DPDK 17.05-rc2, hence this release (or later) is recommended. The sample app showcases a pipeline processing use-case, with event scheduling and processing defined per stage. The application recieves traffic as normal, with each packet traversing the pipeline. Once the packet has been processed by each of the pipeline stages, it is transmitted again. The app provides a framework to utilize cores for a single role or multiple roles. Examples of roles are the RX core, TX core, Scheduling core (in the case of the event/sw PMD), and worker cores. Various flags are available to configure numbers of stages, cycles of work at each stage, type of scheduling, number of worker cores, queue depths etc. For a full explaination, please refer to the documentation. Signed-off-by: Gage Eads Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren Thanks for the example application to share the SW view. I could make it run on HW after some tweaking(not optimized though) [...] +#define MAX_NUM_STAGES 8 +#define BATCH_SIZE 16 +#define MAX_NUM_CORE 64 How about RTE_MAX_LCORE? Core usage in the sample app is held in a uint64_t. Adding arrays would be possible, but I feel that the extra effort would not give that much benefit. I've left as is for the moment, unless you see any strong requirement to go beyond 64 cores? I think, it is OK. Again with service core infrastructure this will change. + +static unsigned int active_cores; +static unsigned int num_workers; +static unsigned long num_packets = (1L << 25); /* do ~32M packets */ +static unsigned int num_fids = 512; +static unsigned int num_priorities = 1; looks like its not used. Yes, Removed. +static unsigned int num_stages = 1; +static unsigned int worker_cq_depth = 16; +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY; +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1}; +static int16_t qid[MAX_NUM_STAGES] = {-1}; Moving all fastpath related variables under a structure with cache aligned will help. I tried a few different combinations of this, and saw no gains, some losses. So will leave as is for the moment, if that's OK. I think, the one are using in fastpath better to allocate from huge page using rte_malloc() +static int worker_cycles; +static int enable_queue_priorities; + +struct prod_data { +uint8_t dev_id; +uint8_t port_id; +int32_t qid; +unsigned num_nic_ports; +}; Yes, saw a percent or two gain when this plus following two data structs cache aligned. lo