RE: [PATCH v13 1/3] power: introduce PM QoS API on CPU wide

Tummala, Sivaprasad Fri, 25 Oct 2024 05:08:42 -0700

[AMD Official Use Only - AMD Internal Distribution Only]

Hi Huisong,


LGTM! One comment to update the doxygen documentation for the new APIs.

> -----Original Message-----
> From: Huisong Li <[email protected]>
> Sent: Friday, October 25, 2024 2:49 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; Yigit, Ferruh
> <[email protected]>; [email protected]; [email protected];
> Tummala, Sivaprasad <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [PATCH v13 1/3] power: introduce PM QoS API on CPU wide
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> The deeper the idle state, the lower the power consumption, but the longer the
> resume time. Some service are delay sensitive and very except the low resume
> time, like interrupt packet receiving mode.
>
> And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
> interface is used to set and get the resume latency limit on the cpuX for 
> userspace.
> Each cpuidle governor in Linux select which idle state to enter based on this 
> CPU
> resume latency in their idle task.
>
> The per-CPU PM QoS API can be used to control this CPU's idle state selection
> and limit just enter the shallowest idle state to low the delay when wake up 
> from by
> setting strict resume latency (zero value).
>
> Signed-off-by: Huisong Li <[email protected]>
> Acked-by: Morten Brørup <[email protected]>
> Acked-by: Chengwen Feng <[email protected]>
> Acked-by: Konstantin Ananyev <[email protected]>
> ---
>  doc/guides/prog_guide/power_man.rst    |  19 ++++
>  doc/guides/rel_notes/release_24_11.rst |   5 +
>  lib/power/meson.build                  |   2 +
>  lib/power/rte_power_qos.c              | 123 +++++++++++++++++++++++++
>  lib/power/rte_power_qos.h              |  73 +++++++++++++++
>  lib/power/version.map                  |   4 +
>  6 files changed, 226 insertions(+)
>  create mode 100644 lib/power/rte_power_qos.c  create mode 100644
> lib/power/rte_power_qos.h
>
> diff --git a/doc/guides/prog_guide/power_man.rst
> b/doc/guides/prog_guide/power_man.rst
> index f6674efe2d..91358b04f3 100644
> --- a/doc/guides/prog_guide/power_man.rst
> +++ b/doc/guides/prog_guide/power_man.rst
> @@ -107,6 +107,25 @@ User Cases
>  The power management mechanism is used to save power when performing L3
> forwarding.
>
>
> +PM QoS
> +------
> +
> +The "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
> +interface is used to set and get the resume latency limit on the cpuX
> +for userspace. Each cpuidle governor in Linux select which idle state
> +to enter based on this CPU resume latency in their idle task.
> +
> +The deeper the idle state, the lower the power consumption, but the
> +longer the resume time. Some service are latency sensitive and very
> +except the low resume time, like interrupt packet receiving mode.
> +
> +Applications can set and get the CPU resume latency by the
> +``rte_power_qos_set_cpu_resume_latency()`` and
> +``rte_power_qos_get_cpu_resume_latency()``
> +respectively. Applications can set a strict resume latency (zero value)
> +by the ``rte_power_qos_set_cpu_resume_latency()`` to low the resume
> +latency and get better performance (instead, the power consumption of 
> platform
> may increase).
> +
> +
>  Ethernet PMD Power Management API
>  ---------------------------------
>
> diff --git a/doc/guides/rel_notes/release_24_11.rst
> b/doc/guides/rel_notes/release_24_11.rst
> index fa4822d928..d9e268274b 100644
> --- a/doc/guides/rel_notes/release_24_11.rst
> +++ b/doc/guides/rel_notes/release_24_11.rst
> @@ -237,6 +237,11 @@ New Features
>    This field is used to pass an extra configuration settings such as ability
>    to lookup IPv4 addresses in network byte order.
>
> +* **Introduce per-CPU PM QoS interface.**
> +
> +  * Add per-CPU PM QoS interface to low the resume latency when wake up from
> +    idle state.
> +
>  * **Added new API to register telemetry endpoint callbacks with private
> arguments.**
>
>    A new ``rte_telemetry_register_cmd_arg`` function is available to pass an 
> opaque
> value to diff --git a/lib/power/meson.build b/lib/power/meson.build index
> 2f0f3d26e9..9b5d3e8315 100644
> --- a/lib/power/meson.build
> +++ b/lib/power/meson.build
> @@ -23,12 +23,14 @@ sources = files(
>          'rte_power.c',
>          'rte_power_uncore.c',
>          'rte_power_pmd_mgmt.c',
> +       'rte_power_qos.c',
>  )
>  headers = files(
>          'rte_power.h',
>          'rte_power_guest_channel.h',
>          'rte_power_pmd_mgmt.h',
>          'rte_power_uncore.h',
> +       'rte_power_qos.h',
>  )
>
>  deps += ['timer', 'ethdev']
> diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c new file 
> mode
> 100644 index 0000000000..4dd0532b36
> --- /dev/null
> +++ b/lib/power/rte_power_qos.c
> @@ -0,0 +1,123 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 HiSilicon Limited
> + */
> +
> +#include <errno.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include <rte_lcore.h>
> +#include <rte_log.h>
> +
> +#include "power_common.h"
> +#include "rte_power_qos.h"
> +
> +#define PM_QOS_SYSFILE_RESUME_LATENCY_US       \
> +       "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
> +
> +#define PM_QOS_CPU_RESUME_LATENCY_BUF_LEN      32
> +
> +int
> +rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency) {
> +       char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN];
> +       uint32_t cpu_id;
> +       FILE *f;
> +       int ret;
> +
> +       if (!rte_lcore_is_enabled(lcore_id)) {
> +               POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
> +               return -EINVAL;
> +       }
> +       ret = power_get_lcore_mapped_cpu_id(lcore_id, &cpu_id);
> +       if (ret != 0)
> +               return ret;
> +
> +       if (latency < 0) {
> +               POWER_LOG(ERR, "latency should be greater than and equal to 
> 0");
> +               return -EINVAL;
> +       }
> +
> +       ret = open_core_sysfs_file(&f, "w",
> PM_QOS_SYSFILE_RESUME_LATENCY_US, cpu_id);
> +       if (ret != 0) {
> +               POWER_LOG(ERR, "Failed to open
> "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s",
> +                         cpu_id, strerror(errno));
> +               return ret;
> +       }
> +
> +       /*
> +        * Based on the sysfs interface pm_qos_resume_latency_us under
> +        * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their
> meaning
> +        * is as follows for different input string.
> +        * 1> the resume latency is 0 if the input is "n/a".
> +        * 2> the resume latency is no constraint if the input is "0".
> +        * 3> the resume latency is the actual value to be set.
> +        */
> +       if (latency == RTE_POWER_QOS_STRICT_LATENCY_VALUE)
> +               snprintf(buf, sizeof(buf), "%s", "n/a");
> +       else if (latency ==
> RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT)
> +               snprintf(buf, sizeof(buf), "%u", 0);
> +       else
> +               snprintf(buf, sizeof(buf), "%u", latency);
> +
> +       ret = write_core_sysfs_s(f, buf);
> +       if (ret != 0)
> +               POWER_LOG(ERR, "Failed to write
> "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s",
> +                         cpu_id, strerror(errno));
> +
> +       fclose(f);
> +
> +       return ret;
> +}
> +
> +int
> +rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id) {
> +       char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN];
> +       int latency = -1;
> +       uint32_t cpu_id;
> +       FILE *f;
> +       int ret;
> +
> +       if (!rte_lcore_is_enabled(lcore_id)) {
> +               POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
> +               return -EINVAL;
> +       }
> +       ret = power_get_lcore_mapped_cpu_id(lcore_id, &cpu_id);
> +       if (ret != 0)
> +               return ret;
> +
> +       ret = open_core_sysfs_file(&f, "r",
> PM_QOS_SYSFILE_RESUME_LATENCY_US, cpu_id);
> +       if (ret != 0) {
> +               POWER_LOG(ERR, "Failed to open
> "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s",
> +                         cpu_id, strerror(errno));
> +               return ret;
> +       }
> +
> +       ret = read_core_sysfs_s(f, buf, sizeof(buf));
> +       if (ret != 0) {
> +               POWER_LOG(ERR, "Failed to read
> "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s",
> +                         cpu_id, strerror(errno));
> +               goto out;
> +       }
> +
> +       /*
> +        * Based on the sysfs interface pm_qos_resume_latency_us under
> +        * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their
> meaning
> +        * is as follows for different output string.
> +        * 1> the resume latency is 0 if the output is "n/a".
> +        * 2> the resume latency is no constraint if the output is "0".
> +        * 3> the resume latency is the actual value in used for other string.
> +        */
> +       if (strcmp(buf, "n/a") == 0)
> +               latency = RTE_POWER_QOS_STRICT_LATENCY_VALUE;
> +       else {
> +               latency = strtoul(buf, NULL, 10);
> +               latency = latency == 0 ?
> RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
> +       }
> +
> +out:
> +       fclose(f);
> +
> +       return latency != -1 ? latency : ret; }
> diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h new file 
> mode
> 100644 index 0000000000..7a8dab9272
> --- /dev/null
> +++ b/lib/power/rte_power_qos.h
> @@ -0,0 +1,73 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2024 HiSilicon Limited
> + */
> +
> +#ifndef RTE_POWER_QOS_H
> +#define RTE_POWER_QOS_H
> +
> +#include <stdint.h>
> +
> +#include <rte_compat.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @file rte_power_qos.h
> + *
> + * PM QoS API.
> + *
> + * The CPU-wide resume latency limit has a positive impact on this
> +CPU's idle
> + * state selection in each cpuidle governor.
> + * Please see the PM QoS on CPU wide in the following link:
> + *
> +https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?hig
> +hlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-lat
> +ency-us
> + *
> + * The deeper the idle state, the lower the power consumption, but the
> + * longer the resume time. Some service are delay sensitive and very
> +except the
> + * low resume time, like interrupt packet receiving mode.
> + *
> + * In these case, per-CPU PM QoS API can be used to control this CPU's
> +idle
> + * state selection and limit just enter the shallowest idle state to
> +low the
> + * delay after sleep by setting strict resume latency (zero value).
> + */
> +
> +#define RTE_POWER_QOS_STRICT_LATENCY_VALUE             0
> +#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
> INT32_MAX
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @param lcore_id
> + *   target logical core id
> + *
> + * @param latency
> + *   The latency should be greater than and equal to zero in microseconds 
> unit.
> + *
> + * @return
> + *   0 on success. Otherwise negative value is returned.
> + */
> +__rte_experimental
> +int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int
> +latency);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the current resume latency of this logical core.
> + * The default value in kernel is @see
> +RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT
> + * if don't set it.
> + *
> + * @return
> + *   Negative value on failure.
> + *   >= 0 means the actual resume latency limit on this core.
> + */
> +__rte_experimental
> +int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_POWER_QOS_H */
> diff --git a/lib/power/version.map b/lib/power/version.map index
> c9a226614e..08f178a39d 100644
> --- a/lib/power/version.map
> +++ b/lib/power/version.map
> @@ -51,4 +51,8 @@ EXPERIMENTAL {
>         rte_power_set_uncore_env;
>         rte_power_uncore_freqs;
>         rte_power_unset_uncore_env;
> +
> +       # added in 24.11
> +       rte_power_qos_get_cpu_resume_latency;
> +       rte_power_qos_set_cpu_resume_latency;
>  };
> --
> 2.22.0

Acked-by: Sivaprasad Tummala <[email protected]>

RE: [PATCH v13 1/3] power: introduce PM QoS API on CPU wide

Reply via email to