On 13-Jan-21 5:29 PM, Burakov, Anatoly wrote:
On 13-Jan-21 12:58 PM, Ananyev, Konstantin wrote:
-----Original Message-----
From: Burakov, Anatoly <anatoly.bura...@intel.com>
Sent: Tuesday, January 12, 2021 5:37 PM
To: dev@dpdk.org
Cc: Ma, Liang J <liang.j...@intel.com>; Hunt, David
<david.h...@intel.com>; Ray Kinsella <m...@ashroe.eu>; Neil Horman
<nhor...@tuxdriver.com>; tho...@monjalon.net; Ananyev, Konstantin
<konstantin.anan...@intel.com>; McDaniel, Timothy
<timothy.mcdan...@intel.com>; Richardson, Bruce
<bruce.richard...@intel.com>; Macnamara, Chris
<chris.macnam...@intel.com>
Subject: [PATCH v16 07/11] power: add PMD power management API and
callback
From: Liang Ma <liang.j...@intel.com>
Add a simple on/off switch that will enable saving power when no
packets are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering an
architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when packets arrive.
This API mandates a core-to-single-queue mapping (that is, multiple
queued per device are supported, but they have to be polled on different
cores).
This design is using PMD RX callbacks.
1. UMWAIT/UMONITOR:
When a certain threshold of empty polls is reached, the core will go
into a power optimized sleep while waiting on an address of next RX
descriptor to be written to.
2. TPAUSE/Pause instruction
This method uses the pause (or TPAUSE, if available) instruction to
avoid busy polling.
3. Frequency scaling
Reuse existing DPDK power library to scale up/down core frequency
depending on traffic volume.
Signed-off-by: Liang Ma <liang.j...@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
---
Notes:
v15:
- Fix check in UMWAIT callback
v13:
- Rework the synchronization mechanism to not require locking
- Add more parameter checking
- Rework n_rx_queues access to not go through internal PMD
structures and use
public API instead
v13:
- Rework the synchronization mechanism to not require locking
- Add more parameter checking
- Rework n_rx_queues access to not go through internal PMD
structures and use
public API instead
doc/guides/prog_guide/power_man.rst | 44 +++
doc/guides/rel_notes/release_21_02.rst | 10 +
lib/librte_power/meson.build | 5 +-
lib/librte_power/rte_power_pmd_mgmt.c | 359 +++++++++++++++++++++++++
lib/librte_power/rte_power_pmd_mgmt.h | 90 +++++++
lib/librte_power/version.map | 5 +
6 files changed, 511 insertions(+), 2 deletions(-)
create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
...
+
+static uint16_t
+clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts
__rte_unused,
+uint16_t nb_rx, uint16_t max_pkts __rte_unused,
+void *addr __rte_unused)
+{
+
+struct pmd_queue_cfg *q_conf;
+
+q_conf = &port_cfg[port_id][qidx];
+
+if (unlikely(nb_rx == 0)) {
+q_conf->empty_poll_stats++;
+if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
+struct rte_power_monitor_cond pmc;
+uint16_t ret;
+
+/*
+ * we might get a cancellation request while being
+ * inside the callback, in which case the wakeup
+ * wouldn't work because it would've arrived too early.
+ *
+ * to get around this, we notify the other thread that
+ * we're sleeping, so that it can spin until we're done.
+ * unsolicited wakeups are perfectly safe.
+ */
+q_conf->umwait_in_progress = true;
This write and subsequent read can be reordered by the cpu.
I think you need rte_atomic_thread_fence(__ATOMIC_SEQ_CST) here and
in disable() code-path below.
+
+/* check if we need to cancel sleep */
+if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
+/* use monitoring condition to sleep */
+ret = rte_eth_get_monitor_addr(port_id, qidx,
+&pmc);
+if (ret == 0)
+rte_power_monitor(&pmc, -1ULL);
+}
+q_conf->umwait_in_progress = false;
+}
+} else
+q_conf->empty_poll_stats = 0;
+
+return nb_rx;
+}
+
...
+
+int
+rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id,
+uint16_t port_id, uint16_t queue_id)
+{
+struct pmd_queue_cfg *queue_cfg;
+
+RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
+return -EINVAL;
+
+/* no need to check queue id as wrong queue id would not be enabled */
+queue_cfg = &port_cfg[port_id][queue_id];
+
+if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
+return -EINVAL;
+
+/* let the callback know we're shutting down */
+queue_cfg->pwr_mgmt_state = PMD_MGMT_BUSY;
Same as above - write to pwr_mgmt_state and read from umwait_in_progress
could be reordered by cpu.
Need to insert rte_atomic_thread_fence(__ATOMIC_SEQ_CST) between them.
BTW, out of curiosity - why do you need this intermediate
state (PMD_MGMT_BUSY) at all?
Why not directly:
queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
?
Thanks for suggestions, i'll add those.
The goal for the "intermediate" step is to prevent Rx callback from
sleeping in the first place. We can't "wake up" earlier than it goes to
sleep, but we may get a request to disable power management while we're
at the beginning of the callback and haven't yet entered the
rte_power_monitor code.
In this case, setting it to "BUSY" will prevent the callback from ever
sleeping in the first place (see rte_power_pmd_mgmt:108 check), and will
unset the "umwait in progress" if there was any.
So, we have three situations to handle:
1) wake up during umwait
2) "wake up" during callback after we've set the "umwait in progress"
flag but before actual umwait happens - we don't wait to exit before
we're sure there's nothing sleeping there
3) "wake up" during callback before we set the "umwait in progress" flag
1) is handled by the rte_power_monitor_wakeup() call, so that's taken
care of. 2) is handled by the other thread waiting on "umwait in
progress" becoming false. 3) is handled by having this BUSY check in the
umwait thread.
Hope that made sense!
On further thoughts, the "BUSY" thing relies on a hidden assumption that
enable/disable power management per queue is supposed to be thread safe.
If we let go of this assumption, we can get by with just enable/disable,
so i think i'll just document the thread safety and leave out the "BUSY"
part.
--
Thanks,
Anatoly