Re: [dpdk-dev] [PATCH v3 6/7] power: support monitoring multiple Rx queues

Burakov, Anatoly Mon, 28 Jun 2021 07:09:19 -0700

On 28-Jun-21 2:29 PM, Ananyev, Konstantin wrote:

Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
Rx queues while entering the energy efficient power state. The multi
version will be used unconditionally if supported, and the UMWAIT one
will only be used when multi-monitor is not supported by the hardware.

Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
---
  doc/guides/prog_guide/power_man.rst |  9 ++--
  lib/power/rte_power_pmd_mgmt.c      | 76 ++++++++++++++++++++++++++++-
  2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/doc/guides/prog_guide/power_man.rst 
b/doc/guides/prog_guide/power_man.rst
index fac2c19516..3245a5ebed 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -221,13 +221,16 @@ power saving whenever empty poll count reaches a certain 
number.
  The "monitor" mode is only supported in the following configurations and 
scenarios:

  * If ``rte_cpu_get_intrinsics_support()`` function indicates that
+  ``rte_power_monitor_multi()`` function is supported by the platform, then
+  monitoring multiple Ethernet Rx queues for traffic will be supported.
+
+* If ``rte_cpu_get_intrinsics_support()`` function indicates that only
    ``rte_power_monitor()`` is supported by the platform, then monitoring will 
be
    limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be
    monitored from a different lcore).

-* If ``rte_cpu_get_intrinsics_support()`` function indicates that the
-  ``rte_power_monitor()`` function is not supported, then monitor mode will not
-  be supported.
+* If ``rte_cpu_get_intrinsics_support()`` function indicates that neither of 
the
+  two monitoring functions are supported, then monitor mode will not be 
supported.

  * Not all Ethernet devices support monitoring, even if the underlying
    platform may support the necessary CPU instructions. Please refer to
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 7762cd39b8..aab2d4f1ee 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -155,6 +155,24 @@ queue_list_remove(struct pmd_core_cfg *cfg, const union 
queue *q)
       return 0;
  }

+static inline int
+get_monitor_addresses(struct pmd_core_cfg *cfg,
+             struct rte_power_monitor_cond *pmc)
+{
+     const struct queue_list_entry *qle;
+     size_t i = 0;
+     int ret;
+
+     TAILQ_FOREACH(qle, &cfg->head, next) {
+             struct rte_power_monitor_cond *cur = &pmc[i];


Looks like you never increment 'i' value inside that function.
Also it probably will be safer to add 'num' parameter to check that
we will never over-run pmc[] boundaries.


Will fix in v4, good catch!

+             const union queue *q = &qle->queue;
+             ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur);
+             if (ret < 0)
+                     return ret;
+     }
+     return 0;
+}
+
  static void
  calc_tsc(void)
  {
@@ -183,6 +201,48 @@ calc_tsc(void)
       }
  }

+static uint16_t
+clb_multiwait(uint16_t port_id, uint16_t qidx,
+             struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
+             uint16_t max_pkts __rte_unused, void *addr __rte_unused)
+{
+     const unsigned int lcore = rte_lcore_id();
+     const union queue q = {.portid = port_id, .qid = qidx};
+     const bool empty = nb_rx == 0;
+     struct pmd_core_cfg *q_conf;
+
+     q_conf = &lcore_cfg[lcore];
+
+     /* early exit */
+     if (likely(!empty)) {
+             q_conf->empty_poll_stats = 0;
+     } else {
+             /* do we care about this particular queue? */
+             if (!queue_is_power_save(q_conf, &q))
+                     return nb_rx;


I still don't understand the need of 'special' power_save queue here...
Why we can't just have a function:

get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(struct 
pmd_core_cfg *lcore_cfg),
and then just:

/* all queues have at least EMPTYPOLL_MAX sequential empty polls */
if 
(get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(q_conf) 
== 0) {
     /* go into power-save mode here */
}


Okay, let's go through this step by step :)

Let's suppose we have three queues - q0, q1 and q2. We want to sleepwhenever there's no traffic on *all of them*, however we cannot knowthat until we have checked all of them.

So, let's suppose that q0, q1 and q2 were empty all this time, but nowsome traffic arrived at q2 while we're still checking q0. We see that q0is empty, and all of the queues were empty for the last N polls, so wethink we will be safe to sleep at q0 despite the fact that traffic hasjust arrived at q2.

This is not an issue with MONITOR mode because we will be able to see ifcurrent Rx ring descriptor is busy or not via the NIC callback, *butthis is not possible* with PAUSE and SCALE modes, because they don'thave the sneaky lookahead function of MONITOR! So, with PAUSE and SCALEmodes, it is possible to end up in a situation where you *think* youdon't have any traffic, but you actually do, you just haven't checkedthe relevant queue yet.

In order to prevent this from happening, we do not sleep on every queue,instead we sleep *once* per loop. That is, we check q0, check q1, checkq2, and only then we decide whether we want to sleep or not.

Of course, with such scheme it is still possible to e.g. sleep in q2while there's traffic waiting in q0, but worst case is less bad withthis scheme, because we'll be doing at worst 1 extra sleep.

Whereas with what you're suggesting, if we had e.g. 10 queues to poll,and we checked q1 but traffic has just arrived at q0, we'll be sleepingat q1, then we'll be sleeping at q2, then we'll be sleeping at q3, thenwe'll be sleeping at q4, then we'll be sleeping at q5.... and 9 sleepslater we finally reach q0 and find out after all this time that weshouldn't have slept in the first place. Hopefully you get the point now :)


So, the idea here is, for any N queues, sleep only once, not N times.

+
+             /*
+              * we can increment unconditionally here because if there were
+              * non-empty polls in other queues assigned to this core, we
+              * dropped the counter to zero anyway.
+              */
+             q_conf->empty_poll_stats++;
+             if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
+                     struct rte_power_monitor_cond pmc[RTE_MAX_ETHPORTS];


I think you need here:
struct rte_power_monitor_cond pmc[q_conf->n_queues];

I think VLA's are generally agreed upon to be unsafe, so i'm avoidingthem here.

+                     uint16_t ret;
+
+                     /* gather all monitoring conditions */
+                     ret = get_monitor_addresses(q_conf, pmc);
+
+                     if (ret == 0)
+                             rte_power_monitor_multi(pmc,
+                                     q_conf->n_queues, UINT64_MAX);
+             }
+     }
+
+     return nb_rx;
+}
+
  static uint16_t
  clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts 
__rte_unused,
               uint16_t nb_rx, uint16_t max_pkts __rte_unused,
@@ -348,14 +408,19 @@ static int
  check_monitor(struct pmd_core_cfg *cfg, const union queue *qdata)
  {
       struct rte_power_monitor_cond dummy;
+     bool multimonitor_supported;

       /* check if rte_power_monitor is supported */
       if (!global_data.intrinsics_support.power_monitor) {
               RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not 
supported\n");
               return -ENOTSUP;
       }
+     /* check if multi-monitor is supported */
+     multimonitor_supported =
+                     global_data.intrinsics_support.power_monitor_multi;

-     if (cfg->n_queues > 0) {
+     /* if we're adding a new queue, do we support multiple queues? */
+     if (cfg->n_queues > 0 && !multimonitor_supported) {
               RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not 
supported\n");
               return -ENOTSUP;
       }
@@ -371,6 +436,13 @@ check_monitor(struct pmd_core_cfg *cfg, const union queue 
*qdata)
       return 0;
  }

+static inline rte_rx_callback_fn
+get_monitor_callback(void)
+{
+     return global_data.intrinsics_support.power_monitor_multi ?
+             clb_multiwait : clb_umwait;
+}
+
  int
  rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
               uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
@@ -434,7 +506,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, 
uint16_t port_id,
               if (ret < 0)
                       goto end;

-             clb = clb_umwait;
+             clb = get_monitor_callback();
               break;
       case RTE_POWER_MGMT_TYPE_SCALE:
               /* check if we can add a new queue */
--
2.25.1



--
Thanks,
Anatoly

Re: [dpdk-dev] [PATCH v3 6/7] power: support monitoring multiple Rx queues

Reply via email to