> >> Use the new multi-monitor intrinsic to allow monitoring multiple ethdev > >> Rx queues while entering the energy efficient power state. The multi > >> version will be used unconditionally if supported, and the UMWAIT one > >> will only be used when multi-monitor is not supported by the hardware. > >> > >> Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > >> --- > >> doc/guides/prog_guide/power_man.rst | 9 ++-- > >> lib/power/rte_power_pmd_mgmt.c | 76 ++++++++++++++++++++++++++++- > >> 2 files changed, 80 insertions(+), 5 deletions(-) > >> > >> diff --git a/doc/guides/prog_guide/power_man.rst > >> b/doc/guides/prog_guide/power_man.rst > >> index fac2c19516..3245a5ebed 100644 > >> --- a/doc/guides/prog_guide/power_man.rst > >> +++ b/doc/guides/prog_guide/power_man.rst > >> @@ -221,13 +221,16 @@ power saving whenever empty poll count reaches a > >> certain number. > >> The "monitor" mode is only supported in the following configurations and > >> scenarios: > >> > >> * If ``rte_cpu_get_intrinsics_support()`` function indicates that > >> + ``rte_power_monitor_multi()`` function is supported by the platform, > >> then > >> + monitoring multiple Ethernet Rx queues for traffic will be supported. > >> + > >> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that only > >> ``rte_power_monitor()`` is supported by the platform, then monitoring > >> will be > >> limited to a mapping of 1 core 1 queue (thus, each Rx queue will have > >> to be > >> monitored from a different lcore). > >> > >> -* If ``rte_cpu_get_intrinsics_support()`` function indicates that the > >> - ``rte_power_monitor()`` function is not supported, then monitor mode > >> will not > >> - be supported. > >> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that neither > >> of the > >> + two monitoring functions are supported, then monitor mode will not be > >> supported. > >> > >> * Not all Ethernet devices support monitoring, even if the underlying > >> platform may support the necessary CPU instructions. Please refer to > >> diff --git a/lib/power/rte_power_pmd_mgmt.c > >> b/lib/power/rte_power_pmd_mgmt.c > >> index 7762cd39b8..aab2d4f1ee 100644 > >> --- a/lib/power/rte_power_pmd_mgmt.c > >> +++ b/lib/power/rte_power_pmd_mgmt.c > >> @@ -155,6 +155,24 @@ queue_list_remove(struct pmd_core_cfg *cfg, const > >> union queue *q) > >> return 0; > >> } > >> > >> +static inline int > >> +get_monitor_addresses(struct pmd_core_cfg *cfg, > >> + struct rte_power_monitor_cond *pmc) > >> +{ > >> + const struct queue_list_entry *qle; > >> + size_t i = 0; > >> + int ret; > >> + > >> + TAILQ_FOREACH(qle, &cfg->head, next) { > >> + struct rte_power_monitor_cond *cur = &pmc[i]; > > > > Looks like you never increment 'i' value inside that function. > > Also it probably will be safer to add 'num' parameter to check that > > we will never over-run pmc[] boundaries. > > Will fix in v4, good catch! > > > > >> + const union queue *q = &qle->queue; > >> + ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur); > >> + if (ret < 0) > >> + return ret; > >> + } > >> + return 0; > >> +} > >> + > >> static void > >> calc_tsc(void) > >> { > >> @@ -183,6 +201,48 @@ calc_tsc(void) > >> } > >> } > >> > >> +static uint16_t > >> +clb_multiwait(uint16_t port_id, uint16_t qidx, > >> + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, > >> + uint16_t max_pkts __rte_unused, void *addr __rte_unused) > >> +{ > >> + const unsigned int lcore = rte_lcore_id(); > >> + const union queue q = {.portid = port_id, .qid = qidx}; > >> + const bool empty = nb_rx == 0; > >> + struct pmd_core_cfg *q_conf; > >> + > >> + q_conf = &lcore_cfg[lcore]; > >> + > >> + /* early exit */ > >> + if (likely(!empty)) { > >> + q_conf->empty_poll_stats = 0; > >> + } else { > >> + /* do we care about this particular queue? */ > >> + if (!queue_is_power_save(q_conf, &q)) > >> + return nb_rx; > > > > I still don't understand the need of 'special' power_save queue here... > > Why we can't just have a function: > > > > get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(struct > > pmd_core_cfg *lcore_cfg), > > and then just: > > > > /* all queues have at least EMPTYPOLL_MAX sequential empty polls */ > > if > > (get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(q_conf) > > == 0) { > > /* go into power-save mode here */ > > } > > Okay, let's go through this step by step :) > > Let's suppose we have three queues - q0, q1 and q2. We want to sleep > whenever there's no traffic on *all of them*, however we cannot know > that until we have checked all of them. > > So, let's suppose that q0, q1 and q2 were empty all this time, but now > some traffic arrived at q2 while we're still checking q0. We see that q0 > is empty, and all of the queues were empty for the last N polls, so we > think we will be safe to sleep at q0 despite the fact that traffic has > just arrived at q2. > This is not an issue with MONITOR mode because we will be able to see if > current Rx ring descriptor is busy or not via the NIC callback, *but > this is not possible* with PAUSE and SCALE modes, because they don't > have the sneaky lookahead function of MONITOR! So, with PAUSE and SCALE > modes, it is possible to end up in a situation where you *think* you > don't have any traffic, but you actually do, you just haven't checked > the relevant queue yet.
I think such situation is unavoidable. Yes, traffic can arrive to *any* queue at *any* time. With your example above - user choose q2 as 'special' queue, but traffic actually arrives on q0 or q1. And yes, if user choose PAUSE or SCALE methods he *can* miss the traffic, because as you said for these methods there is no notification mechanisms. I think there are just unavoidable limitations with these power-save methods. > In order to prevent this from happening, we do not sleep on every queue, > instead we sleep *once* per loop. Yes, totally agree we shouldn't sleep on *every* queue. We need to go to sleep when there is no traffic on *any* of queues we monitor. > That is, we check q0, check q1, check > q2, and only then we decide whether we want to sleep or not. > Of course, with such scheme it is still possible to e.g. sleep in q2 > while there's traffic waiting in q0, Yes, exactly. > but worst case is less bad with > this scheme, because we'll be doing at worst 1 extra sleep. Hmm, I think it would be one extra sleep anyway. > Whereas with what you're suggesting, if we had e.g. 10 queues to poll, > and we checked q1 but traffic has just arrived at q0, we'll be sleeping > at q1, then we'll be sleeping at q2, then we'll be sleeping at q3, then > we'll be sleeping at q4, then we'll be sleeping at q5.... and 9 sleeps > later we finally reach q0 and find out after all this time that we > shouldn't have slept in the first place. Ah ok, I think I understand now what you are saying. Sure, to avoid such situation, we'll need to maintain extra counters and update them properly when we go to sleep. I should state it clearly at the beginning. It might be easier to explain what I meant by code snippet: lcore_conf needs 2 counters: uint64_t nb_queues_ready_to_sleep; uint64_t nb_sleeps; Plus each queue needs 2 counters: uint64_t nb_empty_polls; uint64_t nb_sleeps; Now, at rx_callback(): /* check did sleep happen since previous call, if yes, then reset queue counters */ if (queue->nb_sleeps != lcore_conf->nb_sleeps) { queue->nb_sleeps = lcore_conf->nb_sleeps; queue->nb_empty_polls = 0; } /* packet arrived, reset counters */ if (nb_rx != 0) { /* queue is not 'ready_to_sleep' any more */ if (queue->nb_empty_polls > EMPTYPOLL_MAX) lcore_conf-> nb_queues_ready_to_sleep--; queue->nb_empty_polls = 0; /* empty poll */ } else { /* queue reaches EMPTYPOLL_MAX threshold, mark it as 'ready_to_sleep' */ if (queue->nb_empty_polls == EMPTYPOLL_MAX) lcore_conf-> nb_queues_ready_to_sleep++; queue->nb_empty_polls++; } /* no traffic on any queue for at least EMPTYPOLL_MAX iterations */ if (lcore_conf-> nb_queues_ready_to_sleep == lcore_conf->n_queues) { /* update counters and sleep */ lcore_conf->nb_sleeps++; lcore_conf-> nb_queues_ready_to_sleep = 0; goto_sleep(); } } > Hopefully you get the point now :) > > So, the idea here is, for any N queues, sleep only once, not N times. > > > > >> + > >> + /* > >> + * we can increment unconditionally here because if there > >> were > >> + * non-empty polls in other queues assigned to this core, we > >> + * dropped the counter to zero anyway. > >> + */ > >> + q_conf->empty_poll_stats++; > >> + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { > >> + struct rte_power_monitor_cond pmc[RTE_MAX_ETHPORTS]; > > > > I think you need here: > > struct rte_power_monitor_cond pmc[q_conf->n_queues]; > > I think VLA's are generally agreed upon to be unsafe, so i'm avoiding > them here. Wonder why? These days DPDK uses VLA in dozens of places... But if you'd like to avoid VLA - you can use alloca(), or have lcore_conf->pmc[] and realloc() it when new queue is added/removed from the list. > > > > > > >> + uint16_t ret; > >> + > >> + /* gather all monitoring conditions */ > >> + ret = get_monitor_addresses(q_conf, pmc); > >> + > >> + if (ret == 0) > >> + rte_power_monitor_multi(pmc, > >> + q_conf->n_queues, UINT64_MAX); > >> + } > >> + } > >> + > >> + return nb_rx; > >> +} > >> + > >> static uint16_t > >> clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts > >> __rte_unused, > >> uint16_t nb_rx, uint16_t max_pkts __rte_unused, > >> @@ -348,14 +408,19 @@ static int > >> check_monitor(struct pmd_core_cfg *cfg, const union queue *qdata) > >> { > >> struct rte_power_monitor_cond dummy; > >> + bool multimonitor_supported; > >> > >> /* check if rte_power_monitor is supported */ > >> if (!global_data.intrinsics_support.power_monitor) { > >> RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not > >> supported\n"); > >> return -ENOTSUP; > >> } > >> + /* check if multi-monitor is supported */ > >> + multimonitor_supported = > >> + global_data.intrinsics_support.power_monitor_multi; > >> > >> - if (cfg->n_queues > 0) { > >> + /* if we're adding a new queue, do we support multiple queues? */ > >> + if (cfg->n_queues > 0 && !multimonitor_supported) { > >> RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not > >> supported\n"); > >> return -ENOTSUP; > >> } > >> @@ -371,6 +436,13 @@ check_monitor(struct pmd_core_cfg *cfg, const union > >> queue *qdata) > >> return 0; > >> } > >> > >> +static inline rte_rx_callback_fn > >> +get_monitor_callback(void) > >> +{ > >> + return global_data.intrinsics_support.power_monitor_multi ? > >> + clb_multiwait : clb_umwait; > >> +} > >> + > >> int > >> rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t > >> port_id, > >> uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) > >> @@ -434,7 +506,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int > >> lcore_id, uint16_t port_id, > >> if (ret < 0) > >> goto end; > >> > >> - clb = clb_umwait; > >> + clb = get_monitor_callback(); > >> break; > >> case RTE_POWER_MGMT_TYPE_SCALE: > >> /* check if we can add a new queue */ > >> -- > >> 2.25.1 > > > > > -- > Thanks, > Anatoly