On 14 Jun 11:59, Hunt, David wrote:
> Hi Liang
> 
> 
> 
> On 8/6/2018 10:57 AM, Liang Ma wrote:
> > 1. Abstract
> > 
> > For packet processing workloads such as DPDK polling is continuous.
> > This means CPU cores always show 100% busy independent of how much work
> > those cores are doing. It is critical to accurately determine how busy
> > a core is hugely important for the following reasons:
> > 
> >     * No indication of overload conditions
> > 
> >     * User do not know how much real load is on a system meaning resulted in
> >       wasted energy as no power management is utilized
> > 
> > Tried and failed schemes include calculating the cycles required from
> > the load on the core, in other words the business. For example,
> 
> Typo, I think this should be busyness. :)
agree, I will update in next patch.
> 
> > how many cycles it costs to handle each packet and determining the frequency
> > cost per core. Due to the varying nature of traffic, types of frames and 
> > cost
> > in cycles to process, this mechanism becomes complex quickly where
> > a simple scheme is required to solve the problems.
> > 
> > 2. Proposed solution
> > 
> > For all polling mechanism, the proposed solution focus on how many times
> > empty poll executed instead of calculating how many cycles it cost to handle
> > each packet. The less empty poll number means current core is busy with
> > processing workload, therefore,  the higher frequency is needed. The high
> > empty poll number indicate current core has lots spare time, therefore,
> > we can lower the frequency.
> > 
> > 2.1 Power state definition:
> > 
> >     LOW:  the frequency is used for purge mode.
> > 
> >     MED:  the frequency is used to process modest traffic workload.
> > 
> >     HIGH: the frequency is used to process busy traffic workload.
> > 
> > 2.2 There are two phases to establish the power management system:
> > 
> >     a.Initialization/Training phase. There is no traffic pass-through,
> >       the system will test average empty poll numbers  with LOW/MED/HIGH
> >       power state. Those average empty poll numbers  will be the baseline
> >       for the normal phase. The system will collect all core's counter
> >       every 100ms. The Training phase will take 5 seconds.
> 
> I suggest that you add in that all Rx packets are blocked during this phase,
> and we measure the number of empty polls possible for each of the
> frequencies (low, medium and high).
I don't think the Rx packet is blocked,  currently, there is no traffic 
pass-throughduring training phase.
> 
> >     b.Normal phase. When the real traffic pass-though, the system will
> >       compare run-time empty poll moving average value with base line
> >       then make decision to move to HIGH power state of MED  power state.
> >       The system will collect all core's counter every 100ms.
> 
> I think this may need to be reduced from 100ms. The reaction to an increase
> in traffic would need
> to be quicker than this to avoid buffer overflow.
> 
agree, I will reduce to 10ms in next patch.
> > 
> > 3. Proposed  API
> > 
> > 1.  rte_empty_poll_stat_init(void);
> > which is used to initialize the power management system.
> > 2.  rte_empty_poll_stat_free(void);
> > which is used to free the resource hold by power management system.
> > 3.  rte_empty_poll_stat_update(unsigned int lcore_id);
> > which is used to update specific core empty poll counter, not thread safe
> > 4.  rte_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
> > which is used to update specific core valid poll counter, not thread safe
> > 5.  rte_empty_poll_stat_fetch(unsigned int lcore_id);
> > which is used to get specific core empty poll counter.
> > 6.  rte_poll_stat_fetch(unsigned int lcore_id);
> > which is used to get specific core valid poll counter.
> > 
> > 7.  rte_empty_poll_set_freq(enum freq_val index, uint32_t limit);
> > which allow user customize the frequency of power state.
> > 
> > 8.  rte_empty_poll_setup_timer(void);
> > which is used to setup the timer/callback to process all above counter.
> > 
> > Signed-off-by: Liang Ma <liang.j...@intel.com>
> > ---
> >   lib/librte_power/Makefile         |   3 +-
> >   lib/librte_power/meson.build      |   5 +-
> >   lib/librte_power/rte_empty_poll.c | 529 
> > ++++++++++++++++++++++++++++++++++++++
> >   lib/librte_power/rte_empty_poll.h | 135 ++++++++++
> >   4 files changed, 669 insertions(+), 3 deletions(-)
> >   create mode 100644 lib/librte_power/rte_empty_poll.c
> >   create mode 100644 lib/librte_power/rte_empty_poll.h
> > 
> > diff --git a/lib/librte_power/Makefile b/lib/librte_power/Makefile
> > index 6f85e88..dbc175a 100644
> > --- a/lib/librte_power/Makefile
> > +++ b/lib/librte_power/Makefile
> > @@ -16,8 +16,9 @@ LIBABIVER := 1
> >   # all source are stored in SRCS-y
> >   SRCS-$(CONFIG_RTE_LIBRTE_POWER) := rte_power.c power_acpi_cpufreq.c
> >   SRCS-$(CONFIG_RTE_LIBRTE_POWER) += power_kvm_vm.c guest_channel.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_POWER) += rte_empty_poll.c
> >   # install this header file
> > -SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h  rte_empty_poll.h
> 
> I would propose re-naming the .h and .c file to
> rte_*power_*empty_poll.[h/c], so we can
> associate it with the power library.
> 
> 
> >   include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build
> > index 253173f..5270fa3 100644
> > --- a/lib/librte_power/meson.build
> > +++ b/lib/librte_power/meson.build
> > @@ -5,5 +5,6 @@ if host_machine.system() != 'linux'
> >     build = false
> >   endif
> >   sources = files('rte_power.c', 'power_acpi_cpufreq.c',
> > -           'power_kvm_vm.c', 'guest_channel.c')
> > -headers = files('rte_power.h')
> > +           'power_kvm_vm.c', 'guest_channel.c',
> > +           'rte_empty_poll.c')
> > +headers = files('rte_power.h','rte_empty_poll.h')
> > diff --git a/lib/librte_power/rte_empty_poll.c 
> > b/lib/librte_power/rte_empty_poll.c
> > new file mode 100644
> > index 0000000..57bd63b
> > --- /dev/null
> > +++ b/lib/librte_power/rte_empty_poll.c
> > @@ -0,0 +1,529 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2010-2018 Intel Corporation
> > + */
> > +
> > +#include <string.h>
> > +
> > +#include <rte_lcore.h>
> > +#include <rte_cycles.h>
> > +#include <rte_atomic.h>
> > +#include <rte_malloc.h>
> > +
> > +
> > +
> > +
> > +#include "rte_power.h"
> > +#include "rte_empty_poll.h"
> > +
> > +
> > +
> > +
> > +#define INTERVALS_PER_SECOND 10     /* (100ms) */
> > +#define SECONDS_TO_TRAIN_FOR  5
> > +#define DEFAULT_MED_TO_HIGH_PERCENT_THRESHOLD 70
> > +#define DEFAULT_HIGH_TO_MED_PERCENT_THRESHOLD 30
> > +#define DEFAULT_CYCLES_PER_PACKET 800
> > +
> > +
> > +static struct ep_params *ep_params;
> > +static uint32_t med_to_high_threshold = 
> > DEFAULT_MED_TO_HIGH_PERCENT_THRESHOLD;
> > +static uint32_t high_to_med_threshold = 
> > DEFAULT_HIGH_TO_MED_PERCENT_THRESHOLD;
> > +
> > +static uint32_t avail_freqs[RTE_MAX_LCORE][NUM_FREQS];
> > +
> > +static uint32_t total_avail_freqs[RTE_MAX_LCORE];
> > +
> > +static uint32_t freq_index[NUM_FREQ];
> > +
> > +
> > +
> > +static uint32_t
> > +get_freq_index(enum freq_val index)
> > +{
> > +   return freq_index[index];
> > +}
> > +
> > +
> > +static int
> > +set_power_freq(int lcore_id, enum freq_val freq, bool specific_freq)
> > +{
> > +   int err = 0;
> > +   uint32_t power_freq_index;
> > +   if (!specific_freq)
> > +           power_freq_index = get_freq_index(freq);
> > +   else
> > +           power_freq_index = freq;
> > +
> > +   err = rte_power_set_freq(lcore_id, power_freq_index);
> > +
> > +   return err;
> > +}
> > +
> > +
> > +static inline void __attribute__((always_inline))
> > +exit_training_state(struct priority_worker *poll_stats)
> > +{
> > +   RTE_SET_USED(poll_stats);
> > +}
> > +
> > +static inline void __attribute__((always_inline))
> > +enter_training_state(struct priority_worker *poll_stats)
> > +{
> > +   poll_stats->iter_counter = 0;
> > +   poll_stats->cur_freq = LOW;
> > +   poll_stats->queue_state = TRAINING;
> > +}
> > +
> > +static inline void __attribute__((always_inline))
> > +enter_normal_state(struct priority_worker *poll_stats)
> > +{
> > +   /* Clear the averages arrays and strs */
> > +   memset(poll_stats->edpi_av, 0, sizeof(poll_stats->edpi_av));
> > +   poll_stats->ec = 0;
> > +   memset(poll_stats->ppi_av, 0, sizeof(poll_stats->ppi_av));
> > +   poll_stats->pc = 0;
> > +
> > +   poll_stats->cur_freq = MED;
> > +   poll_stats->iter_counter = 0;
> > +   poll_stats->threshold_ctr = 0;
> > +   poll_stats->queue_state = MED_NORMAL;
> > +   set_power_freq(poll_stats->lcore_id, MED, false);
> > +
> > +   /* Try here */
> > +   poll_stats->thresh[MED].threshold_percent = med_to_high_threshold;
> > +   poll_stats->thresh[HGH].threshold_percent = high_to_med_threshold;
> > +}
> > +
> > +static inline void __attribute__((always_inline))
> > +enter_busy_state(struct priority_worker *poll_stats)
> > +{
> > +   memset(poll_stats->edpi_av, 0, sizeof(poll_stats->edpi_av));
> > +   poll_stats->ec = 0;
> > +   memset(poll_stats->ppi_av, 0, sizeof(poll_stats->ppi_av));
> > +   poll_stats->pc = 0;
> > +
> > +   poll_stats->cur_freq = HGH;
> > +   poll_stats->iter_counter = 0;
> > +   poll_stats->threshold_ctr = 0;
> > +   poll_stats->queue_state = HGH_BUSY;
> > +   set_power_freq(poll_stats->lcore_id, HGH, false);
> > +}
> > +
> > +static inline void __attribute__((always_inline))
> > +enter_purge_state(struct priority_worker *poll_stats)
> > +{
> > +   poll_stats->iter_counter = 0;
> > +   poll_stats->queue_state = LOW_PURGE;
> > +}
> > +
> > +static inline void __attribute__((always_inline))
> > +set_state(struct priority_worker *poll_stats,
> > +           enum queue_state new_state)
> > +{
> > +   enum queue_state old_state = poll_stats->queue_state;
> > +   if (old_state != new_state) {
> > +
> > +           /* Call any old state exit functions */
> > +           if (old_state == TRAINING)
> > +                   exit_training_state(poll_stats);
> > +
> > +           /* Call any new state entry functions */
> > +           if (new_state == TRAINING)
> > +                   enter_training_state(poll_stats);
> > +           if (new_state == MED_NORMAL)
> > +                   enter_normal_state(poll_stats);
> > +           if (new_state == HGH_BUSY)
> > +                   enter_busy_state(poll_stats);
> > +           if (new_state == LOW_PURGE)
> > +                   enter_purge_state(poll_stats);
> > +   }
> > +}
> > +
> > +
> > +static void
> > +update_training_stats(struct priority_worker *poll_stats,
> > +           uint32_t freq,
> > +           bool specific_freq,
> > +           uint32_t max_train_iter)
> > +{
> > +   RTE_SET_USED(specific_freq);
> > +
> > +   char pfi_str[32];
> > +   uint64_t p0_empty_deq;
> > +
> > +   sprintf(pfi_str, "%02d", freq);
> > +
> > +   if (poll_stats->cur_freq == freq &&
> > +                   poll_stats->thresh[freq].trained == false) {
> > +           if (poll_stats->thresh[freq].cur_train_iter == 0) {
> > +
> > +                   set_power_freq(poll_stats->lcore_id,
> > +                                   freq, specific_freq);
> > +
> > +                   poll_stats->empty_dequeues_prev =
> > +                           poll_stats->empty_dequeues;
> > +
> > +                   poll_stats->thresh[freq].cur_train_iter++;
> > +
> > +                   return;
> > +           } else if (poll_stats->thresh[freq].cur_train_iter
> > +                           <= max_train_iter) {
> > +
> > +                   p0_empty_deq = poll_stats->empty_dequeues -
> > +                           poll_stats->empty_dequeues_prev;
> > +
> > +                   poll_stats->empty_dequeues_prev =
> > +                           poll_stats->empty_dequeues;
> > +
> > +                   poll_stats->thresh[freq].base_edpi += p0_empty_deq;
> > +                   poll_stats->thresh[freq].cur_train_iter++;
> > +
> > +           } else {
> > +                   if (poll_stats->thresh[freq].trained == false) {
> > +                           poll_stats->thresh[freq].base_edpi =
> > +                                   poll_stats->thresh[freq].base_edpi /
> > +                                   max_train_iter;
> > +
> > +                           /* Add on a factor of 0.05%, this should remove 
> > any */
> > +                           /* false negatives when the system is 0% busy */
> > +                           poll_stats->thresh[freq].base_edpi +=
> > +                                   poll_stats->thresh[freq].base_edpi / 
> > 2000;
> 
> Please use #define for magic numbers
> 
> > +
> > +                           poll_stats->thresh[freq].trained = true;
> > +                           poll_stats->cur_freq++;
> > +
> > +                   }
> > +           }
> > +   }
> > +}
> > +
> > +static inline uint32_t __attribute__((always_inline))
> > +update_stats(struct priority_worker *poll_stats)
> > +{
> > +   uint64_t tot_edpi = 0, tot_ppi = 0;
> > +   uint32_t j, percent;
> > +
> > +   struct priority_worker *s = poll_stats;
> > +
> > +   uint64_t cur_edpi = s->empty_dequeues - s->empty_dequeues_prev;
> > +
> > +   s->empty_dequeues_prev = s->empty_dequeues;
> > +
> > +   uint64_t ppi = s->num_dequeue_pkts - s->num_dequeue_pkts_prev;
> > +
> > +   s->num_dequeue_pkts_prev = s->num_dequeue_pkts;
> > +
> > +   if (s->thresh[s->cur_freq].base_edpi < cur_edpi)
> > +           return 1000UL; /* Value to make us fail */
> > +
> > +   s->edpi_av[s->ec++ % BINS_AV] = cur_edpi;
> > +   s->ppi_av[s->pc++ % BINS_AV] = ppi;
> > +
> > +   for (j = 0; j < BINS_AV; j++) {
> > +           tot_edpi += s->edpi_av[j];
> > +           tot_ppi += s->ppi_av[j];
> > +   }
> > +
> > +   tot_edpi = tot_edpi / BINS_AV;
> > +
> > +   percent = 100 - (uint32_t)((float)tot_edpi /
> > +                   (float)s->thresh[s->cur_freq].base_edpi * 100);
> > +
> > +   return (uint32_t)percent;
> > +}
> > +
> > +
> > +static inline void  __attribute__((always_inline))
> > +update_stats_normal(struct priority_worker *poll_stats)
> > +{
> > +   uint32_t percent;
> > +
> > +   if (poll_stats->thresh[poll_stats->cur_freq].base_edpi == 0)
> > +           return;
> > +
> > +   percent = update_stats(poll_stats);
> > +
> > +   if (percent > 100)
> > +           return;
> > +
> > +   if (poll_stats->cur_freq == LOW)
> > +           RTE_LOG(INFO, POWER, "Purge Mode is not supported\n");
> > +   else if (poll_stats->cur_freq == MED) {
> > +
> > +           if (percent > poll_stats->thresh[poll_stats->cur_freq].
> > +                           threshold_percent) {
> > +
> > +                   if (poll_stats->threshold_ctr < INTERVALS_PER_SECOND)
> > +                           poll_stats->threshold_ctr++;
> > +                   else
> > +                           set_state(poll_stats, HGH_BUSY);
> > +
> > +           } else {
> > +                   /* reset */
> > +                   poll_stats->threshold_ctr = 0;
> > +           }
> > +
> > +   } else if (poll_stats->cur_freq == HGH) {
> > +
> > +           if (percent < poll_stats->thresh[poll_stats->cur_freq].
> > +                           threshold_percent) {
> > +
> > +                   if (poll_stats->threshold_ctr < INTERVALS_PER_SECOND)
> > +                           poll_stats->threshold_ctr++;
> > +                   else
> > +                           set_state(poll_stats, MED_NORMAL);
> > +           } else
> > +                   /* reset */
> > +                   poll_stats->threshold_ctr = 0;
> > +
> > +
> > +   }
> > +}
> > +
> > +static int
> > +empty_poll_trainning(struct priority_worker *poll_stats,
> > +           uint32_t max_train_iter) {
> > +
> > +   if (poll_stats->iter_counter < INTERVALS_PER_SECOND) {
> > +           poll_stats->iter_counter++;
> > +           return 0;
> > +   }
> > +
> > +
> > +   update_training_stats(poll_stats,
> > +                   LOW,
> > +                   false,
> > +                   max_train_iter);
> > +
> > +   update_training_stats(poll_stats,
> > +                   MED,
> > +                   false,
> > +                   max_train_iter);
> > +
> > +   update_training_stats(poll_stats,
> > +                   HGH,
> > +                   false,
> > +                   max_train_iter);
> > +
> > +
> > +   if (poll_stats->thresh[LOW].trained == true
> > +                   && poll_stats->thresh[MED].trained == true
> > +                   && poll_stats->thresh[HGH].trained == true) {
> > +
> > +           set_state(poll_stats, MED_NORMAL);
> > +
> > +           RTE_LOG(INFO, POWER, "Training is Complete for %d\n",
> > +                           poll_stats->lcore_id);
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> > +static void
> > +empty_poll_detection(struct rte_timer *tim,
> > +           void *arg)
> > +{
> > +
> > +   uint32_t i;
> > +
> > +   struct priority_worker *poll_stats;
> > +
> > +   RTE_SET_USED(tim);
> > +
> > +   RTE_SET_USED(arg);
> > +
> > +   for (i = 0; i < NUM_NODES; i++) {
> > +
> > +           poll_stats = &(ep_params->wrk_data.wrk_stats[i]);
> > +
> > +           if (rte_lcore_is_enabled(poll_stats->lcore_id) == 0)
> > +                   continue;
> > +
> > +           switch (poll_stats->queue_state) {
> > +           case(TRAINING):
> > +                   empty_poll_trainning(poll_stats,
> > +                                   ep_params->max_train_iter);
> > +                   break;
> > +
> > +           case(HGH_BUSY):
> > +           case(MED_NORMAL):
> > +                   update_stats_normal(poll_stats);
> > +
> > +                   break;
> > +
> > +           case(LOW_PURGE):
> > +                   break;
> > +           default:
> > +                   break;
> > +
> > +           }
> > +
> > +   }
> > +
> > +}
> > +
> > +int
> > +rte_empty_poll_stat_init(void)
> > +{
> > +   uint32_t i;
> > +   /* Allocate the ep_params structure */
> > +   ep_params = rte_zmalloc_socket(NULL,
> > +                   sizeof(struct ep_params),
> > +                   0,
> > +                   rte_socket_id());
> > +
> > +   if (!ep_params)
> > +           rte_panic("Cannot allocate heap memory for ep_params "
> > +                           "for socket %d\n", rte_socket_id());
> > +
> > +   freq_index[LOW] = 14;
> > +   freq_index[MED] = 9;
> > +   freq_index[HGH] = 1;
> > +
> 
> Are these default frequency indexes? Can they be changed? Maybe say so in a
> comment.
> 
> > +   RTE_LOG(INFO, POWER, "Initialize the Empty Poll\n");
> > +
> > +   /* 5 seconds worth of training */
> > +   ep_params->max_train_iter = INTERVALS_PER_SECOND * SECONDS_TO_TRAIN_FOR;
> > +
> > +   struct stats_data *w = &ep_params->wrk_data;
> > +
> > +   /* initialize all wrk_stats state */
> > +   for (i = 0; i < NUM_NODES; i++) {
> > +
> > +           if (rte_lcore_is_enabled(i) == 0)
> > +                   continue;
> > +
> > +           set_state(&w->wrk_stats[i], TRAINING);
> > +           /*init the freqs table */
> > +           total_avail_freqs[i] = rte_power_freqs(i,
> > +                           avail_freqs[i],
> > +                           NUM_FREQS);
> > +
> > +           if (get_freq_index(LOW) > total_avail_freqs[i])
> > +                   return -1;
> > +
> > +   }
> > +
> > +
> > +   return 0;
> > +}
> > +
> > +void
> > +rte_empty_poll_stat_free(void)
> > +{
> > +
> > +   RTE_LOG(INFO, POWER, "Close the Empty Poll\n");
> > +
> > +   if (ep_params != NULL)
> > +           rte_free(ep_params);
> > +}
> > +
> > +int
> > +rte_empty_poll_stat_update(unsigned int lcore_id)
> > +{
> > +   struct priority_worker *poll_stats;
> > +
> > +   if (lcore_id >= NUM_NODES)
> > +           return -1;
> > +
> > +   poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> > +
> > +   if (poll_stats->lcore_id == 0)
> > +           poll_stats->lcore_id = lcore_id;
> > +
> > +   poll_stats->empty_dequeues++;
> > +
> > +   return 0;
> > +}
> > +
> > +int
> > +rte_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt)
> > +{
> > +
> > +   struct priority_worker *poll_stats;
> > +
> > +   if (lcore_id >= NUM_NODES)
> > +           return -1;
> > +
> > +   poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> > +
> > +   if (poll_stats->lcore_id == 0)
> > +           poll_stats->lcore_id = lcore_id;
> > +
> > +   poll_stats->num_dequeue_pkts += nb_pkt;
> > +
> > +   return 0;
> > +}
> > +
> > +
> > +uint64_t
> > +rte_empty_poll_stat_fetch(unsigned int lcore_id)
> > +{
> > +   struct priority_worker *poll_stats;
> > +
> > +   if (lcore_id >= NUM_NODES)
> > +           return -1;
> > +
> > +   poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> > +
> > +   if (poll_stats->lcore_id == 0)
> > +           poll_stats->lcore_id = lcore_id;
> > +
> > +   return poll_stats->empty_dequeues;
> > +}
> > +
> > +uint64_t
> > +rte_poll_stat_fetch(unsigned int lcore_id)
> > +{
> > +   struct priority_worker *poll_stats;
> > +
> > +   if (lcore_id >= NUM_NODES)
> > +           return -1;
> > +
> > +   poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> > +
> > +   if (poll_stats->lcore_id == 0)
> > +           poll_stats->lcore_id = lcore_id;
> > +
> > +   return poll_stats->num_dequeue_pkts;
> > +}
> > +
> > +void
> > +rte_empty_poll_set_freq(enum freq_val index, uint32_t limit)
> > +{
> > +   switch (index) {
> > +
> > +   case LOW:
> > +           freq_index[LOW] = limit;
> > +           break;
> > +
> > +   case MED:
> > +           freq_index[MED] = limit;
> > +           break;
> > +
> > +   case HGH:
> > +           freq_index[HGH] = limit;
> > +           break;
> > +   default:
> > +           break;
> > +   }
> > +}
> > +
> > +void
> > +rte_empty_poll_setup_timer(void)
> > +{
> > +   int lcore_id = rte_lcore_id();
> > +   uint64_t hz = rte_get_timer_hz();
> > +
> > +   struct  ep_params *ep_ptr = ep_params;
> > +
> > +   ep_ptr->interval_ticks = hz / INTERVALS_PER_SECOND;
> > +
> > +   rte_timer_reset_sync(&ep_ptr->timer0,
> > +                   ep_ptr->interval_ticks,
> > +                   PERIODICAL,
> > +                   lcore_id,
> > +                   empty_poll_detection,
> > +                   (void *)ep_ptr);
> > +
> > +}
> > diff --git a/lib/librte_power/rte_empty_poll.h 
> > b/lib/librte_power/rte_empty_poll.h
> > new file mode 100644
> > index 0000000..7e036ee
> > --- /dev/null
> > +++ b/lib/librte_power/rte_empty_poll.h
> > @@ -0,0 +1,135 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2010-2018 Intel Corporation
> > + */
> > +
> > +#ifndef _RTE_EMPTY_POLL_H
> > +#define _RTE_EMPTY_POLL_H
> > +
> > +/**
> > + * @file
> > + * RTE Power Management
> > + */
> > +#include <stdint.h>
> > +#include <stdbool.h>
> > +#include <sys/queue.h>
> > +
> > +#include <rte_common.h>
> > +#include <rte_byteorder.h>
> > +#include <rte_log.h>
> > +#include <rte_string_fns.h>
> > +#include <rte_power.h>
> > +#include <rte_timer.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#define NUM_FREQS 20
> 
> Should probably be RTE_MAX_LCORE_FREQS
> 
> > +
> > +#define BINS_AV 4 /* has to be ^2 */
> > +
> > +#define DROP (NUM_DIRECTIONS * NUM_DEVICES)
> > +
> > +#define NUM_PRIORITIES          2
> > +
> > +#define NUM_NODES         31 /*any reseanable prime number should work*/
> > +
> > +
> > +enum freq_val {
> > +   LOW,
> > +   MED,
> > +   HGH,
> 
> HGH should be HIGH. Where we see this on it's own in the code, it's not
> obvious that it's related to MEDIUM and LOW.  Also, maybe MED should be
> MEDIUM.
> 
> Would suggest prefixing these with FREQ_. Makes the code more readable.
> 
> > +   NUM_FREQ = NUM_FREQS
> 
> This should probably be the number of elements in freq_val rather than
> NUM_FREQS
> 
> I think there is some confusion in the code about the number of frequencies.
> Any array where it's holding all the possible frequencies should use
> RTE_MAX_LCORE_FREQS.
> But it looks ti me that the freq_index array only holds three values, the
> indexes for Low, Medium and High.
> 
> 
> > +};
> > +
> > +
> > +enum queue_state {
> > +   TRAINING, /* NO TRAFFIC */
> > +   MED_NORMAL,   /* MED */
> > +   HGH_BUSY,     /* HIGH */
> > +   LOW_PURGE,    /* LOW */
> > +};
> 
> I'm not sure that the NORMAL, BUSY and PURGE words add any value here. How
> about MEDIUM, HIGH and LOW?
> 
> > +
> > +/* queue stats */
> > +struct freq_threshold {
> > +
> > +   uint64_t base_edpi;
> > +   bool trained;
> > +   uint32_t threshold_percent;
> > +   uint32_t cur_train_iter;
> > +};
> > +
> > +
> > +struct priority_worker {
> > +
> > +   /* Current dequeue and throughput counts */
> > +   /* These 2 are written to by the worker threads */
> > +   /* So keep them on their own cache line */
> > +   uint64_t empty_dequeues;
> > +   uint64_t num_dequeue_pkts;
> > +
> > +   enum queue_state queue_state;
> > +
> > +   uint64_t empty_dequeues_prev;
> > +   uint64_t num_dequeue_pkts_prev;
> > +
> > +   /* Used for training only */
> > +   struct freq_threshold thresh[NUM_FREQ];
> > +   enum freq_val cur_freq;
> > +
> > +   /* bucket arrays to calculate the averages */
> > +   uint64_t edpi_av[BINS_AV];
> > +   uint32_t  ec;
> > +   uint64_t ppi_av[BINS_AV];
> > +   uint32_t  pc;
> 
> Suggest a comment per line explaining what the names of these variables
> mean. edpi, ppi, etc.
> 
> 
> > +
> > +   uint32_t lcore_id;
> > +   uint32_t iter_counter;
> > +   uint32_t threshold_ctr;
> > +   uint32_t display_ctr;
> > +   uint8_t  dev_id;
> > +
> > +} __rte_cache_aligned;
> > +
> > +
> > +struct stats_data {
> > +
> > +   struct priority_worker wrk_stats[NUM_NODES];
> > +
> > +   /* flag to stop rx threads processing packets until training over */
> > +   bool start_rx;
> > +
> > +};
> > +
> > +struct ep_params {
> > +
> > +   /* timer related stuff */
> > +   uint64_t interval_ticks;
> > +   uint32_t max_train_iter;
> > +   struct rte_timer timer0;
> > +
> > +   struct stats_data wrk_data;
> > +};
> > +
> > +
> > +int rte_empty_poll_stat_init(void);
> > +
> > +void rte_empty_poll_stat_free(void);
> > +
> > +int rte_empty_poll_stat_update(unsigned int lcore_id);
> > +
> > +int rte_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
> > +
> > +uint64_t rte_empty_poll_stat_fetch(unsigned int lcore_id);
> > +
> > +uint64_t rte_poll_stat_fetch(unsigned int lcore_id);
> > +
> > +void rte_empty_poll_set_freq(enum freq_val index, uint32_t limit);
> > +
> > +void rte_empty_poll_setup_timer(void);
> 
> All the function prototypes need documentation. Please see rte_power.h.
> And please makes sure that Doxygen documentation generates correctly from
> the comments.
> Suggest prefixing the functions with rte_power_
> 
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif
> 
> A few more comments throughout the code would be good.
> 
> 
> 

Reply via email to