Re: [PATCH v3 1/3] eal: add lcore poll busyness telemetry

Kevin Laatz Mon, 29 Aug 2022 06:16:59 -0700

On 26/08/2022 23:06, Mattias Rönnblom wrote:

On 2022-08-25 17:28, Kevin Laatz wrote:
From: Anatoly Burakov <anatoly.bura...@intel.com>
Currently, there is no way to measure lcore poll busyness in apassive way,without any modifications to the application. This patch adds a newEAL API
that will be able to passively track core polling busyness.
There's no generic way, but the DSW event device keep tracks of lcoreutilization (i.e., the fraction of cycles used to perform actual work,as opposed to just polling empty queues), and it does some with thesame basic principles as, from what it seems after a quick look, usedin this patch.
The poll busyness is calculated by relying on the fact that most DPDKAPI's
will poll for packets. Empty polls can be counted as "idle", while
Lcore worker threads poll for work. Packets, timeouts, completions,event device events, etc.

Yes, the wording was too restrictive here - the patch includes changesto drivers and libraries such as dmadev, eventdev, ring etc that pollfor work and would want to mark it as "idle" or "busy".

non-empty polls can be counted as busy. To measure lcore pollbusyness, we
I guess what is meant here is that cycles spent after non-empty pollscan be counted as busy (useful) cycles? Potentially including thecycles spent for the actual poll operation. ("Poll busyness" is a veryvague term, in my opionion.)
Similiarly, cycles spent after an empty poll would not be counted.

Correct, the generic functionality works this way. Any cycles between an"busy poll" and the next "idle poll" will be counted as busy/useful work(and vice versa).

simply call the telemetry timestamping function with the number ofpolls aparticular code section has processed, and count the number of cycleswe'vespent processing empty bursts. The more empty bursts we encounter,the less
cycles we spend in "busy" state, and the less core poll busyness will be
reported.
Is this the same scheme as DSW? When a non-zero burst in idle statemeans a transition from the idle to busy? And a zero burst poll inbusy state means a transition from idle to busy?
The issue with this scheme, is that you might potentially end up witha state transition for every iteration of the application's main loop,if packets (or other items of work) only comes in on one of thelcore's potentially many RX queues (or other input queues, such aseventdev ports). That means a rdtsc for every loop, which isn't toobad, but still might be noticable.
An application that gather items of work from multiple source beforeactually doing anything breaks this model. For example, consider alcore worker owning two RX queues, performing rte_eth_rx_burst() onboth, before attempt to process any of the received packets. If thelast poll is empty, the cycles spent will considered idle, even thoughthey were busy.
A lcore worker might also decide to poll the same RX queue multipletimes (until it hits an empty poll, or reaching some high upperbound), before initating processing of the packets.

Yes, more complex applications will need to be modified to gain a morefine-grained busyness metric. In order to achieve this level ofaccuracy, application context is required. The'RTE_LCORE_POLL_BUSYNESS_TIMESTAMP()' macro can be used within theapplication to mark sections as "busy" or "not busy" to do so. Usingyour example above, the application could keep track of multiple bursts(whether they have work or not) and call the macro before initiating theprocessing to signal that there is, in fact, work to be done.

There's a section in the documentation update in this patchset thatdescribes it. It might need more work if its not clear :-)

I didn't read your code in detail, so I might be jumping to conclusions.
In order for all of the above to work without modifications to the
application, the library code needs to be instrumented with calls to the
lcore telemetry busyness timestamping function. The following partsof DPDK
are instrumented with lcore telemetry calls:

- All major driver API's:
   - ethdev
   - cryptodev
   - compressdev
   - regexdev
   - bbdev
   - rawdev
   - eventdev
   - dmadev
- Some additional libraries:
   - ring
   - distributor
In the past, I've suggested this kind of functionality should go intothe service framework instead, with the service function explicitlysignaling wheter or not the cycles spent on something useful or not.
That seems to me like a more straight-forward and more accuratesolution, but does require the application to deploy everything asservices, and also requires a change of the service function signature.
To avoid performance impact from having lcore telemetry support, aglobalvariable is exported by EAL, and a call to timestamping function iswrapped
into a macro, so that whenever telemetry is disabled, it only takes one
Use an static inline function if you don't need the additionalexpressive power of a macro.
I suggest you also mention the performance implications, when thisfunction is enabled.


Sure, I can add a note in the next revision.

additional branch and no function calls are performed. It is alsopossible
to disable it at compile time by commenting out RTE_LCORE_BUSYNESS from
build config.
This patch also adds a telemetry endpoint to report lcore pollbusyness, as
well as telemetry endpoints to enable/disable lcore telemetry. A
documentation entry has been added to the howto guides to explain theusage
of the new telemetry endpoints and API.

Should there really be a dependency from the EAL to the telemetrylibrary? A cycle. Maybe some dependency inversion would be in order?The telemetry library could instead register an interest in gettingbusy/idle cycles reports from lcores.

Signed-off-by: Kevin Laatz <kevin.la...@intel.com>
Signed-off-by: Conor Walsh <conor.wa...@intel.com>
Signed-off-by: David Hunt <david.h...@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>

---
v3:
   * Fix missed renaming to poll busyness
   * Fix clang compilation
   * Fix arm compilation

v2:
   * Use rte_get_tsc_hz() to adjust the telemetry period
   * Rename to reflect polling busyness vs general busyness
   * Fix segfault when calling telemetry timestamp from an unregistered
     non-EAL thread.
   * Minor cleanup
---
  config/meson.build                          |   1 +
  config/rte_config.h                         |   1 +
  lib/bbdev/rte_bbdev.h                       |  17 +-
  lib/compressdev/rte_compressdev.c           |   2 +
  lib/cryptodev/rte_cryptodev.h               |   2 +
  lib/distributor/rte_distributor.c           |  21 +-
  lib/distributor/rte_distributor_single.c    |  14 +-
  lib/dmadev/rte_dmadev.h                     |  15 +-
  lib/eal/common/eal_common_lcore_telemetry.c | 293 ++++++++++++++++++++
  lib/eal/common/meson.build                  |   1 +
  lib/eal/include/rte_lcore.h                 |  80 ++++++
  lib/eal/meson.build                         |   3 +
  lib/eal/version.map                         |   7 +
  lib/ethdev/rte_ethdev.h                     |   2 +
  lib/eventdev/rte_eventdev.h                 |  10 +-
  lib/rawdev/rte_rawdev.c                     |   6 +-
  lib/regexdev/rte_regexdev.h                 |   5 +-
  lib/ring/rte_ring_elem_pvt.h                |   1 +
  meson_options.txt                           |   2 +
  19 files changed, 459 insertions(+), 24 deletions(-)
  create mode 100644 lib/eal/common/eal_common_lcore_telemetry.c

<snip>

diff --git a/lib/eal/common/eal_common_lcore_telemetry.cb/lib/eal/common/eal_common_lcore_telemetry.c
new file mode 100644
index 0000000000..bba0afc26d
--- /dev/null
+++ b/lib/eal/common/eal_common_lcore_telemetry.c
@@ -0,0 +1,293 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <unistd.h>
+#include <limits.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_errno.h>
+#include <rte_lcore.h>
+
+#ifdef RTE_LCORE_POLL_BUSYNESS
+#include <rte_telemetry.h>
+#endif
+
+int __rte_lcore_telemetry_enabled;
Is "telemetry" really the term to use here? Isn't this just anotherpiece of statistics? It can be used for telemetry, or in some otherfashion.
(Use bool not int.)


Can rename to '__rte_lcore_stats_enabled' in next revision.

+
+#ifdef RTE_LCORE_POLL_BUSYNESS
+
+struct lcore_telemetry {
+    int poll_busyness;
+    /**< Calculated poll busyness (gets set/returned by the API) */
+    int raw_poll_busyness;
+    /**< Calculated poll busyness times 100. */
+    uint64_t interval_ts;
+    /**< when previous telemetry interval started */
+    uint64_t empty_cycles;
+    /**< empty cycle count since last interval */
+    uint64_t last_poll_ts;
+    /**< last poll timestamp */
+    bool last_empty;
+    /**< if last poll was empty */
+    unsigned int contig_poll_cnt;
+    /**< contiguous (always empty/non empty) poll counter */
+} __rte_cache_aligned;
+
+static struct lcore_telemetry *telemetry_data;
+
+#define LCORE_POLL_BUSYNESS_MAX 100
+#define LCORE_POLL_BUSYNESS_NOT_SET -1
+#define LCORE_POLL_BUSYNESS_MIN 0
+
+#define SMOOTH_COEFF 5
+#define STATE_CHANGE_OPT 32
+
+static void lcore_config_init(void)
+{
+    int lcore_id;
+
+    telemetry_data = calloc(RTE_MAX_LCORE, sizeof(telemetry_data[0]));
+    if (telemetry_data == NULL)

+ rte_panic("Could not init lcore telemetry data: Out ofmemory\n");

+
+    RTE_LCORE_FOREACH(lcore_id) {
+        struct lcore_telemetry *td = &telemetry_data[lcore_id];
+
+        td->interval_ts = 0;
+        td->last_poll_ts = 0;
+        td->empty_cycles = 0;
+        td->last_empty = true;
+        td->contig_poll_cnt = 0;
+        td->poll_busyness = LCORE_POLL_BUSYNESS_NOT_SET;
+        td->raw_poll_busyness = 0;
+    }
+}
+
+int rte_lcore_poll_busyness(unsigned int lcore_id)
+{

+ const uint64_t active_thresh = rte_get_tsc_hz() *RTE_LCORE_POLL_BUSYNESS_PERIOD_MS;

+    struct lcore_telemetry *tdata;
+
+    if (lcore_id >= RTE_MAX_LCORE)
+        return -EINVAL;
+    tdata = &telemetry_data[lcore_id];
+
+    /* if the lcore is not active */
+    if (tdata->interval_ts == 0)
+        return LCORE_POLL_BUSYNESS_NOT_SET;
+    /* if the core hasn't been active in a while */
+    else if ((rte_rdtsc() - tdata->interval_ts) > active_thresh)
+        return LCORE_POLL_BUSYNESS_NOT_SET;
+
+    /* this core is active, report its poll busyness */
+    return telemetry_data[lcore_id].poll_busyness;
+}
+
+int rte_lcore_poll_busyness_enabled(void)
+{
+    return __rte_lcore_telemetry_enabled;
+}
+
+void rte_lcore_poll_busyness_enabled_set(int enable)


Use bool.

+{
+    __rte_lcore_telemetry_enabled = !!enable;


!!Another reason to use bool!! :)

Are you allowed to call this function during operation, you'll need aatomic store here (and an atomic load on the read side).

Ack

+
+    if (!enable)
+        lcore_config_init();
+}
+
+static inline int calc_raw_poll_busyness(const structlcore_telemetry *tdata,
+                    const uint64_t empty, const uint64_t total)
+{
+    /*
+ * we don't want to use floating point math here, but we wantfor our poll+ * busyness to react smoothly to sudden changes, while stillkeeping the+ * accuracy and making sure that over time the average followspoll busyness+ * as measured just-in-time. therefore, we will calculate theaverage poll+ * busyness using integer math, but shift the decimal point twoplaces+ * to the right, so that 100.0 becomes 10000. this allows us toreport+ * integer values (0..100) while still allowing ourselves tofollow the
+     * just-in-time measurements when we calculate our averages.
+     */
+    const int max_raw_idle = LCORE_POLL_BUSYNESS_MAX * 100;
+
Why not just store/manage the number of busy (or idle, or both)cycles? Then the user can decide what time period to average over, towhat extent the lcore utilization from previous periods should befactored in, etc.

There's an option 'RTE_LCORE_POLL_BUSYNESS_PERIOD_MS' added torte_config.h which would allow the user to define the time period overwhich the utilization should be reported. We only do this calculation ifthat time interval has elapsed.

In DSW, I initially presented only a load statistic (which averagedover 250 us, with some contribution from previous period). I latercame to realize that just exposing the number of busy cycles left thecalling application much more options. For example, to present theaverage load during 1 s, you needed to have some control threadsampling the load statistic during that time period, as opposed towhen the busy cycles statistic was introduced, it just had to readthat value twice (at the beginning of the period, and at the end), andcompared it will the amount of wallclock time passed.

<snip>

Re: [PATCH v3 1/3] eal: add lcore poll busyness telemetry

Reply via email to