service: add perf measurements for with stats mode

Mattias Rönnblom Fri, 02 Sep 2022 10:18:23 -0700

On 2022-07-11 15:18, Harry van Haaren wrote:

This commit improves the performance reporting of the service
cores polling loop to show both with and without statistics
collection modes. Collecting cycle statistics is costly, due
to calls to rte_rdtsc() per service iteration.

That is true for a service deployed on only a single core. Formulti-core services, non-rdtsc-related overhead dominates. For example,if the service is deployed on 11 cores, the extra statistics-relatedoverhead is ~1000 cc/service call on x86_64. 2x rdtsc shouldn't be morethan ~50 cc.


Reported-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
Suggested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
Suggested-by: Morten Brørup <m...@smartsharesystems.com>
Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com>

---

This is split out as a seperate patch from the fix to allow
measuring the before/after of the service stats atomic fixup.
---
  app/test/test_service_cores.c | 36 ++++++++++++++++++++++++-----------
  1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index ced6ed0081..7415b6b686 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -777,6 +777,22 @@ service_run_on_app_core_func(void *arg)
        return rte_service_run_iter_on_app_lcore(*delay_service_id, 1);
  }

+static float

+service_app_lcore_perf_measure(uint32_t id)
+{
+       /* Performance test: call in a loop, and measure tsc() */
+       const uint32_t perf_iters = (1 << 12);
+       uint64_t start = rte_rdtsc();
+       uint32_t i;
+       for (i = 0; i < perf_iters; i++) {
+               int err = service_run_on_app_core_func(&id);

In a real-world scenario, the latency of this function isn'trepresentative for the overall service core overhead.

For example, consider a scenario where an lcore has a single servicemapped to it. rte_service.c will call service_run() 64 times, but onlyone will be a "hit" and the service being run. One iteration in theservice loop costs ~600 cc, on a machine where this performancebenchmark reports 128 cc. (Both with statistics disabled.)


For low-latency services, this is a significant overhead.

+               TEST_ASSERT_EQUAL(0, err, "perf test: returned run failure");
+       }
+       uint64_t end = rte_rdtsc();
+
+       return (end - start)/(float)perf_iters;
+}
+
  static int
  service_app_lcore_poll_impl(const int mt_safe)
  {
@@ -828,17 +844,15 @@ service_app_lcore_poll_impl(const int mt_safe)
                                "MT Unsafe: App core1 didn't return -EBUSY");
        }

- /* Performance test: call in a loop, and measure tsc() */

-       const uint32_t perf_iters = (1 << 12);
-       uint64_t start = rte_rdtsc();
-       uint32_t i;
-       for (i = 0; i < perf_iters; i++) {
-               int err = service_run_on_app_core_func(&id);
-               TEST_ASSERT_EQUAL(0, err, "perf test: returned run failure");
-       }
-       uint64_t end = rte_rdtsc();
-       printf("perf test for %s: %0.1f cycles per call\n", mt_safe ?
-               "MT Safe" : "MT Unsafe", (end - start)/(float)perf_iters);
+       /* Measure performance of no-stats and with-stats. */
+       float cyc_no_stats = service_app_lcore_perf_measure(id);
+
+       TEST_ASSERT_EQUAL(0, rte_service_set_stats_enable(id, 1),
+                               "failed to enable stats for service.");
+       float cyc_with_stats = service_app_lcore_perf_measure(id);
+
+       printf("perf test for %s, no stats: %0.1f, with stats %0.1f 
cycles/call\n",
+               mt_safe ? "MT Safe" : "MT Unsafe", cyc_no_stats, 
cyc_with_stats);

unregister_all();

        return TEST_SUCCESS;

Re: [PATCH v3 1/2] test/service: add perf measurements for with stats mode

Reply via email to