Re: [PATCH v4 3/7] eal: add lcore variable performance test

Mattias Rönnblom Mon, 16 Sep 2024 09:13:02 -0700

On 2024-09-16 13:54, Morten Brørup wrote:

From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Monday, 16 September 2024 13.13


On 2024-09-16 12:52, Mattias Rönnblom wrote:

Add basic micro benchmark for lcore variables, in an attempt to assure
that the overhead isn't significantly greater than alternative
approaches, in scenarios where the benefits aren't expected to show up
(i.e., when plenty of cache is available compared to the working set
size of the per-lcore data).


Here are some test results for a Raptor Cove @ 3,2 GHz (GCC 11):

   + ------------------------------------------------------- +
   + Test Suite : lcore variable perf autotest
   + ------------------------------------------------------- +
Latencies [TSC cycles/update]
Modules/Variables  Static array  Thread-local Storage  Lcore variables
                  1           3.9           5.5              3.7
                  2           3.8           5.5              3.8
                  4           4.9           5.5              3.7
                  8           3.8           5.5              3.8
                 16          11.3           5.5              3.7
                 32          20.9           5.5              3.7
                 64          23.5           5.5              3.7
                128          23.2           5.5              3.7
                256          23.5           5.5              3.7
                512          24.1           5.5              3.7
               1024          25.3           5.5              3.9
   + TestCase [ 0] : test_lcore_var_access succeeded
   + ------------------------------------------------------- +


The reason for TLS being slower than lcore variables (which in turn
relies on TLS for lcore id lookup) is the lazy initialization
conditional that is imposed on variant. Could that be avoided (which is
module-dependent I suppose), it beats lcore variables at ~3.0 cycles/update.


I think you should not assume lazy initialization of TLS in your benchmark.
Our application uses TLS, and when spinning up a new thread, we call an 
per-lcore init function of each module before calling the per-lcore run 
function. This design pattern is also described in Figure 1.4 [1] in the 
Programmer's Guide.

[1]: https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html

Per-lcore init functions may be an option, and also may not, dependingon what API you need to adhere to. But maybe I should add non-lazy TLSvariant as well.

I should probably add some information on lcore variables in the EALprogrammer's guide as well.

Non-lazy TLS would be a more viable option if there were properframework support for it. Now, I'm not sure there is a better way to doit in a DPDK library than how it's done for tracing, where there's anexplicit call per thread created. Other DPDK-internal users ofRTE_PER_LCORE seems to depend on lazy initialization.


I must say I'm surprised to see lcore variables doing this good, at
these very modest working set sizes. Probably, you can stay at near-zero
L1 misses with lcore variables (and TLS), but start missing the L1 with
static arrays.

Re: [PATCH v4 3/7] eal: add lcore variable performance test

Reply via email to