> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Monday, 16 September 2024 18.13 > > On 2024-09-16 13:54, Morten Brørup wrote: > >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > >> Sent: Monday, 16 September 2024 13.13 > >> > >> The reason for TLS being slower than lcore variables (which in turn > >> relies on TLS for lcore id lookup) is the lazy initialization > >> conditional that is imposed on variant. Could that be avoided (which > is > >> module-dependent I suppose), it beats lcore variables at ~3.0 > cycles/update. > > > > I think you should not assume lazy initialization of TLS in your > benchmark. > > Our application uses TLS, and when spinning up a new thread, we call > an per-lcore init function of each module before calling the per-lcore > run function. This design pattern is also described in Figure 1.4 [1] in > the Programmer's Guide. > > > > [1]: https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html > > > > Per-lcore init functions may be an option, and also may not, depending > on what API you need to adhere to. But maybe I should add non-lazy TLS > variant as well.
Certainly. Both, or just non-lazy is fine with me. > > I should probably add some information on lcore variables in the EAL > programmer's guide as well. +1 > > Non-lazy TLS would be a more viable option if there were proper > framework support for it. The framework should provide RTE_LCORE_INIT macros for modules to define per-lcore init functions, which EAL should call when EAL creates additional threads. And they should obviously be called from within the newly created thread, not from the main thread. And if some per-lcore init function only needs to do it work for worker threads, the init function can check the thread type as the first thing. > Now, I'm not sure there is a better way to do > it in a DPDK library than how it's done for tracing, where there's an > explicit call per thread created. Other DPDK-internal users of > RTE_PER_LCORE seems to depend on lazy initialization. The framework lacks the per-thread init feature, so it's implemented differently in different modules. Don't get distracted by how the trace module does it. Just imagine the framework offering some generic mechanism to do it.