+CC: EAL and Memory maintainers. > From: Don Wallwork [mailto:d...@xsightlabs.com] > Sent: Tuesday, 26 April 2022 23.26 > > On 4/26/2022 5:21 PM, Stephen Hemminger wrote: > > On Tue, 26 Apr 2022 17:01:18 -0400 > > Don Wallwork <d...@xsightlabs.com> wrote: > > > >> On 4/26/2022 10:58 AM, Stephen Hemminger wrote: > >>> On Tue, 26 Apr 2022 08:19:59 -0400 > >>> Don Wallwork <d...@xsightlabs.com> wrote: > >>> > >>>> Add support for using hugepages for worker lcore stack memory. > The > >>>> intent is to improve performance by reducing stack memory related > TLB > >>>> misses and also by using memory local to the NUMA node of each > lcore.
This certainly seems like a good idea! However, I wonder: Does the O/S assign memory local to the NUMA node to an lcore-pinned thread's stack when instantiating the tread? And does the DPDK EAL ensure that the preconditions for the O/S to do that are present? (Not relevant for this patch, but the same locality questions come to mind regarding Thread Local Storage.) > >>>> > >>>> Platforms desiring to make use of this capability must enable the > >>>> associated option flag and stack size settings in platform config > >>>> files. > >>>> --- > >>>> lib/eal/linux/eal.c | 39 > +++++++++++++++++++++++++++++++++++++++ > >>>> 1 file changed, 39 insertions(+) > >>>> > >>> Good idea but having a fixed size stack makes writing complex > application > >>> more difficult. Plus you lose the safety of guard pages. Would it be possible to add a guard page or guard region by using the O/S memory allocator instead of rte_zmalloc_socket()? Since the stack is considered private to the process, i.e. not accessible from other processes, this patch does not need to provide remote access to stack memory from secondary processes - and thus it is not a requirement for this features to use DPDK managed memory. > >> Thanks for the quick reply. > >> > >> The expectation is that use of this optional feature would be > limited to > >> cases where > >> the performance gains justify the implications of these tradeoffs. > For > >> example, a specific > >> data plane application may be okay with limited stack size and could > be > >> tested to ensure > >> stack usage remains within limits. How to identify the required stack size and verify it... If aiming for small stacks, some instrumentation would be nice, like rte_mempool_audit() and rte_mempool_list_dump(). Alternatively, just assume that the stack is "always big enough", and don't worry about it - like the default O/S stack size. And as Stephen already mentioned: Regardless of stack size, overflowing the stack will cause memory corruption instead of a segmentation fault. Keep in mind that the required stack size not only depends on the application, but also on DPDK and other libraries being used by the application. > >> > >> Also, since this applies only to worker threads, the main thread > would > >> not be impacted > >> by this change. > >> > >> > > I would prefer it as a runtime, not compile time option. > > That way distributions could ship DPDK and application could opt in > if it wanted. > Good point.. I'll work on a v2 and will post that when it's ready. May I suggest using the stack size configured in the O/S, from pthread_attr_getstacksize() or similar, instead of choosing the stack size manually? If you want it to be configurable, use the default size unless explicitly specified otherwise. Do the worker threads need a different stack size than the main thread? In my opinion: "Nice to have", not "must have". Do the worker threads need different stack sizes individually? In my opinion: Perhaps "nice to have", certainly not "must have".