On Tue 02-03-21 06:02:14, Shakeel Butt wrote: > On Mon, Mar 1, 2021 at 11:52 PM Muchun Song <songmuc...@bytedance.com> wrote: > > > > The alloc_thread_stack_node() cannot guarantee that allocated stack pages > > are in the same node when CONFIG_VMAP_STACK. Because we do not specify > > __GFP_THISNODE to __vmalloc_node_range(). > > Instead of __GFP_THISNODE, mention that the kernel_clone() passes > NUMA_NO_NODE which is being used for __vmalloc_node_range().
If we really want to do this then I would recommend reasoning in the following line: " For simplification 991e7673859e ("mm: memcontrol: account kernel stack per node") has changed the per zone vmalloc backed stack pages accounting to per node. By doing that we have lost a certain precision because those pages might live in different NUMA nodes. In the end NR_KERNEL_STACK_KB exported to the userspace might be over estimated on some nodes while underestimated on others. < some examples would go here ideally > This doesn't impose any real problem to correctnes of the kernel behavior as the counter is not used for any internal processing but it can cause some confusion to the userspace. Address the problem by accounting each vmalloc backing page to its own node. " -- Michal Hocko SUSE Labs