On Tue, 17 May 2016 06:44:54 -0700 Andi Kleen <[email protected]> wrote:
> From: Andi Kleen <[email protected]> > > Linux pre-allocates the task structs of the idle tasks for all possible CPUs. > This currently means they all end up on node 0. This also implies > that the cache line of MWAIT, which is around the flags field in the task > struct, are all located in node 0. > > We see a noticeable performance improvement on Knights Landing CPUs when > the cache lines used for MWAIT are located in the local nodes of the CPUs > using them. I would expect this to give a (likely slight) improvement > on other systems too. > > The patch implements placing the idle task in the node of > its CPUs, by passing the right target node to copy_process() > Looks nice. This is nicer ;) From: Andrew Morton <[email protected]> Subject: allocate-idle-task-for-a-cpu-always-on-its-local-node-fix use NUMA_NO_NODE, not a bare -1 Cc: Andi Kleen <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> --- kernel/fork.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff -puN kernel/fork.c~allocate-idle-task-for-a-cpu-always-on-its-local-node-fix kernel/fork.c --- a/kernel/fork.c~allocate-idle-task-for-a-cpu-always-on-its-local-node-fix +++ a/kernel/fork.c @@ -346,7 +346,7 @@ static struct task_struct *dup_task_stru struct thread_info *ti; int err; - if (node < 0) + if (node == NUMA_NO_NODE) node = tsk_fork_get_node(orig); tsk = alloc_task_struct_node(node); if (!tsk) @@ -1754,7 +1754,7 @@ long _do_fork(unsigned long clone_flags, } p = copy_process(clone_flags, stack_start, stack_size, - child_tidptr, NULL, trace, tls, -1); + child_tidptr, NULL, trace, tls, NUMA_NO_NODE); /* * Do this prior waking up the new thread - the thread pointer * might get invalid after that point, if the thread exits quickly. _

