There are NUMA machines with memory-less node. At present page allocator builds the full fallback info by build_zonelists(). But memblock allocator does not utilize this info. And for memory-less node, memblock allocator just falls back "node 0", without utilizing the nearest node. Unfortunately, the percpu section is allocated by memblock, which is accessed frequently after bootup.
This series aims to improve the performance of per cpu section on memory-less node by feeding node's fallback info to memblock allocator on x86, like we do for page allocator. On other archs, it requires independent effort to setup node to cpumask map ahead. CC: Thomas Gleixner <[email protected]> CC: Ingo Molnar <[email protected]> CC: Borislav Petkov <[email protected]> CC: "H. Peter Anvin" <[email protected]> CC: Dave Hansen <[email protected]> CC: Vlastimil Babka <[email protected]> CC: Mike Rapoport <[email protected]> CC: Andrew Morton <[email protected]> CC: Mel Gorman <[email protected]> CC: Joonsoo Kim <[email protected]> CC: Andy Lutomirski <[email protected]> CC: Andi Kleen <[email protected]> CC: Petr Tesarik <[email protected]> CC: Michal Hocko <[email protected]> CC: Stephen Rothwell <[email protected]> CC: Jonathan Corbet <[email protected]> CC: Nicholas Piggin <[email protected]> CC: Daniel Vacek <[email protected]> CC: [email protected] Pingfan Liu (6): mm/numa: extract the code of building node fall back list mm/memblock: make full utilization of numa info x86/numa: define numa_init_array() conditional on CONFIG_NUMA x86/numa: concentrate the code of setting cpu to node map x86/numa: push forward the setup of node to cpumask map x86/numa: build node fallback info after setting up node to cpumask map arch/x86/include/asm/topology.h | 4 --- arch/x86/kernel/setup.c | 2 ++ arch/x86/kernel/setup_percpu.c | 3 -- arch/x86/mm/numa.c | 40 +++++++++++------------- include/linux/memblock.h | 3 ++ mm/memblock.c | 68 ++++++++++++++++++++++++++++++++++++++--- mm/page_alloc.c | 48 +++++++++++++++++------------ 7 files changed, 114 insertions(+), 54 deletions(-) -- 2.7.4

