On Fri 25-01-19 19:15:49, Michal Hocko wrote: > On Fri 25-01-19 18:33:15, Michal Hocko wrote: > > On Fri 25-01-19 17:39:38, Michal Hocko wrote: > > > On Fri 25-01-19 11:16:30, robert shteynfeld wrote: > > > > Attached is the dmesg from patched kernel. > > > > > > Your Node1 physical memory range precedes Node0 which is quite unusual > > > but it shouldn't be a huge problem on its own. But memory ranges are > > > not aligned to the memory section > > > > > > [ 0.286954] Early memory node ranges > > > [ 0.286955] node 1: [mem 0x0000000000001000-0x0000000000090fff] > > > [ 0.286955] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff] > > > [ 0.286956] node 1: [mem 0x0000000100000000-0x0000001423ffffff] > > > [ 0.286956] node 0: [mem 0x0000001424000000-0x0000002023ffffff] > > > > > > As you can see the last pfn for the node1 is inside the section and > > > Node0 starts right after. This is quite unusual as well. If for no other > > > reasons then the memmap of those struct pages will be remote for one or > > > the other. Actually I am not even sure we can handle that properly > > > because we do expect 1:1 mapping between sections and nodes. > > > > > > Now it also makes some sense why 2830bf6f05fb ("mm, memory_hotplug: > > > initialize struct pages for the full memory section") made any > > > difference. We simply write over a potentially initialized struct page > > > and blow up on that. I strongly suspect that the commit just uncovered > > > a pre-existing problem. Let me think what we can do about that. > > > > Appart from force aligning node's start the only other option is to > > revert 2830bf6f05fb and handling the underlying issue in the hotplug > > code. > > We cannot really align because we have things like ZONE_DMA starting at > 0x1000 and who knows what else. So let's go with the revert. Hutplug > simply needs a larger surgery to get rid of the PAGES_PER_SECTION > inherent assumptions. > > Linus, could you take the revert please?
or should I post the patch as a reply to make your life easier? > From 817b18d3db36a6900ca9043af8c1416c56358be3 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mho...@suse.com> > Date: Fri, 25 Jan 2019 19:08:58 +0100 > Subject: [PATCH] Revert "mm, memory_hotplug: initialize struct pages for the > full memory section" > > This reverts commit 2830bf6f05fb3e05bc4743274b806c821807a684. > > The underlying assumption that one sparse section belongs into a single > numa node doesn't hold really. Robert Shteynfeld has reported a boot > failure. The boot log was not captured but his memory layout is as > follows: > [ 0.286954] Early memory node ranges > [ 0.286955] node 1: [mem 0x0000000000001000-0x0000000000090fff] > [ 0.286955] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff] > [ 0.286956] node 1: [mem 0x0000000100000000-0x0000001423ffffff] > [ 0.286956] node 0: [mem 0x0000001424000000-0x0000002023ffffff] > > This means that node0 starts in the middle of a memory section which is > also in node1. memmap_init_zone tries to initialize padding of a section > even when it is outside of the given pfn range because there are code > paths (e.g. memory hotplug) which assume that the full worth of memory > section is always initialized. In this particular case, though, such a > range is already intialized and most likely already managed by the page > allocator. Scribbling over those pages corrupts the internal state and > likely blows up when any of those pages gets used. > > Reported-by: Robert Shteynfeld <robert.shteynf...@gmail.com> > Fixes: 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the > full memory section") > Cc: stable > Signed-off-by: Michal Hocko <mho...@suse.com> > --- > mm/page_alloc.c | 12 ------------ > 1 file changed, 12 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d295c9bc01a8..35fdde041f5c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5701,18 +5701,6 @@ void __meminit memmap_init_zone(unsigned long size, > int nid, unsigned long zone, > cond_resched(); > } > } > -#ifdef CONFIG_SPARSEMEM > - /* > - * If the zone does not span the rest of the section then > - * we should at least initialize those pages. Otherwise we > - * could blow up on a poisoned page in some paths which depend > - * on full sections being initialized (e.g. memory hotplug). > - */ > - while (end_pfn % PAGES_PER_SECTION) { > - __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid); > - end_pfn++; > - } > -#endif > } > > #ifdef CONFIG_ZONE_DEVICE > -- > 2.20.1 > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs