Bharata B Rao's on June 19, 2019 5:45 pm: > We hit the following BUG_ON when memory hotplugged before reboot > is unplugged after reboot: > > kernel BUG at arch/powerpc/mm/pgtable-frag.c:113! > > remove_pagetable+0x594/0x6a0 > (unreliable) > remove_pagetable+0x94/0x6a0 > vmemmap_free+0x394/0x410 > sparse_remove_one_section+0x26c/0x2e8 > __remove_pages+0x428/0x540 > arch_remove_memory+0xd0/0x170 > __remove_memory+0xd4/0x1a0 > dlpar_remove_lmb+0xbc/0x110 > dlpar_memory+0xa80/0xd20 > handle_dlpar_errorlog+0xa8/0x160 > pseries_hp_work_fn+0x2c/0x60 > process_one_work+0x46c/0x860 > worker_thread+0x364/0x5e0 > kthread+0x1b0/0x1c0 > ret_from_kernel_thread+0x5c/0x68 > > This occurs because, during reboot-after-hotplug, the hotplugged > memory range gets initialized as regular memory and page > tables are setup using memblock allocator. This means that we > wouldn't have initialized the PMD or PTE fragment count for > those PMD or PTE pages. > > Fixing this includes 3 aspects: > > - Walk the init_mm page tables from mem_init() and initialize > the PMD and PTE fragment counts appropriately. > - When we do early allocation of PMD (and PGD as well) pages, > allocate in page size PAGE_SIZE granularity so that we are > sure that the complete page is available for us to set the > fragment count which is part of struct page. > - When PMD or PTE page is freed, check if it comes from memblock > allocator and free it appropriately. > > Reported-by: Srikanth Aithal <srait...@linux.vnet.ibm.com> > Signed-off-by: Bharata B Rao <bhar...@linux.ibm.com> > --- > arch/powerpc/include/asm/book3s/64/radix.h | 1 + > arch/powerpc/include/asm/sparsemem.h | 1 + > arch/powerpc/mm/book3s64/pgtable.c | 12 +++- > arch/powerpc/mm/book3s64/radix_pgtable.c | 67 +++++++++++++++++++++- > arch/powerpc/mm/mem.c | 5 ++ > arch/powerpc/mm/pgtable-frag.c | 5 +- > 6 files changed, 87 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/radix.h > b/arch/powerpc/include/asm/book3s/64/radix.h > index 574eca33f893..4320f2790e8d 100644 > --- a/arch/powerpc/include/asm/book3s/64/radix.h > +++ b/arch/powerpc/include/asm/book3s/64/radix.h > @@ -285,6 +285,7 @@ static inline unsigned long radix__get_tree_size(void) > #ifdef CONFIG_MEMORY_HOTPLUG > int radix__create_section_mapping(unsigned long start, unsigned long end, > int nid); > int radix__remove_section_mapping(unsigned long start, unsigned long end); > +void radix__fixup_pgtable_fragments(void); > #endif /* CONFIG_MEMORY_HOTPLUG */ > #endif /* __ASSEMBLY__ */ > #endif > diff --git a/arch/powerpc/include/asm/sparsemem.h > b/arch/powerpc/include/asm/sparsemem.h > index 3192d454a733..e662f9232d35 100644 > --- a/arch/powerpc/include/asm/sparsemem.h > +++ b/arch/powerpc/include/asm/sparsemem.h > @@ -15,6 +15,7 @@ > #ifdef CONFIG_MEMORY_HOTPLUG > extern int create_section_mapping(unsigned long start, unsigned long end, > int nid); > extern int remove_section_mapping(unsigned long start, unsigned long end); > +void fixup_pgtable_fragments(void); > > #ifdef CONFIG_PPC_BOOK3S_64 > extern int resize_hpt_for_hotplug(unsigned long new_mem_size); > diff --git a/arch/powerpc/mm/book3s64/pgtable.c > b/arch/powerpc/mm/book3s64/pgtable.c > index 01bc9663360d..7efe9cc16b39 100644 > --- a/arch/powerpc/mm/book3s64/pgtable.c > +++ b/arch/powerpc/mm/book3s64/pgtable.c > @@ -186,6 +186,13 @@ int __meminit remove_section_mapping(unsigned long > start, unsigned long end) > > return hash__remove_section_mapping(start, end); > } > + > +void fixup_pgtable_fragments(void) > +{ > + if (radix_enabled()) > + radix__fixup_pgtable_fragments(); > +} > + > #endif /* CONFIG_MEMORY_HOTPLUG */ > > void __init mmu_partition_table_init(void) > @@ -320,7 +327,10 @@ void pmd_fragment_free(unsigned long *pmd) > BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0); > if (atomic_dec_and_test(&page->pt_frag_refcount)) { > pgtable_pmd_page_dtor(page); > - __free_page(page); > + if (PageReserved(page)) > + free_reserved_page(page);
Hmm. Rather than adding this special case here, I wonder if you can just go along in your fixup walk and convert all these pages to non-reserved pages? ClearPageReserved ; init_page_count ; adjust_managed_page_count ; should do the trick, right? > + else > + __free_page(page); Thanks, Nick