Re: [Bug 204789] New: Boot failure with more than 256G of memory
Hello, Regression set to "yes". Not sure how I missed that. :) Will report future PPC issues to that I come across to this list as well. Thanks! -Cameron On 9/11/19 7:31 AM, Andrew Morton wrote: (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 08 Sep 2019 00:04:26 + bugzilla-dae...@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=204789 Bug ID: 204789 Summary: Boot failure with more than 256G of memory Product: Memory Management Version: 2.5 Kernel Version: 5.2.x Hardware: PPC-64 OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Other Assignee: a...@linux-foundation.org Reporter: c...@neo-zeon.de Regression: No "Yes" :) Kernel series 5.2.x will not boot on my Talos II workstation with dual POWER9 18 core processors and 512G of physical memory with disable_radix=yes and 4k pages. 5.3-rc6 did not work either. 5.1 and earlier boot fine. Thanks. It's probably best to report this on the powerpc list, cc'ed here. I can get the system to boot IF I leave the Radix MMU enabled or if I boot a kernel with 64k pages. I haven't yet tested enabling the Radix MMU with 64k pages at the same time, but I suspect this would work. This is a system I cannot take down TOO frequently. The system will also boot with the Radix MMU disabled and 4k pages with 256G or less memory. Setting mem on the kernel CLI to 256G or less results in a successful boot. Setting mem=257G or higher no Radix MMU and 4k pages and the kernel will not boot. Petitboot comes up, but the system fails VERY early in boot in the serial console with: SIGTERM received, booting... [ 23.838858] kexec_core: Starting new kernel Early printk is enabled, and it never progresses any further. 5.1 boots just fine with the Radix MMU disabled and 4k pages. Unfortunately, I currently need 4k pages for bcache to work, and Radix MMU disabled in order for FreeBSD 12.x to work under KVM so I'm sticking with 5.1.21 for now. I have been unable to reproduce this issue in KVM. Here are my PCIe peripherals: 1. Microsemi/Adaptec HBA 1100-4i SAS controller 2. Megaraid 9316-16i SAS RAID controller. I've only tried little endian as this is a little endian install. -- You are receiving this mail because: You are the assignee for the bug.
Re: [Bug 204789] New: Boot failure with more than 256G of memory
Yep, the box comes up now, but with 256G memory as expected. I'll get back to you on when I'll be able to bisect. Thanks! On 9/13/19 7:21 AM, Aneesh Kumar K.V wrote: Aneesh Kumar K.V writes: Andrew Morton writes: (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 08 Sep 2019 00:04:26 + bugzilla-dae...@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=204789 Bug ID: 204789 Summary: Boot failure with more than 256G of memory Product: Memory Management Version: 2.5 Kernel Version: 5.2.x Hardware: PPC-64 OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Other Assignee: a...@linux-foundation.org Reporter: c...@neo-zeon.de Regression: No "Yes" :) Kernel series 5.2.x will not boot on my Talos II workstation with dual POWER9 18 core processors and 512G of physical memory with disable_radix=yes and 4k pages. 5.3-rc6 did not work either. 5.1 and earlier boot fine. Thanks. It's probably best to report this on the powerpc list, cc'ed here. I can get the system to boot IF I leave the Radix MMU enabled or if I boot a kernel with 64k pages. I haven't yet tested enabling the Radix MMU with 64k pages at the same time, but I suspect this would work. This is a system I cannot take down TOO frequently. The system will also boot with the Radix MMU disabled and 4k pages with 256G or less memory. Setting mem on the kernel CLI to 256G or less results in a successful boot. Setting mem=257G or higher no Radix MMU and 4k pages and the kernel will not boot. Petitboot comes up, but the system fails VERY early in boot in the serial console with: SIGTERM received, booting... [ 23.838858] kexec_core: Starting new kernel Early printk is enabled, and it never progresses any further. 5.1 boots just fine with the Radix MMU disabled and 4k pages. Unfortunately, I currently need 4k pages for bcache to work, and Radix MMU disabled in order for FreeBSD 12.x to work under KVM so I'm sticking with 5.1.21 for now. I have been unable to reproduce this issue in KVM. Here are my PCIe peripherals: 1. Microsemi/Adaptec HBA 1100-4i SAS controller 2. Megaraid 9316-16i SAS RAID controller. I've only tried little endian as this is a little endian install. Will you be able to bisect this? I tried 4K PAGESIZE on P8 with upstream kernel and I can't recreate the issuue. [root@ltc ~]# free -g totalusedfree shared buff/cache available Mem:495 0 494 0 0 493 Swap: 0 0 0 [root@ltc ~]# getconf PAGESIZE 4096 [root@ltc ~]# grep Hash /proc/cpuinfo MMU : Hash I will see if I can get a P9 system with largemem I was able to recreate this on a system that got memory above 16TB address. I guess your P9 system memory layout is also like that. Can you try this patch? It doesn't really fix the isssue, as in map the full 512GB of memory. But it do prevent the kernel crash. commit ebd05100344765fc3c030f0c257c2f9236fcd1ec Author: Aneesh Kumar K.V Date: Fri Sep 13 19:26:25 2019 +0530 powerpc/book3s64/hash/4k: 4k supports only 16TB linear mapping With commit: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range"), we now split the 64TB address range into 4 contexts each of 16TB. That implies we can do only 16TB linear mapping. Make sure we don't add physical memory above 16TB if that is present in the system. Signed-off-by: Aneesh Kumar K.V diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index bb3deb76c951..86cce8189240 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -35,12 +35,16 @@ extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT]; * memory requirements with large number of sections. * 51 bits is the max physical real address on POWER9 */ -#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) && \ - defined(CONFIG_PPC_64K_PAGES) + +#if defined(CONFIG_PPC_64K_PAGES) +#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) #define MAX_PHYSMEM_BITS 51 #else #define MAX_PHYSMEM_BITS 46 #endif +#else /* CONFIG_PPC_64K_PAGES */ +#define MAX_PHYSMEM_BITS 44 +#endif /* 64-bit classic hash table MMU */ #include
Re: [Bug 204789] New: Boot failure with more than 256G of memory
Running against the kernel I built against 0034d395f89d and the problem is still there. However, running against the kernel I built against the previous commit, a35a3c6f6065, and the system boots. This being due to 0034d395f89d confirmed. Thanks! On 9/13/19 9:13 AM, Aneesh Kumar K.V wrote: On 9/13/19 8:35 PM, Cameron Berkenpas wrote: Yep, the box comes up now, but with 256G memory as expected. I'll get back to you on when I'll be able to bisect. Thanks! I am sure this is due to commit: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range"), We reduced the linear map range for 4K page size to 16TB there. -aneesh
Re: [Bug 204789] New: Boot failure with more than 256G of memory
Hello, Unfortunately, this patch set has made things quite a bit worse for me. Appending mem=256G doesn't fix it either. in all cases, the system at least gets past early boot and then I will probably get a panic and eventual reboot, or occasionally it just locks up entirely. Here's my very first attempt at booting the kernel where I didn't even get a panic: https://pastebin.com/a3TVZcVB Here's another attempt where I get a panic: https://pastebin.com/QsJjyC2v Finally here's an attempt with mem=256G: https://pastebin.com/swgLYie9 I don't know that these results are substantially different from each other, but perhaps there's something helpful. Sometimes (but not in any of the above), the host gets to the point that systemd starts up, but ultimately it seems I got the same stacktrace. At one point, I ended up with a CPU guarded out, but it was simple to recover. -Cameron On 9/17/19 8:15 PM, Aneesh Kumar K.V wrote: On 9/13/19 10:58 PM, Cameron Berkenpas wrote: Running against the kernel I built against 0034d395f89d and the problem is still there. However, running against the kernel I built against the previous commit, a35a3c6f6065, and the system boots. This being due to 0034d395f89d confirmed. https://lore.kernel.org/linuxppc-dev/20190917145702.9214-1-aneesh.ku...@linux.ibm.com This series should help you. -aneesh
Re: [PATCH] powerpc/mm/book3s64/hash: Update 4k PAGE_SIZE kernel mapping
Seems to work for me so far! I've tried successfully against 5.2.21 and 5.3.6. Thanks! -Cameron On 10/15/19 10:51 PM, Aneesh Kumar K.V wrote: With commit: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range"), kernel now split the 64TB address range into 4 contexts each of 16TB. That implies we can do only 16TB linear mapping. This results in boot failure on some P9 systems. Fix this by redoing the hash 4k mapping as below. vmalloc start = 0xd000 IO start = 0xd0003800 vmemmap start = 0xf000 Vmalloc area is now 56TB in size and IO remap 8TB. We need to keep them in the same top nibble address because we map both of them in the Linux page table and they share the init_mm page table. We need a large vmalloc space because we use percpu embedded first chunk allocator. Both linear and vmemmap range is of 64TB size each and is mapped respectively using 0xc and 0xf top nibble. Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") Reported-by: Cameron Berkenpas Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/hash-4k.h | 54 ++-- arch/powerpc/include/asm/book3s/64/hash-64k.h | 73 - arch/powerpc/include/asm/book3s/64/hash.h | 82 ++- 3 files changed, 123 insertions(+), 86 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h index 8fd8599c9395..4cbb9fe22d76 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h @@ -12,23 +12,59 @@ * Hence also limit max EA bits to 64TB. */ #define MAX_EA_BITS_PER_CONTEXT 46 - -#define REGION_SHIFT (MAX_EA_BITS_PER_CONTEXT - 2) +/* + * For 4k hash, considering we restricted by a page table sizing that + * limit our address range to 64TB, keep the kernel virtual + * mapping in 0xd region. + */ +#define H_KERN_VIRT_START ASM_CONST(0xd000) /* - * Our page table limit us to 64TB. Hence for the kernel mapping, - * each MAP area is limited to 16 TB. - * The four map areas are: linear mapping, vmap, IO and vmemmap + * Top 4 bits are ignored in page table walk. */ -#define H_KERN_MAP_SIZE(ASM_CONST(1) << REGION_SHIFT) +#define EA_MASK(~(0xfUL << 60)) /* - * Define the address range of the kernel non-linear virtual area - * 16TB + * Place vmalloc and IO in the 64TB range because we map them via linux page + * table and table size is limited to 64TB. + */ +#define H_VMALLOC_STARTH_KERN_VIRT_START +/* + * 56TB vmalloc size. We require large vmalloc space for percpu mapping. */ -#define H_KERN_VIRT_START ASM_CONST(0xc0001000) +#define H_VMALLOC_SIZE (56UL << 40) +#define H_VMALLOC_END (H_VMALLOC_START + H_VMALLOC_SIZE) + +#define H_KERN_IO_STARTH_VMALLOC_END +#define H_KERN_IO_SIZE (8UL << 40) +#define H_KERN_IO_END (H_KERN_IO_START + H_KERN_IO_SIZE) + +#define H_VMEMMAP_STARTASM_CONST(0xf000) +#define H_VMEMMAP_SIZE (1UL << MAX_EA_BITS_PER_CONTEXT) +#define H_VMEMMAP_END (H_VMEMMAP_START + H_VMEMMAP_SIZE) #ifndef __ASSEMBLY__ +static inline int get_region_id(unsigned long ea) +{ + int id = (ea >> 60UL); + + switch (id) { + case 0x0: + return USER_REGION_ID; + case 0xc: + return LINEAR_MAP_REGION_ID; + case 0xd: + if (ea < H_KERN_IO_START) + return VMALLOC_REGION_ID; + else + return IO_REGION_ID; + case 0xf: + return VMEMMAP_REGION_ID; + default: + return INVALID_REGION_ID; + } +} + #define H_PTE_TABLE_SIZE (sizeof(pte_t) << H_PTE_INDEX_SIZE) #define H_PMD_TABLE_SIZE (sizeof(pmd_t) << H_PMD_INDEX_SIZE) #define H_PUD_TABLE_SIZE (sizeof(pud_t) << H_PUD_INDEX_SIZE) diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h index d1d9177d9ebd..fc44bc590ac8 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h @@ -13,18 +13,61 @@ * is handled in the hotpath. */ #define MAX_EA_BITS_PER_CONTEXT 49 -#define REGION_SHIFT MAX_EA_BITS_PER_CONTEXT + +/* + * Define the address range of the kernel non-linear virtual area + * 2PB + */ +#define H_KERN_VIRT_START ASM_CONST(0xc008) /* * We use one context for each MAP area. */ +#define REGION_SHIFT MAX_EA_BITS_PER_CONTEXT #define H_KERN_MAP_SIZE (1UL << MAX_EA_BITS_PER_CONTEXT) /* - * Define the address range of the kernel non-l