On Sat, May 20, 2017 at 5:02 AM, Baoquan He <b...@redhat.com> wrote: > On SGI UV system, kernel casually hang with kaslr enabled. > > The back trace is: > > kernel BUG at arch/x86/mm/init_64.c:311! > invalid opcode: 0000 [#1] SMP > [...] > RIP: 0010:__init_extra_mapping+0x188/0x196 > [...] > Call Trace: > init_extra_mapping_uc+0x13/0x15 > map_high+0x67/0x75 > map_mmioh_high_uv3+0x20a/0x219 > uv_system_init_hub+0x12d9/0x1496 > uv_system_init+0x27/0x29 > native_smp_prepare_cpus+0x28d/0x2d8 > kernel_init_freeable+0xdd/0x253 > ? rest_init+0x80/0x80 > kernel_init+0xe/0x110 > ret_from_fork+0x2c/0x40 > > The root cause is that SGI UV system needs map its MMIOH region to direct > mapping section and the mapping happens in rest_init(). However mm KASLR > is done in kernel_randomize_memory() which is much earlier than MMIOH > mapping of SGI UV and doesn't count in the MMIOH regions. When kaslr > disabled, there are 64TB space for system RAM to do direct mapping. Both > system RAM and SGI UV MMIOH region share this 64TB space. With kaslr > enabled, mm KASLR only reserves the actual size of system RAM plus 10TB > for direct mapping usage. Then later MMIOH mapping of SGI UV could go > beyond the upper bound of direct mapping section to step into VMALLOC or > VMEMMAP area. Then the BUG_ON() in __init_extra_mapping() will be > triggered. > > E.g on the SGI UV3 machine where this bug is reported , there are two MMIOH > regions: > > [ 1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000 > [ 1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000 > > They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G ram are > spread out to 1TB regions. Then above two SGI MMIOH regions also will be > mapped into the direct mapping section. > > To fix it, we need check if it's SGI UV system by calling > is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt the > size of the direct mapping section. Do it now. > > Signed-off-by: Baoquan He <b...@redhat.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Ingo Molnar <mi...@redhat.com> > Cc: "H. Peter Anvin" <h...@zytor.com> > Cc: x...@kernel.org > Cc: Thomas Garnier <thgar...@google.com> > Cc: Kees Cook <keesc...@chromium.org> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Masahiro Yamada <yamada.masah...@socionext.com> > --- > arch/x86/mm/kaslr.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c > index aed2064..20b0456 100644 > --- a/arch/x86/mm/kaslr.c > +++ b/arch/x86/mm/kaslr.c > @@ -27,6 +27,7 @@ > #include <asm/pgtable.h> > #include <asm/setup.h> > #include <asm/kaslr.h> > +#include <asm/uv/uv.h> > > #include "mm_internal.h" > > @@ -123,7 +124,7 @@ void __init kernel_randomize_memory(void) > CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; > > /* Adapt phyiscal memory region size based on available memory */ > - if (memory_tb < kaslr_regions[0].size_tb) > + if (memory_tb < kaslr_regions[0].size_tb && !is_early_uv_system())
Given your example, any way we could just restrict memory_tb to be 32TB? Or different configurations will result in different mappings? > kaslr_regions[0].size_tb = memory_tb; > > /* Calculate entropy available between regions */ > -- > 2.5.5 > -- Thomas