On 1/20/2018 10:52 AM, Laura Abbott wrote: > On 01/20/2018 05:13 AM, Ingo Molnar wrote: >> >> * Ingo Molnar <[email protected]> wrote: >> >>> 2) >>> >>> using global variables, which is unsafe in early code if the kernel is >>> relocatable. >>> >>> The bisected to commit uses a new sme_populate_pgd_data to collect >>> variables that >>> were already on the stack, which should be position independent and safe. >>> >>> But the other commits use sme_active(), which does: >>> >>> bool sme_active(void) >>> { >>> return sme_me_mask && !sev_enabled; >>> } >>> EXPORT_SYMBOL(sme_active); >>> >>> And that looks PIC-unsafe to me, as both are globals: >>> >>> u64 sme_me_mask __section(.data) = 0; >>> EXPORT_SYMBOL(sme_me_mask); >>> >>> Does the code start working if you force sme_active() to 0 while >>> keeping the >>> function call, i.e. something like the hack below? >> >> BTW., this aspect of the boot code is really fragile, and depending on >> compiler >> there could be unsafe relocations generated without it being 'obvious' >> from the >> patch itself. It's also pretty compiler and code layout dependent ... >> >> A good way to check this I think would be to turn off >> CONFIG_RELOCATABLE=y in the >> .config - does that make the kernel boot again? >> >> If that makes a difference then we need to take a look at the >> relocations in the >> two key files, with CONFIG_RELOCATABLE=y turned back on: >> >> objdump -r arch/x86/kernel/head64.o >> objdump -r arch/x86/mm/mem_encrypt.o >> >> There's three types of relocations that should be there normally: >> >> #define R_X86_64_64 1 /* Direct 64 bit */ >> #define R_X86_64_PC32 2 /* PC relative 32 bit signed */ >> #define R_X86_64_32S 11 /* Direct 32 bit sign extended */ >> >> Only R_X86_64_PC32 is safe as-is, R_X86_64_32S needs to be used via >> fixup_pointer(). >> >> What makes this difficult in the SME context is that the early boot >> portion of >> arch/x86/mm/mem_encrypt.c is not separated out, but mixed in with later >> code. >> >> I missed this aspect when reviewing and merging this code :-( >> >> Maybe a diff of the list of relocations of the before/after commit >> points would be >> nice. >> >> I.e. does something like: >> >> git checkout <last_working_commit_sha1> >> objdump -r arch/x86/mm/mem_encrypt.o | grep R_X86 | cut -d' ' -f2- > >> working.relocs >> >> git checkout <first_broken_commit_sha1> >> objdump -r arch/x86/mm/mem_encrypt.o | grep R_X86 | cut -d' ' -f2- > >> broken.relocs >> >> diff -up working.relocs broken.relocs >> >> show any changes to the relocations? >> >> Side note: >> >> Regardless of whether it's the root cause for this regression we >> definitely need >> to improve the relocations robustness of early boot code: at minimum we >> should >> isolate all critical functionality into a separate section, and then add >> tooling >> checks to make sure all relocations are safe. >> >> Thanks, >> >> Ingo >> > > For the previous question, changing it to sme_active _does_ make the > kernel work. Unfortunately, I can't test without relocations since > I need to boot with CONFIG_EFI_STUB, but the relocations did show > something interesting: > > +R_X86_64_PC32 __stack_chk_fail-0x0000000000000004 > > There's a new call to __stack_chk_fail and if I dump the end of > sme_encrypt_kernel I do see that stuck in there. I bet the size > of struct sme_populate_pgd_data is now large enough to trigger > a stack check. If I add __nostackprotector to sme_encrypt_kernel > like sme_enable has, it boots fine. This would explain why that > particular commit showed as the problem in bisection.
Great find Laura. It must have something to do with compiler levels since my level didn't insert that check. Thanks, Tom > > Thanks, > Laura

