On Mon, Sep 14, 2020 at 2:05 PM Nick Desaulniers <ndesaulni...@google.com> wrote: > > (Bill, please use `./scripts/get_maintainer.pl <patchfile>` to get the > appropriate list of folks and mailing lists to CC) > > On Thu, Sep 3, 2020 at 8:06 AM Andy Lutomirski <l...@kernel.org> wrote: > > > > So with my selftests hat on, the inline asm utterly sucks. Trying to > > use pushfq / popfq violates the redzone and requires a gross hack to > > work around. It also messes up the DWARF annotations. The kernel > > gets a free pass on both of these because we don't use a redzone and > > we don't use DWARF. > > Sorry, I don't fully understand: > 1. What's the redzone?
Userspace ABI. x86_64 userspace is allowed to use 128 bytes below RSP. > 2. How does pushfq/popfq violate it? It clobbers a stack slot owned by the compiler. > 3. What is the "gross hack"/"workaround?" (So that we might consider > removing it if these builtins help). Look in tools/testing/selftests/x86/helpers.h. I have a patch to switch to the builtins. > 4. The kernel does use DWARF, based on configs, right? Indeed, but user code in the kernel tree (e.g. selftests) does. > #ifdef CONFIG_X86_64 > #define __read_eflags __builtin_ia32_readeflags_u64 > #define __write_eflags __builtin_i32_writeeflags_u64 > #else > #define __read_eflags __builtin_ia32_readeflags_u32 > #define __write_eflags __builtin_i32_writeeflags_u32 > #endif Looks reasonable to me. > > Which would be concise. For smap_save() we can use clac() and stac() > which might be nicer than `asm goto` (kudos for using `asm goto`, but > plz no more). :^P Also, we can probably cleanup the note about GCC < > 4.9 now. :) I find it a bit implausible that popfq is faster than a branch, so the smap_restore() code seems suboptimal. For smap_save(), I'm not sure what's ideal, but the current code seems fine other than the bogus constraint. > > > Clang chooses the most general constraint when multiple constraints > > > are specified. So it will use the stack when retrieving the flags. > > > > I haven't looked at clang's generated code to see if it's sensible > > from an optimization perspective, but surely the kernel code right now > > is genuinely wrong. GCC is free to omit the frame pointer and > > generate a stack-relative access for the pop, in which case it will > > malfunction. > > Sorry, this is another case I don't precisely follow, would you mind > explaining it further to me please? The compiler is permitted (and expected!) to instantiate an m constraint as something like offset(%rsp). For example, this: unsigned long bogus_read_flags(void) { unsigned long ret; asm volatile ("pushfq\n\tpopq %0" : "=m" (ret)); return ret; } compiles to: pushfq popq -8(%rsp) movq -8(%rsp), %rax ret which is straight-up wrong. Changing it to "=rm" gives: pushfq popq %rax ret which is correct, but this only works because gcc politely chose the r option instead of the m option. clang chooses poorly and gives: read_flags: pushfq popq -8(%rsp) movq -8(%rsp), %rax retq which is wrong. I filed a clang bug: https://bugs.llvm.org/show_bug.cgi?id=47530 but the kernel is buggy here -- clang is within its rights to generate the bogus sequence above. Bill's email was IMO rather obfuscated, and I assume this is what he meant. Of course, clang unhelpfully generates poor code for the builtin, too: nice_read_eflags: pushq %rbp movq %rsp, %rbp pushfq popq %rax popq %rbp retq I filed a bug for that, too: https://bugs.llvm.org/show_bug.cgi?id=47531 So we at least need to fix the bogus "=rm", and we should seriously consider using the builtin.