(Bill, please use `./scripts/get_maintainer.pl <patchfile>` to get the appropriate list of folks and mailing lists to CC)
On Thu, Sep 3, 2020 at 8:06 AM Andy Lutomirski <l...@kernel.org> wrote: > > So with my selftests hat on, the inline asm utterly sucks. Trying to > use pushfq / popfq violates the redzone and requires a gross hack to > work around. It also messes up the DWARF annotations. The kernel > gets a free pass on both of these because we don't use a redzone and > we don't use DWARF. Sorry, I don't fully understand: 1. What's the redzone? 2. How does pushfq/popfq violate it? 3. What is the "gross hack"/"workaround?" (So that we might consider removing it if these builtins help). 4. The kernel does use DWARF, based on configs, right? > > Now GCC and clang's builtins are also ugly. But perhaps we could have > a little wrapper that is less ugly? > > static __always_inline unsigned long __read_eflags(void) > { > #ifdef CONFIG_X86_64 > return __builtin_ia32_readeflags_u64(); > #else > return __builtin_ia32_readeflags_u32(); > } It looks like __builtin_ia32_readeflags_u32, __builtin_ia32_readeflags_u64, __builtin_ia32_writeeflags_u64, and __builtin_ia32_writeeflags_u32 were first available in GCC 4.9; they weren't in GCC 4.8 or older, so we can make use of them unconditionally. I think it would be nice to use the above. Could even be: #ifdef CONFIG_X86_64 #define __read_eflags __builtin_ia32_readeflags_u64 #define __write_eflags __builtin_i32_writeeflags_u64 #else #define __read_eflags __builtin_ia32_readeflags_u32 #define __write_eflags __builtin_i32_writeeflags_u32 #endif Which would be concise. For smap_save() we can use clac() and stac() which might be nicer than `asm goto` (kudos for using `asm goto`, but plz no more). :^P Also, we can probably cleanup the note about GCC < 4.9 now. :) > > > > > > Why can't clang use the inline asm version? > > > > Clang chooses the most general constraint when multiple constraints > > are specified. So it will use the stack when retrieving the flags. > > I haven't looked at clang's generated code to see if it's sensible > from an optimization perspective, but surely the kernel code right now > is genuinely wrong. GCC is free to omit the frame pointer and > generate a stack-relative access for the pop, in which case it will > malfunction. Sorry, this is another case I don't precisely follow, would you mind explaining it further to me please? Bill, it looks like Clang is forcibly emitting a frame pointer for these: https://godbolt.org/z/GPMeKc, is that right/needed? I thought -O2 implied -fomit-frame-pointer, in fact I think I fixed that in Clang for x86 most recently... > > IOW, we appear to have an actual bug, and clang just happens to be > triggering it, no? We've seen bugs in the past where "m" constraints were used with inline asm that modified the stack pointer (commit 428b8f1d9f92 "KVM: VMX: don't allow memory operands for inline asm that modifies SP"), though I thought only leaving the stack imbalanced (not restoring it to its previous value before the end of the inline asm) was problematic. Though commit d0a8d9378d16 ("x86/paravirt: Make native_save_fl() extern inline") is what I'm reminded of with this patch. (Stack protectors continue to wind up in places that cause trouble. GCC folks have patches to disable them on a per function basis via function attribute like Clang does, though they've been sitting in code review for a long time. Recently, they've caused some headaches with LTO as well. Once we have those, I plan on adding that to compiler_attributes.h.) I would sleep better at night if I undid/reverted d0a8d9378d16. -- Thanks, ~Nick Desaulniers