TL;DR LLVM currently has a bug when unrolling loops containing asm goto and we have a fix in hand.
Following up from our thread last week (https://lkml.org/lkml/2019/6/29/14). With an x86 defconfig (6fbc7275c7a9) +clang 9 (e1006259d84d), I see the following 4 warnings from objtool: arch/x86/events/intel/core.o: warning: objtool: intel_pmu_nhm_workaround()+0xa8: unreachable instruction arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool: get_fixed_ranges()+0x9b: unreachable instruction arch/x86/kernel/platform-quirks.o: warning: objtool: x86_early_init_platform_quirks()+0x84: unreachable instruction drivers/gpu/drm/i915/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x88: redundant UACCESS disable Peter mentioned seeing 6 cases: https://lkml.org/lkml/2019/6/27/118 We have 21 open reports (mostly unverified, I think someone was testing -O3 -march=native): https://github.com/ClangBuiltLinux/linux/issues?q=is%3Aissue+is%3Aopen+label%3Aobjtool Taking a look specifically at arch/x86/kernel/cpu/mtrr/generic.o (as tglx and Peter did in https://lkml.org/lkml/2019/6/27/118: Disassembly and jump table: https://gist.github.com/nickdesaulniers/395129e75d9f7afe028db832145594d8 More concise snippet: https://gist.github.com/nickdesaulniers/a3159782908cb6e50e26ea9563b14c20 Of interest are the disassembled __jump_table entries; in groups of three, there is a group for which the second element is duplicated with a previous group. This is bad because (as explained by Peter in https://lkml.org/lkml/2019/6/27/118) the triples are in the form (code location, jump target, pointer to key). Duplicate or repeated jump targets are unexpected, and will lead to incorrect control flow after patching such code locations. Also, the jump target should be 0x7 bytes ahead of the location, IIUC. I've annotated the concise snippet: https://gist.github.com/nickdesaulniers/c0cb8fc67720a5abbadf42fde5440a8e we can see that two groups in the __jump_table share a similar jump target, unexpectedly. Now that we can identify what's bad (duplicate __jump_table entries), we can try to creduce the input. (https://nickdesaulniers.github.io/blog/2019/01/18/finding-compiler-bugs-with-c-reduce/) The preprocessed input is 37497 lines long. We want to reduce it down, but I haven't been able to write a correct "interestingness test" for creduce where I was then able to make sense of the output. Instead, I'm looking into program slicing: https://en.wikipedia.org/wiki/Program_slicing. We'd want to do this for C code, and it should be possible to implement with libclang, but since it doesn't exist yet AFAICT and llvm-extract does, we can do the program slicing on the LLVM IR level. This will also give us a test case to add to the compiler's test suite. $ clang <same c flags kernel uses via `make V=`> \ -emit-llvm -S -Xclang -disable-llvm- passes -o get_fixed_ranges generic.ll $ llvm-extract -func get_fixed_ranges generic.ll -o generic.get_fixed_ranges.ll -S --recursive is a more manageable 541 lines of LLVM IR. https://gist.github.com/nickdesaulniers/73aef95e9735a50765f654ebdcbb99dd `callbr` is the LLVM IR instruction representing `asm goto`. https://llvm.org/docs/LangRef.html#callbr-instruction The gist of the instruction is: callbr ... <list of block addresses from the label list> to <fallthrough basic block [<list of block addresses from the label list>]. IIUC, the repeated list of addresses better match. We can then observe the middle end work its optimizations: $ opt -O2 -print-before-all -print-after-all generic.get_fixed_ranges.ll -S -o - 2>&1 | less (27199 lines of output, omitted) Here's the relevant snippet of the 27199 lines: https://gist.github.com/nickdesaulniers/089bf56165675065bca830cd7da767d8 grepping through the output and looking for the first pass which contains non-matching list of block addresses in the label list, we come upon: ``` callbr void asm sideeffect "1:.byte 0x0f,0x1f,0x44,0x00,0\0A\09.pushsection __jump_table, \22aw\22 \0A\09 .balign 8 \0A\09.long 1b - ., ${2:l} - . \0A\09 .quad ${0:c} + ${1:c} - .\0A\09.popsection \0A\09", "i,i,X,~{dirflag},~{fpsr},~{flags}"(%struct.static_key* getelementptr inbounds (%struct.tracepoint, %struct.tracepoint* @__tracepoint_read_msr, i64 0, i32 1), i1 false, i8* blockaddress(@get_fixed_ranges, %20)) #6 to label %native_read_msr.exit31.1 [label %49], !srcloc !4 ``` (particularly I would have expected the `%20` and `%49` to match.) which occurs before "MergedLoadStoreMotion," but due to what looks like a bug in the pass printer (https://lists.llvm.org/pipermail/cfe-dev/2019-July/062795.html) doesn't print the results of loop unrolling, which runs just before "MergedLoadStoreMotion." I see that `get_fixed_ranges` is a loop that gets unrolled to have a trip count of two. $ opt -O2 -debug-only=loop-unroll generic.get_fixed_ranges.ll -S -o /dev/null ... Loop Unroll: F[get_fixed_ranges] Loop % Loop Size = 27 Trip Count = 2 Trip Multiple = 2 COMPLETELY UNROLLING loop % with trip count 2! If I manually add some print statements to loop unrolling, I can see that before we unroll the loop, things look good. After unrolling the loop, things look bad. Indeed, if I kneecap loop unrolling the problem goes away: $ opt -O2 -debug-only=loop-unroll -unroll-full-max-count=0 generic.get_fixed_ranges.ll -S -o - So this is a bug in our loop unroller when unrolling IR representations of asm goto. Peter's guess that this was a bug in inlining was very close! Function inlining and loop unrolling are closely related and probably share a lot of machinery related to cloning of basic blocks and instructions. In this case, if we have a loop with asm goto and we unroll the loop, the referenced labels from the asm goto statement need to reflect the newly duplicated labels. In this case, the new duplicated asm goto is referencing the initial asm goto's label, which would result in non-sensical control flow at runtime. In the loop unroller, I can see that we skip loop unrolling in the event of "indirectbr" instructions. https://github.com/llvm/llvm-project/blob/97316fff5d6a7c3d2c072301a118ac0cb5094ad8/llvm/lib/Transforms/Utils/LoopUnroll.cpp#L294 https://github.com/llvm/llvm-project/blob/97316fff5d6a7c3d2c072301a118ac0cb5094ad8/llvm/lib/Analysis/LoopInfo.cpp#L432 Conservatively, we can block loop unrolling when we see asm goto in a loop: diff --git a/llvm/lib/Analysis/LoopInfo.cpp b/llvm/lib/Analysis/LoopInfo.cpp index 00dbe30c2b3d..f800c890d1db 100644 --- a/llvm/lib/Analysis/LoopInfo.cpp +++ b/llvm/lib/Analysis/LoopInfo.cpp @@ -433,7 +433,8 @@ bool Loop::isSafeToClone() const { // Return false if any loop blocks contain indirectbrs, or there are any calls // to noduplicate functions. for (BasicBlock *BB : this->blocks()) { - if (isa<IndirectBrInst>(BB->getTerminator())) + if (isa<IndirectBrInst>(BB->getTerminator()) || + isa<CallBrInst>(BB->getTerminator())) return false; for (Instruction &I : *BB) This causes objtool to not find any issues in arch/x86/kernel/cpu/mtrr/generic.o. I don't observe any duplication in the __jump_table section of the resulting .o file. It also cuts down the objtool warnings I observe in a defconfig (listed at the beginning of the email) from 4 to 2. (platform-quirks.o predates asm goto, i915_gem_execbuffer.o is likely a separate bug). Coincidentally, Alexander and Bill just found what I think is the same bug: https://bugs.llvm.org/show_bug.cgi?id=42489 So for now we'll be conservative/correct and disable unrolling of loops that contain asm goto; then we can use the same test case to permit unrolling with the correct code transformations added. -- Thanks, ~Nick Desaulniers