https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785
Bug ID: 71785 Summary: Computed gotos are mostly optimized away Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: andres at anarazel dot de Target Milestone: --- Hi, I'm working on some interpreter like constructs in postgres. To reduce the number of mispredictions I wanted to use the "typical" jump threading approach. Unfortunately with gcc-6 (gcc-6 (Debian 6.1.1-8) 6.1.1 20160630) and up to a recent snapshot (Debian 20160612-1) 7.0.0 20160612 (experimental) [trunk revision 237336]), gcc merges some of the gotos together in a common label, and jumps there. In the attached file (a small artifical case showing the problem), with -O3 this results in CASE_OP_A: someglobal++; op++; goto *dispatch_table[op->opcode]; CASE_OP_B: do_stuff_b(op->arg); op++; goto *dispatch_table[op->opcode]; being implemented as .L5: addq $8, %rbx jmp *%rax ... .L3: movl (%rbx), %eax addl $1, someglobal(%rip) movq dispatch_table.1772(,%rax,8), %rax jmp .L5 ... .L4: movl -4(%rbx), %edi call do_stuff_b movl (%rbx), %eax movq dispatch_table.1772(,%rax,8), %rax jmp .L5 I've tried -fno-gcse and -fno-crossjumping, and neither seems to fix the problem. It's also kind of weird how the load from the dispatch table is still performed in the individual branches, just the final jmp *%rax happens in the common location (L5 here). In the actual case I'm fighting with gcc "inlines" the jmp *%rax in one of the dispatches, but not in the other 8. Any additional information I can provide? Regards, Andres