https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46219
Adam Warner <adam at consulting dot net.nz> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Version|4.6.0 |4.9.1 Resolution|FIXED |--- --- Comment #6 from Adam Warner <adam at consulting dot net.nz> --- Great work thanks Kai Tietz and Richard Henderson! I've come across a situation where complex jmp is not generated and crafted a simplified test case: $ cat gcc_bug_no_complex_indirect_jmp.c #include <stdint.h> typedef void (*fn0_t)(uint8_t *rdi); typedef void (*fn1_t)(uint8_t *rdi, fn0_t *rsi); fn0_t fn0_dispatch[256]; fn1_t fn1_dispatch[256]; void fn0_test(uint8_t *rdi) { fn0_t *rsi = fn0_dispatch; fn1_dispatch[rdi[1]](rdi, rsi); } int main(void) { asm volatile ("ret; jmpq *0x601140(,%rax,8)"); return 0; } $ gcc --version gcc (Debian 4.9.1-4) 4.9.1 Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ gcc -O3 gcc_bug_no_complex_indirect_jmp.c && objdump -d -m i386:x86-64:intel a.out |less ... 00000000004003c0 <main>: 4003c0: c3 ret 4003c1: ff 24 c5 40 11 60 00 jmp QWORD PTR [rax*8+0x601140] ... 00000000004004c0 <fn0_test>: 4004c0: 0f b6 47 01 movzx eax,BYTE PTR [rdi+0x1] 4004c4: be 40 09 60 00 mov esi,0x600940 4004c9: 48 8b 04 c5 40 11 60 mov rax,QWORD PTR [rax*8+0x601140] 4004d0: 00 4004d1: ff e0 jmp rax ... The last two instructions should be merged into JMP QWORD PTR [rax*8+0x601140]. This is a 7 byte instruction. Fortuitously fn0_test would become 16 bytes total (no more than 16 bytes of machine code can be decoded in one clock cycle on Intel Core 2).