https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67886
Bug ID: 67886 Summary: Incomplete optimization for virtual function call into freshly constructed object Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: minor Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: Simon.Richter at hogyros dot de Target Milestone: --- This is a bit of a corner/academic case, but came up in a Stack Overflow discussion: struct Base { virtual void func() = 0; }; struct Derived : Base { virtual void func() { }; }; void test() { Base* base = new Derived; for (int i = 0; i < 1000; ++i) { base->func(); } } The generated assembler code on x86_64 with -O3 is Disassembly of section .text: 0000000000000000 <test()>: 0: 55 push %rbp 1: 53 push %rbx 2: bf 08 00 00 00 mov $0x8,%edi 7: bb e8 03 00 00 mov $0x3e8,%ebx c: 48 83 ec 08 sub $0x8,%rsp 10: e8 00 00 00 00 callq 15 <test()+0x15> 11: R_X86_64_PC32 operator new(unsigned long)-0x4 15: ba 00 00 00 00 mov $0x0,%edx 16: R_X86_64_32 vtable for Derived+0x10 1a: 48 89 c5 mov %rax,%rbp 1d: 48 c7 00 00 00 00 00 movq $0x0,(%rax) 20: R_X86_64_32S vtable for Derived+0x10 24: eb 13 jmp 39 <test()+0x39> 26: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 2d: 00 00 00 30: 83 eb 01 sub $0x1,%ebx 33: 74 1a je 4f <test()+0x4f> 35: 48 8b 55 00 mov 0x0(%rbp),%rdx 39: 48 8b 12 mov (%rdx),%rdx 3c: 48 81 fa 00 00 00 00 cmp $0x0,%rdx 3f: R_X86_64_32S Derived::func() 43: 74 eb je 30 <test()+0x30> 45: 48 89 ef mov %rbp,%rdi 48: ff d2 callq *%rdx 4a: 83 eb 01 sub $0x1,%ebx 4d: 75 e6 jne 35 <test()+0x35> 4f: 48 83 c4 08 add $0x8,%rsp 53: 5b pop %rbx 54: 5d pop %rbp 55: c3 retq Disassembly of section .text._ZN7Derived4funcEv: 0000000000000000 <Derived::func()>: 0: f3 c3 repz retq This looks like an optimization half-done. The optimizer correctly inlines the function call to Derived::func() into the loop, and also correctly verifies that the function pointer found in the vtable is indeed the same function that was inlined -- otherwise, the inlined function is skipped and the regular function called. I presume that the pointer is rechecked on every loop iteration because it is possible that the function call can destroy the object and create a new one in its place that still derives from Base, so that is correct. If you set -fPIC, the actual values for the vtable pointer and the pointer to Derived::func() are fetched outside of the loop, and rechecked on each loop iteration, again, correctly. However: without -fPIC, there is no way to get a different definition of Derived::func() without invoking UB, so the function pointer check is tautological and can be optimized out, unraveling the entire fuzzy ball, as the inlined function does not destroy the object, and inlining it into the loop should give an empty loop that can be removed. Also, wouldn't setting -fvisibility=hidden also take Derived's symbols out of the dynamic symbol table, in which case I wouldn't be able to override them at runtime with a preload library? The optimal solution from an assembler programmer's perspective would be to take the knowledge that the inlined function does not touch the object's vtable, and create a path that handles the remaining loop iterations after the object was shown to be a Derived object once -- this would probably be optimized to a conditional jump to the ret instruction in the RTL pass -- but I don't have enough knowledge to tell whether that would be easily doable in this case.