Hi, I've been playing around gcc -flto flag and inlining functionnalities for a while in search of both optimized performance and full understanding of g++ behavious.
Right now, I'm puzzled by the assembly output produced for that piece of code: #include <iostream> using namespace std; class A { public: inline virtual void blah() { cout << "A" << endl; } }; class B : public A { public: inline virtual void blah() { cout << "B" << endl; } }; class C { public: void blah() { cout << "C" << endl; } }; int main(int argc, char** argv) { A* ptr = 0; if(argc == 1) ptr = new B(); else ptr = new A(); ptr->blah(); B().blah(); C().blah(); } I would expect the compiler to be able to inline function blah() when it is statically called for class B and C but have a VTable resolution for the call ptr->blah. Here's the relevant assembly code produced by g++ with flags -O3 and -S: main: .LFB976: .cfi_startproc subq $24, %rsp .cfi_def_cfa_offset 32 cmpl $1, %edi movl $8, %edi je .L18 call _Znwm movq %rax, %rdi movq $_ZTV1A+16, (%rax) movl $_ZTV1A+16, %eax .L16: call *(%rax) movq %rsp, %rdi movq $_ZTV1B+16, (%rsp) call _ZN1B4blahEv movl $.LC2, %esi movl $_ZSt4cout, %edi call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc movq %rax, %rdi call _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ xorl %eax, %eax addq $24, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 ret .L18: .cfi_restore_state call _Znwm movq %rax, %rdi movq $_ZTV1B+16, (%rax) movl $_ZTV1B+16, %eax jmp .L16 .cfi_endproc The puzzling part is to find that the call for C().blah() is indeed inlined and the ptr->blah() uses a VTable resolution, but the code for B.blah() uses neither: the static adress is resolved but the code is not inlined! (The same behaviour occurs if there would be a static-typed pointer to an object of class B). I understand the compiler propagates the types properly, but even after determining the correct type for the object of type B, it only resolves the vtable reference (hence no call *(%..x) ), but cannot perform the inlining. Question: why ? Can someone explain me the exact order in which the optimization of g++ are performed and how they interact with each other ? I know this might be tricky but any small shed of light could be helpfull. Also, did I miss a flag which would enable g++ to proceed to do the inlining after the resolution ? >From a practical point of view, I understand this example does not justify by itself the absolute need for inlining. However, I do have a time-critical application that would get 25-30% increase in speed if I could solve this issue. Also, I'm just curious to understand why is this the behaviour of g++ (or if it's actually a bug) because it counter my most primitive intuition and the beliefs of many people I know. Thanks in advance for any answer to come. Kind Regards -- Thierry Lavoie, B.Ing., M.scA. PhD. Student, Polytechnique Montreal Lecturer INF2010: Data Structures and Algorithm Lecturer LOG3210: Languages and Compilers