https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35560
Witold Baryluk <witold.baryluk+gcc at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |witold.baryluk+gcc at gmail dot co | |m --- Comment #15 from Witold Baryluk <witold.baryluk+gcc at gmail dot com> --- I know this is a pretty old bug, but I was exploring some assembly of gcc and clang on godbolt, and also stumbled into same issue. https://godbolt.org/z/qPzMhWse1 class A { public: virtual int f7(int x) const; }; int g(const A * const a, int x) { int r = 0; for (int i = 0; i < 10000; i++) r += a->f7(x); return r; } (same happens without loop, when just calling a->f7 multiple times) g(A const*, int): push r13 mov r13d, esi push r12 xor r12d, r12d push rbp mov rbp, rdi push rbx mov ebx, 10000 sub rsp, 8 .L2: mov rax, QWORD PTR [rbp+0] # a vtable deref mov esi, r13d mov rdi, rbp call [QWORD PTR [rax]] # f7 indirect call add r12d, eax dec ebx jne .L2 add rsp, 8 pop rbx pop rbp mov eax, r12d pop r12 pop r13 ret I was expecting mov rax, QWORD PTR [rbp+0] and call [QWORD PTR [rax]], to be hoisted out of the loop (call converted to lea, and call register). A bit sad. Is there some recent work done on this optimization? Are there at least some cases where it is valid to do CSE, or change code so it is moved out of the loop?