http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58863
Bug ID: 58863 Summary: for loop not aligned at -O2 or -O3 Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: ali.baharev at gmail dot com The for loop in work() is the hotspot: const int LOOP_BOUND = 200000000; __attribute__((noinline)) static int add(const int& x, const int& y) { return x + y; } __attribute__((noinline)) static int work(int xval, int yval) { int sum(0); for (int i=0; i<LOOP_BOUND; ++i) { int x(xval+sum); int y(yval+sum); int z = add(x, y); sum += z; } return sum; } int main(int , char* argv[]) { int result = work(*argv[1], *argv[2]); return result; } Running g++ -O2 main.cpp && objdump -d | c++filt gives 400598: 41 8d 34 1c lea (%r12,%rbx,1),%esi [...] 4005ab: 75 eb jne 400598 <work(int, int)+0x18> According to the documentation: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html -falign-loops Enabled at levels -O2, -O3. By analyzing the assembly code, it looks like gcc aligns things to the next 16 byte boundary by default on this machine in other cases. If I pass -falign-loops=16 it becomes: 4005a0: 41 8d 34 1c lea (%r12,%rbx,1),%esi [...] 4005b3: 75 eb jne 4005a0 <work(int, int)+0x20> I guess it is also supposed to look like this when just -O2 is passed, at least that is what the documentation suggestes to me.