Consider the following functions: // g++ -mtune=core2 -O3 -S -dp void loop(int* dest, int* src, int count) { for(int i=0; i < count; i++) dest[i] = src[i]; } void loop_few(int* dest, int* src) { loop(dest, src, 8); } void loop_many(int* dest, int* src) { loop(dest, src, 64); }
loop() unrolls 8x, as expected. loop_few() peels completely, as expected. However, loop_many() neither peels nor unrolls. _Z9loop_manyPiS_: xorl %edx, %edx # 34 *movdi_xor_rex64 [length = 2] .L47: movl (%rsi,%rdx,4), %eax # 11 *movsi_1/1 [length = 3] movl %eax, (%rdi,%rdx,4) # 12 *movsi_1/2 [length = 3] incq %rdx # 13 *adddi_1_rex64/1 [length = 3] cmpq $64, %rdx # 15 cmpdi_1_insn_rex64/1 [length = 4] jne .L47 # 16 *jcc_1 [length = 2] rep ; ret # 35 return_internal_long [length = 1] Ideally the optimizer would unroll 8x, then notice that (count%8==0) and eliminate the partial unroll code. However, even a stock unroll would be better than nothing. -- Summary: Loop unrolling not performed with large constant loop bound Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: scovich at gmail dot com GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32729