Using gentoo gcc 3.4.3 This could look like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11707 (and they might be the same. However I think I had the problem with 3.3.4 too)
I have also had this problem in other older versions. In 2 projects I have been on this has been really annoying. I think that if a loop is unrolled and the variable is eliminated it should be replaced with a constant (and then always false ifs should be removed) That is not the case: int test(int v) { int x = 0; for (int u=0;u<2;u++) { if (u>v) // v is input-arg the compiler can't deside at compiletime { if (u%2==1) // can only happen for u==1 (so loops for 0 and 2 does not do x++; // anything. Hoped gcc would notice when unrolling. } } return x; } g++ -O3 -unroll-loops -S simple_test.cpp gives me the following code: .text .align 2 .p2align 4,,15 .globl _Z4testi .type _Z4testi, @function _Z4testi: .LFB2: pushl %ebp .LCFI0: xorl %edx, %edx movl %esp, %ebp .LCFI1: xorl %eax, %eax incl %eax cmpl 8(%ebp), %eax jle .L4 testb $1, %al setne %cl movzbl %cl, %eax addl %eax, %edx .L4: popl %ebp movl %edx, %eax ret .LFE2: .size _Z4testi, .-_Z4testi .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.4.3-20050110 (Gentoo 3.4.3.20050110-r2, ssp-3.4.3.20050110-0, pie-8.7.7)" If I manually unroll like : int test(int v) { int x = 0; if (0>v) { if (0%2==1) x++; } if (1>v) { if (1%2==1) x++; } if (2>v) { if (2%2==1) x++; } return x; } And then just with O3 I get the much nicer : .text .align 2 .p2align 4,,15 .globl _Z4testi .type _Z4testi, @function _Z4testi: .LFB2: pushl %ebp .LCFI0: xorl %eax, %eax movl %esp, %ebp .LCFI1: cmpl $0, 8(%ebp) popl %ebp setle %al ret .LFE2: .size _Z4testi, .-_Z4testi .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.4.3-20050110 (Gentoo 3.4.3.20050110-r2, ssp-3.4.3.20050110-0, pie-8.7.7)" I have had too cases where this optimization is very important. One is if you a kind of program a chessboard "from within". The other case were a raytracer I wrote with a friend. In that situation we had to seattle with a not that fast switch (since we did not wanted to pollute out code with a manual unroll.) The chessboard example (here a simple case - how many knightsmove does white have. We do not consider check, pins or that pieces can be in the way) int knight_square_count(unsigned char* board) { int count = 0; for (int bp=0;bp<64;bp++) { if (board[bp]==WHITE_KNIGHT) { if (bp%8>1 && bp/8>0) count++; if (bp%8>0 && bp/8>1) count++; if (bp%8<6 && bp/8>0) count++; if (bp%8<7 && bp/8>1) count++; if (bp%8>1 && bp/8<7) count++; if (bp%8>0 && bp/8<6) count++; if (bp%8<6 && bp/8<7) count++; if (bp%8<7 && bp/8<6) count++; } } return count; } In the above situation a manual unroll (with O3) is more than 400% faster. (I have timed it and it is close to 500%) I thought that one of the main ideas of unrolling loops was to make a kind of every loop "its own" (Without making ugly code) regards and thanks for the best (free) compiler Bsc Computer Science Thorbjørn Martsum PS : There might also be a reason for things being as they are. Then I just don't understand why - please explain then -- Summary: unroll misses simple elimination - works with manual unroll Product: gcc Version: 3.4.3 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tlm at daimi dot au dot dk CC: gcc-bugs at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827