------- Comment #30 from dave at hiauly1 dot hia dot nrc dot ca 2006-07-31 02:09 ------- Subject: Re: [4.2 Regression] FAIL: gcc.c-torture/execute/builtin-bitops-1.c execution, -O3 -fomit-frame-pointeRO
> no, this seems correct to me. In cfglayout mode (that is used in the loop > optimizations), the unconditional jumps are removed and they are represented > only implicitly by edges in the cfg. BB 13 is probably a forwarder block > created by some of the loop shape canonicalization transformations, and its > single successor is the block containing label 27 (i.e., bb 13 implicitly > contains (jump_insn (set (pc) (label_ref 27))) Ok, I've learned something new about RTL. Looking some more, I think the bug is in the cse2 pass. In particular, the substitution of reg:SI 138 for reg:SI 176 appears wrong. Before this pass, the most significant 32 bits of the long long were arithmetically shifted (ashift) first by 1 and then -1 when i is 0. The latter shift is equivalent to a shift of 31 because of the modulo nature of shifting on this target. Thus, reg:SI 176 should be 0 when i is 0. However, the cse2 substitution or's in the most significant 32-bits of the original long long into the least significant 32 bits of reg:DI 113. This results in bit 31 being counted twice. For some reason, this substitution doesn't happen in my_parityll. There are some differences in the unrolling in main and my_parityll. For example, I see this in the output for main: Loop 2 is simple: simple exit 7 -> 8 number of iterations: (plus:SI (not:SI (reg/v:SI 109 [ i ])) (reg:SI 142)) upper bound: -1 Whereas, for my_parityll: Loop 1 is simple: simple exit 6 -> 7 number of iterations: (const_int 63 [0x3f]) upper bound: 63 Dave -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26244