The code in attached testcase is taken from povray-3.6.1 and produces a nasty regression, exposed by new optimized string functions. Please note, that expanded RTL of pov_calloc() function is OK, but subsequent RTL optimization (bbro) mixes BBs in the wrong order.
It is evident, that %ebx is cleared in BB4, and dies in BB5. This dump is from _.148r.rnreg: --cut here-- ;; Start of basic block 4, registers live: 0 [ax] 1 [dx] 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame] ;; Pred edge 3 [40.0%] (fallthru) (note:HI 72 29 119 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn 119 72 31 4 (parallel [ (set (reg:SI 3 bx [68]) (const_int 0 [0x0])) (clobber (reg:CC 17 flags)) ]) 38 {*movsi_xor} (nil) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (note:HI 31 119 89 4 NOTE_INSN_DELETED) (insn 89 31 33 4 (set (reg:CCZ 17 flags) (compare:CCZ (and:SI (reg:SI 0 ax [orig:59 block ] [59]) (const_int 1 [0x1])) (const_int 0 [0x0]))) 286 {testsi_1} (nil) (expr_list:REG_DEAD (reg:SI 0 ax [orig:59 block ] [59]) (nil))) (jump_insn:HI 33 89 73 4 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0x0])) (label_ref 36) (pc))) 530 {*jcc_1} (insn_list:REG_DEP_TRUE 32 (nil)) (expr_list:REG_DEAD (reg:CCZ 17 flags) (expr_list:REG_BR_PROB (const_int 9000 [0x2328]) (nil)))) ;; End of basic block 4, registers live: 1 [dx] 3 [bx] 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame] ;; Succ edge 6 [90.0%] ;; Succ edge 5 [10.0%] (fallthru) ;; Start of basic block 5, registers live: 1 [dx] 3 [bx] 4 [si] 5 [di] 6 [bp] 7 [sp] 20 [frame] ;; Pred edge 4 [10.0%] (fallthru) (note:HI 73 33 95 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 95 73 34 5 (set (reg:QI 0 ax) (reg:QI 3 bx)) 55 {*movqi_1} (nil) (nil)) (insn:HI 34 95 35 5 (parallel [ (set (mem:QI (reg/f:SI 5 di [orig:67 block ] [67]) [0 S1 A8]) (reg:QI 0 ax)) (set (reg/f:SI 5 di [orig:67 block ] [67]) (plus:SI (reg/f:SI 5 di [orig:67 block ] [67]) (const_int 1 [0x1]))) ]) 720 {*strsetqi_1} (nil) (expr_list:REG_DEAD (reg:QI 0 ax) (nil))) --cut here-- However, _.149.bbro renames BB4 and BB5 into BB12 and BB17 respectively, where BB12 can be reached _conditionally_ from BB3. This produces wrong code for pov_calloc(): --cut here-- movl %eax, %esi #, block testl %eax, %eax # block je .L4 #, <<< check for NULL movl %ebx, %edx # actsize, actsize movl %eax, %edi # block, block cmpl $3, %ebx #, actsize <<< memset check for "< 4" ja .L13 #, <<< jump only for < 4 testb $2, %dl #, actsize jne .L14 #, <<< here we go with wrong %ebx .L9: andb $1, %dl #, actsize jne .L15 #, <<< here too. .L4: movl %esi, %eax # block, <result> addl $16, %esp #, popl %ebx # popl %esi # popl %edi # popl %ebp # ret .L15: movl %ebx, %eax #, <<< wrong %ebx moved to %eax stosb <<< FUBAR 2. movl %esi, %eax # block, <result> addl $16, %esp #, popl %ebx # popl %esi # popl %edi # popl %ebp # ret .L14: movl %ebx, %eax #, <<< wrong %ebx moved to %eax stosw <<< FUBAR 1. andb $1, %dl #, actsize je .L4 #, jmp .L15 # .L13: xorl %ebx, %ebx # tmp68 <<< %ebx is cleared here!! testb $1, %al #, block jne .L16 #, .L7: testl $2, %edi #, block .p2align 4,,5 jne .L17 #, .L8: movl %edx, %ecx # actsize, tmp71 shrl $2, %ecx #, tmp71 movl %ebx, %eax # tmp68, rep stosl testb $2, %dl #, actsize je .L9 #, jmp .L14 # .L16: movl %ebx, %eax #, <<< this part is OK, but for size > 4 stosb subl $1, %edx #, actsize jmp .L7 # .L17: movl %ebx, %eax #, <<< this part is OK, but for size > 4 stosw subl $2, %edx #, actsize jmp .L8 # --cut here-- This can be confirmed by running the testcase: > gcc -O2 -m32 -march=pentium4 -minline-all-stringops -DSIZE=1 mem.c > ./a.out Aborted > gcc -O2 -m32 -march=pentium4 -minline-all-stringops -DSIZE=2 mem.c > ./a.out Aborted > gcc -O2 -m32 -march=pentium4 -minline-all-stringops -DSIZE=4 mem.c > ./a.out > echo $? 0 -- Summary: Wrong code with optimized memset() (possible bug in RTL bbro optimizer) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: wrong-code Severity: major Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ubizjak at gmail dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30213