Hello gcc-hackers,
Tracking down a gcse bug while unrolling a loop where the count is known to be one, I've narrowed the problem down to the actions of commit_edge_insertions and bypass_conditional_jumps, and I could use a little help in appreciating the reasoning behind what they're /trying/ to do. The bug is similar or perhaps related to PR/9974, see http://gcc.gnu.org/ml/gcc-patches/2003-03/msg02162.html for the explanation of that one; I also have a non-cc0 target, and a SET immediately prior to a conditional jump is being assumed to be a CC-setter. But the reg_killed_on_edge patch that solved that one isn't helping me in this case. Here's the testcase: ----------------------------------<snip!>---------------------------------- struct Peripheral { int index; }; struct PeripheralManager { struct Peripheral *activePeripherals[1]; }; struct PeripheralManager *manager; int RegisterPeripheral (struct Peripheral *periph); int RegisterPeripheral (struct Peripheral *periph) { int rc = 0; if ((0 != periph) && (-1 == periph->index)) { unsigned i = 0; while ((i < 1) && (0 != manager->activePeripherals [i])) { i++; } if (i < 1) { manager->activePeripherals [i] = periph; periph->index = (int) i; rc = 1; } } return rc; } ----------------------------------<snip!>---------------------------------- When compiled at O2, gcc optimises away the entire body of the function and just returns zero. This happens during the first gcse pass, and it appears to be because of bad code movement on an edge during cjump bypassing. What happens is that the "i++" from the loop body is always executed, because it gets hoisted up past the "0 != manager->activePeripherals [i]" test at the loop top; later, gcc realises that since i is always at least 1, the "if (i < NUM_ARRAY_ELEMENTS (manager->activePeripherals))" clause can never happen and discards it; as a result it decides i is unused and discards the entire loop body as well. In the code below[*], BB#2 initialises the loop counter (r74) to zero, loads the variable 'manager' into r85 and r75, then loads up r89 from (mem r85) to get the first entry of the activePeripherals[] array for testing at the top of the loop in the entry condition. If r89 fails the non-zero test in the jump insn 100, it exits immediately to BB#6, which performs the 'if' test after the loop. Otherwise it falls through to BB#4. BB#4 is the loop body. It increments the loop counter r74 and tests if the result is non-zero, which is equivalent to saying '< 1' with unsigned, where '1' is the loop count; if so, it exits the loop to BB#6. Otherwise, it falls into BB#5, which tests the second part of the loop condition, the activePeripherals[] array entry, falling out to BB#6 if it fails, or returning to BB#4 if not. (Oddly enough it hasn't advanced r75, the pointer to the array, even though it has advanced the index number r74; then again, it does know the size of the array is 1 and the loop will not iterate). Anyway this looks reasonable enough until gcse comes along. Near the end of gcse_main, one_cprop_pass is called one final time, but for the first time with the flag set that permits it to modify jumps. This causes one_cprop_pass to call bypass_conditional_jumps, and that's how we get into trouble. Initially, we've got a flow graph that looks something like this: |=============| |BB#2 | | i = 0 | | | |if arr[i]==0:+--+ |=============| | | | | | V | +->|=============| | | |BB#4 | | | | i++ | | | | if i!=0:+--+ | | | | | |=============| | | | | | | | | V | | |=============| | | |BB#5 | | | | | | +--+:if arr[i]!=0| | | | | |=============| | | | | | V | |=============|<-+ |BB#6 | | | | if i!=0:+--+ | | | |=============| | | | | | V | |=============| | |BB#7 | | | | | |arr[i]=periph| | | | | |=============| | | | | | V | |=============|<-+ |BB#8 | | | | | | | |=============| where 'arr' stands for 'manager->activePeripherals'. The first thing that happens in bypass_conditional_jumps is that it decides that BB#4 can be bypassed when coming from BB#2... ----------------------------------<snip!>---------------------------------- Replacing insn 100 by jump 115 JUMP-BYPASS: Proved reg 74 in jump_insn 22 equals constant (const_int 0 [0x0]) Bypass edge from 2->4 to 6 ----------------------------------<snip!>---------------------------------- ... so we end up with this RTL: ----------------------------------<snip!>---------------------------------- ;; Start of basic block 2, registers live: (nil) (note 81 14 18 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 18 81 95 2 0x18d50390 (set (reg/v:SI 74) (const_int 0 [0x0])) 25 {movsi} (nil) (expr_list:REG_EQUAL (const_int 0 [0x0]) (nil))) (insn 95 18 113 2 0x18d50390 (set (reg:SI 85) (mem/f:SI (symbol_ref/v:SI ("manager")) [5 manager+0 S4 A32])) 25 {movsi} (nil) (nil)) (insn 113 95 108 2 0x0 (set (reg:SI 90) (reg:SI 85)) -1 (nil) (nil)) (insn 108 113 99 2 0x0 (set (reg/s/f:SI 75) (reg:SI 85)) -1 (nil) (nil)) (insn 99 108 114 2 0x18d50390 (set (reg:SI 89) (mem/s:SI (reg:SI 85) [2 <variable>.activePeripherals S4 A32])) 25 {movsi} (nil) (nil)) (insn 114 99 115 2 0x0 (set (reg:SI 91) (reg:SI 89)) -1 (nil) (nil)) (jump_insn 115 114 116 2 0x0 (set (pc) (label_ref 47)) -1 (nil) (nil)) ;; End of basic block 2, registers live: (nil) ----------------------------------<snip!>---------------------------------- .. and a graph like this: |=============| |BB#2 | | i = 0 | | | | | |=============| | +---------+ | +->|=============| | | |BB#4 | | | | i++ | | | | if i!=0:+--+ | | | | | |=============| | | | | | | | | V | | |=============| | | |BB#5 | | | | | | +--+:if arr[i]!=0| | | | | |=============| | | | | | V | |=============|<-+ |BB#6 | | | | if i!=0:+--+ | | | |=============| | | | | | V | |=============| | |BB#7 | | | | | |arr[i]=periph| | | | | |=============| | | | | | V | |=============|<-+ |BB#8 | | | | | | | |=============| Which is fine, but then it calls commit_edge_insertions, and something goes wrong. For some reason, the insn that increments i from BB#4 gets moved into the end of BB#2 just before the jump: ------------------------------------------------------------------------ @@ -69,11 +69,16 @@ (mem/s:SI (reg:SI 85) [2 <variable>.activePeripherals S4 A32])) 25 {mov si} (nil) (nil)) -(insn 114 99 115 2 0x0 (set (reg:SI 91) +(insn 114 99 117 2 0x0 (set (reg:SI 91) (reg:SI 89)) -1 (nil) (nil)) -(jump_insn 115 114 116 2 0x0 (set (pc) +(insn 117 114 115 2 0x0 (set (reg/v:SI 74) + (plus:SI (reg/v:SI 74) + (const_int 1 [0x1]))) -1 (nil) + (nil)) + +(jump_insn 115 117 116 2 0x0 (set (pc) (label_ref 47)) -1 (nil) (nil)) ;; End of basic block 2, registers live: ------------------------------------------------------------------------ I've tracked this down enough to understand that bypass_block thinks that insn 39 is a CC setter for the conditional jump insn 22, and that the new insn 117 is a duplicate of insn 39 that has been placed on the cfg edge that now points from BB#2 to BB#6; and that it believes this cc setter is required in order for the conditional jump at BB#6 which is a duplicate of the one at BB#4 to still be valid. What I don't understand, though, is why it gets lifted into BB#2. It is happening in commit_one_edge_insertion, at the comment that reads: /* If the source has one successor and the edge is not abnormal, insert there. Except for the entry block. */ which is where the stray insn 117 gets inserted into the tail end of BB#2. But I can't quite grok what the actual /bug/ is here. Should that edge not have been chosen for bypassing? Or is the decision to bypass correct, but the placement of the hoisted insn wrong? Is the bug perhaps because the bypass code notes that there are two code paths from BB#2 to BB#7, notices that the indirect one depends on a condition that it knows is true, but doesn't realise that that doesn't mean they are equivalent to each other? That is, is it because after the replacement of the cjump insn 100 with the non-conditional jump insn 115, there are effectively *two* edges from BB#2 to BB#7, only /one/ of which should have the SET insn moved onto it, and something in gcc is assuming that any two edges with the same source and destination must be actually identical, an assumption which can be violated in the presence of copy instructions inserted on one edge? Is there anything in the cfg code that would try and merge edges in this fashion? cheers, DaveK [*] - RTL before bad transformation ----------------------------------<snip!>---------------------------------- ;; Start of basic block 2, registers live: (nil) (note 81 14 18 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 18 81 95 2 0x18d50390 (set (reg/v:SI 74) (const_int 0 [0x0])) 25 {movsi} (nil) (expr_list:REG_EQUAL (const_int 0 [0x0]) (nil))) (insn 95 18 108 2 0x18d50390 (set (reg:SI 85) (mem/f:SI (symbol_ref/v:SI ("manager")) [5 manager+0 S4 A32])) 25 {movsi} (nil) (nil)) (insn 108 95 99 2 0x0 (set (reg/s/f:SI 75) (reg:SI 85)) -1 (nil) (nil)) (insn 99 108 100 2 0x18d50390 (set (reg:SI 89) (mem/s:SI (reg:SI 85) [2 <variable>.activePeripherals S4 A32])) 25 {movsi} (nil) (nil)) (jump_insn 100 99 20 2 0x18d50390 (set (pc) (if_then_else (eq:SI (const_int 0 [0x0]) (reg:SI 89)) (label_ref 47) (pc))) 45 {int_cond_branch} (nil) (expr_list:REG_BR_PRED (concat (const_int 13 [0xd]) (const_int 3600 [0xe10])) (nil))) ;; End of basic block 2, registers live: (nil) (note 20 100 41 NOTE_INSN_LOOP_BEG) ;; Start of basic block 4, registers live: (nil) (code_label 41 20 82 4 7 "" [1 uses]) (note 82 41 39 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn 39 82 40 4 0x18d50390 (set (reg/v:SI 74) (plus:SI (reg/v:SI 74) (const_int 1 [0x1]))) 2 {addsi3} (nil) (nil)) (note 40 39 104 4 NOTE_INSN_LOOP_CONT) (note 104 40 22 4 NOTE_INSN_LOOP_VTOP) (jump_insn 22 104 84 4 0x18d50390 (set (pc) (if_then_else (ne:SI (const_int 0 [0x0]) (reg/v:SI 74)) (label_ref 47) (pc))) 46 {int_cond_branch_rev} (nil) (nil)) ;; End of basic block 4, registers live: (nil) ;; Start of basic block 5, registers live: (nil) (note 84 22 27 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 27 84 28 5 0x18d50390 (set (reg/s:SI 79) (mem/s:SI (reg:SI 85) [2 <variable>.activePeripherals S4 A32])) 25 {movsi} (nil) (nil)) (jump_insn 28 27 46 5 0x18d50390 (set (pc) (if_then_else (ne:SI (const_int 0 [0x0]) (reg/s:SI 79)) (label_ref 41) (pc))) 46 {int_cond_branch_rev} (nil) (nil)) ;; End of basic block 5, registers live: (nil) (note 46 28 47 NOTE_INSN_LOOP_END) ;; Start of basic block 6, registers live: (nil) (code_label 47 46 88 6 4 "" [2 uses]) (note 88 47 49 6 [bb 6] NOTE_INSN_BASIC_BLOCK) (jump_insn 49 88 89 6 0x18d50390 (set (pc) (if_then_else (ne:SI (const_int 0 [0x0]) (reg/v:SI 74)) (label_ref 64) (pc))) 46 {int_cond_branch_rev} (nil) (nil)) ;; End of basic block 6, registers live: (nil) ;; Start of basic block 7, registers live: (nil) (note 89 49 53 7 [bb 7] NOTE_INSN_BASIC_BLOCK) -- Can't think of a witty .sigline today....