GCSE again: bypass_conditional_jumps -vs- commit_edge_insertions - problem with ccsetters?

Dave Korn Tue, 31 Oct 2006 11:37:20 -0800


    Hello gcc-hackers,


  Tracking down a gcse bug while unrolling a loop where the count is known to
be one, I've narrowed the problem down to the actions of
commit_edge_insertions and bypass_conditional_jumps, and I could use a little
help in appreciating the reasoning behind what they're /trying/ to do.

  The bug is similar or perhaps related to PR/9974, see
http://gcc.gnu.org/ml/gcc-patches/2003-03/msg02162.html for the explanation of
that one; I also have a non-cc0 target, and a SET immediately prior to a
conditional jump is being assumed to be a CC-setter.  But the
reg_killed_on_edge patch that solved that one isn't helping me in this case.

  Here's the testcase:

----------------------------------<snip!>----------------------------------

struct Peripheral {
        int index;
};

struct PeripheralManager {
        struct Peripheral *activePeripherals[1];
};

struct PeripheralManager *manager;

int RegisterPeripheral (struct Peripheral *periph);
int RegisterPeripheral (struct Peripheral *periph)
{
    int rc = 0;

    if ((0 != periph) && (-1 == periph->index))
    {
        unsigned    i = 0;

        while ((i < 1)
        &&     (0 != manager->activePeripherals [i]))
        {
            i++;
        }

        if (i < 1)
        {
            manager->activePeripherals [i] = periph;
            periph->index = (int) i;

            rc = 1;
        }
    }
    return  rc;
}

----------------------------------<snip!>----------------------------------

  When compiled at O2, gcc optimises away the entire body of the function and
just returns zero.  This happens during the first gcse pass, and it appears to
be because of bad code movement on an edge during cjump bypassing.  What
happens is that the "i++" from the loop body is always executed, because it
gets hoisted up past the "0 != manager->activePeripherals [i]" test at the
loop top; later, gcc realises that since i is always at least 1, the "if (i <
NUM_ARRAY_ELEMENTS (manager->activePeripherals))" clause can never happen and
discards it; as a result it decides i is unused and discards the entire loop
body as well.

  In the code below[*], BB#2 initialises the loop counter (r74) to zero, loads
the variable 'manager' into r85 and r75, then loads up r89 from (mem r85) to
get the first entry of the activePeripherals[] array for testing at the top of
the loop in the entry condition.  If r89 fails the non-zero test in the jump
insn 100, it exits immediately to BB#6, which performs the 'if' test after the
loop.  Otherwise it falls through to BB#4.

  BB#4 is the loop body.  It increments the loop counter r74 and tests if the
result is non-zero, which is equivalent to saying '< 1' with unsigned, where
'1' is the loop count; if so, it exits the loop to BB#6.  Otherwise, it falls
into BB#5, which tests the second part of the loop condition, the
activePeripherals[] array entry, falling out to BB#6 if it fails,  or
returning to BB#4 if not.  (Oddly enough it hasn't advanced r75, the pointer
to the array, even though it has advanced the index number r74; then again, it
does know the size of the array is 1 and the loop will not iterate).

  Anyway this looks reasonable enough until gcse comes along.  Near the end of
gcse_main, one_cprop_pass is called one final time, but for the first time
with the flag set that permits it to modify jumps.  This causes one_cprop_pass
to call bypass_conditional_jumps, and that's how we get into trouble.

  Initially, we've got a flow graph that looks something like this:


     |=============|
     |BB#2         |
     |  i = 0      |
     |             |
     |if arr[i]==0:+--+
     |=============|  |
            |         |
            |         |
            V         |
  +->|=============|  |
  |  |BB#4         |  |
  |  |  i++        |  |
  |  |     if i!=0:+--+
  |  |             |  |
  |  |=============|  |
  |         |         |
  |         |         |
  |         V         |
  |  |=============|  |
  |  |BB#5         |  |
  |  |             |  |
  +--+:if arr[i]!=0|  |
     |             |  |
     |=============|  |
            |         |
            |         |
            V         |
     |=============|<-+
     |BB#6         |
     |             |
     |     if i!=0:+--+
     |             |  |
     |=============|  |
            |         |
            |         |
            V         |
     |=============|  |
     |BB#7         |  |
     |             |  |
     |arr[i]=periph|  |
     |             |  |
     |=============|  |
            |         |
            |         |
            V         |
     |=============|<-+
     |BB#8         |
     |             |
     |             |
     |             |
     |=============|

  where 'arr' stands for 'manager->activePeripherals'.  The first thing that
happens in bypass_conditional_jumps is that it decides that BB#4 can be
bypassed when coming from BB#2...

----------------------------------<snip!>----------------------------------
Replacing insn 100 by jump 115
JUMP-BYPASS: Proved reg 74 in jump_insn 22 equals constant (const_int 0 [0x0])
Bypass edge from 2->4 to 6
----------------------------------<snip!>----------------------------------

... so we end up with this RTL:

----------------------------------<snip!>----------------------------------
;; Start of basic block 2, registers live: (nil)
(note 81 14 18 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(insn 18 81 95 2 0x18d50390 (set (reg/v:SI 74)
        (const_int 0 [0x0])) 25 {movsi} (nil)
    (expr_list:REG_EQUAL (const_int 0 [0x0])
        (nil)))

(insn 95 18 113 2 0x18d50390 (set (reg:SI 85)
        (mem/f:SI (symbol_ref/v:SI ("manager")) [5 manager+0 S4 A32])) 25
{movsi} (nil)
    (nil))

(insn 113 95 108 2 0x0 (set (reg:SI 90)
        (reg:SI 85)) -1 (nil)
    (nil))

(insn 108 113 99 2 0x0 (set (reg/s/f:SI 75)
        (reg:SI 85)) -1 (nil)
    (nil))

(insn 99 108 114 2 0x18d50390 (set (reg:SI 89)
        (mem/s:SI (reg:SI 85) [2 <variable>.activePeripherals S4 A32])) 25
{movsi} (nil)
    (nil))

(insn 114 99 115 2 0x0 (set (reg:SI 91)
        (reg:SI 89)) -1 (nil)
    (nil))

(jump_insn 115 114 116 2 0x0 (set (pc)
        (label_ref 47)) -1 (nil)
    (nil))
;; End of basic block 2, registers live:
 (nil)
----------------------------------<snip!>----------------------------------

.. and a graph like this:

     |=============|
     |BB#2         |
     |  i = 0      |
     |             |
     |             |   
     |=============|   
            |          
            +---------+
                      |
  +->|=============|  |
  |  |BB#4         |  |
  |  |  i++        |  |
  |  |     if i!=0:+--+
  |  |             |  |
  |  |=============|  |
  |         |         |
  |         |         |
  |         V         |
  |  |=============|  |
  |  |BB#5         |  |
  |  |             |  |
  +--+:if arr[i]!=0|  |
     |             |  |
     |=============|  |
            |         |
            |         |
            V         |
     |=============|<-+
     |BB#6         |
     |             |
     |     if i!=0:+--+
     |             |  |
     |=============|  |
            |         |
            |         |
            V         |
     |=============|  |
     |BB#7         |  |
     |             |  |
     |arr[i]=periph|  |
     |             |  |
     |=============|  |
            |         |
            |         |
            V         |
     |=============|<-+
     |BB#8         |
     |             |
     |             |
     |             |
     |=============|


  Which is fine, but then it calls commit_edge_insertions, and something goes
wrong.  For some reason, the insn that increments i from BB#4 gets moved into
the end of BB#2 just before the jump:


------------------------------------------------------------------------
@@ -69,11 +69,16 @@
         (mem/s:SI (reg:SI 85) [2 <variable>.activePeripherals S4 A32])) 25
{mov
si} (nil)
     (nil))

-(insn 114 99 115 2 0x0 (set (reg:SI 91)
+(insn 114 99 117 2 0x0 (set (reg:SI 91)
         (reg:SI 89)) -1 (nil)
     (nil))

-(jump_insn 115 114 116 2 0x0 (set (pc)
+(insn 117 114 115 2 0x0 (set (reg/v:SI 74)
+        (plus:SI (reg/v:SI 74)
+            (const_int 1 [0x1]))) -1 (nil)
+    (nil))
+
+(jump_insn 115 117 116 2 0x0 (set (pc)
         (label_ref 47)) -1 (nil)
     (nil))
 ;; End of basic block 2, registers live:
------------------------------------------------------------------------

  I've tracked this down enough to understand that bypass_block thinks that
insn 39 is a CC setter for the conditional jump insn 22, and that the new insn
117 is a duplicate of insn 39 that has been placed on the cfg edge that now
points from BB#2 to BB#6; and that it believes this cc setter is required in
order for the conditional jump at BB#6 which is a duplicate of the one at BB#4
to still be valid.

  What I don't understand, though, is why it gets lifted into BB#2.  It is
happening in commit_one_edge_insertion, at the comment that reads:

      /* If the source has one successor and the edge is not abnormal,
         insert there.  Except for the entry block.  */

which is where the stray insn 117 gets inserted into the tail end of BB#2.
But I can't quite grok what the actual /bug/ is here.  Should that edge not
have been chosen for bypassing?  Or is the decision to bypass correct, but the
placement of the hoisted insn wrong?

  Is the bug perhaps because the bypass code notes that there are two code
paths from BB#2 to BB#7, notices that the indirect one depends on a condition
that it knows is true, but doesn't realise that that doesn't mean they are
equivalent to each other?

  That is, is it because after the replacement of the cjump insn 100 with the
non-conditional jump insn 115, there are effectively *two* edges from BB#2 to
BB#7, only /one/ of which should have the SET insn moved onto it, and
something in gcc is assuming that any two edges with the same source and
destination must be actually identical, an assumption which can be violated in
the presence of copy instructions inserted on one edge?  Is there anything in
the cfg code that would try and merge edges in this fashion?

    cheers,
      DaveK

[*] - RTL before bad transformation
----------------------------------<snip!>----------------------------------

;; Start of basic block 2, registers live: (nil)
(note 81 14 18 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(insn 18 81 95 2 0x18d50390 (set (reg/v:SI 74)
        (const_int 0 [0x0])) 25 {movsi} (nil)
    (expr_list:REG_EQUAL (const_int 0 [0x0])
        (nil)))

(insn 95 18 108 2 0x18d50390 (set (reg:SI 85)
        (mem/f:SI (symbol_ref/v:SI ("manager")) [5 manager+0 S4 A32])) 25
{movsi} (nil)
    (nil))

(insn 108 95 99 2 0x0 (set (reg/s/f:SI 75)
        (reg:SI 85)) -1 (nil)
    (nil))

(insn 99 108 100 2 0x18d50390 (set (reg:SI 89)
        (mem/s:SI (reg:SI 85) [2 <variable>.activePeripherals S4 A32])) 25
{movsi} (nil)
    (nil))

(jump_insn 100 99 20 2 0x18d50390 (set (pc)
        (if_then_else (eq:SI (const_int 0 [0x0])
                (reg:SI 89))
            (label_ref 47)
            (pc))) 45 {int_cond_branch} (nil)
    (expr_list:REG_BR_PRED (concat (const_int 13 [0xd])
            (const_int 3600 [0xe10]))
        (nil)))
;; End of basic block 2, registers live:
 (nil)

(note 20 100 41 NOTE_INSN_LOOP_BEG)

;; Start of basic block 4, registers live: (nil)
(code_label 41 20 82 4 7 "" [1 uses])

(note 82 41 39 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

(insn 39 82 40 4 0x18d50390 (set (reg/v:SI 74)
        (plus:SI (reg/v:SI 74)
            (const_int 1 [0x1]))) 2 {addsi3} (nil)
    (nil))

(note 40 39 104 4 NOTE_INSN_LOOP_CONT)

(note 104 40 22 4 NOTE_INSN_LOOP_VTOP)

(jump_insn 22 104 84 4 0x18d50390 (set (pc)
        (if_then_else (ne:SI (const_int 0 [0x0])
                (reg/v:SI 74))
            (label_ref 47)
            (pc))) 46 {int_cond_branch_rev} (nil)
    (nil))
;; End of basic block 4, registers live:
 (nil)

;; Start of basic block 5, registers live: (nil)
(note 84 22 27 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(insn 27 84 28 5 0x18d50390 (set (reg/s:SI 79)
        (mem/s:SI (reg:SI 85) [2 <variable>.activePeripherals S4 A32])) 25
{movsi} (nil)
    (nil))

(jump_insn 28 27 46 5 0x18d50390 (set (pc)
        (if_then_else (ne:SI (const_int 0 [0x0])
                (reg/s:SI 79))
            (label_ref 41)
            (pc))) 46 {int_cond_branch_rev} (nil)
    (nil))
;; End of basic block 5, registers live:
 (nil)

(note 46 28 47 NOTE_INSN_LOOP_END)

;; Start of basic block 6, registers live: (nil)
(code_label 47 46 88 6 4 "" [2 uses])

(note 88 47 49 6 [bb 6] NOTE_INSN_BASIC_BLOCK)

(jump_insn 49 88 89 6 0x18d50390 (set (pc)
        (if_then_else (ne:SI (const_int 0 [0x0])
                (reg/v:SI 74))
            (label_ref 64)
            (pc))) 46 {int_cond_branch_rev} (nil)
    (nil))
;; End of basic block 6, registers live:
 (nil)

;; Start of basic block 7, registers live: (nil)
(note 89 49 53 7 [bb 7] NOTE_INSN_BASIC_BLOCK)

-- 
Can't think of a witty .sigline today....

GCSE again: bypass_conditional_jumps -vs- commit_edge_insertions - problem with ccsetters?

Reply via email to