On Wed, 2020-02-05 at 13:30 +0000, Richard Sandiford wrote:
> Jeff Law <l...@redhat.com> writes:
> > Richard & Segher, if y'all could check my analysis here, it'd be
> > appreciated.
> > 
> > pr90275 is a P2 regression that is only triggering on ARM.  David's
> > testcase in c#1 is the best for this problem as it doesn't require
> > magic flags like -fno-dce to trigger.
> > 
> > The block in question:
> > 
> > > (code_label 89 88 90 24 15 (nil) [0 uses])
> > > (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
> > > (insn 97 90 98 24 (parallel [
> > >             (set (reg:CC 100 cc)
> > >                 (compare:CC (reg:SI 131 [ d_lsm.21 ])
> > >                     (const_int 0 [0])))
> > >             (set (reg:SI 135 [ d_lsm.21 ])
> > >                 (reg:SI 131 [ d_lsm.21 ]))
> > >         ]) "pr90275.c":21:45 248 {*movsi_compare0}
> > >      (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ])
> > >         (nil)))
> > > (insn 98 97 151 24 (set (reg:SI 136 [+4 ])
> > >         (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 
> > > {*arm_movsi_insn}
> > >      (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ])
> > >         (expr_list:REG_DEAD (reg:CC 100 cc)
> > >             (nil))))
> > > (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ])
> > >         (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
> > >      (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ])
> > >         (nil)))
> > > (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ])
> > >         (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}
> > >      (expr_list:REG_DEAD (reg:SI 136 [+4 ])
> > >         (nil)))
> > > 
> > insns 97 and 151 are the most important.
> > 
> > We process insn 97 which creates an equivalency between r135 and r131. 
> > This is expressed by putting both on on the "same_value" chain
> > (table_elt->{next,prev}_same_value).
> > 
> > When we put the REGs on the chain we'll set REG_QTY to a positive
> > number which indicates their values are valid.
> > 
> > We continue processing insns forward and run into insn 151 which is a
> > self-copy.
> > 
> > First CSE will invalidate r131 (because its set).  Invalidation is
> > accomplished by setting REG_QTY for r131 to a negative value.  It does
> > not remove r131 from the same value chains.
> > 
> > Then CSE will call insert_regs for r131.  The qty is not valid, so we
> > get into this code:
> > 
> > >      if (modified || ! qty_valid)
> > >         {
> > >           if (classp)
> > >             for (classp = classp->first_same_value;
> > >                  classp != 0;
> > >                  classp = classp->next_same_value)
> > >               if (REG_P (classp->exp)
> > >                   && GET_MODE (classp->exp) == GET_MODE (x))
> > >                 {
> > >                   unsigned c_regno = REGNO (classp->exp);
> > > 
> > >                   gcc_assert (REGNO_QTY_VALID_P (c_regno));
> > > [ ... ]
> > 
> > So we walk the chain of same values for r131.  WHen walking we run into
> > r131 itself.  Since r131 has been invalidated  we trip the assert.
> > 
> > 
> > The fix is pretty simple.  We just arrange to stop processing insns
> > that are nop reg->reg copies much like we already do for mem->mem
> > copies and (set (pc) (pc)).
> > 
> > This has bootstrapped and regression tested on x86_64.  I've also
> > verified it fixes the testcase in c#1 of pr90275, the test in pr93125
> > and pr92388.  Interestingly enough I couldn't trigger the original
> > testcase in 90275, but I'm confident this is ultimately all the same
> > problem.
> 
> This looks similar to the infamous (to me):
> 
>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01581.html
> 
> which had to be reverted because it broke powerpc64 bootstrap.
> The problem was that n_sets is misleading for calls:
> 
>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01858.html
> 
> That's easy to fix (and I have a fix).  But given the damage this caused
> last time, I think it's probably best left to GCC 11.
Yea, it's closely related.  In your case you need to effectively ignore
the nop insn to get the optimization you want.  In mine that nop insn
causes an ICE.

I think we could take your cse bits + adding a !CALL_P separately from
the simplify-rtx stuff which Segher objected to.  THat'd likely solve
the ARM ICEs and take you a tiny step forward on optimizing that SVE
case.  Thoughts?

Jeff

Reply via email to