On Mon, 27 Jun 2016, Richard Biener wrote:

> On Wed, 15 Jun 2016, Richard Sandiford wrote:
> 
> > Richard Biener <rguent...@suse.de> writes:
> > > With the proposed cost change for vector construction we will end up
> > > vectorizing the testcase in PR68961 again (on x86_64 and likely
> > > on ppc64le as well after that target gets adjustments).  Currently
> > > we can't optimize that away again noticing the direct overlap of
> > > argument and return registers.  The obstackle is
> > >
> > > (insn 7 4 8 2 (set (reg:V2DF 93)
> > >         (vec_concat:V2DF (reg/v:DF 91 [ a ])
> > >             (reg/v:DF 92 [ aa ]))) 
> > > ...
> > > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> > >         (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> > > (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> > >         (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> > >
> > > which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> > >
> > > First of all simplify_subreg doesn't handle the subregs of a vec_concat
> > > (easy fix below).
> > >
> > > Then combine doesn't like to simplify the multi-use (it tries some
> > > parallel it seems).  So I went to forwprop which eventually manages
> > > to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> > > because it is not a constant.  Thus I allow arbitrary simplification
> > > results for SUBREGs of [VEC_]CONCAT operations.  There doesn't seem
> > > to be a magic flag to tell it to restrict to the case where all
> > > uses can be simplified or so, nor to restrict simplifications to a REG.
> > > But I don't see any undesirable simplifications of (subreg 
> > > ([vec_]concat)).
> > 
> > Adding that as a special case to propgate_rtx feels like a hack though :-)
> > I think:
> > 
> > > Index: gcc/fwprop.c
> > > ===================================================================
> > > *** gcc/fwprop.c  (revision 237286)
> > > --- gcc/fwprop.c  (working copy)
> > > *************** propagate_rtx (rtx x, machine_mode mode,
> > > *** 664,670 ****
> > >         || (GET_CODE (new_rtx) == SUBREG
> > >             && REG_P (SUBREG_REG (new_rtx))
> > >             && (GET_MODE_SIZE (mode)
> > > !               <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))))))
> > >       flags |= PR_CAN_APPEAR;
> > >     if (!varying_mem_p (new_rtx))
> > >       flags |= PR_HANDLE_MEM;
> > > --- 664,673 ----
> > >         || (GET_CODE (new_rtx) == SUBREG
> > >             && REG_P (SUBREG_REG (new_rtx))
> > >             && (GET_MODE_SIZE (mode)
> > > !               <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)))))
> > > !       || ((GET_CODE (new_rtx) == VEC_CONCAT
> > > !            || GET_CODE (new_rtx) == CONCAT)
> > > !           && GET_CODE (x) == SUBREG))
> > >       flags |= PR_CAN_APPEAR;
> > >     if (!varying_mem_p (new_rtx))
> > >       flags |= PR_HANDLE_MEM;
> > 
> > ...this if statement should fundamentally only test new_rtx.
> > E.g. we'd want the same thing for any SUBREG inside X.
> > 
> > How about changing:
> > 
> >   /* The replacement we made so far is valid, if all of the recursive
> >      replacements were valid, or we could simplify everything to
> >      a constant.  */
> >   return valid_ops || can_appear || CONSTANT_P (tem);
> > 
> > so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid?
> > I suppose that's likely to increase register pressure though,
> > if only some uses of new_rtx simplify.  (There again, requiring all
> > uses to be replacable could make hot code the hostage of cold code.)
> 
> Yes, my fear was about register presure increase for the case not all
> uses can be replaced (fwprop doesn't seem to have code to verify or
> require that).
> 
> I can avoid checking for GET_CODE (x) == SUBREG and add a PR_REG
> case to restrict REG_P (tem) && !HARD_REGISTER_P (tem) to the
> new_rtx == [VEC_]CONCAT case for example.

Btw, I have installed the simplify-rtx.c part now.

Richard.

2016-06-29  Richard Biener  <rguent...@suse.de>

        PR rtl-optimization/68961
        * simplify-rtx.c (simplify_subreg): Handle VEC_CONCAT like CONCAT.

Reply via email to