On Wed, 15 Jun 2016, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > With the proposed cost change for vector construction we will end up
> > vectorizing the testcase in PR68961 again (on x86_64 and likely
> > on ppc64le as well after that target gets adjustments).  Currently
> > we can't optimize that away again noticing the direct overlap of
> > argument and return registers.  The obstackle is
> >
> > (insn 7 4 8 2 (set (reg:V2DF 93)
> >         (vec_concat:V2DF (reg/v:DF 91 [ a ])
> >             (reg/v:DF 92 [ aa ]))) 
> > ...
> > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> >         (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> > (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> >         (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> >
> > which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> >
> > First of all simplify_subreg doesn't handle the subregs of a vec_concat
> > (easy fix below).
> >
> > Then combine doesn't like to simplify the multi-use (it tries some
> > parallel it seems).  So I went to forwprop which eventually manages
> > to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> > because it is not a constant.  Thus I allow arbitrary simplification
> > results for SUBREGs of [VEC_]CONCAT operations.  There doesn't seem
> > to be a magic flag to tell it to restrict to the case where all
> > uses can be simplified or so, nor to restrict simplifications to a REG.
> > But I don't see any undesirable simplifications of (subreg 
> > ([vec_]concat)).
> 
> Adding that as a special case to propgate_rtx feels like a hack though :-)
> I think:
> 
> > Index: gcc/fwprop.c
> > ===================================================================
> > *** gcc/fwprop.c    (revision 237286)
> > --- gcc/fwprop.c    (working copy)
> > *************** propagate_rtx (rtx x, machine_mode mode,
> > *** 664,670 ****
> >         || (GET_CODE (new_rtx) == SUBREG
> >       && REG_P (SUBREG_REG (new_rtx))
> >       && (GET_MODE_SIZE (mode)
> > !         <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))))))
> >       flags |= PR_CAN_APPEAR;
> >     if (!varying_mem_p (new_rtx))
> >       flags |= PR_HANDLE_MEM;
> > --- 664,673 ----
> >         || (GET_CODE (new_rtx) == SUBREG
> >       && REG_P (SUBREG_REG (new_rtx))
> >       && (GET_MODE_SIZE (mode)
> > !         <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)))))
> > !       || ((GET_CODE (new_rtx) == VEC_CONCAT
> > !      || GET_CODE (new_rtx) == CONCAT)
> > !     && GET_CODE (x) == SUBREG))
> >       flags |= PR_CAN_APPEAR;
> >     if (!varying_mem_p (new_rtx))
> >       flags |= PR_HANDLE_MEM;
> 
> ...this if statement should fundamentally only test new_rtx.
> E.g. we'd want the same thing for any SUBREG inside X.
> 
> How about changing:
> 
>   /* The replacement we made so far is valid, if all of the recursive
>      replacements were valid, or we could simplify everything to
>      a constant.  */
>   return valid_ops || can_appear || CONSTANT_P (tem);
> 
> so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid?
> I suppose that's likely to increase register pressure though,
> if only some uses of new_rtx simplify.  (There again, requiring all
> uses to be replacable could make hot code the hostage of cold code.)

Yes, my fear was about register presure increase for the case not all
uses can be replaced (fwprop doesn't seem to have code to verify or
require that).

I can avoid checking for GET_CODE (x) == SUBREG and add a PR_REG
case to restrict REG_P (tem) && !HARD_REGISTER_P (tem) to the
new_rtx == [VEC_]CONCAT case for example.

Richard.

> Thanks,
> Richard
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to