On Mon, 27 Jun 2016, Richard Biener wrote: > On Wed, 15 Jun 2016, Richard Sandiford wrote: > > > Richard Biener <rguent...@suse.de> writes: > > > With the proposed cost change for vector construction we will end up > > > vectorizing the testcase in PR68961 again (on x86_64 and likely > > > on ppc64le as well after that target gets adjustments). Currently > > > we can't optimize that away again noticing the direct overlap of > > > argument and return registers. The obstackle is > > > > > > (insn 7 4 8 2 (set (reg:V2DF 93) > > > (vec_concat:V2DF (reg/v:DF 91 [ a ]) > > > (reg/v:DF 92 [ aa ]))) > > > ... > > > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ]) > > > (subreg:DI (reg:TI 88 [ D.1756 ]) 0)) > > > (insn 24 21 11 2 (set (reg:DI 100 [+8 ]) > > > (subreg:DI (reg:TI 88 [ D.1756 ]) 8)) > > > > > > which we eventually optimize to DFmode subregs of (reg:V2DF 93). > > > > > > First of all simplify_subreg doesn't handle the subregs of a vec_concat > > > (easy fix below). > > > > > > Then combine doesn't like to simplify the multi-use (it tries some > > > parallel it seems). So I went to forwprop which eventually manages > > > to do this but throws away the result (reg:DF 91) or (reg:DF 92) > > > because it is not a constant. Thus I allow arbitrary simplification > > > results for SUBREGs of [VEC_]CONCAT operations. There doesn't seem > > > to be a magic flag to tell it to restrict to the case where all > > > uses can be simplified or so, nor to restrict simplifications to a REG. > > > But I don't see any undesirable simplifications of (subreg > > > ([vec_]concat)). > > > > Adding that as a special case to propgate_rtx feels like a hack though :-) > > I think: > > > > > Index: gcc/fwprop.c > > > =================================================================== > > > *** gcc/fwprop.c (revision 237286) > > > --- gcc/fwprop.c (working copy) > > > *************** propagate_rtx (rtx x, machine_mode mode, > > > *** 664,670 **** > > > || (GET_CODE (new_rtx) == SUBREG > > > && REG_P (SUBREG_REG (new_rtx)) > > > && (GET_MODE_SIZE (mode) > > > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)))))) > > > flags |= PR_CAN_APPEAR; > > > if (!varying_mem_p (new_rtx)) > > > flags |= PR_HANDLE_MEM; > > > --- 664,673 ---- > > > || (GET_CODE (new_rtx) == SUBREG > > > && REG_P (SUBREG_REG (new_rtx)) > > > && (GET_MODE_SIZE (mode) > > > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))))) > > > ! || ((GET_CODE (new_rtx) == VEC_CONCAT > > > ! || GET_CODE (new_rtx) == CONCAT) > > > ! && GET_CODE (x) == SUBREG)) > > > flags |= PR_CAN_APPEAR; > > > if (!varying_mem_p (new_rtx)) > > > flags |= PR_HANDLE_MEM; > > > > ...this if statement should fundamentally only test new_rtx. > > E.g. we'd want the same thing for any SUBREG inside X. > > > > How about changing: > > > > /* The replacement we made so far is valid, if all of the recursive > > replacements were valid, or we could simplify everything to > > a constant. */ > > return valid_ops || can_appear || CONSTANT_P (tem); > > > > so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid? > > I suppose that's likely to increase register pressure though, > > if only some uses of new_rtx simplify. (There again, requiring all > > uses to be replacable could make hot code the hostage of cold code.) > > Yes, my fear was about register presure increase for the case not all > uses can be replaced (fwprop doesn't seem to have code to verify or > require that). > > I can avoid checking for GET_CODE (x) == SUBREG and add a PR_REG > case to restrict REG_P (tem) && !HARD_REGISTER_P (tem) to the > new_rtx == [VEC_]CONCAT case for example.
Btw, I have installed the simplify-rtx.c part now. Richard. 2016-06-29 Richard Biener <rguent...@suse.de> PR rtl-optimization/68961 * simplify-rtx.c (simplify_subreg): Handle VEC_CONCAT like CONCAT.