On Sun, 3 Jul 2016, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Wed, 15 Jun 2016, Richard Sandiford wrote: > > > >> Richard Biener <rguent...@suse.de> writes: > >> > With the proposed cost change for vector construction we will end up > >> > vectorizing the testcase in PR68961 again (on x86_64 and likely > >> > on ppc64le as well after that target gets adjustments). Currently > >> > we can't optimize that away again noticing the direct overlap of > >> > argument and return registers. The obstackle is > >> > > >> > (insn 7 4 8 2 (set (reg:V2DF 93) > >> > (vec_concat:V2DF (reg/v:DF 91 [ a ]) > >> > (reg/v:DF 92 [ aa ]))) > >> > ... > >> > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ]) > >> > (subreg:DI (reg:TI 88 [ D.1756 ]) 0)) > >> > (insn 24 21 11 2 (set (reg:DI 100 [+8 ]) > >> > (subreg:DI (reg:TI 88 [ D.1756 ]) 8)) > >> > > >> > which we eventually optimize to DFmode subregs of (reg:V2DF 93). > >> > > >> > First of all simplify_subreg doesn't handle the subregs of a vec_concat > >> > (easy fix below). > >> > > >> > Then combine doesn't like to simplify the multi-use (it tries some > >> > parallel it seems). So I went to forwprop which eventually manages > >> > to do this but throws away the result (reg:DF 91) or (reg:DF 92) > >> > because it is not a constant. Thus I allow arbitrary simplification > >> > results for SUBREGs of [VEC_]CONCAT operations. There doesn't seem > >> > to be a magic flag to tell it to restrict to the case where all > >> > uses can be simplified or so, nor to restrict simplifications to a REG. > >> > But I don't see any undesirable simplifications of (subreg > >> > ([vec_]concat)). > >> > >> Adding that as a special case to propgate_rtx feels like a hack though :-) > >> I think: > >> > >> > Index: gcc/fwprop.c > >> > =================================================================== > >> > *** gcc/fwprop.c (revision 237286) > >> > --- gcc/fwprop.c (working copy) > >> > *************** propagate_rtx (rtx x, machine_mode mode, > >> > *** 664,670 **** > >> > || (GET_CODE (new_rtx) == SUBREG > >> > && REG_P (SUBREG_REG (new_rtx)) > >> > && (GET_MODE_SIZE (mode) > >> > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)))))) > >> > flags |= PR_CAN_APPEAR; > >> > if (!varying_mem_p (new_rtx)) > >> > flags |= PR_HANDLE_MEM; > >> > --- 664,673 ---- > >> > || (GET_CODE (new_rtx) == SUBREG > >> > && REG_P (SUBREG_REG (new_rtx)) > >> > && (GET_MODE_SIZE (mode) > >> > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))))) > >> > ! || ((GET_CODE (new_rtx) == VEC_CONCAT > >> > ! || GET_CODE (new_rtx) == CONCAT) > >> > ! && GET_CODE (x) == SUBREG)) > >> > flags |= PR_CAN_APPEAR; > >> > if (!varying_mem_p (new_rtx)) > >> > flags |= PR_HANDLE_MEM; > >> > >> ...this if statement should fundamentally only test new_rtx. > >> E.g. we'd want the same thing for any SUBREG inside X. > >> > >> How about changing: > >> > >> /* The replacement we made so far is valid, if all of the recursive > >> replacements were valid, or we could simplify everything to > >> a constant. */ > >> return valid_ops || can_appear || CONSTANT_P (tem); > >> > >> so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid? > >> I suppose that's likely to increase register pressure though, > >> if only some uses of new_rtx simplify. (There again, requiring all > >> uses to be replacable could make hot code the hostage of cold code.) > > > > Yes, my fear was about register presure increase for the case not all > > uses can be replaced (fwprop doesn't seem to have code to verify or > > require that). > > > > I can avoid checking for GET_CODE (x) == SUBREG and add a PR_REG > > case to restrict REG_P (tem) && !HARD_REGISTER_P (tem) to the > > new_rtx == [VEC_]CONCAT case for example. > > I don't think that helps though. There might be other uses of a > VEC_CONCAT that aren't SUBREGs, in which case we'd have the same > problem of keeping both values live at once. > > How about restricting the REG_P (tem) && !HARD_REGISTER_P (tem) > to cases where new_rtx has more words than tem?
So would you really make a simple mode-size check here? I wonder which cases are there other than the subreg of [vec_concat] that would end up with this case. That is, if (REG_P (tem) && !HARD_REGISTER_P (tem) && GET_MODE (tem) == GET_MODE_INNER (GET_MODE (new_rtx)) && (VECTOR_MODE_P (GET_MODE (new_rtx)) || COMPLEX_MODE_P (GET_MODE (new_rtx)))) return true; works as would constraining new_rtx to [VEC_]CONCAT instead of vector/complex modes. I'm worried about relaxing it further as partial int modes also have a GET_MODE_INNER - relaxing to && GET_MODE_INNER (GET_MODE (new_rtx)) != GET_MODE (new_rtx) I mean. If we restrict it to [VEC_]CONCAT we can even allow any REG_P && !HARD_REGISTER_P as I can't see any outer (non-noop) operation that will simplify into a REG_P besides a subreg. But the above form would capture the intent to allow simplifying operations on vector / complex to a scalar component. Which means I can test the following if that looks better than what I had originally. There's Seghers 2->2 combine patch but that has wider impact as well. Thanks, Richard. 2016-07-04 Richard Biener <rguent...@suse.de> PR rtl-optimization/68961 * fwprop.c (propagate_rtx): Allow SUBREGs of VEC_CONCAT and CONCAT to simplify to a non-constant. * gcc.target/i386/pr68961.c: New testcase. Index: gcc/testsuite/gcc.target/i386/pr68961.c =================================================================== *** gcc/testsuite/gcc.target/i386/pr68961.c (revision 0) --- gcc/testsuite/gcc.target/i386/pr68961.c (working copy) *************** *** 0 **** --- 1,19 ---- + /* { dg-do compile { target lp64 } } */ + /* { dg-options "-O3 -fno-vect-cost-model -fdump-tree-slp2-details" } */ + + struct x { double d[2]; }; + + struct x + pack (double a, double aa) + { + struct x u; + u.d[0] = a; + u.d[1] = aa; + return u; + } + + /* The function should be optimized to just return as arguments and + result exactly overlap even when previously vectorized. */ + + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" } } */ + /* { dg-final { scan-assembler-not "mov" } } */ Index: gcc/fwprop.c =================================================================== *** gcc/fwprop.c (revision 237955) --- gcc/fwprop.c (working copy) *************** propagate_rtx_1 (rtx *px, rtx old_rtx, r *** 619,624 **** --- 619,633 ---- *px = tem; + /* Allow replacements that simplify operations on a vector or complex + value to a component. The most prominent case is + (subreg ([vec_]concat ...)). */ + if (REG_P (tem) && !HARD_REGISTER_P (tem) + && GET_MODE (tem) == GET_MODE_INNER (GET_MODE (new_rtx)) + && (VECTOR_MODE_P (GET_MODE (new_rtx)) + || COMPLEX_MODE_P (GET_MODE (new_rtx)))) + return true; + /* The replacement we made so far is valid, if all of the recursive replacements were valid, or we could simplify everything to a constant. */