On Tue, Aug 16, 2022 at 6:30 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > > On Tue, 9 Aug 2022 at 18:42, Richard Biener <richard.guent...@gmail.com> > > wrote: > >> > >> On Tue, Aug 9, 2022 at 12:10 PM Prathamesh Kulkarni > >> <prathamesh.kulka...@linaro.org> wrote: > >> > > >> > On Mon, 8 Aug 2022 at 14:27, Richard Biener <richard.guent...@gmail.com> > >> > w>> > > > >> > > > >> > > /* If result vector has greater length than input vector, > >> > > + then allow permuting two vectors as long as: > >> > > + a) sel.nelts_per_pattern == 1 > >> > > + b) sel.npatterns == len of input vector. > >> > > + The intent is to permute input vectors, and > >> > > + dup the elements in resulting vector to target vector length. */ > >> > > + > >> > > + if (maybe_gt (TYPE_VECTOR_SUBPARTS (type), > >> > > + TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))) > >> > > + { > >> > > + nelts = sel.encoding ().npatterns (); > >> > > + if (sel.encoding ().nelts_per_pattern () != 1 > >> > > + || (!known_eq (nelts, TYPE_VECTOR_SUBPARTS (TREE_TYPE > >> > > (arg0))))) > >> > > + return NULL_TREE; > >> > > + } > >> > > > >> > > so the only case you add is non-VLA to VLA and there > >> > > explicitely only the case of a period that's same as the > >> > > element count in the input vectors. > >> > > > >> > > > >> > > @@ -2602,6 +2602,9 @@ dump_generic_node (pretty_printer *pp, tree > >> > > node, int spc, dump_flags_t flags, > >> > > pp_space (pp); > >> > > } > >> > > } > >> > > + if (VECTOR_TYPE_P (TREE_TYPE (node)) > >> > > + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (node)).is_constant ()) > >> > > + pp_string (pp, ", ... "); > >> > > pp_right_brace (pp); > >> > > > >> > > btw, I do wonder if VLA CONSTRUCTORs are a "thing"? Are they? > >> > Well, it got created for the following case after folding: > >> > svint32_t f2(int a, int b, int c, int d) > >> > { > >> > int32x4_t v = {a, b, c, d}; > >> > return svld1rq_s32 (svptrue_b8 (), &v[0]); > >> > } > >> > > >> > The svld1rq_s32 call gets folded to: > >> > v = {a, b, c, d} > >> > lhs = VEC_PERM_EXPR<v, v, {0, 1, 2, 3, ... }> > >> > > >> > fold_vec_perm then folds the above VEC_PERM_EXPR to > >> > VLA constructor, since elements in v (in_elts) are not constant, and > >> > need_ctor is thus true: > >> > lhs = {a, b, c, d, ...} > >> > I added "..." to make it more explicit that it's a VLA constructor. > >> > >> But I doubt we do anything reasonable with such a beast? Do we? > >> I suppose it's like a vec_duplicate if you view it as V1TImode > >> but do we actually make sure to do this duplication? > > I am not sure. As mentioned above, the current code-gen for VLA > > constructor looks pretty bad. > > Should we avoid folding VLA constructors for now ? > > VLA constructors aren't really a thing. At least, the only VLA vector > you could represent with current CONSTRUCTOR nodes is a fixed-length > sequence at the start of an otherwise zero vector. I'm not sure > we even use that though (perhaps we do and I've forgotten). > > > I guess these are 2 different issues: > > (a) Resolving ICE with VEC_PERM_EXPR for above aarch64 tests. > > (b) Extending fold_vec_perm to handle vectors with differing lengths. > > > > For (a), I think the issue with using: > > res_type = gimple_assign_lhs (stmt) > > in previous patch, was that op2's type will change to match tgt_units, > > if we go thru > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) branch, > > and may thus not be same as len(lhs_type) anymore, and hit the assert > > in fold_vec_perm. > > > > IIUC, for lhs = VEC_PERM_EXPR<rhs1, rhs2, mask>, we now have the > > following semantics: > > (1) Element types for lhs, rhs1 and rhs2 should be the same. > > (2) len(lhs) == len(mask) and len(rhs1) == len(rhs2). > > Yeah. > > > The attached patch changes res_type from TREE_TYPE (arg0) to following: > > res_type = build_vector_type (TREE_TYPE (TREE_TYPE (arg0)), > > TYPE_VECTOR_SUBPARTS (op2)) > > so it has same element type as arg0 (and arg1) and len of op2. > > Does that look reasonable ? > > > > If we need a cast from res_type to lhs_type, then both would be fixed > > width vectors > > with len(lhs_type) being a multiple of len(res_type). > > IIUC, we don't support casting from VLA vector to/from fixed width vector, > > Yes, that's not supported as a cast. If the compiler knows the > length of the "VLA" vector then it's not VLA. If it doesn't > know the length of the VLA vector then the sizes could be different > (preventing VIEW_CONVERT_EXPR) and the number of elements could be > different (preventing pointwise CONVERT_EXPRs). > > > or from VLA vector of one type to VLA vector of other type ? > > That's supported though. They work just like VLS vectors: if the sizes > are the same then we can use VIEW_CONVERT_EXPR, if the number of elements > are the same then we can do pointwise conversions (e.g. element-by-element > extensions, truncations, conversions to float, conversions to integer, etc). > > > Currently, if op2 is VLA, and we enter the branch: > > (code == VIEW_CONVERT_EXPR || code2 == VIEW_CONVERT_EXPR) > > then I think it will bail out because op2_units will not be a compile > > time constant, > > and constant_multiple_p (op2_units, tgt_units, &factor) would return false. > > > > The patch resolves ICE's for aarch64 cases, without regressing the > > ones in PR106360. > > Does it look OK ? > > Richi should have the final say, but it looks OK in principle to me. > I'm not sure it's worth applying independently of the follow-on patch > though, if that patch is in the offing anyway. > > I guess my only question is whether tree-ssa-forwprop.cc really needs > to build a new type. Couldn't it just use the TREE_TYPE of the lhs > instead?
I think the point was they are not necssarily the same when we looked through a VIEW_CONVERT? A comment might be in order here. Btw, please don't add new asserts that trivially hold in critical paths. diff --git a/gcc/match.pd b/gcc/match.pd index 330c1db0c8e..aa20cc713c5 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7845,6 +7845,12 @@ and, (with { tree op0 = @0, op1 = @1, op2 = @2; + + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (type), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op2)))); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0)), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (op1)))); + machine_mode result_mode = TYPE_MODE (type); machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0)); > > Thanks, > Richard