On Mon, 6 Feb 2023 at 20:14, Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Perhaps I'm missing something (I'm not too familiar with SVE semantics), but > is there > a reason that the solution for PR96473 uses a VEC_PERM_EXPR and not just a > VEC_DUPLICATE_EXPR? The folding of sv1d1rq (svptrue_..., ...) doesn't seem > to > require either the blending or the permutation functionality of a > VEC_PERM_EXPR. > Instead, it seems to be misusing (the modified) VEC_PERM_EXPR as a form of > VIEW_CONVERT_EXPR that allows us to convert/mismatch the type of the > operands > to the type of the result. Hi, I am not sure if we could use VEC_DUPLICATE_EXPR for PR96463 case as-is. Perhaps we could extend VEC_DUPLICATE_EXPR to take N operands, so the resulting vector has npatterns = N, nelts_per_pattern = 1 ? AFAIU, extending VEC_PERM_EXPR to handle vectors with different lengths, would allow for more optimization opportunities besides PR96463. > > Conceptually, (as in Richard's original motivation for the PR), > svint32_t foo (int32x4_t x) { return svld1rq (svptrue_b8 (), &x[0]); } > can be optimized to (something like) > svint32_t foo (int32x4_t x) { return svdup_32 (x[0]); } // or dup z0.q, > z0.q[0] equivalent I guess that should be equivalent to svdupq_s32 (x[0], x[1], x[2], x[3]) ?
Thanks, Prathamesh > hence it makes sense for fold to transform the gimple form of the first, > into the > gimple form of the second(?) > > Just curious. > Roger > -- > > > -----Original Message----- > > From: Richard Sandiford <richard.sandif...@arm.com> > > Sent: 06 February 2023 12:22 > > To: Richard Biener <richard.guent...@gmail.com> > > Cc: Roger Sayle <ro...@nextmovesoftware.com>; GCC Patches <gcc- > > patc...@gcc.gnu.org> > > Subject: Re: [DOC PATCH] Document the VEC_PERM_EXPR tree code (and minor > > clean-ups). > > > > Richard Biener <richard.guent...@gmail.com> writes: > > > On Sat, Feb 4, 2023 at 9:35 PM Roger Sayle <ro...@nextmovesoftware.com> > > wrote: > > >> > > >> > > >> This patch (primarily) documents the VEC_PERM_EXPR tree code in > > >> generic.texi. For ease of review, it is provided below as a pair of > > >> diffs. The first contains just the new text added to describe > > >> VEC_PERM_EXPR, the second tidies up this part of the documentation by > > >> sorting the tree codes into alphabetical order, and providing > > >> consistent section naming/capitalization, so changing this section > > >> from "Vectors" to "Vector Expressions" (matching the nearby "Unary > > >> and Binary Expressions"). > > >> > > >> Tested with make pdf and make html on x86_64-pc-linux-gnu. > > >> The reviewer(s) can decide whether to approve just the new content, > > >> or the content+clean-up. Ok for mainline? > > > > > > +@item VEC_PERM_EXPR > > > +This node represents a vector permute/blend operation. The three > > > +operands must be vectors of the same number of elements. The first > > > +and second operands must be vectors of the same type as the entire > > > +expression, > > > > > > this was recently relaxed for the case of constant permutes in which > > > case the first and second operands only have to have the same element > > > type as the result. See tree-cfg.cc:verify_gimple_assign_ternary. > > > > > > The following description will become a bit more awkward here and for > > > rhs1/rhs2 with different number of elements the modulo interpretation > > > doesn't hold - I believe we require in-bounds elements for constant > > > permutes. Richard can probably clarify things here. > > > > I thought that the modulo behaviour still applies when the node has a > constant > > selector, it's just that the in-range form is the canonical one. > > > > With variable-length vectors, I think it's in principle possible to have a > stepped > > constant selector whose start elements are in-range but whose final > elements > > aren't (and instead wrap around when applied). > > E.g. the selector could zip the last quarter of the inputs followed by the > first > > quarter. > > > > Thanks, > > Richard >