Richard Biener <richard.guent...@gmail.com> writes: > On Tue, Dec 12, 2017 at 4:32 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Sun, Dec 10, 2017 at 12:06 AM, Richard Sandiford >>> <richard.sandif...@linaro.org> wrote: >>>> This series is a replacement for: >>>> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00747.html >>>> based on the feedback that using VEC_PERM_EXPR would be better. >>>> >>>> The changes are: >>>> >>>> (1) Remove the restriction that the selector elements have to have the >>>> same width as the data elements, but only for constant selectors. >>>> This lets through the cases we need without also allowing >>>> potentially-expensive ops. Adding support for the variable >>>> case can be done later if it seems useful, but it's not trivial. >>>> >>>> (2) Encode the integer form of constant selectors (vec_perm_indices) >>>> in the same way as the new VECTOR_CST encoding, so that it can >>>> cope with variable-length vectors. >>>> >>>> (3) Remove the vec_perm_const optab and reuse the target hook to emit >>>> code. This avoids the need to create a CONST_VECTOR for the wide >>>> selectors, and hence the need to have a corresponding wide vector >>>> mode (which the target wouldn't otherwise need or support). >>> >>> Hmm. Makes sense I suppose. >>> >>>> (4) When handling the variable vec_perm optab, check that modes can store >>>> all element indices before using them. >>>> >>>> (5) Unconditionally use ssizetype selector elements in permutes created >>>> by the vectoriser. >>> >>> Why specifically _signed_ sizetype? That sounds like an odd choice. >>> But I'll >>> eventually see when looking at the patch. >> >> Sorry, should have said. The choice doesn't matter for vector lengths >> that are a power of 2, > > which are the only ones we support anyway?
Yeah, for fixed-length at the tree level. The variable-length support allows (2^N)*X vectors for non-power-of-2 X though, and we support non-power-of-2 fixed-length vectors in RTL (e.g. V12QI). >> but for others, using a signed selector means that >> -1 always selects the last input element, whereas for unsigned selectors, >> the element selected by -1 would depend on the selector precision. (And the >> use of sizetype precision is pretty arbitrary.) > > hmm, so you are saying that vec_perm <v1, v2, { -1, -2, ... }> is equal > to vec_perm <v1, v2, {2*n-1, 2*n-2, ....}? Yeah. > tree.def defines VEC_PERM_EXPR via > > N = length(mask) > foreach i in N: > M = mask[i] % (2*N) > A = M < N ? v0[M] : v1[M-N] > > which doesn't reflect this behavior. Does this behavior persist for variable > vector permutations? __builtin_shuffle is defined to wrap though: The elements of the input vectors are numbered in memory ordering of @var{vec0} beginning at 0 and @var{vec1} beginning at @var{N}. The elements of @var{mask} are considered modulo @var{N} in the single-operand case and modulo @math{2*@var{N}} in the two-operand case. I think we need to preserve that for VEC_PERM_EXPR, otherwise we'd need to introduce the masking operation when lowering __builtin_shuffle to VEC_PERM_EXPR. >>> Does that mean we have a VNDImode vector unconditionally for the >>> permute even though a vector matching the width of the data members >>> would work? >> >> A VECTOR_CST of N DIs, yeah. It only becomes a VNDI at the rtl level >> if we're selecting 64-bit data elements. > > And on GIMPLE? Do we have a vector type with ssizetype elements > unconditionally? For autovectorised permutes, yes. Other permutes we keep the existing types. Thanks, Richard