On June 19, 2019 11:05:42 AM GMT+02:00, Richard Sandiford <richard.sandif...@arm.com> wrote: >Richard Biener <rguent...@suse.de> writes: >> On June 19, 2019 10:55:16 AM GMT+02:00, Jakub Jelinek ><ja...@redhat.com> wrote: >>>Hi! >>> >>>When VEC_[LR]SHIFT_EXPR has been replaced with VEC_PERM_EXPR, >>>vec_shl_optab >>>has been removed as unused, because we only used vec_shr_optab for >the >>>reductions. >>>Without this patch the vect-simd-*.c tests can be vectorized just >fine >>>for SSE4 and above, but can't be with SSE2. As the comment in >>>tree-vect-stmts.c tries to explain, for the inclusive scan operation >we >>>want (when using V8SImode vectors): >>> _30 = MEM <vector(8) int> [(int *)&D.2043]; >>> _31 = MEM <vector(8) int> [(int *)&D.2042]; >>> _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>; >>> _33 = _31 + _32; >>> // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] >}; >>> _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>; >>> _35 = _33 + _34; >>> // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], >_31[0]+.._31[3], >>> // _31[1]+.._31[4], ... _31[4]+.._31[7] }; >>> _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>; >>> _37 = _35 + _36; >>> // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], >_31[0]+.._31[3], >>> // _31[0]+.._31[4], ... _31[0]+.._31[7] }; >>> _38 = _30 + _37; >>> _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>; >>> MEM <vector(8) int> [(int *)&D.2043] = _39; >>> MEM <vector(8) int> [(int *)&D.2042] = _38; */ >>>For V4SImode vectors that would be VEC_PERM_EXPR <x, init, { 4, 0, 1, >2 >>>}>, >>>VEC_PERM_EXPR <x2, init, { 4, 5, 0, 1 }> and >>>VEC_PERM_EXPR <x3, init, { 3, 3, 3, 3 }> etc. >>>Unfortunately, SSE2 can't do the VEC_PERM_EXPR <x, init, { 4, 0, 1, 2 >>>}> >>>permutation (the other two it can do). Well, to be precise, it can >do >>>it >>>using the vector left shift which has been removed as unused, >provided >>>that init is initializer_zerop (shifting all zeros from the left). >>>init usually is all zeros, that is the neutral element of additive >>>reductions and couple of others too, in the unlikely case that some >>>other >>>reduction is used with scan (multiplication, minimum, maximum, >bitwise >>>and), >>>we can use a VEC_COND_EXPR with constant first argument, i.e. a blend >>>or >>>and/or. >>> >>>So, this patch reintroduces vec_shl_optab (most backends actually >have >>>those >>>patterns already) and handles its expansion and vector generic >lowering >>>similarly to vec_shr_optab - i.e. it is a VEC_PERM_EXPR where the >first >>>operand is initializer_zerop and third operand starts with a few >>>numbers >>>smaller than number of elements (doesn't matter which one, as all >>>elements >>>are same - zero) followed by nelts, nelts+1, nelts+2, ... >>>Unlike vec_shr_optab which has zero as the second operand, this one >has >>>it >>>as first operand, because VEC_PERM_EXPR canonicalization wants to >have >>>first element selector smaller than number of elements. And unlike >>>vec_shr_optab, where we also have a fallback in >have_whole_vector_shift >>>using normal permutations, this one doesn't need it, that "fallback" >is >>>tried >>>first before vec_shl_optab. >>> >>>For the vec_shl_optab checks, it tests only for constant number of >>>elements >>>vectors, not really sure if our VECTOR_CST encoding can express the >>>left >>>shifts in any way nor whether SVE supports those (I see aarch64 has >>>vec_shl_insert but that is just a fixed shift by element bits and >>>shifts in >>>a scalar rather than zeros). >>> >>>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >> >> Ok. > >I think it would be worth instead telling evpc that the second permute >vector is zero. Permutes with a second vector of zero are somewhat >special for SVE too, and could be in other cases for other targets. > >I thought the direction of travel was not to have optabs for specific >kinds of permute any more. E.g. zip, unzip and blend are common >permutes too, but we decided not to commonise those.
The issue is that vec_perm_const_ok doesn't have enough info here (second operand is all zeros). So the optab is needed to query target capabilities, not so much for the actual expansion which could go via vec_perm_const. I thought of having special permute vector entries to denote zero or don't - care which would make it possible to Have this info and allow these permutes to be single vector permutes. But then encoding this might be awkward. Richard. >Thanks, >Richard