[Bug tree-optimization/97770] [ICELAKE]Missing vectorization for vpopcnt

crazylht at gmail dot com via Gcc-bugs Fri, 04 Jun 2021 00:41:49 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770


--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #14)
> So we vectorize to
> 
>   _18 = .POPCOUNT (vect__5.7_22);
>   _17 = .POPCOUNT (vect__5.7_21);
>   vect__6.8_16 = VEC_PACK_TRUNC_EXPR <_18, _17>;
>   _6 = 0;
>   _7 = dest_13(D) + _2;
>   vect__8.9_10 = [vec_unpack_lo_expr] vect__6.8_16;
>   vect__8.9_9 = [vec_unpack_hi_expr] vect__6.8_16;
>   _8 = (long long int) _6;
> 
> which is exactly the issue that in the scalar code we have a 'int' producing
> popcount with long long argument but the vector IFN produces a result of the
> same width as the argument.  So the vectorizer compensates for that
> (VEC_PACK_TRUNC_EXPR) and then vectorizes the widening that's in the scalar
> code (vec_unpack_{lo,hi}_expr).  The fix for this and for the missing
> byte and word variants is to add a pattern to tree-vect-patterns.c for this
> case matching it to the .POPCOUNT internal function.  That possibly applies
> to other bitops, too, like parity, ctz, ffs, etc.  There's quite some
> _widen helpers in the pattern recog code so I'm not sure how complicated
> it is to match
> 
>  (long)popcountl(long)
> 
> and
> 
>  (short)popcount((int)short)
> 
> Richard may have a good idea since he did the last "big" surgery there.

Any suggestion for this, should we change prototype of builtins or add
vec_recog_popcnt_pattern in vectorizer?

[Bug tree-optimization/97770] [ICELAKE]Missing vectorization for vpopcnt

Reply via email to