https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770
--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #14) > So we vectorize to > > _18 = .POPCOUNT (vect__5.7_22); > _17 = .POPCOUNT (vect__5.7_21); > vect__6.8_16 = VEC_PACK_TRUNC_EXPR <_18, _17>; > _6 = 0; > _7 = dest_13(D) + _2; > vect__8.9_10 = [vec_unpack_lo_expr] vect__6.8_16; > vect__8.9_9 = [vec_unpack_hi_expr] vect__6.8_16; > _8 = (long long int) _6; > > which is exactly the issue that in the scalar code we have a 'int' producing > popcount with long long argument but the vector IFN produces a result of the > same width as the argument. So the vectorizer compensates for that > (VEC_PACK_TRUNC_EXPR) and then vectorizes the widening that's in the scalar > code (vec_unpack_{lo,hi}_expr). The fix for this and for the missing > byte and word variants is to add a pattern to tree-vect-patterns.c for this > case matching it to the .POPCOUNT internal function. That possibly applies > to other bitops, too, like parity, ctz, ffs, etc. There's quite some > _widen helpers in the pattern recog code so I'm not sure how complicated > it is to match > > (long)popcountl(long) > > and > > (short)popcount((int)short) > > Richard may have a good idea since he did the last "big" surgery there. Any suggestion for this, should we change prototype of builtins or add vec_recog_popcnt_pattern in vectorizer?