https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97770
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #4) > What's missing is middle-end folding support to narrow popcount to the > appropriate internal function call with byte/half-word width when target > support > is available. But I'm quite sure there's no scalar popcount instruction > operating on half-word or byte pieces of a GPR? > > Alternatively the vectorizer can use patterns to do this. Yes, but for 64bit width, vectorizer generate suboptimal code. sse #c3 vector(2) long long unsigned int vect__4.6; vector(2) long long unsigned int vect__4.5; vector(2) long long unsigned int _8; vector(2) long long unsigned int _26; ... ... _8 = .POPCOUNT (vect__4.5_16); _26 = .POPCOUNT (vect__4.6_9); vect__5.7_22 = VEC_PACK_TRUNC_EXPR <_8, _26>; --- Why do we do this? vector(4) int vect__5.7; It could generate directly v4di = .POPCOUNT (v4di);