https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120647
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|rtl-optimization |tree-optimization
Last reconfirmed| |2025-06-19
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Blocks| |53947
Target| |x86_64-*-*
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. Conditional reduction could have a special case where popcount on
the condition mask is available. In principle the generated code isn't that
bad - but we are using unpacking of the mask from the vec<char> compare to
perform a .COND_ADD of vec<int>. It might be more efficient to unpack a
vec<char> of zeros or ones to add to four IVs or in the case of constant niters
(48 here),
choose a narrower counting IV (char) and only reduce to an int in the epilogue.
That would get you the following when there's no popcount. How vector masks
transfer to GPRs is a bit iffy at the moment (but it would work).
vector_comparison:
.LFB0:
.cfi_startproc
vmovdqu8 (%rsi), %ymm1
vpcmpeqd %ymm0, %ymm0, %ymm0
vpcmpeqb (%rdi), %ymm1, %k1
vpabsb %ymm0, %ymm1{%k1}{z}
vmovdqa %xmm1, %xmm2
vextracti32x4 $0x1, %ymm1, %xmm1
vpaddb %xmm1, %xmm2, %xmm2
vmovdqu8 32(%rsi), %xmm1
vpcmpeqb 32(%rdi), %xmm1, %k1
vpabsb %xmm0, %xmm1
vpaddb %xmm1, %xmm2, %xmm2{%k1}
vpsrldq $8, %xmm2, %xmm1
vpaddb %xmm1, %xmm2, %xmm0
vpxor %xmm1, %xmm1, %xmm1
vpsadbw %xmm1, %xmm0, %xmm0
vpextrb $0, %xmm0, %eax
movsbl %al, %eax
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations