https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80846
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> --- So after Jakubs update the vectorizer patch yields sumint: .LFB0: .cfi_startproc vpxor %xmm0, %xmm0, %xmm0 leaq 4096(%rdi), %rax .p2align 4,,10 .p2align 3 .L2: vpaddd (%rdi), %ymm0, %ymm0 addq $32, %rdi cmpq %rdi, %rax jne .L2 vextracti128 $1, %ymm0, %xmm1 vpaddd %xmm0, %xmm1, %xmm0 vpsrldq $8, %xmm0, %xmm1 vpaddd %xmm1, %xmm0, %xmm0 vpsrldq $4, %xmm0, %xmm1 vpaddd %xmm1, %xmm0, %xmm0 vmovd %xmm0, %eax vzeroupper ret that's not using the unpacking strategy (sum adjacent elements) but still the vector shift approach (add upper/lower halves). That's sth that can be changed independently. Waiting for final vec_extract/init2 optab settling.