https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
Bug ID: 110062
Summary: missed vectorization in graphicsmagick
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Phoronix claims 31% performance difference between gcc13 and clang on sharpen
benchmark of graphicsmagick. On zen3 I reproduce only 4%, but the benchmark
has only single short internal loop:
214
97.56% gm gm [.] ConvolveImage.◆
0.88% gm libgomp.so.1.0.0 [.] 0x000000000002▒
0.67% gm libc.so.6 [.] __memmove_avx_▒
GCC version:
2.38 │500:┌─→vmovss (%r8,%rax,4),%xmm2 ▒
0.04 │ │ movzbl 0x2(%rdx,%rax,4),%ebp ▒
0.09 │ │ vcvtsi2ss %ebp,%xmm0,%xmm1 ▒
7.44 │ │ movzbl 0x1(%rdx,%rax,4),%ebp ▒
0.16 │ │ vfmadd231ss %xmm1,%xmm2,%xmm7 ▒
30.23 │ │ vcvtsi2ss %ebp,%xmm0,%xmm1 ▒
2.38 │ │ movzbl (%rdx,%rax,4),%ebp ▒
0.03 │ │ inc %rax ▒
0.00 │ │ vfmadd231ss %xmm1,%xmm2,%xmm9 ▒
22.80 │ │ vcvtsi2ss %ebp,%xmm0,%xmm1 ▒
1.03 │ │ vfmadd231ss %xmm1,%xmm2,%xmm10 ▒
30.49 │ ├──cmp %rax,%rbx ▒
0.18 │ └──jne 500 ▒
Clangs:
0.00 │1e70:┌─→movzbl 0x2(%rdx,%rsi,4),%r9d ▒
0.05 │ │ vbroadcastss (%rcx,%rsi,4),%xmm3 ▒
0.56 │ │ movzwl (%rdx,%rsi,4),%r11d ▒
0.05 │ │ inc %rsi ▒
0.00 │ │ vcvtsi2ss %r9d,%xmm10,%xmm2 ▒
0.71 │ │ vfmadd231ss %xmm2,%xmm3,%xmm0 ▒
1.17 │ │ vmovd %r11d,%xmm2 ▒
0.00 │ │ vpmovzxbd %xmm2,%xmm2 ▒
0.06 │ │ vcvtdq2ps %xmm2,%xmm2 ▒
0.89 │ │ vfmadd231ps %xmm2,%xmm3,%xmm1 ▒
1.98 │ ├──cmp %rsi,%r10 ▒
0.00 │ └──jne 1e70 ▒
0.00 │ ↑ jmp 1630 ▒
Probably same issue as in PR109812 but reproduces on zens and loop is even
shorter.