https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96246
Bug ID: 96246
Summary: [AVX512] unefficient code generatation for vpblendm*
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: crazylht at gmail dot com
Target Milestone: ---
Target: i386, x86-64
cat test.c
---
typedef int v8si __attribute__ ((__vector_size__ (32)));
v8si
foo (v8si a, v8si b, v8si c, v8si d)
{
return a > b ? c : d;
}
---
gcc11 -O2 -mavx512f -mavx512vl
gcc generate
---
vpcmpd $6, %ymm1, %ymm0, %k1
vmovdqa32 %ymm2, %ymm3{%k1}
vmovdqa %ymm3, %ymm0
ret
---
could be optimized to
---
vpcmpd $6, %ymm1, %ymm0, %k1
vpblendmd %ymm2, %ymm3, %ymm0 {%k1}
---
gcc failed to generate optimal code because in sse.md
(define_insn "<avx512>_load<mode>_mask have the same pattern as
(define_insn "<avx512>_blendm<mode>" and existed early in the file, rtx pattern
match is always recognized as <avx512>_load<mode>_mask which missed opportunity
in pass_reload, and can't combine to <avx512>_blendm<mode> after reload.