https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
With just -mavx512f we produce a bunch of instructions (looking like we went to
scalar mode) while LLVM is able to produce:
foo(short __vector(16)): # @foo(short __vector(16))
.cfi_startproc
# %bb.0:
vpmovzxwd ymm1, xmm0 # ymm1 =
xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
vextracti128 xmm0, ymm0, 1
vpmovzxwd ymm0, xmm0 # ymm0 =
xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
vinserti64x4 zmm0, zmm1, ymm0, 1
ret
bar(short __vector(32)): # @bar(short __vector(32))
.cfi_startproc
# %bb.0:
vpmovdw ymm0, zmm0
ret
For -march=skylake512 we do produce now:
foo(short __vector(16)):
vpmovzxwd zmm0, ymm0
ret
bar(short __vector(32)):
vpmovdw ymm0, zmm0
ret
So still confirmed for the -mavx512f case.