On Thu, Dec 2, 2021 at 6:07 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > Introduce vec_set_0 pattern for V8HI and V8HF modes to implement scalar > element 0 inserts to from a GP register, SSE register or memory. Also > add V8HI and V8HF AVX2 (x,x,x) alternative to PINSR insn pattern, which is > split after reload to a sequence of PBROADCASTW and PBLENDW. > > The V8HF inserts from memory improve from: > > - vpbroadcastw 4(%esp), %xmm1 > - vpblendw $16, %xmm1, %xmm0, %xmm0 > + vpinsrw $4, 4(%esp), %xmm0, %xmm0 > > and V8HF inserts from SSE register to element 0 improve from: > > vpxor %xmm2, %xmm2, %xmm2 > - vpbroadcastw %xmm0, %xmm0 > vpblendw $1, %xmm0, %xmm2, %xmm0 > > Based on the above improvements, the register allocator is able to determine > the optimal instruction (or instruction sequence) based on the register set > of the input value, so there is no need to manually expand V8HI and V8HF > inserts to the sequence of VEC_DUPLICATE and VEC_MERGE RTXes. > Thanks. > 2021-12-01 Uroš Bizjak <ubiz...@gmail.com> > > gcc/ChangeLog: > > PR target/102811 > * config/i386/sse.md (VI2F): Remove mode iterator. > (VI2F_256_512): New mode iterator. > (vec_set<V8_128:mode>_0): New insn pattern. > (vec_set<VI2F_256_512:mode>_0>): Rename from vec_set<VI2F:mode>mode. > Use VI2F_256_512 mode iterator instead of VI2F. > (*axv512fp16_movsh): Remove. > (<sse2p4_1>_pinsr<ssemodesuffix>): Add (x,x,x) AVX2 alternative. > Do not disable V8HF mode insn on AVX2 targets. > (pinsrw -> pbroadcast + pblendw peephole2): New peephole. > (pinsrw -> pbroadcast + pblendw splitter): New post-reload splitter. > * config/i386/i386.md (extendhfsf): Call gen_vec_setv8hf_0. > * config/i386/i386-expand.c (ix86_expand_vector_set) > <case E_V8HFmode>: Use vec_merge path for TARGET_AVX2. > > gcc/testsuite/ChangeLog: > > PR target/102881 > * gcc.target/i386/pr102811-1.c: New test. > * gcc.target/i386/avx512fp16-1c.c (dg-final): Update > scan-assembler-times scan strings for ia32 targets. > * gcc.target/i386/pr102327-1.c (dg-final): Ditto. > * gcc.target/i386/pr102811.c: Rename from ... > * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: ... this. > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32} w/ and > w/o -mf16c. > > Pushed to master. > > Uros.
-- BR, Hongtao