On Wed, Jan 17, 2024 at 5:59 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > I thought I'd just missed the bug fixing season of stage3, but there > appears to a little latitude in early stage4 (for vector patches), so > I'll post this now. > > This patch resolves PR target/106060 by providing efficient methods for > materializing/synthesizing special "vector" constants on x86. Currently > there are three methods of materializing a vector constant; the most > general is to load a vector from the constant pool, secondly "duplicated" > constants can be synthesized by moving an integer between units and > broadcasting (or shuffling it), and finally the special cases of the > all-zeros vector and all-ones vectors can be loaded via a single SSE > instruction. This patch handles additional cases that can be synthesized > in two instructions, loading an all-ones vector followed by another SSE > instruction. Following my recent patch for PR target/112992, there's > conveniently a single place in i386-expand.cc where these special cases > can be handled. > > Two examples are given in the original bugzilla PR for 106060. > > __m256i > should_be_cmpeq_abs () > { > return _mm256_set1_epi8 (1); > } > > is now generated (with -O3 -march=x86-64-v3) as: > > vpcmpeqd %ymm0, %ymm0, %ymm0 > vpabsb %ymm0, %ymm0 > ret > > and > > __m256i > should_be_cmpeq_add () > { > return _mm256_set1_epi8 (-2); > } > > is now generated as: > > vpcmpeqd %ymm0, %ymm0, %ymm0 > vpaddb %ymm0, %ymm0, %ymm0 > ret > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2024-01-16 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR target/106060 > * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New. > (struct ix86_vec_bcast_map_simode_t): New type for table below. > (ix86_vec_bcast_map_simode): Table of SImode constants that may > be efficiently synthesized by a ix86_vec_bcast_alg method. > (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch. > (ix86_vector_duplicate_simode_const): Efficiently synthesize > V4SImode and V8SImode constants that duplicate special constants. > (ix86_vector_duplicate_value): Attempt to synthesize "special" > vector constants using ix86_vector_duplicate_simode_const. > * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a > vector integer mode costs with a single SSE instruction. >
+ switch (entry->alg) + { + case VEC_BCAST_PXOR: + if (mode == V8SImode && !TARGET_AVX2) + return false; + emit_move_insn (target, CONST0_RTX (mode)); + return true; + case VEC_BCAST_PCMPEQ: + if ((mode == V4SImode && !TARGET_SSE2) + || (mode == V8SImode && !TARGET_AVX2)) + return false; + emit_move_insn (target, CONSTM1_RTX (mode)); + return true; I think we need to prevent those standard_sse_constant_p getting in ix86_expand_vector_init_duplicate by below codes. /* If all values are identical, broadcast the value. */ if (all_same && (nvars != 0 || !standard_sse_constant_p (gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0)), mode)) && ix86_expand_vector_init_duplicate (mmx_ok, mode, target, XVECEXP (vals, 0, 0))) return; + case VEC_BCAST_PABSB: + if (mode == V4SImode) + { + tmp1 = gen_reg_rtx (V16QImode); + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode)); + tmp2 = gen_reg_rtx (V16QImode); + emit_insn (gen_absv16qi2 (tmp2, tmp1)); Shouldn't it rely on TARGET_SSE2? + case VEC_BCAST_PADDB: + if (mode == V4SImode) + { + tmp1 = gen_reg_rtx (V16QImode); + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode)); + tmp2 = gen_reg_rtx (V16QImode); + emit_insn (gen_addv16qi3 (tmp2, tmp1, tmp1)); Ditto here and for all logic shift cases. + } + + if ((mode == V4SImode || mode == V8SImode) + && CONST_INT_P (val) + && ix86_vector_duplicate_simode_const (mode, target, INTVAL (val))) + return true; + The alternative way is adding a pre_reload define_insn_and_split to match specific const_vector and splitt it into new instructions. In theoritically, the constant info can be retained before combine and will enable more simplication. Also the patch can be extend to V16SImode, but it can be a separate patch. > gcc/testsuite/ChangeLog > PR target/106060 > * gcc.target/i386/auto-init-8.c: Update test case. > * gcc.target/i386/avx512fp16-3.c: Likewise. > * gcc.target/i386/pr100865-9a.c: Likewise. > * gcc.target/i386/pr106060-1.c: New test case. > * gcc.target/i386/pr106060-2.c: Likewise. > * gcc.target/i386/pr106060-3.c: Likewise. > * gcc.target/i386/pr70314-3.c: Update test case. > * gcc.target/i386/vect-shiftv4qi.c: Likewise. > * gcc.target/i386/vect-shiftv8qi.c: Likewise. > > > Thanks in advance, > Roger > -- > -- BR, Hongtao