On Fri, Jan 26, 2024 at 3:03 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Hongtao, > Many thanks for the review. Here's a revised version of my patch > that addresses (most of) the issues you've raised. Firstly the > handling of zero and all_ones in this function is mostly for > completeness/documentation, these standard_sse_constant_p > values are (currently/normally) handled elsewhere. But I have > added an "n_var == 0" optimization to ix86_expand_vector_init. > > As you've suggested I've added explicit TARGET_SSE2 tests where > required, and for consistency I've also added support for AVX512's > V16SImode. > > As you've predicted, the eventual goal is to move this after combine > (or reload) using define_insn_and_split, but that requires a significant > restructuring that should be done in steps. This also interacts with > a similar planned reorganization of TImode constant handling. If > all 128-bit (vector) constants are acceptable before combine, then > STV has the freedom to chose V1TImode (and this broadcast > functionality) to implement TImode operations on immediate > constants. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline (in stage 1)? Ok, thanks for handling this. > > > 2024-01-25 Roger Sayle <ro...@nextmovesoftware.com> > Hongtao Liu <hongtao....@intel.com> > > gcc/ChangeLog > PR target/106060 > * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New. > (struct ix86_vec_bcast_map_simode_t): New type for table below. > (ix86_vec_bcast_map_simode): Table of SImode constants that may > be efficiently synthesized by a ix86_vec_bcast_alg method. > (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch. > (ix86_vector_duplicate_simode_const): Efficiently synthesize > V4SImode and V8SImode constants that duplicate special constants. > (ix86_vector_duplicate_value): Attempt to synthesize "special" > vector constants using ix86_vector_duplicate_simode_const. > * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a > vector integer mode costs with a single SSE instruction. > > gcc/testsuite/ChangeLog > PR target/106060 > * gcc.target/i386/auto-init-8.c: Update test case. > * gcc.target/i386/avx512fp16-3.c: Likewise. > * gcc.target/i386/pr100865-9a.c: Likewise. > * gcc.target/i386/pr101796-1.c: Likewise. > * gcc.target/i386/pr106060-1.c: New test case. > * gcc.target/i386/pr106060-2.c: Likewise. > * gcc.target/i386/pr106060-3.c: Likewise. > * gcc.target/i386/pr70314.c: Update test case. > * gcc.target/i386/vect-shiftv4qi.c: Likewise. > * gcc.target/i386/vect-shiftv8qi.c: Likewise. > > > Roger > -- > > > -----Original Message----- > > From: Hongtao Liu <crazy...@gmail.com> > > Sent: 17 January 2024 03:13 > > To: Roger Sayle <ro...@nextmovesoftware.com> > > Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubiz...@gmail.com> > > Subject: Re: [x86 PATCH] PR target/106060: Improved SSE vector constant > > materialization. > > > > On Wed, Jan 17, 2024 at 5:59 AM Roger Sayle <ro...@nextmovesoftware.com> > > wrote: > > > > > > > > > I thought I'd just missed the bug fixing season of stage3, but there > > > appears to a little latitude in early stage4 (for vector patches), so > > > I'll post this now. > > > > > > This patch resolves PR target/106060 by providing efficient methods > > > for materializing/synthesizing special "vector" constants on x86. > > > Currently there are three methods of materializing a vector constant; > > > the most general is to load a vector from the constant pool, secondly > > "duplicated" > > > constants can be synthesized by moving an integer between units and > > > broadcasting (or shuffling it), and finally the special cases of the > > > all-zeros vector and all-ones vectors can be loaded via a single SSE > > > instruction. This patch handles additional cases that can be synthesized > > > in two instructions, loading an all-ones vector followed by another > > > SSE instruction. Following my recent patch for PR target/112992, > > > there's conveniently a single place in i386-expand.cc where these > > > special cases can be handled. > > > > > > Two examples are given in the original bugzilla PR for 106060. > > > > > > __m256i > > > should_be_cmpeq_abs () > > > { > > > return _mm256_set1_epi8 (1); > > > } > > > > > > is now generated (with -O3 -march=x86-64-v3) as: > > > > > > vpcmpeqd %ymm0, %ymm0, %ymm0 > > > vpabsb %ymm0, %ymm0 > > > ret > > > > > > and > > > > > > __m256i > > > should_be_cmpeq_add () > > > { > > > return _mm256_set1_epi8 (-2); > > > } > > > > > > is now generated as: > > > > > > vpcmpeqd %ymm0, %ymm0, %ymm0 > > > vpaddb %ymm0, %ymm0, %ymm0 > > > ret > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=unix{-m32} > > > with no new failures. Ok for mainline? > > > > > > > > > 2024-01-16 Roger Sayle <ro...@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > PR target/106060 > > > * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New. > > > (struct ix86_vec_bcast_map_simode_t): New type for table below. > > > (ix86_vec_bcast_map_simode): Table of SImode constants that may > > > be efficiently synthesized by a ix86_vec_bcast_alg method. > > > (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch. > > > (ix86_vector_duplicate_simode_const): Efficiently synthesize > > > V4SImode and V8SImode constants that duplicate special constants. > > > (ix86_vector_duplicate_value): Attempt to synthesize "special" > > > vector constants using ix86_vector_duplicate_simode_const. > > > * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a > > > vector integer mode costs with a single SSE instruction. > > > > > > > + switch (entry->alg) > > + { > > + case VEC_BCAST_PXOR: > > + if (mode == V8SImode && !TARGET_AVX2) return false; > > + emit_move_insn (target, CONST0_RTX (mode)); > > + return true; > > + case VEC_BCAST_PCMPEQ: > > + if ((mode == V4SImode && !TARGET_SSE2) > > + || (mode == V8SImode && !TARGET_AVX2)) return false; > > + emit_move_insn (target, CONSTM1_RTX (mode)); > > + return true; > > > > I think we need to prevent those standard_sse_constant_p getting in > > ix86_expand_vector_init_duplicate by below codes. > > > > /* If all values are identical, broadcast the value. */ > > if (all_same > > && (nvars != 0 || !standard_sse_constant_p (gen_rtx_CONST_VECTOR > > (mode, XVEC (vals, 0)), mode)) > > && ix86_expand_vector_init_duplicate (mmx_ok, mode, target, > > XVECEXP (vals, 0, 0))) > > return; > > > > + case VEC_BCAST_PABSB: > > + if (mode == V4SImode) > > + { > > + tmp1 = gen_reg_rtx (V16QImode); > > + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode)); > > + tmp2 = gen_reg_rtx (V16QImode); > > + emit_insn (gen_absv16qi2 (tmp2, tmp1)); > > Shouldn't it rely on TARGET_SSE2? > > > > + case VEC_BCAST_PADDB: > > + if (mode == V4SImode) > > + { > > + tmp1 = gen_reg_rtx (V16QImode); > > + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode)); > > + tmp2 = gen_reg_rtx (V16QImode); > > + emit_insn (gen_addv16qi3 (tmp2, tmp1, tmp1)); > > Ditto here and for all logic shift cases. > > + } > > > > + > > + if ((mode == V4SImode || mode == V8SImode) > > + && CONST_INT_P (val) > > + && ix86_vector_duplicate_simode_const (mode, target, INTVAL (val))) > > + return true; > > + > > The alternative way is adding a pre_reload define_insn_and_split to match > > specific const_vector and splitt it into new instructions. > > In theoritically, the constant info can be retained before combine and will > > enable > > more simplication. > > > > Also the patch can be extend to V16SImode, but it can be a separate patch. > > > > > gcc/testsuite/ChangeLog > > > PR target/106060 > > > * gcc.target/i386/auto-init-8.c: Update test case. > > > * gcc.target/i386/avx512fp16-3.c: Likewise. > > > * gcc.target/i386/pr100865-9a.c: Likewise. > > > * gcc.target/i386/pr106060-1.c: New test case. > > > * gcc.target/i386/pr106060-2.c: Likewise. > > > * gcc.target/i386/pr106060-3.c: Likewise. > > > * gcc.target/i386/pr70314-3.c: Update test case. > > > * gcc.target/i386/vect-shiftv4qi.c: Likewise. > > > * gcc.target/i386/vect-shiftv8qi.c: Likewise. > > > > > > > > > Thanks in advance, > > > Roger > > > -- > > > > > > > > > -- > > BR, > > Hongtao
-- BR, Hongtao