On Wed, Jun 9, 2021 at 12:44 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > [For some reason this message didn't reach my gmail account] > > > 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO > > operands to vector broadcast from an integer with AVX2. > > 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which > > won't increase stack alignment requirement and blocks transformation by > > the combine pass. > > 3. Update PR 87767 tests to expect integer broadcast instead of broadcast > > from memory. > > 4. Update avx512f_cond_move.c to expect integer broadcast. > > + else if (TARGET_64BIT > + && ix86_broadcast (val, GET_MODE_BITSIZE (DImode), > + val_broadcast)) > + { > + /* NB: MOVQ takes a 32-bit signed immediate operand. */ > + if (trunc_int_for_mode (val_broadcast, SImode) != val_broadcast) > + return nullptr; > + broadcast_mode = DImode; > + } > + else > + return nullptr; > > We have MOVABS insn and movdi_internal knows when to switch between > MOVQ and MOVABS.
Fixed in the v4 patch. > + if (!ix86_expand_vector_init_duplicate (false, vector_mode, target, > + GEN_INT (val_broadcast))) > + gcc_unreachable (); > > We are using: > > bool ok = ix86_expand_vector_init_duplicate (...); > gcc_assert (ok); > > idiom throughout i386/. Let's keep it this way. Fixed in the v4 patch. > + if (REGNO (target) < FIRST_PSEUDO_REGISTER) > + target = gen_rtx_REG (mode, REGNO (target)); > + else > + target = convert_to_mode (mode, target, 1); > + > > This is not needed. lowpart_subreg should do the trick when changing > mode of hard regs (also see comment for ix86_gen_scratch_sse_rtx). Fixed in the v4 patch. > + rtx first; > + > + if (can_create_pseudo_p () > + && GET_MODE_SIZE (mode) >= 16 > + && GET_MODE_CLASS (mode) == MODE_VECTOR_INT > + && (MEM_P (op1) > + && SYMBOL_REF_P (XEXP (op1, 0)) > + && CONSTANT_POOL_ADDRESS_P (XEXP (op1, 0))) > + && (first = ix86_broadcast_from_integer_constant (mode, op1))) > + { > + /* Broadcast to XMM/YMM/ZMM register from an integer constant. */ > + op1 = ix86_gen_scratch_sse_rtx (mode, false); > + if (!ix86_expand_vector_init_duplicate (false, mode, op1, first)) > + gcc_unreachable (); > + emit_move_insn (op0, op1); > + return; > > Please try to avoid assignment inside the condition. And also use > "gcc_assert (ok)" here. Fixed in the v4 patch. > +/* Return a scratch register in MODE for vector load and store. If > + CONSTANT_INT_BROADCAST is true, it is used to hold constant integer > + broadcast result. */ > + > +rtx > +ix86_gen_scratch_sse_rtx (machine_mode mode, > + bool constant_int_broadcast) > > This function should always return hard reg, simply: > > return gen_rtx_REG (mode, (TARGET_64BIT > ? LAST_REX_SSE_REG : LAST_SSE_REG)); > > The complications with pseudo does not bring us anything (at the end > we need a hard reg anyway, and I guess reload knows quite well how to > avoid used temporary). Fixed in the v4 patch. > The function can then be renamed to ix86_gen_scratch_sse_reg. I'd like to keep the ix86_gen_scratch_sse_rtx name: rtx ix86_gen_scratch_sse_rtx (machine_mode mode) { if (TARGET_SSE) return gen_rtx_REG (mode, (TARGET_64BIT ? LAST_REX_SSE_REG : LAST_SSE_REG)); else return gen_reg_rtx (mode); } so that it can be used to implement the target hook for memset later. > * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer > broadcast. > * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise. > * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise. > * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise. > * gcc.target/i386/avx512f_cond_move.c: Also pass > -mprefer-vector-width=512 and expect integer broadcast. > > No review for the above changes for AVX512 tests, someone else should > check if the new code is better here. A small benchmark: https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast shows that integer broadcast is faster than embedded memory broadcast: $ make gcc -g -I. -O2 -march=skylake-avx512 -c -o test.o test.c gcc -g -c -o memory.o memory.S gcc -g -c -o broadcast.o broadcast.S gcc -o test test.o memory.o broadcast.o ./test memory : 425538 broadcast : 375260 $ Thanks. -- H.J.