On Thu, Aug 27, 2020 at 2:25 PM Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Thu, Jul 09, 2020 at 04:33:46PM +0800, Hongtao Liu via Gcc-patches wrote: > > +static void > > +replace_constant_pool_with_broadcast (rtx_insn* insn) > > +{ > > + subrtx_ptr_iterator::array_type array; > > + FOR_EACH_SUBRTX_PTR (iter, array, &PATTERN (insn), ALL) > > + { > > + rtx *loc = *iter; > > + rtx x = *loc; > > + rtx broadcast_mem, vec_dup, constant, first; > > + machine_mode mode; > > + if (GET_CODE (x) != MEM > > MEM_P > > > + || GET_CODE (XEXP (x, 0)) != SYMBOL_REF > > SYMBOL_REF_P > > > + || !CONSTANT_POOL_ADDRESS_P (XEXP (x, 0))) > > + continue; > > + > > + mode = GET_MODE (x); > > + if (!VECTOR_MODE_P (mode)) > > + return; > > + > > + constant = get_pool_constant (XEXP (x, 0)); > > + first = XVECEXP (constant, 0, 0); > > Shouldn't this verify first that GET_CODE (constant) == CONST_VECTOR > and punt otherwise? > > > + broadcast_mem = force_const_mem (GET_MODE_INNER (mode), first); > > + vec_dup = gen_rtx_VEC_DUPLICATE (mode, broadcast_mem); > > + *loc = vec_dup; > > + INSN_CODE (insn) = -1; > > + /* Revert change if there's no corresponding pattern. */ > > + if (recog_memoized (insn) < 0) > > + { > > + *loc = x; > > + recog_memoized (insn); > > + } > > The usual way of doing this would be through > validate_change (insn, loc, vec_dup, 0); > > Also, isn't the pass also useful for TARGET_AVX and above (but in that case > only if it is a simple memory load)? Or are avx/avx2 broadcast slower than > full vector loads? > > As Jeff wrote, I wonder if when successfully replacing those pool constants > the old constant pool entries will be omitted. > > Another thing I wonder about is whether more analysis shouldn't be used. > E.g. if the constant pool entry is already emitted into .rodata anyway > (e.g. some earlier function needed it), using the broadcast will mean > actually larger .rodata. If {1to8} and similar is as fast as reading all > the same elements from memory (or faster), perhaps in that case it should > broadcast from the first element of the existing constant pool full vector > rather than creating a new one. > And similarly, perhaps the function should look at all constant pool entries > in the current function (not yet emitted into .rodata) and if it would > succeed for some and not for others, either use broadcast from its first > element or not perform it for the others too.
IIRC I once implemented this (re-using vector constant components for non-vector pool entries) but it was quite hackish and never merged it seems. Richard. > Jakub >