"H.J. Lu" <hjl.to...@gmail.com> writes: > On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes: >> > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford >> > <richard.sandif...@arm.com> wrote: >> >> >> >> "H.J. Lu" <hjl.to...@gmail.com> writes: >> >> > 1. Update vec_duplicate to allow to fail so that backend can only allow >> >> > broadcasting an integer constant to a vector when broadcast instruction >> >> > is available. This can be used by memset expander to avoid >> >> > vec_duplicate >> >> > when loading from constant pool is more efficient. >> >> >> >> I don't see any changes in target-independent code though, other than >> >> the doc update. It's still the case that (existing) uses of >> >> vec_duplicate_optab do not allow it to fail. >> > >> > I have a followup patch set on >> > >> > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast >> > >> > to use it to expand memset with vector broadcast: >> > >> > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3 >> > >> > For SSE2 which doesn't have vector broadcast, the constant vector broadcast >> > expander returns FAIL and load from constant pool will be used. >> >> Hmm, but as Jeff and I mentioned in the earlier replies, >> vec_duplicate_optab shouldn't be used for constants. Constants >> should go via the move expanders instead. >> >> In a previous message I suggested: >> >> … would it work to change: >> >> /* Try using vec_duplicate_optab for uniform vectors. */ >> if (!TREE_SIDE_EFFECTS (exp) >> && VECTOR_MODE_P (mode) >> && eltmode == GET_MODE_INNER (mode) >> && ((icode = optab_handler (vec_duplicate_optab, mode)) >> != CODE_FOR_nothing) >> && (elt = uniform_vector_p (exp))) >> >> to something like: >> >> /* Try using vec_duplicate_optab for uniform vectors. */ >> if (!TREE_SIDE_EFFECTS (exp) >> && VECTOR_MODE_P (mode) >> && eltmode == GET_MODE_INNER (mode) >> && (elt = uniform_vector_p (exp))) >> { >> if (TREE_CODE (elt) == INTEGER_CST >> || TREE_CODE (elt) == POLY_INT_CST >> || TREE_CODE (elt) == REAL_CST >> || TREE_CODE (elt) == FIXED_CST) >> { >> rtx src = gen_const_vec_duplicate (mode, expand_normal >> (node)); >> emit_move_insn (target, src); >> break; >> } >> … >> } >> >> if that code was the source of the constant operand. If we're adding a >> new use of vec_duplicate_optab then that should be similarly protected >> against constant operands. >> > > Your comments apply to my initial vec_duplicate patch that caused the > gcc.dg/pr100239.c failure. It has been fixed by > > commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515 > Author: Richard Biener <rguent...@suse.de> > Date: Mon Jun 7 20:08:13 2021 +0200 > > middle-end/100951 - make sure to generate VECTOR_CST in lowering > > When vector lowering creates piecewise ops make sure to create > VECTOR_CSTs instead of CONSTRUCTORs when possible. > > The problem I am running into now is in my memset vector broadcast > patch. In order to optimize vector broadcast for memset, I need to > generate a pseudo register for > > __builtin_memset (ops, 3, 38); > > only when vector broadcast is available: > > rtx target = nullptr; > > unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode); > machine_mode vector_mode; > if (!mode_for_vector (QImode, nunits).exists (&vector_mode)) > gcc_unreachable (); > > enum insn_code icode = optab_handler (vec_duplicate_optab, > vector_mode); > if (icode != CODE_FOR_nothing) > { > rtx reg = targetm.gen_memset_scratch_rtx (vector_mode); > class expand_operand ops[2]; > create_output_operand (&ops[0], reg, vector_mode); > create_input_operand (&ops[1], data, QImode); > if (maybe_expand_insn (icode, 2, ops)) > { > if (!rtx_equal_p (reg, ops[0].value)) > emit_move_insn (reg, ops[0].value); > target = lowpart_subreg (mode, reg, vector_mode); > } > } > > return target; <<< Return nullptr to load from constant pool.
I don't think this is a correct use of vec_duplicate_optab. If the scalar operand is a constant then the move should always go through the move expanders instead, as a move from a CONST_VECTOR. Thanks, Richard