Hi! On Tue, Mar 16, 2021 at 07:57:17PM +0100, Jakub Jelinek wrote: > On Thu, Mar 11, 2021 at 07:57:23AM +0800, Xionghu Luo via Gcc-patches wrote: > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > > index ec068c58aa5..48eb91132a9 100644 > > --- a/gcc/config/rs6000/rs6000.c > > +++ b/gcc/config/rs6000/rs6000.c > > @@ -7000,11 +7000,15 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx > > val, rtx idx) > > > > gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx)); > > > > - gcc_assert (GET_MODE (idx) == E_SImode); > > - > > machine_mode inner_mode = GET_MODE (val); > > > > - rtx tmp = gen_reg_rtx (GET_MODE (idx)); > > + machine_mode idx_mode = GET_MODE (idx); > > + rtx tmp = gen_reg_rtx (DImode); > > + if (idx_mode != DImode) > > + tmp = convert_modes (DImode, idx_mode, idx, 0); > > + else > > + tmp = idx; > > + > > int width = GET_MODE_SIZE (inner_mode); > > > > gcc_assert (width >= 1 && width <= 8); > > @@ -7012,9 +7016,7 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, > > rtx idx) > > int shift = exact_log2 (width); > > /* Generate the IDX for permute shift, width is the vector element size. > > idx = idx * width. */ > > - emit_insn (gen_ashlsi3 (tmp, idx, GEN_INT (shift))); > > - > > - tmp = convert_modes (DImode, SImode, tmp, 1); > > + emit_insn (gen_ashldi3 (tmp, tmp, GEN_INT (shift))); > > This looks broken.
Yup. > The gen_reg_rtx (DImode); call result is completely ignored, > so it wastes one pseudo, and I'm not convinced that idx nor tmp > is guaranteed to be usable as output of shift. > So, shouldn't it be instead: > rtx tmp = gen_reg_rtx (DImode); > if (idx_mode != DImode) > idx = convert_modes (DImode, idx_mode, idx, 0); > > ... > emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift)); > ? Yeah, something like that. Just idx = convert_modes (DImode, idx_mode, idx, 1); ... rtx tmp = gen_reg_rtx (DImode); emit_insn (gen_ashldi3 (tmp, idx, GEN_INT (shift)); works just as well. > Also, dunno what is less and what is more expensive on > rs6000, whether convert_modes with unsigned = 0 or 1. Unsigned is never more expensive. Sometimes it is cheaper. It depends more on the situation what is best. Just look at the generated code? > I think rs6000 only supports very narrow vectors, so even for The only hardware vectors are 128 bits. There are modes for two vectors concatenated as well, but that is just for RTL, you should not have insns making such a pair. > say QImode idx anything with MSB set will be out of out of bounds and UB, > so you can sign or zero extend, whatever is more efficient. Yup. > > + machine_mode idx_mode = GET_MODE (idx); > > + rtx tmp = gen_reg_rtx (DImode); > > + if (idx_mode != DImode) > > + tmp = convert_modes (DImode, idx_mode, idx, 0); > > + else > > + tmp = idx; > > + > > gcc_assert (width >= 1 && width <= 4); > > > > if (!BYTES_BIG_ENDIAN) > > { > > /* idx = idx * width. */ > > - emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width))); > > + emit_insn (gen_muldi3 (tmp, tmp, GEN_INT (width))); > > /* idx = idx + 8. */ > > - emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8))); > > + emit_insn (gen_adddi3 (tmp, tmp, GEN_INT (8))); > > } > > else > > { > > - emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width))); > > - emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp)); > > + emit_insn (gen_muldi3 (tmp, idx, GEN_INT (width))); > > + emit_insn (gen_subdi3 (tmp, GEN_INT (24 - width), tmp)); > > Ditto. But the above was even inconsistent, sometimes it > used tmp (for LE) and otherwise idx in whatever mode it has. Thanks for the review Jakub, I kinda dropped this :-( Segher