On Fri, Jan 14, 2022 at 10:00 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > Here's a revised version of this patch incorporating your suggestion of using > force_reg instead of emit_move_insn to a pseudo allocated by gen_reg_rtx. > I also took the opportunity to transition the rest of the function (and > clean-up > those around it) to use this preferred idiom. > > This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap > and make -k check with no new failures. OK for mainline? > > > 2022-01-14 Roger Sayle <ro...@nextmovesoftware.com> > Uroš Bizjak <ubiz...@gmail.com> > > gcc/ChangeLog > * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Use force_reg. > (ix86_expand_ti_to_v1ti): Use force_reg. > (ix86_expand_v1ti_shift): Use force_reg. > (ix86_expand_v1ti_rotate): Use force_reg. > (ix86_expand_v1ti_ashiftrt): Provide new three operation > implementations for shifts by 111..126 bits. Use force_reg.
LGTM, as far as I can review the patch due to code churn... Thanks, Uros. > > Thanks again, > Roger > -- > > > -----Original Message----- > > From: Uros Bizjak <ubiz...@gmail.com> > > Sent: 12 January 2022 19:18 > > To: Roger Sayle <ro...@nextmovesoftware.com> > > Cc: GCC Patches <gcc-patches@gcc.gnu.org> > > Subject: Re: [PATCH] x86_64: Improvements to arithmetic right shifts of > > V1TImode values. > > > > On Tue, Jan 11, 2022 at 2:26 PM Roger Sayle <ro...@nextmovesoftware.com> > > wrote: > > > > > > > > > This patch to the i386 backend's ix86_expand_v1ti_ashiftrt provides > > > improved (shorter) implementations of V1TI mode arithmetic right > > > shifts for constant amounts between 111 and 126 bits. The > > > significance of this range is that this functionality is useful for > > > (eventually) providing sign extension from HImode and QImode to V1TImode. > > > > > > For example, x>>112 (to sign extend a 16-bit value), was previously > > > generated as a four operation sequence: > > > > > > movdqa %xmm0, %xmm1 // word 7 6 5 4 3 2 1 0 > > > psrad $31, %xmm0 // V8HI = [S,S,?,?,?,?,?,?] > > > psrad $16, %xmm1 // V8HI = [S,X,?,?,?,?,?,?] > > > punpckhqdq %xmm0, %xmm1 // V8HI = [S,S,?,?,S,X,?,?] > > > pshufd $253, %xmm1, %xmm0 // V8HI = [S,S,S,S,S,S,S,X] > > > > > > with this patch, we now generates a three operation sequence: > > > > > > psrad $16, %xmm0 // V8HI = [S,X,?,?,?,?,?,?] > > > pshufhw $254, %xmm0, %xmm0 // V8HI = [S,S,S,X,?,?,?,?] > > > pshufd $254, %xmm0, %xmm0 // V8HI = [S,S,S,S,S,S,S,X] > > > > > > The correctness of generated code is confirmed by the existing > > > run-time test gcc.target/i386/sse2-v1ti-ashiftrt-1.c in the testsuite. > > > This idiom is safe to use for shifts by 127, but that case gets > > > handled by a two operation sequence earlier in this function. > > > > > > > > > This patch has been tested on x86_64-pc-linux-gnu with a make > > > bootstrap and make -k check with no new failures. OK for mainline? > > > > > > > > > 2022-01-11 Roger Sayle <ro...@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > * config/i386/i386-expand.c (ix86_expand_v1ti_ashiftrt): Provide > > > new three operation implementations for shifts by 111..126 bits. > > > > + if (bits >= 111) > > + { > > + /* Three operations. */ > > + rtx tmp1 = gen_reg_rtx (V4SImode); > > + rtx tmp2 = gen_reg_rtx (V4SImode); > > + emit_move_insn (tmp1, gen_lowpart (V4SImode, op1)); > > + emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits - 96))); > > > > This can be written as: > > > > rtx tmp1 = force_reg (V4SImode, gen_lowpart (V4SImode, op1)); emit_insn > > (gen_ashrv4i3 (tmp2, tmp1, GEN_INT ...)); > > > > + rtx tmp3 = gen_reg_rtx (V8HImode); > > + rtx tmp4 = gen_reg_rtx (V8HImode); > > + emit_move_insn (tmp3, gen_lowpart (V8HImode, tmp2)); > > + emit_insn (gen_sse2_pshufhw (tmp4, tmp3, GEN_INT (0xfe))); > > > > Here in a similar way... > > > > + rtx tmp5 = gen_reg_rtx (V4SImode); > > + rtx tmp6 = gen_reg_rtx (V4SImode); > > + emit_move_insn (tmp5, gen_lowpart (V4SImode, tmp4)); > > + emit_insn (gen_sse2_pshufd (tmp6, tmp5, GEN_INT (0xfe))); > > > > ... also here. > > > > + rtx tmp7 = gen_reg_rtx (V1TImode); > > + emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp6)); > > + emit_move_insn (operands[0], tmp7); > > > > And here a simple: > > > > emit_move_insn (operands[0], gen_lowpart (V1TImode, tmp6); > > > > + return; > > + } > > + > > > > Uros.