On Tue, Jan 11, 2022 at 2:26 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch to the i386 backend's ix86_expand_v1ti_ashiftrt provides
> improved (shorter) implementations of V1TI mode arithmetic right shifts
> for constant amounts between 111 and 126 bits.  The significance of
> this range is that this functionality is useful for (eventually)
> providing sign extension from HImode and QImode to V1TImode.
>
> For example, x>>112 (to sign extend a 16-bit value), was previously
> generated as a four operation sequence:
>
>         movdqa  %xmm0, %xmm1            // word    7 6 5 4 3 2 1 0
>         psrad   $31, %xmm0              // V8HI = [S,S,?,?,?,?,?,?]
>         psrad   $16, %xmm1              // V8HI = [S,X,?,?,?,?,?,?]
>         punpckhqdq      %xmm0, %xmm1    // V8HI = [S,S,?,?,S,X,?,?]
>         pshufd  $253, %xmm1, %xmm0      // V8HI = [S,S,S,S,S,S,S,X]
>
> with this patch, we now generates a three operation sequence:
>
>         psrad   $16, %xmm0              // V8HI = [S,X,?,?,?,?,?,?]
>         pshufhw $254, %xmm0, %xmm0      // V8HI = [S,S,S,X,?,?,?,?]
>         pshufd  $254, %xmm0, %xmm0      // V8HI = [S,S,S,S,S,S,S,X]
>
> The correctness of generated code is confirmed by the existing
> run-time test gcc.target/i386/sse2-v1ti-ashiftrt-1.c in the testsuite.
> This idiom is safe to use for shifts by 127, but that case gets handled
> by a two operation sequence earlier in this function.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap
> and make -k check with no new failures.  OK for mainline?
>
>
> 2022-01-11  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.c (ix86_expand_v1ti_ashiftrt): Provide
>         new three operation implementations for shifts by 111..126 bits.

+  if (bits >= 111)
+    {
+      /* Three operations.  */
+      rtx tmp1 = gen_reg_rtx (V4SImode);
+      rtx tmp2 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+      emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits - 96)));

This can be written as:

rtx tmp1 = force_reg (V4SImode, gen_lowpart (V4SImode, op1));
emit_insn (gen_ashrv4i3 (tmp2, tmp1, GEN_INT ...));

+      rtx tmp3 = gen_reg_rtx (V8HImode);
+      rtx tmp4 = gen_reg_rtx (V8HImode);
+      emit_move_insn (tmp3, gen_lowpart (V8HImode, tmp2));
+      emit_insn (gen_sse2_pshufhw (tmp4, tmp3, GEN_INT (0xfe)));

Here in a similar way...

+      rtx tmp5 = gen_reg_rtx (V4SImode);
+      rtx tmp6 = gen_reg_rtx (V4SImode);
+      emit_move_insn (tmp5, gen_lowpart (V4SImode, tmp4));
+      emit_insn (gen_sse2_pshufd (tmp6, tmp5, GEN_INT (0xfe)));

... also here.

+      rtx tmp7 = gen_reg_rtx (V1TImode);
+      emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp6));
+      emit_move_insn (operands[0], tmp7);

And here a simple:

emit_move_insn (operands[0], gen_lowpart (V1TImode, tmp6);

+      return;
+    }
+

Uros.

Reply via email to