Dear GCC folk,

# Issue Background

We’re investigating an issue with redundant sign-extension instructions being 
emitted with the riscv backend. Firstly I would like to state that riscv is 
possibly a unique backend with respect to its canonical sign-extended register 
form due to the following:

- POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit 
operations on RV64 is sign-extended not zero-extended.

- PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to 
signed DI mode values holding SI mode subregs

- RISC-V does not have register aliases for these different modes, rather 
different instructions are selected to operate on different register widths.

- The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, etc.

- Some instructions such as logical ops only have full word width variants 
(AND, OR, XOR) but these instructions maintain canonical form as there is no 
horizontal movement of bits.


# Issue Details

During porting from GCC 6.1 to GCC 7.1, the RISC-V port was changed to define 
TRULY_NOOP_TRUNCATION to 1 and PROMOTE_MODE was set up to promote values to 
wider modes. This was done to ensure ABI correctness so that registers holding 
narrower modes always contain canonical sign extended values.

The result of this is that sign_extend operations are inserted but later 
eliminated in ree/simplyfy_rtx. In testcase 3 the sign_extend is correctly 
eliminated, and indeed most 32-bit instructions are handled correctly. This is 
what the pattern looks like for a typical 32-bit instruction after expand:

;;
;; Full RTL generated for this function:
;;
(note 1 0 4 NOTE_INSN_DELETED)
(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v:DI 73 [ rs1 ])
       (reg:DI 10 a0 [ rs1 ])) "lshift.c":1 -1
    (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SI 74)
       (ashift:SI (subreg/s/u:SI (reg/v:DI 73 [ rs1 ]) 0)
           (const_int 24 [0x18]))) "lshift.c":1 -1
    (nil))
(insn 7 6 8 2 (set (reg:DI 75)
       (sign_extend:DI (reg:SI 74))) "lshift.c":1 -1
    (nil))
(insn 8 7 12 2 (set (reg:DI 72 [ <retval> ])
       (reg:DI 75)) "lshift.c":1 -1
    (nil))
(insn 12 8 13 2 (set (reg/i:DI 10 a0)
       (reg:DI 72 [ <retval> ])) "lshift.c":1 -1
    (nil))
(insn 13 12 0 2 (use (reg/i:DI 10 a0)) "lshift.c":1 -1
    (nil))


# Problem test cases

- testcase 1 - open coded bswap - redundant sign extension instructions
- testcase 2 - open coded right shift - redundant sign extension instructions
- testcase 3 - open coded left shift - control / correct behaviour

It appears that the downside to the PROMOTE_MODE transition is that several 
redundant sign-extension operations are cropping up under specific 
circumstances. One of the first small isolators is testcase 1 (below), which is 
an open-coded bswap. You’ll notice the SEXT.W emitted before the return. This 
was then further isolated to an even simpler testcase 2 which is a single right 
shift which also emits SEXT.W before the return. A 32-bit left shift or other 
32-bit operations correctly have the redundant sign_extend eliminated.

We have isolated the code that prevented the right shift from having its 
redundant sign extension eliminated and it is target independent code in 
simplify-rtx. Code in simplify_unary_operation_1 assumes zero extension is more 
efficient, likely an assumption based on the fact that most platforms define 
POINTERS_EXTEND_UNSIGNED to true and naturally promote words to zero extended 
form vs sign extended form. The redundant sign extension appears to be due to 
an optimisation in simply_rtx that generates zero extend for right shifts where 
the value is non zero, based on the assumption that zero extend is” better”. 
This is not true on platforms that define POINTERS_EXTEND_UNSIGNED to false 
i.e. naturally promote subregister width operations to canonical sign-extended 
form internally.


# Partial resolution of test case 2

We have prepared a patch that disables the zero extend optimisation on 
platforms that define POINTERS_EXTEND_UNSIGNED. It fixes the issue in testcase 
2. We believe this fix is very low risk because right shift for values above 
zero will always have a zero sign bit, thus either sign or zero extension can 
be used (this is in the comment), however sign-extension is “better” on RISC-V 
and likely all other platforms with POINTERS_EXTEND_UNSIGNED false. Platforms 
like x86, Aarch64 and most others define POINTERS_EXTEND_UNSIGNED to 1 so would 
not be affected by this change.

Please review this candidate patch to fix the codegen issue for testcase 2.


diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index ce632ae..25dd70f 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1503,6 +1503,10 @@ simplify_unary_operation_1 (enum rtx_code code, 
machine_mode mode, rtx op)
      /* (sign_extend:M (lshiftrt:N <X> (const_int I))) is better as
         (zero_extend:M (lshiftrt:N <X> (const_int I))) if I is not 0.  */
      if (GET_CODE (op) == LSHIFTRT
+#if defined(POINTERS_EXTEND_UNSIGNED)
+      /* we skip this optimisation if pointers naturally extend signed */
+         && POINTERS_EXTEND_UNSIGNED
+#endif
         && CONST_INT_P (XEXP (op, 1))
         && XEXP (op, 1) != const0_rtx)
       return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));


However, after applying this patch, we noticed that while it resolved the 
redundant sign extension after the right shift in testcase 2, it has not 
resolved the redundant sign extension emitted in testcase 1. 


# Questions

1. Is the first patch an appropriate fix for testcase 2?

This particular patch seems harmless as zero or sign extending a right shift of 
a non-zero value will have the same result, however in RISC-V’s case, 
sign-extension can be eliminated at no cost and the zero_extend results in the 
redundant SEXT.W due to an inserted sign_extend for canonicalisation of the 
register value. It seemed to me that we can’t easily access promote_mode in 
simplify-rtx so we used POINTERS_EXTEND_UNSIGNED to indicate the canonical 
representation for wider types is sign-extended. I believe that is the 
intention of that flag as it is used this way in the sign and zero extension 
logic in emit-rtl.c

2. Where should we fix the original bswap testcase 1?

The redundant sign extension is not solved by the patch above and it seems we 
have found 2 distinct issues. We are now wondering if logic to handle redundant 
sign extension elimination for the word width logical operations (AND/OR/XOR) 
is missing in simplify_rtx or REE (Redundant Extension Elimination). 64-bit 
logical operations on DI mode registers with SI mode subregs should also have 
their sign_extend operations eliminated as there is no horizontal bit movement 
introduced by these operations. Upon looking at the RTL for testcase 1, we can 
see that there is a very similar pattern to the right shift case. We are 
seeking guidance on where to find the logic (or lack thereof) to eliminate 
sign_extend on DI mode registers containing SI mode subregs for full width 
logical ops. Should this logic be in simplify-rtx.c or ree.c?

Regards,
Michael.

P.S. if the patch is acceptable, I can post to gcc-patches and/or raise bugs in 
GCC bugzilla… I guess we’re initially seeking guidance on the issue…


testcase 1:

$ cat bswap.c
unsigned bswap(unsigned p) {
      return ((p >> 24) & 0x000000ff) | ((p << 8 ) & 0x00ff0000) | 
       ((p >> 8 ) & 0x0000ff00) | ((p << 24) & 0xff000000);
}
$ cat bswap.s
bswap2:
        sllw    a3,a0,24
        srlw    a5,a0,24
        sllw    a4,a0,8
        or      a5,a5,a3
        li      a3,16711680
        and     a4,a4,a3
        or      a5,a5,a4
        li      a4,65536
        add     a4,a4,-256
        srlw    a0,a0,8
        and     a0,a0,a4
        or      a0,a5,a0
        sext.w  a0,a0   # redundant
        ret


testcase 2:

$ cat rshift.c
unsigned rs24(unsigned rs1) { return rs1 >> 24; }
$ cat rshift.s
rs24:
        sllw    a0,a0,24
        sext.w  a0,a0   # redundant
        ret


testcase 3:

$ cat lshift.c
unsigned ls24(unsigned rs1) { return rs1 << 24; }
$ cat lshift.s
ls24:
        sllw    a0,a0,24
        ret

Reply via email to