> On 30 Aug 2017, at 12:36 PM, Michael Clark <[email protected]> wrote:
>
> Dear GCC folk,
>
>
> # Issue Background
>
> We’re investigating an issue with redundant sign-extension instructions being
> emitted with the riscv backend. Firstly I would like to state that riscv is
> possibly a unique backend with respect to its canonical sign-extended
> register form due to the following:
>
> - POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit
> operations on RV64 is sign-extended not zero-extended.
>
> - PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to
> signed DI mode values holding SI mode subregs
>
> - RISC-V does not have register aliases for these different modes, rather
> different instructions are selected to operate on different register widths.
>
> - The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, etc.
>
> - Some instructions such as logical ops only have full word width variants
> (AND, OR, XOR) but these instructions maintain canonical form as there is no
> horizontal movement of bits.
>
>
> # Issue Details
>
> During porting from GCC 6.1 to GCC 7.1, the RISC-V port was changed to define
> TRULY_NOOP_TRUNCATION to 1 and PROMOTE_MODE was set up to promote values to
> wider modes. This was done to ensure ABI correctness so that registers
> holding narrower modes always contain canonical sign extended values.
>
> The result of this is that sign_extend operations are inserted but later
> eliminated in ree/simplyfy_rtx. In testcase 3 the sign_extend is correctly
> eliminated, and indeed most 32-bit instructions are handled correctly. This
> is what the pattern looks like for a typical 32-bit instruction after expand:
>
> ;;
> ;; Full RTL generated for this function:
> ;;
> (note 1 0 4 NOTE_INSN_DELETED)
> (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn 2 4 3 2 (set (reg/v:DI 73 [ rs1 ])
> (reg:DI 10 a0 [ rs1 ])) "lshift.c":1 -1
> (nil))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (set (reg:SI 74)
> (ashift:SI (subreg/s/u:SI (reg/v:DI 73 [ rs1 ]) 0)
> (const_int 24 [0x18]))) "lshift.c":1 -1
> (nil))
> (insn 7 6 8 2 (set (reg:DI 75)
> (sign_extend:DI (reg:SI 74))) "lshift.c":1 -1
> (nil))
> (insn 8 7 12 2 (set (reg:DI 72 [ <retval> ])
> (reg:DI 75)) "lshift.c":1 -1
> (nil))
> (insn 12 8 13 2 (set (reg/i:DI 10 a0)
> (reg:DI 72 [ <retval> ])) "lshift.c":1 -1
> (nil))
> (insn 13 12 0 2 (use (reg/i:DI 10 a0)) "lshift.c":1 -1
> (nil))
>
>
> # Problem test cases
>
> - testcase 1 - open coded bswap - redundant sign extension instructions
> - testcase 2 - open coded right shift - redundant sign extension instructions
> - testcase 3 - open coded left shift - control / correct behaviour
>
> It appears that the downside to the PROMOTE_MODE transition is that several
> redundant sign-extension operations are cropping up under specific
> circumstances. One of the first small isolators is testcase 1 (below), which
> is an open-coded bswap. You’ll notice the SEXT.W emitted before the return.
> This was then further isolated to an even simpler testcase 2 which is a
> single right shift which also emits SEXT.W before the return. A 32-bit left
> shift or other 32-bit operations correctly have the redundant sign_extend
> eliminated.
>
> We have isolated the code that prevented the right shift from having its
> redundant sign extension eliminated and it is target independent code in
> simplify-rtx. Code in simplify_unary_operation_1 assumes zero extension is
> more efficient, likely an assumption based on the fact that most platforms
> define POINTERS_EXTEND_UNSIGNED to true and naturally promote words to zero
> extended form vs sign extended form. The redundant sign extension appears to
> be due to an optimisation in simply_rtx that generates zero extend for right
> shifts where the value is non zero, based on the assumption that zero extend
> is” better”. This is not true on platforms that define
> POINTERS_EXTEND_UNSIGNED to false i.e. naturally promote subregister width
> operations to canonical sign-extended form internally.
>
>
> # Partial resolution of test case 2
>
> We have prepared a patch that disables the zero extend optimisation on
> platforms that define POINTERS_EXTEND_UNSIGNED. It fixes the issue in
> testcase 2. We believe this fix is very low risk because right shift for
> values above zero will always have a zero sign bit, thus either sign or zero
> extension can be used (this is in the comment), however sign-extension is
> “better” on RISC-V and likely all other platforms with
> POINTERS_EXTEND_UNSIGNED false. Platforms like x86, Aarch64 and most others
> define POINTERS_EXTEND_UNSIGNED to 1 so would not be affected by this change.
>
> Please review this candidate patch to fix the codegen issue for testcase 2.
>
>
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index ce632ae..25dd70f 100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -1503,6 +1503,10 @@ simplify_unary_operation_1 (enum rtx_code code,
> machine_mode mode, rtx op)
> /* (sign_extend:M (lshiftrt:N <X> (const_int I))) is better as
> (zero_extend:M (lshiftrt:N <X> (const_int I))) if I is not 0. */
> if (GET_CODE (op) == LSHIFTRT
> +#if defined(POINTERS_EXTEND_UNSIGNED)
> + /* we skip this optimisation if pointers naturally extend signed */
> + && POINTERS_EXTEND_UNSIGNED
> +#endif
> && CONST_INT_P (XEXP (op, 1))
> && XEXP (op, 1) != const0_rtx)
> return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
>
>
> However, after applying this patch, we noticed that while it resolved the
> redundant sign extension after the right shift in testcase 2, it has not
> resolved the redundant sign extension emitted in testcase 1.
>
>
> # Questions
>
> 1. Is the first patch an appropriate fix for testcase 2?
>
> This particular patch seems harmless as zero or sign extending a right shift
> of a non-zero value will have the same result, however in RISC-V’s case,
> sign-extension can be eliminated at no cost and the zero_extend results in
> the redundant SEXT.W due to an inserted sign_extend for canonicalisation of
> the register value. It seemed to me that we can’t easily access promote_mode
> in simplify-rtx so we used POINTERS_EXTEND_UNSIGNED to indicate the canonical
> representation for wider types is sign-extended. I believe that is the
> intention of that flag as it is used this way in the sign and zero extension
> logic in emit-rtl.c
>
> 2. Where should we fix the original bswap testcase 1?
>
> The redundant sign extension is not solved by the patch above and it seems we
> have found 2 distinct issues. We are now wondering if logic to handle
> redundant sign extension elimination for the word width logical operations
> (AND/OR/XOR) is missing in simplify_rtx or REE (Redundant Extension
> Elimination). 64-bit logical operations on DI mode registers with SI mode
> subregs should also have their sign_extend operations eliminated as there is
> no horizontal bit movement introduced by these operations. Upon looking at
> the RTL for testcase 1, we can see that there is a very similar pattern to
> the right shift case. We are seeking guidance on where to find the logic (or
> lack thereof) to eliminate sign_extend on DI mode registers containing SI
> mode subregs for full width logical ops. Should this logic be in
> simplify-rtx.c or ree.c?
I believe these patterns can have their sign extension eliminated on riscv but
REE is not touching them for some reason:
(sign_extend:DI (and:DI (subreg:DI (reg:SI)) (subreg:DI (reg:SI)))
(sign_extend:DI (or:DI (subreg:DI (reg:SI)) (subreg:DI (reg:SI)))
(sign_extend:DI (xor:DI (subreg:DI (reg:SI)) (subreg:DI (reg:SI)))
I’m now reading combine_reaching_defs in gcc/ree.c. We see Elimination
opportunities = 1 realized = 0 in REE, which means combine_reaching_defs
failed. I am presently assuming this is where the sign_extend should be
eliminated.
This is a MCVE for the logical op case:
https://cx.rv8.io/g/PCEvoa
testcase 4
$ cat rp2.c
unsigned rp2(unsigned x) { return -x & x; }
$ cat rp2.s
rp2(unsigned int):
subw a5,zero,a0
and a0,a0,a5
sext.w a0,a0
ret
$ cat rp2.c.229r.expand
;;
;; Full RTL generated for this function:
;;
(note 1 0 4 NOTE_INSN_DELETED)
(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v:DI 74 [ x ])
(reg:DI 10 a0 [ x ])) "rp2.c":1 -1
(nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SI 76)
(minus:SI (const_int 0 [0])
(subreg/s/u:SI (reg/v:DI 74 [ x ]) 0))) "rp2.c":1 -1
(nil))
(insn 7 6 8 2 (set (reg:SI 77)
(subreg/s/u:SI (reg/v:DI 74 [ x ]) 0)) "rp2.c":1 -1
(nil))
(insn 8 7 9 2 (set (reg:DI 78)
(and:DI (subreg:DI (reg:SI 76) 0)
(subreg:DI (reg:SI 77) 0))) "rp2.c":1 -1
(nil))
(insn 9 8 10 2 (set (reg:DI 79)
(sign_extend:DI (subreg:SI (reg:DI 78) 0))) "rp2.c":1 -1
(nil))
(insn 10 9 14 2 (set (reg:DI 73 [ <retval> ])
(reg:DI 79)) "rp2.c":1 -1
(nil))
(insn 14 10 15 2 (set (reg/i:DI 10 a0)
(reg:DI 73 [ <retval> ])) "rp2.c":1 -1
(nil))
(insn 15 14 0 2 (use (reg/i:DI 10 a0)) "rp2.c":1 -1
(nil))
;; Function rp2 (rp2, funcdef_no=0, decl_uid=1411, cgraph_uid=0, symbol_order=0)
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 ( 1)
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 ( 1)
Trying to eliminate extension:
(insn 14 8 15 2 (set (reg/i:DI 10 a0)
(sign_extend:DI (reg:SI 10 a0 [78]))) "rp2.c":1 86 {extendsidi2}
(nil))
Elimination opportunities = 1 realized = 0
starting the processing of deferred insns
ending the processing of deferred insns
$ cat rp2.c.276r.ree
Dataflow summary:
;; invalidated by call 0 [zero] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10
[a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30
[t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39
[ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7]
60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11]
;; hardware regs used 2 [sp]
;; regular block artificial uses 2 [sp]
;; eh block artificial uses 2 [sp] 64 [arg]
;; entry block defs 1 [ra] 2 [sp] 7 [t2] 10 [a0] 11 [a1] 12 [a2] 13 [a3]
14 [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47
[fa5] 48 [fa6] 49 [fa7]
;; exit block uses 1 [ra] 2 [sp] 10 [a0]
;; regs ever live 10 [a0] 15 [a5]
;; ref usage r1={1d,1u} r2={1d,2u} r7={1d} r10={3d,5u} r11={1d} r12={1d}
r13={1d} r14={1d} r15={2d,1u} r16={1d} r17={1d} r42={1d} r43={1d} r44={1d}
r45={1d} r46={1d} r47={1d} r48={1d} r49={1d}
;; total ref usage 31{22d,9u,0e} in 4{4 regular + 0 call} insns.
(note 1 0 4 NOTE_INSN_DELETED)
(note 4 1 3 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 3 4 9 2 NOTE_INSN_FUNCTION_BEG)
(note 9 3 6 2 NOTE_INSN_DELETED)
(insn 6 9 8 2 (set (reg:SI 15 a5 [76])
(minus:SI (const_int 0 [0])
(reg:SI 10 a0 [orig:74 x ] [74]))) "rp2.c":1 10 {subsi3}
(nil))
(insn 8 6 14 2 (set (reg:DI 10 a0 [78])
(and:DI (reg/v:DI 10 a0 [orig:74 x ] [74])
(reg:DI 15 a5 [76]))) "rp2.c":1 70 {anddi3}
(nil))
(insn 14 8 15 2 (set (reg/i:DI 10 a0)
(sign_extend:DI (reg:SI 10 a0 [78]))) "rp2.c":1 86 {extendsidi2}
(nil))
(insn 15 14 17 2 (use (reg/i:DI 10 a0)) "rp2.c":1 -1
(nil))
(note 17 15 18 NOTE_INSN_DELETED)
(note 18 17 0 NOTE_INSN_DELETED)