On Wed, Aug 30, 2017 at 2:36 AM, Michael Clark <michaeljcl...@mac.com> wrote: > Dear GCC folk, > > > # Issue Background > > We’re investigating an issue with redundant sign-extension instructions being > emitted with the riscv backend. Firstly I would like to state that riscv is > possibly a unique backend with respect to its canonical sign-extended > register form due to the following: > > - POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit > operations on RV64 is sign-extended not zero-extended. > > - PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to > signed DI mode values holding SI mode subregs > > - RISC-V does not have register aliases for these different modes, rather > different instructions are selected to operate on different register widths. > > - The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, etc. > > - Some instructions such as logical ops only have full word width variants > (AND, OR, XOR) but these instructions maintain canonical form as there is no > horizontal movement of bits. > > > # Issue Details > > During porting from GCC 6.1 to GCC 7.1, the RISC-V port was changed to define > TRULY_NOOP_TRUNCATION to 1 and PROMOTE_MODE was set up to promote values to > wider modes. This was done to ensure ABI correctness so that registers > holding narrower modes always contain canonical sign extended values. > > The result of this is that sign_extend operations are inserted but later > eliminated in ree/simplyfy_rtx. In testcase 3 the sign_extend is correctly > eliminated, and indeed most 32-bit instructions are handled correctly. This > is what the pattern looks like for a typical 32-bit instruction after expand: > > ;; > ;; Full RTL generated for this function: > ;; > (note 1 0 4 NOTE_INSN_DELETED) > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > (insn 2 4 3 2 (set (reg/v:DI 73 [ rs1 ]) > (reg:DI 10 a0 [ rs1 ])) "lshift.c":1 -1 > (nil)) > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) > (insn 6 3 7 2 (set (reg:SI 74) > (ashift:SI (subreg/s/u:SI (reg/v:DI 73 [ rs1 ]) 0) > (const_int 24 [0x18]))) "lshift.c":1 -1 > (nil)) > (insn 7 6 8 2 (set (reg:DI 75) > (sign_extend:DI (reg:SI 74))) "lshift.c":1 -1 > (nil)) > (insn 8 7 12 2 (set (reg:DI 72 [ <retval> ]) > (reg:DI 75)) "lshift.c":1 -1 > (nil)) > (insn 12 8 13 2 (set (reg/i:DI 10 a0) > (reg:DI 72 [ <retval> ])) "lshift.c":1 -1 > (nil)) > (insn 13 12 0 2 (use (reg/i:DI 10 a0)) "lshift.c":1 -1 > (nil)) > > > # Problem test cases > > - testcase 1 - open coded bswap - redundant sign extension instructions > - testcase 2 - open coded right shift - redundant sign extension instructions > - testcase 3 - open coded left shift - control / correct behaviour > > It appears that the downside to the PROMOTE_MODE transition is that several > redundant sign-extension operations are cropping up under specific > circumstances. One of the first small isolators is testcase 1 (below), which > is an open-coded bswap. You’ll notice the SEXT.W emitted before the return. > This was then further isolated to an even simpler testcase 2 which is a > single right shift which also emits SEXT.W before the return. A 32-bit left > shift or other 32-bit operations correctly have the redundant sign_extend > eliminated. > > We have isolated the code that prevented the right shift from having its > redundant sign extension eliminated and it is target independent code in > simplify-rtx. Code in simplify_unary_operation_1 assumes zero extension is > more efficient, likely an assumption based on the fact that most platforms > define POINTERS_EXTEND_UNSIGNED to true and naturally promote words to zero > extended form vs sign extended form. The redundant sign extension appears to > be due to an optimisation in simply_rtx that generates zero extend for right > shifts where the value is non zero, based on the assumption that zero extend > is” better”. This is not true on platforms that define > POINTERS_EXTEND_UNSIGNED to false i.e. naturally promote subregister width > operations to canonical sign-extended form internally. > > > # Partial resolution of test case 2 > > We have prepared a patch that disables the zero extend optimisation on > platforms that define POINTERS_EXTEND_UNSIGNED. It fixes the issue in > testcase 2. We believe this fix is very low risk because right shift for > values above zero will always have a zero sign bit, thus either sign or zero > extension can be used (this is in the comment), however sign-extension is > “better” on RISC-V and likely all other platforms with > POINTERS_EXTEND_UNSIGNED false. Platforms like x86, Aarch64 and most others > define POINTERS_EXTEND_UNSIGNED to 1 so would not be affected by this change. > > Please review this candidate patch to fix the codegen issue for testcase 2. > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index ce632ae..25dd70f 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -1503,6 +1503,10 @@ simplify_unary_operation_1 (enum rtx_code code, > machine_mode mode, rtx op) > /* (sign_extend:M (lshiftrt:N <X> (const_int I))) is better as > (zero_extend:M (lshiftrt:N <X> (const_int I))) if I is not 0. */ > if (GET_CODE (op) == LSHIFTRT > +#if defined(POINTERS_EXTEND_UNSIGNED) > + /* we skip this optimisation if pointers naturally extend signed */ > + && POINTERS_EXTEND_UNSIGNED > +#endif > && CONST_INT_P (XEXP (op, 1)) > && XEXP (op, 1) != const0_rtx) > return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
Is it just me or does this miss a || mode != Pmode || GET_MODE (op) != ptr_mode check? Note the comment says exactly the opposite as the transform... I'm not even sure why this simplification is correct in the first place?! > > However, after applying this patch, we noticed that while it resolved the > redundant sign extension after the right shift in testcase 2, it has not > resolved the redundant sign extension emitted in testcase 1. > > > # Questions > > 1. Is the first patch an appropriate fix for testcase 2? > > This particular patch seems harmless as zero or sign extending a right shift > of a non-zero value will have the same result, however in RISC-V’s case, > sign-extension can be eliminated at no cost and the zero_extend results in > the redundant SEXT.W due to an inserted sign_extend for canonicalisation of > the register value. It seemed to me that we can’t easily access promote_mode > in simplify-rtx so we used POINTERS_EXTEND_UNSIGNED to indicate the canonical > representation for wider types is sign-extended. I believe that is the > intention of that flag as it is used this way in the sign and zero extension > logic in emit-rtl.c > > 2. Where should we fix the original bswap testcase 1? > > The redundant sign extension is not solved by the patch above and it seems we > have found 2 distinct issues. We are now wondering if logic to handle > redundant sign extension elimination for the word width logical operations > (AND/OR/XOR) is missing in simplify_rtx or REE (Redundant Extension > Elimination). 64-bit logical operations on DI mode registers with SI mode > subregs should also have their sign_extend operations eliminated as there is > no horizontal bit movement introduced by these operations. Upon looking at > the RTL for testcase 1, we can see that there is a very similar pattern to > the right shift case. We are seeking guidance on where to find the logic (or > lack thereof) to eliminate sign_extend on DI mode registers containing SI > mode subregs for full width logical ops. Should this logic be in > simplify-rtx.c or ree.c? > > Regards, > Michael. > > P.S. if the patch is acceptable, I can post to gcc-patches and/or raise bugs > in GCC bugzilla… I guess we’re initially seeking guidance on the issue… > > > testcase 1: > > $ cat bswap.c > unsigned bswap(unsigned p) { > return ((p >> 24) & 0x000000ff) | ((p << 8 ) & 0x00ff0000) | > ((p >> 8 ) & 0x0000ff00) | ((p << 24) & 0xff000000); > } > $ cat bswap.s > bswap2: > sllw a3,a0,24 > srlw a5,a0,24 > sllw a4,a0,8 > or a5,a5,a3 > li a3,16711680 > and a4,a4,a3 > or a5,a5,a4 > li a4,65536 > add a4,a4,-256 > srlw a0,a0,8 > and a0,a0,a4 > or a0,a5,a0 > sext.w a0,a0 # redundant > ret > > > testcase 2: > > $ cat rshift.c > unsigned rs24(unsigned rs1) { return rs1 >> 24; } > $ cat rshift.s > rs24: > sllw a0,a0,24 > sext.w a0,a0 # redundant > ret > > > testcase 3: > > $ cat lshift.c > unsigned ls24(unsigned rs1) { return rs1 << 24; } > $ cat lshift.s > ls24: > sllw a0,a0,24 > ret