> On 30 Aug 2017, at 9:14 PM, Richard Biener <richard.guent...@gmail.com> wrote:
> 
> On Wed, Aug 30, 2017 at 2:36 AM, Michael Clark <michaeljcl...@mac.com> wrote:
>> Dear GCC folk,
>> 
>> 
>> # Issue Background
>> 
>> We’re investigating an issue with redundant sign-extension instructions 
>> being emitted with the riscv backend. Firstly I would like to state that 
>> riscv is possibly a unique backend with respect to its canonical 
>> sign-extended register form due to the following:
>> 
>> - POINTERS_EXTEND_UNSIGNED is false, and the canonical form after 32-bit 
>> operations on RV64 is sign-extended not zero-extended.
>> 
>> - PROMOTE_MODE is defined on RV64 such that SI mode values are promoted to 
>> signed DI mode values holding SI mode subregs
>> 
>> - RISC-V does not have register aliases for these different modes, rather 
>> different instructions are selected to operate on different register widths.
>> 
>> - The 32-bit instructions sign-extend results. e.g. all shifts, add, sub, 
>> etc.
>> 
>> - Some instructions such as logical ops only have full word width variants 
>> (AND, OR, XOR) but these instructions maintain canonical form as there is no 
>> horizontal movement of bits.
>> 
>> 
>> # Issue Details
>> 
>> During porting from GCC 6.1 to GCC 7.1, the RISC-V port was changed to 
>> define TRULY_NOOP_TRUNCATION to 1 and PROMOTE_MODE was set up to promote 
>> values to wider modes. This was done to ensure ABI correctness so that 
>> registers holding narrower modes always contain canonical sign extended 
>> values.
>> 
>> The result of this is that sign_extend operations are inserted but later 
>> eliminated in ree/simplyfy_rtx. In testcase 3 the sign_extend is correctly 
>> eliminated, and indeed most 32-bit instructions are handled correctly. This 
>> is what the pattern looks like for a typical 32-bit instruction after expand:
>> 
>> ;;
>> ;; Full RTL generated for this function:
>> ;;
>> (note 1 0 4 NOTE_INSN_DELETED)
>> (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> (insn 2 4 3 2 (set (reg/v:DI 73 [ rs1 ])
>>       (reg:DI 10 a0 [ rs1 ])) "lshift.c":1 -1
>>    (nil))
>> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
>> (insn 6 3 7 2 (set (reg:SI 74)
>>       (ashift:SI (subreg/s/u:SI (reg/v:DI 73 [ rs1 ]) 0)
>>           (const_int 24 [0x18]))) "lshift.c":1 -1
>>    (nil))
>> (insn 7 6 8 2 (set (reg:DI 75)
>>       (sign_extend:DI (reg:SI 74))) "lshift.c":1 -1
>>    (nil))
>> (insn 8 7 12 2 (set (reg:DI 72 [ <retval> ])
>>       (reg:DI 75)) "lshift.c":1 -1
>>    (nil))
>> (insn 12 8 13 2 (set (reg/i:DI 10 a0)
>>       (reg:DI 72 [ <retval> ])) "lshift.c":1 -1
>>    (nil))
>> (insn 13 12 0 2 (use (reg/i:DI 10 a0)) "lshift.c":1 -1
>>    (nil))
>> 
>> 
>> # Problem test cases
>> 
>> - testcase 1 - open coded bswap - redundant sign extension instructions
>> - testcase 2 - open coded right shift - redundant sign extension instructions
>> - testcase 3 - open coded left shift - control / correct behaviour
>> 
>> It appears that the downside to the PROMOTE_MODE transition is that several 
>> redundant sign-extension operations are cropping up under specific 
>> circumstances. One of the first small isolators is testcase 1 (below), which 
>> is an open-coded bswap. You’ll notice the SEXT.W emitted before the return. 
>> This was then further isolated to an even simpler testcase 2 which is a 
>> single right shift which also emits SEXT.W before the return. A 32-bit left 
>> shift or other 32-bit operations correctly have the redundant sign_extend 
>> eliminated.
>> 
>> We have isolated the code that prevented the right shift from having its 
>> redundant sign extension eliminated and it is target independent code in 
>> simplify-rtx. Code in simplify_unary_operation_1 assumes zero extension is 
>> more efficient, likely an assumption based on the fact that most platforms 
>> define POINTERS_EXTEND_UNSIGNED to true and naturally promote words to zero 
>> extended form vs sign extended form. The redundant sign extension appears to 
>> be due to an optimisation in simply_rtx that generates zero extend for right 
>> shifts where the value is non zero, based on the assumption that zero extend 
>> is” better”. This is not true on platforms that define 
>> POINTERS_EXTEND_UNSIGNED to false i.e. naturally promote subregister width 
>> operations to canonical sign-extended form internally.
>> 
>> 
>> # Partial resolution of test case 2
>> 
>> We have prepared a patch that disables the zero extend optimisation on 
>> platforms that define POINTERS_EXTEND_UNSIGNED. It fixes the issue in 
>> testcase 2. We believe this fix is very low risk because right shift for 
>> values above zero will always have a zero sign bit, thus either sign or zero 
>> extension can be used (this is in the comment), however sign-extension is 
>> “better” on RISC-V and likely all other platforms with 
>> POINTERS_EXTEND_UNSIGNED false. Platforms like x86, Aarch64 and most others 
>> define POINTERS_EXTEND_UNSIGNED to 1 so would not be affected by this change.
>> 
>> Please review this candidate patch to fix the codegen issue for testcase 2.
>> 
>> 
>> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
>> index ce632ae..25dd70f 100644
>> --- a/gcc/simplify-rtx.c
>> +++ b/gcc/simplify-rtx.c
>> @@ -1503,6 +1503,10 @@ simplify_unary_operation_1 (enum rtx_code code, 
>> machine_mode mode, rtx op)
>>      /* (sign_extend:M (lshiftrt:N <X> (const_int I))) is better as
>>         (zero_extend:M (lshiftrt:N <X> (const_int I))) if I is not 0.  */
>>      if (GET_CODE (op) == LSHIFTRT
>> +#if defined(POINTERS_EXTEND_UNSIGNED)
>> +      /* we skip this optimisation if pointers naturally extend signed */
>> +         && POINTERS_EXTEND_UNSIGNED
>> +#endif
>>         && CONST_INT_P (XEXP (op, 1))
>>         && XEXP (op, 1) != const0_rtx)
>>       return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
> 
> Is it just me or does this miss a || mode != Pmode || GET_MODE (op) != 
> ptr_mode
> check?  Note the comment says exactly the opposite as the transform...
> 
> I’m not even sure why this simplification is correct in the first place?!

I hope you are not confusing my use of POINTERS_EXTEND_UNSIGNED as a proxy for 
the property that defines whether sub width operations sign-extend to the full 
width of the register vs zero extend. Are you taking about our added comment?

The added comment is in fact correct with the proviso that we are using the 
property as a proxy for promotion of registers. I should probably say 
“registers” vs “pointers” in the comment. We are skipping the transform if 
POINTERS_EXTEND_UNSIGNED is defined and is false. RISC-V defines it to false as 
registers naturally EXTEND_SIGNED vs EXTEND_UNSIGNED.

My suspicion is that this code was an X86 optimisation hidden in target 
independent code. i.e instead of emitting MOVSX/MOVSXD (if the extend is not 
elided), emit nothing. However on RISC-V it is the other way around.

We though we could use promote_mode to check and check whether unsigned was 
false, but promote_mode requires a tree type and I wasn’t sure how to call it. 
It doesn’t seem to be called in very many places.

POINTERS_EXTEND_UNSIGNED false appears to be defined on targets that sign 
extend subregister widths.
POINTERS_EXTEND_UNSIGNED 1 appears to be defined on targets that zero extend 
subregister widths.
POINTERS_EXTEND_UNSIGNED -1 (which is true) is defined on some targets. I 
assume they sign-extend but the meaning has been overloaded.

We could #define SUBREGISTERS_EXTEND_UNSIGNED false if that makes it more 
clear, or figure out how to call promote_mode here as I mentioned earlier, 
however on reading set_reg_attrs_from_value in emit-rtl.c it seems 
POINTERS_EXTEND_UNSIGNED is a proxy for defining that sub registers zero-extend 
and is set to false if the target sign extends. MIPS for example also sets 
POINTERS_EXTEND_UNSIGNED to false. X86 sets it to 1.

>> However, after applying this patch, we noticed that while it resolved the 
>> redundant sign extension after the right shift in testcase 2, it has not 
>> resolved the redundant sign extension emitted in testcase 1.
>> 
>> 
>> # Questions
>> 
>> 1. Is the first patch an appropriate fix for testcase 2?
>> 
>> This particular patch seems harmless as zero or sign extending a right shift 
>> of a non-zero value will have the same result, however in RISC-V’s case, 
>> sign-extension can be eliminated at no cost and the zero_extend results in 
>> the redundant SEXT.W due to an inserted sign_extend for canonicalisation of 
>> the register value. It seemed to me that we can’t easily access promote_mode 
>> in simplify-rtx so we used POINTERS_EXTEND_UNSIGNED to indicate the 
>> canonical representation for wider types is sign-extended. I believe that is 
>> the intention of that flag as it is used this way in the sign and zero 
>> extension logic in emit-rtl.c
>> 
>> 2. Where should we fix the original bswap testcase 1?
>> 
>> The redundant sign extension is not solved by the patch above and it seems 
>> we have found 2 distinct issues. We are now wondering if logic to handle 
>> redundant sign extension elimination for the word width logical operations 
>> (AND/OR/XOR) is missing in simplify_rtx or REE (Redundant Extension 
>> Elimination). 64-bit logical operations on DI mode registers with SI mode 
>> subregs should also have their sign_extend operations eliminated as there is 
>> no horizontal bit movement introduced by these operations. Upon looking at 
>> the RTL for testcase 1, we can see that there is a very similar pattern to 
>> the right shift case. We are seeking guidance on where to find the logic (or 
>> lack thereof) to eliminate sign_extend on DI mode registers containing SI 
>> mode subregs for full width logical ops. Should this logic be in 
>> simplify-rtx.c or ree.c?
>> 
>> Regards,
>> Michael.
>> 
>> P.S. if the patch is acceptable, I can post to gcc-patches and/or raise bugs 
>> in GCC bugzilla… I guess we’re initially seeking guidance on the issue…
>> 
>> 
>> testcase 1:
>> 
>> $ cat bswap.c
>> unsigned bswap(unsigned p) {
>>      return ((p >> 24) & 0x000000ff) | ((p << 8 ) & 0x00ff0000) |
>>       ((p >> 8 ) & 0x0000ff00) | ((p << 24) & 0xff000000);
>> }
>> $ cat bswap.s
>> bswap2:
>>        sllw    a3,a0,24
>>        srlw    a5,a0,24
>>        sllw    a4,a0,8
>>        or      a5,a5,a3
>>        li      a3,16711680
>>        and     a4,a4,a3
>>        or      a5,a5,a4
>>        li      a4,65536
>>        add     a4,a4,-256
>>        srlw    a0,a0,8
>>        and     a0,a0,a4
>>        or      a0,a5,a0
>>        sext.w  a0,a0   # redundant
>>        ret
>> 
>> 
>> testcase 2:
>> 
>> $ cat rshift.c
>> unsigned rs24(unsigned rs1) { return rs1 >> 24; }
>> $ cat rshift.s
>> rs24:
>>        sllw    a0,a0,24
>>        sext.w  a0,a0   # redundant
>>        ret
>> 
>> 
>> testcase 3:
>> 
>> $ cat lshift.c
>> unsigned ls24(unsigned rs1) { return rs1 << 24; }
>> $ cat lshift.s
>> ls24:
>>        sllw    a0,a0,24
>>        ret

Reply via email to