Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

Palmer Dabbelt Fri, 30 Jun 2023 17:36:57 -0700

On Fri, 30 Jun 2023 17:25:54 PDT (-0700), Andrew Waterman wrote:

On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta <vine...@rivosinc.com> wrote:




On 6/30/23 16:50, Andrew Waterman wrote:
> I don't believe this is correct; the subtraction is needed to account
> for the fact that the low part might be negative, resulting in a
> borrow from the high part.  See the output for your test case below:
>
> $ cat test.c
> #include <stdio.h>
>
> int main()
> {
>    unsigned long result, tmp;
>
> asm (
>    "li      %1,-252645376\n"
>    "addi    %1,%1,240\n"
>    "slli    %0,%1,32\n"
>    "add     %0,%0,%1"
>      : "=r" (result), "=r" (tmp));
>
>    printf("%lx\n", result);
>
>    return 0;
> }
> $ riscv64-unknown-elf-gcc -O2 test.c
> $ spike pk a.out
> bbl loader
> f0f0f0eff0f0f0f0
> $

Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
So is it possible to have a better code seq for the testcase at all ?


You're welcome!

When Zba is implemented, then inserting a zext.w would do the trick;
see below.  (The generalization is that the zext.w is needed if the
32-bit constant is negative.)  When Zba is not implemented, I think
the original sequence is optimal.

li      a5, -252645376
addi    a5, a5, 240
slli    a0, a5, 32
zext.w  a5, a5
add     a0, a0, a5

For the non-Zba case, I think we can leverage the two high partsstarting out the same to save an instruction generating the constant.So for the original code sequence of

       li      a5,-252645376
       addi    a5,a5,241
       li      a0,-252645376
       slli    a5,a5,32
       addi    a0,a0,240
       add     a0,a5,a0
       ret

we could instead generate

       li      a5,-252645376
       addi    a0,a5,240
       addi    a5,a5,241
       slli    a5,a5,32
       add     a0,a5,a0
       ret

which IIUC produces the same result. I think something along the linesof this (with the corresponding cost function updates) would do it


   diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
   index de578b5b899..32b6033a966 100644
   --- a/gcc/config/riscv/riscv.cc
   +++ b/gcc/config/riscv/riscv.cc
   @@ -704,7 +704,13 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode 
mode)
      rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);

riscv_move_integer (hi, hi, hival, mode);

   -  riscv_move_integer (lo, lo, loval, mode);
   +  if (riscv_integer_cost (loval - hival) + 1 < riscv_integer_cost (loval)) {
   +    rtx delta = gen_reg_rrtx (mode);
   +    riscv_move_integer (delta, delta, loval - hival, mode);
   +    lo = gen_rtx_fmt_ee (PLUS, mode, hi, delta);
   +  } else {
   +    riscv_move_integer (lo, lo, loval, mode);
   +  }

hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));

      hi = force_reg (mode, hi);

though I suppose that would produce a slightly different sequence that has the
same number of instructions but a slightly longer dependency chain, something
more like

       li      a5,-252645376
       addi    a5,a5,241
       addi    a0,a5,-1
       slli    a5,a5,32
       add     a0,a5,a0
       ret

Take that all with a grain of salt, though, as I just ate some very spicy
chicken and can barely see straight :)


-Vineet

>
>
> On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta <vine...@rivosinc.com> wrote:
>>
>>
>> On 6/30/23 16:33, Vineet Gupta wrote:
>>> Ran into a minor snafu in const splitting code when playing with test
>>> case from an old PR/23813.
>>>
>>>        long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
>>>
>>> This currently generates
>>>
>>>        li      a5,-252645376
>>>        addi    a5,a5,241
>>>        li      a0,-252645376
>>>        slli    a5,a5,32
>>>        addi    a0,a0,240
>>>        add     a0,a5,a0
>>>        ret
>>>
>>> The signed math in hival extraction introduces an additional bit,
>>> causing loval == hival check to fail.
>>>
>>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
../gcc/config/riscv/riscv.cc:702
>>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>>> | (gdb)n
>>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>>> | (gdb)
>> FWIW (and I missed adding this observation to the changelog) I pondered
>> about using unsigned loval/hival with zext_hwi() but that in certain
>> cases can cause additional insns
>>
>> e.g. constant 0x8000_0000 is codegen to LI 1 +SLLI 31 vs, LI
>> 0xFFFFFFFF_80000000
>>
>>
>>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>>> | (gdb) p/x val
>>> | $2 = 0xf0f0f0f0f0f0f0f0
>>> | (gdb) p/x loval
>>> | $3 = 0xfffffffff0f0f0f0
>>> | (gdb) p/x hival
>>> | $4 = 0xfffffffff0f0f0f1
>>>                          ^^^
>>> Fix that by eliding the subtraction in shift.
>>>
>>> With patch:
>>>
>>>        li      a5,-252645376
>>>        addi    a5,a5,240
>>>        slli    a0,a5,32
>>>        add     a0,a0,a5
>>>        ret
>>>
>>> gcc/ChangeLog:
>>>
>>>        * config/riscv/riscv.cc (riscv_split_integer): hival computation
>>>          do elide subtraction of loval.
>>>        * (riscv_split_integer_cost): Ditto.
>>>        * (riscv_build_integer): Ditto
>>>
>>> Signed-off-by: Vineet Gupta <vine...@rivosinc.com>
>>> ---
>>> I wasn't planning to do any more work on large const stuff, but just ran 
into it this
>>> on a random BZ entry when trying search for redundant constant stuff.
>>> The test seemed too good to pass :-)
>>> ---
>>>    gcc/config/riscv/riscv.cc | 6 +++---
>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>> index 5ac187c1b1b4..377d3aac794b 100644
>>> --- a/gcc/config/riscv/riscv.cc
>>> +++ b/gcc/config/riscv/riscv.cc
>>> @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
>>>          && (value > INT32_MAX || value < INT32_MIN))
>>>        {
>>>          unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
>>> -      unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
>>> +      unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
>>>          struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
>>>          struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
>>>          int hi_cost, lo_cost;
>>> @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
>>>    {
>>>      int cost;
>>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>>>      struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
>>>
>>>      cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
>>> @@ -700,7 +700,7 @@ static rtx
>>>    riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
>>>    {
>>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>>>      rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>>>
>>>      riscv_move_integer (lo, lo, loval, mode);

Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

Reply via email to