On Fri, Jun 30, 2023 at 5:36 PM Palmer Dabbelt <pal...@rivosinc.com> wrote:
>
> On Fri, 30 Jun 2023 17:25:54 PDT (-0700), Andrew Waterman wrote:
> > On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta <vine...@rivosinc.com> wrote:
> >>
> >>
> >>
> >> On 6/30/23 16:50, Andrew Waterman wrote:
> >> > I don't believe this is correct; the subtraction is needed to account
> >> > for the fact that the low part might be negative, resulting in a
> >> > borrow from the high part.  See the output for your test case below:
> >> >
> >> > $ cat test.c
> >> > #include <stdio.h>
> >> >
> >> > int main()
> >> > {
> >> >    unsigned long result, tmp;
> >> >
> >> > asm (
> >> >    "li      %1,-252645376\n"
> >> >    "addi    %1,%1,240\n"
> >> >    "slli    %0,%1,32\n"
> >> >    "add     %0,%0,%1"
> >> >      : "=r" (result), "=r" (tmp));
> >> >
> >> >    printf("%lx\n", result);
> >> >
> >> >    return 0;
> >> > }
> >> > $ riscv64-unknown-elf-gcc -O2 test.c
> >> > $ spike pk a.out
> >> > bbl loader
> >> > f0f0f0eff0f0f0f0
> >> > $
> >>
> >> Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
> >> So is it possible to have a better code seq for the testcase at all ?
> >
> > You're welcome!
> >
> > When Zba is implemented, then inserting a zext.w would do the trick;
> > see below.  (The generalization is that the zext.w is needed if the
> > 32-bit constant is negative.)  When Zba is not implemented, I think
> > the original sequence is optimal.
> >
> > li      a5, -252645376
> > addi    a5, a5, 240
> > slli    a0, a5, 32
> > zext.w  a5, a5
> > add     a0, a0, a5
>
> For the non-Zba case, I think we can leverage the two high parts
> starting out the same to save an instruction generating the constant.
> So for the original code sequence of
>
>         li      a5,-252645376
>         addi    a5,a5,241
>         li      a0,-252645376
>         slli    a5,a5,32
>         addi    a0,a0,240
>         add     a0,a5,a0
>         ret
>
> we could instead generate
>
>         li      a5,-252645376
>         addi    a0,a5,240
>         addi    a5,a5,241
>         slli    a5,a5,32
>         add     a0,a5,a0
>         ret
>
> which IIUC produces the same result.  I think something along the lines
> of this (with the corresponding cost function updates) would do it
>
>     diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>     index de578b5b899..32b6033a966 100644
>     --- a/gcc/config/riscv/riscv.cc
>     +++ b/gcc/config/riscv/riscv.cc
>     @@ -704,7 +704,13 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode 
> mode)
>        rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>
>        riscv_move_integer (hi, hi, hival, mode);
>     -  riscv_move_integer (lo, lo, loval, mode);
>     +  if (riscv_integer_cost (loval - hival) + 1 < riscv_integer_cost 
> (loval)) {
>     +    rtx delta = gen_reg_rrtx (mode);
>     +    riscv_move_integer (delta, delta, loval - hival, mode);
>     +    lo = gen_rtx_fmt_ee (PLUS, mode, hi, delta);
>     +  } else {
>     +    riscv_move_integer (lo, lo, loval, mode);
>     +  }
>
>        hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));
>        hi = force_reg (mode, hi);
>
> though I suppose that would produce a slightly different sequence that has the
> same number of instructions but a slightly longer dependency chain, something
> more like
>
>         li      a5,-252645376
>         addi    a5,a5,241
>         addi    a0,a5,-1
>         slli    a5,a5,32
>         add     a0,a5,a0
>         ret
>
> Take that all with a grain of salt, though, as I just ate some very spicy
> chicken and can barely see straight :)

Yeah, that might end up being a false economy for superscalars.

In general, I wouldn't recommend spending too many cleverness beans on
non-Zba+Zbb implementations.  Going forward, we should expect that
even very simple cores provide those extensions.

>
>
> >
> >
> >>
> >> -Vineet
> >>
> >> >
> >> >
> >> > On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta <vine...@rivosinc.com> 
> >> > wrote:
> >> >>
> >> >>
> >> >> On 6/30/23 16:33, Vineet Gupta wrote:
> >> >>> Ran into a minor snafu in const splitting code when playing with test
> >> >>> case from an old PR/23813.
> >> >>>
> >> >>>        long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
> >> >>>
> >> >>> This currently generates
> >> >>>
> >> >>>        li      a5,-252645376
> >> >>>        addi    a5,a5,241
> >> >>>        li      a0,-252645376
> >> >>>        slli    a5,a5,32
> >> >>>        addi    a0,a0,240
> >> >>>        add     a0,a5,a0
> >> >>>        ret
> >> >>>
> >> >>> The signed math in hival extraction introduces an additional bit,
> >> >>> causing loval == hival check to fail.
> >> >>>
> >> >>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
> >> >>> ../gcc/config/riscv/riscv.cc:702
> >> >>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >> >>> | (gdb)n
> >> >>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 
> >> >>> 32);
> >> >>> | (gdb)
> >> >> FWIW (and I missed adding this observation to the changelog) I pondered
> >> >> about using unsigned loval/hival with zext_hwi() but that in certain
> >> >> cases can cause additional insns
> >> >>
> >> >> e.g. constant 0x8000_0000 is codegen to LI 1 +SLLI 31 vs, LI
> >> >> 0xFFFFFFFF_80000000
> >> >>
> >> >>
> >> >>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >> >>> | (gdb) p/x val
> >> >>> | $2 = 0xf0f0f0f0f0f0f0f0
> >> >>> | (gdb) p/x loval
> >> >>> | $3 = 0xfffffffff0f0f0f0
> >> >>> | (gdb) p/x hival
> >> >>> | $4 = 0xfffffffff0f0f0f1
> >> >>>                          ^^^
> >> >>> Fix that by eliding the subtraction in shift.
> >> >>>
> >> >>> With patch:
> >> >>>
> >> >>>        li      a5,-252645376
> >> >>>        addi    a5,a5,240
> >> >>>        slli    a0,a5,32
> >> >>>        add     a0,a0,a5
> >> >>>        ret
> >> >>>
> >> >>> gcc/ChangeLog:
> >> >>>
> >> >>>        * config/riscv/riscv.cc (riscv_split_integer): hival computation
> >> >>>          do elide subtraction of loval.
> >> >>>        * (riscv_split_integer_cost): Ditto.
> >> >>>        * (riscv_build_integer): Ditto
> >> >>>
> >> >>> Signed-off-by: Vineet Gupta <vine...@rivosinc.com>
> >> >>> ---
> >> >>> I wasn't planning to do any more work on large const stuff, but just 
> >> >>> ran into it this
> >> >>> on a random BZ entry when trying search for redundant constant stuff.
> >> >>> The test seemed too good to pass :-)
> >> >>> ---
> >> >>>    gcc/config/riscv/riscv.cc | 6 +++---
> >> >>>    1 file changed, 3 insertions(+), 3 deletions(-)
> >> >>>
> >> >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >> >>> index 5ac187c1b1b4..377d3aac794b 100644
> >> >>> --- a/gcc/config/riscv/riscv.cc
> >> >>> +++ b/gcc/config/riscv/riscv.cc
> >> >>> @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op 
> >> >>> *codes, HOST_WIDE_INT value,
> >> >>>          && (value > INT32_MAX || value < INT32_MIN))
> >> >>>        {
> >> >>>          unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
> >> >>> -      unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 
> >> >>> 32);
> >> >>> +      unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
> >> >>>          struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
> >> >>>          struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
> >> >>>          int hi_cost, lo_cost;
> >> >>> @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
> >> >>>    {
> >> >>>      int cost;
> >> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> >> >>>      struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
> >> >>>
> >> >>>      cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
> >> >>> @@ -700,7 +700,7 @@ static rtx
> >> >>>    riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
> >> >>>    {
> >> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> >> >>>      rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >> >>>
> >> >>>      riscv_move_integer (lo, lo, loval, mode);
> >>

Reply via email to