https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

            Bug ID: 105733
           Summary: riscv: Poor codegen for large stack frames
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jrtc27 at jrtc27 dot com
  Target Milestone: ---
            Target: riscv*-*-*

For the following test:

#define BUF_SIZE 2064

void
foo(unsigned long i)
{
    volatile char buf[BUF_SIZE];

    buf[i] = 0;
}

GCC currently generates:

foo:
        li      t0,-4096
        addi    t0,t0,2016
        li      a4,4096
        add     sp,sp,t0
        li      a5,-4096
        addi    a4,a4,-2032
        add     a4,a4,a5
        addi    a5,sp,16
        add     a5,a4,a5
        add     a0,a5,a0
        li      t0,4096
        sd      a5,8(sp)
        sb      zero,2032(a0)
        addi    t0,t0,-2016
        add     sp,sp,t0
        jr      ra

whereas Clang generates the much shorter:

foo:
        lui     a1, 1
        addiw   a1, a1, -2016
        sub     sp, sp, a1
        addi    a1, sp, 16
        add     a0, a0, a1
        sb      zero, 0(a0)
        lui     a0, 1
        addiw   a0, a0, -2016
        add     sp, sp, a0
        ret

The:

        li      a4,4096
        ...
        li      a5,-4096
        addi    a4,a4,-2032
        add     a4,a4,a5

sequence in particular is rather surprising to see rather than just li a4,-2032
and constant-folding that would halve the instruction count difference between
GCC and Clang alone.

See: https://godbolt.org/z/8EGc85dsf

Reply via email to