https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116548

            Bug ID: 116548
           Summary: [avr] ivopts Introducing expensive loop condition
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gjl at gcc dot gnu.org
  Target Milestone: ---

uint8_t add1 (const uint8_t *bb, uint8_t nn)
{
    uint8_t sum = 0;
    do
    {
        sum += *bb++;
    } while (--nn);    // Why not just 2 instructions: decrement + branch
    return sum;
}

$ avr-gcc -mmcu=avr4 -S -Os -dp

has an expensive and overly complicated loop condition:

add1:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
        mov r20,r24      ;  47  [c=4 l=1]  movqi_insn/0
        mov r18,r24      ;  48  [c=4 l=1]  movqi_insn/0
        mov r19,r25      ;  49  [c=4 l=1]  movqi_insn/0
        ldi r24,0                ;  50  [c=4 l=1]  movqi_insn/0
.L5:
        movw r30,r18     ;  38  [c=4 l=1]  *movhi/0
        subi r18,-1      ;  39  [c=4 l=2]  *addhi3_clobber/1
        sbci r19,-1
        ld r25,Z                 ;  40  [c=4 l=1]  movqi_insn/3
        add r24,r25      ;  41  [c=4 l=1]  *addqi3/0
        mov r25,r22      ;  42  [c=4 l=1]  movqi_insn/0
        sub r25,r18      ;  43  [c=4 l=1]  *subqi3/0
        add r25,r20      ;  55  [c=4 l=1]  *op8.for.cczn.plus/1
        brne .L5                 ;  56  [c=4 l=1]  branch_ZN
/* epilogue start */
        ret              ;  53  [c=0 l=1]  return

In the loop we have R18 and R30 (Z) holding the current address.

The loop condition is:
Insn 42 = Move nn to R25.
Insn 43 = Subtract (low byte of) current address from R25.
Insn 55 = Add (low byte of) initial addressto R25
Insn 56 = branch if result != 0

This are 4 instructions, and the register pressure is: A reg that holds nn, a
reg that holds the current address, a reg that holds the start address, and a
reg to compute the condition.

Instead, the code could just DECrement nn in R20 and branch on != 0 which has
less code, less cycles and less register pressure, even in the case when the
start address and nn are needed after the loop.

With -fno-ivopts, the code is:

add1:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
        movw r18,r24     ;  51  [c=4 l=1]  *movhi/0
        ldi r24,0                ;  44  [c=4 l=1]  movqi_insn/0
.L5:
        movw r30,r18     ;  35  [c=4 l=1]  *movhi/0
        ld r25,Z                 ;  36  [c=4 l=1]  movqi_insn/3
        subi r18,-1      ;  37  [c=4 l=2]  *addhi3_clobber/1
        sbci r19,-1
        add r24,r25      ;  38  [c=4 l=1]  *addqi3/0
        subi r22,lo8(1)  ;  49  [c=4 l=1]  *op8.for.cczn.plus/0
        brne .L5                 ;  50  [c=4 l=1]  branch_ZN
/* epilogue start */
        ret              ;  47  [c=0 l=1]  return

Which uses nn in R22 with decrement (insn 49) and branch (insn 50).

Seems like ivopts cost model is off.

Target: avr
Configured with: ../../source/gcc-master/configure --target=avr --disable-nls
--with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared
--with-long-double=64 --enable-languages=c,c++
gcc version 15.0.0 20240829 (experimental) (GCC)

Reply via email to