https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116548
Bug ID: 116548 Summary: [avr] ivopts Introducing expensive loop condition Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gjl at gcc dot gnu.org Target Milestone: --- uint8_t add1 (const uint8_t *bb, uint8_t nn) { uint8_t sum = 0; do { sum += *bb++; } while (--nn); // Why not just 2 instructions: decrement + branch return sum; } $ avr-gcc -mmcu=avr4 -S -Os -dp has an expensive and overly complicated loop condition: add1: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ mov r20,r24 ; 47 [c=4 l=1] movqi_insn/0 mov r18,r24 ; 48 [c=4 l=1] movqi_insn/0 mov r19,r25 ; 49 [c=4 l=1] movqi_insn/0 ldi r24,0 ; 50 [c=4 l=1] movqi_insn/0 .L5: movw r30,r18 ; 38 [c=4 l=1] *movhi/0 subi r18,-1 ; 39 [c=4 l=2] *addhi3_clobber/1 sbci r19,-1 ld r25,Z ; 40 [c=4 l=1] movqi_insn/3 add r24,r25 ; 41 [c=4 l=1] *addqi3/0 mov r25,r22 ; 42 [c=4 l=1] movqi_insn/0 sub r25,r18 ; 43 [c=4 l=1] *subqi3/0 add r25,r20 ; 55 [c=4 l=1] *op8.for.cczn.plus/1 brne .L5 ; 56 [c=4 l=1] branch_ZN /* epilogue start */ ret ; 53 [c=0 l=1] return In the loop we have R18 and R30 (Z) holding the current address. The loop condition is: Insn 42 = Move nn to R25. Insn 43 = Subtract (low byte of) current address from R25. Insn 55 = Add (low byte of) initial addressto R25 Insn 56 = branch if result != 0 This are 4 instructions, and the register pressure is: A reg that holds nn, a reg that holds the current address, a reg that holds the start address, and a reg to compute the condition. Instead, the code could just DECrement nn in R20 and branch on != 0 which has less code, less cycles and less register pressure, even in the case when the start address and nn are needed after the loop. With -fno-ivopts, the code is: add1: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ movw r18,r24 ; 51 [c=4 l=1] *movhi/0 ldi r24,0 ; 44 [c=4 l=1] movqi_insn/0 .L5: movw r30,r18 ; 35 [c=4 l=1] *movhi/0 ld r25,Z ; 36 [c=4 l=1] movqi_insn/3 subi r18,-1 ; 37 [c=4 l=2] *addhi3_clobber/1 sbci r19,-1 add r24,r25 ; 38 [c=4 l=1] *addqi3/0 subi r22,lo8(1) ; 49 [c=4 l=1] *op8.for.cczn.plus/0 brne .L5 ; 50 [c=4 l=1] branch_ZN /* epilogue start */ ret ; 47 [c=0 l=1] return Which uses nn in R22 with decrement (insn 49) and branch (insn 50). Seems like ivopts cost model is off. Target: avr Configured with: ../../source/gcc-master/configure --target=avr --disable-nls --with-dwarf2 --with-gnu-as --with-gnu-ld --disable-shared --with-long-double=64 --enable-languages=c,c++ gcc version 15.0.0 20240829 (experimental) (GCC)