пт, 30 авг. 2024 г. в 16:09, Georg-Johann Lay <a...@gjlay.de>: > > There are cases, where opportunities to use POST_INC addressing > only occur very late in the compilation process. Take for example > the following function from AVR-LibC's qsort: > > void swapfunc (char *a, char *b, int n) > { > do > { > char t = *a; > *a++ = *b; > *b++ = t; > } while (--n > 0); > } > > > which -mmcu=avrtiny -S -Os -dp compiles to: > > swapfunc: > push r28 ; 72 [c=4 l=1] pushqi1/0 > push r29 ; 73 [c=4 l=1] pushqi1/0 > /* prologue: function */ > /* frame size = 0 */ > /* stack size = 2 */ > mov r26,r24 ; 66 [c=4 l=1] movqi_insn/0 > mov r27,r25 ; 67 [c=4 l=1] movqi_insn/0 > mov r30,r22 ; 68 [c=4 l=1] movqi_insn/0 > mov r31,r23 ; 69 [c=4 l=1] movqi_insn/0 > mov r22,r20 ; 70 [c=4 l=1] movqi_insn/0 > mov r23,r21 ; 71 [c=4 l=1] movqi_insn/0 > .L2: > ld r20,X ; 55 [c=4 l=1] movqi_insn/3 > ld r21,Z ; 57 [c=4 l=1] movqi_insn/3 > st X,r21 ; 58 [c=4 l=1] movqi_insn/2 > subi r26,-1 ; 59 [c=4 l=2] *addhi3_clobber/0 > sbci r27,-1 > st Z,r20 ; 61 [c=4 l=1] movqi_insn/2 > subi r30,-1 ; 62 [c=4 l=2] *addhi3_clobber/0 > sbci r31,-1 > subi r22, 1 ; 81 [c=4 l=2] *add.for.cczn.hi/0 > sbci r23, 0 > breq .+2 ; 82 [c=8 l=2] branch_ZN > brpl .L2 > /* epilogue start */ > pop r29 ; 76 [c=4 l=1] popqi > pop r28 ; 77 [c=4 l=1] popqi > ret ; 78 [c=0 l=1] return_from_epilogue > > Insn 56+57 and insns 61+62 are post-inc stores. They are not recognized > because the code prior to cprop_hardreg is a bit of a mess that moves > addresses back and forth (including Y). Only after cprop_hardreg the > code is simple enough so post-inc can be detected by avr-fuse-add. > > Hence this patch runs a 2nd instance of that pass late after > cprop_hardreg (the 1st instance runs prior to RTL peephole). > > It renames avr_split_tiny_move to avr_split_fake_addressing_move > because that function also splits some insns on non-avrtiny. > > The patch removes a define_split that's no more needed because > such splits are performed by avr_split_fake_addressing_move. > > Passed without new regressions. Ok for trunk?
Ok. Please apply. Denis > > Johann > > p.s. post-inc etc. optimizations is basically non-existent in GCC. > One reasons seems to be SSA because each SSA variable gets its own > register, which results in a jungle of required registers which > even pass auto-inc-dec cannot penetrate... > > Take for example the following code, that would require 12 instructions > as indicated by the comments: > > void add4 (uint8_t *aa, const __flash uint8_t *bb, uint8_t nn) > { > // Set Z (R30) to bb (1 MOVW or 2 MOVs) > // Set X (R26) to aa (1 MOVW or 2 MOVs) > do > { > uint8_t sum = 0; > sum += *bb++; // 1 instruction: POST_INC load > sum += *bb++; // 2 instructions: POST_INC load + add > sum += *bb++; // 2 instructions: POST_INC load + add > sum += *bb++; // 2 instructions: POST_INC load + add > *aa++ = sum; // 1 instruction: POST_INC store > } while (--nn); // 2 instructions: dec + branch > } > > But -mmcu=avr4 -Os -S -dp compiles this to madness that > requires more than a dozen registers and computes each > intermediate in its own 16-bit register, coming out with > code that requires 34 instructions instead of 12. > > -- > > AVR: Run pass avr-fuse-add a second time after pass_cprop_hardreg. > > gcc/ > * config/avr/avr-protos.h (avr_split_tiny_move): Rename to > avr_split_fake_addressing_move. > * config/avr/avr-passes.cc: Same. > (avr_pass_data_fuse_add) <tv_id>: Set to TV_MACH_DEP. > (avr_pass_fuse_add) <clone>: Override. > * config/avr/avr-passes.def (avr_pass_fuse_add): Run again > after pass_cprop_hardreg. > * config/avr/avr.md (split-lpmx): Remove a define_split. Such > splits are performed by avr_split_fake_addressing_move.