Hi Roger, Your patch doesn't introduce new regressions. However, before pushing to the mainline you need to fix some issues: 1. Please fix the trailing spaces and blocks of 8 spaces which should be replaced with tabs. You can use check_GNU_style.py script to spot them. 2. Please use capital letters for code iterators (i.e., any_shift_rotate).
Once the above issues are fixed, please proceed with your commit. Thank you for your contribution, Claudiu On Sun, Oct 8, 2023 at 10:07 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch completes the ARC back-end's transition to using pre-reload > splitters for SImode shifts and rotates on targets without a barrel > shifter. The core part is that the shift_si3 define_insn is no longer > needed, as shifts and rotates that don't require a loop are split > before reload, and then because shift_si3_loop is the only caller > of output_shift, both can be significantly cleaned up and simplified. > The output_shift function (Claudiu's "the elephant in the room") is > renamed output_shift_loop, which handles just the four instruction > zero-overhead loop implementations. > > Aside from the clean-ups, the user visible changes are much improved > implementations of SImode shifts and rotates on affected targets. > > For the function: > unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); } > > GCC with -O2 -mcpu=em would previously generate: > > rotr_1: lsr_s r2,r0 > bmsk_s r0,r0,0 > ror r0,r0 > j_s.d [blink] > or_s r0,r0,r2 > > with this patch, we now generate: > > j_s.d [blink] > ror r0,r0 > > For the function: > unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); } > > GCC with -O2 -mcpu=em would previously generate: > > rotr_31: > mov_s r2,r0 ;4 > asl_s r0,r0 > add.f 0,r2,r2 > rlc r2,0 > j_s.d [blink] > or_s r0,r0,r2 > > with this patch we now generate an add.f followed by an adc: > > rotr_31: > add.f r0,r0,r0 > j_s.d [blink] > add.cs r0,r0,1 > > > Shifts by constants requiring a loop have been improved for even counts > by performing two operations in each iteration: > > int shl10(int x) { return x >> 10; } > > Previously looked like: > > shl10: mov.f lp_count, 10 > lpnz 2f > asr r0,r0 > nop > 2: # end single insn loop > j_s [blink] > > > And now becomes: > > shl10: > mov lp_count,5 > lp 2f > asr r0,r0 > asr r0,r0 > 2: # end single insn loop > j_s [blink] > > > So emulating ARC's SWAP on architectures that don't have it: > > unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); } > > previously required 10 instructions and ~70 cycles: > > rotr_16: > mov_s r2,r0 ;4 > mov.f lp_count, 16 > lpnz 2f > add r0,r0,r0 > nop > 2: # end single insn loop > mov.f lp_count, 16 > lpnz 2f > lsr r2,r2 > nop > 2: # end single insn loop > j_s.d [blink] > or_s r0,r0,r2 > > now becomes just 4 instructions and ~18 cycles: > > rotr_16: > mov lp_count,8 > lp 2f > ror r0,r0 > ror r0,r0 > 2: # end single insn loop > j_s [blink] > > > This patch has been tested with a cross-compiler to arc-linux hosted > on x86_64-pc-linux-gnu and (partially) tested with the compile-only > portions of the testsuite with no regressions. Ok for mainline, if > your own testing shows no issues? > > > 2023-10-07 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/arc/arc-protos.h (output_shift): Rename to... > (output_shift_loop): Tweak API to take an explicit rtx_code. > (arc_split_ashl): Prototype new function here. > (arc_split_ashr): Likewise. > (arc_split_lshr): Likewise. > (arc_split_rotl): Likewise. > (arc_split_rotr): Likewise. > * config/arc/arc.cc (output_shift): Delete local prototype. Rename. > (output_shift_loop): New function replacing output_shift to output > a zero overheap loop for SImode shifts and rotates on ARC targets > without barrel shifter (i.e. no hardware support for these insns). > (arc_split_ashl): New helper function to split *ashlsi3_nobs. > (arc_split_ashr): New helper function to split *ashrsi3_nobs. > (arc_split_lshr): New helper function to split *lshrsi3_nobs. > (arc_split_rotl): New helper function to split *rotlsi3_nobs. > (arc_split_rotr): New helper function to split *rotrsi3_nobs. > * config/arc/arc.md (any_shift_rotate): New define_code_iterator. > (define_code_attr insn): New code attribute to map to pattern name. > (<any_shift_rotate>si3): New expander unifying previous ashlsi3, > ashrsi3 and lshrsi3 define_expands. Adds rotlsi3 and rotrsi3. > (*<any_shift_rotate>si3_nobs): New define_insn_and_split that > unifies the previous *ashlsi3_nobs, *ashrsi3_nobs and *lshrsi3_nobs. > We now call arc_split_<insn> in arc.cc to implement each split. > (shift_si3): Delete define_insn, all shifts/rotates are now split. > (shift_si3_loop): Rename to... > (<insn>si3_loop): define_insn to handle loop implementations of > SImode shifts and rotates, calling ouput_shift_loop for template. > (rotrsi3): Rename to... > (*rotrsi3_insn): define_insn for TARGET_BARREL_SHIFTER's ror. > (*rotlsi3): New define_insn_and_split to transform left rotates > into right rotates before reload. > (rotlsi3_cnt1): New define_insn_and_split to implement a left > rotate by one bit using an add.f followed by an adc. > * config/arc/predicates.md (shiftr4_operator): Delete. > > > Thanks in advance, > Roger > -- >