With FUSION you might get farther. See the arm port as I recall. The quick overview, FUSION allows instructions that are not contiguous to be paired up and fused together. it was built for load/load store/store combining.
On Apr 19, 2015, at 10:09 PM, sameera <sameera.deshpa...@imgtec.com> wrote: > Gentle reminder! > > - Thanks and regards, > Sameera D. > > On Monday 30 March 2015 04:58 PM, sameera wrote: >> Hi! >> >> Sorry for delay in sending this patch for review. >> Please find attached updated patch. >> >> In P5600, 2 consecutive loads/stores of same type which access contiguous >> memory locations are bonded together by instruction issue unit to dispatch >> single load/store instruction which accesses both locations. This allows 2X >> improvement in memory intensive code. This optimization can be performed >> for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions. >> >> This patch adds peephole2 patterns to identify such loads/stores, and put >> them in parallel, so that the scheduler will not split it - thereby >> guaranteeing h/w level load/store bonding. >> >> The patch is tested with dejagnu for correctness, and tested on hardware for >> performance. >> Ok for trunk? >> >> Changelog: >> gcc/ >> * config/mips/mips.md (JOIN_MODE): New mode iterator. >> (join2_load_Store<JOIN_MODE:mode>): New pattern. >> (join2_loadhi): Likewise. >> (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode >> load-load and store-stores. >> * config/mips/mips.opt (mload-store-pairs): New option. >> (TARGET_LOAD_STORE_PAIRS): New macro. >> *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise. >> *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype. >> *config/mips/mips.c(mips_load_store_bonding_p): New function. >> >> - Thanks and regards, >> Sameera D. >> >> On Tuesday 24 June 2014 04:12 PM, Sameera Deshpande wrote: >>> Hi Richard, >>> >>> Thanks for the review. >>> Please find attached updated patch after your review comments. >>> >>> Changelog: >>> gcc/ >>> * config/mips/mips.md (JOIN_MODE): New mode iterator. >>> (join2_load_Store<JOIN_MODE:mode>): New pattern. >>> (join2_loadhi): Likewise. >>> (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode >>> load-load and store-stores. >>> * config/mips/mips.opt (mload-store-pairs): New option. >>> (TARGET_LOAD_STORE_PAIRS): New macro. >>> *config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise. >>> *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype. >>> *config/mips/mips.c(mips_load_store_bonding_p): New function. >>> >>> The change is tested with dejagnu with additional options >>> -mload-store-pairs and -mtune=p5600. >>> The perf measurement is yet to finish. >>> >>>>> We had offline discussion based on your comment. There is additional >>>>> view on the same. >>>>> Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining >>>>> ISAs do not support P5600. >>>>> For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is >>>>> implemented separately, and hence, as you suggested, P5600 Ld-ST >>>>> bonding optimization should not be enabled for them. >>>>> So, is it fine if I emit error for any ISAs other than mips32r2, >>>>> mips32r3 and mips32r5 when P5600 is enabled, or the compilation should >>>>> continue by emitting warning and disabling P5600? >>>> >>>> No, the point is that we have two separate concepts: ISA and optimisation >>>> target. -mipsN and -march=N control the ISA (which instructions are >>>> available) and -mtune=M controls optimisation decisions within the >>>> constraints of that N, such as scheduling and the cost of things like >>>> multiplication and division. >>>> >>>> E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II- >>>> compatible code, optimise it for p5600, but make sure that 24k workarounds >>>> are used. The code would run correctly on any MIPS II-compatible processor >>>> without known errata and also on the 24k. >>> Ok, disabled the peephole pattern for fix-24k and micromips - to allow >>> specific patterns to be matched. >>> >>>>> + >>>>> +mld-st-pairing >>>>> +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store >>>>> +pairing >>>> >>>> Other options are just "TARGET_" + the captialised form of the option name, >>>> so I'd prefer TARGET_LD_ST_PAIRING instead. Although "ld" might be >>>> misleading since it's an abbreviation for "load" rather than the LD >>>> instruction. >>>> Maybe -mload-store-pairs, since plurals are more common than "-ing"? >>>> Not sure that's a great suggestion though. >>> Renamed the option and corresponding macro as suggested. >>> >>>>> Performance testing for this patch is not yet done. >>>>> If the patch proves beneficial in most of the testcases (which we >>>>> believe will do on P5600) we will enable this optimization by default >>>>> for P5600 - in which case this option can be removed. >>>> >>>> OK. Sending the patch for comments before performance testing is fine, but >>>> I think it'd be better to commit the patch only after the testing is done, >>>> since >>>> otherwise the patch might need to be tweaked. >>>> >>>> I don't see any problem with keeping the option in case people want to >>>> experiment with it. I just think the patch should only go in once it can >>>> be >>>> enabled by default for p5600. I.e. the option would exist to turn off the >>>> pairing. >>>> >>>> Not having the option is fine too of course. >>> Yes, after perf analysis, I will share the results across, and then >>> depending upon the impact, the decision can be made - whether to make the >>> option >>> as default or not, and then the patch will be submitted. >>> >>>> We should allow pairing even without -mtune=p5600. >>> The load-store pairing is currently attribute of P5600, so I have not >>> enabled the pairing without mtune=5600. If need be, can enable that without >>> mtune=p5600. >>> >>>> >>>> (define_mode_iterator JOIN_MODE [ >>>> SI >>>> (DI "TARGET_64BIT") >>>> (SF "TARGET_HARD_FLOAT") >>>> (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")]) >>>> >>> Done this change. >>> >>>> and then extend: >>>> >>>>> @@ -883,6 +884,8 @@ >>>>> (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")]) >>>>> (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")]) >>>>> >>>>> +(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")]) >>>>> + >>>>> ;; The unextended ranges of the MIPS16 addiu and daddiu instructions >>>>> ;; are different. Some forms of unextended addiu have an 8-bit >>>>> immediate ;; field but the equivalent daddiu has only a 5-bit field. >>>> >>>> this accordingly. >>> In order to allow d/f for both register classes, the pattern >>> join2_load_store<mode> was altered a bit which eliminated this mode >>> iterator. >>> >>>> >>>> Outer (parallel ...)s are redundant in a define_insn. >>> Removed. >>> >>>> >>>> It would be better to add the mips_load_store_insns for each operand >>>> rather than multiplying one of them by 2. Or see the next bit for an >>>> alternative. >>> Using the alternative method as you suggested, so this change is not needed. >>> >>>> Please instead add HI to the define_mode_iterator so that we can use the >>>> same peephole and define_insn. >>> Added HI in the mode iterator to eliminate join2_storehi pattern and >>> corresponding peephole2. >>> As arithmetic operations on HImode is not supported, we generate zero or >>> sign extended loads in such cases. >>> To handle that case, join2_loadhi pattern is kept. >>> >>> - Thanks and regards, >>> Sameera D. >>>