With FUSION you might get farther.  See the arm port as I recall.

The quick overview, FUSION allows instructions that are not contiguous to be 
paired up and fused together.  it was built for load/load store/store combining.

On Apr 19, 2015, at 10:09 PM, sameera <sameera.deshpa...@imgtec.com> wrote:
> Gentle reminder!
> 
> - Thanks and regards,
>  Sameera D.
> 
> On Monday 30 March 2015 04:58 PM, sameera wrote:
>> Hi!
>> 
>> Sorry for delay in sending this patch for review.
>> Please find attached updated patch.
>> 
>> In P5600, 2 consecutive loads/stores of same type which access contiguous 
>> memory locations are bonded together by instruction issue unit to dispatch
>> single load/store instruction which accesses both locations. This allows 2X 
>> improvement in memory intensive code. This optimization can be performed
>> for LH, SH, LW, SW, LWC, SWC, LDC, SDC instructions.
>> 
>> This patch adds peephole2 patterns to identify such loads/stores, and put 
>> them in parallel, so that the scheduler will not split it - thereby
>> guaranteeing h/w level load/store bonding.
>> 
>> The patch is tested with dejagnu for correctness, and tested on hardware for 
>> performance.
>> Ok for trunk?
>> 
>> Changelog:
>> gcc/
>>         * config/mips/mips.md (JOIN_MODE): New mode iterator.
>>     (join2_load_Store<JOIN_MODE:mode>): New pattern.
>>     (join2_loadhi): Likewise.
>>     (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
>>     load-load and store-stores.
>>     * config/mips/mips.opt (mload-store-pairs): New option.
>>     (TARGET_LOAD_STORE_PAIRS): New macro.
>>     *config/mips/mips.h (ENABLE_LD_ST_PAIRS): Likewise.
>>     *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
>>     *config/mips/mips.c(mips_load_store_bonding_p): New function.
>> 
>> - Thanks and regards,
>>   Sameera D.
>> 
>> On Tuesday 24 June 2014 04:12 PM, Sameera Deshpande wrote:
>>> Hi Richard,
>>> 
>>> Thanks for the review.
>>> Please find attached updated patch after your review comments.
>>> 
>>> Changelog:
>>> gcc/
>>>    * config/mips/mips.md (JOIN_MODE): New mode iterator.
>>>    (join2_load_Store<JOIN_MODE:mode>): New pattern.
>>>    (join2_loadhi): Likewise.
>>>    (define_peehole2): Add peephole2 patterns to join 2 HI/SI/SF/DF-mode
>>>    load-load and store-stores.
>>>    * config/mips/mips.opt (mload-store-pairs): New option.
>>>    (TARGET_LOAD_STORE_PAIRS): New macro.
>>>    *config/mips/mips.h (ENABLE_P5600_LD_ST_PAIRS): Likewise.
>>>    *config/mips/mips-protos.h (mips_load_store_bonding_p): New prototype.
>>>    *config/mips/mips.c(mips_load_store_bonding_p): New function.
>>> 
>>> The change is tested with dejagnu with additional options 
>>> -mload-store-pairs and -mtune=p5600.
>>> The perf measurement is yet to finish.
>>> 
>>>>> We had offline discussion based on your comment. There is additional
>>>>> view on the same.
>>>>> Only ISAs mips32r2, mips32r3 and mips32r5 support P5600. Remaining
>>>>> ISAs do not support P5600.
>>>>> For mips32r2 (24K) and mips32r3 (micromips), load-store pairing is
>>>>> implemented separately, and hence, as you suggested, P5600 Ld-ST
>>>>> bonding optimization should not be enabled for them.
>>>>> So, is it fine if I emit error for any ISAs other than mips32r2,
>>>>> mips32r3 and mips32r5 when P5600 is enabled, or the compilation should
>>>>> continue by emitting warning and disabling P5600?
>>>> 
>>>> No, the point is that we have two separate concepts: ISA and optimisation
>>>> target.  -mipsN and -march=N control the ISA (which instructions are
>>>> available) and -mtune=M controls optimisation decisions within the
>>>> constraints of that N, such as scheduling and the cost of things like
>>>> multiplication and division.
>>>> 
>>>> E.g. you could have -mips2 -mtune=p5600 -mfix-24k: generate MIPS II-
>>>> compatible code, optimise it for p5600, but make sure that 24k workarounds
>>>> are used.  The code would run correctly on any MIPS II-compatible processor
>>>> without known errata and also on the 24k.
>>> Ok, disabled the peephole pattern for fix-24k and micromips - to allow 
>>> specific patterns to be matched.
>>> 
>>>>> +
>>>>> +mld-st-pairing
>>>>> +Target Report Var(TARGET_ENABLE_LD_ST_PAIRING) Enable load/store
>>>>> +pairing
>>>> 
>>>> Other options are just "TARGET_" + the captialised form of the option name,
>>>> so I'd prefer TARGET_LD_ST_PAIRING instead.  Although "ld" might be
>>>> misleading since it's an abbreviation for "load" rather than the LD 
>>>> instruction.
>>>> Maybe -mload-store-pairs, since plurals are more common than "-ing"?
>>>> Not sure that's a great suggestion though.
>>> Renamed the option and corresponding macro as suggested.
>>> 
>>>>> Performance testing for this patch is not yet done.
>>>>> If the patch proves beneficial in most of the testcases (which we
>>>>> believe will do on P5600) we will enable this optimization by default
>>>>> for P5600 - in which case this option can be removed.
>>>> 
>>>> OK.  Sending the patch for comments before performance testing is fine, but
>>>> I think it'd be better to commit the patch only after the testing is done, 
>>>> since
>>>> otherwise the patch might need to be tweaked.
>>>> 
>>>> I don't see any problem with keeping the option in case people want to
>>>> experiment with it.  I just think the patch should only go in once it can 
>>>> be
>>>> enabled by default for p5600.  I.e. the option would exist to turn off the
>>>> pairing.
>>>> 
>>>> Not having the option is fine too of course.
>>> Yes, after perf analysis, I will share the results across, and then 
>>> depending upon the impact, the decision can be made - whether to make the 
>>> option
>>> as default or not, and then the patch will be submitted.
>>> 
>>>> We should allow pairing even without -mtune=p5600.
>>> The load-store pairing is currently attribute of P5600, so I have not 
>>> enabled the pairing without mtune=5600. If need be, can enable that without
>>> mtune=p5600.
>>> 
>>>> 
>>>> (define_mode_iterator JOIN_MODE [
>>>>   SI
>>>>   (DI "TARGET_64BIT")
>>>>   (SF "TARGET_HARD_FLOAT")
>>>>   (DF "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT")])
>>>> 
>>> Done this change.
>>> 
>>>> and then extend:
>>>> 
>>>>> @@ -883,6 +884,8 @@
>>>>>  (define_mode_attr loadx [(SF "lwxc1") (DF "ldxc1") (V2SF "ldxc1")])
>>>>> (define_mode_attr storex [(SF "swxc1") (DF "sdxc1") (V2SF "sdxc1")])
>>>>> 
>>>>> +(define_mode_attr insn_type [(SI "") (SF "fp") (DF "fp")])
>>>>> +
>>>>>  ;; The unextended ranges of the MIPS16 addiu and daddiu instructions
>>>>> ;; are different.  Some forms of unextended addiu have an 8-bit
>>>>> immediate  ;; field but the equivalent daddiu has only a 5-bit field.
>>>> 
>>>> this accordingly.
>>> In order to allow d/f for both register classes, the pattern 
>>> join2_load_store<mode> was altered a bit which eliminated this mode 
>>> iterator.
>>> 
>>>> 
>>>> Outer (parallel ...)s are redundant in a define_insn.
>>> Removed.
>>> 
>>>> 
>>>> It would be better to add the mips_load_store_insns for each operand
>>>> rather than multiplying one of them by 2.  Or see the next bit for an
>>>> alternative.
>>> Using the alternative method as you suggested, so this change is not needed.
>>> 
>>>> Please instead add HI to the define_mode_iterator so that we can use the
>>>> same peephole and define_insn.
>>> Added HI in the mode iterator to eliminate join2_storehi pattern and 
>>> corresponding peephole2.
>>> As arithmetic operations on HImode is not supported, we generate zero or 
>>> sign extended loads in such cases.
>>> To handle that case, join2_loadhi pattern is kept.
>>> 
>>> - Thanks and regards,
>>>    Sameera D.
>>> 

Reply via email to