On Thu, May 15, 2014 at 6:31 PM, Oleg Endo <oleg.e...@t-online.de> wrote: > Hi, > > On 15 May 2014, at 09:26, "bin.cheng" <bin.ch...@arm.com> wrote: > >> Hi, >> Targets like ARM and AARCH64 support double-word load store instructions, >> and these instructions are generally faster than the corresponding two >> load/stores. GCC currently uses peephole2 to merge paired load/store into >> one single instruction which has a disadvantage. It can only handle simple >> cases like the two instructions actually appear sequentially in instruction >> stream, and is too weak to handle cases in which the two load/store are >> intervened by other irrelevant instructions. >> >> Here comes up with a new GCC pass looking through each basic block and >> merging paired load store even they are not adjacent to each other. The >> algorithm is pretty simple: >> 1) In initialization pass iterating over instruction stream it collects >> relevant memory access information for each instruction. >> 2) It iterates over each basic block, tries to find possible paired >> instruction for each memory access instruction. During this work, it checks >> dependencies between the two possible instructions and also records the >> information indicating how to pair the two instructions. To avoid quadratic >> behavior of the algorithm, It introduces new parameter >> max-merge-paired-loadstore-distance and set the default value to 4, which is >> large enough to catch major part of opportunities on ARM/cortex-a15. >> 3) For each candidate pair, it calls back-end's hook to do target dependent >> check and merge the two instructions if possible. >> >> Though the parameter is set to 4, for miscellaneous benchmarks, this pass >> can merge numerous opportunities except ones already merged by peephole2 >> (same level numbers of opportunities comparing to peepholed ones). GCC >> bootstrap can also confirm this finding. > > This is interesting. E.g. on SH there are insns to load/store SFmode pairs. > However, these insns require a mode switch and have some constraints on > register usage. So in the SH case the load/store pairing would need to be > done before reg alloc and before mode switching. > >> >> Yet there is an open issue about when we should run this new pass. Though >> register renaming is disabled by default now, I put this pass after it, >> because renaming can resolve some false dependencies thus benefit this pass. >> Another finding is, it can capture a lot more opportunities if it's after >> sched2, but I am not sure whether it will mess up with scheduling results in >> this way. > > How about the following. > Instead of adding new hooks and inserting the pass to the general pass list, > make the new > pass class take the necessary callback functions directly. Then targets can > just instantiate > the pass, passing their impl of the callbacks, and insert the pass object > into the pass list at > a place that fits best for the target. Oh, I don't know we can do this in GCC. But yes, a target may want to run it at some place that fits best for the target.
Thanks, bin > > >> >> So, any comments about this? >> >> Thanks, >> bin >> >> >> 2014-05-15 Bin Cheng <bin.ch...@arm.com> >> * common.opt (flag_merge_paired_loadstore): New option. >> * merge-paired-loadstore.c: New file. >> * Makefile.in: Support new file. >> * config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro. >> (load_latency_expanded_p, arm_merge_paired_loadstore): New function. >> * params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param. >> * doc/invoke.texi (-fmerge-paired-loadstore): New. >> (max-merge-paired-loadstore-distance): New. >> * doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New. >> * doc/tm.texi: Regenerated. >> * target.def (merge_paired_loadstore): New. >> * tree-pass.h (make_pass_merge_paired_loadstore): New decl. >> * passes.def (pass_merge_paired_loadstore): New pass. >> * timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var. >> >> gcc/testsuite/ChangeLog >> 2014-05-15 Bin Cheng <bin.ch...@arm.com> >> >> * gcc.target/arm/merge-paired-loadstore.c: New test. >> >> <merge-paired-loadstore-20140515.txt> -- Best Regards.