On Sat, Oct 11, 2014 at 5:13 AM, Jeff Law <l...@redhat.com> wrote: > On 09/30/14 03:22, Bin Cheng wrote: >> >> Hi, >> many load/store pairs as my old patch. Then I decided to take one step >> forward to introduce a generic instruction fusion infrastructure in GCC, >> because in essence, load/store pair is nothing different with other >> instruction fusion, all these optimizations want is to push instructions >> together in instruction flow. > > Great generalization. And yes, you're absolutely right, what you're doing > is building a fairly generic mechanism to mark insns that might fuse > together. > > So, some questions. Let's assume I've got 3 kinds of insns. A B & C. > > I can fuse AB or AC, but not BC. In fact, moving B & C together may > significantly harm performance. > > So my question is can a given insn have different fusion priorities > depending on its scheduling context? > > So perhaps an example. Let's say I have an insn stream with the following > kinds of instructions, all ready at the same time. > > AAAAAAAABBBBCCCC > > Can I create 8 distinct fusion priorities such that I ultimately schedule > AB(1) AB(2) AB(3) AB(4) AC(5) AC(6) AC(7) AC(8) > > I guess another way to ask the question, are fusion priorities static based > on the insn/alternative, or can they vary? And if they can vary, can they > vary each tick of the scheduler? > > > > Now the next issue is I really don't want all those to fire > back-to-back-to-back. I'd like some other insns to be inserted between each > fusion pair if they're in the ready list. I guess the easiest way to get > that is to assign the same fusion priority to other insns in the ready > queue, even though they don't participate in fusion. So > > ABX(1) ABY(2)..... > > Where X & Y are some other arbitrary insns that don't participate in the AB > fusion, but will issue in the same cycle as the AB fused insn. > > Though I guess if we run fusion + peep2 between sched1 and sched2, that > problem would just resolve itself as we'd have fused AB together into a new > insn and we'd schedule normally with the fused insns and X, Y. > > > > >> So here comes this patch. It adds a new sched_fusion pass just before >> peephole2. The methodology is like: >> 1) The priority in scheduler is extended into [fusion_priority, priority] >> pair, with fusion_priority as the major key and priority as the minor key. >> 2) The back-end assigns priorities pair to each instruction, instructions >> want to be fused together get same fusion_priority assigned. > > I think the bulk of my questions above are targetted at this part of the > change. When are these assignments made and how much freedom does the > backend have to make/change those assignments. > > So another question, given a fused pair, is there any way to guarantee > ordering within the fused pair. This is useful to cut down on the number of > peep2 patterns. I guess we could twiddle the priority in those cases to > force a particular ordering of the fused pair, right? > > I wonder if we could use this to zap all the hair I added to caller-save > back in the early 90s to try and widen the save/restore modes. So instead > of st; st; call; ld; ld, we'd generate std; call; ldd. It was a huge win > for floating point on the sparc processors of that time. I don't expect you > to do that investigation. Just thinking out loud. > > > > >> >> I collected performance data for both cortex-a15 and cortex-a57 (with a >> local peephole ldp/stp patch), the benchmarks can be obviously improved on >> arm/aarch64. I also collected instrument data about how many load/store >> pairs are found. For the four versions of load/store pair patches: >> 0) The original Mike's patch. >> 1) My original prototype patch. >> 2) Cleaned up pass of Mike (with implementation bugs resolved). >> 3) This new prototype fusion pass. >> >> The numbers of paired opportunities satisfy below relations: >> 3 * N0 ~ N1 ~ N2 < N3 >> For example, for one benchmark suite, we have: >> N0 ~= 1300 >> N1/N2 ~= 5000 >> N3 ~= 7500 > > Nice. Very nice. > > Overall it's a fairly simple change. I'll look deeper into it next week. > > jeff
Hi Jeff, Any new comments? Thanks, bin