Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

Bin.Cheng Mon, 20 Oct 2014 22:45:12 -0700

On Sat, Oct 11, 2014 at 5:13 AM, Jeff Law <l...@redhat.com> wrote:
> On 09/30/14 03:22, Bin Cheng wrote:
>>
>> Hi,
>> many load/store pairs as my old patch.  Then I decided to take one step
>> forward to introduce a generic instruction fusion infrastructure in GCC,
>> because in essence, load/store pair is nothing different with other
>> instruction fusion, all these optimizations want is to push instructions
>> together in instruction flow.
>
> Great generalization.  And yes, you're absolutely right, what you're doing
> is building a fairly generic mechanism to mark insns that might fuse
> together.
>
> So, some questions.  Let's assume I've got 3 kinds of insns.  A B & C.
>
> I can fuse AB or AC, but not BC.  In fact, moving B & C together may
> significantly harm performance.
>
> So my question is can a given insn have different fusion priorities
> depending on its scheduling context?
>
> So perhaps an example.  Let's say I have an insn stream with the following
> kinds of instructions, all ready at the same time.
>
> AAAAAAAABBBBCCCC
>
> Can I create 8 distinct fusion priorities such that I ultimately schedule
> AB(1) AB(2) AB(3) AB(4) AC(5) AC(6) AC(7) AC(8)
>
> I guess another way to ask the question, are fusion priorities static based
> on the insn/alternative, or can they vary?  And if they can vary, can they
> vary each tick of the scheduler?
>
>
>
> Now the next issue is I really don't want all those to fire
> back-to-back-to-back.  I'd like some other insns to be inserted between each
> fusion pair if they're in the ready list.  I guess the easiest way to get
> that is to assign the same fusion priority to other insns in the ready
> queue, even though they don't participate in fusion.  So
>
> ABX(1) ABY(2).....
>
> Where X & Y are some other arbitrary insns that don't participate in the AB
> fusion, but will issue in the same cycle as the AB fused insn.
>
> Though I guess if we run fusion + peep2 between sched1 and sched2, that
> problem would just resolve itself as we'd have fused AB together into a new
> insn and we'd schedule normally with the fused insns and X, Y.
>
>
>
>
>> So here comes this patch.  It adds a new sched_fusion pass just before
>> peephole2.  The methodology is like:
>> 1) The priority in scheduler is extended into [fusion_priority, priority]
>> pair, with fusion_priority as the major key and priority as the minor key.
>> 2) The back-end assigns priorities pair to each instruction, instructions
>> want to be fused together get same fusion_priority assigned.
>
> I think the bulk of my questions above are targetted at this part of the
> change.  When are these assignments made and how much freedom does the
> backend have to make/change those assignments.
>
> So another question, given a fused pair, is there any way to guarantee
> ordering within the fused pair.  This is useful to cut down on the number of
> peep2 patterns.  I guess we could twiddle the priority in those cases to
> force a particular ordering of the fused pair, right?
>
> I wonder if we could use this to zap all the hair I added to caller-save
> back in the early 90s to try and widen the save/restore modes.  So instead
> of st; st; call; ld; ld, we'd generate std; call; ldd.  It was a huge win
> for floating point on the sparc processors of that time.  I don't expect you
> to do that investigation.  Just thinking out loud.
>
>
>
>
>>
>> I collected performance data for both cortex-a15 and cortex-a57 (with a
>> local peephole ldp/stp patch), the benchmarks can be obviously improved on
>> arm/aarch64.  I also collected instrument data about how many load/store
>> pairs are found.  For the four versions of load/store pair patches:
>> 0) The original Mike's patch.
>> 1) My original prototype patch.
>> 2) Cleaned up pass of Mike (with implementation bugs resolved).
>> 3) This new prototype fusion pass.
>>
>> The numbers of paired opportunities satisfy below relations:
>> 3 * N0 ~ N1 ~ N2 < N3
>> For example, for one benchmark suite, we have:
>> N0 ~= 1300
>> N1/N2 ~= 5000
>> N3 ~= 7500
>
> Nice.  Very nice.
>
> Overall it's a fairly simple change.  I'll look deeper into it next week.
>
> jeff


Hi Jeff,
Any new comments?

Thanks,
bin

Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

Reply via email to