Indeed, we observed some problems with scheduling which we believe has more to do with the scheduling algorithm than with the model DFA, as we said in https://gcc.gnu.org/ml/gcc/2015-09/msg00118.html
Cheers, -- Evandro Menezes Austin, TX > -----Original Message----- > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of > Nikolai Bozhenov > Sent: Monday, September 14, 2015 2:28 > To: James Greenhalgh > Cc: gcc@gcc.gnu.org > Subject: Re: [AArch64] A question about Cortex-A57 pipeline description > > Thanks for the reply! I see you point. Indeed, I've also seen cases where the > load pipeline was overused at the beginning of a basic block, whereas at the > end the code got stuck with a bunch of stores and no other instructions to > run in parallel. And indeed, relaxing the restrictions makes things even > worse in some cases. Anyway, I don't believe it's the best we can do, I'm > going to have a closer look at the scheduler and see what can be done to > improve the situation. > > Nikolai > > > On 09/11/2015 07:21 PM, James Greenhalgh wrote: > > On Fri, Sep 11, 2015 at 04:31:37PM +0100, Nikolai Bozhenov wrote: > >> Hi! > >> > >> Recently I got somewhat confused by Cortex-A57 pipeline description > >> in GCC and I would be grateful if you could help me understand a few > >> unclear points. > > Sure, > > > >> Particularly I am interested in how memory operations (loads/stores) > >> are scheduled. It seems that according to the cortex-a57.md file, > >> firstly, two memory operations may never be scheduled at the same > >> cycle and, secondly, two loads may never be scheduled at two consecutive > cycles: > >> > >> ;; 5. Two pipelines for load and store operations: LS1, LS2. The > most > >> ;; valuable thing we can do is force a structural hazard to > split > >> ;; up loads/stores. > >> > >> (define_cpu_unit "ca57_ls_issue" "cortex_a57") > >> (define_cpu_unit "ca57_ldr, ca57_str" "cortex_a57") > >> (define_reservation "ca57_load_model" "ca57_ls_issue,ca57_ldr*2") > >> (define_reservation "ca57_store_model" > >> "ca57_ls_issue,ca57_str") > >> > >> However, the Cortex-A57 Software Optimization Guide states that the > >> core is able to execute one load operation and one store operation > >> every cycle. And that agrees with my experiments. Indeed, a loop > >> consisting of 10 loads, 10 stores and several arithmetic operations > >> takes on average about 10 cycles per iteration, provided that the > instructions are intermixed properly. > >> > >> So, what is the purpose of additional restrictions imposed on the > >> scheduler in cortex-a57.md file? It doesn't look like an error. > >> Rather, it looks like a deliberate decision. > > When designing the model for the Cortex-A57 processor, I was primarily > > trying to build a model which would increase the blend of utilized > > pipelines on each cycle across a range of benchmarks, rather than to > > accurately reflect the constraints listed in the Cortex-A57 Software > > Optimisation Guide [1]. > > > > My reasoning here is that the Cortex-A57 is a high-performance > > processor, and an accurate model would be infeasible to build. Because > > of this, it is unlikely that the model in GCC will be representative > > of the true state of the processor, and consequently GCC may make > > decisions which result in an instruction stream which would bias > > towards one execution pipeline. In particular, given a less > > restrictive model, GCC will try to hoist more loads to be earlier in > > the basic block, which can result in less good utilization of the other > execution pipelines. > > > > In my experiments, I found this model to be more beneficial across a > > range of benchmarks than a model with the additional restrictions I imposed > relaxed. > > I'd be happy to consider counter-examples where this modeling produces > > suboptimal results - and where the changes you suggest are sufficient > > to resolve the issue. > > > > Thanks, > > James > > > > --- > > [1]: Cortex-A57 Software Optimisation Guide > > > > > http://infocenter.arm.com/help/topic/com.arm.doc.uan0015a/cortex_a57_softwar e > _optimisation_guide_external.pdf > >