On 2014-10-21 12:06 AM, Maxim Kuvyrkov wrote:
Hi,

This patch adds auto-prefetcher modeling to GCC scheduler.  The auto-prefetcher 
model is currently enabled only for ARM Cortex-A15, since this is the only CPU 
that I know of to have the hardware auto-prefetcher unit.

The documentation on the auto-prefetcher is very sparse, and all I have are my empirical studies and a short 
note in Cortex-A15 manual (search for "L2 cache auto-prefether").  This patch, therefore, 
implements a very abstract model that makes scheduler prefer "mem_op (base+8); mem_op (base+12)" 
over "mem_op (base+12); mem_op (base+8)".  In other words, memory operations are tried to be issued 
in order of increasing memory offsets.

The auto-prefetcher model implementation is based on max_issue mutlipass lookahead 
scheduling, and its "guard" hook.  The guard hook examines contents of the 
ready list and the queue, and, if it finds instructions with lower memory offsets, marks 
instructions with higher memory offset as unavailable for immediate scheduling.

This patch has been in works since beginning of the year, and many of my 
previous scheduler cleanup patches were to prepare the infrastructure for this 
feature.

Ramana, this change requires benchmarking, which I can't easily do at the 
moment.  I would appreciate any benchmarking results that you can share.  In 
particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH needs to be 
tuned/confirmed for Cortex-A15.

At the moment the parameter is set to "2", which means that the autopref model 
will look through ready list and 1-stall queue in search of relevant instructions.  
Values of -1 (disable autopref), 0 (use autopref only in rank_for_schedule), 1 (look 
through ready list), 2 (look through ready list and 1-stall queue), and 3 (look through 
ready list and 2-stall queue) should be considered and benchmarked.

Bootstrapped on x86_64-linux-gnu and regtested on arm-linux-gnueaihf and 
aarch64-linux-gnu.  OK to apply, provided no performance or correctness 
regressions?

[ChangeLog is part of the git patch]



I'd prefer symbolic constants for dont_delay. Also the address can contains other parts, e.g. index for some targets. It is not necessary to change the code but a comment would be nice that right now it is oriented for machine with base+disp only addressing.

Although it is probably matter of taste. So you are free to commit it without any change.

  Thanks, Maxim.

Reply via email to