On 2014-10-21 12:06 AM, Maxim Kuvyrkov wrote:
Hi,
This patch adds auto-prefetcher modeling to GCC scheduler. The auto-prefetcher
model is currently enabled only for ARM Cortex-A15, since this is the only CPU
that I know of to have the hardware auto-prefetcher unit.
The documentation on the auto-prefetcher is very sparse, and all I have are my empirical studies and a short
note in Cortex-A15 manual (search for "L2 cache auto-prefether"). This patch, therefore,
implements a very abstract model that makes scheduler prefer "mem_op (base+8); mem_op (base+12)"
over "mem_op (base+12); mem_op (base+8)". In other words, memory operations are tried to be issued
in order of increasing memory offsets.
The auto-prefetcher model implementation is based on max_issue mutlipass lookahead
scheduling, and its "guard" hook. The guard hook examines contents of the
ready list and the queue, and, if it finds instructions with lower memory offsets, marks
instructions with higher memory offset as unavailable for immediate scheduling.
This patch has been in works since beginning of the year, and many of my
previous scheduler cleanup patches were to prepare the infrastructure for this
feature.
Ramana, this change requires benchmarking, which I can't easily do at the
moment. I would appreciate any benchmarking results that you can share. In
particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH needs to be
tuned/confirmed for Cortex-A15.
At the moment the parameter is set to "2", which means that the autopref model
will look through ready list and 1-stall queue in search of relevant instructions.
Values of -1 (disable autopref), 0 (use autopref only in rank_for_schedule), 1 (look
through ready list), 2 (look through ready list and 1-stall queue), and 3 (look through
ready list and 2-stall queue) should be considered and benchmarked.
Bootstrapped on x86_64-linux-gnu and regtested on arm-linux-gnueaihf and
aarch64-linux-gnu. OK to apply, provided no performance or correctness
regressions?
[ChangeLog is part of the git patch]
I'd prefer symbolic constants for dont_delay. Also the address can
contains other parts, e.g. index for some targets. It is not necessary
to change the code but a comment would be nice that right now it is
oriented for machine with base+disp only addressing.
Although it is probably matter of taste. So you are free to commit it
without any change.
Thanks, Maxim.