On Mon, Oct 20, 2014 at 9:06 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> wrote: > Hi, > > This patch adds auto-prefetcher modeling to GCC scheduler. The > auto-prefetcher model is currently enabled only for ARM Cortex-A15, since > this is the only CPU that I know of to have the hardware auto-prefetcher unit.
That might be the only ARM processor but I know the PowerPC 970 and power 4 have a hardware auto-prefetcher. They are slightly different in how many streams can be active. The 970 has some streams reserved for user streams. The PowerPC Cell also has a similar thing. Thanks, Andrew > > The documentation on the auto-prefetcher is very sparse, and all I have are > my empirical studies and a short note in Cortex-A15 manual (search for "L2 > cache auto-prefether"). This patch, therefore, implements a very abstract > model that makes scheduler prefer "mem_op (base+8); mem_op (base+12)" over > "mem_op (base+12); mem_op (base+8)". In other words, memory operations are > tried to be issued in order of increasing memory offsets. > > The auto-prefetcher model implementation is based on max_issue mutlipass > lookahead scheduling, and its "guard" hook. The guard hook examines contents > of the ready list and the queue, and, if it finds instructions with lower > memory offsets, marks instructions with higher memory offset as unavailable > for immediate scheduling. > > This patch has been in works since beginning of the year, and many of my > previous scheduler cleanup patches were to prepare the infrastructure for > this feature. > > Ramana, this change requires benchmarking, which I can't easily do at the > moment. I would appreciate any benchmarking results that you can share. In > particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH needs to be > tuned/confirmed for Cortex-A15. > > At the moment the parameter is set to "2", which means that the autopref > model will look through ready list and 1-stall queue in search of relevant > instructions. Values of -1 (disable autopref), 0 (use autopref only in > rank_for_schedule), 1 (look through ready list), 2 (look through ready list > and 1-stall queue), and 3 (look through ready list and 2-stall queue) should > be considered and benchmarked. > > Bootstrapped on x86_64-linux-gnu and regtested on arm-linux-gnueaihf and > aarch64-linux-gnu. OK to apply, provided no performance or correctness > regressions? > > [ChangeLog is part of the git patch] > > Thank you, > > -- > Maxim Kuvyrkov > www.linaro.org > >