On Fri, Sep 15, 2017 at 2:33 AM, Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> wrote: > This patch adds aarch64_loop_unroll_adjust to limit partial unrolling > in rtl based on strided-loads in loop. > > Thanks, > Kugan > > gcc/ChangeLog: > > 2017-09-12 Kugan Vivekanandarajah <kug...@linaro.org> > > * cfgloop.h (iv_analyze_biv): export. > * loop-iv.c: Likewise. > * config/aarch64/aarch64.c (strided_load_p): New. > (insn_has_strided_load): New. > (count_strided_load_rtl): New. > (aarch64_loop_unroll_adjust): New.
This implementation assumes a particular kind of prefetcher and collisions in that hardware prefetcher. Are you sure this helps every single micro-architecture out there (or rather doesn't harm ?) ? Further more how has this patchset been benchmarked, what micro-architecture, what benchmarks, what's the performance impact and why should this be considered for generic ? regards Ramana