On Mon, May 7, 2018 at 7:09 AM, Luis Machado <luis.mach...@linaro.org> wrote: > > > On 05/01/2018 03:30 PM, Jeff Law wrote: >> >> On 01/22/2018 06:46 AM, Luis Machado wrote: >>> >>> This patch adds a new option to control the minimum stride, for a memory >>> reference, after which the loop prefetch pass may issue software prefetch >>> hints for. There are two motivations: >>> >>> * Make the pass less aggressive, only issuing prefetch hints for bigger >>> strides >>> that are more likely to benefit from prefetching. I've noticed a case in >>> cpu2017 >>> where we were issuing thousands of hints, for example. >>> >>> * For processors that have a hardware prefetcher, like Falkor, it allows >>> the >>> loop prefetch pass to defer prefetching of smaller (less than the >>> threshold) >>> strides to the hardware prefetcher instead. This prevents conflicts >>> between >>> the software prefetcher and the hardware prefetcher. >>> >>> I've noticed considerable reduction in the number of prefetch hints and >>> slightly positive performance numbers. This aligns GCC and LLVM in terms >>> of >>> prefetch behavior for Falkor. >>> >>> The default settings should guarantee no changes for existing targets. >>> Those >>> are free to tweak the settings as necessary. >>> >>> No regressions in the testsuite and bootstrapped ok on aarch64-linux. >>> >>> Ok? >>> >>> 2018-01-22 Luis Machado <luis.mach...@linaro.org> >>> >>> Introduce option to limit software prefetching to known constant >>> strides above a specific threshold with the goal of preventing >>> conflicts with a hardware prefetcher. >>> >>> gcc/ >>> * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) >>> <minimum_stride>: New const int field. >>> * config/aarch64/aarch64.c (generic_prefetch_tune): Update to >>> include >>> minimum_stride field. >>> (exynosm1_prefetch_tune): Likewise. >>> (thunderxt88_prefetch_tune): Likewise. >>> (thunderx_prefetch_tune): Likewise. >>> (thunderx2t99_prefetch_tune): Likewise. >>> (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048. >>> (aarch64_override_options_internal): Update to set >>> PARAM_PREFETCH_MINIMUM_STRIDE. >>> * doc/invoke.texi (prefetch-minimum-stride): Document new option. >>> * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New. >>> * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define. >>> * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return >>> false if >>> stride is constant and is below the minimum stride threshold. >> >> OK for the trunk. >> jeff >> > > Thanks. Committed as revision 259995 now.
This breaks bootstrap on x86: ../../src-trunk/gcc/tree-ssa-loop-prefetch.c: In function ‘bool should_issue_prefetch_p(mem_ref*)’: ../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1010:54: error: comparison of integer expressions of different signedness: ‘long long unsigned int’ and ‘int’ [-Werror=sign-compare] && absu_hwi (int_cst_value (ref->group->step)) < PREFETCH_MINIMUM_STRIDE) ../../src-trunk/gcc/tree-ssa-loop-prefetch.c:1014:4: error: format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘long long int’ [-Werror=format=] "Step for reference %u:%u (%d) is less than the mininum " ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "required stride of %d\n", ~~~~~~~~~~~~~~~~~~~~~~~~~ ref->group->uid, ref->uid, int_cst_value (ref->group->step), ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- H.J.