Re: [PATCH 1/2] Introduce prefetch-minimum stride option

Luis Machado Tue, 23 Jan 2018 05:13:30 -0800

Hi Kyrill,

On 01/23/2018 07:32 AM, Kyrill Tkachov wrote:

Hi Luis,
On 22/01/18 13:46, Luis Machado wrote:
This patch adds a new option to control the minimum stride, for a memory
reference, after which the loop prefetch pass may issue software prefetch
hints for. There are two motivations:
* Make the pass less aggressive, only issuing prefetch hints forbigger stridesthat are more likely to benefit from prefetching. I've noticed a casein cpu2017
where we were issuing thousands of hints, for example.
I've noticed a large amount of prefetch hints being issued as well, buthad not
analysed it further.

I've gathered some numbers for this. Some of the most extreme casesbefore both patches:


CPU2017

xalancbmk_s: 3755 hints
wrf_s: 10950 hints
parest_r: 8521 hints

CPU2006

gamess: 11377 hints
wrf: 3238 hints

After both patches:

CPU2017

xalancbmk_s: 1 hint
wrf_s: 20 hints
parest_r: 0 hints

CPU2006

gamess: 44 hints
wrf: 16 hints

* For processors that have a hardware prefetcher, like Falkor, itallows theloop prefetch pass to defer prefetching of smaller (less than thethreshold)strides to the hardware prefetcher instead. This prevents conflictsbetween
the software prefetcher and the hardware prefetcher.

I've noticed considerable reduction in the number of prefetch hints and
slightly positive performance numbers. This aligns GCC and LLVM interms of
prefetch behavior for Falkor.
Do you, by any chance, have a link to the LLVM review that implementedthat behavior?
It's okay if you don't, but I think it would be useful context.


I've dug it up. The base change was implemented here:

review: https://reviews.llvm.org/D17945
RFC: http://lists.llvm.org/pipermail/llvm-dev/2015-December/093514.html

And then target-specific changes were introduced later for specificprocessors.

One small difference in LLVM is the fact that the second parameter,prefetching of non-constant strides, is implicitly switched off if onesets the minimum stride length. My approach here makes that secondparameter adjustable.

I've seen big gains due to prefetching of non-constant strides, but ittends to be tricky to control and usually comes together withsignificant regressions as well.

The fact that we potentially unroll loops along with issuing prefetchhints also makes things a bit erratic.

The default settings should guarantee no changes for existing targets.Those
are free to tweak the settings as necessary.

No regressions in the testsuite and bootstrapped ok on aarch64-linux.

Ok?


Are there any benchmark numbers you can share?
I think this approach is sensible.

Comparing the previous, more aggressive, pass behavior with the new onei've seen a slight improvement for CPU2006, 0.15% for both INT and FP.

For CPU2017 the previous behavior was actually a bit harmful, regressingperformance by about 1.2% in intspeed. The new behavior kept intspeedstable and slightly improved fpspeed by 0.15%.

The motivation for the future is to have better control of softwareprefetching so we can fine-tune the pass, either through generic loopprefetch code or by using the target-specific parameters.

Since your patch touches generic code as well as AArch64
code you'll need an approval from a midend maintainer as well as anAArch64 maintainer.Also, GCC development is now in the regression fixing stage, so unlessthis fixes a regression
it may have to wait until GCC 9 development is opened.

That is my understanding. I thought i'd put this up for review anyway sopeople can chime in and provide their thoughts.


Thanks for the review.

Luis

Re: [PATCH 1/2] Introduce prefetch-minimum stride option

Reply via email to