Hi Andrew, I took the liberty of rebasing your patch on top of my patchset. Does it look correct?
I think I addressed all the comments you had about my review and posted updated patches. -- Maxim Kuvyrkov www.linaro.org > On Jan 30, 2017, at 7:25 PM, Andrew Pinski <apin...@cavium.com> wrote: > > On Mon, Jan 30, 2017 at 6:49 AM, Maxim Kuvyrkov > <maxim.kuvyr...@linaro.org> wrote: >>> On Jan 27, 2017, at 6:59 PM, Andrew Pinski <apin...@cavium.com> wrote: >>> >>> On Fri, Jan 27, 2017 at 4:11 AM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Fri, Jan 27, 2017 at 1:10 PM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Thu, Jan 26, 2017 at 9:56 PM, Andrew Pinski <apin...@cavium.com> wrote: >>>>>> Hi, >>>>>> This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and >>>>>> -mcpu=thunderxt88p1. I filled out the tuning structures for both >>>>>> thunderx and thunderx2t99. No other core current enables software >>>>>> prefetching so I set them to 0 which does not change the default >>>>>> parameters. >>>>>> >>>>>> OK? Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX >>>>>> CN88xx with no regressions. I got a 2x improvement for 462.libquantum >>>>>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast. >>>>>> CN99xx's SPEC did not change. >>>>> >>>>> Heh, quite impressive for this kind of bit-rotten (and broken?) pass ;) >>>> >>>> And I wonder if most benefit comes from the unrolling the pass might do >>>> rather than from the prefetches... >>> >>> Not in this case. The main reason why I know is because the number of >>> L1 and L2 misses drops a lot. >> >> I can confirm this. In my experiments loop unrolling hurts several tests. > > Not on the cores I tried it. I tried it on both ThunderX CN88xx and > ThunderX CN99xx, I did not get any regressions due to unrolling. > > Thanks, > Andrew > >> >> The prefetching approach I'm testing for -O2 includes disabling of loop >> unrolling to prevent code bloat. >> >> -- >> Maxim Kuvyrkov >> www.linaro.org
0007-Prefetch-tuning-for-ThunderX.patch
Description: Binary data