On Fri, Feb 3, 2017 at 4:00 AM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> wrote: > Hi Andrew, > > I took the liberty of rebasing your patch on top of my patchset. Does it > look correct?
Yes this looks correct. Thanks, Andrew > > I think I addressed all the comments you had about my review and posted > updated patches. > > -- > Maxim Kuvyrkov > www.linaro.org > > > >> On Jan 30, 2017, at 7:25 PM, Andrew Pinski <apin...@cavium.com> wrote: >> >> On Mon, Jan 30, 2017 at 6:49 AM, Maxim Kuvyrkov >> <maxim.kuvyr...@linaro.org> wrote: >>>> On Jan 27, 2017, at 6:59 PM, Andrew Pinski <apin...@cavium.com> wrote: >>>> >>>> On Fri, Jan 27, 2017 at 4:11 AM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Fri, Jan 27, 2017 at 1:10 PM, Richard Biener >>>>> <richard.guent...@gmail.com> wrote: >>>>>> On Thu, Jan 26, 2017 at 9:56 PM, Andrew Pinski <apin...@cavium.com> >>>>>> wrote: >>>>>>> Hi, >>>>>>> This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and >>>>>>> -mcpu=thunderxt88p1. I filled out the tuning structures for both >>>>>>> thunderx and thunderx2t99. No other core current enables software >>>>>>> prefetching so I set them to 0 which does not change the default >>>>>>> parameters. >>>>>>> >>>>>>> OK? Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX >>>>>>> CN88xx with no regressions. I got a 2x improvement for 462.libquantum >>>>>>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast. >>>>>>> CN99xx's SPEC did not change. >>>>>> >>>>>> Heh, quite impressive for this kind of bit-rotten (and broken?) pass ;) >>>>> >>>>> And I wonder if most benefit comes from the unrolling the pass might do >>>>> rather than from the prefetches... >>>> >>>> Not in this case. The main reason why I know is because the number of >>>> L1 and L2 misses drops a lot. >>> >>> I can confirm this. In my experiments loop unrolling hurts several tests. >> >> Not on the cores I tried it. I tried it on both ThunderX CN88xx and >> ThunderX CN99xx, I did not get any regressions due to unrolling. >> >> Thanks, >> Andrew >> >>> >>> The prefetching approach I'm testing for -O2 includes disabling of loop >>> unrolling to prevent code bloat. >>> >>> -- >>> Maxim Kuvyrkov >>> www.linaro.org >