Re: [PATCH 0/6] Improve -fprefetch-loop-arrays in general and for AArch64 in particular

Andrew Pinski Sat, 27 May 2017 22:02:53 -0700

On Tue, Feb 28, 2017 at 1:53 AM, Maxim Kuvyrkov
<maxim.kuvyr...@linaro.org> wrote:
>> On Feb 20, 2017, at 5:38 PM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> 
>> wrote:
>>
>> Hi Maxim,
>>
>> On 30/01/17 11:24, Maxim Kuvyrkov wrote:
>>> This patch series improves -fprefetch-loop-arrays pass through small fixes 
>>> and tweaks, and then enables it for several AArch64 cores.
>>>
>>> My tunings were done on and for Qualcomm hardware, with results varying 
>>> between +0.5-1.9% for SPEC2006 INT and +0.25%-1.0% for SPEC2006 FP at -O3, 
>>> depending on hardware revision.
>>>
>>> This patch series enables restricted -fprefetch-loop-arrays at -O2, which 
>>> also improves SPEC2006 numbers
>>>
>>> Biggest progressions are on 419.mcf and 437.leslie3d, with no serious 
>>> regressions on other benchmarks.
>>>
>>> I'm now investigating making -fprefetch-loop-arrays more aggressive for 
>>> Qualcomm hardware, which improves performance on most benchmarks, but also 
>>> causes big regressions on 454.calculix and 462.libquantum.  If I can fix 
>>> these two regressions, prefetching will give another boost to AArch64.
>>>
>>> Andrew just posted similar prefetching tunings for Cavium's cores, and the 
>>> two patches have trivial conflicts.  I'll post mine as-is, since it address 
>>> one of the comments on Andrew's review (adding a stand-alone struct for 
>>> tuning parameters).
>>>
>>> Andrew, feel free to just copy-paste it to your patch, since it is just a 
>>> mechanical change.
>>>
>>> All patches were bootstrapped and regtested on x86_64-linux-gnu and 
>>> aarch64-linux-gnu.
>>>
>>
>> I've tried these patches out on Cortex-A72 and Cortex-A53, with the tuning 
>> structs entries appropriately
>> modified to enable the changes on those cores.
>> I'm seeing the mcf and leslie3d improvements as well on Cortex-A72 and 
>> Cortex-A53 and no noticeable regressions.
>> I've also verified that the improvements are due to the prefetch 
>> instructions rather than just the unrolling that
>> the pass does.
>> So I'm in favor of enabling this for the cores that benefit from it.
>>
>> Do you plan to get this in for GCC 8?
>
> Hi Kyrill,
>
> My hope was to push them in time for GCC 7, but it seems to late now.  I'll 
> return to these patches at the beginning of Stage 1.


Ping on this patch set as I really want to get in the prefetching side
for ThunderX 1 and 2.  Or should I resubmit my patch set?

Thanks,
Andrew

>
> --
> Maxim Kuvyrkov
> www.linaro.org
>

Re: [PATCH 0/6] Improve -fprefetch-loop-arrays in general and for AArch64 in particular

Reply via email to