> On Feb 3, 2017, at 8:12 PM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> 
> wrote:
> 
> Hi all,
> 
> While evaluating Maxim's SW prefetch patches [1] I noticed that the aarch64 
> prefetch pattern is
> overly restrictive in its address operand. It only accepts simple register 
> addressing modes.
> In fact, the PRFM instruction accepts almost all modes that a normal 64-bit 
> LDR supports.
> The restriction in the pattern leads to explicit address calculation code to 
> be emitted which we could avoid.

Thanks for this fix, I'll test it on my hardware.

I've reviewed your patch and it looks OK to me.

> 
> This patch relaxes the restrictions on the prefetch define_insn. It creates a 
> predicate and constraint that
> allow the full addressing modes that PRFM allows. Thus for the testcase in 
> the patch (adapted from one of the existing
> __builtin_prefetch tests in the testsuite) we can generate a:
> prfm    PLDL1STRM, [x1, 8]
> 
> instead of the current
> prfm    PLDL1STRM, [x1]
> with an explicit increment of x1 by 8 in a separate instruction.
> 
> I've removed the %a output modifier in the output template and wrapped the 
> address operand into a DImode MEM before
> passing it down to aarch64_print_operand.
> 
> This is because operand 0 is an address operand rather than a memory operand 
> and thus doesn't have a mode associated
> with it.  When processing the 'a' output modifier the code in final.c will 
> call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
> argument.  This will ICE on aarch64 because we need a mode for the memory in 
> order for aarch64_classify_address to work
> correctly.  Rather than overriding the VOIDmode in 
> aarch64_print_operand_address I decided to instead create the DImode
> MEM in the "prefetch" output template and treat it as a normal 64-bit memory 
> address, which at the point of assembly output
> is what it is anyway.

I agree that it is cleaner to convert operand of prefetch to DImode just before 
printing out to assembly.  There is little to be gained in relaxing asserts in 
aarch64_print_operand_address.

> 
> With this patch I see a reduction in instruction count in the SPEC2006 
> benchmarks when SW prefetching is enabled on top
> of Maxim's patchset because fewer address calculation instructions are 
> emitted due to the use of the more expressive
> addressing modes. It also fixes a performance regression that I observed in 
> 410.bwaves from Maxim's patches on Cortex-A72.
> I'll be running a full set of benchmarks to evaluate this further, but I 
> think this is the right thing to do.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Maxim, do you want to try this on top of your patches on your hardware to see 
> if it helps with the regressions you mentioned?

Sure.

--
Maxim Kuvyrkov
www.linaro.org


Reply via email to