On 29/11/12 17:16, Christophe Lyon wrote:
Hi,
I have been working on a patch to avoid using Neon for 64 bits bitops
when it's too expensive to move data between core and Neon registers.
Benchmarking and validation look OK on the 4.7 branch (compiler
configured for thumb and hard FP)
- validation on cortex-a9 board OK
- bencharking shows 10.5% improvement on spec2k's crafty bench. On
other benches we are between -0.5% and +0.5%.
On trunk I have noticed a regression in gfortran when using modulo
scheduling: sms-1.f90 now fails, but I suspect it's not because of
this patch since forcing compilation for armv5t makes the same test
fail with and without my patch.
Hmm, that's worrying. Could you please makesure this is recorded in
bugzilla. If this is a regression, please mark it as such.
Specifically, I have observed that the loop:
862e: 3b01 subs r3, #1
8630: ef70 08a1 vadd.i64 d16, d16, d17
8634: ec51 0b30 vmov r0, r1, d16
8638: e9e2 0102 strd r0, r1, [r2, #8]!
863c: d1f7 bne.n 862e <main+0x3e>
in transformed into:
862e: 3901 subs r1, #1
8630: 1912 adds r2, r2, r4
8632: eb43 0305 adc.w r3, r3, r5
8636: e9e0 2302 strd r2, r3, [r0, #8]!
863a: d1f8 bne.n 862e <main+0x3e>
with my patch.
This is wrong because adds/adc clobber the flags used to control the loop.
The patch is:
2012-11-28 Christophe Lyon <christophe.l...@linaro.org>
gcc/
* config/arm/arm-protos.h (tune_params): Add
prefer_neon_for_64bits field.
* config/arm/arm.c (prefer_neon_for_64bits): New variable.
(arm_slowmul_tune): Default prefer_neon_for_64bits to false.
(arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
(arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
(arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
(arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
(arm_option_override): Handle -mneon-for-64bits new option.
* config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
(prefer_neon_for_64bits): Declare new variable.
* config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
avoid_neon_for_64bits and neon_for_64bits.
(arch_enabled): Handle new arch types.
(one_cmpldi2): Use new arch names.
* config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
(anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
of onlya8.
Is it OK for trunk?
Now that this optimization is disabled by default, the onlya8 code is
completely redundant and should be purged, along with the insn
alternatives that used it.
R.