Hi,
We'd like to note about CodeSourcery's patch for ARM backend, from which
GCC mainline can gain 4% on SPEC2K INT:
http://cgit.openembedded.org/openembedded/plain/recipes/gcc/gcc-4.5/linaro/gcc-4.5-linaro-r99369.patch
(also the patch is attached).
Originally, we noticed that GNU Go works 6% faster on cortex-a8 with
-fno-gcse. After profiling we found that this is most likely caused by
cache misses when accessing global variables. GCC generates ldr
instructions for them, while this can be avoided by emitting movt/movw
pair for such cases. RTL expressions for these instructions is high_ and
lo_sum. Currently, symbol_ref expands as high_ and lo_sum but then
cprop1 decides that this is redundant and merges them into one load insn.
The problem was also found by Linaro community:
https://bugs.launchpad.net/gcc-linaro/+bug/886124 .
Also there is a patch from codesourcery (attached), which was ported to
linaro gcc 4.5, but is missing in later linaro releases.
This patch makes split of symbol_refs at the later stage (after cprop),
instead of generating movt/movw at expand.
It fixed our test case on GNU Go. Also we tested it on SPEC2K INT (ref)
with GCC 4.8 snapshot from May 12, 2012 on cortex-a9 with -O2 and -mthumb:
Base Base Base Peak Peak Peak
Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
---------- -------- -------- -------- -------- -------- -------
164.gzip 1400 492 284 1400 497 282 -0.70%
175.vpr 1400 433 323 1400 458 306 -5.26%
176.gcc 1100 203 542 1100 198 557 2.77%
181.mcf 1800 529 340 1800 528 341 0.29%
186.crafty 1000 261 383 1000 256 391 2.09%
197.parser 1800 709 254 1800 701 257 1.18%
252.eon 1300 219 594 1300 202 644 8.42%
253.perlbmk 1800 389 463 1800 367 490 5.83%
254.gap 1100 259 425 1100 236 467 9.88%
255.vortex 1900 498 382 1900 442 430 12.57%
256.bzip2 1500 452 332 1500 424 354 6.63%
300.twolf 3000 916 328 3000 853 352 7.32%
SPECint_base2000 376
SPECint2000 391 3.99%
SPEC2K INT grows by 4% (up to 12.5% on vortex; vpr slowdown is likely
because of big variance on this test).
Similarly, there are gains of 3-4% without -mthumb on cortex-a9 and on
cortex-a8 (thumb2 and ARM modes).
This patch can be applied to current trunk and passes regtest
successfully on qemu-arm.
Maybe it will be good to have it in trunk?
If everybody agrees, we can take care of committing it.
--
Best regards,
Dmitry
2010-08-20 Jie Zhang <j...@codesourcery.com>
Merged from Sourcery G++ 4.4:
gcc/
2009-05-29 Julian Brown <jul...@codesourcery.com>
Merged from Sourcery G++ 4.3:
* config/arm/arm.md (movsi): Don't split symbol refs here.
(define_split): New.
2010-08-18 Julian Brown <jul...@codesourcery.com>
Issue #9222
=== modified file 'gcc/config/arm/arm.md'
--- old/gcc/config/arm/arm.md 2010-08-20 16:41:37 +0000
+++ new/gcc/config/arm/arm.md 2010-08-23 14:39:12 +0000
@@ -5150,14 +5150,6 @@
optimize && can_create_pseudo_p ());
DONE;
}
-
- if (TARGET_USE_MOVT && !target_word_relocations
- && GET_CODE (operands[1]) == SYMBOL_REF
- && !flag_pic && !arm_tls_referenced_p (operands[1]))
- {
- arm_emit_movpair (operands[0], operands[1]);
- DONE;
- }
}
else /* TARGET_THUMB1... */
{
@@ -5265,6 +5257,19 @@
"
)
+(define_split
+ [(set (match_operand:SI 0 "arm_general_register_operand" "")
+ (match_operand:SI 1 "general_operand" ""))]
+ "TARGET_32BIT
+ && TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
+ && !flag_pic && !target_word_relocations
+ && !arm_tls_referenced_p (operands[1])"
+ [(clobber (const_int 0))]
+{
+ arm_emit_movpair (operands[0], operands[1]);
+ DONE;
+})
+
(define_insn "*thumb1_movsi_insn"
[(set (match_operand:SI 0 "nonimmediate_operand" "=l,l,l,l,l,>,l, m,*lhk")
(match_operand:SI 1 "general_operand" "l, I,J,K,>,l,mi,l,*lhk"))]