[RFC, ARM] later split of symbol_refs

Dmitry Melnik Wed, 27 Jun 2012 07:59:10 -0700

Hi,

We'd like to note about CodeSourcery's patch for ARM backend, from whichGCC mainline can gain 4% on SPEC2K INT:http://cgit.openembedded.org/openembedded/plain/recipes/gcc/gcc-4.5/linaro/gcc-4.5-linaro-r99369.patch(also the patch is attached).

Originally, we noticed that GNU Go works 6% faster on cortex-a8 with-fno-gcse. After profiling we found that this is most likely caused bycache misses when accessing global variables. GCC generates ldrinstructions for them, while this can be avoided by emitting movt/movwpair for such cases. RTL expressions for these instructions is high_ andlo_sum. Currently, symbol_ref expands as high_ and lo_sum but thencprop1 decides that this is redundant and merges them into one load insn.

The problem was also found by Linaro community:https://bugs.launchpad.net/gcc-linaro/+bug/886124 .Also there is a patch from codesourcery (attached), which was ported tolinaro gcc 4.5, but is missing in later linaro releases.This patch makes split of symbol_refs at the later stage (after cprop),instead of generating movt/movw at expand.

It fixed our test case on GNU Go. Also we tested it on SPEC2K INT (ref)with GCC 4.8 snapshot from May 12, 2012 on cortex-a9 with -O2 and -mthumb:


            Base      Base      Base      Peak      Peak      Peak
Benchmarks  Ref Time  Run Time   Ratio    Ref Time  Run Time  Ratio
----------  --------  --------  --------  --------  -------- -------
164.gzip    1400      492       284     1400       497       282  -0.70%
175.vpr     1400      433       323     1400       458       306  -5.26%
176.gcc     1100      203       542     1100       198       557   2.77%
181.mcf     1800      529       340     1800       528       341   0.29%
186.crafty  1000      261       383     1000       256       391   2.09%
197.parser  1800      709       254     1800       701       257   1.18%
252.eon     1300      219       594     1300       202       644   8.42%
253.perlbmk 1800      389       463     1800       367       490   5.83%
254.gap     1100      259       425     1100       236       467   9.88%
255.vortex  1900      498       382     1900       442       430  12.57%
256.bzip2   1500      452       332     1500       424       354   6.63%
300.twolf   3000      916       328     3000       853       352   7.32%
SPECint_base2000                376
SPECint2000                                                  391   3.99%

SPEC2K INT grows by 4% (up to 12.5% on vortex; vpr slowdown is likelybecause of big variance on this test).

Similarly, there are gains of 3-4% without -mthumb on cortex-a9 and oncortex-a8 (thumb2 and ARM modes).

This patch can be applied to current trunk and passes regtestsuccessfully on qemu-arm.

Maybe it will be good to have it in trunk?
If everybody agrees, we can take care of committing it.

--
Best regards,
  Dmitry

2010-08-20  Jie Zhang  <j...@codesourcery.com>

	Merged from Sourcery G++ 4.4:

	gcc/
	2009-05-29  Julian Brown  <jul...@codesourcery.com>
	Merged from Sourcery G++ 4.3:
	* config/arm/arm.md (movsi): Don't split symbol refs here.
	(define_split): New.

 2010-08-18  Julian Brown  <jul...@codesourcery.com>
 
 	Issue #9222

=== modified file 'gcc/config/arm/arm.md'
--- old/gcc/config/arm/arm.md	2010-08-20 16:41:37 +0000
+++ new/gcc/config/arm/arm.md	2010-08-23 14:39:12 +0000
@@ -5150,14 +5150,6 @@
 			       optimize && can_create_pseudo_p ());
           DONE;
         }
-
-      if (TARGET_USE_MOVT && !target_word_relocations
-	  && GET_CODE (operands[1]) == SYMBOL_REF
-	  && !flag_pic && !arm_tls_referenced_p (operands[1]))
-	{
-	  arm_emit_movpair (operands[0], operands[1]);
-	  DONE;
-	}
     }
   else /* TARGET_THUMB1...  */
     {
@@ -5265,6 +5257,19 @@
   "
 )
 
+(define_split
+  [(set (match_operand:SI 0 "arm_general_register_operand" "")
+	(match_operand:SI 1 "general_operand" ""))]
+  "TARGET_32BIT
+   && TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
+   && !flag_pic && !target_word_relocations
+   && !arm_tls_referenced_p (operands[1])"
+  [(clobber (const_int 0))]
+{
+  arm_emit_movpair (operands[0], operands[1]);
+  DONE;
+})
+
 (define_insn "*thumb1_movsi_insn"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=l,l,l,l,l,>,l, m,*lhk")
 	(match_operand:SI 1 "general_operand"      "l, I,J,K,>,l,mi,l,*lhk"))]

[RFC, ARM] later split of symbol_refs

Reply via email to